IDENT package demo

The IDENT package provides rapper functions to the SLRA package for the purpose of linear time-invariant system identification. For a description of the identification problem, solution method, and software, see the references. This document gives a demo. After brief installed and usage information, we show numerical examples that illustrate features of the package such as identification of unstable models, use of data from multiple experiments, and missing data estimation. We validate the results obtained and compare the performance with the ones of the System Identification Toolbox.

1 Installation and usage

The software is hosted in GitHub. It can be download as a single zip file, or installed in a sub-directory to the current directory by executing the following commands into MATLAB:

%% Installation
unzip('https://github.com/slra/slra/zipball/master')
addpath(fullfile(cd, 'slra-slra-3e77deb'))

The package has two functions:

[sys, info, wh] = ident(w, m, ell, opt) — for identification of a model, and
[M, wh] = misfit(w, sys, opt) — for validation of a model.

The model sys is a discrete-time linear time-invariant system and the data w is a set of trajectories. The function misfit(w, sys) computes the projection of the data w on a given model sys and returns as a result the 2-norm of the error signal w - wh, called misfit of the data with respect to the model. The function ident(w, m, ell) minimizes the misfit misfit(w, sys) over all linear time-invariant systems with m inputs and lag at most ell.

The input/output parameters are:

data w — set of time series \(\{\, w^1, \ldots, w^N \,\}\), \(w^k \in (\mathbb{R} \cup \{\mathtt{NaN}\}^q)^{T_k}\), specified by
- \(T\times q\) matrix, in case of a \(q\)-variate time series with \(T\) samples,
- \(T\times q \times N\) array, in case of \(N\), \(q\)-variate time series with \(T\) samples, or
- cell array of \(T_k\times q\) matrices, in case of vector time series with different number of samples.
Elements of w with value NaN indicate missing values.
model sys — discrete-time linear time-invariant system (an ss object).
data approximation wh — the projection of w on sys.
misfit M \(=\) norm(w - wh).
number of inputs m and lag ell — specify the model's complexity: order of sys \(\leq (\mathtt{q}-\mathtt{m}) \star \mathtt{ell}\).
optional parameters opt
- opt.exct — exact/fixed variables (vector of integers between 1 and \(q\)),
- opt.wini — initial conditions (set equal to zero for zero initial conditions)
- opt.sys0 — initial approximation (a linear time-invariant system with m inputs and lag ell)
- opt.n — order of the model.
output information info
- opt.M — misfit norm(w - wh).
- opt.iter — number of iterations.

2 Examples setup

The following sections show numerical examples illustrating features of the ident toolbox. The data that is used in the examples is generated in the output error setting. By default, the ident function treats all variables as inexact (noisy), which corresponds to the errors-in-variables setting. However, the optional parameter opt.exct allows specification of exact (noise free) variables. Thus, setting the inputs as exact, results in output identification.

clear all, rng(1)
%
%% simulation parameters
m = 1; p = 1; ell = 2; % # of inputs, # of outputs, and lag
T = 500; s = 0.1; % # of samples and noise level (relative to the true signal)
%
%% IDENT package options
opt_oe.exct = 1:m; % fixed inputs = output error identification
opt_eiv.exct = []; % default, errors-in-variables setup
%
%% true and noisy data
n = ell * p; q = m + p;
sys0 = drss(n, p, m);                  % random "true" system
u0 = rand(T, m); xini0 = rand(n, 1);   % random "true" trajectory
y0 = lsim(sys0, u0, [], xini0); u = u0;
yt = randn(T, p);
y = y0 + s * norm(y0) * yt / norm(yt); % output error
w0 = [u0 y0]; w = [u y];

3 Equivalence with PEM in the output error case

The output error identification problem can be solved also by the pem function from the System Identification Toolbox of \matlab. The following simulation example shows that for output error identification, the ident and pem functions obtain equivalent results.

%% ident and pem give the same model in the OE setting
tic, sysh_ident = ident(w, m, ell, opt_oe); t_ident = toc;
tic, sysh_pem = pem(iddata(y, u), n); t_pem = toc;
[Yh, M] = compare(iddata(y, u), idss(sysh_ident), sysh_pem);
[{'' 'fit' 'time'}; {'ident'; 'pem'} ...
 num2cell([[mean(M{1}); mean(M{2})] [t_ident; t_pem]])]

ans = 

    ''         'fit'        'time'  
    'ident'    [78.9405]    [0.0851]
    'pem'      [78.9363]    [1.3349]

The estimation errors (computed by the compare function or the System Identification Toolbox) are close to each other. (They are not exactly the same due to different default convergence tolerances of ident and pem.) The execution time for the ident function, however, is an order of magnitude smaller than the one of pem. This is misleading as a comparison of the computational efficiency of ident and pem, because the example is small size and in this case the execution time of pem is dominated by the parameter checks rather than the actual computation of the model. Even for larger size problems, however, the ident function tends to be faster than pem due to the C++ implementation of the underlying computational method \cite{slra-software}.

4 Permutation of variables

By default the ident package treats all variables as inexact (errors-in-variables setup). Moreover, no a priori fixed input/output partitioning of the variables is assumed (behavioral setup). For convenience of the user, however, the model sysh, obtained by ident, is returned in an input/state/output form (ss object), where the first m variables are inputs and the remaining p variables are outputs. In order to illustrate the point that ident does not distinguish a priori between inputs and outputs, we perform next the following experiment:

identify a model from the data \((w_1,w_2)\),
identify a model from the data \((w_2,w_1)\).

%% the identified model for [y u] is the inverse of the one for [u y]
[sysh1, info1] = ident(w, m, ell, opt_eiv);
[sysh2, info2] = ident(fliplr(w), m, ell, opt_eiv);
[{'' 'sysh1' 'sysh2'}; ['misfit' num2cell([info1.M info2.M])]]

ans = 

    ''          'sysh1'     'sysh2' 
    'misfit'    [1.4805]    [1.4805]

The result shows that the two models achieve the same estimation error. This is due to the fact that the misfit criterion is symmetric with respect to the variables. Moreover, the two models are inverse of each other

[{'zero(sysh1)' 'pole(sysh2)' 'pole(sysh1)' 'zero(sysh2)'}
 num2cell(sort([zero(sysh1) pole(sysh2) pole(sysh1) zero(sysh2)]))]

ans = 

    'zero(sysh1)'    'pole(sysh2)'    'pole(sysh1)'         'zero(sysh2)'     
    [     0.0442]    [     0.0419]    [0.1292 - 0.2066i]    [0.1281 - 0.2063i]
    [    61.1458]    [    61.0538]    [0.1292 + 0.2066i]    [0.1281 + 0.2063i]

5 Unstable system: time-reversed trajectory

Apart from treating all variables on an equal footing, the ident package does not impose a stability constraint. This is another important difference with other identification packages, such as the System Identification Toolbox of \matlab, which, by default imposes a stability constraint. The reason for making identification of a stable model a default option is that 1) in the classical stochastic setting the data is assumed stationary which implies stability, 2) unstable models may cause numerical issues in the evaluation of the likelihood when using the classical Kalman filter. In order to illustrate this point consider identification from the time-reversed trajectory.

%% pem without stability constraint fails when the model is unstable
opt_pem = ssestOptions; opt_pem.Advanced.StabilityThreshold.z = inf;
try 
  sysh_pem = pem(iddata(flipud(y0), flipud(u0)), n, opt_pem, 'dist', 'none');
  disp('PEM succeeds')
catch
  disp('PEM fails')
end
PEM fails

Time reversal has the effect of flipping the model's poles with respect to the unit circle, so that if the "forward-time" model is stable, the "backward-time" model is unstable.

%% the model for the reversed time data has reciprocal poles
sysh1 = ident(w0, m, ell, opt_eiv);
sysh2 = ident(flipud(w0), m, ell, opt_eiv);
[{'1 ./ eig(sysh1)' 'eig(sysh3)'};
 num2cell(sort([1 ./ eig(sysh1) eig(sysh2)]))]

ans = 

    '1 ./ eig(sysh1)'     'eig(sysh3)'      
    [2.5039 - 4.7492i]    [2.5039 - 4.7492i]
    [2.5039 + 4.7492i]    [2.5039 + 4.7492i]

6 Trajectory of minimal length

The minimum number of samples needed to identify a single-input single-model linear time-invariant system of order \(n\) is \(T_{\min} := 2n + 1\). The code below finds the minimum number of samples needed for ident and pem to find a model.

%% pem needs more samples than ident for exact identification
Tmin = 2 * ell + 1;
%
%% IDENT
for Tmin_ident = Tmin:14; 
  try 
    sysh_ident = ident(w0(1:Tmin_ident, :), m, ell, opt_oe); 
    if (norm(sys0 - sysh_ident) < 1e-5), break, end
  end
end
%
% PEM
for Tmin_pem = Tmin:14; 
  try 
    sysh_pem = pem(iddata(y0(1:Tmin_pem, :), u0(1:Tmin_pem, :)), n, 'dist', 'none');
    if (norm(sys0 - sysh_pem) < 1e-5), break, end
  end
end
%
%% results
[{'' 'ident' 'pem'}; ['Tmin' num2cell([Tmin_ident Tmin_pem])]]

ans = 

    ''        'ident'    'pem'
    'Tmin'    [    7]    [ 13]

It turns out that pem needs more samples than ident. However, ident does not achieve the theoretical bound of \(T_{\min} := 2n + 1\) samples.

7 Multiple short trajectories

In some cases the data consists of multiple trajectories, obtained in different experiments (initial conditions and/or inputs). Both pem and ident can use data from multiple experiments. As shown in \cite{slra-consistency}, consistent estimation of a model can be achieved not only asymptotically in the number of samples of a single experiment but also asymptotically in the number of experiments. In the latter case, the observed trajectories may be finite. Identification from multiple "short" trajectories is of interest, for example, when the system is unstable. Also, when the system is stable autonomous, consistent estimation can not be achieved from a single trajectory, but is possible by using multiple trajectories. In the following simulation example, we illustrate this fact: multiple "short" trajectories give equivalent "information" about the model as a single "long" trajectory.

%% multiple "short" trajectories give equivalent information about the model
%% as a single "long" trajectory
%
%% generate N trajectories with Tshort samples, where N * Nshort = T
Tshort = 15; N = round(T / Tshort);
u_mult = {}; y_mult = {}; w_mult = [];
for k = 1:N,
  u0 = rand(Tshort, m); xini0 = rand(n, 1); 
  y0 = lsim(sys0, u0, [], xini0); 
  yt = randn(Tshort, p); y_mult{k} = y0 + s * norm(y0) * yt / norm(yt); 
  u_mult{k} = u0; w_mult(:, :, k) = [u0 y_mult{k}]; 
end
%
%% estimate a model from the multiple short trajectories
tic, sysh_ident = ident(w_mult, m, ell, opt_oe); t_ident = toc;
tic, sysh_pem = pem(iddata(y_mult, u_mult), n, 'dist', 'none'); t_pem = toc;
%
%% compare the identified models by pem and ident
[Yh, M] = compare(iddata(y_mult, u_mult), idss(sysh_ident), sysh_pem);
[{'' 'fit' 'time'};
 {'ident'; 'pem'} num2cell([[mean(M{1}); mean(M{2})] [t_ident; t_pem]])]

ans = 

    ''         'fit'        'time'  
    'ident'    [86.4743]    [0.0704]
    'pem'      [86.7589]    [1.4642]

The result shows that the identified models achieve approximately the same accuracy, however, the ident function is faster than pem. Next, we compare the estimation accuracy of the true model sys0 when using one trajectory with \(T\) samples and \(N\) trajectories with \(T_\min\) samples, where the total number of samples are the same, i.e., \(T = N T_\min\).

% compare the identified model from the multiple trajectories
% with the one from an equivalent (N * Nshort = T) single trajectory
sysh1 = ident(w, m, ell, opt_oe);
[{'' '1 traj.' 'N traj.'}; '||sys0 - sysh||' ...
 num2cell([norm(sys0 - sysh1) norm(sys0 - sysh_ident)])]

ans = 

    ''                   '1 traj.'    'N traj.'
    '||sys0 - sysh||'    [ 0.0327]    [ 0.0241]

The result shows that the obtained accuracy is about the same which empirically confirms the claim that multiple "short" trajectories give equivalent information about the model as a single "long" trajectory.

8 Missing data

Our final example shows the feature of the ident package to estimate missing data. Missing data are marked with NaN's. They can be arbitrary distributed in time as well as in variables. No assumption is made about the missing data except from the basic one that they complete a full trajectory of a linear time-invariant system with bounded complexity. The ident function estimates the missing data from the underlying rank constraint (see \cite{slra-ext}). The System Identification Toolbox has a function misdata that allows estimation of missing data. misdata and the subspace method n4sid are used for the computation of initial approximation for both pem and ident.

%% randomly distributed missing data in time and variables
Tp = 100; Tm = round(0.2 * q * Tp);
Im = randperm(q * Tp, Tm);
wm = w(1:Tp, :); wm(Im) = NaN;
um = wm(:, 1:m); ym = wm(:, m + 1:end);
%
%% use misdata+n4sid for computation of initial approximation
tic,
sysh_n4sid = ss(n4sid(misdata(iddata(ym, um), 10), n, 'dist', 'none'));
t_n4sid = toc;
%
%% PEM
tic,
sysh_pem = pem(misdata(iddata(ym, um), 10), sysh_n4sid, 'dist', 'none');
t_pem = toc;
%
%% IDENT
opt_oe.sys0 = sysh_n4sid; % use the same initial approximation as pem
tic, sysh_ident = ident(wm, m, ell, opt_oe); t_ident = toc;

The identified models are validated against the true complete trajectory, using the compare function.

[Yh, M] = compare(iddata(y0, u0), sysh_n4sid, idss(sysh_ident), sysh_pem);
[{'' 'fit' 'time'}; {'n4sid'; 'ident'; 'pem'} ...
 num2cell([[mean(M{1}); mean(M{2}); mean(M{3})] [t_n4sid; t_ident; t_pem]])]

ans = 

    ''         'fit'        'time'  
    'n4sid'    [86.3034]    [8.2505]
    'ident'    [86.1149]    [0.0823]
    'pem'      [86.8504]    [8.7056]

9 References

I. Markovsky and K. Usevich. Software for weighted structured low-rank approximation. J. Comput. Appl. Math., 256:278-292, 2014. [ bib | DOI | pdf | .html | Abstract ]
I. Markovsky. A software package for system identification in the behavioral setting. Control Engineering Practice, 21:1422-1436, 2013. [ bib | DOI | pdf | .html | Abstract ]