Selected topics in nonlinear system identification
lecturer: Ivan Markovsky
Organization of the course
Topics
- Behavioral approach to system theory
- A general framework for system identification
- Linear time/parameter-varying systems
- Static nonlinear systems
- Representation and identification of nonlinear dynamic systems
- Data-driven control and signal processing
common theme: data modeling using matrix low-rank approximation
Individual projects
- Choose an individual mini-project.
- Work on the mini-project and prepare a written report (to be submitted by ).
- You give a 15 minutes oral presentation for all course participants. The presentation is followed by 15 minutes questions, discussion, and feedback.
Exam
If you complete the mini-project and deliver a report of your work, you earn 10 points. The final mark (10–20) is determined by the quality of the report, oral presentation, and participation in the discussions.
Behavioral approach to system theory
A prerequisite for the new material on nonlinear systems identification, is working knowledge of the linear time-invariant systems theory. Why is this? On the one hand, the class of linear time-invariant systems is a subclass of nonlinear systems. Therefore, understanding of the latter implies understanding of the former. On the other hand, many results for nonlinear systems are inspired by and are often direct generalizations of corresponding results for linear systems. For example, the three main types of representations—Volterra series, nonlinear ARX, and nonlinear state space—correspond to the impulse response (convolution), ARX, and state space representations of linear time-invariant systems.
Linear time-invariant system's theory
We will often use the following concepts:
- behavioral definition of a dynamical system; properties of linearity, time-invariance, and final dimensionality;
- representations of linear time-invariant systems—convolution, transfer function, state space;
- links among representations—the realization problem.
For detailed notes, see:
- I. Markovsky, Exact and approximate modeling in the behavioral setting, Chapter 7
- J.-W. Polderman and J. Willems, Introduction to mathematical systems theory
- D. Luenberger, Introduction to dynamical systems: Theory, models and applications
Three types of problems are discussed in engineering mathematics:
- analysis problems, e.g., establishing stability, controllability, and observability properties of a system
- synthesis problems, e.g., design of an observer and controller for a given system, and
- identification problems, i.e., deriving a model of the system from observed trajectories.
This course focuses on the identification problem. Solving identification problems, however, requires knowledge of analysis. Also the product of identification—the model—is used in analysis and synthesis. There are deep connections and interactions among the three problem areas.
Problems
- (Testing if a system is static linear)
- You are given a system as a black-box or executable code of a software function. You don't know what is within the black-box (or you can not see the code of the function) but you can do an arbitrary experiment with the system, i.e., you can observe its response. Assume that the variables are partitioned into inputs that you can choose and outputs that you observe.
- What experiments would you do in order to find out whether the system is static linear?
- How would you infer from the observations if the system is static linear?
- What if the observations are corrupted by noise?
- Consider, now, the case when you are given (noisy) data \(\big\{w^{(1)},\ldots,w^{(N)}\big\}\) of \(N\) experiments with the system, i.e., you have no control over the experiment. How would you infer from the observations if the system is static linear?
- You are given a system as a black-box or executable code of a software function. You don't know what is within the black-box (or you can not see the code of the function) but you can do an arbitrary experiment with the system, i.e., you can observe its response. Assume that the variables are partitioned into inputs that you can choose and outputs that you observe.
- (Testing if a system is linear time-invariant)
- The setup is the same as in problem 1: you are given a system as a black-box and you are allowed to do experiments in order to find out what type of system it is. In this case, the properties of interest are linearity and time-invariance, i.e., the system may be dynamic. What experiments would you do and how would you infer from the observations if the system is linear time-invariant? Consider separately the cases of exact and noisy data.
- Consider, now, the case when you are given trajectories \(\big\{w^{(1)},\ldots,w^{(N)}\big\}\) \[ w^{(i)} = \big( w^{(i)}(1),\ldots,w^{(i)}(T) \big) \] of the system, i.e., you have no control over the experiment. How would you infer from the observations if the system is linear time-invariant? Consider separately the cases of exact and noisy data.
A general framework for system identification
The three main players in system identification are the data, the model class, and the approximation criterion. The data is a finite set of time series. The model class is a set of candidate models. The approximation criterion is a measure for the distance from the data to a model.
In system identification, we aim to find a model from given data, however, this problem is ill-posed. We need additional prior knowledge about the model. This prior knowledge is the model class is the approximation criterion. The model class imposes a hard constraint—the model must belong to the model class. The approximation criterion imposes a soft constraint—the model should minimize the approximation error.
If there are multiple models that achieve the same approximation error, we prefer the least complex one. This is called the Occam's razor principle. In fact, system identification is a trade-off between fitting accuracy and model complexity. Indeed, with a finite amount of data (which is the case in practice), exact fit is always possible by making the model complexity sufficiently high.
In the case of linear time-invariant systems, the model complexity is measured by a pair of natural numbers: the number of inputs and the order of the system. Similar characterization of model complexity does not exist for general nonlinear models.
Exact identification aims to find the least complicated model that fits the data. Although it is an academic problem (one can argue that it never occurs in practice due to disturbances and measurement noise), it is important pedagogically and leads to practical approximate identification methods. In this course, we will always start with the "exact" version of the problems before treating the approximate ones.
Problems
(Exact modeling of an input/output static polynomial system, a.k.a., polynomial interpolation)
Given a set of \(N\) data points \(\big\{w^{(1)},\ldots,w^{(N)}\big\} \subset \mathbb{R}^2\), where \(w^{(i)} = \big( u^{(i)}, y^{(i)}\big)\), find a polynomial static model \[\mathcal{B} = \{\, (u, y) \ | \ y = p_0u^0 + p_1 u^1 + \cdots + p_d u^d \,\} \] that fits the data exactly.
(Exact modeling of a static polynomial system in the behavioral setting)
Given a set of \(N\) data points \(\big\{w^{(1)},\ldots,w^{(N)}\big\} \subset \mathbb{R}^2\), find a polynomial static model \[\mathcal{B} = \{\, w \ | \ p_{00} w_1^0 w_2^0 + p_{10} w_1^1 w_2^0 + p_{01} w_1^0 w_2^1 + p_{11} w_1^1 w_2^1 + p_{20} w_1^2 w_2^0 + p_{02} w_1^0 w_2^2 = 0 \,\}\] that fits the data exactly.
Linear time/parameter-varying systems
There are two main paths of generalizing the linear time-invariant model class:
- allowing for time-variation (i.e., relaxing the time-invariance property), and
- allowing for nonlinearities (i.e., relaxing the linearity assumption).
In this lecture, we begin with the first generalization, which leads to the class of linear time-varying systems and the related class of linear parameter-varying systems.
Linear time-varying systems
- linear differential/difference equation: the coefficients are functions of time
- state-space representation: the \(A,B,C,D\) matrices are functions of time
- autonomous case and state transition matrix \(\Phi(t_2, t_1) := A(t_2-1)A(t_2-1)\cdots A(t_1)\), for \(t_2>t_1\)
- driven response \[ x(t) = \Phi(t,0)x(0) + \underbrace{\begin{bmatrix} \Phi(t, 1)B(0) & \Phi(t, 2)B(1) & \cdots & B(t-1) \end{bmatrix}}_{\Gamma(t,0)} \begin{bmatrix} u(0)\\ u(1)\\ \vdots\\ u(t-1) \end{bmatrix} \]
- similarities and differences with the linear time-invariant case
- the notions of poles and zeros are no longer relevant
- stability analysis is more complicated (requires construction of a Lyapunov function)
- linear time-varying systems are essentially nonparameteric: the number of parameters is \(O(T)\), where \(T\) is the number of data points
Exercises
- Show that if \(||A(t)|| < 1\), for all \(t\), the system is asymptotically stable, i.e., \(x(t) \to 0\), for any \(x(0)\).
- Show that \[ \begin{bmatrix} y(0)\\ y(1)\\ \vdots\\ y(t) \end{bmatrix} = \mathcal{O}(0,t)x(0) + \mathcal{G}(0,t) \begin{bmatrix} u(0)\\ u(1)\\ \vdots\\ u(t) \end{bmatrix},\] for some \(\mathcal{O}(0,t)\) and \(\mathcal{G}(0,t)\).
Identification of linear periodically time-varying systems
- when the coefficients variation is a periodic function, the number of parameters is fixed (independent of \(T\))
- a "lifting" technique reduces a linear periodically time-varying system to a linear time-invariant system
Multiple-model system modeling and estimation
In order to capture the global behavior of a strongly nonlinear system, a single linear time-invariant model is not sufficient. If the system has a number of equilibrium points, it is reasonable to expect that "most of the time" the system is in the neighborhood of these points. More generally, we have a number of operating points (these can be equilibrium points as well as other points of interest) that represent well the global nonlinear behavior.
Linearization around the set of operating point leads to a set of linear time-invariant models (local models) that are "good" approximation of the original nonlinear system in the neighborhoods of the operating points. Such a system is called a multiple model system. Consider as an analogy piecewise linear approximation of a nonlinear function. The number and location of operating points is a design parameter in the construction of the approximation.
The approximation of a nonlinear system by a multiple model system can be done by switching from one local model to another depending on the state of the system or by "blending" all models, taking a linear combination of the local models. The first approach leads to a switch system, which is type of a hybrid system: continues/discrete-time dynamics of the local models and discrete-event dynamics of the switching rule. The second approach leads to a parameter dependent system.
./mm-approx.pdf The parameter is the vector of the coefficients in the linear combination. We will consider next the second approach.
The parameter of the multiple model system can be used for system approximation, estimation, and control. In the system approximation problem, given is a system or input/output data and the goal is to find the value of the parameter that minimizes the error between the multiple model system and the given system. In the estimation problem, a linear combination of a bank of local estimator is taken,
./mm-est.pdf and the parameters of the linear combination are optimized to achieve the smallest estimation error. Finally in the control problem the linear combination is formed from the control signals of local controllers
./mm-ctr.pdf and the objective is to select the optimal coefficients according to a specified control objective.
Static nonlinear systems
- a static nonlinear system identification problem is a curve fitting problem
- the problem is linear in the coefficients however the ordinary least squares estimator is biased
- it is possible to correct for the bias, resulting in a consistent estimator
- polynomial decoupling
Representations of nonlinear dynamic systems
Volterra series
A Volterra series expansion is a non-parametric system description that has a universal approximation property. It generalizes the concept of the finite impulse response to the nonlinear case, in much the same way as a Taylor series expansion for function approximation. In simple terms, a Volterra series is nothing more than a polynomial function of time-shifted input signals. The output \(y\) of a system can hence be written as the finite sum \[y(t) =\sum_{i=1}^d \, \sum_{\tau_1} \sum_{\tau_2} \cdots \sum_{\tau_i} H_i(\tau_1, \tau_2, \ldots, \tau_i)\, u(t - \tau_1)\, u(t - \tau_2)\, \cdots u(t - \tau_i) \] where \(u\) denotes the input and \(H_i\) is the \(i\)th-order Volterra kernel. A major advantage is that estimating the Volterra kernel coefficients is a linear problem. However, the number of coefficients grows at least combinatorially with the system memory (number of lags) and order.
Data-driven control and signal processing
Mini-projects
- Realization and identification of autonomous linear periodically time-varying systems
- Multiple model adaptive filtering
- Static nonlinear system identification
- Nonlinear subspace identification
- Nonlinear auto-regressive-exogenous system identification
- Autonomous Wiener system analysis and identification
- Data-driven control generalization to nonlinear systems