Considering an inverted pendulum as the one depicted in the figure: a cart of mass M (which can move in one direction) has a pendulum of mass m attached to a massless rigid rod (which is constrained to move along an arc of a circle centered at the pivot point which is fixed on the cart); assuming x is the direction where the cart can move, L is the fixed length of the connecting rod, \theta is the angle it makes with the vertical axis, a friction with a coefficient d applied to the cart (considering this friction much greater than the friction on the pendulum which is then approximated to zero) and an force F = u applied to the cart in the x direction.
It is possible to derive the governing equations for the motion of the two mass, and a system of equations which can be solved to determine the position and the period of oscillation.
Using the non-variational formulation of Lagrange equations
\frac{\mathrm d}{\mathrm dt} \left(\frac{\partial L}{\partial \dot{q}^j}\right)-\frac{\partial L}{\partial q^j} = Q_j, \quad j \in \{1,\ldots, n\}
where Q_j are the generalized forces that do not have generalized potentials.
Three forced are applied:
As generalized coordinates, the displacement of the cart x and the angle \theta are chosen. The equations of motions becomes:
\begin{aligned} & \frac{\mathrm d}{\mathrm dt} \left(\frac{\partial L}{\partial \dot x}\right)-\frac{\partial L}{\partial x} = F_x \\ & \frac{\mathrm d}{\mathrm dt} \left(\frac{\partial L}{\partial \dot \theta}\right)-\frac{\partial L}{\partial \theta} = F_\theta \end{aligned}
Starting from the computation of the Lagrangian:
T - V = \frac{1}{2} \left(M\,v_1^2 + m\,v_2^2 \right) - m\,L\,g \, y_2
The only potential energy is the one associated with the pendulum, as the cart is moving horizontally.
The velocities (in the x direction \mathbf i and the y direction \mathbf j) can be expressed in function of x and theta as:
\begin{aligned} & x_1 = x \quad y_1 = 0 \\ & \mathbf v_1 = \dot x \mathbf i \\ & x_2 = x + L \, \sin(\theta), \quad y_2 = - L \, \cos(\theta) \\ & \mathbf v_2 = \left[ \dot x + L \, \cos(\theta)\dot \theta \right] \mathbf i + \left[L \, \sin(\theta)\dot \theta\right] \mathbf j \\ \end{aligned}
The potential energy (g is considered as the modulus of the gravitational acceleration, and therefore there is a minus sign) is:
V = - m\,L\,g \cos(\theta)
Using these expressions, the Lagrangian becomes:
\begin{aligned} T - V & = \frac{1}{2} \left(M\,\dot x^2 + m\left[(\dot x + L \, \cos(\theta)\dot \theta)^2 + (L\sin(\theta)\dot \theta) ^2 \right] \right) + m\,L\,g \cos(\theta) \\ & = \frac{1}{2} (M + m) \dot x^2 + m\,L \cos(\theta)\dot x \, \dot \theta + \frac{1}{2} m\,L^2 \, \dot \theta^2 + m\,L\,g\, \cos(\theta) \end{aligned}
It is now possible to compute the equations of motion. For x:
\begin{aligned} & \frac{\mathrm d}{\mathrm dt} \left(\frac{\partial L}{\partial \dot x}\right)-\frac{\partial L}{\partial x} = F_x \\ & \frac{\mathrm d}{\mathrm dt} \left( (M+m) \dot x + m\,L \cos(\theta) \dot \theta \right) = -\mu\, \dot x + u \\ & (M+m) \ddot x + m\,L \cos(\theta) \ddot \theta - m\,L \sin(\theta)\dot \theta^2 = -\mu\, \dot x + u \end{aligned}
For \theta:
\begin{aligned} & \frac{\mathrm d}{\mathrm dt} \left(\frac{\partial L}{\partial \dot \theta}\right)-\frac{\partial L}{\partial \theta} = F_\theta \\ & \frac{\mathrm d}{\mathrm dt} \left( m\,L \cos(\theta) \dot x + m\,L^2 \dot \theta \right) + m(L\,\dot x + L\,g)\sin(\theta) = 0 \\ & m\,L \cos(\theta) -m\,L\dot x \sin(\theta) + \ddot x + m\,L^2 \ddot \theta + m(L\,\dot x + L\,g)\sin(\theta) = 0 \\ & m\,L \left( \cos(\theta) \ddot x + L \ddot \theta + g \sin(\theta)\right) = 0 \end{aligned}
Therefore the motion of the system is:
\begin{aligned} & (M+m) \ddot x + m\,L \cos(\theta) \ddot \theta - m\,L \sin(\theta)\dot \theta^2 = -\mu\, \dot x + u \\ & m\,L \left( \cos(\theta) \ddot x + L \ddot \theta + g \sin(\theta)\right) = 0 \end{aligned}
It can be explicitly written as function of \ddot x and \ddot \theta:
\begin{aligned} & \ddot \theta = \frac{1}{L} (- \cos(\theta) \ddot x - g \sin(\theta))\\ & (M+m) \ddot x + m\,L \cos(\theta) \frac{1}{L} (- \cos(\theta) \ddot x - g \sin(\theta)) - m\,L \sin(\theta)\dot \theta^2 = -\mu\, \dot x + u \\ & (M+m) \ddot x - m \cos(\theta)^2 \ddot x - m\, g \cos(\theta)\sin(\theta) - m\,L \sin(\theta)\dot \theta^2 = -\mu\, \dot x + u \\ & \ddot x = \frac{m\, g \cos(\theta)\sin(\theta) + m\,L \sin(\theta)\dot \theta^2 -\mu\, \dot x + u} {M + m(1- \cos(\theta)^2} \\ & \ddot \theta = \frac{1}{L} \left(- \cos(\theta) \frac{m\, g \cos(\theta)\sin(\theta) + m\,L \sin(\theta)\dot \theta^2 -\mu\, \dot x + u} {(M + m(1- \cos(\theta)^2)} - g \sin(\theta)\right)\\ & \ddot \theta = \frac{1}{L} \left( \frac{-m\, g \cos(\theta)^2\sin(\theta) - m\,L \sin(\theta) \cos(\theta) \dot \theta^2 +\mu\, \cos(\theta) \, \dot x - \cos(\theta) u - g (M + m(1- \cos(\theta)^2) \sin(\theta)} {(M + m(1- \cos(\theta)^2)}\right)\\ & \ddot \theta = \frac{1}{L} \left( \frac{ - m\,L \sin(\theta) \cos(\theta) \dot \theta^2 +\mu\, \cos(\theta) \, \dot x - \cos(\theta) u - g (M + m) \sin(\theta)} {(M + m(1- \cos(\theta)^2)}\right)\\ & \ddot \theta = \frac{1}{L} \left( \frac{ - g (M + m) \sin(\theta) - \cos(\theta)(m\,L \sin(\theta) \dot \theta^2 - \mu\, \dot x + u)} {(M + m(1- \cos(\theta)^2)}\right) \end{aligned}
The system can be rewritten as a set of four equations of the first order setting v = \dot x and \omega = \dot \theta:
\begin{aligned} & \dot x = v \\ & \dot v = \frac{m\, g \cos(\theta)\sin(\theta) + m\,L \sin(\theta)\omega^2 - \mu\, v + u} {M + m(1- \cos(\theta)^2)} \\ & \dot \theta = \omega \\ & \dot \omega = \frac{ - g (M + m) \sin(\theta) - \cos(\theta)(m\,L \sin(\theta) \omega^2 - \mu\, v + u)} {L(M + m(1- \cos(\theta)^2)} \end{aligned}
Given a nonlinear input-output system:
\frac{\mathrm d}{\mathrm dt} \mathbf x = f(\mathbf x, \mathbf u)
it is possible to linearize the dynamics near a fix point (\bar {\mathbf x}, \bar {\mathbf u}) where f(\bar {\mathbf x}, \bar {\mathbf u}) = \mathbf 0 expanding with a Taylor series:
\mathbf{f}(\mathbf{x} + \Delta \mathbf x, \mathbf u + \Delta \mathbf u) = \mathbf{f}(\bar{\mathbf{x}}, \bar{\mathbf{u}}) + \frac{D\mathbf{f}}{D\mathbf{x}} \bigg\vert_{\bar{\mathbf{x}}, \bar{\mathbf u}} \Delta \mathbf{x} + \frac{D\mathbf{f}}{D\mathbf{u}} \bigg\vert_{\bar{\mathbf{x}}, \bar{\mathbf u}} \Delta \mathbf{u} + \cdots
The linearized system can be written as:
\frac{\mathrm d}{\mathrm dt} \mathbf x = \mathbf {A\,x} + \mathbf {B\,u}
Computing the Jacobian for the system \mathbf F(\mathbf x, \mathbf u):
\begin{aligned} & \frac{D \mathbf f}{D \mathbf x} = \mathbf A = \begin{bmatrix} 0 & 1 & 0 & 0 \\ 0 & - \dfrac{\mu}{M + m(1- \cos(\theta)^2)} & \dfrac{\partial v}{\partial \theta} & \dfrac{2\, m\, L \sin(\theta)}{M + m(1- \cos(\theta)^2)}\\ 0 & 0 & 0 & 1 \\ 0 & \dfrac{- \cos(\theta)\mu}{L \left( M + m(1- \cos(\theta)^2) \right)} & \dfrac{\partial \omega}{\partial \theta} & \dfrac{-2 \cos(\theta)m\,L\, \sin(\theta)}{L \left( M + m(1- \cos(\theta)^2) \right)} \end{bmatrix} \\ \\ & \frac{D \mathbf f}{D \mathbf u} = \mathbf B = \begin{bmatrix} 0 \\ \dfrac{1}{M + m(1- \cos(\theta)^2)} \\ 0 \\ \dfrac{- \cos(\theta)}{L \left( M + m(1- \cos(\theta)^2) \right)}\end{bmatrix} \end{aligned}
The partial derivative respect of \theta for v and \omega are:
\begin{aligned} \dfrac{\partial v}{\partial \theta} = & \dfrac{-g \, m \sin (\theta)^2 + g \,m \cos (\theta)^2+L\, m \, \omega^2 \cos (\theta)}{m \left(1-\cos(\theta)^2\right)+M}\\ & -\frac{2 \,m \sin (\theta) \cos (\theta) (-\mu \,v+g\, m \sin (x) \cos (\theta)+L\, m \omega^2 \sin (\theta)+u)}{\left(m \left(1-\cos (\theta)^2\right)+M\right)^2} \\ \dfrac{\partial \omega}{\partial \theta} = & \dfrac{\sin (\theta) (-\mu\,v+L \, m\,\omega^2 \sin (\theta)+u)-g (m+M) \cos (\theta)-L\, m\,\omega^2 \cos (\theta)^2}{L \left(m \left(1-\cos (\theta)^2\right)+M\right)} \\ & -\frac{2 m \sin (\theta) \cos (\theta) (-\cos (\theta) (-\mu \,v+L\, m\, \omega^2 \sin (\theta)+u)-g (m+M) \sin (\theta))}{L \left(m \left(1-\cos(\theta)^2\right)+M\right)^2} \end{aligned}
The equilibrium points of the matrix \mathbf A are (x,v,\theta, \omega) = (0, 0, n \, \pi,0), n \in \mathbb Z, with even n = 0, 2, 4, \dots corresponding to the pendulum in the down position and the odd n = 1, 3, 5, \dots in the up position.
For x=0, v=0, \theta = 0, \omega = 0:
\begin {aligned} & \dfrac{\partial v}{\partial \theta} = \dfrac{m\,g + L \,m \,\omega }{M} = \dfrac{m\,g}{M} \\ & \dfrac{\partial \omega}{\partial \theta} = \dfrac{-( m + M)g - L \,m \,\omega }{L\,M} = \dfrac{-( m + M)g}{L\,M} \end{aligned}
For x=0, v=0, \theta = \pi, \omega = 0:
\begin {aligned} & \dfrac{\partial v}{\partial \theta} =\dfrac{m\,g + L \,m \,\omega }{M} = \dfrac{m\,g}{M} \\ & \dfrac{\partial \omega}{\partial \theta} = \dfrac{( m + M)g - L \,m \,\omega }{L\,M} = \dfrac{( m + M)g}{L\,M} \end{aligned}
With these information it is possible to linearize around these points.
For the up position:
\dfrac{\mathrm d}{\mathrm dt} \begin{bmatrix} x \\ v \\ \theta \\ \omega \end{bmatrix} = \begin{bmatrix} 0 & 1 & 0 & 0 \\ 0 & -\dfrac{\mu}{M} & \dfrac{m\,g}{M} & 0 \\[7pt] 0 & 0 & 0 & 1 \\ 0 & - \dfrac{\mu}{M\,L} & \dfrac{(m+M)g}{M\,L} & 0 \end{bmatrix} \begin{bmatrix} x \\ v \\ \theta \\ \omega \end{bmatrix} + \begin{bmatrix} 0 \\ \dfrac{1}{M} \\[7pt] 0 \\ \dfrac{1}{M\,L} \end{bmatrix} u
For the down position:
\dfrac{\mathrm d}{\mathrm dt} \begin{bmatrix} x \\ v \\ \theta \\ \omega \end{bmatrix} = \begin{bmatrix} 0 & 1 & 0 & 0 \\ 0 & -\dfrac{\mu}{M} & \dfrac{m\,g}{M} & 0 \\[7pt] 0 & 0 & 0 & 1 \\ 0 & \dfrac{\mu}{M\,L} & -\dfrac{(m+M)g}{M\,L} & 0 \end{bmatrix} \begin{bmatrix} x \\ v \\ \theta \\ \omega \end{bmatrix} + \begin{bmatrix} 0 \\ \dfrac{1}{M} \\[7pt] 0 \\ -\dfrac{1}{M\,L} \end{bmatrix} u
The control strategy of the system for the up condition can be computed numerically. The open-loop control is depending from the eigenvalues of \mathbf A.
Taking for example as values for the up position, with m =1.5, M=5 and L=1.5 and \mu = 0.75:
\begin{aligned} \mathbf A & = \begin{bmatrix} 0 & 1 & 0 & 0 \\ 0 & -0.15 & 2.9420 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & -0.1 & 8.4991 & 0 \end{bmatrix}\\ \mathbf B & = \begin{bmatrix} 0 \\ 0.2 \\ 0 \\ 0.1333 \end{bmatrix} \\ \lambda_i(\mathbf A) & = \begin{bmatrix} 0 \\-2.9334 \\ -0.1153 \\ 2.8987 \end{bmatrix} \end{aligned}
Similarly, as example for the down position:
\begin{aligned} \mathbf A & = \begin{bmatrix} 0 & 1 & 0 & 0 \\ 0 & 0 & 2.9420 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & -8.4991 & 0 \end{bmatrix}\\ \mathbf B & = \begin{bmatrix} 0 \\ 0.2 \\ 0 \\ -0.1333 \end{bmatrix} \\ \lambda_i(\mathbf A) & = \begin{bmatrix} 0 \\ 0 \\ 2.9153 \,i \\ -2.9153\,i \end{bmatrix} \end{aligned}
No open-loop control strategy can change the dynamics of the system, given the eigenvalues of \mathbf A with one eigenvalue positive in the former case (unstable system) and with pure imaginary in the latter (periodic system). However, with full-state control feedback, given by:
u = -\mathbf{K\,x}
The closed-loop system becomes:
\frac{\mathrm d}{\mathrm dt} \mathbf x = \mathbf {A\,x} + \mathbf {B\,u} = (\mathbf A - \mathbf{B \, K}) \mathbf x
With an opportune choice of \mathbf K it is possible to change the eigenvalues of the systems in a way they become stable; the first step is to check if the system is controllable at all, and that involve to check the rank of the controllability matrix (ctrb(A,B)
); since the rank of the matrix is 4 the system is controllable. Then it is possible to move these eigenvalues and place them with the real part negative (for example with values [-0.5,\,-0.7,\,-0.9,\,-1.1]^T) so that the system is now stable.
For the up position example:
\lambda_p = \begin{bmatrix} -0.5 \\ -0.7 \\ -0.9 \\ -1.1 \end{bmatrix} \quad \mathbf K = \begin{bmatrix} -0.2650 \\ -2.1939 \\ 92.1907 \\ 26.1659 \end{bmatrix}
For the down position:
\lambda_p = \begin{bmatrix} -0.5 \\ -0.7 \\ -0.9 \\ -1.1 \end{bmatrix} \quad \mathbf K = \begin{bmatrix} 0.2650 \\ 1.4439 \\ 36.0907 \\ -21.8341 \end{bmatrix}
There are better strategy to chose the placement of the eigenvectors in place of an arbitrary choice (for example it is then possible to specify given two matrices \mathbf Q and \mathbf R matrices for the cost function and design an \mathbf{LQR} controller gain matrix \mathbf K); the two cases were chosen as reference to compare with the results a genetic algorithm based on neural networks.
I implemented a program that, in addition to classical feedback control, uses a neural network (NN) trained through a genetic algorithm (GA) to optimize the control strategy for the inverted pendulum on a cart. The system receives the cart’s position (x) and velocity (\dot{x}), along with the pendulum’s angle (\theta) and angular velocity (\dot{\theta}) as inputs. These values are normalized by divisors (XDIVISOR for position and velocity, and TDIVISOR for angle and angular velocity) to ensure consistent scaling for the neural network.
The neural network processes these inputs to generate four output signals, which are used to compute the control input. The control input, scaled by a gain K_{\text{NN}}, adjusts the force applied to the cart to stabilize the pendulum. The control law combines the neural network outputs with a feedback mechanism to ensure the cart and pendulum remain close to their desired reference states.
The fitness function evaluates the performance of each NN in the population by calculating the accumulated error over a simulation period. This error is computed by measuring the absolute differences between the cart’s position and velocity, and the pendulum’s angle and angular velocity, against their reference values. A heavier penalty is applied to deviations in the pendulum’s angle (\theta) to prioritize stabilizing the pendulum. If the system exceeds a predefined threshold (i.e., the cart’s position or pendulum’s angle becomes too large), the fitness is penalized, assigning a maximum value to that network.
The genetic algorithm evolves the population over multiple generations. Each generation, the neural networks are ranked based on their fitness scores, and the best-performing networks are selected to generate the next population using crossover and mutation. This process ensures that the population continuously improves over time, with the best individuals being preserved through elitism.
In each generation, the network with the best fitness is used to update the weights and biases of the model. Periodically, the system saves the trained network to a file, ensuring progress is preserved. The neural network evolves alongside classical control strategies, ultimately enhancing the system’s ability to stabilize the pendulum, even in highly nonlinear or unstable conditions.
The complete code is available on GitHub. The repository contains detailed instructions on how to set up the environment, compile, run the training and the tests and it is available at the following link.