title | layout | description |
---|---|---|
MPPI |
page |
Overview of the algorithm and its variants |
{% assign use_output = false %}
Consider a general nonlinear system with discrete dynamics and cost function of the following form:
{% include math_functions.md %}
{% if use_output == true %} $$ \begin{align} \vb{x}_{t+1} &= \vb{F}\PP{\vb{x}t, \vb{u}t} \ \vb{y{t}} &= G\PP{\vb{x}t, \vb{u}t} \ \vb{J}(Y, U) &= \phi(\vb{y}{T}) + \sum{t = 0}^{T - 1}\vb{\ell}\PP{\vb{y}t, \vb{u}{t}} \end{align} $$ {% else %} $$ \begin{align} \vb{x}{t+1} &= \vb{F}\PP{\vb{x}t, \vb{u}t} \ \vb{J}(X, U) &= \phi(\vb{x}{T}) + \sum{t = 0}^{T - 1}\vb{\ell}\PP{\vb{x}t, \vb{u}{t}} \label{eq:cost_function} \end{align} $$ {% endif %}
where
{% if use_output == true %}
Looking at the above equations, we can see that there is a minor difference from the typical nonlinear control setup, shown below, in that we have our cost function using the output,
$$ \begin{align} \vb{J}(X,U) &= \phi(\vb{x}T) + \sum{t=0}^{T-1} \vb{\ell}\PP{\vb{x}_t, \vb{u}_t} \end{align} $$
For the vast majority of systems,
Model Predictive Path Integral (MPPI) is a stochastic optimal control algorithm that minimizes the cost function \eqref{eq:cost_function} the use of sampling. We start by sampling control trajectories, running each trajectory through the dynamics to create a corresponding state trajectory, and then evaluating each state and control trajectory through the cost function. Each trajectory's cost is then run through the exponential transform,
where
$$ \begin{align} \mathcal{U}^{*}t &= \sum{m=1}^{M} \frac{\expf{-\frac{1}{\lambda} \vb{J}\PP{X^m,V^m}}\vb{v}^m_t}{\sum_{j=1}^{M}\expf{-\frac{1}{\lambda} \vb{J}\PP{X^j,V^j}}}\ &= \vb{u}t + \sum{m=1}^{M} \frac{\expf{-\frac{1}{\lambda} \vb{J}\PP{X^m,V^m}}\epsilon^m_t}{\sum_{j=1}^{M}\expf{-\frac{1}{\lambda} \vb{J}\PP{X^j,V^j}}}, \label{eq:mppi_update_rule} \end{align} $$
where
{% include pseudocode.html id="mppi-pseudocode" code="
\begin{algorithm}
\caption{MPPI}
\begin{algorithmic}
\REQUIRE
The original derivation of MPPI was done from a path-integral approach [1]. Future papers then derived MPPI from information theory [2], stochastic search [3], and mirror descent [4] approaches. A Tube-based MPPI controller [5] was also created in order to improve robustness to state disturbances. It made use of a tracking controller to cause the real system to track back to a nominal system that ignored state disturbances causing large costs. Both the real and the nominal trajectories are calculated using MPPI while the tracking controller was iterative Linear Quadratic Regulator (iLQR). In this setup, the tracking controller would always be the one sending controls to the system and as MPPI was not aware of the tracking controller, it could end up fighting against the tracking controller. In order to address this, Robust MPPI (RMPPI) was developed in [6] [7], which applied the tracking controller feedback within the samples MPPI used. RMPPI also contains other changes that when taken together provided an upper bound on how quickly the cost function can grow due to disturbance. Our library contains implementations of these algorithmic improvements as different controllers are the best choice in different scenarios.
[1] G. Williams, P. Drews, B. Goldfain, J. M. Rehg, and E. A. Theodorou, "Aggressive Driving with Model Predictive Path Integral Control," in 2016 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2016, pp. 1433–1440. [Online]. Available: https://ieeexplore.ieee.org/document/7487277/
[2] G. Williams, P. Drews, B. Goldfain, J. M. Rehg, and E. A. Theodorou, "Information-Theoretic Model Predictive Control: Theory and Applications to Autonomous Driving," IEEE Transactions on Robotics, vol. 34, no. 6, pp. 1603-1622, 2018. [Online]. Available: https://ieeexplore.ieee.org/document/8558663
[3] Z. Wang, O. So, K. Lee, and E. A. Theodorou, "Adaptive Risk Sensitive Model Predictive Control with Stochastic Search," in Proceedings of the 3rd Conference on Learning for Dynamics and Control. PMLR, 2021, pp. 510–522. [Online]. Available: https://proceedings.mlr.press/v144/wang21b.html
[4] N. Wagener, C.-A. Cheng, J. Sacks, and B. Boots, "An Online Learning Approach to Model Predictive Control," in Proceedings of Robotics: Science and Systems, Freiburg im Breisgau, Germany, Jun. 2019. [Online]. Available: https://www.roboticsproceedings.org/rss15/p33.pdf
[5] G. Williams, B. Goldfain, P. Drews, K. Saigol, J. Rehg, and E. Theodorou, "Robust Sampling Based Model Predictive Control with Sparse Objective Information," in Robotics: Science and Systems XIV. Robotics: Science and Systems Foundation, Jun. 2018. [Online]. Available: http://www.roboticsproceedings.org/rss14/p42.pdf
[6] M. Gandhi, B. Vlahov, J. Gibson, G. Williams, and E. A. Theodorou, "Robust Model Predictive Path Integral Control: Analysis and Performance Guarantees," IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 1423-1430, Feb. 2021. [Online]. Available: https://arxiv.org/abs/2102.09027v1
[7] G. R. Williams, "Model Predictive Path Integral Control: Theoretical Foundations and Applications to Autonomous Driving," 2019