The official PyTorch Implementations of Positive-Negative Momentum Optimizers.
The algortihms are proposed in our paper: Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to Improve Generalization, which is accepted by ICML 2021. In the updated arxiv version, we fixed several notation typos that appeared in the ICML version due to the notation conflicts.
It is well-known that stochastic gradient noise matters a lot to generalization. The Positive-Negative Momentum (PNM) approach, which is a powerful alternative to conventional Momentum in classic optimizers, can manipulate stochastic gradient noise by adjusting the extrahyperparameter.
Python 3.7.3
PyTorch >= 1.4.0
#You may use it as a standard PyTorch optimizer.
from pnm_optim import *
PNM_optimizer = PNM(net.parameters(), lr=lr, betas=(0.9, 1.), weight_decay=weight_decay)
AdaPNM_optimizer = AdaPNM(net.parameters(), lr=lr, betas=(0.9, 0.999, 1.), eps=1e-08, weight_decay=weight_decay)
PNM versus conventional Momentum. We report the mean and the standard deviations (as the subscripts) of the optimal test errors computed over three runs of each experiment. The proposed PNM-based methods show significantly better generalization than conventional momentum-based methods. Particularly, as the theoretical analysis indicates, Stochastic PNM indeed consistently outperforms the conventional baseline, SGD.
Dataset | Model | PNM | AdaPNM | SGD M | Adam | AMSGrad | AdamW | AdaBound | Padam | Yogi | RAdam |
---|---|---|---|---|---|---|---|---|---|---|---|
CIFAR-10 | ResNet18 | 4.480.09 | 4.940.05 | 5.010.03 | 6.530.03 | 6.160.18 | 5.080.07 | 5.650.08 | 5.120.04 | 5.870.12 | 6.010.10 |
VGG16 | 6.260.05 | 5.990.11 | 6.420.02 | 7.310.25 | 7.140.14 | 6.480.13 | 6.760.12 | 6.150.06 | 6.900.22 | 6.560.04 | |
CIFAR-100 | ResNet34 | 20.590.29 | 20.410.18 | 21.520.37 | 27.160.55 | 25.530.19 | 22.990.40 | 22.870.13 | 22.720.10 | 23.570.12 | 24.410.40 |
DenseNet121 | 19.760.28 | 20.680.11 | 19.810.33 | 25.110.15 | 24.430.09 | 21.550.14 | 22.690.15 | 21.100.23 | 22.150.36 | 22.270.22 | |
GoogLeNet | 20.380.31 | 20.260.21 | 21.210.29 | 26.120.33 | 25.530.17 | 21.290.17 | 23.180.31 | 21.820.17 | 24.240.16 | 22.230.15 |
If you use Positive-Negative Momentum in your work, please cite
@InProceedings{xie2021positive,
title = {Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to Improve Generalization},
author = {Xie, Zeke and Yuan, Li and Zhu, Zhanxing and Sugiyama, Masashi},
booktitle = {Proceedings of the 38th International Conference on Machine Learning},
pages = {11448--11458},
year = {2021},
editor = {Meila, Marina and Zhang, Tong},
volume = {139},
series = {Proceedings of Machine Learning Research},
month = {18--24 Jul},
publisher = {PMLR},
}