Skip to content

[ICML 2021] The official PyTorch Implementations of Positive-Negative Momentum Optimizers.

License

Notifications You must be signed in to change notification settings

zeke-xie/Positive-Negative-Momentum

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Positive-Negative-Momentum

The official PyTorch Implementations of Positive-Negative Momentum Optimizers.

The algortihms are proposed in our paper: Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to Improve Generalization, which is accepted by ICML 2021. In the updated arxiv version, we fixed several notation typos that appeared in the ICML version due to the notation conflicts.

Why Positive-Negative Momentum?

It is well-known that stochastic gradient noise matters a lot to generalization. The Positive-Negative Momentum (PNM) approach, which is a powerful alternative to conventional Momentum in classic optimizers, can manipulate stochastic gradient noise by adjusting the extrahyperparameter.

The environment is as bellow:

Python 3.7.3

PyTorch >= 1.4.0

Usage

#You may use it as a standard PyTorch optimizer.

from pnm_optim import *

PNM_optimizer = PNM(net.parameters(), lr=lr, betas=(0.9, 1.), weight_decay=weight_decay)
AdaPNM_optimizer = AdaPNM(net.parameters(), lr=lr, betas=(0.9, 0.999, 1.), eps=1e-08, weight_decay=weight_decay)

Test performance

PNM versus conventional Momentum. We report the mean and the standard deviations (as the subscripts) of the optimal test errors computed over three runs of each experiment. The proposed PNM-based methods show significantly better generalization than conventional momentum-based methods. Particularly, as the theoretical analysis indicates, Stochastic PNM indeed consistently outperforms the conventional baseline, SGD.

Dataset Model PNM AdaPNM SGD M Adam AMSGrad AdamW AdaBound Padam Yogi RAdam
CIFAR-10 ResNet18 4.480.09 4.940.05 5.010.03 6.530.03 6.160.18 5.080.07 5.650.08 5.120.04 5.870.12 6.010.10
VGG16 6.260.05 5.990.11 6.420.02 7.310.25 7.140.14 6.480.13 6.760.12 6.150.06 6.900.22 6.560.04
CIFAR-100 ResNet34 20.590.29 20.410.18 21.520.37 27.160.55 25.530.19 22.990.40 22.870.13 22.720.10 23.570.12 24.410.40
DenseNet121 19.760.28 20.680.11 19.810.33 25.110.15 24.430.09 21.550.14 22.690.15 21.100.23 22.150.36 22.270.22
GoogLeNet 20.380.31 20.260.21 21.210.29 26.120.33 25.530.17 21.290.17 23.180.31 21.820.17 24.240.16 22.230.15

Citing

If you use Positive-Negative Momentum in your work, please cite

@InProceedings{xie2021positive,
  title = 	 {Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to Improve Generalization},
  author =       {Xie, Zeke and Yuan, Li and Zhu, Zhanxing and Sugiyama, Masashi},
  booktitle = 	 {Proceedings of the 38th International Conference on Machine Learning},
  pages = 	 {11448--11458},
  year = 	 {2021},
  editor = 	 {Meila, Marina and Zhang, Tong},
  volume = 	 {139},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {18--24 Jul},
  publisher =    {PMLR},
}

About

[ICML 2021] The official PyTorch Implementations of Positive-Negative Momentum Optimizers.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published