Skip to content

Commit

Permalink
docs: update parameter guide on lr and momentum
Browse files Browse the repository at this point in the history
  • Loading branch information
pavlin-policar committed May 24, 2023
1 parent f5c1f1b commit 252f610
Show file tree
Hide file tree
Showing 2 changed files with 3 additions and 5 deletions.
2 changes: 1 addition & 1 deletion docs/source/benchmarks.rst
Original file line number Diff line number Diff line change
Expand Up @@ -41,4 +41,4 @@ Similarly, care must be taken when benchmarking against numba-dependent librarie
Reproducibility
---------------

All benchmarks were run on an Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz processor. We also ran a subset of these benchmarks on a consumer-grade Intel Core i7-7700HQ processor found in laptop computers. The general trends were similar. All benchmarks were run using the provided benchmark script in the openTSNE repository ``openTSNE/benchmarks/benchmark.py``. The data set used can be found in the example notebooks. A direct link to the preprocessed pickled matrix file is available at ``http://file.biolab.si/opentsne/10x_mouse_zheng.pkl.gz``.
All benchmarks were run on an Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz processor. We also ran a subset of these benchmarks on a consumer-grade Intel Core i7-7700HQ processor found in laptop computers. The general trends were similar. All benchmarks were run using the provided benchmark script in the openTSNE repository ``openTSNE/benchmarks/benchmark.py``. The data set used can be found in the example notebooks. A direct link to the preprocessed pickled matrix file is available at ``http://file.biolab.si/opentsne/benchmark/10x_mouse_zheng.pkl.gz``.
6 changes: 2 additions & 4 deletions docs/source/parameters.rst
Original file line number Diff line number Diff line change
Expand Up @@ -37,12 +37,10 @@ Optimization parameters
t-SNE uses a variation of gradient descent optimization procedure that incorporates momentum to speed up convergence of the embedding [3]_.

learning_rate: float
The learning rate controls the step size of the gradient updates. This typically ranges from 100 to 1000, but usually the default (200) works well enough.

When dealing with large data sets e.g 500k samples or more, it may be necessary to increase the learning rate or to increase the number of iterations [1]_.
The learning rate controls the step size of the gradient updates. This parameter can be manually set, however, we recommend using the default value of "auto", which sets the learning rate by dividing the number of samples by the exaggearation factor.

momentum: float
Gradient descent with momentum keeps a sum exponentially decaying weights from previous iterations, speeding up convergence. In early stages of the optimization, this is typically set to a lower value (0.5 in most implementations) since points generally move around quite a bit in this phase and increased after the initial early exaggeration phase (typically to 0.8) to speed up convergence.
To increase convergence speed and reduce the number of iterations required, we can augment gradient descent with a momentum term. Momentum stores an exponentially decaying sum of gradient updates from previous iterations. By default, this is typically set to 0.8.

max_grad_norm: float
By default, openTSNE does not apply gradient clipping. However, when embedding new data into an existing embedding, care must be taken that the data points do not "shoot off". Gradient clipping alevaites this issue.
Expand Down

0 comments on commit 252f610

Please sign in to comment.