docs: update parameter guide on lr and momentum

pavlin-policar · May 24, 2023 · 252f610 · 252f610
1 parent f5c1f1b
commit 252f610
Show file tree

Hide file tree

Showing 2 changed files with 3 additions and 5 deletions.
diff --git a/docs/source/benchmarks.rst b/docs/source/benchmarks.rst
@@ -41,4 +41,4 @@ Similarly, care must be taken when benchmarking against numba-dependent librarie
 Reproducibility
 ---------------
 
-All benchmarks were run on an Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz processor. We also ran a subset of these benchmarks on a consumer-grade Intel Core i7-7700HQ processor found in laptop computers. The general trends were similar. All benchmarks were run using the provided benchmark script in the openTSNE repository ``openTSNE/benchmarks/benchmark.py``. The data set used can be found in the example notebooks. A direct link to the preprocessed pickled matrix file is available at ``http://file.biolab.si/opentsne/10x_mouse_zheng.pkl.gz``.
+All benchmarks were run on an Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz processor. We also ran a subset of these benchmarks on a consumer-grade Intel Core i7-7700HQ processor found in laptop computers. The general trends were similar. All benchmarks were run using the provided benchmark script in the openTSNE repository ``openTSNE/benchmarks/benchmark.py``. The data set used can be found in the example notebooks. A direct link to the preprocessed pickled matrix file is available at ``http://file.biolab.si/opentsne/benchmark/10x_mouse_zheng.pkl.gz``.
diff --git a/docs/source/parameters.rst b/docs/source/parameters.rst
@@ -37,12 +37,10 @@ Optimization parameters
 t-SNE uses a variation of gradient descent optimization procedure that incorporates momentum to speed up convergence of the embedding [3]_.
 
 learning_rate: float
-    The learning rate controls the step size of the gradient updates. This typically ranges from 100 to 1000, but usually the default (200) works well enough.
-
-    When dealing with large data sets e.g 500k samples or more, it may be necessary to increase the learning rate or to increase the number of iterations [1]_.
+    The learning rate controls the step size of the gradient updates. This parameter can be manually set, however, we recommend using the default value of "auto", which sets the learning rate by dividing the number of samples by the exaggearation factor.
 
 momentum: float
-    Gradient descent with momentum keeps a sum exponentially decaying weights from previous iterations, speeding up convergence. In early stages of the optimization, this is typically set to a lower value (0.5 in most implementations) since points generally move around quite a bit in this phase and increased after the initial early exaggeration phase (typically to 0.8) to speed up convergence.
+    To increase convergence speed and reduce the number of iterations required, we can augment gradient descent with a momentum term. Momentum stores an exponentially decaying sum of gradient updates from previous iterations. By default, this is typically set to 0.8.
 
 max_grad_norm: float
     By default, openTSNE does not apply gradient clipping. However, when embedding new data into an existing embedding, care must be taken that the data points do not "shoot off". Gradient clipping alevaites this issue.