New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Talk about gradients in `algorithmic_differentiation.md` #457

Open

Jollywatt wants to merge 2 commits into compintell:main from Jollywatt:docs

Jollywatt commented Feb 4, 2025

This PR contains the addition of another "aside" section explaining why the choice of inner product does not affect the gradient of functions. I also took the liberty to change the wording in some places.

Jollywatt force-pushed the docs branch from d8fd131 to e34d47b Compare

February 4, 2025 12:49

codecov bot commented Feb 4, 2025 •

edited

Loading

Codecov Report

All modified and coverable lines are covered by tests ✅

Jollywatt force-pushed the docs branch 3 times, most recently from 1bdbe0b to ccf16e1 Compare

February 5, 2025 15:35


          Minor rewording in docs

f7d6ba0

Jollywatt force-pushed the docs branch from ccf16e1 to de5dcdd Compare

February 5, 2025 15:37


          Rework docs about gradients

83be391

Jollywatt force-pushed the docs branch from de5dcdd to 83be391 Compare

February 5, 2025 15:44

Jollywatt changed the title ~~Minor additions to algorithmic_differentiation.md~~ Talk about gradients in algorithmic_differentiation.md

willtebbutt reviewed

View reviewed changes

Member

willtebbutt left a comment

I really like this -- it's a fantastic extension to the docs that ties a lot together. I just have a few small comments, but think it's basically good to go.

docs/src/understanding_mooncake/algorithmic_differentiation.md

-              The role of the adjoint is revealed when we consider ``f := \mathcal{l} \circ g``, where ``g : \mathcal{X} \to \mathcal{Y}``, ``\mathcal{l}(y) := \langle \bar{y}, y \rangle``, and ``\bar{y} \in \mathcal{Y}`` is some fixed vector.
-              Noting that ``D \mathcal{l} [y](\dot{y}) = \langle \bar{y}, \dot{y} \rangle``, we apply the chain rule to obtain
+              An alternative characterisation is that ``\nabla f(x)`` is the vector pointing in the direction of steepest ascent on ``f`` at ``x``, with magnitude equal to the directional derivative in that steepest direction.

Member

willtebbutt Feb 6, 2025

I think I'm not quite clear what is meant by with magnitude equal to the directional derivative in that steepest direction. -- is there a precise mathematical statement by which you can explain what this means?

docs/src/understanding_mooncake/algorithmic_differentiation.md

+              Notice that the value of the gradient depends on how the inner product on ``\mathcal{X}`` is defined.
+              Indeed, different choices of inner product result in different values of ``\nabla f``.
+              Adjoints such as ``D f[x]^*`` are also inner product dependent.
+              However, the actual derivative ``D f[x]`` is of course invariant -- it makes no reference to the inner product.

Member

willtebbutt Feb 6, 2025

Is this correct, technically? We make use of the norms for both X and Y in the definition of the Frechet derivative, which I've been assuming we take to be the norms induced by whichever inner products we pick on X and Y. Would it be more accurate to point out that the definition is invariant because all norms in are equivalent in finite dimensions?

docs/src/understanding_mooncake/algorithmic_differentiation.md

Comment on lines +433 to +434

		In practice, Mooncake uses the Euclidean inner product, extended in the "obvious way" to other composite data types (that is, as if everything is flattened and embedded in ``\mathbb{R}^N``).
		But we endeavour to keep the discussion general in order to make the role of the inner product explicit.

Member

willtebbutt Feb 6, 2025

Suggested change

      
            In practice, Mooncake uses the Euclidean inner product, extended in the "obvious way" to other composite data types (that is, as if everything is flattened and embedded in ``\mathbb{R}^N``).
          
            But we endeavour to keep the discussion general in order to make the role of the inner product explicit.
          
            In practice, Mooncake uses the Euclidean inner product, extended in the "obvious way" to other composite data types (that is, as if everything is flattened and embedded in ``\mathbb{R}^N``), but we endeavour to keep the discussion general in order to make the role of the inner product explicit.

grammar

docs/src/understanding_mooncake/algorithmic_differentiation.md

    
              ```

              from which we conclude that ``D g [x]^\ast (\bar{y})`` is the gradient of the composition ``l \circ g`` at ``x``.

              where the second equality follows from the gradient's implicit definition.

Member

willtebbutt Feb 6, 2025

Suggested change

      
            where the second equality follows from the gradient's implicit definition.
          
            where the second equality follows from the gradient's definition.

Reading this, I briefly thought that we had multiple definitions of the gradient lying around, and the one you are using here is the "implicit" one, before realising you're just trying to point out that our definition of the gradient is implicit. I wonder if others might read it in the same way, meaning that it's better just to refer to the "gradient's definition"?

docs/src/understanding_mooncake/algorithmic_differentiation.md


		_Example_

		The adjoint derivative of ``f(x, y) = x + y_1 y_2`` (see [above](#AD-of-a-Julia-function:-a-slightly-less-trivial-example)) immediately gives

Member

willtebbutt Feb 6, 2025

Suggested change

      
            The adjoint derivative of ``f(x, y) = x + y_1 y_2`` (see [above](#AD-of-a-Julia-function:-a-slightly-less-trivial-example)) immediately gives
          
            The adjoint of the derivative of ``f(x, y) = x + y_1 y_2`` (see [above](#AD-of-a-Julia-function:-a-slightly-less-trivial-example)) immediately gives

nit-pick: I don't believe we refer to the "adjoint derivative" anywhere, but we do refer to the "adjoint" and the "adjoint of the derivative" interchangeably. Is this a typo, or ought we to be talking about the "adjoint derivative"?

docs/src/understanding_mooncake/algorithmic_differentiation.md

+              \nabla f(x, y) = D f[x, y]^\ast (1) = (1, (y_2, y_1)) .
+              ```
+              _**Aside: Adjoint Derivatives as Gradients**_

Member

willtebbutt Feb 6, 2025

Same here -- "Adjoint Derivatives" vs "Adjoint" or "Adjoint of the Derivative" etc

docs/src/understanding_mooncake/algorithmic_differentiation.md

+              To compute the gradient in forwards-mode, we need to evaluate the forwards pass ``\dim \mathcal{X}`` times.
+              We also need to refer to a basis ``\{\mathbf{e}_i\}`` of ``\mathcal{X}`` and its reciprocal basis ``\{\mathbf{e}^i\}`` defined by ``\langle \mathbf{e}_i, \mathbf{e}^j \rangle = \delta_i^j``.
+              (For any basis there exists such a reciprocal basis, and they are the same if the basis is orthonormal.)

Member

willtebbutt Feb 6, 2025

Suggested change

      
            (For any basis there exists such a reciprocal basis, and they are the same if the basis is orthonormal.)
          
            For any basis there exists such a reciprocal basis, and they are the same for orthonormal bases such as the standard basis. As a result, you can replace any occurrences of ``\{\mathbf{e}^i\}`` with ``\{\mathbf{e}_i\}`` in what follows and still have a correct understanding of the mathematics underpinning Mooncake.

nip-pick: I don't think we need the brackets here, and I think it would be good to allude to the standard basis.

What are your thoughts on my bit about "replace occurences of..."? I'm not 100% certain I've phrased this perfectly, but I would like to reassure readers that this is indeed the consequence of the previous sentence. Maybe I'm overthinking it...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet