Three suggestions from performance profiling #193

meedstrom · 2024-10-27T17:58:31Z

Hi and thank you for this svelte library!

I recently discovered some perf hotspots by profiling, and I have some suggestions to deal with them, but they are suggestions only! Up to you as developer :)

First, I'll say that my attempts to run the profiler hit some roadblocks, because this form SOMETIMES signals errors. Don't know if that is a bug or my mistake. Backtrace below. (EDIT: See a working version at #193 (comment))

(let ((done-ctr 0)
      (max 20)) ;; Pretend 20 cores -> 20 processes
  (profiler-start 'cpu+mem)
  (dotimes (_ max)
    (async-start (lambda ()
                   (sleep-for 0.5)
                   ;; Simulate a real-world "hairy" dataset
                   (make-list 30 (cl-loop
                                  repeat (random 5) collect
                                  (make-list (random 5)
                                             (number-to-string (random))))))
                 (lambda (result)
                   (ignore result)
                   (when (= (cl-incf done-ctr) max)
                     (profiler-stop)
                     (profiler-report))))))

Backtrace:

Debugger entered--Lisp error: (wrong-type-argument number-or-marker-p nil)
  async-handle-result((closure ((max . 20) (done-ctr . 0)) (result) (ignore result) (if (= (setq done-ctr (1+ done-ctr)) max) (progn (profiler-stop) (profiler-report)))) (async-signal (wrong-type-argument number-or-marker-p nil)) #<buffer *emacs*<18>>)
  async-when-done(#<process emacs> "finished\n")

Fortunately, I was able to get results anyway, because when it hits the error, the profiler has not been stopped. So I can manually stop and produce a report, and see results from the processes that did not hit the error.

Findings follow.

1. `backward-sexp`

In the case of a large amount of data, async-when-done spends half its CPU just to call backward-sexp once. The rest is spent on the FINISH-FUNC, so it's pretty sleek aside from this one call.

Suggestion: Run something other than backward-sexp This substitute works in my application (org-node, which I'm refactoring to depend on async.el):

;; We can see that replacing the `backward-sexp' call
;; completely eliminates that perf bottleneck
(advice-add 'backward-sexp :override
             (byte-compile (defun async--goto-start-of-value ()
                             (goto-char 1)
                             (search-forward "Lisp expression: \n"))))
;; Clean up after testing
(advice-remove 'backward-sexp #'async--goto-start-of-value)

2. `locate-library`

Having solved the case of a large amount of data, over 60% of CPU time is spent on locate-library, which is repeated for every subprocess spawned.

Suggestion: Memoize the result of locate-library.

To expire this memoization, I see two options:

a timer of 10 seconds or so
make a convenience wrapper for spawning many async processes at once, in which this value and potentially other values could be memoized. I would also be willing to develop such a wrapper, if the PR would be accepted.

Bonus: I happen to use in production something that's faster than locate-library, and ensures to use the .eln if available. Don't have FSF assignment yet, but if you want this verbatim, I'll get off my butt and submit the paperwork.

In any case, using .eln somehow would promise some all-around perf boosts.

(defun ensure-compiled-lib (feature)
  "Look for .eln, .elc or .el file corresponding to FEATURE.
FEATURE is a symbol as it shows up in `features'.

Guess which one was in fact loaded by the current Emacs,
and return it if it is .elc or .eln.

If it is .el, then opportunistically compile it and return the newly
compiled file instead.  This returns an .elc on the first call, then an
.eln on future calls.

Note: if you are currently editing the source code for FEATURE, use
`eval-buffer' and save to ensure this finds the correct file."
  (let* ((hit (cl-loop
               for (file . elems) in load-history
               when (eq feature (cdr (assq 'provide elems)))
               return
               ;; Want two pieces of info: the file path according to
               ;; `load-history', and some function supposedly defined
               ;; there.  The function is a better source of info, for
               ;; discovering an .eln.
               (cons file (cl-loop
                           for elem in elems
                           when (and (consp elem)
                                     (eq 'defun (car elem))
                                     (not (consp (symbol-function (cdr elem))))
                                     (not (function-alias-p (cdr elem))))
                           return (cdr elem)))))
         (file-name-handler-alist '(("\\.gz\\'" . jka-compr-handler))) ;; perf
         (loaded (or (and (ignore-errors
                            ;; REVIEW: `symbol-file' uses expand-file-name,
                            ;;         but I'm not convinced it is needed
                            (expand-file-name
                             (native-comp-unit-file
                              (subr-native-comp-unit
                               (symbol-function (cdr hit)))))))
                     (car hit))))
    (unless loaded
      (error "Current Lisp definitions need to come from a file %S[.el/.elc/.eln]"
             feature))
    ;; HACK: Sometimes comp.el makes freefn- temp files; pretend we found .el.
    ;;       Not a good hack, because load-path is NOT as trustworthy as
    ;;       load-history.
    (when (string-search "freefn-" loaded)
      (setq loaded
            (locate-file (symbol-name feature) load-path '(".el" ".el.gz"))))
    (if (or (string-suffix-p ".el" loaded)
            (string-suffix-p ".el.gz" loaded))
        (or (when (native-comp-available-p)
              ;; If we built an .eln last time, use it now even though
              ;; the current Emacs process is still running interpreted .el.
              (comp-lookup-eln loaded))
            (let* ((elc (file-name-concat temporary-file-directory
                                          (concat (symbol-name feature)
                                                  ".elc")))
                   (byte-compile-dest-file-function
                    `(lambda (&rest _) ,elc)))
              (when (native-comp-available-p)
                (native-compile-async (list loaded)))
              ;; Native comp may take a while, so return elc this time.
              ;; We should not pick an .elc from load path if Emacs is now
              ;; running interpreted code, since the code's likely newer.
              (if (or (file-newer-than-file-p elc loaded)
                      (byte-compile-file loaded))
                  ;; NOTE: On Guix we should never end up here, but if we
                  ;;       did, that'd be a problem as Guix will probably
                  ;;       reuse the first .elc we ever made forever, even
                  ;;       after upgrades to .el, due to 1970 timestamps.
                  elc
                loaded)))
      ;; Either .eln or .elc was loaded, so use the same for the
      ;; subprocesses.  We should not opportunistically build an .eln if
      ;; Emacs had loaded an .elc for the current process, because we
      ;; cannot assume the source .el is equivalent code.
      ;; The .el could be in-development, newer than .elc, so subprocesses
      ;; should use the same .elc for compatibility right up until the point
      ;; the developer actually evals the .el buffer.
      loaded)))

(EDIT Oct 29: Fixed some issues in that code snippet)

3. `file-truename`

While locate-library stood for 60%+, file-truename stood for about 8%.

Suggestions:

memoize this value, or
don't run file-truename if not needed.
- Don't know what issue it tried to solve, but perhaps file-chase-links could suffice?

The text was updated successfully, but these errors were encountered:

thierryvolpiatto · 2024-10-28T22:55:19Z

Martin Edström ***@***.***> writes:

Hi and thank you for this svelte library! I recently discovered some perf hotspots by profiling, and I have some suggestions to deal with them, but they are suggestions only! Up to you as developer :) First, I'll say that my attempts to run the profiler hit some roadblocks, because this form SOMETIMES signals errors. Don't know if that is a bug or my mistake. Backtrace below. (let ((done-ctr 0) (max 20)) ;; Pretend 20 cores -> 20 processes (profiler-start 'cpu+mem) (dotimes (_ max) (async-start (lambda () (sleep-for 0.5) ;; Simulate a real-world "hairy" dataset (make-list 30 (cl-loop repeat (random 5) collect (make-list (random 5) (number-to-string (random)))))) (lambda (result) (ignore result) (when (= (cl-incf done-ctr) max) (profiler-stop) (profiler-report))))))

Thanks to look into this, but I wonder why you run (dotime (_ 20) (async-start (lambda () (do-something))) ...) instead of (async-start (lambda () (dotimes (_ 20) (do-something))) ...) ? In the first place your application is not really async because you are back to the emacs parent after each iteration, while in the second sexp you are completely async, running only one emacs child.

…

-- Thierry

meedstrom · 2024-10-28T23:03:15Z

It is still async, just with 20 children instead of one working in parallel (if your computer has 20 cores). Nice for performance-intensive applications :)

But yea, then you have 20 sentinels waiting to run, instead of 1. So it's good to ensure that they are optimized, also to eliminate overhead in things like async-when-done.

thierryvolpiatto · 2024-10-29T13:40:13Z

Martin Edström ***@***.***> writes:

1. ( ) text/plain (*) text/html It is still async, just with 20 children instead of one working in parallel (if your computer has 20 cores).

I hardly see how it could be async as the loop is turning on the parent side and blocking Emacs even if at each iteration an async process is running, did I miss something?

…

-- Thierry

meedstrom · 2024-10-29T14:27:15Z

OK, first, I fixed the test-snippet. For some reason the error wrong-type argument number-or-marker-p nil did not appear once I removed cl-loop from the START-FUNC.

(defvar done-ctr 0)
(let ((max 20))
  (setq done-ctr 0)
  (profiler-start 'cpu+mem)
  ;; Supposing device has 20 cores, launch 20 processes
  (dotimes (_ max)
    (async-start (lambda ()
                   (sleep-for 0.5)
                   ;; Simulate a real-world "hairy" dataset
                   (thread-last (make-list 50 nil)
                                (mapcar #'random)
                                (mapcar #'number-to-string)
                                (make-list (random 15))
                                (make-list (random 15))))
                 `(lambda (result)
                    (ignore result)
                    (cl-incf done-ctr)
                    (message "Receiving results from... process %d" done-ctr)
                    (when (= done-ctr ,max)
                      (profiler-stop)
                      (profiler-report))))))

Second, you'll see if you eval that, that Emacs stays responsive until the results start coming in from the 20 different subprocesses.

In my mental model, async-start is just make-process, and FINISH-FUNC is the process sentinel. Right? The sentinels are not called during the dotimes loop, but at an undefined point in the future, after the processes have finished.

That's how this is async. The dotimes loop just launches 20 system processes, so the loop itself finishes in milliseconds, long before any one of the processes have finished.

EDIT: Interestingly, I'm getting different profiler results. Now it is string-match standing for 65% of CPU time. May be affected by the exact shape of the simulated "hairy" dataset.

        1325  67% - #<byte-code-function D58>
        1325  67%  - async-read-from-client
        1284  65%     string-match
          24   1%     read
          12   0%     search-forward
         384  19% - command-execute
         384  19%  - funcall-interactively
         384  19%   - eval-last-sexp
         384  19%    - #<byte-code-function D3C>
         384  19%     - elisp--eval-last-sexp
         384  19%      - eval
         384  19%       - progn
         384  19%        - let
         384  19%         - let
         384  19%          - while
         384  19%           - let
         384  19%            - async-start
         354  18%             - async--emacs-program-args
         354  18%              + locate-library
          19   0%             + file-truename
          10   0%             + async-start-process
           1   0%             + async--transmit-sexp

meedstrom · 2024-10-29T14:42:37Z

Speaking of string-match, I have not understood all the reasons that went into async.el design, like why it uses a custom process filter.

If it is in fact possible to refactor... the Emacs 30 NEWS file makes an argument for using the built-in process filter:

---
** The default process filter was rewritten in native code.
The round-trip through the Lisp function
'internal-default-process-filter' is skipped when the process filter is
the default one.  It is reimplemented in native code, reducing GC churn.
To undo this change, set 'fast-read-process-output' to nil.

+++

Saw the issues on here about problems decoding hash (#) characters, like #145, but are they still current? Perhaps only caused by using a custom process filter?

I've tested with vanilla make-process and the default process filter. Made the subprocesses call prin1 to return literal records looking like #s(data data data), and there was no issue calling read in the sentinel. That's what org-node does now (at these lines https://github.com/meedstrom/org-node/blob/3241743c4b3d0c69968b301e62cb0602932297da/org-node.el#L1215-L1255).

thierryvolpiatto · 2024-10-29T15:09:32Z

Martin Edström ***@***.***> writes:

The dotimes loop just launches 20 system processes, so the loop itself finishes in milliseconds, long before any one of the processes have finished.

Ah yes of course, so I will look into this as soon as I am back at home in november. Thanks.

…

-- Thierry

meedstrom · 2024-10-29T15:21:05Z

Take your time :)

meedstrom · 2024-11-01T18:46:29Z

Don't know if this will be useful, but I found a clean way to erase the "Lisp expression: " from the process buffer before the output is inserted!

Try in the Scratch buffer:

;; Will print empty string "" when called
(defun before-sentinel (proc)
  (with-current-buffer (process-buffer proc)
    ;; For battle-testing: Send hash table with 1 key, where the value is
    ;; another `record' type called `org-node'
    (process-send-string proc "#s(hash-table size 3694 test equal rehash-size 1.5 rehash-threshold 0.8125 data (\"cf486b81-a7bf-480c-9a94-317f980e1ee0\" #s(org-node nil nil \"~/org/daily/partner/2020-11-22.org\" \"2020-11-22\" \"cf486b81-a7bf-480c-9a94-317f980e1ee0\" 0 nil 1 nil ((\"ID\" . \"cf486b81-a7bf-480c-9a94-317f980e1ee0\") (\"CREATED\" . \"[2020-11-22]\")) nil nil (\"privy\" \"daily\") (\"privy\" \"daily\") \"2020-11-22\" nil)))")
    (process-send-eof proc)
    ;; Reveal the initial "Lisp expression: " garbage.
    ;; Only works with connection-type `pipe', for some reason.
    (accept-process-output proc)
    (erase-buffer) ;; Erase the "Lisp expression: "
    (print (buffer-string))))


;; Will print only the actual output when called,
;; hopefully the same hash table we inserted.
;; No "Lisp expression: " before it.
(defun sentinel (proc _)
  (with-current-buffer (process-buffer proc)
    (print (buffer-string))
    (delete-process proc)))


(let ((proc (make-process
             :buffer (get-buffer-create " *test*" t)
             :name "test"
             :connection-type 'pipe
             :command (list "emacs" "-Q" "--batch"
                            "--eval" "(pp (read t))")
             :sentinel #'sentinel)))
  (before-sentinel proc)
  nil)

Anyway, it's just aesthetics.

EDIT: A simpler trick: instead of (read), use (read-minibuffer ""). Because that's what (read) does anyway: it calls (read-minibuffer "Lisp expression: ").

meedstrom · 2024-11-18T09:58:46Z

Update: I wrote a library for my purposes, so don't worry about me: https://github.com/meedstrom/el-job/ Thanks for engaging! I learned a lot from async.el.

FWIW, my library uses the default process filter. But I learned that if I just use after-change-functions to notice new output in the process buffers, it's fast on Emacs 31.0.50 but slow on Emacs 29.4. Not only due to the rewritten default process filter, but after-change-functions also seems to perform differently. So on Emacs 29.4, better to poll the buffers with a timer. 🤷🏼‍♀️

thierryvolpiatto · 2024-11-19T07:22:04Z

Sorry for the delay, I didn't have the time to look into this. Martin Edström ***@***.***> writes:

1. ( ) text/plain (*) text/html Update: I wrote a library for my purposes, so don't worry about me: https://github.com/meedstrom/el-job/ Thanks for engaging! I learned a lot from async.el.

Nice, thanks. I think I will make something for locate-library along these lines: diff --git a/async.el b/async.el index b960ebc..749cdca 100644 --- a/async.el +++ b/async.el @@ -466,11 +466,17 @@ Can be one of \"-Q\" or \"-q\". Default is \"-Q\" but it is sometimes useful to use \"-q\" to have a enhanced config or some more variables loaded.") +(defvar async-library nil + "Cache async library path. +This variable should be let bounded around an `async-start' call and not +used globally. Should be found with `locate-library'.") + (defun async--emacs-program-args (&optional sexp) "Return a list of arguments for invoking the child Emacs." ;; Using `locate-library' ensure we use the right file ;; when the .elc have been deleted. - (let ((args (list async-quiet-switch "-l" (locate-library "async")))) + (let ((args (list async-quiet-switch "-l" (or async-library + (locate-library "async"))))) (when async-child-init (setq args (append args (list "-l" async-child-init)))) (append args (list "-batch" "-f" "async-batch-invoke" So a simple var to allow caching the locate-library output around your loop calling multiple async processes.

FWIW, my library uses the default process filter. But I learned that if I just use after-change-functions to notice new output in the process buffers, it's fast on Emacs 31.0.50 but slow on Emacs 29.4. Not only due to the new default process filter, but after-change-functions also seems to perform differently.

Yes, @monnier made changes to improve after-change-functions but I can't remember what it was.

…

So on Emacs 29.4, better to poll the buffers with a timer. 🤷🏼‍♀️ — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.*Message ID: ***@***.***>

-- Thierry

monnier · 2024-11-19T14:42:10Z

+This variable should be let bounded around an `async-start' call and not

^^^^^^^ bound

So a simple var to allow caching the locate-library output around your loop calling multiple async processes.

Maybe this should be mentioned as an other example of slowness over at Emacs bug#41646.

> FWIW, my library uses the default process filter. But I learned that > if I just use after-change-functions to notice new output in the > process buffers, it's fast on Emacs 31.0.50 but slow on Emacs > 29.4. Not only due to the new default process filter, but > after-change-functions also seems to perform differently. Yes, @monnier made changes to improve after-change-functions but I can't remember what it was.

I'm glad to hear Emacs-31 is faster, but I must admit I can't think of anything I've changed which would cause `after-change-functions` to be significantly faster in Emacs-31 than in Emacs-29.

thierryvolpiatto · 2024-11-19T16:38:59Z

monnier ***@***.***> writes:

I'm glad to hear Emacs-31 is faster, but I must admit I can't think of anything I've changed which would cause `after-change-functions` to be significantly faster in Emacs-31 than in Emacs-29.

My bad! I was pretty sure you had done some work around this.

…

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.*Message ID: ***@***.***>

-- Thierry

monnier · 2024-11-19T18:53:11Z

> I'm glad to hear Emacs-31 is faster, but I must admit I can't think of > anything I've changed which would cause `after-change-functions` to be > significantly faster in Emacs-31 than in Emacs-29. My bad! I was pretty sure you had done some work around this.

No harm at all, my point is rather to clarify that the origin of the speed up is still unknown, so it might be worthwhile to track it down.

meedstrom · 2024-11-19T19:05:46Z

It could always be a fluke on my end.

The library I made has three code paths to allow comparison (docstring at https://github.com/meedstrom/el-job/blob/32ea3c18394ef56bb61c4699c02038122160ea3a/el-job.el#L255-L271), and maybe other people don't see the same difference.

But eh, it's a bit of a crazy library. Too many lines of code, one reason I'll be dropping the three code paths once Debian trixie is out.

thierryvolpiatto added a commit that referenced this issue Nov 21, 2024

Allow caching locate-library result (#193)

863b336

thierryvolpiatto mentioned this issue Nov 25, 2024

Use temporary file for async byte compile log #194

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Three suggestions from performance profiling #193

Three suggestions from performance profiling #193

meedstrom commented Oct 27, 2024 •

edited

Loading

thierryvolpiatto commented Oct 28, 2024 via email

meedstrom commented Oct 28, 2024

thierryvolpiatto commented Oct 29, 2024 via email

meedstrom commented Oct 29, 2024 •

edited

Loading

meedstrom commented Oct 29, 2024

thierryvolpiatto commented Oct 29, 2024 via email

meedstrom commented Oct 29, 2024

meedstrom commented Nov 1, 2024 •

edited

Loading

meedstrom commented Nov 18, 2024 •

edited

Loading

thierryvolpiatto commented Nov 19, 2024 via email

monnier commented Nov 19, 2024 via email

thierryvolpiatto commented Nov 19, 2024 via email

monnier commented Nov 19, 2024 via email

meedstrom commented Nov 19, 2024

Three suggestions from performance profiling #193

Three suggestions from performance profiling #193

Comments

meedstrom commented Oct 27, 2024 • edited Loading

1. backward-sexp

2. locate-library

3. file-truename

thierryvolpiatto commented Oct 28, 2024 via email

meedstrom commented Oct 28, 2024

thierryvolpiatto commented Oct 29, 2024 via email

meedstrom commented Oct 29, 2024 • edited Loading

meedstrom commented Oct 29, 2024

thierryvolpiatto commented Oct 29, 2024 via email

meedstrom commented Oct 29, 2024

meedstrom commented Nov 1, 2024 • edited Loading

meedstrom commented Nov 18, 2024 • edited Loading

thierryvolpiatto commented Nov 19, 2024 via email

monnier commented Nov 19, 2024 via email

thierryvolpiatto commented Nov 19, 2024 via email

monnier commented Nov 19, 2024 via email

meedstrom commented Nov 19, 2024

meedstrom commented Oct 27, 2024 •

edited

Loading

1. `backward-sexp`

2. `locate-library`

3. `file-truename`

meedstrom commented Oct 29, 2024 •

edited

Loading

meedstrom commented Nov 1, 2024 •

edited

Loading

meedstrom commented Nov 18, 2024 •

edited

Loading