Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Three suggestions from performance profiling #193

Open
meedstrom opened this issue Oct 27, 2024 · 14 comments
Open

Three suggestions from performance profiling #193

meedstrom opened this issue Oct 27, 2024 · 14 comments

Comments

@meedstrom
Copy link

meedstrom commented Oct 27, 2024

Hi and thank you for this svelte library!

I recently discovered some perf hotspots by profiling, and I have some suggestions to deal with them, but they are suggestions only! Up to you as developer :)

First, I'll say that my attempts to run the profiler hit some roadblocks, because this form SOMETIMES signals errors. Don't know if that is a bug or my mistake. Backtrace below. (EDIT: See a working version at #193 (comment))

(let ((done-ctr 0)
      (max 20)) ;; Pretend 20 cores -> 20 processes
  (profiler-start 'cpu+mem)
  (dotimes (_ max)
    (async-start (lambda ()
                   (sleep-for 0.5)
                   ;; Simulate a real-world "hairy" dataset
                   (make-list 30 (cl-loop
                                  repeat (random 5) collect
                                  (make-list (random 5)
                                             (number-to-string (random))))))
                 (lambda (result)
                   (ignore result)
                   (when (= (cl-incf done-ctr) max)
                     (profiler-stop)
                     (profiler-report))))))

Backtrace:

Debugger entered--Lisp error: (wrong-type-argument number-or-marker-p nil)
  async-handle-result((closure ((max . 20) (done-ctr . 0)) (result) (ignore result) (if (= (setq done-ctr (1+ done-ctr)) max) (progn (profiler-stop) (profiler-report)))) (async-signal (wrong-type-argument number-or-marker-p nil)) #<buffer *emacs*<18>>)
  async-when-done(#<process emacs> "finished\n")

Fortunately, I was able to get results anyway, because when it hits the error, the profiler has not been stopped. So I can manually stop and produce a report, and see results from the processes that did not hit the error.

Findings follow.

1. backward-sexp

In the case of a large amount of data, async-when-done spends half its CPU just to call backward-sexp once. The rest is spent on the FINISH-FUNC, so it's pretty sleek aside from this one call.

Suggestion: Run something other than backward-sexp This substitute works in my application (org-node, which I'm refactoring to depend on async.el):

;; We can see that replacing the `backward-sexp' call
;; completely eliminates that perf bottleneck
(advice-add 'backward-sexp :override
             (byte-compile (defun async--goto-start-of-value ()
                             (goto-char 1)
                             (search-forward "Lisp expression: \n"))))
;; Clean up after testing
(advice-remove 'backward-sexp #'async--goto-start-of-value)

2. locate-library

Having solved the case of a large amount of data, over 60% of CPU time is spent on locate-library, which is repeated for every subprocess spawned.

Suggestion: Memoize the result of locate-library.

To expire this memoization, I see two options:

  1. a timer of 10 seconds or so
  2. make a convenience wrapper for spawning many async processes at once, in which this value and potentially other values could be memoized. I would also be willing to develop such a wrapper, if the PR would be accepted.

Bonus: I happen to use in production something that's faster than locate-library, and ensures to use the .eln if available. Don't have FSF assignment yet, but if you want this verbatim, I'll get off my butt and submit the paperwork.

In any case, using .eln somehow would promise some all-around perf boosts.

(defun ensure-compiled-lib (feature)
  "Look for .eln, .elc or .el file corresponding to FEATURE.
FEATURE is a symbol as it shows up in `features'.

Guess which one was in fact loaded by the current Emacs,
and return it if it is .elc or .eln.

If it is .el, then opportunistically compile it and return the newly
compiled file instead.  This returns an .elc on the first call, then an
.eln on future calls.

Note: if you are currently editing the source code for FEATURE, use
`eval-buffer' and save to ensure this finds the correct file."
  (let* ((hit (cl-loop
               for (file . elems) in load-history
               when (eq feature (cdr (assq 'provide elems)))
               return
               ;; Want two pieces of info: the file path according to
               ;; `load-history', and some function supposedly defined
               ;; there.  The function is a better source of info, for
               ;; discovering an .eln.
               (cons file (cl-loop
                           for elem in elems
                           when (and (consp elem)
                                     (eq 'defun (car elem))
                                     (not (consp (symbol-function (cdr elem))))
                                     (not (function-alias-p (cdr elem))))
                           return (cdr elem)))))
         (file-name-handler-alist '(("\\.gz\\'" . jka-compr-handler))) ;; perf
         (loaded (or (and (ignore-errors
                            ;; REVIEW: `symbol-file' uses expand-file-name,
                            ;;         but I'm not convinced it is needed
                            (expand-file-name
                             (native-comp-unit-file
                              (subr-native-comp-unit
                               (symbol-function (cdr hit)))))))
                     (car hit))))
    (unless loaded
      (error "Current Lisp definitions need to come from a file %S[.el/.elc/.eln]"
             feature))
    ;; HACK: Sometimes comp.el makes freefn- temp files; pretend we found .el.
    ;;       Not a good hack, because load-path is NOT as trustworthy as
    ;;       load-history.
    (when (string-search "freefn-" loaded)
      (setq loaded
            (locate-file (symbol-name feature) load-path '(".el" ".el.gz"))))
    (if (or (string-suffix-p ".el" loaded)
            (string-suffix-p ".el.gz" loaded))
        (or (when (native-comp-available-p)
              ;; If we built an .eln last time, use it now even though
              ;; the current Emacs process is still running interpreted .el.
              (comp-lookup-eln loaded))
            (let* ((elc (file-name-concat temporary-file-directory
                                          (concat (symbol-name feature)
                                                  ".elc")))
                   (byte-compile-dest-file-function
                    `(lambda (&rest _) ,elc)))
              (when (native-comp-available-p)
                (native-compile-async (list loaded)))
              ;; Native comp may take a while, so return elc this time.
              ;; We should not pick an .elc from load path if Emacs is now
              ;; running interpreted code, since the code's likely newer.
              (if (or (file-newer-than-file-p elc loaded)
                      (byte-compile-file loaded))
                  ;; NOTE: On Guix we should never end up here, but if we
                  ;;       did, that'd be a problem as Guix will probably
                  ;;       reuse the first .elc we ever made forever, even
                  ;;       after upgrades to .el, due to 1970 timestamps.
                  elc
                loaded)))
      ;; Either .eln or .elc was loaded, so use the same for the
      ;; subprocesses.  We should not opportunistically build an .eln if
      ;; Emacs had loaded an .elc for the current process, because we
      ;; cannot assume the source .el is equivalent code.
      ;; The .el could be in-development, newer than .elc, so subprocesses
      ;; should use the same .elc for compatibility right up until the point
      ;; the developer actually evals the .el buffer.
      loaded)))

(EDIT Oct 29: Fixed some issues in that code snippet)

3. file-truename

While locate-library stood for 60%+, file-truename stood for about 8%.

Suggestions:

  1. memoize this value, or
  2. don't run file-truename if not needed.
    • Don't know what issue it tried to solve, but perhaps file-chase-links could suffice?
@thierryvolpiatto
Copy link
Collaborator

thierryvolpiatto commented Oct 28, 2024 via email

@meedstrom
Copy link
Author

It is still async, just with 20 children instead of one working in parallel (if your computer has 20 cores). Nice for performance-intensive applications :)

But yea, then you have 20 sentinels waiting to run, instead of 1. So it's good to ensure that they are optimized, also to eliminate overhead in things like async-when-done.

@thierryvolpiatto
Copy link
Collaborator

thierryvolpiatto commented Oct 29, 2024 via email

@meedstrom
Copy link
Author

meedstrom commented Oct 29, 2024

OK, first, I fixed the test-snippet. For some reason the error wrong-type argument number-or-marker-p nil did not appear once I removed cl-loop from the START-FUNC.

(defvar done-ctr 0)
(let ((max 20))
  (setq done-ctr 0)
  (profiler-start 'cpu+mem)
  ;; Supposing device has 20 cores, launch 20 processes
  (dotimes (_ max)
    (async-start (lambda ()
                   (sleep-for 0.5)
                   ;; Simulate a real-world "hairy" dataset
                   (thread-last (make-list 50 nil)
                                (mapcar #'random)
                                (mapcar #'number-to-string)
                                (make-list (random 15))
                                (make-list (random 15))))
                 `(lambda (result)
                    (ignore result)
                    (cl-incf done-ctr)
                    (message "Receiving results from... process %d" done-ctr)
                    (when (= done-ctr ,max)
                      (profiler-stop)
                      (profiler-report))))))

Second, you'll see if you eval that, that Emacs stays responsive until the results start coming in from the 20 different subprocesses.

In my mental model, async-start is just make-process, and FINISH-FUNC is the process sentinel. Right? The sentinels are not called during the dotimes loop, but at an undefined point in the future, after the processes have finished.

That's how this is async. The dotimes loop just launches 20 system processes, so the loop itself finishes in milliseconds, long before any one of the processes have finished.


EDIT: Interestingly, I'm getting different profiler results. Now it is string-match standing for 65% of CPU time. May be affected by the exact shape of the simulated "hairy" dataset.

        1325  67% - #<byte-code-function D58>
        1325  67%  - async-read-from-client
        1284  65%     string-match
          24   1%     read
          12   0%     search-forward
         384  19% - command-execute
         384  19%  - funcall-interactively
         384  19%   - eval-last-sexp
         384  19%    - #<byte-code-function D3C>
         384  19%     - elisp--eval-last-sexp
         384  19%      - eval
         384  19%       - progn
         384  19%        - let
         384  19%         - let
         384  19%          - while
         384  19%           - let
         384  19%            - async-start
         354  18%             - async--emacs-program-args
         354  18%              + locate-library
          19   0%             + file-truename
          10   0%             + async-start-process
           1   0%             + async--transmit-sexp

@meedstrom
Copy link
Author

Speaking of string-match, I have not understood all the reasons that went into async.el design, like why it uses a custom process filter.

If it is in fact possible to refactor... the Emacs 30 NEWS file makes an argument for using the built-in process filter:

---
** The default process filter was rewritten in native code.
The round-trip through the Lisp function
'internal-default-process-filter' is skipped when the process filter is
the default one.  It is reimplemented in native code, reducing GC churn.
To undo this change, set 'fast-read-process-output' to nil.

+++

Saw the issues on here about problems decoding hash (#) characters, like #145, but are they still current? Perhaps only caused by using a custom process filter?

I've tested with vanilla make-process and the default process filter. Made the subprocesses call prin1 to return literal records looking like #s(data data data), and there was no issue calling read in the sentinel. That's what org-node does now (at these lines https://github.com/meedstrom/org-node/blob/3241743c4b3d0c69968b301e62cb0602932297da/org-node.el#L1215-L1255).

@thierryvolpiatto
Copy link
Collaborator

thierryvolpiatto commented Oct 29, 2024 via email

@meedstrom
Copy link
Author

Take your time :)

@meedstrom
Copy link
Author

meedstrom commented Nov 1, 2024

Don't know if this will be useful, but I found a clean way to erase the "Lisp expression: " from the process buffer before the output is inserted!

Try in the Scratch buffer:

;; Will print empty string "" when called
(defun before-sentinel (proc)
  (with-current-buffer (process-buffer proc)
    ;; For battle-testing: Send hash table with 1 key, where the value is
    ;; another `record' type called `org-node'
    (process-send-string proc "#s(hash-table size 3694 test equal rehash-size 1.5 rehash-threshold 0.8125 data (\"cf486b81-a7bf-480c-9a94-317f980e1ee0\" #s(org-node nil nil \"~/org/daily/partner/2020-11-22.org\" \"2020-11-22\" \"cf486b81-a7bf-480c-9a94-317f980e1ee0\" 0 nil 1 nil ((\"ID\" . \"cf486b81-a7bf-480c-9a94-317f980e1ee0\") (\"CREATED\" . \"[2020-11-22]\")) nil nil (\"privy\" \"daily\") (\"privy\" \"daily\") \"2020-11-22\" nil)))")
    (process-send-eof proc)
    ;; Reveal the initial "Lisp expression: " garbage.
    ;; Only works with connection-type `pipe', for some reason.
    (accept-process-output proc)
    (erase-buffer) ;; Erase the "Lisp expression: "
    (print (buffer-string))))


;; Will print only the actual output when called,
;; hopefully the same hash table we inserted.
;; No "Lisp expression: " before it.
(defun sentinel (proc _)
  (with-current-buffer (process-buffer proc)
    (print (buffer-string))
    (delete-process proc)))


(let ((proc (make-process
             :buffer (get-buffer-create " *test*" t)
             :name "test"
             :connection-type 'pipe
             :command (list "emacs" "-Q" "--batch"
                            "--eval" "(pp (read t))")
             :sentinel #'sentinel)))
  (before-sentinel proc)
  nil)

Anyway, it's just aesthetics.

EDIT: A simpler trick: instead of (read), use (read-minibuffer ""). Because that's what (read) does anyway: it calls (read-minibuffer "Lisp expression: ").

@meedstrom
Copy link
Author

meedstrom commented Nov 18, 2024

Update: I wrote a library for my purposes, so don't worry about me: https://github.com/meedstrom/el-job/ Thanks for engaging! I learned a lot from async.el.

FWIW, my library uses the default process filter. But I learned that if I just use after-change-functions to notice new output in the process buffers, it's fast on Emacs 31.0.50 but slow on Emacs 29.4. Not only due to the rewritten default process filter, but after-change-functions also seems to perform differently. So on Emacs 29.4, better to poll the buffers with a timer. 🤷🏼‍♀️

@thierryvolpiatto
Copy link
Collaborator

thierryvolpiatto commented Nov 19, 2024 via email

@monnier
Copy link
Contributor

monnier commented Nov 19, 2024 via email

@thierryvolpiatto
Copy link
Collaborator

thierryvolpiatto commented Nov 19, 2024 via email

@monnier
Copy link
Contributor

monnier commented Nov 19, 2024 via email

@meedstrom
Copy link
Author

It could always be a fluke on my end.

The library I made has three code paths to allow comparison (docstring at https://github.com/meedstrom/el-job/blob/32ea3c18394ef56bb61c4699c02038122160ea3a/el-job.el#L255-L271), and maybe other people don't see the same difference.

But eh, it's a bit of a crazy library. Too many lines of code, one reason I'll be dropping the three code paths once Debian trixie is out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants