-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Three suggestions from performance profiling #193
Comments
Martin Edström ***@***.***> writes:
Hi and thank you for this svelte library!
I recently discovered some perf hotspots by profiling, and I have some suggestions to deal with them, but they are suggestions only! Up to you as developer :)
First, I'll say that my attempts to run the profiler hit some roadblocks, because this form SOMETIMES signals errors. Don't know if that is a bug or my mistake. Backtrace below.
(let ((done-ctr 0)
(max 20)) ;; Pretend 20 cores -> 20 processes
(profiler-start 'cpu+mem)
(dotimes (_ max)
(async-start (lambda ()
(sleep-for 0.5)
;; Simulate a real-world "hairy" dataset
(make-list 30 (cl-loop
repeat (random 5) collect
(make-list (random 5)
(number-to-string (random))))))
(lambda (result)
(ignore result)
(when (= (cl-incf done-ctr) max)
(profiler-stop)
(profiler-report))))))
Thanks to look into this, but I wonder why you run
(dotime (_ 20) (async-start (lambda () (do-something))) ...)
instead of
(async-start (lambda () (dotimes (_ 20) (do-something))) ...)
?
In the first place your application is not really async because you are
back to the emacs parent after each iteration, while in the second sexp
you are completely async, running only one emacs child.
…--
Thierry
|
It is still async, just with 20 children instead of one working in parallel (if your computer has 20 cores). Nice for performance-intensive applications :) But yea, then you have 20 sentinels waiting to run, instead of 1. So it's good to ensure that they are optimized, also to eliminate overhead in things like |
Martin Edström ***@***.***> writes:
1. ( ) text/plain (*) text/html
It is still async, just with 20 children instead of one working in
parallel (if your computer has 20 cores).
I hardly see how it could be async as the loop is turning on the parent
side and blocking Emacs even if at each iteration an async process is
running, did I miss something?
…--
Thierry
|
OK, first, I fixed the test-snippet. For some reason the error (defvar done-ctr 0)
(let ((max 20))
(setq done-ctr 0)
(profiler-start 'cpu+mem)
;; Supposing device has 20 cores, launch 20 processes
(dotimes (_ max)
(async-start (lambda ()
(sleep-for 0.5)
;; Simulate a real-world "hairy" dataset
(thread-last (make-list 50 nil)
(mapcar #'random)
(mapcar #'number-to-string)
(make-list (random 15))
(make-list (random 15))))
`(lambda (result)
(ignore result)
(cl-incf done-ctr)
(message "Receiving results from... process %d" done-ctr)
(when (= done-ctr ,max)
(profiler-stop)
(profiler-report)))))) Second, you'll see if you eval that, that Emacs stays responsive until the results start coming in from the 20 different subprocesses. In my mental model, async-start is just That's how this is async. The EDIT: Interestingly, I'm getting different profiler results. Now it is
|
Speaking of If it is in fact possible to refactor... the Emacs 30 NEWS file makes an argument for using the built-in process filter:
Saw the issues on here about problems decoding hash (#) characters, like #145, but are they still current? Perhaps only caused by using a custom process filter? I've tested with vanilla make-process and the default process filter. Made the subprocesses call |
Martin Edström ***@***.***> writes:
The dotimes loop just launches 20 system processes, so the loop itself
finishes in milliseconds, long before any one of the processes have
finished.
Ah yes of course, so I will look into this as soon as I am back at home
in november.
Thanks.
…--
Thierry
|
Take your time :) |
Don't know if this will be useful, but I found a clean way to erase the "Lisp expression: " from the process buffer before the output is inserted! Try in the Scratch buffer: ;; Will print empty string "" when called
(defun before-sentinel (proc)
(with-current-buffer (process-buffer proc)
;; For battle-testing: Send hash table with 1 key, where the value is
;; another `record' type called `org-node'
(process-send-string proc "#s(hash-table size 3694 test equal rehash-size 1.5 rehash-threshold 0.8125 data (\"cf486b81-a7bf-480c-9a94-317f980e1ee0\" #s(org-node nil nil \"~/org/daily/partner/2020-11-22.org\" \"2020-11-22\" \"cf486b81-a7bf-480c-9a94-317f980e1ee0\" 0 nil 1 nil ((\"ID\" . \"cf486b81-a7bf-480c-9a94-317f980e1ee0\") (\"CREATED\" . \"[2020-11-22]\")) nil nil (\"privy\" \"daily\") (\"privy\" \"daily\") \"2020-11-22\" nil)))")
(process-send-eof proc)
;; Reveal the initial "Lisp expression: " garbage.
;; Only works with connection-type `pipe', for some reason.
(accept-process-output proc)
(erase-buffer) ;; Erase the "Lisp expression: "
(print (buffer-string))))
;; Will print only the actual output when called,
;; hopefully the same hash table we inserted.
;; No "Lisp expression: " before it.
(defun sentinel (proc _)
(with-current-buffer (process-buffer proc)
(print (buffer-string))
(delete-process proc)))
(let ((proc (make-process
:buffer (get-buffer-create " *test*" t)
:name "test"
:connection-type 'pipe
:command (list "emacs" "-Q" "--batch"
"--eval" "(pp (read t))")
:sentinel #'sentinel)))
(before-sentinel proc)
nil) Anyway, it's just aesthetics. EDIT: A simpler trick: instead of |
Update: I wrote a library for my purposes, so don't worry about me: https://github.com/meedstrom/el-job/ Thanks for engaging! I learned a lot from async.el. FWIW, my library uses the default process filter. But I learned that if I just use |
Sorry for the delay, I didn't have the time to look into this.
Martin Edström ***@***.***> writes:
1. ( ) text/plain (*) text/html
Update: I wrote a library for my purposes, so don't worry about me:
https://github.com/meedstrom/el-job/ Thanks for engaging! I learned a
lot from async.el.
Nice, thanks. I think I will make something for locate-library along
these lines:
diff --git a/async.el b/async.el
index b960ebc..749cdca 100644
--- a/async.el
+++ b/async.el
@@ -466,11 +466,17 @@ Can be one of \"-Q\" or \"-q\".
Default is \"-Q\" but it is sometimes useful to use \"-q\" to have a
enhanced config or some more variables loaded.")
+(defvar async-library nil
+ "Cache async library path.
+This variable should be let bounded around an `async-start' call and not
+used globally. Should be found with `locate-library'.")
+
(defun async--emacs-program-args (&optional sexp)
"Return a list of arguments for invoking the child Emacs."
;; Using `locate-library' ensure we use the right file
;; when the .elc have been deleted.
- (let ((args (list async-quiet-switch "-l" (locate-library "async"))))
+ (let ((args (list async-quiet-switch "-l" (or async-library
+ (locate-library "async")))))
(when async-child-init
(setq args (append args (list "-l" async-child-init))))
(append args (list "-batch" "-f" "async-batch-invoke"
So a simple var to allow caching the locate-library output around your loop
calling multiple async processes.
FWIW, my library uses the default process filter. But I learned that
if I just use after-change-functions to notice new output in the
process buffers, it's fast on Emacs 31.0.50 but slow on Emacs
29.4. Not only due to the new default process filter, but
after-change-functions also seems to perform differently.
Yes, @monnier made changes to improve after-change-functions but I can't
remember what it was.
… So on Emacs 29.4, better to poll the buffers with a timer. 🤷🏼♀️
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you commented.*Message ID: ***@***.***>
--
Thierry
|
+This variable should be let bounded around an `async-start' call and not
^^^^^^^
bound
So a simple var to allow caching the locate-library output around your loop
calling multiple async processes.
Maybe this should be mentioned as an other example of slowness over at
Emacs bug#41646.
> FWIW, my library uses the default process filter. But I learned that
> if I just use after-change-functions to notice new output in the
> process buffers, it's fast on Emacs 31.0.50 but slow on Emacs
> 29.4. Not only due to the new default process filter, but
> after-change-functions also seems to perform differently.
Yes, @monnier made changes to improve after-change-functions but
I can't remember what it was.
I'm glad to hear Emacs-31 is faster, but I must admit I can't think of
anything I've changed which would cause `after-change-functions` to be
significantly faster in Emacs-31 than in Emacs-29.
|
monnier ***@***.***> writes:
I'm glad to hear Emacs-31 is faster, but I must admit I can't think of
anything I've changed which would cause `after-change-functions` to be
significantly faster in Emacs-31 than in Emacs-29.
My bad! I was pretty sure you had done some work around this.
… —
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you commented.*Message ID: ***@***.***>
--
Thierry
|
> I'm glad to hear Emacs-31 is faster, but I must admit I can't think of
> anything I've changed which would cause `after-change-functions` to be
> significantly faster in Emacs-31 than in Emacs-29.
My bad! I was pretty sure you had done some work around this.
No harm at all, my point is rather to clarify that the origin of the
speed up is still unknown, so it might be worthwhile to track it down.
|
It could always be a fluke on my end. The library I made has three code paths to allow comparison (docstring at https://github.com/meedstrom/el-job/blob/32ea3c18394ef56bb61c4699c02038122160ea3a/el-job.el#L255-L271), and maybe other people don't see the same difference. But eh, it's a bit of a crazy library. Too many lines of code, one reason I'll be dropping the three code paths once Debian trixie is out. |
Hi and thank you for this svelte library!
I recently discovered some perf hotspots by profiling, and I have some suggestions to deal with them, but they are suggestions only! Up to you as developer :)
First, I'll say that my attempts to run the profiler hit some roadblocks, because this form SOMETIMES signals errors. Don't know if that is a bug or my mistake. Backtrace below. (EDIT: See a working version at #193 (comment))
Backtrace:
Fortunately, I was able to get results anyway, because when it hits the error, the profiler has not been stopped. So I can manually stop and produce a report, and see results from the processes that did not hit the error.
Findings follow.
1.
backward-sexp
In the case of a large amount of data,
async-when-done
spends half its CPU just to callbackward-sexp
once. The rest is spent on the FINISH-FUNC, so it's pretty sleek aside from this one call.Suggestion: Run something other than
backward-sexp
This substitute works in my application (org-node, which I'm refactoring to depend on async.el):2.
locate-library
Having solved the case of a large amount of data, over 60% of CPU time is spent on
locate-library
, which is repeated for every subprocess spawned.Suggestion: Memoize the result of
locate-library
.To expire this memoization, I see two options:
Bonus: I happen to use in production something that's faster than
locate-library
, and ensures to use the .eln if available. Don't have FSF assignment yet, but if you want this verbatim, I'll get off my butt and submit the paperwork.In any case, using .eln somehow would promise some all-around perf boosts.
(EDIT Oct 29: Fixed some issues in that code snippet)
3.
file-truename
While
locate-library
stood for 60%+,file-truename
stood for about 8%.Suggestions:
file-truename
if not needed.file-chase-links
could suffice?The text was updated successfully, but these errors were encountered: