-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Beat ripgrep on all benchmarks, permanently and for all time #102
Comments
+1 for the "This Time It's Personal" milestone. ;) |
Haha nice, best of luck to ya! I'll try to highlight some of the most important points from my writeup as they related to
It will be interesting to see how much of a performance hit this entails! Also, I'll be adding a parallel directory iterator (hopefully soon), so I've still got a few more tricks left up my sleeve. :-) |
Hey @BurntSushi , Thanks again for the benchmarks and your excellent writeup thereof.
Aw shucks, I hate to blow my own horn, so it's especially nice when someone does it for me ;-). Seriously, like you say, much/most of the credit goes to PCRE/PCRE2's JIT. In my own testing as well, there's a big hit if JIT isn't enabled.
Definitely; benchmarking all these tools (ucg, rg, ag, ack, GNU grep, BSD grep, pcre[2]grep) against each other in any kind of meaningful manner is a real challenge, due to the many differences you've cited better than I can reproduce here. That's been the number one time-consumer for me recently, as I've tried to get comparisons across not just different use cases (~regex+corpus), but across dimensions such as:
Indeed. However, I usually don't use a
Indeed, though you may be interested in seeing (and I think you already have) how I'm handling this same issue in FileScanner_sse4_2.cpp::popcount16(). Kernighan's algorithm does a pretty respectable O(n) job in the number of set bits, and on x86 compiles down to... let me check... about 4 instructions including the loop's JNE. POPCNT still wins, but this seems to be a pretty close second. Also, if you haven't seen it already, you may want to read my treatise on the POPCNT subject entitled "Portability Pointers: The POPCNT Instruction: Not Portable Unless It Is, And Then Only Sometimes". Basically, virtualization (VirtualBox in my tale, but also valgrind and certainly others) can play havoc with what CPUID claims is supported, and with what really is supported, and the two may disagree. Hopefully Rust handles this for you, but that one was a real head-scratcher for me for a while.
Yeah, I've been meaning to see what happens if I turn on UTF-8. Probably time to start some experimentation. Good to hear from you, and thanks again, Gary R. Van Sickle |
Love this conversation! Great grep implementations to use and compare! This got me motivated to spent a couple of weekends coding in C++11 to build a new portable tool GitHub ugrep. It has full support for Unicode. Its faster than ripgrep and GNU grep, at least for practical use cases with large files (13GB text file, 100MB Wikipedia and 35MB source code) and with lots of small files (Qt source). There is still room for improvement by tuning the algos and by using SIMD/AVX which isn't yet the case in this early version of ugrep v1.5. With Unicode support and many other new easy-to-use features it is a good start IMHO. Love to hear from you what you think isn't so great and should be improved. |
@genivia-inc Neat tool. Looks like you put a lot of work into it. Unfortunately, I have some concerns with the way you're comparing it to other tools. Firstly, I don't see any way to reproduce your benchmarks. Maybe I skimmed too quickly, but I don't see anything that discusses what actual commands were run, what corpora were searched and what patterns were used. Secondly, you're benchmarking a pretty old version of ripgrep. ripgrep 0.10.0 was released over a year ago, but ugrep 1.5.0 was released 3 hours ago. You'd ideally run your benchmarks against ripgrep master, which has an unreleased performance bug fix: BurntSushi/ripgrep#1335 Thirdly, it is not easy to build your tool:
Please consider making it easier to compile your tool. If your install instructions require me to run Fourthly, you make weird claims like the fact that Boost.Regex is faster than PCRE, which is just... not even remotely true in my experience. To me, this suggests you either have a bug in your measurement process or your benchmarks are deeply biased. |
As you know, you do not need The A's to your Q's:
As for your "weird claims" comments...? This is thoroughly tested and there are thousands of users of RE/flex and folks have replicated these results based on the files that are available. |
Not really. I have my own set of benchmarks that suggest otherwise: https://github.com/rust-lang/regex/tree/master/bench I don't usually advertise those as they don't come with analysis, but you can compare the numbers the last time I ran my benchmarks against boost vs PCRE2. It's really not even close on most benchmarks. Others have drawn similar conclusions. Anyway, the thing left for me to do is to dig into your benchmark. Not sure when I'll have time to do that. But I don't see a lot of details on how your benchmark is run in your RE/flex README... You just vaguely describe the setup, but I don't know how to reproduce them.
Thanks. Yeah, a lot of your benchmarks are just about searching a lot of literals at once. ripgrep 0.10.0 was missing an important optimization there. It now just uses Aho-Corasick (or something better if it's a small number of literals) and beats ugrep on my machine:
I haven't tried all of them, but ripgrep and ugrep appear to be comparable on T-7 for example:
Your T-2 benchmark is interesting. Honestly, the file is so small that there is likely a lot of noise in the results. But I'm seeing a bigger difference on my system:
Such a small file is hard to benchmark reliably. If we expand the file 50 times:
then the difference is much more noticeable:
I also can't reproduce your T-8 benchmark while running on the
For your T-9 benchmark, it seems like
As for T-1, there's definitely a performance bug in ripgrep there related to the Now that I've looked through all of your benchmarks, they appear to suffer from some pretty severe bias. In particular, it looks like all of them have very high match counts. While this isn't an uncommon use case, searching for things with very few results is itself a pretty common thing to do. It would be wise to expand your benchmarks to include cases like that. Otherwise, a significant portion of your benchmark set is just going to be testing match and printing overhead. (Which, again, definitely should be measured. But not to the exclusion of everything else.)
Sorry, but I don't know that and I don't know who your users are. I've been working on regex engines for years and I've never heard of RE/flex.
You might already know this, but ripgrep doesn't use Aho-Corasick when searching a smallish (~100 or less) set of patterns. It uses a special SIMD accelerated algorithm that was developed in Hyperscan: https://github.com/BurntSushi/aho-corasick/tree/master/src/packed/teddy My machine is an Intel i7-6900K 3.20GHz running Linux 5.3.0. |
This is great and confirms that ripgrep can be beaten (sorry), because I have a lot of room left to continue improving ugrep after 1.5.0 for short word searches, e.g. by using SSE/AVX/NEON that is not yet used by the ugrep version you're testing. These planned SIMD improvements of ugrep will positively affect tests T-2, T-3, T-8, T-9 which you elaborate on in your response and I knew were not yet as fast without SIMD, but faster than ripgrep 0.10.0 (which also lacked such optimizations it seems). The latest ugrep 2.0 version is even faster than before (but no SIMD yet) and clearly shows it beats performance of other grep for many practical use cases. For the early ugrep version 1.5.0 that you compared, it is only best for the kind of searches that don't solely rely on single short word searches as SIMD is not yet used to optimize single word matches. It has excellent (better than ripgrep) parallel efficiency for recursive search. I've worked over 25 years in the research areas of compilers and high performance. For example, the GNU compiler uses my research work (since GCC 4.0) on cross-iteration dependence testing for SIMD auto-vectorization (chains of recurrences aka scalar evolutions). ugrep doesn't leverage SIMD/AVX yet and I have a plan to go about doing just that.
This is a strange result with respect to threaded search: if Your question about RE/flex: RE/flex is not a regex library, but a new lexer analyzer generator. Scanning is the forte of RE/flex, not searching. I added ugrep a few months ago as a relatively simple example and have been thinking since to improve it to make it "industrial strength" (whatever that means). That early version of ugrep with the older RE/flex engine was not optimized for searching yet. It is interesting (and fun coding) to optimize the engine for search as well, with a few tweaks and new methods/algorithms added over the last few weeks. However, It is still not fully optimized as of 1.5.0 as there is no SIMD/AVX usage among other things. You see, the main difference with scanning (aka continuous matching, tokenizing, lexical analysis) is that you'll get a match for every character in the file, i.e. there are no gaps to search ahead. Searching on the other hand greatly benefits from an optimized
With respect to Boost.Regex versus PCRE: there is a noticeable performance difference when regex engines are used for scanning, if you use the regex API smartly (e.g. don't extract matches as strings that are heap-allocated every time). The source code files for performance testing are available for download and linked in the CodeProject article. Quite a few people have looked into this too. And yes, Boost.Regex is faster that PCRE for this type of task, if you do it right. And some regex engines are unexpectedly terrible at this type of task. I've used the most efficient ways I could think of to compare and test the regex APIs, e.g. using iterators with Boost.Regex. Everything is precompiled to test compiled NFA/DFA hybrids fairly. Overall, I need a tool such as ugrep that is compatible and with results that are consistent with GNU/BSD grep. Judging from forums and blogs, if the options are not compatible to GNU/BSD grep, a new grep tool is pretty much dead on arrival. My main focus is performance, compatibility, and safety (priorities are not necessarily in that order). |
Well... I mean I do too... Best we can do is benchmark what we've got right now.
These are, by far, the most common kind of searches that ripgrep services. So I would be very careful about trying to claim that ugrep is faster in many practical use cases. A single search term is the single most practical example of using ripgrep in its target audience.
I haven't really seen any evidence to support this.
I followed the build steps you gave me exactly. Running with
Ah I see, I did not realize this was the main goal of ugrep. Sounds like they are probably designed for two very different use cases then. Might be good to mention that in the README.
These are crucial details. Looking back at your README, I can kind of see how it reads that way now. But it would probably help to add this important context to your benchmark results. It would ideally be nice if you made it easy for others to reproduce your benchmarks in a scriptable fashion. Like I did with benchsuite. And notice how the output documents everything pretty precisely, and provides the raw data in addition to confirming that match counts are correct.
ripgrep is mostly compatible with GNU/BSD grep. Certainly, it is not my primary goal, but I don't go out of my way to break compatibility unless there is compelling reason to do so. There is much more in common between ripgrep and grep than there are differences. Either way, clearly, ripgrep is not "dead on arrival." ;-) My main focus is really on the user experience and performance. Of course, I care about safety too, which is at least partially why I built ripgrep in a language that is safe by default. |
Elementary, my friend. Parallel computing 101. Sorry for thinking a bit fast as it's all in the README if you extrapolate the numbers. This is based on ripgrep 0.10.0 in the README results, but a quick check shows similar numbers for ripgrep 1.5.0. Recall that for P processors (P = number of cores or threads) we have: $ S_P = \frac{t_s}{t_P} $ (speedup) Based on T-8 and T-9, the following is observed:
As you know, increasing P when the parallel execution time t_P does not decrease is not useful and as a consequence the parallel efficiency goes down. Increasing P to >= 6 for ripgrep still decreases parallel execution time to t_P = 0.12 compared to t_s = 0.36, so picking P = 6 is fair. For ugrep, t_s = 0.20 and t_P = 0.10 for P = 3. Hence, parallel efficiency 67% > 50% for a common parallel search task (T-8). Ergo, ugrep has better parallel efficiency even when we plug in numbers that are biased in favor of ripgrep. Furthermore, parallel work states how well an algorithm is parallelized (lower is better as it requires fewer CPU cycles as a sum total over P). With P * E_P = 3 for ripgrep this is higher than 2 for ugrep. That's the intuition in the back of my head. This got me a bit excited to spend a few more days trying some new things with ugrep, and make RE/flex a better engine for search AND a fast scanner that it already is. There is nothing peculiar here about the setup I've test that is perfectly replicable. Future testing will show if this swings one way or another. But for now, for our early ugrep stable release, these numbers look pretty good IMHO. It's just the SIMD/AVX stuff that is needed to search short words faster, which is not that exciting to work on. But, yes, it is important. I did not say that it wasn't. |
I haven't been able to reproduce it, which is what I meant by evidence. You can spare the condescending lectures in your future responses, thanks. |
Filed an issue on it, ugrep doesnt spawn more threads at all it seems, sucks as i was really eager to test this now! Hoping for a quick fix. Nice write ups, i love to read it all as i learn a lot, but its a bit sad to talk about parallel computing 101 when the released code doesn't work. I suggest at bit more of a respectful tone, this is all about improving and combining power to really make the most of open source. But "it works here" so its all good right? Now that's coding 101. So far none of you guys have been able to beat find piped to xargs and zgrep on a lot of compressed files in a lot of directories, so i suggest less theory and more coding as well as real world benchmarking. So instead of focusing on beating each other on some of these tests, try to beat unix on my complex real world test. Thats a challenge, a test none of you designed. Best of luck. |
I've done plenty of both. That alone should be pretty clear. No need to talk down. |
Im was pretty sure the this time its personal milestone/tag was a joke now im not. Jk, but loosen up, i wasnt talking anyone down, or at least didnt mean to. It should be pretty clear that im hear to learn and evolve and not talk you down. And yes that is pretty obvious that all of you have done plenty of both. Yet the discussion ends up being a lecture in parallel programing 101 Instead of more actual tests on the cases none of you guys win at yet. This is about improving after all, not fighting. I was refering to the condecending tone of geniva inc and in a few other comments here, and i suggested coding and benchmarking instead of endless theoretical writeups when the code in question doesnt even work at the moment. And in the end, why and the theory doesnt matter that much if the results doesnt show improvements or can be reproduced. It matters more to me that a program is faster, rather than the for that it should be, or why. Although i love to learn about that as well. |
The "why" and "theory" do matter. To build tools like this, it's usually necessary to dig into that stuff. And discussing why something should be faster is absolutely critical, because it's basically akin to asking questions and either validating them or invalidating them, which is an extremely educational process. |
Of course it does! That’s not what I meant exactly. And yes, it is. Discussion is what drives this forward, never said anything about that, baseless claims are not. To say that why something should be faster matters more than what actually is the case is just not right. My point was that claims that cannot be backed up is reduced from why something should be faster, to why someone thinks something should be faster and thats pretty useless, because the theory is then wrong or the reasoning off. Yes you learn and evolve thats all nice but claims such as my parallelism is better than yours and i have room for improvement is baseless claims until proven in a reproduceable manner. |
No disrespect intended. I taught 20+ years of computer science to thousands of students and have a habit of explaining, in particular on matters when theory and practice diverge and when practice confirms theory. CS is truly a science of tradeoffs. A zero sum game of algorithms/programs that typically trades increased parallelism and speed for increased energy use and/or increased memory use. There is rarely an absolute win-win as these days may be gone. And no, to me grep-like tools are not close to being that important, even more so because I fully expect these will suffer one undeniable fate: the current state-of-the-art algorithms and design choices only work under the unrealistic assumption that software and hardware do not evolve. There are too many variables that depend on external factors that you cannot control. Take GNU grep's changes over the years, for example. Clever string search and matching algorithms are no longer as effective as used to be 5 to 10 years ago. In 5 to 10 years to come, SW and HW will continue to evolve. Worse, contemporary speedup and performance studies can only offer a snapshot of (actionable) information that might be useful to figure out what currently works (not so) well, what is comparable, and what approach may work better perhaps. But performance comparisons are fraught with issues, not to mention a fate of irrelevance. I mean, how many research publications that are 5 years old or older with results that are solely based on empirical performance results are still relevant today? I'm just repeating what is generally discussed in our circles.
I've tested on Mac (i7 4 core, SSD), CentOS (56 core, NFS), Raspbian (a slow RPi 3B, class 10 flash), and Windows (i5, SSD) but don't see an issue with threads not being spawned when more than one file is searched, nor did any of my colleagues. I can ask them to chime in if need be. In all for cases when more than one file is searched there is a speedup over a single thread with The ugrep C++11 code is pretty transparent on this matter. The code checks for hardware threads with I've investigated the use of pipes to buffer and merge the output, but that slowed things down for common cases, at least in initial tests to investigate alternative approaches. In some cases you may get an improvement. However, this is a clear case of speed versus memory use as a tradeoff, and as memory use grows, so does a negative impact of memory hierarchy on performance, eventually. Finally, ugrep is a small spinoff project that continues to evolve, and perhaps more importantly, it helps us with RE/flex development cycles. An updated version is in the works. |
I filed an issue to be able to provide all the needed info there. While its a somewhat IObound process it clearly wasnt limited by it as other ways benefited from parallelism. should be possible to do what find piped to xargs to zgrep does, just faster, which is interesting because all zgrep should do is zcat which invokes gzip -cd, and then pipe to grep. Testing with parallel vs xargs shows that the improvement lies with using xargs for parallelism, which is interesting. This approach doesnt use a lot of ram at all, and really shouldnt as that would be less effective the way i see it. Are you sure you guys are compiling from master and not some not released code? I had to add a linking flag in the arguments for the compiler within the configure file(made a pull request to fix that), which makes me think that you guys perhaps have different code, or use another way of building it. On ubuntu 16.04 this is what i get: |
Cool. Thanks for the details. Will take a look soon. |
Your tests ran into an incomplete implementation of decompression and buffering in ugrep, which slowed things down a lot as it decompressed the input byte-by-byte! Also, your use of Here are some of the changes I've made in ugrep v1.5.4:
Because all lines of the input files match, your suggested tests are a significant outlier compared to the usual queries that return more sparse results. I reran the tests you conducted, now showing parallel speedups and also show how much faster ugrep v1.5.4 is:
With one thread (
Using
where Note: ugrep options
and with
Now the ugrep
The next version of ugrep has further speed improvements and additional features. Coming soon!! |
ugrep 1.56 is by far the fastest grep for searching compressed logs now. Impressive. |
http://blog.burntsushi.net/ripgrep/
While we already win a few, and hold our own on others for the most part, @BurntSushi has found a few bugs and does actually beat us in some cases.
OH NOW IT IS ON! ;-)
The text was updated successfully, but these errors were encountered: