Improve MEDIUM_LINE_BYTES
guessing with heuristic
#26
Labels
enhancement
important
perf impacted
This issues impacts performance of duplicut (either positively of negatively)
MEDIUM_LINE_BYTES
is currently hardcorded inconst.h
, to a value of 8.The hasmap & chunks chunks are then made in such way that if real medium length of lines is
MEDIUM_LINE_BYTES
, the hashmap will be filled by a factor defined byHMAP_LOAD_FACTOR
(currently set to 0.5, for 50% hmap filling).Therefore, we could read some random pages in the file (e.g: start/middle/end of file), and get a better guess of
MEDIUM_LINE_BYTES
from there.It would greatly improve performance in wordlists with a lot of very long lines (for example, a list of md5).
Because if lines are 32bytes long, hmap will be filled 12.5% only (50%/2/2). And a lot more chunks are needed.
The text was updated successfully, but these errors were encountered: