Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve MEDIUM_LINE_BYTES guessing with heuristic #26

Open
nil0x42 opened this issue Oct 3, 2020 · 1 comment
Open

Improve MEDIUM_LINE_BYTES guessing with heuristic #26

nil0x42 opened this issue Oct 3, 2020 · 1 comment
Labels
enhancement important perf impacted This issues impacts performance of duplicut (either positively of negatively)

Comments

@nil0x42
Copy link
Owner

nil0x42 commented Oct 3, 2020

MEDIUM_LINE_BYTES is currently hardcorded in const.h, to a value of 8.
The hasmap & chunks chunks are then made in such way that if real medium length of lines is MEDIUM_LINE_BYTES, the hashmap will be filled by a factor defined by HMAP_LOAD_FACTOR (currently set to 0.5, for 50% hmap filling).

Therefore, we could read some random pages in the file (e.g: start/middle/end of file), and get a better guess of MEDIUM_LINE_BYTES from there.

It would greatly improve performance in wordlists with a lot of very long lines (for example, a list of md5).
Because if lines are 32bytes long, hmap will be filled 12.5% only (50%/2/2). And a lot more chunks are needed.

@nil0x42 nil0x42 added enhancement perf impacted This issues impacts performance of duplicut (either positively of negatively) labels Oct 3, 2020
@nil0x42
Copy link
Owner Author

nil0x42 commented Oct 16, 2020

Count occurrences of newline in buffer (stackoverflow):

Here's the way I'd do it (minimal number of variables needed):

for (i=0; s[i]; s[i]=='.' ? i++ : *s++);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement important perf impacted This issues impacts performance of duplicut (either positively of negatively)
Projects
None yet
Development

No branches or pull requests

1 participant