Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UX: interrupt codeql database analyze should write the already computed results to the output file #17930

Open
disconnect3d opened this issue Nov 7, 2024 · 2 comments
Labels
question Further information is requested

Comments

@disconnect3d
Copy link
Contributor

When executing codeql database analyze <db> --format=sarif-latest --output=out.sarif -- <querysuite> on a large project it may happen that certain queries evaluation takes a very long time.

In such a case it would be very useful to be able to interrupt the analyze process (e.g., via CTRL+C) and save the output file with results from the queries that have already been evaluated.

This along with #17929 and #17928 would substantially improve the user experience of CodeQL.

PS: Sorry if those issues should belong to the CodeQL CLI binaries repo. I started creating them here and realized it later on. Let me know if I should move them there.

@jketema
Copy link
Contributor

jketema commented Nov 7, 2024

Hi @disconnect3d. Thanks for your suggestion, we'll take this into consideration.

@hmakholm
Copy link
Contributor

hmakholm commented Nov 7, 2024

This is not as straightforward as it would appear as first. The database analyze command works in two phases: First all of the queries are "evaluated" into tabular results (stored in an internal binary format as .bqrs files within the results subdirectory of the database), and then those tables are "interpreted" into alerts in SARIF format.

The reason for splitting into two phases is this: The evaluator really wants to evaluate all the queries together so it can share intermediate results between them. Could we temporarily pause it to do "interpretation" after each .bqrs has been produced? Not really. For alerts that show taint paths through the code, the interpretation phase is responsible for selecting some representative paths through a larger graph of dataflow that was produced during the evaluation phase. This is a potentially RAM-intensive computation, and if the evaluator is still active (even it it were to be paused), it is still using all of the configured --ram for its own intermediate results.

If you stop database analyze (or database run-queries) partway through the evaluation phase, you can still get alerts from the .bqrs files that have been completed by then, by using codeql database interpret-results manually afterwards. It will ignore the still-missing .bqrs files with a modicum of grace, save for printing whiny complaints to the console.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants