Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add shared basic block library #18497

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

paldepind
Copy link
Contributor

@paldepind paldepind commented Jan 15, 2025

Adds a shared basic block library in the controlflow pack and modifies the basic block implementation of Swift, Ruby, C#, and Rust to use it.

A few notes:

  • There's two ways to use the basic block library. Either through the new codeql.controlflow.BasicBlock or through the existing codeql.controlflow.Cfg. The former is suitable for languages that don't use the control graph library and the latter for those that do. In this PR all languages use the latter, but I've tried instantiating codeql.controlflow.BasicBlock for Go and that seems to work fine as well.

  • The BasicBlock class for C# now has both the current getASuccessorByType/1 method and a new getASuccessor/1 method that it inherits from the basic block library and which is the name used in Ruby, Rust, and Swift. We could consider deprecating getASuccessorByType/1 in order to not have two methods doing the same and to increase consistency between language libraries.

  • For the BasicBlock subclasses in Cfg.qll I've changed the current names a bit, such that they are all consistently of the form ${name}BasicBlock. For instance JoinBlock is instead JoinBasicBlock. For the existing language-level basic block libraries I've kept the current names for backwards compatibility, so only Rust use the new names.

@github-actions github-actions bot added C# Ruby Rust Pull requests that update Rust code Swift labels Jan 15, 2025
@paldepind paldepind force-pushed the shared-basic-block-library branch from 1c9e7cc to 5bfeff6 Compare January 15, 2025 14:26
@paldepind paldepind force-pushed the shared-basic-block-library branch 5 times, most recently from 7112872 to b313a48 Compare January 16, 2025 15:03
@paldepind paldepind force-pushed the shared-basic-block-library branch from b313a48 to 8b20b0d Compare January 16, 2025 15:37
@paldepind paldepind marked this pull request as ready for review January 16, 2025 15:45
@paldepind paldepind requested review from a team as code owners January 16, 2025 15:45
@hvitved
Copy link
Contributor

hvitved commented Jan 17, 2025

  • We could consider deprecating getASuccessorByType/1 in order to not have two methods doing the same and to increase consistency between language libraries.

I think we should do that.

Copy link
Contributor

@hvitved hvitved left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice work, great to see even more shared code 🎉

shared/controlflow/codeql/controlflow/BasicBlock.qll Outdated Show resolved Hide resolved
shared/controlflow/codeql/controlflow/BasicBlock.qll Outdated Show resolved Hide resolved
shared/controlflow/codeql/controlflow/BasicBlock.qll Outdated Show resolved Hide resolved
shared/controlflow/codeql/controlflow/BasicBlock.qll Outdated Show resolved Hide resolved
shared/controlflow/codeql/controlflow/Cfg.qll Outdated Show resolved Hide resolved
ruby/ql/lib/codeql/ruby/controlflow/BasicBlocks.qll Outdated Show resolved Hide resolved
ruby/ql/lib/codeql/ruby/controlflow/BasicBlocks.qll Outdated Show resolved Hide resolved
@paldepind
Copy link
Contributor Author

Thanks for the review and comments. I've addressed all but one.

* Gets an `id` of `node`. This is used to order the predecessors of a join
* basic block.
*/
bindingset[node]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the bindingset here appropriate? My thinking was that we don't need IDs of all nodes, only the few that end up as the first in a basic block prior to a join block.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the actual equivalenceRelation based hacks will be able to benefit, so we can remove it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, no. I don't think there's any way to implement a numbering scheme that can make use of a pre-bound node. So you're only risking inlining and recomputation.

Suggested change
bindingset[node]

@paldepind
Copy link
Contributor Author

paldepind commented Jan 17, 2025

  • We could consider deprecating getASuccessorByType/1 in order to not have two methods doing the same and to increase consistency between language libraries.

I think we should do that.

I'll do that in a follow up PR 👍

@paldepind paldepind requested a review from a team as a code owner January 17, 2025 12:32
@github-actions github-actions bot added the Actions Analysis of GitHub Actions label Jan 17, 2025
@@ -134,6 +134,10 @@ private module Implementation implements CfgShared::InputSig<Location> {
SuccessorType getAMatchingSuccessorType(Completion c) { result = c.getAMatchingSuccessorType() }

predicate isAbnormalExitType(SuccessorType t) { none() }

int idOfAstNode(AstNode node) { none() }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that getJoinBlockPredecessor isn't implemented in the Actions basic block library, so this instantiation is actually fine. And we could move Actions to the shared BB library in a future PR.

@@ -64,6 +80,7 @@ module Make<InputSig Input> {
BasicBlock getAPredecessor(SuccessorType t) { result.getASuccessor(t) = this }

/** Gets the control flow node at a specific (zero-indexed) position in this basic block. */
cached
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It needs to be defined inside the Cached module, and then just referenced here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be possible to reference them with the BasicBlocks:: prefix?

Yes, that works. I didn't think of that 🙈

* arbitrary order.
*/
cached
JoinPredecessorBasicBlock getJoinBlockPredecessor(JoinBasicBlock jb, int i) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be possible to reference them with the BasicBlocks:: prefix?

Comment on lines 222 to 234
// if x or y
// foo
// else
// bar
// ```
//
// we have a CFG that looks like
//
// x --false--> [false] x or y --false--> bar
// \ |
// --true--> y --false--
// \
// --true--> [true] x or y --true--> foo
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This CFG is for x and y - the true/false edges are mixed up. The easiest fix is probably just to replace x or y with x and y.

Comment on lines +253 to +254
private predicate bbIndex(Node bbStart, Node cfn, int i) =
shortestDistances(startsBB/1, intraBBSucc/2)(bbStart, cfn, i)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned the other day, I don't think shortestDistances is optimal. IIRC it can be beat with a QL recursion, but it's something that'll need fresh measurements on some large examples. I expect the C++ implementation to be in the lead in terms of performance on this particular predicate, and C++ uses a straightforward QL recursion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since that requires some benchmarking I'd prefer to leave that for a future PR.

* Gets an `id` of `scope`. This is used to order the predecessors of a join
* basic block.
*/
bindingset[scope]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
bindingset[scope]

Comment on lines +1564 to +1566
if this instanceof EntryBasicBlock
then result = this.getFirstNode().getScope()
else result = this.getAPredecessor().getScope()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this recursive and not just the following?

Suggested change
if this instanceof EntryBasicBlock
then result = this.getFirstNode().getScope()
else result = this.getAPredecessor().getScope()
result = this.getFirstNode().getScope()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have been wondering the same, it's hard to imagine how they would not be equivalent for correct implementations of getScope and EntryBasicBlock, but I simply reproduced the existing code (which used overloading but was otherwise the same) thinking that probably was some reason 😅

final class BasicBlock extends BasicBlockImpl::BasicBlock {
// We extend `BasicBlockImpl::BasicBlock` to add the `getScope`.
/** Gets the scope of this basic block. */
CfgScope getScope() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on putting Node.getScope through the BasicBlock input, such that getScope could be added to BasicBlock directly in its definition inside shared/controlflow/codeql/controlflow/BasicBlock.qll? That way this could just be an alias rather than having to copy all the member predicates.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would definitely save some boiler plate, but could make the BasicBlock module less flexible for instantiations that don't want a getScope.

Comment on lines +1737 to +1742
predicate immediatelyControls(BasicBlock succ, SuccessorType s) {
succ = this.getASuccessor(s) and
forall(BasicBlock pred | pred = succ.getAPredecessor() and pred != this |
succ.dominates(pred)
)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Java, this predicate is called dominatingEdge and it includes bbIDominates(this, succ) as an additional conjunct. I think that ought to be semantically identical, but it might offer a better join-order since immediate dominance joined with the successor relation yields a much stronger context for the forall.
Also, in this general library, it's probably worth it to expand the qldoc a bit on the relationship between the concept of controls and dominance. In short: controls equals edge dominance. See also the qldoc for dominatingEdge in Java.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the concept of dominating edge aka immediatelyControls might very well be applicable beyond ConditionBasicBlocks, so it should possibly be available as a top-level predicate between basic blocks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In particular, this concept can be put into BasicBlocks.qll as it's independent of .isCondition().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Actions Analysis of GitHub Actions C# documentation Ruby Rust Pull requests that update Rust code Swift
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants