-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Java: sanitize values which are checked against an allowlist using java.util.List.contains or java.util.Set.contains #17051
base: main
Are you sure you want to change the base?
Conversation
9e85afb
to
d062b8b
Compare
List.contains
on a list constructed locally with List.of
)d062b8b
to
f523ff3
Compare
f523ff3
to
bf001fc
Compare
I should run QA to check alert changes before this is merged. I have run DCA but I don't have enough experience with interpreting java DCA results to know if it is indicative of a performance problem. |
Here is one problem I've seen while looking through MRVA results (slightly adapted from List<String> tables = new ArrayList<>();
SQLUtils.acceptTableSource(
sql,
DbType.odps,
e -> tables.add(((SQLExprTableSource)e).getTableName()),
e -> e instanceof SQLExprTableSource
);
if(tables.contains("src")) { Update: I've addressed this now. |
x = mc.getArgument(arg) | ||
) | ||
| | ||
x = e or arg = mc.(SafeCall).getArg() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move x = e to x != e
in the first clause of the forall -- as it stands, the two arms of the disjunction x = e or arg = mc.(SafeCall).getArg()
bind different variables, which seems like a recipe for bad RA
If a mutable allowlist flows to a captured variable in a lambda, we can't be sure that it won't have a non-constant element added, so we exclude it as a list of constants sanitizer.
1f0e26f
to
f37077d
Compare
} | ||
|
||
/** A comparison against a list of compile-time constants. */ | ||
abstract class ListOfConstantsComparison extends Guard { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make private?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thinking in not making this all private was that someone could add some classes representing their favourite way of making a list of constants.
} | ||
|
||
/** Classes for `java.util.List` and `java.util.Set`. */ | ||
module Collection { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not really a general module for collection classes, and if it were, it probably shouldn't be in a library called ListOfConstantsSanitizer.qll
, hence make private.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thinking in not making this all private was that someone could add some classes representing their favourite way of making a list of constants.
Alright, I'm going to step back a bit and try to gather a higher-level picture (there are probably more details I could comment on). What we want is to identify expressions that are always sets/lists of known constants. This is a universal flow problem and the solution is generally expected to be a small set. Any use of local data flow is existential rather than universal flow, so that is only really valid in the dual setting, i.e. under a negation, where we identify things that are not constant, but that's generally a huge set, so probably a bad strategy. |
public static final Set<String> badAllowList5; | ||
public static Set<String> badAllowList6 = Set.of("allowed1", "allowed2", "allowed3"); | ||
public final Set<String> badAllowList7 = Set.of("allowed1", "allowed2", "allowed3"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are these 3 cases considered bad? I think we can use just a tiny bit of closed-world assumption and identify e.g. "effectively final" fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote these as negative examples because it's hard to be sure that they don't contain a non-constant element. In practice, when people want to check a value is in an allowed list of constants, I think they construct that list locally if it will be used once or put it in a static field if it might be used more broadly, and they generally make that list final and immutable, so that it's clear that it only has the contents that it seems to have on first glance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In other words, I think we should err on the side of being cautious when identifying these sanitizers, and I think it's reasonable to be able to say to a user who complains about an FP "we didn't recognize that sanitizer, but if you made it a static final field/immutable then we would."
Thank you for the thorough review, @aschackmull . I have addressed the superficial comments as it is easy to do so. I will now try to rewrite it as you suggest, which will take longer. Do you have any examples of anything written in the format you are suggesting, which I can refer to? I'm not immediately sure how to structure it. |
The Java TypeFlow library is an example of a universal flow calculation. I was actually thinking that it might be useful to extract the "universal flow" part of that as a new individual shared library, and then use that. Aside: The "collection might reach update with non-constant" will need existential flow as it is only naturally expressed as a negation in the positive formulation of constants and collections-of-constants. |
I've put up a PR with a reimplementation of this in terms of universal flow here: #17901. It builds on top of #17863. |
Checking that a value is in a set of compile-time constant values should be a sanitizer, because what was untrusted data has now been checked to be in a known set of values. It is hard in CodeQL to identify when this happens, as there are many ways of checking that a value is in a known set of values. This PR introduces the framework for this sanitizer and implements it for a small number of cases:
contains
on ajava.util.List
orjava.util.Set
which contains only compile-time constant valuescontains
isx.toLowerCase()
orx.toUpperCase()
then the value that we sanitize isx
contains
) must either beList.of(...)
,Set.of(...)
,Collections.unmodifiableSet(Arrays.asList(...))
orCollections.unmodifiableList(Arrays.asList(...))
, either created locally or read from a final static field.List
orSet
(via a constructor orArrays.asList(...)
)add
,addFirst
oraddLast