Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Java: sanitize values which are checked against an allowlist using java.util.List.contains or java.util.Set.contains #17051

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

owen-mc
Copy link
Contributor

@owen-mc owen-mc commented Jul 23, 2024

Checking that a value is in a set of compile-time constant values should be a sanitizer, because what was untrusted data has now been checked to be in a known set of values. It is hard in CodeQL to identify when this happens, as there are many ways of checking that a value is in a known set of values. This PR introduces the framework for this sanitizer and implements it for a small number of cases:

  • currently only a call to contains on a java.util.List or java.util.Set which contains only compile-time constant values
    • if the argument to contains is x.toLowerCase() or x.toUpperCase() then the value that we sanitize is x
  • the allowlist (qualifier of contains) must either be
    • an immutable allowlist created using List.of(...), Set.of(...), Collections.unmodifiableSet(Arrays.asList(...)) or Collections.unmodifiableList(Arrays.asList(...)), either created locally or read from a final static field.
    • a locally constructed List or Set (via a constructor or Arrays.asList(...))
      • which may have had compile-time constant elements added using add, addFirst or addLast
      • which does not have any other methods called on it, and which is not an argument to any other methods
      • which is not captured in any lambda (as this might lead to non-compile time constant elements being added to it).

@owen-mc owen-mc changed the title Java: sanitize values which are checked against an allowlist (currently only calling List.contains on a list constructed locally with List.of) Java: sanitize values which are checked against an allowlist (currently only java.util.List.contains) Oct 3, 2024
@owen-mc owen-mc marked this pull request as ready for review October 9, 2024 09:12
@owen-mc owen-mc requested a review from a team as a code owner October 9, 2024 09:12
@owen-mc
Copy link
Contributor Author

owen-mc commented Oct 9, 2024

I should run QA to check alert changes before this is merged. I have run DCA but I don't have enough experience with interpreting java DCA results to know if it is indicative of a performance problem.

@owen-mc owen-mc changed the title Java: sanitize values which are checked against an allowlist (currently only java.util.List.contains) Java: sanitize values which are checked against an allowlist using java.util.List.contains or java.util.Set.contains Oct 9, 2024
@owen-mc
Copy link
Contributor Author

owen-mc commented Oct 9, 2024

Here is one problem I've seen while looking through MRVA results (slightly adapted from core/src/test/java/com/alibaba/druid/bvt/sql/odps/issues/Issue4933.java in alibaba/druid). We don't recognise that the following code might add a non-constant string to tables.

List<String> tables = new ArrayList<>();
SQLUtils.acceptTableSource(
        sql,
        DbType.odps,
        e -> tables.add(((SQLExprTableSource)e).getTableName()),
        e -> e instanceof SQLExprTableSource
);
if(tables.contains("src")) {

Update: I've addressed this now.

x = mc.getArgument(arg)
)
|
x = e or arg = mc.(SafeCall).getArg()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move x = e to x != e in the first clause of the forall -- as it stands, the two arms of the disjunction x = e or arg = mc.(SafeCall).getArg() bind different variables, which seems like a recipe for bad RA

}

/** A comparison against a list of compile-time constants. */
abstract class ListOfConstantsComparison extends Guard {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make private?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking in not making this all private was that someone could add some classes representing their favourite way of making a list of constants.

}

/** Classes for `java.util.List` and `java.util.Set`. */
module Collection {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not really a general module for collection classes, and if it were, it probably shouldn't be in a library called ListOfConstantsSanitizer.qll, hence make private.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking in not making this all private was that someone could add some classes representing their favourite way of making a list of constants.

@aschackmull
Copy link
Contributor

Alright, I'm going to step back a bit and try to gather a higher-level picture (there are probably more details I could comment on). What we want is to identify expressions that are always sets/lists of known constants. This is a universal flow problem and the solution is generally expected to be a small set. Any use of local data flow is existential rather than universal flow, so that is only really valid in the dual setting, i.e. under a negation, where we identify things that are not constant, but that's generally a huge set, so probably a bad strategy.
So I'd recommend reworking this as a positive recursive definition that avoids any use of existential data flow. The base cases are simple and the recursive steps can take a variety of forms including library calls like Arrays.asList and SSA flow.

Comment on lines +28 to +30
public static final Set<String> badAllowList5;
public static Set<String> badAllowList6 = Set.of("allowed1", "allowed2", "allowed3");
public final Set<String> badAllowList7 = Set.of("allowed1", "allowed2", "allowed3");
Copy link
Contributor

@aschackmull aschackmull Oct 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these 3 cases considered bad? I think we can use just a tiny bit of closed-world assumption and identify e.g. "effectively final" fields.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote these as negative examples because it's hard to be sure that they don't contain a non-constant element. In practice, when people want to check a value is in an allowed list of constants, I think they construct that list locally if it will be used once or put it in a static field if it might be used more broadly, and they generally make that list final and immutable, so that it's clear that it only has the contents that it seems to have on first glance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other words, I think we should err on the side of being cautious when identifying these sanitizers, and I think it's reasonable to be able to say to a user who complains about an FP "we didn't recognize that sanitizer, but if you made it a static final field/immutable then we would."

@owen-mc
Copy link
Contributor Author

owen-mc commented Oct 17, 2024

Thank you for the thorough review, @aschackmull . I have addressed the superficial comments as it is easy to do so. I will now try to rewrite it as you suggest, which will take longer. Do you have any examples of anything written in the format you are suggesting, which I can refer to? I'm not immediately sure how to structure it.

@aschackmull
Copy link
Contributor

Do you have any examples of anything written in the format you are suggesting, which I can refer to? I'm not immediately sure how to structure it.

The Java TypeFlow library is an example of a universal flow calculation. I was actually thinking that it might be useful to extract the "universal flow" part of that as a new individual shared library, and then use that.

Aside: The "collection might reach update with non-constant" will need existential flow as it is only naturally expressed as a negation in the positive formulation of constants and collections-of-constants.

@aschackmull
Copy link
Contributor

I've put up a PR with a reimplementation of this in terms of universal flow here: #17901. It builds on top of #17863.
I've compared results to the implementation given in this PR, which has allowed me to iron out a couple of bugs and add useful improvements. This also identified several FPs in this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants