-
Notifications
You must be signed in to change notification settings - Fork 145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PPL: Add json_extract
function
#3262
base: main
Are you sure you want to change the base?
PPL: Add json_extract
function
#3262
Conversation
Signed-off-by: Kenrick Yap <[email protected]>
Signed-off-by: Kenrick Yap <[email protected]>
Signed-off-by: Kenrick Yap <[email protected]>
Signed-off-by: Kenrick Yap <[email protected]>
Signed-off-by: Kenrick Yap <[email protected]>
Signed-off-by: Kenrick Yap <[email protected]>
Signed-off-by: Kenrick Yap <[email protected]>
Signed-off-by: Kenrick Yap <[email protected]>
Signed-off-by: Kenrick Yap <[email protected]>
Signed-off-by: Kenrick Yap <[email protected]>
Signed-off-by: Kenrick Yap <[email protected]>
Co-authored-by: Andrew Carbonetto <[email protected]> Signed-off-by: kenrickyap <[email protected]>
….java Co-authored-by: Andrew Carbonetto <[email protected]> Signed-off-by: kenrickyap <[email protected]>
Signed-off-by: Kenrick Yap <[email protected]>
…nsearch-project-sql into feature/json-valid Signed-off-by: Kenrick Yap <[email protected]>
Signed-off-by: Kenrick Yap <[email protected]>
Signed-off-by: Kenrick Yap <[email protected]>
Signed-off-by: Kenrick Yap <[email protected]>
Signed-off-by: Kenrick Yap <[email protected]>
Signed-off-by: Kenrick Yap <[email protected]>
Signed-off-by: Andrew Carbonetto <[email protected]>
Signed-off-by: Andrew Carbonetto <[email protected]>
Signed-off-by: Andrew Carbonetto <[email protected]>
Signed-off-by: Andrew Carbonetto <[email protected]>
Signed-off-by: Andrew Carbonetto <[email protected]>
Signed-off-by: Andrew Carbonetto <[email protected]>
Signed-off-by: Andrew Carbonetto <[email protected]>
Signed-off-by: Andrew Carbonetto <[email protected]>
Signed-off-by: Andrew Carbonetto <[email protected]>
Signed-off-by: Kenrick Yap <[email protected]>
Signed-off-by: Kenrick Yap <[email protected]>
dfe5d3a
to
75e9cc3
Compare
Signed-off-by: kenrickyap <[email protected]>
Signed-off-by: Kenrick Yap <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few comments to address
core/src/main/java/org/opensearch/sql/expression/json/JsonFunctions.java
Outdated
Show resolved
Hide resolved
core/src/test/java/org/opensearch/sql/expression/json/JsonFunctionsTest.java
Outdated
Show resolved
Hide resolved
core/src/test/java/org/opensearch/sql/expression/json/JsonFunctionsTest.java
Outdated
Show resolved
Hide resolved
core/src/test/java/org/opensearch/sql/expression/json/JsonFunctionsTest.java
Outdated
Show resolved
Hide resolved
core/src/test/java/org/opensearch/sql/expression/json/JsonFunctionsTest.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/opensearch/sql/expression/json/JsonFunctions.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Kenrick Yap <[email protected]>
Signed-off-by: Kenrick Yap <[email protected]>
Signed-off-by: Kenrick Yap <[email protected]> # Conflicts: # core/src/main/java/org/opensearch/sql/expression/DSL.java # core/src/main/java/org/opensearch/sql/expression/function/BuiltinFunctionName.java # core/src/main/java/org/opensearch/sql/expression/json/JsonFunctions.java # core/src/main/java/org/opensearch/sql/utils/JsonUtils.java # core/src/test/java/org/opensearch/sql/expression/json/JsonFunctionsTest.java # docs/user/ppl/functions/json.rst # integ-test/src/test/java/org/opensearch/sql/ppl/JsonFunctionsIT.java # integ-test/src/test/resources/json_test.json # ppl/src/main/antlr/OpenSearchPPLLexer.g4 # ppl/src/main/antlr/OpenSearchPPLParser.g4
Signed-off-by: Kenrick Yap <[email protected]>
Signed-off-by: Kenrick Yap <[email protected]>
docs/user/ppl/functions/json.rst
Outdated
Usage: `json_extract(doc, path[, path])` Extracts a JSON value from a json document based on the path specified. | ||
|
||
Argument type: STRING, STRING | ||
|
||
Return type: STRING/BOOLEAN/DOUBLE/INTEGER/NULL/STRUCT/ARRAY | ||
|
||
- Up to 3 paths can be provided, and results of each `path` with be returned in an ARRAY. | ||
- Returns an ARRAY if `path` points to multiple results (e.g. $.a[*]) or if the `path` points to an array. | ||
- Return null if `path` is not valid, or if JSON `doc` is MISSING or NULL. | ||
- Throws SemanticCheckException if `doc` or `path` is malformed. | ||
- Throws ExpressionEvaluationException if `path` is missing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this documentation might need some updates for multiple paths?
- Usage:
json_extract(doc, path[, path[, path[[)
(or another way to indicate that up to 3 paths can be provided) - Usage: "based on the path(s) specified"
- Argument type: "STRING, STRING[, STRING[, STRING]] (or another way to indicate that up to 3 paths can be provided)
- What about the case where only one of the paths is invalid? Should this read "Returns null if any
path
is not valid"? Similarly, should the last two bullets be updated to specify that they apply to "anypath
"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed
@Test | ||
public void test_json_extract() throws IOException { | ||
JSONObject result; | ||
result = | ||
executeQuery( | ||
String.format( | ||
"source=%s | where json_valid(json_string) | eval" | ||
+ " extracted=json_extract(json_string, '$.b') | fields test_name, extracted", | ||
TEST_INDEX_JSON_TEST)); | ||
verifySchema( | ||
result, schema("test_name", null, "string"), schema("extracted", null, "undefined")); | ||
verifyDataRows( | ||
result, | ||
rows("json nested object", new JSONObject(Map.of("c", "3"))), | ||
rows("json object", "2"), | ||
rows("json array", null), | ||
rows("json nested array", null), | ||
rows("json scalar string", null), | ||
rows("json scalar int", null), | ||
rows("json scalar float", null), | ||
rows("json scalar double", null), | ||
rows("json scalar boolean true", null), | ||
rows("json scalar boolean false", null), | ||
rows("json empty string", null), | ||
rows("json nested list", new JSONArray(List.of(Map.of("c", "2"), Map.of("c", "3"))))); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed during yesterday's review session, we certainly don't need to test all edge cases in IT, but I think we probably want to have more than one test. At least, we should probably have one with two path strings and one with three path strings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...and maybe test the 4 path case where we get a parsing error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we want to test a couple of other error scenarios mentioned: such as null returns and Semantic errors for malformed json.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added test IT test cases for use of multiple paths and for the cases when semantic errors are thrown. Though I am curious why we need to test for errors in this case as I cannot find any other IT test testing for errors.
In addition by the time the IT test receive errors they are received as ResponseErrors so had to check if error contains reference to semantic error.
List<ExprValue> resultList = new ArrayList<>(); | ||
|
||
for (ExprValue path : paths) { | ||
System.out.println("Processing path: " + path); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like an errant debug statement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resolved
* @return ExprValue of value at given path of json string. | ||
*/ | ||
public static ExprValue extractJson(ExprValue json, ExprValue... paths) { | ||
List<ExprValue> resultList = new ArrayList<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since know that the array list is always going to be the same since as paths
, can we just initialize it with a specified initialCapacity
to save some memory space?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resolved
} | ||
|
||
private static ExprValue extractJsonPath(String json, String path) { | ||
if (json.isEmpty() || json.equals("null")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need a separate case for either of these? Seems like this would just throw a PathNotFoundException
and returns LITERAL_NULL
below if not (at least for "null", I'm not sure about the empty string). Probably makes sense to reduce branching here if possible. 🤷
Moreover, do we actually want to return LITERAL_NULL
for the empty string? Is that an invalid JSON string? If so, does it make sense to raise the corresponding exception instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
both "null" and "" are considered valid json strings by json_valid, since we expect that users will use json_valid to first filter for valid json then use json extract to avoid errors, we need these specific cases to be consistent with json_valid
assertEquals(expected, actual); | ||
} | ||
|
||
private static void execute_extract_json(ExprValue expected, String json, String path) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be generalized to allow multiple paths, so you can use it for json_extract_search_list_of_paths
as well?
private static void execute_extract_json(ExprValue expected, String json, String path) { | |
private static void execute_extract_json(ExprValue expected, String json, String... paths) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resolved
} | ||
|
||
@Test | ||
void json_extract_throws_ExpressionEvaluationException() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could split execute_extract_json
into two methods: execute_extract_json
(which would just actually run jsonExtract
) and test_extract_json
(or assert_extract_json
), which would call execute_extract_json
and then actually do the comparison.
That way, test methods like this could call execute_extract_json
, and no have to worry about calling DSL.literal
🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
} | ||
|
||
@Test | ||
void json_extract_search_arrays() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have tests for the following:
- Multiple paths where one or more paths match an array
- Multiple paths where one or more paths match more than one value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed to ec6ff5e
.
Co-authored-by: Taylor Curran <[email protected]> Signed-off-by: kenrickyap <[email protected]>
|
||
Argument type: STRING, STRING | ||
|
||
Return type: STRING/BOOLEAN/DOUBLE/INTEGER/NULL/STRUCT/ARRAY |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use JSON_STRING instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we always return an ARRAY?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the case that only one string is provided, we return the associated type which will not be a json string. e.g. if path "$.a" is provided from json_string "{"a": 1}: then a integer is returned
docs/user/ppl/functions/json.rst
Outdated
|
||
Return type: STRING/BOOLEAN/DOUBLE/INTEGER/NULL/STRUCT/ARRAY | ||
|
||
- Up to 3 paths can be provided, and results of each `path` with be returned in an ARRAY. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
phrasing is a little confusing.
- Up to 3 paths can be provided, and results of each `path` with be returned in an ARRAY. | |
- Up to 3 paths can be provided, and results of all possible `path`s will be returned in an ARRAY. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resolved
|
||
- Up to 3 paths can be provided, and results of each `path` with be returned in an ARRAY. | ||
- Returns an ARRAY if `path` points to multiple results (e.g. $.a[*]) or if the `path` points to an array. | ||
- Return null if `path` is not valid, or if JSON `doc` is MISSING or NULL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
null or empty array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the case of multiple paths an array of nulls would be returned
docs/user/ppl/functions/json.rst
Outdated
- Returns an ARRAY if `path` points to multiple results (e.g. $.a[*]) or if the `path` points to an array. | ||
- Return null if `path` is not valid, or if JSON `doc` is MISSING or NULL. | ||
- Throws SemanticCheckException if `doc` or `path` is malformed. | ||
- Throws ExpressionEvaluationException if `path` is missing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not handled in code in this PR, this is the default error thrown if provided function does not matched defined schema for functions. Thrown by DefaultFunctionResolver.
@Test | ||
public void test_json_extract() throws IOException { | ||
JSONObject result; | ||
result = | ||
executeQuery( | ||
String.format( | ||
"source=%s | where json_valid(json_string) | eval" | ||
+ " extracted=json_extract(json_string, '$.b') | fields test_name, extracted", | ||
TEST_INDEX_JSON_TEST)); | ||
verifySchema( | ||
result, schema("test_name", null, "string"), schema("extracted", null, "undefined")); | ||
verifyDataRows( | ||
result, | ||
rows("json nested object", new JSONObject(Map.of("c", "3"))), | ||
rows("json object", "2"), | ||
rows("json array", null), | ||
rows("json nested array", null), | ||
rows("json scalar string", null), | ||
rows("json scalar int", null), | ||
rows("json scalar float", null), | ||
rows("json scalar double", null), | ||
rows("json scalar boolean true", null), | ||
rows("json scalar boolean false", null), | ||
rows("json empty string", null), | ||
rows("json nested list", new JSONArray(List.of(Map.of("c", "2"), Map.of("c", "3"))))); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...and maybe test the 4 path case where we get a parsing error
@Test | ||
public void test_json_extract() throws IOException { | ||
JSONObject result; | ||
result = | ||
executeQuery( | ||
String.format( | ||
"source=%s | where json_valid(json_string) | eval" | ||
+ " extracted=json_extract(json_string, '$.b') | fields test_name, extracted", | ||
TEST_INDEX_JSON_TEST)); | ||
verifySchema( | ||
result, schema("test_name", null, "string"), schema("extracted", null, "undefined")); | ||
verifyDataRows( | ||
result, | ||
rows("json nested object", new JSONObject(Map.of("c", "3"))), | ||
rows("json object", "2"), | ||
rows("json array", null), | ||
rows("json nested array", null), | ||
rows("json scalar string", null), | ||
rows("json scalar int", null), | ||
rows("json scalar float", null), | ||
rows("json scalar double", null), | ||
rows("json scalar boolean true", null), | ||
rows("json scalar boolean false", null), | ||
rows("json empty string", null), | ||
rows("json nested list", new JSONArray(List.of(Map.of("c", "2"), Map.of("c", "3"))))); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we want to test a couple of other error scenarios mentioned: such as null returns and Semantic errors for malformed json.
Signed-off-by: Kenrick Yap <[email protected]>
| json empty string | | null | | ||
+--------------------+--------------------------------------+-------------------------+ | ||
|
||
os> source=json_test | where test_name="json nested list" | eval json_extract=json_extract(json_string, '$.b[1].c') | fields test_name, json_string, json_extract |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be kind of you if you post examples and description to the PR
Description | ||
>>>>>>>>>>> | ||
|
||
Usage: `json_extract(doc, path[, path[, path])` Extracts a JSON value from a json document based on the path(s) specified. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, it accepts 1-3 paths. Why not array of paths then?
|
||
Example:: | ||
|
||
os> source=json_test | where json_valid(json_string) | eval json_extract=json_extract(json_string, '$.b') | fields test_name, json_string, json_extract |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it accept "legacy" path (path which starts with ".")? if yes, this should be tested and documented. If no, this should be documented.
Description
Implements json_extract function that would be useful to extract specific objects and scalars from an existing json_object.
Related Issues
Resolves #3211
Follow up issue for #3027
Check List
--signoff
.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.