Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PPL: Add json_extract function #3262

Open
wants to merge 86 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 84 commits
Commits
Show all changes
86 commits
Select commit Hold shift + click to select a range
70152eb
added implementation
14yapkc1 Jan 3, 2025
76c3995
added doctest, integ-tests, and unit tests
14yapkc1 Jan 6, 2025
ce2c551
addressed PR comments
kenrickyap Jan 6, 2025
ad1bde3
fixed unit tests
kenrickyap Jan 7, 2025
ccf47a2
addressed pr comments
kenrickyap Jan 7, 2025
acc76a0
addressed PR comments
kenrickyap Jan 7, 2025
519c6f2
removed unused dependencies
kenrickyap Jan 7, 2025
2e319fe
linting
kenrickyap Jan 7, 2025
ee0820d
addressed pr comment and rolling back disabled test case
kenrickyap Jan 8, 2025
d44fc5a
Merge branch 'main' into feature/json-valid
kenrickyap Jan 8, 2025
3407d4a
removed disabled import
kenrickyap Jan 9, 2025
7ef6cc9
Update docs/user/ppl/functions/json.rst
kenrickyap Jan 9, 2025
e5e90ac
Update integ-test/src/test/java/org/opensearch/sql/ppl/JsonFunctionIT…
kenrickyap Jan 9, 2025
2187a5a
nit
kenrickyap Jan 9, 2025
5e1e488
Merge branch 'feature/json-valid' of https://github.com/Bit-Quill/ope…
kenrickyap Jan 9, 2025
3512b33
fixed integ test
kenrickyap Jan 9, 2025
9fea606
change text type to keyword
kenrickyap Jan 9, 2025
fbc54bc
addressed PR comments
kenrickyap Jan 10, 2025
31ad2a4
fix doc-test
kenrickyap Jan 11, 2025
2b2a8f3
added null test
kenrickyap Jan 14, 2025
dc96563
Merge branch 'main' into feature/json-valid
acarbonetto Jan 15, 2025
1913bfe
SQL: adding error case unit tests for json_valid
acarbonetto Jan 15, 2025
67d979d
json_valid: null and missing should return false
acarbonetto Jan 15, 2025
aa6b723
PPL: Add json and cast to json functions
acarbonetto Jan 8, 2025
4c99235
PPL: Update json cast for review
acarbonetto Jan 8, 2025
9ccde7f
Fix testes
acarbonetto Jan 9, 2025
4306bf3
spotless
acarbonetto Jan 9, 2025
613137b
Fix tests
acarbonetto Jan 14, 2025
ab28872
SPOTLESS
acarbonetto Jan 14, 2025
3ec16e0
Clean up for merge
acarbonetto Jan 15, 2025
6dbf37b
added implementation
14yapkc1 Jan 3, 2025
b8c6d68
added doctest, integ-tests, and unit tests
14yapkc1 Jan 6, 2025
afb668c
addressed pr comments
kenrickyap Jan 7, 2025
54ef183
addressed PR comments
kenrickyap Jan 7, 2025
d841394
removed unused dependencies
kenrickyap Jan 7, 2025
25fb527
linting
kenrickyap Jan 7, 2025
4a20d08
addressed pr comment and rolling back disabled test case
kenrickyap Jan 8, 2025
fdc4729
removed disabled import
kenrickyap Jan 9, 2025
707a0b9
nit
kenrickyap Jan 9, 2025
4f28211
Update integ-test/src/test/java/org/opensearch/sql/ppl/JsonFunctionIT…
kenrickyap Jan 9, 2025
9ec6335
fixed integ test
kenrickyap Jan 9, 2025
3324e66
SQL: adding error case unit tests for json_valid
acarbonetto Jan 15, 2025
7123c35
json_valid: null and missing should return false
acarbonetto Jan 15, 2025
dbca991
PPL: Add json and cast to json functions
acarbonetto Jan 8, 2025
7df87cb
PPL: Update json cast for review
acarbonetto Jan 8, 2025
cd45fcc
Fix testes
acarbonetto Jan 9, 2025
6f5dc07
spotless
acarbonetto Jan 9, 2025
0aae36e
Fix tests
acarbonetto Jan 14, 2025
b225f28
SPOTLESS
acarbonetto Jan 14, 2025
78af4f8
Clean up for merge
acarbonetto Jan 15, 2025
b84282a
clean up unit tests
acarbonetto Jan 15, 2025
1e23286
Add casting from undefined
acarbonetto Jan 15, 2025
343f5a2
Add cast to scalar from undefined expression
acarbonetto Jan 16, 2025
e8b6df3
Add test for missing/null
acarbonetto Jan 16, 2025
ab9be75
Clean up merge conflicts
acarbonetto Jan 17, 2025
788be9d
Fix jacoco coverage
acarbonetto Jan 17, 2025
a9721bf
Move to Switch by json type
acarbonetto Jan 17, 2025
daa95ff
Merge branch 'main' into feature/acarbo_json_cast_ppl
acarbonetto Jan 20, 2025
018e462
functionality implemented
kenrickyap Jan 20, 2025
c6c6cc1
Remove conflicted files
acarbonetto Jan 21, 2025
a5652ea
Add doctext row
acarbonetto Jan 21, 2025
2cd10a2
added integ-test and doc test
kenrickyap Jan 22, 2025
cd78ddd
fixed integ tests
kenrickyap Jan 22, 2025
afb385f
unit tests
kenrickyap Jan 23, 2025
0e91b2e
Merge branch 'main' into feature/json-extract
kenrickyap Jan 23, 2025
794db8a
finnished unit tests
kenrickyap Jan 23, 2025
0f0b8d4
update doctest
kenrickyap Jan 23, 2025
f030057
addessed comments
kenrickyap Jan 27, 2025
2b08007
added addition edge cases for unit tests
kenrickyap Jan 28, 2025
be52786
Merge branch 'feature/acarbo_json_cast_ppl' into feature/json-extract
kenrickyap Jan 28, 2025
6bd2f40
Merge branch 'feature/acarbo_json_cast_ppl' into feature/json-extract
kenrickyap Jan 28, 2025
0b9e9e4
Merge branch 'feature/json-extract' of https://github.com/Bit-Quill/o…
kenrickyap Jan 28, 2025
6678be4
addressed PR comments
kenrickyap Jan 29, 2025
e57fa21
fix code coverage
14yapkc1 Jan 30, 2025
112be65
Update core/src/test/java/org/opensearch/sql/expression/json/JsonFunc…
kenrickyap Jan 30, 2025
306ac97
address comments
kenrickyap Jan 30, 2025
75e9cc3
fix build error
kenrickyap Jan 30, 2025
77827bb
Merge branch 'main' into feature/json-extract
kenrickyap Jan 31, 2025
80f44e2
add header
kenrickyap Jan 31, 2025
0d1cc28
addressing PR comments
kenrickyap Feb 12, 2025
b6ae5ba
added multi path use case
kenrickyap Feb 12, 2025
adde88d
Merge branch 'main' into feature/json-extract
kenrickyap Feb 12, 2025
aa8b81e
linting
kenrickyap Feb 12, 2025
ec6ff5e
fixing doc tests
kenrickyap Feb 12, 2025
95e996b
Update core/src/main/java/org/opensearch/sql/utils/JsonUtils.java
kenrickyap Feb 13, 2025
591fb71
addressed PR comments
kenrickyap Feb 14, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions core/build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@ dependencies {
api "com.fasterxml.jackson.core:jackson-core:${versions.jackson}"
api "com.fasterxml.jackson.core:jackson-databind:${versions.jackson_databind}"
api "com.fasterxml.jackson.core:jackson-annotations:${versions.jackson}"
api group: 'com.jayway.jsonpath', name: 'json-path', version: '2.9.0'
api group: 'com.google.code.gson', name: 'gson', version: '2.8.9'
api group: 'com.tdunning', name: 't-digest', version: '3.3'
api project(':common')
Expand Down
4 changes: 4 additions & 0 deletions core/src/main/java/org/opensearch/sql/expression/DSL.java
Original file line number Diff line number Diff line change
Expand Up @@ -687,6 +687,10 @@ public static FunctionExpression jsonValid(Expression... expressions) {
return compile(FunctionProperties.None, BuiltinFunctionName.JSON_VALID, expressions);
}

public static FunctionExpression jsonExtract(Expression... expressions) {
return compile(FunctionProperties.None, BuiltinFunctionName.JSON_EXTRACT, expressions);
}

public static FunctionExpression stringToJson(Expression value) {
return compile(FunctionProperties.None, BuiltinFunctionName.JSON, value);
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -207,6 +207,7 @@ public enum BuiltinFunctionName {
/** Json Functions. */
JSON_VALID(FunctionName.of("json_valid")),
JSON(FunctionName.of("json")),
JSON_EXTRACT(FunctionName.of("json_extract")),

/** GEOSPATIAL Functions. */
GEOIP(FunctionName.of("geoip")),
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ public class JsonFunctions {
public void register(BuiltinFunctionRepository repository) {
repository.register(jsonValid());
repository.register(jsonFunction());
repository.register(jsonExtract());
}

private DefaultFunctionResolver jsonValid() {
Expand All @@ -35,4 +36,12 @@ private DefaultFunctionResolver jsonFunction() {
BuiltinFunctionName.JSON.getName(),
impl(nullMissingHandling(JsonUtils::castJson), UNDEFINED, STRING));
}

private DefaultFunctionResolver jsonExtract() {
return define(
BuiltinFunctionName.JSON_EXTRACT.getName(),
impl(JsonUtils::extractJson, UNDEFINED, STRING, STRING),
impl(JsonUtils::extractJson, UNDEFINED, STRING, STRING, STRING),
impl(JsonUtils::extractJson, UNDEFINED, STRING, STRING, STRING, STRING));
}
}
58 changes: 58 additions & 0 deletions core/src/main/java/org/opensearch/sql/utils/JsonUtils.java
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,18 @@
package org.opensearch.sql.utils;

import static org.opensearch.sql.data.model.ExprValueUtils.LITERAL_FALSE;
import static org.opensearch.sql.data.model.ExprValueUtils.LITERAL_MISSING;
import static org.opensearch.sql.data.model.ExprValueUtils.LITERAL_NULL;
import static org.opensearch.sql.data.model.ExprValueUtils.LITERAL_TRUE;

import com.fasterxml.jackson.core.JsonProcessingException;
import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;
import com.jayway.jsonpath.InvalidJsonException;
import com.jayway.jsonpath.InvalidPathException;
import com.jayway.jsonpath.JsonPath;
import com.jayway.jsonpath.PathNotFoundException;
import java.util.ArrayList;
import java.util.LinkedHashMap;
import java.util.LinkedList;
import java.util.List;
Expand Down Expand Up @@ -79,6 +85,58 @@ public static ExprValue castJson(ExprValue json) {
return processJsonNode(jsonNode);
}

/**
* Extract value of JSON string at given JSON path.
*
* @param json JSON string (e.g. "{\"hello\": \"world\"}").
* @param paths list of JSON path (e.g. "$.hello")
* @return ExprValue of value at given path of json string.
*/
public static ExprValue extractJson(ExprValue json, ExprValue... paths) {
List<ExprValue> resultList = new ArrayList<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since know that the array list is always going to be the same since as paths, can we just initialize it with a specified initialCapacity to save some memory space?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved


for (ExprValue path : paths) {
System.out.println("Processing path: " + path);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like an errant debug statement?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved

if (json == LITERAL_NULL || json == LITERAL_MISSING) {
return json;
}

String jsonString = json.stringValue();
String jsonPath = path.stringValue();

resultList.add(extractJsonPath(jsonString, jsonPath));
}

if (resultList.size() == 1) {
return resultList.getFirst();
} else {
return new ExprCollectionValue(resultList);
}
}

private static ExprValue extractJsonPath(String json, String path) {
if (json.isEmpty() || json.equals("null")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a separate case for either of these? Seems like this would just throw a PathNotFoundException and returns LITERAL_NULL below if not (at least for "null", I'm not sure about the empty string). Probably makes sense to reduce branching here if possible. 🤷

Moreover, do we actually want to return LITERAL_NULL for the empty string? Is that an invalid JSON string? If so, does it make sense to raise the corresponding exception instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both "null" and "" are considered valid json strings by json_valid, since we expect that users will use json_valid to first filter for valid json then use json extract to avoid errors, we need these specific cases to be consistent with json_valid

return LITERAL_NULL;
}

try {
Object results = JsonPath.parse(json).read(path);
return ExprValueUtils.fromObjectValue(results);
} catch (PathNotFoundException ignored) {
return LITERAL_NULL;
} catch (InvalidPathException invalidPathException) {
final String errorFormat = "JSON path '%s' is not valid. Error details: %s";
throw new SemanticCheckException(
String.format(errorFormat, path, invalidPathException.getMessage()),
invalidPathException);
} catch (InvalidJsonException invalidJsonException) {
final String errorFormat = "JSON string '%s' is not valid. Error details: %s";
throw new SemanticCheckException(
String.format(errorFormat, json, invalidJsonException.getMessage()),
invalidJsonException);
}
}

private static ExprValue processJsonNode(JsonNode jsonNode) {
switch (jsonNode.getNodeType()) {
case ARRAY:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@
import org.opensearch.sql.data.model.ExprBooleanValue;
import org.opensearch.sql.data.model.ExprCollectionValue;
import org.opensearch.sql.data.model.ExprDoubleValue;
import org.opensearch.sql.data.model.ExprFloatValue;
import org.opensearch.sql.data.model.ExprIntegerValue;
import org.opensearch.sql.data.model.ExprLongValue;
import org.opensearch.sql.data.model.ExprNullValue;
Expand All @@ -32,6 +33,7 @@
import org.opensearch.sql.exception.ExpressionEvaluationException;
import org.opensearch.sql.exception.SemanticCheckException;
import org.opensearch.sql.expression.DSL;
import org.opensearch.sql.expression.Expression;
import org.opensearch.sql.expression.FunctionExpression;
import org.opensearch.sql.expression.LiteralExpression;

Expand Down Expand Up @@ -216,5 +218,153 @@ void json_returnsSemanticCheckException() {
SemanticCheckException.class,
() -> DSL.castJson(expr).valueOf(),
"Expected to throw SemanticCheckException when calling castJson with " + expr));

// invalid type
assertThrows(
SemanticCheckException.class, () -> DSL.castJson(DSL.literal("invalid")).valueOf());

// missing bracket
assertThrows(SemanticCheckException.class, () -> DSL.castJson(DSL.literal("{{[}}")).valueOf());

// missing quote
assertThrows(
SemanticCheckException.class, () -> DSL.castJson(DSL.literal("\"missing quote")).valueOf());
}

@Test
void json_extract_search() {
ExprValue expected = new ExprIntegerValue(1);
execute_extract_json(expected, "{\"a\":1}", "$.a");
}

@Test
void json_extract_search_arrays() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have tests for the following:

  • Multiple paths where one or more paths match an array
  • Multiple paths where one or more paths match more than one value

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

String jsonArray = "{\"a\":[1,2.3,\"abc\",true,null,{\"c\":{\"d\":1}},[1,2,3]]}";
List<ExprValue> expectedExprValues =
List.of(
new ExprIntegerValue(1),
new ExprFloatValue(2.3),
new ExprStringValue("abc"),
LITERAL_TRUE,
LITERAL_NULL,
ExprTupleValue.fromExprValueMap(
Map.of("c", ExprTupleValue.fromExprValueMap(Map.of("d", new ExprIntegerValue(1))))),
new ExprCollectionValue(
List.of(
new ExprIntegerValue(1), new ExprIntegerValue(2), new ExprIntegerValue(3))));

// extract specific index from JSON list
for (int i = 0; i < expectedExprValues.size(); i++) {
String path = String.format("$.a[%d]", i);
execute_extract_json(expectedExprValues.get(i), jsonArray, path);
}

// extract nested object
ExprValue nestedExpected =
ExprTupleValue.fromExprValueMap(Map.of("d", new ExprIntegerValue(1)));
execute_extract_json(nestedExpected, jsonArray, "$.a[5].c");

// extract * from JSON list
ExprValue starExpected = new ExprCollectionValue(expectedExprValues);
execute_extract_json(starExpected, jsonArray, "$.a[*]");
}

@Test
void json_extract_returns_null() {
List<String> jsonStrings =
List.of(
"{\"a\":\"1\",\"b\":\"2\"}",
"{\"a\":1,\"b\":{\"c\":2,\"d\":3}}",
"{\"arr1\": [1,2,3], \"arr2\": [4,5,6]}",
"[1, 2, 3, 4]",
"[{\"a\":1,\"b\":2}, {\"c\":3,\"d\":2}]",
"\"abc\"",
"1234",
"12.34",
"true",
"false",
"");

jsonStrings.forEach(str -> execute_extract_json(LITERAL_NULL, str, "$.a.path_not_found_key"));

// null string literal
assertEquals(LITERAL_NULL, DSL.jsonExtract(DSL.literal("null"), DSL.literal("$.a")).valueOf());

// null json
assertEquals(
LITERAL_NULL, DSL.jsonExtract(DSL.literal(LITERAL_NULL), DSL.literal("$.a")).valueOf());

// missing json
assertEquals(
LITERAL_MISSING,
DSL.jsonExtract(DSL.literal(LITERAL_MISSING), DSL.literal("$.a")).valueOf());

// array out of bounds
execute_extract_json(LITERAL_NULL, "{\"a\":[1,2,3]}", "$.a[4]");
}

@Test
void json_extract_throws_SemanticCheckException() {
// invalid path
SemanticCheckException invalidPathError =
assertThrows(
SemanticCheckException.class,
() -> DSL.jsonExtract(DSL.literal("{\"a\":1}"), DSL.literal("$a")).valueOf());
assertEquals(
"JSON path '$a' is not valid. Error details: Illegal character at position 1 expected"
+ " '.' or '['",
invalidPathError.getMessage());

// invalid json
SemanticCheckException invalidJsonError =
assertThrows(
SemanticCheckException.class,
() ->
DSL.jsonExtract(
DSL.literal("{\"invalid\":\"json\", \"string\"}"), DSL.literal("$.a"))
.valueOf());
assertTrue(
invalidJsonError
.getMessage()
.startsWith(
"JSON string '{\"invalid\":\"json\", \"string\"}' is not valid. Error"
+ " details:"));
}

@Test
void json_extract_throws_ExpressionEvaluationException() {
Copy link
Contributor

@currantw currantw Feb 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could split execute_extract_json into two methods: execute_extract_json (which would just actually run jsonExtract) and test_extract_json (or assert_extract_json), which would call execute_extract_json and then actually do the comparison.

That way, test methods like this could call execute_extract_json, and no have to worry about calling DSL.literal 🤷

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

// null path
assertThrows(
ExpressionEvaluationException.class,
() -> DSL.jsonExtract(DSL.literal("{\"a\":1}"), DSL.literal(LITERAL_NULL)).valueOf());

// missing path
assertThrows(
ExpressionEvaluationException.class,
() -> DSL.jsonExtract(DSL.literal("{\"a\":1}"), DSL.literal(LITERAL_MISSING)).valueOf());
}

@Test
void json_extract_search_list_of_paths() {
final String objectJson =
"{\"foo\": \"foo\", \"fuzz\": true, \"bar\": 1234, \"bar2\": 12.34, \"baz\": null, "
+ "\"obj\": {\"internal\": \"value\"}, \"arr\": [\"string\", true, null]}";

ExprValue expected =
new ExprCollectionValue(
List.of(new ExprStringValue("foo"), new ExprFloatValue(12.34), LITERAL_NULL));
Expression pathExpr1 = DSL.literal(ExprValueUtils.stringValue("$.foo"));
Expression pathExpr2 = DSL.literal(ExprValueUtils.stringValue("$.bar2"));
Expression pathExpr3 = DSL.literal(ExprValueUtils.stringValue("$.potato"));
Expression jsonExpr = DSL.literal(ExprValueUtils.stringValue(objectJson));
ExprValue actual = DSL.jsonExtract(jsonExpr, pathExpr1, pathExpr2, pathExpr3).valueOf();
assertEquals(expected, actual);
}

private static void execute_extract_json(ExprValue expected, String json, String path) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be generalized to allow multiple paths, so you can use it for json_extract_search_list_of_paths as well?

Suggested change
private static void execute_extract_json(ExprValue expected, String json, String path) {
private static void execute_extract_json(ExprValue expected, String json, String... paths) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved

Expression pathExpr = DSL.literal(ExprValueUtils.stringValue(path));
Expression jsonExpr = DSL.literal(ExprValueUtils.stringValue(json));
ExprValue actual = DSL.jsonExtract(jsonExpr, pathExpr).valueOf();
assertEquals(expected, actual);
}
}
Loading
Loading