-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PPL: Add json_set
and json_extend
command to spark-ppl
#1038
PPL: Add json_set
and json_extend
command to spark-ppl
#1038
Conversation
Signed-off-by: Andrew Carbonetto <[email protected]>
Signed-off-by: Andrew Carbonetto <[email protected]>
Signed-off-by: Andrew Carbonetto <[email protected]>
json_extend
command to spark-ppljson_set
and json_extend
command to spark-ppl
Signed-off-by: Andrew Carbonetto <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add some ITs for testing integrated with multiple json functions
@@ -223,10 +223,6 @@ public enum BuiltinFunctionName { | |||
JSON_EXTRACT(FunctionName.of("json_extract")), | |||
JSON_KEYS(FunctionName.of("json_keys")), | |||
JSON_VALID(FunctionName.of("json_valid")), | |||
// JSON_DELETE(FunctionName.of("json_delete")), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Em, how did json_delete
work if we didn't included here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its not a built-in function, but a user-defined function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Built-in" here means that this function is built-in in PPL-on-Spark, not that it is built-in in Spark. Therefore, as long as the implementation is provided, those that do not require a user to provide their own implementation are considered "built-in" (we currently do not provide a public interface for user-defined functions, so the functions we add are all built-in functions). It seems due to this code
Lines 203 to 208 in c6d8793
if (BuiltinFunctionName.of(function.getFuncName()).isEmpty()) { | |
ScalaUDF udf = SerializableUdf.visit(function.getFuncName(), args); | |
if(udf == null) { | |
throw new UnsupportedOperationException(function.getFuncName() + " is not a builtin function of PPL"); | |
} | |
return udf; |
BuiltinFunctionName
.( Not a blocker for this PR).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add IT for this PR and ensure the examples you add in the document are all correct.
...st/scala/org/opensearch/flint/spark/ppl/PPLLogicalPlanJsonFunctionsTranslatorTestSuite.scala
Outdated
Show resolved
Hide resolved
docs/ppl-lang/functions/ppl-json.md
Outdated
@@ -278,12 +301,57 @@ Example: | |||
|{"teacher":["Alice","Tom","Walt"],"student":[{"name":"Bob","rank":1},{"name":"Charlie","rank":2}]} | | |||
+-----------------------------------------------------------------------------------------------------------------------------------+ | |||
|
|||
|
|||
os> source=people | eval append = json_append(`{"school":{"teacher":["Alice"],"student":[{"name":"Bob","rank":1},{"name":"Charlie","rank":2}]}}`,array('school.teacher', 'Tom', 'Walt')) | head 1 | fields append | |||
os> source=people | eval append = json_append(`{"school":{"teacher":["Alice"],"student":[{"name":"Bob","rank":1},{"name":"Charlie","rank":2}]}}`,array('school.teacher', array('Tom', 'Walt'))) | head 1 | fields append |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you ever checked this PPL? Does it work?
As far as I know, function array
in Spark won't accept elements of different types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please take a look at the latest example/IT test.
We can use a json-encoded string of arrays to do this. But passing the array does have abide by the rules of our Spark array object.
docs/ppl-lang/functions/ppl-json.md
Outdated
|
||
Example: | ||
|
||
os> source=people | eval extend = json_extend(`{"teacher":["Alice"],"student":[{"name":"Bob","rank":1},{"name":"Charlie","rank":2}]}`, 'student', '{"name":"Tommy","rank":5}') | head 1 | fields extend |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same problem as found in json_set
. These 2 functions should only have 2 parameters as defined in the code while given 3 here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@qianheng-aws the second parameter is actually a list of key-value pairs if I'm not mistaken - @acarbonetto is this correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct.
ppl-spark-integration/src/main/java/org/opensearch/sql/expression/function/JsonUtils.java
Outdated
Show resolved
Hide resolved
* @param depth - current traversal depth | ||
* @param valueToUpdate - value to update | ||
*/ | ||
static void updateNestedValue(Object currentObj, String[] pathParts, int depth, Object valueToUpdate) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we reuse the code from appendNestedValue? These 2 functions seem to have mostly duplicated code with differences only in their final operations -- set or append.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking the same thing. I will try and consolidate today - maybe with another functional argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thought, I reckon we will end up having something similar in term of class structure:
JsonUtils {
updateNestedValue( ) {
// Invoke METHOD_TRAVEERSE method with Lambda for the handling (Update).
}
appendNestedValue( ) {
// Invoke METHOD_TRAVEERSE method with Lambda for the handling (Append).
}
private static METHOD_TRAVEERSE (FunctionalInterface ) {
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please take a look at the latest iteration @qianheng-aws
I've added traversal and update functions, as well passing in a lambda that does the actual implemented.
Let me know if you find any more candidates for optimization.
Signed-off-by: Andrew Carbonetto <[email protected]>
@acarbonetto can you plz check why CI failed ? |
Sure. IT failed, but it doesn't look like it's from a failed test. |
* @param depth - current traversal depth | ||
* @param valueToUpdate - value to update | ||
*/ | ||
static void updateNestedValue(Object currentObj, String[] pathParts, int depth, Object valueToUpdate) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thought, I reckon we will end up having something similar in term of class structure:
JsonUtils {
updateNestedValue( ) {
// Invoke METHOD_TRAVEERSE method with Lambda for the handling (Update).
}
appendNestedValue( ) {
// Invoke METHOD_TRAVEERSE method with Lambda for the handling (Append).
}
private static METHOD_TRAVEERSE (FunctionalInterface ) {
}
}
Signed-off-by: Andrew Carbonetto <[email protected]>
Signed-off-by: Andrew Carbonetto <[email protected]>
Signed-off-by: Andrew Carbonetto <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CI build failure due to scala style format issue. Please run sbt scalafmtAll
before submitting your code.
.../src/integration/scala/org/opensearch/flint/spark/ppl/FlintSparkPPLJsonFunctionITSuite.scala
Outdated
Show resolved
Hide resolved
.../src/integration/scala/org/opensearch/flint/spark/ppl/FlintSparkPPLJsonFunctionITSuite.scala
Show resolved
Hide resolved
ppl-spark-integration/src/main/java/org/opensearch/sql/expression/function/JsonUtils.java
Outdated
Show resolved
Hide resolved
ppl-spark-integration/src/main/java/org/opensearch/sql/expression/function/JsonUtils.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Andrew Carbonetto <[email protected]>
Signed-off-by: Andrew Carbonetto <[email protected]>
@@ -245,9 +270,12 @@ Example: | |||
|
|||
**Description** | |||
|
|||
`json_append(json_string, [path_key, list of values to add ])` appends values to end of an array within the json elements. Return the updated json object after appending . | |||
`json_append(json_string, array(key1, value1, key2, value2, ...))` appends values to end of an array at key within the json elements. Returns the updated json object after appending. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update json_append syntax to align with SP2 syntax, which takes pairs of key/values. Tests also updated.
ppl-spark-integration/src/main/java/org/opensearch/sql/expression/function/JsonUtils.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Andrew Carbonetto <[email protected]>
Signed-off-by: Andrew Carbonetto <[email protected]>
Signed-off-by: Andrew Carbonetto <[email protected]>
Signed-off-by: Andrew Carbonetto <[email protected]>
Signed-off-by: Andrew Carbonetto <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, Thanks for the contribution!
test("test json_extend() function: add single value key not found") { | ||
val frame = sql(s""" | ||
| source = $testTable | ||
| | eval result = json_extend('$validJson7',array('headmaster', 'Tom')) | head 1 | fields result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this 'Tom' works?
I see three different formats in this IT:
- '\"Foobar\"'
- '"Tom"'
- 'Tom'
This is a bit confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 is completely equal to 2 in scala.StringContext;
and 2 (""Tom"") will be transformed to 3 ("Tom") after parsing by jackson.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 is completely equal to 2 in scala.StringContext;
I see.
and 2 (""Tom"") will be transformed to 3 ("Tom") after parsing by jackson.
Do you mean they are all equality?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, at least equality in our code implementation.
@@ -223,10 +223,6 @@ public enum BuiltinFunctionName { | |||
JSON_EXTRACT(FunctionName.of("json_extract")), | |||
JSON_KEYS(FunctionName.of("json_keys")), | |||
JSON_VALID(FunctionName.of("json_valid")), | |||
// JSON_DELETE(FunctionName.of("json_delete")), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Built-in" here means that this function is built-in in PPL-on-Spark, not that it is built-in in Spark. Therefore, as long as the implementation is provided, those that do not require a user to provide their own implementation are considered "built-in" (we currently do not provide a public interface for user-defined functions, so the functions we add are all built-in functions). It seems due to this code
Lines 203 to 208 in c6d8793
if (BuiltinFunctionName.of(function.getFuncName()).isEmpty()) { | |
ScalaUDF udf = SerializableUdf.visit(function.getFuncName(), args); | |
if(udf == null) { | |
throw new UnsupportedOperationException(function.getFuncName() + " is not a builtin function of PPL"); | |
} | |
return udf; |
BuiltinFunctionName
.( Not a blocker for this PR).
Description
Adds
json_set
andjson_extend
functions to the spark PPL UDF.Related Issues
Resolves #996
Check List
--signoff
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.