Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#3016 New expand PPL command #3305

Draft
wants to merge 81 commits into
base: main
Choose a base branch
from

Conversation

currantw
Copy link
Contributor

@currantw currantw commented Feb 6, 2025

Signed-off-by: currantw [email protected]

Description

Adds a new expand PPL command to OpenSearch SQL that create multiple rows corresponding to the elements on an multi-valued field. If the field contains a single value (not an array, including null or missing), the row is not modified.

This does not support expanding object fields. This is handled by the flatten command - see #3030.

Related Issues

Resolves #3030

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Comment on lines +71 to 85
public boolean equal(ExprValue other) {

if (!(other instanceof ExprTupleValue)) {
return false;
} else {
ExprTupleValue other = (ExprTupleValue) o;
Iterator<Entry<String, ExprValue>> thisIterator = this.valueMap.entrySet().iterator();
Iterator<Entry<String, ExprValue>> otherIterator = other.valueMap.entrySet().iterator();
while (thisIterator.hasNext() && otherIterator.hasNext()) {
Entry<String, ExprValue> thisEntry = thisIterator.next();
Entry<String, ExprValue> otherEntry = otherIterator.next();
if (!(thisEntry.getKey().equals(otherEntry.getKey())
&& thisEntry.getValue().equals(otherEntry.getValue()))) {
return false;
}
}
return !(thisIterator.hasNext() || otherIterator.hasNext());
}

/**
* {@link Map#equals} returns true if the two maps' entry sets are equal, and works properly
* across all implementation of the {@link Map} interface. See {@link
* https://docs.oracle.com/javase/8/docs/api/java/util/Map.html#equals-java.lang.Object-} for
* more details.
*/
return valueMap.equals(other.tupleValue());
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change pulled from flatten PR. See discussion there.
#3267

@@ -103,7 +103,7 @@ Description

Usage: nullif(field1, field2) return null if two parameters are same, otherwise return field1.

Argument type: all the supported data type, (NOTE : if two parameters has different type, if two parameters has different type, you will fail semantic check)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a typo that I came across. Unrelated.

result,
rows("Seattle Seahawks", 2014),
rows("Seattle Kraken", null),
rows("Vancouver Canucks", null),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies to any Canucks fans... 😢

import org.opensearch.sql.exception.SemanticCheckException;

@UtilityClass
public class PathUtils {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this new utilities module for dealing with paths. Includes unit tests. Once flatten has been merged (see here, I will also update that implementation to use these methods.

Comment on lines 14 to 41
The ``expand`` command expands a field that contains an array of values to produce a seperate row for each value in the
array. If the field does not contain an array, the row is not modified.

Syntax
============

``expand <field>``

* ``field``: reference to the field to flatten.

Example 1: Expand a field
=========================

PPL query::

os> source=expand | expand team | fields city, team.name
fetched rows / total rows = 7/7
+--------------+-------------------+
| city | team.name |
|--------------+-------------------|
| Seattle | Seattle Seahawks |
| Seattle | Seattle Kraken |
| Vancouver | Vancouver Canucks |
| Vancouver | BC Lions |
| San Antonio | San Antonio Spurs |
| Null Team | null |
| Missing Team | null |
+--------------+-------------------+
Copy link
Contributor Author

@currantw currantw Feb 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bit of a difference from both Splunk and OpenSearch Spark.

This support multi-values and single values. If expand is applied to a field that contains both scalars (i.e. single values, including null and missing) and multi-values (i.e. arrays), then the arrays are expanded into separate rows, while the scalar rows remain unchanged. An empty array appears to be converted to null "under-the-hood" by OpenSearch Spark (see OpenSearchExprValueFactoryTest.constructNullArrayValue), so an empty array is handled the same as a null value.

Comment on lines +118 to +122
/** With {@link org.opensearch.sql.data.model.ExprMissingValue} */
inputRow = ExprValueUtils.tupleValue(Map.of());
mockInput(inputRow);

actualRows = execute(expand(inputPlan, DSL.ref("scalar_missing", ARRAY)));
expectedRows = List.of(inputRow);

assertEquals(expectedRows, actualRows);

/** Without {@link org.opensearch.sql.data.model.ExprMissingValue} */
inputRow = ExprValueUtils.tupleValue(Map.of("scalar_missing", missingExprValue));
mockInput(inputRow);

actualRows = execute(expand(inputPlan, DSL.ref("scalar_missing", ARRAY)));
expectedRows = List.of(inputRow);

assertEquals(expectedRows, actualRows);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When debugging, I found that "missing" values seem to simply by missing from the ExprValue mapping when they get read into OpenSearch SQL, rather than being explicitly mapped to ExprMissingValue. I test both cases here.

@currantw currantw marked this pull request as draft February 6, 2025 01:54
@currantw currantw force-pushed the #3016_expand_command branch 2 times, most recently from 8f6838b to 7e0a31b Compare February 12, 2025 23:50
Signed-off-by: currantw <[email protected]>
Signed-off-by: currantw <[email protected]>
Signed-off-by: currantw <[email protected]>
Signed-off-by: currantw <[email protected]>
Signed-off-by: currantw <[email protected]>
Signed-off-by: currantw <[email protected]>
Signed-off-by: currantw <[email protected]>
Signed-off-by: currantw <[email protected]>
@currantw currantw force-pushed the #3016_expand_command branch from 7e0a31b to 028e074 Compare February 13, 2025 03:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE]Add flatten Command to PPL
1 participant