Ppl spark join command #69

YANG-DB · 2023-10-11T05:52:25Z

Description

New Correlation Query Command

Here is the new command that would allow this type of investigation :

source alb_logs, traces | where alb_logs.ip="10.0.0.1" AND alb_logs.cloud.provider="aws"|
correlate exact fields(traceId, ip) scope(@timestamp, 1D) mapping(alb_logs.ip = traces.attributes.http.server.address, alb_logs.traceId = traces.traceId )

Lets break this down a bit:

1. source alb_logs, traces allows to select all the data-sources that will be correlated to one another

2. where ip="10.0.0.1" AND cloud.provider="aws" predicate clause constraints the scope of the search corpus

3. correlate exact fields(traceId, ip) express the correlation operation on the following list of field :

- ip has an explicit filter condition so this will be propagated into the correlation condition for all the data-sources
- traceId has no explicit filter so the correlation will only match same traceId’s from all the data-sources

The fields names indicate the logical meaning the function within the correlation command, the actual join condition will take the mapping statement described bellow.

The term exact means that the correlation statements will require all the fields to match in order to fulfill the query statement.

Other alternative for this can be approximate that will attempt to match on a best case scenario and will not reject rows with partially match.

Addressing different field mapping

In cases where the same logical field (such as ip ) may have different mapping within several data-sources, the explicit mapping field path is expected.

The next syntax will extend the correlation conditions to allow matching different field names with similar logical meaning
alb_logs.ip = traces.attributes.http.server.address, alb_logs.traceId = traces.traceId

It is expected that for each field that participates in the correlation join, there should be a relevant mapping statement that includes all the tables that should be joined by this correlation command.

Example**:**
In our case there are 2 sources : alb_logs, traces
There are 2 fields: traceId, ip
These are 2 mapping statements : alb_logs.ip = traces.attributes.http.server.address, alb_logs.traceId = traces.traceId

Scoping the correlation timeframes

In order to simplify the work that has to be done by the execution engine (driver) the scope statement was added to explicitly direct the join query on the time it should scope for this search.

scope(@timestamp, 1D) in this example, the scope of the search should be focused on a daily basis so that correlations appearing in the same day should be grouped together. This scoping mechanism simplifies and allows better control over results and allows incremental search resolution base on the user’s needs.

Issues Resolved

#68

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

…se.yml Signed-off-by: YANGDB <[email protected]>

Signed-off-by: YANGDB <[email protected]>

add ppl statement logical plan elements add ppl parser components add ppl expressions components Signed-off-by: YANGDB <[email protected]>

Signed-off-by: YANGDB <[email protected]>

- source = $testTable - source = $testTable | fields name, age - source = $testTable age=25 | fields name, age Signed-off-by: YANGDB <[email protected]>

Signed-off-by: YANGDB <[email protected]>

add AggregateFunction translation & tests remove unused DSL builder Signed-off-by: YANGDB <[email protected]>

# Conflicts: # build.sbt # flint-spark-integration/src/test/scala/org/opensearch/flint/spark/skipping/FlintSparkSkippingIndexSuite.scala

Signed-off-by: YANGDB <[email protected]>

add actual ppl based table content fetch and verification Signed-off-by: YANGDB <[email protected]>

Signed-off-by: YANGDB <[email protected]>

add README.md details for supported commands and planned future support Signed-off-by: YANGDB <[email protected]>

add missing license header update supported command in readme Signed-off-by: YANGDB <[email protected]>

Signed-off-by: YANGDB <[email protected]>

add join ast builder Signed-off-by: YANGDB <[email protected]>

# Conflicts: # build.sbt # integ-test/src/test/scala/org/opensearch/flint/spark/LogicalPlanTestUtils.scala # ppl-spark-integration/README.md # ppl-spark-integration/src/main/antlr4/OpenSearchPPLLexer.g4 # ppl-spark-integration/src/main/antlr4/OpenSearchPPLParser.g4 # ppl-spark-integration/src/main/java/org/opensearch/sql/ast/AbstractNodeVisitor.java # ppl-spark-integration/src/main/java/org/opensearch/sql/ast/tree/Project.java # ppl-spark-integration/src/main/java/org/opensearch/sql/ppl/CatalystPlanContext.java # ppl-spark-integration/src/main/java/org/opensearch/sql/ppl/CatalystQueryPlanVisitor.java # ppl-spark-integration/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java # ppl-spark-integration/src/main/java/org/opensearch/sql/ppl/parser/AstExpressionBuilder.java # ppl-spark-integration/src/main/java/org/opensearch/sql/ppl/utils/AggregatorTranslator.java # ppl-spark-integration/src/main/java/org/opensearch/sql/ppl/utils/ComparatorTransformer.java # ppl-spark-integration/src/main/java/org/opensearch/sql/ppl/utils/DataTypeTransformer.java # ppl-spark-integration/src/main/java/org/opensearch/sql/ppl/utils/SortUtils.java # ppl-spark-integration/src/main/scala/org/opensearch/flint/spark/ppl/FlintSparkPPLParser.scala # ppl-spark-integration/src/test/scala/org/opensearch/flint/spark/ppl/LogicalPlanTestUtils.scala # ppl-spark-integration/src/test/scala/org/opensearch/flint/spark/ppl/PPLLogicalAdvancedTranslatorTestSuite.scala # ppl-spark-integration/src/test/scala/org/opensearch/flint/spark/ppl/PPLLogicalPlanAggregationQueriesTranslatorTestSuite.scala # ppl-spark-integration/src/test/scala/org/opensearch/flint/spark/ppl/PPLLogicalPlanBasicQueriesTranslatorTestSuite.scala # ppl-spark-integration/src/test/scala/org/opensearch/flint/spark/ppl/PPLLogicalPlanFiltersTranslatorTestSuite.scala

Signed-off-by: YANGDB <[email protected]>

add test parts Signed-off-by: YANGDB <[email protected]>

Signed-off-by: YANGDB <[email protected]>

- add plan branches context traversal - add resolving of un-resolved attributes (columns) - add join spec transformer util API - add documentation about the correlation design considerations Signed-off-by: YANGDB <[email protected]>

add correlation span and group by tests remove un-implemented tests Signed-off-by: YANGDB <[email protected]>

Signed-off-by: YANGDB <[email protected]>

YANG-DB added 30 commits August 23, 2023 14:06

adding support for containerized flint with spark / Livy docker-compo…

62c12cc

…se.yml Signed-off-by: YANGDB <[email protected]>

adding support for containerized flint with spark / Livy docker-compo…

9e6ecfc

…se.yml Signed-off-by: YANGDB <[email protected]>

adding support for containerized flint with spark / Livy docker-compo…

0808ea5

…se.yml Signed-off-by: YANGDB <[email protected]>

Merge branch 'main' into containerize_flint

1b2ae06

adding support for containerized flint with spark / Livy docker-compo…

91defa0

…se.yml Signed-off-by: YANGDB <[email protected]>

update ppl ast builder

0febc09

Signed-off-by: YANGDB <[email protected]>

add ppl ast components

18cd83f

add ppl statement logical plan elements add ppl parser components add ppl expressions components Signed-off-by: YANGDB <[email protected]>

populate ppl test suit for covering different types of PPL queries

605f1bf

Signed-off-by: YANGDB <[email protected]>

update additional tests

d54530d

Signed-off-by: YANGDB <[email protected]>

separate ppl-spark code into a dedicated module

72dc5f7

Signed-off-by: YANGDB <[email protected]>

add ppl translation of simple filter and data-type literal expression

d953b19

Signed-off-by: YANGDB <[email protected]>

remove none-used ppl ast builder

9fce31e

Signed-off-by: YANGDB <[email protected]>

add log-plan test results validation

a299bdf

Signed-off-by: YANGDB <[email protected]>

add support for multiple table selection using union

019f690

Signed-off-by: YANGDB <[email protected]>

add support for multiple table selection using union

0c7ccec

Signed-off-by: YANGDB <[email protected]>

update sbt with new IT test suite for PPL module

14fa7e5

Signed-off-by: YANGDB <[email protected]>

update ppl IT suite test

d55b774

Signed-off-by: YANGDB <[email protected]>

update ppl IT suite dependencies

8bbe0d9

Signed-off-by: YANGDB <[email protected]>

add tests for ppl IT with

af065f7

- source = $testTable - source = $testTable | fields name, age - source = $testTable age=25 | fields name, age Signed-off-by: YANGDB <[email protected]>

update literal transformations according to catalyst's convention

5819dc7

Signed-off-by: YANGDB <[email protected]>

separate unit-tests into a dedicated file per each test category

7db7213

Signed-off-by: YANGDB <[email protected]>

add IT tests for additional filters

32573ab

Signed-off-by: YANGDB <[email protected]>

mark unsatisfied tests as ignored until supporting code is ready

eec0e4a

Signed-off-by: YANGDB <[email protected]>

add README.md design and implementation details

3f9d9d1

add AggregateFunction translation & tests remove unused DSL builder Signed-off-by: YANGDB <[email protected]>

Merge branch 'main' into ppl-spark-translation

3dbb5bb

# Conflicts: # build.sbt # flint-spark-integration/src/test/scala/org/opensearch/flint/spark/skipping/FlintSparkSkippingIndexSuite.scala

remove docker related files

67fd56a

Signed-off-by: YANGDB <[email protected]>

add text related unwrapping bug - fix

89dd114

add actual ppl based table content fetch and verification Signed-off-by: YANGDB <[email protected]>

add AggregatorTranslator support

65f4372

Signed-off-by: YANGDB <[email protected]>

resolve group by issues

00e2a76

Signed-off-by: YANGDB <[email protected]>

add generic ppl extension chain which registers a chain of parsers

17e93fb

Signed-off-by: YANGDB <[email protected]>

YANG-DB added 14 commits September 13, 2023 12:33

update readme doc

eaa4e33

Signed-off-by: YANGDB <[email protected]>

add head support

157bbb7

add README.md details for supported commands and planned future support Signed-off-by: YANGDB <[email protected]>

add support for sort command

20385c1

add missing license header update supported command in readme Signed-off-by: YANGDB <[email protected]>

update supported command in readme

7c0fd36

Signed-off-by: YANGDB <[email protected]>

add initial join command for ppl grammar

aaa4831

add join ast builder Signed-off-by: YANGDB <[email protected]>

update correlation command

e9d1589

Signed-off-by: YANGDB <[email protected]>

update correlation command

f998427

Signed-off-by: YANGDB <[email protected]>

update correlation command

4ee2fbf

add test parts Signed-off-by: YANGDB <[email protected]>

Merge branch 'main' into ppl-spark-join-command

b632f70

fix testScalastyle issues

1258dc5

Signed-off-by: YANGDB <[email protected]>

add correlation tests

63ece19

Signed-off-by: YANGDB <[email protected]>

ignore not completed tests - for build purpose

9b0251c

Signed-off-by: YANGDB <[email protected]>

update correlation related traversal

400fbad

- add plan branches context traversal - add resolving of un-resolved attributes (columns) - add join spec transformer util API - add documentation about the correlation design considerations Signed-off-by: YANGDB <[email protected]>

YANG-DB marked this pull request as ready for review October 14, 2023 07:12

YANG-DB requested review from dai-chen, rupal-bq, vamsimanohar, penghuo and anirudha as code owners October 14, 2023 07:12

YANG-DB added 8 commits October 17, 2023 13:58

add assertion tests for failing correlation conditions

2c23149

add correlation span and group by tests remove un-implemented tests Signed-off-by: YANGDB <[email protected]>

remove un-implemented tests

e1735b2

Signed-off-by: YANGDB <[email protected]>

Merge branch 'main' into ppl-spark-join-command

bb16e7a

update according to scalastyle

c6649ad

Signed-off-by: YANGDB <[email protected]>

set the correlation scope parameter as optional

a3df76f

Signed-off-by: YANGDB <[email protected]>

update scala-fmt style

41fc9f4

Signed-off-by: YANGDB <[email protected]>

remove merged file that was previously removed

23d3dff

Signed-off-by: YANGDB <[email protected]>

update scala style fmt

cca23b4

Signed-off-by: YANGDB <[email protected]>

YANG-DB merged commit a0ac1fb into opensearch-project:main Dec 14, 2023

YANG-DB mentioned this pull request Jul 3, 2024

[META]Add Missing PPL commands - campaign #408

Open

35 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ppl spark join command #69

Ppl spark join command #69

YANG-DB commented Oct 11, 2023

Ppl spark join command #69

Ppl spark join command #69

Conversation

YANG-DB commented Oct 11, 2023

Description

New Correlation Query Command

Addressing different field mapping

Scoping the correlation timeframes

Issues Resolved