Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: PPL command - WMA Trendline #3293

Open
wants to merge 37 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
927fbfa
WMA
andy-k-improving Jan 31, 2025
3836f31
Update switch
andy-k-improving Feb 1, 2025
0fcd688
Unit-test
andy-k-improving Feb 3, 2025
898d3a6
Integ-test
andy-k-improving Feb 3, 2025
ab58370
Doc test
andy-k-improving Feb 3, 2025
51d8395
Spotless
andy-k-improving Feb 3, 2025
e4c25b3
Update test cases
andy-k-improving Feb 3, 2025
f6c82ae
Update test coverage
andy-k-improving Feb 3, 2025
d0cf898
Update docs/user/ppl/cmd/trendline.rst
andy-k-improving Feb 6, 2025
ce94ca9
Update docs/user/ppl/cmd/trendline.rst
andy-k-improving Feb 6, 2025
f32e5ad
Remove debug
andy-k-improving Feb 6, 2025
4d95529
Address code comments
andy-k-improving Feb 6, 2025
df3c666
Add support to all numeric types
andy-k-improving Feb 8, 2025
3ed8fa8
Update generic
andy-k-improving Feb 8, 2025
7fc4075
Replace evaluator with functionalInterface
andy-k-improving Feb 10, 2025
3b7902f
Apply suggestions from code review
andy-k-improving Feb 10, 2025
47f9cec
Update wma doc
andy-k-improving Feb 10, 2025
b8cf496
Fix style
andy-k-improving Feb 10, 2025
4a56b57
Update docs/user/ppl/cmd/trendline.rst
andy-k-improving Feb 11, 2025
0b50637
Fix code commentse
andy-k-improving Feb 11, 2025
b8e3983
update default name
andy-k-improving Feb 11, 2025
8dc96c9
Doc
andy-k-improving Feb 12, 2025
c197ecd
Doc
andy-k-improving Feb 12, 2025
5ea47ae
Refactor test-cases
andy-k-improving Feb 12, 2025
8d6ac31
DataPoints test-cases
andy-k-improving Feb 12, 2025
9ce1d99
Update test-cases
andy-k-improving Feb 12, 2025
bdb0f28
Update test-cases
andy-k-improving Feb 12, 2025
fa685bd
Update test-cases
andy-k-improving Feb 12, 2025
b87db9f
Address code comments
andy-k-improving Feb 13, 2025
39bf113
Update test-cases
andy-k-improving Feb 13, 2025
ba9bfb6
Update test-cases
andy-k-improving Feb 13, 2025
9910b22
Update IT magic number
andy-k-improving Feb 14, 2025
4be0400
Update IT test
andy-k-improving Feb 14, 2025
8348950
Update test coverage
andy-k-improving Feb 14, 2025
6037742
Update unit test
andy-k-improving Feb 14, 2025
e590fb9
Code style
andy-k-improving Feb 14, 2025
eccbf1d
Update doc test
andy-k-improving Feb 14, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,7 @@ public <R, C> R accept(AbstractNodeVisitor<R, C> nodeVisitor, C context) {
}

public enum TrendlineType {
SMA
SMA,
WMA
}
}

Large diffs are not rendered by default.

Large diffs are not rendered by default.

85 changes: 75 additions & 10 deletions docs/user/ppl/cmd/trendline.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,25 +15,24 @@ Description

Syntax
============
`TRENDLINE [sort <[+|-] sort-field>] SMA(number-of-datapoints, field) [AS alias] [SMA(number-of-datapoints, field) [AS alias]]...`
`TRENDLINE [sort <[+|-] sort-field>] <trendline-type>(number-of-datapoints, field) [AS alias] [<trendline-type>(number-of-datapoints, field) [AS alias]]...`

* [+|-]: optional. The plus [+] stands for ascending order and NULL/MISSING first and a minus [-] stands for descending order and NULL/MISSING last. **Default:** ascending order and NULL/MISSING first.
* sort-field: mandatory when sorting is used. The field used to sort.
* trendline-type: mandatory. The type of algorithm being used for the calculation, only SMA (simple moving average) or WMA (weighted moving average) are supported at the moment.
* number-of-datapoints: mandatory. The number of datapoints to calculate the moving average (must be greater than zero).
* field: mandatory. The name of the field the moving average should be calculated for.
* alias: optional. The name of the resulting column containing the moving average (defaults to the field name with "_trendline").
* alias: optional. The name of the resulting column containing the moving average (defaults to the field name with "_<trendline-type>_trendline").

At the moment only the Simple Moving Average (SMA) type is supported.

It is calculated like
In the case of Simple Moving Average - SMA, result will be calculated as per the below formula.

f[i]: The value of field 'f' in the i-th data-point
n: The number of data-points in the moving window (period)
t: The current time index

SMA(t) = (1/n) * Σ(f[i]), where i = t-n+1 to t

Example 1: Calculate the moving average on one field.
Example 1: Calculate the simple moving average on one field.
=====================================================

The example shows how to calculate the moving average on one field.
Expand All @@ -52,7 +51,7 @@ PPL query::
+------+


Example 2: Calculate the moving average on multiple fields.
Example 2: Calculate the simple moving average on multiple fields.
===========================================================

The example shows how to calculate the moving average on multiple fields.
Expand All @@ -70,14 +69,14 @@ PPL query::
| 15.5 | 30.5 |
+------+-----------+

Example 4: Calculate the moving average on one field without specifying an alias.
Example 3: Calculate the simple moving average on one field without specifying an alias.
=================================================================================

The example shows how to calculate the moving average on one field.
The example shows how to calculate the moving average on one field without specifying an alias.

PPL query::

os> source=accounts | trendline sma(2, account_number) | fields account_number_trendline;
os> source=accounts | trendline sma(2, account_number) | fields account_number_sma_trendline;
fetched rows / total rows = 4/4
+--------------------------+
| account_number_trendline |
Expand All @@ -88,3 +87,69 @@ PPL query::
| 15.5 |
+--------------------------+



In the case of Weighted Moving Average - WMA, result will be calculated as per the below formula.

f[i]: The value of field 'f' in the i-th data point
n: The number of data points in the moving window (period)
t: The current time index
w[i]: The weight of the i-th data point, increasing by one per step to prioritize recent points.

WMA(t) = ( Σ from i=t−n+1 to t of (w[i] * f[i]) ) / ( Σ from i=t−n+1 to t of w[i] )

Example 1: Calculate the weighted moving average on one field.
=====================================================

The example shows how to calculate the weighted moving average on one field.

PPL query::

os> source=accounts | trendline wma(2, account_number) as an | fields an;
fetched rows / total rows = 4/4
+--------------------+
| an |
|--------------------|
| null |
| 4.333333333333333 |
| 10.666666666666666 |
| 16.333333333333332 |
+--------------------+

Example 2: Calculate the weighted moving average on multiple fields.
===========================================================

The example shows how to calculate the weighted moving average on multiple fields.

PPL query::

os> source=accounts | trendline wma(2, account_number) as an sma(2, age) as age_trend | fields an, age_trend ;
fetched rows / total rows = 4/4
+--------------------+-----------+
| an | age_trend |
|--------------------+-----------|
| null | null |
| 4.333333333333333 | 34.0 |
| 10.666666666666666 | 32.0 |
| 16.333333333333332 | 30.5 |
+--------------------+-----------+


Example 3: Calculate the weighted moving average on one field without specifying an alias.
=================================================================================

The example shows how to calculate the weighted moving average on one field without specifying an alias.

PPL query::

os> source=accounts | trendline wma(2, account_number) | fields account_number_wma_trendline;
fetched rows / total rows = 4/4
+--------------------------+
| account_number_trendline |
|--------------------------|
| null |
| 4.333333333333333 |
| 10.666666666666666 |
| 16.333333333333332 |
+--------------------------+

Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,9 @@

import static org.opensearch.sql.legacy.TestsConstants.TEST_INDEX_BANK;
import static org.opensearch.sql.util.MatcherUtils.rows;
import static org.opensearch.sql.util.MatcherUtils.schema;
import static org.opensearch.sql.util.MatcherUtils.verifyDataRows;
import static org.opensearch.sql.util.MatcherUtils.verifySchema;

import java.io.IOException;
import org.json.JSONObject;
Expand Down Expand Up @@ -55,13 +57,14 @@ public void testTrendlineOverwritesExistingField() throws IOException {
}

@Test
public void testTrendlineNoAlias() throws IOException {
public void testTrendlineNoAliasDefaultName() throws IOException {
final JSONObject result =
executeQuery(
String.format(
"source=%s | where balance > 39000 | sort balance | trendline sma(2, balance) |"
+ " fields balance_trendline",
+ " fields balance_sma_trendline",
TEST_INDEX_BANK));
verifySchema(result, schema("balance_sma_trendline", "double"));
verifyDataRows(result, rows(new Object[] {null}), rows(44313.0), rows(39882.5));
}

Expand All @@ -71,8 +74,95 @@ public void testTrendlineWithSort() throws IOException {
executeQuery(
String.format(
"source=%s | where balance > 39000 | trendline sort balance sma(2, balance) |"
+ " fields balance_trendline",
+ " fields balance_sma_trendline",
TEST_INDEX_BANK));
verifyDataRows(result, rows(new Object[] {null}), rows(44313.0), rows(39882.5));
}

@Test
public void testTrendlineWma() throws IOException {
final JSONObject result =
executeQuery(
String.format(
"source=%s | sort balance | head 4 | trendline wma(4, balance) as"
+ " balance_trend | fields balance_trend",
TEST_INDEX_BANK));
verifyDataRows(
result,
rows(new Object[] {null}),
rows(new Object[] {null}),
rows(new Object[] {null}),
rows(19615.8));
}

@Test
public void testTrendlineMultipleFieldsWma() throws IOException {
final JSONObject result =
executeQuery(
String.format(
"source=%s | sort balance | head 5 | trendline wma(4, balance) as"
+ " balance_trend wma(5, account_number) as account_number_trend | fields"
+ " balance_trend, account_number_trend",
TEST_INDEX_BANK));
verifyDataRows(
result,
rows(null, null),
rows(null, null),
rows(null, null),
rows(19615.8, null),
rows(29393.6, 9.8));
}

@Test
public void testTrendlineOverwritesExistingFieldWma() throws IOException {
final JSONObject result =
executeQuery(
String.format(
"source=%s | sort balance | head 6 | trendline wma(4, balance) as"
+ " age | fields age",
TEST_INDEX_BANK));
verifyDataRows(
result,
rows(new Object[] {null}),
rows(new Object[] {null}),
rows(new Object[] {null}),
rows(19615.8),
rows(29393.6),
rows(36192.9));
}

@Test
public void testTrendlineNoAliasWmaDefaultName() throws IOException {
final JSONObject result =
executeQuery(
String.format(
"source=%s | sort balance | head 5 | trendline wma(4, balance) |"
+ " fields balance_wma_trendline",
TEST_INDEX_BANK));
verifySchema(result, schema("balance_wma_trendline", "double"));
verifyDataRows(
result,
rows(new Object[] {null}),
rows(new Object[] {null}),
rows(new Object[] {null}),
rows(19615.8),
rows(29393.6));
}

@Test
public void testTrendlineWithSortWma() throws IOException {
final JSONObject result =
executeQuery(
String.format(
"source=%s | sort balance | head 5 | trendline sort balance wma(4, balance) |"
+ " fields balance_wma_trendline",
TEST_INDEX_BANK));
verifyDataRows(
result,
rows(new Object[] {null}),
rows(new Object[] {null}),
rows(new Object[] {null}),
rows(19615.8),
rows(29393.6));
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add test with both wma and sma trendline default name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated with test-cases:
testTrendlineNoAliasDefaultName and testTrendlineNoAliasWmaDefaultName

}
1 change: 1 addition & 0 deletions ppl/src/main/antlr/OpenSearchPPLLexer.g4
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@ NUM: 'NUM';

// TRENDLINE KEYWORDS
SMA: 'SMA';
WMA: 'WMA';

// ARGUMENT KEYWORDS
KEEPEMPTY: 'KEEPEMPTY';
Expand Down
1 change: 1 addition & 0 deletions ppl/src/main/antlr/OpenSearchPPLParser.g4
Original file line number Diff line number Diff line change
Expand Up @@ -156,6 +156,7 @@ trendlineClause

trendlineType
: SMA
| WMA
;

kmeansCommand
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -89,13 +89,15 @@ public Trendline.TrendlineComputation visitTrendlineClause(
}

final Field dataField = (Field) this.visitFieldExpression(ctx.field);
final Trendline.TrendlineType computationType =
Trendline.TrendlineType.valueOf(ctx.trendlineType().getText().toUpperCase(Locale.ROOT));
final String alias =
ctx.alias != null
? ctx.alias.getText()
: dataField.getChild().get(0).toString() + "_trendline";

final Trendline.TrendlineType computationType =
Trendline.TrendlineType.valueOf(ctx.trendlineType().getText().toUpperCase(Locale.ROOT));
: dataField.getChild().getFirst().toString()
+ "_"
+ computationType.name().toLowerCase()
+ "_trendline";
return new Trendline.TrendlineComputation(
numberOfDataPoints, dataField, alias, computationType);
}
Expand Down
Loading