You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
issue-1, expect source = $sourceTable | LOOKUP $lookupTbl id should replace id column , but the result include 2 id columns. IMO, it should work as same as default behaviour source = $sourceTable | LOOKUP $lookupTbl id REPLACE id, department.
issue-2, got exception when execute source = $sourceTable | LOOKUP $lookupTbl id REPLACE id, department
[AMBIGUOUS_REFERENCE] Reference `id` is ambiguous, could be: [`__auto_generated_subquery_name_l`.`id`, `__auto_generated_subquery_name_s`.`id`].
org.apache.spark.sql.AnalysisException: [AMBIGUOUS_REFERENCE] Reference `id` is ambiguous, could be: [`__auto_generated_subquery_name_l`.`id`, `__auto_generated_subquery_name_s`.`id`].
How can one reproduce the bug?
Add following code to FlintSparkPPLLookupITSuite
protected def sourceTable(testTable: String): Unit = {
sql(s"""
| CREATE TABLE $testTable
| (
| id INT,
| name STRING,
| occupation STRING,
| country STRING,
| salary INT
| )
| USING $tableType $tableOptions
|""".stripMargin)
// Insert data into the new table
sql(s"""
| INSERT INTO $testTable
| VALUES (1000, 'Jake', 'Engineer', 'England' , 100000),
| (1001, 'Hello', 'Artist', 'USA', 70000),
| (1002, 'John', 'Doctor', 'Canada', 120000),
| (1003, 'David', 'Doctor', null, 120000),
| (1004, 'David', null, 'Canada', 0),
| (1005, 'Jane', 'Scientist', 'Canada', 90000)
| """.stripMargin)
}
protected def lookupTbl(testTable: String): Unit = {
sql(s"""
| CREATE TABLE $testTable
| (
| id INT,
| department STRING
| )
| USING $tableType $tableOptions
|""".stripMargin)
// Insert data into the new table
sql(s"""
| INSERT INTO $testTable
| VALUES (1000, 'IT'),
| (1002, 'DATA'),
| (1003, 'HR'),
| (1005, 'DATA'),
| (1006, 'SALES')
| """.stripMargin)
}
test("test LOOKUP lookupTable") {
var frame = sql(s"source = $sourceTable | LOOKUP $lookupTbl id")
frame.show()
frame = sql(s"source = $sourceTable | LOOKUP $lookupTbl id REPLACE id, department")
frame.show()
}
I think you are right, the current behaviour of inputField is "You can specify multiple <inputField> with comma-delimited. If you don't specify any <inputField>, all fields of <lookupIndex> where matched values are applied to result output."
The correct behaviour should be "If you don't specify any <inputField>, all fields of <lookupIndex> that are not the match fields where matched values are applied to result output."
What is the bug?
source = $sourceTable | LOOKUP $lookupTbl id
should replace id column , but the result include 2 id columns. IMO, it should work as same as default behavioursource = $sourceTable | LOOKUP $lookupTbl id REPLACE id, department
.source = $sourceTable | LOOKUP $lookupTbl id REPLACE id, department
How can one reproduce the bug?
What is the expected behavior?
source = $sourceTable | LOOKUP $lookupTbl id
What is your host/environment?
Do you have any screenshots?
If applicable, add screenshots to help explain your problem.
Do you have any additional context?
Add any other context about the problem.
The text was updated successfully, but these errors were encountered: