Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Wrong results by ORCA when NULL TEST on LOJ (#15358)
Issue: Orca tries to eliminate self comparisons at preprocessing, but this early optimization misleading the further expression preprocesing of LOJ. This PR tries to avoid self comparison check's of WHERE clause predicate when SELECT's logical child is LOJ. NOTE: Postgres Executor’s standard, restriction placed in the ON clause is processed before the join, while a restriction placed in the WHERE clause is processed after the join. That does not matter with inner joins, but it matters a lot with outer joins. Setup: CREATE TABLE t2(c0 int, c1 int not null); INSERT INTO t2 values(1, 2),(3,4),(5,6),(7,8); CREATE TABLE t3(c0 int not null, c1 int, c2 int); SELECT t2.c1 FROM t2 LEFT OUTER JOIN t3 ON t3.c1 > t3.c2 WHERE (t3.c0=t3.c0) IS NULL; c1 ---- (0 rows) explain SELECT t2.c1 FROM t2 LEFT OUTER JOIN t3 ON t3.c1 > t3.c2 WHERE (t3.c0=t3.c0) IS NULL; QUERY PLAN --------------------------------------------------------------------------------------------------- Gather Motion 3:1 (slice1; segments: 3) (cost=0.00..1324032.07 rows=1 width=4) -> Nested Loop (cost=0.00..1324032.07 rows=1 width=4) Join Filter: true -> Seq Scan on t2 (cost=0.00..431.00 rows=1 width=4) Filter: (true IS NULL) -> Materialize (cost=0.00..431.00 rows=1 width=1) -> Broadcast Motion 3:3 (slice2; segments: 3) (cost=0.00..431.00 rows=1 width=1) -> Seq Scan on t3 (cost=0.00..431.00 rows=1 width=1) Filter: c1 > c2 Optimizer: Pivotal Optimizer (GPORCA) (10 rows set optimizer=off; SET SELECT t2.c1 FROM t2 LEFT OUTER JOIN t3 ON t3.c1 > t3.c2 WHERE (t3.c0=t3.c0) IS NULL; c1 ---- 4 8 2 6 (4 rows) explain SELECT t2.c1 FROM t2 LEFT OUTER JOIN t3 ON t3.c1 > t3.c2 WHERE (t3.c0=t3.c0) IS NULL; QUERY PLAN --------------------------------------------------------------------------------------------------------- Gather Motion 3:1 (slice1; segments: 3) (cost=10000000000.00..10044448648.78 rows=1117865000 width=4) -> Nested Loop Left Join (cost=10000000000.00..10029543782.11 rows=372621667 width=4) Filter: ((t3.c0 = t3.c0) IS NULL) -> Seq Scan on t2 (cost=0.00..321.00 rows=28700 width=4) -> Materialize (cost=0.00..834.64 rows=25967 width=4) -> Broadcast Motion 3:3 (slice2; segments: 3) (cost=0.00..704.81 rows=25967 width=4) -> Seq Scan on t3 (cost=0.00..358.58 rows=8656 width=4) Filter: (c1 > c2) Optimizer: Postgres query optimizer (8 rows) After Fix: SELECT t2.c1 FROM t2 LEFT OUTER JOIN t3 ON t3.c1 > t3.c2 WHERE (t3.c0=t3.c0) IS NULL; c1 ---- 6 4 8 2 (4 rows) explain SELECT t2.c1 FROM t2 LEFT OUTER JOIN t3 ON t3.c1 > t3.c2 WHERE (t3.c0=t3.c0) IS NULL; QUERY PLAN --------------------------------------------------------------------------------------------------------- Gather Motion 3:1 (slice1; segments: 3) (cost=0.00..1324032.37 rows=1 width=4) -> Result (cost=0.00..1324032.37 rows=1 width=4) Filter: ((t3.c0 = t3.c0) IS NULL) -> Nested Loop Left Join (cost=0.00..1324032.37 rows=1 width=8) Join Filter: true -> Seq Scan on t2 (cost=0.00..431.00 rows=1 width=4) -> Materialize (cost=0.00..431.00 rows=1 width=4) -> Broadcast Motion 3:3 (slice2; segments: 3) (cost=0.00..431.00 rows=1 width=4) -> Seq Scan on t3 (cost=0.00..431.00 rows=1 width=4) Filter: (c1 > c2) Optimizer: Pivotal Optimizer (GPORCA) (cherry picked from gpdb commit d3dd98c1a8daf04fbf6cb91fc4afa6f91b317e93)
- Loading branch information