[Join] Inline and parallelize tbb in getAllTableColumnFragments. #616

Devjiu · 2023-08-03T21:23:49Z

This commit refactors and simplifies method getAllTableColumnFragments.
Also some parallelization added.

Partially resolves: #574

Signed-off-by: Dmitrii Makarenko [email protected]

Devjiu · 2023-08-17T12:31:01Z

Will be rebased over #623 --- Done

Devjiu · 2023-08-24T21:42:53Z

I was wrong here, it's inference of 2 issues.

~~Note: verify continue/break issue, for some reason continue issues following fails:~~

[ RUN      ] Select.Subqueries
/localdisk/dmitriim/hdk/omniscidb/Tests/ArrowSQLRunner/SQLiteComparator.cpp:66: Failure
Expected equality of these values:
  connector.getNumRows()
    Which is: 2
  omnisci_results->rowCount()
    Which is: 1
CPU: SELECT CASE WHEN test.x IN (SELECT x FROM test_inner) THEN x ELSE NULL END AS c, COUNT(*) AS n FROM test WHERE y > 40 GROUP BY c ORDER BY n DESC;
/localdisk/dmitriim/hdk/omniscidb/Tests/ArrowSQLRunner/SQLiteComparator.cpp:100: Failure
Expected equality of these values:
  ref_val
    Which is: 10
  omnisci_val
    Which is: 0
CPU: SELECT COUNT(*) FROM test WHERE str IN (SELECT DISTINCT str FROM test_inner) AND x IN (SELECT DISTINCT x FROM test_inner);
/localdisk/dmitriim/hdk/omniscidb/Tests/ArrowSQLRunner/SQLiteComparator.cpp:100: Failure
Expected equality of these values:
  ref_val
    Which is: 10
  omnisci_val
    Which is: 0
CPU: SELECT COUNT(*) FROM test WHERE str IN (SELECT DISTINCT str FROM test_inner) AND x IN (SELECT x FROM test_inner);
/localdisk/dmitriim/hdk/omniscidb/Tests/ArrowSQLRunner/SQLiteComparator.cpp:100: Failure
Expected equality of these values:
  ref_val
    Which is: 10
  omnisci_val
    Which is: 0
CPU: SELECT COUNT(*) FROM test WHERE str IN (SELECT str FROM test_inner) AND x IN (SELECT x FROM test_inner);
[  FAILED  ] Select.Subqueries (7655 ms)
[ RUN      ] Select.Joins_Arrays
[       OK ] Select.Joins_Arrays (465 ms)
[ RUN      ] Select.Joins_Fixed_Size_Array_Multi_Frag
[       OK ] Select.Joins_Fixed_Size_Array_Multi_Frag (690 ms)
[ RUN      ] Select.Joins_EmptyTable
[       OK ] Select.Joins_EmptyTable (180 ms)
[ RUN      ] Select.Joins_Fragmented_SelfJoin_And_LoopJoin
[       OK ] Select.Joins_Fragmented_SelfJoin_And_LoopJoin (409 ms)
[ RUN      ] Select.Joins_ImplicitJoins
[       OK ] Select.Joins_ImplicitJoins (1647 ms)
[ RUN      ] Select.Joins_DifferentIntegerTypes
[       OK ] Select.Joins_DifferentIntegerTypes (48 ms)
[ RUN      ] Select.Joins_FilterPushDown
[       OK ] Select.Joins_FilterPushDown (1818 ms)
[ RUN      ] Select.Joins_InnerJoin_TwoTables
[       OK ] Select.Joins_InnerJoin_TwoTables (1643 ms)
[ RUN      ] Select.Joins_InnerJoin_AtLeastThreeTables
/localdisk/dmitriim/hdk/omniscidb/Tests/ArrowSQLRunner/SQLiteComparator.cpp:66: Failure
Expected equality of these values:
  connector.getNumRows()
    Which is: 2
  omnisci_results->rowCount()
    Which is: 1
CPU: SELECT a.x, b.x FROM test_inner a JOIN test_inner b ON a.x = b.x ORDER BY a.x;
[  FAILED  ] Select.Joins_InnerJoin_AtLeastThreeTables (1589 ms)

Devjiu · 2023-08-25T21:28:37Z

Should be separated into 2 PRs, also new CPU implementation: fill_hash_join_buff_bucketized_cpu should be moved into new sources - HashJoinRuntimeCpu.cpp/h.

Some overall details in #574 (comment)

Devjiu · 2023-08-25T21:45:16Z

Benchmark status (with non-lazy):

 ./_launcher/solution.R --solution=pyhdk --task=join --nrow=1e8
[1] "./pyhdk/join-pyhdk.py"
# join-pyhdk.py
pyhdk data_name:  J1_1e8_NA_0_0
loading datasets J1_1e8_NA_0_0, J1_1e8_1e2_0_0, J1_1e8_1e5_0_0, J1_1e8_1e8_0_0
Using fragment size 32000000
100000000
100
100000
100000000
joining...
(89997128, 9)
(89997128, 9)
(89995511, 11)
(89995511, 11)
(100000000, 11)
(100000000, 11)
(89995511, 11)
(89995511, 11)
(90000000, 13)
(90000000, 13)
joining finished, took 73s
   on_disk               question run time_sec
1    FALSE     small inner on int   1    2.227
2    FALSE     small inner on int   2    1.914
3    FALSE    medium inner on int   1    2.315
4    FALSE    medium inner on int   2    2.167
5    FALSE    medium outer on int   1    2.257
6    FALSE    medium outer on int   2    2.475
7    FALSE medium inner on factor   1    2.227
8    FALSE medium inner on factor   2    2.240
9    FALSE       big inner on int   1    4.320
10   FALSE       big inner on int   2    3.970

 ./_launcher/solution.R --solution=pyhdk --task=join --nrow=1e8
[1] "./pyhdk/join-pyhdk.py"
# join-pyhdk.py
pyhdk data_name:  J1_1e8_NA_0_0
loading datasets J1_1e8_NA_0_0, J1_1e8_1e2_0_0, J1_1e8_1e5_0_0, J1_1e8_1e8_0_0
Using fragment size 4000000
100000000
100
100000
100000000
joining...
(89997128, 9)
(89997128, 9)
(89995511, 11)
(89995511, 11)
(100000000, 11)
(100000000, 11)
(89995511, 11)
(89995511, 11)
(90000000, 13)
(90000000, 13)
joining finished, took 59s
   on_disk               question run time_sec
1    FALSE     small inner on int   1    1.077
2    FALSE     small inner on int   2    0.765
3    FALSE    medium inner on int   1    1.063
4    FALSE    medium inner on int   2    1.017
5    FALSE    medium outer on int   1    0.704
6    FALSE    medium outer on int   2    0.726
7    FALSE medium inner on factor   1    1.056
8    FALSE medium inner on factor   2    1.004
9    FALSE       big inner on int   1    1.506
10   FALSE       big inner on int   2    1.293

Devjiu · 2023-09-13T13:06:50Z

Rebased over #663 . Let's merge #663 first.

Devjiu · 2023-09-13T17:23:56Z

export FRAGMENT_SIZE=4000000
/localdisk/dmitriim/benchmarks/db-benchmark ⑂master* $ ./_launcher/solution.R --solution=pyhdk --task=join --nrow=1e7
[1] "./pyhdk/join-pyhdk.py"
# join-pyhdk.py
pyhdk data_name:  J1_1e7_NA_0_0
loading datasets J1_1e7_NA_0_0, J1_1e7_1e1_0_0, J1_1e7_1e4_0_0, J1_1e7_1e7_0_0
Using fragment size 4000000
10000000
10
10000
10000000
joining...
(8998860, 9)
(8998860, 9)
(8998412, 11)
(8998412, 11)
(10000000, 11)
(10000000, 11)
(8998412, 11)
(8998412, 11)
(9000000, 13)
(9000000, 13)
joining finished, took 9s
   on_disk               question run time_sec
1    FALSE     small inner on int   1    0.346
2    FALSE     small inner on int   2    0.304
3    FALSE    medium inner on int   1    0.409
4    FALSE    medium inner on int   2    0.373
5    FALSE    medium outer on int   1    0.274
6    FALSE    medium outer on int   2    0.284
7    FALSE medium inner on factor   1    0.354
8    FALSE medium inner on factor   2    0.320
9    FALSE       big inner on int   1    0.774
10   FALSE       big inner on int   2    0.482

./_launcher/solution.R --solution=pyhdk --task=join --nrow=1e8
[1] "./pyhdk/join-pyhdk.py"
# join-pyhdk.py
pyhdk data_name:  J1_1e8_NA_0_0
loading datasets J1_1e8_NA_0_0, J1_1e8_1e2_0_0, J1_1e8_1e5_0_0, J1_1e8_1e8_0_0
Using fragment size 4000000
100000000
100
100000
100000000
joining...
(89997128, 9)
(89997128, 9)
(89995511, 11)
(89995511, 11)
(100000000, 11)
(100000000, 11)
(89995511, 11)
(89995511, 11)
(90000000, 13)
(90000000, 13)
joining finished, took 64s
   on_disk               question run time_sec
1    FALSE     small inner on int   1    1.009
2    FALSE     small inner on int   2    0.903
3    FALSE    medium inner on int   1    0.970
4    FALSE    medium inner on int   2    0.971
5    FALSE    medium outer on int   1    0.754
6    FALSE    medium outer on int   2    0.840
7    FALSE medium inner on factor   1    1.086
8    FALSE medium inner on factor   2    0.999
9    FALSE       big inner on int   1    1.501
10   FALSE       big inner on int   2    1.338

./_launcher/solution.R --solution=pyhdk --task=join --nrow=1e9
[1] "./pyhdk/join-pyhdk.py"
# join-pyhdk.py
pyhdk data_name:  J1_1e9_NA_0_0
loading datasets J1_1e9_NA_0_0, J1_1e9_1e3_0_0, J1_1e9_1e6_0_0, J1_1e9_1e9_0_0
Using fragment size 4000000
1000000000
1000
1000000
1000000000
joining...
(899999033, 9)
[thread 984394 also had an error][thread 983253 also had an error][thread 984245 also had an error][thread 986622 also had an error]



[thread 988259 also had an error][thread 985551 also had an error]

[thread 982395 also had an error]
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f7723a7c2d6, pid=858290, tid=984071
#
# JRE version: OpenJDK Runtime Environment (20.0) (build 20-internal-adhoc..src)
# Java VM: OpenJDK 64-Bit Server VM (20-internal-adhoc..src, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# C  [libResultSet.so+0xbb2d6]  ResultSet::getRowAt[abi:cxx11](unsigned long, bool, bool, bool, std::vector<bool, std::allocator<bool> > const&) const+0x1d6
#
# Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E" (or dumping to /localdisk/dmitriim/benchmarks/db-benchmark/core.858290)
#
# An error report file with more information is saved as:
# /localdisk/dmitriim/benchmarks/db-benchmark/hs_err_pid858290.log
#
# If you would like to submit a bug report, please visit:
#   https://bugreport.java.com/bugreport/crash.jsp
#
Aborted (core dumped)

Devjiu · 2023-09-14T13:15:57Z

Looks like #616 (comment) large joins are failing in all backends.

https://duckdb.org/2023/04/14/h2oai.html

ienkovich · 2023-09-14T15:34:10Z

Looks like #616 (comment) large joins are failing in all backends.

It looks like it's simply due to the fact that the machine doesn't have enough memory for such a big join. It has just 160GB of memory and IIUC each of the joined tables is 50GB.

ienkovich

Could you please explain what redundant copies you actually remove? For me, it looks like some inlining + parallelization using TBB.

Varlen part of the new code looks completely dysfunctional (size to copy in write_ptrs would be some negative int casted to size_t and should cause SEGFAULT on mempcy). Is it actually ever triggered in our tests? I feel like we don't actually support the whole column fetch for varlen data.

ienkovich · 2023-09-18T21:21:44Z

python/pyhdk/hdk.py

@@ -2899,6 +2899,9 @@ def if_then_else(self, cond, true_val, false_val):
        """
        return self._builder.if_then_else(cond, true_val, false_val)

+    def clear_cache(self):


Please avoid unrelated change.

ienkovich · 2023-09-18T21:22:16Z

omniscidb/QueryEngine/JoinHashTable/Runtime/JoinColumnIterator.h

@@ -73,6 +73,7 @@ struct JoinColumnIterator {
  DEVICE FORCE_INLINE JoinColumnIterator& operator++() {
    index += step;
    index_inside_chunk += step;
+    // this loop is made to find index_of_chunk by total index of element


Please avoid unrelated changes.

omniscidb/QueryEngine/ColumnFetcher.cpp

ienkovich · 2023-09-18T21:23:14Z

omniscidb/QueryEngine/ColumnFetcher.cpp

  const auto fragments_it = all_tables_fragments.find({db_id, table_id});
  CHECK(fragments_it != all_tables_fragments.end());
+


What is this vertical space added for?

for visual indention, will remove.

ienkovich · 2023-09-18T21:30:32Z

omniscidb/QueryEngine/ColumnFetcher.cpp

-      auto merged_results =
-          ColumnarResults::mergeResults(executor_->row_set_mem_owner_, column_frags);
+      const auto& fragment = (*fragments)[frag_id];
+      const auto& rows_in_frag = fragment.getNumTuples();


A reference here looks inappropriate.

ienkovich · 2023-09-18T21:30:51Z

omniscidb/QueryEngine/ColumnFetcher.cpp

+      total_row_count += rows_in_frag;
+    }
+
+    const auto& type_width = col_info->type->size();


Why reference?

ienkovich · 2023-09-18T21:36:45Z

omniscidb/QueryEngine/ColumnFetcher.cpp

+      valid_fragments.push_back(frag_id);
+    }
+
+    if (write_ptrs.empty()) {


Maybe make an empty table check right after the row count computation?

omniscidb/QueryEngine/ColumnFetcher.cpp

Devjiu · 2023-09-19T13:05:31Z

Could you please explain what redundant copies you actually remove? For me, it looks like some inlining + parallelization using TBB.

Varlen part of the new code looks completely dysfunctional (size to copy in write_ptrs would be some negative int casted to size_t and should cause SEGFAULT on mempcy). Is it actually ever triggered in our tests? I feel like we don't actually support the whole column fetch for varlen data.

Okay, maybe current changes is simple parallelization case. Originally we had:

copy in ColumnarResults
in fetchBuffer ( from getBuffer)
in merge

Currently ColumnarResults c-tor already fixed, fetchBuffer should be zero copy, merge inline and parallelized.

ienkovich

This version is good overall! I suggest a few clean-ups though.

ienkovich · 2023-09-29T20:28:58Z

omniscidb/QueryEngine/ColumnFetcher.cpp

@@ -239,6 +240,11 @@ const int8_t* ColumnFetcher::getAllTableColumnFragments(
  int db_id = col_info->db_id;
  int table_id = col_info->table_id;
  int col_id = col_info->column_id;
+  if (col_info->type->isString() || col_info->type->isArray()) {


This should probably be a CHECK instead. It is an internal error, not some input error useful for the user.

ienkovich · 2023-09-29T20:30:54Z

omniscidb/QueryEngine/ColumnFetcher.cpp

+
+    size_t total_row_count = 0;
+    for (size_t frag_id = 0; frag_id < frag_count; ++frag_id) {
+      if (executor_->getConfig().exec.interrupt.enable_non_kernel_time_query_interrupt &&


Interruption check in this loop doesn't make much sense because we don't do any actual data fetch here. I suggest moving it out of the loop or removing it completely.

ienkovich · 2023-09-29T20:46:55Z

omniscidb/QueryEngine/ColumnFetcher.cpp

+      raw_write_ptrs.emplace_back(write_ptrs[i].first);
+    }
+
+    std::unique_ptr<ColumnarResults> merged_results(new ColumnarResults(


The vector ColumnarResults::ColumnarResults gets is not a pointer per fragment, it is a pointer per column (ColumnarResults can store multiple columns, but not multiple fragments). So actually you are supposed to pass a vector with a single pointer here. Your version works because the first vector element has a correct pointer and other elements are simply not used. But this code is still confusing, so let's fix it.

I don't understand. Do you mean that we are using only data from first fragment? If so why we are fetching them?

Ah, sorry, now I get it.

This commit refactors and simplifies method `getAllTableColumnFragments`. Also some parallelization added. Partially resolves: #574 Signed-off-by: Dmitrii Makarenko <[email protected]>

Devjiu force-pushed the dmitriim/remove_copies branch from 6cfba25 to 40fd548 Compare August 7, 2023 14:22

Devjiu changed the title ~~Dmitriim/remove copies~~ [Join] Remove redundant copies. Aug 7, 2023

Devjiu force-pushed the dmitriim/remove_copies branch from ab0e572 to eb8042f Compare August 8, 2023 14:38

Devjiu mentioned this pull request Aug 8, 2023

[Perf][Bench] Join is slow on big tables. #574

Open

Devjiu force-pushed the dmitriim/remove_copies branch 3 times, most recently from 1e7b776 to 32b48b4 Compare August 9, 2023 16:12

Devjiu force-pushed the dmitriim/remove_copies branch 3 times, most recently from 13a559f to 93f9e17 Compare August 24, 2023 17:03

Devjiu force-pushed the dmitriim/remove_copies branch from 93f9e17 to 235508d Compare August 24, 2023 22:33

Devjiu force-pushed the dmitriim/remove_copies branch 2 times, most recently from b81d607 to a6dbaa4 Compare September 13, 2023 12:50

Devjiu marked this pull request as ready for review September 13, 2023 13:07

Devjiu requested a review from ienkovich September 13, 2023 13:07

Devjiu force-pushed the dmitriim/remove_copies branch from a6dbaa4 to 3e26fa2 Compare September 15, 2023 16:06

Devjiu mentioned this pull request Sep 15, 2023

Join perf analysis #579

Closed

ienkovich reviewed Sep 18, 2023

View reviewed changes

Devjiu force-pushed the dmitriim/remove_copies branch from 3e26fa2 to e342fae Compare September 27, 2023 14:08

Devjiu changed the title ~~[Join] Remove redundant copies.~~ [Join] Inline and parallelize tbb in getAllTableColumnFragments. Sep 27, 2023

Devjiu force-pushed the dmitriim/remove_copies branch from ba5af91 to a501ccd Compare September 28, 2023 11:14

ienkovich suggested changes Sep 29, 2023

View reviewed changes

Devjiu force-pushed the dmitriim/remove_copies branch from a501ccd to 05b98f3 Compare October 2, 2023 14:25

[Join] Inline and parallelize tbb in getAllTableColumnFragments.

8c4a252

This commit refactors and simplifies method `getAllTableColumnFragments`. Also some parallelization added. Partially resolves: #574 Signed-off-by: Dmitrii Makarenko <[email protected]>

Devjiu force-pushed the dmitriim/remove_copies branch from 05b98f3 to 8c4a252 Compare October 6, 2023 15:01

ienkovich approved these changes Oct 6, 2023

View reviewed changes

Devjiu merged commit f363014 into main Oct 6, 2023

Devjiu deleted the dmitriim/remove_copies branch October 6, 2023 20:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Join] Inline and parallelize tbb in getAllTableColumnFragments. #616

[Join] Inline and parallelize tbb in getAllTableColumnFragments. #616

Devjiu commented Aug 3, 2023 •

edited

Loading

Devjiu commented Aug 17, 2023 •

edited

Loading

Devjiu commented Aug 24, 2023 •

edited

Loading

Devjiu commented Aug 25, 2023

Devjiu commented Aug 25, 2023

Devjiu commented Sep 13, 2023

Devjiu commented Sep 13, 2023

Devjiu commented Sep 14, 2023

ienkovich commented Sep 14, 2023

ienkovich left a comment

ienkovich Sep 18, 2023

ienkovich Sep 18, 2023

ienkovich Sep 18, 2023

Devjiu Sep 19, 2023

ienkovich Sep 18, 2023

ienkovich Sep 18, 2023

ienkovich Sep 18, 2023

Devjiu commented Sep 19, 2023

ienkovich left a comment

ienkovich Sep 29, 2023

ienkovich Sep 29, 2023

ienkovich Sep 29, 2023

Devjiu Oct 2, 2023

Devjiu Oct 2, 2023

		const auto fragments_it = all_tables_fragments.find({db_id, table_id});
		CHECK(fragments_it != all_tables_fragments.end());

[Join] Inline and parallelize tbb in getAllTableColumnFragments. #616

[Join] Inline and parallelize tbb in getAllTableColumnFragments. #616

Conversation

Devjiu commented Aug 3, 2023 • edited Loading

Devjiu commented Aug 17, 2023 • edited Loading

Devjiu commented Aug 24, 2023 • edited Loading

Devjiu commented Aug 25, 2023

Devjiu commented Aug 25, 2023

Devjiu commented Sep 13, 2023

Devjiu commented Sep 13, 2023

Devjiu commented Sep 14, 2023

ienkovich commented Sep 14, 2023

ienkovich left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Devjiu commented Sep 19, 2023

ienkovich left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Devjiu commented Aug 3, 2023 •

edited

Loading

Devjiu commented Aug 17, 2023 •

edited

Loading

Devjiu commented Aug 24, 2023 •

edited

Loading