Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLBlast failed to compile with CLTune enabled #61

Closed
fonghou opened this issue May 28, 2016 · 9 comments
Closed

CLBlast failed to compile with CLTune enabled #61

fonghou opened this issue May 28, 2016 · 9 comments

Comments

@fonghou
Copy link

fonghou commented May 28, 2016

Hello, Please see for detail.

uncomplicate/neanderthal#17

@fonghou
Copy link
Author

fonghou commented May 29, 2016

Hello, some progress. Installing cuda toolkit 7.5. CLTune compiled successfully.

Only other issue was

home/ec2-user/CLBlast/include/internal/database/xaxpy.h:35:41: error: redefinition of ‘const clblast::Database::DatabaseEntry clblast::Database::XaxpySingle’
 const Database::DatabaseEntry Database::XaxpySingle = {
                                         ^
/home/ec2-user/CLBlast/include/internal/database/xaxpy.h:17:31: error: ‘const clblast::Database::DatabaseEntry clblast::Database::XaxpySingle’ previously defined here
 const Database::DatabaseEntry Database::XaxpySingle = {
                               ^

Deleting dup definitions resolved them. Not sure about root cause, python script or dup entries in json files?

Finally, see some speedup.

GEMM 8192x8192

CPU: "Elapsed time: 11074.600465 msecs"
GPU: "Elapsed time: 1571.362177 msecs"

@CNugteren
Copy link
Owner

I am not sure how your previous issues were caused and solved, but if you think CLBlast can be improved to prevent such issues in the future let me know how you solved them and what I can do to help out.

About your duplicate definitions issue: which branch did you check out? The master branch? I cannot find the duplicate definition, but maybe I am looking at the wrong place.

About the speed-up results: did you also actually run the tuner first? If you did, can you upload the results here: #1. That way it will be fast out-of-the-box in the next version for your GPU! If not, can you run them as well? Also, are the CPU times also using CLBlast or using some other BLAS library?

@fonghou
Copy link
Author

fonghou commented May 29, 2016

I am not sure how your previous issues were caused and solved.

The root cause of previous issues, undefined reference to "...@OPENCL_1.0" symbol in my libclblast.so build, was that I upgraded nvidia driver in aws g2 ami, but didn't know I need to upgrade old cuda sdk, which has OPENCL_1.0 headers in /opt/nvidia/cuda. Installing cuda sdk 7.5 resolved that issue as your README mentions.

About your duplicate definitions issue: which branch did you check out? The master branch? I cannot find the duplicate definition, but maybe I am looking at the wrong place.

Yes, I checked out master branch. Duplicate definitions only showed up after running below two steps:

make alltuners
python ../scripts/database/database.py . ..

I saw NVIDIA GRID K520 device got inserted into database/*.h files. However, there was new Database::*Single also inserted. Here is a sample snippet. Could it be an issue in python json parsing or code generation? I can reproduce it in my laptop and aws g2 environment. Let me know if more detail would help.

// =================================================================================================

const Database::DatabaseEntry Database::CopySingle = {
  "Copy", Precision::kSingle, {
    { // Intel GPUs
      kDeviceTypeGPU, "Intel", {
        { "Intel(R) HD Graphics Skylake ULT GT2",            { {"COPY_DIMX",8}, {"COPY_DIMY",8}, {"COPY_VW",4}, {"COPY_WPT",1} } },
        { "default",                                         { {"COPY_DIMX",8}, {"COPY_DIMY",8}, {"COPY_VW",4}, {"COPY_WPT",1} } },
      }
    },
    { // Default
      kDeviceTypeAll, "default", {
        { "default",                                         { {"COPY_DIMX",8}, {"COPY_DIMY",8}, {"COPY_VW",4}, {"COPY_WPT",1} } },
      }
    },
  }
};

// =================================================================================================

const Database::DatabaseEntry Database::CopySingle = {
  "Copy", Precision::kSingle, {
    { // AMD GPUs
      kDeviceTypeGPU, "AMD", {
        { "AMD Radeon R9 M370X Compute Engine",              { {"COPY_DIMX",32}, {"COPY_DIMY",8}, {"COPY_VW",4}, {"COPY_WPT",1} } },
        { "Hawaii",                                          { {"COPY_DIMX",32}, {"COPY_DIMY",8}, {"COPY_VW",2}, {"COPY_WPT",2} } },
        { "Pitcairn",                                        { {"COPY_DIMX",8}, {"COPY_DIMY",16}, {"COPY_VW",4}, {"COPY_WPT",1} } },
        { "Tahiti",                                          { {"COPY_DIMX",32}, {"COPY_DIMY",8}, {"COPY_VW",2}, {"COPY_WPT",2} } },
        { "default",                                         { {"COPY_DIMX",8}, {"COPY_DIMY",8}, {"COPY_VW",2}, {"COPY_WPT",1} } },
      }
    },
    { // ARM GPUs
      kDeviceTypeGPU, "ARM", {
        { "Mali-T628",                                       { {"COPY_DIMX",32}, {"COPY_DIMY",8}, {"COPY_VW",2}, {"COPY_WPT",4} } },
        { "default",                                         { {"COPY_DIMX",32}, {"COPY_DIMY",8}, {"COPY_VW",2}, {"COPY_WPT",4} } },
      }
    },
    { // Intel CPUs
      kDeviceTypeCPU, "Intel", {
        { "Intel(R) Core(TM) i5-6200U CPU @ 2.30GHz",        { {"COPY_DIMX",32}, {"COPY_DIMY",16}, {"COPY_VW",8}, {"COPY_WPT",2} } },
        { "Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz",         { {"COPY_DIMX",32}, {"COPY_DIMY",16}, {"COPY_VW",8}, {"COPY_WPT",1} } },
        { "Intel(R) Core(TM) i7-5930K CPU @ 3.50GHz",        { {"COPY_DIMX",32}, {"COPY_DIMY",8}, {"COPY_VW",8}, {"COPY_WPT",1} } },
        { "default",                                         { {"COPY_DIMX",32}, {"COPY_DIMY",8}, {"COPY_VW",8}, {"COPY_WPT",1} } },
      }
    },
    { // Intel GPUs
      kDeviceTypeGPU, "Intel", {
        { "HD Graphics 4000",                                { {"COPY_DIMX",16}, {"COPY_DIMY",8}, {"COPY_VW",2}, {"COPY_WPT",1} } },
        { "Iris",                                            { {"COPY_DIMX",16}, {"COPY_DIMY",8}, {"COPY_VW",1}, {"COPY_WPT",2} } },
        { "Iris Pro",                                        { {"COPY_DIMX",32}, {"COPY_DIMY",8}, {"COPY_VW",4}, {"COPY_WPT",4} } },
        { "default",                                         { {"COPY_DIMX",16}, {"COPY_DIMY",8}, {"COPY_VW",1}, {"COPY_WPT",1} } },
      }
    },
    { // Intel accelerators
      kDeviceTypeAccelerator, "Intel", {
        { "Intel(R) Many Integrated Core Acceleration Card", { {"COPY_DIMX",32}, {"COPY_DIMY",8}, {"COPY_VW",8}, {"COPY_WPT",1} } },
        { "default",                                         { {"COPY_DIMX",32}, {"COPY_DIMY",8}, {"COPY_VW",8}, {"COPY_WPT",1} } },
      }
    },
    { // NVIDIA GPUs
      kDeviceTypeGPU, "NVIDIA", {
        { "GeForce GTX 480",                                 { {"COPY_DIMX",8}, {"COPY_DIMY",8}, {"COPY_VW",4}, {"COPY_WPT",1} } },
        { "GeForce GTX 680",                                 { {"COPY_DIMX",32}, {"COPY_DIMY",16}, {"COPY_VW",4}, {"COPY_WPT",1} } },
        { "GeForce GTX 750 Ti",                              { {"COPY_DIMX",32}, {"COPY_DIMY",8}, {"COPY_VW",2}, {"COPY_WPT",1} } },
        { "GeForce GTX 980",                                 { {"COPY_DIMX",32}, {"COPY_DIMY",16}, {"COPY_VW",1}, {"COPY_WPT",1} } },
        { "GeForce GTX TITAN",                               { {"COPY_DIMX",32}, {"COPY_DIMY",8}, {"COPY_VW",2}, {"COPY_WPT",4} } },
        { "GeForce GTX TITAN X",                             { {"COPY_DIMX",32}, {"COPY_DIMY",8}, {"COPY_VW",2}, {"COPY_WPT",1} } },
        { "Tesla K20m",                                      { {"COPY_DIMX",8}, {"COPY_DIMY",8}, {"COPY_VW",4}, {"COPY_WPT",4} } },
        { "Tesla K40m",                                      { {"COPY_DIMX",8}, {"COPY_DIMY",8}, {"COPY_VW",4}, {"COPY_WPT",2} } },
        { "default",                                         { {"COPY_DIMX",8}, {"COPY_DIMY",8}, {"COPY_VW",1}, {"COPY_WPT",1} } },
      }
    },
    { // Default
      kDeviceTypeAll, "default", {
        { "default",                                         { {"COPY_DIMX",8}, {"COPY_DIMY",8}, {"COPY_VW",1}, {"COPY_WPT",1} } },
      }
    },
  }
};

About the speed-up results: did you also actually run the tuner first? If you did, can you upload the results here: #1. That way it will be fast out-of-the-box in the next version for your GPU! If not, can you run them as well? Also, are the CPU times also using CLBlast or using some other BLAS library?

Yes, I did run "make alltuners; python ...; make". Though I noticed a few tuner rules with certain parameters output "FAILED", overall build completed successfully. I'll upload the result soon.

Thank you very much for the help!

Regards,
Feng

@CNugteren
Copy link
Owner

CNugteren commented May 30, 2016

Thanks for the details Feng!

The issue you are seeing is a mistake on my side, sorry for that. The 'database' of tuner results (scripts/database/database.db) is not stored in the GitHub repository, as it is quite large and only useful when doing the tuning. Therefore, it is downloaded automatically the first time. However, the database you downloaded is newer than the version actually used in master. It actually doesn't just include new tuning results, it also adds support for half-precision fp16. And that's where things go wrong.

To fix it for now (if you want), you can replace scripts/database/database.db with http://www.cedricnugteren.nl/tuning/2016_05_16_clblast.db.

I will add your tuning results (after you've uploaded the .JSON files) to CLBlast such that it is included in the next release (also of Neanderthal).

About the tuner failing once a while: that's OK. It will automatically filter out those results.

@fonghou
Copy link
Author

fonghou commented May 30, 2016

Thank you for looking into the issue!

I'll make a clean build today, and upload tuner .json files at #1 .

Regards,
Feng

@fonghou
Copy link
Author

fonghou commented May 30, 2016

@CNugteren, tried what you suggested, still getting the same issue. Not urgent as editing database/*.h seems working fine for me to make a local build. Since you are working on fp16 dev branch right now, I'll wait and run it again when you have next release ready. Just to log some details here.

First, removed everything, and checkout a fresh copy of CLBlast master branch.

Modify DATABASE_SERVER_URL

diff --git a/scripts/database/database.py b/scripts/database/database.py
index 8e8f37f..fc44891 100644
--- a/scripts/database/database.py
+++ b/scripts/database/database.py
@@ -24,7 +24,8 @@ except ImportError:
 import pandas as pd

 # Server storing a copy of the database
-DATABASE_SERVER_URL = "http://www.cedricnugteren.nl/tuning/clblast.db"
+DATABASE_SERVER_URL = "http://www.cedricnugteren.nl/tuning/2016_05_16_clblast.db"

Ran cmake ...; make; make alltuners

while running python ... did see it was downloading previous db

[ec2-user@ip-172-31-26-167 build]$ python ../scripts/database/database.py . ..
## Downloading database from 'http://www.cedricnugteren.nl/tuning/2016_05_16_clblast.db'...
## Loading the database from disk...
## Processing './clblast_xger_6464.json' with 130 new items
## Processing './clblast_xgemv_2_3232.json' with 18 new items
## Processing './clblast_transpose_64.json' with 56 new items
## Processing './clblast_copy_6464.json' with 144 new items
## Processing './clblast_xaxpy_64.json' with 80 new items
## Processing './clblast_xger_64.json' with 130 new items
## Processing './clblast_transpose_32.json' with 56 new items
## Processing './clblast_xgemv_1_6464.json' with 9 new items
## Processing './clblast_xgemm_6464.json' with 73 new items
## Processing './clblast_pad_32.json' with 81 new items
## Processing './clblast_xdot_2_32.json' with 6 new items
## Processing './clblast_xaxpy_6464.json' with 80 new items
## Processing './clblast_xger_32.json' with 130 new items
## Processing './clblast_xgemv_2_64.json' with 18 new items
## Processing './clblast_xgemv_1_3232.json' with 9 new items
## Processing './clblast_copy_64.json' with 144 new items
## Processing './clblast_padtranspose_32.json' with 18 new items
## Processing './clblast_xdot_2_6464.json' with 6 new items
## Processing './clblast_xdot_1_32.json' with 6 new items
## Processing './clblast_xger_3232.json' with 130 new items
## Processing './clblast_transpose_3232.json' with 56 new items
## Processing './clblast_padtranspose_64.json' with 18 new items
## Processing './clblast_pad_6464.json' with 81 new items
## Processing './clblast_xgemv_1_64.json' with 9 new items
## Processing './clblast_xgemv_2_6464.json' with 18 new items
## Processing './clblast_xgemm_3232.json' with 114 new items
## Processing './clblast_copy_3232.json' with 144 new items
## Processing './clblast_pad_3232.json' with 81 new items
## Processing './clblast_xdot_1_6464.json' with 6 new items
## Processing './clblast_xgemv_3_3232.json' with 18 new items
## Processing './clblast_xgemm_32.json' with 117 new items
## Processing './clblast_pad_64.json' with 81 new items
## Processing './clblast_copy_32.json' with 144 new items
## Processing './clblast_padtranspose_3232.json' with 18 new items
## Processing './clblast_xdot_2_3232.json' with 6 new items
## Processing './clblast_xgemv_2_32.json' with 18 new items
## Processing './clblast_xaxpy_32.json' with 80 new items
## Processing './clblast_xdot_1_64.json' with 6 new items
## Processing './clblast_xdot_1_3232.json' with 6 new items
## Processing './clblast_xgemv_3_6464.json' with 18 new items
## Processing './clblast_padtranspose_6464.json' with 12 new items
## Processing './clblast_transpose_6464.json' with 40 new items
## Processing './clblast_xgemm_64.json' with 104 new items
## Processing './clblast_xgemv_3_64.json' with 18 new items
## Processing './clblast_xaxpy_3232.json' with 80 new items
## Processing './clblast_xgemv_1_32.json' with 9 new items
## Processing './clblast_xdot_2_64.json' with 6 new items
## Processing './clblast_xgemv_3_32.json' with 18 new items
## Storing the database to disk...
## Calculating the best results per device/kernel...
## Producing a C++ database in '../include/internal/database'...
## All done

Still getting redefinition errors

[ec2-user@ip-172-31-26-167 build]$ make
[  1%] Building CXX object CMakeFiles/clblast.dir/src/database.cc.o
In file included from /home/ec2-user/CLBlast/src/database.cc:15:0:
/home/ec2-user/CLBlast/include/internal/database/xaxpy.h:35:41: error: redefinition of ‘const clblast::Database::DatabaseEntry clblast::Database::XaxpySingle’
 const Database::DatabaseEntry Database::XaxpySingle = {
                                         ^
/home/ec2-user/CLBlast/include/internal/database/xaxpy.h:17:31: error: ‘const clblast::Database::DatabaseEntry clblast::Database::XaxpySingle’ previously defined here
 const Database::DatabaseEntry Database::XaxpySingle = {
                               ^
In file included from /home/ec2-user/CLBlast/src/database.cc:20:0:
/home/ec2-user/CLBlast/include/internal/database/copy.h:35:41: error: redefinition of ‘const clblast::Database::DatabaseEntry clblast::Database::CopySingle’
 const Database::DatabaseEntry Database::CopySingle = {
                                         ^
In file included from /home/ec2-user/CLBlast/src/database.cc:20:0:
/home/ec2-user/CLBlast/include/internal/database/copy.h:17:31: error: ‘const clblast::Database::DatabaseEntry clblast::Database::CopySingle’ previously defined here
 const Database::DatabaseEntry Database::CopySingle = {
                               ^
In file included from /home/ec2-user/CLBlast/src/database.cc:21:0:
/home/ec2-user/CLBlast/include/internal/database/pad.h:35:41: error: redefinition of ‘const clblast::Database::DatabaseEntry clblast::Database::PadSingle’
 const Database::DatabaseEntry Database::PadSingle = {
                                         ^
In file included from /home/ec2-user/CLBlast/src/database.cc:21:0:
/home/ec2-user/CLBlast/include/internal/database/pad.h:17:31: error: ‘const clblast::Database::DatabaseEntry clblast::Database::PadSingle’ previously defined here
 const Database::DatabaseEntry Database::PadSingle = {
                               ^
In file included from /home/ec2-user/CLBlast/src/database.cc:22:0:
/home/ec2-user/CLBlast/include/internal/database/transpose.h:35:41: error: redefinition of ‘const clblast::Database::DatabaseEntry clblast::Database::TransposeSingle’
 const Database::DatabaseEntry Database::TransposeSingle = {
                                         ^
In file included from /home/ec2-user/CLBlast/src/database.cc:22:0:
/home/ec2-user/CLBlast/include/internal/database/transpose.h:17:31: error: ‘const clblast::Database::DatabaseEntry clblast::Database::TransposeSingle’ previously defined here
 const Database::DatabaseEntry Database::TransposeSingle = {
                               ^
In file included from /home/ec2-user/CLBlast/src/database.cc:23:0:
/home/ec2-user/CLBlast/include/internal/database/padtranspose.h:35:41: error: redefinition of ‘const clblast::Database::DatabaseEntry clblast::Database::PadtransposeSingle’
 const Database::DatabaseEntry Database::PadtransposeSingle = {
                                         ^
/home/ec2-user/CLBlast/include/internal/database/padtranspose.h:17:31: error: ‘const clblast::Database::DatabaseEntry clblast::Database::PadtransposeSingle’ previously defined here
 const Database::DatabaseEntry Database::PadtransposeSingle = {
                               ^
make[2]: *** [CMakeFiles/clblast.dir/src/database.cc.o] Error 1
make[1]: *** [CMakeFiles/clblast.dir/all] Error 2
make: *** [all] Error 2

Here is git diff at this point.
gitdiff.txt

@CNugteren
Copy link
Owner

CNugteren commented May 31, 2016

Sorry, I didn't test it myself but guessed that that version would work based on the date. I tested myself and indeed, it doesn't. This one does not include the fp16 results, I just tested it: http://www.cedricnugteren.nl/tuning/2016_05_08_clblast.db (note: first remove the old database.db file, otherwise it will not re-download).

Alternatively, you can just checkout the development branch and just the latest version of the code and database. This branch should also compile properly. However, also there there is still an issue with the new half-precision format: the database doesn't contain any tuning results for HGEMM yet. This will be fixed this week.

By the way, it will probably take another few weeks before a new release (0.8.0) will be available.

@fonghou
Copy link
Author

fonghou commented May 31, 2016

Built master branch successful again 2016_05_08_clblast.db. Thanks!

Here are some quick benchmark. CPU number is from single-thread ATLAS compiled on the same aws g2 instance. 20x speed-up!

matrix.core=> (bench-sgemv 8192)
CPU:
Warming up for JIT optimisations 5000000000 ...
Estimating execution count ...
Sampling ...
Final GC...
Checking GC...
Finding outliers ...
Bootstrapping ...
Checking outlier significance
Evaluation count : 30 in 6 samples of 5 calls.
             Execution time mean : 22.826832 ms
    Execution time std-deviation : 174.719555 µs
   Execution time lower quantile : 22.541743 ms ( 2.5%)
   Execution time upper quantile : 23.006858 ms (97.5%)
                   Overhead used : 2.067003 ns
GPU:
Warming up for JIT optimisations 5000000000 ...
  compilation occurred before 1 iterations
  compilation occurred before 330 iterations
Estimating execution count ...
Sampling ...
Final GC...
Checking GC...
Finding outliers ...
Bootstrapping ...
Checking outlier significance
Evaluation count : 216 in 6 samples of 36 calls.
             Execution time mean : 2.802051 ms
    Execution time std-deviation : 5.142751 µs
   Execution time lower quantile : 2.793694 ms ( 2.5%)
   Execution time upper quantile : 2.806382 ms (97.5%)
                   Overhead used : 2.067003 ns

matrix.core=> (bench-sgemm 4092)
CPU: "Elapsed time: 4132.343272 msecs"
GPU: "Elapsed time: 211.098989 msecs"
matrix.core=> (bench-sgemm 8192)
CPU: "Elapsed time: 32414.521019 msecs"
GPU: "Elapsed time: 1577.384848 msecs"

@CNugteren
Copy link
Owner

OK, I guess we can close this issue then. The database issue will remain until a new version of CLBlast is released, but this will not impact a regular user of the library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants