-
-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mm! is slower and crashing on Macbook pro gpu #17
Comments
mm could be slow(er) for two reasons:
Here is the list of devices that the current version (0.6.2) is tuned to: Obviously, your GPUs are not on the list, so if you would like them to be in the next version, ask and I'll send you the pointers on how to tune and optimize the library. It involves some native code compilation, but is automated and reasonably easy. There are also tests for the native library that I use for GPU BLAS, so that may also help us to debug the JVM crash that you have. |
BTW, I am using Arch Linux, and one of the good things about it is that it has fairly recent drivers for my AMD GPUs, which are also regularly patched for newer kernels. |
Thanks for the info. I'm a newbei in OpenCL, but would love to give it a try. Though I'm more interested in making it work then tuning it on aws g2 instance, which has nvidia gpu and nvidia releasted ami. I've upgraded it to the latest nvidia driver and gcc-4.9/libstd++ which resolved libstd++ CXXABI_1.3.8 link error. However, getting another link error now: java.lang.UnsatisfiedLinkError: /tmp/JOCLBlast_0_7_1-linux-x86_64_dependents/linux/x86_64/libclblast.so: /usr/lib64/libOpenCL.so: version `OPENCL_2.0' not found (required by /tmp/JOCLBlast_0_7_1-linux-x86_64_dependents/linux/x86_64/libclblast.so) Here are some system info. [ec2-user@ip-172-31-26-167 ~]$ uname -a ==============NVSMI LOG============== Timestamp : Sat May 28 03:16:13 2016 Attached GPUs : 1 Apparently, nvidia linux only supports OpenCL 1.2. Also, GPU device is GRID K520, which is not in your tuned list. I think I have to compile JOCL native libraries first for OpenCL 1.2, then compile and tune your library, correct? Appreciate any help you can provide! Thanks! |
You do not need to recompile, neanderthal should work with OpenCl 1.2. Check os, drivers, and the calling code. |
... but you can try to compile and test https://github.com/CNugteren/CLBlast to see what happens. When we make it work, then you tune it to make it work fast. |
Above example works now after compiling CLBlast. However, I couldn't make it work with CLTune. Here is the cmake that failed with lots of undefined reference to ATL_... errors. cmake -DCMAKE_INSTALL_PREFIX=/opt/CLBlast-tune -DOPENCL_ROOT=/opt/nvidia/cuda -DCLTUNE_ROOT=/opt/CLTune -DTUNERS=ON -DTESTS=ON -DCBLAS_ROOT=/opt/ATLAS .. |
@fonghou I think that it is better to ask for help with that in CLBlast issues. |
Remove -DTESTS=ON, cmake -DCMAKE_INSTALL_PREFIX=/opt/CLBlast-tune -DOPENCL_ROOT=/opt/nvidia/cuda -DCLTUNE_ROOT=/opt/CLTune -DTUNERS=ON .. Scanning dependencies of target clblast_tuner_copy |
@blueberry Yes, I'll follow up there. Thanks again for your help! |
@fonghou On the surface, it looks to me that Nvidia made some mess in drivers, so OpenCL 1.0 gets picked up, and CLBlast requires access at least to OpenCL 1.1, which is itself ancient. I hope Cedric (CLBlast author) will be able to help, since he is using Nvidia himself. |
Another idea: clean up old Nvidia drivers, make sure nothing is left over, and then reinstall the latest drivers. |
@blueberry Thanks for helping with the initial support. I'll follow-up as soon as possible in the CLBlast issues. The requirements for CLBlast are indeed OpenCL 1.1 or higher and GCC 4.9.0 or higher. |
@fonghou OK, since you found the solution through CLBlast issues, I'll close this for nw. |
Hello,
Trying a few examples today. On my macbook pro (ealy 2013), ran this code
First gpu device is Intel HD Graphics 4000. GPU is slower.
:device-type :gpu, :vendor "Intel", :vendor-id 16925696, :device-version "OpenCL 1.2 ", :driver-version "1.2(Apr 26 2016 00:33:44)"
CPU:
"Elapsed time: 835.101025 msecs"
GPU:
object[org.jocl.cl_command_queue 0x72581c95 cl_command_queue[0x7f897a5b03f0]]
CLGeneralMatrix[float, COL, mxn: 4096x4096, offset:0, ld:4096>]
"Elapsed time: 1968.578958 msecs"
object[org.jocl.cl_command_queue 0x72581c95 "cl_command_queue[0x7f897a5b03f0]"]
Run the same code by switching to second gpu device - NVIDIA GeForce GT 650M.
:device-type :gpu, :vendor "NVIDIA", :vendor-id 16918272, :device-version "OpenCL 1.2 ", :driver-version "10.10.10 310.42.25f01"
JVM crashed in native code.
Are these results expected on old macbook pro? Would like to see some results from newer mac.
Another note, I did a quick attempt on AWS G2 instance using Amazon Linux AMI with NVIDIA GRID GPU Driver. OS failed to load libJOCL_2_0_0-linux-x86_64.so with LinkError because libstdc++ does not support CXXABI_1.3.8 (is there jocl built for CXXABI_1.3.7?). I may try to build a custom AMI. It'd be really helpful if someone did it before give some guidance, e.g. what linux distribution, opencl package, versions etc.
Thanks,
Feng
The text was updated successfully, but these errors were encountered: