注意: 千万不要在虚拟机机中操作,不会成功的。因为目前不支持。
要想成功,需要在实体机中操作。
准备
确认版本
主要确认CUDA toolkit和nvidia的驱动版本。
经过实践之后,发现最靠谱的确定思路是:
首先根据本机的显卡版本,确定nvidia显卡的驱动版本,然后根据驱动版本确定CUDA toolkit的版本。
-
查看显卡的类型
可以看到显卡的类型为GeForce GTX 1060 3G
CUDA的core个数为: 1152个
- 确定显卡的驱动版本
https://www.geforce.com/drivers
然后可以查询到所有支持该显卡的驱动版本,最上边的为最新版本(除了beta版本)。
可看到当前nvidia显卡最新的驱动版本为: 390.87
- 确定CUDA toolkit的版本
CUDA toolkit对nvidia的版本有要求, 可参见https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html中的CUDA Driver部分的说明:
linux平台下,由于nvidia driver的最新版本为390.87,所以无法选择CUDA 9.2, 因为它对driver的要求是>=396.26, 所以选择CUDA 9.1,它的要求是>=390.46, 满足要求。
- 查看系统和内核的要求
参见https://docs.nvidia.com/cuda/archive/9.1/cuda-installation-guide-linux/index.html中System Requirements部分的说明:
可见CUDA 9.1对各系统的要求。
比如CentOS 7.x,要求内核3.10, gcc版本4.8.5, GLIBC版本2.17等。
必要的查询
可参考https://docs.nvidia.com/cuda/archive/9.1/cuda-installation-guide-linux/index.html中的第2章。
- (1) 查看是否存在支持CUDA的GPU
lspci | grep -i nvidia
可以在https://developer.nvidia.com/cuda-gpus查询本机的显卡是否支持CUDA。
- (2) 查看当前linux版本是否支持
The CUDA Development Tools are only supported on some specific distributions of Linux.
$ uname -m && cat /etc/*release
You should see output similar to the following, modified for your particular system:
x86_64
Red Hat Enterprise Linux Workstation release 6.0 (Santiago)
The x86_64 line indicates you are running on a 64-bit system.
The remainder gives information about your distribution.
- (3) 查看gcc的版本:
$ gcc --version
- (4) 查看glibc版本
ll /lib64/libc.so.*
- (5) 安装当前内核需要的kernel headers
这个步骤很重要。
sudo yum install "kernel-devel-uname-r == $(uname -r)"
安装显卡驱动和CUDA toolkit
Handle Conflicting Installation Methods中提到:
可见,同版本的显卡驱动和CUDA toolkit,如果再次安装时,需要卸载旧的版本。
如果CUDA toolkit已安装,可用如下途径卸载:
To uninstall the CUDA Toolkit, run the uninstall script in /usr/local/cuda-9.1/bin
To uninstall the NVIDIA Driver, run nvidia-uninstall
安装显卡driver
- yum安装
大部分 Linux 发行版都使用开源的显卡驱动 nouveau,对于 nvidia 显卡来说,还是闭源的官方驱动的效果更好。
安装官方显卡驱动,可参考这个网址:https://blog.csdn.net/u013378306/article/details/69229919
里边介绍了一种简单的用yum安装nvidia显卡驱动的方法。
操作之前需要屏蔽默认带有的nouveau。
lsmod | grep nouveau
如果以上语句没有输出,则表示屏蔽默认带有的nouveau
成功。
这种方式,最后一步:
yum -y install kmod-nvidia
有时可能不成功,不过不妨碍使用
nvidia-detect -v
返回的结果去查找对应的驱动版本,进行安装。
- 源码安装
查找驱动的靠谱地址: https://www.geforce.com/drivers
安装过程可参考: https://blog.csdn.net/itaacy/article/details/72628792?utm_source=itdadao&utm_medium=referral
显卡安装成功后,可用如下命令查看显卡信息:
nvidia-smi
出现以上信息,说明显卡驱动安装成功。
卸载显卡驱动,可用如下指令:
nvidia-uninstall
安装 CUDA toolkit
注: 安装前应该关闭gnome。
获取CUDA toolkit下载地址:
CUDA toolkit 下载地址: https://developer.nvidia.com/cuda-toolkit-archive
下载CUDA 9.1。
安装CUDA:
sh cuda_9.1.85_387.26_linux.run
安装过程(以下是某次安装9.2版本的日志,仅参考):
Do you accept the previously read EULA?
accept/decline/quit: accept
Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 396.37?
(y)es/(n)o/(q)uit: yes
Do you want to install the OpenGL libraries?
(y)es/(n)o/(q)uit [ default is yes ]: yes
Do you want to run nvidia-xconfig?
This will update the system X configuration file so that the NVIDIA X driver
is used. The pre-existing X configuration file will be backed up.
This option should not be used on systems that require a custom
X configuration, such as systems with multiple GPU vendors.
(y)es/(n)o/(q)uit [ default is no ]: y
Install the CUDA 9.2 Toolkit?
(y)es/(n)o/(q)uit: y
Enter Toolkit Location
[ default is /usr/local/cuda-9.2 ]: y
Toolkit location must be an absolute path.
Enter Toolkit Location
[ default is /usr/local/cuda-9.2 ]: /usr/local/cuda-9.2
Do you want to install a symbolic link at /usr/local/cuda?
(y)es/(n)o/(q)uit: y
Install the CUDA 9.2 Samples?
(y)es/(n)o/(q)uit: y
Enter CUDA Samples Location
[ default is /root ]: y
Samples location must be an absolute path
Enter CUDA Samples Location
[ default is /root ]: y
Samples location must be an absolute path
Enter CUDA Samples Location
[ default is /root ]: /root
Installing the NVIDIA display driver...
安装成功的日志:
Installing the NVIDIA display driver...
Installing the CUDA Toolkit in /usr/local/cuda-9.2 ...
Installing the CUDA Samples in /root ...
Copying samples to /root/NVIDIA_CUDA-9.2_Samples now...
Finished copying samples.
===========
= Summary =
===========
Driver: Installed
Toolkit: Installed in /usr/local/cuda-9.2
Samples: Installed in /root
Please make sure that
- PATH includes /usr/local/cuda-9.2/bin
- LD_LIBRARY_PATH includes /usr/local/cuda-9.2/lib64, or, add /usr/local/cuda-9.2/lib64 to /etc/ld.so.conf and run ldconfig as root
To uninstall the CUDA Toolkit, run the uninstall script in /usr/local/cuda-9.2/bin
To uninstall the NVIDIA Driver, run nvidia-uninstall
Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-9.2/doc/pdf for detailed information on setting up CUDA.
Logfile is /tmp/cuda_install_3101.log
配置环境变量
http://08643.cn/p/73399a4c9114 参考这个设置环境变量。
验证cuda是否安装成功
cd /root/NVIDIA_CUDA-9.2_Samples/1_Utilities/deviceQuery
make
./deviceQuery
如果成功,会显示PASS。
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "GeForce GTX 1060 3GB"
CUDA Driver Version / Runtime Version 8.0 / 8.0
CUDA Capability Major/Minor version number: 6.1
Total amount of global memory: 3013 MBytes (3159293952 bytes)
( 9) Multiprocessors, (128) CUDA Cores/MP: 1152 CUDA Cores
GPU Max Clock rate: 1747 MHz (1.75 GHz)
Memory Clock rate: 4004 Mhz
Memory Bus Width: 192-bit
L2 Cache Size: 1572864 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 8.0, CUDA Runtime Version = 8.0, NumDevs = 1, Device0 = GeForce GTX 1060 3GB
Result = PASS
可以看到CUDA Driver Version / Runtime Version 8.0 / 8.0
( 9) Multiprocessors, (128) CUDA Cores/MP: 1152 CUDA Cores
等参数。
如何查看cuda的版本
nvcc --version
遇到问题及解决:
- The driver installation is unable to locate the kernel source.
The driver installation is unable to locate the kernel source. Please make sure that the kernel source packages are installed and set up correctly.
If you know that the kernel source packages are installed and set up correctly, you may pass the location of the kernel source with the '--kernel-source-path' flag.
解决方法:
sudo yum install epel-release
yum install --enablerepo=epel dkms
- Missing recommended library
Installing the NVIDIA display driver...
Installing the CUDA Toolkit in /usr/local/cuda-9.2 ...
Missing recommended library: libGLU.so
Missing recommended library: libX11.so
Missing recommended library: libXi.so
Missing recommended library: libXmu.so
Installing the CUDA Samples in /root ...
Copying samples to /root/NVIDIA_CUDA-9.2_Samples now...
Finished copying samples.
===========
= Summary =
===========
Driver: Installed
Toolkit: Installed in /usr/local/cuda-9.2
Samples: Installed in /root, but missing recommended libraries
Please make sure that
- PATH includes /usr/local/cuda-9.2/bin
- LD_LIBRARY_PATH includes /usr/local/cuda-9.2/lib64, or, add /usr/local/cuda-9.2/lib64 to /etc/ld.so.conf and run ldconfig as root
To uninstall the CUDA Toolkit, run the uninstall script in /usr/local/cuda-9.2/bin
To uninstall the NVIDIA Driver, run nvidia-uninstall
Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-9.2/doc/pdf for detailed information on setting up CUDA.
Logfile is /tmp/cuda_install_7498.log
解决方法:
yum install mesa-libGLES.x86_64 mesa-libGL-devel.x86_64
mesa-libGLU-devel.x86_64 mesa-libGLw.x86_64
mesa-libGLw-devel.x86_64 libXi-devel.x86_64
freeglut-devel.x86_64 freeglut.x86_64
- cudaGetDeviceCount returned 30
验证cuda安装是否成功时,出现如下提示:
[root@localhost deviceQuery]# ./deviceQuery
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
cudaGetDeviceCount returned 30
-> unknown error
Result = FAIL
[root@localhost deviceQuery]# pwd
/root/NVIDIA_CUDA-9.2_Samples/1_Utilities/deviceQuery
这种一般是nvidia显卡驱动的问题,需要安装最新的nvidia的驱动。
http://elrepo.org/tiki/tiki-index.php
rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
rpm -Uvh https://www.elrepo.org/elrepo-release-7.0-3.el7.elrepo.noarch.rpm
然后按照https://blog.csdn.net/u013378306/article/details/69229919中用yum方式安装nvidia的驱动。
-
cudaGetDeviceCount returned 35
这种一般是cuda版本的问题。确定正确的版本,安装即可。
CUDA driver version is insufficient for CUDA runtime version就是说cuda runtime库的版本比driver的版本高了,要么装更高版本的驱动,要么就用低一点版本的cuda runtime库,所有的库都可以在这里面找到http://developer.download.nvidia.com/compute/cuda/repos/
-
Your kernel headers for kernel xxx cannot be found
The solution is likely to be found at this question the short version being, run
sudo yum install "kernel-devel-uname-r == $(uname -r)"
That will install the kernel headers for the version of the kernel you are currently running.
References:
https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html
https://baiweiblog.wordpress.com/2017/07/21/cuda-8-0%E5%9C%A8linux%E4%B8%8A%E7%9A%84%E5%AE%89%E8%A3%85%E6%B5%81%E7%A8%8B/
https://stackoverflow.com/questions/38016466/installing-cuda-7-5-on-centos-7-unable-to-locate-the-kernel-source
https://bitsanddragons.wordpress.com/2016/10/07/cuda-on-centos-7/
https://devtalk.nvidia.com/default/topic/1027413/cuda-setup-and-installation/linux-installation-error-cudagetdevicecount-returned-30-gt-unknown-error/
https://developer.download.nvidia.com/compute/cuda/9.2/Prod2/docs/sidebar/CUDA_Installation_Guide_Linux.pdf
https://blog.csdn.net/10km/article/details/61665578
https://medium.com/@changrongko/nv-how-to-check-cuda-and-cudnn-version-e05aa21daf6c
https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html
https://www.cnblogs.com/wolflzc/p/9117291.html
http://detail.zol.com.cn/picture_index_1760/index17594460.shtml