GTX1070+ubuntu1610+tensorflow-gpu安装

96
petrowu
2017.03.31 15:32* 字数 1102

#系统启动时提示nouveauerror: unkown chipset

# nouveau无法识别GTX1080

-禁用nouveau

vi /etc/modprobe.d/blacklist.conf

#添加:

blacklist nouveau

sudo update-initramfs -u

sudo reboot

#准备系统环境

sudo apt-get install build-essential wget

#安装gcc g++ 4.8

sudo apt-get install gcc-4.8gcc-4.8-multilib g++-4.8 g++-4.8-multilib

sudo update-alternatives --install/usr/bin/gcc gcc /usr/bin/gcc-5 60

sudo update-alternatives --install/usr/bin/gcc gcc /usr/bin/gcc-4.8 50

sudo update-alternatives --install/usr/bin/g++ g++ /usr/bin/g++-5 60

sudo update-alternatives --install/usr/bin/g++ g++ /usr/bin/g++-4.8 50

#切换gcc g++版本

sudo update-alternatives --config gcc

sudo update-alternatives --config g++

#移除gcc g++ 4.8

# sudo update-alternatives --remove gcc/usr/bin/gcc-4.8

# sudo update-alternatives --remove g++/usr/bin/g++-4.8

# CUDA 8.0RC

#https://developer.nvidia.com/cuda-release-candidate-download

#安装cuda toolkit

#切换到gcc-4.8

sudo dpkg -icuda-repo-ubuntu1604-8-0-rc_8.0.27-1_amd64.deb

sudo apt-get update

sudo apt-get install cuda

#配置环境变量

echo "exportCUDA_HOME=/usr/local/cuda" >> ~/.bashrc

echo "exportPATH=/usr/local/cuda/bin:$PATH" >> ~/.bashrc

echo "exportLD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH" >> ~/.bashrc

#安装cuDNN

tar -xf cudnn-8.0-linux-x64-v5.0-ga.tgz

sudo cp -f cuda/lib64/*.*/usr/local/cuda/lib64/

sudo cp -f cuda/include/*.*/usr/local/cuda/include/

#注意:GeForce GTX1080 Developers must re-install the latest driver from www.nvidia.com/driversafter installing any of these CUDA Toolkits.

#注意:gcc-4.8无法编译nvidia driver

#注意:安装驱动时需要允许dkms

#切换到gcc-5

sudo sh NVIDIA-Linux-x86_64-*.run

#卸载驱动:sudonvidia-uninstall

#测试

cd/usr/local/cuda/samples/1_Utilities/deviceQuery

sudo make

./deviceQuery

# modprobe: ERROR: could not insert'nvidia_361_uvm': Invalid argument

#这是因为cuda8.0自带了361版本的nvidia driver,需要将其卸载

sudo apt-getremove nvidia-361

The following packages will be REMOVED:

cuda cuda-8-0 cuda-demo-suite-8-0cuda-drivers cuda-runtime-8-0 nvidia-361 nvidia-361-dev

0 upgraded, 0 newly installed, 7 to removeand 76 not upgraded.

After this operation, 312 MB disk spacewill be freed.

Do you want to

continue? [Y/n] y(别怕,没问题)

sudo reboot(重启显示有问题,可能无法进入桌面)

Crtl+Alt+F1

sudo apt-add-repository ppa:graphics-drivers/ppa -y

sudo apt update

sudo apt install nvidia-367 nvidia-settingsnvidia-prime

sudo reboot

现在能正常进入桌面了

# Tensorflow 0.9.0 build from source

#安装bazel

sudo apt-get install openjdk-8-jdk

echo "debhttp://storage.googleapis.com/bazel-apt stable jdk1.8" | sudo tee/etc/apt/sources.list.d/bazel.list

curlhttps://storage.googleapis.com/bazel-apt/doc/apt-key.pub.gpg | sudo apt-key add-

sudo apt-get update

sudo apt-get install bazel

#编译tensorflow

sudo apt-get install python-numpy swigpython-dev

mkdir ~/github && cd ~/github

git clone --recurse-submoduleshttps://github.com/tensorflow/tensorflow

cd ~/github/tensorflow &&./configure

---------------------------------------

Please specify the location of python.[Default is /usr/bin/python]:

Do you wish to build TensorFlow with GoogleCloud Platform support? [y/N] n

No Google Cloud Platform support will beenabled for TensorFlow

Do you wish to build TensorFlow with GPUsupport? [y/N] y

GPU support will be enabled for TensorFlow

Please specify which gcc nvcc should use asthe host compiler. [Default is /usr/bin/gcc]:

Please specify the Cuda SDK version youwant to use, e.g. 7.0. [Leave empty to use system default]: 8.0

Please specify the location where CUDA 8.0toolkit is installed. Refer to README.md for more details. [Default is/usr/local/cuda]:

Please specify the Cudnn version you wantto use. [Leave empty to use system default]: 5 (not 5.0)

Please specify the location where cuDNN 5library is installed. Refer to README.md for more details. [Default is/usr/local/cuda]:

Please specify a list of comma-separatedCuda compute capabilities you want to build with.

You can find the compute capability of yourdevice at: https://developer.nvidia.com/cuda-gpus.

Please note that each additional computecapability significantly increases your build time and binary size.

[Default is: "3.5,5.2"]:

Setting up Cuda include

Setting up Cuda lib64

Setting up Cuda bin

Setting up Cuda nvvm

Setting up CUPTI include

Setting up CUPTI lib64

Configuration finished

---------------------------------------

bazel build -c opt --config=cuda//tensorflow/tools/pip_package:build_pip_package

bazel-bin/tensorflow/tools/pip_package/build_pip_package/tmp/tensorflow_pkg

sudo pip install/tmp/tensorflow_pkg/tensorflow-…

#测试

python -c "import tensorflow"

# ImportError: cannot import name

pywrap_tensorflow:需要重启

sudo reboot

# Theano & keras

sudo apt-get install python-numpypython-scipy python-dev python-pip python-nose libopenblas-dev git

sudo pip install Theano

sudo pip install keras

#配置Theano

echo "[global]" > ~/.theanorc

echo "floatX = float32" >>~/.theanorc

echo "device = gpu0" >>~/.theanorc

echo "[nvcc]" >>~/.theanorc

echo "fastmath = True" >>~/.theanorc

#测试

python -c "import keras"

# matplotlib

sudo apt-get build-dep python-matplotlib

# E: You must put some 'source' URIs inyour sources.list

sudo vi /etc/apt/sources.list

#去掉所有deb-src前面的#号

sudo apt-get update

sudo pip install matplotlib

# h5py

sudo apt-get install libhdf5-dev

sudo apt-get install cython

sudo pip install h5py

# Docker

# Update apt sources

sudo apt-get update

sudo apt-get install apt-transport-httpsca-certificates

sudo apt-key adv --keyserverhkp://p80.pool.sks-keyservers.net:80 --recv-keys58118E89F3A912897C070ADBF76221572C52609D

sudo vi /etc/apt/sources.list.d/docker.list

#添加(14.04):

deb https://apt.dockerproject.org/repoubuntu-trusty main

#添加(16.04):

deb https://apt.dockerproject.org/repoubuntu-xenial main

sudo apt-get update

sudo apt-get install docker-engine

sudo service docker start

# add user group

sudo groupadd docker

sudo usermod -aG docker [your username]

日记本