Running cxxnet on Amazon EC2 (Ubuntu 14.04)

By , 2015年8月9日 8:47 上午

1. Launch an EC2 instance with the g2.8xlarge instance type, using a Ubuntu 14.04 HVM AMI. When I launched the EC2 instance, I used a root EBS volume of 300 GB (General Purpose SSD) to have a decent disk I/O capacity. With general purpose SSD, you have 3 IOPS for each GB of storage. So 300 GB storage gives me 900 baseline IOPS, with the capability to burst up to 3000 IOPS for an extended period of time.

2. SSH into the EC2 instance and install CUDA driver, as below:

There is a detailed tutorial on this topic available on Github:

https://github.com/BVLC/caffe/wiki/Install-Caffe-on-EC2-from-scratch-(Ubuntu,-CUDA-7,-cuDNN)

3. Install OpenBLAS, as below

$ sudo apt-get install make gfortran
$ wget http://github.com/xianyi/OpenBLAS/archive/v0.2.14.tar.gz
$ tar zxvf v0.2.14.tar.gz
$ cd OpenBLAS-0.2.14
$ make FC=gfortran
$ sudo make PREFIX=/usr/local/ install
$ cd/usr/local/lib
$ sudo ln -s libopenblas.so libblas.so

4. Install OpenCV

There is a detailed documentation available from the Ubuntu community:

https://help.ubuntu.com/community/OpenCV

You will also need to install the header files for OpenCV

$ sudo apt-get install libopencv-dev

3. Install cxxnet, as below

$ cd ~
$ wget https://github.com/dmlc/cxxnet/
$ cd cxxnet
$ ./build.sh

In most cases, the build will fail. You need to customize your Makefile a little bit to reflect the actual situation of your build environment. Below is an example from my environment:

CFLAGS += -g -O3 -I./mshadow/ -I./dmlc-core/include -I/usr/local/cuda/include -I/usr/include -fPIC $(MSHADOW_CFLAGS) $(DMLC_CFLAGS)
LDFLAGS = -pthread $(MSHADOW_LDFLAGS) $(DMLC_LDFLAGS) -L/usr/local/cuda/lib64 -L/usr/local/lib

Then do the make again:

$ make
g++ -DMSHADOW_FORCE_STREAM -Wall -g -O3 -I./mshadow/ -I./dmlc-core/include -I/usr/local/cuda/include -I/usr/include -fPIC -msse3 -funroll-loops -Wno-unused-parameter -Wno-unknown-pragmas -DMSHADOW_USE_CBLAS=1 -DMSHADOW_USE_MKL=0 -DMSHADOW_RABIT_PS=0 -DMSHADOW_DIST_PS=0 -fPIC -DDMLC_USE_HDFS=0 -DDMLC_USE_S3=0 -DDMLC_USE_AZURE=0 -DCXXNET_USE_OPENCV=1 -DCXXNET_USE_OPENCV_DECODER=1 -fopenmp   -o bin/cxxnet src/local_main.cpp layer_cpu.o updater_cpu.o nnet_cpu.o main.o nnet_ps_server.o data.o dmlc-core/libdmlc.a layer_gpu.o updater_gpu.o nnet_gpu.o -pthread -lm -lcudart -lcublas -lcurand -lblas -lrt -L/usr/local/cuda/lib64 -L/usr/local/lib `pkg-config --libs opencv` -ljpeg
g++ -DMSHADOW_FORCE_STREAM -Wall -g -O3 -I./mshadow/ -I./dmlc-core/include -I/usr/local/cuda/include -I/usr/include -fPIC -msse3 -funroll-loops -Wno-unused-parameter -Wno-unknown-pragmas -DMSHADOW_USE_CBLAS=1 -DMSHADOW_USE_MKL=0 -DMSHADOW_RABIT_PS=0 -DMSHADOW_DIST_PS=0 -fPIC -DDMLC_USE_HDFS=0 -DDMLC_USE_S3=0 -DDMLC_USE_AZURE=0 -DCXXNET_USE_OPENCV=1 -DCXXNET_USE_OPENCV_DECODER=1 -fopenmp   -o bin/im2rec tools/im2rec.cc dmlc-core/libdmlc.a -pthread -lm -lcudart -lcublas -lcurand -lblas -lrt -L/usr/local/cuda/lib64 -L/usr/local/lib `pkg-config --libs opencv` -ljpeg
g++ -DMSHADOW_FORCE_STREAM -Wall -g -O3 -I./mshadow/ -I./dmlc-core/include -I/usr/local/cuda/include -I/usr/include -fPIC -msse3 -funroll-loops -Wno-unused-parameter -Wno-unknown-pragmas -DMSHADOW_USE_CBLAS=1 -DMSHADOW_USE_MKL=0 -DMSHADOW_RABIT_PS=0 -DMSHADOW_DIST_PS=0 -fPIC -DDMLC_USE_HDFS=0 -DDMLC_USE_S3=0 -DDMLC_USE_AZURE=0 -DCXXNET_USE_OPENCV=1 -DCXXNET_USE_OPENCV_DECODER=1 -fopenmp   -o bin/bin2rec tools/bin2rec.cc dmlc-core/libdmlc.a -pthread -lm -lcudart -lcublas -lcurand -lblas -lrt -L/usr/local/cuda/lib64 -L/usr/local/lib `pkg-config --libs opencv` -ljpeg
g++ -DMSHADOW_FORCE_STREAM -Wall -g -O3 -I./mshadow/ -I./dmlc-core/include -I/usr/local/cuda/include -I/usr/include -fPIC -msse3 -funroll-loops -Wno-unused-parameter -Wno-unknown-pragmas -DMSHADOW_USE_CBLAS=1 -DMSHADOW_USE_MKL=0 -DMSHADOW_RABIT_PS=0 -DMSHADOW_DIST_PS=0 -fPIC -DDMLC_USE_HDFS=0 -DDMLC_USE_S3=0 -DDMLC_USE_AZURE=0 -DCXXNET_USE_OPENCV=1 -DCXXNET_USE_OPENCV_DECODER=1 -fopenmp  -shared -o wrapper/libcxxnetwrapper.so wrapper/cxxnet_wrapper.cpp layer_cpu.o updater_cpu.o nnet_cpu.o main.o nnet_ps_server.o data.o dmlc-core/libdmlc.a layer_gpu.o updater_gpu.o nnet_gpu.o -pthread -lm -lcudart -lcublas -lcurand -lblas -lrt -L/usr/local/cuda/lib64 -L/usr/local/lib `pkg-config --libs opencv` -ljpeg

Now we can run an example:

$ cd example/MNIST
$ ./run.sh MNIST_CONV.conf 
libdc1394 error: Failed to initialize libdc1394
Use CUDA Device 0: GRID K520
finish initialization with 1 devices
Initializing layer: cv1
Initializing layer: 1
Initializing layer: 2
Initializing layer: 3
Initializing layer: fc1
Initializing layer: se1
Initializing layer: fc2
Initializing layer: 7
SGDUpdater: eta=0.100000, mom=0.900000
SGDUpdater: eta=0.100000, mom=0.900000
SGDUpdater: eta=0.100000, mom=0.900000
SGDUpdater: eta=0.100000, mom=0.900000
SGDUpdater: eta=0.100000, mom=0.900000
SGDUpdater: eta=0.100000, mom=0.900000
node[in].shape: 100,1,28,28
node[1].shape: 100,32,14,14
node[2].shape: 100,32,7,7
node[3].shape: 100,1,1,1568
node[4].shape: 100,1,1,100
node[5].shape: 100,1,1,100
node[6].shape: 100,1,1,10
MNISTIterator: load 60000 images, shuffle=1, shape=100,1,28,28
MNISTIterator: load 10000 images, shuffle=0, shape=100,1,28,28
initializing end, start working
round        0:[     600] 2 sec elapsed[1]      train-error:0.211783	test-error:0.0435
round        1:[     600] 3 sec elapsed[2]      train-error:0.0522667	test-error:0.0263
round        2:[     600] 5 sec elapsed[3]      train-error:0.0370833	test-error:0.0214
round        3:[     600] 7 sec elapsed[4]      train-error:0.0316167	test-error:0.023
round        4:[     600] 9 sec elapsed[5]      train-error:0.02905	test-error:0.0152
round        5:[     600] 11 sec elapsed[6]     train-error:0.0265167	test-error:0.0166
round        6:[     600] 13 sec elapsed[7]     train-error:0.0248333	test-error:0.0164
round        7:[     600] 15 sec elapsed[8]     train-error:0.0226667	test-error:0.0144
round        8:[     600] 17 sec elapsed[9]     train-error:0.0234167	test-error:0.0139
round        9:[     600] 19 sec elapsed[10]    train-error:0.0221	test-error:0.0152
round       10:[     600] 21 sec elapsed[11]    train-error:0.0218667	test-error:0.0121
round       11:[     600] 23 sec elapsed[12]    train-error:0.02025	test-error:0.0128
round       12:[     600] 24 sec elapsed[13]    train-error:0.01925	test-error:0.0142
round       13:[     600] 26 sec elapsed[14]    train-error:0.0194333	test-error:0.0129
round       14:[     600] 28 sec elapsed[15]    train-error:0.0190167	test-error:0.0114

updating end, 28 sec in all

At this point you can proceed to work with the examples provided by the cxxnet authors:

https://github.com/dmlc/cxxnet/tree/master/example

2 Responses to “Running cxxnet on Amazon EC2 (Ubuntu 14.04)”

  1. Silcowitz说道:

    Great stuff. Could you make the image available as a community image?

  2. Bing Xu说道:

    @Silcowitz Thanks for your interest and suggestion. We are currently building new generation toolkit which fully support Python/R. It is called MXNet(https://github.com/dmlc/mxnet). We will make an image of MXNet when we finish it!

Leave a Reply

Panorama Theme by Themocracy