Hardware Support (CPU/GPU)

Using a CPU

Deep learning algorithms perform a lot of computations and can be very slow to train on a CPU.
Use the EDeepLearningTool::SetEnableGPU(false) method to use the CPU with the deep learning tools.
The deep learning tools support CPU processing for both 32-bit and 64-bit applications.
The memory of a 32-bit application is limited to 2 GB and this can slow the training or the classification of large images.
The 64-bit version better supports the SIMD instructions of modern CPUs and is usually faster than the 32-bit version.

Using an NVIDIA CUDA® GPU

Using a recent NVIDIA GPU greatly accelerates the processing speeds. Refer to the benchmarks for each tool type to compare the GPU and CPU speeds.

1. To use an NVIDIA GPU on Windows or Linux x64 (Intel) with the deep learning tools, install the following NVIDIA libraries on your computer:
NVIDIA CUDA® Toolkit version v11.1 (https://developer.nvidia.com/cuda-toolkit)
NVIDIA CUDA® Deep Neural Network library (cuDNN) v8.1 for CUDA 11.1 (https://developer.nvidia.com/cudnn)
2. According to the installation location:
If you install the NVIDIA CUDA® Toolkit in its default location, a deep learning tool automatically finds what it needs.

- On Windows: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1

- On Linux: /usr/local/cuda-11.1

Otherwise, copy the DLLs cusolver64_11.dll, curand64_10.dll, cufft64_10.dll and cublas64_11.dll in the Open eVision DLL folder (its default location is C:\Program Files (x86)\Euresys\Open eVision X.X\Bin64\).
3. Install the NVIDIA CUDA ® Deep Neural Network library (cuDNN) that comes as a zip archive:
a. Unzip the files.
b. Copy the unzipped files to the NVIDIA CUDA® Toolkit installation directory as indicated in https://docs.nvidia.com/deeplearning/sdk/cudnn-install/index.html#installwindows.
c. If the NVIDIA CUDA® Toolkit is not installed in its default location, copy all the DLL files cudnn*8.dll in the Open eVision DLL folder (its default location is C:\Program Files (x86)\Euresys\Open eVision X.X\Bin64\).
4. To use an NVIDIA GPU on Linux ARM (aarch64 Jetson platforms), you need:
The NVIDIA JetPack SDK version 4.6 that includes the NVIDIA Jetson Linux Driver Package (L4T) version 32.6
CUDA® 10.2
cuDNN 8.2.1.

On these platforms, the NVIDIA CUDA ® Toolkit is located in /usr/local/cuda-10.2.

To use version 32.6 of the NVIDIA Jetson Linux Driver Package and install the required CUDA packages:

a. Edit the file /etc/apt/sources.list.d/nvidia-l4t-apt-source.list so that its content is:

 deb https://repo.download.nvidia.com/jetson/common r32.6 main
 deb https://repo.download.nvidia.com/jetson/t194 r32.6 main

b. Launch the following commands in a terminal:

 $ sudo apt update
 $ sudo apt install cuda-toolkit-10-2 cuda-tools-10-2 libcudnn8 libcudnn8-dev

c. If Open eVision does not detect the GPU, add the path of the CUDA library directory to the environment variable LD_LIBRARY_PATH.

- In a terminal, use the following command:

 $ export LD_LIBRARY_PATH=/usr/local/cuda-10.2/lib64:$LD_LIBRARY_PATH

- You can also put this command in your .bashrc file of your home directory to make it persistent in your terminal.

5. Use the method EDeepLearningTool::SetEnableGPU(true) to use the GPU with the deep learning tools.

Using multiple GPUs

You can use multiple GPUs for the training and the batch classification.

In the API, to set the list of GPUs, use the EDeepLearningTool::SetGPUIndexes method.

Using multiple GPUs increases the training and batch classification speed only if these GPUs are Quadro or Tesla models with the TCC driver model (see https://docs.nvidia.com/gameworks/content/developertools/desktop/nsight/tesla_compute_cluster.htm).
Using multiple GeForce GPUs is slower than using a single one. If there are more than one GPU installed on your computer, set the index of the GPU to use with the EDeepLearningTool::SetGPUIndexes method.

In Deep Learning Studio, to choose the processing devices, select an execution profile.

You can configure these execution profiles to match your needs.
GPU processing is not possible with 32-bit applications.

Image cache

The image cache is the part of the memory reserved for storing images during training.

The default size is 1 GB.
The training speed is higher when the image cache is big enough to hold all the images of your dataset.
With dataset too big to fit in the image cache, we recommend using a SSD drive to hold your images and project files as a SSD drive improves the training speed.

To specify the cache size in bytes:

In the API, use the EDeepLearningTool::SetImageCacheSize method.
In Deep Learning Studio, click on the Configure button below the Execution profile control and select Image cache in the menu.

    

When there is enough memory to increase the image cache so that it can hold all the images in the dataset, Deep Learning Studio displays a recommendation next to the training button.
Click on the recommendation to change the image cache size and improve the training speed.

Multicore processing

The deep learning tools support multicore processing (see Multicore Processing):

In the API, use the multicore processing helper function from Open eVision (that is Easy ::SetMaxNumberOfProcessingThreads() with a value greater than 1).
In Deep Learning Studio, click on the Configure button below the Execution profile control and select CPU Settings in the menu.