Porting a seismic wave simulator on Graphic Processing Units, what for?
How can Graphic Processing Units help seismologist ? Understanding seismic wave propagation is one of the most important issues in seismography and has always been a numerical challenge. This is very important in the industry to prevent the impact of earthquakes or for example during oil exploration operations. In this article, we show how GPUs offer great opportunities.
SeWaS (Seismic Wave Simulator)  is an implementation of a seismic wave propagation simulator developed by Aldwin by ANEO combining efficiency and accuracy using state-of-the-art libraries CPU. Through the use of these HPC libraries, SeWaS is also seen as a benchmark tool. Given the large computational time required to solve typical use-cases, it is interesting to explore the use of hardware accelerators such as Graphic Processing Units (GPUs) as a target computing platform for SeWaS.
Firstly, let’s present SeWaS and why it could be interesting to port it on another hardware as the GPU. Then we will discuss about GPGPU. Finally, we will see how the GPU backend is implemented while keeping the genericity of the application.
Presentation of SeWaS
SeWaS is a solver that can be used for reverse time migration.
The study of seismic wave propagation is based on linear elasticity and more particularly on elastodynamics which is the study of elastic waves. Therefore, SeWaS uses some mathematical tools such as elasto-dynamic equations to evaluate the components of the velocity and stress fields. SeWaS requires many vector operations, indeed we must compute the 4th order central finite difference operator to calculate the discretized forms of the velocity and stress fields . For SeWaS, it was necessary to have a method where the code looks a lot like “natural” mathematical expressions while being efficient. The design pattern Expression Templates  is a good way to combine abstraction and performance. Expression Templates are a C++ meta-programming technique that builds a structure resulting from a calculation at compile time, only the calculation itself is performed at runtime . Therefore SeWaS was built using C++14 and Eigen .
SeWaS has two levels of parallelization :
- The parallel calculation of all velocity fields at a fixed time step : task parallelism
- The parallel calculation of derivatives (application of the finite differences operator) : SIMD parallelism
So It is questionable whether or not using the CPU as the sole hardware is the most efficient way.
Using others hardware ?
CPUs are very efficient to quickly perform sequential tasks unlike the GPU. Indeed, because of its different architecture, GPU is not efficient for this type of task but it is well suited for throughput computing. Until now SeWaS is built around CPU parallelization however, we can wonder if it would not be possible to enhance performances by using GPGPU (General-Purpose Computing on Graphic Processing Units). It performs generic calculations on a graphic processor, this would allow us to benefit from the parallel processing capacity of the graphic processor. GPU follows the SIMT (Single Instruction Multiple Thread) execution model which is the combination of SIMD (Single Instruction Multiple Data) and multithreading. This means that multiple independents threads can execute simultaneously the same instruction on different data.
Furthermore, SeWaS is memory bound. So, the algorithm is limited by access to RAM memory. We are looking to take advantage of the cache (faster than RAM memory) but once full we are limited by the memory bandwidth. The Graphic Processing Units has better bandwidth than the CPU, so it could also allow us to get better performance.
Using different programming languages
We can use several programming languages to program on Graphic Processing Units such as OpenCL  and CUDA , which are the two main frameworks for programming GPU applications. Currently, we are targeting Nvidia GPUs but we are looking forward for leveraging also FPGAs that can be addressed through OpenCL. CUDA is a programming language developed by Nvidia, it only works on Nvidia GPUs. OpenCL is an open standard that allows users to program on many GPUs (AMD, NVIDIA, Intel…) unlike CUDA. CUDA is the most efficient on Nvidia GPUs but instead of CUDA, OpenCL allows us to program on FPGA. Rather than programming two kernels (one in CUDA and one in OpenCL) we have chosen to use C++ generic programming for developing multi-target backend of SeWaS. In particular, we will first explore the usage of the VexCL library.
VexCL  is a vector expression template library for OpenCL/CUDA. It has been developed to ease of GPGPU development with C++, especially for applications featuring large vector operations. It provides convenient and intuitive notation for vector arithmetic. Moreover multi-device and multi-platform computations are supported. Using VexCL also allows us to switch from CUDA to OpenCL or OpenCL to CUDA very easily. First, we will focus on the CUDA backend to program on Nvidia GPUs. CUDA backend is programmed using meta-programming technique like expression templates for Graphic Processing Units.
View on future work
To sum up, we briefly presented SeWaS and why it would be interesting to use another architecture that focuses more on parallel computing like GPUs. We will soon talk in more detail about porting SeWaS to GPU and also about Expression Templates.
Read more about GPUS computing here.
|||Aneo, «SeWaS,» [En ligne]. Available: https://github.com/aneoconsulting/SeWaS.|
|||S. Moustafa, W. Kirschenmann, F. Dupros et H. Aochi, «Task-Based Programming on Emerging Parallel Architectures for Finite-Differences Seismic Numerical Kernel,» 2018.|
|||T. Veldhuizen, «Expression Templates,» 1995.|
|||W. Kirschenmann, «Vers des noyaux de calcul intensif pérennes,» 2012.|
|||B. Jacob et G. Guennebaud, «Eigen,» [En ligne]. Available: https://eigen.tuxfamily.org/dox/.|
|||K. Group, «OpenCL,» [En ligne]. Available: https://opencl.org/.|
|||Nvidia, «CUDA,» [En ligne]. Available: https://developer.nvidia.com/cuda-toolkit.|
|||D. Demidov, «VexCL,» [En ligne]. Available: https://vexcl.readthedocs.io/en/latest/index.html.|