Porting a seismic wave simulator on GPU ? Yes but why ?
Understanding seismic wave propagation is one of the most important issues in seismography and has always been a numerical challenge. This is very important in the industry to prevent the impact of earthquakes or during oil exploration operations. SeWaS (Seismic Wave Simulator)  is an implementation of a seismic wave propagation simulator developed by ANEO combining efficiency and accuracy using state-of-the-art libraries CPU. Through the use of these HPC libraries, SeWaS is also seen as a benchmark tool. Given the large computational time required to solve typical use-cases, it is interesting to explore the use of hardware accelerators such as Graphics Processing Units (GPUs) as a target computing platform for SeWaS.
First, let’s present SeWaS and why it could be interesting to port it on another hardware as the GPU. Then we will discuss about GPGPU. Finally, we will see how the GPU backend is implemented while keeping the genericity of the application.
Presentation of SeWaS
SeWaS is a solver that can be used for reverse time migration.
The study of seismic wave propagation is based on linear elasticity and more particularly on elastodynamics which is the study of elastic waves. Therefore, SeWaS uses some mathematical tools such as elasto-dynamic equations to evaluate the components of the velocity and stress fields. SeWaS requires many vector operations, indeed we must compute the 4th order central finite difference operator to calculate the discretized forms of the velocity and stress fields . For SeWaS, it was necessary to have a method where the code looks a lot like “natural” mathematical expressions while being efficient. The design pattern Expression Templates  is a good way to combine abstraction and performance. Expression Templates are a C++ meta-programming technique that builds a structure resulting from a calculation at compile time, only the calculation itself is performed at runtime . Therefore SeWaS was built using C++14 and Eigen .
SeWaS has two levels of parallelization :
- The parallel calculation of all velocity fields at a fixed time step : task parallelism
- The parallel calculation of derivatives (application of the finite differences operator) : SIMD parallelism
So It is questionable whether or not using the CPU as the sole hardware is the most efficient way.
Using others hardware ?
CPUs are very efficient to quickly perform sequential tasks unlike the GPU. Indeed, because of its different architecture, GPU is not efficient for this type of task but it is well suited for throughput computing. Until now SeWaS is built around CPU parallelization but we can wonder if it would not be possible to enhance performances by using GPGPU (General-Purpose Computing on Graphics Processing Units). It performs generic calculations on a graphics processor, this would allow us to benefit from the parallel processing capacity of the graphics processor. GPU follows the SIMT (Single Instruction Multiple Thread) execution model which is the combination of SIMD (Single Instruction Multiple Data) and multithreading. This means that multiple independents threads can execute simultaneously the same instruction on different data. Furthermore, SeWaS is memory bound. So, the algorithm is limited by access to RAM memory. We are looking to take advantage of the cache (faster than RAM memory) but once full we are limited by the memory bandwidth. The GPU has better bandwidth than the CPU, so it could also allow us to get better performance.
We can use several programming languages to program on GPUs such as OpenCL  and CUDA , which are the two main frameworks for programming GPU applications. Currently, we are targeting Nvidia GPUs but we are looking forward for leveraging also FPGAs that can be addressed through OpenCL. CUDA is a programming language developed by Nvidia, it only works on Nvidia GPUs. OpenCL is an open standard that allows users to program on many GPUs (AMD, NVIDIA, Intel…) unlike CUDA. CUDA is the most efficient on Nvidia GPUs but instead of CUDA, OpenCL allows us to program on FPGA. Rather than programming two kernels (one in CUDA and one in OpenCL) we have chosen to use C++ generic programming for developing multi-target backend of SeWaS. In particular, we will first explore the usage of the VexCL library.
VexCL  is a vector expression template library for OpenCL/CUDA. It has been developed to ease of GPGPU development with C++, especially for applications featuring large vector operations. It provides convenient and intuitive notation for vector arithmetic. Moreover multi-device and multi-platform computations are supported. Using VexCL also allows us to switch from CUDA to OpenCL or OpenCL to CUDA very easily. First, we will focus on the CUDA backend to program on Nvidia GPUs. CUDA backend is programmed using meta-programming technique like expression templates for GPUs.
View on future work
In this article, we briefly presented SeWaS and why it would be interesting to use another architecture that focuses more on parallel computing like GPUs. We will soon talk in more detail about porting SeWaS to GPU and also about Expression Templates.
|||Aneo, «SeWaS,» [En ligne]. Available: https://github.com/aneoconsulting/SeWaS.|
|||S. Moustafa, W. Kirschenmann, F. Dupros et H. Aochi, «Task-Based Programming on Emerging Parallel Architectures for Finite-Differences Seismic Numerical Kernel,» 2018.|
|||T. Veldhuizen, «Expression Templates,» 1995.|
|||W. Kirschenmann, «Vers des noyaux de calcul intensif pérennes,» 2012.|
|||B. Jacob et G. Guennebaud, «Eigen,» [En ligne]. Available: https://eigen.tuxfamily.org/dox/.|
|||K. Group, «OpenCL,» [En ligne]. Available: https://opencl.org/.|
|||Nvidia, «CUDA,» [En ligne]. Available: https://developer.nvidia.com/cuda-toolkit.|
|||D. Demidov, «VexCL,» [En ligne]. Available: https://vexcl.readthedocs.io/en/latest/index.html.|