Dear Sophie,
Thanks for your reply. Attached below are some question after reading and checking the materials you sent to me. I also sent those question to Mr. Scot Schultz, the director of HPC at Mellanox.
To build a simple template accessing data on the GPUs of one node from the GPUs of another GPUs, we want to run some benchmark code or testing code on the system to ensure GPUDirect RMDA works on our system, then to create the template code for the future use.
Benchmark:
1. MPI level
A. openMPI
openMPIs support GPUDirect RDMA since v1.7. I downloaded the openmpi-v1.10, however, there is no GPUDirect RDMA sample code in the examples folder. Do you know there is any place we can find some more sample code using GPUDirect RMDA + openMPI? I only found one sample code at https://github.com/parallel-forall/code-samples/tree/master/posts/cuda-aware-mpi-example/srcAnd do you know how we can tell there is no CPU memory involved? The current MPI_send() and MPI_recv() are likely the blackbox which encapsulates the GPUDirect RDMA.
B. MVAPICH2-GDR
MVAPICH2 is using the gdrcopy library which you showed in your reply (https://github.com/NVIDIA/gdrcopy), in which they are using API like cuPointerSetAttribute(). However, in the NVIDIA documentation, there is not too much details or sample code to demo it. Do you know if there is any cuda sample code using these APIs?
2. CUDA+IB verbs level
I am not able to open the link you gave me (git://git.openfabrics.org/~grockah/perftest.git). Could you please show me another link if possible?
3. Ib tools
There is an article benchmarking GPUDirect RDMA using ibv_ud_pingpong and ibv_rdma_bw from libibverbs-1.1 and perftest-1.3 (https://devblogs.nvidia.com/parallelforall/benchmarking-gpudirect-rdma-on-modern-server-platforms/#platforms). The author, Davide Rossetti, is a NVIDIA developer. However, he did not mention how to test it. By my understanding, ibv_ud_pingpong need server ip, I am not sure how to use it to test the RDMA connection between GPUs or Host-GPU. And the newest perftest does not provide ibv_rdma_bw any more. I did find the author’s e-mail.Do you have any experience benchmarking GPUDirect RDMA using those ib tools? If so, could you please show us how to implement?
Appreciate your help.