I figured out how to use the API. Thanks to the new OFED code from Mellanox, you only have to pass the CUDA malloc'ed address to the infiniband memory register call. If you have the nvidia_peer_memory driver from Mellanox installed, that driver is called to validate the address.
↧