I am a new user of Infiniband. After installation of Mellaxon_OFED by default, I run the hca_self_test.ofed, and I got the information as follow:
root@gpu-cluster-4:/usr/bin# hca_self_test.ofed
---- Performing Adapter Device Self Test ----
Number of CAs Detected ................. 1
PCI Device Check ....................... PASS
Kernel Arch ............................ x86_64
Host Driver Version .................... MLNX_OFED_LINUX-2.3-2.0.0 (OFED-2.3-2.0.0): 3.13.0-32-generic
Host Driver RPM Check .................. PASS
Firmware on CA #0 VPI .................. v2.11.500
Firmware Check on CA #0 (VPI) .......... FAIL
REASON: mismatch CA #0 firmware detected (found v2.11.500, required v2.32.5100)
Host Driver Initialization ............. PASS
Number of CA Ports Active .............. 2
Port State of Port #1 on CA #0 (VPI)..... UP 4X QDR (InfiniBand)
Port State of Port #2 on CA #0 (VPI)..... UP 4X QDR (InfiniBand)
Error Counter Check on CA #0 (VPI)...... PASS
Kernel Syslog Check .................... PASS
Node GUID on CA #0 (VPI) ............... 00:e0:81:00:00:2a:e9:5b
------------------ DONE ---------------------
But when I check the interface of ib, it seems like OK:
root@gpu-cluster-1:/usr/sbin# ibstat
CA 'mlx4_0'
CA type: MT4099
Number of ports: 2
Firmware version: 2.11.500
Hardware version: 0
Node GUID: 0x00e08100002ae8a7
System image GUID: 0x00e08100002ae8aa
Port 1:
State: Active
Physical state: LinkUp
Rate: 40
Base lid: 1
LMC: 0
SM lid: 1
Capability mask: 0x0251486a
Port GUID: 0x00e08100002ae8a8
Link layer: InfiniBand
Port 2:
State: Active
Physical state: LinkUp
Rate: 40
Base lid: 9
LMC: 0
SM lid: 1
Capability mask: 0x02514868
Port GUID: 0x00e08100002ae8a9
Link layer: InfiniBand
Do I need to do something to fix the FAIL problem? What is the influence of it?