Quantcast
Channel: Mellanox Interconnect Community: Message List
Viewing all articles
Browse latest Browse all 6148

“Invalid module format” error while loading nv_peer_mem in CentOS 6.6

$
0
0

Hello everybody,

I have 2 twin servers, with same hardware (Infiniband and Nvidia Tesla) and same OS (CentOS6.6, 2.6.32-504.el6.x86_64 kernel and drivers).

On host1 everything is working fine as usual, while on host2 I cannot run anymore this service, because I get this error:

[root@vega2 nvidia_peer_memory-1.0-0]# service nv_peer_mem start
starting... FATAL: Error inserting nv_peer_mem (/lib/modules/2.6.32-504.el6.x86_64/extra/nv_peer_mem.ko): Invalid module format
Failed to load nv_peer_mem

and dmesg says:

nv_p2p_dummy: exports duplicate symbol nvidia_p2p_free_page_table (owned by nvidia)

Note that host2 has been working fine for 2 months, until a rebooted it after summer holydays.   What can be the cause of this error ? The main software component didn't change (kernel, Nvidia drivers, Mellanox drivers) and hardware is ok. I tried also to repeat the installation procedure, but I get stuck at module loading point:

[root@vega2 nvidia_peer_memory-1.0-0]# rpm -ivh /root/rpmbuild/RPMS/x86_64/nvidia_peer_memory-1.0-0.x86_64.rpm
Preparing... ########################################### [100%]
1:nvidia_peer_memory ########################################### [100%]
FATAL: Error inserting nv_peer_mem (/lib/modules/2.6.32-504.el6.x86_64/extra/nv_peer_mem.ko): Invalid module format

I found this post (http://stackoverflow.com/questions/3454740/what-will-happen-if-two-kernel-module-export-same-symbol ) about two kernel modules exporting the same symbols, but why on host2 this second module is disturbing nv_peer_mem, while on host1 it does not ? Here is the output of nm commands, exactly the same for both hosts.

[root@vega2 nvidia_peer_memory-1.0-0]# nm /lib/modules/2.6.32-504.el6.x86_64/kernel/drivers/video/nvidia.ko |grep nvidia_p2p_free_ page_table
0000000088765bb5 A __crc_nvidia_p2p_free_page_table
0000000000000028 r __kcrctab_nvidia_p2p_free_page_table
000000000000007e r __kstrtab_nvidia_p2p_free_page_table
0000000000000050 r __ksymtab_nvidia_p2p_free_page_table
00000000004bcb10 T nvidia_p2p_free_page_table

[root@vega2 nvidia_peer_memory-1.0-0]# nm /lib/modules/2.6.32-504.el6.x86_64/extra/nv_peer_mem.ko |grep nvidia_p2p_free_page_table 
  U nvidia_p2p_free_page_table

I conclude saying that NVidia drivers are version 7.5, nvidia_peer_memory v. 1.0.0 package was downloaded from Mellanox site (http://www.mellanox.com/page/products_dyn?product_family=116), I tried also the new 1.0.1 version with the same effect. Moreover, the only packages present in host2 and NOT in host1 are:

lapack-3.2.1-4.el6.x86_64

lapack-devel-3.2.1-4.el6.x86_64

libX11-1.6.0-6.el6.x86_64

libX11-common-1.6.0-6.el6.noarch

libpng-1.2.49-2.el6_7.x86_64

libxcb-1.9.1-3.el6.x86_64

libxml2-2.7.6-21.el6_8.1.x86_64

libxml2-python-2.7.6-21.el6_8.1.x86_64

Can they interfere with nv_peer_mem service ?

 

Thanks in advance for any help.

  Stefano


Viewing all articles
Browse latest Browse all 6148

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>