Quantcast
Channel: Mellanox Interconnect Community: Message List
Viewing all articles
Browse latest Browse all 6148

mlx5_core - device's health compromised

$
0
0

Dear all,

 

I have a connect-IB adapter, firmware version 10.10.5020, connected via a PCIe switch to a host running CENTOS 7 and Mellanox drivers MLNX_OFED_LINUX-2.4-1.0.4-rhel7.0-x86_64.

Just after boot, I do see the following messages:

 

localhost kernel: mlx5_core 0000:03:00.0: device's health compromised

localhost kernel: mlx5_core 0000:03:00.0: assert_var[0] 0x0000007a

localhost kernel: mlx5_core 0000:03:00.0: assert_var[1] 0x0000006e

localhost kernel: mlx5_core 0000:03:00.0: assert_var[2] 0x00000000

localhost kernel: mlx5_core 0000:03:00.0: assert_var[3] 0x00000000

localhost kernel: mlx5_core 0000:03:00.0: assert_var[4] 0x00000000

localhost kernel: mlx5_core 0000:03:00.0: assert_exit_ptr 0x006a013c

localhost kernel: mlx5_core 0000:03:00.0: assert_callra 0x006a0c9c

localhost kernel: mlx5_core 0000:03:00.0: fw_ver 0xa00a139c

localhost kernel: mlx5_core 0000:03:00.0: hw_id 0x000001ff

localhost kernel: mlx5_core 0000:03:00.0: irisc_index 0

localhost kernel: mlx5_core 0000:03:00.0: synd 0x10: High temprature

localhost kernel: mlx5_core 0000:03:00.0: ext_synd 0x0000

localhost kernel: mlx5_core 0000:03:00.0: handling bad device here

 

PCIe device 03:00.0 is the connect-IB card.

The system runs safely so far and the ports link-up with both QDR or FDR cables.

However, I am worried about the health of the system.

Anybody knows in specific what's the meaning of the messages reported above?

 

Many thanks.


Viewing all articles
Browse latest Browse all 6148

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>