Mellanox crashed our SAN connections this morning with bad drivers. Thousands of lines like below in our kern.log:
...
Aug 5 09:04:55 khazix kernel: [169598.036781] mlx4_en: eth4: CQE error - vendor syndrome: 0xf9 syndrome: 0x5
Aug 5 09:04:55 khazix kernel: [169598.037282] mlx4_en: eth4: CQE error - vendor syndrome: 0xf9 syndrome: 0x5
Aug 5 09:04:55 khazix kernel: [169598.277464] mlx4_en: eth4: CQE error - vendor syndrome: 0xf9 syndrome: 0x5
Aug 5 09:04:55 khazix kernel: [169598.278037] mlx4_en: eth4: CQE error - vendor syndrome: 0xf9 syndrome: 0x5
Aug 5 09:04:56 khazix kernel: [169598.685302] mlx4_en: eth4: CQE error - vendor syndrome: 0xf9 syndrome: 0x5
Aug 5 09:04:56 khazix kernel: [169599.501394] mlx4_en: eth4: CQE error - vendor syndrome: 0xf9 syndrome: 0x5
Aug 5 09:04:57 khazix kernel: [169600.031606] mlx4_en: eth4: CQE error - vendor syndrome: 0xf9 syndrome: 0x5
Aug 5 09:04:57 khazix kernel: [169600.302486] mlx4_en: eth4: CQE error - vendor syndrome: 0xf9 syndrome: 0x5
Aug 5 09:04:57 khazix kernel: [169600.302965] mlx4_en: eth4: CQE error - vendor syndrome: 0xf9 syndrome: 0x5
Aug 5 09:04:57 khazix kernel: [169600.303317] mlx4_en: eth4: CQE error - vendor syndrome: 0xf9 syndrome: 0x5
...
Looks like a bad driver 2.2, trying 2.4 now, but can't build the 3.0 version because its install.sh believes wrongly that my compiler cannot build executables. Running Debian amd64 8.0 (Jessie).
I can provide all kinds of info, but don't know where to turn.
Mellanox doesn't seem to have a good support system. I tried contacting support and they expect a verification, and the page doesn't work.
Do people actually use these cards for serious purposes?
Dave