Quantcast
Channel: Mellanox Interconnect Community: Message List
Viewing all 6148 articles
Browse latest View live

Re: Dell M1000e blade server, InfiniBand QDR subnet issue, OFED 4.4, opensm initialization error!

$
0
0

Well, it was one busy weekend troubleshooting and a lot of work. I may have solved few issues but it is not perfect yet!

 

The OEM updates (tried few) would not work because of PSID mistmatch, if there is a work around, please let me know. I'm not able to find any firmware online for PSID of the switch M3601Q.

[root@headnode Infini Switch firmware]# ls

fw-sx-9_2_8000-0269NG_B1.bin

[root@headnode Infini Switch firmware]# lspci | grep Mellanox

07:00.0 Infiniband controller: Mellanox Technologies MT27500 Family [ConnectX-3]

[root@headnode Infini Switch firmware]# mstflint -d 07:00.0 -i fw-sx-9_2_8000-0269NG_B1.bin b

    Current FW version on flash:  2.10.2132

    New FW version:              9.2.8000

-E- PSID mismatch. The PSID on flash (DEL0A10210018) differs from the PSID in the given image (DEL09E0210003).

[root@headnode Infini Switch firmware]#

 

I tried forcing GUID through command line as suggested as I don't have a opensm.conf file anywhere.

 

Then I went ahead and uninstalled Mellanox OFED and started with Open Fabrics OFED. There were few missing errors (cmake, libnl3-devel, numactl-devel,  devel-grind), after getting those rpm's and dependencies all sorted, it did install. The Port GUID did recognize and infiniband is active. DHCP didn't do it, so I set it up as manual, may not be perfect yet. The issues lingering now are OFED related, I cant seem to get opensm run auto, it has to be started with #/etc/init.d/opensmd start. After starting it, ibv_devinfo and nmcli connection show gives:

[root@headnode ~]# ibv_devinfo

hca_id:    mlx4_0

    transport:            InfiniBand (0)

    fw_ver:                2.10.2132

    node_guid:            0002:c903:00f9:32f0

    sys_image_guid:            0002:c903:00f9:32f3

    vendor_id:            0x02c9

    vendor_part_id:            4099

    hw_ver:                0x0

    board_id:            DEL0A10210018

    phys_port_cnt:            2

        port:    1

            state:            PORT_ACTIVE (4)

            max_mtu:        4096 (5)

            active_mtu:        4096 (5)

            sm_lid:            1

            port_lid:        1

            port_lmc:        0x00

            link_layer:        InfiniBand

 

        port:    2

            state:            PORT_DOWN (1)

            max_mtu:        4096 (5)

            active_mtu:        4096 (5)

            sm_lid:            0

            port_lid:        0

            port_lmc:        0x00

            link_layer:        InfiniBand

 

[root@headnode ~]# nmcli connection show

NAME                UUID                                  TYPE            DEVICE

Wired connection 2  a40b3b41-66e7-3d87-a77c-e79ccd002698  802-3-ethernet  em1   

Wired connection 3  7b5a96ce-3df4-3534-8a35-b430f3f1e3e5  802-3-ethernet  em2   

ib0                 b4fdfa83-45ba-4904-a8ec-377234b898ee  infiniband      ib0   

virbr0              d36acaba-3663-4199-ae03-0b2a39aa75df  bridge          virbr0

Bridge em1          1dad842d-1912-ef5a-a43a-bc238fb267e7  bridge          --    

Bridge em2          0578038a-64e9-a2fd-0a28-e4cd0b553930  bridge          --    

System ib0          2ab4abde-b8a5-6cbc-19b1-2bfb193e4e89  infiniband      --    

System pem1         c19149d5-4e53-4636-b52a-81d213a8a3cb  802-3-ethernet  --    

System pem2         7379072d-ea75-335e-2486-0afa3cd10c77  802-3-ethernet  --    

Wired connection 1  d4070b38-e850-4a48-83a7-223ecca993f7  802-3-ethernet  --    

ib0                 4e22b1f1-3e0c-4b84-b0d9-85b0755728ac  infiniband      --    

ib0                 152321c5-8ba1-4865-9eca-5a18a889ffb7  infiniband      --    

ib1                 9fd439a6-da5e-4928-9265-47a636b3aaea  infiniband      --  

 

#ifconfig -a ib0

ib0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 65520

        inet 10.1.27.7  netmask 255.0.0.0  broadcast 10.1.77.77

        inet6 fe80::202:c903:f9:32f1  prefixlen 64  scopeid 0x20<link>

Infiniband hardware address can be incorrect! Please read BUGS section in ifconfig(8).

        infiniband 80:00:02:08:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00  txqueuelen 256  (InfiniBand)

        RX packets 0  bytes 0 (0.0 B)

        RX errors 0  dropped 0  overruns 0  frame 0

        TX packets 289  bytes 19652 (19.1 KiB)

        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

 

Next resolution: I'm waiting on two Dell flash SD's for CMC, so I can get all drivers updated on the chassis and nodes. It is a lot slower through UEFI and some drivers are too big anyway. Hopefully the I/O update may help! Next, I may do a fresh install of Rocks Cluster 7 (Manzanita) and try the prior versions of Mellanox OFED such as 4.1 or 3.xx. I can come back to OFED as well.

 

Issues persisting: The commands ibstat, ibhosts, etc. of OFED do not work, perhaps a failure on OFED side. The ib0 still shows hardware error, perhaps firmware issue. HCA test command do not work, but seems good as port is active. I have a different issue of Rocks Clusters command "insert-ethers" non responding to connect the switch and compute nodes, hence the reinstall.

 

Sorry, seems like a mess, thank you for the time! I know I'll get around it one way or the other, may even have to buy a newer m4001 switch that has current drivers. Wonder if Mellanox will share an archive m3601q firmware?


Re: Remote VTEP mac learning is not working

$
0
0

Hi Maheedhara,

Can you please provide us with the running-config from both the leaf switches.

 

Thanks,Praitk Pande

Re: Tagged Ethernet interface (VLAN PFC) fails to activate on Ubuntu 16.04

$
0
0

Hi Michael,

Thank you for posting your question on the Mellanox Community.

Based on the information provided, please follow Mellanox Community Document -> https://community.mellanox.com/docs/DOC-2474

If after applying the Community document, the issue issue is not resolved, please open a Mellanox Support case by sending an email to support@mellanox.com

Thanks and regards,
~Mellanox Technical Support

Re: Tagged Ethernet interface (VLAN PFC) fails to activate on Ubuntu 16.04

$
0
0

Hi,

 

I'm not sure how HowTo Configure PFC on ConnectX-4   article is helpful. It doesn't relate to NetworkManager at all.

 

To whoever is reading this thread - this question is NOT answered as of this moment.

Re: rx-out-of-buffer

$
0
0

Hi Tom,

Thanks you for posting your question on the Mellanox Community.

Based on the information provided, the following Mellanox Community document explains the 'rx_out_of_buffer' ethtool/xstat statistic.

You can improve the rx_out_of_buffer behavior with tuning the node and also modifying the ring-size on the adapter (ethtool -g <int>)

Also make sure, you follow the DPDK Performance recommendations from the following link -> https://doc.dpdk.org/guides/nics/mlx5.html#performance-tuning

If you still experience performance issues after these recommendations, please do not hesitate to open a Mellanox Support Case, by emailing to support@mellanox.com

Thanks and regards,
~Mellanox Technical Support

Re: DPDK-mlx5 set_mac question on Mellanox NIC passthru (or SR-IOV) at VMWare Hypervisor.

$
0
0

Hi Edward,

Thank you for posting your question on the Mellanox Community.

Based on the information provided, this issue needs more engineering effort.

Can you please open a Mellanox Support case by emailing support@mellanox.com

Thanks and regards,
~Mellanox Technical Support

Re: Dell M1000e blade server, InfiniBand QDR subnet issue, OFED 4.4, opensm initialization error!

$
0
0

Thank you! I totally missed checking the adapters!  I did find the correct file needed for the switch PSID. Also, appreciate the explanation on opensm config. I believe my flash cards for CMC will be delivered tomorrow, will start of updates (Bios, I/O, etc.) first, install frontend Rocks and then tackle MOFED install (3.4 works), I think this will get moving...!

How to Configure Docker in SR-IOV or Passthrough Mode with Mellanox Infiniband Adapters ?

$
0
0

How to Configure Docker in SR-IOV or Passthrough Mode with Mellanox Infiniband Adapters ?

 

I have the following Infiniband HCA's

 

ConnectX-3

Connect-IB

ConnectX-4

 

1. is all the above HCA's SR-IOV configurable for Docker Environment ?
2. What is the documentation that should be following ?

3. Can it establish rdma communication while using it in the docker ?


how to create a topology file with the MSB7890 and 2 HCA card

$
0
0

Hello , Mellanox Academy support team

     I am on studying the Infiniband Fabric ,and I have problem on create the topology file on the MSB7890 and 2 HCA card connect to it ,but I don't know how to create the topology file here ,some question here are :

1: the course on the topology file is out of date and not clear to me ,it is focus on the SX6036/SX6025

2: no ibnl file related tothe MSB7890 switch,

3: no SM defined in the topology file

 

here is what I do in my topology file :

 

[root@node01 ~]# ibdmchk -t a.topo

-------------------------------------------------

IBDMCHK Cluster Design Mode:

Topology File .. a.topo

SM Node ........

SM Port ........ 4294967295

LMC ............ 0

-I- Parsing topology definition:a.topo

-W- Ignoring 'p13 -> HCA-1 node01 p1' (line:2)

-W- Ignoring 'p17 -> HCA-1 node04 p1' (line:3)

-I- Defined 1/2 systems/nodes

-E- Fail to find SM node:

[root@node01 ~]# cat a.topo

MSB7700 MSB7700

p13 -> HCA-1 node01 p1

p17 -> HCA-1 node04 p1

 

 

here is the link for my ib network :

 

[root@node01 ~]# iblinkinfo

CA: node04 HCA-1:

      0x98039b0300078390      4    1[  ] ==( 4X      25.78125 Gbps Active/  LinkUp)==>       3   17[  ] "SwitchIB Mellanox Technologies" ( )

Switch: 0x248a070300f82490 SwitchIB Mellanox Technologies:

           3    1[  ] ==(                Down/ Polling)==>             [  ] "" ( )

           3    2[  ] ==(                Down/ Polling)==>             [  ] "" ( )

           3    3[  ] ==(                Down/ Polling)==>             [  ] "" ( )

           3    4[  ] ==(                Down/ Polling)==>             [  ] "" ( )

           3    5[  ] ==(                Down/ Polling)==>             [  ] "" ( )

           3    6[  ] ==(                Down/ Polling)==>             [  ] "" ( )

           3    7[  ] ==(                Down/ Polling)==>             [  ] "" ( )

           3    8[  ] ==(                Down/ Polling)==>             [  ] "" ( )

           3    9[  ] ==(                Down/ Polling)==>             [  ] "" ( )

           3   10[  ] ==(                Down/ Polling)==>             [  ] "" ( )

           3   11[  ] ==(                Down/ Polling)==>             [  ] "" ( )

           3   12[  ] ==(                Down/ Polling)==>             [  ] "" ( )

           3   13[  ] ==( 4X      25.78125 Gbps Active/  LinkUp)==>       1    1[  ] "node01 HCA-1" ( )

           3   14[  ] ==(                Down/ Polling)==>             [  ] "" ( )

           3   15[  ] ==(                Down/ Polling)==>             [  ] "" ( )

           3   16[  ] ==(                Down/ Polling)==>             [  ] "" ( )

           3   17[  ] ==( 4X      25.78125 Gbps Active/  LinkUp)==>       4    1[  ] "node04 HCA-1" ( )

           3   18[  ] ==(                Down/ Polling)==>             [  ] "" ( )

           3   19[  ] ==(                Down/ Polling)==>             [  ] "" ( )

           3   20[  ] ==(                Down/ Polling)==>             [  ] "" ( )

           3   21[  ] ==(                Down/ Polling)==>             [  ] "" ( )

           3   22[  ] ==(                Down/ Polling)==>             [  ] "" ( )

           3   23[  ] ==(                Down/ Polling)==>             [  ] "" ( )

           3   24[  ] ==(                Down/ Polling)==>             [  ] "" ( )

           3   25[  ] ==(                Down/ Polling)==>             [  ] "" ( )

           3   26[  ] ==(                Down/ Polling)==>             [  ] "" ( )

           3   27[  ] ==(                Down/ Polling)==>             [  ] "" ( )

           3   28[  ] ==(                Down/ Polling)==>             [  ] "" ( )

           3   29[  ] ==(                Down/ Polling)==>             [  ] "" ( )

           3   30[  ] ==(                Down/ Polling)==>             [  ] "" ( )

           3   31[  ] ==(                Down/ Polling)==>             [  ] "" ( )

           3   32[  ] ==(                Down/ Polling)==>             [  ] "" ( )

           3   33[  ] ==(                Down/ Polling)==>             [  ] "" ( )

           3   34[  ] ==(                Down/ Polling)==>             [  ] "" ( )

           3   35[  ] ==(                Down/ Polling)==>             [  ] "" ( )

           3   36[  ] ==(                Down/ Polling)==>             [  ] "" ( )

           3   37[  ] ==(                Down/ Polling)==>             [  ] "" ( )

CA: node01 HCA-1:

      0x98039b0300078348      1    1[  ] ==( 4X      25.78125 Gbps Active/  LinkUp)==>       3   13[  ] "SwitchIB Mellanox Technologies" ( )

Re: Small redundant MLAG setup

$
0
0

Hi, the best practice in this case is to connect the switches to each other and configure MPOs between them, this way all of the links will be active.

Add all ports between the switches into the same mlag-port-channel on each of the mlag pairs.

 

Logically it will act as bellow:

Re: Factors that determine compatibility of SFPs with new fibre services?

Re: Setup Mellanox MSX1012B in HA environment.

$
0
0

Hi it should look similar to this :

 

Re: Line rate using Connect_X5 100G EN in Ubuntu; PCIe speed difference;

$
0
0

Can I get some expert advice on this if possible?

 

Thanks

Re: SN2100 QSFP (40Gbps) needs to be connected to Cisco Catalyst 3850 SFP 1Gbps port.

$
0
0

Hi Corbin,

 

If we plug MC3208011-SX into  MAM1Q00A-QSA and configure "speed 1000" at that interface ,  will it works fine ?

==> Yes. Shouldn't be an issue

         Just make sure you configure the interface speed manually to 1G

 

Thanks,

Pratik Pande

Re: How to enable the debuginfo for the libraries of OFED?

$
0
0

Hi Alkx,

Thanks a lot for your guide. I just read this reply. In the past days, I use the method of rpm patching and rebuilding, force installation for the debugging. It works and I can get the symbols, but very complex. The method you provided here seems to be very flexible. I will try this next time.


Questions about how to Clean Fiber Optical Connector?

$
0
0

The dust is invisible to the naked eye and is very easy to attach to the fiber connector. In the routine maintenance of the fiber optic connector, the fiber optic connector is contaminated with oil, powder and other contaminants. These contaminants may cause problems such as unclean fiber tips, aging connectors, degraded cable quality, and unobstructed network links. Therefore, it is necessary to clean the fiber connector regularly and take dust-proof measures.

Re: When will an ACK generated in RDMA write?

$
0
0

Hi Alkx,

Many thanks.

I will read the specification later when time is available. Yes, I did ask the question on that blog. When I click to commit my question, nothing update there. So I assumed that I failed to commit due to the network or the server's problem. To my understanding, the NIC will generate the ACK once it received the whole frame/segment without any error to accelerate the processing. Indeed, almost all the chips will do so in the embedded platform, maybe the NIC on the server will have the same consideration.

 

I am just confused the cycles gap of poll cq between the RC and UC mode. And by the way, where can I get the latency and BW report of MLNX NICs? Is it public? From the blog, it is said that usually less than 1 microsecond with small packets. But my testing result is more than 1 us. And no matter whether the numactl is used to specify the CPU core and memory allocation.

Redhat VM's on ESXi 6.5 U1 7967591 Cluster - poor 40G

$
0
0

Hi All,

 

Performance issue on 8 x Del R740 servers with ESXi 6.5 and vCenter 6.5

 

Each server required 4 VM's (Redhat 7.4) to sit on it
with 1Gb, 10Gb and 40Gb network access and to accommodate that we needed to
create 10Gb and 40Gb distributed switches. We uplinked 2 ports from a 40Gb
& 10Gb switch to the servers.

 

The 10Gb distributed switch works fine but our 40Gb distributed switch is problematic.

We are using the tool iPerf to validate the bandwidth between the cluster and an Isilon storage array.

 

troubleshooting steps:

 

- iPerf from 1 Isilon node to Isilon node, consistently getting over 35Gb

- From VM to Isilon Node, iPerf is satisfactorily getting  over 35Gb

- From Isilon node to VM, iPerf is ranging between 13Gb and 20Gb

- between  redhat VM's on separate ESXi hosts, iPerf is ranging between 13Gb and 20Gb

- From VMs on same ESXi hosts, iPerf is getting over 35Gb

- We checked the firmware of the 40Gb card (MLNX 40Gb 2P ConnectX3Pro Adpt ) and it's at the latest version (2.42.5000).

 

 

Has encountered a similar issue - or what steps need to be taken for ESXi 6.5 and the mellanox cards (MLNX 40Gb 2P ConnectX3Pro Adpt)

Re: Add iPXE support for Connectx-3-Pro MT27520

$
0
0

Here is the fw details of the card. I think those are the latest.

 

flint -d /dev/mst/mt4103_pciconf0 q

Image type:            FS2

FW Version:            2.42.5000

FW Release Date:       5.9.2017

Product Version:       02.42.50.00

Rom Info:              version_id=8025 type=CLP

                       type=UEFI version=14.11.45 cpu=AMD64

                       type=PXE version=3.4.752

Device ID:             4103

Description:           Node             Port1            Port2            Sys image

GUIDs:                 ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff

MACs:                                       70106fa0f9b0     70106fa0f9b1

VSD:                  

PSID:                  HP_2240110004

Support for "INBOX drivers?" for 18.04/connected mode?

$
0
0

I saw at http://www.mellanox.com/page/products_dyn?product_family=26

"Linux Inbox Drivers

Mellanox Adapters' Linux VPI Drivers for Ethernet and InfiniBand are also available Inbox in all the major distributions, RHEL, SLES, Ubuntu and more. Inbox drivers enable Mellanox High performance for Cloud, HPC, Storage, Financial Services and more with the Out of box experience of Enterprise grade Linux distributions."

I've used this for several generations of Mellanox cards of for last decade with a wide variety of linux distributions.  Just config with /etc/network/interfaces, ifconfig ib0 works, datagram/connected mode works, IPoIB works. 

From what I can tell with 18.04 it doesn't work.  The default (ubuntu supplied/INBOX) drivers:

# cat /sys/class/net/ib0/mode

datagram

# echo connected > /sys/class/net/ib0/mode

-bash: echo: write error: Invalid argument


I found docs that with the MLNX_OFED drivers that you just:

  # cat ib_ipoib.conf

  options ib_ipoib ipoib_enhanced=0

I get:

[   57.573664] ib_ipoib: unknown parameter 'ipoib_enhanced' ignored


Is there any documentation for the "INBOX drivers"?  Anyone know how to get connected mode working?  I'm guess the 18.04 drivers are too old to have the ability to disable enhanced mode, but to new to have the connected mode working by default.

Viewing all 6148 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>