Quantcast
Channel: Mellanox Interconnect Community: Message List
Viewing all articles
Browse latest Browse all 6148

Mellanox ConnectX-3 SR-IOV problem

$
0
0

Hi, All,

 

I have spent quite some time searching around for solutions. Tutorials and Q&As like:

https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Virtualization_Host_Configuration_and_Guest_Installation_Guide/sect-Virtualization_Host_Configuration_and_Guest_Installation_Guide-SR_IOV-How_SR_IOV_Libvirt_Works.html

https://community.mellanox.com/docs/DOC-1317

https://community.mellanox.com/docs/DOC-1484

are all very helpful. However, they primarily focus on how to create and pass Mellanox VFs to the guest, and stop right there. Unfortunately, although my guest can see the VF as a pci device, it failed on installing the driver. Here are some details:

 

Host: Intel Xeon CPU E5-2620 v3 @ 2.40GHz

         Debian 7

         Mellanox ConnectX-3 dual port

         Mellanox OFED driver v2.4-1.0.0.1

         VT-d and VT-x enabled in BIOS

         intel_iommu=on in kernel option

/etc/modprobe.d/mlx4_core.conf:

options mlx4_core port_type_array=2,2 num_vfs=4,4,0 probe_vf=4,4,0 enable_64b_cqe_eqe=0 log_num_mgm_entry_size=-1

I can see virtual functions created on host via "lspci -nn | grep Mellanox":

04:00.0 Ethernet controller [0200]: Mellanox Technologies MT27500 Family [ConnectX-3] [15b3:1003]

04:00.1 Ethernet controller [0200]: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function] [15b3:1004]

04:00.2 Ethernet controller [0200]: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function] [15b3:1004]

04:00.3 Ethernet controller [0200]: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function] [15b3:1004]

04:00.4 Ethernet controller [0200]: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function] [15b3:1004]

04:00.5 Ethernet controller [0200]: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function] [15b3:1004]

04:00.6 Ethernet controller [0200]: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function] [15b3:1004]

04:00.7 Ethernet controller [0200]: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function] [15b3:1004]

04:01.0 Ethernet controller [0200]: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function] [15b3:1004]

 

I also enabled MSI-X on the host Mellanox card driver, as shown in "lspci -vv -s 04:00.0"

04:00.0 Ethernet controller: Mellanox Technologies MT27500 Family [ConnectX-3]

    Subsystem: Mellanox Technologies Device 0049

    Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+

    Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-

    Latency: 0, Cache Line Size: 64 bytes

    Interrupt: pin A routed to IRQ 32

    Region 0: Memory at c7200000 (64-bit, non-prefetchable) [size=1M]

    Region 2: Memory at c5000000 (64-bit, prefetchable) [size=8M]

    Expansion ROM at c7100000 [disabled] [size=1M]

    Capabilities: [40] Power Management version 3

        Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)

        Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-

    Capabilities: [48] Vital Product Data

        Product Name: CX312A - ConnectX-3 SFP+

        Read-only fields:

            [PN] Part number: MCX312A-XCBT        

            [EC] Engineering changes: A9

            [SN] Serial number: MT1445K01104           

            [V0] Vendor specific: PCIe Gen3 x8   

            [RV] Reserved: checksum good, 0 byte(s) reserved

        Read/write fields:

            [V1] Vendor specific: N/A  

            [YA] Asset tag: N/A                    

            [RW] Read-write area: 105 byte(s) free

            [RW] Read-write area: 253 byte(s) free

            [RW] Read-write area: 253 byte(s) free

            [RW] Read-write area: 253 byte(s) free

            [RW] Read-write area: 253 byte(s) free

            [RW] Read-write area: 253 byte(s) free

            [RW] Read-write area: 253 byte(s) free

            [RW] Read-write area: 253 byte(s) free

            [RW] Read-write area: 253 byte(s) free

            [RW] Read-write area: 253 byte(s) free

            [RW] Read-write area: 253 byte(s) free

            [RW] Read-write area: 253 byte(s) free

            [RW] Read-write area: 253 byte(s) free

            [RW] Read-write area: 253 byte(s) free

            [RW] Read-write area: 253 byte(s) free

            [RW] Read-write area: 252 byte(s) free

        End

    Capabilities: [9c] MSI-X: Enable+ Count=128 Masked-

        Vector table: BAR=0 offset=0007c000

        PBA: BAR=0 offset=0007d000

    Capabilities: [60] Express (v2) Endpoint, MSI 00

        DevCap:    MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 unlimited

            ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+

        DevCtl:    Report errors: Correctable- Non-Fatal- Fatal- Unsupported-

            RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- FLReset-

            MaxPayload 256 bytes, MaxReadReq 512 bytes

        DevSta:    CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-

        LnkCap:    Port #8, Speed 8GT/s, Width x8, ASPM L0s, Latency L0 unlimited, L1 unlimited

            ClockPM- Surprise- LLActRep- BwNot-

        LnkCtl:    ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+

            ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-

        LnkSta:    Speed 8GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-

        DevCap2: Completion Timeout: Range ABCD, TimeoutDis+

        DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-

        LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB

             Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-

             Compliance De-emphasis: -6dB

        LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+, EqualizationPhase1+

             EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-

    Capabilities: [100 v1] Alternative Routing-ID Interpretation (ARI)

        ARICap:    MFVC- ACS-, Next Function: 0

        ARICtl:    MFVC- ACS-, Function Group: 0

    Capabilities: [148 v1] Device Serial Number f4-52-14-03-00-94-cc-c0

    Capabilities: [108 v1] Single Root I/O Virtualization (SR-IOV)

        IOVCap:    Migration-, Interrupt Message Number: 000

        IOVCtl:    Enable+ Migration- Interrupt- MSE+ ARIHierarchy+

        IOVSta:    Migration-

        Initial VFs: 16, Total VFs: 16, Number of VFs: 8, Function Dependency Link: 00

        VF offset: 1, stride: 1, Device ID: 1004

        Supported Page Size: 000007ff, System Page Size: 00000001

        Region 2: Memory at 00000000bd000000 (64-bit, prefetchable)

        VF Migration: offset: 00000000, BIR: 0

    Capabilities: [154 v2] Advanced Error Reporting

        UESta:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-

        UEMsk:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-

        UESvrt:    DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-

        CESta:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+

        CEMsk:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+

        AERCap:    First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-

    Capabilities: [18c v1] #19

    Kernel driver in use: mlx4_core

 

I use qemu-kvm and libvirt for guest machines, and here is the interface section of my guest configuration xml:

    <interface type='network'>      <mac address='52:54:00:78:06:44'/>      <source network='default'/>      <model type='rtl8139'/>      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>    </interface>    <interface type='hostdev' managed='yes'>      <mac address='52:54:00:6d:90:02'/>      <source>        <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x1'/>      </source>      <vlan>        <tag id='42'/>      </vlan>      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>    </interface>

 

I meant to pass the first virtual function to the guest.

After start the guest, I can see this Mellanox device via lspci:

00:05.0 Ethernet controller [0200]: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] [15b3:1004]

Next, I installed Mellanox Ethernet driver from http://www.mellanox.com/page/products_dyn?product_family=27, since I pass the ports as Ethernet port in the mlx4_core.conf file

However, after I reboot the guest, the dmesg gives:

mlx4_core: Mellanox ConnectX core driver v2.4-1.0.0.1 (Feb 19 2015)

mlx4_core: Initializing 0000:00:05.0

mlx4_core 0000:00:05.0: setting latency timer to 64

mlx4_core 0000:00:05.0: Detected virtual function - running in slave mode

mlx4_core 0000:00:05.0: Sending reset

mlx4_core 0000:00:05.0: Sending vhcr0

mlx4_core 0000:00:05.0: Requested number of MACs is too much for port 1, reducing to 64.

mlx4_core 0000:00:05.0: HCA minimum page size:512

mlx4_core 0000:00:05.0: Timestamping is not supported in slave mode.

  alloc irq_desc for 24 on node -1

  alloc kstat_irqs on node -1

mlx4_core 0000:00:05.0: irq 24 for MSI/MSI-X

  alloc irq_desc for 25 on node -1

  alloc kstat_irqs on node -1

mlx4_core 0000:00:05.0: irq 25 for MSI/MSI-X

mlx4_core 0000:00:05.0: communication channel command 0x31 timed out.

mlx4_core 0000:00:05.0: mlx4_enter_error_state: device is going to be reset

mlx4_core 0000:00:05.0: VF is sending reset request to Firmware.

mlx4_core 0000:00:05.0: VF Reset succeed, unloading VF driver.

mlx4_core 0000:00:05.0: mlx4_enter_error_state: device was reset successfully

mlx4_core 0000:00:05.0: mlx4_enter_error_state: end

mlx4_core 0000:00:05.0: NOP command failed to generate MSI-X interrupt IRQ 24).

mlx4_core 0000:00:05.0: Trying again without MSI-X.

mlx4_core 0000:00:05.0: Failed to close slave function.

mlx4_core: probe of 0000:00:05.0 failed with error -5

unload and load mlx4_core via modprobe with give similar message.

 

It appears to me the driver cannot be installed correctly on the guest. Please advice and many thanks in advance!


Viewing all articles
Browse latest Browse all 6148

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>