Hi, All,
I have spent quite some time searching around for solutions. Tutorials and Q&As like:
https://community.mellanox.com/docs/DOC-1317
https://community.mellanox.com/docs/DOC-1484
are all very helpful. However, they primarily focus on how to create and pass Mellanox VFs to the guest, and stop right there. Unfortunately, although my guest can see the VF as a pci device, it failed on installing the driver. Here are some details:
Host: Intel Xeon CPU E5-2620 v3 @ 2.40GHz
Debian 7
Mellanox ConnectX-3 dual port
Mellanox OFED driver v2.4-1.0.0.1
VT-d and VT-x enabled in BIOS
intel_iommu=on in kernel option
/etc/modprobe.d/mlx4_core.conf:
options mlx4_core port_type_array=2,2 num_vfs=4,4,0 probe_vf=4,4,0 enable_64b_cqe_eqe=0 log_num_mgm_entry_size=-1
I can see virtual functions created on host via "lspci -nn | grep Mellanox":
04:00.0 Ethernet controller [0200]: Mellanox Technologies MT27500 Family [ConnectX-3] [15b3:1003]
04:00.1 Ethernet controller [0200]: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function] [15b3:1004]
04:00.2 Ethernet controller [0200]: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function] [15b3:1004]
04:00.3 Ethernet controller [0200]: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function] [15b3:1004]
04:00.4 Ethernet controller [0200]: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function] [15b3:1004]
04:00.5 Ethernet controller [0200]: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function] [15b3:1004]
04:00.6 Ethernet controller [0200]: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function] [15b3:1004]
04:00.7 Ethernet controller [0200]: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function] [15b3:1004]
04:01.0 Ethernet controller [0200]: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function] [15b3:1004]
I also enabled MSI-X on the host Mellanox card driver, as shown in "lspci -vv -s 04:00.0"
04:00.0 Ethernet controller: Mellanox Technologies MT27500 Family [ConnectX-3]
Subsystem: Mellanox Technologies Device 0049
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 32
Region 0: Memory at c7200000 (64-bit, non-prefetchable) [size=1M]
Region 2: Memory at c5000000 (64-bit, prefetchable) [size=8M]
Expansion ROM at c7100000 [disabled] [size=1M]
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [48] Vital Product Data
Product Name: CX312A - ConnectX-3 SFP+
Read-only fields:
[PN] Part number: MCX312A-XCBT
[EC] Engineering changes: A9
[SN] Serial number: MT1445K01104
[V0] Vendor specific: PCIe Gen3 x8
[RV] Reserved: checksum good, 0 byte(s) reserved
Read/write fields:
[V1] Vendor specific: N/A
[YA] Asset tag: N/A
[RW] Read-write area: 105 byte(s) free
[RW] Read-write area: 253 byte(s) free
[RW] Read-write area: 253 byte(s) free
[RW] Read-write area: 253 byte(s) free
[RW] Read-write area: 253 byte(s) free
[RW] Read-write area: 253 byte(s) free
[RW] Read-write area: 253 byte(s) free
[RW] Read-write area: 253 byte(s) free
[RW] Read-write area: 253 byte(s) free
[RW] Read-write area: 253 byte(s) free
[RW] Read-write area: 253 byte(s) free
[RW] Read-write area: 253 byte(s) free
[RW] Read-write area: 253 byte(s) free
[RW] Read-write area: 253 byte(s) free
[RW] Read-write area: 253 byte(s) free
[RW] Read-write area: 252 byte(s) free
End
Capabilities: [9c] MSI-X: Enable+ Count=128 Masked-
Vector table: BAR=0 offset=0007c000
PBA: BAR=0 offset=0007d000
Capabilities: [60] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 unlimited
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- FLReset-
MaxPayload 256 bytes, MaxReadReq 512 bytes
DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
LnkCap: Port #8, Speed 8GT/s, Width x8, ASPM L0s, Latency L0 unlimited, L1 unlimited
ClockPM- Surprise- LLActRep- BwNot-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 8GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range ABCD, TimeoutDis+
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-
LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+, EqualizationPhase1+
EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
Capabilities: [100 v1] Alternative Routing-ID Interpretation (ARI)
ARICap: MFVC- ACS-, Next Function: 0
ARICtl: MFVC- ACS-, Function Group: 0
Capabilities: [148 v1] Device Serial Number f4-52-14-03-00-94-cc-c0
Capabilities: [108 v1] Single Root I/O Virtualization (SR-IOV)
IOVCap: Migration-, Interrupt Message Number: 000
IOVCtl: Enable+ Migration- Interrupt- MSE+ ARIHierarchy+
IOVSta: Migration-
Initial VFs: 16, Total VFs: 16, Number of VFs: 8, Function Dependency Link: 00
VF offset: 1, stride: 1, Device ID: 1004
Supported Page Size: 000007ff, System Page Size: 00000001
Region 2: Memory at 00000000bd000000 (64-bit, prefetchable)
VF Migration: offset: 00000000, BIR: 0
Capabilities: [154 v2] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
Capabilities: [18c v1] #19
Kernel driver in use: mlx4_core
I use qemu-kvm and libvirt for guest machines, and here is the interface section of my guest configuration xml:
<interface type='network'> <mac address='52:54:00:78:06:44'/> <source network='default'/> <model type='rtl8139'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </interface> <interface type='hostdev' managed='yes'> <mac address='52:54:00:6d:90:02'/> <source> <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x1'/> </source> <vlan> <tag id='42'/> </vlan> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> </interface>
I meant to pass the first virtual function to the guest.
After start the guest, I can see this Mellanox device via lspci:
00:05.0 Ethernet controller [0200]: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] [15b3:1004]
Next, I installed Mellanox Ethernet driver from http://www.mellanox.com/page/products_dyn?product_family=27, since I pass the ports as Ethernet port in the mlx4_core.conf file
However, after I reboot the guest, the dmesg gives:
mlx4_core: Mellanox ConnectX core driver v2.4-1.0.0.1 (Feb 19 2015)
mlx4_core: Initializing 0000:00:05.0
mlx4_core 0000:00:05.0: setting latency timer to 64
mlx4_core 0000:00:05.0: Detected virtual function - running in slave mode
mlx4_core 0000:00:05.0: Sending reset
mlx4_core 0000:00:05.0: Sending vhcr0
mlx4_core 0000:00:05.0: Requested number of MACs is too much for port 1, reducing to 64.
mlx4_core 0000:00:05.0: HCA minimum page size:512
mlx4_core 0000:00:05.0: Timestamping is not supported in slave mode.
alloc irq_desc for 24 on node -1
alloc kstat_irqs on node -1
mlx4_core 0000:00:05.0: irq 24 for MSI/MSI-X
alloc irq_desc for 25 on node -1
alloc kstat_irqs on node -1
mlx4_core 0000:00:05.0: irq 25 for MSI/MSI-X
mlx4_core 0000:00:05.0: communication channel command 0x31 timed out.
mlx4_core 0000:00:05.0: mlx4_enter_error_state: device is going to be reset
mlx4_core 0000:00:05.0: VF is sending reset request to Firmware.
mlx4_core 0000:00:05.0: VF Reset succeed, unloading VF driver.
mlx4_core 0000:00:05.0: mlx4_enter_error_state: device was reset successfully
mlx4_core 0000:00:05.0: mlx4_enter_error_state: end
mlx4_core 0000:00:05.0: NOP command failed to generate MSI-X interrupt IRQ 24).
mlx4_core 0000:00:05.0: Trying again without MSI-X.
mlx4_core 0000:00:05.0: Failed to close slave function.
mlx4_core: probe of 0000:00:05.0 failed with error -5
unload and load mlx4_core via modprobe with give similar message.
It appears to me the driver cannot be installed correctly on the guest. Please advice and many thanks in advance!