Hi OVS Networking Gurus,
I am a developer that is testing a new client/server app that will someday have to run through an OVS network. A colleague built me a working testbed from a spare Ubuntu host (ver 16.04.2 LTS). My test client & server machines connect to the Ubuntu, and within the host, their traffic runs through an OVS bridge. (OVS ver 2.12.1, DPDK ver 18.11.2) You can think of the testbed like this:
I know a bit about Ubuntu and OVS, but my colleague did the bit where the Ubuntu’s physical interfaces (eno1 and eno2) were connected to logical interfaces on the OVS bridge (dpdk1 and dpdk2) At the time, I remember that he told me that he would “install the DPDK driver on the physical interfaces.” My notes tell me he used these commands:
/usr/local/share/dpdk/tools/dpdk_nic_bind.py -u 01:00.0
/usr/local/share/dpdk/tools/dpdk_nic_bind.py -u 01:00.1
/usr/local/share/dpdk/tools/dpdk_nic_bind.py -b igb_uio 01:00.1
/usr/local/share/dpdk/tools/dpdk_nic_bind.py -b igb_uio 01:00.0
ovs-vsctl add-port OVS_Bridge dpdk1 -- set Interface dpdk1 type=dpdk options:dpdk-devargs=0000:01:00.0
ovs-vsctl add-port OVS_Bridge dpdk2 -- set Interface dpdk2 type=dpdk options:dpdk-devargs=0000:01:00.1And everything worked great. My client and server hosts were able to talk to one another just fine.
But then, my colleague left the company.
And after that, I got a new testing requirement that my client & server had to communicate by sending jumbo packets, i.e. packets larger than 1518 bytes. I can set the packet sizes on my test hosts, no problem. But when they send traffic through the Ubuntu, the traffic fragments to 1518 bytes. By playing around with virtual hosts inside the Ubuntu, I know the problem isn’t the OVS bridge. I’m guessing the bottleneck is the two physical interfaces.
By default, a Ubuntu interface has an MTU of 1500 bytes. When my colleague set up this environment for me, we never played with MTU, so I suppose eno1 and eno2 still have an MTU of 1500. But unfortunately, the interfaces no longer appear in an ifconfig output.
Does anyone know how to check the current MTU if these interfaces are running the DPDK driver? And if so, can I set that MTU to 9000? Thank you!
Update for @heynnema...
me@linux:~# tracepath 168.161.114.120 1?: [LOCALHOST] pmtu 1500 1: 168.161.114.100 2999.544ms !H Resume: pmtu 1500
root@upce-superl1:~#Second update for @heynnema:
The "ovs-vsctl list-ports PDH_bridge2" is abridged to show the two ports that I care about. The "sudo lshw -C bridge" is completely unabridged, as I don't know what is relevant here.
However... I'm fairly certain that the OVS bridge is not the bottleneck. When I spin up two VMs and use iPerf to run traffic between them, there is no problem with packets larger than 1500 bytes. However, when I run traffic through the two physical interfaces connected to OVS bridge ports dpdk1 and dpdk2, my large packets fragment. I'm hoping to find a way to set the MTU on the physical interfaces AFTER they have been bonded to the DPDK driver. Thank you.
met@linux:~# ovs-vsctl list-ports PDH_bridge2
dpdk1
dpdk2
met@linux:~#
met@linux:~#
met@linux:~#
met@linux:~# sudo lshw -C bridge *-pci:0 description: Host bridge product: Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D DMI2 vendor: Intel Corporation physical id: 100 bus info: pci@0000:00:00.0 version: 01 width: 32 bits clock: 33MHz *-pci:0 description: PCI bridge product: Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D PCI Express Root Port 1 vendor: Intel Corporation physical id: 1 bus info: pci@0000:00:01.0 version: 01 width: 32 bits clock: 33MHz capabilities: pci msi pciexpress pm normal_decode bus_master cap_list configuration: driver=pcieport resources: irq:25 ioport:6000(size=4096) memory:90000000-903fffff ioport:c7a00000(size=5242880) *-pci:1 description: PCI bridge product: Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D PCI Express Root Port 2 vendor: Intel Corporation physical id: 2 bus info: pci@0000:00:02.0 version: 01 width: 32 bits clock: 33MHz capabilities: pci msi pciexpress pm normal_decode bus_master cap_list configuration: driver=pcieport resources: irq:27 ioport:5000(size=4096) memory:90400000-90afffff *-pci:2 description: PCI bridge product: Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D PCI Express Root Port 3 vendor: Intel Corporation physical id: 3 bus info: pci@0000:00:03.0 version: 01 width: 32 bits clock: 33MHz capabilities: pci msi pciexpress pm normal_decode bus_master cap_list configuration: driver=pcieport resources: irq:29 *-pci:3 description: PCI bridge product: C610/X99 series chipset PCI Express Root Port #1 vendor: Intel Corporation physical id: 1c bus info: pci@0000:00:1c.0 version: d5 width: 32 bits clock: 33MHz capabilities: pci pciexpress msi pm normal_decode bus_master cap_list configuration: driver=pcieport resources: irq:30 *-pci:4 description: PCI bridge product: C610/X99 series chipset PCI Express Root Port #5 vendor: Intel Corporation physical id: 1c.4 bus info: pci@0000:00:1c.4 version: d5 width: 32 bits clock: 33MHz capabilities: pci pciexpress msi pm normal_decode bus_master cap_list configuration: driver=pcieport resources: irq:31 ioport:4000(size=4096) memory:c6000000-c70fffff *-pci description: PCI bridge product: AST1150 PCI-to-PCI Bridge vendor: ASPEED Technology, Inc. physical id: 0 bus info: pci@0000:05:00.0 version: 03 width: 32 bits clock: 33MHz capabilities: pci msi pm pciexpress normal_decode bus_master cap_list resources: ioport:4000(size=4096) memory:c6000000-c70fffff *-isa description: ISA bridge product: C610/X99 series chipset LPC Controller vendor: Intel Corporation physical id: 1f bus info: pci@0000:00:1f.0 version: 05 width: 32 bits clock: 33MHz capabilities: isa bus_master cap_list configuration: driver=lpc_ich latency=0 resources: irq:0 *-pci:1 description: PCI bridge product: Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D PCI Express Root Port 3 vendor: Intel Corporation physical id: 3 bus info: pci@0000:80:03.0 version: 01 width: 32 bits clock: 33MHz capabilities: pci msi pciexpress pm normal_decode bus_master cap_list configuration: driver=pcieport resources: irq:33 memory:fba00000-fbefffff ioport:c8000000(size=1048576)
met@linux:~#
met@linux:~# 12 2 Answers
This isn't a concise answer, but all the information you probably need is here...
Source:
Also read:
Cheat Sheet:
Jumbo Frames
New in version 2.6.0.
By default, DPDK ports are configured with standard Ethernet MTU (1500B). To enable Jumbo Frames support for a DPDK port, change the Interface’s mtu_request attribute to a sufficiently large value. For example, to add a DPDK physical port with an MTU of 9000, run:
$ ovs-vsctl add-port br0 dpdk-p0 -- set Interface dpdk-p0 type=dpdk \ options:dpdk-devargs=0000:01:00.0 mtu_request=9000Similarly, to change the MTU of an existing port to 6200, run:
$ ovs-vsctl set Interface dpdk-p0 mtu_request=6200Some additional configuration is needed to take advantage of jumbo frames with vHost User ports:
Mergeable buffers must be enabled for vHost User ports, as demonstrated in the QEMU command line snippet below:
-netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \
-device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,mrg_rxbuf=onWhere virtio devices are bound to the Linux kernel driver in a guest environment (i.e. interfaces are not bound to an in-guest DPDK driver), the MTU of those logical network interfaces must also be increased to a sufficiently large value. This avoids segmentation of Jumbo Frames received in the guest. Note that ‘MTU’ refers to the length of the IP packet only, and not that of the entire frame.
To calculate the exact MTU of a standard IPv4 frame, subtract the L2 header and CRC lengths (i.e. 18B) from the max supported frame size. So, to set the MTU for a 9018B Jumbo Frame:
$ ip link set eth1 mtu 9000When Jumbo Frames are enabled, the size of a DPDK port’s mbuf segments are increased, such that a full Jumbo Frame of a specific size may be accommodated within a single mbuf segment.
Jumbo frame support has been validated against 9728B frames, which is the largest frame size supported by Fortville NIC using the DPDK i40e driver, but larger frames and other DPDK NIC drivers may be supported. These cases are common for use cases involving East-West traffic only.
Update #1:
You probably need something similar to this... no guarantees though, as I'm not any kind of an OVS expert...
sudo ovs-vsctl set Interface dpdk1 mtu_request=9000
sudo ovs-vsctl set Interface dpdk2 mtu_request=9000
sudo ip link set en01 mtu 9000
sudo ip link set en02 mtu 9000 First, read man ip;man ip-link
You can see your MTU values with:
ip link | grep -E 'mtu [0-9]+'The above shows the MTU for all the interfaces the system knows about.
and you can change the MTU with ip link set, as described in the man page.
If your interfaces don't show up in the ip link command,
you can watch how your system discovers its hardware/software environment in excruciating detail by sudo journalctl -b 0.
Page through this to find out how your system is (mis) handling the interfaces.