Recently I observed a peculiar behaviour with Mellanox ConnectX-5 100 Gbps NIC. While working on 100 Gbps rxonly using DPDK rxonly mode. It was observed that I was able to receive 142 Mpps using 12 queues. However with 11 queues, it was only 96 Mpps, with 10 queues 94 Mpps, 9 queues 92 Mpps. Can anyone explain why there is a sudden/abrupt jump in capture performance from 11 queues to 12 queues?
The details of the setup is mentioned below.
I have connected two servers back to back. One of them (server-1) is used for traffic generation and the other (server-2) is used for traffic reception. In both the servers I am using Mellanox ConnectX-5 NIC.
Performance tuning parameters mentioned in section-3 of https://fast.dpdk.org/doc/perf/DPDK_19_08_Mellanox_NIC_performance_report.pdf [pg no.:11,12] has been followed
Both servers are of same configuration.
Server configuration
Processor: Intel Xeon scalable processor, 6148 series, 20 Core HT, 2.4 GHz, 27.5 L3 Cache
No. of Processor: 4 Nos.
RAM: 256 GB, 2666 MHz speed
DPDK version used is dpdk-19.11 and OS is RHEL-8.0
For traffic generation testpmd with --forward=txonly and --txonly-multi-flow is used. Command used is below.
Packet generation testpmd command in server-1
./testpmd -l 4,5,6,7,8,9,10,11,12,13,14,15,16 -n 6 -w 17:00.0,mprq_en=1,rxq_pkt_pad_en=1 --socket-mem=4096,0,0,0 -- --socket-num=0 --burst=64 --txd=4096 --rxd=4096--mbcache=512 --rxq=12 --txq=12 --nb-cores=12 -i -a --rss-ip --no-numa --forward=txonly --txonly-multi-flow
testpmd> set txpkts 64
It was able to generate 64 bytes packet at the sustained rate of 142.2 Mpps. This is used as input to the second server that works in rxonly mode. The command for reception is mentioned below
Packet Reception command with 12 cores in server-2
./testpmd -l 4,5,6,7,8,9,10,11,12,13,14,15,16 -n 6 -w 17:00.0,mprq_en=1,rxq_pkt_pad_en=1 --socket-mem=4096,0,0,0 -- --socket-num=0 --burst=64 --txd=4096 --rxd=4096--mbcache=512 --rxq=12 --txq=12 --nb-cores=12 -i -a --rss-ip --no-numa
testpmd> set fwd rxonly
testpmd> show port stats all
######################## NIC statistics for port 0 ########################
RX-packets: 1363328297 RX-missed: 0 RX-bytes: 87253027549
RX-errors: 0
RX-nombuf: 0
TX-packets: 19 TX-errors: 0 TX-bytes: 3493
Throughput (since last show)
Rx-pps: 142235725 Rx-bps: 20719963768
Tx-pps: 0 Tx-bps: 0
############################################################################
Packet Reception command with 11 cores in server-2
./testpmd -l 4,5,6,7,8,9,10,11,12,13,14,15 -n 6 -w 17:00.0,mprq_en=1,rxq_pkt_pad_en=1 --socket-mem=4096,0,0,0 -- --socket-num=0 --burst=64 --txd=4096 --rxd=4096--mbcache=512 --rxq=11 --txq=11 --nb-cores=11 -i -a --rss-ip --no-numa
testpmd> set fwd rxonly
testpmd> show port stats all
######################## NIC statistics for port 0 ########################
RX-packets: 1507398174 RX-missed: 112937160 RX-bytes: 96473484013
RX-errors: 0
RX-nombuf: 0
TX-packets: 867061720 TX-errors: 0 TX-bytes: 55491950935
Throughput (since last show)
Rx-pps: 96718960 Rx-bps: 49520107600
Tx-pps: 0 Tx-bps: 0
############################################################################
If you see there is a sudden jump in Rx-pps from 11 cores to 12 cores. This variation was not observed elsewhere like 8 to 9, 9 to 10 or 10 to 11 and so on.
Can anyone explain the reason of this sudden jump in performance.
The same experiment was conducted, this time using 11 cores for traffic generation.
./testpmd -l 4,5,6,7,8,9,10,11,12,13,14,15 -n 6 -w 17:00.0,mprq_en=1,rxq_pkt_pad_en=1 --socket-mem=4096,0,0,0 -- --socket-num=0 --burst=64 --txd=4096 --rxd=4096--mbcache=512 --rxq=11 --txq=11 --nb-cores=11 -i -a --rss-ip --no-numa --forward=txonly --txonly-multi-flow
testpmd> show port stats all
######################## NIC statistics for port 0 ########################
RX-packets: 0 RX-missed: 0 RX-bytes: 0
RX-errors: 0
RX-nombuf: 0
TX-packets: 2473087484 TX-errors: 0 TX-bytes: 158277600384
Throughput (since last show)
Rx-pps: 0 Rx-bps: 0
Tx-pps: 142227777 Tx-bps: 72820621904
############################################################################
On the capture side with 11 cores
./testpmd -l 1,2,3,4,5,6,10,11,12,13,14,15 -n 6 -w 17:00.0,mprq_en=1,rxq_pkt_pad_en=1 --socket-mem=4096,0,0,0 -- --socket-num=0 --burst=64 --txd=1024 --rxd=1024--mbcache=512 --rxq=11 --txq=11 --nb-cores=11 -i -a --rss-ip --no-numa
testpmd> set fwd rxonly
testpmd> show port stats all
######################## NIC statistics for port 0 ########################
RX-packets: 8411445440 RX-missed: 9685 RX-bytes: 538332508206
RX-errors: 0
RX-nombuf: 0
TX-packets: 0 TX-errors: 0 TX-bytes: 0
Throughput (since last show)
Rx-pps: 97597509 Rx-bps: 234643872
Tx-pps: 0 Tx-bps: 0
############################################################################
On the capture side with 12 cores
./testpmd -l 1,2,3,4,5,6,10,11,12,13,14,15,16 -n 6 -w 17:00.0,mprq_en=1,rxq_pkt_pad_en=1 --socket-mem=4096,0,0,0 -- --socket-num=0 --burst=64 --txd=1024 --rxd=1024--mbcache=512 --rxq=12 --txq=12 --nb-cores=12 -i -a --rss-ip --no-numa
testpmd> set fwd rxonly
testpmd> show port stats all
######################## NIC statistics for port 0 ########################
RX-packets: 9370629638 RX-missed: 6124 RX-bytes: 554429504128
RX-errors: 0
RX-nombuf: 0
TX-packets: 0 TX-errors: 0 TX-bytes: 0
Throughput (since last show)
Rx-pps: 140664658 Rx-bps: 123982640
Tx-pps: 0 Tx-bps: 0
############################################################################
The sudden jump in performance from 11 to 12 core still remains the same.
With DPDK LTS release for 19.11, 20.11, 21.11 running just in vector mode (default mode) for Mellanox CX-5 and CX-6 does not produce the problem mentioned above.
[EDIT-1] retested with rxqs_min_mprq=1 for 2 * 100Gbps for 64B, For 16 RXTX on 16T16C resulted in degradation 9~10Mpps. For all RX queue from 1 to 7 RX there is degration of 6Mpps with rxqs_min_mprq=1.
Following is the capture for RXTX to core scaling
investigating into MPRQ claim, the following are some the unique observations
For both MLX CX-5 and CX-6, the max that each RX queue can attain is around 36 to 38 MPPs
Single core can achieve up to 90Mpps (64B) with 3 RXTX in IO using AMD EPYC MILAN on both CX-5 and CX-6.
For 100Gbps on 64B can be achieved with 14 Logical cores (7 Physical cores) with testpmd in IO mode.
for both CX-5 and CX-6 2 * 100Gbps for 64B requires MPRQ and compression technique to allow more packets in and out of system.
There are multitude of configuration tuning required to achieve high number. Please refer stackoverflow question and DPDK MLX tuning parameters for more information.
PCIe gen4 BW is not the limiting factor, but the NIC ASIC with internal embedded siwtch results in above mentioned behaviour. hence to overcome these limitation one needs to use PMD arguments to activate the Hardware, which further increases the overhead on CPU in PMD processing. Thus there is barrier (needs more cpu) to process the compressed and multiple packets inlined to convert to DPDK single MBUF. This is reason why more therads are required when using PMD arguments.
note:
Test application: testpmd
EAL Args: --in-memory --no-telemetry --no-shconf --single-file-segments --file-prefix=2 -l 7,8-31
PMD args vector: none
PMD args for 2 * 100Gbps line rate: txq_inline_mpw=204,txqs_min_inline=1,mprq_en=1,rxqs_min_mprq=1,mprq_log_stride_num=12,rxq_pkt_pad_en=1,rxq_cqe_comp_en=4
I have a VM running in Fusion that I want to hit by routing a specific endpoint address through the virtual ethernet interface (multicast DNS, in particular). First I was sending packets and inspecting with Wireshark noticing that nothing was getting through. Then I thought to check the routing table
$ netstat -rn | grep vmnet8
Destination Gateway Flags Refs Use Netif Expire
172.16.12/24 link#29 UC 2 0 vmnet8 !
172.16.12.255 ff:ff:ff:ff:ff:ff UHLWbI 0 35 vmnet8 !
But unlike other interfaces,
Destination Gateway Flags Refs Use Netif Expire
224.0.0.251 a1:10:5e:50:0:fb UHmLWI 0 732 en0
224.0.0.251 a1:10:5e:50:0:fb UHmLWI 0 0 en8
There was no multicast route. So I added it:
$ sudo route add -host 224.0.0.251 -interface vmnet8
add host 224.0.0.251: gateway vmnet8
And so it was true
$ netstat -rn | grep vmnet8
Destination Gateway Flags Refs Use Netif Expire
172.16.12/24 link#29 UC 2 0 vmnet8 !
172.16.12.255 ff:ff:ff:ff:ff:ff UHLWbI 0 35 vmnet8 !
224.0.0.251 a1:10:5e:50:0:fb UHmLS 0 13 vmnet8
I was also sure to check the interface flags to ensure it had been configured to support multicast
$ ifconfig vmnet8
vmnet8: flags=8963<UP,BROADCAST,SMART,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 1500
ether 00:70:61:c0:11:08
inet 172.16.12.1 netmask 0xffffff00 broadcast 172.16.12.255
Still, no multicast packets I send are getting through. I noted that the other interface's multicast route have different flags than the default ones given to my added route. Namely UHmLWI vs UHmLS. The differences I can see are insignificant. From man netstat:
I RTF_IFSCOPE Route is associated with an interface scope
S RTF_STATIC Manually added
W RTF_WASCLONED Route was generated as a result of cloning
Then again, I'm not claiming to be a routing expert. Perhaps a multicast route entry must be made somehow differently?
You'll note that the Use column is non-zero, despite no packets showing in a sniffer.
I am trying to simulate a 5% packet loss using the tc tool at server port 1234. Here are my steps -
sudo tc qdisc del dev eth0 root
sudo tc qdisc add dev eth0 root handle 1: prio
sudo tc filter add dev eth0 parent 1: protocol ip prio 1 u32 flowid 1:1 match ip dport 1234 0xffff
sudo tc qdisc add dev eth0 parent 1:1 handle 1: netem loss 5%
There are no errors during the above commands. But when I send any TCP traffic to that port, there is no packet loss observed.
What am I doing wrong in the above commands ?
Any help is appreciated.
See https://serverfault.com/a/841865/342799 for similar case.
Commands I have in my testing rig to drop 5.5% of packets:
# tc qdisc add dev eth0 root handle 1: prio priomap 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
# tc qdisc add dev eth0 parent 1:1 handle 10: netem loss 5.5% 25%
# DST_IP=1.2.3.4/32
# tc filter add \
dev eth0 \
parent 1: \
protocol ip \
prio 1 \
u32 \
match ip dst $DST_IP \
flowid 1:1
To confirm, run:
# ping -f -c 1000 $DST_IP
before and after this setup.
Note: Almost all hosting providers start throttling your traffic if you do lot of flood pings.
I have a tun adapter (OS X) which looks like this:
tun11: flags=8851<UP,POINTOPOINT,RUNNING,SIMPLEX,MULTICAST> mtu 1500
inet 10.12.0.2 --> 10.12.0.1 netmask 0xff000000
open (pid 4004)
I send a UDP packet to it:
echo "lol" | nc -4u 10.12.0.1 8000
and able to see it with tcpdump:
➜ build git:(master) ✗ sudo tcpdump -i tun11 -vv
tcpdump: listening on tun11, link-type NULL (BSD loopback), capture size 262144 bytes
14:39:16.669055 IP (tos 0x0, ttl 64, id 21714, offset 0, flags [none], proto UDP (17), length 32)
10.12.0.2.55707 > 10.12.0.1.irdmi: [udp sum ok] UDP, length 4
However I do not see anything when I use capture filter:
➜ build git:(master) ✗ sudo tcpdump -i tun11 udp -vv
tcpdump: listening on tun11, link-type NULL (BSD loopback), capture size 262144 bytes
Same syntax works fine with ethernet adapter:
➜ build git:(master) ✗ sudo tcpdump -i en0 udp -vv
tcpdump: listening on en0, link-type EN10MB (Ethernet), capture size 262144 bytes
14:42:15.010329 IP (tos 0x0, ttl 128, id 7539, offset 0, flags [none], proto UDP (17), length 291)
xxxx.54915 > 10.64.3.255.54915: [udp sum ok] UDP, length 263
I checked man pcap-filter and found an interesting sentence related to capture filters:
Note that this primitive does not chase the protocol header chain.
Is it related to my problem? Anyway, why capture filters (at least protocol part) do not work for loopback adapters and is there way to make them work?
Addition
Interesting, it works with tun device created by OpenVPN. But I do not understand what is the difference.
tun11: flags=8851<UP,POINTOPOINT,RUNNING,SIMPLEX,MULTICAST> mtu 1500
inet 10.12.0.2 --> 10.12.0.1 netmask 0xff000000
open (pid 5792)
utun0: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> mtu 1500
inet 198.18.1.214 --> 198.18.1.213 netmask 0xffffffff
inet6 xxxx%utun0 prefixlen 64 optimistic scopeid 0xa
inet6 xxxx::1074 prefixlen 64 tentative
nd6 options=1<PERFORMNUD>
I have following routing table:
➜ ~ netstat -nr
Routing tables
Internet:
Destination Gateway Flags Refs Use Netif Expire
default 192.168.0.1 UGSc 63 1 en0
default 10.255.254.1 UGScI 1 0 ppp0
10 ppp0 USc 2 4 ppp0
10.255.254.1 10.255.254.2 UHr 1 0 ppp0
92.46.122.12 192.168.0.1 UGHS 0 0 en0
127 127.0.0.1 UCS 0 0 lo0
127.0.0.1 127.0.0.1 UH 2 62144 lo0
169.254 link#4 UCS 0 0 en0
192.168.0 link#4 UCS 8 0 en0
192.168.0.1 c0:4a:0:2d:18:48 UHLWIir 60 370 en0 974
192.168.0.100 a0:f3:c1:22:1d:6e UHLWIi 1 228 en0 1174
How can I add gateway(10.25.1.252) to specific IP(10.12.254.9) inside VPN.
I tried this command but with no luck:
sudo route -n add 10.12.0.0/16 10.25.1.252
But traceroute show that it uses default gateway:
~ traceroute 10.12.254.9
traceroute to 10.12.254.9 (10.12.254.9), 64 hops max, 52 byte packets
1 10.255.254.1 (10.255.254.1) 41.104 ms 203.766 ms 203.221 ms
Are you using Cisco AnyConnect? Here's a tidbit from https://supportforums.cisco.com/document/7651/anyconnect-vpn-client-faq
Q. How does the AnyConnect client enforce/monitor the
tunnel/split-tunnel policy?
A. AnyConnect enforces the tunnel policy in 2 ways:
1)Route monitoring and repair (e.g. if you change the route table),
AnyConnect will restore it to what was provisioned.
2)Filtering (on platforms that support filter engines). Filtering ensures that even if you could perform some sort of route injection, the filters would block the packets.
Which I interpret as: Whenever you change the route from, the Cisco client resets the route to what your VPN administrator configured.
Your best bet it to talk to you VPN administrator and ask them to add your route.