PcapPlusPlus Example- DPDKFilterTraffic not performing like I would expect, do I have something misconfigured? - pcapplusplus

I am working on an application that uses DPDK to write packet payloads to file and am investigating whether or not PcapPlusPlus could be used for this purpose. My setup is as follows:
I am using a Mellanox ConnectX-5 NIC, Ubuntu 22.04.01, and the latest DPDK and PcapPlusPlus. Kernel: 5.15.0-53-generic and 10 1GB hugepages.
Mellanox Connectx-5 NIC:
Device #1:
----------
Device Type: ConnectX5
Part Number: MCX516A-CCA_Ax
Description: ConnectX-5 EN network interface card; 100GbE dual-port QSFP28; PCIe3.0 x16; tall bracket; ROHS R6
PSID: MT_0000000012
PCI Device Name: /dev/mst/mt4119_pciconf0
Base GUID: b8cef60300fb1330
Base MAC: b8cef6fb1330
Versions: Current Available
FW 16.35.1012 N/A
PXE 3.6.0804 N/A
UEFI 14.28.0015 N/A
Status: No matching image found
Running ethtool on the card:
Settings for ens7f0np0:
Supported ports: [ Backplane ]
Supported link modes: 1000baseKX/Full
10000baseKR/Full
40000baseKR4/Full
40000baseCR4/Full
40000baseSR4/Full
40000baseLR4/Full
25000baseCR/Full
25000baseKR/Full
25000baseSR/Full
50000baseCR2/Full
50000baseKR2/Full
100000baseKR4/Full
100000baseSR4/Full
100000baseCR4/Full
100000baseLR4_ER4/Full
Supported pause frame use: Symmetric
Supports auto-negotiation: Yes
Supported FEC modes: None RS BASER
Advertised link modes: 100000baseKR4/Full
100000baseSR4/Full
100000baseCR4/Full
100000baseLR4_ER4/Full
Advertised pause frame use: Symmetric
Advertised auto-negotiation: No
Advertised FEC modes: RS
Speed: 100000Mb/s
Duplex: Full
Auto-negotiation: off
Port: Direct Attach Copper
PHYAD: 0
Transceiver: internal
netlink error: Operation not permitted
Current message level: 0x00000004 (4)
link
Link detected: yes
Here is my issue. In the diagram, each link consists of two sets of packets. One set from port 0 and one set from port 1 etc. If using tcpdump I can record the packets from a single port without issue to a RAID0 array consisting of 4 NVME drives.
sudo tcpdump -i ens7f0np0 udp -nn -# -N -B 1048576 -t -q -Q in -p -w /mnt/md0/test/test.pcap dst port 50340
I get a .pcap file that is 2GB in size for a 1 second test as expected. Two ports is too much for the kernel so If I try to record everything I get dropped packets. This is where DPDK comes in. I wanted to use the DPDK filter example to do the same test with DPDK.
sudo ./DpdkTrafficFilter -d 2 -f /mnt/md0/ -o 'UDP' -c 7
I run the example and here is the output:
EAL: Detected CPU lcores: 56
EAL: Detected NUMA nodes: 2
EAL: Detected shared linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'PA'
EAL: No available 2048 kB hugepages reported
EAL: VFIO support initialized
EAL: Probe PCI driver: mlx5_pci (15b3:1015) device: 0000:17:00.0 (socket 0)
mlx5_net: No available register for sampler.
EAL: Probe PCI driver: mlx5_pci (15b3:1015) device: 0000:17:00.1 (socket 0)
mlx5_net: No available register for sampler.
EAL: Probe PCI driver: mlx5_pci (15b3:1017) device: 0000:31:00.0 (socket 0)
EAL: Probe PCI driver: mlx5_pci (15b3:1017) device: 0000:31:00.1 (socket 0)
EAL: Probe PCI driver: mlx5_pci (15b3:1015) device: 0000:b1:00.0 (socket 1)
mlx5_net: No available register for sampler.
EAL: Probe PCI driver: mlx5_pci (15b3:1015) device: 0000:b1:00.1 (socket 1)
mlx5_net: No available register for sampler.
EAL: Probe PCI driver: mlx5_pci (15b3:1015) device: 0000:ca:00.0 (socket 1)
mlx5_net: No available register for sampler.
EAL: Probe PCI driver: mlx5_pci (15b3:1015) device: 0000:ca:00.1 (socket 1)
mlx5_net: No available register for sampler.
EAL: Probe PCI driver: mlx5_pci (15b3:1015) device: 0000:e3:00.0 (socket 1)
mlx5_net: No available register for sampler.
EAL: Probe PCI driver: mlx5_pci (15b3:1015) device: 0000:e3:00.1 (socket 1)
mlx5_net: No available register for sampler.
TELEMETRY: No legacy callbacks, legacy socket not created
Opened device #2 with 1 RX queues and 1 TX queues. RSS hash functions:
RSS_IPV4
RSS_IPV6
Using core 1
Core configuration:
DPDK device#2: RX-Queue#0;
Using core 2
Core configuration:
None
^C
Application stopped
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Core ID | Packet Cnt | Eth Cnt | ARP Cnt | IPv4 Cnt | IPv6 Cnt | TCP Cnt | UDP Cnt | HTTP Cnt | Matched TCP Flows | Matched UDP Flows | Matched Packets |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 1 | 514051 | 514051 | 0 | 514051 | 0 | 0 | 514051 | 0 | 0 | 1 | 514050 |
| 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Total | 514051 | 514051 | 0 | 514051 | 0 | 0 | 514051 | 0 | 0 | 1 | 514050 |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
However, the pcap file is only 725MB for a 1 second test! Any idea what I am doing wrong? Are packets being dropped? If so why is the performance worse than tcpdump?
Also, not sure if it matters, but the traffic I am receiving is MTU 9000. I did change the DpdkTrafficFilter code to set the mtu to 9000 but got the same results.

Related

Peculiar behaviour with Mellanox ConnectX-5 and DPDK in rxonly mode

Recently I observed a peculiar behaviour with Mellanox ConnectX-5 100 Gbps NIC. While working on 100 Gbps rxonly using DPDK rxonly mode. It was observed that I was able to receive 142 Mpps using 12 queues. However with 11 queues, it was only 96 Mpps, with 10 queues 94 Mpps, 9 queues 92 Mpps. Can anyone explain why there is a sudden/abrupt jump in capture performance from 11 queues to 12 queues?
The details of the setup is mentioned below.
I have connected two servers back to back. One of them (server-1) is used for traffic generation and the other (server-2) is used for traffic reception. In both the servers I am using Mellanox ConnectX-5 NIC.
Performance tuning parameters mentioned in section-3 of https://fast.dpdk.org/doc/perf/DPDK_19_08_Mellanox_NIC_performance_report.pdf [pg no.:11,12] has been followed
Both servers are of same configuration.
Server configuration
Processor: Intel Xeon scalable processor, 6148 series, 20 Core HT, 2.4 GHz, 27.5 L3 Cache
No. of Processor: 4 Nos.
RAM: 256 GB, 2666 MHz speed
DPDK version used is dpdk-19.11 and OS is RHEL-8.0
For traffic generation testpmd with --forward=txonly and --txonly-multi-flow is used. Command used is below.
Packet generation testpmd command in server-1
./testpmd -l 4,5,6,7,8,9,10,11,12,13,14,15,16 -n 6 -w 17:00.0,mprq_en=1,rxq_pkt_pad_en=1 --socket-mem=4096,0,0,0 -- --socket-num=0 --burst=64 --txd=4096 --rxd=4096--mbcache=512 --rxq=12 --txq=12 --nb-cores=12 -i -a --rss-ip --no-numa --forward=txonly --txonly-multi-flow
testpmd> set txpkts 64
It was able to generate 64 bytes packet at the sustained rate of 142.2 Mpps. This is used as input to the second server that works in rxonly mode. The command for reception is mentioned below
Packet Reception command with 12 cores in server-2
./testpmd -l 4,5,6,7,8,9,10,11,12,13,14,15,16 -n 6 -w 17:00.0,mprq_en=1,rxq_pkt_pad_en=1 --socket-mem=4096,0,0,0 -- --socket-num=0 --burst=64 --txd=4096 --rxd=4096--mbcache=512 --rxq=12 --txq=12 --nb-cores=12 -i -a --rss-ip --no-numa
testpmd> set fwd rxonly
testpmd> show port stats all
######################## NIC statistics for port 0 ########################
RX-packets: 1363328297 RX-missed: 0 RX-bytes: 87253027549
RX-errors: 0
RX-nombuf: 0
TX-packets: 19 TX-errors: 0 TX-bytes: 3493
Throughput (since last show)
Rx-pps: 142235725 Rx-bps: 20719963768
Tx-pps: 0 Tx-bps: 0
############################################################################
Packet Reception command with 11 cores in server-2
./testpmd -l 4,5,6,7,8,9,10,11,12,13,14,15 -n 6 -w 17:00.0,mprq_en=1,rxq_pkt_pad_en=1 --socket-mem=4096,0,0,0 -- --socket-num=0 --burst=64 --txd=4096 --rxd=4096--mbcache=512 --rxq=11 --txq=11 --nb-cores=11 -i -a --rss-ip --no-numa
testpmd> set fwd rxonly
testpmd> show port stats all
######################## NIC statistics for port 0 ########################
RX-packets: 1507398174 RX-missed: 112937160 RX-bytes: 96473484013
RX-errors: 0
RX-nombuf: 0
TX-packets: 867061720 TX-errors: 0 TX-bytes: 55491950935
Throughput (since last show)
Rx-pps: 96718960 Rx-bps: 49520107600
Tx-pps: 0 Tx-bps: 0
############################################################################
If you see there is a sudden jump in Rx-pps from 11 cores to 12 cores. This variation was not observed elsewhere like 8 to 9, 9 to 10 or 10 to 11 and so on.
Can anyone explain the reason of this sudden jump in performance.
The same experiment was conducted, this time using 11 cores for traffic generation.
./testpmd -l 4,5,6,7,8,9,10,11,12,13,14,15 -n 6 -w 17:00.0,mprq_en=1,rxq_pkt_pad_en=1 --socket-mem=4096,0,0,0 -- --socket-num=0 --burst=64 --txd=4096 --rxd=4096--mbcache=512 --rxq=11 --txq=11 --nb-cores=11 -i -a --rss-ip --no-numa --forward=txonly --txonly-multi-flow
testpmd> show port stats all
######################## NIC statistics for port 0 ########################
RX-packets: 0 RX-missed: 0 RX-bytes: 0
RX-errors: 0
RX-nombuf: 0
TX-packets: 2473087484 TX-errors: 0 TX-bytes: 158277600384
Throughput (since last show)
Rx-pps: 0 Rx-bps: 0
Tx-pps: 142227777 Tx-bps: 72820621904
############################################################################
On the capture side with 11 cores
./testpmd -l 1,2,3,4,5,6,10,11,12,13,14,15 -n 6 -w 17:00.0,mprq_en=1,rxq_pkt_pad_en=1 --socket-mem=4096,0,0,0 -- --socket-num=0 --burst=64 --txd=1024 --rxd=1024--mbcache=512 --rxq=11 --txq=11 --nb-cores=11 -i -a --rss-ip --no-numa
testpmd> set fwd rxonly
testpmd> show port stats all
######################## NIC statistics for port 0 ########################
RX-packets: 8411445440 RX-missed: 9685 RX-bytes: 538332508206
RX-errors: 0
RX-nombuf: 0
TX-packets: 0 TX-errors: 0 TX-bytes: 0
Throughput (since last show)
Rx-pps: 97597509 Rx-bps: 234643872
Tx-pps: 0 Tx-bps: 0
############################################################################
On the capture side with 12 cores
./testpmd -l 1,2,3,4,5,6,10,11,12,13,14,15,16 -n 6 -w 17:00.0,mprq_en=1,rxq_pkt_pad_en=1 --socket-mem=4096,0,0,0 -- --socket-num=0 --burst=64 --txd=1024 --rxd=1024--mbcache=512 --rxq=12 --txq=12 --nb-cores=12 -i -a --rss-ip --no-numa
testpmd> set fwd rxonly
testpmd> show port stats all
######################## NIC statistics for port 0 ########################
RX-packets: 9370629638 RX-missed: 6124 RX-bytes: 554429504128
RX-errors: 0
RX-nombuf: 0
TX-packets: 0 TX-errors: 0 TX-bytes: 0
Throughput (since last show)
Rx-pps: 140664658 Rx-bps: 123982640
Tx-pps: 0 Tx-bps: 0
############################################################################
The sudden jump in performance from 11 to 12 core still remains the same.
With DPDK LTS release for 19.11, 20.11, 21.11 running just in vector mode (default mode) for Mellanox CX-5 and CX-6 does not produce the problem mentioned above.
[EDIT-1] retested with rxqs_min_mprq=1 for 2 * 100Gbps for 64B, For 16 RXTX on 16T16C resulted in degradation 9~10Mpps. For all RX queue from 1 to 7 RX there is degration of 6Mpps with rxqs_min_mprq=1.
Following is the capture for RXTX to core scaling
investigating into MPRQ claim, the following are some the unique observations
For both MLX CX-5 and CX-6, the max that each RX queue can attain is around 36 to 38 MPPs
Single core can achieve up to 90Mpps (64B) with 3 RXTX in IO using AMD EPYC MILAN on both CX-5 and CX-6.
For 100Gbps on 64B can be achieved with 14 Logical cores (7 Physical cores) with testpmd in IO mode.
for both CX-5 and CX-6 2 * 100Gbps for 64B requires MPRQ and compression technique to allow more packets in and out of system.
There are multitude of configuration tuning required to achieve high number. Please refer stackoverflow question and DPDK MLX tuning parameters for more information.
PCIe gen4 BW is not the limiting factor, but the NIC ASIC with internal embedded siwtch results in above mentioned behaviour. hence to overcome these limitation one needs to use PMD arguments to activate the Hardware, which further increases the overhead on CPU in PMD processing. Thus there is barrier (needs more cpu) to process the compressed and multiple packets inlined to convert to DPDK single MBUF. This is reason why more therads are required when using PMD arguments.
note:
Test application: testpmd
EAL Args: --in-memory --no-telemetry --no-shconf --single-file-segments --file-prefix=2 -l 7,8-31
PMD args vector: none
PMD args for 2 * 100Gbps line rate: txq_inline_mpw=204,txqs_min_inline=1,mprq_en=1,rxqs_min_mprq=1,mprq_log_stride_num=12,rxq_pkt_pad_en=1,rxq_cqe_comp_en=4

Adding a multicast route to an interface in OSX

I have a VM running in Fusion that I want to hit by routing a specific endpoint address through the virtual ethernet interface (multicast DNS, in particular). First I was sending packets and inspecting with Wireshark noticing that nothing was getting through. Then I thought to check the routing table
$ netstat -rn | grep vmnet8
Destination Gateway Flags Refs Use Netif Expire
172.16.12/24 link#29 UC 2 0 vmnet8 !
172.16.12.255 ff:ff:ff:ff:ff:ff UHLWbI 0 35 vmnet8 !
But unlike other interfaces,
Destination Gateway Flags Refs Use Netif Expire
224.0.0.251 a1:10:5e:50:0:fb UHmLWI 0 732 en0
224.0.0.251 a1:10:5e:50:0:fb UHmLWI 0 0 en8
There was no multicast route. So I added it:
$ sudo route add -host 224.0.0.251 -interface vmnet8
add host 224.0.0.251: gateway vmnet8
And so it was true
$ netstat -rn | grep vmnet8
Destination Gateway Flags Refs Use Netif Expire
172.16.12/24 link#29 UC 2 0 vmnet8 !
172.16.12.255 ff:ff:ff:ff:ff:ff UHLWbI 0 35 vmnet8 !
224.0.0.251 a1:10:5e:50:0:fb UHmLS 0 13 vmnet8
I was also sure to check the interface flags to ensure it had been configured to support multicast
$ ifconfig vmnet8
vmnet8: flags=8963<UP,BROADCAST,SMART,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 1500
ether 00:70:61:c0:11:08
inet 172.16.12.1 netmask 0xffffff00 broadcast 172.16.12.255
Still, no multicast packets I send are getting through. I noted that the other interface's multicast route have different flags than the default ones given to my added route. Namely UHmLWI vs UHmLS. The differences I can see are insignificant. From man netstat:
I RTF_IFSCOPE Route is associated with an interface scope
S RTF_STATIC Manually added
W RTF_WASCLONED Route was generated as a result of cloning
Then again, I'm not claiming to be a routing expert. Perhaps a multicast route entry must be made somehow differently?
You'll note that the Use column is non-zero, despite no packets showing in a sniffer.

How can I simulate packet loss using tc netem?

I am trying to simulate a 5% packet loss using the tc tool at server port 1234. Here are my steps -
sudo tc qdisc del dev eth0 root
sudo tc qdisc add dev eth0 root handle 1: prio
sudo tc filter add dev eth0 parent 1: protocol ip prio 1 u32 flowid 1:1 match ip dport 1234 0xffff
sudo tc qdisc add dev eth0 parent 1:1 handle 1: netem loss 5%
There are no errors during the above commands. But when I send any TCP traffic to that port, there is no packet loss observed.
What am I doing wrong in the above commands ?
Any help is appreciated.
See https://serverfault.com/a/841865/342799 for similar case.
Commands I have in my testing rig to drop 5.5% of packets:
# tc qdisc add dev eth0 root handle 1: prio priomap 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
# tc qdisc add dev eth0 parent 1:1 handle 10: netem loss 5.5% 25%
# DST_IP=1.2.3.4/32
# tc filter add \
dev eth0 \
parent 1: \
protocol ip \
prio 1 \
u32 \
match ip dst $DST_IP \
flowid 1:1
To confirm, run:
# ping -f -c 1000 $DST_IP
before and after this setup.
Note: Almost all hosting providers start throttling your traffic if you do lot of flood pings.

Packet filter syntax and loopback

I have a tun adapter (OS X) which looks like this:
tun11: flags=8851<UP,POINTOPOINT,RUNNING,SIMPLEX,MULTICAST> mtu 1500
inet 10.12.0.2 --> 10.12.0.1 netmask 0xff000000
open (pid 4004)
I send a UDP packet to it:
echo "lol" | nc -4u 10.12.0.1 8000
and able to see it with tcpdump:
➜ build git:(master) ✗ sudo tcpdump -i tun11 -vv
tcpdump: listening on tun11, link-type NULL (BSD loopback), capture size 262144 bytes
14:39:16.669055 IP (tos 0x0, ttl 64, id 21714, offset 0, flags [none], proto UDP (17), length 32)
10.12.0.2.55707 > 10.12.0.1.irdmi: [udp sum ok] UDP, length 4
However I do not see anything when I use capture filter:
➜ build git:(master) ✗ sudo tcpdump -i tun11 udp -vv
tcpdump: listening on tun11, link-type NULL (BSD loopback), capture size 262144 bytes
Same syntax works fine with ethernet adapter:
➜ build git:(master) ✗ sudo tcpdump -i en0 udp -vv
tcpdump: listening on en0, link-type EN10MB (Ethernet), capture size 262144 bytes
14:42:15.010329 IP (tos 0x0, ttl 128, id 7539, offset 0, flags [none], proto UDP (17), length 291)
xxxx.54915 > 10.64.3.255.54915: [udp sum ok] UDP, length 263
I checked man pcap-filter and found an interesting sentence related to capture filters:
Note that this primitive does not chase the protocol header chain.
Is it related to my problem? Anyway, why capture filters (at least protocol part) do not work for loopback adapters and is there way to make them work?
Addition
Interesting, it works with tun device created by OpenVPN. But I do not understand what is the difference.
tun11: flags=8851<UP,POINTOPOINT,RUNNING,SIMPLEX,MULTICAST> mtu 1500
inet 10.12.0.2 --> 10.12.0.1 netmask 0xff000000
open (pid 5792)
utun0: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> mtu 1500
inet 198.18.1.214 --> 198.18.1.213 netmask 0xffffffff
inet6 xxxx%utun0 prefixlen 64 optimistic scopeid 0xa
inet6 xxxx::1074 prefixlen 64 tentative
nd6 options=1<PERFORMNUD>

Add route in VPN connection Mac OS X

I have following routing table:
➜ ~ netstat -nr
Routing tables
Internet:
Destination Gateway Flags Refs Use Netif Expire
default 192.168.0.1 UGSc 63 1 en0
default 10.255.254.1 UGScI 1 0 ppp0
10 ppp0 USc 2 4 ppp0
10.255.254.1 10.255.254.2 UHr 1 0 ppp0
92.46.122.12 192.168.0.1 UGHS 0 0 en0
127 127.0.0.1 UCS 0 0 lo0
127.0.0.1 127.0.0.1 UH 2 62144 lo0
169.254 link#4 UCS 0 0 en0
192.168.0 link#4 UCS 8 0 en0
192.168.0.1 c0:4a:0:2d:18:48 UHLWIir 60 370 en0 974
192.168.0.100 a0:f3:c1:22:1d:6e UHLWIi 1 228 en0 1174
How can I add gateway(10.25.1.252) to specific IP(10.12.254.9) inside VPN.
I tried this command but with no luck:
sudo route -n add 10.12.0.0/16 10.25.1.252
But traceroute show that it uses default gateway:
~ traceroute 10.12.254.9
traceroute to 10.12.254.9 (10.12.254.9), 64 hops max, 52 byte packets
1 10.255.254.1 (10.255.254.1) 41.104 ms 203.766 ms 203.221 ms
Are you using Cisco AnyConnect? Here's a tidbit from https://supportforums.cisco.com/document/7651/anyconnect-vpn-client-faq
Q. How does the AnyConnect client enforce/monitor the
tunnel/split-tunnel policy?
A. AnyConnect enforces the tunnel policy in 2 ways:
1)Route monitoring and repair (e.g. if you change the route table),
AnyConnect will restore it to what was provisioned.
2)Filtering (on platforms that support filter engines). Filtering ensures that even if you could perform some sort of route injection, the filters would block the packets.
Which I interpret as: Whenever you change the route from, the Cisco client resets the route to what your VPN administrator configured.
Your best bet it to talk to you VPN administrator and ask them to add your route.

Resources