tcpdump & tshark (almost) fooled me
🖊️ About 2200 words ⏱️ 13 minutes
I’ve been using tcpdump and tshark for years now and I’ve always trusted these tools because they’ve helped me fix so many things. But sometimes, something reminds you not to trust blindly ! Take a moment to pause, grab a ☕ and find a 🪑 to sit down and focus.
Intro
I’m working on creating a new lab based on FreeBSD (you can check out my bio for more details). I’ve got to use multiple VLANs for security reasons, and I am trying to replicate what I did in the past on Linux.
High Level Architecture
- The network has three VLANs
- vlan140 : Clients
- vlan150 : Web Servers
- vlan160 : Database Servers
- host2 is the main L3/Inter VLANs router/gateway and firewall connected to Internet router
- host3 is hosting all the containers (jails)
- 1x Client jail jail-c1 is connected on vlan140
- 1x Web Server jail jail-s1 is on vlan150
- 1x Database jail jail-db1 is on vlan160
Host2 (router) Network Configuration
- NIC ale0 connected to Internet router
- NIC em0 for management traffic
- NIC em1 for trunk traffic with multiple VLANs sub-interfaces :
- em1.140 with 10.10.40.254/24 (acting as vlan140 gateway)
- em1.150 with 10.10.50.254/24 (acting as vlan150 gateway)
- em1.160 with 10.10.60.254/24 (acting as vlan160 gateway)
That’s how the VLANs sub-interfaces and IP addresses had been created:
$ doas ifconfig em1.140 create
$ doas ifconfig em1.150 create
$ doas ifconfig em1.160 create
$ doas ifconfig em1.140 10.10.40.254 netmask 255.255.255.0
$ doas ifconfig em1.150 10.10.50.254 netmask 255.255.255.0
$ doas ifconfig em1.160 10.10.60.254 netmask 255.255.255.0
Host3 (Jails server) Network Configuration
- NIC em1 for management traffic
- NIC em0 for trunk traffic
That’s how the virtual bridge vswitch_lan had been created and how the physical NIC em0 connected to the physical switch (trunk port) had been added:
$ doas ifconfig create vswitch_lab
$ doas ifconfig vswitch_lab addm em0
Containers Management
I’m using the excelent Bastille to deploy jails (which are a type of OS-level virtualization with very little overhead, similar to LXC containers or Docker for Linux).
I am also going to use VNET (Virtual Network) which creates a complete virtualized network stack for a container with its own network interface, hardware address, routing tables… (each VNET is attached to a prison, which is a pretty neat FreeBSD feature) so I had to add this to /etc/devfs.rules as mentioned into Bastille documentation:
[bastille_vnet=13]
add include $devfsrules_hide_all
add include $devfsrules_unhide_basic
add include $devfsrules_unhide_login
add include $devfsrules_jail
add include $devfsrules_jail_vnet
add path 'bpf*' unhide
And this to /etc/sysctl.conf:
net.link.bridge.pfil_bridge=0
net.link.bridge.pfil_onlyip=0
net.link.bridge.pfil_member=0
Let’s get started on creating the jails with Bastille and adding the corresponding VLANs to the jail’s NICs (Thanks tschettervictor for implementing my suggestion to add a VLAN option in the bastille create command):
$ doas bastille create -B --vlan 140 jail-c1 14.2-RELEASE 10.10.40.1/24 vswitch_lab
$ doas bastille create -B --vlan 150 jail-s1 14.2-RELEASE 10.10.50.1/24 vswitch_lab
$ doas bastille create -B --vlan 160 jail-db1 14.2-RELEASE 10.10.60.1/24 vswitch_lab
Checking the results:
$ doas bastille list -a
JID State IP Address Published Ports Hostname Release Path
21 Up 10.10.40.1 - jail-c1 14.2-RELEASE /usr/local/bastille/jails/jail-c1/root
20 Up 10.10.60.1 - jail-db1 14.2-RELEASE /usr/local/bastille/jails/jail-db1/root
10 Up 10.10.50.1 - jail-s1 14.2-RELEASE /usr/local/bastille/jails/jail-s1/root
Setup validation
Let’s ping jail-s1 from jail-c1 to confirm end-to-end connectivity:
root@jail-c1:~ # ping 10.10.50.1
ping: sendto: Host is down
...
👎 KO
Let’s ping the default gateway:
root@jail-c1:~ # ping 10.10.40.254
ping: sendto: Host is down
...
👎 Nada !
Preliminary check-up to avoid classic mistakes
Physical Switch
- ✅ trunk ports are configured correctly
- ✅ vlan140, vlan150 & vlan160 are allowed on my trunk ports
Physical Servers Host2 (router/gateway) & Host3 (jails server)
- ✅ gateway (IP Forwarding) is enabled on host2
- ✅ firewall (pf) rules on host2 is allowing traffic between these VLANs and networks
- ✅ virtual bridge vswitch_lan is setup correctly
Jails
- ✅ devfs and rules are setup correctly
- ✅ vnet0, vnet0.140 & vnet0.150 are UP and with correct IPs
- ✅ gateway is set up correctly
Troubleshooting
The methodology I’ve used, from simple to advanced:
➡️ Drawing a traffic-flow & break down the steps
Sometimes it’s enough to solve a simple & obvious problem…
🟣 is showing the traffic making its way out of jail-c1 through vnet0.140 (vlan140 on vnet0).
Also, vnet0 is a friendly name for epair0b (jail side) which is like a patch cable connected to the virtual bridge to the host through epair0a
So, everything sent to vnet0 (epair0b) is making its way to the other side of the patch cable (epair0a).
Since jail-c1 and jail-s1 are in a different VLAN, the traffic is sent to jail-c1 default gateway (em1.140@host2 on 10.10.40.254)
The traffic passes through NIC em0@host3 and goes into the physical switch trunk port where the VLANs are allowed.
jail-c1 broadcasts ARP Requests which are received on em1 (and em1.140) port @host2
And the traffic will flow in the other direction in 🔵…
➡️ Runing tcpdump on vnet0@jail-c1 directly
Let’s validate the traffic flow from the jail’s point of view:
root@jail-c1:~ # tcpdump -i vnet0 -e
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on vnet0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
21:26:17.599708 02:36:e7:5d:33:0b (oui Unknown) > Broadcast, ethertype 802.1Q (0x8100), length 46: vlan 140, p 0, ethertype ARP (0x0806), Request who-has 10.10.40.254 tell 10.10.40.1, length 28
21:26:18.662585 02:36:e7:5d:33:0b (oui Unknown) > Broadcast, ethertype 802.1Q (0x8100), length 46: vlan 140, p 0, ethertype ARP (0x0806), Request who-has 10.10.40.254 tell 10.10.40.1, length 28
...
I’m using -e option to see the packets VLAN ID.
ARP Requests are sent to the right VLAN ID/broadcast domain to find the default gateway MAC address, but my jail never receives the ARP Responses, that’s why I don’t have any response to my pings. But why ?
➡️ Runing tcpdump on em0@host3
Let’s take a look at the IN/OUT traffic from host3 point of view to eliminate any potential virtual/bridge issues…
I’m running the same tcpdump command, but this time it’s not on the bridge, it’s on em0@host3 which is connecting the bridge to the real switch:
blt at host3 in ~$ doas tcpdump -i em0 -e
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on em0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
21:33:07.147507 02:36:e7:5d:33:0b (oui Unknown) > Broadcast, ethertype 802.1Q (0x8100), length 46: vlan 140, p 0, ethertype ARP (0x0806), Request who-has 10.10.40.254 tell 10.10.40.1, length 28
21:33:08.162309 02:36:e7:5d:33:0b (oui Unknown) > Broadcast, ethertype 802.1Q (0x8100), length 46: vlan 140, p 0, ethertype ARP (0x0806), Request who-has 10.10.40.254 tell 10.10.40.1, length 28
...
Traffic is properly sent to the bridge and to em0@host3, so far, so good!
➡️ Runing tcpdump on em1@host2
Let’s take a look at the other side on host2, so we can confirm the switch is doing its job properly:
blt at host2 in ~$ doas tcpdump -i em1 -e
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on em1, link-type EN10MB (Ethernet), snapshot length 262144 bytes
21:35:41.268374 02:36:e7:5d:33:0b (oui Unknown) > Broadcast, ethertype 802.1Q (0x8100), length 60: vlan 140, p 0, ethertype ARP (0x0806), Request who-has 10.10.40.254 tell 10.10.40.1, length 42
21:35:41.268393 00:0e:0c:d0:b0:b1 (oui Unknown) > 02:36:e7:5d:33:0b (oui Unknown), ethertype 802.1Q (0x8100), length 46: vlan 140, p 0, ethertype ARP (0x0806), Reply 10.10.40.254 is-at 00:0e:0c:d0:b0:b1 (oui Unknown), length 28
21:35:42.331396 02:36:e7:5d:33:0b (oui Unknown) > Broadcast, ethertype 802.1Q (0x8100), length 60: vlan 140, p 0, ethertype ARP (0x0806), Request who-has 10.10.40.254 tell 10.10.40.1, length 42
21:35:42.331416 00:0e:0c:d0:b0:b1 (oui Unknown) > 02:36:e7:5d:33:0b (oui Unknown), ethertype 802.1Q (0x8100), length 46: vlan 140, p 0, ethertype ARP (0x0806), Reply 10.10.40.254 is-at 00:0e:0c:d0:b0:b1 (oui Unknown), length 28
...
Not only the switch is working perfectly fine but em1@host2 is receiving the ARP Requests packets, AND we can also see the longe-awaited ARP Responses…
➡️ Out-of-ideas…
I installed tshark, and got the exact same results as tcpdump.
Forums, Stackoverflow, Bugzilla…
I thought I had figured out the issue in a Bugzilla ticket, that was very similar to my issue, so I decided to check and force my ARP entries.
➡️ Checking ARP entries @ jail-c1 and force them
root@jail-c1:~ # arp -n 10.10.40.254
10.10.40.254 (10.10.40.254) -- no entry
What if I manually add the entry ?
root@jail-c1:~ # arp -s 10.10.40.254 00:0e:0c:d0:b0:b1
root@jail-c1:~ # arp -n 10.10.40.254
? (10.10.40.254) at 00:0e:0c:d0:b0:b1 on vnet0.140 permanent [vlan]
Let’s ping again from jail-c1 its default gateway and check tcpdump on em1@host2:
root@jail-c1:~ # ping 10.10.40.254
PING 10.10.40.254 (10.10.40.254): 56 data bytes
--- 10.10.40.254 ping statistics ---
30 packets transmitted, 0 packets received, 100.0% packet loss
100% packet loss…
blt at host2 in ~$ doas tcpdump -i em1 -e
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on em1, link-type EN10MB (Ethernet), snapshot length 262144 bytes
21:49:05.954190 02:36:e7:5d:33:0b (oui Unknown) > 00:0e:0c:d0:b0:b1 (oui Unknown), ethertype 802.1Q (0x8100), length 102: vlan 140, p 0, ethertype IPv4 (0x0800), 10.10.40.1 > 10.10.40.254: ICMP echo request, id 46251, seq 0, length 64
21:49:05.954221 00:0e:0c:d0:b0:b1 (oui Unknown) > 02:36:e7:5d:33:0b (oui Unknown), ethertype 802.1Q (0x8100), length 102: vlan 140, p 0, ethertype IPv4 (0x0800), 10.10.40.254 > 10.10.40.1: ICMP echo reply, id 46251, seq 0, length 64
It’s not a surprised that ICMP Requests are now hitting the default gateway, after all jail-c1 knows now where to send them, and host2 is responding with ICMP Responses, but they are not seen by host3 nor jail-c1.
This confirms that I’m not hitting the ARP bug I found previously…
➡️ Cross checking with TAP and dedicated packet capture probe
A TAP (Test Access Port) is like a tap water for packets! This small physical device is connected in the “middle” of a link and the output is connected to a dedicated passive packet capture probe. This allows to follow the same packet end-to-end from multiple points of capture, so there is no more room to any kind of interpretation.
ARP Responses are seen correctly on TAP 1 (packet #55 below) connected between the physical switch and em1@host2
ARP Responses are also observed on TAP 2 (packet #56 below) connected between the physical switch and em0@host3 while tcpdump capturing on the same interface was not seeing any ARP Response at all!

➡️ So what ?
To sum things up, the traffic is going out correctly ✅, and coming back correctly ✅ (confirmed by the external probe)…but not seen by tcpdump, tshark nor my host/virtual bridge/jail 😡 ?
There’s definitely something in the system that’s silently filtering ✂️ all the responses when the packets are VLAN tagged.
We confirmed that the packets are received from the physical NIC em0@host3 but they’re not passing through to the OS level…
Solution
The external probe is showing the packets but not the system, it must have diseappeared on the NIC itself, let’s take a look at ifconfig em0 on host3:
blt at host3 in ~$ doas ifconfig em0
em0: flags=1008943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 1500
options=8120b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,WOL_MAGIC,VLAN_HWFILTER,HWSTATS>
ether 08:60:6e:44:cd:d0
media: Ethernet autoselect (1000baseT <full-duplex>)
status: active
nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
And HERE my eyes 👀 were suddenly OPENED : VLAN_HWFILTER
After doing some digging on Bugzilla, it looks like that VLAN Hardware Offloading has some bugs… Let’s disable it.
$ doas ifconfig em0 -vlanhwfilter
Let’s ping again jail-s1 from jail-c1 to confirm !
root@jail-c1:~ # ping 10.10.50.1
64 bytes from 10.10.50.1: icmp_seq=0 ttl=63 time=0.628 ms
...
To make it permanent in /etc/rc.conf:
ifconfig_em0="up -vlanhwfilter"
I can also disable all the Hardware Offloading if needed, at least I can keep this in mind:
$ doas ifconfig em0 -rxcsum -txcsum -tso -lro -vlanhwtso -vlanhwfilter
Conclusion
The traffic was hiting the NIC, and capturing the traffic externally really helped to understand the issue.
The traffic was not reaching the OS level as it was filtered on the NIC level, so it was invisible to the OS and to the tools like tcpdump/tshark…
I might have found the issue differently, but this was the quickest way for me.
Don’t ever blindly trust your (best|old|loyal) tools! It wasn’t actually failing, but it was blinded as the traffic was stopped on the NIC level.
The shocking thing was seeing the response from the external probe but not on the actual physical interface captured from the OS !
Monitor your network, not just from one spot, but from many points of measure as you can, this will make your troubleshooting so much smoother and faster!