tc
Configure the Linux kernel packet scheduler (Traffic Control).
Introduction
tc (Traffic Control) is the user-space utility used to configure the Linux kernel packet scheduler. It is the absolute authority on how packets leave your network interface.
By default, Linux uses a "First-In, First-Out" (FIFO) queue. While simple, FIFO is disastrous for mixed workloads (e.g., bulky RTSP video + fragile API calls) because a burst of video packets can block critical control signals—a phenomenon known as Head-of-Line Blocking or Bufferbloat.
Core Capabilities
- SHAPING: Limiting transmission rate (e.g., "Max 5Mbps").
- SCHEDULING: Reordering packets (e.g., "Send API packets before Video packets").
- POLICING: Dropping traffic that exceeds a limit (usually ingress).
- DROPPING: Selectively dropping packets to signal congestion (AQM).
Architecture: The QDisc Hierarchy
Traffic Control relies on three building blocks: QDisc, Class, and Filter.
QDisc (Queueing Discipline)
The "Scheduler." It lives on the interface root (egress).
- Classless QDiscs: Simple. Do not allow child queues. (e.g.,
pfifo_fast,fq_codel,tbf,netem). - Classful QDiscs: The "Parent." Can contain multiple child classes with different rules. (e.g.,
prio,htb,cbq).
Class
The "Category." Classes exist only inside Classful QDiscs.
Example: A prio QDisc might have three classes: Band 0 (High), Band 1 (Medium), Band 2 (Low).
Filter
The "Classifier." Filters look at packet headers (IP, Port, Protocol) and decide which Class the packet belongs to.
Most common filter: u32 (matches specific bits in the packet header).
The "PRIO" QDisc (Strict Priority)
Best for: Ensuring critical traffic (VoIP, API) is never delayed by bulk traffic (Video, File Transfers).
The prio QDisc does not shape bandwidth; it manages order. It dequeues packets strictly:
- If Band 0 has packets, send them.
- If Band 0 is empty, check Band 1.
- If Band 1 is empty, check Band 2.
Warning: If Band 0 is flooded (100% bandwidth), Band 1 and 2 will starve completely.
Recipe: Prioritizing API over Video
This configuration creates a fast lane for port [port].
Clean existing rules.
tc qdisc del dev [interface] root 2>/dev/null
Create PRIO QDisc with 3 bands. Map Linux kernel priorities (TOS) to these bands (default mapping used here).
tc qdisc add dev [interface] root handle 1: prio bands 3 \
priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
Create the Filters (The Logic).
Filter A: Match Source Port [port] (API Response) -> Send to Band 0 (FlowID 1:1).
tc filter add dev [interface] protocol ip parent 1:0 prio 1 u32 \
match ip sport [port] 0xffff \
flowid 1:1
Filter B: Match TCP ACKs (Protocol 6, small packets) -> Send to Band 0. ACKs are crucial for upload throughput; if ACKs are delayed, download slows down.
tc filter add dev [interface] protocol ip parent 1:0 prio 1 u32 \
match ip protocol 6 0xff \
match u8 0x05 0x0f at 0 \
match u16 0x0000 0xffc0 at 2 \
match u8 0x10 0xff at 33 \
flowid 1:1
All other traffic falls through to Band 1 or 2 based on TOS, or Band 1 by default.
The "HTB" QDisc (Hierarchical Token Bucket)
Best for: Bandwidth Guarantees and Limiting. (e.g., "Video gets max 5Mbps, API gets guaranteed 1Mbps").
HTB allows "borrowing." If the API isn't using its 1Mbps, the Video stream can borrow it.
Recipe: Bandwidth Reservation
Scenario: 10Mbps Uplink. We want to guarantee 2Mbps for API, and limit Video to 8Mbps.
Root HTB handle.
tc qdisc add dev [interface] root handle 1: htb default 20
Create the Main Class (Total speed 10mbps).
tc class add dev [interface] parent 1: classid 1:1 htb rate 10mbit burst 15k
Create Child Classes.
Class 10: API (High Priority, Guaranteed 2mbit, can borrow up to 10mbit).
tc class add dev [interface] parent 1:1 classid 1:10 htb rate 2mbit ceil 10mbit prio 1
Class 20: Video (Lower Priority, Guaranteed 8mbit, can borrow up to 10mbit).
tc class add dev [interface] parent 1:1 classid 1:20 htb rate 8mbit ceil 10mbit prio 2
Filters. Send Port [port] to Class 10.
tc filter add dev [interface] protocol ip parent 1:0 prio 1 u32 match ip sport [port] 0xffff flowid 1:10
Network Emulation (netem)
Best for: Developers testing application resilience.
You can use tc to simulate bad networks to ensure your retry logic works.
Recipe: Simulate a Bad 4G Connection
Run this inside your container to see if your app handles "Socket Hang Up" correctly.
Add 100ms delay (+/- 10ms jitter) and 1% packet loss.
tc qdisc add dev [interface] root netem delay 100ms 10ms loss 1%
To remove:
tc qdisc del dev [interface] root
The u32 Match Syntax
The u32 filter is powerful but cryptic. It performs bitwise matching on packet headers.
Common Matches
- Source IP:
match ip src 192.168.1.5 - Destination Port:
match ip dport 80 0xffff(0xffff is the mask. It means "match all bits of the port number"). - Protocol:
match ip protocol 6 0xff(6 = TCP, 17 = UDP, 1 = ICMP). - Packet Size:
match u16 0 0xffff at 2(Matches length field in IP header).
Monitoring & Debugging
Traffic control rules are invisible unless queried.
View Active QDiscs:
tc -s qdisc show dev [interface]
Look for: sent X bytes, dropped X packets. If dropped is high, your queue is full or limits are too aggressive.
View Active Classes (for HTB/PRIO):
tc -s class show dev [interface]
View Active Filters:
tc -s filter show dev [interface]
Docker & Container Persistence
tc rules are lost when a container restarts or the interface ([interface]) is recreated.
Requirements
- Privileges: Container must run with
privileged: trueor--cap-add=NET_ADMIN. - Tooling: Image must have
iproute2installed. - Timing: Scripts must run after the interface exists.
Implementation Pattern
The most robust way to apply rules in Docker is monitoring the interface creation:
Wait loop in entrypoint.
while ! ip link show [interface] > /dev/null 2>&1; do sleep 0.5; done
Apply rules.
tc qdisc add ...