Networking

tc

Configure the Linux kernel packet scheduler (Traffic Control).

#networking #linux #traffic-control #qos

Introduction

tc (Traffic Control) is the user-space utility used to configure the Linux kernel packet scheduler. It is the absolute authority on how packets leave your network interface.

By default, Linux uses a "First-In, First-Out" (FIFO) queue. While simple, FIFO is disastrous for mixed workloads (e.g., bulky RTSP video + fragile API calls) because a burst of video packets can block critical control signals—a phenomenon known as Head-of-Line Blocking or Bufferbloat.

Core Capabilities

  • SHAPING: Limiting transmission rate (e.g., "Max 5Mbps").
  • SCHEDULING: Reordering packets (e.g., "Send API packets before Video packets").
  • POLICING: Dropping traffic that exceeds a limit (usually ingress).
  • DROPPING: Selectively dropping packets to signal congestion (AQM).

Architecture: The QDisc Hierarchy

Traffic Control relies on three building blocks: QDisc, Class, and Filter.

QDisc (Queueing Discipline)

The "Scheduler." It lives on the interface root (egress).

  • Classless QDiscs: Simple. Do not allow child queues. (e.g., pfifo_fast, fq_codel, tbf, netem).
  • Classful QDiscs: The "Parent." Can contain multiple child classes with different rules. (e.g., prio, htb, cbq).

Class

The "Category." Classes exist only inside Classful QDiscs. Example: A prio QDisc might have three classes: Band 0 (High), Band 1 (Medium), Band 2 (Low).

Filter

The "Classifier." Filters look at packet headers (IP, Port, Protocol) and decide which Class the packet belongs to. Most common filter: u32 (matches specific bits in the packet header).


The "PRIO" QDisc (Strict Priority)

Best for: Ensuring critical traffic (VoIP, API) is never delayed by bulk traffic (Video, File Transfers).

The prio QDisc does not shape bandwidth; it manages order. It dequeues packets strictly:

  1. If Band 0 has packets, send them.
  2. If Band 0 is empty, check Band 1.
  3. If Band 1 is empty, check Band 2.

Warning: If Band 0 is flooded (100% bandwidth), Band 1 and 2 will starve completely.

Recipe: Prioritizing API over Video

This configuration creates a fast lane for port [port].

Clean existing rules.

tc qdisc del dev [interface] root 2>/dev/null

Create PRIO QDisc with 3 bands. Map Linux kernel priorities (TOS) to these bands (default mapping used here).

tc qdisc add dev [interface] root handle 1: prio bands 3 \
    priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1

Create the Filters (The Logic).

Filter A: Match Source Port [port] (API Response) -> Send to Band 0 (FlowID 1:1).

tc filter add dev [interface] protocol ip parent 1:0 prio 1 u32 \
   match ip sport [port] 0xffff \
   flowid 1:1

Filter B: Match TCP ACKs (Protocol 6, small packets) -> Send to Band 0. ACKs are crucial for upload throughput; if ACKs are delayed, download slows down.

tc filter add dev [interface] protocol ip parent 1:0 prio 1 u32 \
   match ip protocol 6 0xff \
   match u8 0x05 0x0f at 0 \
   match u16 0x0000 0xffc0 at 2 \
   match u8 0x10 0xff at 33 \
   flowid 1:1

All other traffic falls through to Band 1 or 2 based on TOS, or Band 1 by default.


The "HTB" QDisc (Hierarchical Token Bucket)

Best for: Bandwidth Guarantees and Limiting. (e.g., "Video gets max 5Mbps, API gets guaranteed 1Mbps").

HTB allows "borrowing." If the API isn't using its 1Mbps, the Video stream can borrow it.

Recipe: Bandwidth Reservation

Scenario: 10Mbps Uplink. We want to guarantee 2Mbps for API, and limit Video to 8Mbps.

Root HTB handle.

tc qdisc add dev [interface] root handle 1: htb default 20

Create the Main Class (Total speed 10mbps).

tc class add dev [interface] parent 1: classid 1:1 htb rate 10mbit burst 15k

Create Child Classes.

Class 10: API (High Priority, Guaranteed 2mbit, can borrow up to 10mbit).

tc class add dev [interface] parent 1:1 classid 1:10 htb rate 2mbit ceil 10mbit prio 1

Class 20: Video (Lower Priority, Guaranteed 8mbit, can borrow up to 10mbit).

tc class add dev [interface] parent 1:1 classid 1:20 htb rate 8mbit ceil 10mbit prio 2

Filters. Send Port [port] to Class 10.

tc filter add dev [interface] protocol ip parent 1:0 prio 1 u32 match ip sport [port] 0xffff flowid 1:10

Network Emulation (netem)

Best for: Developers testing application resilience. You can use tc to simulate bad networks to ensure your retry logic works.

Recipe: Simulate a Bad 4G Connection

Run this inside your container to see if your app handles "Socket Hang Up" correctly.

Add 100ms delay (+/- 10ms jitter) and 1% packet loss.

tc qdisc add dev [interface] root netem delay 100ms 10ms loss 1%

To remove:

tc qdisc del dev [interface] root

The u32 Match Syntax

The u32 filter is powerful but cryptic. It performs bitwise matching on packet headers.

Common Matches

  • Source IP: match ip src 192.168.1.5
  • Destination Port: match ip dport 80 0xffff (0xffff is the mask. It means "match all bits of the port number").
  • Protocol: match ip protocol 6 0xff (6 = TCP, 17 = UDP, 1 = ICMP).
  • Packet Size: match u16 0 0xffff at 2 (Matches length field in IP header).

Monitoring & Debugging

Traffic control rules are invisible unless queried.

View Active QDiscs:

tc -s qdisc show dev [interface]

Look for: sent X bytes, dropped X packets. If dropped is high, your queue is full or limits are too aggressive.

View Active Classes (for HTB/PRIO):

tc -s class show dev [interface]

View Active Filters:

tc -s filter show dev [interface]

Docker & Container Persistence

tc rules are lost when a container restarts or the interface ([interface]) is recreated.

Requirements

  • Privileges: Container must run with privileged: true or --cap-add=NET_ADMIN.
  • Tooling: Image must have iproute2 installed.
  • Timing: Scripts must run after the interface exists.

Implementation Pattern

The most robust way to apply rules in Docker is monitoring the interface creation:

Wait loop in entrypoint.

while ! ip link show [interface] > /dev/null 2>&1; do sleep 0.5; done

Apply rules.

tc qdisc add ...