Breaking the Handshake: Network Protocol Fuzzing
If you have ever run a fuzzer like AFL++ on a file parser (like libpng or
ffmpeg), you know the drill: you throw mutated bits at a binary until it
crashes. But try to point that same logic at a stateful network server, and you
will hit a wall.
Network fuzzing is a systematic approach to discovering security and robustness flaws in software that communicates over structured protocols. It differs from file fuzzing or command-line fuzzing because network protocols involve timing, ordering, and interaction among multiple peers. Recent surveys cover the landscape of protocol fuzzing and stateful system testing in depth [1] [2].
In this post we’re going to look at why naïve random mutation fails against network protocols, and how modern stateful fuzzing architectures address this using structure modeling and deterministic harnessing. We will also look at practical examples targeting the MQTT protocol [3].
Here’s the flow:
- Why network fuzzing fails with naïve mutation.
- How to model syntax and enforce determinism.
- Three practical MQTT setups: white-box, black-box, and gray-box.
The Problem: It’s Not Just About the Bytes
In file fuzzing, the input is static. In network fuzzing, the input is a conversation.
As defined in modern System Under Test (SUT) models, a network protocol is a
state machine. If your fuzzer sends a valid GET_DATA packet, but sends it
before the HANDSHAKE is complete, the server won’t crash, it will just close
the connection. The server logic is correct; the fuzzer is just “speaking out
of turn.”
There are four specific challenges that make this difficult:
- Statefulness: Each message is interpreted relative to prior state. Without valid handshakes, most messages are rejected, so random mutation remains stuck in the connection setup phase.
- Structure: Messages contain complex fields whose values depend on each other. Checksums, length prefixes, and magic bytes create dependencies. If you flip a bit in the payload but don’t update the CRC32, the packet is dropped before it ever hits the vulnerable logic.
- Asynchrony: Network I/O, timers, and multithreading introduce nondeterminism. A timeout may arise from scheduling rather than input semantics, producing noise in feedback.
- Diversity: Implementations interpret the same specification differently. Vendors add extensions or tolerate errors differently, requiring fuzzers that remain robust under dialect variation.
Syntax Modeling
To get past the “Structure” challenge, we cannot simply cat /dev/urandom into
a socket. We need to model the protocol syntax.
When a formal grammar isn’t available, we often use tools like Scapy (Python) to define the packet boundaries. This allows the fuzzer to mutate fields rather than bytes.
Here is an example of modeling a custom binary protocol using Python. Instead of fuzzing the raw stream, we fuzz the fields, and let the generator handle the “hard” constraints like Length and Checksum.
1class MyCustomProto(Packet):
2 name = "MyProto"
3 fields_desc = [
4 # The fuzzer shouldn't mutate this randomly,
5 # or the parser rejects it immediately.
6 IntField("magic", 0xDEADBEEF),
7
8 # Field dependencies: The length must match the payload
9 ShortField("len", None),
10
11 # This is where we want the fuzzer to go wild
12 StrLenField("payload", "", length_from=lambda x: x.len),
13
14 # Checksum must be recalculated on every mutation
15 IntField("checksum", None)
16 ]
17
18 def post_build(self, p, pay):
19 # Automatically fix length and checksum before sending
20 if self.len is None:
21 l = len(p) - 8 # simplified calculation
22 p = p[:4] + struct.pack("!H", l) + p[6:]
23 if self.checksum is None:
24 ck = crc32(p[:-4])
25 p = p[:-4] + struct.pack("!I", ck)
26 return p + pay
By using a model like this, we ensure that 100% of our generated test cases pass the initial packet validation checks. This sharp reduction in “wasted executions” allows the fuzzer to reach semantic logic much faster.
The Harness: Enforcing Determinism
A major bottleneck in network fuzzing is the kernel. Opening a TCP socket,
performing a handshake, and waiting for recv() is incredibly slow (in CPU
time) and introduces timing noise. If a test case crashes the server, can you
reproduce it? If the network was laggy, was the crash caused by the packet
content or the delay?
Modern approaches (like NSFuzz, AFLNet, or StateAFL) often use
“desocketing” or LD_PRELOAD hooks to replace network calls with memory
buffers [4] [5] [6].
Also lock down reproducibility: fix seeds, avoid background threads, and use
deterministic timers where possible.
Here is a conceptual C++ harness compatible with LibFuzzer. Instead of running
the server and connecting via localhost, we link directly against the server
library and feed data directly to the parsing function.
1// harness.cc
2#include "server_lib.h"
3#include <stdint.h>
4#include <stddef.h>
5
6extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
7 // 1. We must ensure every run starts from a clean slate to maintain
8 // auditability.
9 ServerState* SUT = server_init();
10
11 // 2. We inject the data directly into the library
12 server_process_buffer(SUT, data, size);
13
14 // 3. Collect Feedback / Clean up
15 server_teardown(SUT);
16
17 return 0;
18}
This harness controls Timing (no network waits), Isolation (fresh memory), and Synchronization (single-threaded execution).
The Feedback Loop
Once we have structured inputs and a deterministic harness, we need feedback. Classical greybox fuzzers use edge coverage. Protocol fuzzers extend this with state coverage [5] [6]:
- Response-based state modeling: Clustering responses (e.g.,
200 OKvs500 Error) to infer the protocol state machine. - Variable-based state modeling: Instrumenting the SUT to watch key
variables, like
session_state.
The scheduler uses this info to prioritize inputs. If a specific sequence of
messages transitions the server from INIT to AUTHENTICATED, the fuzzer
saves that sequence and uses it as a prefix for future mutations.
Here is a quick comparison of the three approaches:
| Approach | Structure | State | Throughput | Setup effort |
|---|---|---|---|---|
| White-box (AFL++) | Optional | Optional | High | High |
| Black-box (Boofuzz) | Required | Required | Low | Medium |
| Gray-box (AFLNet) | Optional | Required | Medium | High |
Fuzzing MQTT
To demonstrate these concepts, let’s look at two practical approaches to fuzzing the MQTT protocol. The MQTT specification and its security guidance provide the baseline protocol behavior and constraints we are testing against [3] [7].
Whitish-Box: AFL++ (Source Available)
In a whitish-box scenario, we have access to the source code (e.g., the Mosquitto broker). Our goal is to bypass the operating system’s networking stack entirely to increase throughput and determinism.
By using socketpair, we create a bidirectional communication channel that
resides entirely in memory. The fuzzer writes mutated data to one end of the
pair (sv[1]), and we trick the Mosquitto instance into reading from the
other end (sv[0]). To Mosquitto, it looks like a standard network client,
but to the kernel, it is just a memory copy. This allows us to use standard
sanitizers (like ASAN) to detect memory corruption immediately, without the
flakiness of timeouts or dropped packets.
1#include <stdint.h>
2#include <stddef.h>
3#include <unistd.h>
4#include <sys/socket.h>
5#include <fcntl.h>
6
7extern "C" {
8#include "mosquitto.h"
9#include "mosquitto_internal.h"
10#include "packet_mosq.h"
11}
12
13static int g_initialized = 0;
14
15extern "C" int LLVMFuzzerInitialize(int *argc, char ***argv) {
16 if (mosquitto_lib_init() != MOSQ_ERR_SUCCESS) return 0;
17 g_initialized = 1;
18 return 0;
19}
20
21extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
22 if (!g_initialized || size < 2 || size > 65535) return 0;
23
24 // Create a fresh Mosquitto instance for every iteration
25 struct mosquitto *mosq = mosquitto_new(NULL, true, NULL);
26 if (!mosq) return 0;
27
28 // Create a socket pair to mock the network connection
29 // sv[0] is for the server, sv[1] is for the fuzzer
30 int sv[2];
31 if (socketpair(AF_UNIX, SOCK_STREAM, 0, sv) == -1) {
32 mosquitto_destroy(mosq);
33 return 0;
34 }
35
36 // Write the fuzzer's mutation directly into the socket buffer
37 write(sv[1], data, size);
38
39 // Assign the server-side socket to the mosquitto instance
40 mosq->sock = sv[0];
41 mosq->protocol = mosq_p_mqtt5;
42 mosq->state = mosq_cs_connected;
43
44 // Execute the protocol parsing logic synchronously
45 packet__read(mosq);
46
47 close(sv[0]); close(sv[1]);
48 mosquitto_destroy(mosq);
49 return 0;
50}
Running this harness allows AFL++ to fuzz the Mosquitto broker at speeds exceeding 8,000 executions per second on a single core, as shown in the status screen below:
1 american fuzzy lop ++4.35a {default} (./fuzz_mosquitto) [explore]
2┌─ process timing ────────────────────────────────────┬─ overall results ────┐
3│ run time : 0 days, 0 hrs, 3 min, 37 sec │ cycles done : 30 │
4│ last new find : 0 days, 0 hrs, 3 min, 28 sec │ corpus count : 548 │
5│last saved crash : 0 days, 0 hrs, 3 min, 11 sec │saved crashes : 3 │
6│ last saved hang : none seen yet │ saved hangs : 0 │
7├─ cycle progress ─────────────────────┬─ map coverage┴──────────────────────┤
8│ now processing : 78.97 (14.2%) │ map density : 3.98% / 17.74% │
9│ runs timed out : 0 (0.00%) │ count coverage : 2.81 bits/tuple │
10├─ stage progress ─────────────────────┼─ findings in depth ─────────────────┤
11│ now trying : havoc │ favored items : 101 (18.43%) │
12│ stage execs : 73/100 (73.00%) │ new edges on : 163 (29.74%) │
13│ total execs : 1.76M │ total crashes : 71 (3 saved) │
14│ exec speed : 8048/sec │ total tmouts : 14 (0 saved) │
15├─ fuzzing strategy yields ────────────┴─────────────┬─ item geometry ───────┤
16│ bit flips : 1/96, 1/95, 1/93 │ levels : 2 │
17│ byte flips : 0/12, 1/11, 2/9 │ pending : 0 │
18│ arithmetics : 1/812, 0/1344, 0/1032 │ pend fav : 0 │
19│ known ints : 0/99, 2/386, 0/484 │ own finds : 44 │
20│ dictionary : 0/228, 0/247, 0/0, 0/0 │ imported : 425 │
21│havoc/splice : 36/1.71M, 0/0 │ stability : 100.00% │
22│py/custom/rq : unused, unused, unused, unused ├───────────────────────┘
23│ trim/eff : 0.66%/38.8k, 83.33% │ [cpu000: 31%]
24└─ strategy: explore ────────── state: started :-) ──┘
However, this approach has limitations. It requires access to source code and
often necessitates significant changes to the build process to link statically
against the target library. It also mocks out the network layer, which means
bugs specific to the actual socket handling (like epoll race conditions)
might be missed.
Black-Box: Boofuzz
When source code is unavailable, we cannot bypass the network stack. Instead, we use tools like Boofuzz to model the protocol grammar and state machine explicitly. Related black-box work like Snipuzz shows how far message-structure inference can go without source access [8].
In this script, we solve two problems:
- Structure: We define a
mqtt_varlen_encoderto handle MQTT’s variable-length integers automatically. If the fuzzer expands the payload, this encoder ensures the length field remains valid so the packet is accepted by the broker. - State: We use
session.connectto define a graph. The fuzzer learns that it must successfully send aMQTT_CONNECTpacket before it attempts to send aMQTT_PUBLISHpacket.
1def mqtt_varlen_encoder(value):
2 n = int.from_bytes(value, byteorder="big", signed=False) if value else 0
3 if n < 0 or n > 268_435_455: raise ValueError(f"Remaining Length out of range for MQTT varint: {n}")
4 out = bytearray()
5 while True:
6 encoded = n % 128
7 n //= 128
8 if n > 0: encoded |= 0x80
9 out.append(encoded)
10 if n == 0: break
11 if len(out) > 4: raise ValueError("MQTT varint produced >4 bytes, which is invalid.")
12 return bytes(out)
13
14def build_mqtt_packet(name: str, control_header: Union[int, dict], variable_header_fields=None, payload_fields=None):
15 variable_header_fields = variable_header_fields or []
16 payload_fields = payload_fields or []
17
18 def build_fields(field_defs):
19 elements = []
20 for f in field_defs:
21 ftype, fname, fval, fuzzable, endian, max_len = f.get("type"), f.get("name"), f.get("value", 0), f.get("fuzzable", True), f.get("endian", "big"), f.get("max_len", None)
22
23 if ftype == "group":
24 values, default_value = f.get("values", []), f.get("default_value", None)
25 elements.append(Group(name=fname, values=values, default_value=default_value, fuzzable=fuzzable))
26 elif ftype == "byte": elements.append(Byte(name=fname, default_value=fval, fuzzable=fuzzable))
27 elif ftype == "word": elements.append(Word(name=fname, default_value=fval, endian=endian, fuzzable=fuzzable))
28 elif ftype == "string":
29 elements.append(Size(name=f"{fname}_len", block_name=fname, length=2, endian=">", fuzzable=False))
30 elements.append(String(name=fname, default_value=fval, fuzzable=fuzzable, max_len=max_len))
31 elif ftype == "raw": elements.append(Bytes(name=fname, default_value=fval, fuzzable=fuzzable))
32
33 return elements
34
35 if type(control_header) == dict:
36 fvalues, fdef, ffuzz = control_header.get("values", None), control_header.get("default_value", None), control_header.get("fuzzable", False)
37 if fvalues == None and fdef == None:
38 print("[FATAL] At least values or default value has to be specified for the control header")
39 exit()
40 ch = Group(name="ControlHeader", values=fvalues, default_value=fdef, fuzzable=ffuzz)
41 else:
42 ch = Byte(name="ControlHeader", default_value=control_header, fuzzable=False)
43
44 return Request(name, children=(
45 Block(name="FixedHeader", children=(
46 ch, # Control Header
47 Block(name="RemainingLength", children=Size(name="RemainingLengthRaw", block_name="Body", fuzzable=False, length=4, endian=">"), encoder=mqtt_varlen_encoder, fuzzable=False)
48 )),
49 Block(name="Body", children=(
50 Block(name="VariableHeader", children=build_fields(variable_header_fields)),
51 Block(name="Payload", children=build_fields(payload_fields))
52 ))
53 ))
54
55def build_connect_request():
56 variable_header = [
57 {"type": "string", "name": "ProtocolName", "value": "MQTT", "fuzzable": False},
58 {"type": "byte", "name": "ProtocolLevel", "value": 5, "fuzzable": False},
59 {"type": "byte", "name": "ConnectFlags", "value": 0x02, "fuzzable": False},
60 {"type": "word", "name": "KeepAlive", "value": 60},
61 {"type": "byte", "name": "PropertiesLength", "value": 0, "fuzzable": False},
62 ]
63
64 payload = [
65 {"type": "string", "name": "ClientID", "value": "boofuzz", "max_len": 30}
66 ]
67 return build_mqtt_packet("MQTT_CONNECT", 0x10, variable_header, payload)
68
69# ... ... ...
The session.connect calls internally create a Finite State Machine representing
the protocol evolution. This can be shown with session.render_graph_graphviz().create_png().
The output from my script produced the following

Boofuzz generates a web/tui interface to track progress that provides immediate feedback on the state protocol graph traversal:
1# NOTE: this is an extract of the tui as the output was longer and not really
2# usefull to paste it here
3
4│[2025-12-29] Test Case: 1460: MQTT_CONNECT->MQTT_SUBSCRIBE:[MQTT_SUBSCRIBE.Body.Payload.TopicFilter:751]
5│[2025-12-29] Info: Type: String
6│[2025-12-29] Info: Opening target connection (127.0.0.1:1883)...
7│[2025-12-29] Info: Connection opened.
8│[2025-12-29] Test Step: Monitor CallbackMonitor#140523418095968[pre=[],post=[],restart=[],post_start_target=[]].pre_send()
9│[2025-12-29] Test Step: Transmit Prep Node 'MQTT_CONNECT'
10│[2025-12-29] Info: Sending 22 bytes...
11│[2025-12-29] Transmitted 22 bytes: 10 14 00 04 4d 51 54 54 05 02 00 3c 00 00 07 62 6f 6f 66 75 7a 7a b'\x10\x14\x00\x04MQTT\x05\x02\x00<\x00\x00\x07boofuzz'
12│[2025-12-29] Info: Receiving...
13│[2025-12-29] Received: 20 09 00 00 06 22 00 0a 21 00 14 b' \t\x00\x00\x06"\x00\n!\x00\x14'
14│[2025-12-29] Test Step: Callback function 'conn_callback'
15│[2025-12-29] Info: Received CONNACK: 200900000622000a210014
16│[2025-12-29] Test Step: Fuzzing Node 'MQTT_SUBSCRIBE'
17│[2025-12-29] Info: Sending 1000010 bytes...
18│[2025-12-29] Error!!!! SIGINT received ... exiting
While Boofuzz is excellent for logic and state testing, it is significantly slower than AFL++ because it operates over the real network stack. It typically manages 10-50 executions per second compared to AFL++’s thousands. Furthermore, defining the protocol grammar manually is time-consuming and prone to errors if the specification is complex.
Gray-Box: AFLNet
//TBD
Conclusion
We covered why naïve mutation fails on protocols, how syntax modeling keeps inputs valid, and how deterministic harnesses reduce noise so feedback can guide exploration. The MQTT examples show the trade-offs: white-box harnesses buy speed, black-box frameworks buy reach, and gray-box fuzzers bridge the two. The practical takeaway is simple: start by making inputs valid enough to reach deep logic, then let coverage or state feedback do the exploration. LLM-guided approaches are emerging as another way to learn protocol structure and guide mutations [9].
References