SIMP Internal Architecture
Simple In the Middle Proxy started as a learning project: a small MITM TCP proxy I built to get hands-on with Go concurrency, programmable filtering, and epoll-based event loops. This post walks through the internal architecture, the tradeoffs between its legacy goroutine model and the custom poller, and the rough edges I would fix if I were building it today.
Project Structure and Separation of Concerns
The server codebase follows a modular architecture centered around the core
package, decoupling the business logic (proxies/filters) from the control plane
(gRPC) and configuration.
The Manager Pattern
The Manager serves as the central registry, holding references to all active
proxy services. This design choice isolates the lifecycle management of proxies
from the API layer. The gRPC service communicates solely with the Manager,
preventing race conditions or direct access to raw socket structures.
1// core/manager.go
2
3type Manager struct {
4 Services map[string]proxy.Proxy
5 NumOfServices int
6}
7
8func newManager() *Manager {
9 return &Manager{
10 Services: make(map[string]proxy.Proxy),
11 NumOfServices: 0,
12 }
13}
Concurrency Models: Legacy vs. Epoll
SIMP implements two distinct networking engines, selectable via configuration. This dual-approach allows performance benchmarking between idiomatic Go and raw syscall usage.
Legacy Mode
This implementation uses standard net libraries. Upon accepting a TCP
connection, it spawns a BidirChannel which manages two goroutines per
connection (Client->Server and Server->Client).
Design Rationale: This maximizes development speed and reliability. The Go runtime scheduler efficiently handles I/O wait times via the Netpoller.
This implementation is also portable to windows
In practice, every accepted connection becomes a BidirChannel with one
errChan that both directions report to. The pumps run until any side errors,
then they signal errChan, which triggers the deferred closes and a connection
counter decrement. Writes block when the peer is slow, so Go parks the goroutine
until the socket is writable instead of spinning.
1// core/proxy/legacy/bidirChannel.go
2
3func (bidChan *BidirChannel) StartComunication() {
4 defer bidChan.dstIo.Close()
5 defer bidChan.srcIo.Close()
6
7 // Spawn read/write pumps for both directions
8 go channel(bidChan.srcIo, bidChan.dstIo)
9 go channel(bidChan.dstIo, bidChan.srcIo)
10
11 // Wait for error or close signal
12 <-bidChan.errChan
13
14 metrics.DecConnCounter()
15}
16
17// The blocking read/write pump closure
18channel := func(src, dest io.ReadWriteCloser) {
19 isFromOutside := src == bidChan.srcIo
20 buff := make([]byte, 0xffff)
21
22 for {
23 // 1. Blocking Read
24 l, err := src.Read(buff)
25 if err != nil { return }
26 b := buff[:l]
27
28 // 2. Filter Validation
29 matched := false
30 bidChan.filterArr.Iterate(func(idx int, f filter.Filter) (int32, bool) {
31 if _, matched = f.Validate(b); matched {
32 return 0, true
33 }
34 return 0, false
35 })
36
37 if isFromOutside && matched {
38 return // Drop connection
39 }
40
41 // 3. Blocking Write
42 _, err = dest.Write(b)
43 if err != nil { return }
44 }
45}
Poll Mode
This implementation bypasses Go’s netpoller in favor of a custom event loop
using syscall.
- The Poller: Wraps
syscall.EpollWaitusing Edge-Triggered mode (POLL_EDGE_TRIGGERED). - The Reactor: Implements a worker pool pattern to handle events generated by the poller.
In poll mode the proxy takes on more of the scheduling work. The Poller emits
callbacks that drain an FD until EAGAIN so edge-triggered epoll goes quiet
until more data arrives. The Reactor keeps a per-FD TryLock so only one worker
touches a connection at a time; if two events collide, one gets re-queued
instead of racing. A bounded worker pool keeps goroutine counts stable, and
doneChan/closeWg let the loop finish in-flight callbacks before shutdown.
1// core/proxy/poll/reactor.go
2
3func (r *Reactor) Run() {
4 for {
5 select {
6 case <-r.doneChan:
7 return
8 case ev := <-r.eventChan:
9 // Locking prevents parallel processing of the same connection
10 if !ev.l.TryLock() {
11 go func() {
12 time.Sleep(100 * time.Nanosecond) // Reschedule event
13 r.eventChan <- ev
14 }()
15 continue
16 }
17
18 r.workerPool <- struct{}{}
19 r.closeWg.Add(1)
20
21 go func() {
22 start := time.Now()
23 ev.callback() // Execute read/write logic
24
25 metrics.ObserveReqHandlingTime(float64(time.Since(start).Milliseconds()))
26 metrics.IncTotalReqCounter()
27
28 ev.l.Unlock()
29 <-r.workerPool
30 r.closeWg.Done()
31 }()
32 continue
33 }
34 }
35}
The actual I/O is handled inside the callback generated by the Poller:
1// core/proxy/poll/poll.go
2
3// Callback generated for a POLL_READ event
4callback: func() {
5 // 1. Non-Blocking Read Loop
6 var data []byte
7 buf := make([]byte, 1024)
8 for {
9 n, err := sys.Read(conn.Fd, buf[:])
10 if err == sys.EAGAIN { break } // No more data
11 if n == 0 { conn.IsClosed = true; return }
12 data = append(data, buf[:n]...)
13 }
14
15 // 2. Filter Logic
16 if conn.Type == CLIENT {
17 p.filterArr.Iterate(func(idx int, f filter.Filter) (int32, bool) {
18 if _, matched = f.Validate(data); matched {
19 return 0, true
20 }
21 return 0, false
22 })
23 if matched {
24 conn.IsClosed = true
25 return
26 }
27 }
28
29 // 3. Non-Blocking Write Loop
30 var written int = 0
31 for written < len(data) {
32 n, err := sys.Write(conn.Other.Fd, data[written:])
33 if err == sys.EAGAIN { break } // Socket full, should buffer remaining
34 written += n
35 }
36}
By controlling the event loop, SIMP reduces the memory footprint of stack allocations associated with thousands of goroutines.
Dynamic Filtering System
The core/filter package defines the contract for traffic inspection:
1// core/filter/filter.go
2
3type Filter interface {
4 // Returns true if filter is matched, otherwise false
5 Validate(data []byte) (response string, ok bool)
6 ToString() (str string)
7 GetUUID() string
8 GenUUID()
9}
Hot-Swapping and Thread Safety
Filters are stored in a FilterArray which wraps a slice with a sync.RWMutex.
This allows the proxy to read rules at high throughput (RLock) while the gRPC
API can safely inject new rules (Lock) at runtime.
1// core/filter/filterArray.go
2
3type FilterArray struct {
4 filters []Filter
5 mux sync.RWMutex
6}
7
8// Thread-safe iteration used by the data path
9func (arr *FilterArray) Iterate(fun func(idx int, f Filter) (errCode int32, brk bool)) int32 {
10 var brk bool = false
11 var errCode int32 = 0
12 arr.mux.RLock()
13 for idx, f := range arr.filters {
14 if errCode, brk = fun(idx, f); brk {
15 break
16 }
17 }
18 arr.mux.RUnlock()
19 return errCode
20}
Tengo Integration
To support complex logic without recompiling, SIMP embeds Tengo, a fast scripting language for Go. The scripts are compiled once upon creation, then cached by the system.
The validation takes from the Filter interface, takes the received data and pass
it to the script, this provides a higher level of flexibility for application
specific checks that otherwise would be difficult to block or simply
log using Regex/Length-based filters.
1// core/filter/scriptFilter.go
2
3func (f *ScriptFilter) Validate(data []byte) (response string, ok bool) {
4 // Inject the payload into the script VM
5 if err := f.compiled.Set("data", data); err != nil {
6 logger.Logger.Error().Err(err).Msg("Error when setting `data` variable inside the script!")
7 return "", false
8 }
9
10 // Execute
11 if err := f.compiled.Run(); err != nil {
12 logger.Logger.Error().Err(err).Msg("Error when running the script!")
13 return "", false
14 }
15
16 // Check output variable
17 blk := f.compiled.Get(outputVariableName).Bool()
18 return "", blk
19}
An example of a Tengo script where we check if the packet payload (decoded from base64) contains some “secret” value.
As a side node the imports that Tengo provides are many, the application restrict them to
[math, fmt, base64, hex, rand, text]for security reasons.
1text := import("text")
2base64 := import("base64")
3
4process := func(data) {
5 decoded := base64.decode(data)
6 if text.contains(decoded, "super-secret-payload") {
7 return true
8 }
9 return false
10}
11
12matched := process(data)
Observability Integration
Observability is baked into the application. We use Prometheus atomic counters to minimize locking contention during metric collection, so you can watch open connections and handling latency without paying for extra locks in the hot path.
1// core/proxy/legacy/bidirChannel.go
2
3func (bidChan *BidirChannel) StartComunication() {
4 // ...
5 metrics.IncConnCounter()
6 // ...
7 <-bidChan.errChan
8 metrics.DecConnCounter()
9}
This was usefull when we first deployed the proxy in the attack/defense environment as it provided a view of the program’s behaviour in a real environment (e.g. memory usage, thread count, go-routines count, median time to execute a filter, etc…).
The attack&defense game where this application is utilized is 8h straight.
Review
While SIMP demonstrates advanced concepts, several architectural decisions should be changed.
-
The Reactor “Spin-Lock”: In
core/proxy/poll/reactor.go, when a connection is locked, the event is rescheduled via a goroutine sleep:1if !ev.l.TryLock() { 2 go func() { 3 time.Sleep(100 * time.Nanosecond) 4 r.eventChan <- ev 5 }() 6 continue 7}Problem: This creates essentially a distributed spin-lock. Under high contention, this floods the scheduler with sleeping goroutines and increases CPU usage without making progress. To improve the current implementation we could use
epoll’sEPOLLONESHOTto ensure only one thread receives an event for a specific FD until re-armed. -
Allocation Heavy Epoll Loop: The Epoll implementation allocates a new buffer for every read event:
Problem: Allocating a fresh 1KB buffer for every read creates lots of short-lived objects, which increases GC work. We could use
sync.Poolso read buffers are reused across events instead of allocated each time.
Final Thoughts and Roadmap
Reflecting on this project, building SIMP during my second/third year of university without prior systems programming knowledge was an ambitious undertaking. It successfully demonstrates the core concepts of proxying, event loops, and concurrency. However, viewed through the lens of modern production engineering, the architecture shows how naive it is.
While the custom Epoll implementation was a fantastic learning exercise, it is not how one would write a production Go service today. That said, the foundation is solid enough to iterate upon.
Proposed Roadmap
Here is how I would re-architect SIMP today to address the limitations discussed above, balancing user-space optimizations with modern best practices:
| Category | Current Implementation | Proposed Improvement |
|---|---|---|
| Protocol Layer | Raw []byte slice passing |
Abstract TCP, UDP, and HTTP/TLS into specific handlers, creating protocol aware filters. |
| Traffic Support | TCP Only | Add UDP support and TLS Termination. |
| Memory | Frequent Allocations (make([]byte)) |
Use sync.Pool and splice syscalls to move data without user-space copies. |
| Extensibility | Regex/Script on partial chunks | Full message parsing allowing rules on HTTP Headers, etc. |
The Case for Protocol-Aware Filters
The biggest limitation of the current design is that filters operate on arbitrary byte slices. In TCP, a logical message (like an HTTP request) might be split across multiple read events.
The proxy should reconstruct the stream based on the protocol:
- L4 Filters: operate on connection metadata (IP, Port).
- L7 Filters: parse the stream. For HTTP, wait for the full header before running filters.
- TLS Wrappers: Transparently handle the TLS handshake to inspect encrypted traffic, falling back to TCP passing if decryption fails.
eBPF
While the roadmap above focuses on optimizing the user-space Go implementation, modern high-performance networking is increasingly moving towards eBPF (Extended Berkeley Packet Filter).
Technologies like eBPF allow for programmable packet processing directly in
the Linux kernel (and soon on windows?!),
eliminating the overhead of context switches inherent in user-space proxies like SIMP.
While integrating eBPF would require a fundamental rewrite of the data plane
(likely in C or utilizing Go libraries like cilium),
it represents the
ultimate step in performance, allowing filtering and observability at speeds
that pure user-space applications struggle to match.
For a project like SIMP, eBPF could be used to offload simple L4 filtering to the kernel, passing only complex L7 inspection tasks to the user-space application, creating a hybrid architecture that balances performance with programmability.
By moving to this layered approach, SIMP could evolve from a simple packet shoveler into a real security research tool.