Events vs. Threads: Two Sides of the Web Server Debate

This post is adapted from a paper review I wrote during CMU’s 18-845: Internet Services course, where we studied seminal papers in distributed systems, web architecture, and virtualization. I’ve converted my reviews into blog posts to share them more broadly.

Paper 1: “A Scalable and Explicit Event Delivery Mechanism for UNIX” by Gaurav Banga, Jeff Mogul, and Peter Druschel (USENIX Technical Conference, 1999)

Paper 2: “Why Events Are A Bad Idea (for high-concurrency servers)” by R. von Behren, J. Condit, and E. Brewer (HotOS IX, 2003)

TL;DR

These two papers represent opposing viewpoints in the classic events-versus-threads debate for web server design. The first proposes a scalable event delivery mechanism to replace the bottleneck-prone select() system call, while the second argues that threads – with proper compiler and runtime support – are the superior model for high-concurrency servers. Together, they frame one of the most enduring architectural questions in systems design.

Summary

These two papers represent fundamentally divergent ideas about web server architecture. The central question each addresses is which model is superior: event-based mechanisms, state-based mechanisms, or thread-based mechanisms.

The first paper (Banga et al., 1999) argues that the traditional select() system call does not scale for large numbers of clients. The authors propose a new event delivery mechanism that allows applications to register interest in one or more event sources and to efficiently dequeue new events. The paper demonstrates how this event-based system is significantly more scalable and outperforms both select() and its modified variants.

The second paper (von Behren et al., 2003) takes the opposite stance, arguing that threads are the best choice for highly concurrent applications. It systematically disputes the perceived drawbacks of threads and makes a strong case for why thread-based servers are superior to event-based servers. The paper also introduces the Knot server and shows that it outperforms event-based servers under high load.

Strengths

The first paper effectively demonstrates the drawbacks of the select() system call and shows why modified versions of select() also fall short, building a strong case for the need for an event-based system.
The first paper provides detailed coverage of related work in Win32, POSIX, and Mach, giving useful cross-platform context.
The second paper systematically addresses perceived limitations of threads point by point, making a compelling argument for each. This methodical approach of first acknowledging thread criticisms before rebutting them strengthens the paper’s impact on the reader.

Weaknesses

The second paper references the Lauer and Needham duality argument extensively but should have provided more background or references explaining what the argument means before relying on it so heavily.
The authors of the second paper should have given more examples of the complex control flow patterns they reference.
In the first paper, the authors do not explain why they chose 750 cold connections and 32 hot connections for the initial experiment. More reasoning behind this selection would have been helpful.

Discussion Questions

The evaluation in the second paper is relatively weak. It would be interesting to see how the coroutine library and thread package perform on different operating systems, particularly Windows, where threading support may differ significantly. The same cross-platform question applies to the first paper.
The second paper uses poll() for asynchronous socket I/O, but the first paper shows that poll() shares the same scaling problems as select(). If the threading library relies on poll() internally, how would scalability be affected?
In the second paper, the authors note that performance is limited by interrupt processing overhead in the kernel. It would be interesting to see how performance scales with modifications to the underlying kernel code. Pursuing kernel-level changes would have strengthened their argument.
The two papers use different hardware for evaluation – a 300 MHz Pentium II with 128 MB RAM versus dual 2000 MHz processors with 1 GB RAM – making direct comparison impossible. Running both approaches on identical hardware would be valuable.
In the first paper, the authors state that the cost of all event API operations is independent of event rate, as long as the HINTS queue is configured large enough to hold one entry per descriptor. It would be interesting to see what happens when this constraint is not met and what the API call overhead looks like with a smaller queue. Experiments in this area would have been informative.