Blog - Darsh Shah

On-device RAG QA AI Agent with local LLM

July 06, 2026

Tutorial for building a On-device RAG QA AI Agent with Ollama and Qwen.

Personal Web Research AI Agent using Ollama

June 30, 2026

Tutorial for building a personal Web Research AI Agent with Ollama and Qwen.

Apple Business is Here

May 01, 2026

Apple Business brings together device management, communication tools, and customer reach into one unified platform for businesses and schools. I've been working on this for a while, and it's exciting...

Live Migration of Virtual Machines: Seamless OS Migration with Minimal Downtime

April 21, 2025

This paper presents a technique for migrating running virtual machines across physical hosts with remarkably low downtime -- as little as 60 milliseconds. Using a pre-copy approach that iteratively transfers...

Nested Virtualization and the Turtles Project: Hypervisors All the Way Down

March 31, 2025

The Turtles Project demonstrates that hypervisors can efficiently run inside other hypervisors -- even without architectural support for nesting on x86. By implementing multi-dimensional paging for MMU virtualization and multi-level...

Xen and KVM: Two Approaches to Virtualization

March 10, 2025

Xen pioneered paravirtualization -- modifying guest OSes slightly to achieve near-native performance on a bare-metal hypervisor -- while KVM took a radically simpler approach by turning the Linux kernel itself...

Hardware vs. Software Virtualization: A Deep Dive into x86 Techniques

February 17, 2025

This paper provides a rigorous head-to-head comparison of software-based and hardware-based x86 virtualization. The surprising finding is that software VMMs outperform first-generation hardware VMMs for I/O-heavy and context-switch-heavy workloads, while...

Virtual Machine Monitors and Intel VT: Foundations of Modern Virtualization

January 27, 2025

These two papers provide a comprehensive look at virtualization from both the software and hardware sides. Rosenblum and Garfinkel (VMware's co-founder among them) lay out the design goals and implementation...

FAWN: Trading Raw Speed for Radical Energy Efficiency

January 06, 2025

FAWN proposes a cluster architecture built from many low-power ("wimpy") embedded processors paired with local flash storage, demonstrating that such a system can be several times more energy-efficient than traditional...

Inside the Warehouse-Scale Computer: Datacenter Basics

December 16, 2024

This book reframes the modern datacenter not as a collection of co-located servers, but as a single warehouse-scale computer requiring holistic design. Chapter 4 dives into the physical fundamentals --...

Building Raft with Test-Driven Development in Go

November 25, 2024

Report: “Building Raft - An Exploration of Test-Driven Development in Go” by Sean Klein and Advaya Krishna (Student Project Report, 2015)

Dapper: Google's Large-Scale Distributed Tracing Infrastructure

November 04, 2024

Dapper is Google's production distributed tracing system that provides low-overhead, application-transparent instrumentation across Google's massive infrastructure. Originally conceived as a tracing tool, it evolved into a general-purpose monitoring platform --...

Mesa: Google's Geo-Replicated, Near Real-Time Data Warehousing System

October 14, 2024

Mesa is Google's distributed data warehousing system designed for storing and querying critical advertising metrics. It achieves a unique combination of geo-replication across multiple datacenters, near real-time update ingestion, strong...

Percolator: Large-Scale Incremental Processing at Google

September 23, 2024

Percolator is Google's system for incrementally processing updates to massive datasets, built on top of Bigtable to replace batch-oriented MapReduce pipelines for web search indexing. By providing ACID snapshot-isolation transactions...

Bigtable: Google's Distributed Storage System for Structured Data

September 02, 2024

Bigtable is Google's sparse, distributed, multi-dimensional sorted map designed to scale to petabytes of data across thousands of machines. Rather than building a traditional RDBMS, Google created a system organized...

Chubby: Google's Distributed Lock Service for Loosely-Coupled Systems

August 12, 2024

Chubby is Google's distributed lock service that provides coarse-grained locking and reliable low-volume storage for loosely-coupled distributed systems. Rather than exposing a raw Paxos library to developers, Chubby offers a...

Paxos Made Live: Bridging Theory and Production Systems

July 22, 2024

This paper chronicles the hard-won engineering lessons from building a production fault-tolerant database using the Paxos consensus algorithm inside Google's Chubby lock service. The key takeaway is that the gap...

MapReduce: Simplified Data Processing on Large Clusters

July 01, 2024

This landmark paper introduces MapReduce, a programming model and runtime framework for processing massive datasets across thousands of machines. By abstracting away the complexity of parallelization, fault tolerance, and load...

Locality-Aware Request Distribution in Cluster-Based Web Servers

June 10, 2024

These two papers tackle the problem of intelligently distributing requests across web server clusters. The first introduces LARD, which routes requests to maximize backend cache locality while maintaining load balance....

The Birth of Google: Search Architecture and PageRank

May 20, 2024

These two foundational papers introduce the Google search engine and its core ranking algorithm, PageRank. The first paper describes Google's system architecture -- how web pages are crawled, indexed, and...

Measuring Web Server Capacity Under Realistic Conditions

April 29, 2024

This paper exposes critical shortcomings in how web server benchmarks were designed in the late 1990s -- they failed to push servers past their capacity and ignored the effects of...

Comparing Web Server Architectures: Events, Threads, and Pipelines

April 08, 2024

This paper provides a rigorous performance comparison of event-driven (userver), thread-per-connection (Knot), and hybrid pipeline (WatPipe) web server architectures. After carefully tuning each server and eliminating confounding factors, the authors...

Darsh's Blog