Paper Reviews
Kafka: a Distributed Messaging System for Log Processing
Summary of Kafka: a Distributed Messaging System for Log Processing
Apache Kafka is a system for streaming logs with the aim of providing high throughput over strict guarantees around delivery. This was an interesting paper to read ~13 years on, as Kafka has become more and more ubiquitous in system design.
Anvil: Verifying Liveness of Cluster Management Controllers
Summary of Anvil: Verifying Liveness of Cluster Management Controllers
Wouldn’t it be nice to write some software and confidently say that you know it’s right? That, as long as some assumptions about the world hold, it’s going to do exactly what you want it to, no matter what strange permutations or combinations of failures happen. In broad strokes that’s the promise of formal verification and proofs in software.
MRTOM: Mostly Reliable Totally Ordered Multicast
Summary of MRTOM: Mostly Reliable Totally Ordered Multicast
This paper is about building a network primitive to speed up consensus protocols. Like other papers in this area, MRTOM builds on the fact that network ordered protocols can have much higher throughput than standard consensus protocols. MRTOM takes this one step further by offloading not just packet ordering, but also the fast path of consensus protocols to programmable switches.
How Hard is Asynchronous Weight Reassignment?
Summary of How Hard is Asynchronous Weight Reassignment?
Majority quorum systems are useful in providing a simple mechanism for consensus. To accept a value, you need a majority of servers to agree to accepting it. Weighted majority quorum services (WQMS) take this approach and recognise that some servers are going to have better performance than others, so they should get more voting power.
Hydra: Serialization-Free Network Ordering for Strongly Consistent Distributed Applications
Summary of Hydra: Serialization-Free Network Ordering for Strongly Consistent Distributed Applications
Replicated systems typically pretty much always have some overhead in comparison to unreplicated systems, at least if you want strong consistency for your data. We need to do extra work in order to make sure that we get the same result across all nodes. The fastest systems minimise or avoid that coordination, but where we can’t avoid it, we need an algorithm to manage that consensus.
XFaaS: Hyperscale and Low Cost Serverless Functions at Meta
Summary of XFaaS: Hyperscale and Low Cost Serverless Functions at Meta
This paper from Meta presents their home-built Function-as-a-Service system called XFaaS. According to the paper, XFaaS handles trillions of function invocations per day, across hundreds of thousands of worker servers. The paper sets out the system architecture, and talks about how they’re able to achieve some quite impressive hardware utilization at scale.
Keeping CALM: when distributed consistency is easy
Summary of Keeping CALM: when distributed consistency is easy
This paper from Hellerstein and Alvaro centers around what we can do without coordinating. The core insight of the paper is that some distributed programs can run without coordination, as long as the output of running the program on a subset of the inputs doesn’t change once you get more information.