Cryptocurrencies such as Bitcoin and smart contract platforms like Ethereum are recent examples of the evolution of consensus algorithms blockchain represents. Consensus, getting distributed processes to agree on a single value, is a fundamental problem in computer science. Distributed processing is difficult. In fact, there are logical proofs that show pretty conclusively that there won’t be a single perfect algorithm for handling consensus in an asynchronous system made of imperfect nodes. As long as there is the possibility of a fault, there is a certainty of a problem.
It’s important to have a solid understanding of the fundamentals since we are tasked with identifying an imperfect consensus algorithm sufficient for complex distributed use cases. First, we’ll discuss consensus algorithms. Next, we’ll look into current algorithms addressing byzantine fault tolerance, followed by emerging algorithms. With that overview of the landscape, we can start to get a handle on selecting an algorithm for a particular use case. We’ll wrap up by looking through a glass darkly about what might be.
Brave New Error-Prone World
We have now fast-forwarded about twenty years into the future since Paxos first came around. In this brave new world, software is really awful. Treating a fault as a result of simple server failure rather than malicious or even poorly tested code is impractical. More and more trust and money is being placed on systems less and less deserving of either. Castro and Liskov’s Practical Byzantine Fault Tolerance was the first workable approach for managing byzantine fault tolerance in an asynchronous network by building on the work done using Fred Schneider’s replicated state machines and lessons learned from Paxos.
PBFT implements state machine replication using a primary and secondary replica, where it’s the job of the secondary to constantly monitor the primary for sanity and liveness and switch over to a new primary if necessary. Each server is a deterministic state machine, starts at the same initial state and is identified by a unique integer. The system model still assumes an asynchronous distributed system where messages may be delayed, duplicated or delivered out of order. However, now public-key cryptography is used to prevent message tampering, corruption or spoofing. While the network size of a system that can tolerate fail-stop failure is 2f + 1, PBFT assumes a network size of 3f + 1 so we do need to throw some money at the problem. The adversary model can now assume the ability to coordinate faulty replicas and delay communications (but not indefinitely) but the cryptography cannot be broken. A client sends requests to the primary but it is assumed that these messages are sent synchronously. The primary validates the request and initiates the 3-phase protocol to ensure consensus among all (non-faulty) replicas. The replicas execute the request and send result directly to the client who accepts the result after receiving f + 1 identical replies.
Messages are processed using a three-phase process: pre-prepare, prepare and commit. In the pre-prepare phase, the goal is to acknowledge the sequence number assigned to the message and broadcast to all the replicas is unique. After the replicas agree on the sequence number multicast by pre-prepare, the acknowledgement of the sequence number is multi-cast to all replicas. This provides a guarantee that different messages can never have the same sequence number. After all replicas have been prepared, a commit message is sent across all replicas. The synchrony assumptions helps to identify if there is a problem with the primary: since clients can only send one message at a time and sequence numbers are ordered, if a majority of non-faulty processes have committed a message with a sequence number of 8 and have been on the pre-prepare phase with a message of sequence number of 10, they will fail the primary if they don’t receive a message numbered 9 within a predefined amount of time. To initiate a view-change, which will switch primaries, all replicas broadcasts to all other replicas along with their current state (list pf prepared requests) so the new primary what messages have not been completed. A new primary is elected when the server receives 2f + 1 messages.
There are practical limitations of scale in this configuration, though. In private, permissioned enterprise networks this was considered a manageable risk. But you can’t have a public, asynchronous network without taking into account both types of faults. In 2008, an entity referring to itself as Satoshi Nakamoto proposed a novel implementation of Proof-of-Work as a probabilistic solution for managing byzantine fault tolerance in a public, trustless decentralized, asynchronous network using existing technologies in an artful way and combines it with a straightforward use case: solve the double spend problem. This opened the way to distributed trustless consensus. We’ve already talked at depth about distributed consensus so we can focus on the trustless part.
Churchill vs Double-Crossing Generals
So far, we have dealt with building consensus on internal systems where there is an implicit level of trust. Fail-stop tolerance assumes full trust – all data is reasonable. Byzantine fault tolerance assumes that most nodes are trustworthy but some may be unreasonable and an even smaller subset may be malicious. Now consider the perspective of a cryptocurrency developer: an adversary may have a cost-free mechanism for creating an unbounded number of colluding malicious entities making it impossible to resolve conflicting information. All the generals are lying. In this adversary model, I can only trust myself. This implies a requirement that I need to be able to discover and confirm everything myself. I need access to a distributed, shared ledger where I can view the entire transaction history. This seems great until you realize that the server on the shared ledger can also be compromised and that a compromised ledger can be as self-consistent as a valid ledger. Satoshi posited that on the internet, everything is eventually connected so all versions of a public ledger will eventually be observable by everyone. Consensus will be used to determine which state of a ledger is valid. We still have a problem, though.
Consensus can determine the correctness of the state of a ledger at a particular point in time. Consensus alone cannot determine the correctness of a ledger as it moves through states. Imagine that a malicious entity creates a ledger state showing them to be the sender of some transaction, receive compensation and then present another ledger state that negates the earlier transaction. We need to not just agree that a state is valid, but we must also agree that this state will not be changed at a later time. Satoshi proposed that if there are two conflicting states, the state that was the hardest to generate computationally wins. Proof of Work provides a mechanism for injecting artificial computational complexity to the process of adding to a ledger. In Bitcoin, that unit of work involves finding a hash value that is less than some number set by the network called a difficulty level. First node to solve the puzzle wins and gets to add its proposed block to the blockchain and receive a mining reward. In case of ties, the longest blockchain will ultimately win thus providing eventual consistency. With this solution, we assume that an adversary cannot break the cryptography used in an existing blockchain nor do they have the computational power necessary to compete with the entire existing network.
Using some form of voting among some set of participants to achieve consensus in a faulty system seems to be emerging as a common pattern. A good voting algorithm should have fairness, investment and verification. Distributing leader election across the broadest possible population is a measurement of fairness. Keeping the costs associated with leader election proportional to the value is an investment measurement. Simplicity of leader legitimacy is a verification measurement. PoW elects a leader through a lottery system and this leader than proposes the new block and the rest of the chain accepts this block. Bitcoin uses miners to solve a cryptographic puzzle to determine leader election. BFT algorithms use multiple rounds of explicit voting. We now have a general understanding of how to address fail-stop and byzantine failure in asynchronous distributed systems by using voting to achieve consensus and we have identified two different algorithms currently used in production. However, there are still more algorithms in progress and no doubt more coming in the future. Substitute”consensus” for “government” and “leader election algorithms” for “democracy” and I think Mr. Churchill sums up our current state of affairs nicely.
‘Many forms of Government have been tried, and will be tried in this world of sin and woe. No one pretends that democracy is perfect or all-wise. Indeed it has been said that democracy is the worst form of Government except for all those other forms that have been tried from time to time.…’
Winston S Churchill, 11 November 1947
We have seen there are production-grade systems that address byzantine fault tolerance. In the next post, we will see the landscape continues to grow with new and updated algorithms.