Consensus Algorithms Compared: PoA vs IBFT vs Raft

Consensus Algorithms: PoA, IBFT or Raft?

Since this post was written, Hyperledger FireFly has reached 1.0. Learn more here!

The Kaleido platform offers three types of consensus algorithms between three blockchain protocols. With the Go-ethereum client it offers Proof-of-Authority, Quorum client supports Istanbul BFT and Raft, and finally with Hyperledger Besu, formerly Pantheon, you get Proof-of-Authority and Istanbul BFT*.

Let’s take a closer look at them so you can decide with confidence which algorithm best fits the business needs for your consortium.

Blockchain Protocols Use Certain Consensus Algorithms

Geth is an open source blockchain protocol implementing the standard Ethereum specification plus other useful features such as a pluggable consensus engine. Geth is among the most widely used implementations of Ethereum nodes. It supports both the standard DAG based Proof-of-Work consensus and Proof-of-Authority (PoA) consensus with its implementation called Clique.

The Geth nodes in the Kaleido platform are based on release version v1.8.27 stable.

Quorum is another open source blockchain protocol specially designed for use in a private blockchain network, where there is only a single member owning all the nodes, or, a consortium blockchain network, where multiple members each own a portion of the network. Quorum is derived from Ethereum by modifying the Geth client. The latest release as of the writing of this blog is 2.2.3 based on Geth 1.8.27.

Some of the key features of Quorum include:

Privacy via private transactions: members of a Quorum network can send private transactions that are addressed to a subset of nodes, such that the contents of the transaction are not exposed to non-privy members.
Peer permissioning: a Quorum network can be configured to run in *permissioned* mode such that all nodes must be explicitly listed in an access control list enforced by all nodes. This prevents foreign nodes from tapping into the network and replicating blocks as is the case in permissionless networks.
Flexible consensus: described in greater detail later in this post, Quorum supports Raft and IBFT as valid consensus options. Both support transaction finality (i.e. lack of chain forking) and offer shorter block intervals than proof-of-work.

For more information on Quorum, visit the github.com repository or the Quorum wiki.

Hyperledger Besu is an open source Ethereum compliant blockchain implementation that is purpose-built for the enterprise permissioned blockchain market. Written in Java from scratch, Besu is built on a modular design that enables pluggable implementations of key components such as consensus, cryptography and signing wallets. Besu offers two consensus algorithms to choose from: IBFT* and Clique PoA. For privacy it also supports the Quorum-style private transactions in a separate component called Orion (equivalent of Quorum’s Constellation or Tessera). The latest release as of the writing is 1.1.1.

Some of the unique features in Besu:

User authentication with a built-in user registry for JSON RPC APIs
On-chain smart contracts based permissioning rules with local node-specific override

For more information on Besu, visit the github.com repository or the documentation.

What Is “Consensus” Anyway?

If you have not come across distributed systems before, the concept of “consensus” may be foreign to you. In short, because distributed systems act independently when processing information (i.e. executing transactions) and updating state, there must be non-disputable agreement in the resulting states among the nodes. The process of achieving the agreement among the distributed nodes’ states is called *consensus*. For a concise introduction to consensus algorithms, visit this published course by Duke University.

There are many different kinds of consensus algorithms. In the blockchain world you come across myriad *proof-of-X* variations. These types of algorithms tend to rely on laws of physics (limit on computing speed) or economics (incentives for honest behaviors or disincentives for dishonest behaviors) to guarantee agreement; they apply to the cryptocurrency networks in a public setting. Discussions on this class of algorithms are out of the scope of this blog. Visit this blog post for a comprehensive run down.

Private blockchains have different properties than public blockchains, and that affords opportunities to use alternative consensus algorithms. One key aspect is the fact that all nodes must be explicitly allowed to join the network, aka peer permissioning. In addition, clearly designated node identities apply accountability toward the participating members. These properties make it possible to employ consensus algorithms that are designed for a distributed network comprised entirely of trusted (honest) nodes. Examples include Paxos (as seen in Kafka Zookeeper implementations) and Raft (to be discussed below). For networks where participants don’t assume honesty toward each other, there are a number of Byzantine Fault Tolerant variants (e.g. Istanbul, addressed later in this post) that can be employed.

All three of the consensus algorithms discussed in this article offer absolute finality, which describe the fact that a confirmed transaction can not be reverted or annulled. Public blockchain networks utilizing Proof-of-Work do not offer absolute finality, because the cryptographic seal of a block is based on compute-intensive hashes, which with enough compute power can be calculated, resulting in a significant number of the blocks to be re-written. Only when the number of nodes that make up the network is large enough, will there be high enough probabilistic guarantee that a re-write can not happen. On the other hand, both Raft and IBFT offer immediate finality, where no forks are ever allowed to happen. Clique offers finality within the (N/2)+1 blocks because each of those blocks are protected by a different signer, thus forming a digital signature chain that can not be broken without majority of the network colluding together.

Clique (Proof-of-Authority)

Geth nodes in Kaleido use Proof-of-Authority (PoA) as the consensus algorithm. In particular, the implementation by the Geth team is called Clique. Clique uses digital signatures to seal the blocks and achieve data immutability. For consensus, PoA relies on a set of trusted nodes called Authorities that use a simplified messaging algorithm to achieve better performance than typical PBFT algorithms. There is only one round of messages exchanged among the authorities in PoA, compared to 3 rounds in PBFT. Thus, better performance is one of the claims of PoA when measured against other BFT algorithms, especially PBFT.

A PoA network can tolerate up to N/2 – 1 byzantine authority nodes. Namely, it can operate correctly when a simple majority of the authority nodes, N/2 + 1, are honest. For each block, multiple authority nodes are allowed to propose. The algorithm relies on Ethereum’s GHOST protocol to resolve forks that can result from multiple authorities proposing at the same time. Compared to PBFT algorithms, the design of PoA sacrifices consistency (forking can happen) for better availability (faster block committal). And in particular Clique offers eventual consistency PBFT vs. PoA analysis, when the forks get sorted out by the GHOST protocol.

PoA produce blocks at a configurable but fixed interval, regardless of whether there are transactions to include. Kaleido, as of this writing, sets the block interval to 5 seconds for all nodes using PoA that are created in the UI. Block period is configurable in the range 5s-10s when created via Kaleido APIs.

Clique is an appropriate choice for a network containing parties that do not trust each other. When considering this algorithm, makes sure to understand that forking will happen in a network with more than 4 authorities. In Kaleido, all nodes are currently established as authorities. So in a network of 6 nodes, 5 for the users plus the System Monitor node, up to 2 authority nodes can propose blocks for any given block interval. The formula is `N – (N/2 + 1)`.

Network Size	Quorum Size	Fault Tolerance
1	1	0
2	2	0
3	2	1
4	3	1
5	3	2
6	4	2

‍

Another rather important aspect of Clique is that each block is sealed by only one signature – that of the proposer’s. This is significantly different than PBFT where at least a super majority of validators provide their signatures to every block. The implication is that the data immutability guarantee is weaker than PBFT. However, it still provides a strong guarantee across a chain with many blocks, because every proposer is only allowed to sign once every N/2 + 1 blocks. Given the strong tamper proof feature of the blockchain itself, this means that in order to attack the data on the blockchain the attacker would still have to compromise the whole set of authority nodes.

Example block in a Geth chain using Clique PoA:

> eth.getBlock(10000){
  difficulty: 2,
  extraData: "0xd783010703846765746887676f312e392e36856c696e75780000000000000000c845989a1fd72cc3d45a9fc1a96429621f716461ef9b6fffca9e72f2fcdcdcbb719c36f47e26da5c749c59efd5692fe8fb5b664580297fb2d11f7eb8b9bb45da01",
  gasLimit: 4712388,
  gasUsed: 857520,
  hash: "0x4eac41573e910786b94536eb14eb8fb70fcce6ab62e6034d72a63660edf9b789",
  logsBloom: "0x00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000",
  miner: "0x0000000000000000000000000000000000000000",
  mixHash: "0x0000000000000000000000000000000000000000000000000000000000000000",
  nonce: "0x0000000000000000",
  number: 10000,
  parentHash: "0x0f588f510c528f3101220b70268d4162bc56dd693c1ed9c8f3456273ffa31df9",
  receiptsRoot: "0xbbdc940f6b7efb1930667aba9647ba133d75770d418ccdc0e388191610890ca9",
  sha3Uncles: "0x1dcc4de8dec75d7aab85b567b6ccd41ad312451b948a7413f0a142fd40d49347",
  size: 6762,
  stateRoot: "0xa9df63c158d4aeff04c3313f50b05bea1f37a57c515a6ee048a94e0d9c8a92c3",
  timestamp: 1526067551,
  totalDifficulty: 19858,
  transactions: ["0x9829a5e1c1d0b360e72e1d5bf6955d49d306d6b81decfe1576251c8bb7dc614e",
  "0xbdc1e9d58acc97eacc18b856738e0963af79b3e203093f9652281119327870ed",
  "0x8617a28e960c17d257a27b3f508fe4d01cc8b4eb4ab1310c9d5f7a8240bebdff"],
  transactionsRoot: "0x986a94e7fdc3f54a48c8b8785b64e55355f2fe8374685f6d309786c235a9ca32",
  uncles: []
}

Istanbul Byzantine Fault Tolerance (IBFT)

In 2017, JPMC and AMIS collaborated to introduce the first implementation of IBFT into Quorum, and it has proven to be a very popular consensus algorithm for production enterprise permissioned networks requiring byzantine fault tolerance and finality. Later, PegaSys team dedicated significant research and development effort in 2018 for a Java implementation targeting notable improvements in safety and liveness under byzantine validator behaviors.

Istanbul BFT, or IBFT, is an implementation of the Practical Byzantine Fault Tolerance algorithm with modifications. The main difference from Raft, or any other Crash Fault Tolerance algorithms, is that while Raft followers blindly trust their leader, in IBFT each block requires multiple rounds of voting by the set of validators to arrive at a mutual agreement, which is recorded as a collection of signatures on the block content. A validator never assumes the “leader”, or “block proposer”, to be correct or honest. Instead it verifies the proposed block just like other consensus engines operating in an untrusted environment (Proof-of-Work, etc.)

By definition, Byzantine Fault Tolerance means a network can continue to function correctly even if some nodes are dishonest and attempt to propose invalid blocks, or blocks that benefit certain parties at the expense of others. In particular, a PBFT implementation, which IBFT is one of, can tolerate up to *f* number of dishonest (faulty) nodes in a network of 3f + 1 nodes. This roughly translates to 1/3 of faulty nodes being tolerated. This can be even more roughly translated to “super-majority rules” algorithm.

Network Size	Quorum Size	Fault Tolerance
1	1	0
2	2	0
3	3	0
4	3	1
5	4	1
6	4	1
7	5	2
8	6	2
9	6	2

‍

When considering IBFT, it’s important to understand the properties of the nodes. If they belong to parties that you don’t want to assume will always act fairly and honestly, like your fiercest competitors, or may be susceptible to compromises, like those deployed in a less-than-ideal security environment, then IBFT is a necessary choice.

A network using IBFT will always produce blocks at a constant interval, regardless of whether there are pending transactions. This means there can be many blocks with zero transactions if the transaction load is low. Kaleido, as of this writing, sets the block interval to 10 seconds for all nodes using IBFT.

Blocks produced by IBFT are strongly protected against tampering through the collection of signatures from the proposer and the voting validators. It will be impossible to rewrite the block content without having access to all the private signing keys of the proposer and the validator nodes. This provides strong guarantees to the immutability of the resulting blockchain.

The list of validators that get involved in voting for each block can be dynamically expanded or shrunk, by asking existing validators to vote:

To add a new validator, at least 2/3 of the existing validators must call `istanbul.propose(new_node_address, true);`
To evict an existing validator, at least 2/3 of the existing validators must call `istanbul.propose(new_node_address, false);`

Finally, the block interval is configurable, with the default being 1 second. Kaleido today sets the block interval to 10 seconds for all IBFT networks created in the UI, with configurable block interval for nodes created via APIs (allowed range 10s – 20s).

Example block in a Quorum chain using IBFT:

> eth.getBlock(2001){
  difficulty: 1,
  extraData: "0xd783010702846765746887676f312e372e33856c696e75780000000000000000f89ed594abbcf2086d69783e7c7bd88193860da082ac5a6fb84131cd6969c6eddd7cef61ce42c881566966922ebf530868b8769db1cb2742031a6045ee9dd1857a04654c672bdd3a05d25435ec9caa98a2388ab2cd1c9e91dad900f843b841f829ba18c2d4a55da9b62179f25488ae7205069e79d94cbf2da3b9ecb62746e50620c1ac308b01e0f23d85de147ec25826e8a03560b8efb068fd08894f8d8b5701",
  gasLimit: 113962254,
  gasUsed: 278693,
  hash: "0x65f173061375a984adc24edec314262f2c47d91ed026d11b0654ecb40747b497",
  logsBloom: "0x00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000",
  miner: "0x0000000000000000000000000000000000000000",
  mixHash: "0x63746963616c2062797a616e74696e65206661756c7420746f6c6572616e6365",
  nonce: "0x0000000000000000",
  number: 2001,
  parentHash: "0xfb307e7034bcf9553308afaf876991e0a6510ccdb4ac9ed5a918475cf7f9e269",
  receiptsRoot: "0x1ccca06221387a6e390167e39e1f08647686fe4150891fce5e5073a40d1dced4",
  sha3Uncles: "0x1dcc4de8dec75d7aab85b567b6ccd41ad312451b948a7413f0a142fd40d49347",
  size: 2708,
  stateRoot: "0xf0e5b4667d71b8505be207177df97e65a763aa09174da5c28350df279ed519f5",
  timestamp: 1521830564,
  totalDifficulty: 2002,
  transactions: ["0x7b031bda65dc52d2940ba221d81c3d4287bcc00b41f63693d60828f2fffdb15a",
  "0x91da07e8f350b55d1411dbf25ce8c4f675be494c2be73891c562265413d82b15"],
  transactionsRoot: "0x67e1cd0b9f6bd9973b11c5891a5b9711862191e9571b9b38f7b1eb1ea82b49bc",
  uncles: []
}

* Pantheon’s IBFT implementation is based on the original IBFT proposal and the implementation adopted by Quorum, but with a series of improvements mainly to address byzantine behaviors (validator nodes knowingly acting maliciously) as described in great details in Robert Saltini’s IBFT analysis paper. The Quorum and Pantheon developers are working together on a formal specification of IBFT: github.com/EntEthAlliance/enhanced-bft/issues.

Raft

Quorum uses a Raft implementation in `etcd` to provide a Crash Fault Tolerant (CFT) consensus engine. Raft is a well-known consensus algorithm developed by researchers at Stanford University. The `etcd` implementation has been battle-tested in many distributed frameworks including Kubernetes, Docker Swarm, CloudFoundry Diego containers, etc.

This is a CFT consensus, as opposed to BFT (Byzantine Fault Tolerance) because in Raft the leader is assumed to always act correctly (honestly). All the followers blindly replicate the entries proposed by the leader with no questions asked. If the leader crashes, the remainder of the network will automatically elect a new leader after a set period of timeout, and the network will continue to function. When the crashed node recovers, it will become a follower and start replicating the blocks it has missed while offline.

Network Size	Quorum Size	Fault Tolerance
1	1	0
2	2	0
3	2	1
4	3	1
5	3	2
6	4	2

‍

Blocks minted in a Quorum Raft consensus network, as of this writing, are not protected by either a unique hash, as in Proof-of-Work, or by signatures, as in other consensus algorithms like BFT. As a result, the blockchain data can be re-written rather easily by modifying historical data (transaction inputs) and re-calculating the block hash (as well as other relevant fields that must be kept coherent, like transactions trie root, etc.). Other means of data protection may need to be employed to ensure the integrity of the blockchain data. (Edit: this problem has been addressed by Quorum pull request #395, which was submitted by engineers from Kaleido).

Raft consensus does not mint blocks unless there are pending transactions. This can result in significant storage savings, especially when the transaction load is low, because no empty blocks containing zero transactions are being minted.

Another advantage of using Raft is faster block time compared to IBFT. The leader mints a block within 50ms of receiving the transaction (per the default setting), and flowing a proposed block through the Raft cluster and collecting majority acknowledgements is a very fast process. Under most typical network conditions, average block times are consistently sub-seconds.

Example block in a Quorum chain using Raft:

> eth.getBlock(2){
  difficulty: 131072,
  extraData: "0x",
  gasLimit: 802677719,
  gasUsed: 26620,
  hash: "0xc7afe8d21071120517fac800c5619b26efcbfba23f7974effcaf1f100eb1671a",
  logsBloom: "0x00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000",
  miner: "0x0000000000000000000000000000000000000000",
  mixHash: "0x0000000000000000000000000000000000000000000000000000000000000000",
  nonce: "0x0000000000000000",
  number: 2,
  parentHash: "0x341fed9780faa832a5207e1b7ccbc1d27903ff4d39e92fb02241d707cb49a488",
  receiptsRoot: "0xa13fa2556b7c907468667b128ebd08c4b69d9826e946252dbd230a46d9b1fe8d",
  sha3Uncles: "0x1dcc4de8dec75d7aab85b567b6ccd41ad312451b948a7413f0a142fd40d49347",
  size: 652,
  stateRoot: "0xc8ed03350b0c8b5a272e8602ca49c91a52b67beb04463e755d120f33ce52d786",
  timestamp: 1522098611952556500,
  totalDifficulty: 262144,
  transactions: ["0x68ebea2905de935128660065eeb019d893f9fe86eff2fe06f56f6eb3a63646a8"],
  transactionsRoot: "0x7c77d4ffc657dfbe00b3d7dba1924041bc1a34b91ee95776ae740c924c31c144",
  uncles: []
}