Skip to content
Jason Law edited this page Feb 14, 2016 · 33 revisions

Zeno, a high-performance Byzantine fault tolerant protocol

Zeno is an implementation of RBFT, or Redundant Byzantine Fault Tolerance, a consensus algorithm proposed by Pierre-Louis Aublin, Sonia Ben Mokhtar, and Vivien Quéma. As described in their paper, existing BFT protocols "use a special replica, called the primary, which indicates to other replicas the order in which requests should be processed. This primary can be smartly malicious and degrade the performance of the system without being detected by correct replicas. Our evaluation shows that RBFT achieves similar performance as the most robust protocols when there is no failure and that, under faults, its maximum performance degradation is about 3%, whereas it is at least equal to 78% for existing protocols."

RBFT implements a new approach whereby multiple instances of the protocol run simultaneously, a Master instance, and one or more Backup instances. All the instances order the requests, but only the requests ordered by the Master instance, are actually executed. All nodes monitor the Master and compare its performance with that of the Backup instances. If the Master does not perform acceptably, it is considered malicious and replaced.

In addition to using RBFT, Zeno leverages RAET: Reliable Asynchronous Event Transport Protocol, a high-performance, fault-tolerant communications protocol on top of UDP. RAET leverages Daniel J. Bernstein's Curve25519, a highly-secure high performance elliptic curve.

Where Zeno differs from RBFT is that instead of using Message Authentication Codes, every communication is digitally signed using Curve25519. While MAC authenticators are computationally less expensive to verify than digital signatures, we feel that given the foreseeable protocol applications today, the security trade-offs of using MACs would be too high.

Also, RBFT does not specify the election process, that is, how the primaries of each protocol instance are selected. We've implemented a process that applies a voting to select the primary. The election strategy is plug-able, meaning another strategy with different security and performance characteristics could be substituted easily.

The Tutorial

This tutorial assumes an interest in byzantine agreement but not much else. We'll try to teach some of the concepts as we go, but these protocols are sophisticated. The goal is to give you an overview of the workings of our consensus protocol that helps to overcome Byzantine failures (random, spurious faults, or malicious, intelligent and coordinated attacks) in a distributed system. We refer you to the RBFT document if and when you'd like to know a bit more about RBFT. We recommend going through [Getting Started] (https://docs.google.com/document/d/1MtGYbf9EtIGKHZ8g_RaoXVutegTnXu2Lzb09mJGtyxA/edit?usp=sharing) to know more about Byzantine faults and failures, the origin of BFT, and more about Zeno.

Contributors

This is a collaborative effort from the Evernym team. Have something to add? See a typo? Fork and submit a pull request. We've had fun creating it and would welcome your contribution.

The CLI

To demonstrate the Zeno Protocol in action, we created a command-line interface (cli) that creates a "sandbox" of sorts. In this sandbox, you can spin up a small consensus pool, spin up clients, send requests, and see them move through the system.

We've tried to strike a balance between too much info, and not enough. Where the output can be a bit confusing, we'll call out what's happening. Also, the cli generates a log file with a lot more detail. It might be useful to tail the log while you run the cli to see what's happening. Also, you could attach to a running cli process with a debugger (some of us use PyCharm) and really see what's going on.

Feel free to run the tests. Though we are light on unit tests, we have a number of integration tests that demonstrate the protocol in a number of behaviors. We use py.test,

Future improvements

We are continuing to add to this protocol. Upcoming improvements include:

  • Plugable blacklisting strategies
  • Bootstrapping and authentication schemes (clients and nodes)
  • Persistence adapters (ledger/DHT/database)
  • Enhance tests with more Byzantine faults and categorized malicious behaviors
  • Refactoring for readability, separation of concerns, etc.
  • Take advantage of multiple cores (RAET's LaneStack for inter-process communication)
  • Hardening, load testing, and randomized simulations
  • TBD

Reference:

  1. Pierre-Louis Aublin, Sonia Ben Mokhtar, Vivien Quéma: “RBFT: Redundant Byzantine Fault Tolerance” In Proceedings of the International Conference on Distributed Computing Systems (ICDCS), Philadelphia, USA, July 2013
  2. RAET: Reliable Asynchronous Event Transport Protocol
  3. NaCl: Networking and Cryptography library
Clone this wiki locally