Byzantine Fault Tolerance Quickly Explained

Nicholas Tang
4 min readApr 18, 2021

In distributed computation such as blockchain, one of the key characteristic for the blockchain to work is to keep functioning in the presence of multiple faulty processes or nodes. The name given to this algorithm is the Byzantine Fault Tolerance.

Solving the impossible

Let’s start with a scenario of two generals. In order to win a battle, two generals must come to an agreement on whether to attack or to retreat. If one of the general retreats while the other attacks, the battle will result in a loss. They can communicate with each other before the battle and come to a consensus. Now, let’s say one of them is a traitor (faulty processes), when the other general agrees to attack, the traitor will retreat and vice versa. You can see it is impossible to win this battle. Thus, from this story, we learned that two nodes will not solve the distributed computing problem since if one of them is faulty, the whole system will collapse.

Byzantine Fault Tolerance

Now let’s make the problem easier for us. We will add two more generals into the scenario where if the majority of the vote to attack or retreat is achieved, they win the battle (in this case, at least 3 generals are needed to achieve consensus). Let’s imagine that 4 generals surrounded a castle and they are planning to attack or retreat.

All the generals can deliver messages to each other thus each general will have inputs from all three generals.

In order to come to a consensus, the captain of all the generals, let’s assume that general 1 is the captain, will give orders to all other generals whether to attack or to retreat. Also there will be one traitor among the generals. Keep in mind that the captain may also be a traitor but we will discuss more later. Now let’s say general 3 is a traitor so he will oppose the order received from the captain to lose the battle. Let’s say the captain (general 1) decides to attack, so he forward his messages to all other generals and as usual, general 3 will oppose his orders.

From the image above, green means attack while red means retreat. As shown, the ‘good’ generals will only relay what they heard from the captain while the traitor will oppose. General 2 and 4 will each receives 2 orders to attack and 1 order to retreat. As such, since the generals will only trust the majority, they will agree to attack instead. A consensus is achieved in this scenario.

The Traitor Captain

In the other scenario, unfortunately the captain himself is a traitor. In order to lose the battle, the captain will only have two options, either to give two attack orders to any two generals and one retreat order to a general; or two retreat orders to any two generals and one attack order to a general. If the captain gives the same orders to all three other generals, obviously a consensus would have been achieved and the traitor won’t allow it. Let’s have a looks on what happens when the general gives two attack orders to general 2 and 4 and one retreat order to general 3.

The generals other than the traitor will always accept the majority vote. In this scenario, all three generals will agree to attack since all of them receives the order to attack as their majority messages. General 1 may do what he wants to lose the battle since he is the traitor. But since the majority of the generals decides to attack, a consensus is already achieved.

If the number of traitor increases to two, just like we discussed above where 50% of the total generals are traitors, consensus will never be achieved. No more than 1/3 of the generals (or nodes) can be a traitor in order to achieve consensus.

Conclusion

The Byzantine Fault Tolerance is a complex algorithm to achieve consensus, especially in distributed computing, which is also used in the blockchain technology where the blockchain needs to be resilient under multiple faulty nodes. This algorithm can also be used in a lot other places such as airplanes where if one of the components failed, the plane has to keeps on flying. Nuclear power plants are also actively using this algorithm as a precaution as well as rocket launches.

--

--

Nicholas Tang

Amateur writer that writes about books review and the latest technology.