Replication and Sharding in Hyperledger Fabric, Part 1: Peer replication
When we start our adventure with Hyperledger Fabric, we probably don’t think too much about fault tolerance, scaling, data partitioning or other distributed system terms. We just want to get our chaincode up and running. But after some time, the distributed nature of Fabric creeps in, especially when we’re thinking about production-grade systems with much higher load and availability needs. Suddenly, we must switch and think about concepts like replication, sharding. It can be intimidating since Hyperledger Fabric is a complex system and some of its abstractions might be different than well-known Cassandra or Kafka.
In this short series, I’ll try to gather some useful information on the replication & sharding topics. We’ll start with replication - what happens when we don’t have enough peers, orderers and what problems it might cause.
Since we’ll be speaking on a higher level, some basic Fabric knowledge is required:
- What are Peers, Orderers? (Check the top part of our awesome cheat sheet for intro)
- How they are connected with each other by Channel abstraction
- Transaction Flow (cheat sheet)
Let’s meet our basic scenario that we will be discussing and upgrading in every step:
- At start, we’ll have 2 organizations (Company 1 and Company 2)
- connected by channel “channel1” with one chaincode “Chaincode 1” installed.
- Endorsement policy will be configured so that both orgs must approve incoming transactions - AND (‘Company1.member’, Company2.member).
- At the beginning, each organization will have 1 peer.
- We will also have a separate organization for Orderers called “ordererOrg1”.
- Both organizations will have separate backend apps communicating with blockchain via sdk.
- We’ll focus on Peers, Channels, Orderers. Certificate Authorities will be omitted (but still are an important part of the Fabric network).
In this article, we will see how a partially broken network affects the chaincode and its invoke or query operations.
For simplicity, I’ll divide unavailability of a service into two categories. This division is very simplified and not formal in any way, but it's good for the purpose of this article.
Temporary downtime issue
A classic example is network failure. The service is up and running, but due to occasional network issues, we can’t connect to it. After some time, e.g. after 2 minutes, it may come back and be responsive again. No data loss in this case. The service might only be a little behind and needs to request data from the last two minutes.
Another example of a Temporary downtime issue is a situation where Kuberentes moves pod to another node due to a partial cluster crash where node pool decreases. Of course, it would be possible if data backups were done right ;)
Long running failures
More serious scenario. It’s possible in the case of peer data getting corrupted or complete disk crash without backup. We must bring up a completely new instance of a peer (or other service) and start from the beginning.
First, let’s focus on peer failures. In distributed systems, every service can occasionally crash and it's a completely normal thing. Peers are no different and our current setup means potential problems in the future.
If we’re not prepared for such situations, both temporary downtimes and long running failures obviously mean unavailability of the network. Slightly different for both crash types.
It means unavailability of the system for a short period of time. And different problems in both companies:
- Backend application of Company 2 cannot query nor invoke the chaincode as there’s no point of contact for the Fabric network. So it gets peer connection errors.
- Peers are the first services that we use during the Transaction Flow to start the simulation phase. For Company 2, since its peers are not available, it means fail fast. Backend 2 can't connect to the Fabric network, so the Transaction Proposal phase will fail.
- Query: Since peers of Company 1 are alive, Backend 1 can query the chaincode and get data.
- Invoke: Unfortunately, outage in Company 2 also means problems in Company 1. If Company 2 doesn’t have any peers alive, we cannot satisfy the endorsement policy of the chaincode, so we cannot process any new transactions. (The hidden beauty of distributed systems achieved: Failure in one organization can cause problems in the other.)
- Backend app of Company 1 can connect to the network, but it will get endorsement errors. Transactions may pass the first step of the transaction flow (simulation), but they will be dropped by sdk or later by Orderer due to not fulfilling the endorsement policy.
Long running failures
All issues mentioned in the section above will also apply to long running failures, but we will also get more serious problems. Peers are responsible for storing the copy of a ledger, so if we completely lose the ledger replica in Company 2, it’s not good. We could always ask Company 1 to send a copy via some secured channel. But in limited trust cases, it’s not a good idea. Data might have been manipulated. Company 1, for example, could modify account balances or asset ownership.
We need a few independent replicas of data to detect data manipulation. In the case of complete ledger change in one of the organizations:
- Two replicas (two independent organizations in our case) guarantee that new transactions won’t pass due to inconsistent peer responses. Such configuration is good and common but won’t give any chances to automatically recover after such a situation, as we don’t know whose replica is correct. It might be Company 1 or Company 2 and only human verification of transaction logs, backups or other goodies can reveal the truth. Till that time, the whole system is inoperative.
- More than two organizations with independent replicas of the ledger are the best option. If a majority of organizations say that some replica is true, we may assume that they are right.
The above discussion is nothing else than the famous >50% attack in public blockchain, but done for the private blockchain. Since the organization maintains and controls its peers, we have a different line of independence: limited to a single organization, not a single peer as it’s in the public blockchain world.
Lastly, it is worth mentioning that probably most organizations won’t intentionally decide for such a move - companies working on a common goal are by their nature more coupled than individual users of the public chain. Long-running outage of the crucial system means huge money loss, due to e.g. production line paralysis, to all participants.
How to fix it? Scale peers up
Ok, enough of talking - let’s fix our setup.
Let's add a second peer for each organization. Now when Peer 2 of Company 2 fails, we can still communicate with Peer 4. The endorsement policy requires approval of each organization - which means a response from at least one peer from a given organization. Now in the case of a partial system crash, we can continue and process transactions.
Bear in mind that having two peers fixed our example, but is still not enough for production. I’ll soon explain what default we should have.
What happens under the hood?
The gossip protocol is responsible for handling recovery situations - when Peer 2 is back, it asks peers in its channel about ledger updates. To be precise, it asks them about the current block height and if it notices that it’s left behind, it fetches missing blocks. The fetched blocks, before being applied, are also validated against the local replica of data; In Byzantine Fault Tolerant Systems, we must exclude the possibility of fetching blocks from a corrupted peer.
Bear in mind that in the case of big data loss, the process of catching up with the network might take some time. Every block may contain multiple transactions and must be verificated before applying. In the case of peer starting from scratch, it means verifying every block from the genesis up to the latest one.
Lastly, note that one of the purposes of the gossip protocol established between peers is decreasing the usage of Orderer. That way, the Orderer is not overwhelmed by queries from other peers and also it’s not acting as a single source of truth having valid information about blocks. The single source of truth can be easily hacked and we want to disable such possibilities by any means in the Blockchain world.
What defaults should I have?
The basic sane minimum is 3 peers in every organization and every one of these peers should be marked both as an anchor peer and a bootstrap peer. This enables some redundancy and, therefore, proper use of the gossip protocol.
Keep in mind that every peer should be deployed on a different physical machine (node), otherwise it won’t make much sense. If we use modern orchestration tools, e.g. Kubernetes, such a process should be fairly simple - we can define anti-affinity and keep them in different places.
Hyperledger Fabric is meant for companies that desire to host a blockchain solution between them using private infrastructures of their choice. It’s very different from public blockchains where the network is spread all over the world. Fabric is more centralized and often run on two clusters between two companies.
Such a solution is not very different from classical distributed systems where we must consider various fallacies that might happen. First and most important one being distributed ledger loss or corruption, which was covered in this article.
We’ve discussed various unavailability types that might affect peers, described problems that they might entail. We’ve also shown how important it is to have at least two independent replicas of the data hosted by different organizations and proposed some default peer replication factors.
If you liked it, stay tuned for more! In the next episode, we’ll focus on Orderer replication - why it’s important, how it can be achieved, and how Orderers actually store data.