GDPR in Hyperledger Fabric webp image

Blockchain is not a regular database. Once data is published on blockchain, it stays there forever. This is obviously conflicting with the ideas behind the GDPR act. Hyperledger Fabric has some solutions that allow keeping the data secret and possible to remove, but even then, there is a risk of doing something irreversible.

Important note: This post is not legal advice. Since I am a tech person, it contains only some suggestions and highlights on how to challenge GDPR compliance in blockchain-based systems.

GDPR and blockchain

GDPR, the regulation having a great impact on the IT industry, obliges service providers (i.e. controllers) to allow end users (data subjects) to remove their sensitive information. Article 17 states:

The data subject shall have the right to obtain from the controller the erasure of personal data concerning him or her without undue delay and the controller shall have the obligation to erase personal data without undue delay (...).

There are some circumstances when the controller can keep the data anyway, but still - it is far safer from the legal point of view to have the technical ability to remove it.

You can handle the right to be forgotten in two ways. You can either delete the data or “forget” it using some kind of crypto-shredding.

Since no data can be removed from blockchain, you simply cannot store sensitive data there. No real names, no personal ID numbers, no e-mail addresses, no phone numbers, addresses etc. Free text fields are risky as well. If a user can put something in a text field, it’s a matter of time before they put in some sensitive information that should eventually be removed.

If you want to use crypto-shredding, you need to store encrypted information on blockchain. But it requires encryption keys to be stored out of the chain, and without those keys, all the encrypted information stored on the chain is useless. The blockchain is not self-contained anymore, it requires an external system.

It is also possible not to store any sensitive information on the blockchain, but use it as evidence of data integrity. A common scenario is when you store a checksum (hash) of the sensitive data on blockchain. Each time you want to verify the data, you can just calculate the hash again and compare it with the stored one. If they are different, it means the data is invalid.

This approach has a significant advantage over storing encrypted data on the chain. When we have an eventual breakthrough in cryptography, data hashes still cannot be used to restore the original information.

Private data in Hyperledger Fabric

Hyperledger Fabric has a built-in mechanism for storing data off the chain and simultaneously keeping the hash of the data on the chain. It is called private data and basically consists of two elements:

  1. The private data itself, which is distributed via peer-to-peer protocol among Hyperledger Fabric peers. It is not stored in the blockchain.
  2. The private data hashes that are stored in the blockchain.

You can just invoke smart contracts and pass the private data as arguments. Hyperledger Fabric will manage to create a hash, save the hash on the blockchain, and sync the private data with other peers.

You don't need to create data hashes in a blockchain client application. You also don't need to sync the private data among multiple organizations. No additional complexity and costs.

source: Hyperledger Fabric documentation

Blockchain makes sense in the case of multiple organizations who share data and do not trust each other. Hyperledger Fabric adds an abstraction called “channel” to group organizations in terms of permissions. When some organizations are in the same channel, they have access to the same smart contracts and the same channel state (the same "part" of the blockchain and database representing the business logic state).

Additionally, you can create private data collections that are independent from the channels. Depending on your needs, the collection may be shared:

  • within the organization peers only (collections of this kind are created automatically for each organization),
  • with a part of the organizations in the channel,
  • across channels.

What it means in practice is that Hyperledger Fabric gives you a database or a set of databases to store sensitive information, and those databases are synced among peers of organizations you choose. And the integrity of the database content is guaranteed by the blockchain. Very useful tool.

What is stored on chain in Hyperledger Fabric

Before I write what might go wrong with private data, it is important to point out what information is stored on the chain in Hyperledger Fabric.

source: Hyperledger Fabric documentation

There is obviously some metadata about the blocks. There is a header with the block number, a hash and the previous block hash, thanks to them the whole blockchain is interlinked and resistant to manipulations. There is metadata with signatures of the block creator and some low-level data for ensuring the consistency of the state.

Finally, and what is more important, there is a list of transactions with client application signatures, chaincode names, input parameters, reads and writes to the world state, and lists of transaction execution outputs from all required organizations.

source: Hyperledger Fabric documentation

What is tricky in terms of private data is that the transactions stored on the chain contain:

  • Input parameters (Proposal)
  • Output of smart contracts and values saved to the world state (Response)

Note that, since the signatures (in the form of X.509 certificates) are stored in blockchain, and the certificates are created by CA nodes, you should not pass sensitive information to CA as well.

Be careful with private data

Use transient parameters

All regular input parameters for smart contract invocation are stored on the chain. If you want to pass sensitive information to be saved in private data collection, use transient parameters instead of regular ones. They won’t be saved.

async putPrivateMessage(ctx) {
 const transient = ctx.stub.getTransient();
 const message = transient.get("message");
 await ctx.stub.putPrivateData("my-collection", "message", message);
 return { success: "OK" };

Beware of Hyperledger Fabric 1.4

When you restrict access to private data collections, you provide organization MSP names in a form of signature policy and parameters describing read/write access to the collection. For example:

   "name": "my-collection",
   "policy": "OR('Org1MSP.member', 'Org2MSP.member')",
   "memberOnlyRead": true,
   "memberOnlyWrite": true,

Hyperledger Fabric 1.4.x does not support the memberOnlyWrite parameter. It means that even if not all organizations can read from private data collection, all organizations that can call relevant smart contracts can also write to the collection.

Avoid returning private data in contracts

What do you think, what is wrong with the following example?

async getPrivateMessage(ctx) {
 const message = await ctx.stub.getPrivateData("my-collection", "message");
 const messageString = message.toBuffer ? message.toBuffer().toString() : message.toString();
 return { success: messageString };

Return values of smart contracts are saved on the chain. So, if you invoke the smart contract, the messageString is saved on the chain, and all channel members have access to it.

You can obviously not invoke (submit) this smart contract and only query (evaluate) it. But this is tricky. It is easy to reveal the value by a simple mistake. And then, it will not be possible to remove it.

A possible approach to handle this issue is to require a salt as a transient parameter and encode the returning value with the salt.

Do not save simple values

A simple value, like an e-mail address, a phone number or ID number should not be saved in private data.

Technically, Hyperledger Fabric stores a simple SHA-256 control sum of the data on the chain. If the object saved is a simple value, it can easily be guessed by a brute force attack (see more in this thread on Stack Overflow).

To handle this issue, you can either store more complex objects in private data collections, or provide some salt as a transient parameter and add it to the value stored in a private data collection.

Do not assume private data is GDPR compliant out of the box

Private data aims to be GDPR compliant. But it is not, at least for now. There are still some issues to be resolved. There is a whole epic dedicated to it in the Hyperledgr’s Jira (FAB-5097) and it is described in one of the design documents.

The issue here is quite complex. As the design doc states, private data is stored in multiple databases:

  1. Private Temp DB stores transient (uncommitted) private read-write sets for transactions ‘on the side’, between endorsement time and commit time.
  2. Private write-set log: Primary storage for committed private write sets, a transaction log of private write sets keyed by blockNum or (blockNum:tranNum) to assist in state transfer alongside blocks (which is the transaction log for public data). That means, the private write-set log contains the data that is needed to (re-)compute the current state of the private state DB.
  3. Private State DB: Similar to state DB - stores the latest version of committed private keys/values. Used by chaincode APIs. Can be rebuilt from a private write-set log.
  4. Private History DB stores history of committed private value updates of a key (pointer to private write-set log blockNum:tranNum)

The issue here is you cannot control the deletion of private data. It is easy to delete the data from Private State DB (3).

The deletion of the data from Private Temp DB (1), Private write-set log (2), and Private State DB (3) can be handled by blockToLive configuration parameter, i.e. can be deleted automatically after the configured number of blocks will be added to the chain. So it is just a subject of some retention policy. This is inconvenient, but the data will be eventually removed.

But the removal of the data from Private History DB (4) is not supported.

So, if you want to use private data in Hyperledger Fabric for storing personal information and, at the same time, you want to make your solution GDPR compliant, you need to find a way to manually remove the private data upon request.

Final remarks

Don’t get me wrong. Private data is a really useful and powerful tool to enhance privacy in the Hyperledger Fabric network. It may also be used to support GDPR compliance. But this is not a silver bullet, at least for now. Use it with care.

If you want to experiment with private data, feel free to use our Open Source project, Fablo, now under Hyperledger Labs. With Fablo, you can easily set up local networks with private data for various Hyperledger Fabric versions and with easy REST API access to smart contracts.

Blog Comments powered by Disqus.