What is Data Privacy on a private blockchain and why do we need it?

    Vostok builds on and adapts existing solutions for private data storage and transmission on a private blockchain, optimising the benefits for our target market.

    When we talk about blockchain the very first association that comes to mind is Bitcoin. Bitcoin is a public blockchain, which means that anybody can access the full transaction history of the system, and anybody can set up their own full node and mine, which provides greater decentralisation for the system and better security guarantees. However, there are many companies that might benefit from using blockchain in some of their business processes, but who are limited by the number of authorised parties involved. In addition to this, we can say that the participants of a private blockchain ought to have different roles in the system: miners, maintainers (or just regular participants), administrators who allocate permissions to the other parties, etc. With this in mind we can better understand what a private blockchain actually is.

    This article is about private data on a private blockchain. Let’s say we have a private blockchain maintained by 10 companies, and three of them want to make a deal in such a way that the blockchain contains certain information about the deal, but this information is either concealed or meaningless to the other parties. Because the information they exchange is so sensitive we cannot allow it even to be read by the other private network participants.

    We can formulate the definition of the problem we seek to solve as follows — the specificity of business applications on the blockchain needs mechanics to operate with confidential data among specified parties. We can say that specified parties are only a subset of all the parties in the private network, which means that we need additional methods to protect sensitive information. The rest of our article organised as follows:

    1. Classification of existing solutions - we will discuss major existing solutions for our defined problem. This will be necessary brief, since providing a full overview we would require a separate article for each of them. Also, for each type of solution, we will list the pros and cons.
    2. In Vostok solution architecture - we will outline what lessons we learned from other solutions, the experience we gained deploying enterprise private blockchain solutions, provide details about how Vostok was built and discuss its pros and cons.
    3. Finally, we will give an overview of Vostok’s future development plans.

    Classification of existing solutions

    In order to build a successful and complete solution, we investigated the previous work in this area and came up with a classification of solutions which provide a way to deal with private data.

    1. Private database outside the node (but somewhat integrated). Data is distributed off the blockchain, while on the chain are stored hashes of the data as results. This method is used by Hyperledger Fabric, Quorum, Masterchain, and also by Vostok.
    2. Data stored directly on the state of participants. Data is distributed across transactions in the blockchain, but requires a separate subnetwork (or channel). This method is used by Exonum and also Hyperledger Fabric (why Hyperledger is in both categories will be explained later).
    3. Partial state storage among participants. Some additional information about transactions is transferred off-chain. This approach is implemented by Corda.

    We will now explore different approaches to organising private data flows in a private blockchain in a little more detail. If you are interested in getting to know any of the solutions listed below, feel free to check out the documentation. All of these projects are interesting and have extensive documentation, but we will mention only key aspects related to the Private Data concept.

    Type 1 solutions

    The main idea of the solutions presented in this category is as follows:

    1. To send a private data a node stores it in private database, which is somewhat integrated with the node.
    2. The node publishes a blockchain transaction containing the hash of this private data.
    3. Recipient/s are informed, that there is some private data to retrieve. The recipient node makes a request to the Data Owner to obtain the Private Dataset.
    4. The owner node checks the request (which could be digitally signed) and if it is one of the recipients it creates a connection (TLS or other encryption) and transfers the data.
    5. When the data is received, the recipient checks the hash on the blockchain.
    6. If a new participant is now authorised for some existing private data he can make a request in the same way as any other participants.

    The systems listed below have implemented this general concept with small variations due to their architecture specialities.

    Hyperledger Fabric

    When we talk about private blockchain solutions, Hyperledger Fabric is the first one to come to mind. The main feature of Fabric is flexibility. You can essentially build almost any private blockchain solution based on Hyperledger, but this also leads to the main disadvantage - it is too sophisticated even for experienced developers when trying to build a complex solution. When it comes to private data Hyperledger Fabric allows you to choose one of two possibilities. In this section, we will look at the first one, which is to create a Private Data Collection. This involves exchanging data among participants (a subset of all channel participants) and recording the data hashes on the blockchain for some kind of proof. This is a reasonable approach when you know there will be many different groups to communicate with confidentially. The main privacy components of the Fabric are:

    1. Each participant has an .X509 certificate for Identity confirmation (PKI). The whole system has CAs, and MSPs are used to check the permission of the certificates in the working domain.
    2. Consensus could be optimised in the channel configuration, which is good for overall performance.
    3. Private Data transfer is implemented using Gossip Protocol.
    4. Private Data transfer is done using TLS with point-to-point transmission.
    5. Anchor peers are responsible for maintenance of the Map of Internet Addresses of peers in the channel.
    Hyperledger Fabric private data flow high level overview

    Quorum

    Quorum is a modification of the Ethereum protocol produced by JP Morgan as their private blockchain solution. There are several advantages to using the Ethereum code base, such as a large community who helps detect bugs and improve the product in general, and already implemented smart-contract technologies - which lowers the cost of development for Quorum. Since PoW consensus is redundant for the needs of a private blockchain, the Quorum team proposed three types of consensus: RAFT, CliquePoA - which are fast - and IstanbulBFT, which provides higher security guarantees.

    Quorum private data flow low level overview

    In terms of private data, Quorum has the following features:

    1. Blockchain state is partitioned between the private and public. The public state is consistent among all nodes in the network.
    2. Storage and transmission of private data are organised by the Transaction Manager, which is an entity in the Quorum Node.
    3. Enclave is another entity, which is responsible for encryption and decryption of the private data.
    4. Data is transmitted point-to-point only in PGP-encrypted form.
    5. Has two implementations - Constellation and Tessera for greater flexibility.

    There are other modifications of the Ethereum protocol for private blockchain, for example, Masterchain, which is a Russian project with a cryptography solution certified by Russian Government Standards. Other than having its own cryptography, it does not have major technical difference.

    Type 1 solutions pros and cons

    The relative strengths and weaknesses of the solution will determine whether the solution is suitable for your needs. As for the first type, the strong points are:

    1. The solution completely covers the problem domain.
    2. You can transfer private data of arbitrary size and format, because the blockchain stores only data hashes.
    3. The solution is clean and simple, which is good for development.

    However, we live in the real world and there are also some disadvantages:

    1. If an attacker is able to deploy a node to the network, he will be able to read all the data except the private datasets. This is not really suitable for a private blockchain, where security is one of the main concerns.
    2. In order to exchange private data, one must have a full node – which is a serious limitation for b2c cases.
    3. The solution is based on off-chain technology, which means that it provides less private data immutability guarantees than ordinary transactions, because in a conflict situation it cannot be resolved without a third party. For example, if one party modifies the document in private storage, using the blockchain and checking the stored hash we can see that the document was modified, but we cannot force the dishonest party to restore the right version. This is not stored on the blockchain directly, and can only be accessed by requesting it from an honest party.

    Type 2 Solutions

    This group of solutions aimed to address the third downside in the previous group, which is off-chain private data storage and transfer. It is a safe guess that in this solution, data is transferred on-chain using data transactions. This is a more natural solution for the blockchain philosophy, but comes with certain limitations:

    • To exchange data among participants, you need to create separate channels or sub-chains. The more communication with different groups you have, the more channels you need to support.
    • As a result, you have higher infrastructure demands, because there is obviously more information stored directly on the blockchain.

    Hyperledger Fabric

    Нou can of course create a separate channel in Hyperledger Fabric and this is reasonable when you have a consistently high volume of private data transfer among companies (for example on a daily basis). It is practical to use another channel for this communication. This is an expensive operation in terms of the number of VMs you use because you need additional nodes, an ordering service, MSP, etc. In this case, you can just write to the state with your data organised in key-value pairs.

    Exonum

    Exonum does not operate with definitions like channels, but you can do pretty much the same if you deploying another Exonum sub-chain. The essential features to support data transfer in these sub-chains are:

    1. Variable block and transaction size, which can be defined in a subchain config. Upper bound according to Exonum documentation is 2^32 bytes (approximately 4.3Gb).
    2. pBFT consensus which is fast, if we have a small number of participants, and secure.
    3. Anchoring to Bitcoin to provide more security guarantees.

    Type 2 solutions pros and cons

    In conclusion for this type of solution, the main benefits are:

    1. Can be very fast, because there is almost no overhead for off-chain systems (for example, database for storing private data). There is no need to check data after transmission.
    2. The solution is fully based on the blockchain without the help of any off-chain components.

    The drawbacks include:

    1. For each instance of a business process, you have to deploy a separate channel without the possibility of tying them up in one system.
    2. If an attacker can set up a node and connect to the network, he can obtain all of the private data while synchronisation.
    3. In order to exchange private data, one must have a full node, which is a serious limitation for b2c cases.

    Type 3 Solutions

    Corda

    Corda is a private blockchain solution developed by R3, which gained popularity among financial institutions. Corda’s approach is reminiscent of the first category of solution, but does not really fit because of the unique features:

    1. There is no Transaction Broadcast. All transactions are transmitted point-to-point.
    2. There is no full copy of the blockchain on each node, which means that no node in the network knows the full current state. Each node stores only that part of the state where transactions are explicitly addressing it.
    3. The state is partitioned between the private and public.
    4. Private data is stored not in a specific database outside of the node, but in the special area of Corda Vault, so the Vault accumulates both public and private information.
    5. The private data itself, in the private area of the vault, is marked as ‘Notes’ to the transactions. Transactions contain the ids of the private data if it has to be transferred to other peers.
    Corda private architecture and private data store overview

    Corda: pros and cons

    The main advantages of Corda’s solution are:

    1. Corda has fast transactions finalisation in the network, since consensus doesn't have to be reached among a relatively large number of participants each node keeps only the ‘relevant’ part of the state.
    2. Following from this point, it is very efficient in terms of memory consumption.

    On the other hand, there are weaknesses to the concept:

    1. Unlike other blockchain solutions, Corda has a low level of replication, which is risky – especially when public state is partitioned among parties. In the case of an emergency, the probability of losing some parts of the state is higher than in more classic approaches.
    2. In order to exchange private data, one must have a full node, which is a serious limitation for b2c cases (as for all  previous solutions).
    3. There is no data encryption for private storage.

    Vostok solution architecture

    So far we have provided a brief analysis of the strengths and weaknesses of the existing solutions, which can help us to formulate the main features we targeted while developing a similar concept in Vostok:

    1. Users should be able to exchange private data safely.
    2. The system should be reliable with standard public state replication.
    3. Private data exchanges should be regulated by the blockchain, at least in the form of data hashes.
    4. The new concept should entail a significant performance downgrade for the whole system.

    To address the requirements listed above we divided our privacy concept into three main parts:

    • Node Registration - helps us to ensure only known (both permissioned and registered parties) can gain access to the network. All unauthorised parties should be ignored in order to prevent leaking of the public state.
    • Node handshakes are cryptographically signed to ensure communication only among known parties.
    • Private data stored and transmitted only in encrypted form.

    Now we will briefly address all of Vostok’s privacy components. If you are interested in learning more, feel free to check out our documentation.

    Node Registration

    If you are interested in running your own node on the Vostok mainnet or want to add nodes for your private blockchain, you have to register it on the network, so other participants will know that you are also a trusted party. The registration process is as follows:

    1. The new party generates a node owner key pair and sends the public key to the network participant who has a Connection Manager role, providing additional information about their organisation.
    2. The Connection Manager broadcasts a transaction to the network which contains the public key of the new node and its set of permissions in the network, so other parties know that they are allowed to process requests from a new node after the transaction is applied to the blockchain.
    3. As a result, at any given moment the blockchain state contains information about all network participants. Removal is also implemented as a transaction.
    Vostok node registration process overview

    Signed handshakes

    To ensure that during the network lifecycle we stay in touch only with authorised participants, we proposed the following signed handshake mechanics:

    1. Alice and Bob generate a temporary key pair.
    2. Alice signs her public temporary key with her private regular key. Bob does the same with his key pairs.
    3. Alice and Bob exchange the produced signatures because their regular public keys are stored on the blockchain they are able to know that they are talking with each other, which makes man-in-the-middle attacks more unlikely.

    Temporary keys are also used in private data encryption, before transfer to another party.

    Policy

    Vostok adopts Policy mechanics similar to Hyperledger Fabric in order to give users a more convenient way to share data with a group of participants. Strictly saying, Policy in Vostok is an entity which is used to regulate engagement with this private data. In order to create a policy, you should submit a transaction to the blockchain containing:

    1. A list of participants and their roles - Owner or Recipient. The Recipients can only read and send data corresponding to this policy, while the Owner can add and remove parties from it.
    2. Policy name, with optional description.
    3. Expiration Date.

    Owners can add and remove policy participants using special transactions, to keep everything better synchronised among parties.

    Vostok private data flow high level overview

    Vostok’s private data transfer flow is organised much like in other type 1 solutions, but with its own features, such as:

    1. Oauth service out of the box, which is used to check the access level of users inside the company and proceed only their requests. Organisations can manage whether a particular employee can have access to private data, smart contracts or just to read public transactions from the network.
    2. For private storage (which we recommend to be deployed in secured contour) participants can use any SQL database which supports JDBC driver, to allow better integration with companies’ existing architecture.
    3. Private data is stored and transferred only in encrypted form, as well as encryption keys for this private data. If the size of the data is more than 20 MB it will be transferred as packages, again only in encrypted form. The keys are unique for each encryption session to ensure greater security against keys being compromised.

    Vostok: pros and cons

    Ultimately, Vostok’s solution has the following key benefits:

    1. Security - minimised risk of an adversary gaining access even to public blockchain state. Private data is encrypted.
    2. As with any type 1 solution, one can transfer private data of arbitrary size and format because the blockchain stores only data hashes.
    3. Satisfies all criteria we defined for our solution.

    Naturally, there are some points that should be considered to use solution in a proper way:

    1. In order to exchange private data, a participant must have a full node, which means that to implement b2c case with data privacy you need some kind of integrated business application with end-user data access control.
    2. There is a centralised step of adding participants to the network, which is necessary for the private network to control that only authorised parties are involved in the protocol. However, this means it is not well-suited for a public network.

    In conclusion we would like to say that we are designing our platform to fit actual business needs. Our customers and partners are our best development drivers. And now we can see  how our customers' needs helps us to expand horizons and philosophy of blockchain technology -  anonymity and transparency with data control and privacy in trustless environment.

    At the moment you are reading this article we go further in implementation in Vostok Platform the synergy of technologies from two worlds - decentralized and centralized enterprise. Stay tuned.

    References

    1. Vostok WhitePaper.
    2. Vostok Documentation.
    3. Corda official documentation.
    4. Hyperledger Official Documentation.
    5. Exonum Official Documentation.
    6. Quorum Documentation.

    Written by George Ugulava