Overview Design of QuarkChain Sharding
Given the existing blockchain model, QuarkChain enhances the existing models with the following designs:
- (State to partition): QuarkChain partitions smart contracts, i.e., putting smart contracts in different shards if a reshard event happens. A smart contract contains code and storage, and the data size could be much larger than a user account (consider an ERC20 with a large number of address-to-balance mapping).
- (State to partition): We do not partition user accounts, and a user can easily move its account state (mostly balance) to another shard via cross-shard transactions. This allows a user with a private key to access all resources (smart contracts) in all shards.
- (Atomicity): Each smart contract has a shard key (fullShardId in our codebase), and any batch operations on all smart contracts with the same shard key are atomic. In addition, a smart contract tries to access another smart contract with different shard key is prohibited.
- (Balanced load/size): By assuming that the shard keys of smart contracts are evenly distributed, the smart contracts are expected to be evenly partitioned in different shards.
- (Reshard): To add more shards with low migration cost, we split a shard to two shards according to the shard key in smart contracts. This eliminates data migrations from other pre-reshard shards and thus simplifies reshard.
- (Reshard): The behaviors of the smart contracts will be the same before- and after- reshard, i.e., reading a smart contract result in the same value, and writing a smart contract result in the same system state changes before- and after- reshard.
System State and State Partition of QuarkChain
To support atomicity of a batch of CRUD operations in QuarkChain, QuarkChain redefines the address of a user account and smart contract by adding 32 bits shard key to each address:
Address := RIPEMD160(Public key) + Shard key,
where + is Cartesian product operator, an address of QuarkChain is a 192-bit data, and we refer to the first 160-bit of the address as the recipient of the address in QuarkChain.
The number of shards (shard size) in a QuarkChain network is a power of 2, and a reshard operation will double the number of shards in the network. Given a shard size, the shards are indexed by:
Shard Id := Shard key % Shard size.
The state in a shard is a key-value mapping, where the key is the recipient of addresses and value contains
- Balance;
- Nonce;
- Code;
- Storage;
- ShardKey.
where ShardKey is set when then key-value pair is created and is immutable thereafter.
With an additional shard key in address, a user (or a recipient) is able to manage all addresses in all shards with single private key.
Transactions in QuarkChain
Balance Transfer Transactions
A balance transfer transaction depends on source and destination addresses:
- (In-shard transaction): If both addresses have the same shard id (even the shard keys are different), the transfer is an in-shard transaction, and such transaction will only update the balance of the recipients in the same shard.
- (Cross-shard transaction): If both addresses have the different shard id’s, the transfer is a cross-shard transaction, and the atomicity of the transaction needs further coordination. Fortunately, such transaction is much simpler than traditional cross-chain transaction as both shards run the same cryptocurrency (QKC). The details of the cross-shard transaction will be discussed in another article.
Smart Contract Transactions
A smart contract transaction must be issued in the same shard, i.e., the shard id of the source user account and the destination smart contract must be the same. A smart contract can call another smart contract (by supplying another smart contract’s recipient, i.e., address in EVM, in the code to maintain EVM backward compatibility) with the same shard key. However, such contract call will fail if the smart contracts are with different shard key, where such call is equivalent of calling a smart contract with the following assembly code:
PUSH 0x0
DUP1
REVERT
Since the smart contracts with the same shard key will always be partitioned to the same shard, the behaviors of the smart contract (read/write) are guaranteed to be the same no matter how the system state is partitioned (See Reshard section in QuarkChain for more details).
Reshard in QuarkChain
A reshard operation will split each shard into two seperate shards, and the shard size will be doubled as a result. After doubling shard size, the destination of the user account/smart contracts is identified by the extra significant bit in new shard id. By assuming the shard keys of user accounts/smart contracts are uniformly distributed, half of user accounts/smart contracts should be partitioned to a new shard, and the rest to another shard. In addition, a splitted shard may still contain the smart contracts of another splitted shard by replacing the code with REVERT code and nulling its storage. These dummy smart contracts ensure that the smart contracts in new shard will still fail to call the smart contracts that were in the same shard but with different shard keys.
If the current node that processes both splitted shards is out of capacity, a new node can be added in the network to process a new shard by migrating its state, and thus the system capacity could be increased as the number of shards/nodes increases. This is achieved by a cluster, which will be discussed in future articles.
Shard Key Selection in QuarkChain
A key to load balancing in QuarkChain is to ensure all smart contracts spread over all shards. Since the shard key is immutable after the creation of a smart contract, the key selection during creation is important.
First, if a smart contract to be created depends on other smart contracts, the shard key has to be the same as the depending smart contracts.
Second, if a smart contract does not depend on any other smart contracts, a user (or a wallet) is free to choose any shard key or the system could impose some rules:
- Randomly-generated shard key; or
- Any 32-bit in source address’ recipient; or
- IP address of the wallet.
The first two selections could result in uniformly distributed shard keys. However, suppose that there are hundreds or thousands of shards. A user who wants to access any smart contract in the network may need frequent cross-shard transactions or maintain multiple balances, which reduces the efficiency of sharding or worsens user experience.
Using IP address may alleviate the issue by grouping the smart contracts with geographical information. If the user’s interaction with smart contract is geographically related (for example, interact with local merchandises/services), a user could just maintain its balances in the shards that are related to the locations that the user resides in or travels to, potentially saving a lot of unnecessary cross-shard transactions and simplifying user account management.
Comparison of QuarkChain and Google’s BigTable
A lot of QuarkChain’s sharding designs are inspired by Google’s BigTable, and they share many similarities as both of them are essentially key-value store. The following table compares them side by side.
Besides similarities, QuarkChain also has several major differences with BigTable:
- QuarkChain natively supports cross-shard transactions, which transfer the balance from one account in one shard to another account in another shard, while BigTable does not support any transactions across multiple row keys.
- BigTable allows merging two adjacent tablets (shards) into one tablet if the two tables are small because of deletions. In contrast, the deletion operations (e.g., selfdestruct) are rarely used in blockchain, and thus merging operation is not required. This greatly simplifies the system model and threat model of QuarkChain (such as replay attack).
Summary
Inspired by existing scalable systems, we proposed a new blockchain system model and described our sharding scheme. In addition, we presented a comparison of our proposed model with Google’s BigTable. In the next article Part 4: Sharding in QuarkChain: Consensus, we will discuss how to build consensus to support sharding and the corresponding threat model of QuarkChain.