Living in a state of accord.

Ethereum State Rent Proof of Concept

I’ve had the opportunity to do some proof of concept development of the Ethereum state-rent proposal that Alexey Akhunov has been leading on the Pantheon code base. The proposal evolved as the work continued so the actual implementation is now a lot simpler than described in that PDF.

Note that the aim is to explore what it takes to implement the proposal, not to create production ready code.  The current work is all available on on my state-rent branch.

The easiest way to begin exploring the changes required to implement state-rent in Pantheon is to start with the hard forks required which translate to new ProtocolSpecs in Pantheon which are defined in MainnetProtocolSpecs. Ultimately there should be four hard forks:

  1. Introduce replay protection for accounts that are evicted
  2. Charge fixed state rent for “owned” accounts
  3. Charge state rent for newly allocated storage
  4. Charge state rent for pre-existing storage

Fork 1: Replay Protection

When accounts are evicted, their nonce would reset to 0, potentially enabled old transactions to be replayed. There are a few ways to prevent this, my preference was to introduce temporal production (variant 2 in the PDF) which is a simple extra validation to perform before processing transactions.  As such, I skipped implementing it in the proof of concept.

Fork 2: State Rent for Owned Accounts

In this milestone “owned” accounts are charged a fixed rent per block. An “owned” account is defined as one without code in the PoC. To avoid having to process every account on every block, the rent is only recalculated when the account is changed at the end of the block (ie a change would already have to be written to disk for it).

TODO: Formalise the definition of owned account more. Particularly, clarify if an account created with empty code is considered owned or not.

The ProtocolSpec for this fork is defined based on Constantinople with three changes:

public static ProtocolSpecBuilder<Void> stateRentOwnedAccountsDefinition(
      final int chainId, final long rentEnabledBlockNumber) {
    return constantinopleDefinition(chainId)
            rentCost -> new OwnedAccountsStateRentProcessor(rentCost, rentEnabledBlockNumber))

Firstly, we set a rent cost of 2 Gwei/block. This is a pretty arbitrary figure and will likely need tweaking. This value is used by the rent processor we also set but has been split out so it’s easy for a hypothetical future fork to change the rent cost without having to implement a completely different rent processor.

Finally, we set a StateRentOwnedAccountsAccountInit which is used to setup defaults for any newly created accounts (post-fork). In this case, it sets the rent balance to zero and rent block to the current block number. These are two new fields added to account state in this fork. This is the first time the new account process has varied in a hard fork, so there was a bunch of plumbing code required to add the concept of a pluggable AccountInit but none of it difficult.

However, it’s also the first hard fork that adds fields to account state, which makes serialising and deserialising them a little more difficult.  That code is in AccountSerializer. Fortunately it’s easy in RLP to check if you are at the end of the current list or not, so when deserialising we can tell if rent balance and rent block have been added to this account state by checking if we’re at the end of the list (line 51).

Calculating the rent owed is handled by OwnedAccountsStateRentProcessor with much of the code able to be reused with later forks via AbstractRentProcessor. The rent balance is one of the very few things in Ethereum that can be negative which makes life a little frustrating as the existing Wei class used for account balance can’t be reused and there’s a few points where things need to be converted between Wei and BigInteger but it’s manageable.

Rent is only recalculated when the account has been changed in the block. In Pantheon the simplest way to achieve this was to piggy-back on the concept of “touched” accounts used in Spurious Dragon.  We simply iterate through the touched accounts and check if the root of their account state trie is dirty (which is also already tracked), if it is we recalculate rent via the RentProcessor.

At this stage, any account which is unable to pay its rent can be deleted in the same way it would have been if it were empty. Again, simple to reuse existing code for this.

For the record, actually calling the rent processor is hooked into MainnetBlockProcessor so that it happens as the last step in processing a block.

TODO: Additional clarification is required about exactly when rent should be charged. In the PoC it’s applied even after paying the miner which seems sensible but there are a number of things happening at the end of a block so it needs to be completely clear.

Note that the priority queue for eviction has been scrapped (I forget why it was originally required…).

Fork 3: State Rent for New Storage

This fork introduces quite a bit of new stuff, but most importantly adds a new storagesize field to account state which is adjusted as part of every SSTORE call. Existing storage is not included in the storagesize field at this point, only a count of the storage size either added or cleared since fork 3 became active. The field is added when new accounts are created, when rent is recalculated for an existing account or when an SSTORE call results in the account storage size changing.

TODO: Need to be clear about exactly when this field is first added (particularly if multiple SSTORE calls are made that leave the storage at the same size or even completely unchanged).

The ProtocolSpec for this fork is based on the one for fork 2 but with a few changes:

  public static ProtocolSpecBuilder<Void> stateRentNewStorageDefinition(
      final int chainId, final long rentEnabledBlockNumber) {
    return stateRentOwnedAccountsDefinition(chainId, rentEnabledBlockNumber)
            rentCost -> new StorageSizeStateRentProcessor(rentCost, rentEnabledBlockNumber))

There’s a new AccountInit to set storagesize to 0 when a new account is created.

Rent cost is reduced to 1 Gwei since it’s now charged per byte per block. This is almost certainly still the wrong value but it definitely should be less than when rent was purely per block.

We also supply a new RentProcessor, StorageSizeStateRentProcessor, which applies to all accounts whether they have code or not. It also charges rent based on the value of the new storagesize field.  If storagesize hasn’t yet been set, it’s set to a fixed value to represent the size of an empty account (73 which is the length of an empty account RLP in bytes).

TODO: The PoC deliberately uses the value of storagesize from the start of the block, ignoring any changes applied in the current block. Otherwise, if an account is untouched for 100 blocks, then increases its storage size, it will be charged rent at the increased storage size for the whole 100 blocks it was untouched.  This rule may need to be tweaked slightly to use the new storagesize value for the current block and the old one for the untouched blocks but it seems reasonable to defer rent changes until the block after they take effect – increased storage would have cost gas in the current block and reduced storage doesn’t really give a benefit until the block is complete and committed anyway.

NOTE: storagesize may be negative during this fork, indicating that the account now uses less storage than it did when this fork first happened.  After fork 5, storagesize should only ever be positive. 

TODO: Rent calculation during this fork should probably use max(storagesize, 0) to avoid the rent due being negative.

Accounts with code are not deleted when they can’t pay their rent, they are instead “evicted” and replaced with a hash stub so they can be restored via the new RESTORETO operation (see below). This is one of the more complicated changes to work through the code – firstly just in terms of tracking evicted accounts through the transaction commit/rollback code but also in terms of detecting them in the account state trie.

Hash Stubs

Challenging bit: Determining if an address in the account state trie is an actual account state or just a hash stub.

Looking again at AccountSerializer in the serializeHashStub method, we can see that a hash stub is an RLP list containing the code hash and storage root hash. RLP doesn’t provide any typing information though so to detect a hash stub, we check if the first item is a nonce or a hash. If it is a hash stub, the size of the RLP item must be 32 bytes. Theoretically a nonce is an arbitrary unsigned scalar number, so it could wind up being 32 bytes (a 256bit number) but at least in Pantheon we already don’t support nonces that big and you’d have to spend a lot of eth on gas to get a nonce to make a nonce that big. So the item length appears to be a suitable way to distinguish between account state and hash stubs.

UPDATE: A smarter way to do this would have been to just count the number of items in the RLP list which is relatively straight-forward. If there are 2 items, it must be a hash stub.

TODO: Clarify interactions for any operation that looks up an account state if the account state has been replaced by a hash stub. Currently Pantheon treats hash stubs as if they don’t exist except in the RESTORETO operation. This is mostly right, except that contract creation should consider it an address conflict if a hash stub is at the target address (not currently implemented but pretty straight forward).

New EVM Operations

There are three EVM operations added in this fork: PAYRENT, RENTBALANCE, SSIZE and RESTORETO. Additionally SSTORE is updated to update the storagesize field. PAYRENT, RENTBALANCE and the SSTORE changes are pretty trivial but RESTORETO has quite a few complexities.

TODO: Decide on the gas cost for these new opcodes.

Tool Support: RENTBALANCE returns an Int256 (signed) which I think is the only time an EVM operation returns a signed number. Need to double check compatibility for this with things like Solidity but should be fine.

Restore To Operation

The RESTORETO operation turns out to be one of the more interesting parts of this spec to implement.  The basic workflow is slightly nuanced and involves three accounts.  Let’s say that Account A is evicted, leaving behind a stub with its code and storage root hashes.  To restore Account A:

  1. Account B is created with the same code as Account A.
  2. Account C is created with code that sets up the storage in Account C to precisely match the storage Account A had before being evicted. It may do this in it’s constructor or expose functions that can be called over time to build up the correct state from multiple sources (potentially sharing the gas cost of rebuilding that state across multiple people).
  3. Account C calls RESTORETO <address of Account A> <address of Account B>.  The EVM checks that account B’s code hash matches what Account A’s stub records, and that Account C’s storage root hash matches what Account A’s stub records.  If so, it creates a new contract at Address A, replacing the hash stub, with the code from Account B and the storage from Account C. It also transfers any balance from Account C to Account A and destroys Account C (just as SELFDESTRUCT would).

Mostly this is fairly straight-forward to implement as seen in The hidden catch is that most Ethereum clients, including Pantheon, don’t update the storage trie until the transaction actually commits – the changes made mid-transaction are stored in a simple HashMap which is much faster. As such, getting the storage root hash for Account C isn’t straight-forward. In fact, in the PoC I haven’t implemented it at all. It’s certainly possible to create a copy of the unmodified Trie, update that with pending changes and calculate the storage root but it’s an expensive operation and that code would only be exercised by RESTORETO, significantly increasing the testing required to fully cover RESTORETO.

TODO: Find a way to change RESTORETO so that the storage root is more easily available when required.

One option here would be to have execution immediately halt when RESTORETO is called, and then to perform the actual restore at the end of the transaction, much like how SELFDESTRUCT only applies at the end of the transaction.  Geth and Pantheon currently update the storage trie after every transaction (they don’t necessarily calculate the root hash but that’s simple once you have the trie), but TurboGeth delays updating the trie until the end of the block, improving performance if the same account is used from multiple transactions in one block.  Implementing that improvement for Pantheon is also planned.  Delaying application of RESTORETO until the end of the block feels a bit too weird though.

More thought is required in this area to find a solution that both makes sense to users and can be implemented without significantly increasing the surface area required for testing.

Fork 4: State Rent for Existing Storage

The PoC doesn’t include any of this fork currently, but it’s fairly straight-forward. When an account’s rentblock field is first updated to a block after fork 4 is in effect, the account’s storagesize field is increased by the size of its storage immediately prior to fork 3 coming into effect. Basically, we add the historical state size in.

This is done as a separate fork because it gives times for client developers to precompute the storage size for every account on the various public networks at the fork 3 block and include that in a client update. This means the cost of calculating that existing storage size is done offline and doesn’t impact on performance of every node.

However, because the storagesize field is only updated when the account is first touched anyway, it is still feasible to calculate the existing storage size on the fly, either because the precomputed values aren’t available for that particular network or in order to verify them.

TODO: It’s not explicitly stated anywhere at the moment, but when calculating the rent due for the first time after fork 4, rent should be calculated up to fork 4 block using the new-storage-only-value, then add in the existing storage size and calculate rent from fork 4 block up to the current block. Otherwise accounts start being charge rent for existing storage at different times which is too hard to understand (and unfair).

Other Work Required

There’s some other stuff that would need to be done to make state-rent production ready:

  • Additional JSON-RPC methods such as getRentBalance and probably something to calculate the rent owing for a given account at a specific block.
  • CALLFEE opcode. The spec refers to this to provide a way for contracts to require a rent contribution when users call into it, but there weren’t enough details to actually implement it. The main question is where the extra fee is taken from (it can’t be an extra gas charge because rent is tracked in ETH not gas).
  • Currently if you deploy a contract with Remix it doesn’t get any eth balance so it is immediately evicted when you interact with it. This is quite confusing and frustrating as a user.  We’ll either need to update tools to allow sending an initial rent balance, provide some free rent on contract creation or allow a grace period after contract creation before rent is charged.

Introducing Pantheon

This week, the work I’ve been doing for the past 6 months, and that PegaSys has been working on for the past 18 months or so was released into the world. Specifically we’ve released Pantheon 0.8.1, our new MainNet compatible, Apache 2 licensed, Java-based Ethereum client. And it’s open source.

I’m pretty excited about it on a few fronts.  Firstly I think it’s a pretty important thing for the Ethereum community. To be a healthy ecosystem, Ethereum needs to have diversity in its clients to avoid a bug in one client taking out or accidentally hard forking the entire network. Currently though, Geth and Parity dominate the Ethereum client landscape.  Pantheon clearly won’t change that in the short term, but it is backed by significant engineering resources to help it keep up with the ever changing Ethereum landscape and be a dependable option.

I’m also really excited that Pantheon is released under the Apache 2.0 license. Both Parity and Geth along with most other clients are licensed under the GPL or LGPL. There are still a large number of enterprises that completely avoid the GPL and LGPL which has closed off Ethereum to them. Having Pantheon available under a permissive license and in a highly familiar language like Java will make it much easier for many enterprises to start using, building on and innovating with Ethereum.

Pantheon will also be building out functionality from the Enterprise Ethereum standard for things like privacy and permissioning to make private chains more powerful and flexible. Meanwhile we have a significant number of researches continuing to work on developing new ways to get the most out of Ethereum.

Finally I’m quite excited to be able to contribute to an open source project as my full time job. I’ve had the opportunity to do some open source for work in the past but only on a fairly small scale. It’s a bit daunting to have everything in the open and on the record, but I’m really looking forward to engaging with the community and being able to show exactly what I’ve been doing.