Symphonious

Living in a state of accord.

Exploring Eth2: Attestation Inclusion Rates with chaind

For the beacon chain MainNet, I’ve setup an instance of chaind – a handle little utility that connects to a beacon node (like Teku) and dumps out the chain data to a database so its easy to run various queries against. While you can do this with various REST API queries and custom code, having it all accessible by SQL makes ad-hoc queries a lot easier and lets you add it as a datasource for Grafana to display lots of powerful metrics.

Herewith, a few useful queries and snippets that make life easier. I’ve setup my dashboard with two variables – genesisTime (1606824023 for MainNet) and perfValidators – a list of validator indices that I often want to monitor (e.g. 1,1234,5533,2233).

Filter an epoch based table to the current Grafana time range:

WHERE f_epoch >= ($__unixEpochFrom() - $genesisTime) / 12 / 32 AND f_epoch <= ($__unixEpochTo() - $genesisTime) / 12 / 32

Filter a slot based table to the current Grafana time range:

WHERE f_slot >= ($__unixEpochFrom() - $genesisTime) / 12 AND f_slot <= ($__unixEpochTo() - $genesisTime) / 12

Average balance, in ETH, of a set of validators for the latest epoch:

SELECT
f_epoch * 32 * 12 + $genesisTime AS "time",
AVG(f_balance) / 1000000000.0 as "balance"
FROM t_validator_balances
WHERE
f_epoch >= ($__unixEpochFrom() - $genesisTime) / 12 / 32 AND f_epoch <= ($__unixEpochTo() - $genesisTime) / 12 / 32 AND
f_validator_index IN ($perfValidators)
GROUP BY f_epoch
ORDER BY f_epoch DESC
LIMIT 1

Balances by validator, by epoch suitable for graphing:

SELECT
CONCAT(f_validator_index, ' ') AS metric,
f_epoch * 32 * 12 + $genesisTime AS "time",
MAX(f_balance) / 1000000000.0 as "balance"
FROM t_validator_balances
WHERE
f_epoch >= ($__unixEpochFrom() - $genesisTime) / 12 / 32 AND f_epoch <= ($__unixEpochTo() - $genesisTime) / 12 / 32 AND
f_validator_index IN ($perfValidators)
GROUP BY f_epoch, f_validator_index
ORDER BY f_epoch

Calculate the percentage of blocks successfully produced and included on the canonical chain:

SELECT
SUM(1) FILTER (WHERE b.f_root IS NOT NULL) / COUNT(*) as "value"
FROM t_proposer_duties AS d
LEFT JOIN t_blocks AS b ON d.f_slot = d.f_slot
WHERE
d.f_slot >= ($__unixEpochFrom() - $genesisTime) / 12 AND d.f_slot <= ($__unixEpochTo() - $genesisTime) / 12 AND
d.f_validator_index IN ($perfValidators)

And then we come to attestations… They’re a special kind of fun because they can be aggregated and the validator is identified by its position in a committee rather than by its validator index directly. We’re going to build up to having a query that calculates the inclusion rate for attestations but we’ll take it a step at a time with a bunch of useful queries and snippets along the way.

Firstly, we need to deal with the fact that attestations for the committee may have been spread across multiple aggregates when included in blocks and recombine them into one set of aggregation bits that represents every validator from that slot and committee which was included in any attestation:

SELECT x.f_slot, x.f_committee_index, bit_or(RIGHT(x.f_aggregation_bits::text, -1)::varbit) AS f_aggregation_bits 
FROM t_attestations x
GROUP BY x.f_slot, x.f_committee_index

Note: you may want to add a WHERE clause to constrain this to the relevant time rage to improve performance.

The aggregation bits are stored as a bytea, so we first have to convert it to a varbit with a weird RIGHT(x.f_aggregation_bits::text, -1)::varbit because the internet said to and then we can do a bitwise or to combine them all.

Now we have slot, committee index and a long binary string indicating which validators in the committee attested. So we can identify the validators we’re interested in, we want to convert that binary string to an array of validator indices. I don’t know of a function that can do that in Postgres, but we can write one:

create or replace function get_indices(varbit, bigint[]) returns bigint[] as $$
declare i int;
l int;
s int;
c bigint[];
begin
c:='{}';
l:=0;
loop
exit when get_bit($1, bit_length($1) - 8 + l)=1;
l:=l+1;
end loop;
s:=(7 - l) + (octet_length($1) - 1) * 8;
c:=c||l::bigint||s::bigint;
for i in reverse s-1..0 loop
if get_bit($1, (i / 8) * 8 + 7-(i%8))=1 then
c:=c||$2[i + 1];
end if;
end loop;
return c;
end;
$$ language plpgsql;

UPDATE: The initial version of this function was incorrect. Endianness strikes again… The updated version fully parses the SSZ bitlist format, using the marker bit to calculate the exact list size.

This takes the binary string we had as the first argument (varbit) and the f_committee field from the t_beacon_committees table which is an array of validator indices. It iterates through each bit in the binary string and if it’s set adds the validator index from the same position in the beacon committee to the result.

Combining the two, we can get the list of validator indices that were included by slot and committee index:

SELECT c.f_slot, c.f_index, get_indices(a.f_aggregation_bits, c.f_committee)
FROM t_beacon_committees c
JOIN (SELECT x.f_slot, x.f_committee_index, bit_or(RIGHT(x.f_aggregation_bits::text, -1)::varbit) AS f_aggregation_bits
FROM t_attestations x
GROUP BY x.f_slot, x.f_committee_index) a ON a.f_slot = c.f_slot AND a.f_committee_index = c.f_index

If we just want to know how many validators attested from that committee and slot, we’d change the get_indices call to:

cardinality(get_indices(a.f_aggregation_bits, c.f_committee))

But we really only want to know how many of our validators attested so we need to find the intersection of our validators and the validators that attested.  That part of the select then becomes:

icount(get_indices(a.f_aggregation_bits, c.f_committee)::int[] & '{$perfValidators}'::int[])

Since we’re ultimately wanting just the total number of attestations included from our validators we can aggregate all those rows with the SUM function and our query becomes:

SELECT SUM(icount(get_indices(a.f_aggregation_bits, c.f_committee)::int[] & '{$perfValidators}'::int[]))
FROM t_beacon_committees c
JOIN (SELECT x.f_slot, x.f_committee_index, bit_or(RIGHT(x.f_aggregation_bits::text, -1)::varbit) AS f_aggregation_bits
FROM t_attestations x
GROUP BY x.f_slot, x.f_committee_index) a ON a.f_slot = c.f_slot AND a.f_committee_index = c.f_index

But that’s only have the story (fortunately its the more complicated half) – we also want to know how many attestations we should have produced. That means counting the number of times our validators appear in committees:

SELECT SUM(icount(c.f_committee::int[] & '{$perfValidators}'))
FROM t_beacon_committees c

It’s tempting to just use a COUNT(*) but that will find the number of committees any validator is in, but undercount the number of expected attestations if two of our validators are in the same committee. So we have to apply the same trick as we did for counting the number of attestations – find the intersection and then count.

We can filter the number of rows we have to inspect by only including committees containing our validators with:

WHERE c.f_committee && '{$perfValidators}'

So putting it all together we can get the percentage of included attestations in the current time range with:

SELECT SUM(icount(get_indices(a.f_aggregation_bits, c.f_committee)::int[] & '{$perfValidators}'::int[]))::double precision / SUM(icount(c.f_committee::int[] & '{$perfValidators}'))
FROM t_beacon_committees c
JOIN (SELECT x.f_slot, x.f_committee_index, bit_or(RIGHT(x.f_aggregation_bits::text, -1)::varbit) AS f_aggregation_bits
FROM t_attestations x
GROUP BY x.f_slot, x.f_committee_index) a ON a.f_slot = c.f_slot AND a.f_committee_index = c.f_index
WHERE c.f_slot >= ($__unixEpochFrom() - $genesisTime) / 12 AND c.f_slot <= ($__unixEpochTo() - $genesisTime) / 12
AND c.f_committee && '{$perfValidators}';

Which is up there with the least comprehensible SQL I’ve ever written.  But it works and it says Teku is awesome.

But why stop there? What if we wanted to know the number of correct attestations?

Remember way back up when we were first aggregating attestations? If we just filtered the attestations to the ones which have the right beacon block root, our aggregate will only count validators that attested to the correct head. A straight join to t_blocks will do the trick:

SELECT SUM(icount(get_indices(a.f_aggregation_bits, c.f_committee)::int[] & '{$perfValidators}'::int[]))::double precision / SUM(icount(c.f_committee::int[] & '{$perfValidators}'))
FROM t_beacon_committees c
JOIN (SELECT x.f_slot, x.f_committee_index, bit_or(RIGHT(x.f_aggregation_bits::text, -1)::varbit) AS f_aggregation_bits
FROM t_attestations x
JOIN t_blocks b ON x.f_beacon_block_root = b.f_root AND b.f_slot = (SELECT MAX(b2.f_slot) FROM t_blocks b2 WHERE b2.f_slot <= x.f_slot)
WHERE x.f_slot >= ($__unixEpochFrom() - $genesisTime) / 12 AND x.f_slot <= ($__unixEpochTo() - $genesisTime) / 12
GROUP BY x.f_slot, x.f_committee_index) a ON a.f_slot = c.f_slot AND a.f_committee_index = c.f_index
WHERE c.f_slot >= ($__unixEpochFrom() - $genesisTime) / 12 AND c.f_slot <= ($__unixEpochTo() - $genesisTime) / 12
AND c.f_committee && '{$perfValidators}';

It’s tempting to just test that the block root and slot matches, but then attestations that correctly pointed to empty slots wouldn’t be included. So we have to check that the block root is the last one at or before the attestation slot.

Or would could change that join to check if the target root (it points to the block root at the first slot of the epoch):

JOIN t_blocks b ON x.f_target_root = b.f_root AND b.f_slot = (SELECT MAX(b2.f_slot) FROM t_blocks b2 WHERE b2.f_slot <= x.f_target_epoch * 32)

And hey look, we made that incomprehensible SQL even worse!

This may not be the best use of chaind but it’s fun and the snippets of SQL here can be remixed and combined to do quite a lot of things as part of what’s investigating what happened with a chain. I suspect it would be a lot easier to have a denormalised table that recorded the first time a validator’s attestation was included to save all the bit twiddling and array intersections. It may come at a fairly large cost of disk space though.

Category: General