Deobfuscating genesis accounts


In 2017, DFINITY held a public sale of DFN tokens, which manifested as staked neurons when the network launched. Each "Genesis Account" was to have its allocation unlocked linearly every month - 31 months for "Early Contributors" (EC) or 49 months for "Seed Round" (SR). Because the total ICP allocation is significant (160,561,922 ICP, 33% of the total supply), it's important to be able to identify these accounts in order to properly assess market dynamics.
The Internet Computer Ledger is not quite like other blockchain ledgers - this one is simply an application canister running on the system subnet. Therefore, data can only be accessed if the team explicitly makes it so. This is the same story for other NNS canisters, including Governance, which tracks all neurons and proposals, and GTC, which stores genesis account data.
Down the rabbit hole
What do we have with us as we embark upon this adventure?
A
get_account
function on the GTC (Genesis Token canister) which returns the list ofNeuronId
for each genesis account, along with the principal that claimedMany
AccountIdentifiers
in the Ledger without any links toNeuronIds
At this point, one may think to simply search for all the accounts in Ledger that match a given neuron's expected balance. There are two issues with this approach, however:
A neuron's public info can be queried with
get_neuron_info
on Governance, but this data only includes voting power based on age and dissolve delay, not the original stake. Of course, it is possible to calculate backwards for stake amount, but...All 31 or 49 neurons contain exactly the same ICP amount (minus the last one which contains extra leftovers from bucketing). This means that we won't know exactly which
AccountIdentifier
is linked to each neuronWe need to know exactly which neuron an account is linked to in order to know when it will become liquid
Perhaps there is another solution.
How are neurons linked to accounts?
Let's take a look at the GTC init:
/// Given a list of "Seed Round" accounts, create each account's neuron set
/// and add a mapping from the account's address to these neurons in
/// `gtc_neurons`
pub fn add_sr_neurons(&mut self, sr_accounts: &[(&str, u32)]) {
let sr_months_to_release = self
.sr_months_to_release
.expect("sr_months_to_release must be set");
for (address, icpts) in sr_accounts.iter() {
self.total_alloc += *icpts;
let icpts = ICPTs::from_icpts(*icpts as u64).unwrap();
let sr_stakes = evenly_split_e8s(icpts.get_e8s(), sr_months_to_release);
let aging_since_timestamp_seconds = self.aging_since_timestamp_seconds;
let mut sr_neurons = make_neurons(
address,
INVESTOR_TYPE_SR,
sr_stakes,
self.get_rng(None),
aging_since_timestamp_seconds,
);
let entry = self.gtc_neurons.entry(address.to_string()).or_default();
entry.append(&mut sr_neurons);
}
}
Simple enough - neurons are created based on ICP amount and investor type, and pre-aged 18 months. make_neurons
is defined as:
/// Return a list of neurons that contain the stakes given in `stakes` and
/// dissolve at monotonically increasing months.
///
/// The first neuron's dissolve delay will be set to 0, the following neurons
/// will dissolve at a random time in the month after the previous neuron.
fn make_neurons(
address: &str,
investor_type: &str,
stakes: Vec<u64>,
rng: &mut StdRng,
aging_since_timestamp_seconds: u64,
) -> Vec<Neuron> {
stakes
.into_iter()
.enumerate()
.map(|(month_i, stake_e8s)| {
let random_offset_within_one_month_seconds = rng.next_u64() % ONE_MONTH_SECONDS;
let dissolve_delay_seconds = if month_i == 0 {
0
} else {
((month_i as u64) * ONE_MONTH_SECONDS) + random_offset_within_one_month_seconds
};
make_neuron(
address,
investor_type,
stake_e8s,
dissolve_delay_seconds,
aging_since_timestamp_seconds,
)
})
.collect()
}
This does the bucketing into months, with a random offset. Perhaps this randomness creates a smoother release of supply. Finally, make_neuron
:
fn make_neuron(
address: &str,
investor_type: &str,
stake_e8s: u64,
dissolve_delay_seconds: u64,
aging_since_timestamp_seconds: u64,
) -> Neuron {
let subaccount = {
let mut state = Sha256::new();
state.write(b"gtc-neuron");
state.write(address.as_bytes());
state.write(investor_type.as_bytes());
state.write(&dissolve_delay_seconds.to_be_bytes());
state.finish()
};
Neuron {
id: Some(NeuronId::from_subaccount(&subaccount)),
account: subaccount.to_vec(),
controller: Some(GENESIS_TOKEN_CANISTER_ID.get()),
cached_neuron_stake_e8s: stake_e8s,
dissolve_state: Some(DissolveState::DissolveDelaySeconds(dissolve_delay_seconds)),
aging_since_timestamp_seconds,
..Default::default()
}
}
Ok, so this subaccount
looks promising. We can use that later. But getting the NeuronId
requires another step:
impl pb::v1::NeuronId {
pub fn from_subaccount(subaccount: &[u8; 32]) -> Self {
Self {
id: {
let mut state = Sha256::new();
state.write(subaccount);
// TODO(NNS1-192) We should just store the Sha256, but for now
// we convert it to a number
u64::from_ne_bytes(state.finish()[0..8].try_into().unwrap())
},
}
}
}
We can thus summarize the process of NeuronId
generation as follows:
bucket into 31 or 49 months
for each month:
let dissolve_delay = month + random()
let subaccount = sha256(constant + genesis_address + dissolve_delay)
let neuronId = first 8 bytes of sha256(subaccount), as u64
return neuronId
In our case, we are starting with the set of known NeuronId
s and need to run the algorithm until we have matched all of them with genesis accounts and subaccounts. The random dissolve_delay
may seem to be an issue, but in practice the search space is small enough to make this only slightly annoying. We only need one more step to produce the AccountIdentifier
from a subaccount:
pub fn new(account: PrincipalId, sub_account: Option<Subaccount>) -> AccountIdentifier {
let mut hash = Sha224::new();
hash.write(ACCOUNT_DOMAIN_SEPERATOR);
hash.write(account.as_slice());
let sub_account = sub_account.unwrap_or(SUB_ACCOUNT_ZERO);
hash.write(&sub_account.0[..]);
AccountIdentifier {
hash: hash.finish(),
}
}
// ...
pub fn to_vec(&self) -> Vec<u8> {
[&self.generate_checksum()[..], &self.hash[..]].concat()
}
pub fn generate_checksum(&self) -> [u8; 4] {
let mut hasher = crc32fast::Hasher::new();
hasher.update(&self.hash);
hasher.finalize().to_be_bytes()
}
Neurons always belong to the Governance canister (and are controlled by a separate Principal). Now, let's put everything together to find all of the genesis neuron accounts:
for each genesis_address:
let neuron_set = gtc.get_account(genesis_address)
bucket into 31 or 49 months
for each month:
while not found:
let dissolve_delay = month + nonce
let subaccount = sha256(constant + genesis_address + dissolve_delay)
let neuronId = first 8 bytes of sha256(subaccount), as u64
if neuronId not in neuron_set:
nonce++
else:
let hash = sha224(constant + subaccount)
let accountId = crc32(hash) + hash
assert(account.starting_balance == bucket.amount)
link accountId to neuronId
I implemented this in JS (not recommended) and ran for about two days, accounting for bugfixing and database handling. It produced 15472 accounts, which is exactly the amount expected 🎉.
The complete data is available on ic.rocks.
Final Thoughts
Neurons, excluding genesis accounts and the initial team-controlled ones, have randomly generated IDs. The Governance canister does not expose a way to list neuron IDs. This is likely a design tradeoff made by the team to reduce operating costs, but this does present a problem - we do not have a complete picture of all ICP that is staked in neurons.
There are some workarounds, such as the use of alternate neuron management UIs that store neuron IDs, or dumping the heap of Governance canister and debugging it (this is not a task I wish to take on).
Perhaps a better approach is to ask the DFINITY team directly - why is neuron info difficult to access? Why are fields like controller
and stake_e8s
private? Perhaps we should open this data up in the name of transparency?