Deobfuscating genesis accounts

Norton Wang
Norton Wang
6/21/2021

In 2017, DFINITY held a public sale of DFN tokens, which manifested as staked neurons when the network launched. Each "Genesis Account" was to have its allocation unlocked linearly every month - 31 months for "Early Contributors" (EC) or 49 months for "Seed Round" (SR). Because the total ICP allocation is significant (160,561,922 ICP, 33% of the total supply), it's important to be able to identify these accounts in order to properly assess market dynamics.

The Internet Computer Ledger is not quite like other blockchain ledgers - this one is simply an application canister running on the system subnet. Therefore, data can only be accessed if the team explicitly makes it so. This is the same story for other NNS canisters, including Governance, which tracks all neurons and proposals, and GTC, which stores genesis account data.

Down the rabbit hole

What do we have with us as we embark upon this adventure?

  • A list of all genesis accounts

  • A get_account function on the GTC (Genesis Token canister) which returns the list of NeuronId for each genesis account, along with the principal that claimed

  • Many AccountIdentifiers in the Ledger without any links to NeuronIds

At this point, one may think to simply search for all the accounts in Ledger that match a given neuron's expected balance. There are two issues with this approach, however:

  • A neuron's public info can be queried with get_neuron_info on Governance, but this data only includes voting power based on age and dissolve delay, not the original stake. Of course, it is possible to calculate backwards for stake amount, but...

  • All 31 or 49 neurons contain exactly the same ICP amount (minus the last one which contains extra leftovers from bucketing). This means that we won't know exactly which AccountIdentifier is linked to each neuron

  • We need to know exactly which neuron an account is linked to in order to know when it will become liquid

Perhaps there is another solution.

How are neurons linked to accounts?

Let's take a look at the GTC init:

/// Given a list of "Seed Round" accounts, create each account's neuron set
/// and add a mapping from the account's address to these neurons in
/// `gtc_neurons`
pub fn add_sr_neurons(&mut self, sr_accounts: &[(&str, u32)]) {
    let sr_months_to_release = self
        .sr_months_to_release
        .expect("sr_months_to_release must be set");

    for (address, icpts) in sr_accounts.iter() {
        self.total_alloc += *icpts;
        let icpts = ICPTs::from_icpts(*icpts as u64).unwrap();
        let sr_stakes = evenly_split_e8s(icpts.get_e8s(), sr_months_to_release);
        let aging_since_timestamp_seconds = self.aging_since_timestamp_seconds;
        let mut sr_neurons = make_neurons(
            address,
            INVESTOR_TYPE_SR,
            sr_stakes,
            self.get_rng(None),
            aging_since_timestamp_seconds,
        );
        let entry = self.gtc_neurons.entry(address.to_string()).or_default();
        entry.append(&mut sr_neurons);
    }
}

Simple enough - neurons are created based on ICP amount and investor type, and pre-aged 18 months. make_neurons is defined as:

/// Return a list of neurons that contain the stakes given in `stakes` and
/// dissolve at monotonically increasing months.
///
/// The first neuron's dissolve delay will be set to 0, the following neurons
/// will dissolve at a random time in the month after the previous neuron.
fn make_neurons(
    address: &str,
    investor_type: &str,
    stakes: Vec<u64>,
    rng: &mut StdRng,
    aging_since_timestamp_seconds: u64,
) -> Vec<Neuron> {
    stakes
        .into_iter()
        .enumerate()
        .map(|(month_i, stake_e8s)| {
            let random_offset_within_one_month_seconds = rng.next_u64() % ONE_MONTH_SECONDS;
            let dissolve_delay_seconds = if month_i == 0 {
                0
            } else {
                ((month_i as u64) * ONE_MONTH_SECONDS) + random_offset_within_one_month_seconds
            };

            make_neuron(
                address,
                investor_type,
                stake_e8s,
                dissolve_delay_seconds,
                aging_since_timestamp_seconds,
            )
        })
        .collect()
}

This does the bucketing into months, with a random offset. Perhaps this randomness creates a smoother release of supply. Finally, make_neuron:

fn make_neuron(
    address: &str,
    investor_type: &str,
    stake_e8s: u64,
    dissolve_delay_seconds: u64,
    aging_since_timestamp_seconds: u64,
) -> Neuron {
    let subaccount = {
        let mut state = Sha256::new();
        state.write(b"gtc-neuron");
        state.write(address.as_bytes());
        state.write(investor_type.as_bytes());
        state.write(&dissolve_delay_seconds.to_be_bytes());
        state.finish()
    };

    Neuron {
        id: Some(NeuronId::from_subaccount(&subaccount)),
        account: subaccount.to_vec(),
        controller: Some(GENESIS_TOKEN_CANISTER_ID.get()),
        cached_neuron_stake_e8s: stake_e8s,
        dissolve_state: Some(DissolveState::DissolveDelaySeconds(dissolve_delay_seconds)),
        aging_since_timestamp_seconds,
        ..Default::default()
    }
}

Ok, so this subaccount looks promising. We can use that later. But getting the NeuronId requires another step:

impl pb::v1::NeuronId {
    pub fn from_subaccount(subaccount: &[u8; 32]) -> Self {
        Self {
            id: {
                let mut state = Sha256::new();
                state.write(subaccount);
                // TODO(NNS1-192) We should just store the Sha256, but for now
                // we convert it to a number
                u64::from_ne_bytes(state.finish()[0..8].try_into().unwrap())
            },
        }
    }
}

We can thus summarize the process of NeuronId generation as follows:

bucket into 31 or 49 months
for each month:
  let dissolve_delay = month + random()
  let subaccount = sha256(constant + genesis_address + dissolve_delay)
  let neuronId = first 8 bytes of sha256(subaccount), as u64
  return neuronId

In our case, we are starting with the set of known NeuronIds and need to run the algorithm until we have matched all of them with genesis accounts and subaccounts. The random dissolve_delay may seem to be an issue, but in practice the search space is small enough to make this only slightly annoying. We only need one more step to produce the AccountIdentifier from a subaccount:

pub fn new(account: PrincipalId, sub_account: Option<Subaccount>) -> AccountIdentifier {
    let mut hash = Sha224::new();
    hash.write(ACCOUNT_DOMAIN_SEPERATOR);
    hash.write(account.as_slice());

    let sub_account = sub_account.unwrap_or(SUB_ACCOUNT_ZERO);
    hash.write(&sub_account.0[..]);

    AccountIdentifier {
        hash: hash.finish(),
    }
}
// ...
pub fn to_vec(&self) -> Vec<u8> {
    [&self.generate_checksum()[..], &self.hash[..]].concat()
}

pub fn generate_checksum(&self) -> [u8; 4] {
    let mut hasher = crc32fast::Hasher::new();
    hasher.update(&self.hash);
    hasher.finalize().to_be_bytes()
}

Neurons always belong to the Governance canister (and are controlled by a separate Principal). Now, let's put everything together to find all of the genesis neuron accounts:

for each genesis_address:
  let neuron_set = gtc.get_account(genesis_address)
  bucket into 31 or 49 months
  for each month:
    while not found:
      let dissolve_delay = month + nonce
      let subaccount = sha256(constant + genesis_address + dissolve_delay)
      let neuronId = first 8 bytes of sha256(subaccount), as u64
      if neuronId not in neuron_set:
        nonce++
      else:
        let hash = sha224(constant + subaccount)
        let accountId = crc32(hash) + hash
        
        assert(account.starting_balance == bucket.amount)
        link accountId to neuronId

I implemented this in JS (not recommended) and ran for about two days, accounting for bugfixing and database handling. It produced 15472 accounts, which is exactly the amount expected 🎉.

The complete data is available on ic.rocks.

Final Thoughts

Neurons, excluding genesis accounts and the initial team-controlled ones, have randomly generated IDs. The Governance canister does not expose a way to list neuron IDs. This is likely a design tradeoff made by the team to reduce operating costs, but this does present a problem - we do not have a complete picture of all ICP that is staked in neurons.

There are some workarounds, such as the use of alternate neuron management UIs that store neuron IDs, or dumping the heap of Governance canister and debugging it (this is not a task I wish to take on).

Perhaps a better approach is to ask the DFINITY team directly - why is neuron info difficult to access? Why are fields like controller and stake_e8s private? Perhaps we should open this data up in the name of transparency?


More Stories

NNS Proposal: Enabling Canisters to hold ICP

9/16/2021

Enabling canisters to hold ICP - currently only user accounts and the NNS can hold ICP

Arthur Falls
Arthur Falls

NNS Proposal: Direct Integration with Bitcoin

9/15/2021

A proposal to enable users and canisters on the Internet Computer to transact wrapped bitcoin directly without the use of a centralised intermediary or an intermediary protocol. This is achieved by IC replicas running Bitcoin nodes. 

Arthur Falls
Arthur Falls