Deriving blinded keys and subcredentials
In each time period (see [TIME-PERIODS] for a definition of time periods), a hidden service host uses a different blinded private key to sign its directory information, and clients use a different blinded public key as the index for fetching that information.
For a candidate for a key derivation method, see Appendix [KEYBLIND].
Additionally, clients and hosts derive a subcredential for each period. Knowledge of the subcredential is needed to decrypt hidden service descriptors for each period and to authenticate with the hidden service host in the introduction process. Unlike the credential, it changes each period. Knowing the subcredential does not enable the hidden service host to derive the main credential--therefore, it is safe to put the subcredential on the hidden service host while leaving the hidden service's private key offline.
The subcredential for a period is derived as:
N_hs_subcred = H("subcredential" | N_hs_cred | blinded-public-key).
In the above formula, credential corresponds to:
N_hs_cred = H("credential" | public-identity-key)
where public-identity-key
is the public identity master key of the hidden
service.
Locating, uploading, and downloading hidden service descriptors
To avoid attacks where a hidden service's descriptor is easily targeted for censorship, we store them at different directories over time, and use shared random values to prevent those directories from being predictable far in advance.
Which Tor servers hosts a hidden service depends on:
* the current time period,
* the daily subcredential,
* the hidden service directories' public keys,
* a shared random value that changes in each time period,
shared_random_value.
* a set of network-wide networkstatus consensus parameters.
(Consensus parameters are integer values voted on by authorities
and published in the consensus documents, described in
dir-spec.txt, section 3.3.)
Below we explain in more detail.
Dividing time into periods
To prevent a single set of hidden service directory from becoming a target by adversaries looking to permanently censor a hidden service, hidden service descriptors are uploaded to different locations that change over time.
The length of a "time period" is controlled by the consensus parameter 'hsdir-interval', and is a number of minutes between 30 and 14400 (10 days). The default time period length is 1440 (one day).
Time periods start at the Unix epoch (Jan 1, 1970), and are computed by taking the number of minutes since the epoch and dividing by the time period. However, we want our time periods to start at a regular offset from the SRV voting schedule, so we subtract a "rotation time offset" of 12 voting periods from the number of minutes since the epoch, before dividing by the time period (effectively making "our" epoch start at Jan 1, 1970 12:00UTC when the voting period is 1 hour.)
Example: If the current time is 2016-04-13 11:15:01 UTC, making the seconds since the epoch 1460546101, and the number of minutes since the epoch 24342435. We then subtract the "rotation time offset" of 12*60 minutes from the minutes since the epoch, to get 24341715. If the current time period length is 1440 minutes, by doing the division we see that we are currently in time period number 16903.
Specifically, time period #16903 began 16903*1440*60 + (12*60*60) seconds after the epoch, at 2016-04-12 12:00 UTC, and ended at 16904*1440*60 + (12*60*60) seconds after the epoch, at 2016-04-13 12:00 UTC.
When to publish a hidden service descriptor
Hidden services periodically publish their descriptor to the responsible HSDirs. The set of responsible HSDirs is determined as specified in [WHERE-HSDESC].
Specifically, every time a hidden service publishes its descriptor, it also sets up a timer for a random time between 60 minutes and 120 minutes in the future. When the timer triggers, the hidden service needs to publish its descriptor again to the responsible HSDirs for that time period. [TODO: Control republish period using a consensus parameter?]
Overlapping descriptors
Hidden services need to upload multiple descriptors so that they can be reachable to clients with older or newer consensuses than them. Services need to upload their descriptors to the HSDirs before the beginning of each upcoming time period, so that they are readily available for clients to fetch them. Furthermore, services should keep uploading their old descriptor even after the end of a time period, so that they can be reachable by clients that still have consensuses from the previous time period.
Hence, services maintain two active descriptors at every point. Clients on the other hand, don't have a notion of overlapping descriptors, and instead always download the descriptor for the current time period and shared random value. It's the job of the service to ensure that descriptors will be available for all clients. See section [FETCHUPLOADDESC] for how this is achieved.
[TODO: What to do when we run multiple hidden services in a single host?]
Where to publish a hidden service descriptor
This section specifies how the HSDir hash ring is formed at any given time. Whenever a time value is needed (e.g. to get the current time period number), we assume that clients and services use the valid-after time from their latest live consensus.
The following consensus parameters control where a hidden service descriptor is stored;
hsdir_n_replicas = an integer in range [1,16] with default value 2.
hsdir_spread_fetch = an integer in range [1,128] with default value 3.
hsdir_spread_store = an integer in range [1,128] with default value 4.
(Until 0.3.2.8-rc, the default was 3.)
To determine where a given hidden service descriptor will be stored in a given period, after the blinded public key for that period is derived, the uploading or downloading party calculates:
for replicanum in 1...hsdir_n_replicas:
hs_service_index(replicanum) = H("store-at-idx" |
blinded_public_key |
INT_8(replicanum) |
INT_8(period_length) |
INT_8(period_num) )
where blinded_public_key is specified in section [KEYBLIND], period_length is the length of the time period in minutes, and period_num is calculated using the current consensus "valid-after" as specified in section [TIME-PERIODS].
Then, for each node listed in the current consensus with the HSDir flag, we compute a directory index for that node as:
hs_relay_index(node) = H("node-idx" | node_identity |
shared_random_value |
INT_8(period_num) |
INT_8(period_length) )
where shared_random_value is the shared value generated by the authorities in section [PUB-SHAREDRANDOM], and node_identity is the ed25519 identity key of the node.
Finally, for replicanum in 1...hsdir_n_replicas, the hidden service host uploads descriptors to the first hsdir_spread_store nodes whose indices immediately follow hs_service_index(replicanum). If any of those nodes have already been selected for a lower-numbered replica of the service, any nodes already chosen are disregarded (i.e. skipped over) when choosing a replica's hsdir_spread_store nodes.
When choosing an HSDir to download from, clients choose randomly from among the first hsdir_spread_fetch nodes after the indices. (Note that, in order to make the system better tolerate disappearing HSDirs, hsdir_spread_fetch may be less than hsdir_spread_store.) Again, nodes from lower-numbered replicas are disregarded when choosing the spread for a replica.
Using time periods and SRVs to fetch/upload HS descriptors
Hidden services and clients need to make correct use of time periods (TP) and shared random values (SRVs) to successfully fetch and upload descriptors. Furthermore, to avoid problems with skewed clocks, both clients and services use the 'valid-after' time of a live consensus as a way to take decisions with regards to uploading and fetching descriptors. By using the consensus times as the ground truth here, we minimize the desynchronization of clients and services due to system clock. Whenever time-based decisions are taken in this section, assume that they are consensus times and not system times.
As [PUB-SHAREDRANDOM] specifies, consensuses contain two shared random values (the current one and the previous one). Hidden services and clients are asked to match these shared random values with descriptor time periods and use the right SRV when fetching/uploading descriptors. This section attempts to precisely specify how this works.
Let's start with an illustration of the system:
+------------------------------------------------------------------+
| |
| 00:00 12:00 00:00 12:00 00:00 12:00 |
| SRV#1 TP#1 SRV#2 TP#2 SRV#3 TP#3 |
| |
| $==========|-----------$===========|-----------$===========| |
| |
| |
+------------------------------------------------------------------+
Legend: [TP#1 = Time Period #1]
[SRV#1 = Shared Random Value #1]
["$" = descriptor rotation moment]
Client behavior for fetching descriptors
And here is how clients use TPs and SRVs to fetch descriptors:
Clients always aim to synchronize their TP with SRV, so they always want to use TP#N with SRV#N: To achieve this wrt time periods, clients always use the current time period when fetching descriptors. Now wrt SRVs, if a client is in the time segment between a new time period and a new SRV (i.e. the segments drawn with "-") it uses the current SRV, else if the client is in a time segment between a new SRV and a new time period (i.e. the segments drawn with "="), it uses the previous SRV.
Example:
+------------------------------------------------------------------+
| |
| 00:00 12:00 00:00 12:00 00:00 12:00 |
| SRV#1 TP#1 SRV#2 TP#2 SRV#3 TP#3 |
| |
| $==========|-----------$===========|-----------$===========| |
| ^ ^ |
| C1 C2 |
+------------------------------------------------------------------+
If a client (C1) is at 13:00 right after TP#1, then it will use TP#1 and SRV#1 for fetching descriptors. Also, if a client (C2) is at 01:00 right after SRV#2, it will still use TP#1 and SRV#1.
Service behavior for uploading descriptors
As discussed above, services maintain two active descriptors at any time. We call these the "first" and "second" service descriptors. Services rotate their descriptor every time they receive a consensus with a valid_after time past the next SRV calculation time. They rotate their descriptors by discarding their first descriptor, pushing the second descriptor to the first, and rebuilding their second descriptor with the latest data.
Services like clients also employ a different logic for picking SRV and TP values based on their position in the graph above. Here is the logic:
First descriptor upload logic
Here is the service logic for uploading its first descriptor:
When a service is in the time segment between a new time period a new SRV (i.e. the segments drawn with "-"), it uses the previous time period and previous SRV for uploading its first descriptor: that's meant to cover for clients that have a consensus that is still in the previous time period.
Example: Consider in the above illustration that the service is at 13:00 right after TP#1. It will upload its first descriptor using TP#0 and SRV#0. So if a client still has a 11:00 consensus it will be able to access it based on the client logic above.
Now if a service is in the time segment between a new SRV and a new time period (i.e. the segments drawn with "=") it uses the current time period and the previous SRV for its first descriptor: that's meant to cover clients with an up-to-date consensus in the same time period as the service.
Example:
+------------------------------------------------------------------+
| |
| 00:00 12:00 00:00 12:00 00:00 12:00 |
| SRV#1 TP#1 SRV#2 TP#2 SRV#3 TP#3 |
| |
| $==========|-----------$===========|-----------$===========| |
| ^ |
| S |
+------------------------------------------------------------------+
Consider that the service is at 01:00 right after SRV#2: it will upload its first descriptor using TP#1 and SRV#1.
Second descriptor upload logic
Here is the service logic for uploading its second descriptor:
When a service is in the time segment between a new time period a new SRV (i.e. the segments drawn with "-"), it uses the current time period and current SRV for uploading its second descriptor: that's meant to cover for clients that have an up-to-date consensus on the same TP as the service.
Example: Consider in the above illustration that the service is at 13:00 right after TP#1: it will upload its second descriptor using TP#1 and SRV#1.
Now if a service is in the time segment between a new SRV and a new time period (i.e. the segments drawn with "=") it uses the next time period and the current SRV for its second descriptor: that's meant to cover clients with a newer consensus than the service (in the next time period).
Example:
+------------------------------------------------------------------+
| |
| 00:00 12:00 00:00 12:00 00:00 12:00 |
| SRV#1 TP#1 SRV#2 TP#2 SRV#3 TP#3 |
| |
| $==========|-----------$===========|-----------$===========| |
| ^ |
| S |
+------------------------------------------------------------------+
Consider that the service is at 01:00 right after SRV#2: it will upload its second descriptor using TP#2 and SRV#2.
Directory behavior for handling descriptor uploads [DIRUPLOAD]
Upon receiving a hidden service descriptor publish request, directories MUST check the following:
* The outer wrapper of the descriptor can be parsed according to
[DESC-OUTER]
* The version-number of the descriptor is "3"
* If the directory has already cached a descriptor for this hidden service,
the revision-counter of the uploaded descriptor must be greater than the
revision-counter of the cached one
* The descriptor signature is valid
If any of these basic validity checks fails, the directory MUST reject the descriptor upload.
NOTE: Even if the descriptor passes the checks above, its first and second layers could still be invalid: directories cannot validate the encrypted layers of the descriptor, as they do not have access to the public key of the service (required for decrypting the first layer of encryption), or the necessary client credentials (for decrypting the second layer).
Expiring hidden service descriptors
Hidden services set their descriptor's "descriptor-lifetime" field to 180 minutes (3 hours). Hidden services ensure that their descriptor will remain valid in the HSDir caches, by republishing their descriptors periodically as specified in [WHEN-HSDESC].
Hidden services MUST also keep their introduction circuits alive for as long as descriptors including those intro points are valid (even if that's after the time period has changed).
URLs for anonymous uploading and downloading
Hidden service descriptors conforming to this specification are uploaded
with an HTTP POST request to the URL /tor/hs/<version>/publish
relative to
the hidden service directory's root, and downloaded with an HTTP GET
request for the URL /tor/hs/<version>/<z>
where <z>
is a base64 encoding of
the hidden service's blinded public key and <version>
is the protocol
version which is "3" in this case.
These requests must be made anonymously, on circuits not used for anything else.
Client-side validation of onion addresses
When a Tor client receives a prop224 onion address from the user, it MUST first validate the onion address before attempting to connect or fetch its descriptor. If the validation fails, the client MUST refuse to connect.
As part of the address validation, Tor clients should check that the underlying ed25519 key does not have a torsion component. If Tor accepted ed25519 keys with torsion components, attackers could create multiple equivalent onion addresses for a single ed25519 key, which would map to the same service. We want to avoid that because it could lead to phishing attacks and surprising behaviors (e.g. imagine a browser plugin that blocks onion addresses, but could be bypassed using an equivalent onion address with a torsion component).
The right way for clients to detect such fraudulent addresses (which should only occur malevolently and never naturally) is to extract the ed25519 public key from the onion address and multiply it by the ed25519 group order and ensure that the result is the ed25519 identity element. For more details, please see [TORSION-REFS].