Filename: 268-guard-selection.txt
Title: New Guard Selection Behaviour
Author: Isis Lovecruft, George Kadianakis, [Ola Bini]
Created: 2015-10-28
Status: Obsolete
(Editorial note: this was origianlly written as a revision of
proposal 259, but it diverges so substantially that it seemed
better to assign it a new number for reference, so that we
aren't always talking about "The old 259" and "the new 259". -NM)
This proposal has been obsoleted by proposal #271.
§1. Overview
Tor uses entry guards to prevent an attacker who controls some
fraction of the network from observing a fraction of every user's
traffic. If users chose their entries and exits uniformly at
random from the list of servers every time they build a circuit,
then an adversary who had (k/N) of the network would deanonymize
F=(k/N)^2 of all circuits... and after a given user had built C
circuits, the attacker would see them at least once with
probability 1-(1-F)^C. With large C, the attacker would get a
sample of every user's traffic with probability 1.
To prevent this from happening, Tor clients choose a small number of
guard nodes (currently 3). These guard nodes are the only nodes
that the client will connect to directly. If they are not
compromised, the user's paths are not compromised.
But attacks remain. Consider an attacker who can run a firewall
between a target user and the Tor network, and make
many of the guards they don't control appear to be unreachable.
Or consider an attacker who can identify a user's guards, and mount
denial-of-service attacks on them until the user picks a guard
that the attacker controls.
In the presence of these attacks, we can't continue to connect to
the Tor network unconditionally. Doing so would eventually result
in the user choosing a hostile node as their guard, and losing
anonymity.
This proposal outlines a new entry guard selection algorithm, which
addresses the following concerns:
- Heuristics and algorithms for determining how and which guard(s)
is(/are) chosen should be kept as simple and easy to understand
as possible.
- Clients in censored regions or who are behind a fascist firewall
who connect to the Tor network should not experience any
significant disadvantage in terms of reachability or usability.
- Tor should make a best attempt at discovering the most
appropriate behaviour, with as little user input and
configuration as possible.
§2. Design
Alice, an OP attempting to connect to the Tor network, should
undertake the following steps to determine information about the
local network and to select (some) appropriate entry guards. In the
following scenario, it is assumed that Alice has already obtained a
recent, valid, and verifiable consensus document.
The algorithm is divided into four components such that the full
algorithm is implemented by first invoking START, then repeatedly
calling NEXT while adviced it SHOULD_CONTINUE and finally calling
END. For an example usage see §A. Appendix.
Several components of NEXT can be invoked asynchronously. SHOULD_CONTINUE
is used for the algorithm to be able to tell the caller whether we
consider the work done or not - this can be used to retry primary
guards when we finally are able to connect to a guard after a long
network outage, for example.
This algorithm keeps track of the unreachability status for guards
in state global to the system, so that repeated runs will not have
to rediscover unreachability over and over again. However, this
state does not need to be persisted permanently - it is purely an
optimization.
The algorithm expects several arguments to guide its behavior. These
will be defined in §2.1.
The goal of this algorithm is to strongly prefer connecting to the
same guards we have connected to before, while also trying to detect
conditions such as a network outage. The way it does this is by keeping
track of how many guards we have exposed ourselves to, and if we have
connected to too many we will fall back to only retrying the ones we have
already tried. The algorithm also decides on sample set that should
be persisted - in order to minimize the risk of an attacker forcing
enumeration of the whole network by triggering rebuilding of
circuits.
§2.1. Definitions
Bad guard: a guard is considered bad if it conforms with the function IS_BAD
(see §G. Appendix for details).
Dead guard: a guard is considered dead if it conforms with the function
IS_DEAD (see §H. Appendix for details).
Obsolete guard: a guard is considered obsolete if it conforms with the
function IS_OBSOLETE (see §I. Appendix for details).
Live entry guard: a guard is considered live if it conforms with the function
IS_LIVE (see §D. Appendix for details).
§2.1. The START algorithm
In order to start choosing an entry guard, use the START
algorithm. This takes four arguments that can be used to fine tune
the workings:
USED_GUARDS
This is a list that contains all the guards that have been used
before by this client. We will prioritize using guards from this
list in order to minimize our exposure. The list is expected to
be sorted based on priority, where the first entry will have the
highest priority.
SAMPLED_GUARDS
This is a set that contains all guards that should be considered
for connection. This set should be persisted between runs. It
should be filled by using NEXT_BY_BANDWIDTH with GUARDS as an
argument if it's empty, or if it contains less than SAMPLE_SET_THRESHOLD
guards after winnowing out older guards.
N_PRIMARY_GUARDS
The number of guards we should consider our primary
guards. These guards will be retried more frequently and will
take precedence in most situations. By default the primary
guards will be the first N_PRIMARY_GUARDS guards from USED_GUARDS.
When the algorith is used in constrained mode (have bridges or entry
nodes in the configuration file), this value should be 1 otherwise the
proposed value is 3.
DIR
If this argument is set, we should only consider guards that can
be directory guards. If not set, we will consider all guards.
The primary work of START is to initialize the state machine depicted
in §2.2. The initial state of the machine is defined by:
GUARDS
This is a set of all guards from the consensus. It will primarily be used
to fill in SAMPLED_GUARDS
FILTERED_SAMPLED
This is a set that contains all guards that we are willing to connect to.
It will be obtained from calling FILTER_SET with SAMPLED_GUARDS as
argument.
REMAINING_GUARDS
This is a running set of the guards we have not yet tried to connect to.
It should be initialized to be FILTERED_SAMPLED without USED_GUARDS.
STATE
A variable that keeps track of which state in the state
machine we are currently in. It should be initialized to
STATE_PRIMARY_GUARDS.
PRIMARY_GUARDS
This list keeps track of our primary guards. These are guards
that we will prioritize when trying to connect, and will also
retry more often in case of failure with other guards.
It should be initialized by calling algorithm
NEXT_PRIMARY_GUARD repeatedly until PRIMARY_GUARDS contains
N_PRIMARY_GUARDS elements.
§2.2. The NEXT algorithm
The NEXT algorithm is composed of several different possibly flows. The
first one is a simple state machine that can transfer between two
different states. Every time NEXT is invoked, it will resume at the
state where it left off previously. In the course of selecting an
entry guard, a new consensus can arrive. When that happens we need
to update the data structures used, but nothing else should change.
Before jumping in to the state machine, we should first check if it
was at least PRIMARY_GUARDS_RETRY_INTERVAL minutes since we tried
any of the PRIMARY_GUARDS. If this is the case, and we are not in
STATE_PRIMARY_GUARDS, we should save the previous state and set the
state to STATE_PRIMARY_GUARDS.
§2.2.1. The STATE_PRIMARY_GUARDS state
Return each entry in PRIMARY_GUARDS in turn. For each entry, if the
guard should be retried and considered suitable use it. A guard is
considered to eligible to retry if is marked for retry or is live
and id not bad. Also, a guard is considered to be suitable if is
live and, if is a directory it should not be a cache.
If all entries have been tried transition to STATE_TRY_REMAINING.
§2.2.2. The STATE_TRY_REMAINING state
Return each entry in USED_GUARDS that is not in PRIMARY_GUARDS in
turn.For each entry, if a guard is found return it.
Return each entry from REMAINING_GUARDS in turn.
For each entry, if the guard should be retried and considered
suitable use it and mark it as unreachable. A guard is
considered to eligible to retry if is marked for retry or is live
and id not bad. Also, a guard is considered to be suitable if is
live and, if is a directory it should not be a cache.
If no entries remain in REMAINING_GUARDS, transition to
STATE_PRIMARY_GUARDS.
§2.2.3. ON_NEW_CONSENSUS
First, ensure that all guard profiles are updated with information
about whether they were in the newest consensus or not.
Update the bad status for all guards in USED_GUARDS and SAMPLED_GUARDS.
Remove all dead guards from USED_GUARDS and SAMPLED_GUARDS.
Remove all obsolete guards from USED_GUARDS and SAMPLED_GUARDS.
§2.3. The SHOULD_CONTINUE algorithm
This algorithm takes as an argument a boolean indicating whether the
circuit was successfully built or not.
After the caller have tried to build a circuit with a returned
guard, they should invoke SHOULD_CONTINUE to understand if the
algorithm is finished or not. SHOULD_CONTINUE will always return
true if the circuit failed. If the circuit succeeded,
SHOULD_CONTINUE will always return false, unless the guard that
succeeded was the first guard to succeed after
INTERNET_LIKELY_DOWN_INTERVAL minutes - in that case it will set the
state to STATE_PRIMARY_GUARDS and return true.
§2.4. The END algorithm
The goal of this algorithm is simply to make sure that we keep track
of successful connections made. This algorithm should be invoked
with the guard that was used to correctly set up a circuit.
Once invoked, this algorithm will mark the guard as used, and make
sure it is in USED_GUARDS, by adding it at the end if it was not there.
§2.5. Helper algorithms
These algorithms are used in the above algorithms, but have been
separated out here in order to make the flow clearer.
NEXT_PRIMARY_GUARD
- Return the first entry from USED_GUARDS that is not in
PRIMARY_GUARDS and that is in the most recent consensus.
- If USED_GUARDS is empty, use NEXT_BY_BANDWIDTH with
REMAINING_GUARDS as the argument.
NEXT_BY_BANDWIDTH
- Takes G as an argument, which should be a set of guards to
choose from.
- Return a randomly select element from G, weighted by bandwidth.
FILTER_SET
- Takes G as an argument, which should be a set of guards to filter.
- Filter out guards in G that don't comply with IS_LIVE (see
§D. Appendix for details).
- If the filtered set is smaller than MINIMUM_FILTERED_SAMPLE_SIZE and G
is smaller than MAXIMUM_SAMPLE_SIZE_THRESHOLD, expand G and try to
filter out again. G is expanded by adding one new guard at a time using
NEXT_BY_BANDWIDTH with GUARDS as an argument.
- If G is not smaller than MAXIMUM_SAMPLE_SIZE_THRESHOLD, G should not be
expanded. Abort execution of this function by returning null and report
an error to the user.
§3. Consensus Parameters, & Configurable Variables
This proposal introduces several new parameters that ideally should
be set in the consensus but that should also be possible to
set or override in the client configuration file. Some of these have
proposed values, but for others more simulation and trial needs to
happen.
PRIMARY_GUARDS_RETRY_INTERVAL
In order to make it more likely we connect to a primary guard,
we would like to retry the primary guards more often than other
types of guards. This parameter controls how many minutes should
pass before we consider retrying primary guards again. The
proposed value is 3.
SAMPLE_SET_THRESHOLD
In order to allow us to recognize completely unreachable network,
we would like to avoid connecting to too many guards before switching
modes. We also want to avoid exposing ourselves to too many nodes in a
potentially hostile situation. This parameter, expressed as a
fraction, determines the number of guards we should keep as the
sampled set of the only guards we will consider connecting
to. It will be used as a fraction for the sampled set.
If we assume there are 1900 guards, a setting of 0.02
means we will have a sample set of 38 guards.
This limits our total exposure. Proposed value is 0.02.
MINIMUM_FILTERED_SAMPLE_SIZE
The minimum size of the sampled set after filtering out nodes based on
client configuration (FILTERED_SAMPLED). Proposed value is ???.
MAXIMUM_SAMPLE_SIZE_THRESHOLD
In order to guarantee a minimum size of guards after filtering,
we expand SAMPLED_GUARDS until a limit. This fraction of GUARDS will be
used as an upper bound when expanding SAMPLED_GUARDS.
Proposed value is 0.03.
INTERNET_LIKELY_DOWN_INTERVAL
The number of minutes since we started trying to find an entry
guard before we should consider the network down and consider
retrying primary guards before using a functioning guard
found. Proposed value 5.
§4. Security properties and behavior under various conditions
Under normal conditions, this algorithm will allow us to quickly
connect and use guards we have used before with high likelihood of
working. Assuming the first primary guard is reachable and in the
consensus, this algorithm will deterministically always return that
guard.
Under dystopic conditions (when a firewall is in place that blocks
all ports except for potentially port 80 and 443), this algorithm
will try to connect to 2% of all guards before switching modes to try
dystopic guards. Currently, that means trying to connect to circa 40
guards before getting a successful connection. If we assume a
connection try will take maximum 10 seconds, that means it will take
up to 6 minutes to get a working connection.
When the network is completely down, we will try to connect to 2% of
all guards plus 2% of all dystopic guards before realizing we are
down. This means circa 50 guards tried assuming there are 1900 guards
in the network.
In terms of exposure, we will connect to a maximum of 2% of all
guards plus 2% of all dystopic guards, or 3% of all guards,
whichever is lower. If N is the number of guards, and k is the
number of guards an attacker controls, that means an attacker would
have a probability of 1-(1-(k/N)^2)^(N * 0.03) to have one of their
guards selected before we fall back. In real terms, this means an
attacker would need to control over 10% of all guards in order to
have a larger than 50% chance of controlling a guard for any given client.
In addition, since the sampled set changes slowly (the suggestion
here is that guards in it expire every month) it is not possible for
an attacker to force a connection to an entry guard that isn't
already in the users sampled set.
§A. Appendix: An example usage
In order to clarify how this algorithm is supposed to be used, this
pseudo code illustrates the building of a circuit:
ESTABLISH_CIRCUIT:
if chosen_entry_node = NULL
if context = NULL
context = ALGO_CHOOSE_ENTRY_GUARD_START(used_guards,
sampled_guards=[],
options,
n_primary_guards=3,
dir=false,
guards_in_consensus)
chosen_entry_node = ALGO_CHOOSE_ENTRY_GUARD_NEXT(context)
if not IS_SUITABLE(chosen_entry_node)
try another entry guard
circuit = composeCircuit(chosen_entry_node)
return circuit
ON_FIRST_HOP_CALLBACK(channel):
if !SHOULD_CONTINUE:
ALGO_CHOOSE_ENTRY_GUARD_END(entryGuard)
else
chosen_entry_node = NULL
§B. Appendix: Entry Points in Tor
In order to clarify how this algorithm is supposed to be integrated with
Tor, here are some entry points to trigger actions mentioned in spec:
When establish_circuit:
If *chosen_entry_node* doesn't exist
If *context* exist, populate the first one as *context*
Otherwise, use ALGO_CHOOSE_ENTRY_GUARD_START to initalize a new *context*.
After this when we want to choose_good_entry_server, we will use
ALGO_CHOOSE_ENTRY_GUARD_NEXT to get a candidate.
Use chosen_entry_node to build_circuit and handle_first_hop,
return this circuit
When entry_guard_register_connect_status(should_continue):
if !should_continue:
Call ALGO_CHOOSE_ENTRY_GUARD_END(chosen_entry_node)
else:
Set chosen_entry_node to NULL
When new directory_info_has_arrived:
Do ON_NEW_CONSENSUS
§C. Appendix: IS_SUITABLE helper function
A guard is suitable if it satisfies all of the folowing conditions:
- It's considered to be live, according to IS_LIVE.
- It's a directory cache if a directory guard is requested.
- It's not the chosen exit node.
- It's not in the family of the chosen exit node.
This conforms to the existing conditions in "populate_live_entry_guards()".
§D. Appendix: IS_LIVE helper function
A guard is considered live if it satisfies all of the folowing conditions:
- It's not disabled because of path bias issues (path_bias_disabled).
- It was not observed to become unusable according to the directory or
the user configuration (bad_since).
- It's marked for retry (can_retry) or it's been unreachable for some
time (unreachable_since) but enough time has passed since we last tried
to connect to it (entry_is_time_to_retry).
- It's in our node list, meaninig it's present in the latest consensus.
- It has a usable descriptor (either a routerdescriptor or a
microdescriptor) unless a directory guard is requested.
- It's a general-purpose router unless UseBridges is configured.
- It's reachable by the configuration (fascist_firewall_allows_node).
This conforms to the existing conditions in "entry_is_live()".
A guard is observed to become unusable according to the directory or the
user configuration if it satisfies any of the following conditions:
- It's not in our node list, meaninig it's present in the latest consensus.
- It's not currently running (is_running).
- It's not a bridge and not a configured bridge
(node_is_a_configured_bridge) and UseBridges is True.
- It's not a possible guard and is not in EntryNodes and UseBridges is
False.
- It's in ExcludeNodes. Nevertheless this is ignored when
loading from config.
- It's not reachable by the configuration (fascist_firewall_allows_node).
- It's disabled because of path bias issues (path_bias_disabled).
This conforms to the existing conditions in "entry_guards_compute_status()".
§E. Appendix: UseBridges and Bridges configurations
This is mutually exclusive with EntryNodes.
If options->UseBridges OR options->EntryNodes:
- guards = populate_live_entry_guards() - this is the "bridge flavour" of
IS_SUITABLE as mentioned before.
- return node_sl_choose_by_bandwidth(guards, WEIGHT_FOR_GUARD)
This is "choose a guard from S by bandwidth weight".
UseBridges and Bridges must be set together. Bridges go to bridge_list (via
bridge_add_from_config()), but how is it used?
learned_bridge_descriptor() adds the bridge to the global entry_guards if
UseBridges = True.
We either keep the existing global entry_guards OR incorporate bridges in the
proposal (remove non bridges from USED_GUARDS, and REMAINING_GUARDS = bridges?)
If UseBridges is set as true, we need to fill the SAMPLED_GUARDS
with bridges specified and learned from consensus.
§F. Appendix: EntryNodes configuration
This is mutually exclusive with Bridges.
The global entry_guards will be updated with entries in EntryNodes
(see entry_guards_set_from_config()).
If EntryNodes is set, we need to fill the SAMPLED_GUARDS with
EntryNodes specified in options.
§G. Appendix: IS_BAD helper function
A guard is considered bad if is not included in the newest
consensus.
§H. Appendix: IS_DEAD helper function
A guard is considered dead if it's marked as bad for
ENTRY_GUARD_REMOVE_AFTER period (30 days) unless they have been disabled
because of path bias issues (path_bias_disabled).
§I. Appendix: IS_OBSOLETE helper function
A guard is considered obsolete if it was chosen by an Tor
version we can't recognize or it was chosen more than GUARD_LIFETIME ago.
-*- coding: utf-8 -*-