Filename: 218-usage-controller-events.txt
Title: Controller events to better understand connection/circuit usage
Author: Rob Jansen, Karsten Loesing
Created: 2013-02-06
Status: Closed
Implemented-In: 0.2.5.2-alpha

1. Overview

  This proposal defines three new controller events that shall help
  understand connection and circuit usage.  These events are designed
  to be emitted in private Tor networks only.  This proposal also
  defines a tweak to an existing event for the same purpose.

2. Motivation

  We need to better understand connection and circuit usage in order to
  better simulate Tor networks.  Existing controller events are a fine
  start, but we need more detailed information about per-connection
  bandwidth, processed cells by circuit, and token bucket refills.  This
  proposal defines controller events containing the desired information.

  Most of these usage data are too sensitive to be captured in the
  public network, unless aggregated sufficiently.  That is why we're
  focusing on private Tor networks first, that is, relays that have
  TestingTorNetwork set.  The new controller events described in this
  proposal shall all be restricted to private Tor networks.  In the next
  step we might define aggregate statistics to be gathered by public
  relays, but that will require a new proposal.

3. Design

  The proposed new event types use Tor's asynchronous event mechanism
  where a controller registers for events by type and processes events
  received from the Tor process.

  Tor controllers can register for any of the new event types, but
  events will only be emitted if the Tor process is running in
  TestingTorNetwork mode.

4. Security implications

  There should be no security implications from the new event types,
  because they are only emitted in private Tor networks.

5. Specification

5.1. ConnID Token

  Addition for section 2.4 of the control-spec (General-use tokens).

  ; Unique identifiers for connections or queues.  Only included in
  ; TestingTorNetwork mode.

  ConnID = 1*16 IDChar
  QueueID = 1*16 IDChar

5.2. Adding an ID field to ORCONN events

  The new syntax for ORCONN events is:

    "650" SP "ORCONN" SP (LongName / Target) SP ORStatus
             [ SP "ID=" ConnID ] [ SP "REASON=" Reason ]
             [ SP "NCIRCS=" NumCircuits ] CRLF

  The remaining specification of that event type stays unchanged.

5.3. Bandwidth used on an OR or DIR or EXIT connection

  The syntax is:
     "650" SP "CONN_BW" SP "ID=" ConnID SP "TYPE=" ConnType
              SP "READ=" BytesRead SP "WRITTEN=" BytesWritten CRLF
     ConnType = "OR" / "DIR" / "EXIT"
     BytesRead = 1*DIGIT
     BytesWritten = 1*DIGIT

  Controllers MUST tolerate unrecognized connection types.

  BytesWritten and BytesRead are the number of bytes written and read
  by Tor since the last CONN_BW event on this connection.

  These events are generated about once per second per connection; no
  events are generated for connections that have not read or written.
  These events are only generated if TestingTorNetwork is set.

5.4. Bandwidth used by all streams attached to a circuit

  The syntax is:
     "650" SP "CIRC_BW" SP "ID=" CircuitID SP "READ=" BytesRead SP
              "WRITTEN=" BytesWritten CRLF
     BytesRead = 1*DIGIT
     BytesWritten = 1*DIGIT

  BytesRead and BytesWritten are the number of bytes read and written by
  all applications with streams attached to this circuit since the last
  CIRC_BW event.

  These events are generated about once per second per circuit; no events
  are generated for circuits that had no attached stream writing or
  reading.

5.5. Per-circuit cell stats

  The syntax is:
     "650" SP "CELL_STATS"
              [ SP "ID=" CircuitID ]
              [ SP "InboundQueue=" QueueID SP "InboundConn=" ConnID ]
              [ SP "InboundAdded=" CellsByType ]
              [ SP "InboundRemoved=" CellsByType SP
                   "InboundTime=" MsecByType ]
              [ SP "OutboundQueue=" QueueID SP "OutboundConn=" ConnID ]
              [ SP "OutboundAdded=" CellsByType ]
              [ SP "OutboundRemoved=" CellsByType SP
                   "OutboundTime=" MsecByType ] CRLF
     CellsByType, MsecByType = CellType ":" 1*DIGIT
                               0*( "," CellType ":" 1*DIGIT )
     CellType = 1*( "a" - "z" / "0" - "9" / "_" )

  Examples are:
     650 CELL_STATS ID=14 OutboundQueue=19403 OutboundConn=15
         OutboundAdded=create_fast:1,relay_early:2
         OutboundRemoved=create_fast:1,relay_early:2
         OutboundTime=create_fast:0,relay_early:0
     650 CELL_STATS InboundQueue=19403 InboundConn=32
         InboundAdded=relay:1,created_fast:1
         InboundRemoved=relay:1,created_fast:1
         InboundTime=relay:0,created_fast:0
         OutboundQueue=6710 OutboundConn=18
         OutboundAdded=create:1,relay_early:1
         OutboundRemoved=create:1,relay_early:1
         OutboundTime=create:0,relay_early:0

  ID is the locally unique circuit identifier that is only included if the
  circuit originates at this node.

  Inbound and outbound refer to the direction of cell flow through the
  circuit which is either to origin (inbound) or from origin (outbound).

  InboundQueue and OutboundQueue are identifiers of the inbound and
  outbound circuit queues of this circuit.  These identifiers are only
  unique per OR connection.  OutboundQueue is chosen by this node and
  matches InboundQueue of the next node in the circuit.

  InboundConn and OutboundConn are locally unique IDs of inbound and
  outbound OR connection.  OutboundConn does not necessarily match
  InboundConn of the next node in the circuit.

  InboundQueue and InboundConn are not present if the circuit originates
  at this node.  OutboundQueue and OutboundConn are not present if the
  circuit (currently) ends at this node.

  InboundAdded and OutboundAdded are total number of cells by cell type
  added to inbound and outbound queues.  Only present if at least one cell
  was added to a queue.

  InboundRemoved and OutboundRemoved are total number of cells by
  cell type processed from inbound and outbound queues.  InboundTime and
  OutboundTime are total waiting times in milliseconds of all processed
  cells by cell type.  Only present if at least one cell was removed from
  a queue.

  These events are generated about once per second per circuit; no
  events are generated for circuits that have not added or processed any
  cell.  These events are only generated if TestingTorNetwork is set.

5.6. Token buckets refilled

  The syntax is:
     "650" SP "TB_EMPTY" SP BucketName [ SP "ID=" ConnID ] SP
              "READ=" ReadBucketEmpty SP "WRITTEN=" WriteBucketEmpty SP
              "LAST=" LastRefill CRLF

     BucketName = "GLOBAL" / "RELAY" / "ORCONN"
     ReadBucketEmpty = 1*DIGIT
     WriteBucketEmpty = 1*DIGIT
     LastRefill = 1*DIGIT

  Examples are:
     650 TB_EMPTY ORCONN ID=16 READ=0 WRITTEN=0 LAST=100
     650 TB_EMPTY GLOBAL READ=93 WRITTEN=93 LAST=100
     650 TB_EMPTY RELAY READ=93 WRITTEN=93 LAST=100

  This event is generated when refilling a previously empty token
  bucket.  BucketNames "GLOBAL" and "RELAY" keywords are used for the
  global or relay token buckets, BucketName "ORCONN" is used for the
  token buckets of an OR connection.  Controllers MUST tolerate
  unrecognized bucket names.

  ConnID is only included if the BucketName is "ORCONN".

  If both global and relay buckets and/or the buckets of one or more OR
  connections run out of tokens at the same time, multiple separate
  events are generated.

  ReadBucketEmpty (WriteBucketEmpty) is the time in millis that the read
  (write) bucket was empty since the last refill.  LastRefill is the
  time in millis since the last refill.

  If a bucket went negative and if refilling tokens didn't make it go
  positive again, there will be multiple consecutive TB_EMPTY events for
  each refill interval during which the bucket contained zero tokens or
  less.  In such a case, ReadBucketEmpty or WriteBucketEmpty are capped
  at LastRefill in order not to report empty times more than once.

  These events are only generated if TestingTorNetwork is set.

6. Compatibility

  There should not be any compatibility issues with other Tor versions.

7. Implementation

  Most of the implementation should be straight-forward.

8. Performance and scalability notes

  Most of the new code won't be executed in normal Tor mode.  Wherever
  we needed new fields in existing structs, we tried hard to keep them
  as small as possible.  Still, we should make sure that memory
  requirements won't grow significantly on busy relays.