summaryrefslogtreecommitdiffstats
path: root/doc/DESIGN-AND-ROADMAP.md
blob: ca96989f970f46938a1f0e2970a355fbd722454f (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
# PRIVACY BADGER DESIGN AND ROADMAP

## DESIGN

### OBJECTIVE

Privacy Badger aims to

 - Protect users against non-consensual tracking by third party domains as they
   browse the Web.

 - Send and enforce the Do Not Track signal to sites (especially "third party"
   sites since they are in a position to collect a large fraction of the user's
   browsing history).

Privacy Badger consists of a primary tracker blocking algorithm, augmented by
a number of secondary features that extend further privacy protection and
reduce breakage from the primary mechanism.

### PRIMARY MECHANISM

Privacy Badger:

1. Ensures your browser is sending the DNT: 1 header (in some regulatory
   environments, it is advisable to note "installing Privacy Badger will enable
   Do Not Track" on your installation page / app store entry.
2. Observes which first party origins a given third party origin is setting cookies on
   (certain cookies are deemed to be "low entropy", as discussed below).

   2a. Observes which first party origins a given third party is doing certain
   types of fingerprinting on.

   2b. Observes which first party origins a given third party is setting certain types
   of supercookies on.

   2c. Observes which first party origins a given third party is sending
   certain parts of first party cookies back to itself using image query
   strings (pixel cookie sharing).

3. If a third party origin receives a cookie, a supercookie, an image pixel
   containing first party cookie data, or makes JavaScript fingerprinting API
   calls on 3 or more first party origins, this is deemed to be "cross site
   tracking".
4. Typically, cross site trackers are blocked completely; Privacy Badger
   prevents the browser from communicating with them. The exception is if the
   site is on Privacy Badger's "yellow list" (aka the "cookie block list"), in
   which case resources from the site are loaded, but without access to their
   (third party) cookies or local storage, and with the referer header either
   trimmed down to the origin (for GET requests) or removed outright (all other
   requests). The yellow list is routinely fetched from [an EFF URL](https://www.eff.org/files/cookieblocklist_new.txt)
   to allow prompt fixes for breakage.

   Until methods for blocking them have been implemented, domains that perform
   fingerprinting or use third party supercookies should not be added to the
   yellow list.
5. Users can also choose custom rules for any given domain flagged by Privacy Badger,
   overrulling any automatic decision Privacy Badger has made about the domain.
   Privacy Badger uses three-state sliders (red → block, yellow → cookie block, green → allow) to convey this
   state in UI. We believe this is less confusing than the UI in many other
   blocking tools, which often leave the user confused about whether a visual
   state represents current blocking or the opportunity to block.
6. Domains can agree to EFF's [Do Not Track policy](https://eff.org/dnt-policy). If a domain does this
   Privacy Badger will no longer block its traffic or cookies. If a
   first-party domain posts the policy, this applies to all third parties
   embedded on that domain.
   Sites post the policy at [a well-known URL](https://example.com/.well-known/dnt-policy.txt)
   on their domains. The contents must match those of a file from the list of
   acceptable policies exactly; the policy file is [maintained on github](https://github.com/EFForg/dnt-policy/),
   but Privacy Badger fetches a list of known-good hashes periodically [from EFF](https://www.eff.org/files/dnt-policies.json)
   (version  1.0 of the policy file will be added to that list when Privacy Badger
   reaches version 1.0)

#### Further Details

# :warning: THIS SECTION IS OUTDATED AND NEEDS TO BE REWRITTEN :warning:

Data Structures:

- action_map = { 'google.com': blocked, 'fonts.google.com': 'cookieblocked', 'apis.fonts.google.com': 'user_cookieblock', 'foo.tracker.net': 'allow', 'tracker.net': 'DNT', }
- snitch_map = {google.com: array('cooperq.com', 'noah.com', 'eff.org'), tracker.net: array(a.com, b.com, c.com)}
- dnt_domains = array('tracker.net', 'dnt.eff.org')
- settings = {social_widgets = true, ...}
- cookie_block_list = "{'fonts.google.com': true, 'maps.google.com', true}"


On Request():

      if privacy badger is not enabled for the tab domain then return
      if fqdn is not a third party then return

      action = check_action(fqdn) (described below)

      if action is block then cancel request
      if action is cookie_block then strip headers
      if fqdn is nontracking (i.e check_action returned nothing) then do nothing
      if action is noaction or any user override then async_check_tracking
      if action is allow && count == 2 then blocking_check_tracking
        if check_tracking changed action then call check_action again
        else do_nothing

      async_check_dnt(fqdn)

check_action(fqdn): returns action

      related_domains = array()
      best_action = 'noaction'

      for $domain in range(fqdn ... etld+1)
        if action_map contains $domain
          related_domains.shift($domain)

        for each domain in related domains
          if score(domain.action) > score(best_action)
            best_action = domain.action

        return best_action

check_tracking(fqdn): return boolean

      var base_domain = etld+1(fqdn)

      if has_cookie or has_supercookie or has_fingerprinting
        if snitch_map doesn't have base domain add it
        if snitch_map doesn't have first party add it
        if snitch_map.base_domain.len >= 3
          add base domain to action map as blocked
          add all chlidren of base_domain and self from yellow list to action map
          return true

##### What is an "origin" for Privacy Badger?

Privacy Badger has two notions of origin.  One is the [effective top level
domain](https://wiki.mozilla.org/Public_Suffix_List) plus one level of
subdomain (eTLD+1), computed using
[getBaseDomain](https://developer.mozilla.org/en-US/docs/Mozilla/Tech/XPCOM/Reference/Interface/nsIEffectiveTLDService)
(which is built-in to Firefox; in Chrome we [ship a
copy](https://github.com/EFForg/privacybadger/blob/8e8ad9838b74b6d13354163f78d362ca60dd44f9/src/lib/basedomain.js#L75).
The accounting for which origins are trackers or not is performed by looking
up how many first party fully qualified domain names (FQDNs) have been tracked by each
of these eTLD + 1 origins.  This is a conservative choice, which avoids the
need to evaluate sets of cookies with different scopes.

When the heuristic determines that the correct response is to block,
that decision is applied to the third party eTLD+1 from which tracking was seen.

Users are able to override Privacy Badger's decision for any given FQDN if they
do not wish to block something that is otherwise blocked (or block something
that is not blocked).

To illustrate this, suppose the site <tt>tracking.co.uk</tt> was embedded on
every site on the Web, but each embed came from a randomly selected subdomain
<tt>a.tracking.co.uk</tt>, <tt>b.tracking.co.uk</tt>,
<tt>c.tracking.co.uk</tt>, etc.  Suppose the user visits
<tt>www.news-example.com</tt> and <tt>search.jobs-example.info</tt>.

The accounting data structure <tt>seenThirdParties</tt> would come to include:

```
{
  ...
  "tracking.co.uk" : {
    "news-example.com"  : true,
    "jobs-example.info" : true,
  }
  ...
}
```

Now suppose the user visits a third site, <tt>clickbait.nonprofit.org</tt>,
and is tracked by <tt>q.tracking.co.uk</tt> on that site.  The
seenThirdParties data structure will have a third entry added to it, meeting
the threshold of three first party origins and defining
<tt>tracking.co.uk</tt> as a tracking eTLD+1.  At this point
<tt>tracking.co.uk</tt> will be added to the block list. Any future requests to
<tt>tracking.co.uk</tt>, or any of its subdomains, will be blocked.
The user can manually unblock specific subdomains as necessary via the popup menu.

##### What is a "low entropy" cookie?

Our [current cookie heuristic](https://github.com/EFForg/privacybadger/blob/8e8ad9838b74b6d13354163f78d362ca60dd44f9/src/js/heuristicblocking.js#L632) is to assign "number of identifying bits" estimates to
some known common cookie values, and to bound the sum of these to 12.
Predetermined low-entropy cookies will not be identified as tracking, nor will
combinations of them so long as their total estimated entropy is under 12 bits.

### ADDITIONAL MECHANISMS

#### Widget Substitution

Many social media widgets are inherently designed to combine tracking
and occasionally-useful functionality in a single resource load.
Privacy Badger aims to give the user access to the functionality when they want
it, but protection against the tracking at all other times.

To that end, Privacy Badger has incorporated code from the ShareMeNot project
so that it is able to replace various types of widgets hosted
by third party origins with local, static equivalents that either replace the
original widget faithfully, or create a click-through step before the widget
is loaded and tracks the user.

The widget replacement table lives in the [socialwidgets.json file](https://github.com/EFForg/privacybadger/blob/8e8ad9838b74b6d13354163f78d362ca60dd44f9/src/data/socialwidgets.json).
Widgets are replaced unless the user has chosen to specifically allow that third party
domain (by moving the slider to 'green' in the UI), so users can selectively
disable this functionality if they wish. The code for social media widgets is
quite diverse, so not all variants (especially custom variants that sites build
for themselves) are necessarily replaced.

#### What are the states for domain responses?

Currently domains have three states: no action, cookie block, and block. No
action allows all requests to resolve as normal without intervention from
Privacy Badger. Cookie block allows for requests to resolve normally but will
block cookies from being read or created. Cookie block also trims or removes
the referer header. Block will cause any requests from that origin to be
blocked entirely; before even a TCP connection can be established. The user can
toggle these options manually, which will supersede any determinations made
automatically by Privacy Badger.

#### What does EFFs Do Not Track policy stipulate?

Currently the Do Not Track policy covers where the agreement will be hosted,
how users who send the DNT header are treated, log retention, how information
will be shared with other domains, notifications of disclosure, and possible exceptions.
It can be read in full [here](https://www.eff.org/dnt-policy).

#### How do sites agree to EFFs Do Not Track policy?

Sites can agree to this policy by posting at https://subdomain.example.com/.well-known/dnt-policy.txt,
where "subdomain" is any domain to which the policy applies, for a given third party.

#### Fingerprinting detection
Certain aspects of the browser, such as fonts, add-ons or extensions, screen size,
and seen links, can be used to give the browser a fingerprint that is unique out
of a very small amount of users (see [Panopticlick](https://panopticlick.eff.org/) for more information).

As of Privacy Badger 1.0, any third party script that writes to an HTML5
canvas object and then reads a sufficiently large amount back from the third
party canvas object will be treated the same way as a third party cookie, blocking the
third party origin if it does this across multiple first party origins. Our
research has determined that this is a reliable way to distinguish between
fingerprinting and other third party canvas uses.

This may be augmented by hooks to detect extensive enumeration of properties
in the <tt>navigator</tt> object in the future.

#### Pixel cookie sharing detection

Detection of first to third party cookie sharing via image pixels was added in [#2088](https://github.com/EFForg/privacybadger/issues/2088).

### ROADMAP

#### High priority issues

Please see our ["high priority"-labeled issues](https://github.com/EFForg/privacybadger/issues?q=is%3Aissue+is%3Aopen+label%3A%22high+priority%22).

## Technical Implementation

### How are origins and the rules for them stored?

When a browser with Privacy Badger enabled makes a request to a third party, if
the request contains a cookie or the response tries to set a cookie it gets
flagged as 'tracking'. Origins that make tracking requests get stored in a
key→value store where the keys are the origins making the request, and the
values are the first party origins these requests were made on. If that list of
third parties contains three or more first party origins the third party origin
gets added to another list of known trackers. When Privacy Badger gets a
request from an origin on the known trackers list, if it is not on the yellow
list then Privacy Badger blocks that request. If it is on the yellow list then
the request is allowed to resolve, but all cookie setting and getting parts of
it are blocked, while the referer header is trimmed or removed. Both of these
lists are stored on disk, and persist between browser sessions.

Additionally users can manually set the desired action for any FQDN.
These get added to their own lists, which are also stored on disk, and get checked
before Privacy Badger does its default action for a given origin. These are managed
from the popup window for Privacy Badger on the page as well as the options menu
for the whole extension.