summaryrefslogtreecommitdiffstats
path: root/doc/DESIGN-AND-ROADMAP.md
diff options
context:
space:
mode:
Diffstat (limited to '')
-rw-r--r--doc/DESIGN-AND-ROADMAP.md276
1 files changed, 276 insertions, 0 deletions
diff --git a/doc/DESIGN-AND-ROADMAP.md b/doc/DESIGN-AND-ROADMAP.md
new file mode 100644
index 0000000..ca96989
--- /dev/null
+++ b/doc/DESIGN-AND-ROADMAP.md
@@ -0,0 +1,276 @@
+# PRIVACY BADGER DESIGN AND ROADMAP
+
+## DESIGN
+
+### OBJECTIVE
+
+Privacy Badger aims to
+
+ - Protect users against non-consensual tracking by third party domains as they
+ browse the Web.
+
+ - Send and enforce the Do Not Track signal to sites (especially "third party"
+ sites since they are in a position to collect a large fraction of the user's
+ browsing history).
+
+Privacy Badger consists of a primary tracker blocking algorithm, augmented by
+a number of secondary features that extend further privacy protection and
+reduce breakage from the primary mechanism.
+
+### PRIMARY MECHANISM
+
+Privacy Badger:
+
+1. Ensures your browser is sending the DNT: 1 header (in some regulatory
+ environments, it is advisable to note "installing Privacy Badger will enable
+ Do Not Track" on your installation page / app store entry.
+2. Observes which first party origins a given third party origin is setting cookies on
+ (certain cookies are deemed to be "low entropy", as discussed below).
+
+ 2a. Observes which first party origins a given third party is doing certain
+ types of fingerprinting on.
+
+ 2b. Observes which first party origins a given third party is setting certain types
+ of supercookies on.
+
+ 2c. Observes which first party origins a given third party is sending
+ certain parts of first party cookies back to itself using image query
+ strings (pixel cookie sharing).
+
+3. If a third party origin receives a cookie, a supercookie, an image pixel
+ containing first party cookie data, or makes JavaScript fingerprinting API
+ calls on 3 or more first party origins, this is deemed to be "cross site
+ tracking".
+4. Typically, cross site trackers are blocked completely; Privacy Badger
+ prevents the browser from communicating with them. The exception is if the
+ site is on Privacy Badger's "yellow list" (aka the "cookie block list"), in
+ which case resources from the site are loaded, but without access to their
+ (third party) cookies or local storage, and with the referer header either
+ trimmed down to the origin (for GET requests) or removed outright (all other
+ requests). The yellow list is routinely fetched from [an EFF URL](https://www.eff.org/files/cookieblocklist_new.txt)
+ to allow prompt fixes for breakage.
+
+ Until methods for blocking them have been implemented, domains that perform
+ fingerprinting or use third party supercookies should not be added to the
+ yellow list.
+5. Users can also choose custom rules for any given domain flagged by Privacy Badger,
+ overrulling any automatic decision Privacy Badger has made about the domain.
+ Privacy Badger uses three-state sliders (red → block, yellow → cookie block, green → allow) to convey this
+ state in UI. We believe this is less confusing than the UI in many other
+ blocking tools, which often leave the user confused about whether a visual
+ state represents current blocking or the opportunity to block.
+6. Domains can agree to EFF's [Do Not Track policy](https://eff.org/dnt-policy). If a domain does this
+ Privacy Badger will no longer block its traffic or cookies. If a
+ first-party domain posts the policy, this applies to all third parties
+ embedded on that domain.
+ Sites post the policy at [a well-known URL](https://example.com/.well-known/dnt-policy.txt)
+ on their domains. The contents must match those of a file from the list of
+ acceptable policies exactly; the policy file is [maintained on github](https://github.com/EFForg/dnt-policy/),
+ but Privacy Badger fetches a list of known-good hashes periodically [from EFF](https://www.eff.org/files/dnt-policies.json)
+ (version 1.0 of the policy file will be added to that list when Privacy Badger
+ reaches version 1.0)
+
+#### Further Details
+
+# :warning: THIS SECTION IS OUTDATED AND NEEDS TO BE REWRITTEN :warning:
+
+Data Structures:
+
+- action_map = { 'google.com': blocked, 'fonts.google.com': 'cookieblocked', 'apis.fonts.google.com': 'user_cookieblock', 'foo.tracker.net': 'allow', 'tracker.net': 'DNT', }
+- snitch_map = {google.com: array('cooperq.com', 'noah.com', 'eff.org'), tracker.net: array(a.com, b.com, c.com)}
+- dnt_domains = array('tracker.net', 'dnt.eff.org')
+- settings = {social_widgets = true, ...}
+- cookie_block_list = "{'fonts.google.com': true, 'maps.google.com', true}"
+
+
+On Request():
+
+ if privacy badger is not enabled for the tab domain then return
+ if fqdn is not a third party then return
+
+ action = check_action(fqdn) (described below)
+
+ if action is block then cancel request
+ if action is cookie_block then strip headers
+ if fqdn is nontracking (i.e check_action returned nothing) then do nothing
+ if action is noaction or any user override then async_check_tracking
+ if action is allow && count == 2 then blocking_check_tracking
+ if check_tracking changed action then call check_action again
+ else do_nothing
+
+ async_check_dnt(fqdn)
+
+check_action(fqdn): returns action
+
+ related_domains = array()
+ best_action = 'noaction'
+
+ for $domain in range(fqdn ... etld+1)
+ if action_map contains $domain
+ related_domains.shift($domain)
+
+ for each domain in related domains
+ if score(domain.action) > score(best_action)
+ best_action = domain.action
+
+ return best_action
+
+check_tracking(fqdn): return boolean
+
+ var base_domain = etld+1(fqdn)
+
+ if has_cookie or has_supercookie or has_fingerprinting
+ if snitch_map doesn't have base domain add it
+ if snitch_map doesn't have first party add it
+ if snitch_map.base_domain.len >= 3
+ add base domain to action map as blocked
+ add all chlidren of base_domain and self from yellow list to action map
+ return true
+
+##### What is an "origin" for Privacy Badger?
+
+Privacy Badger has two notions of origin. One is the [effective top level
+domain](https://wiki.mozilla.org/Public_Suffix_List) plus one level of
+subdomain (eTLD+1), computed using
+[getBaseDomain](https://developer.mozilla.org/en-US/docs/Mozilla/Tech/XPCOM/Reference/Interface/nsIEffectiveTLDService)
+(which is built-in to Firefox; in Chrome we [ship a
+copy](https://github.com/EFForg/privacybadger/blob/8e8ad9838b74b6d13354163f78d362ca60dd44f9/src/lib/basedomain.js#L75).
+The accounting for which origins are trackers or not is performed by looking
+up how many first party fully qualified domain names (FQDNs) have been tracked by each
+of these eTLD + 1 origins. This is a conservative choice, which avoids the
+need to evaluate sets of cookies with different scopes.
+
+When the heuristic determines that the correct response is to block,
+that decision is applied to the third party eTLD+1 from which tracking was seen.
+
+Users are able to override Privacy Badger's decision for any given FQDN if they
+do not wish to block something that is otherwise blocked (or block something
+that is not blocked).
+
+To illustrate this, suppose the site <tt>tracking.co.uk</tt> was embedded on
+every site on the Web, but each embed came from a randomly selected subdomain
+<tt>a.tracking.co.uk</tt>, <tt>b.tracking.co.uk</tt>,
+<tt>c.tracking.co.uk</tt>, etc. Suppose the user visits
+<tt>www.news-example.com</tt> and <tt>search.jobs-example.info</tt>.
+
+The accounting data structure <tt>seenThirdParties</tt> would come to include:
+
+```
+{
+ ...
+ "tracking.co.uk" : {
+ "news-example.com" : true,
+ "jobs-example.info" : true,
+ }
+ ...
+}
+```
+
+Now suppose the user visits a third site, <tt>clickbait.nonprofit.org</tt>,
+and is tracked by <tt>q.tracking.co.uk</tt> on that site. The
+seenThirdParties data structure will have a third entry added to it, meeting
+the threshold of three first party origins and defining
+<tt>tracking.co.uk</tt> as a tracking eTLD+1. At this point
+<tt>tracking.co.uk</tt> will be added to the block list. Any future requests to
+<tt>tracking.co.uk</tt>, or any of its subdomains, will be blocked.
+The user can manually unblock specific subdomains as necessary via the popup menu.
+
+##### What is a "low entropy" cookie?
+
+Our [current cookie heuristic](https://github.com/EFForg/privacybadger/blob/8e8ad9838b74b6d13354163f78d362ca60dd44f9/src/js/heuristicblocking.js#L632) is to assign "number of identifying bits" estimates to
+some known common cookie values, and to bound the sum of these to 12.
+Predetermined low-entropy cookies will not be identified as tracking, nor will
+combinations of them so long as their total estimated entropy is under 12 bits.
+
+### ADDITIONAL MECHANISMS
+
+#### Widget Substitution
+
+Many social media widgets are inherently designed to combine tracking
+and occasionally-useful functionality in a single resource load.
+Privacy Badger aims to give the user access to the functionality when they want
+it, but protection against the tracking at all other times.
+
+To that end, Privacy Badger has incorporated code from the ShareMeNot project
+so that it is able to replace various types of widgets hosted
+by third party origins with local, static equivalents that either replace the
+original widget faithfully, or create a click-through step before the widget
+is loaded and tracks the user.
+
+The widget replacement table lives in the [socialwidgets.json file](https://github.com/EFForg/privacybadger/blob/8e8ad9838b74b6d13354163f78d362ca60dd44f9/src/data/socialwidgets.json).
+Widgets are replaced unless the user has chosen to specifically allow that third party
+domain (by moving the slider to 'green' in the UI), so users can selectively
+disable this functionality if they wish. The code for social media widgets is
+quite diverse, so not all variants (especially custom variants that sites build
+for themselves) are necessarily replaced.
+
+#### What are the states for domain responses?
+
+Currently domains have three states: no action, cookie block, and block. No
+action allows all requests to resolve as normal without intervention from
+Privacy Badger. Cookie block allows for requests to resolve normally but will
+block cookies from being read or created. Cookie block also trims or removes
+the referer header. Block will cause any requests from that origin to be
+blocked entirely; before even a TCP connection can be established. The user can
+toggle these options manually, which will supersede any determinations made
+automatically by Privacy Badger.
+
+#### What does EFFs Do Not Track policy stipulate?
+
+Currently the Do Not Track policy covers where the agreement will be hosted,
+how users who send the DNT header are treated, log retention, how information
+will be shared with other domains, notifications of disclosure, and possible exceptions.
+It can be read in full [here](https://www.eff.org/dnt-policy).
+
+#### How do sites agree to EFFs Do Not Track policy?
+
+Sites can agree to this policy by posting at https://subdomain.example.com/.well-known/dnt-policy.txt,
+where "subdomain" is any domain to which the policy applies, for a given third party.
+
+#### Fingerprinting detection
+Certain aspects of the browser, such as fonts, add-ons or extensions, screen size,
+and seen links, can be used to give the browser a fingerprint that is unique out
+of a very small amount of users (see [Panopticlick](https://panopticlick.eff.org/) for more information).
+
+As of Privacy Badger 1.0, any third party script that writes to an HTML5
+canvas object and then reads a sufficiently large amount back from the third
+party canvas object will be treated the same way as a third party cookie, blocking the
+third party origin if it does this across multiple first party origins. Our
+research has determined that this is a reliable way to distinguish between
+fingerprinting and other third party canvas uses.
+
+This may be augmented by hooks to detect extensive enumeration of properties
+in the <tt>navigator</tt> object in the future.
+
+#### Pixel cookie sharing detection
+
+Detection of first to third party cookie sharing via image pixels was added in [#2088](https://github.com/EFForg/privacybadger/issues/2088).
+
+### ROADMAP
+
+#### High priority issues
+
+Please see our ["high priority"-labeled issues](https://github.com/EFForg/privacybadger/issues?q=is%3Aissue+is%3Aopen+label%3A%22high+priority%22).
+
+## Technical Implementation
+
+### How are origins and the rules for them stored?
+
+When a browser with Privacy Badger enabled makes a request to a third party, if
+the request contains a cookie or the response tries to set a cookie it gets
+flagged as 'tracking'. Origins that make tracking requests get stored in a
+key→value store where the keys are the origins making the request, and the
+values are the first party origins these requests were made on. If that list of
+third parties contains three or more first party origins the third party origin
+gets added to another list of known trackers. When Privacy Badger gets a
+request from an origin on the known trackers list, if it is not on the yellow
+list then Privacy Badger blocks that request. If it is on the yellow list then
+the request is allowed to resolve, but all cookie setting and getting parts of
+it are blocked, while the referer header is trimmed or removed. Both of these
+lists are stored on disk, and persist between browser sessions.
+
+Additionally users can manually set the desired action for any FQDN.
+These get added to their own lists, which are also stored on disk, and get checked
+before Privacy Badger does its default action for a given origin. These are managed
+from the popup window for Privacy Badger on the page as well as the options menu
+for the whole extension.