summaryrefslogtreecommitdiffstats
path: root/platform/nodejs/README.md
blob: 0b3e3d8c074dc26858a8aa798d4520e24d12cbbc (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
# uBlock Origin Core

The core filtering engines used in the uBlock Origin ("uBO") extension, and has
no external dependencies.

## Installation

Install: `npm install @gorhill/ubo-core`

This is a very early version and the API is subject to change at any time.

This package uses [native JavaScript modules](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Modules).


## Description

The package contains uBO's static network filtering engine ("SNFE"), which
purpose is to parse and enforce filter lists. The matching algorithm is highly
efficient, and _especially_ optimized to match against large sets of pure
hostnames.

The SNFE can be fed filter lists from a variety of sources, such as [EasyList/EasyPrivacy](https://easylist.to/), 
[uBlock filters](https://github.com/uBlockOrigin/uAssets/tree/master/filters), 
and also lists of domain names or hosts file format (i.e. block lists from [The Block List Project](https://github.com/blocklistproject/Lists#the-block-list-project), 
[Steven Black's HOSTS](https://github.com/StevenBlack/hosts#readme), etc).


## Usage

At the moment, there can be only one instance of the static network filtering
engine ("SNFE"), which proxy API must be imported as follow:

```js
import { StaticNetFilteringEngine } from '@gorhill/ubo-core';
```

If you must import as a NodeJS module:

```js
const { StaticNetFilteringEngine } = await import('@gorhill/ubo-core');
```


Create an instance of SNFE:

```js
const snfe = StaticNetFilteringEngine.create();
```

Feed the SNFE with filter lists -- `useLists()` accepts an array of
objects (or promises to object) which expose the raw text of a list
through the `raw` property, and optionally the name of the list through the
`name` property (how you fetch the lists is up to you):

```js
await snfe.useLists([
    fetch('easylist').then(raw => ({ name: 'easylist', raw })),
    fetch('easyprivacy').then(raw => ({ name: 'easyprivacy', raw })),
]);
```

Now we are ready to match network requests:

```js
// Not blocked
if ( snfe.matchRequest({
    originURL: 'https://www.bloomberg.com/',
    url: 'https://www.bloomberg.com/tophat/assets/v2.6.1/that.css',
    type: 'stylesheet'
}) !== 0 ) {
    console.log(snfe.toLogData());
}

// Blocked
if ( snfe.matchRequest({
    originURL: 'https://www.bloomberg.com/',
    url: 'https://securepubads.g.doubleclick.net/tag/js/gpt.js',
    type: 'script'
}) !== 0 ) {
    console.log(snfe.toLogData());
}

// Unblocked
if ( snfe.matchRequest({
    originURL: 'https://www.bloomberg.com/',
    url: 'https://sourcepointcmp.bloomberg.com/ccpa.js',
    type: 'script'
}) !== 0 ) {
    console.log(snfe.toLogData());
}
```

It is possible to pre-parse filter lists and save the intermediate results for 
later use -- useful to speed up the loading of filter lists. This will be 
documented eventually, but if you feel adventurous, you can look at the code 
and use this capability now if you figure out the details.

---

## Extras

You can directly use specific APIs exposed by this package, here are some of 
them, which are used internally by uBO's SNFE.

### HNTrieContainer

A well optimised [compressed trie](https://en.wikipedia.org/wiki/Trie#Compressing_tries) 
container specialized to specifically store and lookup hostnames.

The matching algorithm is designed for hostnames, i.e. the hostname labels 
making up a hostname are matched from right to left, such that `www.example.org` 
with be a match if `example.org` is stored into the trie, while 
`anotherexample.org` won't be a match.

`HNTrieContainer` is designed to store a large number of hostnames with CPU and 
memory efficiency as a main concern -- and is a key component of uBO.

To create and use a standalone `HNTrieContainer` object:

```js
import HNTrieContainer from '@gorhill/ubo-core/js/hntrie.js';

const trieContainer = new HNTrieContainer();

const aTrie = trieContainer.createOne();
trieContainer.add(aTrie, 'example.org');
trieContainer.add(aTrie, 'example.com');

const anotherTrie = trieContainer.createOne();
trieContainer.add(anotherTrie, 'foo.invalid');
trieContainer.add(anotherTrie, 'bar.invalid');

// matches() return the position at which the match starts, or -1 when
// there is no match.

// Matches: return 4
console.log("trieContainer.matches(aTrie, 'www.example.org')", trieContainer.matches(aTrie, 'www.example.org'));

// Does not match: return -1
console.log("trieContainer.matches(aTrie, 'www.foo.invalid')", trieContainer.matches(aTrie, 'www.foo.invalid'));

// Does not match: return -1
console.log("trieContainer.matches(anotherTrie, 'www.example.org')", trieContainer.matches(anotherTrie, 'www.example.org'));

// Matches: return 0
console.log("trieContainer.matches(anotherTrie, 'foo.invalid')", trieContainer.matches(anotherTrie, 'foo.invalid'));
```

The `reset()` method must be used to remove all the tries from a trie container, 
you can't remove a single trie from the container.

```js
trieContainer.reset();
```

When you reset a trie container, you can't use the reference to prior instances 
of trie, i.e. `aTrie` and `anotherTrie` are no longer valid and shouldn't be 
used following a reset.