summaryrefslogtreecommitdiffstats
path: root/netwerk/docs/url_parsers.md
blob: f5bd1f110eab8383172fb4db647677efbbd0192a (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
# URL parsers

```{warning}
In order to ensure thread safety it is important that all of the objects and interfaces of URI objects are immutable.
If you are implementing a new URI type, please make sure that none of the type's public methods change the URL.
```

##  Definitions
- URI - Uniform Resource Identifier
- URL - Uniform Resource Locator

These two terms are used interchangeably throughout the codebase and essentially represent the same thing - a string of characters that represents a specific resource.

## Motivation

While we could simply pass strings around and leave it to the final consumer to deal with it, that creates a burden for the consumer and would probably be inefficient. Instead we parse the string into a nsIURI object as soon as possible and pass that object through function calls. This allows the consumer to easily extract only the part of the string they are interested in (eg. the hostname or the path).

## Interfaces
- [nsIURI](https://searchfox.org/mozilla-central/source/netwerk/base/nsIURI.idl)
    - This is the most important interface for URI parsing. It contains a series of readonly attributes that consumers can use to extract information from the URI.
- [nsIURL](https://searchfox.org/mozilla-central/source/netwerk/base/nsIURL.idl)
    - Defines a structure for the URI's path (directory, fileName, fileBaseName, fileExtension)
- [nsIFileURL](https://searchfox.org/mozilla-central/source/netwerk/base/nsIFileURL.idl)
    - Has a file attribute of type `nsIFile`
    - Used for local protocols to access the file represented by the `nsIURI`
- [nsIMozIconURI](https://searchfox.org/mozilla-central/source/image/nsIIconURI.idl)
    - Used to represent an icon. Contains additional attributes such as the size and contentType or state of the URL.
- [nsIJARURI](https://searchfox.org/mozilla-central/source/modules/libjar/nsIJARURI.idl)
    - Used to represent a resource inside of a JAR (zip archive) file.
    - For example `jar:http://www.example.com/blue.jar!/ocean.html` represents the `/ocean.html` resource located inside the `blue.jar` archive that can be fetched via HTTP from example.com.
- [nsIStandardURL](https://searchfox.org/mozilla-central/source/netwerk/base/nsIStandardURL.idl)
    - Defines a few constant flags used to determine the type of the URL. No other attributes.
- [nsINestedURI](https://searchfox.org/mozilla-central/source/netwerk/base/nsINestedURI.idl)
    - Defines `innerURI` and `innermostURI`.
    - `innermostURI` is just a helper - one could also get it by going through `innerURI` repeatedly until the attribute no longer QIs to nsINestedURI.
- [nsISensitiveInfoHiddenURI](https://searchfox.org/mozilla-central/source/netwerk/base/nsISensitiveInfoHiddenURI.idl)
    - Objects that implement this interface will have a `getSensitiveInfoHiddenSpec()` method that returns the spec of the URI with sensitive info (such as the password) replaced by the `*` symbol.

### Diagram of interfaces
```{mermaid}
classDiagram
nsISupports <-- nsIURI
nsIURI <-- nsIURL
nsIURL <-- nsIFileURL
nsIURI <-- nsIMozIconURI
nsIURL <-- nsIJARURI
nsISupports <-- nsIStandardURL
nsISupports <-- nsINestedURI
nsISupports <-- nsISensitiveInfoHiddenURI
```

### Mutation

To ensure thread safety all implementations of nsIURI must be immutable.
To change a URI the consumer must call `nsIURI.mutate()` which returns a `nsIMutator`. The `nsIMutator` has several setter methods that can be used change attributes on the concrete object. Once done changing the object, the consumer will call `nsIMutator.finalize()` to obtain an immutable `nsIURI`.

- [nsIURIMutator](https://searchfox.org/mozilla-central/source/netwerk/base/nsIURIMutator.idl)
    - This interface contains a series of setters that can be used to mutate and/or construct a `nsIURI`


### Additional interfaces

- [nsISerializable](https://searchfox.org/mozilla-central/source/xpcom/ds/nsISerializable.idl)
    - Allows us to serialize and deserialize URL objects into strings for persistent storage (such as session restore).

## Implementations
- [nsStandardURL](https://searchfox.org/mozilla-central/source/netwerk/base/nsStandardURL.h)
- [SubstitutingURL](https://searchfox.org/mozilla-central/source/netwerk/protocol/res/SubstitutingURL.h)
    - overrides nsStandardURL::GetFile to provide nsIFile resolution.
    - This allows us to map URLs such as `resource://gre/actors/RemotePageChild.jsm` to the actual file on the disk.
- [nsMozIconURI](https://searchfox.org/mozilla-central/source/image/decoders/icon/nsIconURI.h)
    - Used to represent icon URLs
- [nsSimpleURI](https://searchfox.org/mozilla-central/source/netwerk/base/nsSimpleURI.h)
    - Used for simple URIs that normally don't have an authority (username, password, host, port)
- [nsSimpleNestedURI](https://searchfox.org/mozilla-central/source/netwerk/base/nsSimpleNestedURI.h)
    - eg. `view-source:http://example.com/path`
    - Normally only the extra scheme of the nestedURI is relevant (eg. `view-source:`)
    - Most of the getter/setters are delegated to the innerURI
- [nsNestedAboutURI](https://searchfox.org/mozilla-central/source/netwerk/protocol/about/nsAboutProtocolHandler.h)
    - Similar to nsSimpleNestedURI, but has an extra `mBaseURI` member that allows us to propagate the base URI to about:blank correctly`
- [BlobURL](https://searchfox.org/mozilla-central/source/dom/file/uri/BlobURL.h)
    - Used for javascript blobs
    - Similar to nsSimpleURI, but also has a revoked field.
- [DefaultURI](https://searchfox.org/mozilla-central/source/netwerk/base/DefaultURI.h)
    - This class wraps an object parsed by the `rust-url` crate.
    - While not yet enabled by default, due to small bugs in that parser, the plan is to eventually use this implementation for all _unknown protocols_ that don't have their own URL parser.
- [nsJSURI](https://searchfox.org/mozilla-central/source/dom/jsurl/nsJSProtocolHandler.h)
    - Used to represent javascript code (eg. `javascript:alert('hello')`)
- [nsJARURI](https://searchfox.org/mozilla-central/source/modules/libjar/nsJARURI.h)
    - Used to represent resources inside of JAR files.

### Diagram of implementations

```{mermaid}
classDiagram
nsSimpleURI o-- BlobURL
nsIMozIconURI o-- nsMozIconURI
nsIFileURL o-- nsStandardURL
nsIStandardURL o-- nsStandardURL
nsISensitiveInfoHiddenURI o-- nsStandardURL
nsStandardURL o-- SubstitutingURL
nsIURI o-- nsSimpleURI
nsSimpleURI o-- nsSimpleNestedURI
nsSimpleNestedURI o-- nsNestedAboutURI

nsIURI o-- DefaultURI

nsSimpleURI o-- nsJSURI

nsINestedURI o-- nsJARURI
nsIJARURI o-- nsJARURI
```

## Class and interface diagram

```{mermaid}
classDiagram
nsISupports <-- nsIURI
nsIURI <-- nsIURL
nsIURL <-- nsIFileURL
nsIURI <-- nsIMozIconURI
nsIURL <-- nsIJARURI
nsISupports <-- nsIStandardURL
nsISupports <-- nsINestedURI
nsISupports <-- nsISensitiveInfoHiddenURI

%% classes

nsSimpleURI o-- BlobURL
nsSimpleURI o-- nsJSURI
nsIMozIconURI o-- nsMozIconURI
nsIFileURL o-- nsStandardURL
nsIStandardURL o-- nsStandardURL
nsISensitiveInfoHiddenURI o-- nsStandardURL
nsStandardURL o-- SubstitutingURL
nsIURI o-- nsSimpleURI
nsINestedURI o-- nsJARURI
nsIJARURI o-- nsJARURI
nsSimpleURI o-- nsSimpleNestedURI
nsSimpleNestedURI o-- nsNestedAboutURI
nsIURI o-- DefaultURI

```