summaryrefslogtreecommitdiffstats
path: root/netwerk/docs/cache2/doc.rst
blob: 71982be9e634b0160db094846e7e30744af9ed42 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
HTTP Cache
==========

This document describes the **HTTP cache implementation**.

The code resides in `/netwerk/cache2 (searchfox)
<https://searchfox.org/mozilla-central/source/netwerk/cache2>`_

API
---

Here is a detailed description of the HTTP cache v2 API, examples
included.  This document only contains what cannot be found or may not
be clear directly from the `IDL files <https://searchfox.org/mozilla-central/search?q=&path=cache2%2FnsICache&case=false&regexp=false>`_ comments.

-  The cache API is **completely thread-safe** and **non-blocking**.
-  There is **no IPC support**.  It's only accessible on the default
   chrome process.
-  When there is no profile the new HTTP cache works, but everything is
   stored only in memory not obeying any particular limits.

.. _nsICacheStorageService:

nsICacheStorageService
----------------------

-  The HTTP cache entry-point. Accessible as a service only, fully
   thread-safe, scriptable.

-  `nsICacheStorageService.idl (searchfox) <https://searchfox.org/mozilla-central/source/netwerk/cache2/nsICacheStorageService.idl>`_

-   \ ``"@mozilla.org/netwerk/cache-storage-service;1"``

-  Provides methods accessing "storage" objects – see `nsICacheStorage` below – giving further access to cache entries – see :ref:`nsICacheEntry <nsICacheEntry>` more below – per specific URL.

-  Currently we have 3 types of storages, all the access methods return
   an :ref:`nsICacheStorage <nsICacheStorage>` object:

   -  **memory-only** (``memoryCacheStorage``): stores data only in a
      memory cache, data in this storage are never put to disk

   -  **disk** (``diskCacheStorage``): stores data on disk, but for
      existing entries also looks into the memory-only storage; when
      instructed via a special argument also primarily looks into
      application caches

   .. note::

      **application cache** (``appCacheStorage``): when a consumer has a
      specific ``nsIApplicationCache`` (i.e. a particular app cache
      version in a group) in hands, this storage will provide read and
      write access to entries in that application cache; when the app
      cache is not specified, this storage will operate over all
      existing app caches. **This kind of storage is deprecated and will be removed** in `bug 1694662 <https://bugzilla.mozilla.org/show_bug.cgi?id=1694662>`_

-  The service also provides methods to clear the whole disk and memory
   cache content or purge any intermediate memory structures:

   -  ``clear``– after it returns, all entries are no longer accessible
      through the cache APIs; the method is fast to execute and
      non-blocking in any way; the actual erase happens in background

   -  ``purgeFromMemory``– removes (schedules to remove) any
      intermediate cache data held in memory for faster access (more
      about the :ref:`Intermediate_Memory_Caching <Intermediate_Memory_Caching>` below)

.. _nsILoadContextInfo:

nsILoadContextInfo
------------------

-  Distinguishes the scope of the storage demanded to open.

-  Mandatory argument to ``*Storage`` methods of :ref:`nsICacheStorageService <nsICacheStorageService>`.

-  `nsILoadContextInfo.idl (searchfox) <https://searchfox.org/mozilla-central/source/netwerk/base/nsILoadContextInfo.idl>`_


-  It is a helper interface wrapping following four arguments into a single one:

   -  **private-browsing** boolean flag
   -  **anonymous load** boolean flag
   -  **origin attributes** js value

   .. note::

      Helper functions to create nsILoadContextInfo objects:

      -  C++ consumers: functions at ``LoadContextInfo.h`` exported
         header

      -  JS consumers: ``Services.loadContextInfo`` which is an instance of ``nsILoadContextInfoFactory``.

-  Two storage objects created with the same set of
   ``nsILoadContextInfo``\ arguments are identical, containing the same
   cache entries.

-  Two storage objects created with in any way different
   ``nsILoadContextInfo``\ arguments are strictly and completely
   distinct and cache entries in them do not overlap even when having
   the same URIs.

.. _nsICacheStorage:

nsICacheStorage
---------------

-  `nsICacheStorage.idl (searchfox) <https://searchfox.org/mozilla-central/source/netwerk/cache2/nsICacheStorage.idl>`_

-  Obtained from call to one of the ``*Storage`` methods on
   :ref:`nsICacheStorageService <nsICacheStorageService>`.

-  Represents a distinct storage area (or scope) to put and get cache
   entries mapped by URLs into and from it.

-  *Similarity with the old cache*\ : this interface may be with some
   limitations considered as a mirror to ``nsICacheSession``, but less
   generic and not inclining to abuse.

nsICacheEntryOpenCallback
-------------------------

-  `nsICacheEntryOpenCallback.idl (searchfox) <https://searchfox.org/mozilla-central/source/netwerk/cache2/nsICacheEntryOpenCallback.idl>`_

-  The result of ``nsICacheStorage.asyncOpenURI`` is always and only
   sent to callbacks on this interface.

-  These callbacks are ensured to be invoked when ``asyncOpenURI``
   returns ``NS_OK``.

-

   .. note::

      When the
      cache entry object is already present in memory or open as
      "force-new" (a.k.a "open-truncate") this callback is invoked
      sooner then the ``asyncOpenURI``\ method returns (i.e.
      immediately); there is currently no way to opt out of this feature
      (see `bug
      938186 <https://bugzilla.mozilla.org/show_bug.cgi?id=938186>`__).

.. _nsICacheEntry:

nsICacheEntry
-------------

-  `nsICacheEntry.idl (searchfox) <https://searchfox.org/mozilla-central/source/netwerk/cache2/nsICacheEntry.idl>`_

-  Obtained asynchronously or pseudo-asynchronously by a call to
   ``nsICacheStorage.asyncOpenURI``.

-  Provides access to a cached entry data and meta data for reading or
   writing or in some cases both, see below.

Lifetime of a new entry
-----------------------

-  Such entry is initially empty (no data or meta data is stored in it).

-  The ``aNew``\ argument in ``onCacheEntryAvailable`` is ``true`` for
   and only for new entries.

-  Only one consumer (the so called "*writer*") may have such an entry
   available (obtained via ``onCacheEntryAvailable``).

-  Other parallel openers of the same cache entry are blocked (wait) for
   invocation of their ``onCacheEntryAvailable`` until one of the
   following occurs:

   -  The *writer* simply throws the entry away: other waiting opener in
      line gets the entry again as "*new*", the cycle repeats.

      .. note::

         This applies in general, writers throwing away the cache entry
         means a failure to write the cache entry and a new writer is
         being looked for again, the cache entry remains empty (a.k.a.
         "new").

   -  The *writer* stored all necessary meta data in the cache entry and
      called ``metaDataReady`` on it: other consumers now get the entry
      and may examine and potentially modify the meta data and read the
      data (if any) of the cache entry.
   -  When the *writer* has data (i.e. the response payload) to write to
      the cache entry, it **must** open the output stream on it
      **before** it calls ``metaDataReady``.

-  When the *writer* still keeps the cache entry and has open and keeps
   open the output stream on it, other consumers may open input streams
   on the entry. The data will be available as the *writer* writes data
   to the cache entry's output stream immediately, even before the
   output stream is closed. This is called :ref:`concurrent
   read/write <Concurrent_read_and_write>`.

.. _Concurrent_read_and_write:

Concurrent read and write
-------------------------

The cache supports reading a cache entry data while it is still being
written by the first consumer - the *writer*.
This can only be engaged for resumable responses that (`bug
960902 <https://bugzilla.mozilla.org/show_bug.cgi?id=960902#c17>`__)
don't need revalidation. Reason is that when the writer is interrupted
(by e.g. external canceling of the loading channel) concurrent readers
would not be able to reach the remaining unread content.

.. note::

   This could be improved by keeping the network load running and being
   stored to the cache entry even after the writing channel has been
   canceled.

When the *writer* is interrupted, the first concurrent *reader* in line
does a range request for the rest of the data - and becomes that way a
new *writer*. The rest of the *readers* are still concurrently reading
the content since output stream for the cache entry is again open and
kept by the current *writer*.

Lifetime of an existing entry with only a partial content
---------------------------------------------------------

-  Such a cache entry is first examined in the
   ``nsICacheEntryOpenCallback.onCacheEntryCheck`` callback, where it
   has to be checked for completeness.
-  In this case, the ``Content-Length`` (or different indicator) header
   doesn't equal to the data size reported by the cache entry.
-  The consumer then indicates the cache entry needs to be revalidated
   by returning ``ENTRY_NEEDS_REVALIDATION``\ from
   ``onCacheEntryCheck``.
-  This consumer, from the point of view the cache, takes a role of the
   *writer*.
-  Other parallel consumers, if any, are blocked until the *writer*
   calls ``setValid`` on the cache entry.
-  The consumer is then responsible to validate the partial content
   cache entry with the network server and attempt to load the rest of
   the data.
-  When the server responds positively (in case of an HTTP server with a
   206 response code) the *writer* (in this order) opens the output
   stream on the cache entry and calls ``setValid`` to unblock other
   pending openers.
-  Concurrent read/write is engaged.

Lifetime of an existing entry that doesn't pass server revalidation
-------------------------------------------------------------------

-  Such a cache entry is first examined in the
   ``nsICacheEntryOpenCallback.onCacheEntryCheck`` callback, where the
   consumer finds out it must be revalidated with the server before use.
-  The consumer then indicates the cache entry needs to be revalidated
   by returning ``ENTRY_NEEDS_REVALIDATION``\ from
   ``onCacheEntryCheck``.
-  This consumer, from the point of view the cache, takes a role of the
   *writer*.
-  Other parallel consumers, if any, are blocked until the *writer*
   calls ``setValid`` on the cache entry.
-  The consumer is then responsible to validate the partial content
   cache entry with the network server.
-  The server responses with a 200 response which means the cached
   content is no longer valid and a new version must be loaded from the
   network.
-  The *writer* then calls ``recreate``\ on the cache entry. This
   returns a new empty entry to write the meta data and data to, the
   *writer* exchanges its cache entry by this new one and handles it as
   a new one.
-  The *writer* then (in this order) fills the necessary meta data of
   the cache entry, opens the output stream on it and calls
   ``metaDataReady`` on it.
-  Any other pending openers, if any, are now given this new entry to
   examine and read as an existing entry.

Adding a new storage
--------------------

Should there be a need to add a new distinct storage for which the
current scoping model would not be sufficient - use one of the two
following ways:

#. *[preferred]* Add a new ``<Your>Storage`` method on
   :ref:`nsICacheStorageService <nsICacheStorageService>` and if needed give it any arguments to
   specify the storage scope even more.  Implementation only should need
   to enhance the context key generation and parsing code and enhance
   current - or create new when needed - :ref:`nsICacheStorage <nsICacheStorage>`
   implementations to carry any additional information down to the cache
   service.
#. *[*\ **not**\ *preferred]* Add a new argument to
   :ref:`nsILoadContextInfo <nsILoadContextInfo>`; **be careful
   here**, since some arguments on the context may not be known during
   the load time, what may lead to inter-context data leaking or
   implementation problems. Adding more distinction to
   :ref:`nsILoadContextInfo <nsILoadContextInfo>` also affects all existing storages which may
   not be always desirable.

See context keying details for more information.

Threading
---------

The cache API is fully thread-safe.

The cache is using a single background thread where any IO operations
like opening, reading, writing and erasing happen.  Also memory pool
management, eviction, visiting loops happen on this thread.

The thread supports several priority levels. Dispatching to a level with
a lower number is executed sooner then dispatching to higher number
layers; also any loop on lower levels yields to higher levels so that
scheduled deletion of 1000 files will not block opening cache entries.

#. **OPEN_PRIORITY:** except opening priority cache files also file
   dooming happens here to prevent races
#. **READ_PRIORITY:** top level documents and head blocking script cache
   files are open and read as the first
#. **OPEN**
#. **READ:** any normal priority content, such as images are open and
   read here
#. **WRITE:** writes are processed as last, we cache data in memory in
   the mean time
#. **MANAGEMENT:** level for the memory pool and CacheEntry background
   operations
#. **CLOSE:** file closing level
#. **INDEX:** index is being rebuild here
#. **EVICT:** files overreaching the disk space consumption limit are
   being evicted here

NOTE: Special case for eviction - when an eviction is scheduled on the
IO thread, all operations pending on the OPEN level are first merged to
the OPEN_PRIORITY level. The eviction preparation operation - i.e.
clearing of the internal IO state - is then put to the end of the
OPEN_PRIORITY level.  All this happens atomically.

Storage and entries scopes
--------------------------

A *scope key* string used to map the storage scope is based on the
arguments of :ref:`nsILoadContextInfo <nsILoadContextInfo>`. The form is following (currently
pending in `bug
968593 <https://bugzilla.mozilla.org/show_bug.cgi?id=968593>`__):

.. code::

   a,b,i1009,p,

-  Regular expression: ``(.([-,]+)?,)*``
-  The first letter is an identifier, identifiers are to be
   alphabetically sorted and always terminate with ','
-  a - when present the scope is belonging to an **anonymous** load
-  b - when present the scope is **in browser element** load
-  i - when present must have a decimal integer value that represents an
   app ID the scope belongs to, otherwise there is no app (app ID is
   considered ``0``)
-  p - when present the scope is of a **private browsing** load, this
   never persists

``CacheStorageService``\ keeps a global hashtable mapped by the *scope
key*. Elements in this global hashtable are hashtables of cache entries.
The cache entries are mapped by concantation of Enhance ID and URI
passed to ``nsICacheStorage.asyncOpenURI``.  So that when an entry is
being looked up, first the global hashtable is searched using the
*scope key*. An entries hashtable is found. Then this entries hashtable
is searched using <enhance-id:><uri> string. The elements in this
hashtable are CacheEntry classes, see below.

The hash tables keep a strong reference to ``CacheEntry`` objects. The
only way to remove ``CacheEntry`` objects from memory is by exhausting a
memory limit for :ref:`Intermediate_Memory_Caching <Intermediate_Memory_Caching>`, what triggers a background
process of purging expired and then least used entries from memory.
Another way is to directly call the
``nsICacheStorageService.purge``\ method. That method is also called
automatically on the ``"memory-pressure"`` indication.

Access to the hashtables is protected by a global lock. We also - in a
thread-safe manner - count the number of consumers keeping a reference
on each entry. The open callback actually doesn't give the consumer
directly the ``CacheEntry`` object but a small wrapper class that
manages the 'consumer reference counter' on its cache entry. This both
mechanisms ensure thread-safe access and also inability to have more
then a single instance of a ``CacheEntry`` for a single
<scope+enhanceID+URL> key.

``CacheStorage``, implementing the :ref:`nsICacheStorage <nsICacheStorage>` interface, is
forwarding all calls to internal methods of ``CacheStorageService``
passing itself as an argument.  ``CacheStorageService`` then generates
the *scope key* using the ``nsILoadContextInfo`` of the storage. Note:
CacheStorage keeps a thread-safe copy of ``nsILoadContextInfo`` passed
to a ``*Storage`` method on ``nsICacheStorageService``.

Invoking open callbacks
-----------------------

``CacheEntry``, implementing the ``nsICacheEntry`` interface, is
responsible for managing the cache entry internal state and to properly
invoke ``onCacheEntryCheck`` and ``onCacheEntryAvaiable`` callbacks to
all callers of ``nsICacheStorage.asyncOpenURI``.

-  Keeps a FIFO of all openers.
-  Keeps its internal state like NOTLOADED, LOADING, EMPTY, WRITING,
   READY, REVALIDATING.
-  Keeps the number of consumers keeping a reference to it.
-  Refers a ``CacheFile`` object that holds actual data and meta data
   and, when told to, persists it to the disk.

The openers FIFO is an array of ``CacheEntry::Callback`` objects.
``CacheEntry::Callback`` keeps a strong reference to the opener plus the
opening flags.  ``nsICacheStorage.asyncOpenURI`` forwards to
``CacheEntry::AsyncOpen`` and triggers the following pseudo-code:

**CacheStorage::AsyncOpenURI** - the API entry point:

-  globally atomic:

   -  look a given ``CacheEntry`` in ``CacheStorageService`` hash tables
      up
   -  if not found: create a new one, add it to the proper hash table
      and set its state to NOTLOADED
   -  consumer reference ++

-  call to `CacheEntry::AsyncOpen`
-  consumer reference --

**CacheEntry::AsyncOpen** (entry atomic):

-  the opener is added to FIFO, consumer reference ++ (dropped back
   after an opener is removed from the FIFO)
-  state == NOTLOADED:

   -  state = LOADING
   -  when OPEN_TRUNCATE flag was used:

      -  ``CacheFile`` is created as 'new', state = EMPTY

   -  otherwise:

      -  ``CacheFile`` is created and load on it started
      -  ``CacheEntry::OnFileReady`` notification is now expected

-  state == LOADING: just do nothing and exit
-  call to `CacheEntry::InvokeCallbacks`

**CacheEntry::InvokeCallbacks** (entry atomic):

-  called on:

   -  a new opener has been added to the FIFO via an ``AsyncOpen`` call
   -  asynchronous result of CacheFile open ``CacheEntry::OnFileReady>``
   -  the writer throws the entry away - ``CacheEntry::OnHandleClosed``
   -  the **output stream** of the entry has been **opened** or
      **closed**
   -  ``metaDataReady``\ or ``setValid``\ on the entry has been called
   -  the entry has been **doomed**

-  state == EMPTY:

   -  on OPER_READONLY flag use: onCacheEntryAvailable with
      ``null``\ for the cache entry
   -  otherwise:

      -  state = WRITING
      -  opener is removed from the FIFO and remembered as the current
         '*writer*'
      -  onCacheEntryAvailable with ``aNew = true``\ and this entry is
         invoked (on the caller thread) for the *writer*

-  state == READY:

   -  onCacheEntryCheck with the entry is invoked on the first opener in
      FIFO - on the caller thread if demanded
   -  result == RECHECK_AFTER_WRITE_FINISHED:

      -  opener is left in the FIFO with a flag ``RecheckAfterWrite``
      -  such openers are skipped until the output stream on the entry
         is closed, then ``onCacheEntryCheck`` is re-invoked on them
      -  Note: here is a potential for endless looping when
         RECHECK_AFTER_WRITE_FINISHED is abused

   -  result == ENTRY_NEEDS_REVALIDATION:

      -  state = REVALIDATING, this prevents invocation of any callback
         until ``CacheEntry::SetValid`` is called
      -  continue as in state ENTRY_WANTED (just below)

   -  result == ENTRY_WANTED:

      -  consumer reference ++ (dropped back when the consumer releases
         the entry)
      -  onCacheEntryAvailable is invoked on the opener with
         ``aNew = false``\ and the entry
      -  opener is removed from the FIFO

   -  result == ENTRY_NOT_WANTED:

      -  ``onCacheEntryAvailable`` is invoked on the opener with
         ``null``\ for the entry
      -  opener is removed from the FIFO

-  state == WRITING or REVALIDATING:

   -  do nothing and exit

-  any other value of state is unexpected here (assertion failure)
-  loop this process while there are openers in the FIFO

**CacheEntry::OnFileReady** (entry atomic):

-  load result == failure or the file has not been found on disk (is
   new): state = EMPTY
-  otherwise: state = READY since the cache file has been found and is
   usable containing meta data and data of the entry
-  call to ``CacheEntry::InvokeCallbacks``

**CacheEntry::OnHandleClosed** (entry atomic):

-  Called when any consumer throws the cache entry away
-  If the handle is not the handle given to the current *writer*, then
   exit
-  state == WRITING: the writer failed to call ``metaDataReady`` on the
   entry - state = EMPTY
-  state == REVALIDATING: the writer failed the re-validation process
   and failed to call ``setValid`` on the entry - state = READY
-  call to ``CacheEntry::InvokeCallbacks``

**All consumers release the reference:**

-  the entry may now be purged (removed) from memory when found expired
   or least used on overrun of the :ref:`memory
   pool <Intermediate_Memory_Caching>` limit
-  when this is a disk cache entry, its cached data chunks are released
   from memory and only meta data is kept

.. _Intermediate_Memory_Caching:

Intermediate memory caching
---------------------------

Intermediate memory caching of frequently used metadata (a.k.a. disk cache memory pool).

For the disk cache entries we keep some of the most recent and most used
cache entries' meta data in memory for immediate zero-thread-loop
opening. The default size of this meta data memory pool is only 250kB
and is controlled by a new ``browser.cache.disk.metadata_memory_limit``
preference. When the limit is exceeded, we purge (throw away) first
**expired** and then **least used** entries to free up memory again.

Only ``CacheEntry`` objects that are already loaded and filled with data
and having the 'consumer reference == 0' (`bug
942835 <https://bugzilla.mozilla.org/show_bug.cgi?id=942835#c3>`__) can
be purged.

The 'least used' entries are recognized by the lowest value of
`frecency <https://wiki.mozilla.org/User:Jesse/NewFrecency?title=User:Jesse/NewFrecency>`__
we re-compute for each entry on its every access. The decay time is
controlled by the ``browser.cache.frecency_half_life_hours`` preference
and defaults to 6 hours. The best decay time will be based on results of
`an experiment <https://bugzilla.mozilla.org/show_bug.cgi?id=986728>`__.

The memory pool is represented by two lists (strong referring ordered
arrays) of ``CacheEntry`` objects:

#. Sorted by expiration time (that default to 0xFFFFFFFF)
#. Sorted by frecency (defaults to 0)

We have two such pools, one for memory-only entries actually
representing the memory-only cache and one for disk cache entries for
which we only keep the meta data.  Each pool has a different limit
checking - the memory cache pool is controlled by
``browser.cache.memory.capacity``, the disk entries pool is already
described above. The pool can be accessed and modified only on the cache
background thread.