summaryrefslogtreecommitdiffstats
path: root/browser/components/urlbar/docs/ranking.rst
blob: a1c9d03c3cd2aee962761c80bb9861b7b8cb9397 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
=======
Ranking
=======

Before results appear in the UrlbarView, they are fetched from providers.

Each `UrlbarProvider <https://firefox-source-docs.mozilla.org/browser/urlbar/overview.html#urlbarprovider>`_
implements its own internal ranking and returns sorted results.

Externally all the results are ranked by the `UrlbarMuxer <https://searchfox.org/mozilla-central/source/browser/components/urlbar/UrlbarMuxerUnifiedComplete.jsm>`_
according to an hardcoded list of groups and sub-grups.

.. NOTE:: Preferences can influence the groups order, for example by putting
  Firefox Suggest before Search Suggestions.

The Places provider, responsible to return history and bookmark results, uses
an internal ranking algorithm called Frecency.

Frecency implementation
=======================

Frecency is a term derived from `frequency` and `recency`, its scope is to provide a
ranking algorithm that gives importance both to how often a page is accessed and
when it was last visited.
Additionally, it accounts for the type of each visit through a bonus system.

To account for `recency`, a bucketing system is implemented.
If a page has been visited later than the bucket cutoff, it gets the weight
associated with that bucket:

- Up to 4 days old - weight 100 - ``places.frecency.firstBucketCutoff/Weight``
- Up to 14 days old - weight 70 - ``places.frecency.secondBucketCutoff/Weight``
- Up to 31 days old - weight 50 - ``places.frecency.thirdBucketCutoff/Weight``
- Up to 90 days old - weight 30 - ``places.frecency.fourthBucketCutoff/Weight``
- Anything else - weight 10 - ``places.frecency.defaultBucketWeight``

To account for `frequency`, the total number of visits to a page is used to
calculate the final score.

The type of each visit is taken into account using specific bonuses:

Default bonus
  Any unknown type gets a default bonus. This is expected to be unused.
  Pref ``places.frecency.defaultVisitBonus`` current value: 0.
Embed
  Used for embedded/framed visits not due to user actions. These visits today
  are stored in memory and never participate to frecency calculation.
  Thus this is currently unused.
  Pref ``places.frecency.embedVisitBonus`` current value: 0.
Framed Link
  Used for cross-frame visits due to user action.
  Pref ``places.frecency.framedLinkVisitBonus`` current value: 0.
Download
  Used for download visits. It’s important to support link coloring for these
  visits, but they are not necessarily useful address bar results (the Downloads
  view can do a better job with  these), so their frecency can be low.
  Pref ``places.frecency.downloadVisitBonus`` current value: 0.
Reload
  Used for reload visits (refresh same page). Low because it should not be
  possible to influence frecency by multiple reloads.
  Pref ``places.frecency.reloadVisitBonus`` current value: 0.
Redirect Source
  Used when the page redirects to another one.
  It’s a low value because we give more importance to the final destination,
  that is what the user actually visits, especially for permanent redirects.
  Pref ``places.frecency.redirectSourceVisitBonus`` current value: 25.
Temporary Redirect
  Used for visits resulting from a temporary redirect (HTTP 307).
  Pref ``places.frecency.tempRedirectVisitBonus`` current value: 40.
Permanent Redirect
  Used for visits resulting from a permanent redirect (HTTP 301). This is the
  new supposed destination for a url, thus the bonus is higher than temporary.
  In this case it may be advisable to just pick the bonus for the source visit.
  Pref ``places.frecency.permRedirectVisitBonus`` current value: 50.
Bookmark
  Used for visits generated from bookmark views.
  Pref ``places.frecency.bookmarkVisitBonus`` current value: 75.
Link
  Used for normal visits, for example when clicking on a link.
  Pref ``places.frecency.linkVisitBonus`` current value: 100.
Typed
  Intended to be used for pages typed by the user, in reality it is used when
  the user picks a url from the UI (history views or the Address Bar).
  Pref ``places.frecency.typedVisitBonus`` current value: 2000.

The above bonuses are applied to visits, in addition to that there are also a
few bonuses applied in case a page is not visited at all, both of these bonuses
can be applied at the same time:

Unvisited bookmarked page
  Used for pages that are bookmarked but unvisited.
  Pref ``places.frecency.unvisitedBookmarkBonus`` current value: 140.
Unvisited typed page
  Used for pages that were typed and now are bookmarked (otherwise they would
  be orphans).
  Pref ``places.frecency.unvisitedTypedBonus`` current value: 200.

Two special frecency values are also defined:

- ``-1`` represents a just inserted entry in the database, whose score has not
  been calculated yet.
- ``0`` represents an entry for which a new value should not be calculated,
  because it has a poor user value (e.g. place: queries) among search results.

Finally, because calculating a score from all of the visits every time a new
visit is added would be expensive, only a sample of the last 10
(pref ``places.frecency.numVisits``) visits is used.

How frecency for a page is calculated
-------------------------------------

.. mermaid::
    :align: center
    :caption: Frecency calculation flow

    flowchart TD
        start[URL]
        a0{Has visits?}
        a1[Get last 10 visit]
        a2[bonus = unvisited_bonus + bookmarked + typed]
        a3{bonus > 0?}
        end0[Frecency = 0]
        end1["frecency = age_bucket_weight * (bonus / 100)"]
        a4[Sum points of all sampled visits]
        a5{points > 0?}
        end2[frecency = -1]
        end3["Frecency = visit_count * (points / sample_size)"]
        subgraph sub [Per each visit]
            sub0[bonus = visit_type_bonus]
            sub1{bookmarked?}
            sub2[add bookmark bonus]
            sub3["score = age_bucket_weight * (bonus / 100)"]
            sub0 --> sub1
            sub1 -- yes --> sub2
            sub1 -- no --> sub3
            sub2 --> sub3;
        end
        start --> a0
        a0 -- no --> a2
        a2 --> a3
        a3 -- no --> end0
        a3 -- yes --> end1
        a0 -- yes --> a1
        a1 --> sub
        sub --> a4
        a4 --> a5
        a5 -- no --> end2
        a5 -- yes --> end3

1. If the page is visited, get a sample of ``NUM_VISITS`` most recent visits.
2. For each visit get a transition bonus, depending on the visit type.
3. If the page is bookmarked, add to the bonus an additional bookmark bonus.
4. If the bonus is positive, get a bucket weight depending on the visit date.
5. Calculate points for the visit as ``age_bucket_weight * (bonus / 100)``.
6. Sum points for all the sampled visits.
7. If the points sum is zero, return a ``-1`` frecency, it will still appear in the UI.
   Otherwise, frecency is ``visitCount * points / NUM_VISITS``.
8. If the page is unvisited and not bookmarked, or it’s a bookmarked place-query,
   return a ``0`` frecency, to hide it from the UI.
9. If it’s bookmarked, add the bookmark bonus.
10. If it’s also a typed page, add the typed bonus.
11. Frecency is ``age_bucket_weight * (bonus / 100)``

When frecency for a page is calculated
--------------------------------------

Operations that may influence the frecency score are:

* Adding visits
* Removing visits
* Adding bookmarks
* Removing bookmarks
* Changing the url of a bookmark

Frecency is recalculated:

* Immediately, when a new visit is added. The user expectation here is that the
  page appears in search results after being visited. This is also valid for
  any History API that allows to add visits.
* In background on idle times, in any other case. In most cases having a
  temporarily stale value is not a problem, the main concern would be privacy
  when removing history of a page, but removing whole history will either
  completely remove the page or, if it's bookmarked, it will still be relevant.
  In this case, when a change influencing frecency happens, the ``recalc_frecency``
  database field for the page is set to ``1``.

Recalculation is done by the `PlacesFrecencyRecalculator <https://searchfox.org/mozilla-central/source/toolkit/components/places/PlacesFrecencyRecalculator.sys.mjs>`_ module.
The Recalculator is notified when ``PlacesUtils.history.shouldStartFrecencyRecalculation``
value changes from false to true, that means there's values to recalculate.
A DeferredTask is armed, that will look for a user idle opportunity
in the next 5 minutes, otherwise it will run when that time elapses.
Once all the outdated values have been recalculated
``PlacesUtils.history.shouldStartFrecencyRecalculation`` is set back to false
until the next operation invalidating a frecency.
The recalculation task is also armed on the ``idle-daily`` notification.

When the task is executed, it recalculates frecency of a chunk of pages. If
there are more pages left to recalculate, the task is re-armed. After frecency
of a page is recalculated, its ``recalc_frecency`` field is set back to ``0``.

Frecency is also decayed daily during the ``idle-daily`` notification, by
multiplying all the scores by a decay rate  of ``0.975`` (half-life of 28 days).
This guarantees entries not receiving new visits or bookmarks lose relevancy.


Adaptive Input History
======================

Input History (also known as Adaptive History) is a feature that allows to
find urls that the user previously picked. To do so, it associates search strings
with picked urls.

Adaptive results are usually presented before frecency derived results, making
them appear as having an infinite frecency.

When the user types a given string, and picks a result from the address bar, that
relation is stored and increases a use_count field for the given string.
The use_count field asymptotically approaches a max of ``10`` (the update is
done as ``use_count * .9 + 1``).

On querying, all the search strings that start with the input string are matched,
a rank is calculated per each page as ``ROUND(MAX(use_count) * (1 + (input = :search_string)), 1)``,
so that results perfectly matching the search string appear at the top.
Results with the same rank are additionally sorted by descending frecency.

On daily idles, when frecency is decayed, also input history gets decayed, in
particular the use_count field is multiplied by a decay rate  of ``0.975``.
After decaying, any entry that has a ``use_count < 0.975^90 (= 0.1)`` is removed,
thus entries are removed if unused for 90 days.