summaryrefslogtreecommitdiffstats
path: root/browser/components/urlbar/docs/ranking.rst
diff options
context:
space:
mode:
Diffstat (limited to 'browser/components/urlbar/docs/ranking.rst')
-rw-r--r--browser/components/urlbar/docs/ranking.rst229
1 files changed, 229 insertions, 0 deletions
diff --git a/browser/components/urlbar/docs/ranking.rst b/browser/components/urlbar/docs/ranking.rst
new file mode 100644
index 0000000000..a1c9d03c3c
--- /dev/null
+++ b/browser/components/urlbar/docs/ranking.rst
@@ -0,0 +1,229 @@
+=======
+Ranking
+=======
+
+Before results appear in the UrlbarView, they are fetched from providers.
+
+Each `UrlbarProvider <https://firefox-source-docs.mozilla.org/browser/urlbar/overview.html#urlbarprovider>`_
+implements its own internal ranking and returns sorted results.
+
+Externally all the results are ranked by the `UrlbarMuxer <https://searchfox.org/mozilla-central/source/browser/components/urlbar/UrlbarMuxerUnifiedComplete.jsm>`_
+according to an hardcoded list of groups and sub-grups.
+
+.. NOTE:: Preferences can influence the groups order, for example by putting
+ Firefox Suggest before Search Suggestions.
+
+The Places provider, responsible to return history and bookmark results, uses
+an internal ranking algorithm called Frecency.
+
+Frecency implementation
+=======================
+
+Frecency is a term derived from `frequency` and `recency`, its scope is to provide a
+ranking algorithm that gives importance both to how often a page is accessed and
+when it was last visited.
+Additionally, it accounts for the type of each visit through a bonus system.
+
+To account for `recency`, a bucketing system is implemented.
+If a page has been visited later than the bucket cutoff, it gets the weight
+associated with that bucket:
+
+- Up to 4 days old - weight 100 - ``places.frecency.firstBucketCutoff/Weight``
+- Up to 14 days old - weight 70 - ``places.frecency.secondBucketCutoff/Weight``
+- Up to 31 days old - weight 50 - ``places.frecency.thirdBucketCutoff/Weight``
+- Up to 90 days old - weight 30 - ``places.frecency.fourthBucketCutoff/Weight``
+- Anything else - weight 10 - ``places.frecency.defaultBucketWeight``
+
+To account for `frequency`, the total number of visits to a page is used to
+calculate the final score.
+
+The type of each visit is taken into account using specific bonuses:
+
+Default bonus
+ Any unknown type gets a default bonus. This is expected to be unused.
+ Pref ``places.frecency.defaultVisitBonus`` current value: 0.
+Embed
+ Used for embedded/framed visits not due to user actions. These visits today
+ are stored in memory and never participate to frecency calculation.
+ Thus this is currently unused.
+ Pref ``places.frecency.embedVisitBonus`` current value: 0.
+Framed Link
+ Used for cross-frame visits due to user action.
+ Pref ``places.frecency.framedLinkVisitBonus`` current value: 0.
+Download
+ Used for download visits. It’s important to support link coloring for these
+ visits, but they are not necessarily useful address bar results (the Downloads
+ view can do a better job with these), so their frecency can be low.
+ Pref ``places.frecency.downloadVisitBonus`` current value: 0.
+Reload
+ Used for reload visits (refresh same page). Low because it should not be
+ possible to influence frecency by multiple reloads.
+ Pref ``places.frecency.reloadVisitBonus`` current value: 0.
+Redirect Source
+ Used when the page redirects to another one.
+ It’s a low value because we give more importance to the final destination,
+ that is what the user actually visits, especially for permanent redirects.
+ Pref ``places.frecency.redirectSourceVisitBonus`` current value: 25.
+Temporary Redirect
+ Used for visits resulting from a temporary redirect (HTTP 307).
+ Pref ``places.frecency.tempRedirectVisitBonus`` current value: 40.
+Permanent Redirect
+ Used for visits resulting from a permanent redirect (HTTP 301). This is the
+ new supposed destination for a url, thus the bonus is higher than temporary.
+ In this case it may be advisable to just pick the bonus for the source visit.
+ Pref ``places.frecency.permRedirectVisitBonus`` current value: 50.
+Bookmark
+ Used for visits generated from bookmark views.
+ Pref ``places.frecency.bookmarkVisitBonus`` current value: 75.
+Link
+ Used for normal visits, for example when clicking on a link.
+ Pref ``places.frecency.linkVisitBonus`` current value: 100.
+Typed
+ Intended to be used for pages typed by the user, in reality it is used when
+ the user picks a url from the UI (history views or the Address Bar).
+ Pref ``places.frecency.typedVisitBonus`` current value: 2000.
+
+The above bonuses are applied to visits, in addition to that there are also a
+few bonuses applied in case a page is not visited at all, both of these bonuses
+can be applied at the same time:
+
+Unvisited bookmarked page
+ Used for pages that are bookmarked but unvisited.
+ Pref ``places.frecency.unvisitedBookmarkBonus`` current value: 140.
+Unvisited typed page
+ Used for pages that were typed and now are bookmarked (otherwise they would
+ be orphans).
+ Pref ``places.frecency.unvisitedTypedBonus`` current value: 200.
+
+Two special frecency values are also defined:
+
+- ``-1`` represents a just inserted entry in the database, whose score has not
+ been calculated yet.
+- ``0`` represents an entry for which a new value should not be calculated,
+ because it has a poor user value (e.g. place: queries) among search results.
+
+Finally, because calculating a score from all of the visits every time a new
+visit is added would be expensive, only a sample of the last 10
+(pref ``places.frecency.numVisits``) visits is used.
+
+How frecency for a page is calculated
+-------------------------------------
+
+.. mermaid::
+ :align: center
+ :caption: Frecency calculation flow
+
+ flowchart TD
+ start[URL]
+ a0{Has visits?}
+ a1[Get last 10 visit]
+ a2[bonus = unvisited_bonus + bookmarked + typed]
+ a3{bonus > 0?}
+ end0[Frecency = 0]
+ end1["frecency = age_bucket_weight * (bonus / 100)"]
+ a4[Sum points of all sampled visits]
+ a5{points > 0?}
+ end2[frecency = -1]
+ end3["Frecency = visit_count * (points / sample_size)"]
+ subgraph sub [Per each visit]
+ sub0[bonus = visit_type_bonus]
+ sub1{bookmarked?}
+ sub2[add bookmark bonus]
+ sub3["score = age_bucket_weight * (bonus / 100)"]
+ sub0 --> sub1
+ sub1 -- yes --> sub2
+ sub1 -- no --> sub3
+ sub2 --> sub3;
+ end
+ start --> a0
+ a0 -- no --> a2
+ a2 --> a3
+ a3 -- no --> end0
+ a3 -- yes --> end1
+ a0 -- yes --> a1
+ a1 --> sub
+ sub --> a4
+ a4 --> a5
+ a5 -- no --> end2
+ a5 -- yes --> end3
+
+1. If the page is visited, get a sample of ``NUM_VISITS`` most recent visits.
+2. For each visit get a transition bonus, depending on the visit type.
+3. If the page is bookmarked, add to the bonus an additional bookmark bonus.
+4. If the bonus is positive, get a bucket weight depending on the visit date.
+5. Calculate points for the visit as ``age_bucket_weight * (bonus / 100)``.
+6. Sum points for all the sampled visits.
+7. If the points sum is zero, return a ``-1`` frecency, it will still appear in the UI.
+ Otherwise, frecency is ``visitCount * points / NUM_VISITS``.
+8. If the page is unvisited and not bookmarked, or it’s a bookmarked place-query,
+ return a ``0`` frecency, to hide it from the UI.
+9. If it’s bookmarked, add the bookmark bonus.
+10. If it’s also a typed page, add the typed bonus.
+11. Frecency is ``age_bucket_weight * (bonus / 100)``
+
+When frecency for a page is calculated
+--------------------------------------
+
+Operations that may influence the frecency score are:
+
+* Adding visits
+* Removing visits
+* Adding bookmarks
+* Removing bookmarks
+* Changing the url of a bookmark
+
+Frecency is recalculated:
+
+* Immediately, when a new visit is added. The user expectation here is that the
+ page appears in search results after being visited. This is also valid for
+ any History API that allows to add visits.
+* In background on idle times, in any other case. In most cases having a
+ temporarily stale value is not a problem, the main concern would be privacy
+ when removing history of a page, but removing whole history will either
+ completely remove the page or, if it's bookmarked, it will still be relevant.
+ In this case, when a change influencing frecency happens, the ``recalc_frecency``
+ database field for the page is set to ``1``.
+
+Recalculation is done by the `PlacesFrecencyRecalculator <https://searchfox.org/mozilla-central/source/toolkit/components/places/PlacesFrecencyRecalculator.sys.mjs>`_ module.
+The Recalculator is notified when ``PlacesUtils.history.shouldStartFrecencyRecalculation``
+value changes from false to true, that means there's values to recalculate.
+A DeferredTask is armed, that will look for a user idle opportunity
+in the next 5 minutes, otherwise it will run when that time elapses.
+Once all the outdated values have been recalculated
+``PlacesUtils.history.shouldStartFrecencyRecalculation`` is set back to false
+until the next operation invalidating a frecency.
+The recalculation task is also armed on the ``idle-daily`` notification.
+
+When the task is executed, it recalculates frecency of a chunk of pages. If
+there are more pages left to recalculate, the task is re-armed. After frecency
+of a page is recalculated, its ``recalc_frecency`` field is set back to ``0``.
+
+Frecency is also decayed daily during the ``idle-daily`` notification, by
+multiplying all the scores by a decay rate of ``0.975`` (half-life of 28 days).
+This guarantees entries not receiving new visits or bookmarks lose relevancy.
+
+
+Adaptive Input History
+======================
+
+Input History (also known as Adaptive History) is a feature that allows to
+find urls that the user previously picked. To do so, it associates search strings
+with picked urls.
+
+Adaptive results are usually presented before frecency derived results, making
+them appear as having an infinite frecency.
+
+When the user types a given string, and picks a result from the address bar, that
+relation is stored and increases a use_count field for the given string.
+The use_count field asymptotically approaches a max of ``10`` (the update is
+done as ``use_count * .9 + 1``).
+
+On querying, all the search strings that start with the input string are matched,
+a rank is calculated per each page as ``ROUND(MAX(use_count) * (1 + (input = :search_string)), 1)``,
+so that results perfectly matching the search string appear at the top.
+Results with the same rank are additionally sorted by descending frecency.
+
+On daily idles, when frecency is decayed, also input history gets decayed, in
+particular the use_count field is multiplied by a decay rate of ``0.975``.
+After decaying, any entry that has a ``use_count < 0.975^90 (= 0.1)`` is removed,
+thus entries are removed if unused for 90 days.