diff options
Diffstat (limited to 'browser/components/urlbar/docs/ranking.rst')
-rw-r--r-- | browser/components/urlbar/docs/ranking.rst | 229 |
1 files changed, 229 insertions, 0 deletions
diff --git a/browser/components/urlbar/docs/ranking.rst b/browser/components/urlbar/docs/ranking.rst new file mode 100644 index 0000000000..a1c9d03c3c --- /dev/null +++ b/browser/components/urlbar/docs/ranking.rst @@ -0,0 +1,229 @@ +======= +Ranking +======= + +Before results appear in the UrlbarView, they are fetched from providers. + +Each `UrlbarProvider <https://firefox-source-docs.mozilla.org/browser/urlbar/overview.html#urlbarprovider>`_ +implements its own internal ranking and returns sorted results. + +Externally all the results are ranked by the `UrlbarMuxer <https://searchfox.org/mozilla-central/source/browser/components/urlbar/UrlbarMuxerUnifiedComplete.jsm>`_ +according to an hardcoded list of groups and sub-grups. + +.. NOTE:: Preferences can influence the groups order, for example by putting + Firefox Suggest before Search Suggestions. + +The Places provider, responsible to return history and bookmark results, uses +an internal ranking algorithm called Frecency. + +Frecency implementation +======================= + +Frecency is a term derived from `frequency` and `recency`, its scope is to provide a +ranking algorithm that gives importance both to how often a page is accessed and +when it was last visited. +Additionally, it accounts for the type of each visit through a bonus system. + +To account for `recency`, a bucketing system is implemented. +If a page has been visited later than the bucket cutoff, it gets the weight +associated with that bucket: + +- Up to 4 days old - weight 100 - ``places.frecency.firstBucketCutoff/Weight`` +- Up to 14 days old - weight 70 - ``places.frecency.secondBucketCutoff/Weight`` +- Up to 31 days old - weight 50 - ``places.frecency.thirdBucketCutoff/Weight`` +- Up to 90 days old - weight 30 - ``places.frecency.fourthBucketCutoff/Weight`` +- Anything else - weight 10 - ``places.frecency.defaultBucketWeight`` + +To account for `frequency`, the total number of visits to a page is used to +calculate the final score. + +The type of each visit is taken into account using specific bonuses: + +Default bonus + Any unknown type gets a default bonus. This is expected to be unused. + Pref ``places.frecency.defaultVisitBonus`` current value: 0. +Embed + Used for embedded/framed visits not due to user actions. These visits today + are stored in memory and never participate to frecency calculation. + Thus this is currently unused. + Pref ``places.frecency.embedVisitBonus`` current value: 0. +Framed Link + Used for cross-frame visits due to user action. + Pref ``places.frecency.framedLinkVisitBonus`` current value: 0. +Download + Used for download visits. It’s important to support link coloring for these + visits, but they are not necessarily useful address bar results (the Downloads + view can do a better job with these), so their frecency can be low. + Pref ``places.frecency.downloadVisitBonus`` current value: 0. +Reload + Used for reload visits (refresh same page). Low because it should not be + possible to influence frecency by multiple reloads. + Pref ``places.frecency.reloadVisitBonus`` current value: 0. +Redirect Source + Used when the page redirects to another one. + It’s a low value because we give more importance to the final destination, + that is what the user actually visits, especially for permanent redirects. + Pref ``places.frecency.redirectSourceVisitBonus`` current value: 25. +Temporary Redirect + Used for visits resulting from a temporary redirect (HTTP 307). + Pref ``places.frecency.tempRedirectVisitBonus`` current value: 40. +Permanent Redirect + Used for visits resulting from a permanent redirect (HTTP 301). This is the + new supposed destination for a url, thus the bonus is higher than temporary. + In this case it may be advisable to just pick the bonus for the source visit. + Pref ``places.frecency.permRedirectVisitBonus`` current value: 50. +Bookmark + Used for visits generated from bookmark views. + Pref ``places.frecency.bookmarkVisitBonus`` current value: 75. +Link + Used for normal visits, for example when clicking on a link. + Pref ``places.frecency.linkVisitBonus`` current value: 100. +Typed + Intended to be used for pages typed by the user, in reality it is used when + the user picks a url from the UI (history views or the Address Bar). + Pref ``places.frecency.typedVisitBonus`` current value: 2000. + +The above bonuses are applied to visits, in addition to that there are also a +few bonuses applied in case a page is not visited at all, both of these bonuses +can be applied at the same time: + +Unvisited bookmarked page + Used for pages that are bookmarked but unvisited. + Pref ``places.frecency.unvisitedBookmarkBonus`` current value: 140. +Unvisited typed page + Used for pages that were typed and now are bookmarked (otherwise they would + be orphans). + Pref ``places.frecency.unvisitedTypedBonus`` current value: 200. + +Two special frecency values are also defined: + +- ``-1`` represents a just inserted entry in the database, whose score has not + been calculated yet. +- ``0`` represents an entry for which a new value should not be calculated, + because it has a poor user value (e.g. place: queries) among search results. + +Finally, because calculating a score from all of the visits every time a new +visit is added would be expensive, only a sample of the last 10 +(pref ``places.frecency.numVisits``) visits is used. + +How frecency for a page is calculated +------------------------------------- + +.. mermaid:: + :align: center + :caption: Frecency calculation flow + + flowchart TD + start[URL] + a0{Has visits?} + a1[Get last 10 visit] + a2[bonus = unvisited_bonus + bookmarked + typed] + a3{bonus > 0?} + end0[Frecency = 0] + end1["frecency = age_bucket_weight * (bonus / 100)"] + a4[Sum points of all sampled visits] + a5{points > 0?} + end2[frecency = -1] + end3["Frecency = visit_count * (points / sample_size)"] + subgraph sub [Per each visit] + sub0[bonus = visit_type_bonus] + sub1{bookmarked?} + sub2[add bookmark bonus] + sub3["score = age_bucket_weight * (bonus / 100)"] + sub0 --> sub1 + sub1 -- yes --> sub2 + sub1 -- no --> sub3 + sub2 --> sub3; + end + start --> a0 + a0 -- no --> a2 + a2 --> a3 + a3 -- no --> end0 + a3 -- yes --> end1 + a0 -- yes --> a1 + a1 --> sub + sub --> a4 + a4 --> a5 + a5 -- no --> end2 + a5 -- yes --> end3 + +1. If the page is visited, get a sample of ``NUM_VISITS`` most recent visits. +2. For each visit get a transition bonus, depending on the visit type. +3. If the page is bookmarked, add to the bonus an additional bookmark bonus. +4. If the bonus is positive, get a bucket weight depending on the visit date. +5. Calculate points for the visit as ``age_bucket_weight * (bonus / 100)``. +6. Sum points for all the sampled visits. +7. If the points sum is zero, return a ``-1`` frecency, it will still appear in the UI. + Otherwise, frecency is ``visitCount * points / NUM_VISITS``. +8. If the page is unvisited and not bookmarked, or it’s a bookmarked place-query, + return a ``0`` frecency, to hide it from the UI. +9. If it’s bookmarked, add the bookmark bonus. +10. If it’s also a typed page, add the typed bonus. +11. Frecency is ``age_bucket_weight * (bonus / 100)`` + +When frecency for a page is calculated +-------------------------------------- + +Operations that may influence the frecency score are: + +* Adding visits +* Removing visits +* Adding bookmarks +* Removing bookmarks +* Changing the url of a bookmark + +Frecency is recalculated: + +* Immediately, when a new visit is added. The user expectation here is that the + page appears in search results after being visited. This is also valid for + any History API that allows to add visits. +* In background on idle times, in any other case. In most cases having a + temporarily stale value is not a problem, the main concern would be privacy + when removing history of a page, but removing whole history will either + completely remove the page or, if it's bookmarked, it will still be relevant. + In this case, when a change influencing frecency happens, the ``recalc_frecency`` + database field for the page is set to ``1``. + +Recalculation is done by the `PlacesFrecencyRecalculator <https://searchfox.org/mozilla-central/source/toolkit/components/places/PlacesFrecencyRecalculator.sys.mjs>`_ module. +The Recalculator is notified when ``PlacesUtils.history.shouldStartFrecencyRecalculation`` +value changes from false to true, that means there's values to recalculate. +A DeferredTask is armed, that will look for a user idle opportunity +in the next 5 minutes, otherwise it will run when that time elapses. +Once all the outdated values have been recalculated +``PlacesUtils.history.shouldStartFrecencyRecalculation`` is set back to false +until the next operation invalidating a frecency. +The recalculation task is also armed on the ``idle-daily`` notification. + +When the task is executed, it recalculates frecency of a chunk of pages. If +there are more pages left to recalculate, the task is re-armed. After frecency +of a page is recalculated, its ``recalc_frecency`` field is set back to ``0``. + +Frecency is also decayed daily during the ``idle-daily`` notification, by +multiplying all the scores by a decay rate of ``0.975`` (half-life of 28 days). +This guarantees entries not receiving new visits or bookmarks lose relevancy. + + +Adaptive Input History +====================== + +Input History (also known as Adaptive History) is a feature that allows to +find urls that the user previously picked. To do so, it associates search strings +with picked urls. + +Adaptive results are usually presented before frecency derived results, making +them appear as having an infinite frecency. + +When the user types a given string, and picks a result from the address bar, that +relation is stored and increases a use_count field for the given string. +The use_count field asymptotically approaches a max of ``10`` (the update is +done as ``use_count * .9 + 1``). + +On querying, all the search strings that start with the input string are matched, +a rank is calculated per each page as ``ROUND(MAX(use_count) * (1 + (input = :search_string)), 1)``, +so that results perfectly matching the search string appear at the top. +Results with the same rank are additionally sorted by descending frecency. + +On daily idles, when frecency is decayed, also input history gets decayed, in +particular the use_count field is multiplied by a decay rate of ``0.975``. +After decaying, any entry that has a ``use_count < 0.975^90 (= 0.1)`` is removed, +thus entries are removed if unused for 90 days. |