summaryrefslogtreecommitdiffstats
path: root/widget/windows/docs/windows-pointing-device/index.rst
diff options
context:
space:
mode:
Diffstat (limited to 'widget/windows/docs/windows-pointing-device/index.rst')
-rw-r--r--widget/windows/docs/windows-pointing-device/index.rst1384
1 files changed, 1384 insertions, 0 deletions
diff --git a/widget/windows/docs/windows-pointing-device/index.rst b/widget/windows/docs/windows-pointing-device/index.rst
new file mode 100644
index 0000000000..eda552b3dd
--- /dev/null
+++ b/widget/windows/docs/windows-pointing-device/index.rst
@@ -0,0 +1,1384 @@
+################################################################################
+Windows Pointing Device Support in Firefox
+################################################################################
+
+.. contents:: Table of Contents
+ :depth: 4
+
+================================================================================
+Introduction
+================================================================================
+
+This document is intended to provide the reader with a quick primer and/or
+refresher on pointing devices and the various operating system APIs, user
+experience guidelines, and Web standards that contribute to the way Firefox
+handles input devices on Microsoft Windows.
+
+The documentation for these things is scattered across the web and has varying
+levels of detail and completeness; some of it is missing or ambiguous and was
+only determined experimentally or by reading about other people's experiences
+through forum posts. An explicit goal of this document is to gather this
+information into a cohesive picture.
+
+We will then discuss the ways in which Firefox currently (as of early 2023)
+produces incorrect or suboptimal behavior when implementing those standards
+and guidelines.
+
+Finally, we will raise some thoughts and questions to spark discussion on how
+we might improve the situation and handle corner cases. Some of
+these issues are intrinsically "opinion based" or "policy based", so clear
+direction on these is desirable before engineering effort is invested into
+reimplementation.
+
+
+================================================================================
+Motivation
+================================================================================
+
+A quick look at the `pile of defects <https://bugzilla.mozilla.orgbuglist.cgi?query_format=advanced&status_whiteboard=%5Bwin%3Atouch%5D&list_id=16586149&status_whiteboard_type=allwordssubstr>`__
+on *bugzilla.mozilla.org* marked with *[win:touch]* will show anyone that
+Firefox's input stack for pointer devices has issues, but the bugs recorded
+there don't begin to capture the full range of unreported glitches and
+difficult-to-reproduce hiccups that users run into while using touchscreen
+hardware and pen digitizers on Firefox, nor does it capture the ways that
+Firefox misbehaves according to various W3C standards that are (luckily) either
+rarely used or worked around in web apps (and thus go undetected or
+unreported).
+
+These bugs primarily manifest in a few ways that will each be discussed in
+their own section:
+
+1. Firefox failing to return the proper values for the ``pointer``,
+ ``any-pointer``, ``hover``, and ``any-hover`` CSS Media Queries
+
+2. Firefox failing to fire the correct pointer-related DOM events at the
+ correct time (or at all)
+
+3. Firefox's inconsistent handling of touch-related gestures like scrolling,
+ where certain machines (like the Surface Pro) fail to meet the expected
+ behavior of scrolling inertia and overscroll. This leads to a weird touch
+ experience where the page comes to a choppy, dead-stop when using
+ single-finger scrolling
+
+
+It's worth noting that Firefox is not alone in having these types of issues,
+and that handling input devices is a notoriously difficult task for many
+applications; even a substantial amount of Microsoft's own software has trouble
+navigating this minefield on their own Microsoft Surface devices. Defects are
+instigated by a combination of the *intrinsic complexity* of the problem domain
+and the *accidential complexity* introduced by device vendors and Windows
+itself.
+
+The *intrinsic complexity* comes from the simple fact that human-machine
+interaction is difficult. A person must attempt to convey complex
+and abstract goals through a series of simple movements involving a few pieces
+of physical hardware. The devices can send signals that are unclear
+or even contradictory, and the software must decide how to handle
+this.
+
+As a trivial example, every software engineer that's ever written
+page scrolling logic has to answer the question, "What should my
+program do if the user hits 'Page Up' and 'Page Down' at the same time?".
+While it may seem obvious that the answer is "Do nothing.", naively-written
+keyboard input logic might assume the two are mutually-exclusive and only
+process whichever key is handled first in program order.
+
+Occasionally, a new device will be invented that doesn't obviously map to
+existing abstractions and input pipelines. There will be a period of time where
+applications will want to support the new device, but it won't be well
+understood by either the application developers nor the device vendor
+themselves what ideal integration would look like. The new Apple Vision VR
+headset is such a device; traditional VR headsets have used controllers to
+point at things, but Apple insists that the entire thing should be done using
+only hand tracking and eye tracking. Developers of VR video games and other
+apps (like Firefox) will inevitably make many mistakes on the road to
+supporting this new headset.
+
+A major source of defect-causing *accidental complexity* is the lack of clear
+expectations and documentation from Microsoft for apps (like Firefox) that are
+not using their Universal Windows Platform (UWP). The Microsoft Developer
+Network (MSDN) mentions concepts like inertia, overscroll, elastic bounce,
+single-finger panning, etc., but the solution is presented in the context
+of UWP, and the solution for non-UWP apps is either unclear or undocumented.
+
+Adding to this complexity is the fact that Windows itself has gone through
+several iterations of input APIs for different classes of devices, and
+these APIs interact with each other in ways that are surprising or
+unintuitive. Again, the advice given on MSDN pertains to UWP apps, and the
+documentation about the newer "pointer" based window messages is
+a mix of incomplete and inaccurate.
+
+Finally, individual input devices have bugs in their driver software that
+would disrupt even applications that are using the Windows input APIs perfectly.
+Handling all of these deviations is impossible and would result in fragile,
+unmaintainable code, but Firefox inevitably has to work around common ones to
+avoid alienating large portions of the userbase.
+
+
+================================================================================
+Technical Background
+================================================================================
+
+
+A Quick Primer on Pointing Devices
+======================================
+
+
+Traditionally, web browsers were designed to accommodate computer mice and
+devices that behave in a similar way, like trackballs and touchpads on
+laptops. Generally, it was assumed that there would be one such device attached
+to the computer, and it would be used to control a hovering "cursor" whose
+movements would be changed by relative movement of the physical input device.
+
+However, modern computers can be controlled using a variety of different
+pointing devices, all with different characteristics. Many allow
+multiple concurrent targets to be pointed at and have multiple sensors,
+buttons, and other actuators.
+
+For example, the screen of the Microsoft Surface Pro has dual capabilities
+of being a touch sensor and a digitizer for a tablet pen. When being used as a
+workstation, it's not uncommon for a user to also connect the "keyboard +
+touchpad" cover and a mouse (via USB or Bluetooth) to provide the more
+productivity-oriented "keyboard and mouse" setup. In that configuration, there
+are 4 pointer devices connected to the machine simultaneously: a touch screen,
+a pen digitizer, a touchpad, and a mouse.
+
+The next section will give a quick overview of common pointing devices.
+Many will be familiar to the reader, but they are still mentioned to establish
+common terminology and to avoid making assumptions about familiarity with every
+input device.
+
+
+Common Pointing Devices
+---------------------------
+
+Here are some descriptions of a few pointing device types that demonstrate
+the diversity of hardware:
+
+**Touchscreen**
+
+ A touchscreen is a computer display that is able to sense the
+ location of (possibly-multiple) fingers (or stylus) making contact with its
+ surface. Software can then respond to the touches by changing the displayed
+ objects quickly, giving the user a sense of actually physically manipulating
+ them on screen with their hands.
+
+ .. image:: touchscreen.jpg
+ :width: 25%
+
+
+**Digitizing Tablet + Pen Stylus**
+
+ These advanced pointing devices tend to
+ exist in two forms: as an external sensing "pad" that can be plugged into a
+ computer and sits on a desk or in someone's lap, or as a sensor built right
+ into a computer display. Both use a "stylus", which is a pen-shaped
+ electronic device that is detectable by the surface. Common features
+ include the ability to distinguish proximity to the surface ("hovering")
+ versus actual contact, pressure sensitivity, angle/tilt detection, multiple
+ "ends" such as a tip and an eraser, and one-or-more buttons/switch
+ actuators.
+
+ .. image:: wacom_tablet.png
+ :width: 25%
+
+
+**Joystick/Pointer Stick**
+
+ Pointer sticks are most often seen in laptop
+ computers made by IBM/Lenovo, where they exist as a little red nub located
+ between the G, H, and B keys on a standard QWERTY keyboard. They function
+ similarly to the analog sticks on a game controller -- The user displaces
+ the stick from its center position, and that is interpreted as a relative
+ direction to move the on-screen cursor. A greater displacement from center
+ is interpreted as increased velocity of movement.
+
+ .. image:: trackpoint.jpg
+ :width: 25%
+
+
+**Touchpad**
+
+ A touchpad is a rectangular surface (often found on laptop
+ computers) that detects touch and motion of a finger and moves an on-screen
+ cursor relative to the motion. Modern touchpads often support multiple
+ touches simultaneously, and therefore offer functionality that is quite
+ similar to a touchscreen, albeit with different movement semantics because
+ of their physical separation from the screen (discussed below).
+
+ .. image:: touchpad.jpg
+ :width: 25%
+
+
+**VR Controllers**
+
+ VR controllers (and other similar devices like the
+ Wiimote from the Nintendo Wii) allow users to point at objects in a
+ three-dimensional virtual world by moving a real-world controller and
+ "projecting" the controller's position into the virtual space. They often
+ also include sensors to detect the yaw, pitch, and roll of the sensors.
+ There are often other inputs in the controller device, like analog sticks
+ and buttons.
+
+ .. image:: vrcontroller.jpg
+ :width: 25%
+
+
+**Hand Tracking**
+
+ Devices like the Apple Vision (introduced during the
+ time this document was being written) and (to a lesser extent) the Meta
+ Quest have the ability to track the wearer's hand and directly interpret
+ gestures and movements as input. As the human hand can assume a staggering
+ number of orientations and configurations, a finite list of specific shapes
+ and movements must be identified and labelled to allow for clear
+ software-user interaction.
+
+ .. image:: apple_vision_user.webp
+ :width: 25%
+
+ .. image:: apple_vision.jpg
+ :width: 25%
+
+
+**Mouse**
+
+ A pointing device that needs no introduction. Moving a physical
+ clam-shaped device across a surface translates to relative movement of a
+ cursor on screen.
+
+ .. image:: mouse.jpg
+ :width: 25%
+
+
+The Buxton Three-State Model
+-------------------------------
+
+
+Bill Buxton, an early pioneer in the field of human-computer interaction,
+came up with a three-state model for pointing devices; a device can be
+"Out of Range", "Tracking", or "Dragging". Not all devices support all three
+states, and some devices have multiple actuators that can have the three-state
+model individually applied.
+
+.. mermaid::
+
+ stateDiagram-v2
+ direction LR
+ state "State 0" as s0
+ state "State 1" as s1
+ state "State 2" as s2
+ s0 --> s0 : Out Of Range
+ s1 --> s1 : Tracking
+ s2 --> s2 : Dragging
+ s0 --> s1 : Stylus On
+ s1 --> s0 : Stylus Lift
+ s1 --> s2 : Tip Switch Close
+ s2 --> s1 : Tip Switch Open
+
+
+For demonstration, here is the model applied to a few devices:
+
+**Computer Mouse**
+
+ A mouse is never in the "Out of Range" state. Even though it can technically
+ be lifted off its surface, the mouse does not report this as a separate
+ condition; instead, it behaves as-if it is stationary until it can once
+ again sense the surface moving underneath.
+
+ The remaining two states apply to each button individually; when a button is
+ not being pressed, the mouse is considered in the "tracking" state with
+ respect to that button. When a button is held down, the mouse is "dragging"
+ with respect to that button. A "click" is simply considered a zero-length
+ drag under this model.
+
+ In the case of a two-button mouse, this means that the mouse can be in a
+ total of 4 different states: tracking, left button dragging, right button
+ dragging, and two-button dragging. In practice, very little software
+ actually does anything meaningful with two-button dragging.
+
+**Touch Screen**
+
+ Applying the model to a touch screen, one can observe that current hardware
+ has no way to sense that a finger that is "hovering, but not quite making
+ contact with the screen". This means that the "Tracking" state can be ruled
+ out, leaving only the "Out of Range" and "Dragging" states. Since many touch
+ screens can support multiple fingers touching the screen concurrently, and
+ each finger can be in one of two states, there are potentially 2^N different
+ "states" that a touchscreen can be in. Windows assigns meaning to many two,
+ three, and four-finger gestures.
+
+**Tablet Digitizer**
+
+ A tablet digitizer supports all three states: when the stylus is far away
+ from the surface, it is considered "out of range"; when it is located
+ slightly above the surface, it is "tracking"; and when it is making contact
+ with the surface, it is "dragging".
+
+The W3C standards for pointing devices are based on this three-state model, but
+applied to each individual web element instead of the entire system. This
+makes things like "Out-of-Range" possible for the mouse, since it can be
+out of range of a web element.
+
+The W3C uses the terms "over" and "out" to convey the transition between
+"out-of-range" and "tracking" (which the W3C calls "hover"), and the terms
+"down" and "up" convey the transition between "tracking" and "dragging".
+
+The standard also address some of the known shortcomings of the model to
+improve portability and consistency; these improvements will be discussed more
+below.
+
+The Windows Pointer API is *supposedly* based around this model,
+but unfortunately real-world testing shows that the model is not followed
+very consistently with respect to the actual signals sent to the application.
+
+
+Gestures
+=====================================
+
+
+In contrast to the sort-of "anything goes" UI designs of the past,
+modern operating systems like Windows, Mac OS X, iOS, Android, and even
+modern Linux DEs have an "opinionated" idea of how user interaction
+should behave across all apps on the platform (the so-called "look and feel"
+of the operating system).
+
+Users expect gestures like swipes, pinches, and taps to act the same way
+across all apps for a given operating system, and they expect things like
+on-screen keyboards or handwriting recognition to pop up in certain contexts.
+Failing to meet those expectations makes an app look less polished, and
+(especially as far as accessibility is concerned) it frustrates the user
+and makes it more difficult for them to interact with the app.
+
+Microsoft defines guidelines for various behaviours that Windows applications
+should ideally adhere to in the `Input and Interactions <https://learn.microsoft.com/en-us/windows/apps/design/input/>`__
+section on MSDN. Some of these are summarized quickly below:
+
+**Drag and Drop**
+
+ Drag and drop allows a user to transfer data from one application to
+ another. The gesture begins when a pointer device moves into the "Dragging"
+ state over top of a UI element, usually as a result of holding down a mouse
+ button or pressing a finger on a touchscreen. The user moves the pointer
+ over top of the receiver of the data, and then ends the gesture by releasing
+ the mouse button or lifting their finger off the touchscreen. Window
+ interprets this transition out of the "Dragging" state as permission to
+ initiate the data transfer.
+
+ Firefox has supported Drag and Drop for a very long time, so it will not be
+ discussed further.
+
+
+**Pan and Zoom**
+
+ When using touchscreens (and multi-touch touchpads), users expect to be able
+ to cause the viewport to "pan" left/right/up/down by pressing two fingers on
+ the screen (creating two pointers in "Dragging" state) and moving their
+ fingers in the direction of movement. When they are done, they can release
+ both fingers (changing both pointers to "Out of Bounds").
+
+ A zoom can be signalled by moving the two fingers apart or together
+ in a "pinch" or "reverse pinch" gesture.
+
+
+**Single Pointer Panning**
+
+ Applications that are based on a UI model of the user interacting with a
+ "page" often allow a single pointer "Dragging" over the viewport to cause
+ the viewport to pan, similarly to the two-finger panning discussed in the
+ previous section.
+
+ Note that this gesture is not as universal as two-finger panning is -- as a
+ counterexample, graphics programs tend to treat one-finger dragging as
+ object manipulation and two-finger dragging as viewport panning.
+
+
+**Inertia**
+
+ When a user is done panning, they may lift their finger/pen off the screen
+ while the viewport is still in motion. Users expect that the page will
+ continue to move for a little while, as-if the user had "tossed" the page
+ when they let go. Effectively, the page behaves as though it has "momentum"
+ that needs to be gradually lost before the page comes to a full stop.
+
+ Modern operating systems provide this behavior via their various native
+ widget toolkits, and the curve that objects follow as they slow to a stop
+ are different across OSes. In that way, they can be considered part of the
+ unique "look and feel" of the OS. Users expect the scrolling of pages in
+ their web browser to behave this way, and so when Firefox fails to provide
+ this behavior it can be jarring.
+
+
+**Overscroll and Elastic Bounce**
+
+ When a user is panning the page and reaches the outer edges, Microsoft
+ recommends that the app should begin an "elastic bounce" animation, where
+ the page will allow the user to scroll past the end ("overscroll"),
+ show empty space underneath the page, and then sort of "snap back" like a
+ rubber band that's been stretched and then released. You can see a
+ demonstration in `this article <https://www.windowslatest.com/2020/05/21/microsoft-is-adding-elastic-scrolling-to-chrome-on-windows-10/>`__,
+ which discusses Microsoft adding it to Chromium.
+
+
+History of Web Standards and Windows APIs
+===========================================
+
+The World-Wide Web Consortium (W3C) and the Web Hypertext Application
+Technology Working Group (WHATWG) manage the standards that detail the
+interface between a user agent (like Firefox) and applications designed to run
+on the Web Platform. The user agent, in turn, must rely on the operating system
+(Windows, in this case) to provide the necessary APIs to implement the
+standards required by the Web Platform.
+
+As a result of that relationship, a Web Standard is unlikely to be created
+until all widely-used operating systems provide the required APIs. That allows
+us to build a linear timeline with a predictable pattern: a new type of device
+becomes popular, the APIs to support it are introduced into operating systems,
+and eventually a cross-platform standard is introduced into the Web Platform.
+
+The following sections detail the history of input devices supported by
+Windows and the Web Platform:
+
+
+**1985 - Computer Mouse Support (Windows 1.0)**
+
+ The first version of Windows (1985) supported a computer mouse. Support
+ for other input devices is not well-documented, but probably non-existant.
+
+
+**1991 - Third-Party De-facto Pen Support (Wintab)**
+
+ In the late 80s and early 90s, any tablet pen hardware vendor that wanted
+ to support Windows would need to write a device driver and design a
+ proprietary user-mode API to expose the device to user applications. In
+ turn, application developers would have to write and maintain code to
+ support the APIs of every relevant device vendor.
+
+ In 1991, a company named LCS/Telegraphics released an API for Windows
+ called "Wintab", which was designed in collaboration with hardware and
+ software vendors to define a general API that could be targetted by
+ device drivers and applications.
+
+ It would take Microsoft more than a decade to include first-party support
+ for tablet pens in Windows, which allowed Wintab to become the de-facto
+ standard for pen support on Windows. The Wintab API continues to be
+ supported by virtually all artist tablets to this day. Notable companies
+ include Wacom, Huion, XP-Pen, etc.
+
+
+**1992 - Early Windows Pen Support (Windows for Pen Computing)**
+
+ The earliest Windows operating system to support non-mouse pointing devices
+ was Windows 3.1 with the "Windows for Pen Computing" add-on (1992).
+ (`For the curious <https://socket3.wordpress.com/2019/07/31/windows-for-pen-computing-1-0/>`__,
+ and I'm certain `this book <https://www.amazon.com/Microsoft-Windows-Pen-Computing-Programmers/dp/1556154690>`__
+ is a must-read!). Pen support was mostly implemented by translating actions
+ into the existing ``WM_MOUSExxx`` messages, but also "upgraded" any
+ application's ``EDIT`` controls into ``HEDIT`` controls, which looked the
+ same but were capable of being handwritten into using a pen. This was not
+ very user-friendly, as the controls stayed the same size and the UI was not
+ adapted to the input method. This add-on never achieved much popularity.
+
+ It is not documented whether Netscape Navigator (the ancestor of Mozilla
+ Firefox) supported this add-on or not, but there is no trace of it in modern
+ Firefox code.
+
+
+**1995 - Introduction of JavaScript and Mouse Events (De-facto Web Standard)**
+
+ The introduction of JavaScript in 1995 by Netscape Communications added a
+ programmable, event-driven scripting environment to the Web Platform.
+ Browser vendors quickly added the ability for scripts to listen for and
+ react to mouse events. These are the well-known events like ``mouseover``,
+ ``mouseenter``, ``mousedown``, etc. that are ubiquitous on the web, and are
+ known by basically anyone who has ever written front-end JavaScript.
+
+ This ubiquity created a de-facto standard for mouse input, which would
+ eventually be formally standardized by the W3C in the HTML Living Standard
+ in 2001.
+
+ The Mouse Event APIs assume that the computer has one single pointing device
+ which is always present, has a single cursor capable of "hovering" over an
+ element, and has between one and three buttons.
+
+ When support for other pointing devices like touchscreen and pen first
+ became available in operating systems, it was exposed to the web by
+ interpreting user actions into equivalent mouse events. Unfortunately, this
+ is unable to handle multiple concurrent pointers (like one would get from
+ multitouch screens) or report the kind of rich information a pen digitizer
+ can provide, like tilt angle, pressure, etc. This eventually lead the W3C
+ to develop the new "Touch Events" standard to expose touch functionality,
+ and eventually the "Pointer Events" to expose more of the rich information
+ provided by pens.
+
+
+**2005 - Mainstream Pen Support (Windows XP Tablet PC Edition)**
+
+ It was the release of Windows XP Tablet PC Edition (2005) that allowed
+ Windows applications to directly support tablet pens by using the new COM
+ "`Windows Tablet PC <https://learn.microsoft.com/en-us/windows/win32/tablet/tablet-pc-development-guide>`__"
+ APIs, most of which are provided through the main `InkCollector <https://learn.microsoft.com/en-us/windows/win32/tablet/inkcollector-class>`__
+ class. The ``InkCollector`` functionality would eventually be "mainlined"
+ into Windows XP Professional Service Pack 2, and continues to exist in
+ modern Windows releases.
+
+ The Tablet PC APIs consist of a large group of COM objects that work
+ together to facilitate enumerating attached pens, detecting pen movement and
+ pen strokes, and analyzing them to provide:
+
+ 1. **Cursor Movement**: translates the movements of the pen into the
+ standard mouse events that applications expect from mouse cursor
+ movement, namely ``WM_NCHITTEST``, ``WM_SETCURSOR`` and
+ ``WM_MOUSEMOVE``.
+
+ 2. **Gesture Recognition**: detects common user actions, like "tap",
+ "double-tap", "press-and-hold", and "drag". The `InkCollector` delivers
+ these events via COM `SystemGesture <https://learn.microsoft.com/en-us/windows/win32/tablet/inkcollector-systemgesture>`__
+ events using the `InkSystemGesture <https://learn.microsoft.com/en-us/windows/win32/api/msinkaut/ne-msinkaut-inksystemgesture>`__
+ enumeration. It will also translate them into common Win32 messages; for
+ example, a "drag" gesture would be translated into a ``WM_LBUTTONDOWN``
+ message, several ``WM_MOUSEMOVE`` messages, and finally a
+ ``WM_LBUTTONUP`` message.
+
+ An application that is using ``InkCollector`` will receive both types of
+ messages: traditional mouse input through the Win32 message queue, and
+ "Tablet PC API" events through COM callbacks. It is up to the
+ application to determine which events matter to it in a given context,
+ as the two types of events are not guaranteed by Microsoft to correspond
+ in any predictable way.
+
+ 3. **Shape and Text Recognition**: allows the app to
+ recognize letters, numbers, punctuation, and other `common shapes <https://learn.microsoft.com/en-us/windows/win32/api/msinkaut/ne-msinkaut-inkapplicationgesture>`__
+ the user might make using their pen. Supported shapes include circles,
+ squares, arrows, and motions like "scratch out" to correct a misspelled
+ word. Custom recognizers exist that allow recognition of other symbols,
+ like music notes or mathematical notation.
+
+ 4. **Flick Recognition**: allows the user to invoke actions via quick,
+ linear motions that are recognized by Windows and sent to the app as
+ ``WM_TABLET_FLICK`` messages. The app can choose to handle the window
+ message or pass it on to the default window procedure, which will
+ translate it to scrolling messages or mouse messages.
+
+ For example, a quick upward 'flick' corresponds to "Page up", and
+ a quick sideways flick in a web browser would be "back". Flicks were
+ never widely used by Windows apps, and they may have been removed in
+ more recent versions of Windows, as the existing Control Panel menus
+ for configuring them seem to no longer exist as of Windows 10 22H2.
+
+
+ Firefox does not appear to have ever used these APIs to allow tablet pen
+ input, with the exception of `one piece of code <https://searchfox.org/mozilla-central/rev/e6cb503ac22402421186e7488d4250cc1c5fecab/widget/windows/InkCollector.cpp>`__
+ to detect when the pen leaves the Firefox window to solve
+ `Bug 1016232 <https://bugzilla.mozilla.org/show_bug.cgi?id=1016232>`__.
+
+
+**2009 - Touch Support: WM_GESTURE (Windows 7)**
+
+ While attempts were made with the release of Windows Vista (2007) to support
+ touchscreens through the existing tablet APIs, it was ultimately the release
+ of Windows 7 (2009) that brought first-class support for Touchscreen devices
+ to Windows with new Win32 APIs and two main window messages: ``WM_TOUCH``
+ and ``WM_GESTURE``.
+
+ These two messages are mutually-exclusive, and all applications are
+ initially set to receive only ``WM_GESTURE`` messages. Under this
+ configuration, Windows will attempt to recognize specific movements on a
+ touch digitizer and post "gesture" messages to the application's message
+ queue. These gestures are similar to (but, somewhat-confusingly, not
+ identical to) the gestures provided by the "Windows Tablet PC" APIs
+ mentioned above. The main gesture messages are: zoom, pan, rotate,
+ two-finger-tap, and press-and-tap (one finger presses, another finger
+ quickly taps the screen).
+
+ In contrast to the behavior of the ``InkCollector`` APIs, which will send
+ both gesture events and translated mouse messages, the ``WM_GESTURE``
+ message is truly "upstream" of the translated mouse messages; the translated
+ mouse messages will only be generated if the application forwards the
+ ``WM_GESTURE`` message to the default window procedure. This makes
+ programming against this API simpler than the ``InkCollector`` API, as
+ there is no need to state-fully "remember" that an action has already been
+ serviced by one codepath and needs to be ignored by the other.
+
+ Firefox current supports the ``WM_GESTURE`` message when Asynchronous Pan
+ and Zoom (APZ) is not enabled (although we do not handle inertia in this
+ case, so the page comes to a dead-stop immediately when the user stops
+ scrolling).
+
+
+**2009 - Touch Support: WM_TOUCH (Windows 7)**
+
+ Also introduced in Windows 7, an application that needs full control over
+ touchscreen events can use `RegisterTouchWindow <https://learn.microsoft.com/en-us/windows/win32/api/winuser/nf-winuser-registertouchwindow>`__
+ to change any of its windows to receive ``WM_TOUCH`` messages instead of the
+ more high-level ``WM_GESTURE`` messages. These messages explicitly notify
+ the application about every finger that contacts or breaks contact with the
+ digitizer (as well as each finger's movement over time). This provides
+ absolute control over touch interpretation, but also means that the burden
+ of handling touch behavior falls completely on the application.
+
+ To help ease this burden, Microsoft provides two COM APIs to interpret
+ touch messages, ``IManipulationProcessor`` and ``IInertiaProcessor``.
+
+ ``IManipulationProcessor`` can be considered a superset of the functionality
+ available through normal gestures. The application feeds ``WM_TOUCH`` data
+ into it (along with other state, such as pivot points and timestamps), and
+ it allows for manipulations like: two-finger rotation around a pivot,
+ single-finger rotation around a pivot, simultaneous rotation and translation
+ (for example, 'dragging' a single corner of a square).
+ `These MSDN diagrams <https://learn.microsoft.com/en-us/windows/win32/wintouch/advanced-manipulations-overview>`__
+ give a good overview of the kinds of advanced manipulations an app might
+ support.
+
+ ``IInertiaProcessor`` works with ``IManipulationProcessor`` to add inertia
+ to objects in a standard way across the operating system. It is likely that
+ later APIs that provide this (like DirectManipulation) are using these COM
+ objects under the hood to accomplish their inertia handling.
+
+ Firefox currently handles the ``WM_TOUCH`` event when Asynchronous Pan and
+ Zoom (APZ) is enabled, but we do not use either the ``IInertiaProcessor``
+ nor the ``IManipulationProcessor``.
+
+
+**2012 - Unified Pointer API (Windows 8)**
+
+ Windows 8 (2012) was Microsoft's initial attempt to make a touch-first,
+ mobile-first operating system that (ideally) would make it easy for app
+ developers to treat touch, pen, and mouse as first-class input devices.
+
+ By this point, the Windows Tablet APIs would allow tablet pens to draw
+ text and shapes like squares, triangles, and music notes, and those shapes
+ would be recognizable by the Windows Ink subsystem.
+
+ At the same time, Windows Touch allowed touchscreens to have advanced
+ manipulation, like rotate + translate, or simultaneous pan and zoom, and it
+ allowed objects manipulated by touch to have momentum and angular velocity.
+
+ The shortcomings of having separate input stacks for these various devices
+ starts to be become apparent after a while: Why shouldn't a touchscreen be
+ able to recognize a circle or a triangle? Why shouldn't a pen be able to
+ have complex rotation and zoom functionality? How do we handle these newer
+ laptop touchpads that are starting to handle multi-touch gestures like a
+ touchscreen, but still cause relative cursor movement like a mouse? Why does
+ my program have to have 3 separate codepaths for different pointing devices
+ that are all very similar?
+
+ The Windows Pointer Device Input Stack introduces new APIs and window
+ messages that generalize the various types of pointing devices under a
+ single API while still falling back to the legacy touch and tablet input
+ stacks in the event that the API is unused. (Note that the touch and tablet
+ stacks themselves fall back to the traditional mouse input stack when they
+ are unused.)
+
+ Microsoft based their pointer APIs off the Buxton Three-State Model
+ (discussed earlier), where changes between "Out-of-Range" and "Tracking" are
+ signalled by ``WM_POINTERENTER`` AND ``WM_POINTERLEAVE`` messages, and
+ changes between "Tracking" and "Dragging" are signalled by
+ ``WM_POINTERDOWN`` and ``WM_POINTERUP``. Movement is indicated via
+ ``WM_POINTERUPDATE`` messages.
+
+ If these messages are unhandled (the message is forwarded to
+ ``DefWindowProc``), the Win32 subsystem will translate them
+ into touch or gesture messages. If unhandled, those will be further
+ translated into mouse and system messages.
+
+ While the Pointer API is not without some unfortunate pitfalls (which will
+ be discussed later), it still provides several advantages over the
+ previously available APIs: it can allow a mostly-unified codepath for
+ handling pointing devices, it circumvents many of the often-complex
+ interactions between the previous APIs, and it provides the ability to
+ simulate pointing devices to help facilitate end-to-end automated testing.
+
+ Firefox currently uses the Pointer APIs to handle tablet stylus input only,
+ while other input methods still use the historical mouse and touch input
+ APIs above.
+
+
+**2013 - DirectManipulation (Windows 8.1)**
+
+ DirectManipulation is a DirectX based API that was added during the release
+ of Windows 8.1 (2013). This API allows an app to create a series of
+ "viewports" inside a window and have scrollable content within each of these
+ viewports. The manipulation engine will then take care of automatically
+ reading Pointer API messages from the window's event queue and generating
+ pan and zoom events to be consumed by the app.
+
+ In the case that the app is also using DirectComposition to draw its window,
+ DirectManipulation can pipe the events directly into it, causing the app
+ to essentially get asynchronous pan and zoom with proper handling of inertia
+ and overscroll with very little coding.
+
+ DirectManipulation is only used in Firefox to handle data coming from
+ Precision Touchpads, as Microsoft provides no other convenient API for
+ obtaining data from such devices. Firefox creates fake content inside of
+ a fake viewport to capture the incoming events from the touchpad and
+ translates them into the standard Asynchronous Pan and Zoom (APZ) events
+ that the rest of the input pipeline uses.
+
+
+**2013 - Touch Events (Web Standard)**
+
+ "`Touch Events <https://www.w3.org/TR/touch-events/>`__" became a W3C
+ recommendation in October, 2013.
+
+ At this point, Microsoft's first operating system to include touch support
+ (Windows 7) was the most popular desktop operating system, and the ubiquity
+ of smart phones brought a huge uptick in users with touchscreen inputs. All
+ major browsers included some API that allowed reading touch input,
+ prompting the W3C to formalize a new standard to ensure interoperability.
+
+ With the Touch Events API, multiple touch interactions may be reported
+ simultaneously, each with their own separate identifier for tracking and
+ their own coordinates within the screen, viewport, and client area. A
+ touch is reported by: a ``touchstart`` event with a unique ID for each
+ contact, zero-or-more ``touchmove`` events with that ID, and finally a
+ ``touchend`` event to signal the end of that specific contact.
+
+ The API also has some amount of support for pen styluses, but it lacks
+ important features necessary to truly support them: hovering, pressure,
+ tilt, or multiple cursors like an erasure. Ultimately, its functionality
+ has been superceded by the newer "Pointer Events" API, discussed below.
+
+
+**2016 - Precision Touchpads (Windows 10)**
+
+ Early touchpads emulated a computer mouse by directly using the same IBM
+ PS/2 interface that most computer mice used and translating relative
+ movement of the user's finger into equivalent movements of a mouse on a
+ surface.
+
+ As touchpad technology advanced and more powerful interface standards like
+ USB begun to take over the consumer market, touchpad vendors started adding
+ extra features to their hardware, like tap-to-click, tap-and-drag, and
+ tap-and-hold (to simulate a right click). These behaviors were implemented
+ by touchpad vendors either in hardware drivers and/or user mode "hooks" that
+ injected equivalent Win32 messages into the appropriate target.
+
+ As expected, each touchpad vendor's driver had its own subtly-different
+ behavior from others, its own bugs, and its own negative interactions with
+ other software.
+
+ During the later years of Windows 8, Microsoft and touchpad company
+ Synaptics co-developed the "Precision Touchpad" standard, which defines an
+ interface for touchpad hardware to report its physical measurements,
+ precision, and sensor configuration to Windows and allows it to deliver raw
+ touch data. Windows then interprets the data and generates gestures and
+ window messages in a standard way, removing the burden of implementing these
+ behaviors from the touchpad vendor and providing the OS with rich
+ information about the user's movements.
+
+ It wasn't until the 2016 release of Windows 10 14946 that Microsoft would
+ support all the standard gestures through the new standard. Although
+ adoption by vendors has been a bit slow, the fact that
+ `it is a requirement for Windows 11 <https://pocketnow.com/all-windows-11-pcs-will-be-required-to-have-a-precision-touchpad-and-webcam/>`__
+ means that vendor support for this standard is imminent.
+
+ Unfortunately, there's a piece of bad news: Microsoft did not
+ implement the above "Unified Pointer API" for use with touchpads, as the
+ developers of Blender discovered when `they moved to the Pointer API <https://archive.blender.org/developer/D7660>`__.
+ Instead, Microsoft expects developers to either use DirectManipulation to
+ automatically get pan/zoom enabled for their app, or the RawInput API to
+ directly read touchpad data.
+
+
+**2019 - Pointer Events (Web Standard)**
+
+ "`Pointer Events <https://www.w3.org/TR/pointerevents/>`__" became a level 2
+ W3C recommendation in April, 2019. They considered `the work done by Microsoft <https://www.w3.org/Submission/2012/SUBM-pointer-events-20120907/>`__
+ as part of the design of their own Pointer API, and in many ways the W3C
+ standard resembles an improved, better specified, more consistent, and
+ easier-to-use version of the APIs provided by the Win32 subsystem.
+
+ The Pointer Events API generalizes devices like touchscreens, mice, tablet
+ pens, VR controllers, etc. into a "thing that points". A pointer has
+ (optional) properties: a width and height (big for a finger, 1px for a
+ mouse), an amount of pressure, a tilt angle relative to the surface, some
+ buttons, etc. This helps applications maximize code reuse for handling
+ pointer input by having a common codebase written against these generalized
+ traits. If needed, the application may also have smaller, specialized
+ sections of code for each concrete pointer type.
+
+ Certain types of pointers (like pens and touchscreens) have a behavior where
+ they are always "captured" by the first object that they interact with. For
+ example, if a user puts their finger on an empty part of a web page and
+ starts to scroll, their finger is now "captured" by the web page itself.
+ "Captured" means that even if their finger moves over an element in
+ the web page, that element will not receive events from the finger -- the
+ page itself will until the entire interaction stops.
+
+ The events themselves very closely follow the Buxton Three-State Model
+ (discussed earlier), where ``pointerover/pointerout`` messages indicate
+ transitions from "Out of Range" to "Tracking" and visa-versa, and
+ ``pointerdown/pointerup`` messages transition between "Tracking" and
+ "Dragging". ``pointermove`` updates the position of the pointer, and a
+ special ``pointercancel`` message is sent to inform the page that the
+ browser is "cancelling" a ``pointerdown`` event because it has decided to
+ consume it for a gesture or because the operating system cancelled the
+ pointer for its own reasons.
+
+
+CSS "interaction" Media Queries
+==========================================
+
+(Note that this section is **not** about the `pointer-events <https://developer.mozilla.org/en-US/docs/Web/CSS/pointer-events>`__
+CSS property, which defines the circumstances where an element can be the target
+of pointer events.)
+
+The W3C defines the interaction-related media queries in the
+`Media Queries Level 4 - Interaction Media Features <https://www.w3.org/TR/mediaqueries-4/#mf-interaction>`__
+document.
+
+To summarize, the main interaction-related CSS Media Queries that Firefox must
+support are ``pointer``, ``any-pointer``, ``hover`` and ``any-hover``.
+
+
+``pointer``
+
+ Allows the webpage to query the existence of a pointing device on
+ the machine, and (if available) the assumed "pointing accuracy" of the
+ "primary" pointing device. The device considered "primary" on a machine with
+ multiple input devices is a policy decision that must be made by the web
+ browser; Windows simply provides the APIs to query information about
+ attached devices.
+
+ The browser is expected to return one of three strings to this media query:
+
+ ``none``
+
+ There is no pointing device attached to the computer.
+
+ ``coarse``
+
+ The primary pointing device is capable of approximately
+ pointing at a relatively large target (like a finger on a
+ touchscreen).
+
+ ``fine``
+
+ The primary pointing device is capable of near-pixel-level
+ accuracy (like a computer mouse or a tablet pen).
+
+
+``any-pointer``
+
+ Similar to ``pointer``, but represents the union of
+ capabilities of all pointers attached to the system, such that the meanings
+ become:
+
+ ``none``
+
+ There is no pointing device attached to the computer.
+
+ ``coarse``
+
+ There is at-least one "coarse" pointer attached.
+
+ ``fine``
+
+ There is at-least one "fine" pointer attached.
+
+
+``hover``
+
+ Allows the webpage to query whether the primary pointer is
+ capable of "hovering" over top of elements on the page. Computer mice,
+ touchpad cursors, and higher-end pen tablets all support this, whereas
+ current touchscreens are "touch" or "no touch", and they cannot detect a
+ finger hovering over the screen.
+
+ ``hover``
+
+ The primary pointer is capable of reporting hovering.
+
+ ``none``
+
+ The primary pointer is not capable of reporting hovering.
+
+``any-hover``
+
+ Indicates whether any pointer attached to the system has the
+ ``hover`` capability.
+
+
+Selection of the Primary Pointing Device
+--------------------------------------------
+
+To illustrate the complexity of this topic, consider the Microsoft Surface Pro.
+
+The Surface Pro has an advanced screen that is capable of receiving touch
+input, but it can also behave like a pen digitizer and receive input from a
+stylus with advanced pen capabilities, like hover sensing, pressure
+sensitivity, multiple buttons, and even multiple "tips" (a pen and eraser end).
+
+In this case, what should Firefox consider the primary pointing device?
+
+Perhaps the user intends to use their Surface Pro like a touchscreen tablet,
+at which point Firefox should report ``pointer: coarse`` and ``hover: none``
+capabilities.
+
+But what if, instead, the user wants to sketch art or take notes using a pen on
+their Surface Pro? In this case, Firefox should be reporting ``pointer: fine``
+and ``hover: hover``.
+
+Imagine that the user then attaches the "keyboard + touchpad" cover attachment
+to their Surface Pro; naturally, we will consider that the user's intent is for
+the touchpad to become the primary pointing device, and so it is fairly clear
+that we should return ``pointer: fine`` and ``hover: hover`` in this state.
+
+However, what if the user tucks the keyboard/touchpad attachment behind the
+tablet and begins exclusively operating the device with their finger?
+
+This example shows that complex, multi-input machines can resist classification
+and blur the lines between labels like "touch device", "laptop", "drawing
+tablet", etc. It also illustrates that identifying the "primary" pointing
+device using only machine configuration may yield unintuitive and suboptimal
+results.
+
+While we can almost-certainly improve our hardware detection heuristics to
+better answer this question (and we should, at the very least), perhaps it
+makes more sense for Firefox to incorporate user intentions into the decision.
+Intentions could be communicated directly by the user through some sort of
+setting or indirectly through the user's actions.
+
+For example, if the user intends to draw on the screen with a pen, perhaps
+Firefox provides something like a "drawing mode" that the user can toggle to
+change the primary pointing device to the pen. Or perhaps it's better for
+Firefox to interpret the mere fact of receiving pen input as evidence of the
+user's intent and switch the reported primary pointing device automatically.
+
+If we wanted to switch automatically, there are predictable traps and pitfalls
+we need to think about: we need to ensure that we don't create frustrating user
+experiences where web pages may "pop" beneath the user suddenly, and
+we should likely incorporate some kind of "settling time" so we don't
+oscillate between devices.
+
+It's worth noting that Chromium doesn't seem to incorporate anything like
+what's being suggested here, so if this is well-designed it may be an
+opportunity for Firefox to try something novel.
+
+
+
+
+================================================================================
+State of the Browser
+================================================================================
+
+Pan and Zoom, Inertia, Overscroll, and Elastic Bounce
+=========================================================
+
+As can be seen in the videos below, Firefox's support for inertia, overscroll,
+and elastic bounce works well on all platforms when a stylus pen is used
+as the input device, and it also works just fine with the touchscreen on the
+Dell XPS 15. However, it completely fails when the touchscreen is used on
+the Microsoft Surface Pro. While more investigation is needed to completely
+understand these issues, the fact that the correctly-behaving digitizing pens
+use the Pointer API and the misbehaving input devices do not may be related.
+
+- `Video 1 <https://drive.google.com/file/d/1Z1QRSf2RluNhJwkKCzPb6-14vRtkqK8s/view?usp=sharing>`__
+ showcasing overscroll and bounce not working on Surface Pro with touch, but
+ other devices/inputs are working
+
+- `Video 2 <https://drive.google.com/file/d/1bOgpVGBeZtwelvPJzYdA6uFRpubGtu4W/view?usp=sharing>`__
+ showing that everything works just fine with an external Wacom digitizer
+
+
+Pointer Media Queries
+=========================================================
+
+**"any-pointer" Queries**
+
+Unlike the ``pointer`` media queries, which rely on the browser to make a policy
+decision about what should be considered the "primary" pointer in a given
+system configuration, the ``any-pointer`` queries are much more objective and
+binary: the computer either has a type of device attached to it, or it
+doesn't.
+
+**any-pointer: coarse**
+
+Firefox reports that there are "coarse" pointing devices present if either of
+these two points is true:
+
+1. ``GetSystemMetrics(SM_DIGITIZER)`` reports that a device that supports
+ touch or pen is present.
+
+2. Based on heuristics, Firefox concludes that it is running on a computer it
+ considers a "tablet".
+
+Point #1 is incorrect, as a pen is not a "coarse" pointing device. Note that
+this is a recent regression in `Bug 1811303 <https://bugzilla.mozilla.org/show_bug.cgi?id=1811303>`__
+that was uplifted to Firefox 112, so this actually regressed as this document
+was being written! This is responsible for the incorrect "Windows 10 Desktop +
+Wacom USB Tablet" issue in the table.
+
+Point #2 is a clear case of the `XY Problem <https://en.wikipedia.org/wiki/XY_problem>`__,
+where Firefox is trying to determine if a coarse pointing device is present
+by determining whether it is running on a tablet, when instead it should be
+directly testing for coarse pointing devices (since, of course, those can exist
+on machines that wouldn't normally be considered a "tablet"). This is
+responsible for the incorrect "Windows 10 Dell XPS 15 (Touch Disabled) + Wacom
+USB Tablet" issue in the table below.
+
+**any-pointer: fine**
+
+Firefox reports that there are "fine" pointing devices present if and only if
+it detects a mouse. This is clearly already wrong. Firefox determines that the
+computer has a mouse using the following algorithm:
+
+1. If ``GetSystemMetrics(SM_MOUSEPRESENT)`` returns false, report no mouse.
+
+2. If Firefox does not consider the current computer to be a tablet, report a
+ mouse if there is at-least one "mouse" device driver running on the
+ computer.
+
+3. If Firefox considers the current computer to be a tablet or a touch system,
+ only report a mouse if there are at-least two "mouse" device drivers
+ running. This exists because some tablet pens and touch digitizers report
+ themselves as computer mice.
+
+This algorithm also suffers from the XY problem -- Firefox is trying to
+determine whether a fine pointing device exists by determining if there is
+a computer mouse present, when instead it should be directly testing for
+fine pointing devices, since mice are not the only fine pointing
+devices.
+
+Because of this proxy question, this algorithm is completely dependent on any
+attached fine pointing device (like a pen tablet) to report itself as a mouse.
+Point #3 makes the problem even worse, because if a computer that resembles a
+tablet fails to report its digitizers as mice, the algorithm will completely
+ignore an actual computer mouse attached to the system because it expects two
+of them to be reported!
+
+Unfortunately, the Surface Pro has both a pen digitizer and a touch digitizer,
+and it reports neither as a mouse. As a result, this algorithm completely falls
+apart on the Surface Pro, failing to report any "fine" pointing device even
+when a computer mouse is plugged in, a pen is plugged in, or even when
+the tablet is docked because its touchpad is only one mouse and it expects
+at least two.
+
+This is also responsible for failing to report the trackpad on the Dell XPS 15
+as "fine", because the Dell XPS 15 has a touchscreen and therefore looks like
+a "tablet", but doesn't report 2 mouse drivers.
+
+**any-pointer: hover**
+
+
+Firefox reports that any device that is a "fine" pointer also supports "hover",
+which does generally hold true, but isn't necessarily true for lower-end pens
+that only support tapping. It would be better for Firefox to directly
+query the operating system instead of just assuming.
+
+**"pointer" media query**
+
+As discussed previously at length, this media query relies on a "primary"
+designation made by the browser. Below is the current algorithm used to
+determine this:
+
+1. If the computer is considered a "tablet" (see below), report primary
+ pointer as "coarse" (this is clearly already the wrong behavior).
+
+2. Otherwise, if the computer has a mouse plugged in, report "fine".
+
+3. Otherwise, if the computer has a touchscreen or pen digitizer, report
+ "coarse" (this is wrong in the case of the digitizer).
+
+4. Otherwise, report "fine" (this is wrong; should report "None").
+
+Firefox uses the following algorithm to determine if the computer is a
+"tablet" for point #1 above:
+
+1. It is not a tablet if it's not at-least running Windows 8.
+
+2. If Windows "Tablet Mode" is enabled, it is a tablet no matter what.
+
+3. If no touch-capable digitizers are attached, it is not a tablet.
+
+4. If the system doesn't support auto-rotation, perhaps because it has
+ no rotation sensor, or perhaps because it's docked and operating in
+ "laptop mode" where rotation won't happen, it's not a tablet.
+
+5. If the vendor that made the computer reports to Windows that it supports
+ "convertible slate mode" and it is currently operating in "slate mode",
+ it's a tablet.
+
+6. Otherwise, it's not a tablet.
+
+
+**Table with comparison to Chromium**
+
+The following table shows how Firefox and Chromium respond to various pointer
+queries. The "any-pointer" and "any-hover" columns are not subjective and
+therefore are always either green or red to indicate "pass" or "fail", but the
+"pointer" and "hover" may also be yellow to indicate that it's "open to
+interpretation" because of the aforementioned difficulty in determining the
+"primary pointer".
+
+.. image:: touch_media_queries.png
+ :width: 100%
+
+
+**Related Bugs**
+
+- Bug 1813979 - For Surface Pro media query "any-pointer: fine" is true only
+ when both the Type Cover and mouse are connected
+
+- Bug 1747942 - Incorrect CSS media query matches for pointer, any-pointer,
+ hover and any-hover on Surface Laptop
+
+- Bug 1528441 - @media (hover) and (any-hover) does not work on Firefox 64/65
+ where certain dual inputs are present
+
+- Bug 1697294 - Content processes unable to detect Windows 10 Tablet Mode
+
+- Bug 1806259 - CSS media queries wrongly detect a Win10 desktop computer
+ with a mouse and a touchscreen, as a device with no mouse (hover: none)
+ and a touchscreen (pointer: coarse)
+
+
+Web Events
+=====================
+
+The pen stylus worked well on all tested systems -- The correct pointer events
+were fired in the correct order, and mouse events were properly simulated in
+case the default behavior was allowed.
+
+The touchscreen input was less reliable. On the Dell XPS 15, the
+"Pointer Events" were flawless, but the "Touch Events" were missing
+an important step: the ``touchstart`` and ``touchmove`` messages were sent just
+fine, but Firefox never sends the ``touchend`` message! (Hopefully that isn't
+too difficult to fix!)
+
+Unfortunately, everything really falls apart on the Surface Pro using the
+touchscreen -- neither the "Pointer Events" nor the "Touch Events" fire at all!
+Instead, the touch is completely absorbed by pan and zoom gestures, and nothing
+is sent to the web page. The website's request for ``touch-action: none`` is
+ignored, and the web page is never given any opportunity to call
+``Event.preventDefault()`` to cancel the pan/zoom behavior.
+
+
+Operating System Interfaces
+================================
+
+As was discussed above, Windows has multiple input APIs that were each
+introduced in newer version of Windows to handle devices that were not
+well-served by existing APIs.
+
+Backward compatibility with applications designed against older APIs is
+realized when applications call the default event handler (``DefWindowProc``)
+upon receiving an event type that they don't recognize (which is what apps have
+always been instructed to do if they receive events they don't recognize).
+The unrecognized newer events will be translated by the default event handler
+into older events and sent back to the application. A very old application may
+have this process repeat through several generations of APIs until it finally
+sees events that it recognizes.
+
+Firefox currently uses a mix of the older and newer APIs, which complicates
+the input handling logic and may be responsible for some of the
+difficult-to-explain bugs that we see reported by users.
+
+Here is an explanation of the codepaths Firefox uses to handle pointer input:
+
+1. Firefox handles the ``WM_POINTER[LEAVE|DOWN|UP|UPDATE]`` messages if the
+ input device is a tablet pen and an Asynchronous Pan and Zoom (APZ)
+ compositor is available. Note that this already may not be ideal, as
+ Microsoft warns (`here <https://learn.microsoft.com/en-us/windows/win32/inputmsg/wm-pointercapturechanged>`__)
+ that handling some pointer messages and passing other pointer messages to
+ ``DefWindowProc`` has unspecified behavior (meaning that Win32 may do
+ something unexpected or nonsensical).
+
+ If the above criteria aren't met, Firefox will call ``DefWindowProc``, which
+ will re-post the pointer messages as either touch messages or mouse
+ messages.
+
+2. If DirectManipulation is being used for APZ, it will output the
+ ``WM_POINTERCAPTURECHANGED`` if it detects a pan or zoom gesture it can
+ handle. It will then handle the rest of the gesture itself.
+
+ DirectManipulation is used for all top-level and popup windows as long as
+ it isn't disabled via the ``apz.allow_zooming``,
+ ``apz.windows.use_direct_manipulation``, or
+ ``apz.windows.force_disable_direct_manipulation`` prefs.
+
+3. If the pointing device is touch, the next action depends on
+ whether an Asynchronous Pan and Zoom (APZ) compositor is available. If it
+ is, the window will have been registered using ``RegisterTouchWindow``, and
+ Firefox will receive ``WM_TOUCH`` messages, which will be sent to the
+ "Touch Event" API and handled directly by the APZ compositor.
+
+ If there is no APZ compositor, it will instead be received as a
+ ``WM_GESTURE`` message or a mouse message, depending on the movement. Note
+ that these will be more basic gestures, like tap-and-hold.
+
+4. If none of the above apply, the message will be converted into standard
+ ``WM_MOUSExxx`` messages via a call to ``DefWindowProc``.
+
+
+================================================================================
+Discussion
+================================================================================
+
+Here is where some of the outstanding thoughts or questions can be listed.
+This can be updated as more questions come about and (hopefully) as answers to
+questions become apparent.
+
+CSS "pointer" Media Queries
+===============================
+
+- The logic for the ``any-pointer`` and ``any-hover`` queries are objectively
+ incorrect and should be rewritten altogether. That is not as
+ big of a job as it sounds, as the code is fairly straightforward and
+ self-contained. (Note: Improvements have already been made in
+ `Bug 1813979 <https://bugzilla.mozilla.org/show_bug.cgi?id=1813979>`__)
+
+- There are a few behaviors for ``pointer`` and ``hover`` that are
+ objectively wrong (such as reporting a ``coarse`` pointer when the
+ Surface Pro is docked with a touchpad). Those should be fixable with a
+ code change similar to the previous bullet.
+
+- Do we want to continue to use only machine configuration to decide what
+ the "primary" pointer is, or do we also want to incorporate user intent
+ into the algorithm? Or, alternatively:
+
+ 1. Do we create a way for the user to override? For example, a "Drawing
+ Mode" button if a tablet digitizer is sensed.
+
+ 2. Do we attempt to change automatically in response to user action?
+
+ - An example was used above of a docked Surface Pro computer, where
+ the user may use the keyboard and touchpad for a while, then perhaps
+ tuck that behind and use the device as a touchscreen, and then
+ perhaps draw on it with a tablet stylus.
+
+ - We would need to be careful to avoid careless "popping" or
+ "oscillating" if we react too quickly to changing input types.
+
+- On a separate-but-related note, the `W3C suggested <https://www.w3.org/TR/mediaqueries-5/#descdef-media-pointer>`__
+ that it might be beneficial to allow users to at-least disable all
+ reporting of ``fine`` pointing devices for users who may have a disability
+ that prevents them from being able to click small objects, even with a fine
+ pointing device.
+
+
+Pan-and-Zoom, Inertia, Overscroll, and Elastic Bounce
+=========================================================
+
+- Inertia, overscroll, and elastic bounce are just plain broken on the
+ Surface Pro. That should definitely be investigated.
+
+- We can see from the video below that Microsoft Edge has quite a bit more
+ overscroll and a more elastic bounce than Firefox does, and it also
+ allows elastic bounce in directions that the page itself doesn't scroll.
+
+ Edge's way seems more similar to the user experience I'd expect from using
+ Firefox on an iPhone or Android device. Perhaps we should consider
+ following suit?
+
+ (`Link to video <https://drive.google.com/file/d/14XVLT6CNn2RaXcHHCRIrQmRwoMYjj6fu/view?usp=sharing>`__)
+
+
+Web Events
+==============
+
+- It's worth investigating why the ``touchend`` message never seems
+ to be sent by Firefox on any tested devices.
+
+- It's very disappointing that neither the Pointer Events API nor the
+ Touch Events API works at all on Firefox on the Surface Pro. That should
+ be investigated very soon!
+
+
+Operating System Interfaces
+================================
+
+- With the upcoming sun-setting of Windows 7 support, Firefox has an
+ opportunity to revisit the implementation of our input handling and try to
+ simplify our codepaths and eliminate some of the workarounds that exist to
+ handle some of these complex interactions, as well as fix entire classes of
+ bugs - both reported and unreported - that currently exist as a result.
+
+- Does it make sense to combine the touchscreen and pen handling together
+ and use the ``WM_POINTERXXX`` messages for both?
+
+ - This would eliminate the need to handle the ``WM_TOUCH`` and
+ ``WM_GESTURE`` messages at all.
+
+ - Note that there is precedent for this, as `GTK <https://gitlab.gnome.org/GNOME/gtk/-/merge_requests/1563>`__
+ has already done so. It appears that `Blender <https://archive.blender.org/developer/D7660>`__
+ has plans to move toward this as well.
+
+ - Tablet pens seemed to do very well in most of the testing,
+ and they are also the part of the code that mainly exercises the
+ ``WM_POINTERXXX`` codepaths. That may imply increased reliability in
+ that codepath?
+
+ - The Pointer APIs also have good device simulation for integration
+ testing.
+
+ - Would we also want to roll mouse handling into it using the
+ `EnableMouseInPointer <https://learn.microsoft.com/en-us/windows/win32/api/winuser/nf-winuser-enablemouseinpointer>` __
+ call? That would allow us to also get rid of handling
+ ``WM_MOUSE[MOVE/WHEEL/HWHEEL]`` and ``WM_[LRM]BUTTON[UP|DOWN]``
+ messages. Truly one codepath (with a few minor branches) to rule them
+ all!
+
+ - Nick Rishel sent `this link <http://the-witness.net/news/2012/10/wm_touch-is-totally-bananas/>`__
+ that details the troubles that the developers of The Witness (a video
+ game) ran into when using the ``WM_TOUCH`` API. It argues that the API
+ is poorly-designed, and advises that if Windows 7 support is not
+ needed, the API should be avoided.
+
+- Should we exclusively use DirectManipulation for Pan/Zoom?
+
+ - Multitouch touchpads bypass all of the ``WM_POINTER`` machinery
+ for anything gesture-related and directly send their messages to
+ DirectManipulation. We then "capture" all the DirectManipulation events
+ and pump them into our events pipeline, as explained above.
+
+ - DirectManipulation also handles "overscroll + elastic bounce" in a way
+ that aligns with Windows look-and-feel.
+
+ - Perhaps it makes sense to just use DirectManipulation for all APZ
+ handling and eliminate any attempt at handling this through other
+ codepaths.
+
+High-Frequency Input
+================================
+
+"High-Frequency Input" refers to the ability for an app to be able to still
+perceive input events despite them happening at a rate faster than the app
+itself actually handles them.
+
+Consider a mouse that moves through several points: "A->B->C->D->E". If the
+application processes input when the mouse is at "A" and doesn't poll again
+until the mouse is at point "E", the default behavior of all modern operating
+systems is to "coalesce" these events and simply report "A->E". This is fine
+for the majority of use cases, but certain workloads (such as digital
+handwriting and video games) can benefit from knowing the complete path that
+was taken to get from the start point to the end point.
+
+Generally, solutions to this involve the operating system keeping a history of
+pointer movements that can be retrieved through an API. For example,
+Android provides the `MotionEvent <https://developer.android.com/reference/android/view/MotionEvent.html>`__
+API that batches historal movements.
+
+Unfortunately, the APIs to do this in Windows are terribly broken. As
+`this blog <https://blog.getpaint.net/2019/11/14/paint-net-4-2-6-alpha-build-7258/>`__
+makes clear, `GetMouseMovePointsEx <https://docs.microsoft.com/en-us/windows/win32/api/winuser/nf-winuser-getmousemovepointsex>`__
+has so many issues that they had to remove its usage from their program because
+of the burden. That same blog entry also details that the newer Pointer API has
+the `GetPointerInfoHistory <https://docs.microsoft.com/en-us/windows/win32/api/winuser/nf-winuser-getpointerinfohistory>`__
+that is *supposed* to support tracking pointer history, but it only ever tracks
+a single entry!
+
+Perhaps luckily, there is currently no web standard for high-frequency input,
+although it `has been asked about in the past <https://lists.w3.org/Archives/Public/public-pointer-events/2014AprJun/0057.html>`__.
+
+If such a standard was ever created, it would likely be very difficult for
+Firefox on Windows to support it.
+
+
+DirectManipulation and Pens
+=============================
+
+- This is a todo item, but it needs to be investigated whether or not
+ DirectManipulation can directly scoop up pen input, or whether it has
+ to be handled by the application (and forwarded to DM if desired).