diff options
Diffstat (limited to 'widget/windows/docs/windows-pointing-device/index.rst')
-rw-r--r-- | widget/windows/docs/windows-pointing-device/index.rst | 1384 |
1 files changed, 1384 insertions, 0 deletions
diff --git a/widget/windows/docs/windows-pointing-device/index.rst b/widget/windows/docs/windows-pointing-device/index.rst new file mode 100644 index 0000000000..eda552b3dd --- /dev/null +++ b/widget/windows/docs/windows-pointing-device/index.rst @@ -0,0 +1,1384 @@ +################################################################################ +Windows Pointing Device Support in Firefox +################################################################################ + +.. contents:: Table of Contents + :depth: 4 + +================================================================================ +Introduction +================================================================================ + +This document is intended to provide the reader with a quick primer and/or +refresher on pointing devices and the various operating system APIs, user +experience guidelines, and Web standards that contribute to the way Firefox +handles input devices on Microsoft Windows. + +The documentation for these things is scattered across the web and has varying +levels of detail and completeness; some of it is missing or ambiguous and was +only determined experimentally or by reading about other people's experiences +through forum posts. An explicit goal of this document is to gather this +information into a cohesive picture. + +We will then discuss the ways in which Firefox currently (as of early 2023) +produces incorrect or suboptimal behavior when implementing those standards +and guidelines. + +Finally, we will raise some thoughts and questions to spark discussion on how +we might improve the situation and handle corner cases. Some of +these issues are intrinsically "opinion based" or "policy based", so clear +direction on these is desirable before engineering effort is invested into +reimplementation. + + +================================================================================ +Motivation +================================================================================ + +A quick look at the `pile of defects <https://bugzilla.mozilla.orgbuglist.cgi?query_format=advanced&status_whiteboard=%5Bwin%3Atouch%5D&list_id=16586149&status_whiteboard_type=allwordssubstr>`__ +on *bugzilla.mozilla.org* marked with *[win:touch]* will show anyone that +Firefox's input stack for pointer devices has issues, but the bugs recorded +there don't begin to capture the full range of unreported glitches and +difficult-to-reproduce hiccups that users run into while using touchscreen +hardware and pen digitizers on Firefox, nor does it capture the ways that +Firefox misbehaves according to various W3C standards that are (luckily) either +rarely used or worked around in web apps (and thus go undetected or +unreported). + +These bugs primarily manifest in a few ways that will each be discussed in +their own section: + +1. Firefox failing to return the proper values for the ``pointer``, + ``any-pointer``, ``hover``, and ``any-hover`` CSS Media Queries + +2. Firefox failing to fire the correct pointer-related DOM events at the + correct time (or at all) + +3. Firefox's inconsistent handling of touch-related gestures like scrolling, + where certain machines (like the Surface Pro) fail to meet the expected + behavior of scrolling inertia and overscroll. This leads to a weird touch + experience where the page comes to a choppy, dead-stop when using + single-finger scrolling + + +It's worth noting that Firefox is not alone in having these types of issues, +and that handling input devices is a notoriously difficult task for many +applications; even a substantial amount of Microsoft's own software has trouble +navigating this minefield on their own Microsoft Surface devices. Defects are +instigated by a combination of the *intrinsic complexity* of the problem domain +and the *accidential complexity* introduced by device vendors and Windows +itself. + +The *intrinsic complexity* comes from the simple fact that human-machine +interaction is difficult. A person must attempt to convey complex +and abstract goals through a series of simple movements involving a few pieces +of physical hardware. The devices can send signals that are unclear +or even contradictory, and the software must decide how to handle +this. + +As a trivial example, every software engineer that's ever written +page scrolling logic has to answer the question, "What should my +program do if the user hits 'Page Up' and 'Page Down' at the same time?". +While it may seem obvious that the answer is "Do nothing.", naively-written +keyboard input logic might assume the two are mutually-exclusive and only +process whichever key is handled first in program order. + +Occasionally, a new device will be invented that doesn't obviously map to +existing abstractions and input pipelines. There will be a period of time where +applications will want to support the new device, but it won't be well +understood by either the application developers nor the device vendor +themselves what ideal integration would look like. The new Apple Vision VR +headset is such a device; traditional VR headsets have used controllers to +point at things, but Apple insists that the entire thing should be done using +only hand tracking and eye tracking. Developers of VR video games and other +apps (like Firefox) will inevitably make many mistakes on the road to +supporting this new headset. + +A major source of defect-causing *accidental complexity* is the lack of clear +expectations and documentation from Microsoft for apps (like Firefox) that are +not using their Universal Windows Platform (UWP). The Microsoft Developer +Network (MSDN) mentions concepts like inertia, overscroll, elastic bounce, +single-finger panning, etc., but the solution is presented in the context +of UWP, and the solution for non-UWP apps is either unclear or undocumented. + +Adding to this complexity is the fact that Windows itself has gone through +several iterations of input APIs for different classes of devices, and +these APIs interact with each other in ways that are surprising or +unintuitive. Again, the advice given on MSDN pertains to UWP apps, and the +documentation about the newer "pointer" based window messages is +a mix of incomplete and inaccurate. + +Finally, individual input devices have bugs in their driver software that +would disrupt even applications that are using the Windows input APIs perfectly. +Handling all of these deviations is impossible and would result in fragile, +unmaintainable code, but Firefox inevitably has to work around common ones to +avoid alienating large portions of the userbase. + + +================================================================================ +Technical Background +================================================================================ + + +A Quick Primer on Pointing Devices +====================================== + + +Traditionally, web browsers were designed to accommodate computer mice and +devices that behave in a similar way, like trackballs and touchpads on +laptops. Generally, it was assumed that there would be one such device attached +to the computer, and it would be used to control a hovering "cursor" whose +movements would be changed by relative movement of the physical input device. + +However, modern computers can be controlled using a variety of different +pointing devices, all with different characteristics. Many allow +multiple concurrent targets to be pointed at and have multiple sensors, +buttons, and other actuators. + +For example, the screen of the Microsoft Surface Pro has dual capabilities +of being a touch sensor and a digitizer for a tablet pen. When being used as a +workstation, it's not uncommon for a user to also connect the "keyboard + +touchpad" cover and a mouse (via USB or Bluetooth) to provide the more +productivity-oriented "keyboard and mouse" setup. In that configuration, there +are 4 pointer devices connected to the machine simultaneously: a touch screen, +a pen digitizer, a touchpad, and a mouse. + +The next section will give a quick overview of common pointing devices. +Many will be familiar to the reader, but they are still mentioned to establish +common terminology and to avoid making assumptions about familiarity with every +input device. + + +Common Pointing Devices +--------------------------- + +Here are some descriptions of a few pointing device types that demonstrate +the diversity of hardware: + +**Touchscreen** + + A touchscreen is a computer display that is able to sense the + location of (possibly-multiple) fingers (or stylus) making contact with its + surface. Software can then respond to the touches by changing the displayed + objects quickly, giving the user a sense of actually physically manipulating + them on screen with their hands. + + .. image:: touchscreen.jpg + :width: 25% + + +**Digitizing Tablet + Pen Stylus** + + These advanced pointing devices tend to + exist in two forms: as an external sensing "pad" that can be plugged into a + computer and sits on a desk or in someone's lap, or as a sensor built right + into a computer display. Both use a "stylus", which is a pen-shaped + electronic device that is detectable by the surface. Common features + include the ability to distinguish proximity to the surface ("hovering") + versus actual contact, pressure sensitivity, angle/tilt detection, multiple + "ends" such as a tip and an eraser, and one-or-more buttons/switch + actuators. + + .. image:: wacom_tablet.png + :width: 25% + + +**Joystick/Pointer Stick** + + Pointer sticks are most often seen in laptop + computers made by IBM/Lenovo, where they exist as a little red nub located + between the G, H, and B keys on a standard QWERTY keyboard. They function + similarly to the analog sticks on a game controller -- The user displaces + the stick from its center position, and that is interpreted as a relative + direction to move the on-screen cursor. A greater displacement from center + is interpreted as increased velocity of movement. + + .. image:: trackpoint.jpg + :width: 25% + + +**Touchpad** + + A touchpad is a rectangular surface (often found on laptop + computers) that detects touch and motion of a finger and moves an on-screen + cursor relative to the motion. Modern touchpads often support multiple + touches simultaneously, and therefore offer functionality that is quite + similar to a touchscreen, albeit with different movement semantics because + of their physical separation from the screen (discussed below). + + .. image:: touchpad.jpg + :width: 25% + + +**VR Controllers** + + VR controllers (and other similar devices like the + Wiimote from the Nintendo Wii) allow users to point at objects in a + three-dimensional virtual world by moving a real-world controller and + "projecting" the controller's position into the virtual space. They often + also include sensors to detect the yaw, pitch, and roll of the sensors. + There are often other inputs in the controller device, like analog sticks + and buttons. + + .. image:: vrcontroller.jpg + :width: 25% + + +**Hand Tracking** + + Devices like the Apple Vision (introduced during the + time this document was being written) and (to a lesser extent) the Meta + Quest have the ability to track the wearer's hand and directly interpret + gestures and movements as input. As the human hand can assume a staggering + number of orientations and configurations, a finite list of specific shapes + and movements must be identified and labelled to allow for clear + software-user interaction. + + .. image:: apple_vision_user.webp + :width: 25% + + .. image:: apple_vision.jpg + :width: 25% + + +**Mouse** + + A pointing device that needs no introduction. Moving a physical + clam-shaped device across a surface translates to relative movement of a + cursor on screen. + + .. image:: mouse.jpg + :width: 25% + + +The Buxton Three-State Model +------------------------------- + + +Bill Buxton, an early pioneer in the field of human-computer interaction, +came up with a three-state model for pointing devices; a device can be +"Out of Range", "Tracking", or "Dragging". Not all devices support all three +states, and some devices have multiple actuators that can have the three-state +model individually applied. + +.. mermaid:: + + stateDiagram-v2 + direction LR + state "State 0" as s0 + state "State 1" as s1 + state "State 2" as s2 + s0 --> s0 : Out Of Range + s1 --> s1 : Tracking + s2 --> s2 : Dragging + s0 --> s1 : Stylus On + s1 --> s0 : Stylus Lift + s1 --> s2 : Tip Switch Close + s2 --> s1 : Tip Switch Open + + +For demonstration, here is the model applied to a few devices: + +**Computer Mouse** + + A mouse is never in the "Out of Range" state. Even though it can technically + be lifted off its surface, the mouse does not report this as a separate + condition; instead, it behaves as-if it is stationary until it can once + again sense the surface moving underneath. + + The remaining two states apply to each button individually; when a button is + not being pressed, the mouse is considered in the "tracking" state with + respect to that button. When a button is held down, the mouse is "dragging" + with respect to that button. A "click" is simply considered a zero-length + drag under this model. + + In the case of a two-button mouse, this means that the mouse can be in a + total of 4 different states: tracking, left button dragging, right button + dragging, and two-button dragging. In practice, very little software + actually does anything meaningful with two-button dragging. + +**Touch Screen** + + Applying the model to a touch screen, one can observe that current hardware + has no way to sense that a finger that is "hovering, but not quite making + contact with the screen". This means that the "Tracking" state can be ruled + out, leaving only the "Out of Range" and "Dragging" states. Since many touch + screens can support multiple fingers touching the screen concurrently, and + each finger can be in one of two states, there are potentially 2^N different + "states" that a touchscreen can be in. Windows assigns meaning to many two, + three, and four-finger gestures. + +**Tablet Digitizer** + + A tablet digitizer supports all three states: when the stylus is far away + from the surface, it is considered "out of range"; when it is located + slightly above the surface, it is "tracking"; and when it is making contact + with the surface, it is "dragging". + +The W3C standards for pointing devices are based on this three-state model, but +applied to each individual web element instead of the entire system. This +makes things like "Out-of-Range" possible for the mouse, since it can be +out of range of a web element. + +The W3C uses the terms "over" and "out" to convey the transition between +"out-of-range" and "tracking" (which the W3C calls "hover"), and the terms +"down" and "up" convey the transition between "tracking" and "dragging". + +The standard also address some of the known shortcomings of the model to +improve portability and consistency; these improvements will be discussed more +below. + +The Windows Pointer API is *supposedly* based around this model, +but unfortunately real-world testing shows that the model is not followed +very consistently with respect to the actual signals sent to the application. + + +Gestures +===================================== + + +In contrast to the sort-of "anything goes" UI designs of the past, +modern operating systems like Windows, Mac OS X, iOS, Android, and even +modern Linux DEs have an "opinionated" idea of how user interaction +should behave across all apps on the platform (the so-called "look and feel" +of the operating system). + +Users expect gestures like swipes, pinches, and taps to act the same way +across all apps for a given operating system, and they expect things like +on-screen keyboards or handwriting recognition to pop up in certain contexts. +Failing to meet those expectations makes an app look less polished, and +(especially as far as accessibility is concerned) it frustrates the user +and makes it more difficult for them to interact with the app. + +Microsoft defines guidelines for various behaviours that Windows applications +should ideally adhere to in the `Input and Interactions <https://learn.microsoft.com/en-us/windows/apps/design/input/>`__ +section on MSDN. Some of these are summarized quickly below: + +**Drag and Drop** + + Drag and drop allows a user to transfer data from one application to + another. The gesture begins when a pointer device moves into the "Dragging" + state over top of a UI element, usually as a result of holding down a mouse + button or pressing a finger on a touchscreen. The user moves the pointer + over top of the receiver of the data, and then ends the gesture by releasing + the mouse button or lifting their finger off the touchscreen. Window + interprets this transition out of the "Dragging" state as permission to + initiate the data transfer. + + Firefox has supported Drag and Drop for a very long time, so it will not be + discussed further. + + +**Pan and Zoom** + + When using touchscreens (and multi-touch touchpads), users expect to be able + to cause the viewport to "pan" left/right/up/down by pressing two fingers on + the screen (creating two pointers in "Dragging" state) and moving their + fingers in the direction of movement. When they are done, they can release + both fingers (changing both pointers to "Out of Bounds"). + + A zoom can be signalled by moving the two fingers apart or together + in a "pinch" or "reverse pinch" gesture. + + +**Single Pointer Panning** + + Applications that are based on a UI model of the user interacting with a + "page" often allow a single pointer "Dragging" over the viewport to cause + the viewport to pan, similarly to the two-finger panning discussed in the + previous section. + + Note that this gesture is not as universal as two-finger panning is -- as a + counterexample, graphics programs tend to treat one-finger dragging as + object manipulation and two-finger dragging as viewport panning. + + +**Inertia** + + When a user is done panning, they may lift their finger/pen off the screen + while the viewport is still in motion. Users expect that the page will + continue to move for a little while, as-if the user had "tossed" the page + when they let go. Effectively, the page behaves as though it has "momentum" + that needs to be gradually lost before the page comes to a full stop. + + Modern operating systems provide this behavior via their various native + widget toolkits, and the curve that objects follow as they slow to a stop + are different across OSes. In that way, they can be considered part of the + unique "look and feel" of the OS. Users expect the scrolling of pages in + their web browser to behave this way, and so when Firefox fails to provide + this behavior it can be jarring. + + +**Overscroll and Elastic Bounce** + + When a user is panning the page and reaches the outer edges, Microsoft + recommends that the app should begin an "elastic bounce" animation, where + the page will allow the user to scroll past the end ("overscroll"), + show empty space underneath the page, and then sort of "snap back" like a + rubber band that's been stretched and then released. You can see a + demonstration in `this article <https://www.windowslatest.com/2020/05/21/microsoft-is-adding-elastic-scrolling-to-chrome-on-windows-10/>`__, + which discusses Microsoft adding it to Chromium. + + +History of Web Standards and Windows APIs +=========================================== + +The World-Wide Web Consortium (W3C) and the Web Hypertext Application +Technology Working Group (WHATWG) manage the standards that detail the +interface between a user agent (like Firefox) and applications designed to run +on the Web Platform. The user agent, in turn, must rely on the operating system +(Windows, in this case) to provide the necessary APIs to implement the +standards required by the Web Platform. + +As a result of that relationship, a Web Standard is unlikely to be created +until all widely-used operating systems provide the required APIs. That allows +us to build a linear timeline with a predictable pattern: a new type of device +becomes popular, the APIs to support it are introduced into operating systems, +and eventually a cross-platform standard is introduced into the Web Platform. + +The following sections detail the history of input devices supported by +Windows and the Web Platform: + + +**1985 - Computer Mouse Support (Windows 1.0)** + + The first version of Windows (1985) supported a computer mouse. Support + for other input devices is not well-documented, but probably non-existant. + + +**1991 - Third-Party De-facto Pen Support (Wintab)** + + In the late 80s and early 90s, any tablet pen hardware vendor that wanted + to support Windows would need to write a device driver and design a + proprietary user-mode API to expose the device to user applications. In + turn, application developers would have to write and maintain code to + support the APIs of every relevant device vendor. + + In 1991, a company named LCS/Telegraphics released an API for Windows + called "Wintab", which was designed in collaboration with hardware and + software vendors to define a general API that could be targetted by + device drivers and applications. + + It would take Microsoft more than a decade to include first-party support + for tablet pens in Windows, which allowed Wintab to become the de-facto + standard for pen support on Windows. The Wintab API continues to be + supported by virtually all artist tablets to this day. Notable companies + include Wacom, Huion, XP-Pen, etc. + + +**1992 - Early Windows Pen Support (Windows for Pen Computing)** + + The earliest Windows operating system to support non-mouse pointing devices + was Windows 3.1 with the "Windows for Pen Computing" add-on (1992). + (`For the curious <https://socket3.wordpress.com/2019/07/31/windows-for-pen-computing-1-0/>`__, + and I'm certain `this book <https://www.amazon.com/Microsoft-Windows-Pen-Computing-Programmers/dp/1556154690>`__ + is a must-read!). Pen support was mostly implemented by translating actions + into the existing ``WM_MOUSExxx`` messages, but also "upgraded" any + application's ``EDIT`` controls into ``HEDIT`` controls, which looked the + same but were capable of being handwritten into using a pen. This was not + very user-friendly, as the controls stayed the same size and the UI was not + adapted to the input method. This add-on never achieved much popularity. + + It is not documented whether Netscape Navigator (the ancestor of Mozilla + Firefox) supported this add-on or not, but there is no trace of it in modern + Firefox code. + + +**1995 - Introduction of JavaScript and Mouse Events (De-facto Web Standard)** + + The introduction of JavaScript in 1995 by Netscape Communications added a + programmable, event-driven scripting environment to the Web Platform. + Browser vendors quickly added the ability for scripts to listen for and + react to mouse events. These are the well-known events like ``mouseover``, + ``mouseenter``, ``mousedown``, etc. that are ubiquitous on the web, and are + known by basically anyone who has ever written front-end JavaScript. + + This ubiquity created a de-facto standard for mouse input, which would + eventually be formally standardized by the W3C in the HTML Living Standard + in 2001. + + The Mouse Event APIs assume that the computer has one single pointing device + which is always present, has a single cursor capable of "hovering" over an + element, and has between one and three buttons. + + When support for other pointing devices like touchscreen and pen first + became available in operating systems, it was exposed to the web by + interpreting user actions into equivalent mouse events. Unfortunately, this + is unable to handle multiple concurrent pointers (like one would get from + multitouch screens) or report the kind of rich information a pen digitizer + can provide, like tilt angle, pressure, etc. This eventually lead the W3C + to develop the new "Touch Events" standard to expose touch functionality, + and eventually the "Pointer Events" to expose more of the rich information + provided by pens. + + +**2005 - Mainstream Pen Support (Windows XP Tablet PC Edition)** + + It was the release of Windows XP Tablet PC Edition (2005) that allowed + Windows applications to directly support tablet pens by using the new COM + "`Windows Tablet PC <https://learn.microsoft.com/en-us/windows/win32/tablet/tablet-pc-development-guide>`__" + APIs, most of which are provided through the main `InkCollector <https://learn.microsoft.com/en-us/windows/win32/tablet/inkcollector-class>`__ + class. The ``InkCollector`` functionality would eventually be "mainlined" + into Windows XP Professional Service Pack 2, and continues to exist in + modern Windows releases. + + The Tablet PC APIs consist of a large group of COM objects that work + together to facilitate enumerating attached pens, detecting pen movement and + pen strokes, and analyzing them to provide: + + 1. **Cursor Movement**: translates the movements of the pen into the + standard mouse events that applications expect from mouse cursor + movement, namely ``WM_NCHITTEST``, ``WM_SETCURSOR`` and + ``WM_MOUSEMOVE``. + + 2. **Gesture Recognition**: detects common user actions, like "tap", + "double-tap", "press-and-hold", and "drag". The `InkCollector` delivers + these events via COM `SystemGesture <https://learn.microsoft.com/en-us/windows/win32/tablet/inkcollector-systemgesture>`__ + events using the `InkSystemGesture <https://learn.microsoft.com/en-us/windows/win32/api/msinkaut/ne-msinkaut-inksystemgesture>`__ + enumeration. It will also translate them into common Win32 messages; for + example, a "drag" gesture would be translated into a ``WM_LBUTTONDOWN`` + message, several ``WM_MOUSEMOVE`` messages, and finally a + ``WM_LBUTTONUP`` message. + + An application that is using ``InkCollector`` will receive both types of + messages: traditional mouse input through the Win32 message queue, and + "Tablet PC API" events through COM callbacks. It is up to the + application to determine which events matter to it in a given context, + as the two types of events are not guaranteed by Microsoft to correspond + in any predictable way. + + 3. **Shape and Text Recognition**: allows the app to + recognize letters, numbers, punctuation, and other `common shapes <https://learn.microsoft.com/en-us/windows/win32/api/msinkaut/ne-msinkaut-inkapplicationgesture>`__ + the user might make using their pen. Supported shapes include circles, + squares, arrows, and motions like "scratch out" to correct a misspelled + word. Custom recognizers exist that allow recognition of other symbols, + like music notes or mathematical notation. + + 4. **Flick Recognition**: allows the user to invoke actions via quick, + linear motions that are recognized by Windows and sent to the app as + ``WM_TABLET_FLICK`` messages. The app can choose to handle the window + message or pass it on to the default window procedure, which will + translate it to scrolling messages or mouse messages. + + For example, a quick upward 'flick' corresponds to "Page up", and + a quick sideways flick in a web browser would be "back". Flicks were + never widely used by Windows apps, and they may have been removed in + more recent versions of Windows, as the existing Control Panel menus + for configuring them seem to no longer exist as of Windows 10 22H2. + + + Firefox does not appear to have ever used these APIs to allow tablet pen + input, with the exception of `one piece of code <https://searchfox.org/mozilla-central/rev/e6cb503ac22402421186e7488d4250cc1c5fecab/widget/windows/InkCollector.cpp>`__ + to detect when the pen leaves the Firefox window to solve + `Bug 1016232 <https://bugzilla.mozilla.org/show_bug.cgi?id=1016232>`__. + + +**2009 - Touch Support: WM_GESTURE (Windows 7)** + + While attempts were made with the release of Windows Vista (2007) to support + touchscreens through the existing tablet APIs, it was ultimately the release + of Windows 7 (2009) that brought first-class support for Touchscreen devices + to Windows with new Win32 APIs and two main window messages: ``WM_TOUCH`` + and ``WM_GESTURE``. + + These two messages are mutually-exclusive, and all applications are + initially set to receive only ``WM_GESTURE`` messages. Under this + configuration, Windows will attempt to recognize specific movements on a + touch digitizer and post "gesture" messages to the application's message + queue. These gestures are similar to (but, somewhat-confusingly, not + identical to) the gestures provided by the "Windows Tablet PC" APIs + mentioned above. The main gesture messages are: zoom, pan, rotate, + two-finger-tap, and press-and-tap (one finger presses, another finger + quickly taps the screen). + + In contrast to the behavior of the ``InkCollector`` APIs, which will send + both gesture events and translated mouse messages, the ``WM_GESTURE`` + message is truly "upstream" of the translated mouse messages; the translated + mouse messages will only be generated if the application forwards the + ``WM_GESTURE`` message to the default window procedure. This makes + programming against this API simpler than the ``InkCollector`` API, as + there is no need to state-fully "remember" that an action has already been + serviced by one codepath and needs to be ignored by the other. + + Firefox current supports the ``WM_GESTURE`` message when Asynchronous Pan + and Zoom (APZ) is not enabled (although we do not handle inertia in this + case, so the page comes to a dead-stop immediately when the user stops + scrolling). + + +**2009 - Touch Support: WM_TOUCH (Windows 7)** + + Also introduced in Windows 7, an application that needs full control over + touchscreen events can use `RegisterTouchWindow <https://learn.microsoft.com/en-us/windows/win32/api/winuser/nf-winuser-registertouchwindow>`__ + to change any of its windows to receive ``WM_TOUCH`` messages instead of the + more high-level ``WM_GESTURE`` messages. These messages explicitly notify + the application about every finger that contacts or breaks contact with the + digitizer (as well as each finger's movement over time). This provides + absolute control over touch interpretation, but also means that the burden + of handling touch behavior falls completely on the application. + + To help ease this burden, Microsoft provides two COM APIs to interpret + touch messages, ``IManipulationProcessor`` and ``IInertiaProcessor``. + + ``IManipulationProcessor`` can be considered a superset of the functionality + available through normal gestures. The application feeds ``WM_TOUCH`` data + into it (along with other state, such as pivot points and timestamps), and + it allows for manipulations like: two-finger rotation around a pivot, + single-finger rotation around a pivot, simultaneous rotation and translation + (for example, 'dragging' a single corner of a square). + `These MSDN diagrams <https://learn.microsoft.com/en-us/windows/win32/wintouch/advanced-manipulations-overview>`__ + give a good overview of the kinds of advanced manipulations an app might + support. + + ``IInertiaProcessor`` works with ``IManipulationProcessor`` to add inertia + to objects in a standard way across the operating system. It is likely that + later APIs that provide this (like DirectManipulation) are using these COM + objects under the hood to accomplish their inertia handling. + + Firefox currently handles the ``WM_TOUCH`` event when Asynchronous Pan and + Zoom (APZ) is enabled, but we do not use either the ``IInertiaProcessor`` + nor the ``IManipulationProcessor``. + + +**2012 - Unified Pointer API (Windows 8)** + + Windows 8 (2012) was Microsoft's initial attempt to make a touch-first, + mobile-first operating system that (ideally) would make it easy for app + developers to treat touch, pen, and mouse as first-class input devices. + + By this point, the Windows Tablet APIs would allow tablet pens to draw + text and shapes like squares, triangles, and music notes, and those shapes + would be recognizable by the Windows Ink subsystem. + + At the same time, Windows Touch allowed touchscreens to have advanced + manipulation, like rotate + translate, or simultaneous pan and zoom, and it + allowed objects manipulated by touch to have momentum and angular velocity. + + The shortcomings of having separate input stacks for these various devices + starts to be become apparent after a while: Why shouldn't a touchscreen be + able to recognize a circle or a triangle? Why shouldn't a pen be able to + have complex rotation and zoom functionality? How do we handle these newer + laptop touchpads that are starting to handle multi-touch gestures like a + touchscreen, but still cause relative cursor movement like a mouse? Why does + my program have to have 3 separate codepaths for different pointing devices + that are all very similar? + + The Windows Pointer Device Input Stack introduces new APIs and window + messages that generalize the various types of pointing devices under a + single API while still falling back to the legacy touch and tablet input + stacks in the event that the API is unused. (Note that the touch and tablet + stacks themselves fall back to the traditional mouse input stack when they + are unused.) + + Microsoft based their pointer APIs off the Buxton Three-State Model + (discussed earlier), where changes between "Out-of-Range" and "Tracking" are + signalled by ``WM_POINTERENTER`` AND ``WM_POINTERLEAVE`` messages, and + changes between "Tracking" and "Dragging" are signalled by + ``WM_POINTERDOWN`` and ``WM_POINTERUP``. Movement is indicated via + ``WM_POINTERUPDATE`` messages. + + If these messages are unhandled (the message is forwarded to + ``DefWindowProc``), the Win32 subsystem will translate them + into touch or gesture messages. If unhandled, those will be further + translated into mouse and system messages. + + While the Pointer API is not without some unfortunate pitfalls (which will + be discussed later), it still provides several advantages over the + previously available APIs: it can allow a mostly-unified codepath for + handling pointing devices, it circumvents many of the often-complex + interactions between the previous APIs, and it provides the ability to + simulate pointing devices to help facilitate end-to-end automated testing. + + Firefox currently uses the Pointer APIs to handle tablet stylus input only, + while other input methods still use the historical mouse and touch input + APIs above. + + +**2013 - DirectManipulation (Windows 8.1)** + + DirectManipulation is a DirectX based API that was added during the release + of Windows 8.1 (2013). This API allows an app to create a series of + "viewports" inside a window and have scrollable content within each of these + viewports. The manipulation engine will then take care of automatically + reading Pointer API messages from the window's event queue and generating + pan and zoom events to be consumed by the app. + + In the case that the app is also using DirectComposition to draw its window, + DirectManipulation can pipe the events directly into it, causing the app + to essentially get asynchronous pan and zoom with proper handling of inertia + and overscroll with very little coding. + + DirectManipulation is only used in Firefox to handle data coming from + Precision Touchpads, as Microsoft provides no other convenient API for + obtaining data from such devices. Firefox creates fake content inside of + a fake viewport to capture the incoming events from the touchpad and + translates them into the standard Asynchronous Pan and Zoom (APZ) events + that the rest of the input pipeline uses. + + +**2013 - Touch Events (Web Standard)** + + "`Touch Events <https://www.w3.org/TR/touch-events/>`__" became a W3C + recommendation in October, 2013. + + At this point, Microsoft's first operating system to include touch support + (Windows 7) was the most popular desktop operating system, and the ubiquity + of smart phones brought a huge uptick in users with touchscreen inputs. All + major browsers included some API that allowed reading touch input, + prompting the W3C to formalize a new standard to ensure interoperability. + + With the Touch Events API, multiple touch interactions may be reported + simultaneously, each with their own separate identifier for tracking and + their own coordinates within the screen, viewport, and client area. A + touch is reported by: a ``touchstart`` event with a unique ID for each + contact, zero-or-more ``touchmove`` events with that ID, and finally a + ``touchend`` event to signal the end of that specific contact. + + The API also has some amount of support for pen styluses, but it lacks + important features necessary to truly support them: hovering, pressure, + tilt, or multiple cursors like an erasure. Ultimately, its functionality + has been superceded by the newer "Pointer Events" API, discussed below. + + +**2016 - Precision Touchpads (Windows 10)** + + Early touchpads emulated a computer mouse by directly using the same IBM + PS/2 interface that most computer mice used and translating relative + movement of the user's finger into equivalent movements of a mouse on a + surface. + + As touchpad technology advanced and more powerful interface standards like + USB begun to take over the consumer market, touchpad vendors started adding + extra features to their hardware, like tap-to-click, tap-and-drag, and + tap-and-hold (to simulate a right click). These behaviors were implemented + by touchpad vendors either in hardware drivers and/or user mode "hooks" that + injected equivalent Win32 messages into the appropriate target. + + As expected, each touchpad vendor's driver had its own subtly-different + behavior from others, its own bugs, and its own negative interactions with + other software. + + During the later years of Windows 8, Microsoft and touchpad company + Synaptics co-developed the "Precision Touchpad" standard, which defines an + interface for touchpad hardware to report its physical measurements, + precision, and sensor configuration to Windows and allows it to deliver raw + touch data. Windows then interprets the data and generates gestures and + window messages in a standard way, removing the burden of implementing these + behaviors from the touchpad vendor and providing the OS with rich + information about the user's movements. + + It wasn't until the 2016 release of Windows 10 14946 that Microsoft would + support all the standard gestures through the new standard. Although + adoption by vendors has been a bit slow, the fact that + `it is a requirement for Windows 11 <https://pocketnow.com/all-windows-11-pcs-will-be-required-to-have-a-precision-touchpad-and-webcam/>`__ + means that vendor support for this standard is imminent. + + Unfortunately, there's a piece of bad news: Microsoft did not + implement the above "Unified Pointer API" for use with touchpads, as the + developers of Blender discovered when `they moved to the Pointer API <https://archive.blender.org/developer/D7660>`__. + Instead, Microsoft expects developers to either use DirectManipulation to + automatically get pan/zoom enabled for their app, or the RawInput API to + directly read touchpad data. + + +**2019 - Pointer Events (Web Standard)** + + "`Pointer Events <https://www.w3.org/TR/pointerevents/>`__" became a level 2 + W3C recommendation in April, 2019. They considered `the work done by Microsoft <https://www.w3.org/Submission/2012/SUBM-pointer-events-20120907/>`__ + as part of the design of their own Pointer API, and in many ways the W3C + standard resembles an improved, better specified, more consistent, and + easier-to-use version of the APIs provided by the Win32 subsystem. + + The Pointer Events API generalizes devices like touchscreens, mice, tablet + pens, VR controllers, etc. into a "thing that points". A pointer has + (optional) properties: a width and height (big for a finger, 1px for a + mouse), an amount of pressure, a tilt angle relative to the surface, some + buttons, etc. This helps applications maximize code reuse for handling + pointer input by having a common codebase written against these generalized + traits. If needed, the application may also have smaller, specialized + sections of code for each concrete pointer type. + + Certain types of pointers (like pens and touchscreens) have a behavior where + they are always "captured" by the first object that they interact with. For + example, if a user puts their finger on an empty part of a web page and + starts to scroll, their finger is now "captured" by the web page itself. + "Captured" means that even if their finger moves over an element in + the web page, that element will not receive events from the finger -- the + page itself will until the entire interaction stops. + + The events themselves very closely follow the Buxton Three-State Model + (discussed earlier), where ``pointerover/pointerout`` messages indicate + transitions from "Out of Range" to "Tracking" and visa-versa, and + ``pointerdown/pointerup`` messages transition between "Tracking" and + "Dragging". ``pointermove`` updates the position of the pointer, and a + special ``pointercancel`` message is sent to inform the page that the + browser is "cancelling" a ``pointerdown`` event because it has decided to + consume it for a gesture or because the operating system cancelled the + pointer for its own reasons. + + +CSS "interaction" Media Queries +========================================== + +(Note that this section is **not** about the `pointer-events <https://developer.mozilla.org/en-US/docs/Web/CSS/pointer-events>`__ +CSS property, which defines the circumstances where an element can be the target +of pointer events.) + +The W3C defines the interaction-related media queries in the +`Media Queries Level 4 - Interaction Media Features <https://www.w3.org/TR/mediaqueries-4/#mf-interaction>`__ +document. + +To summarize, the main interaction-related CSS Media Queries that Firefox must +support are ``pointer``, ``any-pointer``, ``hover`` and ``any-hover``. + + +``pointer`` + + Allows the webpage to query the existence of a pointing device on + the machine, and (if available) the assumed "pointing accuracy" of the + "primary" pointing device. The device considered "primary" on a machine with + multiple input devices is a policy decision that must be made by the web + browser; Windows simply provides the APIs to query information about + attached devices. + + The browser is expected to return one of three strings to this media query: + + ``none`` + + There is no pointing device attached to the computer. + + ``coarse`` + + The primary pointing device is capable of approximately + pointing at a relatively large target (like a finger on a + touchscreen). + + ``fine`` + + The primary pointing device is capable of near-pixel-level + accuracy (like a computer mouse or a tablet pen). + + +``any-pointer`` + + Similar to ``pointer``, but represents the union of + capabilities of all pointers attached to the system, such that the meanings + become: + + ``none`` + + There is no pointing device attached to the computer. + + ``coarse`` + + There is at-least one "coarse" pointer attached. + + ``fine`` + + There is at-least one "fine" pointer attached. + + +``hover`` + + Allows the webpage to query whether the primary pointer is + capable of "hovering" over top of elements on the page. Computer mice, + touchpad cursors, and higher-end pen tablets all support this, whereas + current touchscreens are "touch" or "no touch", and they cannot detect a + finger hovering over the screen. + + ``hover`` + + The primary pointer is capable of reporting hovering. + + ``none`` + + The primary pointer is not capable of reporting hovering. + +``any-hover`` + + Indicates whether any pointer attached to the system has the + ``hover`` capability. + + +Selection of the Primary Pointing Device +-------------------------------------------- + +To illustrate the complexity of this topic, consider the Microsoft Surface Pro. + +The Surface Pro has an advanced screen that is capable of receiving touch +input, but it can also behave like a pen digitizer and receive input from a +stylus with advanced pen capabilities, like hover sensing, pressure +sensitivity, multiple buttons, and even multiple "tips" (a pen and eraser end). + +In this case, what should Firefox consider the primary pointing device? + +Perhaps the user intends to use their Surface Pro like a touchscreen tablet, +at which point Firefox should report ``pointer: coarse`` and ``hover: none`` +capabilities. + +But what if, instead, the user wants to sketch art or take notes using a pen on +their Surface Pro? In this case, Firefox should be reporting ``pointer: fine`` +and ``hover: hover``. + +Imagine that the user then attaches the "keyboard + touchpad" cover attachment +to their Surface Pro; naturally, we will consider that the user's intent is for +the touchpad to become the primary pointing device, and so it is fairly clear +that we should return ``pointer: fine`` and ``hover: hover`` in this state. + +However, what if the user tucks the keyboard/touchpad attachment behind the +tablet and begins exclusively operating the device with their finger? + +This example shows that complex, multi-input machines can resist classification +and blur the lines between labels like "touch device", "laptop", "drawing +tablet", etc. It also illustrates that identifying the "primary" pointing +device using only machine configuration may yield unintuitive and suboptimal +results. + +While we can almost-certainly improve our hardware detection heuristics to +better answer this question (and we should, at the very least), perhaps it +makes more sense for Firefox to incorporate user intentions into the decision. +Intentions could be communicated directly by the user through some sort of +setting or indirectly through the user's actions. + +For example, if the user intends to draw on the screen with a pen, perhaps +Firefox provides something like a "drawing mode" that the user can toggle to +change the primary pointing device to the pen. Or perhaps it's better for +Firefox to interpret the mere fact of receiving pen input as evidence of the +user's intent and switch the reported primary pointing device automatically. + +If we wanted to switch automatically, there are predictable traps and pitfalls +we need to think about: we need to ensure that we don't create frustrating user +experiences where web pages may "pop" beneath the user suddenly, and +we should likely incorporate some kind of "settling time" so we don't +oscillate between devices. + +It's worth noting that Chromium doesn't seem to incorporate anything like +what's being suggested here, so if this is well-designed it may be an +opportunity for Firefox to try something novel. + + + + +================================================================================ +State of the Browser +================================================================================ + +Pan and Zoom, Inertia, Overscroll, and Elastic Bounce +========================================================= + +As can be seen in the videos below, Firefox's support for inertia, overscroll, +and elastic bounce works well on all platforms when a stylus pen is used +as the input device, and it also works just fine with the touchscreen on the +Dell XPS 15. However, it completely fails when the touchscreen is used on +the Microsoft Surface Pro. While more investigation is needed to completely +understand these issues, the fact that the correctly-behaving digitizing pens +use the Pointer API and the misbehaving input devices do not may be related. + +- `Video 1 <https://drive.google.com/file/d/1Z1QRSf2RluNhJwkKCzPb6-14vRtkqK8s/view?usp=sharing>`__ + showcasing overscroll and bounce not working on Surface Pro with touch, but + other devices/inputs are working + +- `Video 2 <https://drive.google.com/file/d/1bOgpVGBeZtwelvPJzYdA6uFRpubGtu4W/view?usp=sharing>`__ + showing that everything works just fine with an external Wacom digitizer + + +Pointer Media Queries +========================================================= + +**"any-pointer" Queries** + +Unlike the ``pointer`` media queries, which rely on the browser to make a policy +decision about what should be considered the "primary" pointer in a given +system configuration, the ``any-pointer`` queries are much more objective and +binary: the computer either has a type of device attached to it, or it +doesn't. + +**any-pointer: coarse** + +Firefox reports that there are "coarse" pointing devices present if either of +these two points is true: + +1. ``GetSystemMetrics(SM_DIGITIZER)`` reports that a device that supports + touch or pen is present. + +2. Based on heuristics, Firefox concludes that it is running on a computer it + considers a "tablet". + +Point #1 is incorrect, as a pen is not a "coarse" pointing device. Note that +this is a recent regression in `Bug 1811303 <https://bugzilla.mozilla.org/show_bug.cgi?id=1811303>`__ +that was uplifted to Firefox 112, so this actually regressed as this document +was being written! This is responsible for the incorrect "Windows 10 Desktop + +Wacom USB Tablet" issue in the table. + +Point #2 is a clear case of the `XY Problem <https://en.wikipedia.org/wiki/XY_problem>`__, +where Firefox is trying to determine if a coarse pointing device is present +by determining whether it is running on a tablet, when instead it should be +directly testing for coarse pointing devices (since, of course, those can exist +on machines that wouldn't normally be considered a "tablet"). This is +responsible for the incorrect "Windows 10 Dell XPS 15 (Touch Disabled) + Wacom +USB Tablet" issue in the table below. + +**any-pointer: fine** + +Firefox reports that there are "fine" pointing devices present if and only if +it detects a mouse. This is clearly already wrong. Firefox determines that the +computer has a mouse using the following algorithm: + +1. If ``GetSystemMetrics(SM_MOUSEPRESENT)`` returns false, report no mouse. + +2. If Firefox does not consider the current computer to be a tablet, report a + mouse if there is at-least one "mouse" device driver running on the + computer. + +3. If Firefox considers the current computer to be a tablet or a touch system, + only report a mouse if there are at-least two "mouse" device drivers + running. This exists because some tablet pens and touch digitizers report + themselves as computer mice. + +This algorithm also suffers from the XY problem -- Firefox is trying to +determine whether a fine pointing device exists by determining if there is +a computer mouse present, when instead it should be directly testing for +fine pointing devices, since mice are not the only fine pointing +devices. + +Because of this proxy question, this algorithm is completely dependent on any +attached fine pointing device (like a pen tablet) to report itself as a mouse. +Point #3 makes the problem even worse, because if a computer that resembles a +tablet fails to report its digitizers as mice, the algorithm will completely +ignore an actual computer mouse attached to the system because it expects two +of them to be reported! + +Unfortunately, the Surface Pro has both a pen digitizer and a touch digitizer, +and it reports neither as a mouse. As a result, this algorithm completely falls +apart on the Surface Pro, failing to report any "fine" pointing device even +when a computer mouse is plugged in, a pen is plugged in, or even when +the tablet is docked because its touchpad is only one mouse and it expects +at least two. + +This is also responsible for failing to report the trackpad on the Dell XPS 15 +as "fine", because the Dell XPS 15 has a touchscreen and therefore looks like +a "tablet", but doesn't report 2 mouse drivers. + +**any-pointer: hover** + + +Firefox reports that any device that is a "fine" pointer also supports "hover", +which does generally hold true, but isn't necessarily true for lower-end pens +that only support tapping. It would be better for Firefox to directly +query the operating system instead of just assuming. + +**"pointer" media query** + +As discussed previously at length, this media query relies on a "primary" +designation made by the browser. Below is the current algorithm used to +determine this: + +1. If the computer is considered a "tablet" (see below), report primary + pointer as "coarse" (this is clearly already the wrong behavior). + +2. Otherwise, if the computer has a mouse plugged in, report "fine". + +3. Otherwise, if the computer has a touchscreen or pen digitizer, report + "coarse" (this is wrong in the case of the digitizer). + +4. Otherwise, report "fine" (this is wrong; should report "None"). + +Firefox uses the following algorithm to determine if the computer is a +"tablet" for point #1 above: + +1. It is not a tablet if it's not at-least running Windows 8. + +2. If Windows "Tablet Mode" is enabled, it is a tablet no matter what. + +3. If no touch-capable digitizers are attached, it is not a tablet. + +4. If the system doesn't support auto-rotation, perhaps because it has + no rotation sensor, or perhaps because it's docked and operating in + "laptop mode" where rotation won't happen, it's not a tablet. + +5. If the vendor that made the computer reports to Windows that it supports + "convertible slate mode" and it is currently operating in "slate mode", + it's a tablet. + +6. Otherwise, it's not a tablet. + + +**Table with comparison to Chromium** + +The following table shows how Firefox and Chromium respond to various pointer +queries. The "any-pointer" and "any-hover" columns are not subjective and +therefore are always either green or red to indicate "pass" or "fail", but the +"pointer" and "hover" may also be yellow to indicate that it's "open to +interpretation" because of the aforementioned difficulty in determining the +"primary pointer". + +.. image:: touch_media_queries.png + :width: 100% + + +**Related Bugs** + +- Bug 1813979 - For Surface Pro media query "any-pointer: fine" is true only + when both the Type Cover and mouse are connected + +- Bug 1747942 - Incorrect CSS media query matches for pointer, any-pointer, + hover and any-hover on Surface Laptop + +- Bug 1528441 - @media (hover) and (any-hover) does not work on Firefox 64/65 + where certain dual inputs are present + +- Bug 1697294 - Content processes unable to detect Windows 10 Tablet Mode + +- Bug 1806259 - CSS media queries wrongly detect a Win10 desktop computer + with a mouse and a touchscreen, as a device with no mouse (hover: none) + and a touchscreen (pointer: coarse) + + +Web Events +===================== + +The pen stylus worked well on all tested systems -- The correct pointer events +were fired in the correct order, and mouse events were properly simulated in +case the default behavior was allowed. + +The touchscreen input was less reliable. On the Dell XPS 15, the +"Pointer Events" were flawless, but the "Touch Events" were missing +an important step: the ``touchstart`` and ``touchmove`` messages were sent just +fine, but Firefox never sends the ``touchend`` message! (Hopefully that isn't +too difficult to fix!) + +Unfortunately, everything really falls apart on the Surface Pro using the +touchscreen -- neither the "Pointer Events" nor the "Touch Events" fire at all! +Instead, the touch is completely absorbed by pan and zoom gestures, and nothing +is sent to the web page. The website's request for ``touch-action: none`` is +ignored, and the web page is never given any opportunity to call +``Event.preventDefault()`` to cancel the pan/zoom behavior. + + +Operating System Interfaces +================================ + +As was discussed above, Windows has multiple input APIs that were each +introduced in newer version of Windows to handle devices that were not +well-served by existing APIs. + +Backward compatibility with applications designed against older APIs is +realized when applications call the default event handler (``DefWindowProc``) +upon receiving an event type that they don't recognize (which is what apps have +always been instructed to do if they receive events they don't recognize). +The unrecognized newer events will be translated by the default event handler +into older events and sent back to the application. A very old application may +have this process repeat through several generations of APIs until it finally +sees events that it recognizes. + +Firefox currently uses a mix of the older and newer APIs, which complicates +the input handling logic and may be responsible for some of the +difficult-to-explain bugs that we see reported by users. + +Here is an explanation of the codepaths Firefox uses to handle pointer input: + +1. Firefox handles the ``WM_POINTER[LEAVE|DOWN|UP|UPDATE]`` messages if the + input device is a tablet pen and an Asynchronous Pan and Zoom (APZ) + compositor is available. Note that this already may not be ideal, as + Microsoft warns (`here <https://learn.microsoft.com/en-us/windows/win32/inputmsg/wm-pointercapturechanged>`__) + that handling some pointer messages and passing other pointer messages to + ``DefWindowProc`` has unspecified behavior (meaning that Win32 may do + something unexpected or nonsensical). + + If the above criteria aren't met, Firefox will call ``DefWindowProc``, which + will re-post the pointer messages as either touch messages or mouse + messages. + +2. If DirectManipulation is being used for APZ, it will output the + ``WM_POINTERCAPTURECHANGED`` if it detects a pan or zoom gesture it can + handle. It will then handle the rest of the gesture itself. + + DirectManipulation is used for all top-level and popup windows as long as + it isn't disabled via the ``apz.allow_zooming``, + ``apz.windows.use_direct_manipulation``, or + ``apz.windows.force_disable_direct_manipulation`` prefs. + +3. If the pointing device is touch, the next action depends on + whether an Asynchronous Pan and Zoom (APZ) compositor is available. If it + is, the window will have been registered using ``RegisterTouchWindow``, and + Firefox will receive ``WM_TOUCH`` messages, which will be sent to the + "Touch Event" API and handled directly by the APZ compositor. + + If there is no APZ compositor, it will instead be received as a + ``WM_GESTURE`` message or a mouse message, depending on the movement. Note + that these will be more basic gestures, like tap-and-hold. + +4. If none of the above apply, the message will be converted into standard + ``WM_MOUSExxx`` messages via a call to ``DefWindowProc``. + + +================================================================================ +Discussion +================================================================================ + +Here is where some of the outstanding thoughts or questions can be listed. +This can be updated as more questions come about and (hopefully) as answers to +questions become apparent. + +CSS "pointer" Media Queries +=============================== + +- The logic for the ``any-pointer`` and ``any-hover`` queries are objectively + incorrect and should be rewritten altogether. That is not as + big of a job as it sounds, as the code is fairly straightforward and + self-contained. (Note: Improvements have already been made in + `Bug 1813979 <https://bugzilla.mozilla.org/show_bug.cgi?id=1813979>`__) + +- There are a few behaviors for ``pointer`` and ``hover`` that are + objectively wrong (such as reporting a ``coarse`` pointer when the + Surface Pro is docked with a touchpad). Those should be fixable with a + code change similar to the previous bullet. + +- Do we want to continue to use only machine configuration to decide what + the "primary" pointer is, or do we also want to incorporate user intent + into the algorithm? Or, alternatively: + + 1. Do we create a way for the user to override? For example, a "Drawing + Mode" button if a tablet digitizer is sensed. + + 2. Do we attempt to change automatically in response to user action? + + - An example was used above of a docked Surface Pro computer, where + the user may use the keyboard and touchpad for a while, then perhaps + tuck that behind and use the device as a touchscreen, and then + perhaps draw on it with a tablet stylus. + + - We would need to be careful to avoid careless "popping" or + "oscillating" if we react too quickly to changing input types. + +- On a separate-but-related note, the `W3C suggested <https://www.w3.org/TR/mediaqueries-5/#descdef-media-pointer>`__ + that it might be beneficial to allow users to at-least disable all + reporting of ``fine`` pointing devices for users who may have a disability + that prevents them from being able to click small objects, even with a fine + pointing device. + + +Pan-and-Zoom, Inertia, Overscroll, and Elastic Bounce +========================================================= + +- Inertia, overscroll, and elastic bounce are just plain broken on the + Surface Pro. That should definitely be investigated. + +- We can see from the video below that Microsoft Edge has quite a bit more + overscroll and a more elastic bounce than Firefox does, and it also + allows elastic bounce in directions that the page itself doesn't scroll. + + Edge's way seems more similar to the user experience I'd expect from using + Firefox on an iPhone or Android device. Perhaps we should consider + following suit? + + (`Link to video <https://drive.google.com/file/d/14XVLT6CNn2RaXcHHCRIrQmRwoMYjj6fu/view?usp=sharing>`__) + + +Web Events +============== + +- It's worth investigating why the ``touchend`` message never seems + to be sent by Firefox on any tested devices. + +- It's very disappointing that neither the Pointer Events API nor the + Touch Events API works at all on Firefox on the Surface Pro. That should + be investigated very soon! + + +Operating System Interfaces +================================ + +- With the upcoming sun-setting of Windows 7 support, Firefox has an + opportunity to revisit the implementation of our input handling and try to + simplify our codepaths and eliminate some of the workarounds that exist to + handle some of these complex interactions, as well as fix entire classes of + bugs - both reported and unreported - that currently exist as a result. + +- Does it make sense to combine the touchscreen and pen handling together + and use the ``WM_POINTERXXX`` messages for both? + + - This would eliminate the need to handle the ``WM_TOUCH`` and + ``WM_GESTURE`` messages at all. + + - Note that there is precedent for this, as `GTK <https://gitlab.gnome.org/GNOME/gtk/-/merge_requests/1563>`__ + has already done so. It appears that `Blender <https://archive.blender.org/developer/D7660>`__ + has plans to move toward this as well. + + - Tablet pens seemed to do very well in most of the testing, + and they are also the part of the code that mainly exercises the + ``WM_POINTERXXX`` codepaths. That may imply increased reliability in + that codepath? + + - The Pointer APIs also have good device simulation for integration + testing. + + - Would we also want to roll mouse handling into it using the + `EnableMouseInPointer <https://learn.microsoft.com/en-us/windows/win32/api/winuser/nf-winuser-enablemouseinpointer>` __ + call? That would allow us to also get rid of handling + ``WM_MOUSE[MOVE/WHEEL/HWHEEL]`` and ``WM_[LRM]BUTTON[UP|DOWN]`` + messages. Truly one codepath (with a few minor branches) to rule them + all! + + - Nick Rishel sent `this link <http://the-witness.net/news/2012/10/wm_touch-is-totally-bananas/>`__ + that details the troubles that the developers of The Witness (a video + game) ran into when using the ``WM_TOUCH`` API. It argues that the API + is poorly-designed, and advises that if Windows 7 support is not + needed, the API should be avoided. + +- Should we exclusively use DirectManipulation for Pan/Zoom? + + - Multitouch touchpads bypass all of the ``WM_POINTER`` machinery + for anything gesture-related and directly send their messages to + DirectManipulation. We then "capture" all the DirectManipulation events + and pump them into our events pipeline, as explained above. + + - DirectManipulation also handles "overscroll + elastic bounce" in a way + that aligns with Windows look-and-feel. + + - Perhaps it makes sense to just use DirectManipulation for all APZ + handling and eliminate any attempt at handling this through other + codepaths. + +High-Frequency Input +================================ + +"High-Frequency Input" refers to the ability for an app to be able to still +perceive input events despite them happening at a rate faster than the app +itself actually handles them. + +Consider a mouse that moves through several points: "A->B->C->D->E". If the +application processes input when the mouse is at "A" and doesn't poll again +until the mouse is at point "E", the default behavior of all modern operating +systems is to "coalesce" these events and simply report "A->E". This is fine +for the majority of use cases, but certain workloads (such as digital +handwriting and video games) can benefit from knowing the complete path that +was taken to get from the start point to the end point. + +Generally, solutions to this involve the operating system keeping a history of +pointer movements that can be retrieved through an API. For example, +Android provides the `MotionEvent <https://developer.android.com/reference/android/view/MotionEvent.html>`__ +API that batches historal movements. + +Unfortunately, the APIs to do this in Windows are terribly broken. As +`this blog <https://blog.getpaint.net/2019/11/14/paint-net-4-2-6-alpha-build-7258/>`__ +makes clear, `GetMouseMovePointsEx <https://docs.microsoft.com/en-us/windows/win32/api/winuser/nf-winuser-getmousemovepointsex>`__ +has so many issues that they had to remove its usage from their program because +of the burden. That same blog entry also details that the newer Pointer API has +the `GetPointerInfoHistory <https://docs.microsoft.com/en-us/windows/win32/api/winuser/nf-winuser-getpointerinfohistory>`__ +that is *supposed* to support tracking pointer history, but it only ever tracks +a single entry! + +Perhaps luckily, there is currently no web standard for high-frequency input, +although it `has been asked about in the past <https://lists.w3.org/Archives/Public/public-pointer-events/2014AprJun/0057.html>`__. + +If such a standard was ever created, it would likely be very difficult for +Firefox on Windows to support it. + + +DirectManipulation and Pens +============================= + +- This is a todo item, but it needs to be investigated whether or not + DirectManipulation can directly scoop up pen input, or whether it has + to be handled by the application (and forwarded to DM if desired). |