1384 lines
66 KiB
ReStructuredText
1384 lines
66 KiB
ReStructuredText
################################################################################
|
|
Windows Pointing Device Support in Firefox
|
|
################################################################################
|
|
|
|
.. contents:: Table of Contents
|
|
:depth: 4
|
|
|
|
================================================================================
|
|
Introduction
|
|
================================================================================
|
|
|
|
This document is intended to provide the reader with a quick primer and/or
|
|
refresher on pointing devices and the various operating system APIs, user
|
|
experience guidelines, and Web standards that contribute to the way Firefox
|
|
handles input devices on Microsoft Windows.
|
|
|
|
The documentation for these things is scattered across the web and has varying
|
|
levels of detail and completeness; some of it is missing or ambiguous and was
|
|
only determined experimentally or by reading about other people's experiences
|
|
through forum posts. An explicit goal of this document is to gather this
|
|
information into a cohesive picture.
|
|
|
|
We will then discuss the ways in which Firefox currently (as of early 2023)
|
|
produces incorrect or suboptimal behavior when implementing those standards
|
|
and guidelines.
|
|
|
|
Finally, we will raise some thoughts and questions to spark discussion on how
|
|
we might improve the situation and handle corner cases. Some of
|
|
these issues are intrinsically "opinion based" or "policy based", so clear
|
|
direction on these is desirable before engineering effort is invested into
|
|
reimplementation.
|
|
|
|
|
|
================================================================================
|
|
Motivation
|
|
================================================================================
|
|
|
|
A quick look at the `pile of defects <https://bugzilla.mozilla.orgbuglist.cgi?query_format=advanced&status_whiteboard=%5Bwin%3Atouch%5D&list_id=16586149&status_whiteboard_type=allwordssubstr>`__
|
|
on *bugzilla.mozilla.org* marked with *[win:touch]* will show anyone that
|
|
Firefox's input stack for pointer devices has issues, but the bugs recorded
|
|
there don't begin to capture the full range of unreported glitches and
|
|
difficult-to-reproduce hiccups that users run into while using touchscreen
|
|
hardware and pen digitizers on Firefox, nor does it capture the ways that
|
|
Firefox misbehaves according to various W3C standards that are (luckily) either
|
|
rarely used or worked around in web apps (and thus go undetected or
|
|
unreported).
|
|
|
|
These bugs primarily manifest in a few ways that will each be discussed in
|
|
their own section:
|
|
|
|
1. Firefox failing to return the proper values for the ``pointer``,
|
|
``any-pointer``, ``hover``, and ``any-hover`` CSS Media Queries
|
|
|
|
2. Firefox failing to fire the correct pointer-related DOM events at the
|
|
correct time (or at all)
|
|
|
|
3. Firefox's inconsistent handling of touch-related gestures like scrolling,
|
|
where certain machines (like the Surface Pro) fail to meet the expected
|
|
behavior of scrolling inertia and overscroll. This leads to a weird touch
|
|
experience where the page comes to a choppy, dead-stop when using
|
|
single-finger scrolling
|
|
|
|
|
|
It's worth noting that Firefox is not alone in having these types of issues,
|
|
and that handling input devices is a notoriously difficult task for many
|
|
applications; even a substantial amount of Microsoft's own software has trouble
|
|
navigating this minefield on their own Microsoft Surface devices. Defects are
|
|
instigated by a combination of the *intrinsic complexity* of the problem domain
|
|
and the *accidential complexity* introduced by device vendors and Windows
|
|
itself.
|
|
|
|
The *intrinsic complexity* comes from the simple fact that human-machine
|
|
interaction is difficult. A person must attempt to convey complex
|
|
and abstract goals through a series of simple movements involving a few pieces
|
|
of physical hardware. The devices can send signals that are unclear
|
|
or even contradictory, and the software must decide how to handle
|
|
this.
|
|
|
|
As a trivial example, every software engineer that's ever written
|
|
page scrolling logic has to answer the question, "What should my
|
|
program do if the user hits 'Page Up' and 'Page Down' at the same time?".
|
|
While it may seem obvious that the answer is "Do nothing.", naively-written
|
|
keyboard input logic might assume the two are mutually-exclusive and only
|
|
process whichever key is handled first in program order.
|
|
|
|
Occasionally, a new device will be invented that doesn't obviously map to
|
|
existing abstractions and input pipelines. There will be a period of time where
|
|
applications will want to support the new device, but it won't be well
|
|
understood by either the application developers nor the device vendor
|
|
themselves what ideal integration would look like. The new Apple Vision VR
|
|
headset is such a device; traditional VR headsets have used controllers to
|
|
point at things, but Apple insists that the entire thing should be done using
|
|
only hand tracking and eye tracking. Developers of VR video games and other
|
|
apps (like Firefox) will inevitably make many mistakes on the road to
|
|
supporting this new headset.
|
|
|
|
A major source of defect-causing *accidental complexity* is the lack of clear
|
|
expectations and documentation from Microsoft for apps (like Firefox) that are
|
|
not using their Universal Windows Platform (UWP). The Microsoft Developer
|
|
Network (MSDN) mentions concepts like inertia, overscroll, elastic bounce,
|
|
single-finger panning, etc., but the solution is presented in the context
|
|
of UWP, and the solution for non-UWP apps is either unclear or undocumented.
|
|
|
|
Adding to this complexity is the fact that Windows itself has gone through
|
|
several iterations of input APIs for different classes of devices, and
|
|
these APIs interact with each other in ways that are surprising or
|
|
unintuitive. Again, the advice given on MSDN pertains to UWP apps, and the
|
|
documentation about the newer "pointer" based window messages is
|
|
a mix of incomplete and inaccurate.
|
|
|
|
Finally, individual input devices have bugs in their driver software that
|
|
would disrupt even applications that are using the Windows input APIs perfectly.
|
|
Handling all of these deviations is impossible and would result in fragile,
|
|
unmaintainable code, but Firefox inevitably has to work around common ones to
|
|
avoid alienating large portions of the userbase.
|
|
|
|
|
|
================================================================================
|
|
Technical Background
|
|
================================================================================
|
|
|
|
|
|
A Quick Primer on Pointing Devices
|
|
======================================
|
|
|
|
|
|
Traditionally, web browsers were designed to accommodate computer mice and
|
|
devices that behave in a similar way, like trackballs and touchpads on
|
|
laptops. Generally, it was assumed that there would be one such device attached
|
|
to the computer, and it would be used to control a hovering "cursor" whose
|
|
movements would be changed by relative movement of the physical input device.
|
|
|
|
However, modern computers can be controlled using a variety of different
|
|
pointing devices, all with different characteristics. Many allow
|
|
multiple concurrent targets to be pointed at and have multiple sensors,
|
|
buttons, and other actuators.
|
|
|
|
For example, the screen of the Microsoft Surface Pro has dual capabilities
|
|
of being a touch sensor and a digitizer for a tablet pen. When being used as a
|
|
workstation, it's not uncommon for a user to also connect the "keyboard +
|
|
touchpad" cover and a mouse (via USB or Bluetooth) to provide the more
|
|
productivity-oriented "keyboard and mouse" setup. In that configuration, there
|
|
are 4 pointer devices connected to the machine simultaneously: a touch screen,
|
|
a pen digitizer, a touchpad, and a mouse.
|
|
|
|
The next section will give a quick overview of common pointing devices.
|
|
Many will be familiar to the reader, but they are still mentioned to establish
|
|
common terminology and to avoid making assumptions about familiarity with every
|
|
input device.
|
|
|
|
|
|
Common Pointing Devices
|
|
---------------------------
|
|
|
|
Here are some descriptions of a few pointing device types that demonstrate
|
|
the diversity of hardware:
|
|
|
|
**Touchscreen**
|
|
|
|
A touchscreen is a computer display that is able to sense the
|
|
location of (possibly-multiple) fingers (or stylus) making contact with its
|
|
surface. Software can then respond to the touches by changing the displayed
|
|
objects quickly, giving the user a sense of actually physically manipulating
|
|
them on screen with their hands.
|
|
|
|
.. image:: touchscreen.jpg
|
|
:width: 25%
|
|
|
|
|
|
**Digitizing Tablet + Pen Stylus**
|
|
|
|
These advanced pointing devices tend to
|
|
exist in two forms: as an external sensing "pad" that can be plugged into a
|
|
computer and sits on a desk or in someone's lap, or as a sensor built right
|
|
into a computer display. Both use a "stylus", which is a pen-shaped
|
|
electronic device that is detectable by the surface. Common features
|
|
include the ability to distinguish proximity to the surface ("hovering")
|
|
versus actual contact, pressure sensitivity, angle/tilt detection, multiple
|
|
"ends" such as a tip and an eraser, and one-or-more buttons/switch
|
|
actuators.
|
|
|
|
.. image:: wacom_tablet.png
|
|
:width: 25%
|
|
|
|
|
|
**Joystick/Pointer Stick**
|
|
|
|
Pointer sticks are most often seen in laptop
|
|
computers made by IBM/Lenovo, where they exist as a little red nub located
|
|
between the G, H, and B keys on a standard QWERTY keyboard. They function
|
|
similarly to the analog sticks on a game controller -- The user displaces
|
|
the stick from its center position, and that is interpreted as a relative
|
|
direction to move the on-screen cursor. A greater displacement from center
|
|
is interpreted as increased velocity of movement.
|
|
|
|
.. image:: trackpoint.jpg
|
|
:width: 25%
|
|
|
|
|
|
**Touchpad**
|
|
|
|
A touchpad is a rectangular surface (often found on laptop
|
|
computers) that detects touch and motion of a finger and moves an on-screen
|
|
cursor relative to the motion. Modern touchpads often support multiple
|
|
touches simultaneously, and therefore offer functionality that is quite
|
|
similar to a touchscreen, albeit with different movement semantics because
|
|
of their physical separation from the screen (discussed below).
|
|
|
|
.. image:: touchpad.jpg
|
|
:width: 25%
|
|
|
|
|
|
**VR Controllers**
|
|
|
|
VR controllers (and other similar devices like the
|
|
Wiimote from the Nintendo Wii) allow users to point at objects in a
|
|
three-dimensional virtual world by moving a real-world controller and
|
|
"projecting" the controller's position into the virtual space. They often
|
|
also include sensors to detect the yaw, pitch, and roll of the sensors.
|
|
There are often other inputs in the controller device, like analog sticks
|
|
and buttons.
|
|
|
|
.. image:: vrcontroller.jpg
|
|
:width: 25%
|
|
|
|
|
|
**Hand Tracking**
|
|
|
|
Devices like the Apple Vision (introduced during the
|
|
time this document was being written) and (to a lesser extent) the Meta
|
|
Quest have the ability to track the wearer's hand and directly interpret
|
|
gestures and movements as input. As the human hand can assume a staggering
|
|
number of orientations and configurations, a finite list of specific shapes
|
|
and movements must be identified and labelled to allow for clear
|
|
software-user interaction.
|
|
|
|
.. image:: apple_vision_user.jpg
|
|
:width: 25%
|
|
|
|
.. image:: apple_vision.jpg
|
|
:width: 25%
|
|
|
|
|
|
**Mouse**
|
|
|
|
A pointing device that needs no introduction. Moving a physical
|
|
clam-shaped device across a surface translates to relative movement of a
|
|
cursor on screen.
|
|
|
|
.. image:: mouse.jpg
|
|
:width: 25%
|
|
|
|
|
|
The Buxton Three-State Model
|
|
-------------------------------
|
|
|
|
|
|
Bill Buxton, an early pioneer in the field of human-computer interaction,
|
|
came up with a three-state model for pointing devices; a device can be
|
|
"Out of Range", "Tracking", or "Dragging". Not all devices support all three
|
|
states, and some devices have multiple actuators that can have the three-state
|
|
model individually applied.
|
|
|
|
.. mermaid::
|
|
|
|
stateDiagram-v2
|
|
direction LR
|
|
state "State 0" as s0
|
|
state "State 1" as s1
|
|
state "State 2" as s2
|
|
s0 --> s0 : Out Of Range
|
|
s1 --> s1 : Tracking
|
|
s2 --> s2 : Dragging
|
|
s0 --> s1 : Stylus On
|
|
s1 --> s0 : Stylus Lift
|
|
s1 --> s2 : Tip Switch Close
|
|
s2 --> s1 : Tip Switch Open
|
|
|
|
|
|
For demonstration, here is the model applied to a few devices:
|
|
|
|
**Computer Mouse**
|
|
|
|
A mouse is never in the "Out of Range" state. Even though it can technically
|
|
be lifted off its surface, the mouse does not report this as a separate
|
|
condition; instead, it behaves as-if it is stationary until it can once
|
|
again sense the surface moving underneath.
|
|
|
|
The remaining two states apply to each button individually; when a button is
|
|
not being pressed, the mouse is considered in the "tracking" state with
|
|
respect to that button. When a button is held down, the mouse is "dragging"
|
|
with respect to that button. A "click" is simply considered a zero-length
|
|
drag under this model.
|
|
|
|
In the case of a two-button mouse, this means that the mouse can be in a
|
|
total of 4 different states: tracking, left button dragging, right button
|
|
dragging, and two-button dragging. In practice, very little software
|
|
actually does anything meaningful with two-button dragging.
|
|
|
|
**Touch Screen**
|
|
|
|
Applying the model to a touch screen, one can observe that current hardware
|
|
has no way to sense that a finger that is "hovering, but not quite making
|
|
contact with the screen". This means that the "Tracking" state can be ruled
|
|
out, leaving only the "Out of Range" and "Dragging" states. Since many touch
|
|
screens can support multiple fingers touching the screen concurrently, and
|
|
each finger can be in one of two states, there are potentially 2^N different
|
|
"states" that a touchscreen can be in. Windows assigns meaning to many two,
|
|
three, and four-finger gestures.
|
|
|
|
**Tablet Digitizer**
|
|
|
|
A tablet digitizer supports all three states: when the stylus is far away
|
|
from the surface, it is considered "out of range"; when it is located
|
|
slightly above the surface, it is "tracking"; and when it is making contact
|
|
with the surface, it is "dragging".
|
|
|
|
The W3C standards for pointing devices are based on this three-state model, but
|
|
applied to each individual web element instead of the entire system. This
|
|
makes things like "Out-of-Range" possible for the mouse, since it can be
|
|
out of range of a web element.
|
|
|
|
The W3C uses the terms "over" and "out" to convey the transition between
|
|
"out-of-range" and "tracking" (which the W3C calls "hover"), and the terms
|
|
"down" and "up" convey the transition between "tracking" and "dragging".
|
|
|
|
The standard also address some of the known shortcomings of the model to
|
|
improve portability and consistency; these improvements will be discussed more
|
|
below.
|
|
|
|
The Windows Pointer API is *supposedly* based around this model,
|
|
but unfortunately real-world testing shows that the model is not followed
|
|
very consistently with respect to the actual signals sent to the application.
|
|
|
|
|
|
Gestures
|
|
=====================================
|
|
|
|
|
|
In contrast to the sort-of "anything goes" UI designs of the past,
|
|
modern operating systems like Windows, Mac OS X, iOS, Android, and even
|
|
modern Linux DEs have an "opinionated" idea of how user interaction
|
|
should behave across all apps on the platform (the so-called "look and feel"
|
|
of the operating system).
|
|
|
|
Users expect gestures like swipes, pinches, and taps to act the same way
|
|
across all apps for a given operating system, and they expect things like
|
|
on-screen keyboards or handwriting recognition to pop up in certain contexts.
|
|
Failing to meet those expectations makes an app look less polished, and
|
|
(especially as far as accessibility is concerned) it frustrates the user
|
|
and makes it more difficult for them to interact with the app.
|
|
|
|
Microsoft defines guidelines for various behaviours that Windows applications
|
|
should ideally adhere to in the `Input and Interactions <https://learn.microsoft.com/en-us/windows/apps/design/input/>`__
|
|
section on MSDN. Some of these are summarized quickly below:
|
|
|
|
**Drag and Drop**
|
|
|
|
Drag and drop allows a user to transfer data from one application to
|
|
another. The gesture begins when a pointer device moves into the "Dragging"
|
|
state over top of a UI element, usually as a result of holding down a mouse
|
|
button or pressing a finger on a touchscreen. The user moves the pointer
|
|
over top of the receiver of the data, and then ends the gesture by releasing
|
|
the mouse button or lifting their finger off the touchscreen. Window
|
|
interprets this transition out of the "Dragging" state as permission to
|
|
initiate the data transfer.
|
|
|
|
Firefox has supported Drag and Drop for a very long time, so it will not be
|
|
discussed further.
|
|
|
|
|
|
**Pan and Zoom**
|
|
|
|
When using touchscreens (and multi-touch touchpads), users expect to be able
|
|
to cause the viewport to "pan" left/right/up/down by pressing two fingers on
|
|
the screen (creating two pointers in "Dragging" state) and moving their
|
|
fingers in the direction of movement. When they are done, they can release
|
|
both fingers (changing both pointers to "Out of Bounds").
|
|
|
|
A zoom can be signalled by moving the two fingers apart or together
|
|
in a "pinch" or "reverse pinch" gesture.
|
|
|
|
|
|
**Single Pointer Panning**
|
|
|
|
Applications that are based on a UI model of the user interacting with a
|
|
"page" often allow a single pointer "Dragging" over the viewport to cause
|
|
the viewport to pan, similarly to the two-finger panning discussed in the
|
|
previous section.
|
|
|
|
Note that this gesture is not as universal as two-finger panning is -- as a
|
|
counterexample, graphics programs tend to treat one-finger dragging as
|
|
object manipulation and two-finger dragging as viewport panning.
|
|
|
|
|
|
**Inertia**
|
|
|
|
When a user is done panning, they may lift their finger/pen off the screen
|
|
while the viewport is still in motion. Users expect that the page will
|
|
continue to move for a little while, as-if the user had "tossed" the page
|
|
when they let go. Effectively, the page behaves as though it has "momentum"
|
|
that needs to be gradually lost before the page comes to a full stop.
|
|
|
|
Modern operating systems provide this behavior via their various native
|
|
widget toolkits, and the curve that objects follow as they slow to a stop
|
|
are different across OSes. In that way, they can be considered part of the
|
|
unique "look and feel" of the OS. Users expect the scrolling of pages in
|
|
their web browser to behave this way, and so when Firefox fails to provide
|
|
this behavior it can be jarring.
|
|
|
|
|
|
**Overscroll and Elastic Bounce**
|
|
|
|
When a user is panning the page and reaches the outer edges, Microsoft
|
|
recommends that the app should begin an "elastic bounce" animation, where
|
|
the page will allow the user to scroll past the end ("overscroll"),
|
|
show empty space underneath the page, and then sort of "snap back" like a
|
|
rubber band that's been stretched and then released. You can see a
|
|
demonstration in `this article <https://www.windowslatest.com/2020/05/21/microsoft-is-adding-elastic-scrolling-to-chrome-on-windows-10/>`__,
|
|
which discusses Microsoft adding it to Chromium.
|
|
|
|
|
|
History of Web Standards and Windows APIs
|
|
===========================================
|
|
|
|
The World-Wide Web Consortium (W3C) and the Web Hypertext Application
|
|
Technology Working Group (WHATWG) manage the standards that detail the
|
|
interface between a user agent (like Firefox) and applications designed to run
|
|
on the Web Platform. The user agent, in turn, must rely on the operating system
|
|
(Windows, in this case) to provide the necessary APIs to implement the
|
|
standards required by the Web Platform.
|
|
|
|
As a result of that relationship, a Web Standard is unlikely to be created
|
|
until all widely-used operating systems provide the required APIs. That allows
|
|
us to build a linear timeline with a predictable pattern: a new type of device
|
|
becomes popular, the APIs to support it are introduced into operating systems,
|
|
and eventually a cross-platform standard is introduced into the Web Platform.
|
|
|
|
The following sections detail the history of input devices supported by
|
|
Windows and the Web Platform:
|
|
|
|
|
|
**1985 - Computer Mouse Support (Windows 1.0)**
|
|
|
|
The first version of Windows (1985) supported a computer mouse. Support
|
|
for other input devices is not well-documented, but probably non-existant.
|
|
|
|
|
|
**1991 - Third-Party De-facto Pen Support (Wintab)**
|
|
|
|
In the late 80s and early 90s, any tablet pen hardware vendor that wanted
|
|
to support Windows would need to write a device driver and design a
|
|
proprietary user-mode API to expose the device to user applications. In
|
|
turn, application developers would have to write and maintain code to
|
|
support the APIs of every relevant device vendor.
|
|
|
|
In 1991, a company named LCS/Telegraphics released an API for Windows
|
|
called "Wintab", which was designed in collaboration with hardware and
|
|
software vendors to define a general API that could be targetted by
|
|
device drivers and applications.
|
|
|
|
It would take Microsoft more than a decade to include first-party support
|
|
for tablet pens in Windows, which allowed Wintab to become the de-facto
|
|
standard for pen support on Windows. The Wintab API continues to be
|
|
supported by virtually all artist tablets to this day. Notable companies
|
|
include Wacom, Huion, XP-Pen, etc.
|
|
|
|
|
|
**1992 - Early Windows Pen Support (Windows for Pen Computing)**
|
|
|
|
The earliest Windows operating system to support non-mouse pointing devices
|
|
was Windows 3.1 with the "Windows for Pen Computing" add-on (1992).
|
|
(`For the curious <https://socket3.wordpress.com/2019/07/31/windows-for-pen-computing-1-0/>`__,
|
|
and I'm certain `this book <https://www.amazon.com/Microsoft-Windows-Pen-Computing-Programmers/dp/1556154690>`__
|
|
is a must-read!). Pen support was mostly implemented by translating actions
|
|
into the existing ``WM_MOUSExxx`` messages, but also "upgraded" any
|
|
application's ``EDIT`` controls into ``HEDIT`` controls, which looked the
|
|
same but were capable of being handwritten into using a pen. This was not
|
|
very user-friendly, as the controls stayed the same size and the UI was not
|
|
adapted to the input method. This add-on never achieved much popularity.
|
|
|
|
It is not documented whether Netscape Navigator (the ancestor of Mozilla
|
|
Firefox) supported this add-on or not, but there is no trace of it in modern
|
|
Firefox code.
|
|
|
|
|
|
**1995 - Introduction of JavaScript and Mouse Events (De-facto Web Standard)**
|
|
|
|
The introduction of JavaScript in 1995 by Netscape Communications added a
|
|
programmable, event-driven scripting environment to the Web Platform.
|
|
Browser vendors quickly added the ability for scripts to listen for and
|
|
react to mouse events. These are the well-known events like ``mouseover``,
|
|
``mouseenter``, ``mousedown``, etc. that are ubiquitous on the web, and are
|
|
known by basically anyone who has ever written front-end JavaScript.
|
|
|
|
This ubiquity created a de-facto standard for mouse input, which would
|
|
eventually be formally standardized by the W3C in the HTML Living Standard
|
|
in 2001.
|
|
|
|
The Mouse Event APIs assume that the computer has one single pointing device
|
|
which is always present, has a single cursor capable of "hovering" over an
|
|
element, and has between one and three buttons.
|
|
|
|
When support for other pointing devices like touchscreen and pen first
|
|
became available in operating systems, it was exposed to the web by
|
|
interpreting user actions into equivalent mouse events. Unfortunately, this
|
|
is unable to handle multiple concurrent pointers (like one would get from
|
|
multitouch screens) or report the kind of rich information a pen digitizer
|
|
can provide, like tilt angle, pressure, etc. This eventually lead the W3C
|
|
to develop the new "Touch Events" standard to expose touch functionality,
|
|
and eventually the "Pointer Events" to expose more of the rich information
|
|
provided by pens.
|
|
|
|
|
|
**2005 - Mainstream Pen Support (Windows XP Tablet PC Edition)**
|
|
|
|
It was the release of Windows XP Tablet PC Edition (2005) that allowed
|
|
Windows applications to directly support tablet pens by using the new COM
|
|
"`Windows Tablet PC <https://learn.microsoft.com/en-us/windows/win32/tablet/tablet-pc-development-guide>`__"
|
|
APIs, most of which are provided through the main `InkCollector <https://learn.microsoft.com/en-us/windows/win32/tablet/inkcollector-class>`__
|
|
class. The ``InkCollector`` functionality would eventually be "mainlined"
|
|
into Windows XP Professional Service Pack 2, and continues to exist in
|
|
modern Windows releases.
|
|
|
|
The Tablet PC APIs consist of a large group of COM objects that work
|
|
together to facilitate enumerating attached pens, detecting pen movement and
|
|
pen strokes, and analyzing them to provide:
|
|
|
|
1. **Cursor Movement**: translates the movements of the pen into the
|
|
standard mouse events that applications expect from mouse cursor
|
|
movement, namely ``WM_NCHITTEST``, ``WM_SETCURSOR`` and
|
|
``WM_MOUSEMOVE``.
|
|
|
|
2. **Gesture Recognition**: detects common user actions, like "tap",
|
|
"double-tap", "press-and-hold", and "drag". The `InkCollector` delivers
|
|
these events via COM `SystemGesture <https://learn.microsoft.com/en-us/windows/win32/tablet/inkcollector-systemgesture>`__
|
|
events using the `InkSystemGesture <https://learn.microsoft.com/en-us/windows/win32/api/msinkaut/ne-msinkaut-inksystemgesture>`__
|
|
enumeration. It will also translate them into common Win32 messages; for
|
|
example, a "drag" gesture would be translated into a ``WM_LBUTTONDOWN``
|
|
message, several ``WM_MOUSEMOVE`` messages, and finally a
|
|
``WM_LBUTTONUP`` message.
|
|
|
|
An application that is using ``InkCollector`` will receive both types of
|
|
messages: traditional mouse input through the Win32 message queue, and
|
|
"Tablet PC API" events through COM callbacks. It is up to the
|
|
application to determine which events matter to it in a given context,
|
|
as the two types of events are not guaranteed by Microsoft to correspond
|
|
in any predictable way.
|
|
|
|
3. **Shape and Text Recognition**: allows the app to
|
|
recognize letters, numbers, punctuation, and other `common shapes <https://learn.microsoft.com/en-us/windows/win32/api/msinkaut/ne-msinkaut-inkapplicationgesture>`__
|
|
the user might make using their pen. Supported shapes include circles,
|
|
squares, arrows, and motions like "scratch out" to correct a misspelled
|
|
word. Custom recognizers exist that allow recognition of other symbols,
|
|
like music notes or mathematical notation.
|
|
|
|
4. **Flick Recognition**: allows the user to invoke actions via quick,
|
|
linear motions that are recognized by Windows and sent to the app as
|
|
``WM_TABLET_FLICK`` messages. The app can choose to handle the window
|
|
message or pass it on to the default window procedure, which will
|
|
translate it to scrolling messages or mouse messages.
|
|
|
|
For example, a quick upward 'flick' corresponds to "Page up", and
|
|
a quick sideways flick in a web browser would be "back". Flicks were
|
|
never widely used by Windows apps, and they may have been removed in
|
|
more recent versions of Windows, as the existing Control Panel menus
|
|
for configuring them seem to no longer exist as of Windows 10 22H2.
|
|
|
|
|
|
Firefox does not appear to have ever used these APIs to allow tablet pen
|
|
input, with the exception of `one piece of code <https://searchfox.org/mozilla-central/rev/e6cb503ac22402421186e7488d4250cc1c5fecab/widget/windows/InkCollector.cpp>`__
|
|
to detect when the pen leaves the Firefox window to solve
|
|
`Bug 1016232 <https://bugzilla.mozilla.org/show_bug.cgi?id=1016232>`__.
|
|
|
|
|
|
**2009 - Touch Support: WM_GESTURE (Windows 7)**
|
|
|
|
While attempts were made with the release of Windows Vista (2007) to support
|
|
touchscreens through the existing tablet APIs, it was ultimately the release
|
|
of Windows 7 (2009) that brought first-class support for Touchscreen devices
|
|
to Windows with new Win32 APIs and two main window messages: ``WM_TOUCH``
|
|
and ``WM_GESTURE``.
|
|
|
|
These two messages are mutually-exclusive, and all applications are
|
|
initially set to receive only ``WM_GESTURE`` messages. Under this
|
|
configuration, Windows will attempt to recognize specific movements on a
|
|
touch digitizer and post "gesture" messages to the application's message
|
|
queue. These gestures are similar to (but, somewhat-confusingly, not
|
|
identical to) the gestures provided by the "Windows Tablet PC" APIs
|
|
mentioned above. The main gesture messages are: zoom, pan, rotate,
|
|
two-finger-tap, and press-and-tap (one finger presses, another finger
|
|
quickly taps the screen).
|
|
|
|
In contrast to the behavior of the ``InkCollector`` APIs, which will send
|
|
both gesture events and translated mouse messages, the ``WM_GESTURE``
|
|
message is truly "upstream" of the translated mouse messages; the translated
|
|
mouse messages will only be generated if the application forwards the
|
|
``WM_GESTURE`` message to the default window procedure. This makes
|
|
programming against this API simpler than the ``InkCollector`` API, as
|
|
there is no need to state-fully "remember" that an action has already been
|
|
serviced by one codepath and needs to be ignored by the other.
|
|
|
|
Firefox current supports the ``WM_GESTURE`` message when Asynchronous Pan
|
|
and Zoom (APZ) is not enabled (although we do not handle inertia in this
|
|
case, so the page comes to a dead-stop immediately when the user stops
|
|
scrolling).
|
|
|
|
|
|
**2009 - Touch Support: WM_TOUCH (Windows 7)**
|
|
|
|
Also introduced in Windows 7, an application that needs full control over
|
|
touchscreen events can use `RegisterTouchWindow <https://learn.microsoft.com/en-us/windows/win32/api/winuser/nf-winuser-registertouchwindow>`__
|
|
to change any of its windows to receive ``WM_TOUCH`` messages instead of the
|
|
more high-level ``WM_GESTURE`` messages. These messages explicitly notify
|
|
the application about every finger that contacts or breaks contact with the
|
|
digitizer (as well as each finger's movement over time). This provides
|
|
absolute control over touch interpretation, but also means that the burden
|
|
of handling touch behavior falls completely on the application.
|
|
|
|
To help ease this burden, Microsoft provides two COM APIs to interpret
|
|
touch messages, ``IManipulationProcessor`` and ``IInertiaProcessor``.
|
|
|
|
``IManipulationProcessor`` can be considered a superset of the functionality
|
|
available through normal gestures. The application feeds ``WM_TOUCH`` data
|
|
into it (along with other state, such as pivot points and timestamps), and
|
|
it allows for manipulations like: two-finger rotation around a pivot,
|
|
single-finger rotation around a pivot, simultaneous rotation and translation
|
|
(for example, 'dragging' a single corner of a square).
|
|
`These MSDN diagrams <https://learn.microsoft.com/en-us/windows/win32/wintouch/advanced-manipulations-overview>`__
|
|
give a good overview of the kinds of advanced manipulations an app might
|
|
support.
|
|
|
|
``IInertiaProcessor`` works with ``IManipulationProcessor`` to add inertia
|
|
to objects in a standard way across the operating system. It is likely that
|
|
later APIs that provide this (like DirectManipulation) are using these COM
|
|
objects under the hood to accomplish their inertia handling.
|
|
|
|
Firefox currently handles the ``WM_TOUCH`` event when Asynchronous Pan and
|
|
Zoom (APZ) is enabled, but we do not use either the ``IInertiaProcessor``
|
|
nor the ``IManipulationProcessor``.
|
|
|
|
|
|
**2012 - Unified Pointer API (Windows 8)**
|
|
|
|
Windows 8 (2012) was Microsoft's initial attempt to make a touch-first,
|
|
mobile-first operating system that (ideally) would make it easy for app
|
|
developers to treat touch, pen, and mouse as first-class input devices.
|
|
|
|
By this point, the Windows Tablet APIs would allow tablet pens to draw
|
|
text and shapes like squares, triangles, and music notes, and those shapes
|
|
would be recognizable by the Windows Ink subsystem.
|
|
|
|
At the same time, Windows Touch allowed touchscreens to have advanced
|
|
manipulation, like rotate + translate, or simultaneous pan and zoom, and it
|
|
allowed objects manipulated by touch to have momentum and angular velocity.
|
|
|
|
The shortcomings of having separate input stacks for these various devices
|
|
starts to be become apparent after a while: Why shouldn't a touchscreen be
|
|
able to recognize a circle or a triangle? Why shouldn't a pen be able to
|
|
have complex rotation and zoom functionality? How do we handle these newer
|
|
laptop touchpads that are starting to handle multi-touch gestures like a
|
|
touchscreen, but still cause relative cursor movement like a mouse? Why does
|
|
my program have to have 3 separate codepaths for different pointing devices
|
|
that are all very similar?
|
|
|
|
The Windows Pointer Device Input Stack introduces new APIs and window
|
|
messages that generalize the various types of pointing devices under a
|
|
single API while still falling back to the legacy touch and tablet input
|
|
stacks in the event that the API is unused. (Note that the touch and tablet
|
|
stacks themselves fall back to the traditional mouse input stack when they
|
|
are unused.)
|
|
|
|
Microsoft based their pointer APIs off the Buxton Three-State Model
|
|
(discussed earlier), where changes between "Out-of-Range" and "Tracking" are
|
|
signalled by ``WM_POINTERENTER`` AND ``WM_POINTERLEAVE`` messages, and
|
|
changes between "Tracking" and "Dragging" are signalled by
|
|
``WM_POINTERDOWN`` and ``WM_POINTERUP``. Movement is indicated via
|
|
``WM_POINTERUPDATE`` messages.
|
|
|
|
If these messages are unhandled (the message is forwarded to
|
|
``DefWindowProc``), the Win32 subsystem will translate them
|
|
into touch or gesture messages. If unhandled, those will be further
|
|
translated into mouse and system messages.
|
|
|
|
While the Pointer API is not without some unfortunate pitfalls (which will
|
|
be discussed later), it still provides several advantages over the
|
|
previously available APIs: it can allow a mostly-unified codepath for
|
|
handling pointing devices, it circumvents many of the often-complex
|
|
interactions between the previous APIs, and it provides the ability to
|
|
simulate pointing devices to help facilitate end-to-end automated testing.
|
|
|
|
Firefox currently uses the Pointer APIs to handle tablet stylus input only,
|
|
while other input methods still use the historical mouse and touch input
|
|
APIs above.
|
|
|
|
|
|
**2013 - DirectManipulation (Windows 8.1)**
|
|
|
|
DirectManipulation is a DirectX based API that was added during the release
|
|
of Windows 8.1 (2013). This API allows an app to create a series of
|
|
"viewports" inside a window and have scrollable content within each of these
|
|
viewports. The manipulation engine will then take care of automatically
|
|
reading Pointer API messages from the window's event queue and generating
|
|
pan and zoom events to be consumed by the app.
|
|
|
|
In the case that the app is also using DirectComposition to draw its window,
|
|
DirectManipulation can pipe the events directly into it, causing the app
|
|
to essentially get asynchronous pan and zoom with proper handling of inertia
|
|
and overscroll with very little coding.
|
|
|
|
DirectManipulation is only used in Firefox to handle data coming from
|
|
Precision Touchpads, as Microsoft provides no other convenient API for
|
|
obtaining data from such devices. Firefox creates fake content inside of
|
|
a fake viewport to capture the incoming events from the touchpad and
|
|
translates them into the standard Asynchronous Pan and Zoom (APZ) events
|
|
that the rest of the input pipeline uses.
|
|
|
|
|
|
**2013 - Touch Events (Web Standard)**
|
|
|
|
"`Touch Events <https://www.w3.org/TR/touch-events/>`__" became a W3C
|
|
recommendation in October, 2013.
|
|
|
|
At this point, Microsoft's first operating system to include touch support
|
|
(Windows 7) was the most popular desktop operating system, and the ubiquity
|
|
of smart phones brought a huge uptick in users with touchscreen inputs. All
|
|
major browsers included some API that allowed reading touch input,
|
|
prompting the W3C to formalize a new standard to ensure interoperability.
|
|
|
|
With the Touch Events API, multiple touch interactions may be reported
|
|
simultaneously, each with their own separate identifier for tracking and
|
|
their own coordinates within the screen, viewport, and client area. A
|
|
touch is reported by: a ``touchstart`` event with a unique ID for each
|
|
contact, zero-or-more ``touchmove`` events with that ID, and finally a
|
|
``touchend`` event to signal the end of that specific contact.
|
|
|
|
The API also has some amount of support for pen styluses, but it lacks
|
|
important features necessary to truly support them: hovering, pressure,
|
|
tilt, or multiple cursors like an erasure. Ultimately, its functionality
|
|
has been superceded by the newer "Pointer Events" API, discussed below.
|
|
|
|
|
|
**2016 - Precision Touchpads (Windows 10)**
|
|
|
|
Early touchpads emulated a computer mouse by directly using the same IBM
|
|
PS/2 interface that most computer mice used and translating relative
|
|
movement of the user's finger into equivalent movements of a mouse on a
|
|
surface.
|
|
|
|
As touchpad technology advanced and more powerful interface standards like
|
|
USB begun to take over the consumer market, touchpad vendors started adding
|
|
extra features to their hardware, like tap-to-click, tap-and-drag, and
|
|
tap-and-hold (to simulate a right click). These behaviors were implemented
|
|
by touchpad vendors either in hardware drivers and/or user mode "hooks" that
|
|
injected equivalent Win32 messages into the appropriate target.
|
|
|
|
As expected, each touchpad vendor's driver had its own subtly-different
|
|
behavior from others, its own bugs, and its own negative interactions with
|
|
other software.
|
|
|
|
During the later years of Windows 8, Microsoft and touchpad company
|
|
Synaptics co-developed the "Precision Touchpad" standard, which defines an
|
|
interface for touchpad hardware to report its physical measurements,
|
|
precision, and sensor configuration to Windows and allows it to deliver raw
|
|
touch data. Windows then interprets the data and generates gestures and
|
|
window messages in a standard way, removing the burden of implementing these
|
|
behaviors from the touchpad vendor and providing the OS with rich
|
|
information about the user's movements.
|
|
|
|
It wasn't until the 2016 release of Windows 10 14946 that Microsoft would
|
|
support all the standard gestures through the new standard. Although
|
|
adoption by vendors has been a bit slow, the fact that
|
|
`it is a requirement for Windows 11 <https://pocketnow.com/all-windows-11-pcs-will-be-required-to-have-a-precision-touchpad-and-webcam/>`__
|
|
means that vendor support for this standard is imminent.
|
|
|
|
Unfortunately, there's a piece of bad news: Microsoft did not
|
|
implement the above "Unified Pointer API" for use with touchpads, as the
|
|
developers of Blender discovered when `they moved to the Pointer API <https://archive.blender.org/developer/D7660>`__.
|
|
Instead, Microsoft expects developers to either use DirectManipulation to
|
|
automatically get pan/zoom enabled for their app, or the RawInput API to
|
|
directly read touchpad data.
|
|
|
|
|
|
**2019 - Pointer Events (Web Standard)**
|
|
|
|
"`Pointer Events <https://www.w3.org/TR/pointerevents/>`__" became a level 2
|
|
W3C recommendation in April, 2019. They considered `the work done by Microsoft <https://www.w3.org/Submission/2012/SUBM-pointer-events-20120907/>`__
|
|
as part of the design of their own Pointer API, and in many ways the W3C
|
|
standard resembles an improved, better specified, more consistent, and
|
|
easier-to-use version of the APIs provided by the Win32 subsystem.
|
|
|
|
The Pointer Events API generalizes devices like touchscreens, mice, tablet
|
|
pens, VR controllers, etc. into a "thing that points". A pointer has
|
|
(optional) properties: a width and height (big for a finger, 1px for a
|
|
mouse), an amount of pressure, a tilt angle relative to the surface, some
|
|
buttons, etc. This helps applications maximize code reuse for handling
|
|
pointer input by having a common codebase written against these generalized
|
|
traits. If needed, the application may also have smaller, specialized
|
|
sections of code for each concrete pointer type.
|
|
|
|
Certain types of pointers (like pens and touchscreens) have a behavior where
|
|
they are always "captured" by the first object that they interact with. For
|
|
example, if a user puts their finger on an empty part of a web page and
|
|
starts to scroll, their finger is now "captured" by the web page itself.
|
|
"Captured" means that even if their finger moves over an element in
|
|
the web page, that element will not receive events from the finger -- the
|
|
page itself will until the entire interaction stops.
|
|
|
|
The events themselves very closely follow the Buxton Three-State Model
|
|
(discussed earlier), where ``pointerover/pointerout`` messages indicate
|
|
transitions from "Out of Range" to "Tracking" and visa-versa, and
|
|
``pointerdown/pointerup`` messages transition between "Tracking" and
|
|
"Dragging". ``pointermove`` updates the position of the pointer, and a
|
|
special ``pointercancel`` message is sent to inform the page that the
|
|
browser is "cancelling" a ``pointerdown`` event because it has decided to
|
|
consume it for a gesture or because the operating system cancelled the
|
|
pointer for its own reasons.
|
|
|
|
|
|
CSS "interaction" Media Queries
|
|
==========================================
|
|
|
|
(Note that this section is **not** about the `pointer-events <https://developer.mozilla.org/en-US/docs/Web/CSS/pointer-events>`__
|
|
CSS property, which defines the circumstances where an element can be the target
|
|
of pointer events.)
|
|
|
|
The W3C defines the interaction-related media queries in the
|
|
`Media Queries Level 4 - Interaction Media Features <https://www.w3.org/TR/mediaqueries-4/#mf-interaction>`__
|
|
document.
|
|
|
|
To summarize, the main interaction-related CSS Media Queries that Firefox must
|
|
support are ``pointer``, ``any-pointer``, ``hover`` and ``any-hover``.
|
|
|
|
|
|
``pointer``
|
|
|
|
Allows the webpage to query the existence of a pointing device on
|
|
the machine, and (if available) the assumed "pointing accuracy" of the
|
|
"primary" pointing device. The device considered "primary" on a machine with
|
|
multiple input devices is a policy decision that must be made by the web
|
|
browser; Windows simply provides the APIs to query information about
|
|
attached devices.
|
|
|
|
The browser is expected to return one of three strings to this media query:
|
|
|
|
``none``
|
|
|
|
There is no pointing device attached to the computer.
|
|
|
|
``coarse``
|
|
|
|
The primary pointing device is capable of approximately
|
|
pointing at a relatively large target (like a finger on a
|
|
touchscreen).
|
|
|
|
``fine``
|
|
|
|
The primary pointing device is capable of near-pixel-level
|
|
accuracy (like a computer mouse or a tablet pen).
|
|
|
|
|
|
``any-pointer``
|
|
|
|
Similar to ``pointer``, but represents the union of
|
|
capabilities of all pointers attached to the system, such that the meanings
|
|
become:
|
|
|
|
``none``
|
|
|
|
There is no pointing device attached to the computer.
|
|
|
|
``coarse``
|
|
|
|
There is at-least one "coarse" pointer attached.
|
|
|
|
``fine``
|
|
|
|
There is at-least one "fine" pointer attached.
|
|
|
|
|
|
``hover``
|
|
|
|
Allows the webpage to query whether the primary pointer is
|
|
capable of "hovering" over top of elements on the page. Computer mice,
|
|
touchpad cursors, and higher-end pen tablets all support this, whereas
|
|
current touchscreens are "touch" or "no touch", and they cannot detect a
|
|
finger hovering over the screen.
|
|
|
|
``hover``
|
|
|
|
The primary pointer is capable of reporting hovering.
|
|
|
|
``none``
|
|
|
|
The primary pointer is not capable of reporting hovering.
|
|
|
|
``any-hover``
|
|
|
|
Indicates whether any pointer attached to the system has the
|
|
``hover`` capability.
|
|
|
|
|
|
Selection of the Primary Pointing Device
|
|
--------------------------------------------
|
|
|
|
To illustrate the complexity of this topic, consider the Microsoft Surface Pro.
|
|
|
|
The Surface Pro has an advanced screen that is capable of receiving touch
|
|
input, but it can also behave like a pen digitizer and receive input from a
|
|
stylus with advanced pen capabilities, like hover sensing, pressure
|
|
sensitivity, multiple buttons, and even multiple "tips" (a pen and eraser end).
|
|
|
|
In this case, what should Firefox consider the primary pointing device?
|
|
|
|
Perhaps the user intends to use their Surface Pro like a touchscreen tablet,
|
|
at which point Firefox should report ``pointer: coarse`` and ``hover: none``
|
|
capabilities.
|
|
|
|
But what if, instead, the user wants to sketch art or take notes using a pen on
|
|
their Surface Pro? In this case, Firefox should be reporting ``pointer: fine``
|
|
and ``hover: hover``.
|
|
|
|
Imagine that the user then attaches the "keyboard + touchpad" cover attachment
|
|
to their Surface Pro; naturally, we will consider that the user's intent is for
|
|
the touchpad to become the primary pointing device, and so it is fairly clear
|
|
that we should return ``pointer: fine`` and ``hover: hover`` in this state.
|
|
|
|
However, what if the user tucks the keyboard/touchpad attachment behind the
|
|
tablet and begins exclusively operating the device with their finger?
|
|
|
|
This example shows that complex, multi-input machines can resist classification
|
|
and blur the lines between labels like "touch device", "laptop", "drawing
|
|
tablet", etc. It also illustrates that identifying the "primary" pointing
|
|
device using only machine configuration may yield unintuitive and suboptimal
|
|
results.
|
|
|
|
While we can almost-certainly improve our hardware detection heuristics to
|
|
better answer this question (and we should, at the very least), perhaps it
|
|
makes more sense for Firefox to incorporate user intentions into the decision.
|
|
Intentions could be communicated directly by the user through some sort of
|
|
setting or indirectly through the user's actions.
|
|
|
|
For example, if the user intends to draw on the screen with a pen, perhaps
|
|
Firefox provides something like a "drawing mode" that the user can toggle to
|
|
change the primary pointing device to the pen. Or perhaps it's better for
|
|
Firefox to interpret the mere fact of receiving pen input as evidence of the
|
|
user's intent and switch the reported primary pointing device automatically.
|
|
|
|
If we wanted to switch automatically, there are predictable traps and pitfalls
|
|
we need to think about: we need to ensure that we don't create frustrating user
|
|
experiences where web pages may "pop" beneath the user suddenly, and
|
|
we should likely incorporate some kind of "settling time" so we don't
|
|
oscillate between devices.
|
|
|
|
It's worth noting that Chromium doesn't seem to incorporate anything like
|
|
what's being suggested here, so if this is well-designed it may be an
|
|
opportunity for Firefox to try something novel.
|
|
|
|
|
|
|
|
|
|
================================================================================
|
|
State of the Browser
|
|
================================================================================
|
|
|
|
Pan and Zoom, Inertia, Overscroll, and Elastic Bounce
|
|
=========================================================
|
|
|
|
As can be seen in the videos below, Firefox's support for inertia, overscroll,
|
|
and elastic bounce works well on all platforms when a stylus pen is used
|
|
as the input device, and it also works just fine with the touchscreen on the
|
|
Dell XPS 15. However, it completely fails when the touchscreen is used on
|
|
the Microsoft Surface Pro. While more investigation is needed to completely
|
|
understand these issues, the fact that the correctly-behaving digitizing pens
|
|
use the Pointer API and the misbehaving input devices do not may be related.
|
|
|
|
- `Video 1 <https://drive.google.com/file/d/1Z1QRSf2RluNhJwkKCzPb6-14vRtkqK8s/view?usp=sharing>`__
|
|
showcasing overscroll and bounce not working on Surface Pro with touch, but
|
|
other devices/inputs are working
|
|
|
|
- `Video 2 <https://drive.google.com/file/d/1bOgpVGBeZtwelvPJzYdA6uFRpubGtu4W/view?usp=sharing>`__
|
|
showing that everything works just fine with an external Wacom digitizer
|
|
|
|
|
|
Pointer Media Queries
|
|
=========================================================
|
|
|
|
**"any-pointer" Queries**
|
|
|
|
Unlike the ``pointer`` media queries, which rely on the browser to make a policy
|
|
decision about what should be considered the "primary" pointer in a given
|
|
system configuration, the ``any-pointer`` queries are much more objective and
|
|
binary: the computer either has a type of device attached to it, or it
|
|
doesn't.
|
|
|
|
**any-pointer: coarse**
|
|
|
|
Firefox reports that there are "coarse" pointing devices present if either of
|
|
these two points is true:
|
|
|
|
1. ``GetSystemMetrics(SM_DIGITIZER)`` reports that a device that supports
|
|
touch or pen is present.
|
|
|
|
2. Based on heuristics, Firefox concludes that it is running on a computer it
|
|
considers a "tablet".
|
|
|
|
Point #1 is incorrect, as a pen is not a "coarse" pointing device. Note that
|
|
this is a recent regression in `Bug 1811303 <https://bugzilla.mozilla.org/show_bug.cgi?id=1811303>`__
|
|
that was uplifted to Firefox 112, so this actually regressed as this document
|
|
was being written! This is responsible for the incorrect "Windows 10 Desktop +
|
|
Wacom USB Tablet" issue in the table.
|
|
|
|
Point #2 is a clear case of the `XY Problem <https://en.wikipedia.org/wiki/XY_problem>`__,
|
|
where Firefox is trying to determine if a coarse pointing device is present
|
|
by determining whether it is running on a tablet, when instead it should be
|
|
directly testing for coarse pointing devices (since, of course, those can exist
|
|
on machines that wouldn't normally be considered a "tablet"). This is
|
|
responsible for the incorrect "Windows 10 Dell XPS 15 (Touch Disabled) + Wacom
|
|
USB Tablet" issue in the table below.
|
|
|
|
**any-pointer: fine**
|
|
|
|
Firefox reports that there are "fine" pointing devices present if and only if
|
|
it detects a mouse. This is clearly already wrong. Firefox determines that the
|
|
computer has a mouse using the following algorithm:
|
|
|
|
1. If ``GetSystemMetrics(SM_MOUSEPRESENT)`` returns false, report no mouse.
|
|
|
|
2. If Firefox does not consider the current computer to be a tablet, report a
|
|
mouse if there is at-least one "mouse" device driver running on the
|
|
computer.
|
|
|
|
3. If Firefox considers the current computer to be a tablet or a touch system,
|
|
only report a mouse if there are at-least two "mouse" device drivers
|
|
running. This exists because some tablet pens and touch digitizers report
|
|
themselves as computer mice.
|
|
|
|
This algorithm also suffers from the XY problem -- Firefox is trying to
|
|
determine whether a fine pointing device exists by determining if there is
|
|
a computer mouse present, when instead it should be directly testing for
|
|
fine pointing devices, since mice are not the only fine pointing
|
|
devices.
|
|
|
|
Because of this proxy question, this algorithm is completely dependent on any
|
|
attached fine pointing device (like a pen tablet) to report itself as a mouse.
|
|
Point #3 makes the problem even worse, because if a computer that resembles a
|
|
tablet fails to report its digitizers as mice, the algorithm will completely
|
|
ignore an actual computer mouse attached to the system because it expects two
|
|
of them to be reported!
|
|
|
|
Unfortunately, the Surface Pro has both a pen digitizer and a touch digitizer,
|
|
and it reports neither as a mouse. As a result, this algorithm completely falls
|
|
apart on the Surface Pro, failing to report any "fine" pointing device even
|
|
when a computer mouse is plugged in, a pen is plugged in, or even when
|
|
the tablet is docked because its touchpad is only one mouse and it expects
|
|
at least two.
|
|
|
|
This is also responsible for failing to report the trackpad on the Dell XPS 15
|
|
as "fine", because the Dell XPS 15 has a touchscreen and therefore looks like
|
|
a "tablet", but doesn't report 2 mouse drivers.
|
|
|
|
**any-pointer: hover**
|
|
|
|
|
|
Firefox reports that any device that is a "fine" pointer also supports "hover",
|
|
which does generally hold true, but isn't necessarily true for lower-end pens
|
|
that only support tapping. It would be better for Firefox to directly
|
|
query the operating system instead of just assuming.
|
|
|
|
**"pointer" media query**
|
|
|
|
As discussed previously at length, this media query relies on a "primary"
|
|
designation made by the browser. Below is the current algorithm used to
|
|
determine this:
|
|
|
|
1. If the computer is considered a "tablet" (see below), report primary
|
|
pointer as "coarse" (this is clearly already the wrong behavior).
|
|
|
|
2. Otherwise, if the computer has a mouse plugged in, report "fine".
|
|
|
|
3. Otherwise, if the computer has a touchscreen or pen digitizer, report
|
|
"coarse" (this is wrong in the case of the digitizer).
|
|
|
|
4. Otherwise, report "fine" (this is wrong; should report "None").
|
|
|
|
Firefox uses the following algorithm to determine if the computer is a
|
|
"tablet" for point #1 above:
|
|
|
|
1. It is not a tablet if it's not at-least running Windows 8.
|
|
|
|
2. If Windows "Tablet Mode" is enabled, it is a tablet no matter what.
|
|
|
|
3. If no touch-capable digitizers are attached, it is not a tablet.
|
|
|
|
4. If the system doesn't support auto-rotation, perhaps because it has
|
|
no rotation sensor, or perhaps because it's docked and operating in
|
|
"laptop mode" where rotation won't happen, it's not a tablet.
|
|
|
|
5. If the vendor that made the computer reports to Windows that it supports
|
|
"convertible slate mode" and it is currently operating in "slate mode",
|
|
it's a tablet.
|
|
|
|
6. Otherwise, it's not a tablet.
|
|
|
|
|
|
**Table with comparison to Chromium**
|
|
|
|
The following table shows how Firefox and Chromium respond to various pointer
|
|
queries. The "any-pointer" and "any-hover" columns are not subjective and
|
|
therefore are always either green or red to indicate "pass" or "fail", but the
|
|
"pointer" and "hover" may also be yellow to indicate that it's "open to
|
|
interpretation" because of the aforementioned difficulty in determining the
|
|
"primary pointer".
|
|
|
|
.. image:: touch_media_queries.png
|
|
:width: 100%
|
|
|
|
|
|
**Related Bugs**
|
|
|
|
- Bug 1813979 - For Surface Pro media query "any-pointer: fine" is true only
|
|
when both the Type Cover and mouse are connected
|
|
|
|
- Bug 1747942 - Incorrect CSS media query matches for pointer, any-pointer,
|
|
hover and any-hover on Surface Laptop
|
|
|
|
- Bug 1528441 - @media (hover) and (any-hover) does not work on Firefox 64/65
|
|
where certain dual inputs are present
|
|
|
|
- Bug 1697294 - Content processes unable to detect Windows 10 Tablet Mode
|
|
|
|
- Bug 1806259 - CSS media queries wrongly detect a Win10 desktop computer
|
|
with a mouse and a touchscreen, as a device with no mouse (hover: none)
|
|
and a touchscreen (pointer: coarse)
|
|
|
|
|
|
Web Events
|
|
=====================
|
|
|
|
The pen stylus worked well on all tested systems -- The correct pointer events
|
|
were fired in the correct order, and mouse events were properly simulated in
|
|
case the default behavior was allowed.
|
|
|
|
The touchscreen input was less reliable. On the Dell XPS 15, the
|
|
"Pointer Events" were flawless, but the "Touch Events" were missing
|
|
an important step: the ``touchstart`` and ``touchmove`` messages were sent just
|
|
fine, but Firefox never sends the ``touchend`` message! (Hopefully that isn't
|
|
too difficult to fix!)
|
|
|
|
Unfortunately, everything really falls apart on the Surface Pro using the
|
|
touchscreen -- neither the "Pointer Events" nor the "Touch Events" fire at all!
|
|
Instead, the touch is completely absorbed by pan and zoom gestures, and nothing
|
|
is sent to the web page. The website's request for ``touch-action: none`` is
|
|
ignored, and the web page is never given any opportunity to call
|
|
``Event.preventDefault()`` to cancel the pan/zoom behavior.
|
|
|
|
|
|
Operating System Interfaces
|
|
================================
|
|
|
|
As was discussed above, Windows has multiple input APIs that were each
|
|
introduced in newer version of Windows to handle devices that were not
|
|
well-served by existing APIs.
|
|
|
|
Backward compatibility with applications designed against older APIs is
|
|
realized when applications call the default event handler (``DefWindowProc``)
|
|
upon receiving an event type that they don't recognize (which is what apps have
|
|
always been instructed to do if they receive events they don't recognize).
|
|
The unrecognized newer events will be translated by the default event handler
|
|
into older events and sent back to the application. A very old application may
|
|
have this process repeat through several generations of APIs until it finally
|
|
sees events that it recognizes.
|
|
|
|
Firefox currently uses a mix of the older and newer APIs, which complicates
|
|
the input handling logic and may be responsible for some of the
|
|
difficult-to-explain bugs that we see reported by users.
|
|
|
|
Here is an explanation of the codepaths Firefox uses to handle pointer input:
|
|
|
|
1. Firefox handles the ``WM_POINTER[LEAVE|DOWN|UP|UPDATE]`` messages if the
|
|
input device is a tablet pen and an Asynchronous Pan and Zoom (APZ)
|
|
compositor is available. Note that this already may not be ideal, as
|
|
Microsoft warns (`here <https://learn.microsoft.com/en-us/windows/win32/inputmsg/wm-pointercapturechanged>`__)
|
|
that handling some pointer messages and passing other pointer messages to
|
|
``DefWindowProc`` has unspecified behavior (meaning that Win32 may do
|
|
something unexpected or nonsensical).
|
|
|
|
If the above criteria aren't met, Firefox will call ``DefWindowProc``, which
|
|
will re-post the pointer messages as either touch messages or mouse
|
|
messages.
|
|
|
|
2. If DirectManipulation is being used for APZ, it will output the
|
|
``WM_POINTERCAPTURECHANGED`` if it detects a pan or zoom gesture it can
|
|
handle. It will then handle the rest of the gesture itself.
|
|
|
|
DirectManipulation is used for all top-level and popup windows as long as
|
|
it isn't disabled via the ``apz.allow_zooming``,
|
|
``apz.windows.use_direct_manipulation``, or
|
|
``apz.windows.force_disable_direct_manipulation`` prefs.
|
|
|
|
3. If the pointing device is touch, the next action depends on
|
|
whether an Asynchronous Pan and Zoom (APZ) compositor is available. If it
|
|
is, the window will have been registered using ``RegisterTouchWindow``, and
|
|
Firefox will receive ``WM_TOUCH`` messages, which will be sent to the
|
|
"Touch Event" API and handled directly by the APZ compositor.
|
|
|
|
If there is no APZ compositor, it will instead be received as a
|
|
``WM_GESTURE`` message or a mouse message, depending on the movement. Note
|
|
that these will be more basic gestures, like tap-and-hold.
|
|
|
|
4. If none of the above apply, the message will be converted into standard
|
|
``WM_MOUSExxx`` messages via a call to ``DefWindowProc``.
|
|
|
|
|
|
================================================================================
|
|
Discussion
|
|
================================================================================
|
|
|
|
Here is where some of the outstanding thoughts or questions can be listed.
|
|
This can be updated as more questions come about and (hopefully) as answers to
|
|
questions become apparent.
|
|
|
|
CSS "pointer" Media Queries
|
|
===============================
|
|
|
|
- The logic for the ``any-pointer`` and ``any-hover`` queries are objectively
|
|
incorrect and should be rewritten altogether. That is not as
|
|
big of a job as it sounds, as the code is fairly straightforward and
|
|
self-contained. (Note: Improvements have already been made in
|
|
`Bug 1813979 <https://bugzilla.mozilla.org/show_bug.cgi?id=1813979>`__)
|
|
|
|
- There are a few behaviors for ``pointer`` and ``hover`` that are
|
|
objectively wrong (such as reporting a ``coarse`` pointer when the
|
|
Surface Pro is docked with a touchpad). Those should be fixable with a
|
|
code change similar to the previous bullet.
|
|
|
|
- Do we want to continue to use only machine configuration to decide what
|
|
the "primary" pointer is, or do we also want to incorporate user intent
|
|
into the algorithm? Or, alternatively:
|
|
|
|
1. Do we create a way for the user to override? For example, a "Drawing
|
|
Mode" button if a tablet digitizer is sensed.
|
|
|
|
2. Do we attempt to change automatically in response to user action?
|
|
|
|
- An example was used above of a docked Surface Pro computer, where
|
|
the user may use the keyboard and touchpad for a while, then perhaps
|
|
tuck that behind and use the device as a touchscreen, and then
|
|
perhaps draw on it with a tablet stylus.
|
|
|
|
- We would need to be careful to avoid careless "popping" or
|
|
"oscillating" if we react too quickly to changing input types.
|
|
|
|
- On a separate-but-related note, the `W3C suggested <https://www.w3.org/TR/mediaqueries-5/#descdef-media-pointer>`__
|
|
that it might be beneficial to allow users to at-least disable all
|
|
reporting of ``fine`` pointing devices for users who may have a disability
|
|
that prevents them from being able to click small objects, even with a fine
|
|
pointing device.
|
|
|
|
|
|
Pan-and-Zoom, Inertia, Overscroll, and Elastic Bounce
|
|
=========================================================
|
|
|
|
- Inertia, overscroll, and elastic bounce are just plain broken on the
|
|
Surface Pro. That should definitely be investigated.
|
|
|
|
- We can see from the video below that Microsoft Edge has quite a bit more
|
|
overscroll and a more elastic bounce than Firefox does, and it also
|
|
allows elastic bounce in directions that the page itself doesn't scroll.
|
|
|
|
Edge's way seems more similar to the user experience I'd expect from using
|
|
Firefox on an iPhone or Android device. Perhaps we should consider
|
|
following suit?
|
|
|
|
(`Link to video <https://drive.google.com/file/d/14XVLT6CNn2RaXcHHCRIrQmRwoMYjj6fu/view?usp=sharing>`__)
|
|
|
|
|
|
Web Events
|
|
==============
|
|
|
|
- It's worth investigating why the ``touchend`` message never seems
|
|
to be sent by Firefox on any tested devices.
|
|
|
|
- It's very disappointing that neither the Pointer Events API nor the
|
|
Touch Events API works at all on Firefox on the Surface Pro. That should
|
|
be investigated very soon!
|
|
|
|
|
|
Operating System Interfaces
|
|
================================
|
|
|
|
- With the upcoming sun-setting of Windows 7 support, Firefox has an
|
|
opportunity to revisit the implementation of our input handling and try to
|
|
simplify our codepaths and eliminate some of the workarounds that exist to
|
|
handle some of these complex interactions, as well as fix entire classes of
|
|
bugs - both reported and unreported - that currently exist as a result.
|
|
|
|
- Does it make sense to combine the touchscreen and pen handling together
|
|
and use the ``WM_POINTERXXX`` messages for both?
|
|
|
|
- This would eliminate the need to handle the ``WM_TOUCH`` and
|
|
``WM_GESTURE`` messages at all.
|
|
|
|
- Note that there is precedent for this, as `GTK <https://gitlab.gnome.org/GNOME/gtk/-/merge_requests/1563>`__
|
|
has already done so. It appears that `Blender <https://archive.blender.org/developer/D7660>`__
|
|
has plans to move toward this as well.
|
|
|
|
- Tablet pens seemed to do very well in most of the testing,
|
|
and they are also the part of the code that mainly exercises the
|
|
``WM_POINTERXXX`` codepaths. That may imply increased reliability in
|
|
that codepath?
|
|
|
|
- The Pointer APIs also have good device simulation for integration
|
|
testing.
|
|
|
|
- Would we also want to roll mouse handling into it using the
|
|
`EnableMouseInPointer <https://learn.microsoft.com/en-us/windows/win32/api/winuser/nf-winuser-enablemouseinpointer>` __
|
|
call? That would allow us to also get rid of handling
|
|
``WM_MOUSE[MOVE/WHEEL/HWHEEL]`` and ``WM_[LRM]BUTTON[UP|DOWN]``
|
|
messages. Truly one codepath (with a few minor branches) to rule them
|
|
all!
|
|
|
|
- Nick Rishel sent `this link <http://the-witness.net/news/2012/10/wm_touch-is-totally-bananas/>`__
|
|
that details the troubles that the developers of The Witness (a video
|
|
game) ran into when using the ``WM_TOUCH`` API. It argues that the API
|
|
is poorly-designed, and advises that if Windows 7 support is not
|
|
needed, the API should be avoided.
|
|
|
|
- Should we exclusively use DirectManipulation for Pan/Zoom?
|
|
|
|
- Multitouch touchpads bypass all of the ``WM_POINTER`` machinery
|
|
for anything gesture-related and directly send their messages to
|
|
DirectManipulation. We then "capture" all the DirectManipulation events
|
|
and pump them into our events pipeline, as explained above.
|
|
|
|
- DirectManipulation also handles "overscroll + elastic bounce" in a way
|
|
that aligns with Windows look-and-feel.
|
|
|
|
- Perhaps it makes sense to just use DirectManipulation for all APZ
|
|
handling and eliminate any attempt at handling this through other
|
|
codepaths.
|
|
|
|
High-Frequency Input
|
|
================================
|
|
|
|
"High-Frequency Input" refers to the ability for an app to be able to still
|
|
perceive input events despite them happening at a rate faster than the app
|
|
itself actually handles them.
|
|
|
|
Consider a mouse that moves through several points: "A->B->C->D->E". If the
|
|
application processes input when the mouse is at "A" and doesn't poll again
|
|
until the mouse is at point "E", the default behavior of all modern operating
|
|
systems is to "coalesce" these events and simply report "A->E". This is fine
|
|
for the majority of use cases, but certain workloads (such as digital
|
|
handwriting and video games) can benefit from knowing the complete path that
|
|
was taken to get from the start point to the end point.
|
|
|
|
Generally, solutions to this involve the operating system keeping a history of
|
|
pointer movements that can be retrieved through an API. For example,
|
|
Android provides the `MotionEvent <https://developer.android.com/reference/android/view/MotionEvent.html>`__
|
|
API that batches historal movements.
|
|
|
|
Unfortunately, the APIs to do this in Windows are terribly broken. As
|
|
`this blog <https://blog.getpaint.net/2019/11/14/paint-net-4-2-6-alpha-build-7258/>`__
|
|
makes clear, `GetMouseMovePointsEx <https://docs.microsoft.com/en-us/windows/win32/api/winuser/nf-winuser-getmousemovepointsex>`__
|
|
has so many issues that they had to remove its usage from their program because
|
|
of the burden. That same blog entry also details that the newer Pointer API has
|
|
the `GetPointerInfoHistory <https://docs.microsoft.com/en-us/windows/win32/api/winuser/nf-winuser-getpointerinfohistory>`__
|
|
that is *supposed* to support tracking pointer history, but it only ever tracks
|
|
a single entry!
|
|
|
|
Perhaps luckily, there is currently no web standard for high-frequency input,
|
|
although it `has been asked about in the past <https://lists.w3.org/Archives/Public/public-pointer-events/2014AprJun/0057.html>`__.
|
|
|
|
If such a standard was ever created, it would likely be very difficult for
|
|
Firefox on Windows to support it.
|
|
|
|
|
|
DirectManipulation and Pens
|
|
=============================
|
|
|
|
- This is a todo item, but it needs to be investigated whether or not
|
|
DirectManipulation can directly scoop up pen input, or whether it has
|
|
to be handled by the application (and forwarded to DM if desired).
|