diff options
Diffstat (limited to 'xpcom/docs')
-rw-r--r-- | xpcom/docs/cc-macros.rst | 190 | ||||
-rw-r--r-- | xpcom/docs/collections.rst | 114 | ||||
-rw-r--r-- | xpcom/docs/hashtables.rst | 141 | ||||
-rw-r--r-- | xpcom/docs/hashtables_detailed.rst | 121 | ||||
-rw-r--r-- | xpcom/docs/huntingleaks.rst | 22 | ||||
-rw-r--r-- | xpcom/docs/index.rst | 19 | ||||
-rw-r--r-- | xpcom/docs/logging.rst | 488 | ||||
-rw-r--r-- | xpcom/docs/refptr.rst | 81 | ||||
-rw-r--r-- | xpcom/docs/stringguide.rst | 1094 | ||||
-rw-r--r-- | xpcom/docs/thread-safety.rst | 354 | ||||
-rw-r--r-- | xpcom/docs/writing-xpcom-interface.rst | 287 | ||||
-rw-r--r-- | xpcom/docs/xpidl.rst | 402 |
12 files changed, 3313 insertions, 0 deletions
diff --git a/xpcom/docs/cc-macros.rst b/xpcom/docs/cc-macros.rst new file mode 100644 index 0000000000..7877b3aabe --- /dev/null +++ b/xpcom/docs/cc-macros.rst @@ -0,0 +1,190 @@ +======================================= +How to make a C++ class cycle collected +======================================= + +Should my class be cycle collected? +=================================== + +First, you need to decide if your class should be cycle +collected. There are three main criteria: + +* It can be part of a cycle of strong references, including + refcounted objects and JS. Usually, this happens when it can hold + alive and be held alive by cycle collected objects or JS. + +* It must be refcounted. + +* It must be single threaded. The cycle collector can only work with + objects that are used on a single thread. The main thread and DOM + worker and worklet threads each have their own cycle collectors. + +If your class meets the first criteria but not the second, then +whatever class uniquely owns it should be cycle collected, assuming +that is refcounted, and this class should be traversed and unlinked as +part of that. + +The cycle collector supports both nsISupports and non-nsISupports +(known as "native" in CC nomenclature) refcounting. However, we do not +support native cycle collection in the presence of inheritance, if two +classes related by inheritance need different CC +implementations. (This is because we use QueryInterface to find the +right CC implementation for an object.) + +Once you've decided to make a class cycle collected, there are a few +things you need to add to your implementation: + +* Cycle collected refcounting. Special refcounting is needed so that + the CC can tell when an object is created, used, or destroyed, so + that it can determine if an object is potentially part of a garbage + cycle. + +* Traversal. Once the CC has decided an object **might** be garbage, + it needs to know what other cycle collected objects it holds strong + references to. This is done with a "traverse" method. + +* Unlinking. Once the CC has decided that an object **is** garbage, it + needs to break the cycles by clearing out all strong references to + other cycle collected objects. This is done with an "unlink" + method. This usually looks very similar to the traverse method. + +The traverse and unlink methods, along with other methods needed for +cycle collection, are defined on a special inner class object, called a +"participant", for performance reasons. The existence of the +participant is mostly hidden behind macros so you shouldn't need +to worry about it. + +Next, we'll go over what the declaration and definition of these parts +look like. (Spoiler: there are lots of `ALL_CAPS` macros.) This will +mostly cover the most common variants. If you need something slightly +different, you should look at the location of the declaration of the +macros we mention here and see if the variants already exist. + + +Reference counting +================== + +nsISupports +----------- + +If your class inherits from nsISupports, you'll need to add +`NS_DECL_CYCLE_COLLECTING_ISUPPORTS` to the class declaration. This +will declare the QueryInterface (QI), AddRef and Release methods you +need to implement nsISupports, as well as the actual refcount field. + +In the `.cpp` file for your class you'll have to define the +QueryInterface, AddRef and Release methods. The ins and outs of +defining the QI method is out-of-scope for this document, but you'll +need to use the special cycle collection variants of the macros, like +`NS_INTERFACE_MAP_BEGIN_CYCLE_COLLECTION`. (This is because we use +the nsISupports system to define a special interface used to +dynamically find the correct CC participant for the object.) + +Finally, you'll have to actually define the AddRef and Release methods +for your class. If your class is called `MyClass`, then you'd do this +with the declarations `NS_IMPL_CYCLE_COLLECTING_ADDREF(MyClass)` and +`NS_IMPL_CYCLE_COLLECTING_RELEASE(MyClass)`. + +non-nsISupports +--------------- + +If your class does **not** inherit from nsISupports, you'll need to +add `NS_INLINE_DECL_CYCLE_COLLECTING_NATIVE_REFCOUNTING` to the class +declaration. This will give inline definitions for the AddRef and +Release methods, as well as the actual refcount field. + +Cycle collector participant +=========================== + +Next we need to declare and define the cycle collector +participant. This is mostly boilerplate hidden behind macros, but you +will need to specify which fields are to be traversed and unlinked +because they are strong references to cycle collected objects. + +Declaration +----------- + +First, we need to add a declaration for the participant. As before, +let's say your class is `MyClass`. + +The basic way to declare this for an nsISupports class is +`NS_DECL_CYCLE_COLLECTION_CLASS(MyClass)`. + +If your class inherits from multiple classes that inherit from +nsISupports classes, say `Parent1` and `Parent2`, then you'll need to +use `NS_DECL_CYCLE_COLLECTION_CLASS_AMBIGUOUS(MyClass, Parent1)` to +tell the CC to cast to nsISupports via `Parent1`. You probably want to +pick the first class it inherits from. (The cycle collector needs to be +able to cast `MyClass*` to `nsISupports*`.) + +Another situation you might encounter is that your nsISupports class +inherits from another cycle collected class `CycleCollectedParent`. In +that case, your participant declaration will look like +`NS_DECL_CYCLE_COLLECTION_CLASS_INHERITED(MyClass, +CycleCollectedParent)`. (This is needed so that the CC can cast from +nsISupports down to `MyClass`.) Note that we do not support inheritance +for non-nsISupports classes. + +If your class is non-nsISupports, then you'll need to use the `NATIVE` +family of macros, like +`NS_DECL_CYCLE_COLLECTION_NATIVE_CLASS(MyClass)`. + +In addition to these modifiers, these different variations have +further `SCRIPT_HOLDER` variations which are needed if your class +holds alive JavaScript objects. This is because the tracing of JS +objects held alive by this class must be declared separately from the +tracing of C++ objects held alive by this class so that the garbage +collector can also use the tracing. For example, +`NS_DECL_CYCLE_COLLECTION_SCRIPT_HOLDER_CLASS_AMBIGUOUS(MyClass, +Parent1)`. + +There are also `WRAPPERCACHE` variants of the macros which you need to +use if your class is wrapper cached. These are effectively a +specialized form of `SCRIPT_HOLDER`, as a cached wrapper is a +JS object held alive by the C++ object. For example, +`NS_DECL_CYCLE_COLLECTION_WRAPPERCACHE_CLASS_AMBIGUOUS(MyClass, +Parent1)`. + +There is yet another variant of these macros, `SKIPPABLE`. This +document won't go into detail here about how this works, but the basic +idea is that a class can tell the CC when it is definitely alive, +which lets the CC skip it. This is a very important optimization for +things like DOM elements in active documents, but a new class you are +making cycle collected is likely not common enough to worry about. + +Implementation +-------------- + +Finally, you must write the actual implementation of the CC +participant, in the .cpp file for your class. This will define the +traverse and unlink methods, and some other random helper +functions. In the simplest case, this can be done with a single macro +like this: `NS_IMPL_CYCLE_COLLECTION(MyClass, mField1, mField2, +mField3)`, where `mField1` and the rest are the names of the fields of +your class that are strong references to cycle collected +objects. There is some template magic that says how many common types +like RefPtr, nsCOMPtr, and even some arrays, should be traversed and +unlinked. There’s also a variant `NS_IMPL_CYCLE_COLLECTION_INHERITED`, +which you should use when there’s a parent class that is also cycle +collected, to ensure that fields of the parent class are traversed and +unlinked. The name of that parent class is passed in as the second +argument. If either of these work, then you are done. Your class is +now cycle collected. Note that this does not work for fields that are +JS objects. + +However, if that doesn’t work, you’ll have to get into the details a +bit more. A good place to start is by copying the definition of +`NS_IMPL_CYCLE_COLLECTION`. + +For a script holder method, you also need to define a trace method in +addition to the traverse and unlink, using +`NS_IMPL_CYCLE_COLLECTION_TRACE_BEGIN` and other similar +macros. You'll need to include all of the JS fields that your class +holds alive. The trace method will be used by the GC as well as the +CC, so if you miss something you can end up with use-after-free +crashes. You'll also need to call `mozilla::HoldJSObjects(this);` in +the ctor for your class, and `mozilla::DropJSObjects(this);` in the +dtor. This will register (and unregister) each instance of your object +with the JS runtime, to ensure that it gets traced properly. This +does not apply if you have a wrapper cached class that does not have +any additional JS fields, as nsWrapperCache deals with all of that +for you. diff --git a/xpcom/docs/collections.rst b/xpcom/docs/collections.rst new file mode 100644 index 0000000000..8c9356e499 --- /dev/null +++ b/xpcom/docs/collections.rst @@ -0,0 +1,114 @@ +XPCOM Collections +================= + +``nsTArray`` and ``AutoTArray`` +------------------------------- + +``nsTArray<T>`` is a typesafe array for holding various objects, similar to ``std::vector<T>``. (note that +``nsTArray<T>`` is dynamically-sized, unlike ``std::array<T>``) Here's an +incomplete list of mappings between the two: + +================== ================================================== +std::vector<T> nsTArray<T> +================== ================================================== +``size()`` ``Length()`` +``empty()`` ``IsEmpty()`` +``resize()`` ``SetLength()`` or ``SetLengthAndRetainStorage()`` +``capacity()`` ``Capacity()`` +``reserve()`` ``SetCapacity()`` +``push_back()`` ``AppendElement()`` +``insert()`` ``AppendElements()`` +``emplace_back()`` ``EmplaceBack()`` +``clear()`` ``Clear()`` or ``ClearAndRetainStorage()`` +``data()`` ``Elements()`` +``at()`` ``ElementAt()`` +``back()`` ``LastElement()`` +================== ================================================== + +Rust Bindings +~~~~~~~~~~~~~ + +When the ``thin_vec`` crate is built in Gecko, ``thin_vec::ThinVec<T>`` is +guaranteed to have the same memory layout and allocation strategy as +``nsTArray``, meaning that the two types may be used interchangeably across +FFI boundaries. The type is **not** safe to pass by-value over FFI +boundaries, due to Rust and C++ differing in when they run destructors. + +The element type ``T`` must be memory-compatible with both Rust and C++ code +to use over FFI. + +``nsTHashMap`` and ``nsTHashSet`` +--------------------------------- + +These types are the recommended interface for writing new XPCOM hashmaps and +hashsets in XPCOM code. + +Supported Hash Keys +~~~~~~~~~~~~~~~~~~~ + +The following types are supported as the key parameter to ``nsTHashMap`` and +``nsTHashSet``. + +========================== ====================== +Type Hash Key +========================== ====================== +``T*`` ``nsPtrHashKey<T>`` +``T*`` ``nsPtrHashKey<T>`` +``nsCString`` ``nsCStringHashKey`` +``nsString`` ``nsStringHashKey`` +``uint32_t`` ``nsUint32HashKey`` +``uint64_t`` ``nsUint64HashKey`` +``intptr_t`` ``IntPtrHashKey`` +``nsCOMPtr<nsISupports>`` ``nsISupportsHashKey`` +``RefPtr<T>`` ``nsRefPtrHashKey<T>`` +``nsID`` ``nsIDHashKey`` +========================== ====================== + +Any key not in this list must inherit from the ``PLDHashEntryHdr`` class to +implement manual hashing behaviour. + +Class Reference +~~~~~~~~~~~~~~~ + +.. note:: + + The ``nsTHashMap`` and ``nsTHashSet`` types are not declared exactly like + this in code. This is intended largely as a practical reference. + +.. cpp:class:: template<K, V> nsTHashMap<K, V> + + The ``nsTHashMap<K, V>`` class is currently defined as a thin type alias + around ``nsBaseHashtable``. See the methods defined on that class for + more detailed documentation. + + https://searchfox.org/mozilla-central/source/xpcom/ds/nsBaseHashtable.h + + .. cpp:function:: uint32_t Count() const + + .. cpp:function:: bool IsEmpty() const + + .. cpp:function:: bool Get(KeyType aKey, V* aData) const + + Get the value, returning a flag indicating the presence of the entry + in the table. + + .. cpp:function:: V Get(KeyType aKey) const + + Get the value, returning a default-initialized object if the entry is + not present in the table. + + .. cpp:function:: Maybe<V> MaybeGet(KeyType aKey) const + + Get the value, returning Nothing if the entry is not present in the table. + + .. cpp:function:: V& LookupOrInsert(KeyType aKey, Args&&... aArgs) const + + .. cpp:function:: V& LookupOrInsertWith(KeyType aKey, F&& aFunc) const + +.. cpp:class:: template<K> nsTHashSet<K> + + The ``nsTHashSet<K>`` class is currently defined as a thin type alias + around ``nsTBaseHashSet``. See the methods defined on that class for + more detailed documentation. + + https://searchfox.org/mozilla-central/source/xpcom/ds/nsTHashSet.h diff --git a/xpcom/docs/hashtables.rst b/xpcom/docs/hashtables.rst new file mode 100644 index 0000000000..8debd11eb7 --- /dev/null +++ b/xpcom/docs/hashtables.rst @@ -0,0 +1,141 @@ +XPCOM Hashtable Guide +===================== + +.. note:: + + For a deep-dive into the underlying mechanisms that power our hashtables, + check out the :ref:`XPCOM Hashtable Technical Details` + document. + +What Is a Hashtable? +-------------------- + +A hashtable is a data construct that stores a set of **items**. Each +item has a **key** that identifies the item. Items are found, added, and +removed from the hashtable by using the key. Hashtables may seem like +arrays, but there are important differences: + ++-------------------------+----------------------+----------------------+ +| | Array | Hashtable | ++=========================+======================+======================+ +| **Keys** | *integer:* arrays | *any type:* almost | +| | are always keyed on | any datatype can be | +| | integers and must | used as key, | +| | be contiguous. | including strings, | +| | | integers, XPCOM | +| | | interface pointers, | +| | | IIDs, and almost | +| | | anything else. Keys | +| | | can be disjunct | +| | | (i.e. you can store | +| | | entries with keys 1, | +| | | 5, and 3000). | ++-------------------------+----------------------+----------------------+ +| **Lookup Time** | *O(1):* lookup time | *O(1):* lookup time | +| | is a simple constant | is mostly-constant, | +| | | but the constant | +| | | time can be larger | +| | | than an array lookup | ++-------------------------+----------------------+----------------------+ +| **Sorting** | *sorted:* stored | *unsorted:* stored | +| | sorted; iterated | unsorted; cannot be | +| | over in a sorted | iterated over in a | +| | fashion. | sorted manner. | ++-------------------------+----------------------+----------------------+ +| **Inserting/Removing** | *O(n):* adding and | *O(1):* adding and | +| | removing items from | removing items from | +| | a large array can be | hashtables is a | +| | time-consuming | quick operation | ++-------------------------+----------------------+----------------------+ +| **Wasted space** | *none:* Arrays are | *some:* hashtables | +| | packed structures, | are not packed | +| | so there is no | structures; | +| | wasted space. | depending on the | +| | | implementation, | +| | | there may be | +| | | significant wasted | +| | | memory. | ++-------------------------+----------------------+----------------------+ + +In their implementation, hashtables take the key and apply a +mathematical **hash function** to **randomize** the key and then use the +hash to find the location in the hashtable. Good hashtable +implementations will automatically resize the hashtable in memory if +extra space is needed, or if too much space has been allocated. + +.. _When_Should_I_Use_a_Hashtable.3F: + +When Should I Use a Hashtable? +------------------------------ + +Hashtables are useful for + +- sets of data that need swift **random access** +- with **non-integral keys** or **non-contiguous integral keys** +- or where **items will be frequently added or removed** + +Hashtables should *not* be used for + +- Sets that need to be **sorted** +- Very small datasets (less than 12-16 items) +- Data that does not need random access + +In these situations, an array, a linked-list, or various tree data +structures are more efficient. + +.. _Which_Hashtable_Should_I_Use.3F: + +Which Hashtable Should I Use? +----------------------------- + +If there is **no** key type, you should use an ``nsTHashSet``. + +If there is a key type, you should use an ``nsTHashMap``. + +``nsTHashMap`` is a template with two parameters. The first is the hash key +and the second is the data to be stored as the value in the map. Most of +the time, you can simply pass the raw key type as the first parameter, +so long as its supported by `nsTHashMap.h <https://searchfox.org/mozilla-central/source/xpcom/ds/nsTHashMap.h>`_. +It is also possible to specify custom keys if necessary. See `nsHashKeys.h +<https://searchfox.org/mozilla-central/source/xpcom/ds/nsHashKeys.h>`_ for examples. + +There are a number of more esoteric hashkey classes in nsHashKeys.h, and +you can always roll your own if none of these fit your needs (make sure +you're not duplicating an existing hashkey class though!) + +Once you've determined what hashtable and hashkey classes you need, you +can put it all together. A few examples: + +- A hashtable that maps UTF-8 origin names to a DOM Window - + ``nsTHashMap<nsCString, nsCOMPtr<nsIDOMWindow>>`` +- A hashtable that maps 32-bit integers to floats - + ``nsTHashMap<uint32_t, float>`` +- A hashtable that maps ``nsISupports`` pointers to reference counted + ``CacheEntry``\ s - + ``nsTHashMap<nsCOMPtr<nsISupports>, RefPtr<CacheEntry>>`` +- A hashtable that maps ``JSContext`` pointers to a ``ContextInfo`` + struct - ``nsTHashMap<JSContext*, UniquePtr<ContextInfo>>`` +- A hashset of strings - ``nsTHashSet<nsString>`` + +.. _nsBaseHashtable_and_friends:_nsDataHashtable.2C_nsInterfaceHashtable.2C_and_nsClassHashtable: + +Hashtable API +------------- + +The hashtable classes all expose the same basic API. There are three +key methods, ``Get``, ``InsertOrUpdate``, and ``Remove``, which retrieve entries from the +hashtable, write entries into the hashtable, and remove entries from the +hashtable respectively. See `nsBaseHashtable.h <https://searchfox.org/mozilla-central/source/xpcom/ds/nsBaseHashtable.h>`_ +for more details. + +The hashtables that hold references to pointers (nsRefPtrHashtable and +nsInterfaceHashtable) also have GetWeak methods that return non-AddRefed +pointers. + +Note that ``nsRefPtrHashtable``, ``nsInterfaceHashtable`` and ``nsClassHashtable`` +are legacy hashtable types which have some extra methods, and don't have automatic +key type handling. + +All of these hashtable classes can be iterated over via the ``Iterator`` +class, with normal C++11 iterators or using the ``Keys()`` / ``Values()`` ranges, +and all can be cleared via the ``Clear`` method. diff --git a/xpcom/docs/hashtables_detailed.rst b/xpcom/docs/hashtables_detailed.rst new file mode 100644 index 0000000000..200c47490d --- /dev/null +++ b/xpcom/docs/hashtables_detailed.rst @@ -0,0 +1,121 @@ +XPCOM Hashtable Technical Details +================================= + +.. note:: + + This is a deep-dive into the underlying mechanisms that power the XPCOM + hashtables. Some of this information is quite old and may be out of date. If + you're looking for how to use XPCOM hashtables, you should consider reading + the :ref:`XPCOM Hashtable Guide` instead. + +Mozilla's Hashtable Implementations +----------------------------------- + +Mozilla has several hashtable implementations, which have been tested +and tuned, and hide the inner complexities of hashtable implementations: + +- ``PLHashTable`` - low-level C API; entry class pointers are constant; + more efficient for large entry structures; often wastes memory making + many small heap allocations. +- ``nsTHashtable`` - low-level C++ wrapper around ``PLDHash``; + generates callback functions and handles most casting automagically. + Client writes their own entry class which can include complex key and + data types. +- ``nsTHashMap/nsInterfaceHashtable/nsClassHashtable`` - + simplifies the common usage pattern mapping a simple keytype to a + simple datatype; client does not need to declare or manage an entry class; + ``nsTHashMap`` datatype is a scalar such as ``uint64_t``; + ``nsInterfaceHashtable`` datatype is an XPCOM interface; + ``nsClassHashtable`` datatype is a class pointer owned by the + hashtable. + +.. _PLHashTable: + +PLHashTable +~~~~~~~~~~~ + +``PLHashTable`` is a part of NSPR. The header file can be found at `plhash.h +<https://searchfox.org/mozilla-central/source/nsprpub/lib/ds/plhash.h>`_. + +There are two situations where ``PLHashTable`` may be preferable: + +- You need entry-pointers to remain constant. +- The entries stored in the table are very large (larger than 12 + words). + +.. _nsTHashtable: + +nsTHashtable +~~~~~~~~~~~~ + +To use ``nsTHashtable``, you must declare an entry-class. This +entry class contains the key and the data that you are hashing. It also +declares functions that manipulate the key. In most cases, the functions +of this entry class can be entirely inline. For examples of entry classes, +see the declarations at `nsHashKeys.h +<https://searchfox.org/mozilla-central/source/xpcom/ds/nsHashKeys.h>`_. + +The template parameter is the entry class. After construction, use the +functions ``PutEntry/GetEntry/RemoveEntry`` to alter the hashtable. The +``Iterator`` class will do iteration, but beware that the iteration will +occur in a seemingly-random order (no sorting). + +- ``nsTHashtable``\ s can be allocated on the stack, as class members, + or on the heap. +- Entry pointers can and do change when items are added to or removed + from the hashtable. Do not keep long-lasting pointers to entries. +- because of this, ``nsTHashtable`` is not inherently thread-safe. If + you use a hashtable in a multi-thread environment, you must provide + locking as appropriate. + +Before using ``nsTHashtable``, see if ``nsBaseHashtable`` and relatives +will work for you. They are much easier to use, because you do not have +to declare an entry class. If you are hashing a simple key type to a +simple data type, they are generally a better choice. + +.. _nsBaseHashtable_and_friends:nsTHashMap.2C_nsInterfaceHashtable.2C_and_nsClassHashtable: + +nsBaseHashtable and friends: nsTHashMap, nsInterfaceHashtable, and nsClassHashtable +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +These C++ templates provide a high-level interface for using hashtables +that hides most of the complexities of the underlying implementation. They +provide the following features: + +- hashtable operations can be completed without using an entry class, + making code easier to read +- optional thread-safety: the hashtable can manage a read-write lock + around the table +- predefined key classes provide automatic cleanup of + strings/interfaces +- ``nsInterfaceHashtable`` and ``nsClassHashtable`` automatically + release/delete objects to avoid leaks. + +``nsBaseHashtable`` is not used directly; choose one of the three +derivative classes based on the data type you want to store. The +``KeyClass`` is taken from `nsHashKeys.h +<https://searchfox.org/mozilla-central/source/xpcom/ds/nsHashKeys.h>`_ and is the same for all +three classes: + +- ``nsTHashMap<KeyClass, DataType>`` - ``DataType`` is a simple + type such as ``uint32_t`` or ``bool``. +- ``nsInterfaceHashtable<KeyClass, Interface>`` - ``Interface`` is an + XPCOM interface such as ``nsISupports`` or ``nsIDocShell`` +- ``nsClassHashtable<KeyClass, T>`` - ``T`` is any C++ class. The + hashtable stores a pointer to the object, and deletes that object + when the entry is removed. + +The important files to read are +`nsBaseHashtable.h <https://searchfox.org/mozilla-central/source/xpcom/ds/nsBaseHashtable.h>`_ +and +`nsHashKeys.h <https://searchfox.org/mozilla-central/source/xpcom/ds/nsHashKeys.h>`_. +These classes can be used on the stack, as a class member, or on the heap. + +.. _Using_nsTHashtable_as_a_hash-set: + +Using nsTHashtable as a hash-set +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +A hash set only tracks the existence of keys: it does not associate data +with the keys. This can be done using ``nsTHashtable<nsSomeHashKey>``. +The appropriate entries are GetEntry and PutEntry. diff --git a/xpcom/docs/huntingleaks.rst b/xpcom/docs/huntingleaks.rst new file mode 100644 index 0000000000..9e0585da4e --- /dev/null +++ b/xpcom/docs/huntingleaks.rst @@ -0,0 +1,22 @@ +Hunting Leaks +============= + +.. contents:: Table of Contents + :local: + :depth: 2 + +Different tools and techniques are used to hunt leaks: + +.. list-table:: + :header-rows: 1 + + * - Tools + - Description + * - :ref:`Bloatview` + - BloatView is a tool that shows information about cumulative memory usage and leaks. + * - :ref:`Refcount Tracing and Balancing` + - Refcount tracing and balancing are advanced techniques for tracking down leak of refcounted objects found with BloatView. + * - `GC and CC logs </performance/memory/gc_and_cc_logs.html>`_ + - Garbage collector (GC) and cycle collector (CC) logs give information about why various JS and C++ objects are alive in the heap. + * - :ref:`DMD Heap Scan Mode` + - Heap profiler within Firefox diff --git a/xpcom/docs/index.rst b/xpcom/docs/index.rst new file mode 100644 index 0000000000..bf4e3157c4 --- /dev/null +++ b/xpcom/docs/index.rst @@ -0,0 +1,19 @@ +XPCOM +===== + +These pages contain documentation for Mozilla's Cross-Platform Component Object Model (XPCOM) module. It abstracts core systems functionality for cross-platform use. The component architecture follows the standard COM approach. + +.. toctree:: + :maxdepth: 1 + + logging + stringguide + refptr + thread-safety + huntingleaks + collections + xpidl + writing-xpcom-interface + hashtables + hashtables_detailed + cc-macros diff --git a/xpcom/docs/logging.rst b/xpcom/docs/logging.rst new file mode 100644 index 0000000000..0105f018ad --- /dev/null +++ b/xpcom/docs/logging.rst @@ -0,0 +1,488 @@ +Gecko Logging +============= + +A minimal C++ logging framework is provided for use in core Gecko code. It is +enabled for all builds and is thread-safe. + +This page covers enabling logging for particular logging module, configuring +the logging output, and how to use the logging facilities in native code. + +Enabling and configuring logging +++++++++++++++++++++++++++++++++ + +Caveat: sandboxing when logging to a file +----------------------------------------- + +A sandboxed content process cannot write to ``stderr`` or any file. The easiest +way to log these processes is to disable the content sandbox by setting the +preference ``security.sandbox.content.level`` to ``0``, or setting the environment +variable ``MOZ_DISABLE_CONTENT_SANDBOX`` to ``1``. + +On Windows, you can still see child process messages by using DOS (not the +``MOZ_LOG_FILE`` variable defined below) to redirect output to a file. For +example: ``MOZ_LOG="CameraChild:5" mach run >& my_log_file.txt`` will include +debug messages from the camera's child actor that lives in a (sandboxed) content +process. + +Logging to the Firefox Profiler +------------------------------- + +When a log statement is logged on a thread and the `Firefox Profiler +<https://profiler.firefox.com>`_ is profiling that thread, the log statements is +recorded as a profiler marker. + +This allows getting logs alongside profiler markers and lots of performance +and contextual information, in a way that doesn't require disabling the +sandbox, and works across all processes. + +The profile can be downloaded and shared e.g. via Bugzilla or email, or +uploaded, and the logging statements will be visible either in the marker chart +or the marker table. + +While it is possible to manually configure logging module and start the profiler +with the right set of threads to profile, ``about:logging`` makes this task a lot +simpler and error-proof. + + +The ``MOZ_LOG`` syntax +---------------------- + +Logging is configured using a special but simple syntax: which module should be +enabled, at which level, and what logging options should be enabled or disabled. + +The syntax is a list of terms, separated by commas. There are two types of +terms: + +- A log module and its level, separated by a colon (``:``), such as + ``example_module:5`` to enable the module ``example_module`` at logging level + ``5`` (verbose). This `searchfox query + <https://searchfox.org/mozilla-central/search?q=LazyLogModule+.*%5C%28%22&path=&case=true®exp=true>`_ + returns the complete list of modules available. +- A special string in the following table, to configure the logging behaviour. + Some configuration switch take an integer parameter, in which case it's + separated from the string by a colon (``:``). Most switches only apply in a + specific output context, noted in the **Context** column. + ++----------------------+---------+-------------------------------------------------------------------------------------------+ +| Special module name | Context | Action | ++======================+=========+===========================================================================================+ +| append | File | Append new logs to existing log file. | ++----------------------+---------+-------------------------------------------------------------------------------------------+ +| sync | File | Print each log synchronously, this is useful to check behavior in real time or get logs | +| | | immediately before crash. | ++----------------------+---------+-------------------------------------------------------------------------------------------+ +| raw | File | Print exactly what has been specified in the format string, without the | +| | | process/thread/timestamp, etc. prefix. | ++----------------------+---------+-------------------------------------------------------------------------------------------+ +| timestamp | File | Insert timestamp at start of each log line. | ++----------------------+---------+-------------------------------------------------------------------------------------------+ +| rotate:**N** | File | | This limits the produced log files' size. Only most recent **N megabytes** of log data | +| | | | is saved. We rotate four log files with .0, .1, .2, .3 extensions. Note: this option | +| | | | disables 'append' and forces 'timestamp'. | ++----------------------+---------+-------------------------------------------------------------------------------------------+ +| maxsize:**N** | File | Limit the log to **N** MB. Only work in append mode. | ++----------------------+---------+-------------------------------------------------------------------------------------------+ +| prependheader | File | Prepend a simple header while distinguishing logging. Useful in append mode. | ++----------------------+---------+-------------------------------------------------------------------------------------------+ +| profilerstacks | Profiler| | When profiling with the Firefox Profiler and log modules are enabled, capture the call | +| | | | stack for each log statement. | ++----------------------+---------+-------------------------------------------------------------------------------------------+ + +This syntax is used for most methods of enabling logging, with the exception of +settings preferences directly, see :ref:`this section <Enabling logging using preferences>` for directions. + + +Enabling Logging +---------------- + +Enabling logging can be done in a variety of ways: + +- via environment variables +- via command line switches +- using ``about:config`` preferences +- using ``about:logging`` + +The first two allow logging from the start of the application and are also +useful in case of a crash (when ``sync`` output is requested, this can also be +done with ``about:config`` as well to a certain extent). The last two +allow enabling and disabling logging at runtime and don't require using the +command-line. + +By default all logging output is disabled. + +Enabling logging using ``about:logging`` +'''''''''''''''''''''''''''''''''''''''' + +``about:logging`` allows enabling logging by entering a ``MOZ_LOG`` string in the +text input, and validating. + +Options allow logging to a file or using the Firefox Profiler, that can be +started and stopped right from the page. + +Logging presets for common scenarios are available in a drop-down. They can be +associated with a profiler preset. + +It is possible, via URL parameters, to select a particular logging +configuration, or to override certain parameters in a preset. This is useful to +ask a user to gather logs efficiently without having to fiddle with prefs and/or +environment variable. + +URL parameters are described in the following table: + ++---------------------+---------------------------------------------------------------------------------------------+ +| Parameter | Description | ++=====================+=============================================================================================+ +| ``preset`` | a `logging preset <https://searchfox.org/mozilla-central/search?q=gLoggingPresets>`_ | ++---------------------+---------------------------------------------------------------------------------------------+ +| ``logging-preset`` | alias for ``preset`` | ++---------------------+---------------------------------------------------------------------------------------------+ +| ``modules`` | a string in ``MOZ_LOG`` syntax | ++---------------------+---------------------------------------------------------------------------------------------+ +| ``module`` | alias for ``modules`` | ++---------------------+---------------------------------------------------------------------------------------------+ +| ``threads`` | a list of threads to profile, overrides what a profiler preset would have picked | ++---------------------+---------------------------------------------------------------------------------------------+ +| ``thread`` | alias for ``threads`` | ++---------------------+---------------------------------------------------------------------------------------------+ +| ``output`` | either ``profiler`` or ``file`` | ++---------------------+---------------------------------------------------------------------------------------------+ +| ``output-type`` | alias for ``output`` | ++---------------------+---------------------------------------------------------------------------------------------+ +| ``profiler-preset`` | a `profiler preset <https://searchfox.org/mozilla-central/search?q=%40type+{Presets}>`_ | ++---------------------+---------------------------------------------------------------------------------------------+ + +If a preset is selected, then ``threads`` or ``modules`` can be used to override the +profiled threads or logging modules enabled, but keeping other aspects of the +preset. If no preset is selected, then a generic profiling preset is used, +``firefox-platform``. For example: + +:: + + about:logging?output=profiler&preset=media-playback&modules=cubeb:4,AudioSinkWrapper:4:AudioSink:4 + +will profile the threads in the ``Media`` profiler preset, but will only log +specific log modules (instead of the `long list +<https://searchfox.org/mozilla-central/search?q="media-playback"&path=toolkit%2Fcontent%2FaboutLogging.js>`_ +in the ``media-playback`` preset). In addition, it disallows logging to a file. + +Enabling logging using environment variables +'''''''''''''''''''''''''''''''''''''''''''' + +On UNIX, setting and environment variable can be done in a variety of ways + +:: + + set MOZ_LOG="example_logger:3" + export MOZ_LOG="example_logger:3" + MOZ_LOG="example_logger:3" ./mach run + +In the Windows Command Prompt (``cmd.exe``), don't use quotes: + +:: + + set MOZ_LOG=example_logger:3 + +If you want this on GeckoView example, use the following adb command to launch process: + +:: + + adb shell am start -n org.mozilla.geckoview_example/.GeckoViewActivity --es env0 "MOZ_LOG=example_logger:3" + +There are special module names to change logging behavior. You can specify one or more special module names without logging level. + +For example, if you want to specify ``sync``, ``timestamp`` and ``rotate``: + +:: + + set MOZ_LOG="example_logger:3,timestamp,sync,rotate:10" + +Enabling logging usually outputs the logging statements to the terminal. To +have the logs written to a file instead (one file per process), the environment +variable ``MOZ_LOG_FILE`` can be used. Logs will be written at this path +(either relative or absolute), suffixed by a process type and its PID. +``MOZ_LOG`` files are text files and have the extension ``.moz_log``. + +For example, setting: + +:: + + set MOZ_LOG_FILE="firefox-logs" + +can create a number of files like so: + +:: + + firefox-log-main.96353.moz_log + firefox-log-child.96354.moz_log + +respectively for a parent process of PID 96353 and a child process of PID +96354. + +Enabling logging using command-line flags +''''''''''''''''''''''''''''''''''''''''' + +The ``MOZ_LOG`` syntax can be used with the command line switch on the same +name, and specifying a file with ``MOZ_LOG_FILE`` works in the same way: + +:: + + ./mach run -MOZ_LOG=timestamp,rotate:200,example_module:5 -MOZ_LOG_FILE=%TEMP%\firefox-logs + +will enable verbose (``5``) logging for the module ``example_module``, with +timestamp prepended to each line, rotate the logs with 4 files of each 50MB +(for a total of 200MB), and write the output to the temporary directory on +Windows, with name starting with ``firefox-logs``. + +.. _Enabling logging using preferences: + +Enabling logging using preferences +'''''''''''''''''''''''''''''''''' + +To adjust the logging after Firefox has started, you can set prefs under the +`logging.` prefix. For example, setting `logging.foo` to `3` will set the log +module `foo` to start logging at level 3. A number of special prefs can be set, +described in the table below: + ++-------------------------------------+------------+-------------------------------+--------------------------------------------------------+ +| Preference name | Preference | Preference value | Description | ++=====================================+============+===============================+========================================================+ +| ``logging.config.clear_on_startup`` | bool | -- | Whether to clear all prefs under ``logging.`` | ++-------------------------------------+------------+-------------------------------+--------------------------------------------------------+ +| ``logging.config.LOG_FILE`` | string | A path (relative or absolute) | The path to which the log files will be written. | ++-------------------------------------+------------+-------------------------------+--------------------------------------------------------+ +| ``logging.config.add_timestamp`` | bool | -- | Whether to prefix all lines by a timestamp. | ++-------------------------------------+------------+-------------------------------+--------------------------------------------------------+ +| ``logging.config.sync`` | bool | -- | Whether to flush the stream after each log statements. | ++-------------------------------------+------------+-------------------------------+--------------------------------------------------------+ +| ``logging.config.profilerstacks`` | bool | -- | | When logging to the Firefox Profiler, whether to | +| | | | | include the call stack in each logging statement. | ++-------------------------------------+------------+-------------------------------+--------------------------------------------------------+ + +Enabling logging in Rust code +----------------------------- + +We're gradually adding more Rust code to Gecko, and Rust crates typically use a +different approach to logging. Many Rust libraries use the `log +<https://docs.rs/log>`_ crate to log messages, which works together with +`env_logger <https://docs.rs/env_logger>`_ at the application level to control +what's actually printed via `RUST_LOG`. + +You can set an overall logging level, though it could be quite verbose: + +:: + + set RUST_LOG="debug" + +You can also target individual modules by path: + +:: + + set RUST_LOG="style::style_resolver=debug" + +.. note:: + For Linux/MacOS users, you need to use `export` rather than `set`. + +.. note:: + Sometimes it can be useful to only log child processes and ignore the parent + process. In Firefox 57 and later, you can use `RUST_LOG_CHILD` instead of + `RUST_LOG` to specify log settings that will only apply to child processes. + +The `log` crate lists the available `log levels <https://docs.rs/log/0.3.8/log/enum.LogLevel.html>`_: + ++-----------+---------------------------------------------------------------------------------------------------------+ +| Log Level | Purpose | ++===========+=========================================================================================================+ +| error | Designates very serious errors. | ++-----------+---------------------------------------------------------------------------------------------------------+ +| warn | Designates hazardous situations. | ++-----------+---------------------------------------------------------------------------------------------------------+ +| info | Designates useful information. | ++-----------+---------------------------------------------------------------------------------------------------------+ +| debug | Designates lower priority information. | ++-----------+---------------------------------------------------------------------------------------------------------+ +| trace | Designates very low priority, often extremely verbose, information. | ++-----------+---------------------------------------------------------------------------------------------------------+ + +It is common for debug and trace to be disabled at compile time in release builds, so you may need a debug build if you want logs from those levels. + +Check the `env_logger <https://docs.rs/env_logger>`_ docs for more details on logging options. + +Additionally, a mapping from `RUST_LOG` is available. When using the `MOZ_LOG` +syntax, it is possible to enable logging in rust crate using a similar syntax: + +:: + + MOZ_LOG=rust_crate_name::*:4 + +will enable `debug` logging for all log statements in the crate +``rust_crate_name``. + +`*` can be replaced by a series of modules if more specificity is needed: + +:: + + MOZ_LOG=rust_crate_name::module::submodule:4 + +will enable `debug` logging for all log statements in the sub-module +``submodule`` of the module ``module`` of the crate ``rust_crate_name``. + + +A table mapping Rust log levels to `MOZ_LOG` log level is available below: + ++----------------+---------------+-----------------+ +| Rust log level | MOZ_LOG level | Numerical value | ++================+===============+=================+ +| off | Disabled | 0 | ++----------------+---------------+-----------------+ +| error | Error | 1 | ++----------------+---------------+-----------------+ +| warn | Warning | 2 | ++----------------+---------------+-----------------+ +| info | Info | 3 | ++----------------+---------------+-----------------+ +| debug | Debug | 4 | ++----------------+---------------+-----------------+ +| trace | Verbose | 5 | ++----------------+---------------+-----------------+ + + +Enabling logging on Android, interleaved with system logs (``logcat``) +---------------------------------------------------------------------- + +While logging to the Firefox Profiler works it's sometimes useful to have +system logs (``adb logcat``) interleaved with application logging. With a +device (or emulator) that ``adb devices`` sees, it's possible to set +environment variables like so, for e.g. ``GeckoView_example``: + + +.. code-block:: sh + + adb shell am start -n org.mozilla.geckoview_example/.GeckoViewActivity --es env0 MOZ_LOG=MediaDemuxer:4 + + +It is then possible to see the logging statements like so, to display all logs, +including ``MOZ_LOG``: + +.. code-block:: sh + + adb logcat + +and to only see ``MOZ_LOG`` like so: + +.. code-block:: sh + + adb logcat Gecko:V '*:S' + +This expression means: print log module ``Gecko`` from log level ``Verbose`` +(lowest level, this means that all levels are printed), and filter out (``S`` +for silence) all other logging (``*``, be careful to quote it or escape it +appropriately, it so that it's not expanded by the shell). + +While interactive with e.g. ``GeckoView`` code, it can be useful to specify +more logging tags like so: + +.. code-block:: sh + + adb logcat GeckoViewActivity:V Gecko:V '*:S' + + +Enabling logging on Android, using the Firefox Profiler +------------------------------------------------------- + +Set the logging modules using `about:config` (this requires a Nightly build) +using the instructions outlined above, and start the profile using an +appropriate profiling preset to profile the correct threads using the instructions +written in Firefox Profiler documentation's `dedicated page +<https://profiler.firefox.com/docs/#/./guide-profiling-android-directly-on-device>`_. + +`Bug 1803607 <https://bugzilla.mozilla.org/show_bug.cgi?id=1803607>`_ tracks +improving the logging experience on mobile. + +Working with ``MOZ_LOG`` in the code +++++++++++++++++++++++++++++++++++++ + +Declaring a Log Module +---------------------- + +``LazyLogModule`` defers the creation the backing ``LogModule`` in a thread-safe manner and is the preferred method to declare a log module. Multiple ``LazyLogModules`` with the same name can be declared, all will share the same backing ``LogModule``. This makes it much simpler to share a log module across multiple translation units. ``LazyLogLodule`` provides a conversion operator to ``LogModule*`` and is suitable for passing into the logging macros detailed below. + +Note: Log module names can only contain specific characters. The first character must be a lowercase or uppercase ASCII char, underscore, dash, or dot. Subsequent characters may be any of those, or an ASCII digit. + +.. code-block:: cpp + + #include "mozilla/Logging.h" + + static mozilla::LazyLogModule sFooLog("foo"); + + +Logging interface +----------------- + +A basic interface is provided in the form of 2 macros and an enum class. + ++----------------------------------------+----------------------------------------------------------------------------+ +| MOZ_LOG(module, level, message) | Outputs the given message if the module has the given log level enabled: | +| | | +| | * module: The log module to use. | +| | * level: The log level of the message. | +| | * message: A printf-style message to output. Must be enclosed in | +| | parentheses. | ++----------------------------------------+----------------------------------------------------------------------------+ +| MOZ_LOG_TEST(module, level) | Checks if the module has the given level enabled: | +| | | +| | * module: The log module to use. | +| | * level: The output level of the message. | ++----------------------------------------+----------------------------------------------------------------------------+ + + ++-----------+---------------+-----------------------------------------------------------------------------------------+ +| Log Level | Numeric Value | Purpose | ++===========+===============+=========================================================================================+ +| Disabled | 0 | Indicates logging is disabled. This should not be used directly in code. | ++-----------+---------------+-----------------------------------------------------------------------------------------+ +| Error | 1 | An error occurred, generally something you would consider asserting in a debug build. | ++-----------+---------------+-----------------------------------------------------------------------------------------+ +| Warning | 2 | A warning often indicates an unexpected state. | ++-----------+---------------+-----------------------------------------------------------------------------------------+ +| Info | 3 | An informational message, often indicates the current program state. | ++-----------+---------------+-----------------------------------------------------------------------------------------+ +| Debug | 4 | A debug message, useful for debugging but too verbose to be turned on normally. | ++-----------+---------------+-----------------------------------------------------------------------------------------+ +| Verbose | 5 | A message that will be printed a lot, useful for debugging program flow and will | +| | | probably impact performance. | ++-----------+---------------+-----------------------------------------------------------------------------------------+ + +Example Usage +------------- + +.. code-block:: cpp + + #include "mozilla/Logging.h" + + using mozilla::LogLevel; + + static mozilla::LazyLogModule sLogger("example_logger"); + + static void DoStuff() + { + MOZ_LOG(sLogger, LogLevel::Info, ("Doing stuff.")); + + int i = 0; + int start = Time::NowMS(); + MOZ_LOG(sLogger, LogLevel::Debug, ("Starting loop.")); + while (i++ < 10) { + MOZ_LOG(sLogger, LogLevel::Verbose, ("i = %d", i)); + } + + // Only calculate the elapsed time if the Warning level is enabled. + if (MOZ_LOG_TEST(sLogger, LogLevel::Warning)) { + int elapsed = Time::NowMS() - start; + if (elapsed > 1000) { + MOZ_LOG(sLogger, LogLevel::Warning, ("Loop took %dms!", elapsed)); + } + } + + if (i != 10) { + MOZ_LOG(sLogger, LogLevel::Error, ("i should be 10!")); + } + } diff --git a/xpcom/docs/refptr.rst b/xpcom/docs/refptr.rst new file mode 100644 index 0000000000..f71acc6828 --- /dev/null +++ b/xpcom/docs/refptr.rst @@ -0,0 +1,81 @@ +Reference Counting Helpers +========================== + +RefPtr versus nsCOMPtr +---------------------- + +The general rule of thumb is to use ``nsCOMPtr<T>`` when ``T`` is an +interface type which inherits from ``nsISupports``, and ``RefPtr<T>`` when +``T`` is a concrete type. + +This basic rule derives from some ``nsCOMPtr<T>`` code being factored into +the ``nsCOMPtr_base`` base class, which stores the pointer as a +``nsISupports*``. This design was intended to save some space in the binary +(though it is unclear if it still does). Since ``nsCOMPtr`` stores the +pointer as ``nsISupports*``, it must be possible to unambiguously cast from +``T*`` to ``nsISupports**``. Many concrete classes inherit from more than +one XPCOM interface, meaning that they cannot be used with ``nsCOMPtr``, +which leads to the suggestion to use ``RefPtr`` for these classes. + +``nsCOMPtr<T>`` also requires that the target type ``T`` be a valid target +for ``QueryInterface`` so that it can assert that the stored pointer is a +canonical ``T`` pointer (i.e. that ``mRawPtr->QueryInterface(T_IID) == +mRawPtr``). + +do_XXX() nsCOMPtr helpers +------------------------- + +There are a number of ``do_XXX`` helper methods across the codebase which can +be assigned into ``nsCOMPtr`` (and sometimes ``RefPtr``) to perform explicit +operations based on the target pointer type. + +In general, when these operations succeed, they will initialize the smart +pointer with a valid value, and otherwise they will silently initialize the +smart pointer to ``nullptr``. + +``do_QueryInterface`` and ``do_QueryObject`` +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Attempts to cast the provided object to the target class using the XPCOM +``QueryInterface`` mechanism. In general, use ``do_QueryInterface`` may only +be used to cast between interface types in a ``nsCOMPtr<T>``, and +``do_QueryObject`` in situations when downcasting to concrete types. + + +``do_GetInterface`` +~~~~~~~~~~~~~~~~~~~ + +Looks up an object implementing the requested interface using the +``nsIInterfaceRequestor`` interface. If the target object doesn't implement +``nsIInterfaceRequestor`` or doesn't provide the given interface, initializes +the smart pointer with ``nullptr``. + + +``do_GetService`` +~~~~~~~~~~~~~~~~~ + +Looks up the component defined by the passed-in CID or ContractID string in +the component manager, and returns a pointer to the service instance. This +may start the service if it hasn't been started already. The resulting +service will be cast to the target interface type using ``QueryInterface``. + + +``do_CreateInstance`` +~~~~~~~~~~~~~~~~~~~~~ + +Looks up the component defined by the passed-in CID or ContractID string in +the component manager, creates and returns a new instance. The resulting +object will be cast to the target interface type using ``QueryInterface``. + + +``do_QueryReferent`` and ``do_GetWeakReference`` +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +When passed a ``nsIWeakReference*`` (e.g. from a ``nsWeakPtr``), +``do_QueryReferent`` attempts to re-acquire a strong reference to the held +type, and cast it to the target type with ``QueryInterface``. Initializes the +smart pointer with ``nullptr`` if either of these steps fail. + +In contrast ``do_GetWeakReference`` does the opposite, using +``QueryInterface`` to cast the type to ``nsISupportsWeakReference*``, and +acquire a ``nsIWeakReference*`` to the passed-in object. diff --git a/xpcom/docs/stringguide.rst b/xpcom/docs/stringguide.rst new file mode 100644 index 0000000000..a3266d9604 --- /dev/null +++ b/xpcom/docs/stringguide.rst @@ -0,0 +1,1094 @@ +String Guide +============ + +Most of the Mozilla code uses a C++ class hierarchy to pass string data, +rather than using raw pointers. This guide documents the string classes which +are visible to code within the Mozilla codebase (code which is linked into +``libxul``). + +Introduction +------------ + +The string classes are a library of C++ classes which are used to manage +buffers of wide (16-bit) and narrow (8-bit) character strings. The headers +and implementation are in the `xpcom/string +<https://searchfox.org/mozilla-central/source/xpcom/string>`_ directory. All +strings are stored as a single contiguous buffer of characters. + +The 8-bit and 16-bit string classes have completely separate base classes, +but share the same APIs. As a result, you cannot assign a 8-bit string to a +16-bit string without some kind of conversion helper class or routine. For +the purpose of this document, we will refer to the 16-bit string classes in +class documentation. Every 16-bit class has an equivalent 8-bit class: + +===================== ====================== +Wide Narrow +===================== ====================== +``nsAString`` ``nsACString`` +``nsString`` ``nsCString`` +``nsAutoString`` ``nsAutoCString`` +``nsDependentString`` ``nsDependentCString`` +===================== ====================== + +The string classes distinguish, as part of the type hierarchy, between +strings that must have a null-terminator at the end of their buffer +(``ns[C]String``) and strings that are not required to have a null-terminator +(``nsA[C]String``). nsA[C]String is the base of the string classes (since it +imposes fewer requirements) and ``ns[C]String`` is a class derived from it. +Functions taking strings as parameters should generally take one of these +four types. + +In order to avoid unnecessary copying of string data (which can have +significant performance cost), the string classes support different ownership +models. All string classes support the following three ownership models +dynamically: + +* reference counted, copy-on-write, buffers (the default) + +* adopted buffers (a buffer that the string class owns, but is not reference + counted, because it came from somewhere else) + +* dependent buffers, that is, an underlying buffer that the string class does + not own, but that the caller that constructed the string guarantees will + outlive the string instance + +Auto strings will prefer reference counting an existing reference-counted +buffer over their stack buffer, but will otherwise use their stack buffer for +anything that will fit in it. + +There are a number of additional string classes: + + +* Classes which exist primarily as constructors for the other types, + particularly ``nsDependent[C]String`` and ``nsDependent[C]Substring``. These + types are really just convenient notation for constructing an + ``nsA[C]String`` with a non-default ownership mode; they should not be + thought of as different types. + +* ``nsLiteral[C]String`` which should rarely be constructed explicitly but + usually through the ``""_ns`` and ``u""_ns`` user-defined string literals. + ``nsLiteral[C]String`` is trivially constructible and destructible, and + therefore does not emit construction/destruction code when stored in static, + as opposed to the other string classes. + +The Major String Classes +------------------------ + +The list below describes the main base classes. Once you are familiar with +them, see the appendix describing What Class to Use When. + + +* **nsAString**/**nsACString**: the abstract base class for all strings. It + provides an API for assignment, individual character access, basic + manipulation of characters in the string, and string comparison. This class + corresponds to the XPIDL ``AString`` or ``ACString`` parameter types. + ``nsA[C]String`` is not necessarily null-terminated. + +* **nsString**/**nsCString**: builds on ``nsA[C]String`` by guaranteeing a + null-terminated storage. This allows for a method (``.get()``) to access the + underlying character buffer. + +The remainder of the string classes inherit from either ``nsA[C]String`` or +``ns[C]String``. Thus, every string class is compatible with ``nsA[C]String``. + +.. note:: + + In code which is generic over string width, ``nsA[C]String`` is sometimes + known as ``nsTSubstring<CharT>``. ``nsAString`` is a type alias for + ``nsTSubstring<char16_t>``, and ``nsACString`` is a type alias for + ``nsTSubstring<char>``. + +.. note:: + + The type ``nsLiteral[C]String`` technically does not inherit from + ``nsA[C]String``, but instead inherits from ``nsStringRepr<CharT>``. This + allows the type to not generate destructors when stored in static + storage. + + It can be implicitly coerced to ``const ns[C]String&`` (though can never + be accessed mutably) and generally acts as-if it was a subclass of + ``ns[C]String`` in most cases. + +Since every string derives from ``nsAString`` (or ``nsACString``), they all +share a simple API. Common read-only methods include: + +* ``.Length()`` - the number of code units (bytes for 8-bit string classes and ``char16_t`` for 16-bit string classes) in the string. +* ``.IsEmpty()`` - the fastest way of determining if the string has any value. Use this instead of testing ``string.Length() == 0`` +* ``.Equals(string)`` - ``true`` if the given string has the same value as the current string. Approximately the same as ``operator==``. + +Common methods that modify the string: + +* ``.Assign(string)`` - Assigns a new value to the string. Approximately the same as ``operator=``. +* ``.Append(string)`` - Appends a value to the string. +* ``.Insert(string, position)`` - Inserts the given string before the code unit at position. +* ``.Truncate(length)`` - shortens the string to the given length. + +More complete documentation can be found in the `Class Reference`_. + +As function parameters +~~~~~~~~~~~~~~~~~~~~~~ + +In general, use ``nsA[C]String`` references to pass strings across modules. For example: + +.. code-block:: cpp + + // when passing a string to a method, use const nsAString& + nsFoo::PrintString(const nsAString& str); + + // when getting a string from a method, use nsAString& + nsFoo::GetString(nsAString& result); + +The Concrete Classes - which classes to use when +------------------------------------------------ + +The concrete classes are for use in code that actually needs to store string +data. The most common uses of the concrete classes are as local variables, +and members in classes or structs. + +.. digraph:: concreteclasses + + node [shape=rectangle] + + "nsA[C]String" -> "ns[C]String"; + "ns[C]String" -> "nsDependent[C]String"; + "nsA[C]String" -> "nsDependent[C]Substring"; + "nsA[C]String" -> "ns[C]SubstringTuple"; + "ns[C]String" -> "nsAuto[C]StringN"; + "ns[C]String" -> "nsLiteral[C]String" [style=dashed]; + "nsAuto[C]StringN" -> "nsPromiseFlat[C]String"; + "nsAuto[C]StringN" -> "nsPrintfCString"; + +The following is a list of the most common concrete classes. Once you are +familiar with them, see the appendix describing What Class to Use When. + +* ``ns[C]String`` - a null-terminated string whose buffer is allocated on the + heap. Destroys its buffer when the string object goes away. + +* ``nsAuto[C]String`` - derived from ``nsString``, a string which owns a 64 + code unit buffer in the same storage space as the string itself. If a string + less than 64 code units is assigned to an ``nsAutoString``, then no extra + storage will be allocated. For larger strings, a new buffer is allocated on + the heap. + + If you want a number other than 64, use the templated types ``nsAutoStringN`` + / ``nsAutoCStringN``. (``nsAutoString`` and ``nsAutoCString`` are just + typedefs for ``nsAutoStringN<64>`` and ``nsAutoCStringN<64>``, respectively.) + +* ``nsDependent[C]String`` - derived from ``nsString``, this string does not + own its buffer. It is useful for converting a raw string pointer (``const + char16_t*`` or ``const char*``) into a class of type ``nsAString``. Note that + you must null-terminate buffers used by to ``nsDependentString``. If you + don't want to or can't null-terminate the buffer, use + ``nsDependentSubstring``. + +* ``nsPrintfCString`` - derived from ``nsCString``, this string behaves like an + ``nsAutoCString``. The constructor takes parameters which allows it to + construct a 8-bit string from a printf-style format string and parameter + list. + +There are also a number of concrete classes that are created as a side-effect +of helper routines, etc. You should avoid direct use of these classes. Let +the string library create the class for you. + +* ``ns[C]SubstringTuple`` - created via string concatenation +* ``nsDependent[C]Substring`` - created through ``Substring()`` +* ``nsPromiseFlat[C]String`` - created through ``PromiseFlatString()`` +* ``nsLiteral[C]String`` - created through the ``""_ns`` and ``u""_ns`` user-defined literals + +Of course, there are times when it is necessary to reference these string +classes in your code, but as a general rule they should be avoided. + +Iterators +--------- + +Because Mozilla strings are always a single buffer, iteration over the +characters in the string is done using raw pointers: + +.. code-block:: cpp + + /** + * Find whether there is a tab character in `data` + */ + bool HasTab(const nsAString& data) { + const char16_t* cur = data.BeginReading(); + const char16_t* end = data.EndReading(); + + for (; cur < end; ++cur) { + if (char16_t('\t') == *cur) { + return true; + } + } + return false; + } + +Note that ``end`` points to the character after the end of the string buffer. +It should never be dereferenced. + +Writing to a mutable string is also simple: + +.. code-block:: cpp + + /** + * Replace every tab character in `data` with a space. + */ + void ReplaceTabs(nsAString& data) { + char16_t* cur = data.BeginWriting(); + char16_t* end = data.EndWriting(); + + for (; cur < end; ++cur) { + if (char16_t('\t') == *cur) { + *cur = char16_t(' '); + } + } + } + +You may change the length of a string via ``SetLength()``. Note that +Iterators become invalid after changing the length of a string. If a string +buffer becomes smaller while writing it, use ``SetLength`` to inform the +string class of the new size: + +.. code-block:: cpp + + /** + * Remove every tab character from `data` + */ + void RemoveTabs(nsAString& data) { + int len = data.Length(); + char16_t* cur = data.BeginWriting(); + char16_t* end = data.EndWriting(); + + while (cur < end) { + if (char16_t('\t') == *cur) { + len -= 1; + end -= 1; + if (cur < end) + memmove(cur, cur + 1, (end - cur) * sizeof(char16_t)); + } else { + cur += 1; + } + } + + data.SetLength(len); + } + +Note that using ``BeginWriting()`` to make a string longer is not OK. +``BeginWriting()`` must not be used to write past the logical length of the +string indicated by ``EndWriting()`` or ``Length()``. Calling +``SetCapacity()`` before ``BeginWriting()`` does not affect what the previous +sentence says. To make the string longer, call ``SetLength()`` before +``BeginWriting()`` or use the ``BulkWrite()`` API described below. + +Bulk Write +---------- + +``BulkWrite()`` allows capacity-aware cache-friendly low-level writes to the +string's buffer. + +Capacity-aware means that the caller is made aware of how the +caller-requested buffer capacity was rounded up to mozjemalloc buckets. This +is useful when initially requesting best-case buffer size without yet knowing +the true size need. If the data that actually needs to be written is larger +than the best-case estimate but still fits within the rounded-up capacity, +there is no need to reallocate despite requesting the best-case capacity. + +Cache-friendly means that the zero terminator for C compatibility is written +after the new content of the string has been written, so the result is a +forward-only linear write access pattern instead of a non-linear +back-and-forth sequence resulting from using ``SetLength()`` followed by +``BeginWriting()``. + +Low-level means that writing via a raw pointer is possible as with +``BeginWriting()``. + +``BulkWrite()`` takes three arguments: The new capacity (which may be rounded +up), the number of code units at the beginning of the string to preserve +(typically the old logical length), and a boolean indicating whether +reallocating a smaller buffer is OK if the requested capacity would fit in a +buffer that's smaller than current one. It returns a ``mozilla::Result`` which +contains either a usable ``mozilla::BulkWriteHandle<T>`` (where ``T`` is the +string's ``char_type``) or an ``nsresult`` explaining why none can be had +(presumably OOM). + +The actual writes are performed through the returned +``mozilla::BulkWriteHandle<T>``. You must not access the string except via this +handle until you call ``Finish()`` on the handle in the success case or you let +the handle go out of scope without calling ``Finish()`` in the failure case, in +which case the destructor of the handle puts the string in a mostly harmless but +consistent state (containing a single REPLACEMENT CHARACTER if a capacity +greater than 0 was requested, or in the ``char`` case if the three-byte UTF-8 +representation of the REPLACEMENT CHARACTER doesn't fit, an ASCII SUBSTITUTE). + +``mozilla::BulkWriteHandle<T>`` autoconverts to a writable +``mozilla::Span<T>`` and also provides explicit access to itself as ``Span`` +(``AsSpan()``) or via component accessors named consistently with those on +``Span``: ``Elements()`` and ``Length()``. (The latter is not the logical +length of the string but the writable length of the buffer.) The buffer +exposed via these methods includes the prefix that you may have requested to +be preserved. It's up to you to skip past it so as to not overwrite it. + +If there's a need to request a different capacity before you are ready to +call ``Finish()``, you can call ``RestartBulkWrite()`` on the handle. It +takes three arguments that match the first three arguments of +``BulkWrite()``. It returns ``mozilla::Result<mozilla::Ok, nsresult>`` to +indicate success or OOM. Calling ``RestartBulkWrite()`` invalidates +previously-obtained span, raw pointer or length. + +Once you are done writing, call ``Finish()``. It takes two arguments: the new +logical length of the string (which must not exceed the capacity returned by +the ``Length()`` method of the handle) and a boolean indicating whether it's +OK to attempt to reallocate a smaller buffer in case a smaller mozjemalloc +bucket could accommodate the new logical length. + +Helper Classes and Functions +---------------------------- + +Converting Cocoa strings +~~~~~~~~~~~~~~~~~~~~~~~~ + +Use ``mozilla::CopyCocoaStringToXPCOMString()`` in +``mozilla/MacStringHelpers.h`` to convert Cocoa strings to XPCOM strings. + +Searching strings - looking for substrings, characters, etc. +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The ``nsReadableUtils.h`` header provides helper methods for searching in runnables. + +.. code-block:: cpp + + bool FindInReadable(const nsAString& pattern, + nsAString::const_iterator start, nsAString::const_iterator end, + nsStringComparator& aComparator = nsDefaultStringComparator()); + +To use this, ``start`` and ``end`` should point to the beginning and end of a +string that you would like to search. If the search string is found, +``start`` and ``end`` will be adjusted to point to the beginning and end of +the found pattern. The return value is ``true`` or ``false``, indicating +whether or not the string was found. + +An example: + +.. code-block:: cpp + + const nsAString& str = GetSomeString(); + nsAString::const_iterator start, end; + + str.BeginReading(start); + str.EndReading(end); + + constexpr auto valuePrefix = u"value="_ns; + + if (FindInReadable(valuePrefix, start, end)) { + // end now points to the character after the pattern + valueStart = end; + } + +Checking for Memory Allocation failure +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Like other types in Gecko, the string classes use infallible memory +allocation by default, so you do not need to check for success when +allocating/resizing "normal" strings. + +Most functions that modify strings (``Assign()``, ``SetLength()``, etc.) also +have an overload that takes a ``mozilla::fallible_t`` parameter. These +overloads return ``false`` instead of aborting if allocation fails. Use them +when creating/allocating strings which may be very large, and which the +program could recover from if the allocation fails. + +Substrings (string fragments) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +It is very simple to refer to a substring of an existing string without +actually allocating new space and copying the characters into that substring. +``Substring()`` is the preferred method to create a reference to such a +string. + +.. code-block:: cpp + + void ProcessString(const nsAString& str) { + const nsAString& firstFive = Substring(str, 0, 5); // from index 0, length 5 + // firstFive is now a string representing the first 5 characters + } + +Unicode Conversion +------------------ + +Strings can be stored in two basic formats: 8-bit code unit (byte/``char``) +strings, or 16-bit code unit (``char16_t``) strings. Any string class with a +capital "C" in the classname contains 8-bit bytes. These classes include +``nsCString``, ``nsDependentCString``, and so forth. Any string class without +the "C" contains 16-bit code units. + +A 8-bit string can be in one of many character encodings while a 16-bit +string is always in potentially-invalid UTF-16. (You can make a 16-bit string +guaranteed-valid UTF-16 by passing it to ``EnsureUTF16Validity()``.) The most +common encodings are: + + +* ASCII - 7-bit encoding for basic English-only strings. Each ASCII value + is stored in exactly one byte in the array with the most-significant 8th bit + set to zero. + +* `UCS2 <http://www.unicode.org/glossary/#UCS_2>`_ - 16-bit encoding for a + subset of Unicode, `BMP <http://www.unicode.org/glossary/#BMP>`_. The Unicode + value of a character stored in UCS2 is stored in exactly one 16-bit + ``char16_t`` in a string class. + +* `UTF-8 <http://www.faqs.org/rfcs/rfc3629.html>`_ - 8-bit encoding for + Unicode characters. Each Unicode characters is stored in up to 4 bytes in a + string class. UTF-8 is capable of representing the entire Unicode character + repertoire, and it efficiently maps to `UTF-32 + <http://www.unicode.org/glossary/#UTF_32>`_. (Gtk and Rust natively use + UTF-8.) + +* `UTF-16 <http://www.unicode.org/glossary/#UTF_16>`_ - 16-bit encoding for + Unicode storage, backwards compatible with UCS2. The Unicode value of a + character stored in UTF-16 may require one or two 16-bit ``char16_t`` in a + string class. The contents of ``nsAString`` always has to be regarded as in + this encoding instead of UCS2. UTF-16 is capable of representing the entire + Unicode character repertoire, and it efficiently maps to UTF-32. (Win32 W + APIs and Mac OS X natively use UTF-16.) + +* Latin1 - 8-bit encoding for the first 256 Unicode code points. Used for + HTTP headers and for size-optimized storage in text node and SpiderMonkey + strings. Latin1 converts to UTF-16 by zero-extending each byte to a 16-bit + code unit. Note that this kind of "Latin1" is not available for encoding + HTML, CSS, JS, etc. Specifying ``charset=latin1`` means the same as + ``charset=windows-1252``. Windows-1252 is a similar but different encoding + used for interchange. + +In addition, there exist multiple other (legacy) encodings. The Web-relevant +ones are defined in the `Encoding Standard <https://encoding.spec.whatwg.org/>`_. +Conversions from these encodings to +UTF-8 and UTF-16 are provided by `mozilla::Encoding +<https://searchfox.org/mozilla-central/source/intl/Encoding.h#109>`_. +Additionally, on Windows the are some rare cases (e.g. drag&drop) where it's +necessary to call a system API with data encoded in the Windows +locale-dependent legacy encoding instead of UTF-16. In those rare cases, use +``MultiByteToWideChar``/``WideCharToMultiByte`` from kernel32.dll. Do not use +``iconv`` on *nix. We only support UTF-8-encoded file paths on *nix, non-path +Gtk strings are always UTF-8 and Cocoa and Java strings are always UTF-16. + +When working with existing code, it is important to examine the current usage +of the strings that you are manipulating, to determine the correct conversion +mechanism. + +When writing new code, it can be confusing to know which storage class and +encoding is the most appropriate. There is no single answer to this question, +but the important points are: + + +* **Surprisingly many strings are very often just ASCII.** ASCII is a subset of + UTF-8 and is, therefore, efficient to represent as UTF-8. Representing ASCII + as UTF-16 bad both for memory usage and cache locality. + +* **Rust strongly prefers UTF-8.** If your C++ code is interacting with Rust + code, using UTF-8 in ``nsACString`` and merely validating it when converting + to Rust strings is more efficient than using ``nsAString`` on the C++ side. + +* **Networking code prefers 8-bit strings.** Networking code tends to use 8-bit + strings: either with UTF-8 or Latin1 (byte value is the Unicode scalar value) + semantics. + +* **JS and DOM prefer UTF-16.** Most Gecko code uses UTF-16 for compatibility + with JS strings and DOM string which are potentially-invalid UTF-16. However, + both DOM text nodes and JS strings store strings that only contain code points + below U+0100 as Latin1 (byte value is the Unicode scalar value). + +* **Windows and Cocoa use UTF-16.** Windows system APIs take UTF-16. Cocoa + ``NSString`` is UTF-16. + +* **Gtk uses UTF-8.** Gtk APIs take UTF-8 for non-file paths. In the Gecko + case, we support only UTF-8 file paths outside Windows, so all Gtk strings + are UTF-8 for our purposes though file paths received from Gtk may not be + valid UTF-8. + +To assist with ASCII, Latin1, UTF-8, and UTF-16 conversions, there are some +helper methods and classes. Some of these classes look like functions, +because they are most often used as temporary objects on the stack. + +Short zero-terminated ASCII strings +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +If you have a short zero-terminated string that you are certain is always +ASCII, use these special-case methods instead of the conversions described in +the later sections. + +* If you are assigning an ASCII literal to an ``nsACString``, use + ``AssignLiteral()``. +* If you are assigning a literal to an ``nsAString``, use ``AssignLiteral()`` + and make the literal a ``u""`` literal. If the literal has to be a ``""`` + literal (as opposed to ``u""``) and is ASCII, still use ``AppendLiteral()``, + but be aware that this involves a run-time inflation. +* If you are assigning a zero-terminated ASCII string that's not a literal from + the compiler's point of view at the call site and you don't know the length + of the string either (e.g. because it was looked up from an array of literals + of varying lengths), use ``AssignASCII()``. + +UTF-8 / UTF-16 conversion +~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. cpp:function:: NS_ConvertUTF8toUTF16(const nsACString&) + + a ``nsAutoString`` subclass that converts a UTF-8 encoded ``nsACString`` + or ``const char*`` to a 16-bit UTF-16 string. If you need a ``const + char16_t*`` buffer, you can use the ``.get()`` method. For example: + + .. code-block:: cpp + + /* signature: void HandleUnicodeString(const nsAString& str); */ + object->HandleUnicodeString(NS_ConvertUTF8toUTF16(utf8String)); + + /* signature: void HandleUnicodeBuffer(const char16_t* str); */ + object->HandleUnicodeBuffer(NS_ConvertUTF8toUTF16(utf8String).get()); + +.. cpp:function:: NS_ConvertUTF16toUTF8(const nsAString&) + + a ``nsAutoCString`` which converts a 16-bit UTF-16 string (``nsAString``) + to a UTF-8 encoded string. As above, you can use ``.get()`` to access a + ``const char*`` buffer. + + .. code-block:: cpp + + /* signature: void HandleUTF8String(const nsACString& str); */ + object->HandleUTF8String(NS_ConvertUTF16toUTF8(utf16String)); + + /* signature: void HandleUTF8Buffer(const char* str); */ + object->HandleUTF8Buffer(NS_ConvertUTF16toUTF8(utf16String).get()); + +.. cpp:function:: CopyUTF8toUTF16(const nsACString&, nsAString&) + + converts and copies: + + .. code-block:: cpp + + // return a UTF-16 value + void Foo::GetUnicodeValue(nsAString& result) { + CopyUTF8toUTF16(mLocalUTF8Value, result); + } + +.. cpp:function:: AppendUTF8toUTF16(const nsACString&, nsAString&) + + converts and appends: + + .. code-block:: cpp + + // return a UTF-16 value + void Foo::GetUnicodeValue(nsAString& result) { + result.AssignLiteral("prefix:"); + AppendUTF8toUTF16(mLocalUTF8Value, result); + } + +.. cpp:function:: CopyUTF16toUTF8(const nsAString&, nsACString&) + + converts and copies: + + .. code-block:: cpp + + // return a UTF-8 value + void Foo::GetUTF8Value(nsACString& result) { + CopyUTF16toUTF8(mLocalUTF16Value, result); + } + +.. cpp:function:: AppendUTF16toUTF8(const nsAString&, nsACString&) + + converts and appends: + + .. code-block:: cpp + + // return a UTF-8 value + void Foo::GetUnicodeValue(nsACString& result) { + result.AssignLiteral("prefix:"); + AppendUTF16toUTF8(mLocalUTF16Value, result); + } + + +Latin1 / UTF-16 Conversion +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The following should only be used when you can guarantee that the original +string is ASCII or Latin1 (in the sense that the byte value is the Unicode +scalar value; not in the windows-1252 sense). These helpers are very similar +to the UTF-8 / UTF-16 conversion helpers above. + + +UTF-16 to Latin1 converters +``````````````````````````` + +These converters are **very dangerous** because they **lose information** +during the conversion process. You should **avoid UTF-16 to Latin1 +conversions** unless your strings are guaranteed to be Latin1 or ASCII. (In +the future, these conversions may start asserting in debug builds that their +input is in the permissible range.) If the input is actually in the Latin1 +range, each 16-bit code unit in narrowed to an 8-bit byte by removing the +high half. Unicode code points above U+00FF result in garbage whose nature +must not be relied upon. (In the future the nature of the garbage will be CPU +architecture-dependent.) If you want to ``printf()`` something and don't care +what happens to non-ASCII, please convert to UTF-8 instead. + + +.. cpp:function:: NS_LossyConvertUTF16toASCII(const nsAString&) + + A ``nsAutoCString`` which holds a temporary buffer containing the Latin1 + value of the string. + +.. cpp:function:: void LossyCopyUTF16toASCII(Span<const char16_t>, nsACString&) + + Does an in-place conversion from UTF-16 into an Latin1 string object. + +.. cpp:function:: void LossyAppendUTF16toASCII(Span<const char16_t>, nsACString&) + + Appends a UTF-16 string to a Latin1 string. + +Latin1 to UTF-16 converters +``````````````````````````` + +These converters are very dangerous because they will **produce wrong results +for non-ASCII UTF-8 or windows-1252 input** into a meaningless UTF-16 string. +You should **avoid ASCII to UTF-16 conversions** unless your strings are +guaranteed to be ASCII or Latin1 in the sense of the byte value being the +Unicode scalar value. Every byte is zero-extended into a 16-bit code unit. + +It is correct to use these on most HTTP header values, but **it's always +wrong to use these on HTTP response bodies!** (Use ``mozilla::Encoding`` to +deal with response bodies.) + +.. cpp:function:: NS_ConvertASCIItoUTF16(const nsACString&) + + A ``nsAutoString`` which holds a temporary buffer containing the value of + the Latin1 to UTF-16 conversion. + +.. cpp:function:: void CopyASCIItoUTF16(Span<const char>, nsAString&) + + does an in-place conversion from Latin1 to UTF-16. + +.. cpp:function:: void AppendASCIItoUTF16(Span<const char>, nsAString&) + + appends a Latin1 string to a UTF-16 string. + +Comparing ns*Strings with C strings +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +You can compare ``ns*Strings`` with C strings by converting the ``ns*String`` +to a C string, or by comparing directly against a C String. + +.. cpp:function:: bool nsAString::EqualsASCII(const char*) + + Compares with an ASCII C string. + +.. cpp:function:: bool nsAString::EqualsLiteral(...) + + Compares with a string literal. + +Common Patterns +--------------- + +Literal Strings +~~~~~~~~~~~~~~~ + +A literal string is a raw string value that is written in some C++ code. For +example, in the statement ``printf("Hello World\n");`` the value ``"Hello +World\n"`` is a literal string. It is often necessary to insert literal +string values when an ``nsAString`` or ``nsACString`` is required. Two +user-defined literals are provided that implicitly convert to ``const +nsString&`` resp. ``const nsCString&``: + +* ``""_ns`` for 8-bit literals, converting implicitly to ``const nsCString&`` +* ``u""_ns`` for 16-bit literals, converting implicitly to ``const nsString&`` + +The benefits of the user-defined literals may seem unclear, given that +``nsDependentCString`` will also wrap a string value in an ``nsCString``. The +advantage of the user-defined literals is twofold. + +* The length of these strings is calculated at compile time, so the string does + not need to be scanned at runtime to determine its length. + +* Literal strings live for the lifetime of the binary, and can be moved between + the ``ns[C]String`` classes without being copied or freed. + +Here are some examples of proper usage of the literals (both standard and +user-defined): + +.. code-block:: cpp + + // call Init(const nsLiteralString&) - enforces that it's only called with literals + Init(u"start value"_ns); + + // call Init(const nsAString&) + Init(u"start value"_ns); + + // call Init(const nsACString&) + Init("start value"_ns); + +In case a literal is defined via a macro, you can just convert it to +``nsLiteralString`` or ``nsLiteralCString`` using their constructor. You +could consider not using a macro at all but a named ``constexpr`` constant +instead. + +In some cases, an 8-bit literal is defined via a macro, either within code or +from the environment, but it can't be changed or is used both as an 8-bit and +a 16-bit string. In these cases, you can use the +``NS_LITERAL_STRING_FROM_CSTRING`` macro to construct a ``nsLiteralString`` +and do the conversion at compile-time. + +String Concatenation +~~~~~~~~~~~~~~~~~~~~ + +Strings can be concatenated together using the + operator. The resulting +string is a ``const nsSubstringTuple`` object. The resulting object can be +treated and referenced similarly to a ``nsAString`` object. Concatenation *does +not copy the substrings*. The strings are only copied when the concatenation +is assigned into another string object. The ``nsSubstringTuple`` object holds +pointers to the original strings. Therefore, the ``nsSubstringTuple`` object is +dependent on all of its substrings, meaning that their lifetime must be at +least as long as the ``nsSubstringTuple`` object. + +For example, you can use the value of two strings and pass their +concatenation on to another function which takes an ``const nsAString&``: + +.. code-block:: cpp + + void HandleTwoStrings(const nsAString& one, const nsAString& two) { + // call HandleString(const nsAString&) + HandleString(one + two); + } + +NOTE: The two strings are implicitly combined into a temporary ``nsString`` +in this case, and the temporary string is passed into ``HandleString``. If +``HandleString`` assigns its input into another ``nsString``, then the string +buffer will be shared in this case negating the cost of the intermediate +temporary. You can concatenate N strings and store the result in a temporary +variable: + +.. code-block:: cpp + + constexpr auto start = u"start "_ns; + constexpr auto middle = u"middle "_ns; + constexpr auto end = u"end"_ns; + // create a string with 3 dependent fragments - no copying involved! + nsString combinedString = start + middle + end; + + // call void HandleString(const nsAString&); + HandleString(combinedString); + +It is safe to concatenate user-defined literals because the temporary +``nsLiteral[C]String`` objects will live as long as the temporary +concatenation object (of type ``nsSubstringTuple``). + +.. code-block:: cpp + + // call HandlePage(const nsAString&); + // safe because the concatenated-string will live as long as its substrings + HandlePage(u"start "_ns + u"end"_ns); + +Local Variables +~~~~~~~~~~~~~~~ + +Local variables within a function are usually stored on the stack. The +``nsAutoString``/``nsAutoCString`` classes are subclasses of the +``nsString``/``nsCString`` classes. They own a 64-character buffer allocated +in the same storage space as the string itself. If the ``nsAutoString`` is +allocated on the stack, then it has at its disposal a 64-character stack +buffer. This allows the implementation to avoid allocating extra memory when +dealing with small strings. ``nsAutoStringN``/``nsAutoCStringN`` are more +general alternatives that let you choose the number of characters in the +inline buffer. + +.. code-block:: cpp + + ... + nsAutoString value; + GetValue(value); // if the result is less than 64 code units, + // then this just saved us an allocation + ... + +Member Variables +~~~~~~~~~~~~~~~~ + +In general, you should use the concrete classes ``nsString`` and +``nsCString`` for member variables. + +.. code-block:: cpp + + class Foo { + ... + // these store UTF-8 and UTF-16 values respectively + nsCString mLocalName; + nsString mTitle; + }; + +A common incorrect pattern is to use ``nsAutoString``/``nsAutoCString`` +for member variables. As described in `Local Variables`_, these classes have +a built in buffer that make them very large. This means that if you include +them in a class, they bloat the class by 64 bytes (``nsAutoCString``) or 128 +bytes (``nsAutoString``). + + +Raw Character Pointers +~~~~~~~~~~~~~~~~~~~~~~ + +``PromiseFlatString()`` and ``PromiseFlatCString()`` can be used to create a +temporary buffer which holds a null-terminated buffer containing the same +value as the source string. ``PromiseFlatString()`` will create a temporary +buffer if necessary. This is most often used in order to pass an +``nsAString`` to an API which requires a null-terminated string. + +In the following example, an ``nsAString`` is combined with a literal string, +and the result is passed to an API which requires a simple character buffer. + +.. code-block:: cpp + + // Modify the URL and pass to AddPage(const char16_t* url) + void AddModifiedPage(const nsAString& url) { + constexpr auto httpPrefix = u"http://"_ns; + const nsAString& modifiedURL = httpPrefix + url; + + // creates a temporary buffer + AddPage(PromiseFlatString(modifiedURL).get()); + } + +``PromiseFlatString()`` is smart when handed a string that is already +null-terminated. It avoids creating the temporary buffer in such cases. + +.. code-block:: cpp + + // Modify the URL and pass to AddPage(const char16_t* url) + void AddModifiedPage(const nsAString& url, PRBool addPrefix) { + if (addPrefix) { + // MUST create a temporary buffer - string is multi-fragmented + constexpr auto httpPrefix = u"http://"_ns; + AddPage(PromiseFlatString(httpPrefix + modifiedURL)); + } else { + // MIGHT create a temporary buffer, does a runtime check + AddPage(PromiseFlatString(url).get()); + } + } + +.. note:: + + It is **not** possible to efficiently transfer ownership of a string + class' internal buffer into an owned ``char*`` which can be safely + freed by other components due to the COW optimization. + + If working with a legacy API which requires malloced ``char*`` buffers, + prefer using ``ToNewUnicode``, ``ToNewCString`` or ``ToNewUTF8String`` + over ``strdup`` to create owned ``char*`` pointers. + +``printf`` and a UTF-16 string +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +For debugging, it's useful to ``printf`` a UTF-16 string (``nsString``, +``nsAutoString``, etc). To do this usually requires converting it to an 8-bit +string, because that's what ``printf`` expects. Use: + +.. code-block:: cpp + + printf("%s\n", NS_ConvertUTF16toUTF8(yourString).get()); + +Sequence of appends without reallocating +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +``SetCapacity()`` allows you to give the string a hint of the future string +length caused by a sequence of appends (excluding appends that convert +between UTF-16 and UTF-8 in either direction) in order to avoid multiple +allocations during the sequence of appends. However, the other +allocation-avoidance features of XPCOM strings interact badly with +``SetCapacity()`` making it something of a footgun. + +``SetCapacity()`` is appropriate to use before a sequence of multiple +operations from the following list (without operations that are not on the +list between the ``SetCapacity()`` call and operations from the list): + +* ``Append()`` +* ``AppendASCII()`` +* ``AppendLiteral()`` +* ``AppendPrintf()`` +* ``AppendInt()`` +* ``AppendFloat()`` +* ``LossyAppendUTF16toASCII()`` +* ``AppendASCIItoUTF16()`` + +**DO NOT** call ``SetCapacity()`` if the subsequent operations on the string +do not meet the criteria above. Operations that undo the benefits of +``SetCapacity()`` include but are not limited to: + +* ``SetLength()`` +* ``Truncate()`` +* ``Assign()`` +* ``AssignLiteral()`` +* ``Adopt()`` +* ``CopyASCIItoUTF16()`` +* ``LossyCopyUTF16toASCII()`` +* ``AppendUTF16toUTF8()`` +* ``AppendUTF8toUTF16()`` +* ``CopyUTF16toUTF8()`` +* ``CopyUTF8toUTF16()`` + +If your string is an ``nsAuto[C]String`` and you are calling +``SetCapacity()`` with a constant ``N``, please instead declare the string as +``nsAuto[C]StringN<N+1>`` without calling ``SetCapacity()`` (while being +mindful of not using such a large ``N`` as to overflow the run-time stack). + +There is no need to include room for the null terminator: it is the job of +the string class. + +Note: Calling ``SetCapacity()`` does not give you permission to use the +pointer obtained from ``BeginWriting()`` to write past the current length (as +returned by ``Length()``) of the string. Please use either ``BulkWrite()`` or +``SetLength()`` instead. + +.. _stringguide.xpidl: + +XPIDL +----- + +The string library is also available through IDL. By declaring attributes and +methods using the specially defined IDL types, string classes are used as +parameters to the corresponding methods. + +XPIDL String types +~~~~~~~~~~~~~~~~~~ + +The C++ signatures follow the abstract-type convention described above, such +that all method parameters are based on the abstract classes. The following +table describes the purpose of each string type in IDL. + ++-----------------+----------------+----------------------------------------------------------------------------------+ +| XPIDL Type | C++ Type | Purpose | ++=================+================+==================================================================================+ +| ``string`` | ``char*`` | Raw character pointer to ASCII (7-bit) string, no string classes used. | +| | | | +| | | High bit is not guaranteed across XPConnect boundaries. | ++-----------------+----------------+----------------------------------------------------------------------------------+ +| ``wstring`` | ``char16_t*`` | Raw character pointer to UTF-16 string, no string classes used. | ++-----------------+----------------+----------------------------------------------------------------------------------+ +| ``AString`` | ``nsAString`` | UTF-16 string. | ++-----------------+----------------+----------------------------------------------------------------------------------+ +| ``ACString`` | ``nsACString`` | 8-bit string. All bits are preserved across XPConnect boundaries. | ++-----------------+----------------+----------------------------------------------------------------------------------+ +| ``AUTF8String`` | ``nsACString`` | UTF-8 string. | +| | | | +| | | Converted to UTF-16 as necessary when value is used across XPConnect boundaries. | ++-----------------+----------------+----------------------------------------------------------------------------------+ + +Callers should prefer using the string classes ``AString``, ``ACString`` and +``AUTF8String`` over the raw pointer types ``string`` and ``wstring`` in +almost all situations. + +C++ Signatures +~~~~~~~~~~~~~~ + +In XPIDL, ``in`` parameters are read-only, and the C++ signatures for +``*String`` parameters follows the above guidelines by using ``const +nsAString&`` for these parameters. ``out`` and ``inout`` parameters are +defined simply as ``nsAString&`` so that the callee can write to them. + +.. code-block:: cpp + + interface nsIFoo : nsISupports { + attribute AString utf16String; + AUTF8String getValue(in ACString key); + }; + +.. code-block:: cpp + + class nsIFoo : public nsISupports { + NS_IMETHOD GetUtf16String(nsAString& aResult) = 0; + NS_IMETHOD SetUtf16String(const nsAString& aValue) = 0; + NS_IMETHOD GetValue(const nsACString& aKey, nsACString& aResult) = 0; + }; + +In the above example, ``utf16String`` is treated as a UTF-16 string. The +implementation of ``GetUtf16String()`` will use ``aResult.Assign`` to +"return" the value. In ``SetUtf16String()`` the value of the string can be +used through a variety of methods including `Iterators`_, +``PromiseFlatString``, and assignment to other strings. + +In ``GetValue()``, the first parameter, ``aKey``, is treated as a raw +sequence of 8-bit values. Any non-ASCII characters in ``aKey`` will be +preserved when crossing XPConnect boundaries. The implementation of +``GetValue()`` will assign a UTF-8 encoded 8-bit string into ``aResult``. If +the this method is called across XPConnect boundaries, such as from a script, +then the result will be decoded from UTF-8 into UTF-16 and used as a Unicode +value. + +String Guidelines +----------------- + +Follow these simple rules in your code to keep your fellow developers, +reviewers, and users happy. + +* Use the most abstract string class that you can. Usually this is: + * ``nsAString`` for function parameters + * ``nsString`` for member variables + * ``nsAutoString`` for local (stack-based) variables +* Use the ``""_ns`` and ``u""_ns`` user-defined literals to represent literal strings (e.g. ``"foo"_ns``) as nsAString-compatible objects. +* Use string concatenation (i.e. the "+" operator) when combining strings. +* Use ``nsDependentString`` when you have a raw character pointer that you need to convert to an nsAString-compatible string. +* Use ``Substring()`` to extract fragments of existing strings. +* Use `iterators`_ to parse and extract string fragments. + +Class Reference +--------------- + +.. cpp:class:: template<T> nsTSubstring<T> + + .. note:: + + The ``nsTSubstring<char_type>`` class is usually written as + ``nsAString`` or ``nsACString``. + + .. cpp:function:: size_type Length() const + + .. cpp:function:: bool IsEmpty() const + + .. cpp:function:: bool IsVoid() const + + .. cpp:function:: const char_type* BeginReading() const + + .. cpp:function:: const char_type* EndReading() const + + .. cpp:function:: bool Equals(const self_type&, comparator_type = ...) const + + .. cpp:function:: char_type First() const + + .. cpp:function:: char_type Last() const + + .. cpp:function:: size_type CountChar(char_type) const + + .. cpp:function:: int32_t FindChar(char_type, index_type aOffset = 0) const + + .. cpp:function:: void Assign(const self_type&) + + .. cpp:function:: void Append(const self_type&) + + .. cpp:function:: void Insert(const self_type&, index_type aPos) + + .. cpp:function:: void Cut(index_type aCutStart, size_type aCutLength) + + .. cpp:function:: void Replace(index_type aCutStart, size_type aCutLength, const self_type& aStr) + + .. cpp:function:: void Truncate(size_type aLength) + + .. cpp:function:: void SetIsVoid(bool) + + Make it null. XPConnect and WebIDL will convert void nsAStrings to + JavaScript ``null``. + + .. cpp:function:: char_type* BeginWriting() + + .. cpp:function:: char_type* EndWriting() + + .. cpp:function:: void SetCapacity(size_type) + + Inform the string about buffer size need before a sequence of calls + to ``Append()`` or converting appends that convert between UTF-16 and + Latin1 in either direction. (Don't use if you use appends that + convert between UTF-16 and UTF-8 in either direction.) Calling this + method does not give you permission to use ``BeginWriting()`` to + write past the logical length of the string. Use ``SetLength()`` or + ``BulkWrite()`` as appropriate. + + .. cpp:function:: void SetLength(size_type) + + .. cpp:function:: Result<BulkWriteHandle<char_type>, nsresult> BulkWrite(size_type aCapacity, size_type aPrefixToPreserve, bool aAllowShrinking) diff --git a/xpcom/docs/thread-safety.rst b/xpcom/docs/thread-safety.rst new file mode 100644 index 0000000000..be2f156804 --- /dev/null +++ b/xpcom/docs/thread-safety.rst @@ -0,0 +1,354 @@ +**Thread safety analysis in Gecko** +=================================== + +Clang thread-safety analysis is supported in Gecko. This means +builds will generate warnings when static analysis detects an issue with +locking of mutex/monitor-protected members and data structures. Note +that Chrome uses the same feature. An example warning: :: + + warning: dom/media/AudioStream.cpp:504:22 [-Wthread-safety-analysis] + reading variable 'mDataSource' requires holding mutex 'mMonitor' + +If your patch causes warnings like this, you’ll need to resolve them; +they will be errors on checkin. + +This analysis depends on thread-safety attributions in the source. These +have been added to Mozilla’s Mutex and Monitor classes and subclasses, +but in practice the analysis is largely dependent on additions to the +code being checked, in particular adding MOZ_GUARDED_BY(mutex) attributions +on the definitions of member variables. Like this: :: + + mozilla::Mutex mLock; + bool mShutdown MOZ_GUARDED_BY(mLock); + +For background on Clang’s thread-safety support, see `their +documentation <https://clang.llvm.org/docs/ThreadSafetyAnalysis.html>`__. + +Newly added Mutexes and Monitors **MUST** use thread-safety annotations, +and we are enabling static checks to verify this. Legacy uses of Mutexes +and Monitors are marked with MOZ_UNANNOTATED. + +If you’re modifying code that has been annotated with +MOZ_GUARDED_BY()/MOZ_REQUIRES()/etc, you should **make sure that the annotations +are updated properly**; e.g. if you change the thread-usage of a member +variable or method you should mark it accordingly, comment, and resolve +any warnings. Since the warnings will be errors in autoland/m-c, you +won’t be able to land code with active warnings. + +**Annotating locking and usage requirements in class definitions** +------------------------------------------------------------------ + +Values that require a lock to access, or which are simply used from more +than one thread, should always have documentation in the definition +about the locking requirements and/or what threads it’s touched from: :: + + // This array is accessed from both the direct video thread, and the graph + // thread. Protected by mMutex. + + nsTArray<std::pair<ImageContainer::FrameID, VideoChunk>> mFrames + MOZ_GUARDED_BY(mMutex); + + // Set on MainThread, deleted on either MainThread mainthread, used on + // MainThread or IO Thread in DoStopSession + nsCOMPtr<nsITimer> mReconnectDelayTimer MOZ_GUARDED_BY(mMutex); + +It’s strongly recommended to group values by access pattern, but it’s +**critical** to make it clear what the requirements to access a value +are. With values protected by Mutexes and Monitors, adding a +MOZ_GUARDED_BY(mutex/monitor) should be sufficient, though you may want to +also document what threads access it, and if they read or write to it. + +Values which have more complex access requirements (see single-writer +and time-based-locking below) need clear documentation where they’re +defined: :: + + MutexSingleWriter mMutex; + + // mResource should only be modified on the main thread with the lock. + // So it can be read without lock on the main thread or on other threads + // with the lock. + RefPtr<ChannelMediaResource> mResource MOZ_GUARDED_BY(mMutex); + +**WARNING:** thread-safety analysis is not magic; it depends on you telling +it the requirements around access. If you don’t mark something as +MOZ_GUARDED_BY() it won’t figure it out for you, and you can end up with a data +race. When writing multithreaded code, you should always be thinking about +which threads can access what and when, and document this. + +**How to annotate different locking patterns in Gecko** +------------------------------------------------------- + +Gecko uses a number of different locking patterns. They include: + +- **Always Lock** - + Multiple threads may read and write the value + +- **Single Writer** - + One thread does all the writing, other threads + read the value, but code on the writing thread also reads it + without the lock + +- **Out-of-band invariants** - + A value may be accessed from other threads, + but only after or before certain other events or in a certain state, + like when after a listener has been added or before a processing + thread has been shut down. + +The simplest and easiest to check with static analysis is **Always +Lock**, and generally you should prefer this pattern. This is very +simple; you add MOZ_GUARDED_BY(mutex/monitor), and must own the lock to +access the value. This can be implemented by some combination of direct +Lock/AutoLock calls in the method; an assertion that the lock is already +held by the current thread, or annotating the method as requiring the +lock (MOZ_REQUIRES(mutex)) in the method definition: :: + + // Ensures mSize is initialized, if it can be. + // mLock must be held when this is called, and mInput must be non-null. + void EnsureSizeInitialized() MOZ_REQUIRES(mLock); + ... + // This lock protects mSeekable, mInput, mSize, and mSizeInitialized. + Mutex mLock; + int64_t mSize MOZ_GUARDED_BY(mLock); + +**Single Writer** is tricky for static analysis normally, since it +doesn’t know what thread an access will occur on. In general, you should +prefer using Always Lock in non-performance-sensitive code, especially +since these mutexes are almost always uncontended and therefore cheap to +lock. + +To support this fairly common pattern in Mozilla code, we’ve added +MutexSingleWriter and MonitorSingleWriter subclasses. To use these, you +need to subclass SingleWriterLockOwner on one object (typically the +object containing the Mutex), implement ::OnWritingThread(), and pass +the object to the constructor for MutexSingleWriter. In code that +accesses the guarded value from the writing thread, you need to add +mMutex.AssertIsOnWritingThread(), which both does a debug-only runtime +assertion by calling OnWritingThread(), and also asserts to the static +analyzer that the lock is held (which it isn’t). + +There is one case this causes problems with: when a method needs to +access the value (without the lock), and then decides to write to the +value from the same method, taking the lock. To the static analyzer, +this looks like a double-lock. Either you will need to add +MOZ_NO_THREAD_SAFETY_ANALYSIS to the method, move the write into another +method you call, or locally disable the warning with +MOZ_PUSH_IGNORE_THREAD_SAFETY and MOZ_POP_THREAD_SAFETY. We’re discussing with +the clang static analysis developers how to better handle this. + +Note also that this provides no checking that the lock is taken to write +to the value: :: + + MutexSingleWriter mMutex; + // mResource should only be modified on the main thread with the lock. + // So it can be read without lock on the main thread or on other threads + // with the lock. + RefPtr<ChannelMediaResource> mResource MOZ_GUARDED_BY(mMutex); + ... + nsresult ChannelMediaResource::Listener::OnStartRequest(nsIRequest *aRequest) { + mMutex.AssertOnWritingThread(); + + // Read from the only writing thread; no lock needed + if (!mResource) { + return NS_OK; + } + return mResource->OnStartRequest(aRequest, mOffset); + } + +If you need to assert you’re on the writing thread, then later take a +lock to modify a value, it will cause a warning: ”acquiring mutex +'mMutex' that is already held”. You can resolve this by turning off +thread-safety analysis for the lock: :: + + mMutex.AssertOnWritingThread(); + ... + { + MOZ_PUSH_IGNORE_THREAD_SAFETY + MutexSingleWriterAutoLock lock(mMutex); + MOZ_POP_THREAD_SAFETY + +**Out-of-band Invariants** is used in a number of places (and may be +combined with either of the above patterns). It's using other knowledge +about the execution pattern of the code to assert that it's safe to avoid +taking certain locks. A primary example is when a value can +only be accessed from a single thread for part of its lifetime (this can +also be referred to as "time-based locking"). + +Note that thread-safety analysis always ignores constructors and destructors +(which shouldn’t have races with other threads barring really odd usages). +Since only a single thread can access during those time periods, locking is +not required there. However, if a method is called from a constructor, +that method may generate warnings since the compiler doesn't know if it +might be called from elsewhere: :: + + ... + class nsFoo { + public: + nsFoo() { + mBar = true; // Ok since we're in the constructor, no warning + Init(); + } + void Init() { // we're only called from the constructor + // This causes a thread-safety warning, since the compiler + // can't prove that Init() is only called from the constructor + mQuerty = true; + } + ... + mMutex mMutex; + uint32_t mBar MOZ_GUARDED_BY(mMutex); + uint32_t mQuerty MOZ_GUARDED_BY(mMutex); + } + +Another example might be a value that’s used from other threads, but only +if an observer has been installed. Thus code that always runs before the +observer is installed, or after it’s removed, does not need to lock. + +These patterns are impossible to statically check in most cases. If all +the periods where it’s accessed from one thread only are on the same +thread, you could use the Single Writer pattern support to cover this +case. You would add AssertIsOnWritingThread() calls to methods that meet +the criteria that only a single thread can access the value (but only if +that holds). Unlike regular uses of SingleWriter, however, there’s no way +to check if you added such an assertion to code that runs on the “right” +thread, but during a period where another thread might modify it. + +For this reason, we **strongly** suggest that you convert cases of +Out-of-band-invariants/Time-based-locking to Always Lock if you’re +refactoring the code or making major modifications. This is far less prone +to error, and also to future changes breaking the assumptions about other +threads accessing the value. In all but a few cases where code is on a very +‘hot’ path, this will have no impact on performance - taking an uncontended +lock is cheap. + +To quiet warnings where these patterns are in use, you'll need to either +add locks (preferred), or suppress the warnings with MOZ_NO_THREAD_SAFETY_ANALYSIS or +MOZ_PUSH_IGNORE_THREAD_SAFETY/MOZ_POP_THREAD_SAFETY. + +**This pattern especially needs good documentation in the code as to what +threads will access what members under what conditions!**:: + + // Can't be accessed by multiple threads yet + nsresult nsAsyncStreamCopier::InitInternal(nsIInputStream* source, + nsIOutputStream* sink, + nsIEventTarget* target, + uint32_t chunkSize, + bool closeSource, + bool closeSink) + MOZ_NO_THREAD_SAFETY_ANALYSIS { + +and:: + + // We can't be accessed by another thread because this hasn't been + // added to the public list yet + MOZ_PUSH_IGNORE_THREAD_SAFETY + mRestrictedPortList.AppendElement(gBadPortList[i]); + MOZ_POP_THREAD_SAFETY + +and:: + + // This is called on entries in another entry's mCallback array, under the lock + // of that other entry. No other threads can access this entry at this time. + bool CacheEntry::Callback::DeferDoom(bool* aDoom) const { + +**Known limitations** +--------------------- + +**Static analysis can’t handle all reasonable patterns.** In particular, +per their documentation, it can’t handle conditional locks, like: :: + + if (OnMainThread()) { + mMutex.Lock(); + } + +You should resolve this either via MOZ_NO_THREAD_SAFETY_ANALYSIS on the +method, or MOZ_PUSH_IGNORE_THREAD_SAFETY/MOZ_POP_THREAD_SAFETY. + +**Sometimes the analyzer can’t figure out that two objects are both the +same Mutex**, and it will warn you. You may be able to resolve this by +making sure you’re using the same pattern to access the mutex: :: + + mChan->mMonitor->AssertCurrentThreadOwns(); + ... + { + - MonitorAutoUnlock guard(*monitor); + + MonitorAutoUnlock guard(*(mChan->mMonitor.get())); // avoids mutex warning + ok = node->SendUserMessage(port, std::move(aMessage)); + } + +**Maybe<MutexAutoLock>** doesn’t tell the static analyzer when the mutex +is owned or freed; follow locking via the MayBe<> by +**mutex->AssertCurrentThreadOwns();** (and ditto for Monitors): :: + + Maybe<MonitorAutoLock> lock(std::in_place, *mMonitor); + mMonitor->AssertCurrentThreadOwns(); // for threadsafety analysis + +If you reset() the Maybe<>, you may need to surround it with +MOZ_PUSH_IGNORE_THREAD_SAFETY and MOZ_POP_THREAD_SAFETY macros: :: + + MOZ_PUSH_IGNORE_THREAD_SAFETY + aLock.reset(); + MOZ_POP_THREAD_SAFETY + +**Passing a protected value by-reference** sometimes will confuse the +analyzer. Use MOZ_PUSH_IGNORE_THREAD_SAFETY and MOZ_POP_THREAD_SAFETY macros to +resolve this. + +**Classes which need thread-safety annotations** +------------------------------------------------ + +- Mutex + +- StaticMutex + +- RecursiveMutex + +- BaseProfilerMutex + +- Monitor + +- StaticMonitor + +- ReentrantMonitor + +- RWLock + +- Anything that hides an internal Mutex/etc and presents a Mutex-like + API (::Lock(), etc). + +**Additional Notes** +-------------------- + +Some code passes **Proof-of-Lock** AutoLock parameters, as a poor form of +static analysis. While it’s hard to make mistakes if you pass an AutoLock +reference, it is possible to pass a lock to the wrong Mutex/Monitor. + +Proof-of-lock is basically redundant to MOZ_REQUIRES() and obsolete, and +depends on the optimizer to remove it, and per above it can be misused, +with effort. With MOZ_REQUIRES(), any proof-of-lock parameters can be removed, +though you don't have to do so immediately. + +In any method taking an aProofOfLock parameter, add a MOZ_REQUIRES(mutex) to +the definition (and optionally remove the proof-of-lock), or add a +mMutex.AssertCurrentThreadOwns() to the method: :: + + nsresult DispatchLockHeld(already_AddRefed<WorkerRunnable> aRunnable, + - nsIEventTarget* aSyncLoopTarget, + - const MutexAutoLock& aProofOfLock); + + nsIEventTarget* aSyncLoopTarget) MOZ_REQUIRES(mMutex); + +or (if for some reason it's hard to specify the mutex in the header):: + + nsresult DispatchLockHeld(already_AddRefed<WorkerRunnable> aRunnable, + - nsIEventTarget* aSyncLoopTarget, + - const MutexAutoLock& aProofOfLock); + + nsIEventTarget* aSyncLoopTarget) { + + mMutex.AssertCurrentThreadOwns(); + +In addition to MOZ_GUARDED_BY() there’s also MOZ_PT_GUARDED_BY(), which says +that the pointer isn’t guarded, but the data pointed to by the pointer +is. + +Classes that expose a Mutex-like interface can be annotated like Mutex; +see some of the examples in the tree that use MOZ_CAPABILITY and +MOZ_ACQUIRE()/MOZ_RELEASE(). + +Shared locks are supported, though we don’t use them much. See RWLock. diff --git a/xpcom/docs/writing-xpcom-interface.rst b/xpcom/docs/writing-xpcom-interface.rst new file mode 100644 index 0000000000..9eeb1c72a2 --- /dev/null +++ b/xpcom/docs/writing-xpcom-interface.rst @@ -0,0 +1,287 @@ +.. _writing_xpcom_interface: + +Tutorial for Writing a New XPCOM Interface +========================================== + +High Level Overview +------------------- + +In order to write code that works in native code (C++, Rust), and JavaScript contexts, it's necessary to have a mechanism to do so. For chrome privileged contexts, this is the XPCOM Interface Class. + +This mechanism starts with an :ref:`XPIDL` file to define the shape of the interface. In the `build system`_, this file is processed, and `Rust`_ and `C++`_ code is automatically generated. + +.. _build system: https://searchfox.org/mozilla-central/source/xpcom/idl-parser/xpidl +.. _Rust: https://searchfox.org/mozilla-central/source/__GENERATED__/dist/xpcrs/rt +.. _C++: https://searchfox.org/mozilla-central/source/__GENERATED__/dist/include + +Next, the interface's methods and attributes must be implemented. This can be done through either a JSM module, or through a C++ interface class. Once these steps are done, the new files must be added to the appropriate :code:`moz.build` files to ensure the build system knows how to find them and process them. + +Often these XPCOM components are wired into the :code:`Services` JavaScript object to allow for ergonomic access to the interface. For example, open the `Browser Console`_ and type :code:`Services.` to interactively access these components. + +.. _Browser Console: https://developer.mozilla.org/en-US/docs/Tools/Browser_Console + +From C++, components can be accessed via :code:`mozilla::components::ComponentName::Create()` using the :code:`name` option in the :code:`components.conf`. + +While :code:`Services` and :code:`mozilla::components` are the preferred means of accessing components, many are accessed through the historical (and somewhat arcane) :code:`createInstance` mechanism. New usage of these mechanisms should be avoided if possible. + +.. code:: javascript + + let component = Cc["@mozilla.org/component-name;1"].createInstance( + Ci.nsIComponentName + ); + +.. code:: c++ + + nsCOMPtr<nsIComponentName> component = do_CreateInstance( + "@mozilla.org/component-name;1"); + +Writing an XPIDL +---------------- + +First decide on a name. Conventionally the interfaces are prefixed with :code:`nsI` (historically Netscape) or :code:`mozI` as they are defined in the global namespace. While the interface is global, the implementation of an interface can be defined in a namespace with no prefix. Historically many component implementations still use the :code:`ns` prefixes (notice that the :code:`I` was dropped), but this convention is no longer needed. + +This tutorial assumes the component is located at :code:`path/to` with the name :code:`ComponentName`. The interface name will be :code:`nsIComponentName`, while the implementation will be :code:`mozilla::ComponentName`. + +To start, create an :ref:`XPIDL` file: + +.. code:: bash + + touch path/to/nsIComponentName.idl + +And hook it up to the :code:`path/to/moz.build` + +.. code:: python + + XPIDL_SOURCES += [ + "nsIComponentName.idl", + ] + +Next write the initial :code:`.idl` file: :code:`path/to/nsIComponentName.idl` + +.. _contract_ids: +.. code:: c++ + + /* This Source Code Form is subject to the terms of the Mozilla Public + * License, v. 2.0. If a copy of the MPL was not distributed with this + * file, You can obtain one at http://mozilla.org/MPL/2.0/. */ + + // This is the base include which defines nsISupports. This class defines + // the QueryInterface method. + #include "nsISupports.idl" + + // `scriptable` designates that this object will be used with JavaScript + // `uuid` The example below uses a UUID with all Xs. Replace the Xs with + // your own UUID generated here: + // http://mozilla.pettay.fi/cgi-bin/mozuuid.pl + + /** + * Make sure to document your interface. + */ + [scriptable, uuid(xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx)] + interface nsIComponentName : nsISupports { + + // Fill out your definition here. This example attribute only returns a bool. + + /** + * Make sure to document your attributes. + */ + readonly attribute bool isAlive; + }; + +This definition only includes one attribute, :code:`isAlive`, which will demonstrate that we've done our work correctly at the end. For a more comprehensive guide for this syntax, see the :ref:`XPIDL` docs. + +Once :code:`./mach build` is run, the XPIDL parser will read this file, and give any warnings if the syntax is wrong. It will then auto-generate the C++ (or Rust) code for us. For this example the generated :code:`nsIComponentName` class will be located in: + +:code:`{obj-directory}/dist/include/nsIComponentName.h` + +It might be useful to check out what was automatically generated here, or see the existing `generated C++ header files on SearchFox <https://searchfox.org/mozilla-central/source/__GENERATED__/dist/>`_. + +Writing the C++ implementation +------------------------------ + +Now we have a definition for an interface, but no implementation. The interface could be backed by a JavaScript implementation using a JSM, but for this example we'll use a C++ implementation. + +Add the C++ sources to :code:`path/to/moz.build` + +.. code:: python + + EXPORTS.mozilla += [ + "ComponentName.h", + ] + + UNIFIED_SOURCES += [ + "ComponentName.cpp", + ] + +Now write the header: :code:`path/to/ComponentName.h` + +.. code:: c++ + + /* This Source Code Form is subject to the terms of the Mozilla Public + * License, v. 2.0. If a copy of the MPL was not distributed with this + * file, You can obtain one at http://mozilla.org/MPL/2.0/. */ + #ifndef mozilla_nsComponentName_h__ + #define mozilla_nsComponentName_h__ + + // This will pull in the header auto-generated by the .idl file: + // {obj-directory}/dist/include/nsIComponentName.h + #include "nsIComponentName.h" + + // The implementation can be namespaced, while the XPCOM interface is globally namespaced. + namespace mozilla { + + // Notice how the class name does not need to be prefixed, as it is defined in the + // `mozilla` namespace. + class ComponentName final : public nsIComponentName { + // This first macro includes the necessary information to use the base nsISupports. + // This includes the QueryInterface method. + NS_DECL_ISUPPORTS + + // This second macro includes the declarations for the attributes. There is + // no need to duplicate these declarations. + // + // In our case it includes a declaration for the isAlive attribute: + // GetIsAlive(bool *aIsAlive) + NS_DECL_NSICOMPONENTNAME + + public: + ComponentName() = default; + + private: + // A private destructor must be declared. + ~ComponentName() = default; + }; + + } // namespace mozilla + + #endif + +Now write the definitions: :code:`path/to/ComponentName.cpp` + +.. code:: c++ + + /* This Source Code Form is subject to the terms of the Mozilla Public + * License, v. 2.0. If a copy of the MPL was not distributed with this + * file, You can obtain one at http://mozilla.org/MPL/2.0/. */ + + #include "ComponentName.h" + + namespace mozilla { + + // Use the macro to inject all of the definitions for nsISupports. + NS_IMPL_ISUPPORTS(ComponentName, nsIComponentName) + + // This is the actual implementation of the `isAlive` attribute. Note that the + // method name is somewhat different than the attribute. We specified "read-only" + // in the attribute, so only a getter, not a setter was defined for us. Here + // the name was adjusted to be `GetIsAlive`. + // + // Another common detail of implementing an XPIDL interface is that the return values + // are passed as out parameters. The methods are treated as fallible, and the return + // value is an `nsresult`. See the XPIDL documentation for the full nitty gritty + // details. + // + // A common way to know the exact function signature for a method implementation is + // to copy and paste from existing examples, or inspecting the generated file + // directly: {obj-directory}/dist/include/nsIComponentName.h + NS_IMETHODIMP + ComponentName::GetIsAlive(bool* aIsAlive) { + *aIsAlive = true; + return NS_OK; + } + + } // namespace: mozilla + +Registering the component +------------------------- + +At this point, the component should be correctly written, but it's not registered with the component system. In order to this, we'll need to create or modify the :code:`components.conf`. + +.. code:: bash + + touch path/to/components.conf + + +Now update the :code:`moz.build` to point to it. + +.. code:: python + + XPCOM_MANIFESTS += [ + "components.conf", + ] + +It is probably worth reading over :ref:`defining_xpcom_components`, but the following config will be sufficient to hook up our component to the :code:`Services` object. +Services should also be added to ``tools/lint/eslint/eslint-plugin-mozilla/lib/services.json``. +The easiest way to do that is to copy from ``<objdir>/xpcom/components/services.json``. + +.. code:: python + + Classes = [ + { + # This CID is the ID for component entries, and needs a separate UUID from + # the .idl file. Replace the Xs with a uuid from: + # http://mozilla.pettay.fi/cgi-bin/mozuuid.pl + 'cid': '{xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx}', + 'interfaces': ['nsIComponentName'], + + # A contract ID is a human-readable identifier for an _implementation_ of + # an XPCOM interface. + # + # "@mozilla.org/process/environment;1" + # ^^^^^^^^^^^^ ^^^^^^^ ^^^^^^^^^^^ ^ + # | | | | + # | | | The version number, usually just 1. + # | | Component name + # | Module + # Domain + # + # This design goes back to a time when XPCOM was intended to be a generalized + # solution for the Gecko Runtime Environment (GRE). At this point most (if + # not all) of mozilla-central has an @mozilla domain. + 'contract_ids': ['@mozilla.org/component-name;1'], + + # This is the name of the C++ type that implements the interface. + 'type': 'mozilla::ComponentName', + + # The header file to pull in for the implementation of the interface. + 'headers': ['path/to/ComponentName.h'], + + # In order to hook up this interface to the `Services` object, we can + # provide the "js_name" parameter. This is an ergonomic way to access + # the component. + 'js_name': 'componentName', + }, + ] + +At this point the full :code:`moz.build` file should look like: + +.. code:: python + + # -*- Mode: python; indent-tabs-mode: nil; tab-width: 40 -*- + # vim: set filetype=python: + # This Source Code Form is subject to the terms of the Mozilla Public + # License, v. 2.0. If a copy of the MPL was not distributed with this + # file, You can obtain one at http://mozilla.org/MPL/2.0/. + + XPIDL_SOURCES += [ + "nsIComponentName.idl", + ] + + XPCOM_MANIFESTS += [ + "components.conf", + ] + + EXPORTS.mozilla += [ + "ComponentName.h", + ] + + UNIFIED_SOURCES += [ + "ComponentName.cpp", + ] + +This completes the implementation of a basic XPCOM Interface using C++. The component should be available via the `Browser Console`_ or other chrome contexts. + +.. code:: javascript + + console.log(Services.componentName.isAlive); + > true diff --git a/xpcom/docs/xpidl.rst b/xpcom/docs/xpidl.rst new file mode 100644 index 0000000000..41a039710b --- /dev/null +++ b/xpcom/docs/xpidl.rst @@ -0,0 +1,402 @@ +XPIDL +===== + +**XPIDL** is an Interface Description Language used to specify XPCOM interface +classes. + +Interface Description Languages (IDL) are used to describe interfaces in a +language- and machine-independent way. IDLs make it possible to define +interfaces which can then be processed by tools to autogenerate +language-dependent interface specifications. + +An xpidl file is essentially just a series of declarations. At the top level, +we can define typedefs, native types, or interfaces. Interfaces may +furthermore contain typedefs, natives, methods, constants, or attributes. +Most declarations can have properties applied to them. + +Types +----- + +There are three ways to make types: a typedef, a native, or an interface. In +addition, there are a few built-in native types. The built-in native types +are those listed under the type_spec production above. The following is the +correspondence table: + +=========================== =============== =========================== ============================ ======================= ======================= +IDL Type Javascript Type C++ in parameter C++ out parameter Rust in parameter Rust out parameter +=========================== =============== =========================== ============================ ======================= ======================= +``boolean`` boolean ``bool`` ``bool*`` ``bool`` ``*mut bool`` +``char`` string ``char`` ``char*`` ``c_char`` ``*mut c_char`` +``double`` number ``double`` ``double*`` ``f64`` ``*mut f64`` +``float`` number ``float`` ``float*`` ``f32`` ``*mut f32`` +``long`` number ``int32_t`` ``int32_t*`` ``i32`` ``*mut i32`` +``long long`` number ``int64_t`` ``int64_t*`` ``i64`` ``*mut i64`` +``octet`` number ``uint8_t`` ``uint8_t*`` ``u8`` ``*mut u8`` +``short`` number ``uint16_t`` ``uint16_t*`` ``u16`` ``*mut u16`` +``string`` [#strptr]_ string ``const char*`` ``char**`` ``*const c_char`` ``*mut *mut c_char`` +``unsigned long`` number ``uint32_t`` ``uint32_t*`` ``u32`` ``*mut u32`` +``unsigned long long`` number ``uint64_t`` ``uint64_t*`` ``u64`` ``*mut u64`` +``unsigned short`` number ``uint16_t`` ``uint16_t*`` ``u16`` ``*mut u16`` +``wchar`` string ``char16_t`` ``char16_t*`` ``i16`` ``*mut i16`` +``wstring`` [#strptr]_ string ``const char16_t*`` ``char16_t**`` ``*const i16`` ``*mut *mut i16`` +``MozExternalRefCountType`` number ``MozExternalRefCountType`` ``MozExternalRefCountType*`` ``u32`` ``*mut u32`` +``Array<T>`` [#array]_ array ``const nsTArray<T>&`` ``nsTArray<T>&`` ``*const ThinVec<T>`` ``*mut ThinVec<T>`` +=========================== =============== =========================== ============================ ======================= ======================= + +.. [#strptr] + + Prefer using the string class types such as ``AString``, ``AUTF8String`` + or ``ACString`` to this type. The behaviour of these types is documented + more in the :ref:`String Guide <stringguide.xpidl>` + +.. [#array] + + The C++ or Rust exposed type ``T`` will be an owned variant. (e.g. + ``ns[C]String``, ``RefPtr<T>``, or ``uint32_t``) + + ``string``, ``wstring``, ``[ptr] native`` and ``[ref] native`` are + unsupported as element types. + + +In addition to this list, nearly every IDL file includes ``nsrootidl.idl`` in +some fashion, which also defines the following types: + +======================= ======================= ======================= ======================= ======================= ======================= +IDL Type Javascript Type C++ in parameter C++ out parameter Rust in parameter Rust out parameter +======================= ======================= ======================= ======================= ======================= ======================= +``PRTime`` number ``uint64_t`` ``uint64_t*`` ``u64`` ``*mut u64`` +``nsresult`` number ``nsresult`` ``nsresult*`` ``u32`` [#rsresult]_ ``*mut u32`` +``size_t`` number ``uint32_t`` ``uint32_t*`` ``u32`` ``*mut u32`` +``voidPtr`` N/A ``void*`` ``void**`` ``*mut c_void`` ``*mut *mut c_void`` +``charPtr`` N/A ``char*`` ``char**`` ``*mut c_char`` ``*mut *mut c_char`` +``unicharPtr`` N/A ``char16_t*`` ``char16_t**`` ``*mut i16`` ``*mut *mut i16`` +``nsIDRef`` ID object ``const nsID&`` ``nsID*`` ``*const nsID`` ``*mut nsID`` +``nsIIDRef`` ID object ``const nsIID&`` ``nsIID*`` ``*const nsIID`` ``*mut nsIID`` +``nsCIDRef`` ID object ``const nsCID&`` ``nsCID*`` ``*const nsCID`` ``*mut nsCID`` +``nsIDPtr`` ID object ``const nsID*`` ``nsID**`` ``*const nsID`` ``*mut *mut nsID`` +``nsIIDPtr`` ID object ``const nsIID*`` ``nsIID**`` ``*const nsIID`` ``*mut *mut nsIID`` +``nsCIDPtr`` ID object ``const nsCID*`` ``nsCID**`` ``*const nsCID`` ``*mut *mut nsCID`` +``nsID`` N/A ``nsID`` ``nsID*`` N/A N/A +``nsIID`` N/A ``nsIID`` ``nsIID*`` N/A N/A +``nsCID`` N/A ``nsCID`` ``nsCID*`` N/A N/A +``nsQIResult`` object ``void*`` ``void**`` ``*mut c_void`` ``*mut *mut c_void`` +``AUTF8String`` [#str]_ string ``const nsACString&`` ``nsACString&`` ``*const nsACString`` ``*mut nsACString`` +``ACString`` [#str]_ string ``const nsACString&`` ``nsACString&`` ``*const nsACString`` ``*mut nsACString`` +``AString`` [#str]_ string ``const nsAString&`` ``nsAString&`` ``*const nsAString`` ``*mut nsAString`` +``jsval`` any ``HandleValue`` ``MutableHandleValue`` N/A N/A +``jsid`` N/A ``jsid`` ``jsid*`` N/A N/A +``Promise`` Promise object ``dom::Promise*`` ``dom::Promise**`` N/A N/A +======================= ======================= ======================= ======================= ======================= ======================= + +.. [#rsresult] + + A bare ``u32`` is only for bare ``nsresult`` in/outparams in XPIDL. The + result should be wrapped as the ``nserror::nsresult`` type. + +.. [#str] + + The behaviour of these types is documented more in the :ref:`String Guide + <stringguide.xpidl>` + +Typedefs in IDL are basically as they are in C or C++: you define first the +type that you want to refer to and then the name of the type. Types can of +course be one of the fundamental types, or any other type declared via a +typedef, interface, or a native type. + +Native types are types which correspond to a given C++ type. Most native +types are not scriptable: if it is not present in the list above, then it is +certainly not scriptable (some of the above, particularly jsid, are not +scriptable). + +The contents of the parentheses of a native type declaration (although native +declarations without parentheses are parsable, I do not trust that they are +properly handled by the xpidl handlers) is a string equivalent to the C++ +type. XPIDL itself does not interpret this string, it just literally pastes +it anywhere the native type is used. The interpretation of the type can be +modified by using the ``[ptr]`` or ``[ref]`` attributes on the native +declaration. Other attributes are only intended for use in ``nsrootidl.idl``. + +WebIDL Interfaces +~~~~~~~~~~~~~~~~~ + +WebIDL interfaces are also valid XPIDL types. To declare a WebIDL interface in +XPIDL, write: + +.. code-block:: JavaScript + + webidl InterfaceName; + +WebIDL types will be passed as ``mozilla::dom::InterfaceName*`` when used as +in-parameters, as ``mozilla::dom::InterfaceName**`` when used as out or +inout-parameters, and as ``RefPtr<mozilla::dom::InterfaceName>`` when used as +an array element. + +.. note:: + + Other WebIDL types (e.g. dictionaries, enums, and unions) are not currently + supported. + +Constants and CEnums +~~~~~~~~~~~~~~~~~~~~ + +Constants must be attached to an interface. The only constants supported are +those which become integer types when compiled to source code; string constants +and floating point constants are currently not supported. + +Often constants are used to describe a set of enum values. In cases like this +the ``cenum`` construct can be used to group constants together. Constants +grouped in a ``cenum`` will be reflected as-if they were declared directly on +the interface, in Rust and Javascript code. + +.. code-block:: JavaScript + + cenum MyCEnum : 8 { + eSomeValue, // starts at 0 + eSomeOtherValue, + }; + +The number after the enum name, like ``: 8`` in the example above, defines the +width of enum values with the given type. The cenum's type may be referenced in +xpidl as ``nsIInterfaceName_MyCEnum``. + +Interfaces +---------- + +Interfaces are basically a collection of constants, methods, and attributes. +Interfaces can inherit from one-another, and every interface must eventually +inherit from ``nsISupports``. + +Interface Attributes +~~~~~~~~~~~~~~~~~~~~ + +Interfaces may have the following attributes: + +``uuid`` +```````` + +The internal unique identifier for the interface. it must be unique, and the +uuid must be generated when creating the interface. After that, it doesn't need +to be changed any more. + +Online tools such as http://mozilla.pettay.fi/cgi-bin/mozuuid.pl can help +generate UUIDs for new interfaces. + +``builtinclass`` +```````````````` + +JavaScript classes are forbidden from implementing this interface. All child +interfaces must also be marked with this property. + +``function`` +```````````` + +The JavaScript implementation of this interface may be a function that is +invoked on property calls instead of an object with the given property + +``scriptable`` +`````````````` + +This interface is usable by JavaScript classes. Must inherit from a +``scriptable`` interface. + +``rust_sync`` +````````````` + +This interface is safe to use from multiple threads concurrently. All child +interfaces must also be marked with this property. Interfaces marked this way +must be either non-scriptable or ``builtinclass``, and must use threadsafe +reference counting. + +Interfaces marked as ``rust_sync`` will implement the ``Sync`` trait in Rust. +For more details on what that means, read the trait's documentation: +https://doc.rust-lang.org/nightly/std/marker/trait.Sync.html. + +Methods and Attributes +~~~~~~~~~~~~~~~~~~~~~~ + +Interfaces declare a series of attributes and methods. Attributes in IDL are +akin to JavaScript properties, in that they are a getter and (optionally) a +setter pair. In JavaScript contexts, attributes are exposed as a regular +property access, while native code sees attributes as a Get and possibly a Set +method. + +Attributes can be declared readonly, in which case setting causes an error to +be thrown in script contexts and native contexts lack the Set method, by using +the ``readonly`` keyword. + +To native code, on attribute declared ``attribute type foo;`` is syntactic +sugar for the declaration of two methods ``type getFoo();`` and ``void +setFoo(in type foo);``. If ``foo`` were declared readonly, the latter method +would not be present. Attributes support all of the properties of methods with +the exception of ``optional_argc``, as this does not make sense for attributes. + +There are some special rules for attribute naming. As a result of vtable +munging by the MSVC++ compiler, an attribute with the name ``IID`` is +forbidden. Also like methods, if the first character of an attribute is +lowercase in IDL, it is made uppercase in native code only. + +Methods define a return type and a series of in and out parameters. When called +from a JavaScript context, they invocation looks as it is declared for the most +part; some parameter properties can adjust what the code looks like. The calls +are more mangled in native contexts. + +An important attribute for methods and attributes is scriptability. A method or +attribute is scriptable if it is declared in a ``scriptable`` interface and it +lacks a ``noscript`` or ``notxpcom`` property. Any method that is not +scriptable can only be accessed by native code. However, ``scriptable`` methods +must contain parameters and a return type that can be translated to script: any +native type, save a few declared in ``nsrootidl.idl`` (see above), may not be +used in a scriptable method or attribute. An exception to the above rule is if +a ``nsQIResult`` parameter has the ``iid_is`` property (a special case for some +QueryInterface-like operations). + +Methods and attributes are mangled on conversion to native code. If a method is +declared ``notxpcom``, the mangling of the return type is prevented, so it is +called mostly as it looks. Otherwise, the return type of the native method is +``nsresult``, and the return type acts as a final outparameter if it is not +``void``. The name is translated so that the first character is +unconditionally uppercase; subsequent characters are unaffected. However, the +presence of the ``binaryname`` property allows the user to select another name +to use in native code (to avoid conflicts with other functions). For example, +the method ``[binaryname(foo)] void bar();`` becomes ``nsresult Foo()`` in +native code (note that capitalization is still applied). However, the +capitalization is not applied when using ``binaryname`` with attributes; i.e., +``[binaryname(foo)] readonly attribute Quux bar;`` becomes ``Getfoo(Quux**)`` +in native code. + +The ``implicit_jscontext`` and ``optional_argc`` parameters are properties +which help native code implementations determine how the call was made from +script. If ``implicit_jscontext`` is present on a method, then an additional +``JSContext* cx`` parameter is added just after the regular list which receives +the context of the caller. If ``optional_argc`` is present, then an additional +``uint8_t _argc`` parameter is added at the end which receives the number of +optional arguments that were actually used (obviously, you need to have an +optional argument in the first place). Note that if both properties are set, +the ``JSContext* cx`` is added first, followed by the ``uint8_t _argc``, and +then ending with return value parameter. Finally, as an exception to everything +already mentioned, for attribute getters and setters the ``JSContext *cx`` +comes before any other arguments. + +Another native-only property is ``nostdcall``. Normally, declarations are made +in the stdcall ABI on Windows to be ABI-compatible with COM interfaces. Any +non-scriptable method or attribute with ``nostdcall`` instead uses the +``thiscall`` ABI convention. Methods without this property generally use +``NS_IMETHOD`` in their declarations and ``NS_IMETHODIMP`` in their definitions +to automatically add in the stdcall declaration specifier on requisite +compilers; those that use this method may use a plain ``nsresult`` instead. + +Another property, ``infallible``, is attribute-only. When present, it causes an +infallible C++ getter function definition to be generated for the attribute +alongside the normal fallible C++ getter declaration. It should only be used if +the fallible getter will be infallible in practice (i.e. always return +``NS_OK``) for all possible implementations. This infallible getter contains +code that calls the fallible getter, asserts success, and returns the gotten +value directly. The point of using this property is to make C++ code nicer -- a +call to the infallible getter is more concise and readable than a call to the +fallible getter. This property can only be used for attributes having built-in +or interface types, and within classes that are marked with ``builtinclass``. +The latter restriction is because C++ implementations of fallible getters can +be audited for infallibility, but JS implementations can always throw (e.g. due +to OOM). + +The ``must_use`` property is useful if the result of a method call or an +attribute get/set should always (or usually) be checked, which is frequently +the case. (e.g. a method that opens a file should almost certainly have its +result checked.) This property will cause ``[[nodiscard]]`` to be added to the +generated function declarations, which means certain compilers (e.g. clang and +GCC) will reports errors if these results are not used. + +Method Parameters +~~~~~~~~~~~~~~~~~ + +Each method parameter can be specified in one of three modes: ``in``, ``out``, +or ``inout``. An ``out`` parameter is essentially an auxiliary return value, +although these are moderately cumbersome to use from script contexts and should +therefore be avoided if reasonable. An ``inout`` parameter is an in parameter +whose value may be changed as a result of the method; these parameters are +rather annoying to use and should generally be avoided if at all possible. + +``out`` and ``inout`` parameters are reflected as objects having the ``.value`` +property which contains the real value of the parameter; the ``value`` +attribute is missing in the case of ``out`` parameters and is initialized to +the passed-in-value for ``inout`` parameters. The script code needs to set this +property to assign a value to the parameter. Regular ``in`` parameters are +reflected more or less normally, with numeric types all representing numbers, +booleans as ``true`` or ``false``, the various strings (including ``AString`` +etc.) as a JavaScript string, and ``nsID`` types as a ``Components.ID`` +instance. In addition, the ``jsval`` type is translated as the appropriate +JavaScript value (since a ``jsval`` is the internal representation of all +JavaScript values), and parameters with the ``nsIVeriant`` interface have their +types automatically boxed and unboxed as appropriate. + +The equivalent representations of all IDL types in native code is given in the +earlier tables; parameters of type ``inout`` follow their ``out`` form. Native +code should pay particular attention to not passing in null values for out +parameters (although some parts of the codebase are known to violate this, it +is strictly enforced at the JS<->native barrier). + +Representations of types additionally depend on some of the many types of +properties they may have. The ``array`` property turns the parameter into an array; +the parameter must also have a corresponding ``size_is`` property whose argument is +the parameter that has the size of the array. In native code, the type gains +another pointer indirection, and JavaScript arrays are used in script code. +Script code callers can ignore the value of array parameter, but implementers +must still set the values appropriately. + +.. note:: + + Prefer using the ``Array<T>`` builtin over the ``[array]`` attribute for + new code. It is more ergonomic to use from both JS and C++. In the future, + ``[array]`` may be deprecated and removed. + +The ``const`` and ``shared`` properties are special to native code. As its name +implies, the ``const`` property makes its corresponding argument ``const``. The +``shared`` property is only meaningful for ``out`` or ``inout`` parameters and +it means that the pointer value should not be freed by the caller. Only simple +native pointer types like ``string``, ``wstring``, and ``octetPtr`` may be +declared shared. The shared property also makes its corresponding argument +const. + +The ``retval`` property indicates that the parameter is actually acting as the +return value, and it is only the need to assign properties to the parameter +that is causing it to be specified as a parameter. It has no effect on native +code, but script code uses it like a regular return value. Naturally, a method +which contains a ``retval`` parameter must be declared ``void``, and the +parameter itself must be an ``out`` parameter and the last parameter. + +Other properties are the ``optional`` and ``iid_is`` property. The ``optional`` +property indicates that script code may omit the property without problems; all +subsequent parameters must either by optional themselves or the retval +parameter. Note that optional out parameters still pass in a variable for the +parameter, but its value will be ignored. The ``iid_is`` parameter indicates +that the real IID of an ``nsQIResult`` parameter may be found in the +corresponding parameter, to allow script code to automatically unbox the type. + +Not all type combinations are possible. Native types with the various string +properties are all forbidden from being used as an ``inout`` parameter or as an +``array`` parameter. In addition, native types with the ``nsid`` property but +lacking either a ``ptr`` or ``ref`` property are forbidden unless the method is +``notxpcom`` and it is used as an ``in`` parameter. + +Ownership Rules +``````````````` + +For types that reference heap-allocated data (strings, arrays, interface +pointers, etc), you must follow the XPIDL data ownership conventions in order +to avoid memory corruption and security vulnerabilities: + +* For ``in`` parameters, the caller allocates and deallocates all data. If the + callee needs to use the data after the call completes, it must make a private + copy of the data, or, in the case of interface pointers, ``AddRef`` it. +* For ``out`` parameters, the callee creates the data, and transfers ownership + to the caller. For buffers, the callee allocates the buffer with ``malloc``, + and the caller frees the buffer with ``free``. For interface pointers, the + callee does the ``AddRef`` on behalf of the caller, and the caller must call + ``Release``. This manual reference/memory management should be performed + using the ``getter_AddRefs`` and ``getter_Transfers`` helpers in new code. +* For ``inout`` parameters, the callee must clean up the old data if it chooses + to replace it. Buffers must be deallocated with ``free``, and interface + pointers must be ``Release``'d. Afterwards, the above rules for ``out`` + apply. +* ``shared`` out-parameters should not be freed, as they are intended to refer + to constant string literals. |