1 files changed, 396 insertions, 0 deletions
diff --git a/Documentation/filesystems/seq_file.rst b/Documentation/filesystems/seq_file.rst
new file mode 100644
index 0000000000..1e1713d000
--- /dev/null
+++ b/Documentation/filesystems/seq_file.rst
@@ -0,0 +1,396 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+======================
+The seq_file Interface
+======================
+
+	Copyright 2003 Jonathan Corbet <corbet@lwn.net>
+
+	This file is originally from the LWN.net Driver Porting series at
+	https://lwn.net/Articles/driver-porting/
+
+
+There are numerous ways for a device driver (or other kernel component) to
+provide information to the user or system administrator.  One useful
+technique is the creation of virtual files, in debugfs, /proc or elsewhere.
+Virtual files can provide human-readable output that is easy to get at
+without any special utility programs; they can also make life easier for
+script writers. It is not surprising that the use of virtual files has
+grown over the years.
+
+Creating those files correctly has always been a bit of a challenge,
+however. It is not that hard to make a virtual file which returns a
+string. But life gets trickier if the output is long - anything greater
+than an application is likely to read in a single operation.  Handling
+multiple reads (and seeks) requires careful attention to the reader's
+position within the virtual file - that position is, likely as not, in the
+middle of a line of output. The kernel has traditionally had a number of
+implementations that got this wrong.
+
+The 2.6 kernel contains a set of functions (implemented by Alexander Viro)
+which are designed to make it easy for virtual file creators to get it
+right.
+
+The seq_file interface is available via <linux/seq_file.h>. There are
+three aspects to seq_file:
+
+     * An iterator interface which lets a virtual file implementation
+       step through the objects it is presenting.
+
+     * Some utility functions for formatting objects for output without
+       needing to worry about things like output buffers.
+
+     * A set of canned file_operations which implement most operations on
+       the virtual file.
+
+We'll look at the seq_file interface via an extremely simple example: a
+loadable module which creates a file called /proc/sequence. The file, when
+read, simply produces a set of increasing integer values, one per line. The
+sequence will continue until the user loses patience and finds something
+better to do. The file is seekable, in that one can do something like the
+following::
+
+    dd if=/proc/sequence of=out1 count=1
+    dd if=/proc/sequence skip=1 of=out2 count=1
+
+Then concatenate the output files out1 and out2 and get the right
+result. Yes, it is a thoroughly useless module, but the point is to show
+how the mechanism works without getting lost in other details.  (Those
+wanting to see the full source for this module can find it at
+https://lwn.net/Articles/22359/).
+
+Deprecated create_proc_entry
+============================
+
+Note that the above article uses create_proc_entry which was removed in
+kernel 3.10. Current versions require the following update::
+
+    -	entry = create_proc_entry("sequence", 0, NULL);
+    -	if (entry)
+    -		entry->proc_fops = &ct_file_ops;
+    +	entry = proc_create("sequence", 0, NULL, &ct_file_ops);
+
+The iterator interface
+======================
+
+Modules implementing a virtual file with seq_file must implement an
+iterator object that allows stepping through the data of interest
+during a "session" (roughly one read() system call).  If the iterator
+is able to move to a specific position - like the file they implement,
+though with freedom to map the position number to a sequence location
+in whatever way is convenient - the iterator need only exist
+transiently during a session.  If the iterator cannot easily find a
+numerical position but works well with a first/next interface, the
+iterator can be stored in the private data area and continue from one
+session to the next.
+
+A seq_file implementation that is formatting firewall rules from a
+table, for example, could provide a simple iterator that interprets
+position N as the Nth rule in the chain.  A seq_file implementation
+that presents the content of a, potentially volatile, linked list
+might record a pointer into that list, providing that can be done
+without risk of the current location being removed.
+
+Positioning can thus be done in whatever way makes the most sense for
+the generator of the data, which need not be aware of how a position
+translates to an offset in the virtual file. The one obvious exception
+is that a position of zero should indicate the beginning of the file.
+
+The /proc/sequence iterator just uses the count of the next number it
+will output as its position.
+
+Four functions must be implemented to make the iterator work. The
+first, called start(), starts a session and takes a position as an
+argument, returning an iterator which will start reading at that
+position.  The pos passed to start() will always be either zero, or
+the most recent pos used in the previous session.
+
+For our simple sequence example,
+the start() function looks like::
+
+	static void *ct_seq_start(struct seq_file *s, loff_t *pos)
+	{
+	        loff_t *spos = kmalloc(sizeof(loff_t), GFP_KERNEL);
+	        if (! spos)
+	                return NULL;
+	        *spos = *pos;
+	        return spos;
+	}
+
+The entire data structure for this iterator is a single loff_t value
+holding the current position. There is no upper bound for the sequence
+iterator, but that will not be the case for most other seq_file
+implementations; in most cases the start() function should check for a
+"past end of file" condition and return NULL if need be.
+
+For more complicated applications, the private field of the seq_file
+structure can be used to hold state from session to session.  There is
+also a special value which can be returned by the start() function
+called SEQ_START_TOKEN; it can be used if you wish to instruct your
+show() function (described below) to print a header at the top of the
+output. SEQ_START_TOKEN should only be used if the offset is zero,
+however.  SEQ_START_TOKEN has no special meaning to the core seq_file
+code.  It is provided as a convenience for a start() function to
+communicate with the next() and show() functions.
+
+The next function to implement is called, amazingly, next(); its job is to
+move the iterator forward to the next position in the sequence.  The
+example module can simply increment the position by one; more useful
+modules will do what is needed to step through some data structure. The
+next() function returns a new iterator, or NULL if the sequence is
+complete. Here's the example version::
+
+	static void *ct_seq_next(struct seq_file *s, void *v, loff_t *pos)
+	{
+	        loff_t *spos = v;
+	        *pos = ++*spos;
+	        return spos;
+	}
+
+The next() function should set ``*pos`` to a value that start() can use
+to find the new location in the sequence.  When the iterator is being
+stored in the private data area, rather than being reinitialized on each
+start(), it might seem sufficient to simply set ``*pos`` to any non-zero
+value (zero always tells start() to restart the sequence).  This is not
+sufficient due to historical problems.
+
+Historically, many next() functions have *not* updated ``*pos`` at
+end-of-file.  If the value is then used by start() to initialise the
+iterator, this can result in corner cases where the last entry in the
+sequence is reported twice in the file.  In order to discourage this bug
+from being resurrected, the core seq_file code now produces a warning if
+a next() function does not change the value of ``*pos``.  Consequently a
+next() function *must* change the value of ``*pos``, and of course must
+set it to a non-zero value.
+
+The stop() function closes a session; its job, of course, is to clean
+up. If dynamic memory is allocated for the iterator, stop() is the
+place to free it; if a lock was taken by start(), stop() must release
+that lock.  The value that ``*pos`` was set to by the last next() call
+before stop() is remembered, and used for the first start() call of
+the next session unless lseek() has been called on the file; in that
+case next start() will be asked to start at position zero::
+
+	static void ct_seq_stop(struct seq_file *s, void *v)
+	{
+	        kfree(v);
+	}
+
+Finally, the show() function should format the object currently pointed to
+by the iterator for output.  The example module's show() function is::
+
+	static int ct_seq_show(struct seq_file *s, void *v)
+	{
+	        loff_t *spos = v;
+	        seq_printf(s, "%lld\n", (long long)*spos);
+	        return 0;
+	}
+
+If all is well, the show() function should return zero.  A negative error
+code in the usual manner indicates that something went wrong; it will be
+passed back to user space.  This function can also return SEQ_SKIP, which
+causes the current item to be skipped; if the show() function has already
+generated output before returning SEQ_SKIP, that output will be dropped.
+
+We will look at seq_printf() in a moment. But first, the definition of the
+seq_file iterator is finished by creating a seq_operations structure with
+the four functions we have just defined::
+
+	static const struct seq_operations ct_seq_ops = {
+	        .start = ct_seq_start,
+	        .next  = ct_seq_next,
+	        .stop  = ct_seq_stop,
+	        .show  = ct_seq_show
+	};
+
+This structure will be needed to tie our iterator to the /proc file in
+a little bit.
+
+It's worth noting that the iterator value returned by start() and
+manipulated by the other functions is considered to be completely opaque by
+the seq_file code. It can thus be anything that is useful in stepping
+through the data to be output. Counters can be useful, but it could also be
+a direct pointer into an array or linked list. Anything goes, as long as
+the programmer is aware that things can happen between calls to the
+iterator function. However, the seq_file code (by design) will not sleep
+between the calls to start() and stop(), so holding a lock during that time
+is a reasonable thing to do. The seq_file code will also avoid taking any
+other locks while the iterator is active.
+
+The iterator value returned by start() or next() is guaranteed to be
+passed to a subsequent next() or stop() call.  This allows resources
+such as locks that were taken to be reliably released.  There is *no*
+guarantee that the iterator will be passed to show(), though in practice
+it often will be.
+
+
+Formatted output
+================
+
+The seq_file code manages positioning within the output created by the
+iterator and getting it into the user's buffer. But, for that to work, that
+output must be passed to the seq_file code. Some utility functions have
+been defined which make this task easy.
+
+Most code will simply use seq_printf(), which works pretty much like
+printk(), but which requires the seq_file pointer as an argument.
+
+For straight character output, the following functions may be used::
+
+	seq_putc(struct seq_file *m, char c);
+	seq_puts(struct seq_file *m, const char *s);
+	seq_escape(struct seq_file *m, const char *s, const char *esc);
+
+The first two output a single character and a string, just like one would
+expect. seq_escape() is like seq_puts(), except that any character in s
+which is in the string esc will be represented in octal form in the output.
+
+There are also a pair of functions for printing filenames::
+
+	int seq_path(struct seq_file *m, const struct path *path,
+		     const char *esc);
+	int seq_path_root(struct seq_file *m, const struct path *path,
+			  const struct path *root, const char *esc)
+
+Here, path indicates the file of interest, and esc is a set of characters
+which should be escaped in the output.  A call to seq_path() will output
+the path relative to the current process's filesystem root.  If a different
+root is desired, it can be used with seq_path_root().  If it turns out that
+path cannot be reached from root, seq_path_root() returns SEQ_SKIP.
+
+A function producing complicated output may want to check::
+
+	bool seq_has_overflowed(struct seq_file *m);
+
+and avoid further seq_<output> calls if true is returned.
+
+A true return from seq_has_overflowed means that the seq_file buffer will
+be discarded and the seq_show function will attempt to allocate a larger
+buffer and retry printing.
+
+
+Making it all work
+==================
+
+So far, we have a nice set of functions which can produce output within the
+seq_file system, but we have not yet turned them into a file that a user
+can see. Creating a file within the kernel requires, of course, the
+creation of a set of file_operations which implement the operations on that
+file. The seq_file interface provides a set of canned operations which do
+most of the work. The virtual file author still must implement the open()
+method, however, to hook everything up. The open function is often a single
+line, as in the example module::
+
+	static int ct_open(struct inode *inode, struct file *file)
+	{
+		return seq_open(file, &ct_seq_ops);
+	}
+
+Here, the call to seq_open() takes the seq_operations structure we created
+before, and gets set up to iterate through the virtual file.
+
+On a successful open, seq_open() stores the struct seq_file pointer in
+file->private_data. If you have an application where the same iterator can
+be used for more than one file, you can store an arbitrary pointer in the
+private field of the seq_file structure; that value can then be retrieved
+by the iterator functions.
+
+There is also a wrapper function to seq_open() called seq_open_private(). It
+kmallocs a zero filled block of memory and stores a pointer to it in the
+private field of the seq_file structure, returning 0 on success. The
+block size is specified in a third parameter to the function, e.g.::
+
+	static int ct_open(struct inode *inode, struct file *file)
+	{
+		return seq_open_private(file, &ct_seq_ops,
+					sizeof(struct mystruct));
+	}
+
+There is also a variant function, __seq_open_private(), which is functionally
+identical except that, if successful, it returns the pointer to the allocated
+memory block, allowing further initialisation e.g.::
+
+	static int ct_open(struct inode *inode, struct file *file)
+	{
+		struct mystruct *p =
+			__seq_open_private(file, &ct_seq_ops, sizeof(*p));
+
+		if (!p)
+			return -ENOMEM;
+
+		p->foo = bar; /* initialize my stuff */
+			...
+		p->baz = true;
+
+		return 0;
+	}
+
+A corresponding close function, seq_release_private() is available which
+frees the memory allocated in the corresponding open.
+
+The other operations of interest - read(), llseek(), and release() - are
+all implemented by the seq_file code itself. So a virtual file's
+file_operations structure will look like::
+
+	static const struct file_operations ct_file_ops = {
+	        .owner   = THIS_MODULE,
+	        .open    = ct_open,
+	        .read    = seq_read,
+	        .llseek  = seq_lseek,
+	        .release = seq_release
+	};
+
+There is also a seq_release_private() which passes the contents of the
+seq_file private field to kfree() before releasing the structure.
+
+The final step is the creation of the /proc file itself. In the example
+code, that is done in the initialization code in the usual way::
+
+	static int ct_init(void)
+	{
+	        struct proc_dir_entry *entry;
+
+	        proc_create("sequence", 0, NULL, &ct_file_ops);
+	        return 0;
+	}
+
+	module_init(ct_init);
+
+And that is pretty much it.
+
+
+seq_list
+========
+
+If your file will be iterating through a linked list, you may find these
+routines useful::
+
+	struct list_head *seq_list_start(struct list_head *head,
+	       		 		 loff_t pos);
+	struct list_head *seq_list_start_head(struct list_head *head,
+			 		      loff_t pos);
+	struct list_head *seq_list_next(void *v, struct list_head *head,
+					loff_t *ppos);
+
+These helpers will interpret pos as a position within the list and iterate
+accordingly.  Your start() and next() functions need only invoke the
+``seq_list_*`` helpers with a pointer to the appropriate list_head structure.
+
+
+The extra-simple version
+========================
+
+For extremely simple virtual files, there is an even easier interface.  A
+module can define only the show() function, which should create all the
+output that the virtual file will contain. The file's open() method then
+calls::
+
+	int single_open(struct file *file,
+	                int (*show)(struct seq_file *m, void *p),
+	                void *data);
+
+When output time comes, the show() function will be called once. The data
+value given to single_open() can be found in the private field of the
+seq_file structure. When using single_open(), the programmer should use
+single_release() instead of seq_release() in the file_operations structure
+to avoid a memory leak.