summaryrefslogtreecommitdiffstats
path: root/docs/source/data.rst
diff options
context:
space:
mode:
Diffstat (limited to '')
-rw-r--r--docs/source/data.rst193
1 files changed, 193 insertions, 0 deletions
diff --git a/docs/source/data.rst b/docs/source/data.rst
new file mode 100644
index 0000000..a7352e1
--- /dev/null
+++ b/docs/source/data.rst
@@ -0,0 +1,193 @@
+
+.. _data-ext:
+
+Extracting Data
+===============
+
+**Note**: This feature is still in **BETA**, you should expect bugs and
+incompatible changes in the future.
+
+Log messages contain a good deal of useful data, but it's not always easy to get
+at. The log parser built into **lnav** is able to extract data as described by
+:ref:`log_formats` as well as discovering data in plain text messages. This data
+can then be queried and processed using the SQLite front-end that is also
+incorporated into **lnav**. As an example, the following Syslog message from
+:code:`sudo` can be processed to extract several key/value pairs::
+
+ Jul 31 11:42:26 Example-MacBook-Pro.local sudo[87024]: testuser : TTY=ttys004 ; PWD=/Users/testuser/github/lbuild ; USER=root ; COMMAND=/usr/bin/make install
+
+The data that can be extracted by the parser is viewable directly in **lnav**
+by pressing the 'p' key. The results will be shown in an overlay like the
+following::
+
+ Current Time: 2013-07-31T11:42:26.000 Original Time: 2013-07-31T11:42:26.000 Offset: +0.000
+ Known message fields:
+ ├ log_hostname = Example-MacBook-Pro.local
+ ├ log_procname = sudo
+ ├ log_pid = 87024
+ Discovered message fields:
+ ├ col_0 = testuser
+ ├ TTY = ttys004
+ ├ PWD = /Users/testuser/github/lbuild
+ ├ USER = root
+ └ COMMAND = /usr/bin/make install
+
+Notice that the parser has detected pairs of the form '<key>=<value>'. The data
+parser will also look for pairs separated by a colon. If there are no clearly
+demarcated pairs, then the parser will extract anything that looks like data
+values and assign them keys of the form 'col_N'. For example, two data values,
+an IPv4 address and a symbol, will be extracted from the following log
+message::
+
+ Apr 29 08:13:43 sample-centos5 avahi-daemon[2467]: Registering new address record for 10.1.10.62 on eth0.
+
+Since there are no keys for the values in the message, the parser will assign
+'col_0' for the IP address and 'col_1' for the symbol, as seen here::
+
+ Current Time: 2013-04-29T08:13:43.000 Original Time: 2013-04-29T08:13:43.000 Offset: +0.000
+ Known message fields:
+ ├ log_hostname = sample-centos5
+ ├ log_procname = avahi-daemon
+ ├ log_pid = 2467
+ Discovered message fields:
+ ├ col_0 = 10.1.10.62
+ └ col_1 = eth0
+
+Now that you have an idea of how the parser works, you can begin to perform
+queries on the data that is being extracted. The SQLite database engine is
+embedded into **lnav** and its `Virtual Table
+<http://www.sqlite.org/vtab.html>`_ mechanism is used to provide a means to
+process this log data. Each log format has its own table that can be used to
+access all of the loaded messages that are in that format. For accessing log
+message content that is more free-form, like the examples given here, the
+**logline** table can be used. The **logline** table is recreated for each
+query and is based on the format and pairs discovered in the log message at
+the top of the display.
+
+Queries can be performed by pressing the semi-colon (;) key in **lnav**. After
+pressing the key, the overlay showing any known or discovered fields will be
+displayed to give you an idea of what data is available. The query can be any
+`SQL query <http://sqlite.org/lang.html>`_ supported by SQLite. To make
+analysis easier, **lnav** includes many extra functions for processing strings,
+paths, and IP addresses. See :ref:`sql-ext` for more information.
+
+As an example, the simplest query to perform initially would be a "select all",
+like so:
+
+.. code-block:: sql
+
+ SELECT * FROM logline
+
+When this query is run against the second example log message given above, the
+following results are received::
+
+ log_line log_part log_time log_idle_msecs log_level log_hostname log_procname log_pid col_0 col_1
+
+ 292 p.0 2013-04-11T16:42:51.000 0 info localhost avahi-daemon 2480 fe80::a00:27ff:fe98:7f6e eth0
+ 293 p.0 2013-04-11T16:42:51.000 0 info localhost avahi-daemon 2480 10.0.2.15 eth0
+ 330 p.0 2013-04-11T16:47:02.000 0 info localhost avahi-daemon 2480 fe80::a00:27ff:fe98:7f6e eth0
+ 336 p.0 2013-04-11T16:47:02.000 0 info localhost avahi-daemon 2480 10.1.10.75 eth0
+ 343 p.0 2013-04-11T16:47:02.000 0 info localhost avahi-daemon 2480 10.1.10.75 eth0
+ 370 p.0 2013-04-11T16:59:39.000 0 info localhost avahi-daemon 2480 10.1.10.75 eth0
+ 377 p.0 2013-04-11T16:59:39.000 0 info localhost avahi-daemon 2480 10.1.10.75 eth0
+ 382 p.0 2013-04-11T16:59:41.000 0 info localhost avahi-daemon 2480 fe80::a00:27ff:fe98:7f6e eth0
+ 401 p.0 2013-04-11T17:20:45.000 0 info localhost avahi-daemon 4247 fe80::a00:27ff:fe98:7f6e eth0
+ 402 p.0 2013-04-11T17:20:45.000 0 info localhost avahi-daemon 4247 10.1.10.75 eth0
+
+ 735 p.0 2013-04-11T17:41:46.000 0 info sample-centos5 avahi-daemon 2465 fe80::a00:27ff:fe98:7f6e eth0
+ 736 p.0 2013-04-11T17:41:46.000 0 info sample-centos5 avahi-daemon 2465 10.1.10.75 eth0
+ 781 p.0 2013-04-12T03:32:30.000 0 info sample-centos5 avahi-daemon 2465 10.1.10.64 eth0
+ 788 p.0 2013-04-12T03:32:30.000 0 info sample-centos5 avahi-daemon 2465 10.1.10.64 eth0
+ 1166 p.0 2013-04-25T10:56:00.000 0 info sample-centos5 avahi-daemon 2467 fe80::a00:27ff:fe98:7f6e eth0
+ 1167 p.0 2013-04-25T10:56:00.000 0 info sample-centos5 avahi-daemon 2467 10.1.10.111 eth0
+ 1246 p.0 2013-04-26T06:06:25.000 0 info sample-centos5 avahi-daemon 2467 10.1.10.49 eth0
+ 1253 p.0 2013-04-26T06:06:25.000 0 info sample-centos5 avahi-daemon 2467 10.1.10.49 eth0
+ 1454 p.0 2013-04-28T06:53:55.000 0 info sample-centos5 avahi-daemon 2467 10.1.10.103 eth0
+ 1461 p.0 2013-04-28T06:53:55.000 0 info sample-centos5 avahi-daemon 2467 10.1.10.103 eth0
+
+ 1497 p.0 2013-04-29T08:13:43.000 0 info sample-centos5 avahi-daemon 2467 10.1.10.62 eth0
+ 1504 p.0 2013-04-29T08:13:43.000 0 info sample-centos5 avahi-daemon 2467 10.1.10.62 eth0
+
+Note that **lnav** is not returning results for all messages that are in this
+syslog file. Rather, it searches for messages that match the format for the
+given line and returns only those messages in results. In this case, that
+format is "Registering new address record for <IP> on <symbol>", which
+corresponds to the parts of the message that were not recognized as data.
+
+More sophisticated queries can be done, of course. For example, to find out the
+frequency of IP addresses mentioned in these messages, you can run:
+
+.. code-block:: sql
+
+ SELECT col_0,count(*) FROM logline GROUP BY col_0
+
+The results for this query are::
+
+ col_0 count(*)
+
+ 10.0.2.15 1
+ 10.1.10.49 2
+ 10.1.10.62 2
+ 10.1.10.64 2
+ 10.1.10.75 6
+ 10.1.10.103 2
+ 10.1.10.111 1
+ fe80::a00:27ff:fe98:7f6e 6
+
+Since this type of query is fairly common, **lnav** includes a "summarize"
+command that will compute the frequencies of identifiers as well as min, max,
+average, median, and standard deviation for number columns. In this case, you
+can run the following to compute the frequencies and return an ordered set of
+results::
+
+ :summarize col_0
+
+
+Recognized Data Types
+---------------------
+
+When searching for data to extract from log messages, **lnav** looks for the
+following set of patterns:
+
+
+Strings
+ Single and double-quoted strings. Example: "The quick brown fox."
+
+URLs
+ URLs that contain the '://' separator. Example: http://example.com
+
+Paths
+ File system paths. Examples: /path/to/file, ./relative/path
+
+MAC Address
+ Ethernet MAC addresses. Example: c4:2c:03:0e:e4:4a
+
+Hex Dumps
+ A colon-separated string of hex numbers. Example: e8:06:88:ff
+
+Date/Time
+ Date and time stamps of the form "YYYY-mm-DD" and "HH:MM:SS".
+
+IP Addresses
+ IPv4 and IPv6 addresses. Examples: 127.0.0.1, fe80::c62c:3ff:fe0e:e44a%en0
+
+UUID
+ The common formatting for 128-bit UUIDs. Example:
+ 0E305E39-F1E9-4DE4-B10B-5829E5DF54D0
+
+Version Numbers
+ Dot-separated version numbers. Example: 3.7.17
+
+Numbers
+ Numbers in base ten, hex, and octal formats. Examples: 1234, 0xbeef, 0777
+
+E-Mail Address
+ Strings that look close to an e-mail address. Example: gary@example.com
+
+Constants
+ Common constants in languages, like: true, false, null, None.
+
+Symbols
+ Words that follow the common conventions for symbols in programming
+ languages. For example, containing all capital letters, or separated
+ by colons. Example: SOME_CONSTANT_VALUE, namespace::value