summaryrefslogtreecommitdiffstats
path: root/src/prompt_toolkit/contrib/regular_languages/__init__.py
blob: c947fd533771205313736267a4a03a51c5e3d464 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
r"""
Tool for expressing the grammar of an input as a regular language.
==================================================================

The grammar for the input of many simple command line interfaces can be
expressed by a regular language. Examples are PDB (the Python debugger); a
simple (bash-like) shell with "pwd", "cd", "cat" and "ls" commands; arguments
that you can pass to an executable; etc. It is possible to use regular
expressions for validation and parsing of such a grammar. (More about regular
languages: http://en.wikipedia.org/wiki/Regular_language)

Example
-------

Let's take the pwd/cd/cat/ls example. We want to have a shell that accepts
these three commands. "cd" is followed by a quoted directory name and "cat" is
followed by a quoted file name. (We allow quotes inside the filename when
they're escaped with a backslash.) We could define the grammar using the
following regular expression::

    grammar = \s* (
        pwd |
        ls |
        (cd  \s+ " ([^"]|\.)+ ") |
        (cat \s+ " ([^"]|\.)+ ")
    ) \s*


What can we do with this grammar?
---------------------------------

- Syntax highlighting: We could use this for instance to give file names
                       different color.
- Parse the result: .. We can extract the file names and commands by using a
                       regular expression with named groups.
- Input validation: .. Don't accept anything that does not match this grammar.
                       When combined with a parser, we can also recursively do
                       filename validation (and accept only existing files.)
- Autocompletion: .... Each part of the grammar can have its own autocompleter.
                       "cat" has to be completed using file names, while "cd"
                       has to be completed using directory names.

How does it work?
-----------------

As a user of this library, you have to define the grammar of the input as a
regular expression. The parts of this grammar where autocompletion, validation
or any other processing is required need to be marked using a regex named
group. Like ``(?P<varname>...)`` for instance.

When the input is processed for validation (for instance), the regex will
execute, the named group is captured, and the validator associated with this
named group will test the captured string.

There is one tricky bit:

    Often we operate on incomplete input (this is by definition the case for
    autocompletion) and we have to decide for the cursor position in which
    possible state the grammar it could be and in which way variables could be
    matched up to that point.

To solve this problem, the compiler takes the original regular expression and
translates it into a set of other regular expressions which each match certain
prefixes of the original regular expression. We generate one prefix regular
expression for every named variable (with this variable being the end of that
expression).


TODO: some examples of:
    - How to create a highlighter from this grammar.
    - How to create a validator from this grammar.
    - How to create an autocompleter from this grammar.
    - How to create a parser from this grammar.
"""
from __future__ import annotations

from .compiler import compile

__all__ = ["compile"]