diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-05-04 11:33:32 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-05-04 11:33:32 +0000 |
commit | 1f403ad2197fc7442409f434ee574f3e6b46fb73 (patch) | |
tree | 0299c6dd11d5edfa918a29b6456bc1875f1d288c /doc/docs/formatterdevelopment.rst | |
parent | Initial commit. (diff) | |
download | pygments-1f403ad2197fc7442409f434ee574f3e6b46fb73.tar.xz pygments-1f403ad2197fc7442409f434ee574f3e6b46fb73.zip |
Adding upstream version 2.14.0+dfsg.upstream/2.14.0+dfsgupstream
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'doc/docs/formatterdevelopment.rst')
-rw-r--r-- | doc/docs/formatterdevelopment.rst | 169 |
1 files changed, 169 insertions, 0 deletions
diff --git a/doc/docs/formatterdevelopment.rst b/doc/docs/formatterdevelopment.rst new file mode 100644 index 0000000..2bfac05 --- /dev/null +++ b/doc/docs/formatterdevelopment.rst @@ -0,0 +1,169 @@ +.. -*- mode: rst -*- + +======================== +Write your own formatter +======================== + +As well as creating :doc:`your own lexer <lexerdevelopment>`, writing a new +formatter for Pygments is easy and straightforward. + +A formatter is a class that is initialized with some keyword arguments (the +formatter options) and that must provides a `format()` method. +Additionally a formatter should provide a `get_style_defs()` method that +returns the style definitions from the style in a form usable for the +formatter's output format. + + +Quickstart +========== + +The most basic formatter shipped with Pygments is the `NullFormatter`. It just +sends the value of a token to the output stream: + +.. sourcecode:: python + + from pygments.formatter import Formatter + + class NullFormatter(Formatter): + def format(self, tokensource, outfile): + for ttype, value in tokensource: + outfile.write(value) + +As you can see, the `format()` method is passed two parameters: `tokensource` +and `outfile`. The first is an iterable of ``(token_type, value)`` tuples, +the latter a file like object with a `write()` method. + +Because the formatter is that basic it doesn't overwrite the `get_style_defs()` +method. + + +Styles +====== + +Styles aren't instantiated but their metaclass provides some class functions +so that you can access the style definitions easily. + +Styles are iterable and yield tuples in the form ``(ttype, d)`` where `ttype` +is a token and `d` is a dict with the following keys: + +``'color'`` + Hexadecimal color value (eg: ``'ff0000'`` for red) or `None` if not + defined. + +``'bold'`` + `True` if the value should be bold + +``'italic'`` + `True` if the value should be italic + +``'underline'`` + `True` if the value should be underlined + +``'bgcolor'`` + Hexadecimal color value for the background (eg: ``'eeeeeee'`` for light + gray) or `None` if not defined. + +``'border'`` + Hexadecimal color value for the border (eg: ``'0000aa'`` for a dark + blue) or `None` for no border. + +Additional keys might appear in the future, formatters should ignore all keys +they don't support. + + +HTML 3.2 Formatter +================== + +For an more complex example, let's implement a HTML 3.2 Formatter. We don't +use CSS but inline markup (``<u>``, ``<font>``, etc). Because this isn't good +style this formatter isn't in the standard library ;-) + +.. sourcecode:: python + + from pygments.formatter import Formatter + + class OldHtmlFormatter(Formatter): + + def __init__(self, **options): + Formatter.__init__(self, **options) + + # create a dict of (start, end) tuples that wrap the + # value of a token so that we can use it in the format + # method later + self.styles = {} + + # we iterate over the `_styles` attribute of a style item + # that contains the parsed style values. + for token, style in self.style: + start = end = '' + # a style item is a tuple in the following form: + # colors are readily specified in hex: 'RRGGBB' + if style['color']: + start += '<font color="#%s">' % style['color'] + end = '</font>' + end + if style['bold']: + start += '<b>' + end = '</b>' + end + if style['italic']: + start += '<i>' + end = '</i>' + end + if style['underline']: + start += '<u>' + end = '</u>' + end + self.styles[token] = (start, end) + + def format(self, tokensource, outfile): + # lastval is a string we use for caching + # because it's possible that an lexer yields a number + # of consecutive tokens with the same token type. + # to minimize the size of the generated html markup we + # try to join the values of same-type tokens here + lastval = '' + lasttype = None + + # wrap the whole output with <pre> + outfile.write('<pre>') + + for ttype, value in tokensource: + # if the token type doesn't exist in the stylemap + # we try it with the parent of the token type + # eg: parent of Token.Literal.String.Double is + # Token.Literal.String + while ttype not in self.styles: + ttype = ttype.parent + if ttype == lasttype: + # the current token type is the same of the last + # iteration. cache it + lastval += value + else: + # not the same token as last iteration, but we + # have some data in the buffer. wrap it with the + # defined style and write it to the output file + if lastval: + stylebegin, styleend = self.styles[lasttype] + outfile.write(stylebegin + lastval + styleend) + # set lastval/lasttype to current values + lastval = value + lasttype = ttype + + # if something is left in the buffer, write it to the + # output file, then close the opened <pre> tag + if lastval: + stylebegin, styleend = self.styles[lasttype] + outfile.write(stylebegin + lastval + styleend) + outfile.write('</pre>\n') + +The comments should explain it. Again, this formatter doesn't override the +`get_style_defs()` method. If we would have used CSS classes instead of +inline HTML markup, we would need to generate the CSS first. For that +purpose the `get_style_defs()` method exists: + + +Generating Style Definitions +============================ + +Some formatters like the `LatexFormatter` and the `HtmlFormatter` don't +output inline markup but reference either macros or css classes. Because +the definitions of those are not part of the output, the `get_style_defs()` +method exists. It is passed one parameter (if it's used and how it's used +is up to the formatter) and has to return a string or ``None``. |