diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-29 04:24:24 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-29 04:24:24 +0000 |
commit | 12e8343068b906f8b2afddc5569968a8a91fa5b0 (patch) | |
tree | 75cc5e05a4392ea0292251898f992a15a16b172b /docs | |
parent | Initial commit. (diff) | |
download | markdown-it-py-ef6b3991640e41f44752cdb6502719ca58a762c8.tar.xz markdown-it-py-ef6b3991640e41f44752cdb6502719ca58a762c8.zip |
Adding upstream version 2.1.0.upstream/2.1.0upstream
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'docs')
-rw-r--r-- | docs/.gitignore | 1 | ||||
-rw-r--r-- | docs/Makefile | 28 | ||||
-rw-r--r-- | docs/_static/custom.css | 5 | ||||
-rw-r--r-- | docs/architecture.md | 176 | ||||
-rw-r--r-- | docs/conf.py | 150 | ||||
-rw-r--r-- | docs/contributing.md | 108 | ||||
-rw-r--r-- | docs/index.md | 41 | ||||
-rw-r--r-- | docs/other.md | 66 | ||||
-rw-r--r-- | docs/plugins.md | 50 | ||||
-rw-r--r-- | docs/using.md | 399 |
10 files changed, 1024 insertions, 0 deletions
diff --git a/docs/.gitignore b/docs/.gitignore new file mode 100644 index 0000000..fa65608 --- /dev/null +++ b/docs/.gitignore @@ -0,0 +1 @@ +*.ipynb diff --git a/docs/Makefile b/docs/Makefile new file mode 100644 index 0000000..d3e262d --- /dev/null +++ b/docs/Makefile @@ -0,0 +1,28 @@ +# Minimal makefile for Sphinx documentation +# + +# You can set these variables from the command line, and also +# from the environment for the first two. +SPHINXOPTS ?= +SPHINXBUILD ?= sphinx-build +SOURCEDIR = . +BUILDDIR = _build + +# Put it first so that "make" without argument is like "make help". +help: + @$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) + +.PHONY: help Makefile + +# Catch-all target: route all unknown targets to Sphinx using the new +# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS). +%: Makefile + @$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) + +# raise warnings to errors +html-strict: + @$(SPHINXBUILD) -b html -nW --keep-going "$(SOURCEDIR)" "$(BUILDDIR)/html" $(SPHINXOPTS) $(O) + +# increase logging level to verbose +html-verbose: + @$(SPHINXBUILD) -b html -v "$(SOURCEDIR)" "$(BUILDDIR)/html" $(SPHINXOPTS) $(O) diff --git a/docs/_static/custom.css b/docs/_static/custom.css new file mode 100644 index 0000000..9a16010 --- /dev/null +++ b/docs/_static/custom.css @@ -0,0 +1,5 @@ +.code-cell > .highlight > pre { + border-left-color: green; + border-left-width: medium; + border-left-style: solid; +} diff --git a/docs/architecture.md b/docs/architecture.md new file mode 100644 index 0000000..bebcf9d --- /dev/null +++ b/docs/architecture.md @@ -0,0 +1,176 @@ +(md/architecture)= + +# markdown-it design principles + +(md/data-flow)= +## Data flow + +Input data is parsed via nested chains of rules. There are 3 nested chains - +`core`, `block` & `inline`: + +``` +core + core.rule1 (normalize) + ... + core.ruleX + + block + block.rule1 (blockquote) + ... + block.ruleX + + core.ruleX1 (intermediate rule that applies on block tokens, nothing yet) + ... + core.ruleXX + + inline (applied to each block token with "inline" type) + inline.rule1 (text) + ... + inline.ruleX + + core.ruleYY (applies to all tokens) + ... (abbreviation, footnote, typographer, linkifier) + +``` + +The result of the parsing is a *list of tokens*, that will be passed to the `renderer` to generate the html content. + +These tokens can be themselves parsed again to generate more tokens (ex: a `list token` can be divided into multiple `inline tokens`). + +An `env` sandbox can be used alongside tokens to inject external variables for your parsers and renderers. + +Each chain (core / block / inline) uses an independent `state` object when parsing data, so that each parsing operation is independent and can be disabled on the fly. + + +## Token stream + +Instead of traditional AST we use more low-level data representation - *tokens*. +The difference is simple: + +- Tokens are a simple sequence (Array). +- Opening and closing tags are separate. +- There are special token objects, "inline containers", having nested tokens. + sequences with inline markup (bold, italic, text, ...). + +See [token class](https://github.com/executablebooks/markdown-it-py/tree/master/markdown_it/token.py) +for details about each token content. + +In total, a token stream is: + +- On the top level - array of paired or single "block" tokens: + - open/close for headers, lists, blockquotes, paragraphs, ... + - codes, fenced blocks, horizontal rules, html blocks, inlines containers +- Each inline token have a `.children` property with a nested token stream for inline content: + - open/close for strong, em, link, code, ... + - text, line breaks + +Why not AST? Because it's not needed for our tasks. We follow KISS principle. +If you wish - you can call a parser without a renderer and convert the token stream +to an AST. + +More details about tokens: + +- [Renderer source](https://github.com/executablebooks/markdown-it-py/tree/master/markdown_it/renderer.py) +- [Token source](https://github.com/executablebooks/markdown-it-py/tree/master/markdown_it/token.py) +- [Live demo](https://markdown-it.github.io/) - type your text and click `debug` tab. + + +## Rules + +Rules are functions, doing "magic" with parser `state` objects. A rule is associated with one or more *chains* and is unique. For instance, a `blockquote` token is associated with `blockquote`, `paragraph`, `heading` and `list` chains. + +Rules are managed by names via [Ruler](https://markdown-it.github.io/markdown-it/#Ruler) instances and can be `enabled` / `disabled` from the [MarkdownIt](https://markdown-it.github.io/markdown-it/#MarkdownIt) methods. + +You can note, that some rules have a `validation mode` - in this mode rules do not +modify the token stream, and only look ahead for the end of a token. It's one +important design principle - a token stream is "write only" on block & inline parse stages. + +Parsers are designed to keep rules independent of each other. You can safely enable/disable them, or +add new ones. There are no universal recipes for how to create new rules - design of +distributed state machines with good data isolation is a tricky business. But you +can investigate existing rules & plugins to see possible approaches. + +Also, in complex cases you can try to ask for help in tracker. Condition is very +simple - it should be clear from your ticket, that you studied docs, sources, +and tried to do something yourself. We never reject with help to real developers. + + +## Renderer + +After the token stream is generated, it's passed to a [renderer](https://github.com/executablebooks/markdown-it-py/tree/master/markdown_it/renderer.py). +It then plays all the tokens, passing each to a rule with the same name as token type. + +Renderer rules are located in `md.renderer.rules[name]` and are simple functions +with the same signature: + +```python +def function(renderer, tokens, idx, options, env): + return htmlResult +``` + +In many cases that allows easy output change even without parser intrusion. +For example, let's replace images with vimeo links to player's iframe: + +```python +import re +md = MarkdownIt("commonmark") + +vimeoRE = re.compile(r'^https?:\/\/(www\.)?vimeo.com\/(\d+)($|\/)') + +def render_vimeo(self, tokens, idx, options, env): + token = tokens[idx] + + if vimeoRE.match(token.attrs["src"]): + + ident = vimeoRE.match(token.attrs["src"])[2] + + return ('<div class="embed-responsive embed-responsive-16by9">\n' + + ' <iframe class="embed-responsive-item" src="//player.vimeo.com/video/' + + ident + '"></iframe>\n' + + '</div>\n') + return self.image(tokens, idx, options, env) + +md = MarkdownIt("commonmark") +md.add_render_rule("image", render_vimeo) +print(md.render("![](https://www.vimeo.com/123)")) +``` + +Here is another example, how to add `target="_blank"` to all links: + +```python +from markdown_it import MarkdownIt + +def render_blank_link(self, tokens, idx, options, env): + tokens[idx].attrSet("target", "_blank") + + # pass token to default renderer. + return self.renderToken(tokens, idx, options, env) + +md = MarkdownIt("commonmark") +md.add_render_rule("link_open", render_blank_link) +print(md.render("[a]\n\n[a]: b")) +``` + +Note, if you need to add attributes, you can do things without renderer override. +For example, you can update tokens in `core` chain. That is slower, than direct +renderer override, but can be more simple. + +You also can write your own renderer to generate other formats than HTML, such as +JSON/XML... You can even use it to generate AST. + +## Summary + +This was mentioned in [Data flow](md/data-flow), but let's repeat sequence again: + +1. Blocks are parsed, and top level of token stream filled with block tokens. +2. Content on inline containers is parsed, filling `.children` properties. +3. Rendering happens. + +And somewhere between you can apply additional transformations :) . Full content +of each chain can be seen on the top of +[parser_core.py](https://github.com/executablebooks/markdown-it-py/tree/master/markdown_it/parser_core.py), +[parser_block.py](https://github.com/executablebooks/markdown-it-py/tree/master/markdown_it/parser_block.py) and +[parser_inline.py](https://github.com/executablebooks/markdown-it-py/tree/master/markdown_it/parser_inline.py) +files. + +Also you can change output directly in [renderer](https://github.com/executablebooks/markdown-it-py/tree/master/markdown_it/renderer.py) for many simple cases. diff --git a/docs/conf.py b/docs/conf.py new file mode 100644 index 0000000..786eff0 --- /dev/null +++ b/docs/conf.py @@ -0,0 +1,150 @@ +# Configuration file for the Sphinx documentation builder. +# +# This file only contains a selection of the most common options. For a full +# list see the documentation: +# https://www.sphinx-doc.org/en/master/usage/configuration.html + +# -- Path setup -------------------------------------------------------------- + +# If extensions (or modules to document with autodoc) are in another directory, +# add these directories to sys.path here. If the directory is relative to the +# documentation root, use os.path.abspath to make it absolute, like shown here. +# +from glob import glob +import os + +# import sys +# sys.path.insert(0, os.path.abspath('.')) + + +# -- Project information ----------------------------------------------------- + +project = "markdown-it-py" +copyright = "2020, executable book project" +author = "executable book project" + + +# -- General configuration --------------------------------------------------- + +# Add any Sphinx extension module names here, as strings. They can be +# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom +# ones. +extensions = [ + "sphinx.ext.autodoc", + "sphinx.ext.viewcode", + "sphinx.ext.intersphinx", + "myst_parser", + "sphinx_copybutton", + "sphinx_design", +] + +# List of patterns, relative to source directory, that match files and +# directories to ignore when looking for source files. +# This pattern also affects html_static_path and html_extra_path. +exclude_patterns = ["_build", "Thumbs.db", ".DS_Store"] + +nitpicky = True +nitpick_ignore = [ + ("py:class", "Match"), + ("py:class", "Path"), + ("py:class", "x in the interval [0, 1)."), + ("py:class", "markdown_it.helpers.parse_link_destination._Result"), + ("py:class", "markdown_it.helpers.parse_link_title._Result"), + ("py:class", "MarkdownIt"), + ("py:class", "RuleFunc"), + ("py:class", "_NodeType"), + ("py:class", "typing_extensions.Protocol"), +] + + +# -- Options for HTML output ------------------------------------------------- + +# The theme to use for HTML and HTML Help pages. See the documentation for +# a list of builtin themes. +# +html_title = "markdown-it-py" +html_theme = "sphinx_book_theme" +html_theme_options = { + "use_edit_page_button": True, + "repository_url": "https://github.com/executablebooks/markdown-it-py", + "repository_branch": "master", + "path_to_docs": "docs", +} +html_static_path = ["_static"] +html_css_files = ["custom.css"] + +# Add any paths that contain custom static files (such as style sheets) here, +# relative to this directory. They are copied after the builtin static files, +# so a file named "default.css" will overwrite the builtin "default.css". +# html_static_path = ["_static"] + + +intersphinx_mapping = { + "python": ("https://docs.python.org/3.7", None), + "mdit-py-plugins": ("https://mdit-py-plugins.readthedocs.io/en/latest/", None), +} + + +def run_apidoc(app): + """generate apidoc + + See: https://github.com/rtfd/readthedocs.org/issues/1139 + """ + import os + import shutil + + import sphinx + from sphinx.ext import apidoc + + logger = sphinx.util.logging.getLogger(__name__) + logger.info("running apidoc") + # get correct paths + this_folder = os.path.abspath(os.path.dirname(os.path.realpath(__file__))) + api_folder = os.path.join(this_folder, "api") + module_path = os.path.normpath(os.path.join(this_folder, "../")) + ignore_paths = ["../profiler.py", "../conftest.py", "../tests", "../benchmarking"] + ignore_paths = [ + os.path.normpath(os.path.join(this_folder, p)) for p in ignore_paths + ] + # functions from these modules are all imported in the __init__.py with __all__ + for rule in ("block", "core", "inline"): + for path in glob( + os.path.normpath( + os.path.join(this_folder, f"../markdown_it/rules_{rule}/*.py") + ) + ): + if os.path.basename(path) not in ("__init__.py", f"state_{rule}.py"): + ignore_paths.append(path) + + if os.path.exists(api_folder): + shutil.rmtree(api_folder) + os.mkdir(api_folder) + + argv = ["-M", "--separate", "-o", api_folder, module_path] + ignore_paths + + apidoc.OPTIONS.append("ignore-module-all") + apidoc.main(argv) + + # we don't use this + if os.path.exists(os.path.join(api_folder, "modules.rst")): + os.remove(os.path.join(api_folder, "modules.rst")) + + +def setup(app): + """Add functions to the Sphinx setup.""" + if os.environ.get("SKIP_APIDOC", None) is None: + app.connect("builder-inited", run_apidoc) + + from sphinx.directives.code import CodeBlock + + class CodeCell(CodeBlock): + """Custom code block directive.""" + + def run(self): + """Run the directive.""" + self.options["class"] = ["code-cell"] + return super().run() + + # note, these could be run by myst-nb, + # but currently this causes a circular dependency issue + app.add_directive("code-cell", CodeCell) diff --git a/docs/contributing.md b/docs/contributing.md new file mode 100644 index 0000000..6c43e0e --- /dev/null +++ b/docs/contributing.md @@ -0,0 +1,108 @@ +# Contribute to markdown-it-py + +We welcome all contributions! ✨ + +See the [EBP Contributing Guide](https://executablebooks.org/en/latest/contributing.html) for general details, and below for guidance specific to markdown-it-py. + +Before continuing, make sure you've read: + +1. [Architecture description](md/architecture) +2. [Security considerations](md/security) +3. [API documentation](api/markdown_it) + +## Development guidance + +Details of the port can be found in the `markdown_it/port.yaml` and in `port.yaml` files, within the extension folders. + +## Code Style + +Code style is tested using [flake8](http://flake8.pycqa.org), with the configuration set in `.flake8`, and code formatted with [black](https://github.com/ambv/black). + +Installing with `markdown-it-py[code_style]` makes the [pre-commit](https://pre-commit.com/) package available, which will ensure this style is met before commits are submitted, by reformatting the code and testing for lint errors. +It can be setup by: + +```shell +>> cd markdown-it-py +>> pre-commit install +``` + +Editors like VS Code also have automatic code reformat utilities, which can adhere to this standard. + +All functions and class methods should be annotated with types and include a docstring. + +## Testing + +For code tests, markdown-it-py uses [pytest](https://docs.pytest.org)): + +```shell +>> cd markdown-it-py +>> pytest +``` + +You can also use [tox](https://tox.readthedocs.io), to run the tests in multiple isolated environments (see the `tox.ini` file for available test environments): + +```shell +>> cd markdown-it-py +>> tox -p +``` + +This can also be used to run benchmarking tests using [pytest-benchmark](https://pytest-benchmark.readthedocs.io): + +```shell +>> cd markdown-it-py +tox -e py38-bench-packages -- --benchmark-min-rounds 50 +``` + +For documentation build tests: + +```shell +>> cd markdown-it-py/docs +>> make clean +>> make html-strict +``` + +## Contributing a plugin + +1. Does it already exist as JavaScript implementation ([see npm](https://www.npmjs.com/search?q=keywords:markdown-it-plugin))? + Where possible try to port directly from that. + It is usually better to modify existing code, instead of writing all from scratch. +2. Try to find the right place for your plugin rule: + - Will it conflict with existing markup (by priority)? + - If yes - you need to write an inline or block rule. + - If no - you can morph tokens within core chains. + - Remember that token morphing in core chains is always more simple than writing + block or inline rules, if you don't copy existing ones. However, + block and inline rules are usually faster. + - Sometimes, it's enough to only modify the renderer, for example, to add + header IDs or `target="_blank"` for the links. + +## FAQ + +### I need async rule, how to do it? + +Sorry. You can't do it directly. All complex parsers are sync by nature. But you +can use workarounds: + +1. On parse phase, replace content by random number and store it in `env`. +2. Do async processing over collected data. +3. Render content and replace those random numbers with text; or replace first, then render. + +Alternatively, you can render HTML, then parse it to DOM, or +[cheerio](https://github.com/cheeriojs/cheerio) AST, and apply transformations +in a more convenient way. + +### How to replace part of text token with link? + +The right sequence is to split text to several tokens and add link tokens in between. +The result will be: `text` + `link_open` + `text` + `link_close` + `text`. + +See implementations of [linkify](https://github.com/markdown-it/markdown-it/blob/master/lib/rules_core/linkify.js) and [emoji](https://github.com/markdown-it/markdown-it-emoji/blob/master/lib/replace.js) - those do text token splits. + +__Note:__ Don't try to replace text with HTML markup! That's not secure. + +### Why is my inline rule not executed? + +The inline parser skips pieces of texts to optimize speed. It stops only on [a small set of chars](https://github.com/markdown-it/markdown-it/blob/master/lib/rules_inline/text.js), which can be tokens. We did not made this list extensible for performance reasons too. + +If you are absolutely sure that something important is missing there - create a +ticket and we will consider adding it as a new charcode. diff --git a/docs/index.md b/docs/index.md new file mode 100644 index 0000000..64fd344 --- /dev/null +++ b/docs/index.md @@ -0,0 +1,41 @@ +# markdown-it-py + +> Markdown parser done right. + +- {fa}`check,text-success mr-1` Follows the __[CommonMark spec](http://spec.commonmark.org/)__ for baseline parsing +- {fa}`check,text-success mr-1` Configurable syntax: you can add new rules and even replace existing ones. +- {fa}`check,text-success mr-1` Pluggable: Adds syntax extensions to extend the parser (see the [plugin list](md/plugins)) +- {fa}`check,text-success mr-1` High speed (see our [benchmarking tests](md/performance)) +- {fa}`check,text-success mr-1` [Safe by default](md/security) + +For a good introduction to [markdown-it] see the __[Live demo](https://markdown-it.github.io)__. +This is a Python port of the well used [markdown-it], and some of its associated plugins. +The driving design philosophy of the port has been to change as little of the fundamental code structure (file names, function name, etc) as possible, just sprinkling in a little Python syntactical sugar ✨. +It is very simple to write complimentary extensions for both language implementations! + +## References & Thanks + +Big thanks to the authors of [markdown-it] + +- Alex Kocharin [github/rlidwka](https://github.com/rlidwka) +- Vitaly Puzrin [github/puzrin](https://github.com/puzrin) + +Also [John MacFarlane](https://github.com/jgm) for his work on the CommonMark spec and reference implementations. + +## Related Links + +- <https://github.com/jgm/CommonMark> - reference CommonMark implementations in C & JS, also contains latest spec & online demo. +- <http://talk.commonmark.org> - CommonMark forum, good place to collaborate developers' efforts. + +```{toctree} +:maxdepth: 2 + +using +architecture +other +plugins +contributing +api/markdown_it +``` + +[markdown-it]: https://github.com/markdown-it/markdown-it diff --git a/docs/other.md b/docs/other.md new file mode 100644 index 0000000..4d77360 --- /dev/null +++ b/docs/other.md @@ -0,0 +1,66 @@ +(md/security)= + +# Security + +Many people don't understand that markdown format does not care much about security. +In many cases you have to pass output to sanitizers. +`markdown-it` provides 2 possible strategies to produce safe output: + +1. Don't enable HTML. Extend markup features with [plugins](md/plugins). + We think it's the best choice and use it by default. + - That's ok for 99% of user needs. + - Output will be safe without sanitizer. +2. Enable HTML and use external sanitizer package(s). + +Also by default `markdown-it` prohibits some kind of links, which could be used +for XSS: + +- `javascript:`, `vbscript:` +- `file:` +- `data:`, except some images (gif/png/jpeg/webp). + +So, by default `markdown-it` should be safe. We care about it. + +If you find a security problem - contact us via tracker or email. +Such reports are fixed with top priority. + +## Plugins + +Usually, plugins operate with tokenized content, and that's enough to provide safe output. + +But there is one non-evident case you should know - don't allow plugins to generate arbitrary element `id` and `name`. +If those depend on user input - always add prefixes to avoid DOM clobbering. +See [discussion](https://github.com/markdown-it/markdown-it/issues/28) for details. + +So, if you decide to use plugins that add extended class syntax or autogenerating header anchors - be careful. + +(md/performance)= + +# Performance + +You can view our continuous integration benchmarking analysis at: <https://executablebooks.github.io/markdown-it-py/dev/bench/>, +or you can run it for yourself within the repository: + +```console +$ tox -e py38-bench-packages -- --benchmark-columns mean,stddev + +Name (time in ms) Mean StdDev +--------------------------------------------------------------- +test_mistune 70.3272 (1.0) 0.7978 (1.0) +test_mistletoe 116.0919 (1.65) 6.2870 (7.88) +test_markdown_it_py 152.9022 (2.17) 4.2988 (5.39) +test_commonmark_py 326.9506 (4.65) 15.8084 (19.81) +test_pymarkdown 368.2712 (5.24) 7.5906 (9.51) +test_pymarkdown_extra 640.4913 (9.11) 15.1769 (19.02) +test_panflute 678.3547 (9.65) 9.4622 (11.86) +--------------------------------------------------------------- +``` + +As you can see, `markdown-it-py` doesn't pay with speed for it's flexibility. + +```{note} +`mistune` is not CommonMark compliant, which is what allows for its +faster parsing, at the expense of issues, for example, with nested inline parsing. +See [mistletoes's explanation](https://github.com/miyuchina/mistletoe/blob/master/performance.md) +for further details. +``` diff --git a/docs/plugins.md b/docs/plugins.md new file mode 100644 index 0000000..51a2fa6 --- /dev/null +++ b/docs/plugins.md @@ -0,0 +1,50 @@ +(md/plugins)= + +# Plugin extensions + +The following plugins are embedded within the core package: + +- [tables](https://help.github.com/articles/organizing-information-with-tables/) (GFM) +- [strikethrough](https://help.github.com/articles/basic-writing-and-formatting-syntax/#styling-text) (GFM) + +These can be enabled individually: + +```python +from markdown_it import MarkdownIt +md = MarkdownIt("commonmark").enable('table') +``` + +or as part of a configuration: + +```python +from markdown_it import MarkdownIt +md = MarkdownIt("gfm-like") +``` + +```{seealso} +See [](using.md) +``` + +Many other plugins are then available *via* the `mdit-py-plugins` package, including: + +- Front-matter +- Footnotes +- Definition lists +- Task lists +- Heading anchors +- LaTeX math +- Containers +- Word count + +For full information see: <https://mdit-py-plugins.readthedocs.io> + +Or you can write them yourself! + +They can be chained and loaded *via*: + +```python +from markdown_it import MarkdownIt +from mdit_py_plugins import plugin1, plugin2 +md = MarkdownIt().use(plugin1, keyword=value).use(plugin2, keyword=value) +html_string = md.render("some *Markdown*") +``` diff --git a/docs/using.md b/docs/using.md new file mode 100644 index 0000000..8387203 --- /dev/null +++ b/docs/using.md @@ -0,0 +1,399 @@ +--- +jupytext: + formats: ipynb,md:myst + text_representation: + extension: .md + format_name: myst + format_version: '0.8' + jupytext_version: 1.4.2 +kernelspec: + display_name: Python 3 + language: python + name: python3 +--- + +# Using `markdown_it` + +> This document can be opened to execute with [Jupytext](https://jupytext.readthedocs.io)! + +markdown-it-py may be used as an API *via* the [`markdown-it-py`](https://pypi.org/project/markdown-it-py/) package. + +The raw text is first parsed to syntax 'tokens', +then these are converted to other formats using 'renderers'. + ++++ + +## Quick-Start + +The simplest way to understand how text will be parsed is using: + +```{code-cell} python +from pprint import pprint +from markdown_it import MarkdownIt +``` + +```{code-cell} python +md = MarkdownIt() +md.render("some *text*") +``` + +```{code-cell} python +for token in md.parse("some *text*"): + print(token) + print() +``` + +## The Parser + ++++ + +The `MarkdownIt` class is instantiated with parsing configuration options, +dictating the syntax rules and additional options for the parser and renderer. +You can define this configuration *via* directly supplying a dictionary or a preset name: + +- `zero`: This configures the minimum components to parse text (i.e. just paragraphs and text) +- `commonmark` (default): This configures the parser to strictly comply with the [CommonMark specification](http://spec.commonmark.org/). +- `js-default`: This is the default in the JavaScript version. + Compared to `commonmark`, it disables HTML parsing and enables the table and strikethrough components. +- `gfm-like`: This configures the parser to approximately comply with the [GitHub Flavored Markdown specification](https://github.github.com/gfm/). + Compared to `commonmark`, it enables the table, strikethrough and linkify components. + **Important**, to use this configuration you must have `linkify-it-py` installed. + +```{code-cell} python +from markdown_it.presets import zero +zero.make() +``` + +```{code-cell} python +md = MarkdownIt("zero") +md.options +``` + +You can also override specific options: + +```{code-cell} python +md = MarkdownIt("zero", {"maxNesting": 99}) +md.options +``` + +```{code-cell} python +pprint(md.get_active_rules()) +``` + +You can find all the parsing rules in the source code: +`parser_core.py`, `parser_block.py`, +`parser_inline.py`. + +```{code-cell} python +pprint(md.get_all_rules()) +``` + +Any of the parsing rules can be enabled/disabled, and these methods are "chainable": + +```{code-cell} python +md.render("- __*emphasise this*__") +``` + +```{code-cell} python +md.enable(["list", "emphasis"]).render("- __*emphasise this*__") +``` + +You can temporarily modify rules with the `reset_rules` context manager. + +```{code-cell} python +with md.reset_rules(): + md.disable("emphasis") + print(md.render("__*emphasise this*__")) +md.render("__*emphasise this*__") +``` + +Additionally `renderInline` runs the parser with all block syntax rules disabled. + +```{code-cell} python +md.renderInline("__*emphasise this*__") +``` + +### Typographic components + +The `smartquotes` and `replacements` components are intended to improve typography: + +`smartquotes` will convert basic quote marks to their opening and closing variants: + +- 'single quotes' -> ‘single quotes’ +- "double quotes" -> “double quotes” + +`replacements` will replace particular text constructs: + +- ``(c)``, ``(C)`` → © +- ``(tm)``, ``(TM)`` → ™ +- ``(r)``, ``(R)`` → ® +- ``(p)``, ``(P)`` → § +- ``+-`` → ± +- ``...`` → … +- ``?....`` → ?.. +- ``!....`` → !.. +- ``????????`` → ??? +- ``!!!!!`` → !!! +- ``,,,`` → , +- ``--`` → &ndash +- ``---`` → &mdash + +Both of these components require typography to be turned on, as well as the components enabled: + +```{code-cell} python +md = MarkdownIt("commonmark", {"typographer": True}) +md.enable(["replacements", "smartquotes"]) +md.render("'single quotes' (c)") +``` + +### Linkify + +The `linkify` component requires that [linkify-it-py](https://github.com/tsutsu3/linkify-it-py) be installed (e.g. *via* `pip install markdown-it-py[linkify]`). +This allows URI autolinks to be identified, without the need for enclosing in `<>` brackets: + +```{code-cell} python +md = MarkdownIt("commonmark", {"linkify": True}) +md.enable(["linkify"]) +md.render("github.com") +``` + +### Plugins load + +Plugins load collections of additional syntax rules and render methods into the parser. +A number of useful plugins are available in [`mdit_py_plugins`](https://github.com/executablebooks/mdit-py-plugins) (see [the plugin list](./plugins.md)), +or you can create your own (following the [markdown-it design principles](./architecture.md)). + +```{code-cell} python +from markdown_it import MarkdownIt +import mdit_py_plugins +from mdit_py_plugins.front_matter import front_matter_plugin +from mdit_py_plugins.footnote import footnote_plugin + +md = ( + MarkdownIt() + .use(front_matter_plugin) + .use(footnote_plugin) + .enable('table') +) +text = (""" +--- +a: 1 +--- + +a | b +- | - +1 | 2 + +A footnote [^1] + +[^1]: some details +""") +md.render(text) +``` + +## The Token Stream + ++++ + +Before rendering, the text is parsed to a flat token stream of block level syntax elements, with nesting defined by opening (1) and closing (-1) attributes: + +```{code-cell} python +md = MarkdownIt("commonmark") +tokens = md.parse(""" +Here's some *text* + +1. a list + +> a *quote*""") +[(t.type, t.nesting) for t in tokens] +``` + +Naturally all openings should eventually be closed, +such that: + +```{code-cell} python +sum([t.nesting for t in tokens]) == 0 +``` + +All tokens are the same class, which can also be created outside the parser: + +```{code-cell} python +tokens[0] +``` + +```{code-cell} python +from markdown_it.token import Token +token = Token("paragraph_open", "p", 1, block=True, map=[1, 2]) +token == tokens[0] +``` + +The `'inline'` type token contain the inline tokens as children: + +```{code-cell} python +tokens[1] +``` + +You can serialize a token (and its children) to a JSONable dictionary using: + +```{code-cell} python +print(tokens[1].as_dict()) +``` + +This dictionary can also be deserialized: + +```{code-cell} python +Token.from_dict(tokens[1].as_dict()) +``` + +### Creating a syntax tree + +```{versionchanged} 0.7.0 +`nest_tokens` and `NestedTokens` are deprecated and replaced by `SyntaxTreeNode`. +``` + +In some use cases it may be useful to convert the token stream into a syntax tree, +with opening/closing tokens collapsed into a single token that contains children. + +```{code-cell} python +from markdown_it.tree import SyntaxTreeNode + +md = MarkdownIt("commonmark") +tokens = md.parse(""" +# Header + +Here's some text and an image ![title](image.png) + +1. a **list** + +> a *quote* +""") + +node = SyntaxTreeNode(tokens) +print(node.pretty(indent=2, show_text=True)) +``` + +You can then use methods to traverse the tree + +```{code-cell} python +node.children +``` + +```{code-cell} python +print(node[0]) +node[0].next_sibling +``` + +## Renderers + ++++ + +After the token stream is generated, it's passed to a [renderer](https://github.com/executablebooks/markdown-it-py/tree/master/markdown_it/renderer.py). +It then plays all the tokens, passing each to a rule with the same name as token type. + +Renderer rules are located in `md.renderer.rules` and are simple functions +with the same signature: + +```python +def function(renderer, tokens, idx, options, env): + return htmlResult +``` + ++++ + +You can inject render methods into the instantiated render class. + +```{code-cell} python +md = MarkdownIt("commonmark") + +def render_em_open(self, tokens, idx, options, env): + return '<em class="myclass">' + +md.add_render_rule("em_open", render_em_open) +md.render("*a*") +``` + +This is a slight change to the JS version, where the renderer argument is at the end. +Also `add_render_rule` method is specific to Python, rather than adding directly to the `md.renderer.rules`, this ensures the method is bound to the renderer. + ++++ + +You can also subclass a render and add the method there: + +```{code-cell} python +from markdown_it.renderer import RendererHTML + +class MyRenderer(RendererHTML): + def em_open(self, tokens, idx, options, env): + return '<em class="myclass">' + +md = MarkdownIt("commonmark", renderer_cls=MyRenderer) +md.render("*a*") +``` + +Plugins can support multiple render types, using the `__ouput__` attribute (this is currently a Python only feature). + +```{code-cell} python +from markdown_it.renderer import RendererHTML + +class MyRenderer1(RendererHTML): + __output__ = "html1" + +class MyRenderer2(RendererHTML): + __output__ = "html2" + +def plugin(md): + def render_em_open1(self, tokens, idx, options, env): + return '<em class="myclass1">' + def render_em_open2(self, tokens, idx, options, env): + return '<em class="myclass2">' + md.add_render_rule("em_open", render_em_open1, fmt="html1") + md.add_render_rule("em_open", render_em_open2, fmt="html2") + +md = MarkdownIt("commonmark", renderer_cls=MyRenderer1).use(plugin) +print(md.render("*a*")) + +md = MarkdownIt("commonmark", renderer_cls=MyRenderer2).use(plugin) +print(md.render("*a*")) +``` + +Here's a more concrete example; let's replace images with vimeo links to player's iframe: + +```{code-cell} python +import re +from markdown_it import MarkdownIt + +vimeoRE = re.compile(r'^https?:\/\/(www\.)?vimeo.com\/(\d+)($|\/)') + +def render_vimeo(self, tokens, idx, options, env): + token = tokens[idx] + + if vimeoRE.match(token.attrs["src"]): + + ident = vimeoRE.match(token.attrs["src"])[2] + + return ('<div class="embed-responsive embed-responsive-16by9">\n' + + ' <iframe class="embed-responsive-item" src="//player.vimeo.com/video/' + + ident + '"></iframe>\n' + + '</div>\n') + return self.image(tokens, idx, options, env) + +md = MarkdownIt("commonmark") +md.add_render_rule("image", render_vimeo) +print(md.render("![](https://www.vimeo.com/123)")) +``` + +Here is another example, how to add `target="_blank"` to all links: + +```{code-cell} python +from markdown_it import MarkdownIt + +def render_blank_link(self, tokens, idx, options, env): + tokens[idx].attrSet("target", "_blank") + + # pass token to default renderer. + return self.renderToken(tokens, idx, options, env) + +md = MarkdownIt("commonmark") +md.add_render_rule("link_open", render_blank_link) +print(md.render("[a]\n\n[a]: b")) +``` |