diff options
Diffstat (limited to 'tests/test_cmark_spec/test_spec')
-rw-r--r-- | tests/test_cmark_spec/test_spec/test_file.html | 7174 |
1 files changed, 7174 insertions, 0 deletions
diff --git a/tests/test_cmark_spec/test_spec/test_file.html b/tests/test_cmark_spec/test_spec/test_file.html new file mode 100644 index 0000000..1c2dc3c --- /dev/null +++ b/tests/test_cmark_spec/test_spec/test_file.html @@ -0,0 +1,7174 @@ +<hr /> +<p>title: CommonMark Spec +author: John MacFarlane +version: 0.30 +date: '2021-06-19' +license: '<a href="http://creativecommons.org/licenses/by-sa/4.0/">CC-BY-SA 4.0</a>' +...</p> +<h1>Introduction</h1> +<h2>What is Markdown?</h2> +<p>Markdown is a plain text format for writing structured documents, +based on conventions for indicating formatting in email +and usenet posts. It was developed by John Gruber (with +help from Aaron Swartz) and released in 2004 in the form of a +<a href="http://daringfireball.net/projects/markdown/syntax">syntax description</a> +and a Perl script (<code>Markdown.pl</code>) for converting Markdown to +HTML. In the next decade, dozens of implementations were +developed in many languages. Some extended the original +Markdown syntax with conventions for footnotes, tables, and +other document elements. Some allowed Markdown documents to be +rendered in formats other than HTML. Websites like Reddit, +StackOverflow, and GitHub had millions of people using Markdown. +And Markdown started to be used beyond the web, to author books, +articles, slide shows, letters, and lecture notes.</p> +<p>What distinguishes Markdown from many other lightweight markup +syntaxes, which are often easier to write, is its readability. +As Gruber writes:</p> +<blockquote> +<p>The overriding design goal for Markdown's formatting syntax is +to make it as readable as possible. The idea is that a +Markdown-formatted document should be publishable as-is, as +plain text, without looking like it's been marked up with tags +or formatting instructions. +(<a href="http://daringfireball.net/projects/markdown/">http://daringfireball.net/projects/markdown/</a>)</p> +</blockquote> +<p>The point can be illustrated by comparing a sample of +<a href="http://www.methods.co.nz/asciidoc/">AsciiDoc</a> with +an equivalent sample of Markdown. Here is a sample of +AsciiDoc from the AsciiDoc manual:</p> +<pre><code>1. List item one. ++ +List item one continued with a second paragraph followed by an +Indented block. ++ +................. +$ ls *.sh +$ mv *.sh ~/tmp +................. ++ +List item continued with a third paragraph. + +2. List item two continued with an open block. ++ +-- +This paragraph is part of the preceding list item. + +a. This list is nested and does not require explicit item +continuation. ++ +This paragraph is part of the preceding list item. + +b. List item b. + +This paragraph belongs to item two of the outer list. +-- +</code></pre> +<p>And here is the equivalent in Markdown:</p> +<pre><code>1. List item one. + + List item one continued with a second paragraph followed by an + Indented block. + + $ ls *.sh + $ mv *.sh ~/tmp + + List item continued with a third paragraph. + +2. List item two continued with an open block. + + This paragraph is part of the preceding list item. + + 1. This list is nested and does not require explicit item continuation. + + This paragraph is part of the preceding list item. + + 2. List item b. + + This paragraph belongs to item two of the outer list. +</code></pre> +<p>The AsciiDoc version is, arguably, easier to write. You don't need +to worry about indentation. But the Markdown version is much easier +to read. The nesting of list items is apparent to the eye in the +source, not just in the processed document.</p> +<h2>Why is a spec needed?</h2> +<p>John Gruber's <a href="http://daringfireball.net/projects/markdown/syntax">canonical description of Markdown's +syntax</a> +does not specify the syntax unambiguously. Here are some examples of +questions it does not answer:</p> +<ol> +<li> +<p>How much indentation is needed for a sublist? The spec says that +continuation paragraphs need to be indented four spaces, but is +not fully explicit about sublists. It is natural to think that +they, too, must be indented four spaces, but <code>Markdown.pl</code> does +not require that. This is hardly a "corner case," and divergences +between implementations on this issue often lead to surprises for +users in real documents. (See <a href="http://article.gmane.org/gmane.text.markdown.general/1997">this comment by John +Gruber</a>.)</p> +</li> +<li> +<p>Is a blank line needed before a block quote or heading? +Most implementations do not require the blank line. However, +this can lead to unexpected results in hard-wrapped text, and +also to ambiguities in parsing (note that some implementations +put the heading inside the blockquote, while others do not). +(John Gruber has also spoken <a href="http://article.gmane.org/gmane.text.markdown.general/2146">in favor of requiring the blank +lines</a>.)</p> +</li> +<li> +<p>Is a blank line needed before an indented code block? +(<code>Markdown.pl</code> requires it, but this is not mentioned in the +documentation, and some implementations do not require it.)</p> +<pre><code class="language-markdown">paragraph + code? +</code></pre> +</li> +<li> +<p>What is the exact rule for determining when list items get +wrapped in <code><p></code> tags? Can a list be partially "loose" and partially +"tight"? What should we do with a list like this?</p> +<pre><code class="language-markdown">1. one + +2. two +3. three +</code></pre> +<p>Or this?</p> +<pre><code class="language-markdown">1. one + - a + + - b +2. two +</code></pre> +<p>(There are some relevant comments by John Gruber +<a href="http://article.gmane.org/gmane.text.markdown.general/2554">here</a>.)</p> +</li> +<li> +<p>Can list markers be indented? Can ordered list markers be right-aligned?</p> +<pre><code class="language-markdown"> 8. item 1 + 9. item 2 +10. item 2a +</code></pre> +</li> +<li> +<p>Is this one list with a thematic break in its second item, +or two lists separated by a thematic break?</p> +<pre><code class="language-markdown">* a +* * * * * +* b +</code></pre> +</li> +<li> +<p>When list markers change from numbers to bullets, do we have +two lists or one? (The Markdown syntax description suggests two, +but the perl scripts and many other implementations produce one.)</p> +<pre><code class="language-markdown">1. fee +2. fie +- foe +- fum +</code></pre> +</li> +<li> +<p>What are the precedence rules for the markers of inline structure? +For example, is the following a valid link, or does the code span +take precedence ?</p> +<pre><code class="language-markdown">[a backtick (`)](/url) and [another backtick (`)](/url). +</code></pre> +</li> +<li> +<p>What are the precedence rules for markers of emphasis and strong +emphasis? For example, how should the following be parsed?</p> +<pre><code class="language-markdown">*foo *bar* baz* +</code></pre> +</li> +<li> +<p>What are the precedence rules between block-level and inline-level +structure? For example, how should the following be parsed?</p> +<pre><code class="language-markdown">- `a long code span can contain a hyphen like this + - and it can screw things up` +</code></pre> +</li> +<li> +<p>Can list items include section headings? (<code>Markdown.pl</code> does not +allow this, but does allow blockquotes to include headings.)</p> +<pre><code class="language-markdown">- # Heading +</code></pre> +</li> +<li> +<p>Can list items be empty?</p> +<pre><code class="language-markdown">* a +* +* b +</code></pre> +</li> +<li> +<p>Can link references be defined inside block quotes or list items?</p> +<pre><code class="language-markdown">> Blockquote [foo]. +> +> [foo]: /url +</code></pre> +</li> +<li> +<p>If there are multiple definitions for the same reference, which takes +precedence?</p> +<pre><code class="language-markdown">[foo]: /url1 +[foo]: /url2 + +[foo][] +</code></pre> +</li> +</ol> +<p>In the absence of a spec, early implementers consulted <code>Markdown.pl</code> +to resolve these ambiguities. But <code>Markdown.pl</code> was quite buggy, and +gave manifestly bad results in many cases, so it was not a +satisfactory replacement for a spec.</p> +<p>Because there is no unambiguous spec, implementations have diverged +considerably. As a result, users are often surprised to find that +a document that renders one way on one system (say, a GitHub wiki) +renders differently on another (say, converting to docbook using +pandoc). To make matters worse, because nothing in Markdown counts +as a "syntax error," the divergence often isn't discovered right away.</p> +<h2>About this document</h2> +<p>This document attempts to specify Markdown syntax unambiguously. +It contains many examples with side-by-side Markdown and +HTML. These are intended to double as conformance tests. An +accompanying script <code>spec_tests.py</code> can be used to run the tests +against any Markdown program:</p> +<pre><code>python test/spec_tests.py --spec spec.txt --program PROGRAM +</code></pre> +<p>Since this document describes how Markdown is to be parsed into +an abstract syntax tree, it would have made sense to use an abstract +representation of the syntax tree instead of HTML. But HTML is capable +of representing the structural distinctions we need to make, and the +choice of HTML for the tests makes it possible to run the tests against +an implementation without writing an abstract syntax tree renderer.</p> +<p>Note that not every feature of the HTML samples is mandated by +the spec. For example, the spec says what counts as a link +destination, but it doesn't mandate that non-ASCII characters in +the URL be percent-encoded. To use the automatic tests, +implementers will need to provide a renderer that conforms to +the expectations of the spec examples (percent-encoding +non-ASCII characters in URLs). But a conforming implementation +can use a different renderer and may choose not to +percent-encode non-ASCII characters in URLs.</p> +<p>This document is generated from a text file, <code>spec.txt</code>, written +in Markdown with a small extension for the side-by-side tests. +The script <code>tools/makespec.py</code> can be used to convert <code>spec.txt</code> into +HTML or CommonMark (which can then be converted into other formats).</p> +<p>In the examples, the <code>→</code> character is used to represent tabs.</p> +<h1>Preliminaries</h1> +<h2>Characters and lines</h2> +<p>Any sequence of [characters] is a valid CommonMark +document.</p> +<p>A <a href="@">character</a> is a Unicode code point. Although some +code points (for example, combining accents) do not correspond to +characters in an intuitive sense, all code points count as characters +for purposes of this spec.</p> +<p>This spec does not specify an encoding; it thinks of lines as composed +of [characters] rather than bytes. A conforming parser may be limited +to a certain encoding.</p> +<p>A <a href="@">line</a> is a sequence of zero or more [characters] +other than line feed (<code>U+000A</code>) or carriage return (<code>U+000D</code>), +followed by a [line ending] or by the end of file.</p> +<p>A <a href="@">line ending</a> is a line feed (<code>U+000A</code>), a carriage return +(<code>U+000D</code>) not followed by a line feed, or a carriage return and a +following line feed.</p> +<p>A line containing no characters, or a line containing only spaces +(<code>U+0020</code>) or tabs (<code>U+0009</code>), is called a <a href="@">blank line</a>.</p> +<p>The following definitions of character classes will be used in this spec:</p> +<p>A <a href="@">Unicode whitespace character</a> is +any code point in the Unicode <code>Zs</code> general category, or a tab (<code>U+0009</code>), +line feed (<code>U+000A</code>), form feed (<code>U+000C</code>), or carriage return (<code>U+000D</code>).</p> +<p><a href="@">Unicode whitespace</a> is a sequence of one or more +[Unicode whitespace characters].</p> +<p>A <a href="@">tab</a> is <code>U+0009</code>.</p> +<p>A <a href="@">space</a> is <code>U+0020</code>.</p> +<p>An <a href="@">ASCII control character</a> is a character between <code>U+0000–1F</code> (both +including) or <code>U+007F</code>.</p> +<p>An <a href="@">ASCII punctuation character</a> +is <code>!</code>, <code>"</code>, <code>#</code>, <code>$</code>, <code>%</code>, <code>&</code>, <code>'</code>, <code>(</code>, <code>)</code>, +<code>*</code>, <code>+</code>, <code>,</code>, <code>-</code>, <code>.</code>, <code>/</code> (U+0021–2F), +<code>:</code>, <code>;</code>, <code><</code>, <code>=</code>, <code>></code>, <code>?</code>, <code>@</code> (U+003A–0040), +<code>[</code>, <code>\</code>, <code>]</code>, <code>^</code>, <code>_</code>, <code>`</code> (U+005B–0060), +<code>{</code>, <code>|</code>, <code>}</code>, or <code>~</code> (U+007B–007E).</p> +<p>A <a href="@">Unicode punctuation character</a> is an [ASCII +punctuation character] or anything in +the general Unicode categories <code>Pc</code>, <code>Pd</code>, <code>Pe</code>, <code>Pf</code>, <code>Pi</code>, <code>Po</code>, or <code>Ps</code>.</p> +<h2>Tabs</h2> +<p>Tabs in lines are not expanded to [spaces]. However, +in contexts where spaces help to define block structure, +tabs behave as if they were replaced by spaces with a tab stop +of 4 characters.</p> +<p>Thus, for example, a tab can be used instead of four spaces +in an indented code block. (Note, however, that internal +tabs are passed through as literal tabs, not expanded to +spaces.)</p> +<pre><code class="language-example">→foo→baz→→bim +. +<pre><code>foo→baz→→bim +</code></pre> +</code></pre> +<pre><code class="language-example"> →foo→baz→→bim +. +<pre><code>foo→baz→→bim +</code></pre> +</code></pre> +<pre><code class="language-example"> a→a + ὐ→a +. +<pre><code>a→a +ὐ→a +</code></pre> +</code></pre> +<p>In the following example, a continuation paragraph of a list +item is indented with a tab; this has exactly the same effect +as indentation with four spaces would:</p> +<pre><code class="language-example"> - foo + +→bar +. +<ul> +<li> +<p>foo</p> +<p>bar</p> +</li> +</ul> +</code></pre> +<pre><code class="language-example">- foo + +→→bar +. +<ul> +<li> +<p>foo</p> +<pre><code> bar +</code></pre> +</li> +</ul> +</code></pre> +<p>Normally the <code>></code> that begins a block quote may be followed +optionally by a space, which is not considered part of the +content. In the following case <code>></code> is followed by a tab, +which is treated as if it were expanded into three spaces. +Since one of these spaces is considered part of the +delimiter, <code>foo</code> is considered to be indented six spaces +inside the block quote context, so we get an indented +code block starting with two spaces.</p> +<pre><code class="language-example">>→→foo +. +<blockquote> +<pre><code> foo +</code></pre> +</blockquote> +</code></pre> +<pre><code class="language-example">-→→foo +. +<ul> +<li> +<pre><code> foo +</code></pre> +</li> +</ul> +</code></pre> +<pre><code class="language-example"> foo +→bar +. +<pre><code>foo +bar +</code></pre> +</code></pre> +<pre><code class="language-example"> - foo + - bar +→ - baz +. +<ul> +<li>foo +<ul> +<li>bar +<ul> +<li>baz</li> +</ul> +</li> +</ul> +</li> +</ul> +</code></pre> +<pre><code class="language-example">#→Foo +. +<h1>Foo</h1> +</code></pre> +<pre><code class="language-example">*→*→*→ +. +<hr /> +</code></pre> +<h2>Insecure characters</h2> +<p>For security reasons, the Unicode character <code>U+0000</code> must be replaced +with the REPLACEMENT CHARACTER (<code>U+FFFD</code>).</p> +<h2>Backslash escapes</h2> +<p>Any ASCII punctuation character may be backslash-escaped:</p> +<pre><code class="language-example">\!\"\#\$\%\&\'\(\)\*\+\,\-\.\/\:\;\<\=\>\?\@\[\\\]\^\_\`\{\|\}\~ +. +<p>!&quot;#$%&amp;'()*+,-./:;&lt;=&gt;?@[\]^_`{|}~</p> +</code></pre> +<p>Backslashes before other characters are treated as literal +backslashes:</p> +<pre><code class="language-example">\→\A\a\ \3\φ\« +. +<p>\→\A\a\ \3\φ\«</p> +</code></pre> +<p>Escaped characters are treated as regular characters and do +not have their usual Markdown meanings:</p> +<pre><code class="language-example">\*not emphasized* +\<br/> not a tag +\[not a link](/foo) +\`not code` +1\. not a list +\* not a list +\# not a heading +\[foo]: /url "not a reference" +\&ouml; not a character entity +. +<p>*not emphasized* +&lt;br/&gt; not a tag +[not a link](/foo) +`not code` +1. not a list +* not a list +# not a heading +[foo]: /url &quot;not a reference&quot; +&amp;ouml; not a character entity</p> +</code></pre> +<p>If a backslash is itself escaped, the following character is not:</p> +<pre><code class="language-example">\\*emphasis* +. +<p>\<em>emphasis</em></p> +</code></pre> +<p>A backslash at the end of the line is a [hard line break]:</p> +<pre><code class="language-example">foo\ +bar +. +<p>foo<br /> +bar</p> +</code></pre> +<p>Backslash escapes do not work in code blocks, code spans, autolinks, or +raw HTML:</p> +<pre><code class="language-example">`` \[\` `` +. +<p><code>\[\`</code></p> +</code></pre> +<pre><code class="language-example"> \[\] +. +<pre><code>\[\] +</code></pre> +</code></pre> +<pre><code class="language-example">~~~ +\[\] +~~~ +. +<pre><code>\[\] +</code></pre> +</code></pre> +<pre><code class="language-example"><http://example.com?find=\*> +. +<p><a href="http://example.com?find=%5C*">http://example.com?find=\*</a></p> +</code></pre> +<pre><code class="language-example"><a href="/bar\/)"> +. +<a href="/bar\/)"> +</code></pre> +<p>But they work in all other contexts, including URLs and link titles, +link references, and [info strings] in [fenced code blocks]:</p> +<pre><code class="language-example">[foo](/bar\* "ti\*tle") +. +<p><a href="/bar*" title="ti*tle">foo</a></p> +</code></pre> +<pre><code class="language-example">[foo] + +[foo]: /bar\* "ti\*tle" +. +<p><a href="/bar*" title="ti*tle">foo</a></p> +</code></pre> +<pre><code class="language-example">``` foo\+bar +foo +``` +. +<pre><code class="language-foo+bar">foo +</code></pre> +</code></pre> +<h2>Entity and numeric character references</h2> +<p>Valid HTML entity references and numeric character references +can be used in place of the corresponding Unicode character, +with the following exceptions:</p> +<ul> +<li> +<p>Entity and character references are not recognized in code +blocks and code spans.</p> +</li> +<li> +<p>Entity and character references cannot stand in place of +special characters that define structural elements in +CommonMark. For example, although <code>&#42;</code> can be used +in place of a literal <code>*</code> character, <code>&#42;</code> cannot replace +<code>*</code> in emphasis delimiters, bullet list markers, or thematic +breaks.</p> +</li> +</ul> +<p>Conforming CommonMark parsers need not store information about +whether a particular character was represented in the source +using a Unicode character or an entity reference.</p> +<p><a href="@">Entity references</a> consist of <code>&</code> + any of the valid +HTML5 entity names + <code>;</code>. The +document <a href="https://html.spec.whatwg.org/entities.json">https://html.spec.whatwg.org/entities.json</a> +is used as an authoritative source for the valid entity +references and their corresponding code points.</p> +<pre><code class="language-example">&nbsp; &amp; &copy; &AElig; &Dcaron; +&frac34; &HilbertSpace; &DifferentialD; +&ClockwiseContourIntegral; &ngE; +. +<p> &amp; © Æ Ď +¾ ℋ ⅆ +∲ ≧̸</p> +</code></pre> +<p><a href="@">Decimal numeric character +references</a> +consist of <code>&#</code> + a string of 1--7 arabic digits + <code>;</code>. A +numeric character reference is parsed as the corresponding +Unicode character. Invalid Unicode code points will be replaced by +the REPLACEMENT CHARACTER (<code>U+FFFD</code>). For security reasons, +the code point <code>U+0000</code> will also be replaced by <code>U+FFFD</code>.</p> +<pre><code class="language-example">&#35; &#1234; &#992; &#0; +. +<p># Ӓ Ϡ �</p> +</code></pre> +<p><a href="@">Hexadecimal numeric character +references</a> consist of <code>&#</code> + +either <code>X</code> or <code>x</code> + a string of 1-6 hexadecimal digits + <code>;</code>. +They too are parsed as the corresponding Unicode character (this +time specified with a hexadecimal numeral instead of decimal).</p> +<pre><code class="language-example">&#X22; &#XD06; &#xcab; +. +<p>&quot; ആ ಫ</p> +</code></pre> +<p>Here are some nonentities:</p> +<pre><code class="language-example">&nbsp &x; &#; &#x; +&#87654321; +&#abcdef0; +&ThisIsNotDefined; &hi?; +. +<p>&amp;nbsp &amp;x; &amp;#; &amp;#x; +&amp;#87654321; +&amp;#abcdef0; +&amp;ThisIsNotDefined; &amp;hi?;</p> +</code></pre> +<p>Although HTML5 does accept some entity references +without a trailing semicolon (such as <code>&copy</code>), these are not +recognized here, because it makes the grammar too ambiguous:</p> +<pre><code class="language-example">&copy +. +<p>&amp;copy</p> +</code></pre> +<p>Strings that are not on the list of HTML5 named entities are not +recognized as entity references either:</p> +<pre><code class="language-example">&MadeUpEntity; +. +<p>&amp;MadeUpEntity;</p> +</code></pre> +<p>Entity and numeric character references are recognized in any +context besides code spans or code blocks, including +URLs, [link titles], and [fenced code block][] [info strings]:</p> +<pre><code class="language-example"><a href="&ouml;&ouml;.html"> +. +<a href="&ouml;&ouml;.html"> +</code></pre> +<pre><code class="language-example">[foo](/f&ouml;&ouml; "f&ouml;&ouml;") +. +<p><a href="/f%C3%B6%C3%B6" title="föö">foo</a></p> +</code></pre> +<pre><code class="language-example">[foo] + +[foo]: /f&ouml;&ouml; "f&ouml;&ouml;" +. +<p><a href="/f%C3%B6%C3%B6" title="föö">foo</a></p> +</code></pre> +<pre><code class="language-example">``` f&ouml;&ouml; +foo +``` +. +<pre><code class="language-föö">foo +</code></pre> +</code></pre> +<p>Entity and numeric character references are treated as literal +text in code spans and code blocks:</p> +<pre><code class="language-example">`f&ouml;&ouml;` +. +<p><code>f&amp;ouml;&amp;ouml;</code></p> +</code></pre> +<pre><code class="language-example"> f&ouml;f&ouml; +. +<pre><code>f&amp;ouml;f&amp;ouml; +</code></pre> +</code></pre> +<p>Entity and numeric character references cannot be used +in place of symbols indicating structure in CommonMark +documents.</p> +<pre><code class="language-example">&#42;foo&#42; +*foo* +. +<p>*foo* +<em>foo</em></p> +</code></pre> +<pre><code class="language-example">&#42; foo + +* foo +. +<p>* foo</p> +<ul> +<li>foo</li> +</ul> +</code></pre> +<pre><code class="language-example">foo&#10;&#10;bar +. +<p>foo + +bar</p> +</code></pre> +<pre><code class="language-example">&#9;foo +. +<p>→foo</p> +</code></pre> +<pre><code class="language-example">[a](url &quot;tit&quot;) +. +<p>[a](url &quot;tit&quot;)</p> +</code></pre> +<h1>Blocks and inlines</h1> +<p>We can think of a document as a sequence of +<a href="@">blocks</a>---structural elements like paragraphs, block +quotations, lists, headings, rules, and code blocks. Some blocks (like +block quotes and list items) contain other blocks; others (like +headings and paragraphs) contain <a href="@">inline</a> content---text, +links, emphasized text, images, code spans, and so on.</p> +<h2>Precedence</h2> +<p>Indicators of block structure always take precedence over indicators +of inline structure. So, for example, the following is a list with +two items, not a list with one item containing a code span:</p> +<pre><code class="language-example">- `one +- two` +. +<ul> +<li>`one</li> +<li>two`</li> +</ul> +</code></pre> +<p>This means that parsing can proceed in two steps: first, the block +structure of the document can be discerned; second, text lines inside +paragraphs, headings, and other block constructs can be parsed for inline +structure. The second step requires information about link reference +definitions that will be available only at the end of the first +step. Note that the first step requires processing lines in sequence, +but the second can be parallelized, since the inline parsing of +one block element does not affect the inline parsing of any other.</p> +<h2>Container blocks and leaf blocks</h2> +<p>We can divide blocks into two types: +<a href="#container-blocks">container blocks</a>, +which can contain other blocks, and <a href="#leaf-blocks">leaf blocks</a>, +which cannot.</p> +<h1>Leaf blocks</h1> +<p>This section describes the different kinds of leaf block that make up a +Markdown document.</p> +<h2>Thematic breaks</h2> +<p>A line consisting of optionally up to three spaces of indentation, followed by a +sequence of three or more matching <code>-</code>, <code>_</code>, or <code>*</code> characters, each followed +optionally by any number of spaces or tabs, forms a +<a href="@">thematic break</a>.</p> +<pre><code class="language-example">*** +--- +___ +. +<hr /> +<hr /> +<hr /> +</code></pre> +<p>Wrong characters:</p> +<pre><code class="language-example">+++ +. +<p>+++</p> +</code></pre> +<pre><code class="language-example">=== +. +<p>===</p> +</code></pre> +<p>Not enough characters:</p> +<pre><code class="language-example">-- +** +__ +. +<p>-- +** +__</p> +</code></pre> +<p>Up to three spaces of indentation are allowed:</p> +<pre><code class="language-example"> *** + *** + *** +. +<hr /> +<hr /> +<hr /> +</code></pre> +<p>Four spaces of indentation is too many:</p> +<pre><code class="language-example"> *** +. +<pre><code>*** +</code></pre> +</code></pre> +<pre><code class="language-example">Foo + *** +. +<p>Foo +***</p> +</code></pre> +<p>More than three characters may be used:</p> +<pre><code class="language-example">_____________________________________ +. +<hr /> +</code></pre> +<p>Spaces and tabs are allowed between the characters:</p> +<pre><code class="language-example"> - - - +. +<hr /> +</code></pre> +<pre><code class="language-example"> ** * ** * ** * ** +. +<hr /> +</code></pre> +<pre><code class="language-example">- - - - +. +<hr /> +</code></pre> +<p>Spaces and tabs are allowed at the end:</p> +<pre><code class="language-example">- - - - +. +<hr /> +</code></pre> +<p>However, no other characters may occur in the line:</p> +<pre><code class="language-example">_ _ _ _ a + +a------ + +---a--- +. +<p>_ _ _ _ a</p> +<p>a------</p> +<p>---a---</p> +</code></pre> +<p>It is required that all of the characters other than spaces or tabs be the same. +So, this is not a thematic break:</p> +<pre><code class="language-example"> *-* +. +<p><em>-</em></p> +</code></pre> +<p>Thematic breaks do not need blank lines before or after:</p> +<pre><code class="language-example">- foo +*** +- bar +. +<ul> +<li>foo</li> +</ul> +<hr /> +<ul> +<li>bar</li> +</ul> +</code></pre> +<p>Thematic breaks can interrupt a paragraph:</p> +<pre><code class="language-example">Foo +*** +bar +. +<p>Foo</p> +<hr /> +<p>bar</p> +</code></pre> +<p>If a line of dashes that meets the above conditions for being a +thematic break could also be interpreted as the underline of a [setext +heading], the interpretation as a +[setext heading] takes precedence. Thus, for example, +this is a setext heading, not a paragraph followed by a thematic break:</p> +<pre><code class="language-example">Foo +--- +bar +. +<h2>Foo</h2> +<p>bar</p> +</code></pre> +<p>When both a thematic break and a list item are possible +interpretations of a line, the thematic break takes precedence:</p> +<pre><code class="language-example">* Foo +* * * +* Bar +. +<ul> +<li>Foo</li> +</ul> +<hr /> +<ul> +<li>Bar</li> +</ul> +</code></pre> +<p>If you want a thematic break in a list item, use a different bullet:</p> +<pre><code class="language-example">- Foo +- * * * +. +<ul> +<li>Foo</li> +<li> +<hr /> +</li> +</ul> +</code></pre> +<h2>ATX headings</h2> +<p>An <a href="@">ATX heading</a> +consists of a string of characters, parsed as inline content, between an +opening sequence of 1--6 unescaped <code>#</code> characters and an optional +closing sequence of any number of unescaped <code>#</code> characters. +The opening sequence of <code>#</code> characters must be followed by spaces or tabs, or +by the end of line. The optional closing sequence of <code>#</code>s must be preceded by +spaces or tabs and may be followed by spaces or tabs only. The opening +<code>#</code> character may be preceded by up to three spaces of indentation. The raw +contents of the heading are stripped of leading and trailing space or tabs +before being parsed as inline content. The heading level is equal to the number +of <code>#</code> characters in the opening sequence.</p> +<p>Simple headings:</p> +<pre><code class="language-example"># foo +## foo +### foo +#### foo +##### foo +###### foo +. +<h1>foo</h1> +<h2>foo</h2> +<h3>foo</h3> +<h4>foo</h4> +<h5>foo</h5> +<h6>foo</h6> +</code></pre> +<p>More than six <code>#</code> characters is not a heading:</p> +<pre><code class="language-example">####### foo +. +<p>####### foo</p> +</code></pre> +<p>At least one space or tab is required between the <code>#</code> characters and the +heading's contents, unless the heading is empty. Note that many +implementations currently do not require the space. However, the +space was required by the +<a href="http://www.aaronsw.com/2002/atx/atx.py">original ATX implementation</a>, +and it helps prevent things like the following from being parsed as +headings:</p> +<pre><code class="language-example">#5 bolt + +#hashtag +. +<p>#5 bolt</p> +<p>#hashtag</p> +</code></pre> +<p>This is not a heading, because the first <code>#</code> is escaped:</p> +<pre><code class="language-example">\## foo +. +<p>## foo</p> +</code></pre> +<p>Contents are parsed as inlines:</p> +<pre><code class="language-example"># foo *bar* \*baz\* +. +<h1>foo <em>bar</em> *baz*</h1> +</code></pre> +<p>Leading and trailing spaces or tabs are ignored in parsing inline content:</p> +<pre><code class="language-example"># foo +. +<h1>foo</h1> +</code></pre> +<p>Up to three spaces of indentation are allowed:</p> +<pre><code class="language-example"> ### foo + ## foo + # foo +. +<h3>foo</h3> +<h2>foo</h2> +<h1>foo</h1> +</code></pre> +<p>Four spaces of indentation is too many:</p> +<pre><code class="language-example"> # foo +. +<pre><code># foo +</code></pre> +</code></pre> +<pre><code class="language-example">foo + # bar +. +<p>foo +# bar</p> +</code></pre> +<p>A closing sequence of <code>#</code> characters is optional:</p> +<pre><code class="language-example">## foo ## + ### bar ### +. +<h2>foo</h2> +<h3>bar</h3> +</code></pre> +<p>It need not be the same length as the opening sequence:</p> +<pre><code class="language-example"># foo ################################## +##### foo ## +. +<h1>foo</h1> +<h5>foo</h5> +</code></pre> +<p>Spaces or tabs are allowed after the closing sequence:</p> +<pre><code class="language-example">### foo ### +. +<h3>foo</h3> +</code></pre> +<p>A sequence of <code>#</code> characters with anything but spaces or tabs following it +is not a closing sequence, but counts as part of the contents of the +heading:</p> +<pre><code class="language-example">### foo ### b +. +<h3>foo ### b</h3> +</code></pre> +<p>The closing sequence must be preceded by a space or tab:</p> +<pre><code class="language-example"># foo# +. +<h1>foo#</h1> +</code></pre> +<p>Backslash-escaped <code>#</code> characters do not count as part +of the closing sequence:</p> +<pre><code class="language-example">### foo \### +## foo #\## +# foo \# +. +<h3>foo ###</h3> +<h2>foo ###</h2> +<h1>foo #</h1> +</code></pre> +<p>ATX headings need not be separated from surrounding content by blank +lines, and they can interrupt paragraphs:</p> +<pre><code class="language-example">**** +## foo +**** +. +<hr /> +<h2>foo</h2> +<hr /> +</code></pre> +<pre><code class="language-example">Foo bar +# baz +Bar foo +. +<p>Foo bar</p> +<h1>baz</h1> +<p>Bar foo</p> +</code></pre> +<p>ATX headings can be empty:</p> +<pre><code class="language-example">## +# +### ### +. +<h2></h2> +<h1></h1> +<h3></h3> +</code></pre> +<h2>Setext headings</h2> +<p>A <a href="@">setext heading</a> consists of one or more +lines of text, not interrupted by a blank line, of which the first line does not +have more than 3 spaces of indentation, followed by +a [setext heading underline]. The lines of text must be such +that, were they not followed by the setext heading underline, +they would be interpreted as a paragraph: they cannot be +interpretable as a [code fence], [ATX heading][ATX headings], +[block quote][block quotes], [thematic break][thematic breaks], +[list item][list items], or [HTML block][HTML blocks].</p> +<p>A <a href="@">setext heading underline</a> is a sequence of +<code>=</code> characters or a sequence of <code>-</code> characters, with no more than 3 +spaces of indentation and any number of trailing spaces or tabs. If a line +containing a single <code>-</code> can be interpreted as an +empty [list items], it should be interpreted this way +and not as a [setext heading underline].</p> +<p>The heading is a level 1 heading if <code>=</code> characters are used in +the [setext heading underline], and a level 2 heading if <code>-</code> +characters are used. The contents of the heading are the result +of parsing the preceding lines of text as CommonMark inline +content.</p> +<p>In general, a setext heading need not be preceded or followed by a +blank line. However, it cannot interrupt a paragraph, so when a +setext heading comes after a paragraph, a blank line is needed between +them.</p> +<p>Simple examples:</p> +<pre><code class="language-example">Foo *bar* +========= + +Foo *bar* +--------- +. +<h1>Foo <em>bar</em></h1> +<h2>Foo <em>bar</em></h2> +</code></pre> +<p>The content of the header may span more than one line:</p> +<pre><code class="language-example">Foo *bar +baz* +==== +. +<h1>Foo <em>bar +baz</em></h1> +</code></pre> +<p>The contents are the result of parsing the headings's raw +content as inlines. The heading's raw content is formed by +concatenating the lines and removing initial and final +spaces or tabs.</p> +<pre><code class="language-example"> Foo *bar +baz*→ +==== +. +<h1>Foo <em>bar +baz</em></h1> +</code></pre> +<p>The underlining can be any length:</p> +<pre><code class="language-example">Foo +------------------------- + +Foo += +. +<h2>Foo</h2> +<h1>Foo</h1> +</code></pre> +<p>The heading content can be preceded by up to three spaces of indentation, and +need not line up with the underlining:</p> +<pre><code class="language-example"> Foo +--- + + Foo +----- + + Foo + === +. +<h2>Foo</h2> +<h2>Foo</h2> +<h1>Foo</h1> +</code></pre> +<p>Four spaces of indentation is too many:</p> +<pre><code class="language-example"> Foo + --- + + Foo +--- +. +<pre><code>Foo +--- + +Foo +</code></pre> +<hr /> +</code></pre> +<p>The setext heading underline can be preceded by up to three spaces of +indentation, and may have trailing spaces or tabs:</p> +<pre><code class="language-example">Foo + ---- +. +<h2>Foo</h2> +</code></pre> +<p>Four spaces of indentation is too many:</p> +<pre><code class="language-example">Foo + --- +. +<p>Foo +---</p> +</code></pre> +<p>The setext heading underline cannot contain internal spaces or tabs:</p> +<pre><code class="language-example">Foo += = + +Foo +--- - +. +<p>Foo += =</p> +<p>Foo</p> +<hr /> +</code></pre> +<p>Trailing spaces or tabs in the content line do not cause a hard line break:</p> +<pre><code class="language-example">Foo +----- +. +<h2>Foo</h2> +</code></pre> +<p>Nor does a backslash at the end:</p> +<pre><code class="language-example">Foo\ +---- +. +<h2>Foo\</h2> +</code></pre> +<p>Since indicators of block structure take precedence over +indicators of inline structure, the following are setext headings:</p> +<pre><code class="language-example">`Foo +---- +` + +<a title="a lot +--- +of dashes"/> +. +<h2>`Foo</h2> +<p>`</p> +<h2>&lt;a title=&quot;a lot</h2> +<p>of dashes&quot;/&gt;</p> +</code></pre> +<p>The setext heading underline cannot be a [lazy continuation +line] in a list item or block quote:</p> +<pre><code class="language-example">> Foo +--- +. +<blockquote> +<p>Foo</p> +</blockquote> +<hr /> +</code></pre> +<pre><code class="language-example">> foo +bar +=== +. +<blockquote> +<p>foo +bar +===</p> +</blockquote> +</code></pre> +<pre><code class="language-example">- Foo +--- +. +<ul> +<li>Foo</li> +</ul> +<hr /> +</code></pre> +<p>A blank line is needed between a paragraph and a following +setext heading, since otherwise the paragraph becomes part +of the heading's content:</p> +<pre><code class="language-example">Foo +Bar +--- +. +<h2>Foo +Bar</h2> +</code></pre> +<p>But in general a blank line is not required before or after +setext headings:</p> +<pre><code class="language-example">--- +Foo +--- +Bar +--- +Baz +. +<hr /> +<h2>Foo</h2> +<h2>Bar</h2> +<p>Baz</p> +</code></pre> +<p>Setext headings cannot be empty:</p> +<pre><code class="language-example"> +==== +. +<p>====</p> +</code></pre> +<p>Setext heading text lines must not be interpretable as block +constructs other than paragraphs. So, the line of dashes +in these examples gets interpreted as a thematic break:</p> +<pre><code class="language-example">--- +--- +. +<hr /> +<hr /> +</code></pre> +<pre><code class="language-example">- foo +----- +. +<ul> +<li>foo</li> +</ul> +<hr /> +</code></pre> +<pre><code class="language-example"> foo +--- +. +<pre><code>foo +</code></pre> +<hr /> +</code></pre> +<pre><code class="language-example">> foo +----- +. +<blockquote> +<p>foo</p> +</blockquote> +<hr /> +</code></pre> +<p>If you want a heading with <code>> foo</code> as its literal text, you can +use backslash escapes:</p> +<pre><code class="language-example">\> foo +------ +. +<h2>&gt; foo</h2> +</code></pre> +<p><strong>Compatibility note:</strong> Most existing Markdown implementations +do not allow the text of setext headings to span multiple lines. +But there is no consensus about how to interpret</p> +<pre><code class="language-markdown">Foo +bar +--- +baz +</code></pre> +<p>One can find four different interpretations:</p> +<ol> +<li>paragraph "Foo", heading "bar", paragraph "baz"</li> +<li>paragraph "Foo bar", thematic break, paragraph "baz"</li> +<li>paragraph "Foo bar --- baz"</li> +<li>heading "Foo bar", paragraph "baz"</li> +</ol> +<p>We find interpretation 4 most natural, and interpretation 4 +increases the expressive power of CommonMark, by allowing +multiline headings. Authors who want interpretation 1 can +put a blank line after the first paragraph:</p> +<pre><code class="language-example">Foo + +bar +--- +baz +. +<p>Foo</p> +<h2>bar</h2> +<p>baz</p> +</code></pre> +<p>Authors who want interpretation 2 can put blank lines around +the thematic break,</p> +<pre><code class="language-example">Foo +bar + +--- + +baz +. +<p>Foo +bar</p> +<hr /> +<p>baz</p> +</code></pre> +<p>or use a thematic break that cannot count as a [setext heading +underline], such as</p> +<pre><code class="language-example">Foo +bar +* * * +baz +. +<p>Foo +bar</p> +<hr /> +<p>baz</p> +</code></pre> +<p>Authors who want interpretation 3 can use backslash escapes:</p> +<pre><code class="language-example">Foo +bar +\--- +baz +. +<p>Foo +bar +--- +baz</p> +</code></pre> +<h2>Indented code blocks</h2> +<p>An <a href="@">indented code block</a> is composed of one or more +[indented chunks] separated by blank lines. +An <a href="@">indented chunk</a> is a sequence of non-blank lines, +each preceded by four or more spaces of indentation. The contents of the code +block are the literal contents of the lines, including trailing +[line endings], minus four spaces of indentation. +An indented code block has no [info string].</p> +<p>An indented code block cannot interrupt a paragraph, so there must be +a blank line between a paragraph and a following indented code block. +(A blank line is not needed, however, between a code block and a following +paragraph.)</p> +<pre><code class="language-example"> a simple + indented code block +. +<pre><code>a simple + indented code block +</code></pre> +</code></pre> +<p>If there is any ambiguity between an interpretation of indentation +as a code block and as indicating that material belongs to a [list +item][list items], the list item interpretation takes precedence:</p> +<pre><code class="language-example"> - foo + + bar +. +<ul> +<li> +<p>foo</p> +<p>bar</p> +</li> +</ul> +</code></pre> +<pre><code class="language-example">1. foo + + - bar +. +<ol> +<li> +<p>foo</p> +<ul> +<li>bar</li> +</ul> +</li> +</ol> +</code></pre> +<p>The contents of a code block are literal text, and do not get parsed +as Markdown:</p> +<pre><code class="language-example"> <a/> + *hi* + + - one +. +<pre><code>&lt;a/&gt; +*hi* + +- one +</code></pre> +</code></pre> +<p>Here we have three chunks separated by blank lines:</p> +<pre><code class="language-example"> chunk1 + + chunk2 + + + + chunk3 +. +<pre><code>chunk1 + +chunk2 + + + +chunk3 +</code></pre> +</code></pre> +<p>Any initial spaces or tabs beyond four spaces of indentation will be included in +the content, even in interior blank lines:</p> +<pre><code class="language-example"> chunk1 + + chunk2 +. +<pre><code>chunk1 + + chunk2 +</code></pre> +</code></pre> +<p>An indented code block cannot interrupt a paragraph. (This +allows hanging indents and the like.)</p> +<pre><code class="language-example">Foo + bar + +. +<p>Foo +bar</p> +</code></pre> +<p>However, any non-blank line with fewer than four spaces of indentation ends +the code block immediately. So a paragraph may occur immediately +after indented code:</p> +<pre><code class="language-example"> foo +bar +. +<pre><code>foo +</code></pre> +<p>bar</p> +</code></pre> +<p>And indented code can occur immediately before and after other kinds of +blocks:</p> +<pre><code class="language-example"># Heading + foo +Heading +------ + foo +---- +. +<h1>Heading</h1> +<pre><code>foo +</code></pre> +<h2>Heading</h2> +<pre><code>foo +</code></pre> +<hr /> +</code></pre> +<p>The first line can be preceded by more than four spaces of indentation:</p> +<pre><code class="language-example"> foo + bar +. +<pre><code> foo +bar +</code></pre> +</code></pre> +<p>Blank lines preceding or following an indented code block +are not included in it:</p> +<pre><code class="language-example"> + + foo + + +. +<pre><code>foo +</code></pre> +</code></pre> +<p>Trailing spaces or tabs are included in the code block's content:</p> +<pre><code class="language-example"> foo +. +<pre><code>foo +</code></pre> +</code></pre> +<h2>Fenced code blocks</h2> +<p>A <a href="@">code fence</a> is a sequence +of at least three consecutive backtick characters (<code>`</code>) or +tildes (<code>~</code>). (Tildes and backticks cannot be mixed.) +A <a href="@">fenced code block</a> +begins with a code fence, preceded by up to three spaces of indentation.</p> +<p>The line with the opening code fence may optionally contain some text +following the code fence; this is trimmed of leading and trailing +spaces or tabs and called the <a href="@">info string</a>. If the [info string] comes +after a backtick fence, it may not contain any backtick +characters. (The reason for this restriction is that otherwise +some inline code would be incorrectly interpreted as the +beginning of a fenced code block.)</p> +<p>The content of the code block consists of all subsequent lines, until +a closing [code fence] of the same type as the code block +began with (backticks or tildes), and with at least as many backticks +or tildes as the opening code fence. If the leading code fence is +preceded by N spaces of indentation, then up to N spaces of indentation are +removed from each line of the content (if present). (If a content line is not +indented, it is preserved unchanged. If it is indented N spaces or less, all +of the indentation is removed.)</p> +<p>The closing code fence may be preceded by up to three spaces of indentation, and +may be followed only by spaces or tabs, which are ignored. If the end of the +containing block (or document) is reached and no closing code fence +has been found, the code block contains all of the lines after the +opening code fence until the end of the containing block (or +document). (An alternative spec would require backtracking in the +event that a closing code fence is not found. But this makes parsing +much less efficient, and there seems to be no real down side to the +behavior described here.)</p> +<p>A fenced code block may interrupt a paragraph, and does not require +a blank line either before or after.</p> +<p>The content of a code fence is treated as literal text, not parsed +as inlines. The first word of the [info string] is typically used to +specify the language of the code sample, and rendered in the <code>class</code> +attribute of the <code>code</code> tag. However, this spec does not mandate any +particular treatment of the [info string].</p> +<p>Here is a simple example with backticks:</p> +<pre><code class="language-example">``` +< + > +``` +. +<pre><code>&lt; + &gt; +</code></pre> +</code></pre> +<p>With tildes:</p> +<pre><code class="language-example">~~~ +< + > +~~~ +. +<pre><code>&lt; + &gt; +</code></pre> +</code></pre> +<p>Fewer than three backticks is not enough:</p> +<pre><code class="language-example">`` +foo +`` +. +<p><code>foo</code></p> +</code></pre> +<p>The closing code fence must use the same character as the opening +fence:</p> +<pre><code class="language-example">``` +aaa +~~~ +``` +. +<pre><code>aaa +~~~ +</code></pre> +</code></pre> +<pre><code class="language-example">~~~ +aaa +``` +~~~ +. +<pre><code>aaa +``` +</code></pre> +</code></pre> +<p>The closing code fence must be at least as long as the opening fence:</p> +<pre><code class="language-example">```` +aaa +``` +`````` +. +<pre><code>aaa +``` +</code></pre> +</code></pre> +<pre><code class="language-example">~~~~ +aaa +~~~ +~~~~ +. +<pre><code>aaa +~~~ +</code></pre> +</code></pre> +<p>Unclosed code blocks are closed by the end of the document +(or the enclosing [block quote][block quotes] or [list item][list items]):</p> +<pre><code class="language-example">``` +. +<pre><code></code></pre> +</code></pre> +<pre><code class="language-example">````` + +``` +aaa +. +<pre><code> +``` +aaa +</code></pre> +</code></pre> +<pre><code class="language-example">> ``` +> aaa + +bbb +. +<blockquote> +<pre><code>aaa +</code></pre> +</blockquote> +<p>bbb</p> +</code></pre> +<p>A code block can have all empty lines as its content:</p> +<pre><code class="language-example">``` + + +``` +. +<pre><code> + +</code></pre> +</code></pre> +<p>A code block can be empty:</p> +<pre><code class="language-example">``` +``` +. +<pre><code></code></pre> +</code></pre> +<p>Fences can be indented. If the opening fence is indented, +content lines will have equivalent opening indentation removed, +if present:</p> +<pre><code class="language-example"> ``` + aaa +aaa +``` +. +<pre><code>aaa +aaa +</code></pre> +</code></pre> +<pre><code class="language-example"> ``` +aaa + aaa +aaa + ``` +. +<pre><code>aaa +aaa +aaa +</code></pre> +</code></pre> +<pre><code class="language-example"> ``` + aaa + aaa + aaa + ``` +. +<pre><code>aaa + aaa +aaa +</code></pre> +</code></pre> +<p>Four spaces of indentation is too many:</p> +<pre><code class="language-example"> ``` + aaa + ``` +. +<pre><code>``` +aaa +``` +</code></pre> +</code></pre> +<p>Closing fences may be preceded by up to three spaces of indentation, and their +indentation need not match that of the opening fence:</p> +<pre><code class="language-example">``` +aaa + ``` +. +<pre><code>aaa +</code></pre> +</code></pre> +<pre><code class="language-example"> ``` +aaa + ``` +. +<pre><code>aaa +</code></pre> +</code></pre> +<p>This is not a closing fence, because it is indented 4 spaces:</p> +<pre><code class="language-example">``` +aaa + ``` +. +<pre><code>aaa + ``` +</code></pre> +</code></pre> +<p>Code fences (opening and closing) cannot contain internal spaces or tabs:</p> +<pre><code class="language-example">``` ``` +aaa +. +<p><code> </code> +aaa</p> +</code></pre> +<pre><code class="language-example">~~~~~~ +aaa +~~~ ~~ +. +<pre><code>aaa +~~~ ~~ +</code></pre> +</code></pre> +<p>Fenced code blocks can interrupt paragraphs, and can be followed +directly by paragraphs, without a blank line between:</p> +<pre><code class="language-example">foo +``` +bar +``` +baz +. +<p>foo</p> +<pre><code>bar +</code></pre> +<p>baz</p> +</code></pre> +<p>Other blocks can also occur before and after fenced code blocks +without an intervening blank line:</p> +<pre><code class="language-example">foo +--- +~~~ +bar +~~~ +# baz +. +<h2>foo</h2> +<pre><code>bar +</code></pre> +<h1>baz</h1> +</code></pre> +<p>An [info string] can be provided after the opening code fence. +Although this spec doesn't mandate any particular treatment of +the info string, the first word is typically used to specify +the language of the code block. In HTML output, the language is +normally indicated by adding a class to the <code>code</code> element consisting +of <code>language-</code> followed by the language name.</p> +<pre><code class="language-example">```ruby +def foo(x) + return 3 +end +``` +. +<pre><code class="language-ruby">def foo(x) + return 3 +end +</code></pre> +</code></pre> +<pre><code class="language-example">~~~~ ruby startline=3 $%@#$ +def foo(x) + return 3 +end +~~~~~~~ +. +<pre><code class="language-ruby">def foo(x) + return 3 +end +</code></pre> +</code></pre> +<pre><code class="language-example">````; +```` +. +<pre><code class="language-;"></code></pre> +</code></pre> +<p>[Info strings] for backtick code blocks cannot contain backticks:</p> +<pre><code class="language-example">``` aa ``` +foo +. +<p><code>aa</code> +foo</p> +</code></pre> +<p>[Info strings] for tilde code blocks can contain backticks and tildes:</p> +<pre><code class="language-example">~~~ aa ``` ~~~ +foo +~~~ +. +<pre><code class="language-aa">foo +</code></pre> +</code></pre> +<p>Closing code fences cannot have [info strings]:</p> +<pre><code class="language-example">``` +``` aaa +``` +. +<pre><code>``` aaa +</code></pre> +</code></pre> +<h2>HTML blocks</h2> +<p>An <a href="@">HTML block</a> is a group of lines that is treated +as raw HTML (and will not be escaped in HTML output).</p> +<p>There are seven kinds of [HTML block], which can be defined by their +start and end conditions. The block begins with a line that meets a +<a href="@">start condition</a> (after up to three optional spaces of indentation). +It ends with the first subsequent line that meets a matching +<a href="@">end condition</a>, or the last line of the document, or the last line of +the <a href="#container-blocks">container block</a> containing the current HTML +block, if no line is encountered that meets the [end condition]. If +the first line meets both the [start condition] and the [end +condition], the block will contain just that line.</p> +<ol> +<li> +<p><strong>Start condition:</strong> line begins with the string <code><pre</code>, +<code><script</code>, <code><style</code>, or <code><textarea</code> (case-insensitive), followed by a space, +a tab, the string <code>></code>, or the end of the line.<br /> +<strong>End condition:</strong> line contains an end tag +<code></pre></code>, <code></script></code>, <code></style></code>, or <code></textarea></code> (case-insensitive; it +need not match the start tag).</p> +</li> +<li> +<p><strong>Start condition:</strong> line begins with the string <code><!--</code>.<br /> +<strong>End condition:</strong> line contains the string <code>--></code>.</p> +</li> +<li> +<p><strong>Start condition:</strong> line begins with the string <code><?</code>.<br /> +<strong>End condition:</strong> line contains the string <code>?></code>.</p> +</li> +<li> +<p><strong>Start condition:</strong> line begins with the string <code><!</code> +followed by an ASCII letter.<br /> +<strong>End condition:</strong> line contains the character <code>></code>.</p> +</li> +<li> +<p><strong>Start condition:</strong> line begins with the string +<code><![CDATA[</code>.<br /> +<strong>End condition:</strong> line contains the string <code>]]></code>.</p> +</li> +<li> +<p><strong>Start condition:</strong> line begins the string <code><</code> or <code></</code> +followed by one of the strings (case-insensitive) <code>address</code>, +<code>article</code>, <code>aside</code>, <code>base</code>, <code>basefont</code>, <code>blockquote</code>, <code>body</code>, +<code>caption</code>, <code>center</code>, <code>col</code>, <code>colgroup</code>, <code>dd</code>, <code>details</code>, <code>dialog</code>, +<code>dir</code>, <code>div</code>, <code>dl</code>, <code>dt</code>, <code>fieldset</code>, <code>figcaption</code>, <code>figure</code>, +<code>footer</code>, <code>form</code>, <code>frame</code>, <code>frameset</code>, +<code>h1</code>, <code>h2</code>, <code>h3</code>, <code>h4</code>, <code>h5</code>, <code>h6</code>, <code>head</code>, <code>header</code>, <code>hr</code>, +<code>html</code>, <code>iframe</code>, <code>legend</code>, <code>li</code>, <code>link</code>, <code>main</code>, <code>menu</code>, <code>menuitem</code>, +<code>nav</code>, <code>noframes</code>, <code>ol</code>, <code>optgroup</code>, <code>option</code>, <code>p</code>, <code>param</code>, +<code>section</code>, <code>source</code>, <code>summary</code>, <code>table</code>, <code>tbody</code>, <code>td</code>, +<code>tfoot</code>, <code>th</code>, <code>thead</code>, <code>title</code>, <code>tr</code>, <code>track</code>, <code>ul</code>, followed +by a space, a tab, the end of the line, the string <code>></code>, or +the string <code>/></code>.<br /> +<strong>End condition:</strong> line is followed by a [blank line].</p> +</li> +<li> +<p><strong>Start condition:</strong> line begins with a complete [open tag] +(with any [tag name] other than <code>pre</code>, <code>script</code>, +<code>style</code>, or <code>textarea</code>) or a complete [closing tag], +followed by zero or more spaces and tabs, followed by the end of the line.<br /> +<strong>End condition:</strong> line is followed by a [blank line].</p> +</li> +</ol> +<p>HTML blocks continue until they are closed by their appropriate +[end condition], or the last line of the document or other <a href="#container-blocks">container +block</a>. This means any HTML <strong>within an HTML +block</strong> that might otherwise be recognised as a start condition will +be ignored by the parser and passed through as-is, without changing +the parser's state.</p> +<p>For instance, <code><pre></code> within an HTML block started by <code><table></code> will not affect +the parser state; as the HTML block was started in by start condition 6, it +will end at any blank line. This can be surprising:</p> +<pre><code class="language-example"><table><tr><td> +<pre> +**Hello**, + +_world_. +</pre> +</td></tr></table> +. +<table><tr><td> +<pre> +**Hello**, +<p><em>world</em>. +</pre></p> +</td></tr></table> +</code></pre> +<p>In this case, the HTML block is terminated by the blank line — the <code>**Hello**</code> +text remains verbatim — and regular parsing resumes, with a paragraph, +emphasised <code>world</code> and inline and block HTML following.</p> +<p>All types of [HTML blocks] except type 7 may interrupt +a paragraph. Blocks of type 7 may not interrupt a paragraph. +(This restriction is intended to prevent unwanted interpretation +of long tags inside a wrapped paragraph as starting HTML blocks.)</p> +<p>Some simple examples follow. Here are some basic HTML blocks +of type 6:</p> +<pre><code class="language-example"><table> + <tr> + <td> + hi + </td> + </tr> +</table> + +okay. +. +<table> + <tr> + <td> + hi + </td> + </tr> +</table> +<p>okay.</p> +</code></pre> +<pre><code class="language-example"> <div> + *hello* + <foo><a> +. + <div> + *hello* + <foo><a> +</code></pre> +<p>A block can also start with a closing tag:</p> +<pre><code class="language-example"></div> +*foo* +. +</div> +*foo* +</code></pre> +<p>Here we have two HTML blocks with a Markdown paragraph between them:</p> +<pre><code class="language-example"><DIV CLASS="foo"> + +*Markdown* + +</DIV> +. +<DIV CLASS="foo"> +<p><em>Markdown</em></p> +</DIV> +</code></pre> +<p>The tag on the first line can be partial, as long +as it is split where there would be whitespace:</p> +<pre><code class="language-example"><div id="foo" + class="bar"> +</div> +. +<div id="foo" + class="bar"> +</div> +</code></pre> +<pre><code class="language-example"><div id="foo" class="bar + baz"> +</div> +. +<div id="foo" class="bar + baz"> +</div> +</code></pre> +<p>An open tag need not be closed:</p> +<pre><code class="language-example"><div> +*foo* + +*bar* +. +<div> +*foo* +<p><em>bar</em></p> +</code></pre> +<p>A partial tag need not even be completed (garbage +in, garbage out):</p> +<pre><code class="language-example"><div id="foo" +*hi* +. +<div id="foo" +*hi* +</code></pre> +<pre><code class="language-example"><div class +foo +. +<div class +foo +</code></pre> +<p>The initial tag doesn't even need to be a valid +tag, as long as it starts like one:</p> +<pre><code class="language-example"><div *???-&&&-<--- +*foo* +. +<div *???-&&&-<--- +*foo* +</code></pre> +<p>In type 6 blocks, the initial tag need not be on a line by +itself:</p> +<pre><code class="language-example"><div><a href="bar">*foo*</a></div> +. +<div><a href="bar">*foo*</a></div> +</code></pre> +<pre><code class="language-example"><table><tr><td> +foo +</td></tr></table> +. +<table><tr><td> +foo +</td></tr></table> +</code></pre> +<p>Everything until the next blank line or end of document +gets included in the HTML block. So, in the following +example, what looks like a Markdown code block +is actually part of the HTML block, which continues until a blank +line or the end of the document is reached:</p> +<pre><code class="language-example"><div></div> +``` c +int x = 33; +``` +. +<div></div> +``` c +int x = 33; +``` +</code></pre> +<p>To start an [HTML block] with a tag that is <em>not</em> in the +list of block-level tags in (6), you must put the tag by +itself on the first line (and it must be complete):</p> +<pre><code class="language-example"><a href="foo"> +*bar* +</a> +. +<a href="foo"> +*bar* +</a> +</code></pre> +<p>In type 7 blocks, the [tag name] can be anything:</p> +<pre><code class="language-example"><Warning> +*bar* +</Warning> +. +<Warning> +*bar* +</Warning> +</code></pre> +<pre><code class="language-example"><i class="foo"> +*bar* +</i> +. +<i class="foo"> +*bar* +</i> +</code></pre> +<pre><code class="language-example"></ins> +*bar* +. +</ins> +*bar* +</code></pre> +<p>These rules are designed to allow us to work with tags that +can function as either block-level or inline-level tags. +The <code><del></code> tag is a nice example. We can surround content with +<code><del></code> tags in three different ways. In this case, we get a raw +HTML block, because the <code><del></code> tag is on a line by itself:</p> +<pre><code class="language-example"><del> +*foo* +</del> +. +<del> +*foo* +</del> +</code></pre> +<p>In this case, we get a raw HTML block that just includes +the <code><del></code> tag (because it ends with the following blank +line). So the contents get interpreted as CommonMark:</p> +<pre><code class="language-example"><del> + +*foo* + +</del> +. +<del> +<p><em>foo</em></p> +</del> +</code></pre> +<p>Finally, in this case, the <code><del></code> tags are interpreted +as [raw HTML] <em>inside</em> the CommonMark paragraph. (Because +the tag is not on a line by itself, we get inline HTML +rather than an [HTML block].)</p> +<pre><code class="language-example"><del>*foo*</del> +. +<p><del><em>foo</em></del></p> +</code></pre> +<p>HTML tags designed to contain literal content +(<code>pre</code>, <code>script</code>, <code>style</code>, <code>textarea</code>), comments, processing instructions, +and declarations are treated somewhat differently. +Instead of ending at the first blank line, these blocks +end at the first line containing a corresponding end tag. +As a result, these blocks can contain blank lines:</p> +<p>A pre tag (type 1):</p> +<pre><code class="language-example"><pre language="haskell"><code> +import Text.HTML.TagSoup + +main :: IO () +main = print $ parseTags tags +</code></pre> +okay +. +<pre language="haskell"><code> +import Text.HTML.TagSoup + +main :: IO () +main = print $ parseTags tags +</code></pre> +<p>okay</p> +</code></pre> +<p>A script tag (type 1):</p> +<pre><code class="language-example"><script type="text/javascript"> +// JavaScript example + +document.getElementById("demo").innerHTML = "Hello JavaScript!"; +</script> +okay +. +<script type="text/javascript"> +// JavaScript example + +document.getElementById("demo").innerHTML = "Hello JavaScript!"; +</script> +<p>okay</p> +</code></pre> +<p>A textarea tag (type 1):</p> +<pre><code class="language-example"><textarea> + +*foo* + +_bar_ + +</textarea> +. +<textarea> + +*foo* + +_bar_ + +</textarea> +</code></pre> +<p>A style tag (type 1):</p> +<pre><code class="language-example"><style + type="text/css"> +h1 {color:red;} + +p {color:blue;} +</style> +okay +. +<style + type="text/css"> +h1 {color:red;} + +p {color:blue;} +</style> +<p>okay</p> +</code></pre> +<p>If there is no matching end tag, the block will end at the +end of the document (or the enclosing [block quote][block quotes] +or [list item][list items]):</p> +<pre><code class="language-example"><style + type="text/css"> + +foo +. +<style + type="text/css"> + +foo +</code></pre> +<pre><code class="language-example">> <div> +> foo + +bar +. +<blockquote> +<div> +foo +</blockquote> +<p>bar</p> +</code></pre> +<pre><code class="language-example">- <div> +- foo +. +<ul> +<li> +<div> +</li> +<li>foo</li> +</ul> +</code></pre> +<p>The end tag can occur on the same line as the start tag:</p> +<pre><code class="language-example"><style>p{color:red;}</style> +*foo* +. +<style>p{color:red;}</style> +<p><em>foo</em></p> +</code></pre> +<pre><code class="language-example"><!-- foo -->*bar* +*baz* +. +<!-- foo -->*bar* +<p><em>baz</em></p> +</code></pre> +<p>Note that anything on the last line after the +end tag will be included in the [HTML block]:</p> +<pre><code class="language-example"><script> +foo +</script>1. *bar* +. +<script> +foo +</script>1. *bar* +</code></pre> +<p>A comment (type 2):</p> +<pre><code class="language-example"><!-- Foo + +bar + baz --> +okay +. +<!-- Foo + +bar + baz --> +<p>okay</p> +</code></pre> +<p>A processing instruction (type 3):</p> +<pre><code class="language-example"><?php + + echo '>'; + +?> +okay +. +<?php + + echo '>'; + +?> +<p>okay</p> +</code></pre> +<p>A declaration (type 4):</p> +<pre><code class="language-example"><!DOCTYPE html> +. +<!DOCTYPE html> +</code></pre> +<p>CDATA (type 5):</p> +<pre><code class="language-example"><![CDATA[ +function matchwo(a,b) +{ + if (a < b && a < 0) then { + return 1; + + } else { + + return 0; + } +} +]]> +okay +. +<![CDATA[ +function matchwo(a,b) +{ + if (a < b && a < 0) then { + return 1; + + } else { + + return 0; + } +} +]]> +<p>okay</p> +</code></pre> +<p>The opening tag can be preceded by up to three spaces of indentation, but not +four:</p> +<pre><code class="language-example"> <!-- foo --> + + <!-- foo --> +. + <!-- foo --> +<pre><code>&lt;!-- foo --&gt; +</code></pre> +</code></pre> +<pre><code class="language-example"> <div> + + <div> +. + <div> +<pre><code>&lt;div&gt; +</code></pre> +</code></pre> +<p>An HTML block of types 1--6 can interrupt a paragraph, and need not be +preceded by a blank line.</p> +<pre><code class="language-example">Foo +<div> +bar +</div> +. +<p>Foo</p> +<div> +bar +</div> +</code></pre> +<p>However, a following blank line is needed, except at the end of +a document, and except for blocks of types 1--5, [above][HTML +block]:</p> +<pre><code class="language-example"><div> +bar +</div> +*foo* +. +<div> +bar +</div> +*foo* +</code></pre> +<p>HTML blocks of type 7 cannot interrupt a paragraph:</p> +<pre><code class="language-example">Foo +<a href="bar"> +baz +. +<p>Foo +<a href="bar"> +baz</p> +</code></pre> +<p>This rule differs from John Gruber's original Markdown syntax +specification, which says:</p> +<blockquote> +<p>The only restrictions are that block-level HTML elements — +e.g. <code><div></code>, <code><table></code>, <code><pre></code>, <code><p></code>, etc. — must be separated from +surrounding content by blank lines, and the start and end tags of the +block should not be indented with spaces or tabs.</p> +</blockquote> +<p>In some ways Gruber's rule is more restrictive than the one given +here:</p> +<ul> +<li>It requires that an HTML block be preceded by a blank line.</li> +<li>It does not allow the start tag to be indented.</li> +<li>It requires a matching end tag, which it also does not allow to +be indented.</li> +</ul> +<p>Most Markdown implementations (including some of Gruber's own) do not +respect all of these restrictions.</p> +<p>There is one respect, however, in which Gruber's rule is more liberal +than the one given here, since it allows blank lines to occur inside +an HTML block. There are two reasons for disallowing them here. +First, it removes the need to parse balanced tags, which is +expensive and can require backtracking from the end of the document +if no matching end tag is found. Second, it provides a very simple +and flexible way of including Markdown content inside HTML tags: +simply separate the Markdown from the HTML using blank lines:</p> +<p>Compare:</p> +<pre><code class="language-example"><div> + +*Emphasized* text. + +</div> +. +<div> +<p><em>Emphasized</em> text.</p> +</div> +</code></pre> +<pre><code class="language-example"><div> +*Emphasized* text. +</div> +. +<div> +*Emphasized* text. +</div> +</code></pre> +<p>Some Markdown implementations have adopted a convention of +interpreting content inside tags as text if the open tag has +the attribute <code>markdown=1</code>. The rule given above seems a simpler and +more elegant way of achieving the same expressive power, which is also +much simpler to parse.</p> +<p>The main potential drawback is that one can no longer paste HTML +blocks into Markdown documents with 100% reliability. However, +<em>in most cases</em> this will work fine, because the blank lines in +HTML are usually followed by HTML block tags. For example:</p> +<pre><code class="language-example"><table> + +<tr> + +<td> +Hi +</td> + +</tr> + +</table> +. +<table> +<tr> +<td> +Hi +</td> +</tr> +</table> +</code></pre> +<p>There are problems, however, if the inner tags are indented +<em>and</em> separated by spaces, as then they will be interpreted as +an indented code block:</p> +<pre><code class="language-example"><table> + + <tr> + + <td> + Hi + </td> + + </tr> + +</table> +. +<table> + <tr> +<pre><code>&lt;td&gt; + Hi +&lt;/td&gt; +</code></pre> + </tr> +</table> +</code></pre> +<p>Fortunately, blank lines are usually not necessary and can be +deleted. The exception is inside <code><pre></code> tags, but as described +[above][HTML blocks], raw HTML blocks starting with <code><pre></code> +<em>can</em> contain blank lines.</p> +<h2>Link reference definitions</h2> +<p>A <a href="@">link reference definition</a> +consists of a [link label], optionally preceded by up to three spaces of +indentation, followed +by a colon (<code>:</code>), optional spaces or tabs (including up to one +[line ending]), a [link destination], +optional spaces or tabs (including up to one +[line ending]), and an optional [link +title], which if it is present must be separated +from the [link destination] by spaces or tabs. +No further character may occur.</p> +<p>A [link reference definition] +does not correspond to a structural element of a document. Instead, it +defines a label which can be used in [reference links] +and reference-style [images] elsewhere in the document. [Link +reference definitions] can come either before or after the links that use +them.</p> +<pre><code class="language-example">[foo]: /url "title" + +[foo] +. +<p><a href="/url" title="title">foo</a></p> +</code></pre> +<pre><code class="language-example"> [foo]: + /url + 'the title' + +[foo] +. +<p><a href="/url" title="the title">foo</a></p> +</code></pre> +<pre><code class="language-example">[Foo*bar\]]:my_(url) 'title (with parens)' + +[Foo*bar\]] +. +<p><a href="my_(url)" title="title (with parens)">Foo*bar]</a></p> +</code></pre> +<pre><code class="language-example">[Foo bar]: +<my url> +'title' + +[Foo bar] +. +<p><a href="my%20url" title="title">Foo bar</a></p> +</code></pre> +<p>The title may extend over multiple lines:</p> +<pre><code class="language-example">[foo]: /url ' +title +line1 +line2 +' + +[foo] +. +<p><a href="/url" title=" +title +line1 +line2 +">foo</a></p> +</code></pre> +<p>However, it may not contain a [blank line]:</p> +<pre><code class="language-example">[foo]: /url 'title + +with blank line' + +[foo] +. +<p>[foo]: /url 'title</p> +<p>with blank line'</p> +<p>[foo]</p> +</code></pre> +<p>The title may be omitted:</p> +<pre><code class="language-example">[foo]: +/url + +[foo] +. +<p><a href="/url">foo</a></p> +</code></pre> +<p>The link destination may not be omitted:</p> +<pre><code class="language-example">[foo]: + +[foo] +. +<p>[foo]:</p> +<p>[foo]</p> +</code></pre> +<p>However, an empty link destination may be specified using +angle brackets:</p> +<pre><code class="language-example">[foo]: <> + +[foo] +. +<p><a href="">foo</a></p> +</code></pre> +<p>The title must be separated from the link destination by +spaces or tabs:</p> +<pre><code class="language-example">[foo]: <bar>(baz) + +[foo] +. +<p>[foo]: <bar>(baz)</p> +<p>[foo]</p> +</code></pre> +<p>Both title and destination can contain backslash escapes +and literal backslashes:</p> +<pre><code class="language-example">[foo]: /url\bar\*baz "foo\"bar\baz" + +[foo] +. +<p><a href="/url%5Cbar*baz" title="foo&quot;bar\baz">foo</a></p> +</code></pre> +<p>A link can come before its corresponding definition:</p> +<pre><code class="language-example">[foo] + +[foo]: url +. +<p><a href="url">foo</a></p> +</code></pre> +<p>If there are several matching definitions, the first one takes +precedence:</p> +<pre><code class="language-example">[foo] + +[foo]: first +[foo]: second +. +<p><a href="first">foo</a></p> +</code></pre> +<p>As noted in the section on [Links], matching of labels is +case-insensitive (see [matches]).</p> +<pre><code class="language-example">[FOO]: /url + +[Foo] +. +<p><a href="/url">Foo</a></p> +</code></pre> +<pre><code class="language-example">[ΑΓΩ]: /φου + +[αγω] +. +<p><a href="/%CF%86%CE%BF%CF%85">αγω</a></p> +</code></pre> +<p>Whether something is a [link reference definition] is +independent of whether the link reference it defines is +used in the document. Thus, for example, the following +document contains just a link reference definition, and +no visible content:</p> +<pre><code class="language-example">[foo]: /url +. +</code></pre> +<p>Here is another one:</p> +<pre><code class="language-example">[ +foo +]: /url +bar +. +<p>bar</p> +</code></pre> +<p>This is not a link reference definition, because there are +characters other than spaces or tabs after the title:</p> +<pre><code class="language-example">[foo]: /url "title" ok +. +<p>[foo]: /url &quot;title&quot; ok</p> +</code></pre> +<p>This is a link reference definition, but it has no title:</p> +<pre><code class="language-example">[foo]: /url +"title" ok +. +<p>&quot;title&quot; ok</p> +</code></pre> +<p>This is not a link reference definition, because it is indented +four spaces:</p> +<pre><code class="language-example"> [foo]: /url "title" + +[foo] +. +<pre><code>[foo]: /url &quot;title&quot; +</code></pre> +<p>[foo]</p> +</code></pre> +<p>This is not a link reference definition, because it occurs inside +a code block:</p> +<pre><code class="language-example">``` +[foo]: /url +``` + +[foo] +. +<pre><code>[foo]: /url +</code></pre> +<p>[foo]</p> +</code></pre> +<p>A [link reference definition] cannot interrupt a paragraph.</p> +<pre><code class="language-example">Foo +[bar]: /baz + +[bar] +. +<p>Foo +[bar]: /baz</p> +<p>[bar]</p> +</code></pre> +<p>However, it can directly follow other block elements, such as headings +and thematic breaks, and it need not be followed by a blank line.</p> +<pre><code class="language-example"># [Foo] +[foo]: /url +> bar +. +<h1><a href="/url">Foo</a></h1> +<blockquote> +<p>bar</p> +</blockquote> +</code></pre> +<pre><code class="language-example">[foo]: /url +bar +=== +[foo] +. +<h1>bar</h1> +<p><a href="/url">foo</a></p> +</code></pre> +<pre><code class="language-example">[foo]: /url +=== +[foo] +. +<p>=== +<a href="/url">foo</a></p> +</code></pre> +<p>Several [link reference definitions] +can occur one after another, without intervening blank lines.</p> +<pre><code class="language-example">[foo]: /foo-url "foo" +[bar]: /bar-url + "bar" +[baz]: /baz-url + +[foo], +[bar], +[baz] +. +<p><a href="/foo-url" title="foo">foo</a>, +<a href="/bar-url" title="bar">bar</a>, +<a href="/baz-url">baz</a></p> +</code></pre> +<p>[Link reference definitions] can occur +inside block containers, like lists and block quotations. They +affect the entire document, not just the container in which they +are defined:</p> +<pre><code class="language-example">[foo] + +> [foo]: /url +. +<p><a href="/url">foo</a></p> +<blockquote> +</blockquote> +</code></pre> +<h2>Paragraphs</h2> +<p>A sequence of non-blank lines that cannot be interpreted as other +kinds of blocks forms a <a href="@">paragraph</a>. +The contents of the paragraph are the result of parsing the +paragraph's raw content as inlines. The paragraph's raw content +is formed by concatenating the lines and removing initial and final +spaces or tabs.</p> +<p>A simple example with two paragraphs:</p> +<pre><code class="language-example">aaa + +bbb +. +<p>aaa</p> +<p>bbb</p> +</code></pre> +<p>Paragraphs can contain multiple lines, but no blank lines:</p> +<pre><code class="language-example">aaa +bbb + +ccc +ddd +. +<p>aaa +bbb</p> +<p>ccc +ddd</p> +</code></pre> +<p>Multiple blank lines between paragraphs have no effect:</p> +<pre><code class="language-example">aaa + + +bbb +. +<p>aaa</p> +<p>bbb</p> +</code></pre> +<p>Leading spaces or tabs are skipped:</p> +<pre><code class="language-example"> aaa + bbb +. +<p>aaa +bbb</p> +</code></pre> +<p>Lines after the first may be indented any amount, since indented +code blocks cannot interrupt paragraphs.</p> +<pre><code class="language-example">aaa + bbb + ccc +. +<p>aaa +bbb +ccc</p> +</code></pre> +<p>However, the first line may be preceded by up to three spaces of indentation. +Four spaces of indentation is too many:</p> +<pre><code class="language-example"> aaa +bbb +. +<p>aaa +bbb</p> +</code></pre> +<pre><code class="language-example"> aaa +bbb +. +<pre><code>aaa +</code></pre> +<p>bbb</p> +</code></pre> +<p>Final spaces or tabs are stripped before inline parsing, so a paragraph +that ends with two or more spaces will not end with a [hard line +break]:</p> +<pre><code class="language-example">aaa +bbb +. +<p>aaa<br /> +bbb</p> +</code></pre> +<h2>Blank lines</h2> +<p>[Blank lines] between block-level elements are ignored, +except for the role they play in determining whether a [list] +is [tight] or [loose].</p> +<p>Blank lines at the beginning and end of the document are also ignored.</p> +<pre><code class="language-example"> + +aaa + + +# aaa + + +. +<p>aaa</p> +<h1>aaa</h1> +</code></pre> +<h1>Container blocks</h1> +<p>A <a href="#container-blocks">container block</a> is a block that has other +blocks as its contents. There are two basic kinds of container blocks: +[block quotes] and [list items]. +[Lists] are meta-containers for [list items].</p> +<p>We define the syntax for container blocks recursively. The general +form of the definition is:</p> +<blockquote> +<p>If X is a sequence of blocks, then the result of +transforming X in such-and-such a way is a container of type Y +with these blocks as its content.</p> +</blockquote> +<p>So, we explain what counts as a block quote or list item by explaining +how these can be <em>generated</em> from their contents. This should suffice +to define the syntax, although it does not give a recipe for <em>parsing</em> +these constructions. (A recipe is provided below in the section entitled +<a href="#appendix-a-parsing-strategy">A parsing strategy</a>.)</p> +<h2>Block quotes</h2> +<p>A <a href="@">block quote marker</a>, +optionally preceded by up to three spaces of indentation, +consists of (a) the character <code>></code> together with a following space of +indentation, or (b) a single character <code>></code> not followed by a space of +indentation.</p> +<p>The following rules define [block quotes]:</p> +<ol> +<li> +<p><strong>Basic case.</strong> If a string of lines <em>Ls</em> constitute a sequence +of blocks <em>Bs</em>, then the result of prepending a [block quote +marker] to the beginning of each line in <em>Ls</em> +is a <a href="#block-quotes">block quote</a> containing <em>Bs</em>.</p> +</li> +<li> +<p><strong>Laziness.</strong> If a string of lines <em>Ls</em> constitute a <a href="#block-quotes">block +quote</a> with contents <em>Bs</em>, then the result of deleting +the initial [block quote marker] from one or +more lines in which the next character other than a space or tab after the +[block quote marker] is [paragraph continuation +text] is a block quote with <em>Bs</em> as its content. +<a href="@">Paragraph continuation text</a> is text +that will be parsed as part of the content of a paragraph, but does +not occur at the beginning of the paragraph.</p> +</li> +<li> +<p><strong>Consecutiveness.</strong> A document cannot contain two [block +quotes] in a row unless there is a [blank line] between them.</p> +</li> +</ol> +<p>Nothing else counts as a <a href="#block-quotes">block quote</a>.</p> +<p>Here is a simple example:</p> +<pre><code class="language-example">> # Foo +> bar +> baz +. +<blockquote> +<h1>Foo</h1> +<p>bar +baz</p> +</blockquote> +</code></pre> +<p>The space or tab after the <code>></code> characters can be omitted:</p> +<pre><code class="language-example">># Foo +>bar +> baz +. +<blockquote> +<h1>Foo</h1> +<p>bar +baz</p> +</blockquote> +</code></pre> +<p>The <code>></code> characters can be preceded by up to three spaces of indentation:</p> +<pre><code class="language-example"> > # Foo + > bar + > baz +. +<blockquote> +<h1>Foo</h1> +<p>bar +baz</p> +</blockquote> +</code></pre> +<p>Four spaces of indentation is too many:</p> +<pre><code class="language-example"> > # Foo + > bar + > baz +. +<pre><code>&gt; # Foo +&gt; bar +&gt; baz +</code></pre> +</code></pre> +<p>The Laziness clause allows us to omit the <code>></code> before +[paragraph continuation text]:</p> +<pre><code class="language-example">> # Foo +> bar +baz +. +<blockquote> +<h1>Foo</h1> +<p>bar +baz</p> +</blockquote> +</code></pre> +<p>A block quote can contain some lazy and some non-lazy +continuation lines:</p> +<pre><code class="language-example">> bar +baz +> foo +. +<blockquote> +<p>bar +baz +foo</p> +</blockquote> +</code></pre> +<p>Laziness only applies to lines that would have been continuations of +paragraphs had they been prepended with [block quote markers]. +For example, the <code>> </code> cannot be omitted in the second line of</p> +<pre><code class="language-markdown">> foo +> --- +</code></pre> +<p>without changing the meaning:</p> +<pre><code class="language-example">> foo +--- +. +<blockquote> +<p>foo</p> +</blockquote> +<hr /> +</code></pre> +<p>Similarly, if we omit the <code>> </code> in the second line of</p> +<pre><code class="language-markdown">> - foo +> - bar +</code></pre> +<p>then the block quote ends after the first line:</p> +<pre><code class="language-example">> - foo +- bar +. +<blockquote> +<ul> +<li>foo</li> +</ul> +</blockquote> +<ul> +<li>bar</li> +</ul> +</code></pre> +<p>For the same reason, we can't omit the <code>> </code> in front of +subsequent lines of an indented or fenced code block:</p> +<pre><code class="language-example">> foo + bar +. +<blockquote> +<pre><code>foo +</code></pre> +</blockquote> +<pre><code>bar +</code></pre> +</code></pre> +<pre><code class="language-example">> ``` +foo +``` +. +<blockquote> +<pre><code></code></pre> +</blockquote> +<p>foo</p> +<pre><code></code></pre> +</code></pre> +<p>Note that in the following case, we have a [lazy +continuation line]:</p> +<pre><code class="language-example">> foo + - bar +. +<blockquote> +<p>foo +- bar</p> +</blockquote> +</code></pre> +<p>To see why, note that in</p> +<pre><code class="language-markdown">> foo +> - bar +</code></pre> +<p>the <code>- bar</code> is indented too far to start a list, and can't +be an indented code block because indented code blocks cannot +interrupt paragraphs, so it is [paragraph continuation text].</p> +<p>A block quote can be empty:</p> +<pre><code class="language-example">> +. +<blockquote> +</blockquote> +</code></pre> +<pre><code class="language-example">> +> +> +. +<blockquote> +</blockquote> +</code></pre> +<p>A block quote can have initial or final blank lines:</p> +<pre><code class="language-example">> +> foo +> +. +<blockquote> +<p>foo</p> +</blockquote> +</code></pre> +<p>A blank line always separates block quotes:</p> +<pre><code class="language-example">> foo + +> bar +. +<blockquote> +<p>foo</p> +</blockquote> +<blockquote> +<p>bar</p> +</blockquote> +</code></pre> +<p>(Most current Markdown implementations, including John Gruber's +original <code>Markdown.pl</code>, will parse this example as a single block quote +with two paragraphs. But it seems better to allow the author to decide +whether two block quotes or one are wanted.)</p> +<p>Consecutiveness means that if we put these block quotes together, +we get a single block quote:</p> +<pre><code class="language-example">> foo +> bar +. +<blockquote> +<p>foo +bar</p> +</blockquote> +</code></pre> +<p>To get a block quote with two paragraphs, use:</p> +<pre><code class="language-example">> foo +> +> bar +. +<blockquote> +<p>foo</p> +<p>bar</p> +</blockquote> +</code></pre> +<p>Block quotes can interrupt paragraphs:</p> +<pre><code class="language-example">foo +> bar +. +<p>foo</p> +<blockquote> +<p>bar</p> +</blockquote> +</code></pre> +<p>In general, blank lines are not needed before or after block +quotes:</p> +<pre><code class="language-example">> aaa +*** +> bbb +. +<blockquote> +<p>aaa</p> +</blockquote> +<hr /> +<blockquote> +<p>bbb</p> +</blockquote> +</code></pre> +<p>However, because of laziness, a blank line is needed between +a block quote and a following paragraph:</p> +<pre><code class="language-example">> bar +baz +. +<blockquote> +<p>bar +baz</p> +</blockquote> +</code></pre> +<pre><code class="language-example">> bar + +baz +. +<blockquote> +<p>bar</p> +</blockquote> +<p>baz</p> +</code></pre> +<pre><code class="language-example">> bar +> +baz +. +<blockquote> +<p>bar</p> +</blockquote> +<p>baz</p> +</code></pre> +<p>It is a consequence of the Laziness rule that any number +of initial <code>></code>s may be omitted on a continuation line of a +nested block quote:</p> +<pre><code class="language-example">> > > foo +bar +. +<blockquote> +<blockquote> +<blockquote> +<p>foo +bar</p> +</blockquote> +</blockquote> +</blockquote> +</code></pre> +<pre><code class="language-example">>>> foo +> bar +>>baz +. +<blockquote> +<blockquote> +<blockquote> +<p>foo +bar +baz</p> +</blockquote> +</blockquote> +</blockquote> +</code></pre> +<p>When including an indented code block in a block quote, +remember that the [block quote marker] includes +both the <code>></code> and a following space of indentation. So <em>five spaces</em> are needed +after the <code>></code>:</p> +<pre><code class="language-example">> code + +> not code +. +<blockquote> +<pre><code>code +</code></pre> +</blockquote> +<blockquote> +<p>not code</p> +</blockquote> +</code></pre> +<h2>List items</h2> +<p>A <a href="@">list marker</a> is a +[bullet list marker] or an [ordered list marker].</p> +<p>A <a href="@">bullet list marker</a> +is a <code>-</code>, <code>+</code>, or <code>*</code> character.</p> +<p>An <a href="@">ordered list marker</a> +is a sequence of 1--9 arabic digits (<code>0-9</code>), followed by either a +<code>.</code> character or a <code>)</code> character. (The reason for the length +limit is that with 10 digits we start seeing integer overflows +in some browsers.)</p> +<p>The following rules define [list items]:</p> +<ol> +<li> +<p><strong>Basic case.</strong> If a sequence of lines <em>Ls</em> constitute a sequence of +blocks <em>Bs</em> starting with a character other than a space or tab, and <em>M</em> is +a list marker of width <em>W</em> followed by 1 ≤ <em>N</em> ≤ 4 spaces of indentation, +then the result of prepending <em>M</em> and the following spaces to the first line +of Ls*, and indenting subsequent lines of <em>Ls</em> by <em>W + N</em> spaces, is a +list item with <em>Bs</em> as its contents. The type of the list item +(bullet or ordered) is determined by the type of its list marker. +If the list item is ordered, then it is also assigned a start +number, based on the ordered list marker.</p> +<p>Exceptions:</p> +<ol> +<li>When the first list item in a [list] interrupts +a paragraph---that is, when it starts on a line that would +otherwise count as [paragraph continuation text]---then (a) +the lines <em>Ls</em> must not begin with a blank line, and (b) if +the list item is ordered, the start number must be 1.</li> +<li>If any line is a [thematic break][thematic breaks] then +that line is not a list item.</li> +</ol> +</li> +</ol> +<p>For example, let <em>Ls</em> be the lines</p> +<pre><code class="language-example">A paragraph +with two lines. + + indented code + +> A block quote. +. +<p>A paragraph +with two lines.</p> +<pre><code>indented code +</code></pre> +<blockquote> +<p>A block quote.</p> +</blockquote> +</code></pre> +<p>And let <em>M</em> be the marker <code>1.</code>, and <em>N</em> = 2. Then rule #1 says +that the following is an ordered list item with start number 1, +and the same contents as <em>Ls</em>:</p> +<pre><code class="language-example">1. A paragraph + with two lines. + + indented code + + > A block quote. +. +<ol> +<li> +<p>A paragraph +with two lines.</p> +<pre><code>indented code +</code></pre> +<blockquote> +<p>A block quote.</p> +</blockquote> +</li> +</ol> +</code></pre> +<p>The most important thing to notice is that the position of +the text after the list marker determines how much indentation +is needed in subsequent blocks in the list item. If the list +marker takes up two spaces of indentation, and there are three spaces between +the list marker and the next character other than a space or tab, then blocks +must be indented five spaces in order to fall under the list +item.</p> +<p>Here are some examples showing how far content must be indented to be +put under the list item:</p> +<pre><code class="language-example">- one + + two +. +<ul> +<li>one</li> +</ul> +<p>two</p> +</code></pre> +<pre><code class="language-example">- one + + two +. +<ul> +<li> +<p>one</p> +<p>two</p> +</li> +</ul> +</code></pre> +<pre><code class="language-example"> - one + + two +. +<ul> +<li>one</li> +</ul> +<pre><code> two +</code></pre> +</code></pre> +<pre><code class="language-example"> - one + + two +. +<ul> +<li> +<p>one</p> +<p>two</p> +</li> +</ul> +</code></pre> +<p>It is tempting to think of this in terms of columns: the continuation +blocks must be indented at least to the column of the first character other than +a space or tab after the list marker. However, that is not quite right. +The spaces of indentation after the list marker determine how much relative +indentation is needed. Which column this indentation reaches will depend on +how the list item is embedded in other constructions, as shown by +this example:</p> +<pre><code class="language-example"> > > 1. one +>> +>> two +. +<blockquote> +<blockquote> +<ol> +<li> +<p>one</p> +<p>two</p> +</li> +</ol> +</blockquote> +</blockquote> +</code></pre> +<p>Here <code>two</code> occurs in the same column as the list marker <code>1.</code>, +but is actually contained in the list item, because there is +sufficient indentation after the last containing blockquote marker.</p> +<p>The converse is also possible. In the following example, the word <code>two</code> +occurs far to the right of the initial text of the list item, <code>one</code>, but +it is not considered part of the list item, because it is not indented +far enough past the blockquote marker:</p> +<pre><code class="language-example">>>- one +>> + > > two +. +<blockquote> +<blockquote> +<ul> +<li>one</li> +</ul> +<p>two</p> +</blockquote> +</blockquote> +</code></pre> +<p>Note that at least one space or tab is needed between the list marker and +any following content, so these are not list items:</p> +<pre><code class="language-example">-one + +2.two +. +<p>-one</p> +<p>2.two</p> +</code></pre> +<p>A list item may contain blocks that are separated by more than +one blank line.</p> +<pre><code class="language-example">- foo + + + bar +. +<ul> +<li> +<p>foo</p> +<p>bar</p> +</li> +</ul> +</code></pre> +<p>A list item may contain any kind of block:</p> +<pre><code class="language-example">1. foo + + ``` + bar + ``` + + baz + + > bam +. +<ol> +<li> +<p>foo</p> +<pre><code>bar +</code></pre> +<p>baz</p> +<blockquote> +<p>bam</p> +</blockquote> +</li> +</ol> +</code></pre> +<p>A list item that contains an indented code block will preserve +empty lines within the code block verbatim.</p> +<pre><code class="language-example">- Foo + + bar + + + baz +. +<ul> +<li> +<p>Foo</p> +<pre><code>bar + + +baz +</code></pre> +</li> +</ul> +</code></pre> +<p>Note that ordered list start numbers must be nine digits or less:</p> +<pre><code class="language-example">123456789. ok +. +<ol start="123456789"> +<li>ok</li> +</ol> +</code></pre> +<pre><code class="language-example">1234567890. not ok +. +<p>1234567890. not ok</p> +</code></pre> +<p>A start number may begin with 0s:</p> +<pre><code class="language-example">0. ok +. +<ol start="0"> +<li>ok</li> +</ol> +</code></pre> +<pre><code class="language-example">003. ok +. +<ol start="3"> +<li>ok</li> +</ol> +</code></pre> +<p>A start number may not be negative:</p> +<pre><code class="language-example">-1. not ok +. +<p>-1. not ok</p> +</code></pre> +<ol start="2"> +<li><strong>Item starting with indented code.</strong> If a sequence of lines <em>Ls</em> +constitute a sequence of blocks <em>Bs</em> starting with an indented code +block, and <em>M</em> is a list marker of width <em>W</em> followed by +one space of indentation, then the result of prepending <em>M</em> and the +following space to the first line of <em>Ls</em>, and indenting subsequent lines +of <em>Ls</em> by <em>W + 1</em> spaces, is a list item with <em>Bs</em> as its contents. +If a line is empty, then it need not be indented. The type of the +list item (bullet or ordered) is determined by the type of its list +marker. If the list item is ordered, then it is also assigned a +start number, based on the ordered list marker.</li> +</ol> +<p>An indented code block will have to be preceded by four spaces of indentation +beyond the edge of the region where text will be included in the list item. +In the following case that is 6 spaces:</p> +<pre><code class="language-example">- foo + + bar +. +<ul> +<li> +<p>foo</p> +<pre><code>bar +</code></pre> +</li> +</ul> +</code></pre> +<p>And in this case it is 11 spaces:</p> +<pre><code class="language-example"> 10. foo + + bar +. +<ol start="10"> +<li> +<p>foo</p> +<pre><code>bar +</code></pre> +</li> +</ol> +</code></pre> +<p>If the <em>first</em> block in the list item is an indented code block, +then by rule #2, the contents must be preceded by <em>one</em> space of indentation +after the list marker:</p> +<pre><code class="language-example"> indented code + +paragraph + + more code +. +<pre><code>indented code +</code></pre> +<p>paragraph</p> +<pre><code>more code +</code></pre> +</code></pre> +<pre><code class="language-example">1. indented code + + paragraph + + more code +. +<ol> +<li> +<pre><code>indented code +</code></pre> +<p>paragraph</p> +<pre><code>more code +</code></pre> +</li> +</ol> +</code></pre> +<p>Note that an additional space of indentation is interpreted as space +inside the code block:</p> +<pre><code class="language-example">1. indented code + + paragraph + + more code +. +<ol> +<li> +<pre><code> indented code +</code></pre> +<p>paragraph</p> +<pre><code>more code +</code></pre> +</li> +</ol> +</code></pre> +<p>Note that rules #1 and #2 only apply to two cases: (a) cases +in which the lines to be included in a list item begin with a +character other than a space or tab, and (b) cases in which +they begin with an indented code +block. In a case like the following, where the first block begins with +three spaces of indentation, the rules do not allow us to form a list item by +indenting the whole thing and prepending a list marker:</p> +<pre><code class="language-example"> foo + +bar +. +<p>foo</p> +<p>bar</p> +</code></pre> +<pre><code class="language-example">- foo + + bar +. +<ul> +<li>foo</li> +</ul> +<p>bar</p> +</code></pre> +<p>This is not a significant restriction, because when a block is preceded by up to +three spaces of indentation, the indentation can always be removed without +a change in interpretation, allowing rule #1 to be applied. So, in +the above case:</p> +<pre><code class="language-example">- foo + + bar +. +<ul> +<li> +<p>foo</p> +<p>bar</p> +</li> +</ul> +</code></pre> +<ol start="3"> +<li><strong>Item starting with a blank line.</strong> If a sequence of lines <em>Ls</em> +starting with a single [blank line] constitute a (possibly empty) +sequence of blocks <em>Bs</em>, and <em>M</em> is a list marker of width <em>W</em>, +then the result of prepending <em>M</em> to the first line of <em>Ls</em>, and +preceding subsequent lines of <em>Ls</em> by <em>W + 1</em> spaces of indentation, is a +list item with <em>Bs</em> as its contents. +If a line is empty, then it need not be indented. The type of the +list item (bullet or ordered) is determined by the type of its list +marker. If the list item is ordered, then it is also assigned a +start number, based on the ordered list marker.</li> +</ol> +<p>Here are some list items that start with a blank line but are not empty:</p> +<pre><code class="language-example">- + foo +- + ``` + bar + ``` +- + baz +. +<ul> +<li>foo</li> +<li> +<pre><code>bar +</code></pre> +</li> +<li> +<pre><code>baz +</code></pre> +</li> +</ul> +</code></pre> +<p>When the list item starts with a blank line, the number of spaces +following the list marker doesn't change the required indentation:</p> +<pre><code class="language-example">- + foo +. +<ul> +<li>foo</li> +</ul> +</code></pre> +<p>A list item can begin with at most one blank line. +In the following example, <code>foo</code> is not part of the list +item:</p> +<pre><code class="language-example">- + + foo +. +<ul> +<li></li> +</ul> +<p>foo</p> +</code></pre> +<p>Here is an empty bullet list item:</p> +<pre><code class="language-example">- foo +- +- bar +. +<ul> +<li>foo</li> +<li></li> +<li>bar</li> +</ul> +</code></pre> +<p>It does not matter whether there are spaces or tabs following the [list marker]:</p> +<pre><code class="language-example">- foo +- +- bar +. +<ul> +<li>foo</li> +<li></li> +<li>bar</li> +</ul> +</code></pre> +<p>Here is an empty ordered list item:</p> +<pre><code class="language-example">1. foo +2. +3. bar +. +<ol> +<li>foo</li> +<li></li> +<li>bar</li> +</ol> +</code></pre> +<p>A list may start or end with an empty list item:</p> +<pre><code class="language-example">* +. +<ul> +<li></li> +</ul> +</code></pre> +<p>However, an empty list item cannot interrupt a paragraph:</p> +<pre><code class="language-example">foo +* + +foo +1. +. +<p>foo +*</p> +<p>foo +1.</p> +</code></pre> +<ol start="4"> +<li><strong>Indentation.</strong> If a sequence of lines <em>Ls</em> constitutes a list item +according to rule #1, #2, or #3, then the result of preceding each line +of <em>Ls</em> by up to three spaces of indentation (the same for each line) also +constitutes a list item with the same contents and attributes. If a line is +empty, then it need not be indented.</li> +</ol> +<p>Indented one space:</p> +<pre><code class="language-example"> 1. A paragraph + with two lines. + + indented code + + > A block quote. +. +<ol> +<li> +<p>A paragraph +with two lines.</p> +<pre><code>indented code +</code></pre> +<blockquote> +<p>A block quote.</p> +</blockquote> +</li> +</ol> +</code></pre> +<p>Indented two spaces:</p> +<pre><code class="language-example"> 1. A paragraph + with two lines. + + indented code + + > A block quote. +. +<ol> +<li> +<p>A paragraph +with two lines.</p> +<pre><code>indented code +</code></pre> +<blockquote> +<p>A block quote.</p> +</blockquote> +</li> +</ol> +</code></pre> +<p>Indented three spaces:</p> +<pre><code class="language-example"> 1. A paragraph + with two lines. + + indented code + + > A block quote. +. +<ol> +<li> +<p>A paragraph +with two lines.</p> +<pre><code>indented code +</code></pre> +<blockquote> +<p>A block quote.</p> +</blockquote> +</li> +</ol> +</code></pre> +<p>Four spaces indent gives a code block:</p> +<pre><code class="language-example"> 1. A paragraph + with two lines. + + indented code + + > A block quote. +. +<pre><code>1. A paragraph + with two lines. + + indented code + + &gt; A block quote. +</code></pre> +</code></pre> +<ol start="5"> +<li><strong>Laziness.</strong> If a string of lines <em>Ls</em> constitute a <a href="#list-items">list +item</a> with contents <em>Bs</em>, then the result of deleting +some or all of the indentation from one or more lines in which the +next character other than a space or tab after the indentation is +[paragraph continuation text] is a +list item with the same contents and attributes. The unindented +lines are called +<a href="@">lazy continuation line</a>s.</li> +</ol> +<p>Here is an example with [lazy continuation lines]:</p> +<pre><code class="language-example"> 1. A paragraph +with two lines. + + indented code + + > A block quote. +. +<ol> +<li> +<p>A paragraph +with two lines.</p> +<pre><code>indented code +</code></pre> +<blockquote> +<p>A block quote.</p> +</blockquote> +</li> +</ol> +</code></pre> +<p>Indentation can be partially deleted:</p> +<pre><code class="language-example"> 1. A paragraph + with two lines. +. +<ol> +<li>A paragraph +with two lines.</li> +</ol> +</code></pre> +<p>These examples show how laziness can work in nested structures:</p> +<pre><code class="language-example">> 1. > Blockquote +continued here. +. +<blockquote> +<ol> +<li> +<blockquote> +<p>Blockquote +continued here.</p> +</blockquote> +</li> +</ol> +</blockquote> +</code></pre> +<pre><code class="language-example">> 1. > Blockquote +> continued here. +. +<blockquote> +<ol> +<li> +<blockquote> +<p>Blockquote +continued here.</p> +</blockquote> +</li> +</ol> +</blockquote> +</code></pre> +<ol start="6"> +<li><strong>That's all.</strong> Nothing that is not counted as a list item by rules +#1--5 counts as a <a href="#list-items">list item</a>.</li> +</ol> +<p>The rules for sublists follow from the general rules +[above][List items]. A sublist must be indented the same number +of spaces of indentation a paragraph would need to be in order to be included +in the list item.</p> +<p>So, in this case we need two spaces indent:</p> +<pre><code class="language-example">- foo + - bar + - baz + - boo +. +<ul> +<li>foo +<ul> +<li>bar +<ul> +<li>baz +<ul> +<li>boo</li> +</ul> +</li> +</ul> +</li> +</ul> +</li> +</ul> +</code></pre> +<p>One is not enough:</p> +<pre><code class="language-example">- foo + - bar + - baz + - boo +. +<ul> +<li>foo</li> +<li>bar</li> +<li>baz</li> +<li>boo</li> +</ul> +</code></pre> +<p>Here we need four, because the list marker is wider:</p> +<pre><code class="language-example">10) foo + - bar +. +<ol start="10"> +<li>foo +<ul> +<li>bar</li> +</ul> +</li> +</ol> +</code></pre> +<p>Three is not enough:</p> +<pre><code class="language-example">10) foo + - bar +. +<ol start="10"> +<li>foo</li> +</ol> +<ul> +<li>bar</li> +</ul> +</code></pre> +<p>A list may be the first block in a list item:</p> +<pre><code class="language-example">- - foo +. +<ul> +<li> +<ul> +<li>foo</li> +</ul> +</li> +</ul> +</code></pre> +<pre><code class="language-example">1. - 2. foo +. +<ol> +<li> +<ul> +<li> +<ol start="2"> +<li>foo</li> +</ol> +</li> +</ul> +</li> +</ol> +</code></pre> +<p>A list item can contain a heading:</p> +<pre><code class="language-example">- # Foo +- Bar + --- + baz +. +<ul> +<li> +<h1>Foo</h1> +</li> +<li> +<h2>Bar</h2> +baz</li> +</ul> +</code></pre> +<h3>Motivation</h3> +<p>John Gruber's Markdown spec says the following about list items:</p> +<ol> +<li> +<p>"List markers typically start at the left margin, but may be indented +by up to three spaces. List markers must be followed by one or more +spaces or a tab."</p> +</li> +<li> +<p>"To make lists look nice, you can wrap items with hanging indents.... +But if you don't want to, you don't have to."</p> +</li> +<li> +<p>"List items may consist of multiple paragraphs. Each subsequent +paragraph in a list item must be indented by either 4 spaces or one +tab."</p> +</li> +<li> +<p>"It looks nice if you indent every line of the subsequent paragraphs, +but here again, Markdown will allow you to be lazy."</p> +</li> +<li> +<p>"To put a blockquote within a list item, the blockquote's <code>></code> +delimiters need to be indented."</p> +</li> +<li> +<p>"To put a code block within a list item, the code block needs to be +indented twice — 8 spaces or two tabs."</p> +</li> +</ol> +<p>These rules specify that a paragraph under a list item must be indented +four spaces (presumably, from the left margin, rather than the start of +the list marker, but this is not said), and that code under a list item +must be indented eight spaces instead of the usual four. They also say +that a block quote must be indented, but not by how much; however, the +example given has four spaces indentation. Although nothing is said +about other kinds of block-level content, it is certainly reasonable to +infer that <em>all</em> block elements under a list item, including other +lists, must be indented four spaces. This principle has been called the +<em>four-space rule</em>.</p> +<p>The four-space rule is clear and principled, and if the reference +implementation <code>Markdown.pl</code> had followed it, it probably would have +become the standard. However, <code>Markdown.pl</code> allowed paragraphs and +sublists to start with only two spaces indentation, at least on the +outer level. Worse, its behavior was inconsistent: a sublist of an +outer-level list needed two spaces indentation, but a sublist of this +sublist needed three spaces. It is not surprising, then, that different +implementations of Markdown have developed very different rules for +determining what comes under a list item. (Pandoc and python-Markdown, +for example, stuck with Gruber's syntax description and the four-space +rule, while discount, redcarpet, marked, PHP Markdown, and others +followed <code>Markdown.pl</code>'s behavior more closely.)</p> +<p>Unfortunately, given the divergences between implementations, there +is no way to give a spec for list items that will be guaranteed not +to break any existing documents. However, the spec given here should +correctly handle lists formatted with either the four-space rule or +the more forgiving <code>Markdown.pl</code> behavior, provided they are laid out +in a way that is natural for a human to read.</p> +<p>The strategy here is to let the width and indentation of the list marker +determine the indentation necessary for blocks to fall under the list +item, rather than having a fixed and arbitrary number. The writer can +think of the body of the list item as a unit which gets indented to the +right enough to fit the list marker (and any indentation on the list +marker). (The laziness rule, #5, then allows continuation lines to be +unindented if needed.)</p> +<p>This rule is superior, we claim, to any rule requiring a fixed level of +indentation from the margin. The four-space rule is clear but +unnatural. It is quite unintuitive that</p> +<pre><code class="language-markdown">- foo + + bar + + - baz +</code></pre> +<p>should be parsed as two lists with an intervening paragraph,</p> +<pre><code class="language-html"><ul> +<li>foo</li> +</ul> +<p>bar</p> +<ul> +<li>baz</li> +</ul> +</code></pre> +<p>as the four-space rule demands, rather than a single list,</p> +<pre><code class="language-html"><ul> +<li> +<p>foo</p> +<p>bar</p> +<ul> +<li>baz</li> +</ul> +</li> +</ul> +</code></pre> +<p>The choice of four spaces is arbitrary. It can be learned, but it is +not likely to be guessed, and it trips up beginners regularly.</p> +<p>Would it help to adopt a two-space rule? The problem is that such +a rule, together with the rule allowing up to three spaces of indentation for +the initial list marker, allows text that is indented <em>less than</em> the +original list marker to be included in the list item. For example, +<code>Markdown.pl</code> parses</p> +<pre><code class="language-markdown"> - one + + two +</code></pre> +<p>as a single list item, with <code>two</code> a continuation paragraph:</p> +<pre><code class="language-html"><ul> +<li> +<p>one</p> +<p>two</p> +</li> +</ul> +</code></pre> +<p>and similarly</p> +<pre><code class="language-markdown">> - one +> +> two +</code></pre> +<p>as</p> +<pre><code class="language-html"><blockquote> +<ul> +<li> +<p>one</p> +<p>two</p> +</li> +</ul> +</blockquote> +</code></pre> +<p>This is extremely unintuitive.</p> +<p>Rather than requiring a fixed indent from the margin, we could require +a fixed indent (say, two spaces, or even one space) from the list marker (which +may itself be indented). This proposal would remove the last anomaly +discussed. Unlike the spec presented above, it would count the following +as a list item with a subparagraph, even though the paragraph <code>bar</code> +is not indented as far as the first paragraph <code>foo</code>:</p> +<pre><code class="language-markdown"> 10. foo + + bar +</code></pre> +<p>Arguably this text does read like a list item with <code>bar</code> as a subparagraph, +which may count in favor of the proposal. However, on this proposal indented +code would have to be indented six spaces after the list marker. And this +would break a lot of existing Markdown, which has the pattern:</p> +<pre><code class="language-markdown">1. foo + + indented code +</code></pre> +<p>where the code is indented eight spaces. The spec above, by contrast, will +parse this text as expected, since the code block's indentation is measured +from the beginning of <code>foo</code>.</p> +<p>The one case that needs special treatment is a list item that <em>starts</em> +with indented code. How much indentation is required in that case, since +we don't have a "first paragraph" to measure from? Rule #2 simply stipulates +that in such cases, we require one space indentation from the list marker +(and then the normal four spaces for the indented code). This will match the +four-space rule in cases where the list marker plus its initial indentation +takes four spaces (a common case), but diverge in other cases.</p> +<h2>Lists</h2> +<p>A <a href="@">list</a> is a sequence of one or more +list items [of the same type]. The list items +may be separated by any number of blank lines.</p> +<p>Two list items are <a href="@">of the same type</a> +if they begin with a [list marker] of the same type. +Two list markers are of the +same type if (a) they are bullet list markers using the same character +(<code>-</code>, <code>+</code>, or <code>*</code>) or (b) they are ordered list numbers with the same +delimiter (either <code>.</code> or <code>)</code>).</p> +<p>A list is an <a href="@">ordered list</a> +if its constituent list items begin with +[ordered list markers], and a +<a href="@">bullet list</a> if its constituent list +items begin with [bullet list markers].</p> +<p>The <a href="@">start number</a> +of an [ordered list] is determined by the list number of +its initial list item. The numbers of subsequent list items are +disregarded.</p> +<p>A list is <a href="@">loose</a> if any of its constituent +list items are separated by blank lines, or if any of its constituent +list items directly contain two block-level elements with a blank line +between them. Otherwise a list is <a href="@">tight</a>. +(The difference in HTML output is that paragraphs in a loose list are +wrapped in <code><p></code> tags, while paragraphs in a tight list are not.)</p> +<p>Changing the bullet or ordered list delimiter starts a new list:</p> +<pre><code class="language-example">- foo +- bar ++ baz +. +<ul> +<li>foo</li> +<li>bar</li> +</ul> +<ul> +<li>baz</li> +</ul> +</code></pre> +<pre><code class="language-example">1. foo +2. bar +3) baz +. +<ol> +<li>foo</li> +<li>bar</li> +</ol> +<ol start="3"> +<li>baz</li> +</ol> +</code></pre> +<p>In CommonMark, a list can interrupt a paragraph. That is, +no blank line is needed to separate a paragraph from a following +list:</p> +<pre><code class="language-example">Foo +- bar +- baz +. +<p>Foo</p> +<ul> +<li>bar</li> +<li>baz</li> +</ul> +</code></pre> +<p><code>Markdown.pl</code> does not allow this, through fear of triggering a list +via a numeral in a hard-wrapped line:</p> +<pre><code class="language-markdown">The number of windows in my house is +14. The number of doors is 6. +</code></pre> +<p>Oddly, though, <code>Markdown.pl</code> <em>does</em> allow a blockquote to +interrupt a paragraph, even though the same considerations might +apply.</p> +<p>In CommonMark, we do allow lists to interrupt paragraphs, for +two reasons. First, it is natural and not uncommon for people +to start lists without blank lines:</p> +<pre><code class="language-markdown">I need to buy +- new shoes +- a coat +- a plane ticket +</code></pre> +<p>Second, we are attracted to a</p> +<blockquote> +<p><a href="@">principle of uniformity</a>: +if a chunk of text has a certain +meaning, it will continue to have the same meaning when put into a +container block (such as a list item or blockquote).</p> +</blockquote> +<p>(Indeed, the spec for [list items] and [block quotes] presupposes +this principle.) This principle implies that if</p> +<pre><code class="language-markdown"> * I need to buy + - new shoes + - a coat + - a plane ticket +</code></pre> +<p>is a list item containing a paragraph followed by a nested sublist, +as all Markdown implementations agree it is (though the paragraph +may be rendered without <code><p></code> tags, since the list is "tight"), +then</p> +<pre><code class="language-markdown">I need to buy +- new shoes +- a coat +- a plane ticket +</code></pre> +<p>by itself should be a paragraph followed by a nested sublist.</p> +<p>Since it is well established Markdown practice to allow lists to +interrupt paragraphs inside list items, the [principle of +uniformity] requires us to allow this outside list items as +well. (<a href="http://docutils.sourceforge.net/rst.html">reStructuredText</a> +takes a different approach, requiring blank lines before lists +even inside other list items.)</p> +<p>In order to solve of unwanted lists in paragraphs with +hard-wrapped numerals, we allow only lists starting with <code>1</code> to +interrupt paragraphs. Thus,</p> +<pre><code class="language-example">The number of windows in my house is +14. The number of doors is 6. +. +<p>The number of windows in my house is +14. The number of doors is 6.</p> +</code></pre> +<p>We may still get an unintended result in cases like</p> +<pre><code class="language-example">The number of windows in my house is +1. The number of doors is 6. +. +<p>The number of windows in my house is</p> +<ol> +<li>The number of doors is 6.</li> +</ol> +</code></pre> +<p>but this rule should prevent most spurious list captures.</p> +<p>There can be any number of blank lines between items:</p> +<pre><code class="language-example">- foo + +- bar + + +- baz +. +<ul> +<li> +<p>foo</p> +</li> +<li> +<p>bar</p> +</li> +<li> +<p>baz</p> +</li> +</ul> +</code></pre> +<pre><code class="language-example">- foo + - bar + - baz + + + bim +. +<ul> +<li>foo +<ul> +<li>bar +<ul> +<li> +<p>baz</p> +<p>bim</p> +</li> +</ul> +</li> +</ul> +</li> +</ul> +</code></pre> +<p>To separate consecutive lists of the same type, or to separate a +list from an indented code block that would otherwise be parsed +as a subparagraph of the final list item, you can insert a blank HTML +comment:</p> +<pre><code class="language-example">- foo +- bar + +<!-- --> + +- baz +- bim +. +<ul> +<li>foo</li> +<li>bar</li> +</ul> +<!-- --> +<ul> +<li>baz</li> +<li>bim</li> +</ul> +</code></pre> +<pre><code class="language-example">- foo + + notcode + +- foo + +<!-- --> + + code +. +<ul> +<li> +<p>foo</p> +<p>notcode</p> +</li> +<li> +<p>foo</p> +</li> +</ul> +<!-- --> +<pre><code>code +</code></pre> +</code></pre> +<p>List items need not be indented to the same level. The following +list items will be treated as items at the same list level, +since none is indented enough to belong to the previous list +item:</p> +<pre><code class="language-example">- a + - b + - c + - d + - e + - f +- g +. +<ul> +<li>a</li> +<li>b</li> +<li>c</li> +<li>d</li> +<li>e</li> +<li>f</li> +<li>g</li> +</ul> +</code></pre> +<pre><code class="language-example">1. a + + 2. b + + 3. c +. +<ol> +<li> +<p>a</p> +</li> +<li> +<p>b</p> +</li> +<li> +<p>c</p> +</li> +</ol> +</code></pre> +<p>Note, however, that list items may not be preceded by more than +three spaces of indentation. Here <code>- e</code> is treated as a paragraph continuation +line, because it is indented more than three spaces:</p> +<pre><code class="language-example">- a + - b + - c + - d + - e +. +<ul> +<li>a</li> +<li>b</li> +<li>c</li> +<li>d +- e</li> +</ul> +</code></pre> +<p>And here, <code>3. c</code> is treated as in indented code block, +because it is indented four spaces and preceded by a +blank line.</p> +<pre><code class="language-example">1. a + + 2. b + + 3. c +. +<ol> +<li> +<p>a</p> +</li> +<li> +<p>b</p> +</li> +</ol> +<pre><code>3. c +</code></pre> +</code></pre> +<p>This is a loose list, because there is a blank line between +two of the list items:</p> +<pre><code class="language-example">- a +- b + +- c +. +<ul> +<li> +<p>a</p> +</li> +<li> +<p>b</p> +</li> +<li> +<p>c</p> +</li> +</ul> +</code></pre> +<p>So is this, with a empty second item:</p> +<pre><code class="language-example">* a +* + +* c +. +<ul> +<li> +<p>a</p> +</li> +<li></li> +<li> +<p>c</p> +</li> +</ul> +</code></pre> +<p>These are loose lists, even though there are no blank lines between the items, +because one of the items directly contains two block-level elements +with a blank line between them:</p> +<pre><code class="language-example">- a +- b + + c +- d +. +<ul> +<li> +<p>a</p> +</li> +<li> +<p>b</p> +<p>c</p> +</li> +<li> +<p>d</p> +</li> +</ul> +</code></pre> +<pre><code class="language-example">- a +- b + + [ref]: /url +- d +. +<ul> +<li> +<p>a</p> +</li> +<li> +<p>b</p> +</li> +<li> +<p>d</p> +</li> +</ul> +</code></pre> +<p>This is a tight list, because the blank lines are in a code block:</p> +<pre><code class="language-example">- a +- ``` + b + + + ``` +- c +. +<ul> +<li>a</li> +<li> +<pre><code>b + + +</code></pre> +</li> +<li>c</li> +</ul> +</code></pre> +<p>This is a tight list, because the blank line is between two +paragraphs of a sublist. So the sublist is loose while +the outer list is tight:</p> +<pre><code class="language-example">- a + - b + + c +- d +. +<ul> +<li>a +<ul> +<li> +<p>b</p> +<p>c</p> +</li> +</ul> +</li> +<li>d</li> +</ul> +</code></pre> +<p>This is a tight list, because the blank line is inside the +block quote:</p> +<pre><code class="language-example">* a + > b + > +* c +. +<ul> +<li>a +<blockquote> +<p>b</p> +</blockquote> +</li> +<li>c</li> +</ul> +</code></pre> +<p>This list is tight, because the consecutive block elements +are not separated by blank lines:</p> +<pre><code class="language-example">- a + > b + ``` + c + ``` +- d +. +<ul> +<li>a +<blockquote> +<p>b</p> +</blockquote> +<pre><code>c +</code></pre> +</li> +<li>d</li> +</ul> +</code></pre> +<p>A single-paragraph list is tight:</p> +<pre><code class="language-example">- a +. +<ul> +<li>a</li> +</ul> +</code></pre> +<pre><code class="language-example">- a + - b +. +<ul> +<li>a +<ul> +<li>b</li> +</ul> +</li> +</ul> +</code></pre> +<p>This list is loose, because of the blank line between the +two block elements in the list item:</p> +<pre><code class="language-example">1. ``` + foo + ``` + + bar +. +<ol> +<li> +<pre><code>foo +</code></pre> +<p>bar</p> +</li> +</ol> +</code></pre> +<p>Here the outer list is loose, the inner list tight:</p> +<pre><code class="language-example">* foo + * bar + + baz +. +<ul> +<li> +<p>foo</p> +<ul> +<li>bar</li> +</ul> +<p>baz</p> +</li> +</ul> +</code></pre> +<pre><code class="language-example">- a + - b + - c + +- d + - e + - f +. +<ul> +<li> +<p>a</p> +<ul> +<li>b</li> +<li>c</li> +</ul> +</li> +<li> +<p>d</p> +<ul> +<li>e</li> +<li>f</li> +</ul> +</li> +</ul> +</code></pre> +<h1>Inlines</h1> +<p>Inlines are parsed sequentially from the beginning of the character +stream to the end (left to right, in left-to-right languages). +Thus, for example, in</p> +<pre><code class="language-example">`hi`lo` +. +<p><code>hi</code>lo`</p> +</code></pre> +<p><code>hi</code> is parsed as code, leaving the backtick at the end as a literal +backtick.</p> +<h2>Code spans</h2> +<p>A <a href="@">backtick string</a> +is a string of one or more backtick characters (<code>`</code>) that is neither +preceded nor followed by a backtick.</p> +<p>A <a href="@">code span</a> begins with a backtick string and ends with +a backtick string of equal length. The contents of the code span are +the characters between these two backtick strings, normalized in the +following ways:</p> +<ul> +<li>First, [line endings] are converted to [spaces].</li> +<li>If the resulting string both begins <em>and</em> ends with a [space] +character, but does not consist entirely of [space] +characters, a single [space] character is removed from the +front and back. This allows you to include code that begins +or ends with backtick characters, which must be separated by +whitespace from the opening or closing backtick strings.</li> +</ul> +<p>This is a simple code span:</p> +<pre><code class="language-example">`foo` +. +<p><code>foo</code></p> +</code></pre> +<p>Here two backticks are used, because the code contains a backtick. +This example also illustrates stripping of a single leading and +trailing space:</p> +<pre><code class="language-example">`` foo ` bar `` +. +<p><code>foo ` bar</code></p> +</code></pre> +<p>This example shows the motivation for stripping leading and trailing +spaces:</p> +<pre><code class="language-example">` `` ` +. +<p><code>``</code></p> +</code></pre> +<p>Note that only <em>one</em> space is stripped:</p> +<pre><code class="language-example">` `` ` +. +<p><code> `` </code></p> +</code></pre> +<p>The stripping only happens if the space is on both +sides of the string:</p> +<pre><code class="language-example">` a` +. +<p><code> a</code></p> +</code></pre> +<p>Only [spaces], and not [unicode whitespace] in general, are +stripped in this way:</p> +<pre><code class="language-example">` b ` +. +<p><code> b </code></p> +</code></pre> +<p>No stripping occurs if the code span contains only spaces:</p> +<pre><code class="language-example">` ` +` ` +. +<p><code> </code> +<code> </code></p> +</code></pre> +<p>[Line endings] are treated like spaces:</p> +<pre><code class="language-example">`` +foo +bar +baz +`` +. +<p><code>foo bar baz</code></p> +</code></pre> +<pre><code class="language-example">`` +foo +`` +. +<p><code>foo </code></p> +</code></pre> +<p>Interior spaces are not collapsed:</p> +<pre><code class="language-example">`foo bar +baz` +. +<p><code>foo bar baz</code></p> +</code></pre> +<p>Note that browsers will typically collapse consecutive spaces +when rendering <code><code></code> elements, so it is recommended that +the following CSS be used:</p> +<pre><code>code{white-space: pre-wrap;} +</code></pre> +<p>Note that backslash escapes do not work in code spans. All backslashes +are treated literally:</p> +<pre><code class="language-example">`foo\`bar` +. +<p><code>foo\</code>bar`</p> +</code></pre> +<p>Backslash escapes are never needed, because one can always choose a +string of <em>n</em> backtick characters as delimiters, where the code does +not contain any strings of exactly <em>n</em> backtick characters.</p> +<pre><code class="language-example">``foo`bar`` +. +<p><code>foo`bar</code></p> +</code></pre> +<pre><code class="language-example">` foo `` bar ` +. +<p><code>foo `` bar</code></p> +</code></pre> +<p>Code span backticks have higher precedence than any other inline +constructs except HTML tags and autolinks. Thus, for example, this is +not parsed as emphasized text, since the second <code>*</code> is part of a code +span:</p> +<pre><code class="language-example">*foo`*` +. +<p>*foo<code>*</code></p> +</code></pre> +<p>And this is not parsed as a link:</p> +<pre><code class="language-example">[not a `link](/foo`) +. +<p>[not a <code>link](/foo</code>)</p> +</code></pre> +<p>Code spans, HTML tags, and autolinks have the same precedence. +Thus, this is code:</p> +<pre><code class="language-example">`<a href="`">` +. +<p><code>&lt;a href=&quot;</code>&quot;&gt;`</p> +</code></pre> +<p>But this is an HTML tag:</p> +<pre><code class="language-example"><a href="`">` +. +<p><a href="`">`</p> +</code></pre> +<p>And this is code:</p> +<pre><code class="language-example">`<http://foo.bar.`baz>` +. +<p><code>&lt;http://foo.bar.</code>baz&gt;`</p> +</code></pre> +<p>But this is an autolink:</p> +<pre><code class="language-example"><http://foo.bar.`baz>` +. +<p><a href="http://foo.bar.%60baz">http://foo.bar.`baz</a>`</p> +</code></pre> +<p>When a backtick string is not closed by a matching backtick string, +we just have literal backticks:</p> +<pre><code class="language-example">```foo`` +. +<p>```foo``</p> +</code></pre> +<pre><code class="language-example">`foo +. +<p>`foo</p> +</code></pre> +<p>The following case also illustrates the need for opening and +closing backtick strings to be equal in length:</p> +<pre><code class="language-example">`foo``bar`` +. +<p>`foo<code>bar</code></p> +</code></pre> +<h2>Emphasis and strong emphasis</h2> +<p>John Gruber's original <a href="http://daringfireball.net/projects/markdown/syntax#em">Markdown syntax +description</a> says:</p> +<blockquote> +<p>Markdown treats asterisks (<code>*</code>) and underscores (<code>_</code>) as indicators of +emphasis. Text wrapped with one <code>*</code> or <code>_</code> will be wrapped with an HTML +<code><em></code> tag; double <code>*</code>'s or <code>_</code>'s will be wrapped with an HTML <code><strong></code> +tag.</p> +</blockquote> +<p>This is enough for most users, but these rules leave much undecided, +especially when it comes to nested emphasis. The original +<code>Markdown.pl</code> test suite makes it clear that triple <code>***</code> and +<code>___</code> delimiters can be used for strong emphasis, and most +implementations have also allowed the following patterns:</p> +<pre><code class="language-markdown">***strong emph*** +***strong** in emph* +***emph* in strong** +**in strong *emph*** +*in emph **strong*** +</code></pre> +<p>The following patterns are less widely supported, but the intent +is clear and they are useful (especially in contexts like bibliography +entries):</p> +<pre><code class="language-markdown">*emph *with emph* in it* +**strong **with strong** in it** +</code></pre> +<p>Many implementations have also restricted intraword emphasis to +the <code>*</code> forms, to avoid unwanted emphasis in words containing +internal underscores. (It is best practice to put these in code +spans, but users often do not.)</p> +<pre><code class="language-markdown">internal emphasis: foo*bar*baz +no emphasis: foo_bar_baz +</code></pre> +<p>The rules given below capture all of these patterns, while allowing +for efficient parsing strategies that do not backtrack.</p> +<p>First, some definitions. A <a href="@">delimiter run</a> is either +a sequence of one or more <code>*</code> characters that is not preceded or +followed by a non-backslash-escaped <code>*</code> character, or a sequence +of one or more <code>_</code> characters that is not preceded or followed by +a non-backslash-escaped <code>_</code> character.</p> +<p>A <a href="@">left-flanking delimiter run</a> is +a [delimiter run] that is (1) not followed by [Unicode whitespace], +and either (2a) not followed by a [Unicode punctuation character], or +(2b) followed by a [Unicode punctuation character] and +preceded by [Unicode whitespace] or a [Unicode punctuation character]. +For purposes of this definition, the beginning and the end of +the line count as Unicode whitespace.</p> +<p>A <a href="@">right-flanking delimiter run</a> is +a [delimiter run] that is (1) not preceded by [Unicode whitespace], +and either (2a) not preceded by a [Unicode punctuation character], or +(2b) preceded by a [Unicode punctuation character] and +followed by [Unicode whitespace] or a [Unicode punctuation character]. +For purposes of this definition, the beginning and the end of +the line count as Unicode whitespace.</p> +<p>Here are some examples of delimiter runs.</p> +<ul> +<li> +<p>left-flanking but not right-flanking:</p> +<pre><code>***abc + _abc +**"abc" + _"abc" +</code></pre> +</li> +<li> +<p>right-flanking but not left-flanking:</p> +<pre><code> abc*** + abc_ +"abc"** +"abc"_ +</code></pre> +</li> +<li> +<p>Both left and right-flanking:</p> +<pre><code> abc***def +"abc"_"def" +</code></pre> +</li> +<li> +<p>Neither left nor right-flanking:</p> +<pre><code>abc *** def +a _ b +</code></pre> +</li> +</ul> +<p>(The idea of distinguishing left-flanking and right-flanking +delimiter runs based on the character before and the character +after comes from Roopesh Chander's +<a href="http://www.vfmd.org/vfmd-spec/specification/#procedure-for-identifying-emphasis-tags">vfmd</a>. +vfmd uses the terminology "emphasis indicator string" instead of "delimiter +run," and its rules for distinguishing left- and right-flanking runs +are a bit more complex than the ones given here.)</p> +<p>The following rules define emphasis and strong emphasis:</p> +<ol> +<li> +<p>A single <code>*</code> character <a href="@">can open emphasis</a> +iff (if and only if) it is part of a [left-flanking delimiter run].</p> +</li> +<li> +<p>A single <code>_</code> character [can open emphasis] iff +it is part of a [left-flanking delimiter run] +and either (a) not part of a [right-flanking delimiter run] +or (b) part of a [right-flanking delimiter run] +preceded by a [Unicode punctuation character].</p> +</li> +<li> +<p>A single <code>*</code> character <a href="@">can close emphasis</a> +iff it is part of a [right-flanking delimiter run].</p> +</li> +<li> +<p>A single <code>_</code> character [can close emphasis] iff +it is part of a [right-flanking delimiter run] +and either (a) not part of a [left-flanking delimiter run] +or (b) part of a [left-flanking delimiter run] +followed by a [Unicode punctuation character].</p> +</li> +<li> +<p>A double <code>**</code> <a href="@">can open strong emphasis</a> +iff it is part of a [left-flanking delimiter run].</p> +</li> +<li> +<p>A double <code>__</code> [can open strong emphasis] iff +it is part of a [left-flanking delimiter run] +and either (a) not part of a [right-flanking delimiter run] +or (b) part of a [right-flanking delimiter run] +preceded by a [Unicode punctuation character].</p> +</li> +<li> +<p>A double <code>**</code> <a href="@">can close strong emphasis</a> +iff it is part of a [right-flanking delimiter run].</p> +</li> +<li> +<p>A double <code>__</code> [can close strong emphasis] iff +it is part of a [right-flanking delimiter run] +and either (a) not part of a [left-flanking delimiter run] +or (b) part of a [left-flanking delimiter run] +followed by a [Unicode punctuation character].</p> +</li> +<li> +<p>Emphasis begins with a delimiter that [can open emphasis] and ends +with a delimiter that [can close emphasis], and that uses the same +character (<code>_</code> or <code>*</code>) as the opening delimiter. The +opening and closing delimiters must belong to separate +[delimiter runs]. If one of the delimiters can both +open and close emphasis, then the sum of the lengths of the +delimiter runs containing the opening and closing delimiters +must not be a multiple of 3 unless both lengths are +multiples of 3.</p> +</li> +<li> +<p>Strong emphasis begins with a delimiter that +[can open strong emphasis] and ends with a delimiter that +[can close strong emphasis], and that uses the same character +(<code>_</code> or <code>*</code>) as the opening delimiter. The +opening and closing delimiters must belong to separate +[delimiter runs]. If one of the delimiters can both open +and close strong emphasis, then the sum of the lengths of +the delimiter runs containing the opening and closing +delimiters must not be a multiple of 3 unless both lengths +are multiples of 3.</p> +</li> +<li> +<p>A literal <code>*</code> character cannot occur at the beginning or end of +<code>*</code>-delimited emphasis or <code>**</code>-delimited strong emphasis, unless it +is backslash-escaped.</p> +</li> +<li> +<p>A literal <code>_</code> character cannot occur at the beginning or end of +<code>_</code>-delimited emphasis or <code>__</code>-delimited strong emphasis, unless it +is backslash-escaped.</p> +</li> +</ol> +<p>Where rules 1--12 above are compatible with multiple parsings, +the following principles resolve ambiguity:</p> +<ol start="13"> +<li> +<p>The number of nestings should be minimized. Thus, for example, +an interpretation <code><strong>...</strong></code> is always preferred to +<code><em><em>...</em></em></code>.</p> +</li> +<li> +<p>An interpretation <code><em><strong>...</strong></em></code> is always +preferred to <code><strong><em>...</em></strong></code>.</p> +</li> +<li> +<p>When two potential emphasis or strong emphasis spans overlap, +so that the second begins before the first ends and ends after +the first ends, the first takes precedence. Thus, for example, +<code>*foo _bar* baz_</code> is parsed as <code><em>foo _bar</em> baz_</code> rather +than <code>*foo <em>bar* baz</em></code>.</p> +</li> +<li> +<p>When there are two potential emphasis or strong emphasis spans +with the same closing delimiter, the shorter one (the one that +opens later) takes precedence. Thus, for example, +<code>**foo **bar baz**</code> is parsed as <code>**foo <strong>bar baz</strong></code> +rather than <code><strong>foo **bar baz</strong></code>.</p> +</li> +<li> +<p>Inline code spans, links, images, and HTML tags group more tightly +than emphasis. So, when there is a choice between an interpretation +that contains one of these elements and one that does not, the +former always wins. Thus, for example, <code>*[foo*](bar)</code> is +parsed as <code>*<a href="bar">foo*</a></code> rather than as +<code><em>[foo</em>](bar)</code>.</p> +</li> +</ol> +<p>These rules can be illustrated through a series of examples.</p> +<p>Rule 1:</p> +<pre><code class="language-example">*foo bar* +. +<p><em>foo bar</em></p> +</code></pre> +<p>This is not emphasis, because the opening <code>*</code> is followed by +whitespace, and hence not part of a [left-flanking delimiter run]:</p> +<pre><code class="language-example">a * foo bar* +. +<p>a * foo bar*</p> +</code></pre> +<p>This is not emphasis, because the opening <code>*</code> is preceded +by an alphanumeric and followed by punctuation, and hence +not part of a [left-flanking delimiter run]:</p> +<pre><code class="language-example">a*"foo"* +. +<p>a*&quot;foo&quot;*</p> +</code></pre> +<p>Unicode nonbreaking spaces count as whitespace, too:</p> +<pre><code class="language-example">* a * +. +<p>* a *</p> +</code></pre> +<p>Intraword emphasis with <code>*</code> is permitted:</p> +<pre><code class="language-example">foo*bar* +. +<p>foo<em>bar</em></p> +</code></pre> +<pre><code class="language-example">5*6*78 +. +<p>5<em>6</em>78</p> +</code></pre> +<p>Rule 2:</p> +<pre><code class="language-example">_foo bar_ +. +<p><em>foo bar</em></p> +</code></pre> +<p>This is not emphasis, because the opening <code>_</code> is followed by +whitespace:</p> +<pre><code class="language-example">_ foo bar_ +. +<p>_ foo bar_</p> +</code></pre> +<p>This is not emphasis, because the opening <code>_</code> is preceded +by an alphanumeric and followed by punctuation:</p> +<pre><code class="language-example">a_"foo"_ +. +<p>a_&quot;foo&quot;_</p> +</code></pre> +<p>Emphasis with <code>_</code> is not allowed inside words:</p> +<pre><code class="language-example">foo_bar_ +. +<p>foo_bar_</p> +</code></pre> +<pre><code class="language-example">5_6_78 +. +<p>5_6_78</p> +</code></pre> +<pre><code class="language-example">пристаням_стремятся_ +. +<p>пристаням_стремятся_</p> +</code></pre> +<p>Here <code>_</code> does not generate emphasis, because the first delimiter run +is right-flanking and the second left-flanking:</p> +<pre><code class="language-example">aa_"bb"_cc +. +<p>aa_&quot;bb&quot;_cc</p> +</code></pre> +<p>This is emphasis, even though the opening delimiter is +both left- and right-flanking, because it is preceded by +punctuation:</p> +<pre><code class="language-example">foo-_(bar)_ +. +<p>foo-<em>(bar)</em></p> +</code></pre> +<p>Rule 3:</p> +<p>This is not emphasis, because the closing delimiter does +not match the opening delimiter:</p> +<pre><code class="language-example">_foo* +. +<p>_foo*</p> +</code></pre> +<p>This is not emphasis, because the closing <code>*</code> is preceded by +whitespace:</p> +<pre><code class="language-example">*foo bar * +. +<p>*foo bar *</p> +</code></pre> +<p>A line ending also counts as whitespace:</p> +<pre><code class="language-example">*foo bar +* +. +<p>*foo bar +*</p> +</code></pre> +<p>This is not emphasis, because the second <code>*</code> is +preceded by punctuation and followed by an alphanumeric +(hence it is not part of a [right-flanking delimiter run]:</p> +<pre><code class="language-example">*(*foo) +. +<p>*(*foo)</p> +</code></pre> +<p>The point of this restriction is more easily appreciated +with this example:</p> +<pre><code class="language-example">*(*foo*)* +. +<p><em>(<em>foo</em>)</em></p> +</code></pre> +<p>Intraword emphasis with <code>*</code> is allowed:</p> +<pre><code class="language-example">*foo*bar +. +<p><em>foo</em>bar</p> +</code></pre> +<p>Rule 4:</p> +<p>This is not emphasis, because the closing <code>_</code> is preceded by +whitespace:</p> +<pre><code class="language-example">_foo bar _ +. +<p>_foo bar _</p> +</code></pre> +<p>This is not emphasis, because the second <code>_</code> is +preceded by punctuation and followed by an alphanumeric:</p> +<pre><code class="language-example">_(_foo) +. +<p>_(_foo)</p> +</code></pre> +<p>This is emphasis within emphasis:</p> +<pre><code class="language-example">_(_foo_)_ +. +<p><em>(<em>foo</em>)</em></p> +</code></pre> +<p>Intraword emphasis is disallowed for <code>_</code>:</p> +<pre><code class="language-example">_foo_bar +. +<p>_foo_bar</p> +</code></pre> +<pre><code class="language-example">_пристаням_стремятся +. +<p>_пристаням_стремятся</p> +</code></pre> +<pre><code class="language-example">_foo_bar_baz_ +. +<p><em>foo_bar_baz</em></p> +</code></pre> +<p>This is emphasis, even though the closing delimiter is +both left- and right-flanking, because it is followed by +punctuation:</p> +<pre><code class="language-example">_(bar)_. +. +<p><em>(bar)</em>.</p> +</code></pre> +<p>Rule 5:</p> +<pre><code class="language-example">**foo bar** +. +<p><strong>foo bar</strong></p> +</code></pre> +<p>This is not strong emphasis, because the opening delimiter is +followed by whitespace:</p> +<pre><code class="language-example">** foo bar** +. +<p>** foo bar**</p> +</code></pre> +<p>This is not strong emphasis, because the opening <code>**</code> is preceded +by an alphanumeric and followed by punctuation, and hence +not part of a [left-flanking delimiter run]:</p> +<pre><code class="language-example">a**"foo"** +. +<p>a**&quot;foo&quot;**</p> +</code></pre> +<p>Intraword strong emphasis with <code>**</code> is permitted:</p> +<pre><code class="language-example">foo**bar** +. +<p>foo<strong>bar</strong></p> +</code></pre> +<p>Rule 6:</p> +<pre><code class="language-example">__foo bar__ +. +<p><strong>foo bar</strong></p> +</code></pre> +<p>This is not strong emphasis, because the opening delimiter is +followed by whitespace:</p> +<pre><code class="language-example">__ foo bar__ +. +<p>__ foo bar__</p> +</code></pre> +<p>A line ending counts as whitespace:</p> +<pre><code class="language-example">__ +foo bar__ +. +<p>__ +foo bar__</p> +</code></pre> +<p>This is not strong emphasis, because the opening <code>__</code> is preceded +by an alphanumeric and followed by punctuation:</p> +<pre><code class="language-example">a__"foo"__ +. +<p>a__&quot;foo&quot;__</p> +</code></pre> +<p>Intraword strong emphasis is forbidden with <code>__</code>:</p> +<pre><code class="language-example">foo__bar__ +. +<p>foo__bar__</p> +</code></pre> +<pre><code class="language-example">5__6__78 +. +<p>5__6__78</p> +</code></pre> +<pre><code class="language-example">пристаням__стремятся__ +. +<p>пристаням__стремятся__</p> +</code></pre> +<pre><code class="language-example">__foo, __bar__, baz__ +. +<p><strong>foo, <strong>bar</strong>, baz</strong></p> +</code></pre> +<p>This is strong emphasis, even though the opening delimiter is +both left- and right-flanking, because it is preceded by +punctuation:</p> +<pre><code class="language-example">foo-__(bar)__ +. +<p>foo-<strong>(bar)</strong></p> +</code></pre> +<p>Rule 7:</p> +<p>This is not strong emphasis, because the closing delimiter is preceded +by whitespace:</p> +<pre><code class="language-example">**foo bar ** +. +<p>**foo bar **</p> +</code></pre> +<p>(Nor can it be interpreted as an emphasized <code>*foo bar *</code>, because of +Rule 11.)</p> +<p>This is not strong emphasis, because the second <code>**</code> is +preceded by punctuation and followed by an alphanumeric:</p> +<pre><code class="language-example">**(**foo) +. +<p>**(**foo)</p> +</code></pre> +<p>The point of this restriction is more easily appreciated +with these examples:</p> +<pre><code class="language-example">*(**foo**)* +. +<p><em>(<strong>foo</strong>)</em></p> +</code></pre> +<pre><code class="language-example">**Gomphocarpus (*Gomphocarpus physocarpus*, syn. +*Asclepias physocarpa*)** +. +<p><strong>Gomphocarpus (<em>Gomphocarpus physocarpus</em>, syn. +<em>Asclepias physocarpa</em>)</strong></p> +</code></pre> +<pre><code class="language-example">**foo "*bar*" foo** +. +<p><strong>foo &quot;<em>bar</em>&quot; foo</strong></p> +</code></pre> +<p>Intraword emphasis:</p> +<pre><code class="language-example">**foo**bar +. +<p><strong>foo</strong>bar</p> +</code></pre> +<p>Rule 8:</p> +<p>This is not strong emphasis, because the closing delimiter is +preceded by whitespace:</p> +<pre><code class="language-example">__foo bar __ +. +<p>__foo bar __</p> +</code></pre> +<p>This is not strong emphasis, because the second <code>__</code> is +preceded by punctuation and followed by an alphanumeric:</p> +<pre><code class="language-example">__(__foo) +. +<p>__(__foo)</p> +</code></pre> +<p>The point of this restriction is more easily appreciated +with this example:</p> +<pre><code class="language-example">_(__foo__)_ +. +<p><em>(<strong>foo</strong>)</em></p> +</code></pre> +<p>Intraword strong emphasis is forbidden with <code>__</code>:</p> +<pre><code class="language-example">__foo__bar +. +<p>__foo__bar</p> +</code></pre> +<pre><code class="language-example">__пристаням__стремятся +. +<p>__пристаням__стремятся</p> +</code></pre> +<pre><code class="language-example">__foo__bar__baz__ +. +<p><strong>foo__bar__baz</strong></p> +</code></pre> +<p>This is strong emphasis, even though the closing delimiter is +both left- and right-flanking, because it is followed by +punctuation:</p> +<pre><code class="language-example">__(bar)__. +. +<p><strong>(bar)</strong>.</p> +</code></pre> +<p>Rule 9:</p> +<p>Any nonempty sequence of inline elements can be the contents of an +emphasized span.</p> +<pre><code class="language-example">*foo [bar](/url)* +. +<p><em>foo <a href="/url">bar</a></em></p> +</code></pre> +<pre><code class="language-example">*foo +bar* +. +<p><em>foo +bar</em></p> +</code></pre> +<p>In particular, emphasis and strong emphasis can be nested +inside emphasis:</p> +<pre><code class="language-example">_foo __bar__ baz_ +. +<p><em>foo <strong>bar</strong> baz</em></p> +</code></pre> +<pre><code class="language-example">_foo _bar_ baz_ +. +<p><em>foo <em>bar</em> baz</em></p> +</code></pre> +<pre><code class="language-example">__foo_ bar_ +. +<p><em><em>foo</em> bar</em></p> +</code></pre> +<pre><code class="language-example">*foo *bar** +. +<p><em>foo <em>bar</em></em></p> +</code></pre> +<pre><code class="language-example">*foo **bar** baz* +. +<p><em>foo <strong>bar</strong> baz</em></p> +</code></pre> +<pre><code class="language-example">*foo**bar**baz* +. +<p><em>foo<strong>bar</strong>baz</em></p> +</code></pre> +<p>Note that in the preceding case, the interpretation</p> +<pre><code class="language-markdown"><p><em>foo</em><em>bar<em></em>baz</em></p> +</code></pre> +<p>is precluded by the condition that a delimiter that +can both open and close (like the <code>*</code> after <code>foo</code>) +cannot form emphasis if the sum of the lengths of +the delimiter runs containing the opening and +closing delimiters is a multiple of 3 unless +both lengths are multiples of 3.</p> +<p>For the same reason, we don't get two consecutive +emphasis sections in this example:</p> +<pre><code class="language-example">*foo**bar* +. +<p><em>foo**bar</em></p> +</code></pre> +<p>The same condition ensures that the following +cases are all strong emphasis nested inside +emphasis, even when the interior whitespace is +omitted:</p> +<pre><code class="language-example">***foo** bar* +. +<p><em><strong>foo</strong> bar</em></p> +</code></pre> +<pre><code class="language-example">*foo **bar*** +. +<p><em>foo <strong>bar</strong></em></p> +</code></pre> +<pre><code class="language-example">*foo**bar*** +. +<p><em>foo<strong>bar</strong></em></p> +</code></pre> +<p>When the lengths of the interior closing and opening +delimiter runs are <em>both</em> multiples of 3, though, +they can match to create emphasis:</p> +<pre><code class="language-example">foo***bar***baz +. +<p>foo<em><strong>bar</strong></em>baz</p> +</code></pre> +<pre><code class="language-example">foo******bar*********baz +. +<p>foo<strong><strong><strong>bar</strong></strong></strong>***baz</p> +</code></pre> +<p>Indefinite levels of nesting are possible:</p> +<pre><code class="language-example">*foo **bar *baz* bim** bop* +. +<p><em>foo <strong>bar <em>baz</em> bim</strong> bop</em></p> +</code></pre> +<pre><code class="language-example">*foo [*bar*](/url)* +. +<p><em>foo <a href="/url"><em>bar</em></a></em></p> +</code></pre> +<p>There can be no empty emphasis or strong emphasis:</p> +<pre><code class="language-example">** is not an empty emphasis +. +<p>** is not an empty emphasis</p> +</code></pre> +<pre><code class="language-example">**** is not an empty strong emphasis +. +<p>**** is not an empty strong emphasis</p> +</code></pre> +<p>Rule 10:</p> +<p>Any nonempty sequence of inline elements can be the contents of an +strongly emphasized span.</p> +<pre><code class="language-example">**foo [bar](/url)** +. +<p><strong>foo <a href="/url">bar</a></strong></p> +</code></pre> +<pre><code class="language-example">**foo +bar** +. +<p><strong>foo +bar</strong></p> +</code></pre> +<p>In particular, emphasis and strong emphasis can be nested +inside strong emphasis:</p> +<pre><code class="language-example">__foo _bar_ baz__ +. +<p><strong>foo <em>bar</em> baz</strong></p> +</code></pre> +<pre><code class="language-example">__foo __bar__ baz__ +. +<p><strong>foo <strong>bar</strong> baz</strong></p> +</code></pre> +<pre><code class="language-example">____foo__ bar__ +. +<p><strong><strong>foo</strong> bar</strong></p> +</code></pre> +<pre><code class="language-example">**foo **bar**** +. +<p><strong>foo <strong>bar</strong></strong></p> +</code></pre> +<pre><code class="language-example">**foo *bar* baz** +. +<p><strong>foo <em>bar</em> baz</strong></p> +</code></pre> +<pre><code class="language-example">**foo*bar*baz** +. +<p><strong>foo<em>bar</em>baz</strong></p> +</code></pre> +<pre><code class="language-example">***foo* bar** +. +<p><strong><em>foo</em> bar</strong></p> +</code></pre> +<pre><code class="language-example">**foo *bar*** +. +<p><strong>foo <em>bar</em></strong></p> +</code></pre> +<p>Indefinite levels of nesting are possible:</p> +<pre><code class="language-example">**foo *bar **baz** +bim* bop** +. +<p><strong>foo <em>bar <strong>baz</strong> +bim</em> bop</strong></p> +</code></pre> +<pre><code class="language-example">**foo [*bar*](/url)** +. +<p><strong>foo <a href="/url"><em>bar</em></a></strong></p> +</code></pre> +<p>There can be no empty emphasis or strong emphasis:</p> +<pre><code class="language-example">__ is not an empty emphasis +. +<p>__ is not an empty emphasis</p> +</code></pre> +<pre><code class="language-example">____ is not an empty strong emphasis +. +<p>____ is not an empty strong emphasis</p> +</code></pre> +<p>Rule 11:</p> +<pre><code class="language-example">foo *** +. +<p>foo ***</p> +</code></pre> +<pre><code class="language-example">foo *\** +. +<p>foo <em>*</em></p> +</code></pre> +<pre><code class="language-example">foo *_* +. +<p>foo <em>_</em></p> +</code></pre> +<pre><code class="language-example">foo ***** +. +<p>foo *****</p> +</code></pre> +<pre><code class="language-example">foo **\*** +. +<p>foo <strong>*</strong></p> +</code></pre> +<pre><code class="language-example">foo **_** +. +<p>foo <strong>_</strong></p> +</code></pre> +<p>Note that when delimiters do not match evenly, Rule 11 determines +that the excess literal <code>*</code> characters will appear outside of the +emphasis, rather than inside it:</p> +<pre><code class="language-example">**foo* +. +<p>*<em>foo</em></p> +</code></pre> +<pre><code class="language-example">*foo** +. +<p><em>foo</em>*</p> +</code></pre> +<pre><code class="language-example">***foo** +. +<p>*<strong>foo</strong></p> +</code></pre> +<pre><code class="language-example">****foo* +. +<p>***<em>foo</em></p> +</code></pre> +<pre><code class="language-example">**foo*** +. +<p><strong>foo</strong>*</p> +</code></pre> +<pre><code class="language-example">*foo**** +. +<p><em>foo</em>***</p> +</code></pre> +<p>Rule 12:</p> +<pre><code class="language-example">foo ___ +. +<p>foo ___</p> +</code></pre> +<pre><code class="language-example">foo _\__ +. +<p>foo <em>_</em></p> +</code></pre> +<pre><code class="language-example">foo _*_ +. +<p>foo <em>*</em></p> +</code></pre> +<pre><code class="language-example">foo _____ +. +<p>foo _____</p> +</code></pre> +<pre><code class="language-example">foo __\___ +. +<p>foo <strong>_</strong></p> +</code></pre> +<pre><code class="language-example">foo __*__ +. +<p>foo <strong>*</strong></p> +</code></pre> +<pre><code class="language-example">__foo_ +. +<p>_<em>foo</em></p> +</code></pre> +<p>Note that when delimiters do not match evenly, Rule 12 determines +that the excess literal <code>_</code> characters will appear outside of the +emphasis, rather than inside it:</p> +<pre><code class="language-example">_foo__ +. +<p><em>foo</em>_</p> +</code></pre> +<pre><code class="language-example">___foo__ +. +<p>_<strong>foo</strong></p> +</code></pre> +<pre><code class="language-example">____foo_ +. +<p>___<em>foo</em></p> +</code></pre> +<pre><code class="language-example">__foo___ +. +<p><strong>foo</strong>_</p> +</code></pre> +<pre><code class="language-example">_foo____ +. +<p><em>foo</em>___</p> +</code></pre> +<p>Rule 13 implies that if you want emphasis nested directly inside +emphasis, you must use different delimiters:</p> +<pre><code class="language-example">**foo** +. +<p><strong>foo</strong></p> +</code></pre> +<pre><code class="language-example">*_foo_* +. +<p><em><em>foo</em></em></p> +</code></pre> +<pre><code class="language-example">__foo__ +. +<p><strong>foo</strong></p> +</code></pre> +<pre><code class="language-example">_*foo*_ +. +<p><em><em>foo</em></em></p> +</code></pre> +<p>However, strong emphasis within strong emphasis is possible without +switching delimiters:</p> +<pre><code class="language-example">****foo**** +. +<p><strong><strong>foo</strong></strong></p> +</code></pre> +<pre><code class="language-example">____foo____ +. +<p><strong><strong>foo</strong></strong></p> +</code></pre> +<p>Rule 13 can be applied to arbitrarily long sequences of +delimiters:</p> +<pre><code class="language-example">******foo****** +. +<p><strong><strong><strong>foo</strong></strong></strong></p> +</code></pre> +<p>Rule 14:</p> +<pre><code class="language-example">***foo*** +. +<p><em><strong>foo</strong></em></p> +</code></pre> +<pre><code class="language-example">_____foo_____ +. +<p><em><strong><strong>foo</strong></strong></em></p> +</code></pre> +<p>Rule 15:</p> +<pre><code class="language-example">*foo _bar* baz_ +. +<p><em>foo _bar</em> baz_</p> +</code></pre> +<pre><code class="language-example">*foo __bar *baz bim__ bam* +. +<p><em>foo <strong>bar *baz bim</strong> bam</em></p> +</code></pre> +<p>Rule 16:</p> +<pre><code class="language-example">**foo **bar baz** +. +<p>**foo <strong>bar baz</strong></p> +</code></pre> +<pre><code class="language-example">*foo *bar baz* +. +<p>*foo <em>bar baz</em></p> +</code></pre> +<p>Rule 17:</p> +<pre><code class="language-example">*[bar*](/url) +. +<p>*<a href="/url">bar*</a></p> +</code></pre> +<pre><code class="language-example">_foo [bar_](/url) +. +<p>_foo <a href="/url">bar_</a></p> +</code></pre> +<pre><code class="language-example">*<img src="foo" title="*"/> +. +<p>*<img src="foo" title="*"/></p> +</code></pre> +<pre><code class="language-example">**<a href="**"> +. +<p>**<a href="**"></p> +</code></pre> +<pre><code class="language-example">__<a href="__"> +. +<p>__<a href="__"></p> +</code></pre> +<pre><code class="language-example">*a `*`* +. +<p><em>a <code>*</code></em></p> +</code></pre> +<pre><code class="language-example">_a `_`_ +. +<p><em>a <code>_</code></em></p> +</code></pre> +<pre><code class="language-example">**a<http://foo.bar/?q=**> +. +<p>**a<a href="http://foo.bar/?q=**">http://foo.bar/?q=**</a></p> +</code></pre> +<pre><code class="language-example">__a<http://foo.bar/?q=__> +. +<p>__a<a href="http://foo.bar/?q=__">http://foo.bar/?q=__</a></p> +</code></pre> +<h2>Links</h2> +<p>A link contains [link text] (the visible text), a [link destination] +(the URI that is the link destination), and optionally a [link title]. +There are two basic kinds of links in Markdown. In [inline links] the +destination and title are given immediately after the link text. In +[reference links] the destination and title are defined elsewhere in +the document.</p> +<p>A <a href="@">link text</a> consists of a sequence of zero or more +inline elements enclosed by square brackets (<code>[</code> and <code>]</code>). The +following rules apply:</p> +<ul> +<li> +<p>Links may not contain other links, at any level of nesting. If +multiple otherwise valid link definitions appear nested inside each +other, the inner-most definition is used.</p> +</li> +<li> +<p>Brackets are allowed in the [link text] only if (a) they +are backslash-escaped or (b) they appear as a matched pair of brackets, +with an open bracket <code>[</code>, a sequence of zero or more inlines, and +a close bracket <code>]</code>.</p> +</li> +<li> +<p>Backtick [code spans], [autolinks], and raw [HTML tags] bind more tightly +than the brackets in link text. Thus, for example, +<code>[foo`]`</code> could not be a link text, since the second <code>]</code> +is part of a code span.</p> +</li> +<li> +<p>The brackets in link text bind more tightly than markers for +[emphasis and strong emphasis]. Thus, for example, <code>*[foo*](url)</code> is a link.</p> +</li> +</ul> +<p>A <a href="@">link destination</a> consists of either</p> +<ul> +<li> +<p>a sequence of zero or more characters between an opening <code><</code> and a +closing <code>></code> that contains no line endings or unescaped +<code><</code> or <code>></code> characters, or</p> +</li> +<li> +<p>a nonempty sequence of characters that does not start with <code><</code>, +does not include [ASCII control characters][ASCII control character] +or [space] character, and includes parentheses only if (a) they are +backslash-escaped or (b) they are part of a balanced pair of +unescaped parentheses. +(Implementations may impose limits on parentheses nesting to +avoid performance issues, but at least three levels of nesting +should be supported.)</p> +</li> +</ul> +<p>A <a href="@">link title</a> consists of either</p> +<ul> +<li> +<p>a sequence of zero or more characters between straight double-quote +characters (<code>"</code>), including a <code>"</code> character only if it is +backslash-escaped, or</p> +</li> +<li> +<p>a sequence of zero or more characters between straight single-quote +characters (<code>'</code>), including a <code>'</code> character only if it is +backslash-escaped, or</p> +</li> +<li> +<p>a sequence of zero or more characters between matching parentheses +(<code>(...)</code>), including a <code>(</code> or <code>)</code> character only if it is +backslash-escaped.</p> +</li> +</ul> +<p>Although [link titles] may span multiple lines, they may not contain +a [blank line].</p> +<p>An <a href="@">inline link</a> consists of a [link text] followed immediately +by a left parenthesis <code>(</code>, an optional [link destination], an optional +[link title], and a right parenthesis <code>)</code>. +These four components may be separated by spaces, tabs, and up to one line +ending. +If both [link destination] and [link title] are present, they <em>must</em> be +separated by spaces, tabs, and up to one line ending.</p> +<p>The link's text consists of the inlines contained +in the [link text] (excluding the enclosing square brackets). +The link's URI consists of the link destination, excluding enclosing +<code><...></code> if present, with backslash-escapes in effect as described +above. The link's title consists of the link title, excluding its +enclosing delimiters, with backslash-escapes in effect as described +above.</p> +<p>Here is a simple inline link:</p> +<pre><code class="language-example">[link](/uri "title") +. +<p><a href="/uri" title="title">link</a></p> +</code></pre> +<p>The title, the link text and even +the destination may be omitted:</p> +<pre><code class="language-example">[link](/uri) +. +<p><a href="/uri">link</a></p> +</code></pre> +<pre><code class="language-example">[](./target.md) +. +<p><a href="./target.md"></a></p> +</code></pre> +<pre><code class="language-example">[link]() +. +<p><a href="">link</a></p> +</code></pre> +<pre><code class="language-example">[link](<>) +. +<p><a href="">link</a></p> +</code></pre> +<pre><code class="language-example">[]() +. +<p><a href=""></a></p> +</code></pre> +<p>The destination can only contain spaces if it is +enclosed in pointy brackets:</p> +<pre><code class="language-example">[link](/my uri) +. +<p>[link](/my uri)</p> +</code></pre> +<pre><code class="language-example">[link](</my uri>) +. +<p><a href="/my%20uri">link</a></p> +</code></pre> +<p>The destination cannot contain line endings, +even if enclosed in pointy brackets:</p> +<pre><code class="language-example">[link](foo +bar) +. +<p>[link](foo +bar)</p> +</code></pre> +<pre><code class="language-example">[link](<foo +bar>) +. +<p>[link](<foo +bar>)</p> +</code></pre> +<p>The destination can contain <code>)</code> if it is enclosed +in pointy brackets:</p> +<pre><code class="language-example">[a](<b)c>) +. +<p><a href="b)c">a</a></p> +</code></pre> +<p>Pointy brackets that enclose links must be unescaped:</p> +<pre><code class="language-example">[link](<foo\>) +. +<p>[link](&lt;foo&gt;)</p> +</code></pre> +<p>These are not links, because the opening pointy bracket +is not matched properly:</p> +<pre><code class="language-example">[a](<b)c +[a](<b)c> +[a](<b>c) +. +<p>[a](&lt;b)c +[a](&lt;b)c&gt; +[a](<b>c)</p> +</code></pre> +<p>Parentheses inside the link destination may be escaped:</p> +<pre><code class="language-example">[link](\(foo\)) +. +<p><a href="(foo)">link</a></p> +</code></pre> +<p>Any number of parentheses are allowed without escaping, as long as they are +balanced:</p> +<pre><code class="language-example">[link](foo(and(bar))) +. +<p><a href="foo(and(bar))">link</a></p> +</code></pre> +<p>However, if you have unbalanced parentheses, you need to escape or use the +<code><...></code> form:</p> +<pre><code class="language-example">[link](foo(and(bar)) +. +<p>[link](foo(and(bar))</p> +</code></pre> +<pre><code class="language-example">[link](foo\(and\(bar\)) +. +<p><a href="foo(and(bar)">link</a></p> +</code></pre> +<pre><code class="language-example">[link](<foo(and(bar)>) +. +<p><a href="foo(and(bar)">link</a></p> +</code></pre> +<p>Parentheses and other symbols can also be escaped, as usual +in Markdown:</p> +<pre><code class="language-example">[link](foo\)\:) +. +<p><a href="foo):">link</a></p> +</code></pre> +<p>A link can contain fragment identifiers and queries:</p> +<pre><code class="language-example">[link](#fragment) + +[link](http://example.com#fragment) + +[link](http://example.com?foo=3#frag) +. +<p><a href="#fragment">link</a></p> +<p><a href="http://example.com#fragment">link</a></p> +<p><a href="http://example.com?foo=3#frag">link</a></p> +</code></pre> +<p>Note that a backslash before a non-escapable character is +just a backslash:</p> +<pre><code class="language-example">[link](foo\bar) +. +<p><a href="foo%5Cbar">link</a></p> +</code></pre> +<p>URL-escaping should be left alone inside the destination, as all +URL-escaped characters are also valid URL characters. Entity and +numerical character references in the destination will be parsed +into the corresponding Unicode code points, as usual. These may +be optionally URL-escaped when written as HTML, but this spec +does not enforce any particular policy for rendering URLs in +HTML or other formats. Renderers may make different decisions +about how to escape or normalize URLs in the output.</p> +<pre><code class="language-example">[link](foo%20b&auml;) +. +<p><a href="foo%20b%C3%A4">link</a></p> +</code></pre> +<p>Note that, because titles can often be parsed as destinations, +if you try to omit the destination and keep the title, you'll +get unexpected results:</p> +<pre><code class="language-example">[link]("title") +. +<p><a href="%22title%22">link</a></p> +</code></pre> +<p>Titles may be in single quotes, double quotes, or parentheses:</p> +<pre><code class="language-example">[link](/url "title") +[link](/url 'title') +[link](/url (title)) +. +<p><a href="/url" title="title">link</a> +<a href="/url" title="title">link</a> +<a href="/url" title="title">link</a></p> +</code></pre> +<p>Backslash escapes and entity and numeric character references +may be used in titles:</p> +<pre><code class="language-example">[link](/url "title \"&quot;") +. +<p><a href="/url" title="title &quot;&quot;">link</a></p> +</code></pre> +<p>Titles must be separated from the link using spaces, tabs, and up to one line +ending. +Other [Unicode whitespace] like non-breaking space doesn't work.</p> +<pre><code class="language-example">[link](/url "title") +. +<p><a href="/url%C2%A0%22title%22">link</a></p> +</code></pre> +<p>Nested balanced quotes are not allowed without escaping:</p> +<pre><code class="language-example">[link](/url "title "and" title") +. +<p>[link](/url &quot;title &quot;and&quot; title&quot;)</p> +</code></pre> +<p>But it is easy to work around this by using a different quote type:</p> +<pre><code class="language-example">[link](/url 'title "and" title') +. +<p><a href="/url" title="title &quot;and&quot; title">link</a></p> +</code></pre> +<p>(Note: <code>Markdown.pl</code> did allow double quotes inside a double-quoted +title, and its test suite included a test demonstrating this. +But it is hard to see a good rationale for the extra complexity this +brings, since there are already many ways---backslash escaping, +entity and numeric character references, or using a different +quote type for the enclosing title---to write titles containing +double quotes. <code>Markdown.pl</code>'s handling of titles has a number +of other strange features. For example, it allows single-quoted +titles in inline links, but not reference links. And, in +reference links but not inline links, it allows a title to begin +with <code>"</code> and end with <code>)</code>. <code>Markdown.pl</code> 1.0.1 even allows +titles with no closing quotation mark, though 1.0.2b8 does not. +It seems preferable to adopt a simple, rational rule that works +the same way in inline links and link reference definitions.)</p> +<p>Spaces, tabs, and up to one line ending is allowed around the destination and +title:</p> +<pre><code class="language-example">[link]( /uri + "title" ) +. +<p><a href="/uri" title="title">link</a></p> +</code></pre> +<p>But it is not allowed between the link text and the +following parenthesis:</p> +<pre><code class="language-example">[link] (/uri) +. +<p>[link] (/uri)</p> +</code></pre> +<p>The link text may contain balanced brackets, but not unbalanced ones, +unless they are escaped:</p> +<pre><code class="language-example">[link [foo [bar]]](/uri) +. +<p><a href="/uri">link [foo [bar]]</a></p> +</code></pre> +<pre><code class="language-example">[link] bar](/uri) +. +<p>[link] bar](/uri)</p> +</code></pre> +<pre><code class="language-example">[link [bar](/uri) +. +<p>[link <a href="/uri">bar</a></p> +</code></pre> +<pre><code class="language-example">[link \[bar](/uri) +. +<p><a href="/uri">link [bar</a></p> +</code></pre> +<p>The link text may contain inline content:</p> +<pre><code class="language-example">[link *foo **bar** `#`*](/uri) +. +<p><a href="/uri">link <em>foo <strong>bar</strong> <code>#</code></em></a></p> +</code></pre> +<pre><code class="language-example">[![moon](moon.jpg)](/uri) +. +<p><a href="/uri"><img src="moon.jpg" alt="moon" /></a></p> +</code></pre> +<p>However, links may not contain other links, at any level of nesting.</p> +<pre><code class="language-example">[foo [bar](/uri)](/uri) +. +<p>[foo <a href="/uri">bar</a>](/uri)</p> +</code></pre> +<pre><code class="language-example">[foo *[bar [baz](/uri)](/uri)*](/uri) +. +<p>[foo <em>[bar <a href="/uri">baz</a>](/uri)</em>](/uri)</p> +</code></pre> +<pre><code class="language-example">![[[foo](uri1)](uri2)](uri3) +. +<p><img src="uri3" alt="[foo](uri2)" /></p> +</code></pre> +<p>These cases illustrate the precedence of link text grouping over +emphasis grouping:</p> +<pre><code class="language-example">*[foo*](/uri) +. +<p>*<a href="/uri">foo*</a></p> +</code></pre> +<pre><code class="language-example">[foo *bar](baz*) +. +<p><a href="baz*">foo *bar</a></p> +</code></pre> +<p>Note that brackets that <em>aren't</em> part of links do not take +precedence:</p> +<pre><code class="language-example">*foo [bar* baz] +. +<p><em>foo [bar</em> baz]</p> +</code></pre> +<p>These cases illustrate the precedence of HTML tags, code spans, +and autolinks over link grouping:</p> +<pre><code class="language-example">[foo <bar attr="](baz)"> +. +<p>[foo <bar attr="](baz)"></p> +</code></pre> +<pre><code class="language-example">[foo`](/uri)` +. +<p>[foo<code>](/uri)</code></p> +</code></pre> +<pre><code class="language-example">[foo<http://example.com/?search=](uri)> +. +<p>[foo<a href="http://example.com/?search=%5D(uri)">http://example.com/?search=](uri)</a></p> +</code></pre> +<p>There are three kinds of <a href="@">reference link</a>s: +<a href="#full-reference-link">full</a>, <a href="#collapsed-reference-link">collapsed</a>, +and <a href="#shortcut-reference-link">shortcut</a>.</p> +<p>A <a href="@">full reference link</a> +consists of a [link text] immediately followed by a [link label] +that [matches] a [link reference definition] elsewhere in the document.</p> +<p>A <a href="@">link label</a> begins with a left bracket (<code>[</code>) and ends +with the first right bracket (<code>]</code>) that is not backslash-escaped. +Between these brackets there must be at least one character that is not a space, +tab, or line ending. +Unescaped square bracket characters are not allowed inside the +opening and closing square brackets of [link labels]. A link +label can have at most 999 characters inside the square +brackets.</p> +<p>One label <a href="@">matches</a> +another just in case their normalized forms are equal. To normalize a +label, strip off the opening and closing brackets, +perform the <em>Unicode case fold</em>, strip leading and trailing +spaces, tabs, and line endings, and collapse consecutive internal +spaces, tabs, and line endings to a single space. If there are multiple +matching reference link definitions, the one that comes first in the +document is used. (It is desirable in such cases to emit a warning.)</p> +<p>The link's URI and title are provided by the matching [link +reference definition].</p> +<p>Here is a simple example:</p> +<pre><code class="language-example">[foo][bar] + +[bar]: /url "title" +. +<p><a href="/url" title="title">foo</a></p> +</code></pre> +<p>The rules for the [link text] are the same as with +[inline links]. Thus:</p> +<p>The link text may contain balanced brackets, but not unbalanced ones, +unless they are escaped:</p> +<pre><code class="language-example">[link [foo [bar]]][ref] + +[ref]: /uri +. +<p><a href="/uri">link [foo [bar]]</a></p> +</code></pre> +<pre><code class="language-example">[link \[bar][ref] + +[ref]: /uri +. +<p><a href="/uri">link [bar</a></p> +</code></pre> +<p>The link text may contain inline content:</p> +<pre><code class="language-example">[link *foo **bar** `#`*][ref] + +[ref]: /uri +. +<p><a href="/uri">link <em>foo <strong>bar</strong> <code>#</code></em></a></p> +</code></pre> +<pre><code class="language-example">[![moon](moon.jpg)][ref] + +[ref]: /uri +. +<p><a href="/uri"><img src="moon.jpg" alt="moon" /></a></p> +</code></pre> +<p>However, links may not contain other links, at any level of nesting.</p> +<pre><code class="language-example">[foo [bar](/uri)][ref] + +[ref]: /uri +. +<p>[foo <a href="/uri">bar</a>]<a href="/uri">ref</a></p> +</code></pre> +<pre><code class="language-example">[foo *bar [baz][ref]*][ref] + +[ref]: /uri +. +<p>[foo <em>bar <a href="/uri">baz</a></em>]<a href="/uri">ref</a></p> +</code></pre> +<p>(In the examples above, we have two [shortcut reference links] +instead of one [full reference link].)</p> +<p>The following cases illustrate the precedence of link text grouping over +emphasis grouping:</p> +<pre><code class="language-example">*[foo*][ref] + +[ref]: /uri +. +<p>*<a href="/uri">foo*</a></p> +</code></pre> +<pre><code class="language-example">[foo *bar][ref]* + +[ref]: /uri +. +<p><a href="/uri">foo *bar</a>*</p> +</code></pre> +<p>These cases illustrate the precedence of HTML tags, code spans, +and autolinks over link grouping:</p> +<pre><code class="language-example">[foo <bar attr="][ref]"> + +[ref]: /uri +. +<p>[foo <bar attr="][ref]"></p> +</code></pre> +<pre><code class="language-example">[foo`][ref]` + +[ref]: /uri +. +<p>[foo<code>][ref]</code></p> +</code></pre> +<pre><code class="language-example">[foo<http://example.com/?search=][ref]> + +[ref]: /uri +. +<p>[foo<a href="http://example.com/?search=%5D%5Bref%5D">http://example.com/?search=][ref]</a></p> +</code></pre> +<p>Matching is case-insensitive:</p> +<pre><code class="language-example">[foo][BaR] + +[bar]: /url "title" +. +<p><a href="/url" title="title">foo</a></p> +</code></pre> +<p>Unicode case fold is used:</p> +<pre><code class="language-example">[ẞ] + +[SS]: /url +. +<p><a href="/url">ẞ</a></p> +</code></pre> +<p>Consecutive internal spaces, tabs, and line endings are treated as one space for +purposes of determining matching:</p> +<pre><code class="language-example">[Foo + bar]: /url + +[Baz][Foo bar] +. +<p><a href="/url">Baz</a></p> +</code></pre> +<p>No spaces, tabs, or line endings are allowed between the [link text] and the +[link label]:</p> +<pre><code class="language-example">[foo] [bar] + +[bar]: /url "title" +. +<p>[foo] <a href="/url" title="title">bar</a></p> +</code></pre> +<pre><code class="language-example">[foo] +[bar] + +[bar]: /url "title" +. +<p>[foo] +<a href="/url" title="title">bar</a></p> +</code></pre> +<p>This is a departure from John Gruber's original Markdown syntax +description, which explicitly allows whitespace between the link +text and the link label. It brings reference links in line with +[inline links], which (according to both original Markdown and +this spec) cannot have whitespace after the link text. More +importantly, it prevents inadvertent capture of consecutive +[shortcut reference links]. If whitespace is allowed between the +link text and the link label, then in the following we will have +a single reference link, not two shortcut reference links, as +intended:</p> +<pre><code class="language-markdown">[foo] +[bar] + +[foo]: /url1 +[bar]: /url2 +</code></pre> +<p>(Note that [shortcut reference links] were introduced by Gruber +himself in a beta version of <code>Markdown.pl</code>, but never included +in the official syntax description. Without shortcut reference +links, it is harmless to allow space between the link text and +link label; but once shortcut references are introduced, it is +too dangerous to allow this, as it frequently leads to +unintended results.)</p> +<p>When there are multiple matching [link reference definitions], +the first is used:</p> +<pre><code class="language-example">[foo]: /url1 + +[foo]: /url2 + +[bar][foo] +. +<p><a href="/url1">bar</a></p> +</code></pre> +<p>Note that matching is performed on normalized strings, not parsed +inline content. So the following does not match, even though the +labels define equivalent inline content:</p> +<pre><code class="language-example">[bar][foo\!] + +[foo!]: /url +. +<p>[bar][foo!]</p> +</code></pre> +<p>[Link labels] cannot contain brackets, unless they are +backslash-escaped:</p> +<pre><code class="language-example">[foo][ref[] + +[ref[]: /uri +. +<p>[foo][ref[]</p> +<p>[ref[]: /uri</p> +</code></pre> +<pre><code class="language-example">[foo][ref[bar]] + +[ref[bar]]: /uri +. +<p>[foo][ref[bar]]</p> +<p>[ref[bar]]: /uri</p> +</code></pre> +<pre><code class="language-example">[[[foo]]] + +[[[foo]]]: /url +. +<p>[[[foo]]]</p> +<p>[[[foo]]]: /url</p> +</code></pre> +<pre><code class="language-example">[foo][ref\[] + +[ref\[]: /uri +. +<p><a href="/uri">foo</a></p> +</code></pre> +<p>Note that in this example <code>]</code> is not backslash-escaped:</p> +<pre><code class="language-example">[bar\\]: /uri + +[bar\\] +. +<p><a href="/uri">bar\</a></p> +</code></pre> +<p>A [link label] must contain at least one character that is not a space, tab, or +line ending:</p> +<pre><code class="language-example">[] + +[]: /uri +. +<p>[]</p> +<p>[]: /uri</p> +</code></pre> +<pre><code class="language-example">[ + ] + +[ + ]: /uri +. +<p>[ +]</p> +<p>[ +]: /uri</p> +</code></pre> +<p>A <a href="@">collapsed reference link</a> +consists of a [link label] that [matches] a +[link reference definition] elsewhere in the +document, followed by the string <code>[]</code>. +The contents of the first link label are parsed as inlines, +which are used as the link's text. The link's URI and title are +provided by the matching reference link definition. Thus, +<code>[foo][]</code> is equivalent to <code>[foo][foo]</code>.</p> +<pre><code class="language-example">[foo][] + +[foo]: /url "title" +. +<p><a href="/url" title="title">foo</a></p> +</code></pre> +<pre><code class="language-example">[*foo* bar][] + +[*foo* bar]: /url "title" +. +<p><a href="/url" title="title"><em>foo</em> bar</a></p> +</code></pre> +<p>The link labels are case-insensitive:</p> +<pre><code class="language-example">[Foo][] + +[foo]: /url "title" +. +<p><a href="/url" title="title">Foo</a></p> +</code></pre> +<p>As with full reference links, spaces, tabs, or line endings are not +allowed between the two sets of brackets:</p> +<pre><code class="language-example">[foo] +[] + +[foo]: /url "title" +. +<p><a href="/url" title="title">foo</a> +[]</p> +</code></pre> +<p>A <a href="@">shortcut reference link</a> +consists of a [link label] that [matches] a +[link reference definition] elsewhere in the +document and is not followed by <code>[]</code> or a link label. +The contents of the first link label are parsed as inlines, +which are used as the link's text. The link's URI and title +are provided by the matching link reference definition. +Thus, <code>[foo]</code> is equivalent to <code>[foo][]</code>.</p> +<pre><code class="language-example">[foo] + +[foo]: /url "title" +. +<p><a href="/url" title="title">foo</a></p> +</code></pre> +<pre><code class="language-example">[*foo* bar] + +[*foo* bar]: /url "title" +. +<p><a href="/url" title="title"><em>foo</em> bar</a></p> +</code></pre> +<pre><code class="language-example">[[*foo* bar]] + +[*foo* bar]: /url "title" +. +<p>[<a href="/url" title="title"><em>foo</em> bar</a>]</p> +</code></pre> +<pre><code class="language-example">[[bar [foo] + +[foo]: /url +. +<p>[[bar <a href="/url">foo</a></p> +</code></pre> +<p>The link labels are case-insensitive:</p> +<pre><code class="language-example">[Foo] + +[foo]: /url "title" +. +<p><a href="/url" title="title">Foo</a></p> +</code></pre> +<p>A space after the link text should be preserved:</p> +<pre><code class="language-example">[foo] bar + +[foo]: /url +. +<p><a href="/url">foo</a> bar</p> +</code></pre> +<p>If you just want bracketed text, you can backslash-escape the +opening bracket to avoid links:</p> +<pre><code class="language-example">\[foo] + +[foo]: /url "title" +. +<p>[foo]</p> +</code></pre> +<p>Note that this is a link, because a link label ends with the first +following closing bracket:</p> +<pre><code class="language-example">[foo*]: /url + +*[foo*] +. +<p>*<a href="/url">foo*</a></p> +</code></pre> +<p>Full and compact references take precedence over shortcut +references:</p> +<pre><code class="language-example">[foo][bar] + +[foo]: /url1 +[bar]: /url2 +. +<p><a href="/url2">foo</a></p> +</code></pre> +<pre><code class="language-example">[foo][] + +[foo]: /url1 +. +<p><a href="/url1">foo</a></p> +</code></pre> +<p>Inline links also take precedence:</p> +<pre><code class="language-example">[foo]() + +[foo]: /url1 +. +<p><a href="">foo</a></p> +</code></pre> +<pre><code class="language-example">[foo](not a link) + +[foo]: /url1 +. +<p><a href="/url1">foo</a>(not a link)</p> +</code></pre> +<p>In the following case <code>[bar][baz]</code> is parsed as a reference, +<code>[foo]</code> as normal text:</p> +<pre><code class="language-example">[foo][bar][baz] + +[baz]: /url +. +<p>[foo]<a href="/url">bar</a></p> +</code></pre> +<p>Here, though, <code>[foo][bar]</code> is parsed as a reference, since +<code>[bar]</code> is defined:</p> +<pre><code class="language-example">[foo][bar][baz] + +[baz]: /url1 +[bar]: /url2 +. +<p><a href="/url2">foo</a><a href="/url1">baz</a></p> +</code></pre> +<p>Here <code>[foo]</code> is not parsed as a shortcut reference, because it +is followed by a link label (even though <code>[bar]</code> is not defined):</p> +<pre><code class="language-example">[foo][bar][baz] + +[baz]: /url1 +[foo]: /url2 +. +<p>[foo]<a href="/url1">bar</a></p> +</code></pre> +<h2>Images</h2> +<p>Syntax for images is like the syntax for links, with one +difference. Instead of [link text], we have an +<a href="@">image description</a>. The rules for this are the +same as for [link text], except that (a) an +image description starts with <code>![</code> rather than <code>[</code>, and +(b) an image description may contain links. +An image description has inline elements +as its contents. When an image is rendered to HTML, +this is standardly used as the image's <code>alt</code> attribute.</p> +<pre><code class="language-example">![foo](/url "title") +. +<p><img src="/url" alt="foo" title="title" /></p> +</code></pre> +<pre><code class="language-example">![foo *bar*] + +[foo *bar*]: train.jpg "train & tracks" +. +<p><img src="train.jpg" alt="foo bar" title="train &amp; tracks" /></p> +</code></pre> +<pre><code class="language-example">![foo ![bar](/url)](/url2) +. +<p><img src="/url2" alt="foo bar" /></p> +</code></pre> +<pre><code class="language-example">![foo [bar](/url)](/url2) +. +<p><img src="/url2" alt="foo bar" /></p> +</code></pre> +<p>Though this spec is concerned with parsing, not rendering, it is +recommended that in rendering to HTML, only the plain string content +of the [image description] be used. Note that in +the above example, the alt attribute's value is <code>foo bar</code>, not <code>foo [bar](/url)</code> or <code>foo <a href="/url">bar</a></code>. Only the plain string +content is rendered, without formatting.</p> +<pre><code class="language-example">![foo *bar*][] + +[foo *bar*]: train.jpg "train & tracks" +. +<p><img src="train.jpg" alt="foo bar" title="train &amp; tracks" /></p> +</code></pre> +<pre><code class="language-example">![foo *bar*][foobar] + +[FOOBAR]: train.jpg "train & tracks" +. +<p><img src="train.jpg" alt="foo bar" title="train &amp; tracks" /></p> +</code></pre> +<pre><code class="language-example">![foo](train.jpg) +. +<p><img src="train.jpg" alt="foo" /></p> +</code></pre> +<pre><code class="language-example">My ![foo bar](/path/to/train.jpg "title" ) +. +<p>My <img src="/path/to/train.jpg" alt="foo bar" title="title" /></p> +</code></pre> +<pre><code class="language-example">![foo](<url>) +. +<p><img src="url" alt="foo" /></p> +</code></pre> +<pre><code class="language-example">![](/url) +. +<p><img src="/url" alt="" /></p> +</code></pre> +<p>Reference-style:</p> +<pre><code class="language-example">![foo][bar] + +[bar]: /url +. +<p><img src="/url" alt="foo" /></p> +</code></pre> +<pre><code class="language-example">![foo][bar] + +[BAR]: /url +. +<p><img src="/url" alt="foo" /></p> +</code></pre> +<p>Collapsed:</p> +<pre><code class="language-example">![foo][] + +[foo]: /url "title" +. +<p><img src="/url" alt="foo" title="title" /></p> +</code></pre> +<pre><code class="language-example">![*foo* bar][] + +[*foo* bar]: /url "title" +. +<p><img src="/url" alt="foo bar" title="title" /></p> +</code></pre> +<p>The labels are case-insensitive:</p> +<pre><code class="language-example">![Foo][] + +[foo]: /url "title" +. +<p><img src="/url" alt="Foo" title="title" /></p> +</code></pre> +<p>As with reference links, spaces, tabs, and line endings, are not allowed +between the two sets of brackets:</p> +<pre><code class="language-example">![foo] +[] + +[foo]: /url "title" +. +<p><img src="/url" alt="foo" title="title" /> +[]</p> +</code></pre> +<p>Shortcut:</p> +<pre><code class="language-example">![foo] + +[foo]: /url "title" +. +<p><img src="/url" alt="foo" title="title" /></p> +</code></pre> +<pre><code class="language-example">![*foo* bar] + +[*foo* bar]: /url "title" +. +<p><img src="/url" alt="foo bar" title="title" /></p> +</code></pre> +<p>Note that link labels cannot contain unescaped brackets:</p> +<pre><code class="language-example">![[foo]] + +[[foo]]: /url "title" +. +<p>![[foo]]</p> +<p>[[foo]]: /url &quot;title&quot;</p> +</code></pre> +<p>The link labels are case-insensitive:</p> +<pre><code class="language-example">![Foo] + +[foo]: /url "title" +. +<p><img src="/url" alt="Foo" title="title" /></p> +</code></pre> +<p>If you just want a literal <code>!</code> followed by bracketed text, you can +backslash-escape the opening <code>[</code>:</p> +<pre><code class="language-example">!\[foo] + +[foo]: /url "title" +. +<p>![foo]</p> +</code></pre> +<p>If you want a link after a literal <code>!</code>, backslash-escape the +<code>!</code>:</p> +<pre><code class="language-example">\![foo] + +[foo]: /url "title" +. +<p>!<a href="/url" title="title">foo</a></p> +</code></pre> +<h2>Autolinks</h2> +<p><a href="@">Autolink</a>s are absolute URIs and email addresses inside +<code><</code> and <code>></code>. They are parsed as links, with the URL or email address +as the link label.</p> +<p>A <a href="@">URI autolink</a> consists of <code><</code>, followed by an +[absolute URI] followed by <code>></code>. It is parsed as +a link to the URI, with the URI as the link's label.</p> +<p>An <a href="@">absolute URI</a>, +for these purposes, consists of a [scheme] followed by a colon (<code>:</code>) +followed by zero or more characters other [ASCII control +characters][ASCII control character], [space], <code><</code>, and <code>></code>. +If the URI includes these characters, they must be percent-encoded +(e.g. <code>%20</code> for a space).</p> +<p>For purposes of this spec, a <a href="@">scheme</a> is any sequence +of 2--32 characters beginning with an ASCII letter and followed +by any combination of ASCII letters, digits, or the symbols plus +("+"), period ("."), or hyphen ("-").</p> +<p>Here are some valid autolinks:</p> +<pre><code class="language-example"><http://foo.bar.baz> +. +<p><a href="http://foo.bar.baz">http://foo.bar.baz</a></p> +</code></pre> +<pre><code class="language-example"><http://foo.bar.baz/test?q=hello&id=22&boolean> +. +<p><a href="http://foo.bar.baz/test?q=hello&amp;id=22&amp;boolean">http://foo.bar.baz/test?q=hello&amp;id=22&amp;boolean</a></p> +</code></pre> +<pre><code class="language-example"><irc://foo.bar:2233/baz> +. +<p><a href="irc://foo.bar:2233/baz">irc://foo.bar:2233/baz</a></p> +</code></pre> +<p>Uppercase is also fine:</p> +<pre><code class="language-example"><MAILTO:FOO@BAR.BAZ> +. +<p><a href="MAILTO:FOO@BAR.BAZ">MAILTO:FOO@BAR.BAZ</a></p> +</code></pre> +<p>Note that many strings that count as [absolute URIs] for +purposes of this spec are not valid URIs, because their +schemes are not registered or because of other problems +with their syntax:</p> +<pre><code class="language-example"><a+b+c:d> +. +<p><a href="a+b+c:d">a+b+c:d</a></p> +</code></pre> +<pre><code class="language-example"><made-up-scheme://foo,bar> +. +<p><a href="made-up-scheme://foo,bar">made-up-scheme://foo,bar</a></p> +</code></pre> +<pre><code class="language-example"><http://../> +. +<p><a href="http://../">http://../</a></p> +</code></pre> +<pre><code class="language-example"><localhost:5001/foo> +. +<p><a href="localhost:5001/foo">localhost:5001/foo</a></p> +</code></pre> +<p>Spaces are not allowed in autolinks:</p> +<pre><code class="language-example"><http://foo.bar/baz bim> +. +<p>&lt;http://foo.bar/baz bim&gt;</p> +</code></pre> +<p>Backslash-escapes do not work inside autolinks:</p> +<pre><code class="language-example"><http://example.com/\[\> +. +<p><a href="http://example.com/%5C%5B%5C">http://example.com/\[\</a></p> +</code></pre> +<p>An <a href="@">email autolink</a> +consists of <code><</code>, followed by an [email address], +followed by <code>></code>. The link's label is the email address, +and the URL is <code>mailto:</code> followed by the email address.</p> +<p>An <a href="@">email address</a>, +for these purposes, is anything that matches +the <a href="https://html.spec.whatwg.org/multipage/forms.html#e-mail-state-(type=email)">non-normative regex from the HTML5 +spec</a>:</p> +<pre><code>/^[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])? +(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/ +</code></pre> +<p>Examples of email autolinks:</p> +<pre><code class="language-example"><foo@bar.example.com> +. +<p><a href="mailto:foo@bar.example.com">foo@bar.example.com</a></p> +</code></pre> +<pre><code class="language-example"><foo+special@Bar.baz-bar0.com> +. +<p><a href="mailto:foo+special@Bar.baz-bar0.com">foo+special@Bar.baz-bar0.com</a></p> +</code></pre> +<p>Backslash-escapes do not work inside email autolinks:</p> +<pre><code class="language-example"><foo\+@bar.example.com> +. +<p>&lt;foo+@bar.example.com&gt;</p> +</code></pre> +<p>These are not autolinks:</p> +<pre><code class="language-example"><> +. +<p>&lt;&gt;</p> +</code></pre> +<pre><code class="language-example">< http://foo.bar > +. +<p>&lt; http://foo.bar &gt;</p> +</code></pre> +<pre><code class="language-example"><m:abc> +. +<p>&lt;m:abc&gt;</p> +</code></pre> +<pre><code class="language-example"><foo.bar.baz> +. +<p>&lt;foo.bar.baz&gt;</p> +</code></pre> +<pre><code class="language-example">http://example.com +. +<p>http://example.com</p> +</code></pre> +<pre><code class="language-example">foo@bar.example.com +. +<p>foo@bar.example.com</p> +</code></pre> +<h2>Raw HTML</h2> +<p>Text between <code><</code> and <code>></code> that looks like an HTML tag is parsed as a +raw HTML tag and will be rendered in HTML without escaping. +Tag and attribute names are not limited to current HTML tags, +so custom tags (and even, say, DocBook tags) may be used.</p> +<p>Here is the grammar for tags:</p> +<p>A <a href="@">tag name</a> consists of an ASCII letter +followed by zero or more ASCII letters, digits, or +hyphens (<code>-</code>).</p> +<p>An <a href="@">attribute</a> consists of spaces, tabs, and up to one line ending, +an [attribute name], and an optional +[attribute value specification].</p> +<p>An <a href="@">attribute name</a> +consists of an ASCII letter, <code>_</code>, or <code>:</code>, followed by zero or more ASCII +letters, digits, <code>_</code>, <code>.</code>, <code>:</code>, or <code>-</code>. (Note: This is the XML +specification restricted to ASCII. HTML5 is laxer.)</p> +<p>An <a href="@">attribute value specification</a> +consists of optional spaces, tabs, and up to one line ending, +a <code>=</code> character, optional spaces, tabs, and up to one line ending, +and an [attribute value].</p> +<p>An <a href="@">attribute value</a> +consists of an [unquoted attribute value], +a [single-quoted attribute value], or a [double-quoted attribute value].</p> +<p>An <a href="@">unquoted attribute value</a> +is a nonempty string of characters not +including spaces, tabs, line endings, <code>"</code>, <code>'</code>, <code>=</code>, <code><</code>, <code>></code>, or <code>`</code>.</p> +<p>A <a href="@">single-quoted attribute value</a> +consists of <code>'</code>, zero or more +characters not including <code>'</code>, and a final <code>'</code>.</p> +<p>A <a href="@">double-quoted attribute value</a> +consists of <code>"</code>, zero or more +characters not including <code>"</code>, and a final <code>"</code>.</p> +<p>An <a href="@">open tag</a> consists of a <code><</code> character, a [tag name], +zero or more [attributes], optional spaces, tabs, and up to one line ending, +an optional <code>/</code> character, and a <code>></code> character.</p> +<p>A <a href="@">closing tag</a> consists of the string <code></</code>, a +[tag name], optional spaces, tabs, and up to one line ending, and the character +<code>></code>.</p> +<p>An <a href="@">HTML comment</a> consists of <code><!--</code> + <em>text</em> + <code>--></code>, +where <em>text</em> does not start with <code>></code> or <code>-></code>, does not end with <code>-</code>, +and does not contain <code>--</code>. (See the +<a href="http://www.w3.org/TR/html5/syntax.html#comments">HTML5 spec</a>.)</p> +<p>A <a href="@">processing instruction</a> +consists of the string <code><?</code>, a string +of characters not including the string <code>?></code>, and the string +<code>?></code>.</p> +<p>A <a href="@">declaration</a> consists of the string <code><!</code>, an ASCII letter, zero or more +characters not including the character <code>></code>, and the character <code>></code>.</p> +<p>A <a href="@">CDATA section</a> consists of +the string <code><![CDATA[</code>, a string of characters not including the string +<code>]]></code>, and the string <code>]]></code>.</p> +<p>An <a href="@">HTML tag</a> consists of an [open tag], a [closing tag], +an [HTML comment], a [processing instruction], a [declaration], +or a [CDATA section].</p> +<p>Here are some simple open tags:</p> +<pre><code class="language-example"><a><bab><c2c> +. +<p><a><bab><c2c></p> +</code></pre> +<p>Empty elements:</p> +<pre><code class="language-example"><a/><b2/> +. +<p><a/><b2/></p> +</code></pre> +<p>Whitespace is allowed:</p> +<pre><code class="language-example"><a /><b2 +data="foo" > +. +<p><a /><b2 +data="foo" ></p> +</code></pre> +<p>With attributes:</p> +<pre><code class="language-example"><a foo="bar" bam = 'baz <em>"</em>' +_boolean zoop:33=zoop:33 /> +. +<p><a foo="bar" bam = 'baz <em>"</em>' +_boolean zoop:33=zoop:33 /></p> +</code></pre> +<p>Custom tag names can be used:</p> +<pre><code class="language-example">Foo <responsive-image src="foo.jpg" /> +. +<p>Foo <responsive-image src="foo.jpg" /></p> +</code></pre> +<p>Illegal tag names, not parsed as HTML:</p> +<pre><code class="language-example"><33> <__> +. +<p>&lt;33&gt; &lt;__&gt;</p> +</code></pre> +<p>Illegal attribute names:</p> +<pre><code class="language-example"><a h*#ref="hi"> +. +<p>&lt;a h*#ref=&quot;hi&quot;&gt;</p> +</code></pre> +<p>Illegal attribute values:</p> +<pre><code class="language-example"><a href="hi'> <a href=hi'> +. +<p>&lt;a href=&quot;hi'&gt; &lt;a href=hi'&gt;</p> +</code></pre> +<p>Illegal whitespace:</p> +<pre><code class="language-example">< a>< +foo><bar/ > +<foo bar=baz +bim!bop /> +. +<p>&lt; a&gt;&lt; +foo&gt;&lt;bar/ &gt; +&lt;foo bar=baz +bim!bop /&gt;</p> +</code></pre> +<p>Missing whitespace:</p> +<pre><code class="language-example"><a href='bar'title=title> +. +<p>&lt;a href='bar'title=title&gt;</p> +</code></pre> +<p>Closing tags:</p> +<pre><code class="language-example"></a></foo > +. +<p></a></foo ></p> +</code></pre> +<p>Illegal attributes in closing tag:</p> +<pre><code class="language-example"></a href="foo"> +. +<p>&lt;/a href=&quot;foo&quot;&gt;</p> +</code></pre> +<p>Comments:</p> +<pre><code class="language-example">foo <!-- this is a +comment - with hyphen --> +. +<p>foo <!-- this is a +comment - with hyphen --></p> +</code></pre> +<pre><code class="language-example">foo <!-- not a comment -- two hyphens --> +. +<p>foo &lt;!-- not a comment -- two hyphens --&gt;</p> +</code></pre> +<p>Not comments:</p> +<pre><code class="language-example">foo <!--> foo --> + +foo <!-- foo---> +. +<p>foo &lt;!--&gt; foo --&gt;</p> +<p>foo &lt;!-- foo---&gt;</p> +</code></pre> +<p>Processing instructions:</p> +<pre><code class="language-example">foo <?php echo $a; ?> +. +<p>foo <?php echo $a; ?></p> +</code></pre> +<p>Declarations:</p> +<pre><code class="language-example">foo <!ELEMENT br EMPTY> +. +<p>foo <!ELEMENT br EMPTY></p> +</code></pre> +<p>CDATA sections:</p> +<pre><code class="language-example">foo <![CDATA[>&<]]> +. +<p>foo <![CDATA[>&<]]></p> +</code></pre> +<p>Entity and numeric character references are preserved in HTML +attributes:</p> +<pre><code class="language-example">foo <a href="&ouml;"> +. +<p>foo <a href="&ouml;"></p> +</code></pre> +<p>Backslash escapes do not work in HTML attributes:</p> +<pre><code class="language-example">foo <a href="\*"> +. +<p>foo <a href="\*"></p> +</code></pre> +<pre><code class="language-example"><a href="\""> +. +<p>&lt;a href=&quot;&quot;&quot;&gt;</p> +</code></pre> +<h2>Hard line breaks</h2> +<p>A line ending (not in a code span or HTML tag) that is preceded +by two or more spaces and does not occur at the end of a block +is parsed as a <a href="@">hard line break</a> (rendered +in HTML as a <code><br /></code> tag):</p> +<pre><code class="language-example">foo +baz +. +<p>foo<br /> +baz</p> +</code></pre> +<p>For a more visible alternative, a backslash before the +[line ending] may be used instead of two or more spaces:</p> +<pre><code class="language-example">foo\ +baz +. +<p>foo<br /> +baz</p> +</code></pre> +<p>More than two spaces can be used:</p> +<pre><code class="language-example">foo +baz +. +<p>foo<br /> +baz</p> +</code></pre> +<p>Leading spaces at the beginning of the next line are ignored:</p> +<pre><code class="language-example">foo + bar +. +<p>foo<br /> +bar</p> +</code></pre> +<pre><code class="language-example">foo\ + bar +. +<p>foo<br /> +bar</p> +</code></pre> +<p>Hard line breaks can occur inside emphasis, links, and other constructs +that allow inline content:</p> +<pre><code class="language-example">*foo +bar* +. +<p><em>foo<br /> +bar</em></p> +</code></pre> +<pre><code class="language-example">*foo\ +bar* +. +<p><em>foo<br /> +bar</em></p> +</code></pre> +<p>Hard line breaks do not occur inside code spans</p> +<pre><code class="language-example">`code +span` +. +<p><code>code span</code></p> +</code></pre> +<pre><code class="language-example">`code\ +span` +. +<p><code>code\ span</code></p> +</code></pre> +<p>or HTML tags:</p> +<pre><code class="language-example"><a href="foo +bar"> +. +<p><a href="foo +bar"></p> +</code></pre> +<pre><code class="language-example"><a href="foo\ +bar"> +. +<p><a href="foo\ +bar"></p> +</code></pre> +<p>Hard line breaks are for separating inline content within a block. +Neither syntax for hard line breaks works at the end of a paragraph or +other block element:</p> +<pre><code class="language-example">foo\ +. +<p>foo\</p> +</code></pre> +<pre><code class="language-example">foo +. +<p>foo</p> +</code></pre> +<pre><code class="language-example">### foo\ +. +<h3>foo\</h3> +</code></pre> +<pre><code class="language-example">### foo +. +<h3>foo</h3> +</code></pre> +<h2>Soft line breaks</h2> +<p>A regular line ending (not in a code span or HTML tag) that is not +preceded by two or more spaces or a backslash is parsed as a +<a href="@">softbreak</a>. (A soft line break may be rendered in HTML either as a +[line ending] or as a space. The result will be the same in +browsers. In the examples here, a [line ending] will be used.)</p> +<pre><code class="language-example">foo +baz +. +<p>foo +baz</p> +</code></pre> +<p>Spaces at the end of the line and beginning of the next line are +removed:</p> +<pre><code class="language-example">foo + baz +. +<p>foo +baz</p> +</code></pre> +<p>A conforming parser may render a soft line break in HTML either as a +line ending or as a space.</p> +<p>A renderer may also provide an option to render soft line breaks +as hard line breaks.</p> +<h2>Textual content</h2> +<p>Any characters not given an interpretation by the above rules will +be parsed as plain textual content.</p> +<pre><code class="language-example">hello $.;'there +. +<p>hello $.;'there</p> +</code></pre> +<pre><code class="language-example">Foo χρῆν +. +<p>Foo χρῆν</p> +</code></pre> +<p>Internal spaces are preserved verbatim:</p> +<pre><code class="language-example">Multiple spaces +. +<p>Multiple spaces</p> +</code></pre> +<!-- END TESTS --> +<h1>Appendix: A parsing strategy</h1> +<p>In this appendix we describe some features of the parsing strategy +used in the CommonMark reference implementations.</p> +<h2>Overview</h2> +<p>Parsing has two phases:</p> +<ol> +<li> +<p>In the first phase, lines of input are consumed and the block +structure of the document---its division into paragraphs, block quotes, +list items, and so on---is constructed. Text is assigned to these +blocks but not parsed. Link reference definitions are parsed and a +map of links is constructed.</p> +</li> +<li> +<p>In the second phase, the raw text contents of paragraphs and headings +are parsed into sequences of Markdown inline elements (strings, +code spans, links, emphasis, and so on), using the map of link +references constructed in phase 1.</p> +</li> +</ol> +<p>At each point in processing, the document is represented as a tree of +<strong>blocks</strong>. The root of the tree is a <code>document</code> block. The <code>document</code> +may have any number of other blocks as <strong>children</strong>. These children +may, in turn, have other blocks as children. The last child of a block +is normally considered <strong>open</strong>, meaning that subsequent lines of input +can alter its contents. (Blocks that are not open are <strong>closed</strong>.) +Here, for example, is a possible document tree, with the open blocks +marked by arrows:</p> +<pre><code class="language-tree">-> document + -> block_quote + paragraph + "Lorem ipsum dolor\nsit amet." + -> list (type=bullet tight=true bullet_char=-) + list_item + paragraph + "Qui *quodsi iracundia*" + -> list_item + -> paragraph + "aliquando id" +</code></pre> +<h2>Phase 1: block structure</h2> +<p>Each line that is processed has an effect on this tree. The line is +analyzed and, depending on its contents, the document may be altered +in one or more of the following ways:</p> +<ol> +<li>One or more open blocks may be closed.</li> +<li>One or more new blocks may be created as children of the +last open block.</li> +<li>Text may be added to the last (deepest) open block remaining +on the tree.</li> +</ol> +<p>Once a line has been incorporated into the tree in this way, +it can be discarded, so input can be read in a stream.</p> +<p>For each line, we follow this procedure:</p> +<ol> +<li> +<p>First we iterate through the open blocks, starting with the +root document, and descending through last children down to the last +open block. Each block imposes a condition that the line must satisfy +if the block is to remain open. For example, a block quote requires a +<code>></code> character. A paragraph requires a non-blank line. +In this phase we may match all or just some of the open +blocks. But we cannot close unmatched blocks yet, because we may have a +[lazy continuation line].</p> +</li> +<li> +<p>Next, after consuming the continuation markers for existing +blocks, we look for new block starts (e.g. <code>></code> for a block quote). +If we encounter a new block start, we close any blocks unmatched +in step 1 before creating the new block as a child of the last +matched container block.</p> +</li> +<li> +<p>Finally, we look at the remainder of the line (after block +markers like <code>></code>, list markers, and indentation have been consumed). +This is text that can be incorporated into the last open +block (a paragraph, code block, heading, or raw HTML).</p> +</li> +</ol> +<p>Setext headings are formed when we see a line of a paragraph +that is a [setext heading underline].</p> +<p>Reference link definitions are detected when a paragraph is closed; +the accumulated text lines are parsed to see if they begin with +one or more reference link definitions. Any remainder becomes a +normal paragraph.</p> +<p>We can see how this works by considering how the tree above is +generated by four lines of Markdown:</p> +<pre><code class="language-markdown">> Lorem ipsum dolor +sit amet. +> - Qui *quodsi iracundia* +> - aliquando id +</code></pre> +<p>At the outset, our document model is just</p> +<pre><code class="language-tree">-> document +</code></pre> +<p>The first line of our text,</p> +<pre><code class="language-markdown">> Lorem ipsum dolor +</code></pre> +<p>causes a <code>block_quote</code> block to be created as a child of our +open <code>document</code> block, and a <code>paragraph</code> block as a child of +the <code>block_quote</code>. Then the text is added to the last open +block, the <code>paragraph</code>:</p> +<pre><code class="language-tree">-> document + -> block_quote + -> paragraph + "Lorem ipsum dolor" +</code></pre> +<p>The next line,</p> +<pre><code class="language-markdown">sit amet. +</code></pre> +<p>is a "lazy continuation" of the open <code>paragraph</code>, so it gets added +to the paragraph's text:</p> +<pre><code class="language-tree">-> document + -> block_quote + -> paragraph + "Lorem ipsum dolor\nsit amet." +</code></pre> +<p>The third line,</p> +<pre><code class="language-markdown">> - Qui *quodsi iracundia* +</code></pre> +<p>causes the <code>paragraph</code> block to be closed, and a new <code>list</code> block +opened as a child of the <code>block_quote</code>. A <code>list_item</code> is also +added as a child of the <code>list</code>, and a <code>paragraph</code> as a child of +the <code>list_item</code>. The text is then added to the new <code>paragraph</code>:</p> +<pre><code class="language-tree">-> document + -> block_quote + paragraph + "Lorem ipsum dolor\nsit amet." + -> list (type=bullet tight=true bullet_char=-) + -> list_item + -> paragraph + "Qui *quodsi iracundia*" +</code></pre> +<p>The fourth line,</p> +<pre><code class="language-markdown">> - aliquando id +</code></pre> +<p>causes the <code>list_item</code> (and its child the <code>paragraph</code>) to be closed, +and a new <code>list_item</code> opened up as child of the <code>list</code>. A <code>paragraph</code> +is added as a child of the new <code>list_item</code>, to contain the text. +We thus obtain the final tree:</p> +<pre><code class="language-tree">-> document + -> block_quote + paragraph + "Lorem ipsum dolor\nsit amet." + -> list (type=bullet tight=true bullet_char=-) + list_item + paragraph + "Qui *quodsi iracundia*" + -> list_item + -> paragraph + "aliquando id" +</code></pre> +<h2>Phase 2: inline structure</h2> +<p>Once all of the input has been parsed, all open blocks are closed.</p> +<p>We then "walk the tree," visiting every node, and parse raw +string contents of paragraphs and headings as inlines. At this +point we have seen all the link reference definitions, so we can +resolve reference links as we go.</p> +<pre><code class="language-tree">document + block_quote + paragraph + str "Lorem ipsum dolor" + softbreak + str "sit amet." + list (type=bullet tight=true bullet_char=-) + list_item + paragraph + str "Qui " + emph + str "quodsi iracundia" + list_item + paragraph + str "aliquando id" +</code></pre> +<p>Notice how the [line ending] in the first paragraph has +been parsed as a <code>softbreak</code>, and the asterisks in the first list item +have become an <code>emph</code>.</p> +<h3>An algorithm for parsing nested emphasis and links</h3> +<p>By far the trickiest part of inline parsing is handling emphasis, +strong emphasis, links, and images. This is done using the following +algorithm.</p> +<p>When we're parsing inlines and we hit either</p> +<ul> +<li>a run of <code>*</code> or <code>_</code> characters, or</li> +<li>a <code>[</code> or <code>![</code></li> +</ul> +<p>we insert a text node with these symbols as its literal content, and we +add a pointer to this text node to the <a href="@">delimiter stack</a>.</p> +<p>The [delimiter stack] is a doubly linked list. Each +element contains a pointer to a text node, plus information about</p> +<ul> +<li>the type of delimiter (<code>[</code>, <code>![</code>, <code>*</code>, <code>_</code>)</li> +<li>the number of delimiters,</li> +<li>whether the delimiter is "active" (all are active to start), and</li> +<li>whether the delimiter is a potential opener, a potential closer, +or both (which depends on what sort of characters precede +and follow the delimiters).</li> +</ul> +<p>When we hit a <code>]</code> character, we call the <em>look for link or image</em> +procedure (see below).</p> +<p>When we hit the end of the input, we call the <em>process emphasis</em> +procedure (see below), with <code>stack_bottom</code> = NULL.</p> +<h4><em>look for link or image</em></h4> +<p>Starting at the top of the delimiter stack, we look backwards +through the stack for an opening <code>[</code> or <code>![</code> delimiter.</p> +<ul> +<li> +<p>If we don't find one, we return a literal text node <code>]</code>.</p> +</li> +<li> +<p>If we do find one, but it's not <em>active</em>, we remove the inactive +delimiter from the stack, and return a literal text node <code>]</code>.</p> +</li> +<li> +<p>If we find one and it's active, then we parse ahead to see if +we have an inline link/image, reference link/image, compact reference +link/image, or shortcut reference link/image.</p> +<ul> +<li> +<p>If we don't, then we remove the opening delimiter from the +delimiter stack and return a literal text node <code>]</code>.</p> +</li> +<li> +<p>If we do, then</p> +<ul> +<li> +<p>We return a link or image node whose children are the inlines +after the text node pointed to by the opening delimiter.</p> +</li> +<li> +<p>We run <em>process emphasis</em> on these inlines, with the <code>[</code> opener +as <code>stack_bottom</code>.</p> +</li> +<li> +<p>We remove the opening delimiter.</p> +</li> +<li> +<p>If we have a link (and not an image), we also set all +<code>[</code> delimiters before the opening delimiter to <em>inactive</em>. (This +will prevent us from getting links within links.)</p> +</li> +</ul> +</li> +</ul> +</li> +</ul> +<h4><em>process emphasis</em></h4> +<p>Parameter <code>stack_bottom</code> sets a lower bound to how far we +descend in the [delimiter stack]. If it is NULL, we can +go all the way to the bottom. Otherwise, we stop before +visiting <code>stack_bottom</code>.</p> +<p>Let <code>current_position</code> point to the element on the [delimiter stack] +just above <code>stack_bottom</code> (or the first element if <code>stack_bottom</code> +is NULL).</p> +<p>We keep track of the <code>openers_bottom</code> for each delimiter +type (<code>*</code>, <code>_</code>), indexed to the length of the closing delimiter run +(modulo 3) and to whether the closing delimiter can also be an +opener. Initialize this to <code>stack_bottom</code>.</p> +<p>Then we repeat the following until we run out of potential +closers:</p> +<ul> +<li> +<p>Move <code>current_position</code> forward in the delimiter stack (if needed) +until we find the first potential closer with delimiter <code>*</code> or <code>_</code>. +(This will be the potential closer closest +to the beginning of the input -- the first one in parse order.)</p> +</li> +<li> +<p>Now, look back in the stack (staying above <code>stack_bottom</code> and +the <code>openers_bottom</code> for this delimiter type) for the +first matching potential opener ("matching" means same delimiter).</p> +</li> +<li> +<p>If one is found:</p> +<ul> +<li> +<p>Figure out whether we have emphasis or strong emphasis: +if both closer and opener spans have length >= 2, we have +strong, otherwise regular.</p> +</li> +<li> +<p>Insert an emph or strong emph node accordingly, after +the text node corresponding to the opener.</p> +</li> +<li> +<p>Remove any delimiters between the opener and closer from +the delimiter stack.</p> +</li> +<li> +<p>Remove 1 (for regular emph) or 2 (for strong emph) delimiters +from the opening and closing text nodes. If they become empty +as a result, remove them and remove the corresponding element +of the delimiter stack. If the closing node is removed, reset +<code>current_position</code> to the next element in the stack.</p> +</li> +</ul> +</li> +<li> +<p>If none is found:</p> +<ul> +<li> +<p>Set <code>openers_bottom</code> to the element before <code>current_position</code>. +(We know that there are no openers for this kind of closer up to and +including this point, so this puts a lower bound on future searches.)</p> +</li> +<li> +<p>If the closer at <code>current_position</code> is not a potential opener, +remove it from the delimiter stack (since we know it can't +be a closer either).</p> +</li> +<li> +<p>Advance <code>current_position</code> to the next element in the stack.</p> +</li> +</ul> +</li> +</ul> +<p>After we're done, we remove all delimiters above <code>stack_bottom</code> from the +delimiter stack.</p> |