1 files changed, 167 insertions, 0 deletions
diff --git a/third-party/utf8cpp/test_data/utf8samples/Unicode_transcriptions.html b/third-party/utf8cpp/test_data/utf8samples/Unicode_transcriptions.html
new file mode 100644
index 0000000..69b29ff
--- /dev/null
+++ b/third-party/utf8cpp/test_data/utf8samples/Unicode_transcriptions.html
@@ -0,0 +1,167 @@
+? 	*Unicode Transcriptions* 	Notes <#Notes>
+
+Glyphs <http://www.macchiato.com/unicode/show.html> | Samples
+<http://www.macchiato.com/unicode/Unicode_transcriptions.html> | Charts
+<http://www.macchiato.com/unicode/charts.html> | UTF
+<http://www.macchiato.com/unicode/convert.html> | Forms
+<http://www-4.ibm.com/software/developer/library/utfencodingforms/> |
+Home <http://www.macchiato.com>.
+<http://member.linkexchange.com/cgi-bin/fc/fastcounter-login?750641>
+
+Name 	Text 	Image
+Arabic (Arabic) 	يونِكود 	?
+Arabic (Persian) 	یونی‌کُد 	/ ?/
+Armenian 	Յունիկօդ 	
+Bengali 	য়ূনিকোড 	
+Bopomofo 	ㄊㄨㄥ˅ ㄧˋ ㄇㄚ˅ 	
+ㄨㄢˋ ㄍㄨㄛˊ ㄇㄚ˅ 	
+Braille 	  	 
+Buhid 	  	 
+Canadian Aboriginal 	ᔫᗂᑰᑦ 	
+Cherokee 	ᏳᏂᎪᏛ 	
+Cypriot 	  	 
+Cyrillic (Russian) 	Юникод 	?
+Deseret (English) 	??????? 	
+Devanagari (Hindi) 	यूनिकोड 	?
+Ethiopic 	ዩኒኮድ 	
+Georgian 	უნიკოდი 	?
+Gothic 	  	 
+Greek 	Γιούνικοντ 	
+Gujarati 	યૂનિકોડ 	
+Gurmukhi 	ਯੂਨਿਕੋਡ 	
+Han (Chinese) 	统一码 	?
+統一碼 	?
+万国码 	?
+萬國碼 	?
+Hangul 	유니코드 	
+Hanunoo 	  	 
+Hebrew 	יוניקוד 	
+Hebrew (pointed) 	יוּנִיקוׁד 	
+Hebrew (Yiddish) 	יוניקאָד 	?
+Hiragana (Japanese) 	ゆにこおど 	 
+Katakana (Japanese) 	ユニコード 	?
+Kannada 	ಯೂನಿಕೋಡ್ 	
+Khmer 	យូនីគោដ 	
+Lao 	  	 
+Latin 	Unicode 	Unicode
+Latin (IPA <#English_Pronunciation>) 	ˈjunɪˌkoːd 	?
+Latin (Am. Dict. <#American_Dictionary>) 	Ūnĭcōde̽ 	?
+Limbu 	  	 
+Linear B 	  	 
+Malayalam 	യൂനികോഡ് 	
+Mongolian 	  	
+Myanmar 	  	
+Ogham 	ᚔᚒᚅᚔᚉᚑᚇ 	/ /
+Old Italic 	  	 
+Oriya 	ୟୂନିକୋଡ 	
+Osmanya 	  	 
+Runic (Anglo-Saxon) 	ᛡᚢᚾᛁᚳᚩᛞ 	
+Shavian 	  	 
+Sinhala 	යණනිකෞද් 	
+Syriac 	ܝܘܢܝܩܘܕ 	
+Tagbanwa 	  	 
+Tagalog 	  	 
+Tai Le 	  	 
+Tamil 	யூனிகோட் 	
+Telugu 	యూనికోడ్ 	
+Thaana 	  	
+Thai 	ยูนืโคด 	
+Tibetan (Dzongkha) 	ཨུ་ནི་ཀོཌྲ། 	
+Ugaritic 	  	 
+Yi 	  	
+
+
+      Notes:
+
+There are different ways to transcribe the word “Unicode”, depending on
+the language and script. In some cases there is only one language that
+customarily uses a given script; in others there are many languages. The
+goal here is at a minimum to collect at least one transcription for each
+script in a language customarily written in that script, with more
+languages if possible. If the transcription is the same for multiple
+languages in a script, then a single representative language is used.
+
+Still missing are transcriptions for the items above in RED (in at least
+one language). I would appreciate any other transcriptions, or
+corrections for the ones listed here. Send to mark3@macchiato.com
+<mailto:mark3@macchiato.com>, using the directions below:
+
+    * *Supplying Missing Items*
+          o Most Latin-script languages will follow the spelling, and
+            change the pronunciation. For any that would not, it would
+            be good to have the alternate spelling.
+          o For non-Latin scripts the goal is to match the English
+            pronunciation — /*not*/ spelling. Above is the IPA <#IPA>
+            (in phonemic transcription) that should be matched as
+            closely as possible (without sounding affected in the target
+            language)
+          o Text would be best in either the UTF-8 text, or the code
+            points in hex HTML. E.g. either of the following:
+                + "Юникод"
+                + "&#x042E;&#x043D;&#x0438;&#x043A;&#x043E;&#x0434;"
+                + Note: for / supplementary characters/
+                  <http://www.unicode.org/glossary/#supplementary_character>,
+                  there should be one hex number per code point, not two
+                  surrogates
+                  <http://www.unicode.org/glossary/#surrogate_code_point>:
+                      # &#x10000; /*not*/ &#xD800;&xDC00;
+          o If you have a good font, I'd also appreciate a GIF. It
+            should be *96 x 24* bits, with the text centered, in black
+            on white (plus grays if smoothed).
+    * *Other Comments*
+          o Because some browsers won't handle the text, both text and
+            GIF image are supplied. If you can’t read the text columns,
+            see Display Problems
+            <http://www.unicode.org/help/display_problems.html>.
+          o The Chinese versions (inc. Bopomofo) are translations, not
+            transcriptions, since "transcription in Chinese is pretty
+            lame" [J. Becker].
+          o There are other "translations" of Unicode that may be in
+            use, such as the Vietnamese "Thống Nhất Mã".
+          o For sample pages in different languages on the Unicode site,
+            see What is Unicode?
+            <http://www.unicode.org/unicode/standard/WhatIsUnicode.html>
+          o Americans are not generally used to IPA, and find a variety
+            of different systems in their dictionaries. This one leaves
+            the base letters as they are, and uses diacritics for
+            pronunciation.
+    * *Etymology of /Unicode/*
+          o Coined by J. Becker. Not related to previous usages, such as:
+                + A telegraphic code in which one word or set of letters
+                  represents a sentence or phrase; a telegram or message
+                  in this. (late 19th century, OED)
+          o According to my references, the prefix "uni" is directly
+            from Latin while the word "code" is through French.
+          o The original Indo-European apparently would have been
+            *oino-kau-do ("one strike give"): *kau apparently being
+            related to such English words as: hew, haggle, hoe, hag,
+            hay, hack, caudad, caudal, caudate, caudex, coda, codex,
+            codicil, coward, incus, and Kovač (personal name: "smith").
+                + I will leave the exact derivations to the exegetes,
+                  but I like the association with "haggle" myself.
+    * *Contributions*
+          o This draws on contributions or comments from:
+                + Dixon Au
+                + Joe Becker
+                + Maurice Bauhahn
+                + Abel Cheung
+                + Peter Constable
+                + Michael Everson
+                + Christopher John Fynn
+                + Michael Kaplan
+                + George Kiraz
+                + Abdul Malik
+                + Siva Nataraja
+                + Roozbeh Pournader
+                + Jonathan Rosenne
+                + Jungshik Shin
+
+------------------------------------------------------------------------
+	
+
+Terms of Use <http://www.macchiato.com/terms_of_use.html>. Last updated:
+MED - 04/20/2003 15:30:33.
+<http://member.linkexchange.com/cgi-bin/fc/fastcounter-login?750641>
+
+ 
+