summaryrefslogtreecommitdiffstats
path: root/extensions/spellcheck/hunspell/src/README
diff options
context:
space:
mode:
Diffstat (limited to 'extensions/spellcheck/hunspell/src/README')
-rw-r--r--extensions/spellcheck/hunspell/src/README315
1 files changed, 315 insertions, 0 deletions
diff --git a/extensions/spellcheck/hunspell/src/README b/extensions/spellcheck/hunspell/src/README
new file mode 100644
index 0000000000..27240e7880
--- /dev/null
+++ b/extensions/spellcheck/hunspell/src/README
@@ -0,0 +1,315 @@
+# About Hunspell
+
+Hunspell is a free spell checker and morphological analyzer library
+and command-line tool, licensed under LGPL/GPL/MPL tri-license.
+
+Hunspell is used by LibreOffice office suite, free browsers, like
+Mozilla Firefox and Google Chrome, and other tools and OSes, like
+Linux distributions and macOS. It is also a command-line tool for
+Linux, Unix-like and other OSes.
+
+It is designed for quick and high quality spell checking and
+correcting for languages with word-level writing system,
+including languages with rich morphology, complex word compounding
+and character encoding.
+
+Hunspell interfaces: Ispell-like terminal interface using Curses
+library, Ispell pipe interface, C++/C APIs and shared library, also
+with existing language bindings for other programming languages.
+
+Hunspell's code base comes from OpenOffice.org's MySpell library,
+developed by Kevin Hendricks (originally a C++ reimplementation of
+spell checking and affixation of Geoff Kuenning's International
+Ispell from scratch, later extended with eg. n-gram suggestions),
+see http://lingucomponent.openoffice.org/MySpell-3.zip, and
+its README, CONTRIBUTORS and license.readme (here: license.myspell) files.
+
+Main features of Hunspell library, developed by László Németh:
+
+ - Unicode support
+ - Highly customizable suggestions: word-part replacement tables and
+ stem-level phonetic and other alternative transcriptions to recognize
+ and fix all typical misspellings, don't suggest offensive words etc.
+ - Complex morphology: dictionary and affix homonyms; twofold affix
+ stripping to handle inflectional and derivational morpheme groups for
+ agglutinative languages, like Azeri, Basque, Estonian, Finnish, Hungarian,
+ Turkish; 64 thousand affix classes with arbitrary number of affixes;
+ conditional affixes, circumfixes, fogemorphemes, zero morphemes,
+ virtual dictionary stems, forbidden words to avoid overgeneration etc.
+ - Handling complex compounds (for example, for Finno-Ugric, German and
+ Indo-Aryan languages): recognizing compounds made of arbitrary
+ number of words, handle affixation within compounds etc.
+ - Custom dictionaries with affixation
+ - Stemming
+ - Morphological analysis (in custom item and arrangement style)
+ - Morphological generation
+ - SPELLML XML API over plain spell() API function for easier integration
+ of stemming, morpological generation and custom dictionaries with affixation
+ - Language specific algorithms, like special casing of Azeri or Turkish
+ dotted i and German sharp s, and special compound rules of Hungarian.
+
+Main features of Hunspell command line tool, developed by László Németh:
+
+ - Reimplementation of quick interactive interface of Geoff Kuenning's Ispell
+ - Parsing formats: text, OpenDocument, TeX/LaTeX, HTML/SGML/XML, nroff/troff
+ - Custom dictionaries with optional affixation, specified by a model word
+ - Multiple dictionary usage (for example hunspell -d en_US,de_DE,de_medical)
+ - Various filtering options (bad or good words/lines)
+ - Morphological analysis (option -m)
+ - Stemming (option -s)
+
+See man hunspell, man 3 hunspell, man 5 hunspell for complete manual.
+
+# Dependencies
+
+Build only dependencies:
+
+ g++ make autoconf automake autopoint libtool
+
+Runtime dependencies:
+
+| | Mandatory | Optional |
+|---------------|------------------|------------------|
+|libhunspell | | |
+|hunspell tool | libiconv gettext | ncurses readline |
+
+# Compiling on GNU/Linux and Unixes
+
+We first need to download the dependencies. On Linux, `gettext` and
+`libiconv` are part of the standard library. On other Unixes we
+need to manually install them.
+
+For Ubuntu:
+
+ sudo apt install autoconf automake autopoint libtool
+
+Then run the following commands:
+
+ autoreconf -vfi
+ ./configure
+ make
+ sudo make install
+ sudo ldconfig
+
+For dictionary development, use the `--with-warnings` option of
+configure.
+
+For interactive user interface of Hunspell executable, use the
+`--with-ui option`.
+
+Optional developer packages:
+
+ - ncurses (need for --with-ui), eg. libncursesw5 for UTF-8
+ - readline (for fancy input line editing, configure parameter:
+ --with-readline)
+
+In Ubuntu, the packages are:
+
+ libncurses5-dev libreadline-dev
+
+# Compiling on OSX and macOS
+
+On macOS for compiler always use `clang` and not `g++` because Homebrew
+dependencies are build with that.
+
+ brew install autoconf automake libtool gettext
+ brew link gettext --force
+
+Then run autoreconf, configure, make. See above.
+
+# Compiling on Windows
+
+## Compiling with Mingw64 and MSYS2
+
+Download Msys2, update everything and install the following
+ packages:
+
+ pacman -S base-devel mingw-w64-x86_64-toolchain mingw-w64-x86_64-libtool
+
+Open Mingw-w64 Win64 prompt and compile the same way as on Linux, see
+above.
+
+## Compiling in Cygwin environment
+
+Download and install Cygwin environment for Windows with the following
+extra packages:
+
+ - make
+ - automake
+ - autoconf
+ - libtool
+ - gcc-g++ development package
+ - ncurses, readline (for user interface)
+ - iconv (character conversion)
+
+Then compile the same way as on Linux. Cygwin builds depend on
+Cygwin1.dll.
+
+# Debugging
+
+It is recommended to install a debug build of the standard library:
+
+ libstdc++6-6-dbg
+
+For debugging we need to create a debug build and then we need to start
+`gdb`.
+
+ ./configure CXXFLAGS='-g -O0 -Wall -Wextra'
+ make
+ ./libtool --mode=execute gdb src/tools/hunspell
+
+You can also pass the `CXXFLAGS` directly to `make` without calling
+`./configure`, but we don't recommend this way during long development
+sessions.
+
+If you like to develop and debug with an IDE, see documentation at
+https://github.com/hunspell/hunspell/wiki/IDE-Setup
+
+# Testing
+
+Testing Hunspell (see tests in tests/ subdirectory):
+
+ make check
+
+or with Valgrind debugger:
+
+ make check
+ VALGRIND=[Valgrind_tool] make check
+
+For example:
+
+ make check
+ VALGRIND=memcheck make check
+
+# Documentation
+
+features and dictionary format:
+
+ man 5 hunspell
+ man hunspell
+ hunspell -h
+
+http://hunspell.github.io/
+
+# Usage
+
+After compiling and installing (see INSTALL) you can run the Hunspell
+spell checker (compiled with user interface) with a Hunspell or Myspell
+dictionary:
+
+ hunspell -d en_US text.txt
+
+or without interface:
+
+ hunspell
+ hunspell -d en_GB -l <text.txt
+
+Dictionaries consist of an affix (.aff) and dictionary (.dic) file, for
+example, download American English dictionary files of LibreOffice
+(older version, but with stemming and morphological generation) with
+
+ wget -O en_US.aff https://cgit.freedesktop.org/libreoffice/dictionaries/plain/en/en_US.aff?id=a4473e06b56bfe35187e302754f6baaa8d75e54f
+ wget -O en_US.dic https://cgit.freedesktop.org/libreoffice/dictionaries/plain/en/en_US.dic?id=a4473e06b56bfe35187e302754f6baaa8d75e54f
+
+and with command line input and output, it's possible to check its work quickly,
+for example with the input words "example", "examples", "teached" and
+"verybaaaaaaaaaaaaaaaaaaaaaad":
+
+ $ hunspell -d en_US
+ Hunspell 1.7.0
+ example
+ *
+
+ examples
+ + example
+
+ teached
+ & teached 9 0: taught, teased, reached, teaches, teacher, leached, beached
+
+ verybaaaaaaaaaaaaaaaaaaaaaad
+ # verybaaaaaaaaaaaaaaaaaaaaaad 0
+
+Where in the output, `*` and `+` mean correct (accepted) words (`*` = dictionary stem,
+`+` = affixed forms of the following dictionary stem), and
+`&` and `#` mean bad (rejected) words (`&` = with suggestions, `#` = without suggestions)
+(see man hunspell).
+
+Example for stemming:
+
+ $ hunspell -d en_US -s
+ mice
+ mice mouse
+
+Example for morphological analysis (very limited with this English dictionary):
+
+ $ hunspell -d en_US -m
+ mice
+ mice st:mouse ts:Ns
+
+ cats
+ cats st:cat ts:0 is:Ns
+ cats st:cat ts:0 is:Vs
+
+# Other executables
+
+The src/tools directory contains the following executables after compiling.
+
+ - The main executable:
+ - hunspell: main program for spell checking and others (see
+ manual)
+ - Example tools:
+ - analyze: example of spell checking, stemming and morphological
+ analysis
+ - chmorph: example of automatic morphological generation and
+ conversion
+ - example: example of spell checking and suggestion
+ - Tools for dictionary development:
+ - affixcompress: dictionary generation from large (millions of
+ words) vocabularies
+ - makealias: alias compression (Hunspell only, not back compatible
+ with MySpell)
+ - wordforms: word generation (Hunspell version of unmunch)
+ - hunzip: decompressor of hzip format
+ - hzip: compressor of hzip format
+ - munch (DEPRECATED, use affixcompress): dictionary generation
+ from vocabularies (it needs an affix file, too).
+ - unmunch (DEPRECATED, use wordforms): list all recognized words
+ of a MySpell dictionary
+
+Example for morphological generation:
+
+ $ ~/hunspell/src/tools/analyze en_US.aff en_US.dic /dev/stdin
+ cat mice
+ generate(cat, mice) = cats
+ mouse cats
+ generate(mouse, cats) = mice
+ generate(mouse, cats) = mouses
+
+# Using Hunspell library with GCC
+
+Including in your program:
+
+ #include <hunspell.hxx>
+
+Linking with Hunspell static library:
+
+ g++ -lhunspell-1.7 example.cxx
+ # or better, use pkg-config
+ g++ $(pkg-config --cflags --libs hunspell) example.cxx
+
+## Dictionaries
+
+Hunspell (MySpell) dictionaries:
+
+ - https://wiki.documentfoundation.org/Language_support_of_LibreOffice
+ - http://cgit.freedesktop.org/libreoffice/dictionaries
+ - http://extensions.libreoffice.org
+ - http://extensions.openoffice.org
+ - http://wiki.services.openoffice.org/wiki/Dictionaries
+
+Aspell dictionaries (conversion: man 5 hunspell):
+
+ - ftp://ftp.gnu.org/gnu/aspell/dict
+
+László Németh, nemeth at numbertext org
+