summaryrefslogtreecommitdiffstats
path: root/intl/icu_capi/cpp/docs/source/segmenter_word_ffi.rst
diff options
context:
space:
mode:
Diffstat (limited to 'intl/icu_capi/cpp/docs/source/segmenter_word_ffi.rst')
-rw-r--r--intl/icu_capi/cpp/docs/source/segmenter_word_ffi.rst150
1 files changed, 150 insertions, 0 deletions
diff --git a/intl/icu_capi/cpp/docs/source/segmenter_word_ffi.rst b/intl/icu_capi/cpp/docs/source/segmenter_word_ffi.rst
new file mode 100644
index 0000000000..fc9cbe1494
--- /dev/null
+++ b/intl/icu_capi/cpp/docs/source/segmenter_word_ffi.rst
@@ -0,0 +1,150 @@
+``segmenter_word::ffi``
+=======================
+
+.. cpp:enum-struct:: ICU4XSegmenterWordType
+
+ See the `Rust documentation for WordType <https://docs.rs/icu/latest/icu/segmenter/enum.WordType.html>`__ for more information.
+
+
+ .. cpp:enumerator:: None
+
+ .. cpp:enumerator:: Number
+
+ .. cpp:enumerator:: Letter
+
+.. cpp:class:: ICU4XWordBreakIteratorLatin1
+
+ See the `Rust documentation for WordBreakIterator <https://docs.rs/icu/latest/icu/segmenter/struct.WordBreakIterator.html>`__ for more information.
+
+
+ .. cpp:function:: int32_t next()
+
+ Finds the next breakpoint. Returns -1 if at the end of the string or if the index is out of range of a 32-bit signed integer.
+
+ See the `Rust documentation for next <https://docs.rs/icu/latest/icu/segmenter/struct.WordBreakIterator.html#method.next>`__ for more information.
+
+
+ .. cpp:function:: ICU4XSegmenterWordType word_type() const
+
+ Return the status value of break boundary.
+
+ See the `Rust documentation for word_type <https://docs.rs/icu/latest/icu/segmenter/struct.WordBreakIterator.html#method.word_type>`__ for more information.
+
+
+ .. cpp:function:: bool is_word_like() const
+
+ Return true when break boundary is word-like such as letter/number/CJK
+
+ See the `Rust documentation for is_word_like <https://docs.rs/icu/latest/icu/segmenter/struct.WordBreakIterator.html#method.is_word_like>`__ for more information.
+
+
+.. cpp:class:: ICU4XWordBreakIteratorUtf16
+
+ See the `Rust documentation for WordBreakIterator <https://docs.rs/icu/latest/icu/segmenter/struct.WordBreakIterator.html>`__ for more information.
+
+
+ .. cpp:function:: int32_t next()
+
+ Finds the next breakpoint. Returns -1 if at the end of the string or if the index is out of range of a 32-bit signed integer.
+
+ See the `Rust documentation for next <https://docs.rs/icu/latest/icu/segmenter/struct.WordBreakIterator.html#method.next>`__ for more information.
+
+
+ .. cpp:function:: ICU4XSegmenterWordType word_type() const
+
+ Return the status value of break boundary.
+
+ See the `Rust documentation for word_type <https://docs.rs/icu/latest/icu/segmenter/struct.WordBreakIterator.html#method.word_type>`__ for more information.
+
+
+ .. cpp:function:: bool is_word_like() const
+
+ Return true when break boundary is word-like such as letter/number/CJK
+
+ See the `Rust documentation for is_word_like <https://docs.rs/icu/latest/icu/segmenter/struct.WordBreakIterator.html#method.is_word_like>`__ for more information.
+
+
+.. cpp:class:: ICU4XWordBreakIteratorUtf8
+
+ See the `Rust documentation for WordBreakIterator <https://docs.rs/icu/latest/icu/segmenter/struct.WordBreakIterator.html>`__ for more information.
+
+
+ .. cpp:function:: int32_t next()
+
+ Finds the next breakpoint. Returns -1 if at the end of the string or if the index is out of range of a 32-bit signed integer.
+
+ See the `Rust documentation for next <https://docs.rs/icu/latest/icu/segmenter/struct.WordBreakIterator.html#method.next>`__ for more information.
+
+
+ .. cpp:function:: ICU4XSegmenterWordType word_type() const
+
+ Return the status value of break boundary.
+
+ See the `Rust documentation for word_type <https://docs.rs/icu/latest/icu/segmenter/struct.WordBreakIterator.html#method.word_type>`__ for more information.
+
+
+ .. cpp:function:: bool is_word_like() const
+
+ Return true when break boundary is word-like such as letter/number/CJK
+
+ See the `Rust documentation for is_word_like <https://docs.rs/icu/latest/icu/segmenter/struct.WordBreakIterator.html#method.is_word_like>`__ for more information.
+
+
+.. cpp:class:: ICU4XWordSegmenter
+
+ An ICU4X word-break segmenter, capable of finding word breakpoints in strings.
+
+ See the `Rust documentation for WordSegmenter <https://docs.rs/icu/latest/icu/segmenter/struct.WordSegmenter.html>`__ for more information.
+
+
+ .. cpp:function:: static diplomat::result<ICU4XWordSegmenter, ICU4XError> create_auto(const ICU4XDataProvider& provider)
+
+ Construct an :cpp:class:`ICU4XWordSegmenter` with automatically selecting the best available LSTM or dictionary payload data.
+
+ Note: currently, it uses dictionary for Chinese and Japanese, and LSTM for Burmese, Khmer, Lao, and Thai.
+
+ See the `Rust documentation for new_auto <https://docs.rs/icu/latest/icu/segmenter/struct.WordSegmenter.html#method.new_auto>`__ for more information.
+
+
+ .. cpp:function:: static diplomat::result<ICU4XWordSegmenter, ICU4XError> create_lstm(const ICU4XDataProvider& provider)
+
+ Construct an :cpp:class:`ICU4XWordSegmenter` with LSTM payload data for Burmese, Khmer, Lao, and Thai.
+
+ Warning: :cpp:class:`ICU4XWordSegmenter` created by this function doesn't handle Chinese or Japanese.
+
+ See the `Rust documentation for new_lstm <https://docs.rs/icu/latest/icu/segmenter/struct.WordSegmenter.html#method.new_lstm>`__ for more information.
+
+
+ .. cpp:function:: static diplomat::result<ICU4XWordSegmenter, ICU4XError> create_dictionary(const ICU4XDataProvider& provider)
+
+ Construct an :cpp:class:`ICU4XWordSegmenter` with dictionary payload data for Chinese, Japanese, Burmese, Khmer, Lao, and Thai.
+
+ See the `Rust documentation for new_dictionary <https://docs.rs/icu/latest/icu/segmenter/struct.WordSegmenter.html#method.new_dictionary>`__ for more information.
+
+
+ .. cpp:function:: ICU4XWordBreakIteratorUtf8 segment_utf8(const std::string_view input) const
+
+ Segments a (potentially ill-formed) UTF-8 string.
+
+ See the `Rust documentation for segment_utf8 <https://docs.rs/icu/latest/icu/segmenter/struct.WordSegmenter.html#method.segment_utf8>`__ for more information.
+
+ Lifetimes: ``this``, ``input`` must live at least as long as the output.
+
+
+ .. cpp:function:: ICU4XWordBreakIteratorUtf16 segment_utf16(const diplomat::span<const uint16_t> input) const
+
+ Segments a UTF-16 string.
+
+ See the `Rust documentation for segment_utf16 <https://docs.rs/icu/latest/icu/segmenter/struct.WordSegmenter.html#method.segment_utf16>`__ for more information.
+
+ Lifetimes: ``this``, ``input`` must live at least as long as the output.
+
+
+ .. cpp:function:: ICU4XWordBreakIteratorLatin1 segment_latin1(const diplomat::span<const uint8_t> input) const
+
+ Segments a Latin-1 string.
+
+ See the `Rust documentation for segment_latin1 <https://docs.rs/icu/latest/icu/segmenter/struct.WordSegmenter.html#method.segment_latin1>`__ for more information.
+
+ Lifetimes: ``this``, ``input`` must live at least as long as the output.
+