diff options
Diffstat (limited to 'rust/vendor/widestring/src')
-rw-r--r-- | rust/vendor/widestring/src/lib.rs | 238 | ||||
-rw-r--r-- | rust/vendor/widestring/src/platform/mod.rs | 9 | ||||
-rw-r--r-- | rust/vendor/widestring/src/platform/other.rs | 9 | ||||
-rw-r--r-- | rust/vendor/widestring/src/platform/windows.rs | 11 | ||||
-rw-r--r-- | rust/vendor/widestring/src/ucstr.rs | 538 | ||||
-rw-r--r-- | rust/vendor/widestring/src/ucstring.rs | 1570 | ||||
-rw-r--r-- | rust/vendor/widestring/src/ustr.rs | 359 | ||||
-rw-r--r-- | rust/vendor/widestring/src/ustring.rs | 785 |
8 files changed, 3519 insertions, 0 deletions
diff --git a/rust/vendor/widestring/src/lib.rs b/rust/vendor/widestring/src/lib.rs new file mode 100644 index 0000000..d62fd98 --- /dev/null +++ b/rust/vendor/widestring/src/lib.rs @@ -0,0 +1,238 @@ +//! A wide string FFI module for converting to and from wide string variants. +//! +//! This module provides multiple types of wide strings: `U16String`, `U16CString`, `U32String`, +//! and `U32CString`. These types are backed by two generic implementations parameterized by +//! element size: `UString<C>` and `UCString<C>`. The `UCString` types are analogous to the +//! standard `CString` FFI type, while the `UString` types are analogous to `OsString`. Otherwise, +//! `U16` and `U32` types differ only in character width and encoding methods. +//! +//! For `U16String` and `U32String`, no guarantees are made about the underlying string data; they +//! are simply a sequence of UTF-16 *code units* or UTF-32 code points, both of which may be +//! ill-formed or contain nul values. `U16CString` and `U32CString`, on the other hand, are aware +//! of nul values and are guaranteed to be terminated with a nul value (unless unchecked methods +//! are used to construct the strings). Because `U16CString` and `U32CString` are C-style, +//! nul-terminated strings, they will have no interior nul values. All four string types may still +//! have unpaired UTF-16 surrogates or invalid UTF-32 code points; ill-formed data is preserved +//! until conversion to a basic Rust `String`. +//! +//! Use `U16String` or `U32String` when you simply need to pass-through strings, or when you know +//! or don't care if you're not dealing with a nul-terminated string, such as when string lengths +//! are provided and you are only reading strings from FFI, not writing them out to a FFI. +//! +//! Use `U16CString` or `U32CString` when you must properly handle nul values, and must deal with +//! nul-terminated C-style wide strings, such as when you pass strings into FFI functions. +//! +//! # Relationship to other Rust Strings +//! +//! Standard Rust strings `String` and `str` are well-formed Unicode data encoded as UTF-8. The +//! standard strings provide proper handling of Unicode and ensure strong safety guarantees. +//! +//! `CString` and `CStr` are strings used for C FFI. They handle nul-terminated C-style strings. +//! However, they do not have a builtin encoding, and conversions between C-style and other Rust +//! strings must specifically encode and decode the strings, and handle possibly invalid encoding +//! data. They are safe to use only in passing string-like data back and forth from C APIs but do +//! not provide any other guarantees, so may not be well-formed. +//! +//! `OsString` and `OsStr` are also strings for use with FFI. Unlike `CString`, they do no special +//! handling of nul values, but instead have an OS-specified encoding. While, for example, on Linux +//! systems this is usually the UTF-8 encoding, this is not the case for every platform. The +//! encoding may not even be 8-bit: on Windows, `OsString` uses a malformed encoding sometimes +//! referred to as "WTF-8". In any case, like `CString`, `OsString` has no additional guarantees +//! and may not be well-formed. +//! +//! Due to the loss of safety of these other string types, conversion to standard Rust `String` is +//! lossy, and may require knowledge of the underlying encoding, including platform-specific +//! quirks. +//! +//! The wide strings in this crate are roughly based on the principles of the string types in +//! `std::ffi`, though there are differences. `U16String`, `U32String`, `U16Str`, and `U32Str` are +//! roughly similar in role to `OsString` and `OsStr`, while `U16CString`, `U32CString`, `U16CStr`, +//! and `U32CStr` are roughly similar in role to `CString` and `CStr`. Conversion to FFI string +//! types is generally very straight forward and safe, while conversion directly between standard +//! Rust `String` is a lossy conversion just as `OsString` is. +//! +//! `U16String` and `U16CString` are treated as though they use UTF-16 encoding, even if they may +//! contain unpaired surrogates. `U32String` and `U32CString` are treated as though they use UTF-32 +//! encoding, even if they may contain values outside the valid Unicode character range. +//! +//! # Remarks on UTF-16 Code Units +//! +//! *Code units* are the 16-bit units that comprise UTF-16 sequences. Code units +//! can specify Unicode code points either as single units or in *surrogate pairs*. Because every +//! code unit might be part of a surrogate pair, many regular string operations, including +//! indexing into a wide string, writing to a wide string, or even iterating a wide string should +//! be handled with care and are greatly discouraged. Some operations have safer alternatives +//! provided, such as Unicode code point iteration instead of code unit iteration. Always keep in +//! mind that the number of code units (`len()`) of a wide string is **not** equivalent to the +//! number of Unicode characters in the string, merely the length of the UTF-16 encoding sequence. +//! In fact, Unicode code points do not even have a one-to-one mapping with characters! +//! +//! UTF-32 simply encodes Unicode code points as-is in 32-bit values, but Unicode character code +//! points are reserved only for 21-bits. Again, Unicode code points do not have a one-to-one +//! mapping with the concept of a visual character glyph. +//! +//! # FFI with C/C++ `wchar_t` +//! +//! C/C++'s `wchar_t` (and C++'s corresponding `widestring`) varies in size depending on compiler +//! and platform. Typically, `wchar_t` is 16-bits on Windows and 32-bits on most Unix-based +//! platforms. For convenience when using `wchar_t`-based FFI's, type aliases for the corresponding +//! string types are provided: `WideString` aliases `U16String` on Windows or `U32String` +//! elsewhere, `WideCString` aliases `U16CString` or `U32CString`, etc. The `WideChar` alias +//! is also provided, aliasing `u16` or `u32`. +//! +//! When not interacting with a FFI using `wchar_t`, it is recommended to use the string types +//! directly rather than via the wide alias. +//! +//! This crate supports `no_std` when default features are disabled. The `std` and `alloc` features +//! (enabled by default) enable the `U16String`, `U32String`, `U16CString`, and `U32CString` types +//! and aliases. Other types do not require allocation and can be used in a `no_std` environment. +//! +//! # Examples +//! +//! The following example uses `U16String` to get Windows error messages, since `FormatMessageW` +//! returns a string length for us and we don't need to pass error messages into other FFI +//! functions so we don't need to worry about nul values. +//! +//! ```rust +//! # #[cfg(not(windows))] +//! # fn main() {} +//! # extern crate winapi; +//! # extern crate widestring; +//! # #[cfg(windows)] +//! # fn main() { +//! use winapi::um::winbase::{FormatMessageW, LocalFree, FORMAT_MESSAGE_FROM_SYSTEM, +//! FORMAT_MESSAGE_ALLOCATE_BUFFER, FORMAT_MESSAGE_IGNORE_INSERTS}; +//! use winapi::shared::ntdef::LPWSTR; +//! use winapi::shared::minwindef::HLOCAL; +//! use std::ptr; +//! use widestring::U16String; +//! # use winapi::shared::minwindef::DWORD; +//! # let error_code: DWORD = 0; +//! +//! let U16Str: U16String; +//! unsafe { +//! // First, get a string buffer from some windows api such as FormatMessageW... +//! let mut buffer: LPWSTR = ptr::null_mut(); +//! let strlen = FormatMessageW(FORMAT_MESSAGE_FROM_SYSTEM | +//! FORMAT_MESSAGE_ALLOCATE_BUFFER | +//! FORMAT_MESSAGE_IGNORE_INSERTS, +//! ptr::null(), +//! error_code, // error code from GetLastError() +//! 0, +//! (&mut buffer as *mut LPWSTR) as LPWSTR, +//! 0, +//! ptr::null_mut()); +//! +//! // Get the buffer as a wide string +//! U16Str = U16String::from_ptr(buffer, strlen as usize); +//! // Since U16String creates an owned copy, it's safe to free original buffer now +//! // If you didn't want an owned copy, you could use &U16Str. +//! LocalFree(buffer as HLOCAL); +//! } +//! // Convert to a regular Rust String and use it to your heart's desire! +//! let message = U16Str.to_string_lossy(); +//! # assert_eq!(message, "The operation completed successfully.\r\n"); +//! # } +//! ``` +//! +//! The following example is the functionally the same, only using `U16CString` instead. +//! +//! ```rust +//! # #[cfg(not(windows))] +//! # fn main() {} +//! # extern crate winapi; +//! # extern crate widestring; +//! # #[cfg(windows)] +//! # fn main() { +//! use winapi::um::winbase::{FormatMessageW, LocalFree, FORMAT_MESSAGE_FROM_SYSTEM, +//! FORMAT_MESSAGE_ALLOCATE_BUFFER, FORMAT_MESSAGE_IGNORE_INSERTS}; +//! use winapi::shared::ntdef::LPWSTR; +//! use winapi::shared::minwindef::HLOCAL; +//! use std::ptr; +//! use widestring::U16CString; +//! # use winapi::shared::minwindef::DWORD; +//! # let error_code: DWORD = 0; +//! +//! let U16Str: U16CString; +//! unsafe { +//! // First, get a string buffer from some windows api such as FormatMessageW... +//! let mut buffer: LPWSTR = ptr::null_mut(); +//! FormatMessageW(FORMAT_MESSAGE_FROM_SYSTEM | +//! FORMAT_MESSAGE_ALLOCATE_BUFFER | +//! FORMAT_MESSAGE_IGNORE_INSERTS, +//! ptr::null(), +//! error_code, // error code from GetLastError() +//! 0, +//! (&mut buffer as *mut LPWSTR) as LPWSTR, +//! 0, +//! ptr::null_mut()); +//! +//! // Get the buffer as a wide string +//! U16Str = U16CString::from_ptr_str(buffer); +//! // Since U16CString creates an owned copy, it's safe to free original buffer now +//! // If you didn't want an owned copy, you could use &U16CStr. +//! LocalFree(buffer as HLOCAL); +//! } +//! // Convert to a regular Rust String and use it to your heart's desire! +//! let message = U16Str.to_string_lossy(); +//! # assert_eq!(message, "The operation completed successfully.\r\n"); +//! # } +//! ``` + +#![deny(future_incompatible)] +#![warn( + unused, + anonymous_parameters, + missing_docs, + missing_copy_implementations, + missing_debug_implementations, + trivial_casts, + trivial_numeric_casts +)] +#![cfg_attr(not(feature = "std"), no_std)] + +#[cfg(all(feature = "alloc", not(feature = "std")))] +extern crate alloc; +#[cfg(feature = "std")] +extern crate core; + +use core::fmt::Debug; + +#[cfg(feature = "std")] +mod platform; +mod ucstr; +#[cfg(feature = "alloc")] +mod ucstring; +mod ustr; +#[cfg(feature = "alloc")] +mod ustring; + +pub use crate::ucstr::*; +#[cfg(feature = "alloc")] +pub use crate::ucstring::*; +pub use crate::ustr::*; +#[cfg(feature = "alloc")] +pub use crate::ustring::*; + +/// Marker trait for primitive types used to represent UTF character data. Should not be used +/// directly. +pub trait UChar: Debug + Sized + Copy + Ord + Eq { + /// NUL character value + const NUL: Self; +} + +impl UChar for u16 { + const NUL: u16 = 0; +} + +impl UChar for u32 { + const NUL: u32 = 0; +} + +#[cfg(not(windows))] +/// Alias for `u16` or `u32` depending on platform. Intended to match typical C `wchar_t` size on platform. +pub type WideChar = u32; + +#[cfg(windows)] +/// Alias for `u16` or `u32` depending on platform. Intended to match typical C `wchar_t` size on platform. +pub type WideChar = u16; diff --git a/rust/vendor/widestring/src/platform/mod.rs b/rust/vendor/widestring/src/platform/mod.rs new file mode 100644 index 0000000..b61cbb3 --- /dev/null +++ b/rust/vendor/widestring/src/platform/mod.rs @@ -0,0 +1,9 @@ +#[cfg(windows)] +mod windows; +#[cfg(windows)] +pub(crate) use self::windows::*; + +#[cfg(not(windows))] +mod other; +#[cfg(not(windows))] +pub(crate) use self::other::*; diff --git a/rust/vendor/widestring/src/platform/other.rs b/rust/vendor/widestring/src/platform/other.rs new file mode 100644 index 0000000..4cc6257 --- /dev/null +++ b/rust/vendor/widestring/src/platform/other.rs @@ -0,0 +1,9 @@ +use std::ffi::{OsStr, OsString}; + +pub(crate) fn os_to_wide(s: &OsStr) -> Vec<u16> { + s.to_string_lossy().encode_utf16().collect() +} + +pub(crate) fn os_from_wide(s: &[u16]) -> OsString { + OsString::from(String::from_utf16_lossy(s)) +} diff --git a/rust/vendor/widestring/src/platform/windows.rs b/rust/vendor/widestring/src/platform/windows.rs new file mode 100644 index 0000000..5ff0f12 --- /dev/null +++ b/rust/vendor/widestring/src/platform/windows.rs @@ -0,0 +1,11 @@ +#![cfg(windows)] +use std::ffi::{OsStr, OsString}; +use std::os::windows::ffi::{OsStrExt, OsStringExt}; + +pub(crate) fn os_to_wide(s: &OsStr) -> Vec<u16> { + s.encode_wide().collect() +} + +pub(crate) fn os_from_wide(s: &[u16]) -> OsString { + OsString::from_wide(s) +} diff --git a/rust/vendor/widestring/src/ucstr.rs b/rust/vendor/widestring/src/ucstr.rs new file mode 100644 index 0000000..5dbbf16 --- /dev/null +++ b/rust/vendor/widestring/src/ucstr.rs @@ -0,0 +1,538 @@ +use crate::{UChar, WideChar}; +use core::slice; + +#[cfg(all(feature = "alloc", not(feature = "std")))] +use alloc::{ + borrow::ToOwned, + boxed::Box, + string::{FromUtf16Error, String}, + vec::Vec, +}; +#[cfg(feature = "std")] +use std::{ + borrow::ToOwned, + boxed::Box, + string::{FromUtf16Error, String}, + vec::Vec, +}; + +/// An error returned from `UCString` and `UCStr` to indicate that a terminating nul value +/// was missing. +/// +/// The error optionally returns the ownership of the invalid vector whenever a vector was owned. +#[derive(Debug, Clone, PartialEq, Eq)] +pub struct MissingNulError<C> { + #[cfg(feature = "alloc")] + pub(crate) inner: Option<Vec<C>>, + #[cfg(not(feature = "alloc"))] + _p: core::marker::PhantomData<C>, +} + +impl<C: UChar> MissingNulError<C> { + #[cfg(feature = "alloc")] + fn empty() -> Self { + Self { inner: None } + } + + #[cfg(not(feature = "alloc"))] + fn empty() -> Self { + Self { + _p: core::marker::PhantomData, + } + } + + /// Consumes this error, returning the underlying vector of `u16` values which generated the + /// error in the first place. + #[cfg(feature = "alloc")] + pub fn into_vec(self) -> Option<Vec<C>> { + self.inner + } +} + +impl<C: UChar> core::fmt::Display for MissingNulError<C> { + fn fmt(&self, f: &mut core::fmt::Formatter) -> core::fmt::Result { + write!(f, "missing terminating nul value") + } +} + +#[cfg(feature = "std")] +impl<C: UChar> std::error::Error for MissingNulError<C> { + fn description(&self) -> &str { + "missing terminating nul value" + } +} + +/// C-style wide string reference for `UCString`. +/// +/// `UCStr` is aware of nul values. Unless unchecked conversions are used, all `UCStr` +/// strings end with a nul-terminator in the underlying buffer and contain no internal nul values. +/// The strings may still contain invalid or ill-formed UTF-16 or UTF-32 data. These strings are +/// intended to be used with FFI functions such as Windows API that may require nul-terminated +/// strings. +/// +/// `UCStr` can be converted to and from many other string types, including `UString`, +/// `OsString`, and `String`, making proper Unicode FFI safe and easy. +/// +/// Please prefer using the type aliases `U16CStr` or `U32CStr` or `WideCStr` to using +/// this type directly. +#[derive(Debug, PartialEq, Eq, PartialOrd, Ord, Hash)] +pub struct UCStr<C: UChar> { + inner: [C], +} + +impl<C: UChar> UCStr<C> { + /// Coerces a value into a `UCStr`. + pub fn new<S: AsRef<UCStr<C>> + ?Sized>(s: &S) -> &Self { + s.as_ref() + } + + /// Constructs a `UStr` from a nul-terminated string pointer. + /// + /// This will scan for nul values beginning with `p`. The first nul value will be used as the + /// nul terminator for the string, similar to how libc string functions such as `strlen` work. + /// + /// # Safety + /// + /// This function is unsafe as there is no guarantee that the given pointer is valid or has a + /// nul terminator, and the function could scan past the underlying buffer. + /// + /// `p` must be non-null. + /// + /// # Panics + /// + /// This function panics if `p` is null. + /// + /// # Caveat + /// + /// The lifetime for the returned string is inferred from its usage. To prevent accidental + /// misuse, it's suggested to tie the lifetime to whichever source lifetime is safe in the + /// context, such as by providing a helper function taking the lifetime of a host value for the + /// string, or by explicit annotation. + pub unsafe fn from_ptr_str<'a>(p: *const C) -> &'a Self { + assert!(!p.is_null()); + let mut i: isize = 0; + while *p.offset(i) != UChar::NUL { + i += 1; + } + let ptr: *const [C] = slice::from_raw_parts(p, i as usize + 1); + &*(ptr as *const UCStr<C>) + } + + /// Constructs a `UStr` from a pointer and a length. + /// + /// The `len` argument is the number of elements, **not** the number of bytes, and does + /// **not** include the nul terminator of the string. Thus, a `len` of 0 is valid and means that + /// `p` is a pointer directly to the nul terminator of the string. + /// + /// # Safety + /// + /// This function is unsafe as there is no guarantee that the given pointer is valid for `len` + /// elements. + /// + /// `p` must be non-null, even for zero `len`. + /// + /// The interior values of the pointer are not scanned for nul. Any interior nul values will + /// result in an invalid `UCStr`. + /// + /// # Panics + /// + /// This function panics if `p` is null or if a nul value is not found at offset `len` of `p`. + /// Only pointers with a nul terminator are valid. + /// + /// # Caveat + /// + /// The lifetime for the returned string is inferred from its usage. To prevent accidental + /// misuse, it's suggested to tie the lifetime to whichever source lifetime is safe in the + /// context, such as by providing a helper function taking the lifetime of a host value for the + /// string, or by explicit annotation. + pub unsafe fn from_ptr_with_nul<'a>(p: *const C, len: usize) -> &'a Self { + assert!(*p.add(len) == UChar::NUL); + let ptr: *const [C] = slice::from_raw_parts(p, len + 1); + &*(ptr as *const UCStr<C>) + } + + /// Constructs a `UCStr` from a slice of values that has a nul terminator. + /// + /// The slice will be scanned for nul values. When a nul value is found, it is treated as the + /// terminator for the string, and the `UCStr` slice will be truncated to that nul. + /// + /// # Failure + /// + /// If there are no no nul values in the slice, an error is returned. + pub fn from_slice_with_nul(slice: &[C]) -> Result<&Self, MissingNulError<C>> { + match slice.iter().position(|x| *x == UChar::NUL) { + None => Err(MissingNulError::empty()), + Some(i) => Ok(unsafe { UCStr::from_slice_with_nul_unchecked(&slice[..i + 1]) }), + } + } + + /// Constructs a `UCStr` from a slice of values that has a nul terminator. No + /// checking for nul values is performed. + /// + /// # Safety + /// + /// This function is unsafe because it can lead to invalid `UCStr` values when the slice + /// is missing a terminating nul value or there are non-terminating interior nul values + /// in the slice. + pub unsafe fn from_slice_with_nul_unchecked(slice: &[C]) -> &Self { + let ptr: *const [C] = slice; + &*(ptr as *const UCStr<C>) + } + + /// Copies the wide string to an new owned `UString`. + #[cfg(feature = "alloc")] + pub fn to_ucstring(&self) -> crate::UCString<C> { + unsafe { crate::UCString::from_vec_with_nul_unchecked(self.inner.to_owned()) } + } + + /// Copies the wide string to a new owned `UString`. + /// + /// The `UString` will **not** have a nul terminator. + /// + /// # Examples + /// + /// ```rust + /// use widestring::U16CString; + /// let wcstr = U16CString::from_str("MyString").unwrap(); + /// // Convert U16CString to a U16String + /// let wstr = wcstr.to_ustring(); + /// + /// // U16CString will have a terminating nul + /// let wcvec = wcstr.into_vec_with_nul(); + /// assert_eq!(wcvec[wcvec.len()-1], 0); + /// // The resulting U16String will not have the terminating nul + /// let wvec = wstr.into_vec(); + /// assert_ne!(wvec[wvec.len()-1], 0); + /// ``` + /// + /// ```rust + /// use widestring::U32CString; + /// let wcstr = U32CString::from_str("MyString").unwrap(); + /// // Convert U32CString to a U32String + /// let wstr = wcstr.to_ustring(); + /// + /// // U32CString will have a terminating nul + /// let wcvec = wcstr.into_vec_with_nul(); + /// assert_eq!(wcvec[wcvec.len()-1], 0); + /// // The resulting U32String will not have the terminating nul + /// let wvec = wstr.into_vec(); + /// assert_ne!(wvec[wvec.len()-1], 0); + /// ``` + #[cfg(feature = "alloc")] + pub fn to_ustring(&self) -> crate::UString<C> { + crate::UString::from_vec(self.as_slice()) + } + + /// Converts to a slice of the wide string. + /// + /// The slice will **not** include the nul terminator. + pub fn as_slice(&self) -> &[C] { + &self.inner[..self.len()] + } + + /// Converts to a slice of the wide string, including the nul terminator. + pub fn as_slice_with_nul(&self) -> &[C] { + &self.inner + } + + /// Returns a raw pointer to the wide string. + /// + /// The pointer is valid only as long as the lifetime of this reference. + pub fn as_ptr(&self) -> *const C { + self.inner.as_ptr() + } + + /// Returns the length of the wide string as number of elements (**not** number of bytes) + /// **not** including nul terminator. + pub fn len(&self) -> usize { + self.inner.len() - 1 + } + + /// Returns whether this wide string contains no data (i.e. is only the nul terminator). + pub fn is_empty(&self) -> bool { + self.len() == 0 + } + + /// Converts a `Box<UCStr>` into a `UCString` without copying or allocating. + /// + /// # Examples + /// + /// ``` + /// use widestring::U16CString; + /// + /// let v = vec![102u16, 111u16, 111u16]; // "foo" + /// let c_string = U16CString::new(v.clone()).unwrap(); + /// let boxed = c_string.into_boxed_ucstr(); + /// assert_eq!(boxed.into_ucstring(), U16CString::new(v).unwrap()); + /// ``` + /// + /// ``` + /// use widestring::U32CString; + /// + /// let v = vec![102u32, 111u32, 111u32]; // "foo" + /// let c_string = U32CString::new(v.clone()).unwrap(); + /// let boxed = c_string.into_boxed_ucstr(); + /// assert_eq!(boxed.into_ucstring(), U32CString::new(v).unwrap()); + /// ``` + #[cfg(feature = "alloc")] + pub fn into_ucstring(self: Box<Self>) -> crate::UCString<C> { + let raw = Box::into_raw(self) as *mut [C]; + crate::UCString { + inner: unsafe { Box::from_raw(raw) }, + } + } + + #[cfg(feature = "alloc")] + pub(crate) fn from_inner(slice: &[C]) -> &UCStr<C> { + let ptr: *const [C] = slice; + unsafe { &*(ptr as *const UCStr<C>) } + } +} + +impl UCStr<u16> { + /// Decodes a wide string to an owned `OsString`. + /// + /// This makes a string copy of the `U16CStr`. Since `U16CStr` makes no guarantees that it is + /// valid UTF-16, there is no guarantee that the resulting `OsString` will be valid data. The + /// `OsString` will **not** have a nul terminator. + /// + /// # Examples + /// + /// ```rust + /// use widestring::U16CString; + /// use std::ffi::OsString; + /// let s = "MyString"; + /// // Create a wide string from the string + /// let wstr = U16CString::from_str(s).unwrap(); + /// // Create an OsString from the wide string + /// let osstr = wstr.to_os_string(); + /// + /// assert_eq!(osstr, OsString::from(s)); + /// ``` + #[cfg(feature = "std")] + pub fn to_os_string(&self) -> std::ffi::OsString { + crate::platform::os_from_wide(self.as_slice()) + } + + /// Copies the wide string to a `String` if it contains valid UTF-16 data. + /// + /// # Failures + /// + /// Returns an error if the string contains any invalid UTF-16 data. + /// + /// # Examples + /// + /// ```rust + /// use widestring::U16CString; + /// let s = "MyString"; + /// // Create a wide string from the string + /// let wstr = U16CString::from_str(s).unwrap(); + /// // Create a regular string from the wide string + /// let s2 = wstr.to_string().unwrap(); + /// + /// assert_eq!(s2, s); + /// ``` + #[cfg(feature = "alloc")] + pub fn to_string(&self) -> Result<String, FromUtf16Error> { + String::from_utf16(self.as_slice()) + } + + /// Copies the wide string to a `String`. + /// + /// Any non-Unicode sequences are replaced with U+FFFD REPLACEMENT CHARACTER. + /// + /// # Examples + /// + /// ```rust + /// use widestring::U16CString; + /// let s = "MyString"; + /// // Create a wide string from the string + /// let wstr = U16CString::from_str(s).unwrap(); + /// // Create a regular string from the wide string + /// let s2 = wstr.to_string_lossy(); + /// + /// assert_eq!(s2, s); + /// ``` + #[cfg(feature = "alloc")] + pub fn to_string_lossy(&self) -> String { + String::from_utf16_lossy(self.as_slice()) + } +} + +impl UCStr<u32> { + /// Constructs a `U32Str` from a `char` nul-terminated string pointer. + /// + /// This will scan for nul values beginning with `p`. The first nul value will be used as the + /// nul terminator for the string, similar to how libc string functions such as `strlen` work. + /// + /// # Safety + /// + /// This function is unsafe as there is no guarantee that the given pointer is valid or has a + /// nul terminator, and the function could scan past the underlying buffer. + /// + /// `p` must be non-null. + /// + /// # Panics + /// + /// This function panics if `p` is null. + /// + /// # Caveat + /// + /// The lifetime for the returned string is inferred from its usage. To prevent accidental + /// misuse, it's suggested to tie the lifetime to whichever source lifetime is safe in the + /// context, such as by providing a helper function taking the lifetime of a host value for the + /// string, or by explicit annotation. + pub unsafe fn from_char_ptr_str<'a>(p: *const char) -> &'a Self { + UCStr::from_ptr_str(p as *const u32) + } + + /// Constructs a `U32Str` from a `char` pointer and a length. + /// + /// The `len` argument is the number of `char` elements, **not** the number of bytes, and does + /// **not** include the nul terminator of the string. Thus, a `len` of 0 is valid and means that + /// `p` is a pointer directly to the nul terminator of the string. + /// + /// # Safety + /// + /// This function is unsafe as there is no guarantee that the given pointer is valid for `len` + /// elements. + /// + /// `p` must be non-null, even for zero `len`. + /// + /// The interior values of the pointer are not scanned for nul. Any interior nul values will + /// result in an invalid `U32CStr`. + /// + /// # Panics + /// + /// This function panics if `p` is null or if a nul value is not found at offset `len` of `p`. + /// Only pointers with a nul terminator are valid. + /// + /// # Caveat + /// + /// The lifetime for the returned string is inferred from its usage. To prevent accidental + /// misuse, it's suggested to tie the lifetime to whichever source lifetime is safe in the + /// context, such as by providing a helper function taking the lifetime of a host value for the + /// string, or by explicit annotation. + pub unsafe fn from_char_ptr_with_nul<'a>(p: *const char, len: usize) -> &'a Self { + UCStr::from_ptr_with_nul(p as *const u32, len) + } + + /// Constructs a `U32CStr` from a slice of `char` values that has a nul terminator. + /// + /// The slice will be scanned for nul values. When a nul value is found, it is treated as the + /// terminator for the string, and the `U32CStr` slice will be truncated to that nul. + /// + /// # Failure + /// + /// If there are no no nul values in `slice`, an error is returned. + pub fn from_char_slice_with_nul(slice: &[char]) -> Result<&Self, MissingNulError<u32>> { + let ptr: *const [char] = slice; + UCStr::from_slice_with_nul(unsafe { &*(ptr as *const [u32]) }) + } + + /// Constructs a `U32CStr` from a slice of `char` values that has a nul terminator. No + /// checking for nul values is performed. + /// + /// # Safety + /// + /// This function is unsafe because it can lead to invalid `U32CStr` values when `slice` + /// is missing a terminating nul value or there are non-terminating interior nul values + /// in the slice. + pub unsafe fn from_char_slice_with_nul_unchecked(slice: &[char]) -> &Self { + let ptr: *const [char] = slice; + UCStr::from_slice_with_nul_unchecked(&*(ptr as *const [u32])) + } + + /// Decodes a wide string to an owned `OsString`. + /// + /// This makes a string copy of the `U32CStr`. Since `U32CStr` makes no guarantees that it is + /// valid UTF-32, there is no guarantee that the resulting `OsString` will be valid data. The + /// `OsString` will **not** have a nul terminator. + /// + /// # Examples + /// + /// ```rust + /// use widestring::U32CString; + /// use std::ffi::OsString; + /// let s = "MyString"; + /// // Create a wide string from the string + /// let wstr = U32CString::from_str(s).unwrap(); + /// // Create an OsString from the wide string + /// let osstr = wstr.to_os_string(); + /// + /// assert_eq!(osstr, OsString::from(s)); + /// ``` + #[cfg(feature = "std")] + pub fn to_os_string(&self) -> std::ffi::OsString { + self.to_ustring().to_os_string() + } + + /// Copies the wide string to a `String` if it contains valid UTF-32 data. + /// + /// # Failures + /// + /// Returns an error if the string contains any invalid UTF-32 data. + /// + /// # Examples + /// + /// ```rust + /// use widestring::U32CString; + /// let s = "MyString"; + /// // Create a wide string from the string + /// let wstr = U32CString::from_str(s).unwrap(); + /// // Create a regular string from the wide string + /// let s2 = wstr.to_string().unwrap(); + /// + /// assert_eq!(s2, s); + /// ``` + #[cfg(feature = "alloc")] + pub fn to_string(&self) -> Result<String, crate::FromUtf32Error> { + self.to_ustring().to_string() + } + + /// Copies the wide string to a `String`. + /// + /// Any non-Unicode sequences are replaced with U+FFFD REPLACEMENT CHARACTER. + /// + /// # Examples + /// + /// ```rust + /// use widestring::U32CString; + /// let s = "MyString"; + /// // Create a wide string from the string + /// let wstr = U32CString::from_str(s).unwrap(); + /// // Create a regular string from the wide string + /// let s2 = wstr.to_string_lossy(); + /// + /// assert_eq!(s2, s); + /// ``` + #[cfg(feature = "alloc")] + pub fn to_string_lossy(&self) -> String { + self.to_ustring().to_string_lossy() + } +} + +/// C-style wide string reference for `U16CString`. +/// +/// `U16CStr` is aware of nul values. Unless unchecked conversions are used, all `U16CStr` +/// strings end with a nul-terminator in the underlying buffer and contain no internal nul values. +/// The strings may still contain invalid or ill-formed UTF-16 data. These strings are intended to +/// be used with FFI functions such as Windows API that may require nul-terminated strings. +/// +/// `U16CStr` can be converted to and from many other string types, including `U16String`, +/// `OsString`, and `String`, making proper Unicode FFI safe and easy. +pub type U16CStr = UCStr<u16>; + +/// C-style wide string reference for `U32CString`. +/// +/// `U32CStr` is aware of nul values. Unless unchecked conversions are used, all `U32CStr` +/// strings end with a nul-terminator in the underlying buffer and contain no internal nul values. +/// The strings may still contain invalid or ill-formed UTF-32 data. These strings are intended to +/// be used with FFI functions such as Windows API that may require nul-terminated strings. +/// +/// `U32CStr` can be converted to and from many other string types, including `U32String`, +/// `OsString`, and `String`, making proper Unicode FFI safe and easy. +pub type U32CStr = UCStr<u32>; + +/// Alias for `U16CStr` or `U32CStr` depending on platform. Intended to match typical C `wchar_t` size on platform. +pub type WideCStr = UCStr<WideChar>; diff --git a/rust/vendor/widestring/src/ucstring.rs b/rust/vendor/widestring/src/ucstring.rs new file mode 100644 index 0000000..ddb7338 --- /dev/null +++ b/rust/vendor/widestring/src/ucstring.rs @@ -0,0 +1,1570 @@ +use crate::{MissingNulError, UCStr, UChar, UStr, UString, WideChar}; +use core::{ + borrow::Borrow, + mem, + mem::ManuallyDrop, + ops::{Deref, Index, RangeFull}, + ptr, slice, +}; + +#[cfg(all(feature = "alloc", not(feature = "std")))] +use alloc::{ + borrow::{Cow, ToOwned}, + boxed::Box, + vec::Vec, +}; +#[cfg(feature = "std")] +use std::{ + borrow::{Cow, ToOwned}, + boxed::Box, + vec::Vec, +}; + +/// An owned, mutable C-style "wide" string for FFI that is nul-aware and nul-terminated. +/// +/// `UCString` is aware of nul values. Unless unchecked conversions are used, all `UCString` +/// strings end with a nul-terminator in the underlying buffer and contain no internal nul values. +/// The strings may still contain invalid or ill-formed UTF-16 or UTF-32 data. These strings are +/// intended to be used with FFI functions such as Windows API that may require nul-terminated +/// strings. +/// +/// `UCString` can be converted to and from many other string types, including `UString`, +/// `OsString`, and `String`, making proper Unicode FFI safe and easy. +/// +/// Please prefer using the type aliases `U16CString` or `U32CString` or `WideCString` to using +/// this type directly. +/// +/// # Examples +/// +/// The following example constructs a `U16CString` and shows how to convert a `U16CString` to a +/// regular Rust `String`. +/// +/// ```rust +/// use widestring::U16CString; +/// let s = "Test"; +/// // Create a wide string from the rust string +/// let wstr = U16CString::from_str(s).unwrap(); +/// // Convert back to a rust string +/// let rust_str = wstr.to_string_lossy(); +/// assert_eq!(rust_str, "Test"); +/// ``` +/// +/// The same example using `U32CString`: +/// +/// ```rust +/// use widestring::U32CString; +/// let s = "Test"; +/// // Create a wide string from the rust string +/// let wstr = U32CString::from_str(s).unwrap(); +/// // Convert back to a rust string +/// let rust_str = wstr.to_string_lossy(); +/// assert_eq!(rust_str, "Test"); +/// ``` +#[derive(Debug, Clone, PartialEq, Eq, PartialOrd, Ord, Hash)] +pub struct UCString<C: UChar> { + pub(crate) inner: Box<[C]>, +} + +/// An error returned from `UCString` to indicate that an invalid nul value was found. +/// +/// The error indicates the position in the vector where the nul value was found, as well as +/// returning the ownership of the invalid vector. +#[derive(Debug, Clone, PartialEq, Eq)] +pub struct NulError<C: UChar>(usize, Vec<C>); + +impl<C: UChar> UCString<C> { + /// Constructs a `UCString` from a container of wide character data. + /// + /// This method will consume the provided data and use the underlying elements to construct a + /// new string. The data will be scanned for invalid nul values. + /// + /// # Failures + /// + /// This function will return an error if the data contains a nul value. + /// The returned error will contain the `Vec` as well as the position of the nul value. + /// + /// # Examples + /// + /// ```rust + /// use widestring::U16CString; + /// let v = vec![84u16, 104u16, 101u16]; // 'T' 'h' 'e' + /// # let cloned = v.clone(); + /// // Create a wide string from the vector + /// let wcstr = U16CString::new(v).unwrap(); + /// # assert_eq!(wcstr.into_vec(), cloned); + /// ``` + /// + /// ```rust + /// use widestring::U32CString; + /// let v = vec![84u32, 104u32, 101u32]; // 'T' 'h' 'e' + /// # let cloned = v.clone(); + /// // Create a wide string from the vector + /// let wcstr = U32CString::new(v).unwrap(); + /// # assert_eq!(wcstr.into_vec(), cloned); + /// ``` + /// + /// The following example demonstrates errors from nul values in a vector. + /// + /// ```rust + /// use widestring::U16CString; + /// let v = vec![84u16, 0u16, 104u16, 101u16]; // 'T' NUL 'h' 'e' + /// // Create a wide string from the vector + /// let res = U16CString::new(v); + /// assert!(res.is_err()); + /// assert_eq!(res.err().unwrap().nul_position(), 1); + /// ``` + /// + /// ```rust + /// use widestring::U32CString; + /// let v = vec![84u32, 0u32, 104u32, 101u32]; // 'T' NUL 'h' 'e' + /// // Create a wide string from the vector + /// let res = U32CString::new(v); + /// assert!(res.is_err()); + /// assert_eq!(res.err().unwrap().nul_position(), 1); + /// ``` + pub fn new(v: impl Into<Vec<C>>) -> Result<Self, NulError<C>> { + let v = v.into(); + // Check for nul vals + match v.iter().position(|&val| val == UChar::NUL) { + None => Ok(unsafe { UCString::from_vec_unchecked(v) }), + Some(pos) => Err(NulError(pos, v)), + } + } + + /// Constructs a `UCString` from a nul-terminated container of UTF-16 or UTF-32 data. + /// + /// This method will consume the provided data and use the underlying elements to construct a + /// new string. The string will be truncated at the first nul value in the string. + /// + /// # Failures + /// + /// This function will return an error if the data does not contain a nul to terminate the + /// string. The returned error will contain the consumed `Vec`. + /// + /// # Examples + /// + /// ```rust + /// use widestring::U16CString; + /// let v = vec![84u16, 104u16, 101u16, 0u16]; // 'T' 'h' 'e' NUL + /// # let cloned = v[..3].to_owned(); + /// // Create a wide string from the vector + /// let wcstr = U16CString::from_vec_with_nul(v).unwrap(); + /// # assert_eq!(wcstr.into_vec(), cloned); + /// ``` + /// + /// ```rust + /// use widestring::U32CString; + /// let v = vec![84u32, 104u32, 101u32, 0u32]; // 'T' 'h' 'e' NUL + /// # let cloned = v[..3].to_owned(); + /// // Create a wide string from the vector + /// let wcstr = U32CString::from_vec_with_nul(v).unwrap(); + /// # assert_eq!(wcstr.into_vec(), cloned); + /// ``` + /// + /// The following example demonstrates errors from missing nul values in a vector. + /// + /// ```rust + /// use widestring::U16CString; + /// let v = vec![84u16, 104u16, 101u16]; // 'T' 'h' 'e' + /// // Create a wide string from the vector + /// let res = U16CString::from_vec_with_nul(v); + /// assert!(res.is_err()); + /// ``` + /// + /// ```rust + /// use widestring::U32CString; + /// let v = vec![84u32, 104u32, 101u32]; // 'T' 'h' 'e' + /// // Create a wide string from the vector + /// let res = U32CString::from_vec_with_nul(v); + /// assert!(res.is_err()); + /// ``` + pub fn from_vec_with_nul(v: impl Into<Vec<C>>) -> Result<Self, MissingNulError<C>> { + let mut v = v.into(); + // Check for nul vals + match v.iter().position(|&val| val == UChar::NUL) { + None => Err(MissingNulError { inner: Some(v) }), + Some(pos) => { + v.truncate(pos + 1); + Ok(unsafe { UCString::from_vec_with_nul_unchecked(v) }) + } + } + } + + /// Creates a `UCString` from a vector without checking for interior nul values. + /// + /// A terminating nul value will be appended if the vector does not already have a terminating + /// nul. + /// + /// # Safety + /// + /// This method is equivalent to `new` except that no runtime assertion is made that `v` + /// contains no nul values. Providing a vector with nul values will result in an invalid + /// `UCString`. + pub unsafe fn from_vec_unchecked(v: impl Into<Vec<C>>) -> Self { + let mut v = v.into(); + match v.last() { + None => v.push(UChar::NUL), + Some(&c) if c != UChar::NUL => v.push(UChar::NUL), + Some(_) => (), + } + UCString::from_vec_with_nul_unchecked(v) + } + + /// Creates a `UCString` from a vector that should have a nul terminator, without checking + /// for any nul values. + /// + /// # Safety + /// + /// This method is equivalent to `from_vec_with_nul` except that no runtime assertion is made + /// that `v` contains no nul values. Providing a vector with interior nul values or without a + /// terminating nul value will result in an invalid `UCString`. + pub unsafe fn from_vec_with_nul_unchecked(v: impl Into<Vec<C>>) -> Self { + UCString { + inner: v.into().into_boxed_slice(), + } + } + + /// Constructs a `UCString` from anything that can be converted to a `UStr`. + /// + /// The string will be scanned for invalid nul values. + /// + /// # Failures + /// + /// This function will return an error if the data contains a nul value. + /// The returned error will contain a `Vec` as well as the position of the nul value. + pub fn from_ustr(s: impl AsRef<UStr<C>>) -> Result<Self, NulError<C>> { + UCString::new(s.as_ref().as_slice()) + } + + /// Constructs a `UCString` from anything that can be converted to a `UStr`, without + /// scanning for invalid nul values. + /// + /// # Safety + /// + /// This method is equivalent to `from_u16_str` except that no runtime assertion is made that + /// `s` contains no nul values. Providing a string with nul values will result in an invalid + /// `UCString`. + pub unsafe fn from_ustr_unchecked(s: impl AsRef<UStr<C>>) -> Self { + UCString::from_vec_unchecked(s.as_ref().as_slice()) + } + + /// Constructs a `UCString` from anything that can be converted to a `UStr` with a nul + /// terminator. + /// + /// The string will be truncated at the first nul value in the string. + /// + /// # Failures + /// + /// This function will return an error if the data does not contain a nul to terminate the + /// string. The returned error will contain the consumed `Vec`. + pub fn from_ustr_with_nul(s: impl AsRef<UStr<C>>) -> Result<Self, MissingNulError<C>> { + UCString::from_vec_with_nul(s.as_ref().as_slice()) + } + + /// Constructs a `UCString` from anything that can be converted to a `UStr` with a nul + /// terminator, without checking the string for any invalid interior nul values. + /// + /// # Safety + /// + /// This method is equivalent to `from_u16_str_with_nul` except that no runtime assertion is + /// made that `s` contains no nul values. Providing a vector with interior nul values or + /// without a terminating nul value will result in an invalid `UCString`. + pub unsafe fn from_ustr_with_nul_unchecked(s: impl AsRef<UStr<C>>) -> Self { + UCString::from_vec_with_nul_unchecked(s.as_ref().as_slice()) + } + + /// Constructs a new `UCString` copied from a nul-terminated string pointer. + /// + /// This will scan for nul values beginning with `p`. The first nul value will be used as the + /// nul terminator for the string, similar to how libc string functions such as `strlen` work. + /// + /// # Safety + /// + /// This function is unsafe as there is no guarantee that the given pointer is valid or has a + /// nul terminator, and the function could scan past the underlying buffer. + /// + /// `p` must be non-null. + /// + /// # Panics + /// + /// This function panics if `p` is null. + /// + /// # Caveat + /// + /// The lifetime for the returned string is inferred from its usage. To prevent accidental + /// misuse, it's suggested to tie the lifetime to whichever source lifetime is safe in the + /// context, such as by providing a helper function taking the lifetime of a host value for the + /// string, or by explicit annotation. + pub unsafe fn from_ptr_str(p: *const C) -> Self { + assert!(!p.is_null()); + let mut i: isize = 0; + while *p.offset(i) != UChar::NUL { + i += 1; + } + let slice = slice::from_raw_parts(p, i as usize + 1); + UCString::from_vec_with_nul_unchecked(slice) + } + + /// Converts to a `UCStr` reference. + pub fn as_ucstr(&self) -> &UCStr<C> { + self + } + + /// Converts the wide string into a `Vec` without a nul terminator, consuming the string in + /// the process. + /// + /// The resulting vector will **not** contain a nul-terminator, and will contain no other nul + /// values. + pub fn into_vec(self) -> Vec<C> { + let mut v = self.into_inner().into_vec(); + v.pop(); + v + } + + /// Converts the wide string into a `Vec`, consuming the string in the process. + /// + /// The resulting vector will contain a nul-terminator and no interior nul values. + pub fn into_vec_with_nul(self) -> Vec<C> { + self.into_inner().into_vec() + } + + /// Transfers ownership of the wide string to a C caller. + /// + /// # Safety + /// + /// The pointer must be returned to Rust and reconstituted using `from_raw` to be properly + /// deallocated. Specifically, one should _not_ use the standard C `free` function to + /// deallocate this string. + /// + /// Failure to call `from_raw` will lead to a memory leak. + pub fn into_raw(self) -> *mut C { + Box::into_raw(self.into_inner()) as *mut C + } + + /// Retakes ownership of a `UCString` that was transferred to C. + /// + /// # Safety + /// + /// This should only ever be called with a pointer that was earlier obtained by calling + /// `into_raw` on a `UCString`. Additionally, the length of the string will be recalculated + /// from the pointer. + pub unsafe fn from_raw(p: *mut C) -> Self { + assert!(!p.is_null()); + let mut i: isize = 0; + while *p.offset(i) != UChar::NUL { + i += 1; + } + let slice = slice::from_raw_parts_mut(p, i as usize + 1); + UCString { + inner: mem::transmute(slice), + } + } + + /// Converts this `UCString` into a boxed `UCStr`. + /// + /// # Examples + /// + /// ``` + /// use widestring::{U16CString, U16CStr}; + /// + /// let mut v = vec![102u16, 111u16, 111u16]; // "foo" + /// let c_string = U16CString::new(v.clone()).unwrap(); + /// let boxed = c_string.into_boxed_ucstr(); + /// v.push(0); + /// assert_eq!(&*boxed, U16CStr::from_slice_with_nul(&v).unwrap()); + /// ``` + /// + /// ``` + /// use widestring::{U32CString, U32CStr}; + /// + /// let mut v = vec![102u32, 111u32, 111u32]; // "foo" + /// let c_string = U32CString::new(v.clone()).unwrap(); + /// let boxed = c_string.into_boxed_ucstr(); + /// v.push(0); + /// assert_eq!(&*boxed, U32CStr::from_slice_with_nul(&v).unwrap()); + /// ``` + pub fn into_boxed_ucstr(self) -> Box<UCStr<C>> { + unsafe { Box::from_raw(Box::into_raw(self.into_inner()) as *mut UCStr<C>) } + } + + /// Bypass "move out of struct which implements [`Drop`] trait" restriction. + /// + /// [`Drop`]: ../ops/trait.Drop.html + fn into_inner(self) -> Box<[C]> { + unsafe { + let result = ptr::read(&self.inner); + mem::forget(self); + result + } + } +} + +impl UCString<u16> { + /// Constructs a `U16CString` from a `str`. + /// + /// The string will be scanned for invalid nul values. + /// + /// # Failures + /// + /// This function will return an error if the data contains a nul value. + /// The returned error will contain a `Vec<u16>` as well as the position of the nul value. + /// + /// # Examples + /// + /// ```rust + /// use widestring::U16CString; + /// let s = "MyString"; + /// // Create a wide string from the string + /// let wcstr = U16CString::from_str(s).unwrap(); + /// # assert_eq!(wcstr.to_string_lossy(), s); + /// ``` + /// + /// The following example demonstrates errors from nul values in a vector. + /// + /// ```rust + /// use widestring::U16CString; + /// let s = "My\u{0}String"; + /// // Create a wide string from the string + /// let res = U16CString::from_str(s); + /// assert!(res.is_err()); + /// assert_eq!(res.err().unwrap().nul_position(), 2); + /// ``` + #[allow(clippy::should_implement_trait)] + pub fn from_str(s: impl AsRef<str>) -> Result<Self, NulError<u16>> { + let v: Vec<u16> = s.as_ref().encode_utf16().collect(); + UCString::new(v) + } + + /// Constructs a `U16CString` from a `str`, without checking for interior nul values. + /// + /// # Safety + /// + /// This method is equivalent to `from_str` except that no runtime assertion is made that `s` + /// contains no nul values. Providing a string with nul values will result in an invalid + /// `U16CString`. + /// + /// # Examples + /// + /// ```rust + /// use widestring::U16CString; + /// let s = "MyString"; + /// // Create a wide string from the string + /// let wcstr = unsafe { U16CString::from_str_unchecked(s) }; + /// # assert_eq!(wcstr.to_string_lossy(), s); + /// ``` + pub unsafe fn from_str_unchecked(s: impl AsRef<str>) -> Self { + let v: Vec<u16> = s.as_ref().encode_utf16().collect(); + UCString::from_vec_unchecked(v) + } + + /// Constructs a `U16CString` from a `str` with a nul terminator. + /// + /// The string will be truncated at the first nul value in the string. + /// + /// # Failures + /// + /// This function will return an error if the data does not contain a nul to terminate the + /// string. The returned error will contain the consumed `Vec<u16>`. + /// + /// # Examples + /// + /// ```rust + /// use widestring::U16CString; + /// let s = "My\u{0}String"; + /// // Create a wide string from the string + /// let wcstr = U16CString::from_str_with_nul(s).unwrap(); + /// assert_eq!(wcstr.to_string_lossy(), "My"); + /// ``` + /// + /// The following example demonstrates errors from missing nul values in a vector. + /// + /// ```rust + /// use widestring::U16CString; + /// let s = "MyString"; + /// // Create a wide string from the string + /// let res = U16CString::from_str_with_nul(s); + /// assert!(res.is_err()); + /// ``` + pub fn from_str_with_nul(s: impl AsRef<str>) -> Result<Self, MissingNulError<u16>> { + let v: Vec<u16> = s.as_ref().encode_utf16().collect(); + UCString::from_vec_with_nul(v) + } + + /// Constructs a `U16CString` from str `str` that should have a terminating nul, but without + /// checking for any nul values. + /// + /// # Safety + /// + /// This method is equivalent to `from_str_with_nul` except that no runtime assertion is made + /// that `s` contains no nul values. Providing a vector with interior nul values or without a + /// terminating nul value will result in an invalid `U16CString`. + /// + /// # Examples + /// + /// ```rust + /// use widestring::U16CString; + /// let s = "My String\u{0}"; + /// // Create a wide string from the string + /// let wcstr = unsafe { U16CString::from_str_with_nul_unchecked(s) }; + /// assert_eq!(wcstr.to_string_lossy(), "My String"); + /// ``` + pub unsafe fn from_str_with_nul_unchecked(s: impl AsRef<str>) -> Self { + let v: Vec<u16> = s.as_ref().encode_utf16().collect(); + UCString::from_vec_with_nul_unchecked(v) + } + + /// Constructs a new `U16CString` copied from a `u16` pointer and a length. + /// + /// The `len` argument is the number of `u16` elements, **not** the number of bytes. + /// + /// The string will be scanned for invalid nul values. + /// + /// # Failures + /// + /// This function will return an error if the data contains a nul value. + /// The returned error will contain a `Vec<u16>` as well as the position of the nul value. + /// + /// # Safety + /// + /// This function is unsafe as there is no guarantee that the given pointer is valid for `len` + /// elements. + /// + /// # Panics + /// + /// Panics if `len` is greater than 0 but `p` is a null pointer. + pub unsafe fn from_ptr(p: *const u16, len: usize) -> Result<Self, NulError<u16>> { + if len == 0 { + return Ok(UCString::default()); + } + assert!(!p.is_null()); + let slice = slice::from_raw_parts(p, len); + UCString::new(slice) + } + + /// Constructs a new `U16CString` copied from a `u16` pointer and a length. + /// + /// The `len` argument is the number of `u16` elements, **not** the number of bytes. + /// + /// The string will **not** be checked for invalid nul values. + /// + /// # Safety + /// + /// This function is unsafe as there is no guarantee that the given pointer is valid for `len` + /// elements. In addition, no checking for invalid nul values is performed, so if any elements + /// of `p` are a nul value, the resulting `U16CString` will be invalid. + /// + /// # Panics + /// + /// Panics if `len` is greater than 0 but `p` is a null pointer. + pub unsafe fn from_ptr_unchecked(p: *const u16, len: usize) -> Self { + if len == 0 { + return UCString::default(); + } + assert!(!p.is_null()); + let slice = slice::from_raw_parts(p, len); + UCString::from_vec_unchecked(slice) + } + + /// Constructs a new `U16String` copied from a `u16` pointer and a length. + /// + /// The `len` argument is the number of `u16` elements, **not** the number of bytes. + /// + /// The string will be truncated at the first nul value in the string. + /// + /// # Failures + /// + /// This function will return an error if the data does not contain a nul to terminate the + /// string. The returned error will contain the consumed `Vec<u16>`. + /// + /// # Safety + /// + /// This function is unsafe as there is no guarantee that the given pointer is valid for `len` + /// elements. + /// + /// # Panics + /// + /// Panics if `len` is greater than 0 but `p` is a null pointer. + pub unsafe fn from_ptr_with_nul( + p: *const u16, + len: usize, + ) -> Result<Self, MissingNulError<u16>> { + if len == 0 { + return Ok(UCString::default()); + } + assert!(!p.is_null()); + let slice = slice::from_raw_parts(p, len); + UCString::from_vec_with_nul(slice) + } + + /// Constructs a new `U16String` copied from a `u16` pointer and a length. + /// + /// The `len` argument is the number of `u16` elements, **not** the number of bytes. + /// + /// The data should end with a nul terminator, but no checking is done on whether the data + /// actually ends with a nul terminator, or if the data contains any interior nul values. + /// + /// # Safety + /// + /// This function is unsafe as there is no guarantee that the given pointer is valid for `len` + /// elements. In addition, no checking for nul values is performed, so if there data does not + /// end with a nul terminator, or if there are any interior nul values, the resulting + /// `U16CString` will be invalid. + /// + /// # Panics + /// + /// Panics if `len` is greater than 0 but `p` is a null pointer. + pub unsafe fn from_ptr_with_nul_unchecked(p: *const u16, len: usize) -> Self { + if len == 0 { + return UCString::default(); + } + assert!(!p.is_null()); + let slice = slice::from_raw_parts(p, len); + UCString::from_vec_with_nul_unchecked(slice) + } + + /// Constructs a `U16CString` from anything that can be converted to an `OsStr`. + /// + /// The string will be scanned for invalid nul values. + /// + /// # Failures + /// + /// This function will return an error if the data contains a nul value. + /// The returned error will contain a `Vec<u16>` as well as the position of the nul value. + /// + /// # Examples + /// + /// ```rust + /// use widestring::U16CString; + /// let s = "MyString"; + /// // Create a wide string from the string + /// let wcstr = U16CString::from_os_str(s).unwrap(); + /// # assert_eq!(wcstr.to_string_lossy(), s); + /// ``` + /// + /// The following example demonstrates errors from nul values in a vector. + /// + /// ```rust + /// use widestring::U16CString; + /// let s = "My\u{0}String"; + /// // Create a wide string from the string + /// let res = U16CString::from_os_str(s); + /// assert!(res.is_err()); + /// assert_eq!(res.err().unwrap().nul_position(), 2); + /// ``` + #[cfg(feature = "std")] + pub fn from_os_str(s: impl AsRef<std::ffi::OsStr>) -> Result<Self, NulError<u16>> { + let v = crate::platform::os_to_wide(s.as_ref()); + UCString::new(v) + } + + /// Constructs a `U16CString` from anything that can be converted to an `OsStr`, without + /// checking for interior nul values. + /// + /// # Safety + /// + /// This method is equivalent to `from_os_str` except that no runtime assertion is made that + /// `s` contains no nul values. Providing a string with nul values will result in an invalid + /// `U16CString`. + /// + /// # Examples + /// + /// ```rust + /// use widestring::U16CString; + /// let s = "MyString"; + /// // Create a wide string from the string + /// let wcstr = unsafe { U16CString::from_os_str_unchecked(s) }; + /// # assert_eq!(wcstr.to_string_lossy(), s); + /// ``` + #[cfg(feature = "std")] + pub unsafe fn from_os_str_unchecked(s: impl AsRef<std::ffi::OsStr>) -> Self { + let v = crate::platform::os_to_wide(s.as_ref()); + UCString::from_vec_unchecked(v) + } + + /// Constructs a `U16CString` from anything that can be converted to an `OsStr` with a nul + /// terminator. + /// + /// The string will be truncated at the first nul value in the string. + /// + /// # Failures + /// + /// This function will return an error if the data does not contain a nul to terminate the + /// string. The returned error will contain the consumed `Vec<u16>`. + /// + /// # Examples + /// + /// ```rust + /// use widestring::U16CString; + /// let s = "My\u{0}String"; + /// // Create a wide string from the string + /// let wcstr = U16CString::from_os_str_with_nul(s).unwrap(); + /// assert_eq!(wcstr.to_string_lossy(), "My"); + /// ``` + /// + /// The following example demonstrates errors from missing nul values in a vector. + /// + /// ```rust + /// use widestring::U16CString; + /// let s = "MyString"; + /// // Create a wide string from the string + /// let res = U16CString::from_os_str_with_nul(s); + /// assert!(res.is_err()); + /// ``` + #[cfg(feature = "std")] + pub fn from_os_str_with_nul( + s: impl AsRef<std::ffi::OsStr>, + ) -> Result<Self, MissingNulError<u16>> { + let v = crate::platform::os_to_wide(s.as_ref()); + UCString::from_vec_with_nul(v) + } + + /// Constructs a `U16CString` from anything that can be converted to an `OsStr` that should + /// have a terminating nul, but without checking for any nul values. + /// + /// # Safety + /// + /// This method is equivalent to `from_os_str_with_nul` except that no runtime assertion is + /// made that `s` contains no nul values. Providing a vector with interior nul values or + /// without a terminating nul value will result in an invalid `U16CString`. + /// + /// # Examples + /// + /// ```rust + /// use widestring::U16CString; + /// let s = "My String\u{0}"; + /// // Create a wide string from the string + /// let wcstr = unsafe { U16CString::from_os_str_with_nul_unchecked(s) }; + /// assert_eq!(wcstr.to_string_lossy(), "My String"); + /// ``` + #[cfg(feature = "std")] + pub unsafe fn from_os_str_with_nul_unchecked(s: impl AsRef<std::ffi::OsStr>) -> Self { + let v = crate::platform::os_to_wide(s.as_ref()); + UCString::from_vec_with_nul_unchecked(v) + } +} + +impl UCString<u32> { + /// Constructs a `U32CString` from a container of wide character data. + /// + /// This method will consume the provided data and use the underlying elements to construct a + /// new string. The data will be scanned for invalid nul values. + /// + /// # Failures + /// + /// This function will return an error if the data contains a nul value. + /// The returned error will contain the `Vec<u32>` as well as the position of the nul value. + /// + /// # Examples + /// + /// ```rust + /// use widestring::U32CString; + /// let v: Vec<char> = "Test".chars().collect(); + /// # let cloned: Vec<u32> = v.iter().map(|&c| c as u32).collect(); + /// // Create a wide string from the vector + /// let wcstr = U32CString::from_chars(v).unwrap(); + /// # assert_eq!(wcstr.into_vec(), cloned); + /// ``` + /// + /// The following example demonstrates errors from nul values in a vector. + /// + /// ```rust + /// use widestring::U32CString; + /// let v: Vec<char> = "T\u{0}est".chars().collect(); + /// // Create a wide string from the vector + /// let res = U32CString::from_chars(v); + /// assert!(res.is_err()); + /// assert_eq!(res.err().unwrap().nul_position(), 1); + /// ``` + pub fn from_chars(v: impl Into<Vec<char>>) -> Result<Self, NulError<u32>> { + let mut chars = v.into(); + let v: Vec<u32> = unsafe { + let ptr = chars.as_mut_ptr() as *mut u32; + let len = chars.len(); + let cap = chars.capacity(); + ManuallyDrop::new(chars); + Vec::from_raw_parts(ptr, len, cap) + }; + UCString::new(v) + } + + /// Constructs a `U32CString` from a nul-terminated container of UTF-32 data. + /// + /// This method will consume the provided data and use the underlying elements to construct a + /// new string. The string will be truncated at the first nul value in the string. + /// + /// # Failures + /// + /// This function will return an error if the data does not contain a nul to terminate the + /// string. The returned error will contain the consumed `Vec<u32>`. + /// + /// # Examples + /// + /// ```rust + /// use widestring::U32CString; + /// let v: Vec<char> = "Test\u{0}".chars().collect(); + /// # let cloned: Vec<u32> = v[..4].iter().map(|&c| c as u32).collect(); + /// // Create a wide string from the vector + /// let wcstr = U32CString::from_chars_with_nul(v).unwrap(); + /// # assert_eq!(wcstr.into_vec(), cloned); + /// ``` + /// + /// The following example demonstrates errors from missing nul values in a vector. + /// + /// ```rust + /// use widestring::U32CString; + /// let v: Vec<char> = "Test".chars().collect(); + /// // Create a wide string from the vector + /// let res = U32CString::from_chars_with_nul(v); + /// assert!(res.is_err()); + /// ``` + pub fn from_chars_with_nul(v: impl Into<Vec<char>>) -> Result<Self, MissingNulError<u32>> { + let mut chars = v.into(); + let v: Vec<u32> = unsafe { + let ptr = chars.as_mut_ptr() as *mut u32; + let len = chars.len(); + let cap = chars.capacity(); + ManuallyDrop::new(chars); + Vec::from_raw_parts(ptr, len, cap) + }; + UCString::from_vec_with_nul(v) + } + + /// Creates a `U32CString` from a vector without checking for interior nul values. + /// + /// A terminating nul value will be appended if the vector does not already have a terminating + /// nul. + /// + /// # Safety + /// + /// This method is equivalent to `new` except that no runtime assertion is made that `v` + /// contains no nul values. Providing a vector with nul values will result in an invalid + /// `U32CString`. + pub unsafe fn from_chars_unchecked(v: impl Into<Vec<char>>) -> Self { + let mut chars = v.into(); + let v: Vec<u32> = { + let ptr = chars.as_mut_ptr() as *mut u32; + let len = chars.len(); + let cap = chars.capacity(); + ManuallyDrop::new(chars); + Vec::from_raw_parts(ptr, len, cap) + }; + UCString::from_vec_unchecked(v) + } + + /// Creates a `U32CString` from a vector that should have a nul terminator, without checking + /// for any nul values. + /// + /// # Safety + /// + /// This method is equivalent to `from_vec_with_nul` except that no runtime assertion is made + /// that `v` contains no nul values. Providing a vector with interior nul values or without a + /// terminating nul value will result in an invalid `U32CString`. + pub unsafe fn from_chars_with_nul_unchecked(v: impl Into<Vec<char>>) -> Self { + let mut chars = v.into(); + let v: Vec<u32> = { + let ptr = chars.as_mut_ptr() as *mut u32; + let len = chars.len(); + let cap = chars.capacity(); + ManuallyDrop::new(chars); + Vec::from_raw_parts(ptr, len, cap) + }; + UCString::from_vec_with_nul_unchecked(v) + } + + /// Constructs a `U32CString` from a `str`. + /// + /// The string will be scanned for invalid nul values. + /// + /// # Failures + /// + /// This function will return an error if the data contains a nul value. + /// The returned error will contain a `Vec<u32>` as well as the position of the nul value. + /// + /// # Examples + /// + /// ```rust + /// use widestring::U32CString; + /// let s = "MyString"; + /// // Create a wide string from the string + /// let wcstr = U32CString::from_str(s).unwrap(); + /// # assert_eq!(wcstr.to_string_lossy(), s); + /// ``` + /// + /// The following example demonstrates errors from nul values in a vector. + /// + /// ```rust + /// use widestring::U32CString; + /// let s = "My\u{0}String"; + /// // Create a wide string from the string + /// let res = U32CString::from_str(s); + /// assert!(res.is_err()); + /// assert_eq!(res.err().unwrap().nul_position(), 2); + /// ``` + #[allow(clippy::should_implement_trait)] + pub fn from_str(s: impl AsRef<str>) -> Result<Self, NulError<u32>> { + let v: Vec<char> = s.as_ref().chars().collect(); + UCString::from_chars(v) + } + + /// Constructs a `U32CString` from a `str`, without checking for interior nul values. + /// + /// # Safety + /// + /// This method is equivalent to `from_str` except that no runtime assertion is made that `s` + /// contains no nul values. Providing a string with nul values will result in an invalid + /// `U32CString`. + /// + /// # Examples + /// + /// ```rust + /// use widestring::U32CString; + /// let s = "MyString"; + /// // Create a wide string from the string + /// let wcstr = unsafe { U32CString::from_str_unchecked(s) }; + /// # assert_eq!(wcstr.to_string_lossy(), s); + /// ``` + pub unsafe fn from_str_unchecked(s: impl AsRef<str>) -> Self { + let v: Vec<char> = s.as_ref().chars().collect(); + UCString::from_chars_unchecked(v) + } + + /// Constructs a `U32CString` from a `str` with a nul terminator. + /// + /// The string will be truncated at the first nul value in the string. + /// + /// # Failures + /// + /// This function will return an error if the data does not contain a nul to terminate the + /// string. The returned error will contain the consumed `Vec<u32>`. + /// + /// # Examples + /// + /// ```rust + /// use widestring::U32CString; + /// let s = "My\u{0}String"; + /// // Create a wide string from the string + /// let wcstr = U32CString::from_str_with_nul(s).unwrap(); + /// assert_eq!(wcstr.to_string_lossy(), "My"); + /// ``` + /// + /// The following example demonstrates errors from missing nul values in a vector. + /// + /// ```rust + /// use widestring::U32CString; + /// let s = "MyString"; + /// // Create a wide string from the string + /// let res = U32CString::from_str_with_nul(s); + /// assert!(res.is_err()); + /// ``` + pub fn from_str_with_nul(s: impl AsRef<str>) -> Result<Self, MissingNulError<u32>> { + let v: Vec<char> = s.as_ref().chars().collect(); + UCString::from_chars_with_nul(v) + } + + /// Constructs a `U32CString` from a `str` that should have a terminating nul, but without + /// checking for any nul values. + /// + /// # Safety + /// + /// This method is equivalent to `from_str_with_nul` except that no runtime assertion is made + /// that `s` contains no nul values. Providing a vector with interior nul values or without a + /// terminating nul value will result in an invalid `U32CString`. + /// + /// # Examples + /// + /// ```rust + /// use widestring::U32CString; + /// let s = "My String\u{0}"; + /// // Create a wide string from the string + /// let wcstr = unsafe { U32CString::from_str_with_nul_unchecked(s) }; + /// assert_eq!(wcstr.to_string_lossy(), "My String"); + /// ``` + pub unsafe fn from_str_with_nul_unchecked(s: impl AsRef<str>) -> Self { + let v: Vec<char> = s.as_ref().chars().collect(); + UCString::from_chars_with_nul_unchecked(v) + } + + /// Constructs a new `U32CString` copied from a `u32` pointer and a length. + /// + /// The `len` argument is the number of `u32` elements, **not** the number of bytes. + /// + /// The string will be scanned for invalid nul values. + /// + /// # Failures + /// + /// This function will return an error if the data contains a nul value. + /// The returned error will contain a `Vec<u32>` as well as the position of the nul value. + /// + /// # Safety + /// + /// This function is unsafe as there is no guarantee that the given pointer is valid for `len` + /// elements. + /// + /// # Panics + /// + /// Panics if `len` is greater than 0 but `p` is a null pointer. + pub unsafe fn from_ptr(p: *const u32, len: usize) -> Result<Self, NulError<u32>> { + if len == 0 { + return Ok(UCString::default()); + } + assert!(!p.is_null()); + let slice = slice::from_raw_parts(p, len); + UCString::new(slice) + } + + /// Constructs a new `U32CString` copied from a `u32` pointer and a length. + /// + /// The `len` argument is the number of `u32` elements, **not** the number of bytes. + /// + /// The string will **not** be checked for invalid nul values. + /// + /// # Safety + /// + /// This function is unsafe as there is no guarantee that the given pointer is valid for `len` + /// elements. In addition, no checking for invalid nul values is performed, so if any elements + /// of `p` are a nul value, the resulting `U16CString` will be invalid. + /// + /// # Panics + /// + /// Panics if `len` is greater than 0 but `p` is a null pointer. + pub unsafe fn from_ptr_unchecked(p: *const u32, len: usize) -> Self { + if len == 0 { + return UCString::default(); + } + assert!(!p.is_null()); + let slice = slice::from_raw_parts(p, len); + UCString::from_vec_unchecked(slice) + } + + /// Constructs a new `U32String` copied from a `u32` pointer and a length. + /// + /// The `len` argument is the number of `u32` elements, **not** the number of bytes. + /// + /// The string will be truncated at the first nul value in the string. + /// + /// # Failures + /// + /// This function will return an error if the data does not contain a nul to terminate the + /// string. The returned error will contain the consumed `Vec<u32>`. + /// + /// # Safety + /// + /// This function is unsafe as there is no guarantee that the given pointer is valid for `len` + /// elements. + /// + /// # Panics + /// + /// Panics if `len` is greater than 0 but `p` is a null pointer. + pub unsafe fn from_ptr_with_nul( + p: *const u32, + len: usize, + ) -> Result<Self, MissingNulError<u32>> { + if len == 0 { + return Ok(UCString::default()); + } + assert!(!p.is_null()); + let slice = slice::from_raw_parts(p, len); + UCString::from_vec_with_nul(slice) + } + + /// Constructs a new `U32String` copied from a `u32` pointer and a length. + /// + /// The `len` argument is the number of `u32` elements, **not** the number of bytes. + /// + /// The data should end with a nul terminator, but no checking is done on whether the data + /// actually ends with a nul terminator, or if the data contains any interior nul values. + /// + /// # Safety + /// + /// This function is unsafe as there is no guarantee that the given pointer is valid for `len` + /// elements. In addition, no checking for nul values is performed, so if there data does not + /// end with a nul terminator, or if there are any interior nul values, the resulting + /// `U32CString` will be invalid. + /// + /// # Panics + /// + /// Panics if `len` is greater than 0 but `p` is a null pointer. + pub unsafe fn from_ptr_with_nul_unchecked(p: *const u32, len: usize) -> Self { + if len == 0 { + return UCString::default(); + } + assert!(!p.is_null()); + let slice = slice::from_raw_parts(p, len); + UCString::from_vec_with_nul_unchecked(slice) + } + + /// Constructs a new `U32CString` copied from a `char` pointer and a length. + /// + /// The `len` argument is the number of `char` elements, **not** the number of bytes. + /// + /// The string will be scanned for invalid nul values. + /// + /// # Failures + /// + /// This function will return an error if the data contains a nul value. + /// The returned error will contain a `Vec<u32>` as well as the position of the nul value. + /// + /// # Safety + /// + /// This function is unsafe as there is no guarantee that the given pointer is valid for `len` + /// elements. + /// + /// # Panics + /// + /// Panics if `len` is greater than 0 but `p` is a null pointer. + pub unsafe fn from_char_ptr(p: *const char, len: usize) -> Result<Self, NulError<u32>> { + UCString::<u32>::from_ptr(p as *const u32, len) + } + + /// Constructs a new `U32CString` copied from a `char` pointer and a length. + /// + /// The `len` argument is the number of `char` elements, **not** the number of bytes. + /// + /// The string will **not** be checked for invalid nul values. + /// + /// # Safety + /// + /// This function is unsafe as there is no guarantee that the given pointer is valid for `len` + /// elements. In addition, no checking for invalid nul values is performed, so if any elements + /// of `p` are a nul value, the resulting `U32CString` will be invalid. + /// + /// # Panics + /// + /// Panics if `len` is greater than 0 but `p` is a null pointer. + pub unsafe fn from_char_ptr_unchecked(p: *const char, len: usize) -> Self { + UCString::<u32>::from_ptr_unchecked(p as *const u32, len) + } + + /// Constructs a new `U32String` copied from a `char` pointer and a length. + /// + /// The `len` argument is the number of `char` elements, **not** the number of bytes. + /// + /// The string will be truncated at the first nul value in the string. + /// + /// # Failures + /// + /// This function will return an error if the data does not contain a nul to terminate the + /// string. The returned error will contain the consumed `Vec<u32>`. + /// + /// # Safety + /// + /// This function is unsafe as there is no guarantee that the given pointer is valid for `len` + /// elements. + /// + /// # Panics + /// + /// Panics if `len` is greater than 0 but `p` is a null pointer. + pub unsafe fn from_char_ptr_with_nul( + p: *const char, + len: usize, + ) -> Result<Self, MissingNulError<u32>> { + UCString::<u32>::from_ptr_with_nul(p as *const u32, len) + } + + /// Constructs a new `U32String` copied from a `char` pointer and a length. + /// + /// The `len` argument is the number of `char` elements, **not** the number of bytes. + /// + /// The data should end with a nul terminator, but no checking is done on whether the data + /// actually ends with a nul terminator, or if the data contains any interior nul values. + /// + /// # Safety + /// + /// This function is unsafe as there is no guarantee that the given pointer is valid for `len` + /// elements. In addition, no checking for nul values is performed, so if there data does not + /// end with a nul terminator, or if there are any interior nul values, the resulting + /// `U32CString` will be invalid. + /// + /// # Panics + /// + /// Panics if `len` is greater than 0 but `p` is a null pointer. + pub unsafe fn from_char_ptr_with_nul_unchecked(p: *const char, len: usize) -> Self { + UCString::<u32>::from_ptr_with_nul_unchecked(p as *const u32, len) + } + + /// Constructs a `U32CString` from anything that can be converted to an `OsStr`. + /// + /// The string will be scanned for invalid nul values. + /// + /// # Failures + /// + /// This function will return an error if the data contains a nul value. + /// The returned error will contain a `Vec<u16>` as well as the position of the nul value. + /// + /// # Examples + /// + /// ```rust + /// use widestring::U32CString; + /// let s = "MyString"; + /// // Create a wide string from the string + /// let wcstr = U32CString::from_os_str(s).unwrap(); + /// # assert_eq!(wcstr.to_string_lossy(), s); + /// ``` + /// + /// The following example demonstrates errors from nul values in a vector. + /// + /// ```rust + /// use widestring::U32CString; + /// let s = "My\u{0}String"; + /// // Create a wide string from the string + /// let res = U32CString::from_os_str(s); + /// assert!(res.is_err()); + /// assert_eq!(res.err().unwrap().nul_position(), 2); + /// ``` + #[cfg(feature = "std")] + pub fn from_os_str(s: impl AsRef<std::ffi::OsStr>) -> Result<Self, NulError<u32>> { + let v: Vec<char> = s.as_ref().to_string_lossy().chars().collect(); + UCString::from_chars(v) + } + + /// Constructs a `U32CString` from anything that can be converted to an `OsStr`, without + /// checking for interior nul values. + /// + /// # Safety + /// + /// This method is equivalent to `from_os_str` except that no runtime assertion is made that + /// `s` contains no nul values. Providing a string with nul values will result in an invalid + /// `U32CString`. + /// + /// # Examples + /// + /// ```rust + /// use widestring::U32CString; + /// let s = "MyString"; + /// // Create a wide string from the string + /// let wcstr = unsafe { U32CString::from_os_str_unchecked(s) }; + /// # assert_eq!(wcstr.to_string_lossy(), s); + /// ``` + #[cfg(feature = "std")] + pub unsafe fn from_os_str_unchecked(s: impl AsRef<std::ffi::OsStr>) -> Self { + let v: Vec<char> = s.as_ref().to_string_lossy().chars().collect(); + UCString::from_chars_unchecked(v) + } + + /// Constructs a `U32CString` from anything that can be converted to an `OsStr` with a nul + /// terminator. + /// + /// The string will be truncated at the first nul value in the string. + /// + /// # Failures + /// + /// This function will return an error if the data does not contain a nul to terminate the + /// string. The returned error will contain the consumed `Vec<u16>`. + /// + /// # Examples + /// + /// ```rust + /// use widestring::U32CString; + /// let s = "My\u{0}String"; + /// // Create a wide string from the string + /// let wcstr = U32CString::from_os_str_with_nul(s).unwrap(); + /// assert_eq!(wcstr.to_string_lossy(), "My"); + /// ``` + /// + /// The following example demonstrates errors from missing nul values in a vector. + /// + /// ```rust + /// use widestring::U32CString; + /// let s = "MyString"; + /// // Create a wide string from the string + /// let res = U32CString::from_os_str_with_nul(s); + /// assert!(res.is_err()); + /// ``` + #[cfg(feature = "std")] + pub fn from_os_str_with_nul( + s: impl AsRef<std::ffi::OsStr>, + ) -> Result<Self, MissingNulError<u32>> { + let v: Vec<char> = s.as_ref().to_string_lossy().chars().collect(); + UCString::from_chars_with_nul(v) + } + + /// Constructs a `U32CString` from anything that can be converted to an `OsStr` that should + /// have a terminating nul, but without checking for any nul values. + /// + /// # Safety + /// + /// This method is equivalent to `from_os_str_with_nul` except that no runtime assertion is + /// made that `s` contains no nul values. Providing a vector with interior nul values or + /// without a terminating nul value will result in an invalid `U32CString`. + /// + /// # Examples + /// + /// ```rust + /// use widestring::U32CString; + /// let s = "My String\u{0}"; + /// // Create a wide string from the string + /// let wcstr = unsafe { U32CString::from_os_str_with_nul_unchecked(s) }; + /// assert_eq!(wcstr.to_string_lossy(), "My String"); + /// ``` + #[cfg(feature = "std")] + pub unsafe fn from_os_str_with_nul_unchecked(s: impl AsRef<std::ffi::OsStr>) -> Self { + let v: Vec<char> = s.as_ref().to_string_lossy().chars().collect(); + UCString::from_chars_with_nul_unchecked(v) + } +} + +impl<C: UChar> Into<Vec<C>> for UCString<C> { + fn into(self) -> Vec<C> { + self.into_vec() + } +} + +impl<'a> From<UCString<u16>> for Cow<'a, UCStr<u16>> { + fn from(s: UCString<u16>) -> Cow<'a, UCStr<u16>> { + Cow::Owned(s) + } +} + +impl<'a> From<UCString<u32>> for Cow<'a, UCStr<u32>> { + fn from(s: UCString<u32>) -> Cow<'a, UCStr<u32>> { + Cow::Owned(s) + } +} + +#[cfg(feature = "std")] +impl From<UCString<u16>> for std::ffi::OsString { + fn from(s: UCString<u16>) -> std::ffi::OsString { + s.to_os_string() + } +} + +#[cfg(feature = "std")] +impl From<UCString<u32>> for std::ffi::OsString { + fn from(s: UCString<u32>) -> std::ffi::OsString { + s.to_os_string() + } +} + +impl<C: UChar> From<UCString<C>> for UString<C> { + fn from(s: UCString<C>) -> Self { + s.to_ustring() + } +} + +impl<'a, C: UChar, T: ?Sized + AsRef<UCStr<C>>> From<&'a T> for UCString<C> { + fn from(s: &'a T) -> Self { + s.as_ref().to_ucstring() + } +} + +impl<C: UChar> Index<RangeFull> for UCString<C> { + type Output = UCStr<C>; + + #[inline] + fn index(&self, _index: RangeFull) -> &UCStr<C> { + UCStr::from_inner(&self.inner) + } +} + +impl<C: UChar> Deref for UCString<C> { + type Target = UCStr<C>; + + #[inline] + fn deref(&self) -> &UCStr<C> { + &self[..] + } +} + +impl<'a> Default for &'a UCStr<u16> { + fn default() -> Self { + const SLICE: &[u16] = &[UChar::NUL]; + unsafe { UCStr::from_slice_with_nul_unchecked(SLICE) } + } +} + +impl<'a> Default for &'a UCStr<u32> { + fn default() -> Self { + const SLICE: &[u32] = &[UChar::NUL]; + unsafe { UCStr::from_slice_with_nul_unchecked(SLICE) } + } +} + +impl Default for UCString<u16> { + fn default() -> Self { + let def: &UCStr<u16> = Default::default(); + def.to_ucstring() + } +} + +impl Default for UCString<u32> { + fn default() -> Self { + let def: &UCStr<u32> = Default::default(); + def.to_ucstring() + } +} + +// Turns this `U16CString` into an empty string to prevent +// memory unsafe code from working by accident. Inline +// to prevent LLVM from optimizing it away in debug builds. +impl<C: UChar> Drop for UCString<C> { + #[inline] + fn drop(&mut self) { + unsafe { + *self.inner.get_unchecked_mut(0) = UChar::NUL; + } + } +} + +impl<C: UChar> Borrow<UCStr<C>> for UCString<C> { + fn borrow(&self) -> &UCStr<C> { + &self[..] + } +} + +impl<C: UChar> ToOwned for UCStr<C> { + type Owned = UCString<C>; + fn to_owned(&self) -> UCString<C> { + self.to_ucstring() + } +} + +impl<'a> From<&'a UCStr<u16>> for Cow<'a, UCStr<u16>> { + fn from(s: &'a UCStr<u16>) -> Cow<'a, UCStr<u16>> { + Cow::Borrowed(s) + } +} + +impl<'a> From<&'a UCStr<u32>> for Cow<'a, UCStr<u32>> { + fn from(s: &'a UCStr<u32>) -> Cow<'a, UCStr<u32>> { + Cow::Borrowed(s) + } +} + +impl<C: UChar> AsRef<UCStr<C>> for UCStr<C> { + fn as_ref(&self) -> &Self { + self + } +} + +impl<C: UChar> AsRef<UCStr<C>> for UCString<C> { + fn as_ref(&self) -> &UCStr<C> { + self + } +} + +impl<C: UChar> AsRef<[C]> for UCStr<C> { + fn as_ref(&self) -> &[C] { + self.as_slice() + } +} + +impl<C: UChar> AsRef<[C]> for UCString<C> { + fn as_ref(&self) -> &[C] { + self.as_slice() + } +} + +impl<'a, C: UChar> From<&'a UCStr<C>> for Box<UCStr<C>> { + fn from(s: &'a UCStr<C>) -> Box<UCStr<C>> { + let boxed: Box<[C]> = Box::from(s.as_slice_with_nul()); + unsafe { Box::from_raw(Box::into_raw(boxed) as *mut UCStr<C>) } + } +} + +impl<C: UChar> From<Box<UCStr<C>>> for UCString<C> { + #[inline] + fn from(s: Box<UCStr<C>>) -> Self { + s.into_ucstring() + } +} + +impl<C: UChar> From<UCString<C>> for Box<UCStr<C>> { + #[inline] + fn from(s: UCString<C>) -> Box<UCStr<C>> { + s.into_boxed_ucstr() + } +} + +impl<C: UChar> Default for Box<UCStr<C>> { + fn default() -> Box<UCStr<C>> { + let boxed: Box<[C]> = Box::from([UChar::NUL]); + unsafe { Box::from_raw(Box::into_raw(boxed) as *mut UCStr<C>) } + } +} + +impl<C: UChar> NulError<C> { + /// Returns the position of the nul value in the slice that was provided to `U16CString`. + pub fn nul_position(&self) -> usize { + self.0 + } + + /// Consumes this error, returning the underlying vector of u16 values which generated the error + /// in the first place. + pub fn into_vec(self) -> Vec<C> { + self.1 + } +} + +impl<C: UChar> Into<Vec<C>> for NulError<C> { + fn into(self) -> Vec<C> { + self.into_vec() + } +} + +impl<C: UChar> core::fmt::Display for NulError<C> { + fn fmt(&self, f: &mut core::fmt::Formatter) -> core::fmt::Result { + write!(f, "nul value found at position {}", self.0) + } +} + +#[cfg(feature = "std")] +impl<C: UChar> std::error::Error for NulError<C> { + fn description(&self) -> &str { + "nul value found" + } +} + +/// An owned, mutable C-style "wide" string for FFI that is nul-aware and nul-terminated. +/// +/// `U16CString` is aware of nul values. Unless unchecked conversions are used, all `U16CString` +/// strings end with a nul-terminator in the underlying buffer and contain no internal nul values. +/// The strings may still contain invalid or ill-formed UTF-16 data. These strings are intended to +/// be used with FFI functions such as Windows API that may require nul-terminated strings. +/// +/// `U16CString` can be converted to and from many other string types, including `U16String`, +/// `OsString`, and `String`, making proper Unicode FFI safe and easy. +/// +/// # Examples +/// +/// The following example constructs a `U16CString` and shows how to convert a `U16CString` to a +/// regular Rust `String`. +/// +/// ```rust +/// use widestring::U16CString; +/// let s = "Test"; +/// // Create a wide string from the rust string +/// let wstr = U16CString::from_str(s).unwrap(); +/// // Convert back to a rust string +/// let rust_str = wstr.to_string_lossy(); +/// assert_eq!(rust_str, "Test"); +/// ``` +pub type U16CString = UCString<u16>; + +/// An owned, mutable C-style wide string for FFI that is nul-aware and nul-terminated. +/// +/// `U32CString` is aware of nul values. Unless unchecked conversions are used, all `U32CString` +/// strings end with a nul-terminator in the underlying buffer and contain no internal nul values. +/// The strings may still contain invalid or ill-formed UTF-32 data. These strings are intended to +/// be used with FFI functions such as Windows API that may require nul-terminated strings. +/// +/// `U32CString` can be converted to and from many other string types, including `U32String`, +/// `OsString`, and `String`, making proper Unicode FFI safe and easy. +/// +/// # Examples +/// +/// The following example constructs a `U32CString` and shows how to convert a `U32CString` to a +/// regular Rust `String`. +/// +/// ```rust +/// use widestring::U32CString; +/// let s = "Test"; +/// // Create a wide string from the rust string +/// let wstr = U32CString::from_str(s).unwrap(); +/// // Convert back to a rust string +/// let rust_str = wstr.to_string_lossy(); +/// assert_eq!(rust_str, "Test"); +/// ``` +pub type U32CString = UCString<u32>; + +/// Alias for `U16String` or `U32String` depending on platform. Intended to match typical C `wchar_t` size on platform. +pub type WideCString = UCString<WideChar>; diff --git a/rust/vendor/widestring/src/ustr.rs b/rust/vendor/widestring/src/ustr.rs new file mode 100644 index 0000000..398b549 --- /dev/null +++ b/rust/vendor/widestring/src/ustr.rs @@ -0,0 +1,359 @@ +use crate::{UChar, WideChar}; +use core::{char, slice}; + +#[cfg(all(feature = "alloc", not(feature = "std")))] +use alloc::{ + boxed::Box, + string::{FromUtf16Error, String}, + vec::Vec, +}; +#[cfg(feature = "std")] +use std::{ + boxed::Box, + string::{FromUtf16Error, String}, + vec::Vec, +}; + +/// A possible error value when converting a String from a UTF-32 byte slice. +/// +/// This type is the error type for the `to_string` method on `U32Str`. +#[derive(Debug, Clone, Copy, PartialEq, Eq)] +pub struct FromUtf32Error(); + +impl core::fmt::Display for FromUtf32Error { + fn fmt(&self, f: &mut core::fmt::Formatter) -> core::fmt::Result { + write!(f, "error converting from UTF-32 to UTF-8") + } +} + +#[cfg(feature = "std")] +impl std::error::Error for FromUtf32Error { + fn description(&self) -> &str { + "error converting from UTF-32 to UTF-8" + } +} + +/// String slice reference for `U16String`. +/// +/// `UStr` is to `UString` as `str` is to `String`. +/// +/// `UStr` is not aware of nul values. Strings may or may not be nul-terminated, and may +/// contain invalid and ill-formed UTF-16 or UTF-32 data. These strings are intended to be used +/// with FFI functions that directly use string length, where the strings are known to have proper +/// nul-termination already, or where strings are merely being passed through without modification. +/// +/// `UCStr` should be used instead of nul-aware strings are required. +/// +/// `UStr` can be converted to many other string types, including `OsString` and `String`, making +/// proper Unicode FFI safe and easy. +/// +/// Please prefer using the type aliases `U16Str` or `U32Str` or `WideStr` to using this type +/// directly. +#[derive(Debug, PartialEq, Eq, PartialOrd, Ord, Hash)] +pub struct UStr<C: UChar> { + pub(crate) inner: [C], +} + +impl<C: UChar> UStr<C> { + /// Coerces a value into a `UStr`. + pub fn new<S: AsRef<Self> + ?Sized>(s: &S) -> &Self { + s.as_ref() + } + + /// Constructs a `UStr` from a pointer and a length. + /// + /// The `len` argument is the number of elements, **not** the number of bytes. + /// + /// # Safety + /// + /// This function is unsafe as there is no guarantee that the given pointer is valid for `len` + /// elements. + /// + /// # Panics + /// + /// This function panics if `p` is null. + /// + /// # Caveat + /// + /// The lifetime for the returned string is inferred from its usage. To prevent accidental + /// misuse, it's suggested to tie the lifetime to whichever source lifetime is safe in the + /// context, such as by providing a helper function taking the lifetime of a host value for the + /// string, or by explicit annotation. + pub unsafe fn from_ptr<'a>(p: *const C, len: usize) -> &'a Self { + assert!(!p.is_null()); + let slice: *const [C] = slice::from_raw_parts(p, len); + &*(slice as *const UStr<C>) + } + + /// Constructs a `UStr` from a slice of code points. + /// + /// No checks are performed on the slice. + pub fn from_slice(slice: &[C]) -> &Self { + let ptr: *const [C] = slice; + unsafe { &*(ptr as *const UStr<C>) } + } + + /// Copies the wide string to a new owned `UString`. + #[cfg(feature = "alloc")] + pub fn to_ustring(&self) -> crate::UString<C> { + crate::UString::from_vec(&self.inner) + } + + /// Converts to a slice of the wide string. + pub fn as_slice(&self) -> &[C] { + &self.inner + } + + /// Returns a raw pointer to the wide string. + /// + /// The pointer is valid only as long as the lifetime of this reference. + pub fn as_ptr(&self) -> *const C { + self.inner.as_ptr() + } + + /// Returns the length of the wide string as number of elements (**not** number of bytes). + pub fn len(&self) -> usize { + self.inner.len() + } + + /// Returns whether this wide string contains no data. + pub fn is_empty(&self) -> bool { + self.inner.is_empty() + } + + /// Converts a `Box<UStr>` into a `UString` without copying or allocating. + #[cfg(feature = "alloc")] + pub fn into_ustring(self: Box<Self>) -> crate::UString<C> { + let boxed = unsafe { Box::from_raw(Box::into_raw(self) as *mut [C]) }; + crate::UString { + inner: boxed.into_vec(), + } + } +} + +impl UStr<u16> { + /// Decodes a wide string to an owned `OsString`. + /// + /// This makes a string copy of the `U16Str`. Since `U16Str` makes no guarantees that it is + /// valid UTF-16, there is no guarantee that the resulting `OsString` will be valid data. + /// + /// # Examples + /// + /// ```rust + /// use widestring::U16String; + /// use std::ffi::OsString; + /// let s = "MyString"; + /// // Create a wide string from the string + /// let wstr = U16String::from_str(s); + /// // Create an OsString from the wide string + /// let osstr = wstr.to_os_string(); + /// + /// assert_eq!(osstr, OsString::from(s)); + /// ``` + #[cfg(feature = "std")] + pub fn to_os_string(&self) -> std::ffi::OsString { + crate::platform::os_from_wide(&self.inner) + } + + /// Copies the wide string to a `String` if it contains valid UTF-16 data. + /// + /// # Failures + /// + /// Returns an error if the string contains any invalid UTF-16 data. + /// + /// # Examples + /// + /// ```rust + /// use widestring::U16String; + /// let s = "MyString"; + /// // Create a wide string from the string + /// let wstr = U16String::from_str(s); + /// // Create a regular string from the wide string + /// let s2 = wstr.to_string().unwrap(); + /// + /// assert_eq!(s2, s); + /// ``` + #[cfg(feature = "alloc")] + pub fn to_string(&self) -> Result<String, FromUtf16Error> { + String::from_utf16(&self.inner) + } + + /// Copies the wide string to a `String`. + /// + /// Any non-Unicode sequences are replaced with *U+FFFD REPLACEMENT CHARACTER*. + /// + /// # Examples + /// + /// ```rust + /// use widestring::U16String; + /// let s = "MyString"; + /// // Create a wide string from the string + /// let wstr = U16String::from_str(s); + /// // Create a regular string from the wide string + /// let lossy = wstr.to_string_lossy(); + /// + /// assert_eq!(lossy, s); + /// ``` + #[cfg(feature = "alloc")] + pub fn to_string_lossy(&self) -> String { + String::from_utf16_lossy(&self.inner) + } +} + +impl UStr<u32> { + /// Constructs a `U32Str` from a `char` pointer and a length. + /// + /// The `len` argument is the number of `char` elements, **not** the number of bytes. + /// + /// # Safety + /// + /// This function is unsafe as there is no guarantee that the given pointer is valid for `len` + /// elements. + /// + /// # Panics + /// + /// This function panics if `p` is null. + /// + /// # Caveat + /// + /// The lifetime for the returned string is inferred from its usage. To prevent accidental + /// misuse, it's suggested to tie the lifetime to whichever source lifetime is safe in the + /// context, such as by providing a helper function taking the lifetime of a host value for the + /// string, or by explicit annotation. + pub unsafe fn from_char_ptr<'a>(p: *const char, len: usize) -> &'a Self { + UStr::from_ptr(p as *const u32, len) + } + + /// Constructs a `U32Str` from a slice of `u32` code points. + /// + /// No checks are performed on the slice. + pub fn from_char_slice(slice: &[char]) -> &Self { + let ptr: *const [char] = slice; + unsafe { &*(ptr as *const UStr<u32>) } + } + + /// Decodes a wide string to an owned `OsString`. + /// + /// This makes a string copy of the `U32Str`. Since `U32Str` makes no guarantees that it is + /// valid UTF-32, there is no guarantee that the resulting `OsString` will be valid data. + /// + /// # Examples + /// + /// ```rust + /// use widestring::U32String; + /// use std::ffi::OsString; + /// let s = "MyString"; + /// // Create a wide string from the string + /// let wstr = U32String::from_str(s); + /// // Create an OsString from the wide string + /// let osstr = wstr.to_os_string(); + /// + /// assert_eq!(osstr, OsString::from(s)); + /// ``` + #[cfg(feature = "std")] + pub fn to_os_string(&self) -> std::ffi::OsString { + self.to_string_lossy().into() + } + + /// Copies the wide string to a `String` if it contains valid UTF-32 data. + /// + /// # Failures + /// + /// Returns an error if the string contains any invalid UTF-32 data. + /// + /// # Examples + /// + /// ```rust + /// use widestring::U32String; + /// let s = "MyString"; + /// // Create a wide string from the string + /// let wstr = U32String::from_str(s); + /// // Create a regular string from the wide string + /// let s2 = wstr.to_string().unwrap(); + /// + /// assert_eq!(s2, s); + /// ``` + #[cfg(feature = "alloc")] + pub fn to_string(&self) -> Result<String, FromUtf32Error> { + let chars: Vec<Option<char>> = self.inner.iter().map(|c| char::from_u32(*c)).collect(); + if chars.iter().any(|c| c.is_none()) { + return Err(FromUtf32Error()); + } + let size = chars.iter().filter_map(|o| o.map(|c| c.len_utf8())).sum(); + let mut vec = Vec::with_capacity(size); + unsafe { vec.set_len(size) }; + let mut i = 0; + for c in chars.iter().filter_map(|&o| o) { + c.encode_utf8(&mut vec[i..]); + i += c.len_utf8(); + } + Ok(unsafe { String::from_utf8_unchecked(vec) }) + } + + /// Copies the wide string to a `String`. + /// + /// Any non-Unicode sequences are replaced with *U+FFFD REPLACEMENT CHARACTER*. + /// + /// # Examples + /// + /// ```rust + /// use widestring::U32String; + /// let s = "MyString"; + /// // Create a wide string from the string + /// let wstr = U32String::from_str(s); + /// // Create a regular string from the wide string + /// let lossy = wstr.to_string_lossy(); + /// + /// assert_eq!(lossy, s); + /// ``` + #[cfg(feature = "alloc")] + pub fn to_string_lossy(&self) -> String { + let chars: Vec<char> = self + .inner + .iter() + .map(|&c| char::from_u32(c).unwrap_or(char::REPLACEMENT_CHARACTER)) + .collect(); + let size = chars.iter().map(|c| c.len_utf8()).sum(); + let mut vec = Vec::with_capacity(size); + unsafe { vec.set_len(size) }; + let mut i = 0; + for c in chars { + c.encode_utf8(&mut vec[i..]); + i += c.len_utf8(); + } + unsafe { String::from_utf8_unchecked(vec) } + } +} + +/// String slice reference for `U16String`. +/// +/// `U16Str` is to `U16String` as `str` is to `String`. +/// +/// `U16Str` is not aware of nul values. Strings may or may not be nul-terminated, and may +/// contain invalid and ill-formed UTF-16 data. These strings are intended to be used with +/// FFI functions that directly use string length, where the strings are known to have proper +/// nul-termination already, or where strings are merely being passed through without modification. +/// +/// `WideCStr` should be used instead of nul-aware strings are required. +/// +/// `U16Str` can be converted to many other string types, including `OsString` and `String`, making +/// proper Unicode FFI safe and easy. +pub type U16Str = UStr<u16>; + +/// String slice reference for `U32String`. +/// +/// `U32Str` is to `U32String` as `str` is to `String`. +/// +/// `U32Str` is not aware of nul values. Strings may or may not be nul-terminated, and may +/// contain invalid and ill-formed UTF-32 data. These strings are intended to be used with +/// FFI functions that directly use string length, where the strings are known to have proper +/// nul-termination already, or where strings are merely being passed through without modification. +/// +/// `WideCStr` should be used instead of nul-aware strings are required. +/// +/// `U32Str` can be converted to many other string types, including `OsString` and `String`, making +/// proper Unicode FFI safe and easy. +pub type U32Str = UStr<u32>; + +/// Alias for `U16Str` or `U32Str` depending on platform. Intended to match typical C `wchar_t` size on platform. +pub type WideStr = UStr<WideChar>; diff --git a/rust/vendor/widestring/src/ustring.rs b/rust/vendor/widestring/src/ustring.rs new file mode 100644 index 0000000..8faabe8 --- /dev/null +++ b/rust/vendor/widestring/src/ustring.rs @@ -0,0 +1,785 @@ +use crate::{UChar, UStr, WideChar}; +use core::{ + borrow::Borrow, + char, cmp, + mem::ManuallyDrop, + ops::{Deref, Index, RangeFull}, + slice, +}; + +#[cfg(all(feature = "alloc", not(feature = "std")))] +use alloc::{ + borrow::{Cow, ToOwned}, + boxed::Box, + string::String, + vec::Vec, +}; +#[cfg(feature = "std")] +use std::{ + borrow::{Cow, ToOwned}, + boxed::Box, + string::String, + vec::Vec, +}; + +/// An owned, mutable "wide" string for FFI that is **not** nul-aware. +/// +/// `UString` is not aware of nul values. Strings may or may not be nul-terminated, and may +/// contain invalid and ill-formed UTF-16 or UTF-32 data. These strings are intended to be used +/// with FFI functions that directly use string length, where the strings are known to have proper +/// nul-termination already, or where strings are merely being passed through without modification. +/// +/// `UCString` should be used instead if nul-aware strings are required. +/// +/// `UString` can be converted to and from many other standard Rust string types, including +/// `OsString` and `String`, making proper Unicode FFI safe and easy. +/// +/// Please prefer using the type aliases `U16String` or `U32String` or `WideString` to using this +/// type directly. +/// +/// # Examples +/// +/// The following example constructs a `U16String` and shows how to convert a `U16String` to a +/// regular Rust `String`. +/// +/// ```rust +/// use widestring::U16String; +/// let s = "Test"; +/// // Create a wide string from the rust string +/// let wstr = U16String::from_str(s); +/// // Convert back to a rust string +/// let rust_str = wstr.to_string_lossy(); +/// assert_eq!(rust_str, "Test"); +/// ``` +/// +/// The same example using `U32String` instead: +/// +/// ```rust +/// use widestring::U32String; +/// let s = "Test"; +/// // Create a wide string from the rust string +/// let wstr = U32String::from_str(s); +/// // Convert back to a rust string +/// let rust_str = wstr.to_string_lossy(); +/// assert_eq!(rust_str, "Test"); +/// ``` +#[derive(Debug, Default, Clone, PartialEq, Eq, PartialOrd, Ord, Hash)] +pub struct UString<C: UChar> { + pub(crate) inner: Vec<C>, +} + +impl<C: UChar> UString<C> { + /// Constructs a new empty `UString`. + pub fn new() -> Self { + Self { inner: Vec::new() } + } + + /// Constructs a `UString` from a vector of possibly invalid or ill-formed UTF-16 or UTF-32 + /// data. + /// + /// No checks are made on the contents of the vector. + /// + /// # Examples + /// + /// ```rust + /// use widestring::U16String; + /// let v = vec![84u16, 104u16, 101u16]; // 'T' 'h' 'e' + /// # let cloned = v.clone(); + /// // Create a wide string from the vector + /// let wstr = U16String::from_vec(v); + /// # assert_eq!(wstr.into_vec(), cloned); + /// ``` + /// + /// ```rust + /// use widestring::U32String; + /// let v = vec![84u32, 104u32, 101u32]; // 'T' 'h' 'e' + /// # let cloned = v.clone(); + /// // Create a wide string from the vector + /// let wstr = U32String::from_vec(v); + /// # assert_eq!(wstr.into_vec(), cloned); + /// ``` + pub fn from_vec(raw: impl Into<Vec<C>>) -> Self { + Self { inner: raw.into() } + } + + /// Constructs a `UString` from a pointer and a length. + /// + /// The `len` argument is the number of elements, **not** the number of bytes. + /// + /// # Safety + /// + /// This function is unsafe as there is no guarantee that the given pointer is valid for `len` + /// elements. + /// + /// # Panics + /// + /// Panics if `len` is greater than 0 but `p` is a null pointer. + pub unsafe fn from_ptr(p: *const C, len: usize) -> Self { + if len == 0 { + return Self::new(); + } + assert!(!p.is_null()); + let slice = slice::from_raw_parts(p, len); + Self::from_vec(slice) + } + + /// Creates a `UString` with the given capacity. + /// + /// The string will be able to hold exactly `capacity` partial code units without reallocating. + /// If `capacity` is set to 0, the string will not initially allocate. + pub fn with_capacity(capacity: usize) -> Self { + Self { + inner: Vec::with_capacity(capacity), + } + } + + /// Returns the capacity this `UString` can hold without reallocating. + pub fn capacity(&self) -> usize { + self.inner.capacity() + } + + /// Truncate the `UString` to zero length. + pub fn clear(&mut self) { + self.inner.clear() + } + + /// Reserves the capacity for at least `additional` more capacity to be inserted in the given + /// `UString`. + /// + /// More space may be reserved to avoid frequent allocations. + pub fn reserve(&mut self, additional: usize) { + self.inner.reserve(additional) + } + + /// Reserves the minimum capacity for exactly `additional` more capacity to be inserted in the + /// given `UString`. Does nothing if the capcity is already sufficient. + /// + /// Note that the allocator may give more space than is requested. Therefore capacity can not + /// be relied upon to be precisely minimal. Prefer `reserve` if future insertions are expected. + pub fn reserve_exact(&mut self, additional: usize) { + self.inner.reserve_exact(additional) + } + + /// Converts the wide string into a `Vec`, consuming the string in the process. + pub fn into_vec(self) -> Vec<C> { + self.inner + } + + /// Converts to a `UStr` reference. + pub fn as_ustr(&self) -> &UStr<C> { + self + } + + /// Extends the wide string with the given `&UStr`. + /// + /// No checks are performed on the strings. It is possible to end up nul values inside the + /// string, and it is up to the caller to determine if that is acceptable. + /// + /// # Examples + /// + /// ```rust + /// use widestring::U16String; + /// let s = "MyString"; + /// let mut wstr = U16String::from_str(s); + /// let cloned = wstr.clone(); + /// // Push the clone to the end, repeating the string twice. + /// wstr.push(cloned); + /// + /// assert_eq!(wstr.to_string().unwrap(), "MyStringMyString"); + /// ``` + /// + /// ```rust + /// use widestring::U32String; + /// let s = "MyString"; + /// let mut wstr = U32String::from_str(s); + /// let cloned = wstr.clone(); + /// // Push the clone to the end, repeating the string twice. + /// wstr.push(cloned); + /// + /// assert_eq!(wstr.to_string().unwrap(), "MyStringMyString"); + /// ``` + pub fn push(&mut self, s: impl AsRef<UStr<C>>) { + self.inner.extend_from_slice(&s.as_ref().inner) + } + + /// Extends the wide string with the given slice. + /// + /// No checks are performed on the strings. It is possible to end up nul values inside the + /// string, and it is up to the caller to determine if that is acceptable. + /// + /// # Examples + /// + /// ```rust + /// use widestring::U16String; + /// let s = "MyString"; + /// let mut wstr = U16String::from_str(s); + /// let cloned = wstr.clone(); + /// // Push the clone to the end, repeating the string twice. + /// wstr.push_slice(cloned); + /// + /// assert_eq!(wstr.to_string().unwrap(), "MyStringMyString"); + /// ``` + /// + /// ```rust + /// use widestring::U32String; + /// let s = "MyString"; + /// let mut wstr = U32String::from_str(s); + /// let cloned = wstr.clone(); + /// // Push the clone to the end, repeating the string twice. + /// wstr.push_slice(cloned); + /// + /// assert_eq!(wstr.to_string().unwrap(), "MyStringMyString"); + /// ``` + pub fn push_slice(&mut self, s: impl AsRef<[C]>) { + self.inner.extend_from_slice(&s.as_ref()) + } + + /// Shrinks the capacity of the `UString` to match its length. + /// + /// # Examples + /// + /// ```rust + /// use widestring::U16String; + /// + /// let mut s = U16String::from_str("foo"); + /// + /// s.reserve(100); + /// assert!(s.capacity() >= 100); + /// + /// s.shrink_to_fit(); + /// assert_eq!(3, s.capacity()); + /// ``` + /// + /// ```rust + /// use widestring::U32String; + /// + /// let mut s = U32String::from_str("foo"); + /// + /// s.reserve(100); + /// assert!(s.capacity() >= 100); + /// + /// s.shrink_to_fit(); + /// assert_eq!(3, s.capacity()); + /// ``` + pub fn shrink_to_fit(&mut self) { + self.inner.shrink_to_fit(); + } + + /// Converts this `UString` into a boxed `UStr`. + /// + /// # Examples + /// + /// ``` + /// use widestring::{U16String, U16Str}; + /// + /// let s = U16String::from_str("hello"); + /// + /// let b: Box<U16Str> = s.into_boxed_ustr(); + /// ``` + /// + /// ``` + /// use widestring::{U32String, U32Str}; + /// + /// let s = U32String::from_str("hello"); + /// + /// let b: Box<U32Str> = s.into_boxed_ustr(); + /// ``` + pub fn into_boxed_ustr(self) -> Box<UStr<C>> { + let rw = Box::into_raw(self.inner.into_boxed_slice()) as *mut UStr<C>; + unsafe { Box::from_raw(rw) } + } +} + +impl UString<u16> { + /// Encodes a `U16String` copy from a `str`. + /// + /// This makes a wide string copy of the `str`. Since `str` will always be valid UTF-8, the + /// resulting `U16String` will also be valid UTF-16. + /// + /// # Examples + /// + /// ```rust + /// use widestring::U16String; + /// let s = "MyString"; + /// // Create a wide string from the string + /// let wstr = U16String::from_str(s); + /// + /// assert_eq!(wstr.to_string().unwrap(), s); + /// ``` + #[allow(clippy::should_implement_trait)] + pub fn from_str<S: AsRef<str> + ?Sized>(s: &S) -> Self { + Self { + inner: s.as_ref().encode_utf16().collect(), + } + } + + /// Encodes a `U16String` copy from an `OsStr`. + /// + /// This makes a wide string copy of the `OsStr`. Since `OsStr` makes no guarantees that it is + /// valid data, there is no guarantee that the resulting `U16String` will be valid UTF-16. + /// + /// # Examples + /// + /// ```rust + /// use widestring::U16String; + /// let s = "MyString"; + /// // Create a wide string from the string + /// let wstr = U16String::from_os_str(s); + /// + /// assert_eq!(wstr.to_string().unwrap(), s); + /// ``` + #[cfg(feature = "std")] + pub fn from_os_str<S: AsRef<std::ffi::OsStr> + ?Sized>(s: &S) -> Self { + Self { + inner: crate::platform::os_to_wide(s.as_ref()), + } + } + + /// Extends the string with the given `&str`. + /// + /// No checks are performed on the strings. It is possible to end up nul values inside the + /// string, and it is up to the caller to determine if that is acceptable. + /// + /// # Examples + /// + /// ```rust + /// use widestring::U16String; + /// let s = "MyString"; + /// let mut wstr = U16String::from_str(s); + /// // Push the original to the end, repeating the string twice. + /// wstr.push_str(s); + /// + /// assert_eq!(wstr.to_string().unwrap(), "MyStringMyString"); + /// ``` + pub fn push_str(&mut self, s: impl AsRef<str>) { + self.inner.extend(s.as_ref().encode_utf16()) + } + + /// Extends the string with the given `&OsStr`. + /// + /// No checks are performed on the strings. It is possible to end up nul values inside the + /// string, and it is up to the caller to determine if that is acceptable. + /// + /// # Examples + /// + /// ```rust + /// use widestring::U16String; + /// let s = "MyString"; + /// let mut wstr = U16String::from_str(s); + /// // Push the original to the end, repeating the string twice. + /// wstr.push_os_str(s); + /// + /// assert_eq!(wstr.to_string().unwrap(), "MyStringMyString"); + /// ``` + #[cfg(feature = "std")] + pub fn push_os_str(&mut self, s: impl AsRef<std::ffi::OsStr>) { + self.inner.extend(crate::platform::os_to_wide(s.as_ref())) + } +} + +impl UString<u32> { + /// Constructs a `U32String` from a vector of UTF-32 data. + /// + /// No checks are made on the contents of the vector. + /// + /// # Examples + /// + /// ```rust + /// use widestring::U32String; + /// let v: Vec<char> = "Test".chars().collect(); + /// # let cloned: Vec<u32> = v.iter().map(|&c| c as u32).collect(); + /// // Create a wide string from the vector + /// let wstr = U32String::from_chars(v); + /// # assert_eq!(wstr.into_vec(), cloned); + /// ``` + pub fn from_chars(raw: impl Into<Vec<char>>) -> Self { + let mut chars = raw.into(); + UString { + inner: unsafe { + let ptr = chars.as_mut_ptr() as *mut u32; + let len = chars.len(); + let cap = chars.capacity(); + ManuallyDrop::new(chars); + Vec::from_raw_parts(ptr, len, cap) + }, + } + } + + /// Encodes a `U32String` copy from a `str`. + /// + /// This makes a wide string copy of the `str`. Since `str` will always be valid UTF-8, the + /// resulting `U32String` will also be valid UTF-32. + /// + /// # Examples + /// + /// ```rust + /// use widestring::U32String; + /// let s = "MyString"; + /// // Create a wide string from the string + /// let wstr = U32String::from_str(s); + /// + /// assert_eq!(wstr.to_string().unwrap(), s); + /// ``` + #[allow(clippy::should_implement_trait)] + pub fn from_str<S: AsRef<str> + ?Sized>(s: &S) -> Self { + let v: Vec<char> = s.as_ref().chars().collect(); + UString::from_chars(v) + } + + /// Encodes a `U32String` copy from an `OsStr`. + /// + /// This makes a wide string copy of the `OsStr`. Since `OsStr` makes no guarantees that it is + /// valid data, there is no guarantee that the resulting `U32String` will be valid UTF-32. + /// + /// # Examples + /// + /// ```rust + /// use widestring::U32String; + /// let s = "MyString"; + /// // Create a wide string from the string + /// let wstr = U32String::from_os_str(s); + /// + /// assert_eq!(wstr.to_string().unwrap(), s); + /// ``` + #[cfg(feature = "std")] + pub fn from_os_str<S: AsRef<std::ffi::OsStr> + ?Sized>(s: &S) -> Self { + let v: Vec<char> = s.as_ref().to_string_lossy().chars().collect(); + UString::from_chars(v) + } + + /// Constructs a `U32String` from a `char` pointer and a length. + /// + /// The `len` argument is the number of `char` elements, **not** the number of bytes. + /// + /// # Safety + /// + /// This function is unsafe as there is no guarantee that the given pointer is valid for `len` + /// elements. + /// + /// # Panics + /// + /// Panics if `len` is greater than 0 but `p` is a null pointer. + pub unsafe fn from_char_ptr(p: *const char, len: usize) -> Self { + UString::from_ptr(p as *const u32, len) + } + + /// Extends the string with the given `&str`. + /// + /// No checks are performed on the strings. It is possible to end up nul values inside the + /// string, and it is up to the caller to determine if that is acceptable. + /// + /// # Examples + /// + /// ```rust + /// use widestring::U32String; + /// let s = "MyString"; + /// let mut wstr = U32String::from_str(s); + /// // Push the original to the end, repeating the string twice. + /// wstr.push_str(s); + /// + /// assert_eq!(wstr.to_string().unwrap(), "MyStringMyString"); + /// ``` + pub fn push_str(&mut self, s: impl AsRef<str>) { + self.inner.extend(s.as_ref().chars().map(|c| c as u32)) + } + + /// Extends the string with the given `&OsStr`. + /// + /// No checks are performed on the strings. It is possible to end up nul values inside the + /// string, and it is up to the caller to determine if that is acceptable. + /// + /// # Examples + /// + /// ```rust + /// use widestring::U32String; + /// let s = "MyString"; + /// let mut wstr = U32String::from_str(s); + /// // Push the original to the end, repeating the string twice. + /// wstr.push_os_str(s); + /// + /// assert_eq!(wstr.to_string().unwrap(), "MyStringMyString"); + /// ``` + #[cfg(feature = "std")] + pub fn push_os_str(&mut self, s: impl AsRef<std::ffi::OsStr>) { + self.inner + .extend(s.as_ref().to_string_lossy().chars().map(|c| c as u32)) + } +} + +impl<C: UChar> Into<Vec<C>> for UString<C> { + fn into(self) -> Vec<C> { + self.into_vec() + } +} + +impl<'a> From<UString<u16>> for Cow<'a, UStr<u16>> { + fn from(s: UString<u16>) -> Self { + Cow::Owned(s) + } +} + +impl<'a> From<UString<u32>> for Cow<'a, UStr<u32>> { + fn from(s: UString<u32>) -> Self { + Cow::Owned(s) + } +} + +impl Into<UString<u16>> for Vec<u16> { + fn into(self) -> UString<u16> { + UString::from_vec(self) + } +} + +impl Into<UString<u32>> for Vec<u32> { + fn into(self) -> UString<u32> { + UString::from_vec(self) + } +} + +impl Into<UString<u32>> for Vec<char> { + fn into(self) -> UString<u32> { + UString::from_chars(self) + } +} + +impl From<String> for UString<u16> { + fn from(s: String) -> Self { + Self::from_str(&s) + } +} + +impl From<String> for UString<u32> { + fn from(s: String) -> Self { + Self::from_str(&s) + } +} + +#[cfg(feature = "std")] +impl From<std::ffi::OsString> for UString<u16> { + fn from(s: std::ffi::OsString) -> Self { + Self::from_os_str(&s) + } +} + +#[cfg(feature = "std")] +impl From<std::ffi::OsString> for UString<u32> { + fn from(s: std::ffi::OsString) -> Self { + Self::from_os_str(&s) + } +} + +#[cfg(feature = "std")] +impl From<UString<u16>> for std::ffi::OsString { + fn from(s: UString<u16>) -> Self { + s.to_os_string() + } +} + +#[cfg(feature = "std")] +impl From<UString<u32>> for std::ffi::OsString { + fn from(s: UString<u32>) -> Self { + s.to_os_string() + } +} + +impl<'a, C: UChar, T: ?Sized + AsRef<UStr<C>>> From<&'a T> for UString<C> { + fn from(s: &'a T) -> Self { + s.as_ref().to_ustring() + } +} + +impl<C: UChar> Index<RangeFull> for UString<C> { + type Output = UStr<C>; + + #[inline] + fn index(&self, _index: RangeFull) -> &UStr<C> { + UStr::from_slice(&self.inner) + } +} + +impl<C: UChar> Deref for UString<C> { + type Target = UStr<C>; + + #[inline] + fn deref(&self) -> &UStr<C> { + &self[..] + } +} + +impl<C: UChar> PartialEq<UStr<C>> for UString<C> { + #[inline] + fn eq(&self, other: &UStr<C>) -> bool { + self.as_ustr() == other + } +} + +impl<C: UChar> PartialOrd<UStr<C>> for UString<C> { + #[inline] + fn partial_cmp(&self, other: &UStr<C>) -> Option<cmp::Ordering> { + self.as_ustr().partial_cmp(other) + } +} + +impl<'a, C: UChar> PartialEq<&'a UStr<C>> for UString<C> { + #[inline] + fn eq(&self, other: &&'a UStr<C>) -> bool { + self.as_ustr() == *other + } +} + +impl<'a, C: UChar> PartialOrd<&'a UStr<C>> for UString<C> { + #[inline] + fn partial_cmp(&self, other: &&'a UStr<C>) -> Option<cmp::Ordering> { + self.as_ustr().partial_cmp(*other) + } +} + +impl<'a, C: UChar> PartialEq<Cow<'a, UStr<C>>> for UString<C> { + #[inline] + fn eq(&self, other: &Cow<'a, UStr<C>>) -> bool { + self.as_ustr() == other.as_ref() + } +} + +impl<'a, C: UChar> PartialOrd<Cow<'a, UStr<C>>> for UString<C> { + #[inline] + fn partial_cmp(&self, other: &Cow<'a, UStr<C>>) -> Option<cmp::Ordering> { + self.as_ustr().partial_cmp(other.as_ref()) + } +} + +impl<C: UChar> Borrow<UStr<C>> for UString<C> { + fn borrow(&self) -> &UStr<C> { + &self[..] + } +} + +impl<C: UChar> ToOwned for UStr<C> { + type Owned = UString<C>; + fn to_owned(&self) -> UString<C> { + self.to_ustring() + } +} + +impl<'a> From<&'a UStr<u16>> for Cow<'a, UStr<u16>> { + fn from(s: &'a UStr<u16>) -> Self { + Cow::Borrowed(s) + } +} + +impl<'a> From<&'a UStr<u32>> for Cow<'a, UStr<u32>> { + fn from(s: &'a UStr<u32>) -> Self { + Cow::Borrowed(s) + } +} + +impl<C: UChar> AsRef<UStr<C>> for UStr<C> { + fn as_ref(&self) -> &Self { + self + } +} + +impl<C: UChar> AsRef<UStr<C>> for UString<C> { + fn as_ref(&self) -> &UStr<C> { + self + } +} + +impl<C: UChar> AsRef<[C]> for UStr<C> { + fn as_ref(&self) -> &[C] { + self.as_slice() + } +} + +impl<C: UChar> AsRef<[C]> for UString<C> { + fn as_ref(&self) -> &[C] { + self.as_slice() + } +} + +impl<'a, C: UChar> From<&'a UStr<C>> for Box<UStr<C>> { + fn from(s: &'a UStr<C>) -> Self { + let boxed: Box<[C]> = Box::from(&s.inner); + let rw = Box::into_raw(boxed) as *mut UStr<C>; + unsafe { Box::from_raw(rw) } + } +} + +impl<C: UChar> From<Box<UStr<C>>> for UString<C> { + fn from(boxed: Box<UStr<C>>) -> Self { + boxed.into_ustring() + } +} + +impl<C: UChar> From<UString<C>> for Box<UStr<C>> { + fn from(s: UString<C>) -> Self { + s.into_boxed_ustr() + } +} + +impl<C: UChar> Default for Box<UStr<C>> { + fn default() -> Self { + let boxed: Box<[C]> = Box::from([]); + let rw = Box::into_raw(boxed) as *mut UStr<C>; + unsafe { Box::from_raw(rw) } + } +} + +/// An owned, mutable "wide" string for FFI that is **not** nul-aware. +/// +/// `U16String` is not aware of nul values. Strings may or may not be nul-terminated, and may +/// contain invalid and ill-formed UTF-16 data. These strings are intended to be used with +/// FFI functions that directly use string length, where the strings are known to have proper +/// nul-termination already, or where strings are merely being passed through without modification. +/// +/// `WideCString` should be used instead if nul-aware strings are required. +/// +/// `U16String` can be converted to and from many other standard Rust string types, including +/// `OsString` and `String`, making proper Unicode FFI safe and easy. +/// +/// # Examples +/// +/// The following example constructs a `U16String` and shows how to convert a `U16String` to a +/// regular Rust `String`. +/// +/// ```rust +/// use widestring::U16String; +/// let s = "Test"; +/// // Create a wide string from the rust string +/// let wstr = U16String::from_str(s); +/// // Convert back to a rust string +/// let rust_str = wstr.to_string_lossy(); +/// assert_eq!(rust_str, "Test"); +/// ``` +pub type U16String = UString<u16>; + +/// An owned, mutable 32-bit wide string for FFI that is **not** nul-aware. +/// +/// `U32String` is not aware of nul values. Strings may or may not be nul-terminated, and may +/// contain invalid and ill-formed UTF-32 data. These strings are intended to be used with +/// FFI functions that directly use string length, where the strings are known to have proper +/// nul-termination already, or where strings are merely being passed through without modification. +/// +/// `U32CString` should be used instead if nul-aware 32-bit strings are required. +/// +/// `U32String` can be converted to and from many other standard Rust string types, including +/// `OsString` and `String`, making proper Unicode FFI safe and easy. +/// +/// # Examples +/// +/// The following example constructs a `U32String` and shows how to convert a `U32String` to a +/// regular Rust `String`. +/// +/// ```rust +/// use widestring::U32String; +/// let s = "Test"; +/// // Create a wide string from the rust string +/// let wstr = U32String::from_str(s); +/// // Convert back to a rust string +/// let rust_str = wstr.to_string_lossy(); +/// assert_eq!(rust_str, "Test"); +/// ``` +pub type U32String = UString<u32>; + +/// Alias for `U16String` or `U32String` depending on platform. Intended to match typical C `wchar_t` size on platform. +pub type WideString = UString<WideChar>; |