diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-07 16:49:04 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-07 16:49:04 +0000 |
commit | 16f504a9dca3fe3b70568f67b7d41241ae485288 (patch) | |
tree | c60f36ada0496ba928b7161059ba5ab1ab224f9d /src/libs/xpcom18a4/xpcom/string/doc | |
parent | Initial commit. (diff) | |
download | virtualbox-upstream.tar.xz virtualbox-upstream.zip |
Adding upstream version 7.0.6-dfsg.upstream/7.0.6-dfsgupstream
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'src/libs/xpcom18a4/xpcom/string/doc')
-rw-r--r-- | src/libs/xpcom18a4/xpcom/string/doc/README.html | 44 | ||||
-rw-r--r-- | src/libs/xpcom18a4/xpcom/string/doc/string-guide.html | 2508 |
2 files changed, 2552 insertions, 0 deletions
diff --git a/src/libs/xpcom18a4/xpcom/string/doc/README.html b/src/libs/xpcom18a4/xpcom/string/doc/README.html new file mode 100644 index 00000000..154b7969 --- /dev/null +++ b/src/libs/xpcom18a4/xpcom/string/doc/README.html @@ -0,0 +1,44 @@ +<html> +<!-- ***** BEGIN LICENSE BLOCK ***** + - Version: MPL 1.1/GPL 2.0/LGPL 2.1 + - + - The contents of this file are subject to the Mozilla Public License Version + - 1.1 (the "License"); you may not use this file except in compliance with + - the License. You may obtain a copy of the License at + - http://www.mozilla.org/MPL/ + - + - Software distributed under the License is distributed on an "AS IS" basis, + - WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License + - for the specific language governing rights and limitations under the + - License. + - + - The Original Code is Mozilla. + - + - The Initial Developer of the Original Code is + - Netscape Communications. + - Portions created by the Initial Developer are Copyright (C) 2001 + - the Initial Developer. All Rights Reserved. + - + - Contributor(s): + - Scott Collins <scc@mozilla.org> (original author) + - + - Alternatively, the contents of this file may be used under the terms of + - either of the GNU General Public License Version 2 or later (the "GPL"), + - or the GNU Lesser General Public License Version 2.1 or later (the "LGPL"), + - in which case the provisions of the GPL or the LGPL are applicable instead + - of those above. If you wish to allow use of your version of this file only + - under the terms of either the GPL or the LGPL, and not to allow others to + - use your version of this file under the terms of the MPL, indicate your + - decision by deleting the provisions above and replace them with the notice + - and other provisions required by the GPL or the LGPL. If you do not delete + - the provisions above, a recipient may use your version of this file under + - the terms of any one of the MPL, the GPL or the LGPL. + - + - ***** END LICENSE BLOCK ***** --> +<body> + <h1><span class="LXRSHORTDESC">documentation aimed at programmers who are clients of the string library</span></h1> +<p> + <span class="LXRLONGDESC"></span> +</p> +</body> +</html> diff --git a/src/libs/xpcom18a4/xpcom/string/doc/string-guide.html b/src/libs/xpcom18a4/xpcom/string/doc/string-guide.html new file mode 100644 index 00000000..41dbd217 --- /dev/null +++ b/src/libs/xpcom18a4/xpcom/string/doc/string-guide.html @@ -0,0 +1,2508 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"> +<html> + <head> + <title>an incomplete guide to mozilla/string</title> + + <link rel="stylesheet" href="http://www.mozilla.org/projects/string/string-guide.css" title="remote stylesheet" type="text/css"> + <link rel="alternate stylesheet" href="string-guide.css" title="local stylesheet" type="text/css"> + </head> + <body> +<!-- ----|---------|---------|---------|---------|---------|---------|---------| --> +<!-- ...............................................................Front Matter --> +<h1>an incomplete guide to <a class="exact-uri" href="http://lxr.mozilla.org/seamonkey/source/string/">mozilla/string</a></h1> + <h1><font color="red">This document is now deprecated in favor of <a href="http://www.mozilla.org/projects/xpcom/string-guide.html">The new string guide</a>.</font></h1> +<div class="author-note"> + <p>by <a href="http://ScottCollins.net/">Scott Collins</a><!-- /p --> + <p>last modified 8 April 2001<!-- /p --> +</div> + +<div class="abstract"> + <p> + <h1>Abstract</h1> + This document <span class="LXRSHORTDESC">provides + an <a href="#users_guide">introduction</a> to the design and use of the string classes in mozilla, + <a href="#implementors_guide">detailed information</a> on their implementation and how one may extend them, + and <a href="#faq">answers</a> to frequently asked questions about strings</span>. + </p> +</div> + + + +<h2><a name="contents">contents</a></h2> + +<div class="contents"> + <ul> + <li><a href="#users_guide" >user's guide</a></li> + <li><a href="#implementors_guide">implementor's guide</a></li> + <li><a href="#faq" >frequently asked questions</a></li> + </ul> +</div> + +<p> + Please direct all comments, requests, and contributions to, + in order of preference, + the tracking bug <a href="http://bugzilla.mozilla.org/show_bug.cgi?id=70076">#70076</a> for this document, + the author <a class="exact-uri" href="mailto:scc@mozilla.org?subject=string-guide">scc@mozilla.org</a>, and/or + the newsgroup <a class="exact-uri" href="news:netscape.public.mozilla.xpcom">news:netscape.public.mozilla.xpcom</a> + (should there be a strings newsgroup?) +</p> + +<div class="author-note"> + <p> + A note to potential editors: + don't even <strong>consider</strong> modifying this document with an HTML editor. + That would destroy the internal formatting, + and make patches unmanagable. + </p> +</div> + + + + +<!-- ...............................................................User's Guide --> +<hr> +<h1><a name="users_guide">user's guide</a></h1> + +<div class="author-note"> + <p> + Strings in mozilla are a world apart from <span class="code">char*</span>s. + If you don't know why they are different, + this section is the place for you to start. + If you're already familiar with the hierarchy of string classes in mozilla, + then you might want to skip ahead to the <a href="#implementors_guide">implementor's guide</a> + or the <a href="#faq">FAQ</a>. + </p> +</div> + +<div class="contents"> + <ul> + <li><a href="#users_guide_introduction">introduction</a></li> + <li><a href="#users_guide_how_to" >using the string classes correctly; using the correct string class</a></li> + <li><a href="#users_guide_iterators" >using string iterators</a></li> + <li><a href="#users_guide_summary" >summary</a></li> + </ul> +</div> + +<h2><a name="users_guide_introduction">introduction</a></h2> + <h3>what and what isn't a string?</h3> +<p> + A string is an opaque container holding a, possibly zero length, linear sequence of characters. + Understanding the implications of this statement is the foundation for understanding all mozilla's string classes. +</p> + + <h3>readable and writable</h3> + <h3>dependent strings</h3> + <h3>flat strings</h3> + <h3>encoding</h3> + <h3>sharing</h3> + +<h2><a name="users_guide_how_to">using the string classes correctly; using the correct string class</a></h2> + <h3>basic string operations</h3> + <h4>comparison</h4> + <h4>concatenation</h4> + <h4>substrings</h4> + <h4>find and replace</h4> + <h3>conversions</h3> + <h4>calling a function that expects a different kind of string</h4> + <h4>converting between string classes</h4> + <h4>converting between encodings</h4> + <h3>selecting the right string class</h3> + <h4>user string classes</h4> + <h4>selecting the right string class for a parameter</h4> + <h4>selecting the right string class for a local variable</h4> + <h4>selecting the right string class for a member variable</h4> + <h4>selecting the right string class for a return value</h4> + <h4>selecting the right string class in IDL</h4> + <h3>dont's</h3> + +<h2><a name="users_guide_iterators">using string iterators</a></h2> + <h3>what is an iterator?</h3> + <h3>reading iterators and writing iterators</h3> + <h3>`chunky' iterating for efficiency</h3> + <h3><span class="code">copy_string</span>, character sources and sinks</h3> + <h3>encoding conversion iterators</h3> + +<h2><a name="users_guide_summary">summary</a></h2> + + +<!-- ........................................................Implementor's Guide --> +<hr> +<h1><a name="implementors_guide">implementor's guide</a></h1> + +<div class="author-note"> + <p> + + </p> +</div> + +<div class="contents"> + <ul> + <!-- li></li --> + </ul> +</div> + + + +<!-- ........................................................................FAQ --> +<hr> +<h1><a name="faq">frequently asked questions</a></h1> + +<div class="author-note"> +</div> + +<div class="contents"> + <ul> +<!-- + <li> + I have a wide string, i.e., an instance of a class derived from <span class="code">nsAString</span> + <ul> + <li>I want a pointer to the characters</span> + <li>I want a narrow string</li> + <li>I want to <span class="code">printf</span> it</li> + </ul> + </li> + <li> + I have a <span class="code">PRUnichar*</span> + <ul> + <li>I want a wide string</span> + <li>I want a narrow string</span> + <li>I want to <span class="code">printf</span> it</li> + </ul> + </li> + <li> + I have a narrow string, i.e., an instance of a class derived from <span class="code">nsACString</span> + <ul> + <li>I want a pointer to the characters</span> + <li>I want a narrow string</li> + <li>I want to <span class="code">printf</span> it</li> + </ul> + </li> + <li> + I have a <span class="code">char*</span> + <ul> + <li>I want a wide string</span> + <li>I want a narrow string</span> + </ul> + </li> + <li> + I have a literal character sequence, e.g., <span class="code">"Hello, World!\n"</span> + <ul> + <li>I want a wide string</span> + <li>I want a narrow string</span> + </ul> + </li> + <li>What's the best way to return a string?</li> + <li>How can I get a pointer to the characters in a string?</li> + <li>How can I <span class="code">printf</span> a string?</li> + </ul> +--> +</div> + + +<table class="chart"> + <tr> + <th></th> + <th colspan="5">you have some <span class="code">char</span>s</th> + </tr> + <tr> + <th>you want</th> + <th><span class="code">'x'</span></th> + <th><span class="code">char c</span></th> + <th><span class="code">"foo"</span></th> + <th><span class="code">char* cp</span></th> + <th><span class="code">nsACString& cs</span></th> + </tr> + <tr> + <th class="row-label"><span class="code">char</span></th> + <td colspan="2">.</td> +<!-- "foo" --> <td><span class="code">[]</span></td> +<!-- char* cp --> <td><span class="code">[]</span></td> +<!-- nsACString& cs --> <td><a href="#faq_how_to_extract_a_character">extract a character</a></td> + </tr> + <tr> + <th class="row-label"><span class="code">PRUnichar</span></th> +<!-- 'x' --> <td><span class="code">PRUnichar('x')</span></td> +<!-- char c --> <td><span class="code">PRUnichar(c)</span></td> + <td colspan="3"><a href="#faq_how_to_convert_encoding">convert encoding</a>, <a href="#faq_how_to_extract_a_character">extract a character</a></td> + </tr> + <tr> + <th class="row-label"><span class="code">char*</span></th> +<!-- 'x' --> <td><span class="code">&</span></td> +<!-- char c --> <td><span class="code">&</span></td> +<!-- "foo" --> <td><span class="code">&</span></td> +<!-- char* cp --> <td>.</td> +<!-- nsACString& cs --> <td><a href="#faq_how_to_get_a_pointer">get a pointer</a></td> + </tr> + <tr> + <th class="row-label"><span class="code">PRUnichar*</span></th> + <td colspan="5"><a href="#faq_how_to_convert_encoding">convert encoding</a>, <a href="#faq_how_to_get_a_pointer">get a pointer</a></td> + </tr> + <tr> + <th class="row-label"><span class="code">nsACString</span></th> +<!-- 'x' --> <td><span class="code">NS_LITERAL_CSTRING("x")</span></td> +<!-- char c --> <td><a href="#faq_how_to_make_a_string">make a string</a></td> +<!-- "foo" --> <td><span class="code">NS_LITERAL_CSTRING("foo")</td> +<!-- char* cp --> <td><a href="#faq_how_to_make_a_string">make a string</a></td> +<!-- nsACString& cs --> <td>.</td> + </tr> + <tr> + <th class="row-label"><span class="code">nsAString</span></th> +<!-- 'x' --> <td><span class="code">NS_LITERAL_STRING("x")</span></td> +<!-- char c --> <td><a href="#faq_how_to_convert_encoding">convert encoding</a></td> +<!-- "foo" --> <td><span class="code">NS_LITERAL_STRING("foo")</span></td> + <td colspan="2"><a href="#faq_how_to_convert_encoding">convert encoding</a></td> + </tr> + <tr> + <th class="row-label">to call <span class="code">printf</span></th> + <td colspan="4">.</td> +<!-- nsACString& cs --> <td><a href="#faq_how_to_call_printf">call <span class="code">printf</span></a></td> + </tr> +</table> + +<table class="chart"> + <tr> + <th></th> + <th colspan="3">you have some <span class="code">PRUnichar</span>s</th> + </tr> + <tr> + <th>you want</th> + <th><span class="code">PRUnichar w</span></th> + <th><span class="code">PRUnichar* wp</span></th> + <th><span class="code">nsAString& s</span></th> + </tr> + <tr> + <th class="row-label"><span class="code">char</span></th> +<!-- PRUnichar w --> <td></td> +<!-- PRUnichar* wp --> <td></td> +<!-- nsAString& s --> <td></td> + </tr> + <tr> + <th class="row-label"><span class="code">PRUnichar</span></th> +<!-- PRUnichar w --> <td></td> +<!-- PRUnichar* wp --> <td><span class="code">[]</span></td> +<!-- nsAString& s --> <td><a href="#faq_how_to_extract_a_character">extract a character</a></td> + </tr> + <tr> + <th class="row-label"><span class="code">char*</span></th> +<!-- PRUnichar w --> <td></td> +<!-- PRUnichar* wp --> <td></td> +<!-- nsAString& s --> <td></td> + </tr> + <tr> + <th class="row-label"><span class="code">PRUnichar*</span></th> +<!-- PRUnichar w --> <td><span class="code">&</span></td> +<!-- PRUnichar* wp --> <td></td> +<!-- nsAString& s --> <td><a href="#faq_how_to_get_a_pointer">get a pointer</a></td> + </tr> + <tr> + <th class="row-label"><span class="code">nsACString</span></th> +<!-- PRUnichar w --> <td></td> +<!-- PRUnichar* wp --> <td></td> +<!-- nsAString& s --> <td></td> + </tr> + <tr> + <th class="row-label"><span class="code">nsAString</span></th> +<!-- PRUnichar w --> <td></td> +<!-- PRUnichar* wp --> <td></td> +<!-- nsAString& s --> <td></td> + </tr> + <tr> + <th class="row-label">to call <span class="code">printf</span></th> +<!-- PRUnichar w --> <td></td> +<!-- PRUnichar* wp --> <td></td> +<!-- nsAString& s --> <td><a href="#faq_how_to_call_printf">call <span class="code">printf</span></a></td> + </tr> +</table> + +<div class="faq"> + <dl> + <dt> + is there any string doc? + </dt> + <dd> + Yes, you're soaking in it! + </dd> + + + +<!-- getting a pointer --> + <dt> + <a name="faq_how_to_get_a_pointer">I have a string, how do I get a pointer to the characters?</a> + </dt> + <dd> + You want to avoid this situation. + In your own interfaces, prefer string types over raw pointers. + Any interface that wants to process a string using a single pointer is making two expensive assumptions. + First, that the string is stored in one contiguous hunk; and + second, that the string is zero-terminated. + If this isn't the case, + then to get a pointer, storage must be allocated and the entire string must be copied to it and zero-terminated. + You may not be able to avoid needing a pointer when interacting with system calls. + </dd> + <dd> + Some string classes guarantee that they are `flat'. + That is, that their data is stored in one contiguous zero-terminated hunk. + This <strong>does not</strong> imply that there are no embedded nulls. Caveat emptor. + All strings that explicitly promise flatness + inherit from the class <span class="code">nsAFlatString</span> + or <span class="code">nsAFlatCString</span> + and can produce a constant pointer to their data with the <span class="code">get()</span> member function. + Even strings that don't explicitly promise to be flat + may happen to be flat. + The helper function <span class="code">PromiseFlatString</span> will produce + a <span class="code">const</span> dependent string that is guaranteed to be flat. + If you use this on a string that already happens to be flat, + the result is simply a reference through to that string. + Otherwise, + <span class="code">PromiseFlatString</span> does the work to allocate, copy, terminate, and manage + a temporary flat string. + Since the result of <span class="code">PromiseFlatString</span> is a temporary, + you must be careful not to get and hold a pointer to its data for longer than the temporary itself lives. + </dd> + <dd> +<div class="source-code"> +<pre> + /* I have a string, how do I get a pointer to the characters? */ + +extern void EvilNarrowOSFunction( const char* ); // evil OS routines that want a pointers +extern void EvilWideOSFunction( const PRUnichar* ); + +void func( const nsAString& aString, const nsACString& aCString ) + { + EvilWideOSFunction( NS_LITERAL_STRING("Hello, World!").<span class="notice">get()</span> ); + // literal strings are flat already (as are |nsString|s, et al), just use |.get()| + + EvilWideOSFunction( <span class="notice">PromiseFlatString(</span>aString<span class="notice">).get()</span> ); + // for strings that don't explicitly guarantee flatness, use |PromiseFlatString| + + + // beware holding the pointer for longer than the life of the promise + <span class="warning">const PRUnichar* wp = PromiseFlatString(aString).get(); // BAD! |wp| dangles + EvilWideOSFunction(wp);</span> + + // if you really need to use the pointer from |PromiseFlatString| in more than one expression... + const nsAFlatString& flat = <span class="notice">PromiseFlatString(</span>aString<span class="notice">)</span>; + EvilWideOSFunction(flat.<span class="notice">get()</span>); + SomeOtherFunction(flat.<span class="notice">get()</span>); + + // similarly for |char| strings + EvilNarrowOSFunction( <span class="notice">PromiseFlatCString(</span>aCString<span class="notice">).get()</span> ); + } +</pre> +</div> + </dd> + + + +<!-- extracting a character --> + <dt> + <a name="faq_how_to_extract_a_character">How do I get a particular character out of a string?</a> + </dt> + <dd> + Flat strings provide <span class="code">operator[]</span> and <span class="code">CharAt()</span>. + All strings provide <span class="code">First()</span>, <span class="code">Last()</span>, and access with iterators. + <strong>Don't</strong> promise a string flat just to do character indexing. + Prefer, instead, to get an iterator and <span class="code">advance</span> it to the position you care about. + </dd> + <dd> +<div class="source-code"> +<pre> + /* How do I get a particular character out of a string? */ + +PRUnichar Get5thCharacterOf( const nsAString& aString ) + { + if ( aString.Length() >= 5 ) + { + nsAString::const_iterator iter; + aString.BeginReading(iter); // make |iter| point to the beginning of |aString| + iter.advance(5); + return *iter; + } + + return PRUnichar(0); + } +</pre> +</div> + </dd> + <dd> + Using iterators isn't as bad as the example above makes it feel. + The typical use is for advancing through a string, examining many characters. + </dd> + + + +<!-- how to convert encoding --> + <dt> + <a name="faq_how_to_convert_encoding">How do I convert from one encoding to another?</a> + </dt> + <dd> + </dd> + + + +<!-- how to make a string --> + <dt> + <a name="faq_how_to_make_a_string">How do I create a string?</a> + </dt> + <dd> + </dd> + + +<!-- how to return a string --> + <dt> + What is the best way to return a string? + </dt> + <dd> + <p> + There are several reasonable ways to produce a string result from a function. + If you are already holding the answer as a sharable string, + you can simply return that string (pass-by-value). + Otherwise, + the most efficient and flexible way to return a string is + to assign your result into a non-<span class="code">const</span> reference parameter. + Don't bother to create a sharable string from scratch with your generated result. + </p> + <p> + Why? + The two things you want to minimize in string manipulation are, + in order of importance, + heap allocation, and + moving characters around. + </p> + </dd> + <dd> +<div class="source-code"> +<pre> + /* What is the best way to return a string? */ + +class foo + { + public: + // ... + void GetShortName( nsAString& aResult ) const; + nsCommonString GetFullName() const; + + private: + nsCommonString mFullName; + + const PRUnichar* mShortName; + PRUint32 mShortNameLength; + + }; + +nsCommonString +foo::GetFullName() const + { + return mFullName; + } + +void +foo::GetShortName( nsAString& aResult ) const + { + aResult = DependentString(mShortName, mShortNameLength); + } +</pre> +</div> + </dd> + + + <dt> + <a name="faq_how_to_call_printf">How do I <span class="code">printf</span> a string, e.g., for debugging.</a> + </dt> + <dd> + If your string is already narrow, you just have to worry about <a href="#faq_how_to_get_a_pointer">making it flat, and then getting a pointer</a>. + </dd> + <dd> + If your string happens to be wide, + you'll need to convert it before you can <span class="code">printf</span> something reasonable. + If it's just for debugging, + you probably wouldn't care if something odd was printed in the case of a Unicode character that didn't have + an ASCII equivalent. (If you have a UTF-8 terminal, the result is + perfectly legible and nothing odd is printed.) + The simplest thing in this case is to make a temporary conversion using <span class="code">NS_ConvertUTF16toUTF8</span>. + The result is conveniently flat already, so getting the pointer is simple. + Remember not to hold onto the pointer you get out of this beyond the lifetime of temporary. + </dd> + <dd> +<div class="source-code"> +<pre> + /* How do I |printf| a string? */ + + +void PrintSomeStrings( const nsAString& aString, const PRUnichar* aKey, const nsACString& aCString ) + { + // |printf|ing a narrow string is easy + printf("%s\n", <span class="notice">PromiseFlatCString(</span>aCString<span class="notice">).get()</span>); // GOOD + + // the simplest way to get a |printf|-able |const char*| out of a string + printf("%s\n", <span class="notice">NS_ConvertUTF16toUTF8(</span>aKey<span class="notice">).get()</span>); // GOOD + + // works just as well with an formal wide string type... + printf("%s\n", <span class="notice">NS_ConvertUTF16toUTF8(</span>aString<span class="notice">).get()</span>); + + + // But don't hold onto the pointer longer than the lifetime of the temporary! + <span class="warning">const char* cstring = NS_ConvertUTF16toUTF8(aKey).get(); // BAD! |cstring| is dangling + printf("%s\n", cstring);</span> + } +</pre> +</div> + </dd> + + </dl> + +<p> + Here are the email answers I have yet to format into the FAQ. + Some of the URLs may be out-dated or moved. + The messages are in order from oldest to newest. +</p> +<p class="editnote">[Note : In June, 2003, these emails were modified +to better reflect what is stored in 'wide' string +classes (UTF-16 string instead of UCS-2) and what +related methods do as a part of the patch for <a href= +"http://bugzilla.mozilla.org/show_bug.cgi?id=183156" +title="replace UCS2 in function/class/method names with UTF16">bug 183156</a>. +Therefore, they're a little different from the original emails +written by <a href="http://ScottCollins.net/">Scott Collins</a>] +</p> +<hr> +<pre> +Date: Thu, 13 Apr 2000 19:41:47 -0400 +</pre> + +<p>Encoding Wars + +<p>This message is all about strings and the various encodings that might +be used to interpret their contents, the ramifications of that, and +where we're heading. The point of this message is to say what we're +currently thinking, and get feedback. I apologize in advance for the +rambling, and for the fact that this message may accidentally mix +discussion of how things <strong>are</strong> and how they will be. + +<p>There are many different possible encodings. Three in common use in +the Mozilla source base are: ASCII, UTF-16, and UTF-8. In ASCII, every +<!--the Mozilla source base are: ASCII, UCS2, and UTF8. In ASCII, every--> +character fits in 7-bits and is typically stored in an 8-bit byte. We +usually represent ASCII strings with <span class="code">nsCString</span>s, <span class="code">nsXPIDLCString</span>s, +or <span class="code">char</span> string literals. In UTF-16, characters occupy one 16-bit code unit ( +<a href="http://www.unicode.org/glossary/index.html#BMP_character"> +<abbr title="Basic Multilingual Plane">BMP</abbr>characters</a>) +or two 16-bit code units +(<a href="http://www.unicode.org/glossary/index.html#supplementary_character"> +<abbr title="Supplementary Plane : Plane 1 through 16">non-BMP</abbr> characters</a>). +We usually represent UTF-16 strings as <span class="code">nsString</span>s, etc., i.e., two-byte +or `wide' strings. UTF-8 is a multi-byte encoding. A character might +occupy one, two, three, or four bytes. It is easiest to store and +manipulate such a string within a single-byte or `narrow' string +implementation. + +<p>None of our current string implementations know the encoding of the +data they hold at any given moment. An <span class="code">nsCString</span> might legitimately +hold data encoded in ASCII, UTF-8 or even EBCDIC for that matter. + +<p>Operations that convert from one encoding to another, or operations +that are encoding sensitive (e.g., <span class="code">to_upper</span>), rightly belong in +i18n. The fact that our current string interfaces automatically and +implicitly convert between wide and narrow strings is actually the +source of many errors in two particular categories: (1) unintended +extra work, (2) mistaken re-encoding, e.g., accidentally `converting' +a UTF-8 string to UTF-16 by pretending the UTF-8 string is ASCII and then +padding with <span class="code">'\0'</span>s. + +<p>We've known these were bad for a long time, and have been trying to +find the right way to fix them. The current thinking is to just byte +the bullet and eliminate implicit conversions. That has interesting +ramifications. + +<div class="source-code"> +<pre> +void foo( const nsString& aUTF16string ); + +foo("hello"); // works! constructs a temporary |nsString| by + // converting the ASCII literal with padding. + // Note: this requires an allocation +</pre> +</div> + +<p>Though we've always hated this form since it requires a heap +allocation. In current code, we recommend + +<div class="source-code"> +<pre> +foo( nsAutoString("hello") ); +</pre> +</div> + +<p>which still copy/converts, but at least it probably doesn't need to do +a heap allocation. In the best of all worlds, no conversion, copying, +or allocation would be necessary. To do that, you would need to be +able to directly specify a UTF-16 string, e.g., with the <span class="code">L"hello"</span> +notation, and wrap that in an interface that just held a pointer. +E.g., something like + +<div class="source-code"> +<pre> +void foo( const nsAReadableString& aUTF16string ); + +foo( nsLiteralString(L"hello") ); +</pre> +</div> + +<p>There are problems with this example, however. The <span class="code">L</span> notation +specifically makes objects that are arrays of <span class="code">wchar_t</span>, which under +GCC is a 4-byte element. This leads to incompatibility with JS, and +the annoyance of possibly bloated storage (I'm sort of minimizing the +situation here. It's worse that I make it sound). More about tricks +to get around this in a bit, but first, let me talk about what to do +in the meantime while we're just getting rid of implicit constructors. + Initially to get around this problem (what problem? The problem that +<span class="code">foo("hello")</span> stopped compiling on my machine when I threw the +switch) I made a routine called <span class="code">NS_ConvertToString</span> which looked like +this + +<div class="source-code"> +<pre> +inline +nsAutoString +NS_ConvertToString( const char* anASCIIstring ) + { + nsAutoString aUCS2string; + aUCS2string.AssignWithConversion(anASCIIstring); + return aUCS2string; + } +</pre> +</div> + +<p>Which lets me write + +<div class="source-code"> +<pre> +foo( NS_ConvertToString("hello") ); +</pre> +</div> + +<p>This was <strong>OK</strong>, but in discussion there were concerns about performance +on machines that didn't <span class="code">inline</span> well, and issues about naming. In +that meeting we came up with an alternate naming strategy that we +think has room for growth and an implementation more likely to be +efficient on every platform. The implementation is to define a new +class that derives from <span class="code">nsAutoString</span>, but allows construction from a +<span class="code">char*</span> + +<div class="source-code"> +<pre> +class NS_ConvertASCIItoUTF16 : public nsAutoString + { + public: + NS_ConvertASCIItoUTF16( const char* ); + // ... + }; +</pre> +</div> + +<p>Which gives identical (though renamed) notation for calling <span class="code">foo</span>: + +<div class="source-code"> +<pre> +foo( NS_ConvertASCIItoUTF16("hello") ); +</pre> +</div> + +<p>It looks like a function call to an explicit encoding conversion. It +acts like a function call to an explicit encoding conversion. It <strong>is</strong> +a function call to an explicit encoding conversion. We think that +this naming pattern has room for growth. In the meeting, we concluded +that the best representation for encoding conversions is a family of +functions, and <span class="code">NS_ConvertASCIItoUTF16</span> fits right in. We think that +XPCOM probably can't live without the ASCII to UTF-16 conversion (though +as explicit as possible) but that all others rightly belong in i18n +land. + +<p>You can probably deduce from the clues in <span class="code">NS_ConvertToString</span>, above, +that constructors weren't the only thing that became explicit. +Assignment, appending, comparison, et al, got renamed so that when +assigning, appending, or comparing to a value in a different encoding +the `WithConversion' form must be used. E.g., + +<div class="source-code"> +<pre> +nsString aUTF16string; +nsCString anASCIIstring; +// ... + +aUTF16string += anASCIIstring; // Currently legal, but not for long +aUTF16string.Append(anASCIIstring); // same + +aUTF16string.AppendWithConversion(anASCIIstring); // the new way + +if ( aUTF16string == anASCIIstring ) // Sorry, this is going away too + // ... + +if ( aUTF16string.EqualsWithConversion(anASCIIstring) ) + // ... +</pre> +</div> + +<p>Yes, it's long and annoying. Just like the extra work you were +implicitly asking to have done, perhaps incorrectly. There are other +reasons to rename these functions. When <span class="code">nsString</span> and <span class="code">nsCString</span> +defined a ton of, e.g., <span class="code">Append</span>s each there was no problem, because +nobody wanted to override <span class="code">Append</span>. Now, with strings inheriting from +abstract base classes we immediately run into the problem that +overriding and overloading don't mix very well in C++. Because of a +feature of C++ called name hiding, it is problematic to override only +a single signature of a name overloaded in a base class. The base +<span class="code">nsAWritableString</span> provides several <span class="code">Append</span>s, all for objects of +(hopefully) the same encoding. <span class="code">nsString</span> can't easily add a bunch of +new <span class="code">Append</span>s (the converting ones) without running face first into +the name hiding problem. The discussion of the fix for this is mostly +unrelated to encoding issues, so I'll defer it to another post. + +<p>In hindsight, after the meeting, it seemed clear that all the +`WithConversion' forms would be better named + +<div class="source-code"> +<pre> +xxxConvertingASCIItoUTF16 +xxxConvertingUTF16toASCII +</pre> +</div> + +<p>however, the <strong>real</strong> goal (probably) is to move most such conversions +into i18n. Just bringing attention to the previously implicit +conversions is a good first step. Renaming these conversions as just +suggested is probably the right thing to do, though it sort of +validates them, which I'm not sure we really want. This is a decision +we need to discuss further. + +<p>Now, back to the string literal problem above. One possible solution +is to use a macro. Imagine + +<div class="source-code"> +<pre> +NS_LITERAL_STRING("Hello") +</pre> +</div> + +<p>which on a machine where the <span class="code">L</span> trick works, turns into + +<div class="source-code"> +<pre> +nsLiteralString(L"Hello") +</pre> +</div> + +<p>but on a machine where there is trouble, turns into something less +appealing, but more likely to work, like + +<div class="source-code"> +<pre> +NS_ConvertASCIItoUTF16("Hello") +</pre> +</div> + +<p>Another solution is to add a compilation step that fixes <span class="code">L</span> strings +on bad platforms to be non-<span class="code">L</span> strings, but padded with <span class="code">\0</span>s. E.g., +<span class="code">L"Hello"</span> gets preprocessed into <span class="code">"\000H\000e\000l\000l\000o\000"</span>. +This solution is more annoying to the developer, where the prior +solution is more annoying during the runtime. + +<p>Before we go to too much trouble on this specific feature, we will +probably want to do more measurement to see just how much and how +often we are converting constant literal strings, and why. + + +<p>I'm currently ripping through the tree fixing things to use the +`WithConversion' forms where appropriate. I was also converting +things to use <span class="code">NS_ConvertToString</span> where appropriate; unless I get +talked out of it, I want to switch midstream to +<span class="code">NS_ConvertASCIItoUTF16</span>, then go back and fix up the +<span class="code">NS_ConvertToString</span> instances later. I've set things up so I can +check in as I go. After all these conversions have been done, I'll be +able to throw the switch (what switch? NEW_STRING_APIS) which will +make <span class="code">nsString</span> inherit from <span class="code">nsAWritableString</span>, etc. and allow us to +start exploiting these other opportunities (e.g., for literal strings, +shared strings, etc. See +<a class="exact-uri" href="http://bugzilla.mozilla.org/show_bug.cgi?id=28221">http://bugzilla.mozilla.org/show_bug.cgi?id=28221</a> for details and +reasoning.) + +<p>I guess I'm expecting comments on: + +<ul> + <li>how really annoying this whole topic is + <li>how bad <span class="code">L"xxx"</span> is + <li>whether to move forward with <span class="code">NS_ConvertASCIItoUTF16</span> + <li>whether we should move to xxxConvertingASCIItoUTF16 etc instead + of `WithConverting' + <li>arguments about where encoding conversions should live + <li>arguments about whether going between 1 and 2 byte storage is an + encoding conversion + <li>questions about stuff I didn't mention or didn't explain well + <li>pointing out stuff I'm just plain wrong about, or things I forgot + <li>etc +</ul> + +<p>So as not to jumble the discussion, I'll be separately posting other +requests for comments about specific features of the design of the new +string hierarchy. + +<p>I hope this helps keep everybody filled in on what we're thinking and +able to point out what we're forgetting or screwing up :-) + + + + + +<hr> +<pre> +Date: Wed, 19 Apr 2000 21:12:47 -0400 +Subject: more string info +</pre> + +<p> <a class="exact-uri" href="news://news.mozilla.org/scc-705460.16423913042000@news.mozilla.org">news://news.mozilla.org/scc-705460.16423913042000@news.mozilla.org</a> + + + + + +<hr> +<pre> +Date: Fri, 26 May 2000 15:31:37 -0400 +Subject: Re: Question on == +</pre> + +<p>I would prefer you compare with <span class="code">Equals</span> (which should really be named +<span class="code">IsEqualTo</span>) rather than <span class="code">operator==()</span> because of this: + +<div class="source-code"> +<pre> +char* a; +char* b; + +// ... + +if ( a == b ) + // ... +</pre> +</div> + +<p>Comparing two raw `string' pointers doesn't compare the characters +they point to, but instead compares the bits of the pointers. For +this reason, I may eventually make comparison of a string with a +pointer using operators just go away. + + + + + +<hr> +<pre> +Date: Wed, 14 Jun 2000 14:38:55 -0400 +Subject: Re: Fix to XprtDefs.h +</pre> + +<p>Yes, we're aware that turning off <span class="code">wchar_t</span> support makes <span class="code">wchar_t</span> be +a synonym for <span class="code">unsigned short</span> under Metrowerks. We know that the +current version of VC++ also makes these types equivalent. In theory, +though, the types are distinct even when they are the same size and +shape. By using real <span class="code">wchar_t</span> support, we are forced to recognize +the distinction and navigate it appropriately with <span class="code">reinterpret_cast</span> +(via <span class="code">NS_REINTERPRET_CAST</span>). The win here is that we aren't caught by +compiler changes that suddenly make some set of compilers compliant +and therefore break our code. We will add an autoconf test that lets +UNIX compilers opt in to our string scheme when they have an +appropriately shaped <span class="code">wchar_t</span>. If these happen to be compliant +compilers, all will be well. If they don't, the casts don't hurt, +because they are type correct. We are writing our code to meet the +standard as we move forward. + +<p>The win for us is realized by the following macros + +<div class="source-code"> +<pre> +#ifdef HAVE_CPP_2BYTE_WCHAR_T + #define NS_LITERAL_STRING(s) nsLiteralString(L##s, \ + (sizeof(L##s)/sizeof(wchar_t))-1) +#else + #define NS_LITERAL_STRING(s) NS_ConvertASCIItoUTF16(s, \ + sizeof(s)-1) +#endif +</pre> +</div> + +<p>An <span class="code">nsLiteralString</span> points directly to the literal characters. No +copying, no conversion, and the length calculation happens at compile +time. This has turned out to be as large a savings as 15% of code +space and 8% of data space, net, in our string test harness It's +faster as well, again by eliminating the copying, conversion, and +length calculation. We don't know yet what those numbers translate +into in our real code base, but we have high hopes. + +<p>I don't want to be in the position to ask you to change your code. I +don't think it's appropriate for me to do so. The AIM application +that is your client is our client as well. They need to resolve this +difference between us in whatever way they think best. That may mean +asking you if changing your apis is the right thing to do. Or it may +mean applying the casts. Our code-base and yours, Justin, are more +like cousins. I don't think you should have to change just to conform +to us. You may think my arguments for using real <span class="code">wchar_t</span> have +merit, and adopt similar usage just because you agree; but I think the +only obligation you have is to follow the technical solution you think +is right for your code. + +<p>If you decide to make this api change, it will mean shipping a new +binary (on Mac) for your library to clients who want to switch over to +the new api (since the name mangling will be different, and therefore, +the link requirements will change). + +<p>Hope this helps, + + + + + +<hr> +<pre> +Date: Thu, 15 Jun 2000 19:36:55 -0400 +Subject: Re: Checkin approval for bug 32336 +</pre> + +<div class="source-code"> +<pre> +S.Equals(NS_LITERAL_STRING("bar"), PR_TRUE, 3) +</pre> +</div> + +<p>doesn't compile because there is no three parameter form for <span class="code">Equals</span>. + For all definitions of <span class="code">Equals</span> on strings, see "nsAReadableString.h" + +<p><a class="exact-uri" href="http://lxr.mozilla.org/seamonkey/source/xpcom/ds/nsAReadableString.h">http://lxr.mozilla.org/seamonkey/source/xpcom/ds/nsAReadableString.h</a> + +<p>There is an <span class="code">EqualsWithConversion</span> that takes three parameters. + +<p> <a class="exact-uri" href="http://lxr.mozilla.org/seamonkey/source/xpcom/ds/nsString2.h#731">http://lxr.mozilla.org/seamonkey/source/xpcom/ds/nsString2.h#731</a> + +<p>It is ``EqualsWithConversion'' because it admits the possibility of an +encoding specific transformation, in this case to provide +case-insensitive comparison. This also wouldn't compile, however, +since, at the moment, an <span class="code">nsLiteralString</span> doesn't provide an operator +to produce a <span class="code">const PRUnichar*</span> (though perhaps it should), and it +doesn't satisfy the other interfaces that match this call, e.g., a +<span class="code">const nsString&</span>. + +<p>Perhaps I need to move case-insensitive comparison up out of +<span class="code">nsString</span> into a global encoding specific transformations and +algorithms file (which was on its way anyway as Waterson, knows); this +use is one bit of evidence to support this. In the short term, this +can be fixed (if we think the current behavior is wrong) by providing +<span class="code">operator const CharT*() const</span> on literal string. + +<p>If you can live with out case-folding, the earlier form is preferred + +<div class="source-code"> +<pre> +S == NS_LITERAL_STRING("bar") +</pre> +</div> + +<p>if you can't, then one of the fixes I mentioned is in order. + + + + + +<hr> +<pre> +Date: Thu, 15 Jun 2000 19:47:12 -0400 +Subject: Re: [Fwd: how to use nsString ?] +</pre> + +<pre class="email-quote"> + >I see these same examples time and again in the embedding + >samples/docs, but I can't compile them. +</pre> + +<p>Apologies. Documentation mentioning strings is getting out of date. +Here are some specific answers. + + +<pre class="email-quote"> + >nsString URLString("http://www.mozilla.org"); +</pre> + +<p>...is now perhaps best expressed as + + nsString URLString( NS_LITERAL_STRING("http://www.mozilla.org") ); + +<p>since an <span class="code">nsString</span> is a sequence of 2-byte wide characters, and the +routines that implicitly convert 1-byte sequences (like the literal +sequence you specified, "http:...") are now gone. + +<p>Up until not too long ago, one would have had to say + +<div class="source-code"> +<pre> +nsString URLString; +URLString.AssignWithConversion("http://www.mozilla.org"); +</pre> +</div> + +<p>The <span class="code">NS_LITERAL_STRING</span> construction is new machinery that has the +potential to make many operations much more efficient. + +<pre class="email-quote"> + >nsString URLString; + >URLString.SetString("www.mozilla.org"); +</pre> + +<p><span class="code">SetString</span> was a synonym for <span class="code">Assign</span> or assignment with +<span class="code">operator=()</span>, it too went away. The equivalent is the second +example I gave above, that is, the one with <span class="code">AssignWithConversion</span>. + +<p><span class="code">Assign</span> still exists. <span class="code">AssignWithConversion</span> takes on that +functionality for assignments that require encoding transformations +(e.g., from ASCII to UTF16). <span class="code">SetString</span> is gone, since it was always +a synonym for <span class="code">Assign</span>. + +<p>Learn more about the general APIs for strings that we are trying to +move to by examining + +<a class="exact-uri" href="http://lxr.mozilla.org/seamonkey/source/xpcom/ds/nsAReadableString.h">http://lxr.mozilla.org/seamonkey/source/xpcom/ds/nsAReadableString.h</a> +<a class="exact-uri" href="http://lxr.mozilla.org/seamonkey/source/xpcom/ds/nsAWritableString.h">http://lxr.mozilla.org/seamonkey/source/xpcom/ds/nsAWritableString.h</a> + +<p>Hope this helps, + + + + + +<hr> +<pre> +Date: Thu, 15 Jun 2000 21:26:51 -0400 +Subject: Re: Checkin approval for bug 32336 +</pre> + +<pre class="email-quote"> + >I *need* the count attribute, because I need to compare only the first + >chars (that's inherent to the logic). +</pre> + +<p>This is what substrings are for. In that case, you could use + +<div class="source-code"> +<pre> +Substring(S, 0, 3) == NS_LITERAL_STRING("bar") +</pre> +</div> + +<p>As for case-folding, it's best if you can case-fold everything up +front, instead of doing it repeatedly. I'll have to get back to you +on a general solution to that problem, or what my schedule for getting +it checked in would be. I'm sorry, I know that's not what you needed +to hear. If the source string is an <span class="code">nsString</span>, you can continue to +exploit its implementation of these routines, e.g., <span class="code">ToLower</span> all +up-front. + +<p>Hope this helps, + + + + + +<hr> +<pre> +Date: Mon, 19 Jun 2000 14:23:47 -0400 +Subject: Re: string fu +</pre> + +<pre class="email-quote"> + >It seems less convenient to have to first check path.IsEmpty, and + >then if false get path.Last and test it. +</pre> + +<p>What would you prefer? That extracting a character not in the string +always return <span class="code">CharT(0)</span>? Can't do it for two reasons: (1) <span class="code">0</span> may be +a valid character in a particular encoding, so it can't be used in +general as a ``no character at that position'' marker; and (2) I can't +control what an individual string implementation does when asked to +get an out-of-bounds fragment, it's explicitly undefined. That means +the result of <span class="code">CharAt</span> is explicitly undefined for indexes outside the +defined contents of the string. As a debugging convenience, I have +made this assert, but it has always been the case that retrieving such +a character had undefined results ... even in [the old] code. + +<p>OK, you might say, well at least let me ask for a character that is +only off the end by one. E.g., <span class="code">Last</span> of an empty string. Reason (1) +from above still applies. How bad is it to say, for the case you gave + +<div class="source-code"> +<pre> +PRBool needsDelim = PR_FALSE; +if ( !path.IsEmpty() ) + { + PRUnichar last = path.Last(); + needsDelim = !(last == '/' || last == '\\'); + } +</pre> +</div> + +<p>In general, you probably want to opt out of a whole lot of work when +the source string is empty. It is slightly less convenient, but it +doesn't tie us to a bunch of implementation specific mojo. + + +<pre class="email-quote"> + >Can we fix GetUnicode in this case? +</pre> + +<p>This is an annoying property of auto strings, e.g., that they always +have an allocated buffer. I'm happy to fix this bug, however, be +aware that <span class="code">GetUnicode</span> and <span class="code">GetBuffer</span> are artifacts of [the old] +implementation that we don't want to support. They are not part of +the abstract interface. We will keep them no longer than we have to. +They don't support our multi-fragment paradigm. People who require a +contiguous hunk of characters in the future, and are unwilling to +switch over to chunky-iterators, may be forced to copy the string to +their own buffer. There will be an implementation of narrow character +string that guarantees contiguous allocation and a zero-terminator, +much as <span class="code">nsCString</span> does now, for compatibility with platform uses, +but this won't be the default string class. + + + + + +<hr> +<pre> +Date: Mon, 19 Jun 2000 17:22:31 -0400 +</pre> + +<p>Clarifying String Sematics + +<p>Recently, I added an assert to the string operations that extract +characters, namely <span class="code">First()</span>, <span class="code">Last()</span>, <span class="code">CharAt()</span>, and +<span class="code">operator[]()</span>. This assert fires when any of these routines are used +to access a character outside the defined contents of the string. For +<span class="code">First()</span> and <span class="code">Last()</span> that means whenever they are applied to an +empty string. For <span class="code">CharAt()</span> and <span class="code">operator[]()</span>, that means whenever +they are used to access an index outside the range of +<span class="code">0</span>..<span class="code">Length()-1</span>. There have been some complaints, however, the +result was always undefined. What follows is extracted from an email +exchange between me and warren on this topic. I hope it clarifies +strings semantics + +<p>Warren writes: +<pre class="email-quote"> + >I hit your funky CharAt assertion tonight in this piece of code: + + >NS_IMETHODIMP + >nsIOService::ResolveRelativePath( + > const char *relativePath, + > const char* basePath, + > char **result ) + > { + > nsCAutoString name; + > nsCAutoString path(basePath); + > + > PRUnichar last = path.Last(); + > PRBool needsDelim = !(last == '/' || last == '\\' || last == + > '\0'); + > ... + + >where basePath is null. It seems less convenient to have to first + >check path.IsEmpty, and then if false get path.Last and test it. +</pre> + +<p>I replied: +<pre class="email-quote"> + >What would you prefer? That extracting a character not in the + >string always return <span class="code">CharT(0)</span>? Can't do it for two reasons: + >(1) <span class="code">0</span> may be a valid character in a particular encoding, so it + >can't be used in general as a ``no character at that position'' + >marker; and (2) I can't control what an individual string + >implementation does when asked to get an out-of-bounds fragment, + >it's explicitly undefined. That means the result of <span class="code">CharAt</span> is + >explicitly undefined for indexes outside the defined contents of + >the string. As a debugging convenience, I have made this assert, + >but it has always been the case that retrieving such a character + >had undefined results ... even in [the old] code. + + >OK, you might say, well at least let me ask for a character that + >is only off the end by one. E.g., <span class="code">Last</span> of an empty string. + >Reason (1) from above still applies. How bad is it to say, for the + >case you gave + + > PRBool needsDelim = PR_FALSE; + > if ( !path.IsEmpty() ) + > { + > PRUnichar last = path.Last(); + > needsDelim = !(last == '/' || last == '\\'); + > } + + >In general, you probably want to opt out of a whole lot of work + >when the source string is empty. It is slightly less convenient, + >but it doesn't tie us to a bunch of implementation specific mojo. +</pre> + +<p>Warren also asks: +<pre class="email-quote"> + >Here's another issue, perhaps more serious. If I say this: + + > foo(const PRUnichar* s) { + > nsAutoString str(s); + > bar(str.get()); + > } + + >where s is null, bar will get passed a zero-length PRUnichar + >sequence instead of null. This makes it so that you can't just + >test for the argument == null. You have to nsCRT::strlen(arg) == 0 + >which is much less efficient. Can we fix GetUnicode in this case? +</pre> + +<p>And I reply: +<pre class="email-quote"> + >This is an annoying property of auto strings, e.g., that they + >always have an allocated buffer. I'm happy to fix this bug, + >however, be aware that <span class="code">GetUnicode</span> and <span class="code">GetBuffer</span> are artifacts + >of [the old] implementation that we don't want to support. They + >are not part of the abstract interface. We will keep them no + >longer than we have to. They don't support our multi-fragment + >paradigm. People who require a contiguous hunk of characters in + >the future, and are unwilling to switch over to chunky-iterators, + >may be forced to copy the string to their own buffer. There will + >be an implementation of narrow character string that guarantees + >contiguous allocation and a zero-terminator, much as <span class="code">nsCString</span> + >does now, for compatibility with platform uses, but this won't be + >the default string class. +</pre> + +<p>In a later message, Chris Waterson asks a related question +<pre class="email-quote"> + >scc: should we add <span class="code">operator PRUnichar*()</span> to + >NS_ConvertASCIItoUTF16? +</pre> + +<p>And I reply: +<pre class="email-quote"> + >It seems reasonable. A lot more reasonable that forcing people to + >call <span class="code">GetUnicode()</span>. I alluded to platform specific classes in an + >earlier message to warren that you were cc'd on, Chris. I imagine + >that the <span class="code">...Convert...</span> routines would be required to produce + >contiguous allocation 0-terminated strings (though the as yet + >unimplemented <span class="code">...Copy...</span> forms, of course wouldn't. So <span class="code">operator + >const PRUnichar*() const</span> makes perfect sense to me here. +</pre> + +<p>Hope this makes sense, + + + + +<hr> +<pre> +Date: Tue, 20 Jun 2000 04:05:31 -0400 +Subject: Re: NS_LITERAL_STRING is broken +</pre> + +<p>The behavior you describe sounds exactly like when you say + +<div class="source-code"> +<pre> +const char* foobar = "foobar"; + +... NS_LITERAL_STRING(foobar).get() ... +</pre> +</div> + +<p>because in this case, the thing passed in is a <span class="code">const char*</span>. +<span class="code">NS_LITERAL_STRING</span> is not meant to be used in this way. It is only +meant to be used around a <span class="code">"</span> delimited string. The type of such is +<span class="code">const char[N]</span> where N is the number of characters in the string + 1 +for the zero terminator it helpfully adds. <span class="code">sizeof</span> such a type is +<span class="code">N</span>. + +<p>Are you sure you had the actual string as an argument, as in your +example to me? Or could the actual code have been like my sample, +above? + + + + + +<hr> +<pre> +Date: Thu, 29 Jun 2000 13:35:10 -0400 +Subject: Re: a fix +</pre> + +<pre class="email-quote"> + > + if (Length() == 0) { return nsnull; } +</pre> + + +<p>Dave, + +<p>please read + + <a class="exact-uri" href="news://news.mozilla.org/scc-314ABF.14261619062000@news.mozilla.org">news://news.mozilla.org/scc-314ABF.14261619062000@news.mozilla.org</a> + +<p>It's just plain wrong to let people try to index into a string outside +its defined contents. I can't just return <span class="code">'\0'</span> or <span class="code">PRUnichar('\0')</span> +there as that <strong>could</strong> be a legal value to have somewhere in your +string for some encodings ... and the encoding is not specified. So +your patch has the basic problem of defeating my plan to stop people +from doing this bad thing. + +<p>The second problem with your patch is that you use the symbolic +constant <span class="code">nsnull</span>, which is ostensibly a pointer value; <span class="code">Last</span> returns +a character. <span class="code">nsnull</span> is not appropriate for that purpose. In fact, +C++ gurus pretty much eschew the use of symbolic constants for <span class="code">0</span>. +<span class="code">NULL</span> is to be avoided. <span class="code">nsnull</span> is wrong-headed in that it presumes +we could have some <strong>other</strong> application specific value for <span class="code">NULL</span>. We +can't, it would never work. It's just wasted brain-print. Always use +<span class="code">0</span> for these situations, and if you want to communicate the fact that +something is a pointer type, either use a comment or a +(construction-style) cast, like so (graded examples from worst to +best:) + +<ul> + <li>F: FindChildByNameWithHint("Chuck", nsnull); + + <li>D: FindChildByNameWithHint("Chuck", NULL); + + <li>C: FindChildByNameWithHint("Chuck", /* Child* */ 0); + + <li>B: typedef Child* Child_ptr; + FindChildByNameWithHint("Chuck", Child_ptr(0)); + + <li>A: FindChildByNameWithHint("Chuck", 0); +</ul> + +<p>Don't let this discourage you; keep up the good work :-) + + + + + +<hr> +<pre> +Date: Tue, 8 Aug 2000 23:47:16 -0400 +Subject: Re: nsWritingIterator? +</pre> + +<pre class="email-quote"> + >Can you give me any pointers to examples, or docs, or just some + >general advice? +</pre> + + <a class="exact-uri" href="http://ScottCollins.net/Journal/discussion/string_iterators.html">http://ScottCollins.net/Journal/discussion/string_iterators.html</a> + +<p>does this help? + +<p>I can personally walk you through any specific scenario you need. + + + + + +<hr> +<pre> +Date: Wed, 9 Aug 2000 02:35:03 -0400 +Subject: Re: nsWritingIterator? +</pre> + +<p>You got it right... it's <span class="code">nsWritingIterator<CharT></span> for whichever +character type you care about, either <span class="code">char</span> or <span class="code">PRUnichar</span>. You +_can_ use this iterator like a character pointer ... that is, you can +dereference it, assign into its dereference, etc. It is more +efficient, though, to directly address a particular range of +characters around where it points by asking it for its actual +character pointer with <span class="code">get</span>, and knowing that there are +<span class="code">size_forward()</span> characters available ahead of that pointer and +<span class="code">size_backward()</span> characters available behind it. After examining +those characters by hand, you can advance the iterator beyond the +characters you have examined (and possibly into the next chunk, should +one exist) by adding into it (with +=) the count of the characters you +have processed. + +<p>Here are three examples of running through a string and modifying some +of the characters in it. All use <span class="code">nsWritingIterator</span>s. + + +<div class="source-code"> +<pre> + // inefficient, but works in a pinch: + // iterators can hide all details of chunks by acting like + // a raw character pointer + +nsWritingIterator<PRUnichar> s = S.BeginWriting(); +nsWritingIterator<PRUnichar> done_with_string = S.EndWriting(); + + // for each character in the string |S| +while ( s != done_with_string ) + { + // if the character is lower case, capitalize it + if ( 'a' <= *s && *s <= 'z' ) + *s = *s -'a' + 'A'; + } + + + + + // efficient + // iterators provide a mechanism by which you can process + // a chunk-at-a-time + +nsWritingIterator<PRUnichar> iter = S.BeginWriting(); +nsWritingIterator<PRUnichar> done_with_string = S.EndWriting(); + + // for each chunk of the string +while ( iter != done_with_string ) + { + size_t N = iter.size_forward(); // # of chars in this chunk + PRUnichar* s = iter.get(); + PRUnichar* done_with_chunk = s + N; + + // for each character in this chunk + for ( ; s < done_with_chunk; ++s ) + { + // if the character is lower case, capitalize it + if ( 'a' <= *s && *s <= 'z' ) + *s = *s - 'a' + 'A'; + } + + // advance the iterator past characters + // we examined (and into the next chunk, if any) + s += N; + } + + + + // elegant + // pull your transformation into a `sink', and |copy_string| + // will efficiently pump any kind of string into it + +struct Capitalize + { + // inline + PRUint32 + write( PRUnichar* s, PRUint32 N ) + // processes one chunk, called repeatedly by |copy_string| + { + PRUnichar* done_with_chunk = s + N; + + // for each character in this chunk + for ( ; s < done_with_chunk; ++s ) + { + // if the character is lower case, capitalize it + if ( 'a' <= *s && *s <= 'z' ) + *s = *s - 'a' + 'A'; + } + } + }; + +copy_string(S.BeginWriting(), S.EndWriting(), Capitalize()); +</pre> +</div> + + + +<p>Does this show it better? + + + + + +<hr> +<pre> +Date: Thu, 17 Aug 2000 18:23:22 -0400 +</pre> + +<pre class="email-quote"> + >I tried looking at the string header files but they + >are awfully complicated. +</pre> + +<p>I'll explain things in a little <strong>more</strong> detail than you need, then so +that some of the stuff you see in these headers will make more sense. +I'll also answer your questions out of order. + +<p>First: the string hierarchy looks like this + +<a class="exact-uri" href="http://ScottCollins.net/Journal/discussion/string_hierarchy.gif">http://ScottCollins.net/Journal/discussion/string_hierarchy.gif</a> + +<p>The two most important headers are: + +<a class="exact-uri" href="http://lxr.mozilla.org/seamonkey/source/xpcom/ds/nsAReadableString.h">http://lxr.mozilla.org/seamonkey/source/xpcom/ds/nsAReadableString.h</a> +<a class="exact-uri" href="http://lxr.mozilla.org/seamonkey/source/xpcom/ds/nsAWritableString.h">http://lxr.mozilla.org/seamonkey/source/xpcom/ds/nsAWritableString.h</a> + +<p>These abstract classes, <span class="code">nsAReadable[C]String</span>, and +<span class="code">nsAWritable[C]String</span> are typically what you will want to use in the +interfaces of new code. If you write a piece of code that takes a +string for input, consider, e.g., + +<div class="source-code"> +<pre> +void consumes_a_string( const nsAReadableString& aInput ); +</pre> +</div> + +<p>If you write a piece of code that modifies a string, consider + +<div class="source-code"> +<pre> +void modifies_a_string( nsAWritableString& aResult ); +</pre> +</div> + + +<p>When creating your own classes, member strings will typically be +<span class="code">nsString</span>s. When you can't avoid creating a short string that you +need only temporarily during a function, you will typically use +<span class="code">nsAutoString</span>. When someone passes you a raw pointer, or a raw +pointer and a length, representing a buffer of characters that you may +examine, but won't own, you can treat it like a string by wrapping it +in an <span class="code">nsLiteralString</span>, e.g., + +<div class="source-code"> +<pre> +void +reads_a_buffer( const PRUnichar* aInput, PRUint32 aInputLength ) + { + nsLiteralString input(aInput, aInputLength); + // doesn't allocate or copy + + // ... + } +</pre> +</div> + +<p>You will use <span class="code">nsLiteralString</span> around quoted constant strings as well, +though typically through the <span class="code">NS_LITERAL_STRING</span> macro, to avoid doing +a length calculation + +<div class="source-code"> +<pre> +NS_LITERAL_STRING("x") +</pre> +</div> + +<p>expands to + +<div class="source-code"> +<pre> +nsLiteralString(L"x", (sizeof(L"x")/sizeof(PRUnichar) - 1)) +</pre> +</div> + +<p>if <span class="code">L</span> notation works as needed on your platform. + +Those are the basics. Now onto your questions: + + +<pre class="email-quote"> + >For example this won't compile. [...] + + >str1 += L"abc " + str2 + L"def"; +</pre> + + +<p><span class="code">L"abc "</span> makes a an object that is a <span class="code">const wchar_t[5]</span>, and none of +the string code knows about <span class="code">wchar_t</span>. The main reason is that +<span class="code">wchar_t</span> is not necessarily the right size (it can be 4 bytes under +gcc). If you wrap these constant expressions in <span class="code">NS_LITERAL_STRING</span>, +as described above, you should get the right thing, e.g., + +<div class="source-code"> +<pre> +str1 += NS_LITERAL_STRING("abc ") + str2 + NS_LITERAL_STRING("def"); +</pre> +</div> + + +<pre class="email-quote"> + >Another one is: + >function(const PRUnichar *foo); + >call function(L"abc " + str2); + + >It won't create a temporary nsString. +</pre> + +<p>This one, I have a quick and easy explanation for. If <span class="code">function</span> was +declared like this + +<div class="source-code"> +<pre> +function( const nsAReadableString& ) +</pre> +</div> + +<p>then, no problem, since a <span class="code">nsPromiseConcatenation</span> (which was the +result of adding those two things together) <strong>is</strong> a readable string. +No other objects need to be created; no copying needs to be performed. + +<p>In all cases, we want the creation of <span class="code">nsString</span>s et al, to be +<span class="code">explicit</span>, since creation is unbelievably expensive, requiring heap +allocation, locks, copying, etc. + +<p>I hope this answers both your posts, + + + + + +<hr> +<pre> +Date: Thu, 17 Aug 2000 20:57:08 -0400 +Subject: re our conversation +</pre> + + return ToNewUnicode( nsLiteralCString(buffer) ); + + + + + + +<hr> +<pre> +Date: Fri, 18 Aug 2000 02:52:45 -0400 +Subject: Re: More questions and new string API +</pre> + +<pre class="email-quote"> + >1) How do I return a static string? + + >const nsAReadableString& foo() {return NS_LITERAL_STRING("x");} + >errors on taking the address of a temporary variable. +</pre> + +<p>Unfortunately, <span class="code">NS_LITERAL_STRING</span>s definition is not particularly +amenable to this use. Instead, you would have to say something like +this: + +<div class="source-code"> +<pre> +const nsAReadableString& +foo() + { +#ifdef HAVE_CPP_2BYTE_WCHAR_T + static nsLiteralString static_foo(L"x", 1); +#else + static nsLiteralString static_foo; + static PRBool initialized = PR_FALSE; + if ( !initialized ) + { + static_foo.AssignWithConversion("x", 1); + initialized = PR_TRUE; + } +#endif + return static_foo; + } +</pre> +</div> + + +<pre class="email-quote"> + >2) I'm using these with the STL library in an XPCOM component. + >What type should I use with map? This doesn't work... + + >typedef map<const nsAReadableString&, myType*> mapStringMyType; + >mapStringMyType foo; + >foo.find(nsAReadableString); - I want to find on a ReadableString +</pre> + +<p>I don't know what errors you are getting; but it probably doesn't work +because a reference isn't an assignable type. This is just a guess. +You may need to use + +<div class="source-code"> +<pre> +map<const nsAReadableString*, myType*> +</pre> +</div> + +<p>If you actually want the map to manage ownership of the keys, then +you'll want to use a concrete type, e.g., + +<div class="source-code"> +<pre> +map<nsString, myType*> +</pre> +</div> + +<p>or perhaps + +<div class="source-code"> +<pre> +map<nsSharedStringPtr, myType*> +</pre> +</div> + +<p>Or maybe there's something else wrong. Send me the error messages. +If you end up using a pointer, then of course you'll have to supply a +comparison function to the <span class="code">map</span> template. You won't be satisfied +with the default comparison of pointers :-) Sorry I couldn't answer +this one more completely. + + +<pre class="email-quote"> + >3) How do a get a raw PRUnichar pointer out of nsAReadableString + >when I need to call something that wants 'unsigned short *'? +</pre> + +<p>The problem with this scenario is that an <span class="code">nsAReadableString</span> doesn't +promise that all its data is contiguous, nor that it is +zero-terminated, which is what I suspect you want in this case. If +the function you want to call can take {pointer, length} tuples, and +can consume the string in hunks without zero termination ... then you +can use <span class="code">copy_string</span> to pump the string into your function, see + + <a class="exact-uri" href="http://ScottCollins.net/Journal/discussion/string_iterators.html">http://ScottCollins.net/Journal/discussion/string_iterators.html</a> + +<p>If not, and you absolutely have to have a contiguous zero-terminated +buffer, then there is a new facility (part of the DOMAPI branch) that +does what you need. It's not checked in on the trunk; it should +be in early next week. It is <span class="code">nsPromiseFlatString</span>. This class +promises a contiguous zero-terminated buffer; and has an <span class="code">operator +PRUnichar*</span> to produce a pointer to that buffer automatically. If the +underlying class <strong>is</strong> one that happens to be a single fragment and +zero-terminated, then, like <span class="code">nsPromiseSubstring</span> and +<span class="code">nsPromiseConcatenation</span>, this class merely holds a reference into the +original data. If, however, the underlying string is multi-fragment +or not zero-terminated, then <span class="code">nsPromiseFlatString</span> allocates a +contiguous buffer of appropriate size and copies the fragmented string +data to it. So given + +<div class="source-code"> +<pre> +void ReadBuffer( PRUnichar* ); +</pre> +</div> + +<p>You can call this as efficiently as possible with an arbitrary string +like so + +<div class="source-code"> +<pre> +ReadBuffer( nsPromiseFlatString(aString) ); +</pre> +</div> + + +<p>If the function you are calling needs to take ownership of the buffer +you hand it, then you will probably call <span class="code">ToNewUnicode</span> like so + +<div class="source-code"> +<pre> +void ConsumeBuffer( PRUnichar* ); + +ConsumeBuffer( ToNewUnicode(aString) ); +</pre> +</div> + +<p>The global function <span class="code">ToNewUnicode</span> is declared in "nsReadableUtils.h", +and was only recently added to the build. It is currently being used +in the DOMAPI branch. It is part of the build, but the file +"dlldeps.c" in XPCOM may need to be modified to ensure it is exported +on your platform if you are building the tip. + +Needless to say, you want to avoid functions that require bare +pointers for several reasons: (a) they typically assume +zero-termination, which is not guaranteed by the normal encodings; (b) +they require contiguous allocation, which may not be possible; (c) +they scan for the end of the string, at linear cost (if the encoding +makes it possible at all), when the length could be known in advance. +If you have to do it, the above mechanisms work, but be aware of the +cost and the potential need to copy. + + +<pre class="email-quote"> + >4) How do I declare a local variable to hold a nsAReadableString? + >and a member variable? +</pre> + +<p><span class="code">nsAReadableString</span> is an abstract type. So you can't have a concrete +instance of it. All strings in the hierarchy are readable strings. +If you just want a reference to a readable string, you can say, e.g., + +<div class="source-code"> +<pre> +struct foo + { + const nsAReadableString& mString; + // ... + + foo( const nsAReadableString& aString ) : mString(aString) { } + }; +</pre> +</div> + +<p>...similarly with pointers; but I suspect you are looking for +something more concrete. An <span class="code">nsString</span> is a <span class="code">nsAReadableString</span>, and +is the typical thing you want as a member variable. An <span class="code">nsAutoString</span> +is also an <span class="code">nsAReadableString</span> and is typically what you would use for +a short (in length) temporary (in lifetime) local variable, as I +mentioned in my previous post. + + +<pre class="email-quote"> + >5) If I call a function that returns a PRUnichar* and I want t + >use it as a nsAReadableString should I wrap it in a + >nsLiteralString? +</pre> + +<p>Yes, though remember, an <span class="code">nsLiteralString</span> assumes the lifetime of the +underlying data is under someone else's control. If the called +function gives you a buffer that you need to <span class="code">delete</span>, you will have +to manage that yourself. Currently, people often use <span class="code">nsXPIDLString</span> +to handle that. XPIDL strings are <strong>not</strong> part of the hierarchy. They +are only used as a sort of string-<span class="code">auto_ptr</span>. However, I'm +integrating their functionality into <span class="code">nsString</span>. There is no problem +in wrapping the same pointer in both as two separate local variables, +one to give you the readable interface, and one to manage the +lifetime. + +<p>If it's OK with you, I'd like to post this reply (including your +quoted questions) to n.p.m.xpcom and also put a copy near the string +iterator discussion I provided a link to above, so that other people +with similar questions can see these answers. + +<p>Hope this helps, + + + + + +<hr> +<pre> +Date: Sun, 3 Sep 2000 03:52:17 -0400 +</pre> + +<p>In article <8nu9m2$eo14@secnews.netscape.com>, "Jon Smirl" +<jonsmirl@mediaone.com> wrote: + +> I have the new strings up and running in my app. They work as +> advertised and +> I haven't found any bugs. Thanks for the good job in designing and +> implementing them. Here's are a summary of issues I've encountered +> so far... + +<p>Thanks, and I appreciate your comments and insights. + + +> +> 1) Should there be a nsSegmentedString derived from nsString instead +> of building segment support into nsString? None of my strings are +> segmented but +> I keep executing code that is supports it. nsPromiseFlatString would +> be trivial in the non-segmented case. + +<p>The general case is that a string does not promise to have contiguous +data. A specific case is that, for some implementations, it does. +You couldn't do it the other way around, because a segmented string +couldn't satisfy all the promises of a flat string. However, through +the use of chunky iterators, operating on strings that happen to be +flat is very efficient. In fact, <span class="code">nsPromiseFlatString</span> is trivial in +the non-segmented case. In addition, I'll be adding an abstract flat +class into the hierarchy, which will present additional interface ... +in your local routines where you actually have declared a concrete +string instance that happens to be flat, the compiler will give you +the benefit of using the flat specific routines (e.g., a substring +object over a flat string is simpler than the general purpose +substring). I need to be cautious about this, though, since I don't +automatically want people propagating the flat type through their +interfaces. That would put us in the same boat we're in right now ... +where routines only work on a specific kind of string, which denies +other parts of the code the opportunity to use an implementation +beneficial to its specific needs, and typically for no good reason. + +> +> 2) Should nsAWritableString have a way to get the buffer and then +> return it? +> I need to get the buffer to pass it to OS calls. I'm doing this now +> by passing around nsStrings instead of the interface. If I just use +> the interface I encur an extra copy since I have to use a temporary +> buffer. + +<p>A specific string implementation could promise this, but in general, a +writable could not. After all, a writable doesn't even guarantee +contiguous storage. To some degree, this is what +<span class="code">nsPromiseFlatString</span> is for. However, this is a readable promise +only. It will also be the case that <span class="code">ns[C]String</span>s, in the very near +future will be able to just assume ownership of an arbitrary buffer +allocated on the free store with the XPCOM allocators ... getting one +to give up its buffer, on the other hand, presents some problems. Do +you have a lot of places where the system writes into your string +buffer space? Or do you have a lot of system routines that return you +new buffers? I can imagine using <span class="code">nsPromiseFlatString</span> for this, but +what happens when the OS alters the underlying data? If the promise +had generated that flat data on behalf of a multi-fragment string, +should it now put the changes back? It's possible to do, I just want +to know if it's correct to allow this situation to happen. + + + +> +> 3) There needs to be a NS_LITERAL_CHAR() to go along with +> NS_LITERAL_STRING(). + +<p>OK. + + + +> Having NS_LITERAL_STRING() all over the code clutters +> it up and makes it hard to tell what the code is doing, could we +> have a standard short alias for this? + +<p>Yes, I'll try to think of something ... perhaps <span class="code">NS_LSTR</span>? + + +> 4) nsLiteralString should support n.ToInteger(&error); + +<p><span class="code">ToInteger</span> is actually a bad interface. It's only good if your +entire string is the number; this encourages you to edit your string +until it is one, or perhaps copy the numeric part to another string. +Better if you just <span class="code">sscanf</span> a string (don't know if I can provide +that in the general case, but I'm thinking about it), or else use +regular C++ extractors (which wouldn't be too hard for me to +provide), or else I could give you a <span class="code">ToInteger</span> that works on a pair +of iterators, extracting the integer from the digits between them. + +> +> 5) There should be a global define for an interface to a readonly +> empty string. + +<p>Yes, there will be. + + +> +> 6) Something is wrong with concatenation.... + +<p>Hopefully I've fixed this now. + + + +> 8) A forward definition is missing in the h files + +<p>I'll check it out. + + + +<p>My understanding is that you have already found the answers to your +other questions. + +<p>I hope this helps, + + + + +<hr> +<pre> +Date: Wed, 20 Sep 2000 17:32:13 -0400 +Subject: Re: how to free an nsString::ToNewCString +</pre> + +<pre class="email-quote"> + >What's the current approved way to free an nsString::ToNewCString? +</pre> + +<p><span class="code">nsMemory::Free</span> + + + + + +<hr> + +<p>You use several <span class="code">NS_ConvertASCIItoUTF16("...").get()</span>, these should be + + NS_LITERAL_STRING("...").get() + +<p>Don't do this to the very first case where you aren't wrapping an actual literal string. +The first instance would should exploit <span class="code">NS_LITERAL_STRING</span> technology as well, +around the initial declarations of the strings ... probably want to do this with +<span class="code">NS_NAMED_LITERAL_STRING</span>. + + + +<hr> +<pre> +Date: Thu, 12 Oct 2000 00:57:28 -0400 +Subject: string answers +</pre> + +<div class="source-code"> +<pre> +nsresult +DoSomething( nsAWritableString& answer ) + { + nsresult rv; + + nsXPIDLString registry_data; + Fetch("key", getter_Shares(registry_data)); + + nsLiteralString path(not_my_string); + + PRInt32 first_colon = path.FindChar(PRUnichar(':')); + if ( first_colon != -1 ) + { + // convert ... extract path from |path| + nsCOMPtr<nsILocalFile> localFile( do_CreateInstance(CID, &rv) +); + if ( localFile ) + { + +localFile->SetPersistentDescriptor(NS_ConvertUTF16toUTF8(path)); + + nsXPIDLString converted_path; + localFile->GetUnicodePath(getter_Copies(converted_path)); + answer = converted_path.get(); + } + } + else + { + answer = path; + } + + + return rv; + } +</pre> +</div> + + + + + +<hr> +<pre> +Date: Thu, 12 Oct 2000 02:03:49 -0400 +Subject: Re: and the answer is ... +</pre> + +<p>You can see from the line of code that you're on, that this should +have been fine. <span class="code">nsMemory::Alloc</span> would be asked to allocate a 1 byte +object. But it failed trying to allocate that. Which suggests that +the allocator was busy and non-reentrant and the debugger tried to +misuse it. Yes? + +<p>Of course, this doesn't solve your problem. Perhaps we need to go +back to the idea of a function that returns a pointer to the first +hunk of the string. + +<div class="source-code"> +<pre> +const char* +debug_string( const nsAReadableCString& aCString ) + { + nsReadingIterator<char> iter; + aCString.BeginReading(iter); + return aCString.IsEmpty() ? "" : iter.get(); + } +</pre> +</div> + +<p>This code should work regardless of what the allocator is doing. The +downsides are (a) it only returns the first hunk of the string, in the +case of a multi-fragment string; and (b) that hunk <strong>might</strong> not be +zero-terminated. + +<p>Hope this helps, + + + + + +<hr> +<pre> +Date: Thu, 12 Oct 2000 08:30:32 -0400 +Subject: Re: Self healing the cache :-) +</pre> + +<p>At 3:04 PM -0400 10/11/00, Mike Shaver wrote: +<pre class="email-quote"> + >NS_LITERAL_STRING(NS_XPCOM_SHUTDOWN_OBSERVER_ID); +</pre> + +<p>Macro ugliness makes <span class="code">NS_LITERAL_STRING</span> inappropriate for use over +other macros. In other words: + +<div class="source-code"> +<pre> +NS_LITERAL_STRING("foo") +</pre> +</div> + +<p>is <strong>good</strong>. + +<div class="source-code"> +<pre> +#define FOO "foo" +NS_LITERAL_STRING(FOO) +</pre> +</div> + +<p>is <strong>bad</strong>. Why? Because it turns into + +<div class="source-code"> +<pre> +nsLiteralString(LFOO, sizeof(LFOO)... +</pre> +</div> + +<p>and there is no <span class="code">LFOO</span>. Sorry. If you have to do this to a +macro-ized string, do the magic by hand, e.g., + +<div class="source-code"> +<pre> +nsLiteralString(FOO, sizeof(FOO)/sizeof(PRUnichar) + + sizeof(PRUnichar('\0'))) +</pre> +</div> + +<p>or else if you don't care that <span class="code">nsLiteralString</span> will scan for the +length, just say + +<div class="source-code"> +<pre> +nsLiteralString(FOO) +</pre> +</div> + +<p>Hope this helps, + + + + + +<hr> +<pre> +Date: Thu, 12 Oct 2000 08:36:14 -0400 +Subject: Re: Self healing the cache :-) +</pre> + +<p>Actually, I'm not even sure you can do it by hand, since you didn't + +<div class="source-code"> +<pre> +#define FOO L"foo" +</pre> +</div> + +<p>and <strong>can't</strong> do that cross-platform. The other way around this is to +define a global instead of a macro, that is, instead of saying + +<div class="source-code"> +<pre> +#define FOO "foo" +</pre> +</div> + +<p>at the top of your file, say + +<div class="source-code"> +<pre> +NS_NAMED_LITERAL_STRING(FOO, "foo") +</pre> +</div> + +<p>or else, if the macro was used only in one spot ... perhaps you could +just eliminate the macro in favor of <span class="code">NS_NAMED_LITERAL</span> in situ. + +<p>Arghh. In this case, you may be stuck with the extra work of +<span class="code">AssignWithConversion</span>. + + + + + +<hr> +<pre> +Date: Sun, 3 Dec 2000 16:38:07 -0400 +Subject: Re: another copy_string question +</pre> + +<pre class="email-quote"> + >Is there a way to tell, inside the write() sink, if one is in the + >final hunk? I need to do some special processing at the end. +</pre> + +<p>No, there isn't. But you could move such special processing into the +destructor of the sink. Remember, the sink is passed by reference, so +you can exactly control its lifetime. + +<div class="source-code"> +<pre> +{ + MySink sink; + nsReadingIterator<PRUnichar> sourceStart = aStr.BeginReading(); + nsReadingIterator<PRUnichar> sourceEnd = aStr.EndReading(); + copy_string(sourceStart, sourceEnd, sink); + // |sink| destructor executed here +} +</pre> +</div> + +<p>Hope this helps, + + + + + +<hr> +<pre> +Date: Fri, 15 Dec 2000 20:02:08 -0400 +Subject: fragment of code +</pre> + +<div class="source-code"> +<pre> +nsPromiseFlatString flatKey(aReadable); + +flatKey.get() +</pre> +</div> + + + + + + +<hr> +<pre> +Date: Tue, 16 Jan 2001 16:47:37 -0400 +Subject: Re: a few string questions... +</pre> + +>I've accumulated a few questions I've been wanting to ask you, mostly +>about string stuff. Nothing urgent, but I want to ask them before I +>forget. So here goes...: +> +>1) Is it acceptable to use nsLiteralCString or nsLiteralString on +>something that's not a literal? This can be useful in some places, +>for example, to convert a char* to PRUnichar*: +> +>PRUnichar* new = ToNewUnicode(nsLiteralCString(myCharPtr)); + +<p>This is explicitly allowed. That's why I'm proposing to change the +names of those classes to <span class="code">nsLocal[C]String</span>. + + +>2) Should nsString2x.h and nsString2x.cpp go away? They look like a +>never-completed rewrite or something... + +<p>Yes. They should go away. They are uncompleted [old] bullshit, +exactly as you diagnosed. + +<p>I'll look into the other two questions. + + + + + +<hr> +<pre> +Date: Thu, 1 Feb 2001 15:12:41 -0400 +Subject: Re: [Fwd: bad string, bad string] +</pre> + +<p>We've been removing implicit conversion operators because they +_always_ lead to trouble. Usually they make it harder to pick the +right function when overloading is involved and in the past they have +led to huge performance suckage because we ended up doing conversions +when we didn't need to because the implicit operator made us pick the +wrong function. + +<p>It's borderline when the class implements something that is <strong>so</strong> +close, as with a guaranteed flat string or an <span class="code">nsCOMPtr</span> ... but the +general recommendation is to avoid implicit conversions. + +<p>See bug #53057. + + + + + +<hr> +<pre> +Date: Tue, 6 Feb 2001 18:52:23 -0400 +Subject: seeking review for bug #57087 +</pre> + +<p> bug: + <a class="exact-uri" href="http://bugzilla.mozilla.org/show_bug.cgi?id=57087">http://bugzilla.mozilla.org/show_bug.cgi?id=57087</a> + + patch: + <a class="exact-uri" href="http://bugzilla.mozilla.org/showattachment.cgi?attach_id=24576">http://bugzilla.mozilla.org/showattachment.cgi?attach_id=24576</a> + +<p>This patch is supposed to add the ability to define very long literal +strings more easily by breaking lines, e.g., + +<div class="source-code"> +<pre> +NS_MULTILINE_LITERAL( NS_L("This is the start of a very long line") + NS_L(" which actually continues across") + NS_L(" a couple more.") ) +</pre> +</div> + +<p>The main danger in this scheme is callers who omit the inner <span class="code">NS_L</span> +wrapping. Though I believe this will be caught at compile time as the +wrong type initializer. + +<p>Seeking input from everybody, and waterson in particular. + + + + + +<hr> +<pre> +Date: Wed, 14 Feb 2001 16:09:10 -0400 +Subject: Re: Question... +</pre> + +<p>There are some utilities in "xpcom/ds/nsReadableUtils.h". In +particular, if you want to get back a new heap-allocated ASCII string +with the minimal work, you would say + +<div class="source-code"> +<pre> +PRUnichar* sourceChars = ...; + +char* destChars = ToNewCString(nsLiteralString(sourceChars)); +</pre> +</div> + + +<p>It's more efficient if you happen to already know the length. If you +don't, don't bother counting, that's what I'll do in the constructor +for <span class="code">nsLiteralString</span>. If you do, then call like this + +<div class="source-code"> +<pre> +destChars = ToNewCString( nsLiteralString(sourceChars, length) ); +</pre> +</div> + +<p>Other routines in that file will help you if, for instance, you wanted +to translate into a buffer you had already allocated. + +<p>Hope this helps, + + + + + +<hr> +<pre> +Date: Fri, 23 Feb 2001 03:12:58 -0400 +Subject: string snippet +</pre> + +<div class="source-code"> +<pre> +nsCString aInput; + + + +nsReadingIterator<char> search_start; +aInput.BeginReading(search_start); + +nsReadingIterator<char> search_end; +aInput.EndReading(search_end); + +if ( FindCharInReadable(':', search_start, search_end) ) + { + ++search_start; + return ToNewCString( Substring(aInput, search_start, search_end) +); + } +</pre> +</div> + + + + + + +<hr> +<pre> +Date: Wed, 7 Mar 2001 19:44:08 -0400 +Subject: string help +</pre> + +<p>Here you go, Mike: + + http://scottcollins.net/journal/discussion/mjudge-scratch.cpp + + + + + + +<hr> +<pre> +Date: Fri, 9 Mar 2001 20:56:07 -0400 +Subject: Re: string assertions +</pre> + +<p>If you get an iterator into a string and you advance it all the way to +the end of the string, and then <strong>keep</strong> trying to advance it, you hit +this assert. This could happen, for example if you tried to copy 10 +characters out of a 9 character string. I've tried to make this +impossible to get to. As far as I know, all my routines trim requests +in advance of manipulating iterators. When you see this, you should +get the stack. That will take you right to the bad spot. + + + + + +<hr> +<pre> +Date: Sat, 31 Mar 2001 11:04:03 -0400 +Subject: Re: Sun bustage and string advice +</pre> + +<p>You do know you are comparing two pointers now? It seems unlikely +those two pointers would ever be the same pointer. You probably want +to say something like + +<div class="source-code"> +<pre> +NS_LITERAL_STRING("foo").Equals(aTopic) // or + +NS_LITERAL_STRING("foo") == nsLiteralString(aTopic) +</pre> +</div> + +<p>...so that you compare the <strong>contents</strong> of two strings. Right now, +you're just testing to see if two pointers both point to the same +location in memory. A lot of people make this mistake. I would like +to make it obvious to people that comparing two pointers does not +compare strings. Can you tell me what gave you that impression so +that I can figure out how to better educate people not to do this? By +the way, it's not that I don't <strong>want</strong> to make this compare two +strings; it's that in C++, you can't override operations for built-in +types. And pointers are built-in types. So I can't make +<span class="code">operator==(const PRUnichar*, const PRUnichar*)</span> do anything different +than it already does, which is the same thing it does for any other +pointer. + + + + + + +</div> + + + +<!-- .................................................................End Matter --> + + + + </body> +</html> |