summaryrefslogtreecommitdiffstats
path: root/src/doc/book/nostarch/chapter08.md
diff options
context:
space:
mode:
Diffstat (limited to 'src/doc/book/nostarch/chapter08.md')
-rw-r--r--src/doc/book/nostarch/chapter08.md1284
1 files changed, 1284 insertions, 0 deletions
diff --git a/src/doc/book/nostarch/chapter08.md b/src/doc/book/nostarch/chapter08.md
new file mode 100644
index 000000000..1c7968c99
--- /dev/null
+++ b/src/doc/book/nostarch/chapter08.md
@@ -0,0 +1,1284 @@
+<!-- DO NOT EDIT THIS FILE.
+
+This file is periodically generated from the content in the `/src/`
+directory, so all fixes need to be made in `/src/`.
+-->
+
+[TOC]
+
+# Common Collections
+
+Rust’s standard library includes a number of very useful data structures called
+*collections*. Most other data types represent one specific value, but
+collections can contain multiple values. Unlike the built-in array and tuple
+types, the data these collections point to is stored on the heap, which means
+the amount of data does not need to be known at compile time and can grow or
+shrink as the program runs. Each kind of collection has different capabilities
+and costs, and choosing an appropriate one for your current situation is a
+skill you’ll develop over time. In this chapter, we’ll discuss three
+collections that are used very often in Rust programs:
+
+* A *vector* allows you to store a variable number of values next to each other.
+* A *string* is a collection of characters. We’ve mentioned the `String` type
+ previously, but in this chapter we’ll talk about it in depth.
+* A *hash map* allows you to associate a value with a particular key. It’s a
+ particular implementation of the more general data structure called a *map*.
+
+To learn about the other kinds of collections provided by the standard library,
+see the documentation at *https://doc.rust-lang.org/std/collections/index.html*.
+
+We’ll discuss how to create and update vectors, strings, and hash maps, as well
+as what makes each special.
+
+## Storing Lists of Values with Vectors
+
+The first collection type we’ll look at is `Vec<T>`, also known as a *vector*.
+Vectors allow you to store more than one value in a single data structure that
+puts all the values next to each other in memory. Vectors can only store values
+of the same type. They are useful when you have a list of items, such as the
+lines of text in a file or the prices of items in a shopping cart.
+
+### Creating a New Vector
+
+To create a new empty vector, we call the `Vec::new` function, as shown in
+Listing 8-1.
+
+```
+let v: Vec<i32> = Vec::new();
+```
+
+Listing 8-1: Creating a new, empty vector to hold values of type `i32`
+
+Note that we added a type annotation here. Because we aren’t inserting any
+values into this vector, Rust doesn’t know what kind of elements we intend to
+store. This is an important point. Vectors are implemented using generics;
+we’ll cover how to use generics with your own types in Chapter 10. For now,
+know that the `Vec<T>` type provided by the standard library can hold any type.
+When we create a vector to hold a specific type, we can specify the type within
+angle brackets. In Listing 8-1, we’ve told Rust that the `Vec<T>` in `v` will
+hold elements of the `i32` type.
+
+More often, you’ll create a `Vec<T>` with initial values and Rust will infer
+the type of value you want to store, so you rarely need to do this type
+annotation. Rust conveniently provides the `vec!` macro, which will create a
+new vector that holds the values you give it. Listing 8-2 creates a new
+`Vec<i32>` that holds the values `1`, `2`, and `3`. The integer type is `i32`
+because that’s the default integer type, as we discussed in the “Data Types”
+section of Chapter 3.
+
+```
+let v = vec![1, 2, 3];
+```
+
+Listing 8-2: Creating a new vector containing values
+
+Because we’ve given initial `i32` values, Rust can infer that the type of `v`
+is `Vec<i32>`, and the type annotation isn’t necessary. Next, we’ll look at how
+to modify a vector.
+
+### Updating a Vector
+
+To create a vector and then add elements to it, we can use the `push` method,
+as shown in Listing 8-3.
+
+```
+let mut v = Vec::new();
+
+v.push(5);
+v.push(6);
+v.push(7);
+v.push(8);
+```
+
+Listing 8-3: Using the `push` method to add values to a vector
+
+As with any variable, if we want to be able to change its value, we need to
+make it mutable using the `mut` keyword, as discussed in Chapter 3. The numbers
+we place inside are all of type `i32`, and Rust infers this from the data, so
+we don’t need the `Vec<i32>` annotation.
+
+<!--
+I think people from other languages may get stuck a bit here because this is
+the first time (I think?) that we're showing a hindley-milner style type
+inference in action (rather than using the initializer to infer the type).
+
+Should we show the definition for `push`? That'd let us tie together the method
+call, mutable reference to self drawing on the `impl` we saw in earlier
+chapters and help to explain a little why the above works without having to
+annotate the type of the Vec.
+/JT --->
+<!-- I think readers would be more confused showing the definition of `push`
+here because we haven't covered generics yet. I haven't gotten comments about
+people being confused at this point (which doesn't mean they aren't), but
+personally when I learned this, it made sense to me that the type of the vector
+would be known from what I put in it. I'm leaning towards not elaborating here.
+/Carol -->
+
+### Reading Elements of Vectors
+
+There are two ways to reference a value stored in a vector: via indexing or
+using the `get` method. In the following examples, we’ve annotated the types of
+the values that are returned from these functions for extra clarity.
+
+Listing 8-4 shows both methods of accessing a value in a vector, with indexing
+syntax and the `get` method.
+
+```
+let v = vec![1, 2, 3, 4, 5];
+
+[1] let third: &i32 = &v[2];
+println!("The third element is {}", third);
+
+[2] let third: Option<&i32> = v.get(2);
+match third {
+ Some(third) => println!("The third element is {}", third),
+ None => println!("There is no third element."),
+}
+```
+
+Listing 8-4: Using indexing syntax or the `get` method to access an item in a
+vector
+
+Note a few details here. We use the index value of `2` to get the third element
+[1] because vectors are indexed by number, starting at zero. Using `&` and `[]`
+gives us a reference to the element at the index value. When we use the `get`
+method with the index passed as an argument [2], we get an `Option<&T>` that we
+can use with `match`.
+
+<!---
+I think it should be "Second, we get the third element by using both `&` and
+`[]`"
+/JT --->
+<!-- No, it shouldn't, but I reworded this whole paragraph and added wingdings
+because it was unclear /Carol -->
+
+The reason Rust provides these two ways to reference an element is so you can
+choose how the program behaves when you try to use an index value outside the
+range of existing elements. As an example, let’s see what happens when we have
+a vector of five elements and then we try to access an element at index 100
+with each technique, as shown in Listing 8-5.
+
+```
+let v = vec![1, 2, 3, 4, 5];
+
+let does_not_exist = &v[100];
+let does_not_exist = v.get(100);
+```
+
+Listing 8-5: Attempting to access the element at index 100 in a vector
+containing five elements
+
+When we run this code, the first `[]` method will cause the program to panic
+because it references a nonexistent element. This method is best used when you
+want your program to crash if there’s an attempt to access an element past the
+end of the vector.
+
+When the `get` method is passed an index that is outside the vector, it returns
+`None` without panicking. You would use this method if accessing an element
+beyond the range of the vector may happen occasionally under normal
+circumstances. Your code will then have logic to handle having either
+`Some(&element)` or `None`, as discussed in Chapter 6. For example, the index
+could be coming from a person entering a number. If they accidentally enter a
+number that’s too large and the program gets a `None` value, you could tell the
+user how many items are in the current vector and give them another chance to
+enter a valid value. That would be more user-friendly than crashing the program
+due to a typo!
+
+When the program has a valid reference, the borrow checker enforces the
+ownership and borrowing rules (covered in Chapter 4) to ensure this reference
+and any other references to the contents of the vector remain valid. Recall the
+rule that states you can’t have mutable and immutable references in the same
+scope. That rule applies in Listing 8-6, where we hold an immutable reference
+to the first element in a vector and try to add an element to the end. This
+program won’t work if we also try to refer to that element later in the
+function:
+
+```
+let mut v = vec![1, 2, 3, 4, 5];
+
+let first = &v[0];
+
+v.push(6);
+
+println!("The first element is: {}", first);
+```
+
+Listing 8-6: Attempting to add an element to a vector while holding a reference
+to an item
+
+Compiling this code will result in this error:
+
+```
+ --> src/main.rs:6:5
+ |
+4 | let first = &v[0];
+ | - immutable borrow occurs here
+5 |
+6 | v.push(6);
+ | ^^^^^^^^^ mutable borrow occurs here
+7 |
+8 | println!("The first element is: {}", first);
+ | ----- immutable borrow later used here
+```
+
+The code in Listing 8-6 might look like it should work: why should a reference
+to the first element care about changes at the end of the vector? This error is
+due to the way vectors work: because vectors put the values next to each other
+in memory, adding a new element onto the end of the vector might require
+allocating new memory and copying the old elements to the new space, if there
+isn’t enough room to put all the elements next to each other where the vector
+is currently stored. In that case, the reference to the first element would be
+pointing to deallocated memory. The borrowing rules prevent programs from
+ending up in that situation.
+
+> Note: For more on the implementation details of the `Vec<T>` type, see “The
+> Rustonomicon” at *https://doc.rust-lang.org/nomicon/vec/vec.html*.
+
+### Iterating over the Values in a Vector
+
+To access each element in a vector in turn, we would iterate through all of the
+elements rather than use indices to access one at a time. Listing 8-7 shows how
+to use a `for` loop to get immutable references to each element in a vector of
+`i32` values and print them.
+
+```
+let v = vec![100, 32, 57];
+for i in &v {
+ println!("{}", i);
+}
+```
+
+Listing 8-7: Printing each element in a vector by iterating over the elements
+using a `for` loop
+
+We can also iterate over mutable references to each element in a mutable vector
+in order to make changes to all the elements. The `for` loop in Listing 8-8
+will add `50` to each element.
+
+```
+let mut v = vec![100, 32, 57];
+for i in &mut v {
+ *i += 50;
+}
+```
+
+Listing 8-8: Iterating over mutable references to elements in a vector
+
+To change the value that the mutable reference refers to, we have to use the
+`*` dereference operator to get to the value in `i` before we can use the
+`+=` operator. We’ll talk more about the dereference operator in the
+“Following the Pointer to the Value with the Dereference Operator”
+section of Chapter 15.
+
+Iterating over a vector, whether immutably or mutably, is safe because of the
+borrow checker’s rules. If we attempted to insert or remove items in the `for`
+loop bodies in Listing 8-7 and Listing 8-8, we would get a compiler error
+similar to the one we got with the code in Listing 8-6. The reference to the
+vector that the `for` loop holds prevents simultaneous modification of the
+whole vector.
+
+<!--
+Maybe worth a mention: the above use of the mutable reference while you iterate
+is perfectly safe because there's no changing that's happening to the vector
+that would invalidate the iterator. But, if you wanted to iterate the vector
+while also trying to remove or insert elements, you'd get an error. For example:
+
+```
+let mut v = vec![100, 32, 57];
+for i in &mut v {
+ *i += 50;
+ if *i > 100 {
+ v.push(10); // <-- a second mutable reference is needed and will fail to compile
+ }
+}
+```
+
+Things like this help Rust prevent some classic C++ issues where people didn't
+think about the implications of growing/shrinking a container while iterating
+over it.
+/JT --->
+<!-- I thought Listing 8-6 covered this, but I can see how driving home the
+connection with iteration as well is worthwhile so I added a paragraph just
+before this comment. Please check for clarity Liz! /Carol -->
+
+### Using an Enum to Store Multiple Types
+
+Vectors can only store values that are the same type. This can be inconvenient;
+there are definitely use cases for needing to store a list of items of
+different types. Fortunately, the variants of an enum are defined under the
+same enum type, so when we need one type to represent elements of different
+types, we can define and use an enum!
+
+For example, say we want to get values from a row in a spreadsheet in which
+some of the columns in the row contain integers, some floating-point numbers,
+and some strings. We can define an enum whose variants will hold the different
+value types, and all the enum variants will be considered the same type: that
+of the enum. Then we can create a vector to hold that enum and so, ultimately,
+holds different types. We’ve demonstrated this in Listing 8-9.
+
+```
+enum SpreadsheetCell {
+ Int(i32),
+ Float(f64),
+ Text(String),
+}
+
+let row = vec![
+ SpreadsheetCell::Int(3),
+ SpreadsheetCell::Text(String::from("blue")),
+ SpreadsheetCell::Float(10.12),
+];
+```
+
+Listing 8-9: Defining an `enum` to store values of different types in one
+vector
+
+Rust needs to know what types will be in the vector at compile time so it knows
+exactly how much memory on the heap will be needed to store each element. We
+must also be explicit about what types are allowed in this vector. If Rust
+allowed a vector to hold any type, there would be a chance that one or more of
+the types would cause errors with the operations performed on the elements of
+the vector. Using an enum plus a `match` expression means that Rust will ensure
+at compile time that every possible case is handled, as discussed in Chapter 6.
+
+If you don’t know the exhaustive set of types a program will get at runtime to
+store in a vector, the enum technique won’t work. Instead, you can use a trait
+object, which we’ll cover in Chapter 17.
+
+Now that we’ve discussed some of the most common ways to use vectors, be sure
+to review the API documentation for all the many useful methods defined on
+`Vec<T>` by the standard library. For example, in addition to `push`, a `pop`
+method removes and returns the last element.
+
+### Dropping a Vector Drops Its Elements
+
+Like any other `struct`, a vector is freed when it goes out of scope, as
+annotated in Listing 8-10.
+
+```
+{
+ let v = vec![1, 2, 3, 4];
+
+ // do stuff with v
+} // <- v goes out of scope and is freed here
+```
+
+Listing 8-10: Showing where the vector and its elements are dropped
+
+When the vector gets dropped, all of its contents are also dropped, meaning the
+integers it holds will be cleaned up. The borrow checker ensures that any
+references to contents of a vector are only used while the vector itself is
+valid.
+
+Let’s move on to the next collection type: `String`!
+
+<!--
+nit: I think "meaning the integers it holds will be cleaned up" reads a little
+better
+
+nit #2: imho dropping isn't as imports when you start using vectors as reading
+elements from the vector. Is it better for training to mention it here, or
+would it be possible to move it later?
+/JT -->
+<!-- Took both nit suggestions-- reworded for nit #1 and moved this section to
+the end of the Vec section (and renumbered the listings) for nit #2. Liz,
+please check to make sure I didn't miss anything in the way the Vec section
+flows now! /Carol -->
+
+## Storing UTF-8 Encoded Text with Strings
+
+We talked about strings in Chapter 4, but we’ll look at them in more depth now.
+New Rustaceans commonly get stuck on strings for a combination of three
+reasons: Rust’s propensity for exposing possible errors, strings being a more
+complicated data structure than many programmers give them credit for, and
+UTF-8. These factors combine in a way that can seem difficult when you’re
+coming from other programming languages.
+
+We discuss strings in the context of collections because strings are
+implemented as a collection of bytes, plus some methods to provide useful
+functionality when those bytes are interpreted as text. In this section, we’ll
+talk about the operations on `String` that every collection type has, such as
+creating, updating, and reading. We’ll also discuss the ways in which `String`
+is different from the other collections, namely how indexing into a `String` is
+complicated by the differences between how people and computers interpret
+`String` data.
+
+### What Is a String?
+
+We’ll first define what we mean by the term *string*. Rust has only one string
+type in the core language, which is the string slice `str` that is usually seen
+in its borrowed form `&str`. In Chapter 4, we talked about *string slices*,
+which are references to some UTF-8 encoded string data stored elsewhere. String
+literals, for example, are stored in the program’s binary and are therefore
+string slices.
+
+The `String` type, which is provided by Rust’s standard library rather than
+coded into the core language, is a growable, mutable, owned, UTF-8 encoded
+string type. When Rustaceans refer to “strings” in Rust, they might be
+referring to either the `String` or the string slice `&str` types, not just one
+of those types. Although this section is largely about `String`, both types are
+used heavily in Rust’s standard library, and both `String` and string slices
+are UTF-8 encoded.
+
+<!---
+I'm wondering if listing the above makes it a bit more cumbersome. In effect,
+out of gate we're saying there are a lot of different string types.
+
+But perhaps we could focus on String and &str here and let them learn about
+CString/CStr when doing FFI and OsString/OsStr when they work on paths?
+Basically, I'm wondering if we should cut down on the concept count and let
+them come across those alternate strings more naturally.
+/JT --->
+<!-- I'm ok with that! I removed the paragraph talking about the other, rarer
+string types. /Carol -->
+
+### Creating a New String
+
+Many of the same operations available with `Vec<T>` are available with `String`
+as well, because `String` is actually implemented as a wrapper around a vector
+of bytes with some extra guarantees, restrictions, and capabilities. An example
+of a function that works the same way with `Vec<T>` and `String` is the `new`
+function to create an instance, shown in Listing 8-11.
+
+```
+let mut s = String::new();
+```
+
+Listing 8-11: Creating a new, empty `String`
+
+This line creates a new empty string called `s`, which we can then load data
+into. Often, we’ll have some initial data that we want to start the string
+with. For that, we use the `to_string` method, which is available on any type
+that implements the `Display` trait, as string literals do. Listing 8-12 shows
+two examples.
+
+```
+let data = "initial contents";
+
+let s = data.to_string();
+
+// the method also works on a literal directly:
+let s = "initial contents".to_string();
+```
+
+Listing 8-12: Using the `to_string` method to create a `String` from a string
+literal
+
+This code creates a string containing `initial contents`.
+
+We can also use the function `String::from` to create a `String` from a string
+literal. The code in Listing 8-13 is equivalent to the code from Listing 8-12
+that uses `to_string`.
+
+```
+let s = String::from("initial contents");
+```
+
+Listing 8-13: Using the `String::from` function to create a `String` from a
+string literal
+
+Because strings are used for so many things, we can use many different generic
+APIs for strings, providing us with a lot of options. Some of them can seem
+redundant, but they all have their place! In this case, `String::from` and
+`to_string` do the same thing, so which you choose is a matter of style and
+readability.
+
+Remember that strings are UTF-8 encoded, so we can include any properly encoded
+data in them, as shown in Listing 8-14.
+
+```
+let hello = String::from("السلام عليكم");
+let hello = String::from("Dobrý den");
+let hello = String::from("Hello");
+let hello = String::from("שָׁלוֹם");
+let hello = String::from("नमस्ते");
+let hello = String::from("こんにちは");
+let hello = String::from("안녕하세요");
+let hello = String::from("你好");
+let hello = String::from("Olá");
+let hello = String::from("Здравствуйте");
+let hello = String::from("Hola");
+```
+
+Listing 8-14: Storing greetings in different languages in strings
+
+All of these are valid `String` values.
+
+### Updating a String
+
+A `String` can grow in size and its contents can change, just like the contents
+of a `Vec<T>`, if you push more data into it. In addition, you can conveniently
+use the `+` operator or the `format!` macro to concatenate `String` values.
+
+#### Appending to a String with `push_str` and `push`
+
+We can grow a `String` by using the `push_str` method to append a string slice,
+as shown in Listing 8-15.
+
+```
+let mut s = String::from("foo");
+s.push_str("bar");
+```
+
+Listing 8-15: Appending a string slice to a `String` using the `push_str` method
+
+After these two lines, `s` will contain `foobar`. The `push_str` method takes a
+string slice because we don’t necessarily want to take ownership of the
+parameter. For example, in the code in Listing 8-16, we want to be able to use
+`s2` after appending its contents to `s1`.
+
+```
+let mut s1 = String::from("foo");
+let s2 = "bar";
+s1.push_str(s2);
+println!("s2 is {}", s2);
+```
+
+Listing 8-16: Using a string slice after appending its contents to a `String`
+
+If the `push_str` method took ownership of `s2`, we wouldn’t be able to print
+its value on the last line. However, this code works as we’d expect!
+
+The `push` method takes a single character as a parameter and adds it to the
+`String`. Listing 8-17 adds the letter "l" to a `String` using the `push`
+method.
+
+```
+let mut s = String::from("lo");
+s.push('l');
+```
+
+Listing 8-17: Adding one character to a `String` value using `push`
+
+As a result, `s` will contain `lol`.
+
+#### Concatenation with the `+` Operator or the `format!` Macro
+
+Often, you’ll want to combine two existing strings. One way to do so is to use
+the `+` operator, as shown in Listing 8-18.
+
+```
+let s1 = String::from("Hello, ");
+let s2 = String::from("world!");
+let s3 = s1 + &s2; // note s1 has been moved here and can no longer be used
+```
+
+Listing 8-18: Using the `+` operator to combine two `String` values into a new
+`String` value
+
+The string `s3` will contain `Hello, world!`. The reason `s1` is no longer
+valid after the addition, and the reason we used a reference to `s2`, has to do
+with the signature of the method that’s called when we use the `+` operator.
+The `+` operator uses the `add` method, whose signature looks something like
+this:
+
+```
+fn add(self, s: &str) -> String {
+```
+
+In the standard library, you’ll see `add` defined using generics and associated
+types. Here, we’ve substituted in concrete types, which is what happens when we
+call this method with `String` values. We’ll discuss generics in Chapter 10.
+This signature gives us the clues we need to understand the tricky bits of the
+`+` operator.
+
+First, `s2` has an `&`, meaning that we’re adding a *reference* of the second
+string to the first string. This is because of the `s` parameter in the `add`
+function: we can only add a `&str` to a `String`; we can’t add two `String`
+values together. But wait—the type of `&s2` is `&String`, not `&str`, as
+specified in the second parameter to `add`. So why does Listing 8-18 compile?
+
+<!--
+The above isn't quite right - the trait for ops::Add uses an Rhs associated type
+instead of using T for both lhs and rhs.
+
+```
+pub trait Add<Rhs = Self> {
+ type Output;
+ fn add(self, rhs: Rhs) -> Self::Output;
+}
+```
+
+The implementation of Add for String fills in Rhs with the slice:
+
+```
+impl<'_> Add<&'_ str> for String
+```
+
+Not sure if it's better to fix the description and not have deref coercion
+discussion following, or fix the example so you can have the coercion
+discussion.
+/JT --->
+<!-- I've made an edit above to address this /Carol -->
+
+The reason we’re able to use `&s2` in the call to `add` is that the compiler
+can *coerce* the `&String` argument into a `&str`. When we call the `add`
+method, Rust uses a *deref coercion*, which here turns `&s2` into `&s2[..]`.
+We’ll discuss deref coercion in more depth in Chapter 15. Because `add` does
+not take ownership of the `s` parameter, `s2` will still be a valid `String`
+after this operation.
+
+Second, we can see in the signature that `add` takes ownership of `self`,
+because `self` does *not* have an `&`. This means `s1` in Listing 8-18 will be
+moved into the `add` call and will no longer be valid after that. So although
+`let s3 = s1 + &s2;` looks like it will copy both strings and create a new one,
+this statement actually takes ownership of `s1`, appends a copy of the contents
+of `s2`, and then returns ownership of the result. In other words, it looks
+like it’s making a lot of copies but isn’t; the implementation is more
+efficient than copying.
+
+If we need to concatenate multiple strings, the behavior of the `+` operator
+gets unwieldy:
+
+```
+let s1 = String::from("tic");
+let s2 = String::from("tac");
+let s3 = String::from("toe");
+
+let s = s1 + "-" + &s2 + "-" + &s3;
+```
+
+At this point, `s` will be `tic-tac-toe`. With all of the `+` and `"`
+characters, it’s difficult to see what’s going on. For more complicated string
+combining, we can instead use the `format!` macro:
+
+```
+let s1 = String::from("tic");
+let s2 = String::from("tac");
+let s3 = String::from("toe");
+
+let s = format!("{}-{}-{}", s1, s2, s3);
+```
+
+This code also sets `s` to `tic-tac-toe`. The `format!` macro works like
+`println!`, but instead of printing the output to the screen, it returns a
+`String` with the contents. The version of the code using `format!` is much
+easier to read, and the code generated by the `format!` macro uses references
+so that this call doesn’t take ownership of any of its parameters.
+
+### Indexing into Strings
+
+In many other programming languages, accessing individual characters in a
+string by referencing them by index is a valid and common operation. However,
+if you try to access parts of a `String` using indexing syntax in Rust, you’ll
+get an error. Consider the invalid code in Listing 8-19.
+
+```
+let s1 = String::from("hello");
+let h = s1[0];
+```
+
+Listing 8-19: Attempting to use indexing syntax with a String
+
+This code will result in the following error:
+
+```
+error[E0277]: the type `String` cannot be indexed by `{integer}`
+ --> src/main.rs:3:13
+ |
+3 | let h = s1[0];
+ | ^^^^^ `String` cannot be indexed by `{integer}`
+ |
+ = help: the trait `Index<{integer}>` is not implemented for `String`
+```
+
+The error and the note tell the story: Rust strings don’t support indexing. But
+why not? To answer that question, we need to discuss how Rust stores strings in
+memory.
+
+#### Internal Representation
+
+A `String` is a wrapper over a `Vec<u8>`. Let’s look at some of our properly
+encoded UTF-8 example strings from Listing 8-14. First, this one:
+
+```
+let hello = String::from("Hola");
+```
+
+In this case, `len` will be 4, which means the vector storing the string “Hola”
+is 4 bytes long. Each of these letters takes 1 byte when encoded in UTF-8. The
+following line, however, may surprise you. (Note that this string begins with
+the capital Cyrillic letter Ze, not the Arabic number 3.)
+
+```
+let hello = String::from("Здравствуйте");
+```
+
+Asked how long the string is, you might say 12. In fact, Rust’s answer is 24:
+that’s the number of bytes it takes to encode “Здравствуйте” in UTF-8, because
+each Unicode scalar value in that string takes 2 bytes of storage. Therefore,
+an index into the string’s bytes will not always correlate to a valid Unicode
+scalar value. To demonstrate, consider this invalid Rust code:
+
+```
+let hello = "Здравствуйте";
+let answer = &hello[0];
+```
+
+You already know that `answer` will not be `З`, the first letter. When encoded
+in UTF-8, the first byte of `З` is `208` and the second is `151`, so it would
+seem that `answer` should in fact be `208`, but `208` is not a valid character
+on its own. Returning `208` is likely not what a user would want if they asked
+for the first letter of this string; however, that’s the only data that Rust
+has at byte index 0. Users generally don’t want the byte value returned, even
+if the string contains only Latin letters: if `&"hello"[0]` were valid code
+that returned the byte value, it would return `104`, not `h`.
+
+The answer, then, is that to avoid returning an unexpected value and causing
+bugs that might not be discovered immediately, Rust doesn’t compile this code
+at all and prevents misunderstandings early in the development process.
+
+#### Bytes and Scalar Values and Grapheme Clusters! Oh My!
+
+Another point about UTF-8 is that there are actually three relevant ways to
+look at strings from Rust’s perspective: as bytes, scalar values, and grapheme
+clusters (the closest thing to what we would call *letters*).
+
+If we look at the Hindi word “नमस्ते” written in the Devanagari script, it is
+stored as a vector of `u8` values that looks like this:
+
+```
+[224, 164, 168, 224, 164, 174, 224, 164, 184, 224, 165, 141, 224, 164, 164,
+224, 165, 135]
+```
+
+That’s 18 bytes and is how computers ultimately store this data. If we look at
+them as Unicode scalar values, which are what Rust’s `char` type is, those
+bytes look like this:
+
+```
+['न', 'म', 'स', '्', 'त', 'े']
+```
+
+There are six `char` values here, but the fourth and sixth are not letters:
+they’re diacritics that don’t make sense on their own. Finally, if we look at
+them as grapheme clusters, we’d get what a person would call the four letters
+that make up the Hindi word:
+
+```
+["न", "म", "स्", "ते"]
+```
+
+Rust provides different ways of interpreting the raw string data that computers
+store so that each program can choose the interpretation it needs, no matter
+what human language the data is in.
+
+A final reason Rust doesn’t allow us to index into a `String` to get a
+character is that indexing operations are expected to always take constant time
+(O(1)). But it isn’t possible to guarantee that performance with a `String`,
+because Rust would have to walk through the contents from the beginning to the
+index to determine how many valid characters there were.
+
+### Slicing Strings
+
+Indexing into a string is often a bad idea because it’s not clear what the
+return type of the string-indexing operation should be: a byte value, a
+character, a grapheme cluster, or a string slice. If you really need to use
+indices to create string slices, therefore, Rust asks you to be more specific.
+
+Rather than indexing using `[]` with a single number, you can use `[]` with a
+range to create a string slice containing particular bytes:
+
+```
+let hello = "Здравствуйте";
+
+let s = &hello[0..4];
+```
+
+Here, `s` will be a `&str` that contains the first 4 bytes of the string.
+Earlier, we mentioned that each of these characters was 2 bytes, which means
+`s` will be `Зд`.
+
+If we were to try to slice only part of a character’s bytes with something like
+`&hello[0..1]`, Rust would panic at runtime in the same way as if an invalid
+index were accessed in a vector:
+
+```
+thread 'main' panicked at 'byte index 1 is not a char boundary; it is inside 'З' (bytes 0..2) of `Здравствуйте`', src/main.rs:4:14
+```
+
+You should use ranges to create string slices with caution, because doing so
+can crash your program.
+
+### Methods for Iterating Over Strings
+
+<!--- is there a reason this comes after how to slice, rather than after the
+discussion on why we can't directly index into a string? /LC --->
+<!-- I think the idea was that we show this progression of from worst technique
+to best:
+
+1. direct indexing, which doesn't compile
+2. slicing with a range, which looks similar to indexing, which does compile
+but might panic at runtime
+3. iterating over chars or bytes, which compiles and won't panic
+
+Do you have suggestions on making this clearer? I've tried to add a bit at the
+beginning of this section /Carol
+-->
+<!-- JT, what do you think -- is this ordering clear to you? /LC -->
+<!---
+I'm okay with the current order - I think showing why it's bad, what's close to
+what you try first, and then finally the idiomatic Rust solution reads okay.
+
+One tiny nit, for flow, would be to use the Cyrillic example first here to show
+how `.chars()` works well for it and then mention that for more complex
+scripts, like Hindi, you'll need to use the more full-featured string handling
+you find on crates.io.
+/JT --->
+<!-- I've taken JT's suggestion here to use part of the Cyrillic string, then
+mention you'll need a crate to correctly get the grapheme clusters for Hindi
+/Carol -->
+
+The best way to operate on pieces of strings is to be explicit about whether
+you want characters or bytes. For individual Unicode scalar values, use the
+`chars` method. Calling `chars` on “Зд” separates out and returns two values
+of type `char`, and you can iterate over the result to access each element:
+
+```rust
+for c in "Зд".chars() {
+ println!("{}", c);
+}
+```
+
+This code will print the following:
+
+```text
+```
+
+Alternatively, the `bytes` method returns each raw byte, which might be
+appropriate for your domain:
+
+```rust
+for b in "Зд".bytes() {
+ println!("{}", b);
+}
+```
+
+This code will print the four bytes that make up this string:
+
+```text
+208
+151
+208
+180
+```
+
+But be sure to remember that valid Unicode scalar values may be made up of more
+than 1 byte.
+
+Getting grapheme clusters from strings as with the Devanagari script is
+complex, so this functionality is not provided by the standard library. Crates
+are available on *https://crates.io/* if this is the functionality you need.
+
+### Strings Are Not So Simple
+
+<!---
+Because Strings are quite complicated, and have complications that are all
+their own and unlike any other containers, I wonder if maybe this chapter
+should be two different chapters with one specifically being about strings,
+string slices, chars, and related?
+/JT --->
+<!-- I don't think I want to make that big of a change at this point... the
+original idea was to compare and contrast the different containers, perhaps
+that's not serving its purpose as well as a chapter split could... I'll think
+about this for the next major revision. /Carol -->
+
+<!---
+We don't talk about searching in a string. Feels like it could use an example
+or two?
+/JT --->
+<!-- To address this suggestion and a bit of the previous suggestion as well, I
+changed the first paragraph in the "Creating a New String" section to mention
+that a `String` is implemented using a `Vec`. Then, to echo the last paragraph
+before the "Dropping a Vector Drops Its Elements" section, I've added some text
+here to again urge the reader to check out the standard library documentation
+for more functionality. /Carol -->
+
+To summarize, strings are complicated. Different programming languages make
+different choices about how to present this complexity to the programmer. Rust
+has chosen to make the correct handling of `String` data the default behavior
+for all Rust programs, which means programmers have to put more thought into
+handling UTF-8 data upfront. This trade-off exposes more of the complexity of
+strings than is apparent in other programming languages, but it prevents you
+from having to handle errors involving non-ASCII characters later in your
+development life cycle.
+
+The good news is that the standard library offers a lot of functionality built
+off the `String` and `&str` types to help handle these complex situations
+correctly. Be sure to check out the documentation for useful methods like
+`contains` for searching in a string and `replace` for substituting parts of a
+string with another string.
+
+Let’s switch to something a bit less complex: hash maps!
+
+## Storing Keys with Associated Values in Hash Maps
+
+The last of our common collections is the *hash map*. The type `HashMap<K, V>`
+stores a mapping of keys of type `K` to values of type `V` using a
+*hashing function*, which determines how it places these keys and values into
+memory. Many programming languages support this kind of data structure, but
+they often use a different name, such as hash, map, object, hash table,
+dictionary, or associative array, just to name a few.
+
+Hash maps are useful when you want to look up data not by using an index, as
+you can with vectors, but by using a key that can be of any type. For example,
+in a game, you could keep track of each team’s score in a hash map in which
+each key is a team’s name and the values are each team’s score. Given a team
+name, you can retrieve its score.
+
+We’ll go over the basic API of hash maps in this section, but many more goodies
+are hiding in the functions defined on `HashMap<K, V>` by the standard library.
+As always, check the standard library documentation for more information.
+
+### Creating a New Hash Map
+
+One way to create an empty hash map is using `new` and adding elements with
+`insert`. In Listing 8-20, we’re keeping track of the scores of two teams whose
+names are *Blue* and *Yellow*. The Blue team starts with 10 points, and the
+Yellow team starts with 50.
+
+```
+use std::collections::HashMap;
+
+let mut scores = HashMap::new();
+
+scores.insert(String::from("Blue"), 10);
+scores.insert(String::from("Yellow"), 50);
+```
+
+Listing 8-20: Creating a new hash map and inserting some keys and values
+
+Note that we need to first `use` the `HashMap` from the collections portion of
+the standard library. Of our three common collections, this one is the least
+often used, so it’s not included in the features brought into scope
+automatically in the prelude. Hash maps also have less support from the
+standard library; there’s no built-in macro to construct them, for example.
+
+Just like vectors, hash maps store their data on the heap. This `HashMap` has
+keys of type `String` and values of type `i32`. Like vectors, hash maps are
+homogeneous: all of the keys must have the same type as each other, and all of
+the values must have the same type.
+
+<!---
+I'm not sure I've seen this in the wild? I'm tempted to say to skip the zip
+example for flow and go from creating the hash map to working with its
+contents.
+/JT --->
+<!-- Cut Listing 8-21 and renumbered! /Carol -->
+
+### Accessing Values in a Hash Map
+
+<!---
+For flow, would it make sense for this section to follow creating the hash map?
+That way we introduce a useful concept and also continue the teams example.
+/JT --->
+<!-- Ok, I've switched the order of "Accessing Values in a Hash Map" and "Hash
+Maps and Ownership" and renumbered! Does this still make sense Liz? /Carol -->
+
+We can get a value out of the hash map by providing its key to the `get`
+method, as shown in Listing 8-21.
+
+```
+use std::collections::HashMap;
+
+let mut scores = HashMap::new();
+
+scores.insert(String::from("Blue"), 10);
+scores.insert(String::from("Yellow"), 50);
+
+let team_name = String::from("Blue");
+let score = scores.get(&team_name).unwrap_or(0);
+```
+
+Listing 8-21: Accessing the score for the Blue team stored in the hash map
+
+Here, `score` will have the value that’s associated with the Blue team, and the
+result will be `10`. The `get` method returns an `Option<&V>`; if there’s no
+value for that key in the hash map, `get` will return `None`. This program
+handles the `Option` by calling `unwrap_or` to set `score` to zero if `scores`
+doesn’t have an entry for the key.
+
+<!---
+Should there be a quick example here to show handling Some/None again before
+we move on to iteration?
+/JT --->
+<!-- I've changed the code in Listing 8-21 a bit to actually handle the
+`Option` instead of referencing chapter 6, what do you think Liz? /Carol -->
+
+We can iterate over each key/value pair in a hash map in a similar manner as we
+do with vectors, using a `for` loop:
+
+```
+use std::collections::HashMap;
+
+let mut scores = HashMap::new();
+
+scores.insert(String::from("Blue"), 10);
+scores.insert(String::from("Yellow"), 50);
+
+for (key, value) in &scores {
+ println!("{}: {}", key, value);
+}
+```
+
+This code will print each pair in an arbitrary order:
+
+```
+Yellow: 50
+Blue: 10
+```
+
+### Hash Maps and Ownership
+
+For types that implement the `Copy` trait, like `i32`, the values are copied
+into the hash map. For owned values like `String`, the values will be moved and
+the hash map will be the owner of those values, as demonstrated in Listing 8-22.
+
+```
+use std::collections::HashMap;
+
+let field_name = String::from("Favorite color");
+let field_value = String::from("Blue");
+
+let mut map = HashMap::new();
+map.insert(field_name, field_value);
+// field_name and field_value are invalid at this point, try using them and
+// see what compiler error you get!
+```
+
+Listing 8-22: Showing that keys and values are owned by the hash map once
+they’re inserted
+
+We aren’t able to use the variables `field_name` and `field_value` after
+they’ve been moved into the hash map with the call to `insert`.
+
+If we insert references to values into the hash map, the values won’t be moved
+into the hash map. The values that the references point to must be valid for at
+least as long as the hash map is valid. We’ll talk more about these issues in
+the “Validating References with Lifetimes” section in Chapter 10.
+
+### Updating a Hash Map
+
+Although the number of key and value pairs is growable, each unique key can
+only have one value associated with it at a time (but not vice versa: for
+example, both the Blue team and the Yellow team could have value 10 stored in
+the `scores` hash map).
+<!--- And vice versa? /LC --->
+<!-- No, you could have a hashmap that has ("Blue", 10) and ("Yellow", 10) for
+example. Stating this here feels a bit off topic for updating the value of an
+existing key, though, I'm not sure how to work it in. Do you think that's
+important enough to state here? If so, do you have suggestions on how to do it
+without distracting from the main point of this section? /Carol -->
+<!-- It may not be important enough, what do you think JT? /LC -->
+<!---
+I think it's maybe worth calling out. Something you could use to drive
+this home is the `.entry()` call. This makes it clear that for any key there's
+one cell (or entry) that you're updating in the hash map. I see we use it
+later, though worth a thought if bringing it earlier helps?
+/JT --->
+<!-- I've added a short sentence here, but every time I try to add something
+more, I end up getting tangled in saying things like "key value" as opposed to
+"value value", which is terrible... or I worry about misleading readers into
+thinking that you can't use a `Vec<T>` as a HashMap value type, which you
+totally can to store multiple "values" in one vector "value", which you totally
+can, it's just a little more complicated. Or I try to say "multiple keys can
+have the same value" which sounds like it could imply that there would be a
+*shared* value stored in the HashMap, which wouldn't be the case, there would
+be two separate allocations that would happen to have the same value... I just
+haven't heard a reader wondering if each value can only have one key with it
+before (which doesn't mean they haven't wondered it, I just haven't heard of
+it) so I don't want to lead readers astray if they weren't already going that
+direction? What do you think about what's here now, Liz? /Carol -->
+
+When you want to change the data in a hash map, you have to decide how to
+handle the case when a key already has a value assigned. You could replace the
+old value with the new value, completely disregarding the old value. You could
+keep the old value and ignore the new value, only adding the new value if the
+key *doesn’t* already have a value. Or you could combine the old value and the
+new value. Let’s look at how to do each of these!
+
+#### Overwriting a Value
+
+If we insert a key and a value into a hash map and then insert that same key
+with a different value, the value associated with that key will be replaced.
+Even though the code in Listing 8-23 calls `insert` twice, the hash map will
+only contain one key/value pair because we’re inserting the value for the Blue
+team’s key both times.
+
+```
+use std::collections::HashMap;
+
+let mut scores = HashMap::new();
+
+scores.insert(String::from("Blue"), 10);
+scores.insert(String::from("Blue"), 25);
+
+println!("{:?}", scores);
+```
+
+Listing 8-23: Replacing a value stored with a particular key
+
+This code will print `{"Blue": 25}`. The original value of `10` has been
+overwritten.
+
+#### Adding a Key and Value Only If a Key Isn’t Present
+
+<!--- to be clear, are we talking about default values here, or just checking
+for an existing value before allowing insertion of a value? /LC--->
+<!-- I'm not sure what you mean exactly. Checking for an existing value before
+allowing insertion of a value can be used to insert whatever value would mean
+"default" in your program, or it can be used to insert some other value that
+you wouldn't call a default. That is, in Listing 8-25, would you call 50 a
+default value or no? (I don't think we've given enough information about what
+the program is ultimately trying to do to tell if 50 is a default or not, and I
+don't think it matters, but I am interested to know if there's something I'm
+missing that you're trying to get at). Can you elaborate on what was confusing
+and perhaps propose wording that would have cleared this up for you, and I can
+fix if needed? /Carol-->
+<!-- I suppose what I'm asking is whether a value is inserted from the started
+as a default value and then updated, meaning the key never has no value, or
+whether we're only allowing insertion of a value if there isn't already a
+value. I think it's the latter and maybe that's clear enough as is! JT, what do
+you think? /LC -->
+<!---
+I think the idea is generally right, we're going to insert the value if the
+key is not already in the hash map. Maybe the title could be:
+
+"Adding a key and value only if a key isn't present"
+
+Worth a note: I think "default" values are a bit of a loaded term in Rust. If
+we use it, we may confuse people later if we they come across `Default`, which
+is the default value of a type (like 0 is for i64, via `i64::default()`)
+/JT --->
+<!-- Ok, I've taken JT's suggestion for the section title and tried to reword
+the text here a bit; is this clearer, Liz? I share JT's concern about using the
+word "default"... /Carol -->
+
+It’s common to check whether a particular key already exists in the hash map
+with a value then take the following actions: if the key does exist in the hash
+map, the existing value should remain the way it is. If the key doesn’t exist,
+insert it and a value for it.
+
+Hash maps have a special API for this called `entry` that takes the key you
+want to check as a parameter. The return value of the `entry` method is an enum
+called `Entry` that represents a value that might or might not exist. Let’s say
+we want to check whether the key for the Yellow team has a value associated
+with it. If it doesn’t, we want to insert the value 50, and the same for the
+Blue team. Using the `entry` API, the code looks like Listing 8-24.
+
+```
+use std::collections::HashMap;
+
+let mut scores = HashMap::new();
+scores.insert(String::from("Blue"), 10);
+
+scores.entry(String::from("Yellow")).or_insert(50);
+scores.entry(String::from("Blue")).or_insert(50);
+
+println!("{:?}", scores);
+```
+
+Listing 8-24: Using the `entry` method to only insert if the key does not
+already have a value
+
+The `or_insert` method on `Entry` is defined to return a mutable reference to
+the value for the corresponding `Entry` key if that key exists, and if not,
+inserts the parameter as the new value for this key and returns a mutable
+reference to the new value. This technique is much cleaner than writing the
+logic ourselves and, in addition, plays more nicely with the borrow checker.
+
+Running the code in Listing 8-24 will print `{"Yellow": 50, "Blue": 10}`. The
+first call to `entry` will insert the key for the Yellow team with the value
+50 because the Yellow team doesn’t have a value already. The second call to
+`entry` will not change the hash map because the Blue team already has the
+value 10.
+
+#### Updating a Value Based on the Old Value
+
+Another common use case for hash maps is to look up a key’s value and then
+update it based on the old value. For instance, Listing 8-25 shows code that
+counts how many times each word appears in some text. We use a hash map with
+the words as keys and increment the value to keep track of how many times we’ve
+seen that word. If it’s the first time we’ve seen a word, we’ll first insert
+the value 0.
+
+```
+use std::collections::HashMap;
+
+let text = "hello world wonderful world";
+
+let mut map = HashMap::new();
+
+for word in text.split_whitespace() {
+ let count = map.entry(word).or_insert(0);
+ *count += 1;
+}
+
+println!("{:?}", map);
+```
+
+Listing 8-25: Counting occurrences of words using a hash map that stores words
+and counts
+
+This code will print `{"world": 2, "hello": 1, "wonderful": 1}`. You might see
+the same key/value pairs printed in a different order: recall from the
+“Accessing Values in a Hash Map” section that iterating over a hash map happens
+in an arbitrary order.
+
+The `split_whitespace` method returns an iterator over sub-slices, separated by
+whitespace, of the value in `text`. The `or_insert` method returns a mutable
+reference (`&mut V`) to the value for the specified key. Here we store that
+mutable reference in the `count` variable, so in order to assign to that value,
+we must first dereference `count` using the asterisk (`*`). The mutable
+reference goes out of scope at the end of the `for` loop, so all of these
+changes are safe and allowed by the borrowing rules.
+
+<!---
+Running the above gave me `{"world": 2, "wonderful": 1, "hello": 1}` so the key
+order may not be deterministic or may change based on changes to the hashing
+function in the std lib.
+/JT --->
+<!-- I've added a note that getting a different order is perfectly normal
+/Carol -->
+
+### Hashing Functions
+
+By default, `HashMap` uses a hashing function called *SipHash* that can provide
+resistance to Denial of Service (DoS) attacks involving hash tables. This is
+not the fastest hashing algorithm available, but the trade-off for better
+security that comes with the drop in performance is worth it. If you profile
+your code and find that the default hash function is too slow for your
+purposes, you can switch to another function by specifying a different hasher.
+A *hasher* is a type that implements the `BuildHasher` trait. We’ll talk about
+traits and how to implement them in Chapter 10. You don’t necessarily have to
+implement your own hasher from scratch; *https://crates.io/* has libraries
+shared by other Rust users that provide hashers implementing many common
+hashing algorithms.
+
+## Summary
+
+Vectors, strings, and hash maps will provide a large amount of functionality
+necessary in programs when you need to store, access, and modify data. Here are
+some exercises you should now be equipped to solve:
+
+* Given a list of integers, use a vector and return the median (when sorted,
+ the value in the middle position) and mode (the value that occurs most often;
+ a hash map will be helpful here) of the list.
+* Convert strings to pig latin. The first consonant of each word is moved to
+ the end of the word and “ay” is added, so “first” becomes “irst-fay.” Words
+ that start with a vowel have “hay” added to the end instead (“apple” becomes
+ “apple-hay”). Keep in mind the details about UTF-8 encoding!
+* Using a hash map and vectors, create a text interface to allow a user to add
+ employee names to a department in a company. For example, “Add Sally to
+ Engineering” or “Add Amir to Sales.” Then let the user retrieve a list of all
+ people in a department or all people in the company by department, sorted
+ alphabetically.
+
+The standard library API documentation describes methods that vectors, strings,
+and hash maps have that will be helpful for these exercises!
+
+We’re getting into more complex programs in which operations can fail, so, it’s
+a perfect time to discuss error handling. We’ll do that next!
+