Book Image

Hands-On Data Structures and Algorithms with Rust

By : Claus Matzinger
Book Image

Hands-On Data Structures and Algorithms with Rust

By: Claus Matzinger

Overview of this book

Rust has come a long way and is now utilized in several contexts. Its key strengths are its software infrastructure and resource-constrained applications, including desktop applications, servers, and performance-critical applications, not forgetting its importance in systems' programming. This book will be your guide as it takes you through implementing classic data structures and algorithms in Rust, helping you to get up and running as a confident Rust programmer. The book begins with an introduction to Rust data structures and algorithms, while also covering essential language constructs. You will learn how to store data using linked lists, arrays, stacks, and queues. You will also learn how to implement sorting and searching algorithms. You will learn how to attain high performance by implementing algorithms to string data types and implement hash structures in algorithm design. The book will examine algorithm analysis, including Brute Force algorithms, Greedy algorithms, Divide and Conquer algorithms, Dynamic Programming, and Backtracking. By the end of the book, you will have learned how to build components that are easy to understand, debug, and use in different applications.
Table of Contents (15 chapters)

Concurrency and mutability

Rust's approach to managing memory is a powerful concept. In fact, it is powerful enough to also facilitate concurrency and parallel execution. However, first things first: how do threads work in the Rust standard library?

Concurrency and parallelism are two different modes of execution. While concurrency means that parts of a program run independently of each other, parallelism refers to these parts executing at the same time. For simplicity, we will refer to both concepts as concurrency.

Due to its low-level nature, Rust provides an API to the operating system's threading capabilities (for example, POSIX on Linux/Unix systems). If no variables are passed into the scope, their usage is very straightforward:

use std::thread; 

fn threading() {
// The to pipes (||) is the space where parameters go,
// akin to a function signature's parameters, without
// the need to always declare types explicitly.
// This way, variables can move from the outer into the inner scope
let handle = thread::spawn(|| {
println!("Hello from a thread");
});
handle.join().unwrap();
}

However, when passing data back and forth, more work has to be done to hold up Rust's safety guarantees, especially when mutability comes into play. Before getting into that, it is important to recap immutability.

Immutable variables

Rust—like many functional languages—embraces immutable variables. They are the default, and changing mutability requires explicit declaration with mut, which tells the compiler what the variable is going to be used for (reading or writing).

Functional programming languages are known for facilitating the ability to work concurrently, thanks to immutability guarantees; reading data does not produce side effects! Requiring explicit mutability gives the compiler a chance to check where and if mutability is required, and therefore whether a data race may occur.

This results in compile-time warnings and errors instead of crashes and strange race conditions at runtime, something that many production users appreciate. In short, it's easier to think through your code if mutability is a (rare) option instead of the norm.

Shadowing

Instead of changing variable properties, it's often more readable to overwrite a variable with a different value (for example, a changed copy of the original). This technique is called shadowing.

Typically, this is used to reuse a variable name, even though the actual value has changed, to work in the current situation. This snippet sanitizes String and, by using the same name throughout the function, it's always clear that it's the input parameter that is changed:

fn sanitize(s: String) -> String {
let s = s.trim();
let s = s.replace(" ", "_");
s
}

While this is akin to changing the value of a variable, shadowing does not replace mutability, especially when it's less costly to actually change properties of that variable; Rust has a specific design pattern for that!

Interior mutability

Can a variable be immutable and mutable at the same time? Of course. Boxed variables (Box, Rc, and so on) are an immutable reference to the heap and they contain the actual value.

For these kinds of containers, there is no reason why the inner variable cannot be changed—a task that can be done safely in Rust using RefCell. RefCell maintains single ownership of a value but allows mutable borrowing checked at runtime. Instead of compiler errors, violating the rules of borrowing will lead to a runtime panic!, crashing the program.

This entire concept is called interior mutability and is often used in combination with Rc in order to provide a value to multiple owners with mutability at will. Clearly, to provide a great user experience, it is strongly recommended to make sure the borrowing rules can't be violated in other ways.

Wrapping a RefCell in an Rc acts as the gatekeeper for having multiple owners, including a way to change the contents. This is actually similar to more traditional programming languages such as Java or C#, where typically references are moved between method calls, pointing to the object's instance on the heap memory.

This pattern is very important for implementing complex programs and data structures, since ownership of a specific variable is not always clear. For example, later in the book we will examine doubly linked lists, which famously have a pointer to the preceding and succeeding node. Which node should have ownership of which pointer? Interior mutability allows us to say both. Consider the node declaration we will use later:

use std::cell::RefCell;
use std::rc::Rc;

#[derive(Clone)]
struct Node {
value: String,
next: Link,
prev: Link,
}

type Link = Option<Rc<RefCell<Node>>>;

With this list declaration, we can see the pattern in this simpler version of the append function:

pub fn append(&mut self, value: String) {
let new = Rc::new(RefCell::new(Node::new(value)));
match self.tail.take() {
Some(old) => {
old.borrow_mut().next = Some(new.clone());
new.borrow_mut().prev = Some(old);
}
None => self.head = Some(new.clone()),
};
}

This code adds a new node at the front (head) of the list, which contains all data in the form of nodes stored on the heap. In order to add a node at the head of the list, the references have to be set properly, so the previous and next pointers actually refer to the same nodes instead of copies. A more detailed exploration is going to be covered in Chapter 3, Lists, Lists, and More Lists. For now, the important part is setting the variables using borrow_mut(). This mutable reference only lives as long as the assignment takes, thereby ruling out creating a too-large scope and violating the borrowing rules.

By using the RefCell function's borrow_mut(), it will check for and enforce borrowing rules and panic in the case of a violation. Later on, we will also talk about the Mutex type, which is essentially a multithreaded version of these cells.

Moving data

The introductory snippet showed code that spawns a thread but did not pass any data into the scope. Just like any other scope, it requires either ownership of a value or at least a borrowed reference in order to work with that data. In this case, passing ownership is what we want, something that is called moving data into the scope.

If we change the snippet from the introduction to include a simple variable to print from within the thread, compilation is going to fail:

use std::thread; 

fn threading() {
let x = 10;
let handle = thread::spawn(|| {
println!("Hello from a thread, the number is {}", x);
});
handle.join().unwrap();
}

The reason for this is simple: the compiler cannot determine the lifetimes of each of the scopes (will x still be there when the thread needs it?), so it refuses to compile the code:

Compiling ch1 v0.1.0 (file:///code/ch1) 
error[E0373]: closure may outlive the current function, but it borrows `x`, which is owned by the current function
--> src/main.rs:5:32
|
5 | let handle = thread::spawn(|| {
| ^^ may outlive borrowed value `x`
6 | println!("Hello from a thread, the number is {}", x);
| - `x` is borrowed here
help: to force the closure to take ownership of `x` (and any other referenced variables), use the `move` keyword
|
5 | let handle = thread::spawn(move || {
| ^^^^^^^

As the compiler messages indicate, adding the move keyword will solve the issue! This keyword lets a thread pass ownership to a different thread; it "moves" the memory area:

fn threading() { 
let x = 10;
let handle = thread::spawn(move || {
println!("Hello from a thread, the number is {}", x);
});
handle.join().unwrap();
}

When running this snippet, the output is as follows:

Hello from a thread, the number is 10

However, for passing multiple messages into a thread or implementing an actor model, the Rust standard library offers channels. Channels are single-consumer, multi-producer queues that let the caller send messages from multiple threads.

This snippet will spawn 10 threads and have each send a number into the channel, where it will be collected into a vector after the senders have finished executing:

use std::sync::mpsc::{channel, Sender, Receiver};

fn channels() {
const N: i32 = 10;
let (tx, rx): (Sender<i32>, Receiver<i32>) = channel();
let handles = (0..N).map(|i| {
let _tx = tx.clone();
thread::spawn(move || {
// don't use the result
let _ = _tx.send(i).unwrap();
})
});
// close all threads
for h in handles {
h.join().unwrap();
}
// receive N times
let numbers: Vec<i32> = (0..N).map(|_|
rx.recv().unwrap()
).collect();

println!("{:?}", numbers);
}

As expected, the output is as follows:

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

With these tools, a multithreaded application can move data between threads without the need for manual locking or the dangers of inadvertently creating side effects.

Sharing data

Other than sending data into threads one way, many programs operate on a shared state where multiple execution streams have to access and change one or more shared variables. Typically, this warrants a mutex (short for mutual exclusion), so that any time something is accessed within this locked mutex, it is guaranteed to be a single thread.

This is an old concept and implemented in the Rust standard library. How does that facilitate accessing a variable? Wrapping a variable into a Mutex type will provide for the locking mechanism, thereby making it accessible from multiple concurrent writers. However, they don't have ownership of that memory area yet.

In order to provide that ownership across threads—similar to what Rc does within a single thread—Rust provides the concept of an Arc, an atomic reference counter. Using this Mutex on top, it's the thread-safe equivalent of an Rc wrapping a RefCell, a reference counter that wraps a mutable container. To provide an example, this works nicely:

use std::thread;
use std::sync::{Mutex, Arc};

fn shared_state() {
let v = Arc::new(Mutex::new(vec![]));
let handles = (0..10).map(|i| {
let numbers = Arc::clone(&v);
thread::spawn(move || {
let mut vector = numbers
.lock()
.unwrap();
(*vector).push(i);
})
});

for handle in handles {
handle.join().unwrap();
}
println!("{:?}", *v.lock().unwrap());
}

When running this example, the output is this:

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9] 

While the preferred way of doing concurrent programming is still to use immutable variables as often as possible, safe Rust provides the tools for working with shared data without side effects.

Send and Sync

These marker traits are fundamental to Rust's multithreading policies. They have distinct purposes:

  • Send: A data type is safe to send (move) from one thread to the other
  • Sync: The data type can be shared across threads without manual locks or mutex areas

These marker traits are implemented in all basic types of the standard library and can be inherited for custom types (if all properties of a type are Sync, then the type itself is Sync too).

Implementing Sync or Send is unsafe because there is no way for the compiler to know if you are right and the code can be shared/sent between threads, which is why it's very unusual to do this.

In case your program requires this depth of Rust programming, be sure to read up on this topic in the Rust Book, chapter 16 (https://doc.rust-lang.org/1.31.0/book/ch16-04-extensible-concurrency-sync-and-send.html).