Book Image

Hands-On Data Structures and Algorithms with Rust

By : Claus Matzinger
Book Image

Hands-On Data Structures and Algorithms with Rust

By: Claus Matzinger

Overview of this book

Rust has come a long way and is now utilized in several contexts. Its key strengths are its software infrastructure and resource-constrained applications, including desktop applications, servers, and performance-critical applications, not forgetting its importance in systems' programming. This book will be your guide as it takes you through implementing classic data structures and algorithms in Rust, helping you to get up and running as a confident Rust programmer. The book begins with an introduction to Rust data structures and algorithms, while also covering essential language constructs. You will learn how to store data using linked lists, arrays, stacks, and queues. You will also learn how to implement sorting and searching algorithms. You will learn how to attain high performance by implementing algorithms to string data types and implement hash structures in algorithm design. The book will examine algorithm analysis, including Brute Force algorithms, Greedy algorithms, Divide and Conquer algorithms, Dynamic Programming, and Backtracking. By the end of the book, you will have learned how to build components that are easy to understand, debug, and use in different applications.
Table of Contents (15 chapters)

Borrowing and ownership

Rust is famous for its memory management model, which replaces runtime garbage collection with compile-time checks for memory safety. The reason why Rust can work without a garbage collector and still free the programmer from error-prone memory management is simple (but not easy): borrowing and ownership.

While the particulars are quite complex, the high-level view is that the compiler inserts any "provide x amounts of memory" and "remove x amounts of memory" (somewhat like malloc() and free() for C programmers) statements for the developer. Yet how can it do that?

The rules of ownership are as follows:
  • The owner of a value is a variable
  • At any time, only a single owner is allowed
  • The value is lost once the owner goes out of scope

This is where Rust's declarative syntax comes into play. By declaring a variable, the compiler knows—at compile time—that a certain amount of memory needs to be reserved. The lifetime is clearly defined too, from the beginning to end of a block or function, or as long as the struct instance lives. If the size of this variable is known at compile time, the compiler can provide exactly the necessary amount of memory to the function for the time required. To illustrate, let's consider this snippet, where two variables are allocated and removed in a deterministic order:

fn my_func() {
// the compiler allocates memory for x
let x = LargeObject::new();
x.do_some_computation();
// allocate memory for y
let y = call_another_func();
if y > 10 {
do_more_things();
}
} // deallocate (drop) x, y

Is this not what every other compiler does? The answer is yes—and no. At compile time, the "provide x amounts of memory" part is fairly simple; the tricky part is keeping track of how much is still in use when references can be passed around freely. If, during the course of a function, a particular local reference becomes invalid, a static code analysis will tell the compiler about the lifetime of the value behind the reference. However, what if a thread changes that value at an unknown time during the function's execution?

At compile time, this is impossible to know, which is why many languages do these checks at runtime using a garbage collector. Rust forgoes this, with two primary strategies:

  • Every variable is owned by exactly one scope at any time
  • Therefore, the developer is forced to pass ownership as required

Especially when working with scopes, the nature of stack variables comes in handy. There are two areas of memory, stack and heap, and, similar to other languages, the developer uses types to decide whether to allocate heap (Box, Rc, and so on) or stack memory.

Stack memory is usually short-lived and smaller, and operates in a first-in, last-out manner. Consequently, a variable's size has to be known before it is put on the stack:

Heap memory is different; it's a large portion of the memory, which makes it easy to allocate more whenever needed. There is no ordering, and memory is accessed by using an addresses. Since the pointer to an address on the heap has a known size at compile time, it fits nicely on the stack:

Stack variables are typically passed by value in other languages, which means that the entire value is copied and placed into the stack frame of the function. Rust does the same, but it also invalidates further use of that variable in the—now parent—scope. Ownership moves into the new scope and can only be transferred back as a return value. When trying to compile this snippet, the compiler will complain:

fn my_function() {
let x = 10;
do_something(x); // ownership is moved here
let y = x; // x is now invalid!
}

Borrowing is similar but, instead of copying the entire value, a reference to the original value is moved into the new scope. Just like in real life, the value continues to be owned by the original scope; scopes with a reference are just allowed to use it as it was provided. Of course, this comes with drawbacks for mutability, and some functions will require ownership for technical and semantic reasons, but it also has advantages such as a smaller memory footprint.

These are the rules of borrowing:
  • Owners can have immutable or mutable references, but not both
  • There can be multiple immutable references, but only one mutable reference
  • References cannot be invalid

By changing the previous snippet to borrow the variable to do_something() (assuming this is allowed, of course), the compiler will be happy:

fn my_function() {
let x = 10;
do_something(&x); // pass a reference to x
let y = x; // x is still valid!
}

Borrowed variables rely heavily on lifetimes. The most basic lifetime is the scope it was created in. However, if a reference should go into a struct field, how can the compiler know that the underlying value has not been invalidated? The answer is explicit lifetimes!

Exceptional lifetimes

Some lifetimes are different and Rust denominates them with a '. While this could be the predefined 'static, it's equally possible to create your own, something that is often required when working with structures.

This makes sense when thinking about the underlying memory structure: if an input parameter is passed into the function and returned at the end, its lifetime surpasses the function's. While the function owns this part of the memory during its lifetime, it cannot borrow a variable for longer than it actually exists. So, this snippet cannot work:

fn another_function(mut passing_through: MyStruct) -> MyStruct {
let x = vec![1, 2, 3];

// passing_through cannot hold a reference
// to a shorter lived x!
// the compiler will complain.
passing_through.x = &x;

return passing_through;
} // x's life ends here

The reason is that the passing_through variable outlives x. There are several solutions to this problem:

  • Change the type definition of MyStruct to require ownership. This way, the structure now owns the variable and it will live as long as the structure:
fn another_function(mut passing_through: MyStruct) -> MyStruct {
let x = vec![1, 2, 3];

// passing_through owns x and it will be
// dropped together with passing_through.
passing_through.x = x;

return passing_through;
}
  • Clone x to pass ownership into passing_through:
fn another_function(mut passing_through: MyStruct) -> MyStruct {
let x = vec![1, 2, 3];
let y = &x;

// passing_through owns a deep copy of x'value that is be
// dropped together with passing_through.
passing_through.x = y.clone();

return passing_through;
}
  • In this case, vec![] is statically defined, so it could make sense to add it as a function parameter. This is not only more allocation-efficient, but also can enforce an appropriate lifetime:
fn another_function<'a>(mut passing_through: MyStruct<'a>, x: &'a Vec<u32>) -> MyStruct<'a> {

// The compiler knows and expects a lifetime that is
// at least as long as the struct's
// of any reference passed in as x.
passing_through.x = x;

return passing_through;
}

Lifetimes cause a lot of strange errors for many Rust users, and in the 2018 edition there is one less to worry about. With the introduction of non-lexical lifetimes, the borrow checker got a lot smarter and it is now able to check—up to a certain degree—semantically whether the variable was used. Recall from the rules of borrowing that, if a mutable reference is created, no immutable references can exist.

This code did not compile before Rust 1.31:

fn main() {     
let mut a = 42;
let b = &a; // borrow a
let c = &mut a; // borrow a again, mutably
// ... but don't ever use b
}

Now it will compile since the compiler does not just check the beginning and ending of a scope, but also if the reference was used at all.

Multiple owners

As powerful as single ownership is, it does not work for every use case. Large objects or shared objects that other instances need to own are examples where immutable ownership makes life easier. Consider a function that requires an owned object to be passed in:

#[derive(Debug)]
struct FileName {
name: String,
ext: String
}

fn no_ref_counter() {
let name = String::from("main");
let ext = String::from("rs");

for _ in 0..3 {
println!("{;?}",
FileName {
name: name,
ext: ext
});

}
}

When trying to compile no_ref_counter(), the compiler creates a scope for each iteration of the loop and owns any value that is used within it. This works exactly once, since afterward, the variable has been moved and is inaccessible for subsequent iterations.

Consequently, these values (in this case, name and ext) are gone and compilation will yield two errors, one for each "second" move of a string:

error[E0382]: use of moved value: `name`
--> src/main.rs:63:33
|
63 | let _ = FileName { name: name, ext: ext };
| ^^^^ value moved here in previous iteration of loop
|
= note: move occurs because `name` has type `std::string::String`, which does not implement the `Copy` trait

error[E0382]: use of moved value: `ext`
--> src/main.rs:63:44
|
63 | let _ = FileName { name: name, ext: ext };
| ^^^ value moved here in previous iteration of loop
|
= note: move occurs because `ext` has type `std::string::String`, which does not implement the `Copy` trait

One solution is to clone the object in every iteration, but that causes a lot of slow memory allocations. For this, the Rust standard library provides a solution: reference counting.

A reference counter (std::rc::Rc<T>) encapsulates a variable of type T allocated on the heap and returns an immutable reference when created. This reference can be cloned with low overhead (it's only a reference count that is incremented) but never transformed into a mutable reference. Regardless, it acts just like owned data, passing through function calls and property lookups.

While this requires a change to the variable types, a call to clone() is now far cheaper than cloning the data directly:

use std::rc::Rc;

#[derive(Debug)]
struct FileName {
name: Rc<String>,
ext: Rc<String>
}

fn ref_counter() {
let name = Rc::new(String::from("main"));
let ext = Rc::new(String::from("rs")));

for _ in 0..3 {
println!("{;?}",
FileName {
name: name.clone(),
ext: ext.clone()
});

}
}

Running this snippet prints the debug version of the FileName object three times:

FileName { name: "main", ext: "rs" }
FileName { name: "main", ext: "rs" }
FileName { name: "main", ext: "rs" }

This approach works great for single-threaded and immutable scenarios, but will refuse to compile multithreaded code. The solution to this will be discussed in the next section.