Andrea Lattuada Blog

A hammer you can only hold by the handle

05 November 2018

This post was cross-posted to the ETH Zürich Systems Group Blog.


Today we’re looking at the rust borrow checker from a different perspective. As you may know, the borrow checker is designed to safely handle memory allocation and ownership, preventing accessess to invalid memory and ensuring data-race freedom. This is a form of resource management: the borrow checker is tracking who’s in charge of a chunk of memory, and who is currently allowed to read or write to it. In this post, we’ll see how these facilities can be used to enforce higher-level API constraints in your libraries and software. Once you’re familiar with these techniques, we’ll cover how the same principles apply to advanced memory management and handling of other more abstract resources.

This is an extended blog post version of my RustFest Zürich talk (it was recorded). That’s why you’ll find “rustfest” in some of the code examples.

Affine type systems

First, a refresher on affine types. Affine type systems, like Rust’s, only allow a variable to be used once (if it’s not a reference). This is at the core of the ownership semantics, and it’s a significant departure from other mainstream languages (think of using a variable multiple times in C). Here’s an example:

1
2
3
4
5
6
7
8
fn use_name(name: String) { }

fn main() {
    let name = String::from("Andrea");
    use_name(name);

    println!("{}", name);
}

Note that use_name takes name’s ownership (and it’s not pass-by-reference) so name is moved on line 5, and it cannot be used again on line 7. Here’s the compiler output:

error[E0382]: use of moved value: `name`
 --> affine.rs:7:18
  |
5 |   use_name(name);
  |            ---- value moved here
6 | 
7 |   println!("{}", name);
  |                  ^^^^ value used here after move
  |
  = note: move occurs because `name` has type `std::string::String`,
          which does not implement the `Copy` trait

error: aborting due to previous error

For more information about this error, try `rustc --explain E0382`.

If you’d like a more in-depth explanation on ownership (Rust’s lingo for its affine type system), you can take a look at the relevant chapter in the Rust book.

Drop

Now we know what happens if we try to transfer ownership of a variable more than once, but what if we never use it inside a scope?

First of all, Rust is lexically scoped, so variable names are only valid within the scope where they’re defined.

1
2
3
4
5
6
7
8
9
10
11
12
13
struct Thing {
    number: u32,
}

fn main() {
    let a = 4;

    if a > 3 {
        let thing = Thing { number: a };
    } // `thing` dropped here

    println!("{}", thing);
}
error[E0425]: cannot find value `thing` in this scope
  --> drop.rs:12:18
   |
12 |   println!("{}", thing);
   |                  ^^^^^ not found in this scope

error: aborting due to previous error

For more information about this error, try `rustc --explain E0425`.

No thing there, it went out of scope on line 10. Importantly, thing goes out of scope before its ownership is transfered, so Rust drops it: the compiler inserts code to clean up all resources associated with thing and frees its memory. We can hook into this mechanism by providing an implementation of the special Drop trait:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
impl Drop for Thing {
    fn drop(&mut self) {
        eprintln!("dropping thing {}", self.number);
    }
}

fn main() {
    let a = 4;

    if a > 3 {
        let thing = Thing { number: a };
        eprintln!("inside scope");
    } // `thing` dropped here
    eprintln!("outside scope");
}

If we run this we get:

inside scope
dropping thing 4
outside scope

Again, more details are in the book.

Managing resources

We’re going to try to encode higher level API constraints using the linear typing (ownership) semantics of Rust.

envelope, letter, lorry

The interaction we’re going to describe is pretty simple: sending a letter via a delivery service. One has a written letter they’d like to send: they put it in a pre-stamped envelope, they close the envelope and they hand it to the lorry driver. Of course, all of this applies to many APIs: we’ll see a couple of examples at the end.

Here’s one way to model our protocol in Rust:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
#[derive(Clone)]
pub struct Letter {
    text: String,
}

pub struct Envelope {
    letter: Option<Letter>,
}

pub struct PickupLorryHandle {
    done: bool,
    // references to lorry's resources
}

impl Letter {
    pub fn new(text: String) -> Self { Letter { text: text } }
}

impl Envelope {
    /// Put a letter in the envelope and seal it.
    pub fn wrap(&mut self, letter: &Letter) {
        self.letter = Some(letter.clone());
    }
}

impl PickupLorryHandle {
    /// Give an envelope to the delivery driver.
    pub fn pickup(&mut self, envelope: &Envelope) {
        // (the details here don't matter)
    }

    /// Tell the driver we don't have anything else for them.
    pub fn done(&mut self) {
        self.done = true; println!("sent");
    }
}

pub fn order_pickup() -> PickupLorryHandle {
    // ...
}

With this in place we can write our client code:

1
2
3
4
5
6
7
8
9
// in a separate module
fn main() {
    let rustfest_letter = Letter::new(String::from("Dear RustFest"));
    let mut rustfest_envelope = buy_prestamped_envelope();
    rustfest_envelope.wrap(&rustfest_letter);
    let mut lorry = order_pickup();
    lorry.pickup(&rustfest_envelope);
    lorry.done();
}

Our client code:

1, 2, 3

Our API has three shortcomings we can better address with Rust’s type system:

  1. we’d like to prevent re-use of what we know it’s a finite resource: we only have one physical copy of the letter;
    letter duplicate

  2. we want to make sure that we perform a series of steps in the right order (and only once): put the letter in the envelope, seal it, and give it to the driver (i.e. avoid giving an empty envelope);
    letter, envelope, lorry

  3. we don’t want to forget to release a resource when we’re done: we ensure we tell the driver they can leave.
    lorry question

Use a resource only once

letter duplicate

Here’s some problematic client code:

1
2
3
4
5
6
7
8
9
10
11
fn main() {
    let rustfest_letter = Letter::new(String::from("Dear RustFest"));
    let mut envelopes = vec![
        buy_prestamped_envelope(), buy_prestamped_envelope()];
    let mut lorry = order_pickup();
    for e in envelopes.iter_mut() {
        e.wrap(&rustfest_letter);
        lorry.pickup(&e);
    }
    lorry.done();
}

No compiler error, but somehow the letter was magically duplicated and inserted in both envelopes. Sometimes, this is perfectly fine (copying some memory isn’t a big deal), but sometimes the resource represented by our struct cannot be easily duplicated: in this example, if it’s representing a constraint in our business logic. In general, if our struct represents a handle to a resource out of our control, we may not be able to clone it without breaking some safety or correctness guarantees.

So let’s remove the Clone implementation for Letter, and adjust the client code:

1
2
3
4
5
6
7
8
9
10
#[derive(Clone)]
pub struct Letter {
    text: String,
}

impl Envelope {
    pub fn wrap(&mut self, letter: Letter) { // take ownership of `letter`
        self.letter = Some(letter);
    }
}
1
2
3
4
5
6
7
8
9
10
11
fn main() {
    let rustfest_letter = Letter::new(String::from("Dear RustFest"));
    let mut envelopes = vec![
        buy_prestamped_envelope(), buy_prestamped_envelope()];
    let mut lorry = order_pickup();
    for e in envelopes.iter_mut() {
        e.wrap(rustfest_letter); // give ownership of `rustfest_letter`
        lorry.pickup(&e);
    }
    lorry.done();
}

Here’s the compiler output:

error[E0382]: use of moved value: `rustfest_letter`
  --> letter1.rs:7:16
   |
 7 |         e.wrap(rustfest_letter);
   |                ^^^^^^^^^^^^^^^ value moved here in previous iteration of loop
   |
   = note: move occurs because `rustfest_letter` has type `Letter`,
           which does not implement the `Copy` trait

error: aborting due to previous error

For more information about this error, try `rustc --explain E0382`.

Great! We can now only use each letter once!

Enforce order

letter duplicate

One issue down, two to go. We’d like to make sure that the steps of the protocol are carried out in the proper order: we must not forget to put the letter in the envelope before handing it to the lorry driver! And, can we prevent inserting two letters in the same envelope at compile time?

Here’s the broken client code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
fn main() {
    let rustfest_letter = Letter::new(String::from("Dear RustFest"));
    let mut first_envelope = buy_prestamped_envelope();
    first_envelope.wrap(rustfest_letter);

    let eth_letter = Letter::new(String::from("Dear ETH"));
    first_envelope.wrap(eth_letter);

    let mut lorry = order_pickup();
    lorry.pickup(&first_envelope);

    let another_envelope = buy_prestamped_envelope();
    lorry.pickup(&another_envelope);

    lorry.done();
}
1
2
3
4
5
6
impl Envelope {
    pub fn wrap(&mut self, letter: &Letter) {
        assert!(self.letter.is_none());
        self.letter = Some(letter.clone());
    }
}

This compiles, but of course the assert in wrap will fire, at runtime.

thread 'main' panicked at 'assertion failed: self.letter.is_none()'
note: Run with `RUST_BACKTRACE=1` for a backtrace.

We’d like to prevent this at compile time. And once that’s fixed, what about making sure we don’t send empty envelopes (another_envelope on lines 12-13 of main)?

We can make a couple of classes to represent the current state of the Envelope: EmptyEnvelope is an empty pre-stampted envelope, and ClosedEnvelope is a closed envelope guaranteed to contain a letter. We can then only provide implementations for actions that make sense for that specific state. Then, to make sure we don’t send an empty envelope, we make sure that pickup only takes a ClosedEnvelope (and we make it take ownership, to avoid spurious copies).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
/// An empty pre-stamped envelope.
pub struct EmptyEnvelope { }

/// A closed envelope containing a letter.
pub struct ClosedEnvelope {
    letter: Letter,
}

impl EmptyEnvelope {
    /// Put a letter in the envelope and seal it.
    pub fn wrap(self, letter: Letter) -> ClosedEnvelope {
        ClosedEnvelope { letter: letter }
    }
}

impl PickupLorryHandle {
    /// Give an envelope to the delivery driver.
    pub fn pickup(&mut self, envelope: ClosedEnvelope) {
        /* give letter */
    }

    ...
}

pub fn buy_prestamped_envelope() -> EmptyEnvelope { EmptyEnvelope { } }

And here’s the updated client code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
fn main() {
    let rustfest_letter = Letter::new(String::from("Dear RustFest"));
    let mut first_envelope = buy_prestamped_envelope();
    first_envelope.wrap(rustfest_letter);

    let eth_letter = Letter::new(String::from("Dear ETH"));
    first_envelope.wrap(eth_letter);

    let mut lorry = order_pickup();
    lorry.pickup(first_envelope);

    let another_envelope = buy_prestamped_envelope();
    lorry.pickup(another_envelope);

    lorry.done();
}
error[E0308]: mismatched types
  --> letter2.rs:10:18
   |
10 |     lorry.pickup(first_envelope);
   |                  ^^^^^^^^^^^^^^ expected struct `ClosedEnvelope`,
   |                                 found struct `EmptyEnvelope`
   |
   = note: expected type `ClosedEnvelope`
              found type `EmptyEnvelope`

error[E0308]: mismatched types
  --> letter2.rs:13:18
   |
13 |     lorry.pickup(another_envelope);
   |                  ^^^^^^^^^^^^^^^^ expected struct `ClosedEnvelope`,
   |                                      found struct `EmptyEnvelope`
   |
   = note: expected type `ClosedEnvelope`
              found type `EmptyEnvelope`

error: aborting due to 2 previous errors

For more information about this error, try `rustc --explain E0308`.

One problem prevented: no empty envelopes can be sent! Note how the compiler errors point us towards a solution: we need a ClosedEnvelope for the lorry to pickup. Let’s fix the client code again.

1
2
3
4
5
6
7
8
9
10
11
12
13
fn main() {
    let rustfest_letter = Letter::new(String::from("Dear RustFest"));

    let envelope = buy_prestamped_envelope();
    let closed_envelope = envelope.wrap(rustfest_letter);

    let eth_letter = Letter::new(String::from("Dear ETH"));
    let closed_envelope = envelope.wrap(eth_letter);

    let mut lorry = order_pickup();
    lorry.pickup(closed_envelope);
    lorry.done();
}
error[E0382]: use of moved value: `envelope`
  --> letter2.rs:8:27
   |
 5 |     let closed_envelope = envelope.wrap(rustfest_letter);
   |                           -------- value moved here
...
 8 |     let closed_envelope = envelope.wrap(eth_letter);
   |                           ^^^^^^^^ value used here after move
   |
   = note: move occurs because `envelope` has type `EmptyEnvelope`,
           which does not implement the `Copy` trait

error: aborting due to previous error

For more information about this error, try `rustc --explain E0382`.

By making EmptyEnvelope take self’s ownership we can use the linear typing technique from earlier to prevent reuse of envelope: once we’ve put a letter in an envelope, we get back a ClosedEnvelope, that can only be handed over to the lorry driver. Now the compiler can help us make sure we follow the protocol steps in order: put the letter in the envelope, then send it.

We’ll see a more complex example in which we compose an http response in the right order using this technique. For now, here’s the correct client code with the new API:

1
2
3
4
5
6
7
8
9
fn main() {
    let rustfest_letter = Letter::new(String::from("Dear RustFest"));
    let mut envelope = buy_prestamped_envelope();
    let closed_envelope = envelope.wrap(rustfest_letter);

    let mut lorry = order_pickup();
    lorry.pickup(closed_envelope);
    lorry.done();
}

Ensure a resource is released

lorry question

Another common mistake: forgetting to release a resource. In the following client code, lorry.done() is missing, and we never tell the delivery driver we’re done. And, in this tortured methaphor, we’d never deliver the letter because the driver never leaves…

1
2
3
4
5
6
7
8
fn main() {
    let rustfest_letter = Letter::new(String::from("Dear RustFest"));
    let mut envelope = buy_prestamped_envelope();
    let closed_envelope = envelope.wrap(rustfest_letter);

    let mut lorry = order_pickup();
    lorry.pickup(closed_envelope);
}

In the real world, we may be keeping a connection open, or never completing a process in our business logic; and it’s just because we forgot a single line.

We’ve seen that we can hook into Rust’s drop mechanism; here’s how we’d ensure that we release the lorry:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
impl Drop for PickupLorryHandle {
    fn drop(&mut self) {
        if !self.done {
            self.done();
        }
    }
}

impl PickupLorryHandle {
    /// Tell the driver we don't have anything else for them.
    pub fn done(self) {
        self.done = true; println!("sent");
    }

    ...
}
1
2
3
4
5
6
7
8
9
10
11
fn main() {
    let rustfest_letter = Letter::new(String::from("Dear RustFest"));

    let envelope = buy_prestamped_envelope();
    let closed_envelope = envelope.wrap(rustfest_letter);

    let mut lorry = order_pickup();
    lorry.pickup(closed_envelope);

    // `lorry` dropped here
}

Now when the lorry goes out of scope the drop implementation is called, and we automatically send the driver on their merry way.

In the std library

This last pattern is often used in the std library when building abstractions where there’s a need to release a resource or a handle to a resource.

Rc, “a single-threaded reference-counting pointer”, tracks the number of references to a resource, and the ownership of the Rc pointer represents a strong reference: when no references are left (all Rcs are dropped), the underlying resource (memory and other elements of the struct) should be freed.

There’s some unsafe trickery under the hood, but what’s relevant here is that Rc uses Drop to release the underlying memory once the reference-count reaches zero.

Here’s the relevant code: std::rc::Rc.

unsafe impl<#[may_dangle] T: ?Sized> Drop for Rc<T> {
    // ...
    fn drop(&mut self) {
        unsafe {
            self.dec_strong();
            if self.strong() == 0 {
                // destroy the contained object
                ptr::drop_in_place(self.ptr.as_mut());
                // ...
            }
        }
    }
}

Let’s look at another example in std: Mutex, “a mutual exclusion primitive useful for protecting shared data”.

Locking a Mutex returns a MutexGuard, “an RAII implementation of a scoped lock of a mutex”, that is, an object that represents the fact that we’re holding the lock. And Drop is used so we can release the lock just by dropping the guard (either explicitly, with std::mem::drop, or when it goes out of scope).

This is the signature of Mutex::lock:

pub fn lock(&self) -> LockResult<MutexGuard<T>>

And here’s the Drop implementation for MutexGuard: std::sync::MutexGuard.

#[stable(feature = "rust1", since = "1.0.0")]
impl<'a, T: ?Sized> Drop for MutexGuard<'a, T> {
    #[inline]
    fn drop(&mut self) {
        unsafe {
            self.__lock.poison.done(&self.__poison);
            self.__lock.inner.raw_unlock();
        }
    }
}

Example: http response

Now let’s look at an example of how all of these techniques can be combined in a realistic API. Let’s say we’re building an http server library, and we want to make sure that when we write our response we first send all the headers, and only then start writing the body. We may also want to make sure not to forget to flush the buffer at the end.

We use two structs to represent the two states: HttpResponseWritingHeader, and HttpResponseWritingBody. The method body on HttpResponseWritingHeader takes ownership of self, ensuring that the header writer can no longer be used after this call (its ownership is transfered), and we can only append additional chunks to the response body.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
pub struct HttpResponseWritingHeaders { /* connection, … */ }

pub struct HttpResponseWritingBody { /* ... */ }

pub fn start_response() -> HttpResponseWritingHeaders { /* ... */ }

impl HttpResponseWritingHeaders {
    fn header(&mut self, header: Header) { /* ... */ }

    fn body(self) -> HttpResponseWritingBody { /* ... */ }
}

impl HttpResponseWritingBody {
    fn write(&mut self, chunk: Chunk) { /* ... */ }

    fn cease(self) { }
}

impl Drop for HttpResponseWritingBody {
    fn drop(&mut self) {
        self.flush();
    }
}

The Drop implementation ensures that a completed response is always fully written out to the client.

State explosion

Note that the technique of representing the state of a protocol/object with many structs has a significant drawback when the possible state space grows large: we may end up juggling a lot of structs, and the benefit of compile time checks may not be worth the cost in lines-of-code, and maintainability.

Example: streaming engine

At the Systems Group of the ETH Zürich CS department, we’re working on timely dataflow, a low-latency cyclic dataflow computational model: it lets you describe computations as a cyclic graph of nodes (operators), that perform some transformation on incoming data and maintain local state, and edges (channels) that carry data between operators. A computation built this way can be automatically parallelised across cores and computers, over the network.

In timely dataflow, data is transported on channels as tuples, each carrying a logical timestamp.

lorry-time

Timestamps are used to represent logical boundaries between groups of tuples so that operations can be performed on some subset, and timely dataflow tracks which timestamps are in flight in the system. Importantly, for timely dataflow’s correctness, operators are only allowed to send messages with timestamp t when they hold a Capability for the same timestamp t. In addition, they need to report whenever they relinquish one of these capabilities, so that the system can make forward progress (that is, a downstream operator can then process all the tuples associated with the capability’s timestamp).

In timely, we use something like the following to represent a Capability, a resource which grants permission to send data at a certain timestamp.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
/// The capability to send data with a certain timestamp on a dataflow edge.
pub struct Capability<T: Timestamp> {
    time: T,
    internal: Rc<RefCell<ChangeBatch<T>>>,
}

impl<T: Timestamp> Clone for Capability<T> {
    fn clone(&self) -> Capability<T> {
        // … update self.internal …
    }
}

impl<T: Timestamp> Drop for Capability<T> {
    fn drop(&mut self) {
        // … update self.internal …
    }
}

/// Return a channel handle to send data at the timestamp carried by `cap`.
pub fn session(&mut self, cap: &Capability<T>) -> Handle

Capability has a custom Clone implementation, that keeps track of how many copies we’ve made (a bit like Rc), and a Drop implementation, so we can be sure that a Capability is dropped if not explicitly retained. The function that an operator uses to write to its output requires a reference to a valid Capability for a certain timestamp: this way we enforce, at compile time, one of the protocol constraints of timely dataflow.

Thanks for reading! You can send me notes and questions on twitter. There’s a lobste.rs and a reddit post for comments.

Written by Andrea Lattuada (@utaal)