This post was cross-posted to the ETH Zürich Systems Group Blog.
Today we’re looking at the rust borrow checker from a different perspective. As you may know, the borrow checker is designed to safely handle memory allocation and ownership, preventing accessess to invalid memory and ensuring data-race freedom. This is a form of resource management: the borrow checker is tracking who’s in charge of a chunk of memory, and who is currently allowed to read or write to it. In this post, we’ll see how these facilities can be used to enforce higher-level API constraints in your libraries and software. Once you’re familiar with these techniques, we’ll cover how the same principles apply to advanced memory management and handling of other more abstract resources.
This is an extended blog post version of my RustFest Zürich talk (it was recorded). That’s why you’ll find “rustfest” in some of the code examples.
First, a refresher on affine types. Affine type systems, like Rust’s, only allow a variable to be used once (if it’s not a reference). This is at the core of the ownership semantics, and it’s a significant departure from other mainstream languages (think of using a variable multiple times in C). Here’s an example:
1
2
3
4
5
6
7
8
fn use_name(name: String) { }
fn main() {
let name = String::from("Andrea");
use_name(name);
println!("{}", name);
}
Note that use_name
takes name
’s ownership (and it’s not pass-by-reference) so name
is moved on line 5, and it cannot be used again on line 7. Here’s the compiler output:
error[E0382]: use of moved value: `name` --> affine.rs:7:18 | 5 | use_name(name); | ---- value moved here 6 | 7 | println!("{}", name); | ^^^^ value used here after move | = note: move occurs because `name` has type `std::string::String`, which does not implement the `Copy` trait error: aborting due to previous error For more information about this error, try `rustc --explain E0382`.
If you’d like a more in-depth explanation on ownership (Rust’s lingo for its affine type system), you can take a look at the relevant chapter in the Rust book.
Now we know what happens if we try to transfer ownership of a variable more than once, but what if we never use it inside a scope?
First of all, Rust is lexically scoped, so variable names are only valid within the scope where they’re defined.
1
2
3
4
5
6
7
8
9
10
11
12
13
struct Thing {
number: u32,
}
fn main() {
let a = 4;
if a > 3 {
let thing = Thing { number: a };
} // `thing` dropped here
println!("{}", thing);
}
error[E0425]: cannot find value `thing` in this scope --> drop.rs:12:18 | 12 | println!("{}", thing); | ^^^^^ not found in this scope error: aborting due to previous error For more information about this error, try `rustc --explain E0425`.
No thing
there, it went out of scope on line 10. Importantly, thing
goes out of scope before its ownership is transfered, so Rust drops it: the compiler inserts code to clean up all resources associated with thing
and frees its memory. We can hook into this mechanism by providing an implementation of the special Drop
trait:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
impl Drop for Thing {
fn drop(&mut self) {
eprintln!("dropping thing {}", self.number);
}
}
fn main() {
let a = 4;
if a > 3 {
let thing = Thing { number: a };
eprintln!("inside scope");
} // `thing` dropped here
eprintln!("outside scope");
}
If we run this we get:
inside scope dropping thing 4 outside scope
Again, more details are in the book.
We’re going to try to encode higher level API constraints using the linear typing (ownership) semantics of Rust.
The interaction we’re going to describe is pretty simple: sending a letter via a delivery service. One has a written letter they’d like to send: they put it in a pre-stamped envelope, they close the envelope and they hand it to the lorry driver. Of course, all of this applies to many APIs: we’ll see a couple of examples at the end.
Here’s one way to model our protocol in Rust:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
#[derive(Clone)]
pub struct Letter {
text: String,
}
pub struct Envelope {
letter: Option<Letter>,
}
pub struct PickupLorryHandle {
done: bool,
// references to lorry's resources
}
impl Letter {
pub fn new(text: String) -> Self { Letter { text: text } }
}
impl Envelope {
/// Put a letter in the envelope and seal it.
pub fn wrap(&mut self, letter: &Letter) {
self.letter = Some(letter.clone());
}
}
impl PickupLorryHandle {
/// Give an envelope to the delivery driver.
pub fn pickup(&mut self, envelope: &Envelope) {
// (the details here don't matter)
}
/// Tell the driver we don't have anything else for them.
pub fn done(&mut self) {
self.done = true; println!("sent");
}
}
pub fn order_pickup() -> PickupLorryHandle {
// ...
}
With this in place we can write our client code:
1
2
3
4
5
6
7
8
9
// in a separate module
fn main() {
let rustfest_letter = Letter::new(String::from("Dear RustFest"));
let mut rustfest_envelope = buy_prestamped_envelope();
rustfest_envelope.wrap(&rustfest_letter);
let mut lorry = order_pickup();
lorry.pickup(&rustfest_envelope);
lorry.done();
}
Our client code:
Our API has three shortcomings we can better address with Rust’s type system:
we’d like to prevent re-use of what we know it’s a finite resource: we only have one physical copy of the letter;
we want to make sure that we perform a series of steps in the right order (and only once): put the letter in the envelope, seal it, and give it to the driver (i.e. avoid giving an empty envelope);
we don’t want to forget to release a resource when we’re done: we ensure we tell the driver they can leave.
Here’s some problematic client code:
1
2
3
4
5
6
7
8
9
10
11
fn main() {
let rustfest_letter = Letter::new(String::from("Dear RustFest"));
let mut envelopes = vec![
buy_prestamped_envelope(), buy_prestamped_envelope()];
let mut lorry = order_pickup();
for e in envelopes.iter_mut() {
e.wrap(&rustfest_letter);
lorry.pickup(&e);
}
lorry.done();
}
No compiler error, but somehow the letter was magically duplicated and inserted in both envelopes. Sometimes, this is perfectly fine (copying some memory isn’t a big deal), but sometimes the resource represented by our struct
cannot be easily duplicated: in this example, if it’s representing a constraint in our business logic. In general, if our struct
represents a handle to a resource out of our control, we may not be able to clone
it without breaking some safety or correctness guarantees.
So let’s remove the Clone
implementation for Letter
, and adjust the client code:
1
2
3
4
5
6
7
8
9
10
#[derive(Clone)]
pub struct Letter {
text: String,
}
impl Envelope {
pub fn wrap(&mut self, letter: Letter) { // take ownership of `letter`
self.letter = Some(letter);
}
}
1
2
3
4
5
6
7
8
9
10
11
fn main() {
let rustfest_letter = Letter::new(String::from("Dear RustFest"));
let mut envelopes = vec![
buy_prestamped_envelope(), buy_prestamped_envelope()];
let mut lorry = order_pickup();
for e in envelopes.iter_mut() {
e.wrap(rustfest_letter); // give ownership of `rustfest_letter`
lorry.pickup(&e);
}
lorry.done();
}
Here’s the compiler output:
error[E0382]: use of moved value: `rustfest_letter` --> letter1.rs:7:16 | 7 | e.wrap(rustfest_letter); | ^^^^^^^^^^^^^^^ value moved here in previous iteration of loop | = note: move occurs because `rustfest_letter` has type `Letter`, which does not implement the `Copy` trait error: aborting due to previous error For more information about this error, try `rustc --explain E0382`.
Great! We can now only use each letter once!
One issue down, two to go. We’d like to make sure that the steps of the protocol are carried out in the proper order: we must not forget to put the letter in the envelope before handing it to the lorry driver! And, can we prevent inserting two letters in the same envelope at compile time?
Here’s the broken client code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
fn main() {
let rustfest_letter = Letter::new(String::from("Dear RustFest"));
let mut first_envelope = buy_prestamped_envelope();
first_envelope.wrap(rustfest_letter);
let eth_letter = Letter::new(String::from("Dear ETH"));
first_envelope.wrap(eth_letter);
let mut lorry = order_pickup();
lorry.pickup(&first_envelope);
let another_envelope = buy_prestamped_envelope();
lorry.pickup(&another_envelope);
lorry.done();
}
1
2
3
4
5
6
impl Envelope {
pub fn wrap(&mut self, letter: &Letter) {
assert!(self.letter.is_none());
self.letter = Some(letter.clone());
}
}
This compiles, but of course the assert in wrap
will fire, at runtime.
thread 'main' panicked at 'assertion failed: self.letter.is_none()' note: Run with `RUST_BACKTRACE=1` for a backtrace.
We’d like to prevent this at compile time. And once that’s fixed, what about making sure we don’t send empty envelopes (another_envelope
on lines 12-13 of main
)?
We can make a couple of classes to represent the current state of the Envelope
: EmptyEnvelope
is an empty pre-stampted envelope, and ClosedEnvelope
is a closed envelope guaranteed to contain a letter. We can then only provide implementations for actions that make sense for that specific state. Then, to make sure we don’t send an empty envelope, we make sure that pickup
only takes a ClosedEnvelope
(and we make it take ownership, to avoid spurious copies).
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
/// An empty pre-stamped envelope.
pub struct EmptyEnvelope { }
/// A closed envelope containing a letter.
pub struct ClosedEnvelope {
letter: Letter,
}
impl EmptyEnvelope {
/// Put a letter in the envelope and seal it.
pub fn wrap(self, letter: Letter) -> ClosedEnvelope {
ClosedEnvelope { letter: letter }
}
}
impl PickupLorryHandle {
/// Give an envelope to the delivery driver.
pub fn pickup(&mut self, envelope: ClosedEnvelope) {
/* give letter */
}
...
}
pub fn buy_prestamped_envelope() -> EmptyEnvelope { EmptyEnvelope { } }
And here’s the updated client code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
fn main() {
let rustfest_letter = Letter::new(String::from("Dear RustFest"));
let mut first_envelope = buy_prestamped_envelope();
first_envelope.wrap(rustfest_letter);
let eth_letter = Letter::new(String::from("Dear ETH"));
first_envelope.wrap(eth_letter);
let mut lorry = order_pickup();
lorry.pickup(first_envelope);
let another_envelope = buy_prestamped_envelope();
lorry.pickup(another_envelope);
lorry.done();
}
error[E0308]: mismatched types --> letter2.rs:10:18 | 10 | lorry.pickup(first_envelope); | ^^^^^^^^^^^^^^ expected struct `ClosedEnvelope`, | found struct `EmptyEnvelope` | = note: expected type `ClosedEnvelope` found type `EmptyEnvelope` error[E0308]: mismatched types --> letter2.rs:13:18 | 13 | lorry.pickup(another_envelope); | ^^^^^^^^^^^^^^^^ expected struct `ClosedEnvelope`, | found struct `EmptyEnvelope` | = note: expected type `ClosedEnvelope` found type `EmptyEnvelope` error: aborting due to 2 previous errors For more information about this error, try `rustc --explain E0308`.
One problem prevented: no empty envelopes can be sent! Note how the compiler errors point us towards a solution: we need a ClosedEnvelope
for the lorry
to pickup
. Let’s fix the client code again.
1
2
3
4
5
6
7
8
9
10
11
12
13
fn main() {
let rustfest_letter = Letter::new(String::from("Dear RustFest"));
let envelope = buy_prestamped_envelope();
let closed_envelope = envelope.wrap(rustfest_letter);
let eth_letter = Letter::new(String::from("Dear ETH"));
let closed_envelope = envelope.wrap(eth_letter);
let mut lorry = order_pickup();
lorry.pickup(closed_envelope);
lorry.done();
}
error[E0382]: use of moved value: `envelope` --> letter2.rs:8:27 | 5 | let closed_envelope = envelope.wrap(rustfest_letter); | -------- value moved here ... 8 | let closed_envelope = envelope.wrap(eth_letter); | ^^^^^^^^ value used here after move | = note: move occurs because `envelope` has type `EmptyEnvelope`, which does not implement the `Copy` trait error: aborting due to previous error For more information about this error, try `rustc --explain E0382`.
By making EmptyEnvelope
take self
’s ownership we can use the linear typing technique from earlier to prevent reuse of envelope
: once we’ve put a letter in an envelope, we get back a ClosedEnvelope
, that can only be handed over to the lorry
driver. Now the compiler can help us make sure we follow the protocol steps in order: put the letter in the envelope, then send it.
We’ll see a more complex example in which we compose an http response in the right order using this technique. For now, here’s the correct client code with the new API:
1
2
3
4
5
6
7
8
9
fn main() {
let rustfest_letter = Letter::new(String::from("Dear RustFest"));
let mut envelope = buy_prestamped_envelope();
let closed_envelope = envelope.wrap(rustfest_letter);
let mut lorry = order_pickup();
lorry.pickup(closed_envelope);
lorry.done();
}
Another common mistake: forgetting to release a resource. In the following client code, lorry.done()
is missing, and we never tell the delivery driver we’re done. And, in this tortured methaphor, we’d never deliver the letter because the driver never leaves…
1
2
3
4
5
6
7
8
fn main() {
let rustfest_letter = Letter::new(String::from("Dear RustFest"));
let mut envelope = buy_prestamped_envelope();
let closed_envelope = envelope.wrap(rustfest_letter);
let mut lorry = order_pickup();
lorry.pickup(closed_envelope);
}
In the real world, we may be keeping a connection open, or never completing a process in our business logic; and it’s just because we forgot a single line.
We’ve seen that we can hook into Rust’s drop mechanism; here’s how we’d ensure that we release the lorry
:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
impl Drop for PickupLorryHandle {
fn drop(&mut self) {
if !self.done {
self.done();
}
}
}
impl PickupLorryHandle {
/// Tell the driver we don't have anything else for them.
pub fn done(self) {
self.done = true; println!("sent");
}
...
}
1
2
3
4
5
6
7
8
9
10
11
fn main() {
let rustfest_letter = Letter::new(String::from("Dear RustFest"));
let envelope = buy_prestamped_envelope();
let closed_envelope = envelope.wrap(rustfest_letter);
let mut lorry = order_pickup();
lorry.pickup(closed_envelope);
// `lorry` dropped here
}
Now when the lorry goes out of scope the drop
implementation is called, and we automatically send the driver on their merry way.
std
libraryThis last pattern is often used in the std
library when building abstractions where there’s a need to release a resource or a handle to a resource.
Rc
, “a single-threaded reference-counting pointer”, tracks the number of references to a resource, and the ownership of the Rc
pointer represents a strong reference: when no references are left (all Rc
s are dropped), the underlying resource (memory and other elements of the struct
) should be freed.
There’s some unsafe
trickery under the hood, but what’s relevant here is that Rc
uses Drop
to release the underlying memory once the reference-count reaches zero.
Here’s the relevant code: std::rc::Rc
.
Let’s look at another example in std
: Mutex
, “a mutual exclusion primitive useful for protecting shared data”.
Locking a Mutex
returns a MutexGuard
, “an RAII implementation of a scoped lock of a mutex”, that is, an object that represents the fact that we’re holding the lock. And Drop
is used so we can release the lock just by drop
ping the guard (either explicitly, with std::mem::drop
, or when it goes out of scope).
This is the signature of Mutex::lock
:
pub fn lock(&self) -> LockResult<MutexGuard<T>>
And here’s the Drop
implementation for MutexGuard
:
std::sync::MutexGuard
.
Now let’s look at an example of how all of these techniques can be combined in a realistic API. Let’s say we’re building an http server library, and we want to make sure that when we write our response we first send all the headers, and only then start writing the body. We may also want to make sure not to forget to flush the buffer at the end.
We use two structs to represent the two states: HttpResponseWritingHeader
, and HttpResponseWritingBody
. The method body
on HttpResponseWritingHeader
takes ownership of self
, ensuring that the header writer can no longer be used after this call (its ownership is transfered), and we can only append additional chunks to the response body.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
pub struct HttpResponseWritingHeaders { /* connection, … */ }
pub struct HttpResponseWritingBody { /* ... */ }
pub fn start_response() -> HttpResponseWritingHeaders { /* ... */ }
impl HttpResponseWritingHeaders {
fn header(&mut self, header: Header) { /* ... */ }
fn body(self) -> HttpResponseWritingBody { /* ... */ }
}
impl HttpResponseWritingBody {
fn write(&mut self, chunk: Chunk) { /* ... */ }
fn cease(self) { }
}
impl Drop for HttpResponseWritingBody {
fn drop(&mut self) {
self.flush();
}
}
The Drop
implementation ensures that a completed response is always fully written out to the client.
Note that the technique of representing the state of a protocol/object with many structs has a significant drawback when the possible state space grows large: we may end up juggling a lot of structs, and the benefit of compile time checks may not be worth the cost in lines-of-code, and maintainability.
At the Systems Group of the ETH Zürich CS department, we’re working on timely dataflow, a low-latency cyclic dataflow computational model: it lets you describe computations as a cyclic graph of nodes (operators), that perform some transformation on incoming data and maintain local state, and edges (channels) that carry data between operators. A computation built this way can be automatically parallelised across cores and computers, over the network.
In timely dataflow, data is transported on channels as tuples, each carrying a logical timestamp.
Timestamps are used to represent logical boundaries between groups of tuples so that operations can be performed on some subset, and timely dataflow tracks which timestamps are in flight in the system. Importantly, for timely dataflow’s correctness, operators are only allowed to send messages with timestamp t
when they hold a Capability
for the same timestamp t
. In addition, they need to report whenever they relinquish one of these capabilities, so that the system can make forward progress (that is, a downstream operator can then process all the tuples associated with the capability’s timestamp).
In timely, we use something like the following to represent a Capability
, a resource which grants permission to send data at a certain timestamp.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
/// The capability to send data with a certain timestamp on a dataflow edge.
pub struct Capability<T: Timestamp> {
time: T,
internal: Rc<RefCell<ChangeBatch<T>>>,
}
impl<T: Timestamp> Clone for Capability<T> {
fn clone(&self) -> Capability<T> {
// … update self.internal …
}
}
impl<T: Timestamp> Drop for Capability<T> {
fn drop(&mut self) {
// … update self.internal …
}
}
/// Return a channel handle to send data at the timestamp carried by `cap`.
pub fn session(&mut self, cap: &Capability<T>) -> Handle
Capability
has a custom Clone
implementation, that keeps track of how many copies we’ve made (a bit like Rc
), and a Drop
implementation, so we can be sure that a Capability
is dropped if not explicitly retained. The function that an operator uses to write to its output requires a reference to a valid Capability
for a certain timestamp: this way we enforce, at compile time, one of the protocol constraints of timely dataflow.
Thanks for reading! You can send me notes and questions on twitter. There’s a lobste.rs and a reddit post for comments.