从C++和OCaml程序员的视角看Rust (Part 2)

本文来源地址

SEPTEMBER 29, 2017

by GUILLAUME ENDIGNOUX
@GEndignoux

This post is the second of my series about Rust compared to other languages. After a first post about the basics of the language, I’ll now discuss memory management, a cornerstone of Rust’s safety guarantees. You will hopefully get a clear idea about how moves, borrows and other lifetimes work!

Despite the title, in this post, I will mostly compare Rust to C++, because there isn’t much memory management to do in OCaml, which is garbage-collected.


Allocation and memory access

Memory management is a core aspect of programming. The fundamental reason is that most complex algorithms require a variable amount of memory, depending on their input. Therefore, we obtain such memory at runtime with allocators. Most of the time, we also want to deallocate this memory once we are done using it, to avoid exhausting memory resources (otherwise we have a memory leak). But we must make sure that we do not access the memory again after deallocation, because this memory may have been reallocated for other unrelated objects (use-after-free bug).

Several strategies have been implemented to manage memory resources.

  • In C, memory management is manual with the malloc() and free() functions. This gives full control to the programmer, but the code is tedious and repetitive, and a single mistake can lead to (security) bugs (memory leak, use-after-free, double-free, etc.).
  • In C++, the <abbr title="Resource acquisition is initialization" style="box-sizing: border-box; margin: 0px; padding: 0px; border-bottom: 1px dotted; cursor: help;">RAII</abbr> paradigm automates memory (and more generally resources) management with constructors and destructors. If destructors are properly implemented, the resources acquired by a variable are released as soon as the variable goes out-of-scope. The destructing code is automatically added by the compiler, and this ensures that neither memory leaks nor use-after-freehappen.
  • In most modern languages (e.g. Java, Python, OCaml), a garbage collector (<abbr title="Garbage collector" style="box-sizing: border-box; margin: 0px; padding: 0px; border-bottom: 1px dotted; cursor: help;">GC</abbr>) runs from time to time and looks for unreachable memory by traversing all allocated memory.

Although garbage collection avoids pitfalls of manual memory management and is easy to use for the programmer (there is essentially nothing to do), it introduces some overhead (due to memory traversal) and unpredictability. There is no way to know when the <abbr title="Garbage collector" style="box-sizing: border-box; margin: 0px; padding: 0px; border-bottom: 1px dotted; cursor: help;">GC</abbr> will run. But every time it runs there is some delay before the program resumes, which can be problematic for highly-interactive scenarios. Also, <abbr title="Garbage collector" style="box-sizing: border-box; margin: 0px; padding: 0px; border-bottom: 1px dotted; cursor: help;">GC</abbr> only cares about memory, so other resources (e.g. files, network connections) must be managed by other means, e.g. back with the manual strategy, or with finalizers.

Note for the curious: In OCaml, the native integer type has one bit reserved by the <abbr title="Garbage collector" style="box-sizing: border-box; margin: 0px; padding: 0px; border-bottom: 1px dotted; cursor: help;">GC</abbr>, to avoid wrapping the value into a box. So int is a 63-bit integer (on 64-bit architectures), i.e. in range <math><semantics><annotation encoding="application/x-tex">[-2^{62}, 2^{62}-1]</annotation></semantics></math>[−262,262−1].

Another problem is unsafe access to memory. In C (and C++), because you have pointers with unrestricted arithmetic operations, you can (accidentally) access unallocated memory. This often boils down to access an array index i larger than the array’s length l, i.e. an out-of-bound access (or buffer overflow). This can be avoided at the cost of a comparison i < l, as implemented in C++’s std::vector::at() function. In OCaml, there are no pointers, and array access always checks bounds, so you don’t have these safety problems.

In Rust, the <abbr title="Resource acquisition is initialization" style="box-sizing: border-box; margin: 0px; padding: 0px; border-bottom: 1px dotted; cursor: help;">RAII</abbr> strategy of allocation is chosen for its predictability and performance properties, and its safety is enhanced compared to C++. There is no direct access to pointers (unless you explicitly write unsafe code), and we will now see that safe access is enforced by the compiler thanks to more static checks!

Ownership

Rust goes beyond <abbr title="Resource acquisition is initialization" style="box-sizing: border-box; margin: 0px; padding: 0px; border-bottom: 1px dotted; cursor: help;">RAII</abbr>, by introducing ownership rules that are enforced statically at compile time.

Some objects are not copyable, which means that only one variable can have ownership of a given object. This is, for example, the case of heap-allocated objects (the Box type), that implement deallocation in their destructor. Indeed, if such an object could be copied verbatim, then each copy would call the deallocation and we would obtain a double-free bug.

Instead, ownership can be transferred via a move. This is similar to std::move() in C++11, but in Rust it’s automatic and the compiler prevents access to the moved-from variable.

// C++14
auto a = std::make_unique<int>(1);
// The compiler forbids a copy "b = a", we need to move explicitly.
auto b = std::move(a);
std::cout << "b = " << *b << std::endl;
// The following is accepted by the compiler, despite "a" being in an unspecified state at runtime.
std::cout << "a = " << *a << std::endl;

// Rust
let a = Box::new(1);
// The box is moved from "a" to "b" and ownership is transfered.
let b = a;
println!("b = {}", b);
// This does not compile because "a" has lost ownership.
println!("a = {}", a);
// error[E0382]: use of moved value: `a`
//  --> src/main.rs:7:24
//   |
// 4 |     let b = a;
//   |         - value moved here
// ...
// 7 |     println!("a = {}", a);
//   |                        ^ value used here after move
//   |
//   = note: move occurs because `a` has type `std::boxed::Box<i32>`, which does not implement the `Copy` trait

Although transferring ownership is safe, we often want to have several variables pointing to the same object, in particular for function calls. Typically, the calling function wants to give access to some data to the called function (via an argument), but needs to access it after the call, so transferring ownership to the called function does not work.

For this purpose, Rust introduces a concept of references, that borrow objects from their owner. Similarly to C/C++, a reference can be obtained with the & operator, and the corresponding reference type is &T.

// Type annotations are optional, written here for clarity.
let a : i32 = 1;
let b : &i32 = &a;
println!("a = {}", a);
println!("b = {}", b);

Borrows and mutability

By default, Rust variables are immutable. If you want to borrow a variable to modify it, you need to use a mutable borrow, as follows.

// The original variable must be mutable to obtain a mutable borrow
let mut a = 1;
{
    let b = &mut a; // Mutable borrow
    *b += 1; // Use "*" to dereference
}
println!("a = {}", a);
// a = 2

Here, you can notice that we used a temporary scope to operate on b. This is because Rust has strict rules about borrows, enforced by the so-called borrow checker. These rules can be formalized as follows.

  • (Paraphrased from the Rust book) You may have one or the other of these two kinds of borrows, but not both at the same time:
    • one or more immutable references (&T) to a resource,
    • exactly one mutable reference (&mut T).
  • When a variable is mutably borrowed, it cannot be read. Likewise, when a variable is (mutably or immutably) borrowed, it cannot be modified.

To summarize, as long as a variable is mutably borrowed, nothing else can access this variable. This rules may seem strict but are here to avoid race-conditions. Indeed, « threads without data races » is one of the main goals of Rust.

But these rules also avoid data races in single-threaded programs! For example, in C++ you can obtain a reference to a vector cell, and then resize this vector, obtaining a dangling reference…

// C++
std::vector<int> v(10, 0); // Vector of length 10.
int& x = v[9];
v.resize(5); // The resize invalidates reference "x".
// Undefined behavior.
std::cout << "x = " << x << std::endl;

In Rust, this would violate the third rule: v cannot be modified as long as reference x is in scope. This example might seem obvious, but if you hide references behind iterators and wrap this code in a more complex program, it becomes hard to debug in practice!

Warning: Despite the name and the type syntax, references in Rust really are safe pointers. Contrary to C++ references, they can be rebound to another object, as long as they are declared mutable (mut). Note that this is orthogonal to a mutable borrow, for which the referred-to object is mutable.

let a = 1;
let b = 2;
let mut r = &a;
println!("r = {}", r);
r = &b;
println!("r = {}", r);
// r = 1
// r = 2

Lifetimes

Beyond the basic rules on mutability and borrows, Rust introduces the concept of lifetimes, to avoid dangling references. A typical example of dangling reference is to return a reference to a temporary.

fn bad() -> &i32 {
    let a : i32 = 1;
    return &a;
}
// error[E0106]: missing lifetime specifier
//  --> src/main.rs:1:13
//   |
// 1 | fn bad() -> &i32 {
//   |             ^ expected lifetime parameter
//   |

This does not compile, and Rust requests a « lifetime specifier » for the return type. Returning a reference is not forbidden in itself, and there are indeed valid reasons to do it. For example, you may want to access an array cell or some internal data in an object, to modify it or to read it without an expensive copy.

Let’s take a more concrete example with a Point structure over integers, and define a getx() function that returns a reference to the x coordinate (i.e. takes an argument of type &Point and returns a &i32). The returned reference to the x coordinate will logically have the same lifetime as the Point parameter because x is part of the point. But we don’t know exactly what this lifetime is, and in fact, this varies for each call to the function getx(), depending on the parameter. So we use a generic lifetime 'a (or any word preceded by an apostrophe) and write the function as follows.

#[derive(Debug)]
struct Point { x: i32, y: i32 }

fn getx<'a>(p : &'a Point) -> &'a i32 {
    return &p.x;
}

fn main() {
    let p = Point {x: 1, y: 2};
    let px = getx(&p);
    println!("p.x = {}", px);
    println!("p = {:?}", p);
}
// p.x = 1
// p = Point { x: 1, y: 2 }

Here, the diamond syntax getx<'a> means that getx() is parametrized by a generic lifetime 'a. This syntax is similar to templates in C++. Then, &'a Point represents a reference to Point which has lifetime 'a, and the result &'a i32 is a reference to i32 which has the same lifetime 'a.

In fact, I lied a bit to you because in practice you don’t need to explicit lifetimes for simple functions such as getx(). This is due to lifetime elision rules, that allows the programmer to remove lifetime annotations in non-ambiguous cases, where the compiler can figure them out itself. For example, if there is only one reference parameter, a reference result can only come from this parameter (reference to temporaries have an invalid lifetime), so the compiler infers that they share a common lifetime 'a. So you can rewrite getx() as follows.

fn getx(p : &Point) -> &i32 {
    return &p.x;
}

Lifetimes in structures

Thanks to elision rules, you probably won’t have to write lifetimes in functions, unless you write complex stuff. However, a common case where you won’t avoid lifetimes is if you use references in structures.

To give a more concrete example, let’s consider a common exercise in functional programming: writing a linked list. Typically, a list is either empty or contains a pointer to the first node. In turn, each node contains a value and a pointer to the remaining of the list, which is itself a list. Let’s try to formalize this.

enum List { Nil(), Next(Node) }
struct Node { value: i32, next: List }
// error[E0072]: recursive type `List` has infinite size
//  --> src/main.rs:1:1
//   |
// 1 | enum List { Nil(), Next(Node) }
//   | ^^^^^^^^^               ----- recursive without indirection
//   | |
//   | recursive type has infinite size
//   |
//   = help: insert indirection (e.g., a `Box`, `Rc`, or `&`) at some point to make `List` representable

As we can see, if we don’t use any kind of indirection and put the next node directly inside the current node, we obtain a recursive structure and the compiler detects that it is not representable. Instead, let’s try to use a reference to the next node.

enum List { Nil(), Next(&Node) }
// error[E0106]: missing lifetime specifier
//  --> src/main.rs:1:25
//   |
// 1 | enum List { Nil(), Next(&Node) }
//   |                         ^ expected lifetime parameter

Now, we are back to lifetimes. You cannot write a reference inside a structure (or enumeration) without specifying its lifetime.This makes sense because such a reference points to another object, which must be alive at least as long as the current object exists (otherwise we have a dangling reference).

So let’s try with a generic lifetime 'a.

enum List { Nil(), Next(&'a Node) }
// error[E0261]: use of undeclared lifetime name `'a`
//  --> src/main.rs:1:26
//   |
// 1 | enum List { Nil(), Next(&'a Node) }
//   |                          ^^ undeclared lifetime

This lifetime must be declared. For this, we add it to the List enumeration, using the diamond notation. The following means that the List has some lifetime 'a, and that the reference to the next Node must have at least this lifetime 'a.

enum List<'a> { Nil(), Next(&'a Node) }

Now, we rewrite the Node structure to also declare and use the lifetime 'a.

struct Node<'a> { value: i32, next: List<'a> }
// error[E0106]: missing lifetime specifier
//  --> src/main.rs:1:33
//   |
// 1 | enum List<'a> { Nil(), Next(&'a Node) }
//   |                                 ^^^^ expected lifetime parameter

By doing this, we also need to propagate the lifetime to the Node instance in the List definition. The following code finally works and implements our linked list!

#[derive(Debug)]
enum List<'a> { Nil(), Next(&'a Node<'a>) }
#[derive(Debug)]
struct Node<'a> { value: i32, next: List<'a> }

fn main() {
    let n1 = Node {value: 1, next: List::Nil()};
    let n2 = Node {value: 2, next: List::Next(&n1)};
    println!("node = {:?}", n2);
}
// node = Node { value: 2, next: Next(Node { value: 1, next: Nil }) }

Now, let’s see if the borrow checker really checks lifetimes. In the following example, we replace the next node of n2 by a node n3which does not live long enough, due to a temporary scope. We can verify that the borrow checker complains!

fn main() {
    let n1 = Node {value: 1, next: List::Nil()};
    let mut n2 = Node {value: 2, next: List::Next(&n1)};
    {
        let n3 = Node {value: 3, next: List::Nil()};
        n2.next = List::Next(&n3);
    }
    println!("node = {:?}", n2);
}
// error[E0597]: `n3` does not live long enough
//   --> src/main.rs:12:5
//    |
// 11 |         n2.next = List::Next(&n3);
//    |                               -- borrow occurs here
// 12 |     }
//    |     ^ `n3` dropped here while still borrowed
// 13 |     println!("node = {:?}", n2);
// 14 | }
//    | - borrowed value needs to live until here

Compare this to pointers in C/C++, for which the compiler does not complain about such unsafe code!

// DO NOT DO THIS!
struct Node {
    Node(int v) : value(v), next(nullptr) {}
    Node(int v, Node* n) : value(v), next(n) {}

    int value;
    Node* next;
}

void print(Node n);

int main() {
    Node n1(1);
    Node n2(2, &n1);
    {
        Node n3(3);
        n2.next = &n3;
    }
    // n2.next is now invalid but the compiler does not complain...
    print(n2);
}

In summary, the borrow checker verifies mutable borrows and lifetimes, to ensure safe access to all references. There are more interesting and powerful features about lifetimes, such as bounds and coercion, but this post is already long enough so I invite the curious reader to have a look at them in the Rust book.

Conclusion

Memory management is a ubiquitous programming problem, and there is no simple solution. You can hide this complexity with a simple API (malloc/free) or an invisible garbage collector, but with complex programs, you can end up with security problems or performance issues. Instead, Rust chose to expose powerful abstractions that are not easy to understand at first glance but provide compile-time safety without compromising on performance.

There is still a lot to say about Rust, so get ready for my next post in this series, and more Rust projects! In the meantime, you can find me at RustFest this weekend.

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 158,736评论 4 362
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 67,167评论 1 291
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 108,442评论 0 243
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 43,902评论 0 204
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 52,302评论 3 287
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 40,573评论 1 216
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 31,847评论 2 312
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 30,562评论 0 197
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 34,260评论 1 241
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 30,531评论 2 245
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 32,021评论 1 258
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 28,367评论 2 253
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 33,016评论 3 235
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 26,068评论 0 8
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 26,827评论 0 194
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 35,610评论 2 274
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 35,514评论 2 269