6 months into Rust: what was good in 2017 and what will be better in 2018? | Blog

Last summer, I finally decided to start learning Rust. I published my first blog post about Rust in September, I then attended my first Rust conference to learn more about the language, and started three Rust projects on GitHub. I now have more experience about the Rust ecosystem, and I want to share some feedback. The call for community blog posts by the Rust team comes right in time!

In this post, I first want to summarize some key features that I particularly enjoy in Rust, and then focus on what is missing to improve the language!

Best features of Rust so far
What I dream of for 2018
Conclusion

Best features of Rust so far

If you haven’t tried Rust yet, here are some key features to convince you that this language really improves the programming experience. In this post I’ll focus on the Rust ecosystem in general; for more details about the language itself I encourage you to have a look at my previous posts.

Seamless compilation and dependency management

The first important feature is that it is very easy to compile a project and manage dependencies. Thanks to cargo, compiling, running or testing a project is as simple as a one-line command.

cargo build
cargo run
cargo test

Adding source files to compile in your project is also quite simple. The root folder src/ contains a file named either main.rs (for a binary) or lib.rs (for a library), and each subfolder contains a mod.rs file. Adding new files or subfolders is done by adding a simple statement mod foo. Also, organizing the code into modules allows to manage public/private visibility.

// Private module
mod foo;
// Public module
pub mod bar;

Specifying dependencies is also straightforward if they are published on https://crates.io/. All you have to do is add a line in the Cargo.toml configuration file, and add an extern crate foo; statement in your project root (main.rs or lib.rs).

[dependencies]
byteorder = "^1.0.0"

This contrasts with C/C++ for which the compiler only knows how to compile one file and there is no notion of module. This leads to the artificial separation into .h and .c files with #include preprocessor macros, which is a bit of a hack.

It is not uncommon to see (mostly old) C programs consisting of a single .c file with 10,000 lines of code, which makes it hard for external contributors to join the project. And writing a good Makefile that correctly keeps track of dependencies is hard to do, even though tools like CMake have improved the landscape. Yet, people tend to reinvent the wheel and there are at least a dozen C/C++ build systems, which makes it even harder to manage dependencies.

To take the example of another language, the OCaml compiler also follows the file-by-file compilation model, so one has to use other tools like ocamlbuild and ocamlfind to manage files. Adding dependencies can also be quite challenging (here is an example of Makefile for an OCaml project).

To sum up, having a good build system and package manager by default makes it much simpler to share projects, and Rust has done it right so far!

Easy testing and continuous integration

Writing tests in Rust couldn’t be more straightforward! All you have to do is to declare a function with #[test].

// A function in your module.
fn add(x: i32, y: i32) -> i32 {
    x + y
}

// Let's write a test for it.
#[test]
fn test_add() {
    assert_eq!(add(12, 34), 46);
}

Then, you just have to run cargo test and Cargo will collect all tests, compile and run them. This is really simple: no need to add extra dependencies or to put tests in a separate file!

Rust also integrates well with continuous integration systems. For example, a .travis.yml configuration file for Travis-CI is as simple as the following.

language: rust
rust:
    - stable
    - beta
    - nightly
matrix:
    allow_failures:
        - rust: nightly

You can also write helper functions that are only compiled for tests, but never ship to production. For this, simply declare them with #[cfg(test)].

// A function included in the final binary.
fn add(x: i32, y: i32) -> i32 {
    x + y
}

// A helper function for easier tests.
// Not included in the final binary.
#[cfg(test)]
fn add_zero(x: i32) -> i32 {
    add(x, 0)
}

// A test.
// Not included in the final binary.
#[test]
fn test_add_zero() {
    let x = 123;
    assert_eq!(add_zero(x), x);
}

Error messages

If you have programmed in C++ before, you probably noticed that clang greatly improved error messages compared to GCC. The Rust compiler goes one step further in terms of clarity. Error messages are well presented (formatting, colors), aware of the context, and suggest potential misspellings. As an example, the following code:

fn main() {
    let foo = "Hello, world!";
    println!("{}", fool);
}

yields this nice error message – and in reality it’s colored in your console!

error[E0425]: cannot find value `fool` in this scope
 --> src/main.rs:3:20
  |
3 |     println!("{}", fool);
  |                    ^^^^ did you mean `foo`?

error: aborting due to previous error

If hard work on the compiler allows to produce such beautiful error messages, the language itself is also important. For example, having meaningful error messages in C++ is sometimes impossible when templates expand to hundreds of nested types. This is notably because SFINAE is a C++ hack rather than a first-class citizen, whereas Rust traits allow the compiler to provide good error messages out-of-the-box.

One feature that I like is that the Rust compiler aggressively warns you for every piece of unreachable code, and for unused variables, fields or functions. This is sometimes annoying when you are refactoring some code or programming a new feature, but it helps you obtain clean code in the end. Here again, detecting unused functions is easier for a language that has a unified compilation model, as opposed to file-by-file compilation + linker.

Code formatting with `rustfmt`

Thanks to rustfmt, you don’t have to worry about formatting conventions. All you have to do is to setup this tool and code will be formatted automatically. There is no need to adapt to a new convention every time you switch projects, because rustfmt provides a common baseline (at most 80 characters per line, position of spaces and braces, etc.). This saves time on details and allows you to focus on the code itself.

Yet, you can configure some formatting rules if you want, and thanks to a configuration file these rules will also be adopted by other users of rustfmt. But I personally find the default formatting convention really good.

There is even a vim plugin that integrates well with rustfmt, and all you have to do is add the following line to your .vimrc to format all files upon saving. No need to think about rustfmt anymore!

let g:rustfmt_autosave = 1

Documentation quality

I must say that documentation about the Rust language and individual crates are really clear! The Rust books provide detailed and clear explanations about all aspects of the language, with runnable snippets of code thanks to the Rust playground.

The user experience on https://docs.rs/ is also well-polished, thanks to:

many links throughout the code, towards language primitives, traits in the standard library, or types in the current crate;
foldable paragraphs to quickly see a summary of all functions as well as detailed documentation for a specific function;
many annotations and interactive features.

Coming from the C++ world, this is at least as good as https://en.cppreference.com.

What I dream of for 2018

As you probably guessed, I find that Rust is already a very good programming language. Yet, there is always room for improvement, and I would like to share some features that would be useful to add to Rust. These wishes are based on my experience on real projects, so I’ll reference real-world code throughout this section.

Launching `constexpr` and `usize` generic parameters

The constexpr keyword is a major feature of modern C++ (post-2011). It allows to compute some expressions at compile time, and use them where a compile-time value is expected, for example as template parameters. This provides a zero-cost abstraction for clear generic code.

One use case is for an array whose size depends on a compile-time expression. Until constant generics are implemented in Rust, one can use a Vec instead of an array, but it is an unnecessary overhead when the array length is fixed and known at compile time. The size and capacity fields of Vec are not useful, but also the data is allocated anywhere on the heap, which reduces cache locality and can decrease performance due to cache misses. Also, some bounds checks on arrays could be optimized away at compile time, given that the length is already known.

Compile-time expressions and constant-based generics have been proposed for some time in Rust, but the implementation is not trivial so these features have not landed in the compiler yet.

A real-world example for these features is the BitTree class in my lzma-rs crate: given a number of bits $N$ , the code allocates an array of $2^N$ elements. The number of bits is known at compile time, so it would be nice to be able to use a compile-time array, and parametrize the BitTree class by the number of bits.

I would ideally rewrite it as follows, to obtain a memory layout similar to the original C implementation of LZMA (for which all of the decompression state is in a contiguous block of memory), and hopefully improve performance.

pub struct BitTree<num_bits: usize> {
    probs: [u16; 1 << num_bits],
}

impl BitTree<num_bits: usize> {
    pub fn new() -> Self {
        BitTree {
            probs: [0x400; 1 << num_bits],
        }
    }

    ...
}

Array initialization without copy

Let’s assume that you want to initialize an array of non-copyable objects. Unfortunately, the following code does not compile, because the initialization syntax [Bar::new(); 5] copies the given element Bar::new() in all the array slots.

// Does not #[derive(Copy)]
struct Bar { ... }

impl Bar {
    fn new() -> Bar { ... }
}

struct Foo {
    bar: [Bar; 5]
}

impl Foo {
    fn new() -> Foo {
        Foo {
            bar: [Bar::new(); 5]
        }
    }
}

The workaround that I found so far was using a Vec instead of an array, and use the vec! macro.

struct Foo {
    bar: Vec<Bar>
}

impl Foo {
    fn new() -> Foo {
        Foo {
            bar: vec![Bar::new(); 5]
        }
    }
}

This works, but is not satisfactory because we lose the compile-time information about the fixed size, and the vector allocates memory separately instead of directly in the structure’s memory.

A nice feature would be to have some kind of macro that allows to initialize an array by evaluating an expression in all the memory slots, equivalent to the following.

bar: [Bar::new(), Bar::new(), Bar::new(), Bar::new(), Bar::new()]

This would allow to create arrays of objects that are neither copyable nor default-constructible. A real-world example where it would be useful for my lzma-rs crate is the following.

If such a feature already exists or has been proposed, I am looking forward to hearing about it!

Array size and slicing

In today’s Rust, arrays have a compile-time size built into their type. This prevents misusing an array of the wrong length.

fn foo(x: &[i32; 3]) {
    println!("x = {:?}", x)
}

fn main() {
    let x = [1, 2, 3, 4];
    // Does not compile, foo expects 3 elements.
    foo(&x);
}

Rust also introduced the concept of slices, that are essentially views in a contiguous sequence (i.e. from an array or a vector). Such views don’t own the data, so the associated type is a reference &[T]. Contrary to arrays (or array references such as &[T; 5]), size information is not part of the type, but only known at runtime.

let x = [1, 2, 3, 4];
// Slicing: size information is lost by the type system.
// The type is `&[i32]`.
let y: &[i32] = &x[0..2];
assert_eq!(y, &[1, 2]);

This is useful when the size is fixed but not known at compile time. On the other hand, size information is lost during slicing (the &x[0..2] operation), even if the size was known at compile time! This means that the size is now checked at runtime, using extra memory and instructions to store the size and check it, delaying the discovery of potential bugs.

Additionally, slices cannot be converted back to array references, so APIs often support slices even when a fixed size is expected.

fn foo(x: &[i32; 3]) {
    println!("x = {:?}", x)
}

fn main() {
    let x = [1, 2, 3, 4];
    // Does not compile, foo expects an array of 3 elements, not a slice.
    foo(&x[0..3]);
}

In this case, there is no fundamental reason for the compiler to reject this code, as the indices are known at compile time. Bounds checking could be done by the compiler, and the size of the slice could be embedded in the resulting type.

To circumvent this problem, the arrayref crate allows to do this kind of manipulation, but it calls the slicing operator under the hood, so bounds are again checked at runtime. In principle, the compiler could optimize away bounds checks if sizes are known (such as in the above example), but information is still lost by the type system.

I use extensively the arrayref crate in cryptographic code, but I really think that this should be idiomatic Rust, with compile-time bounds checking as much as possible. This is certainly related to my previous section about usize generic parameters.

Stabilizing part of `asm` or providing intrinsics

For many reasons, some code heavily relies on CPU-specific instructions, such as vectorized instructions (SSE, AVX) or cryptographic primitives. These instructions yield better performance, but also provide some constant-time guaranties which are useful for cryptography (e.g. AES-NI instructions).

Rust already provides an asm macro to directly emit assembly code and hence have access to these specific instructions. However, this macro is only available on Rust’s nightly compiler.

Some examples of cryptographic code that use assembly instructions are the aesni crate for fast AES implementation, or my implementation of the Haraka hash function (which also uses AES instructions under the hood). In my case, I just wrapped individual assembly instructions into some kind of intrinsics.

By looking at the assembly output with the following command, I can confirm that the Rust compiler optimized away the boilerplate of intrinsics functions!

cargo rustc --release -- --emit asm

I know that this has been debated for a while and that there does not seem to be an easy solution, but stabilizing the most common instructions via intrinsics would already be beneficial to provide more compiler checks, in particular regarding type-checking and mutability.

Stable code coverage and benchmarking

Benchmarking and code coverage are essential tools of modern programming. Rust nightly already supports benchmarking via cargo bench, but it would be nice to have it in stable Rust ;-) Work towards this is already in progress.

Regarding code coverage, it is possible to use kcov on Linux, as explained here. However, there does not seem to be any portable implementation yet.

Conclusion

Rust is already a great language that provides high-level abstractions for a very small performance overhead, and gives access to many useful features beyond code safety. Adding more features could make it even more efficient, so I am looking forward to what new cool surprises will come into Rust in 2018!

Comments

To react to this blog post please check the Twitter thread.

RSS | Mastodon | GitHub