Writing const functions has been supported since Rust 1.31 in 2018. These functions can be evaluated at compile time, which is for example useful to shift expensive calculations before the program runs. Knowing values at compile time is also necessary for const generic parameters, a feature available since Rust 1.51 in 2021.

Even though const functions have existed for 6 years, only a subset of Rust is available in const context, as tracked in rust-lang/rust#57563. As this subset expands, more and more functions in the standard library become exposed as const fn. A recent example is the inline_const feature, stabilized in Rust 1.79.

In this post, I will focus on std::array::from_fn, a function that is unfortunately not const. After explaining why I’d find it useful in const contexts, I’ll investigate why it’s not const yet and what it would really take to make it so (spoiler: with lots of unstable Rust features).

For reproducibility, the code snippets in this post have been tested with Rust nightly version 1.81.

$ cargo +nightly -V
cargo 1.81.0-nightly (a1f47ec3f 2024-06-15)

Preamble: why array::from_fn is useful at compile time?

My journey started by investigating an experimental feature of Rust: portable_simd. Exposed in the std::simd module, this exposes SIMD operations in a hardware-agnostic way. In one sentence, SIMD allows to more efficiently calculate arithmetic operations on multiple values in parallel, by packing multiple values into a single register. It is widely available in one form or another in most modern CPUs.

For example, the u32x8 type represents 8 elements of type u32 packed together. As such, the most common use of SIMD operations is to speed-up element-wise operations on arrays.

#![feature(portable_simd)]
use std::simd::u32x8;

fn main() {
    let x = u32x8::from_array([0, 1, 2, 3, 4, 5, 6, 7]);
    let y = u32x8::from_array([8, 9, 10, 11, 12, 13, 14, 15]);
    // Element-wise sum.
    let z: [u32; 8] = (x + y).into();
    assert_eq!(z, [8, 10, 12, 14, 16, 18, 20, 22]);
}

Another useful operation is shuffling elements within registers, which is exposed via the simd_swizzle! macro

#![feature(portable_simd)]
use std::simd::{simd_swizzle, u32x8};

fn main() {
    let x = u32x8::from_array([42, 43, 44, 45, 46, 47, 48, 49]);
    let y = u32x8::from_array([50, 51, 52, 53, 54, 55, 56, 57]);

    // Swap pairs of consecutive elements.
    let z = simd_swizzle!(x, [1, 0, 3, 2, 5, 4, 7, 6]);
    let z: [u32; 8] = z.into();
    assert_eq!(z, [43, 42, 45, 44, 47, 46, 49, 48]);

    // Select even elements.
    let t = simd_swizzle!(x, y, [0, 2, 4, 6, 8, 10, 12, 14]);
    let t: [u32; 8] = t.into();
    assert_eq!(t, [42, 44, 46, 48, 50, 52, 54, 56]);
}

One practical application of shuffles is implementing Fast Fourier Transforms, where you typically need to add and subtract consecutive elements.

FFT step Simplified Fast Fourier Transform step.

This non-trivial operation can be implemented from basic SIMD operations thanks to shuffling elements.

FFT step with SIMD shuffles FFT step implemented with SIMD shuffles.

#![feature(portable_simd)]
use std::simd::{simd_swizzle, u32x8};

// Ceci n'est pas une FFT.
fn foo(x: u32x8, y: u32x8) -> (u32x8, u32x8) {
    let even = simd_swizzle!(x, y, [0, 2, 4, 6, 8, 10, 12, 14]);
    let odd = simd_swizzle!(x, y, [1, 3, 5, 7, 9, 11, 13, 15]);

    let a = even + odd;
    let b = even - odd;

    let x = simd_swizzle!(a, b, [0, 8, 1, 9, 2, 10, 3, 11]);
    let y = simd_swizzle!(a, b, [4, 12, 5, 13, 6, 14, 7, 15]);
    (x, y)
}

As you may have noticed, the simd_swizzle! macro takes an array parameter that describes the permutation in terms of indices: this array is in fact a compile-time array. And this is where array::from_fn would be useful, to automatically generate the sequence of even or odd indices.

// The code I wish to write:
let even = simd_swizzle!(x, y, array::from_fn(|i| 2 * i));
let odd = simd_swizzle!(x, y, array::from_fn(|i| 2 * i + 1));

This is even more useful if we want to generalize the FFT implementation to any number of lanes (not necessarily 8). Indeed, the number of lanes supported natively depends on the CPU architecture, so making the code generic and instantiating it based on CPU support can yield better performance. For that, the portable SIMD library has a generic Simd type.

#![feature(portable_simd)]
use std::array;
use std::simd::{simd_swizzle, LaneCount, Simd, SupportedLaneCount};

fn foo<const N: usize>(x: Simd<u32, N>, y: Simd<u32, N>) -> (Simd<u32, N>, Simd<u32, N>)
where
    LaneCount<N>: SupportedLaneCount,
{
    let even = simd_swizzle!(x, y, /* insert array of N indices */);
    let odd = simd_swizzle!(x, y, /* insert array of N indices */);
    (even, odd)
}

Alternatively to the simd_swizzle! macro, you may try to directly implement the Swizzle trait for every possible length.

#![feature(portable_simd)]
use std::simd::{LaneCount, Simd, SupportedLaneCount, Swizzle};

fn foo<const N: usize>(x: Simd<u32, N>, y: Simd<u32, N>) -> (Simd<u32, N>, Simd<u32, N>)
where
    LaneCount<N>: SupportedLaneCount,
{
    let even = FetchEven::concat_swizzle(x, y);
    ...
}

// Custom type to represent a swizzle operation that fetches even indices.
struct FetchEven;

// Let's implement it for all valid lane counts.
impl Swizzle<1> for FetchEven {
    const INDEX: [usize; 1] = [0];
}
impl Swizzle<2> for FetchEven {
    const INDEX: [usize; 2] = [0, 2];
}
impl Swizzle<4> for FetchEven {
    const INDEX: [usize; 4] = [0, 2, 4, 6];
}
impl Swizzle<8> for FetchEven { ... }
impl Swizzle<16> for FetchEven { ... }
impl Swizzle<32> for FetchEven { ... }
impl Swizzle<64> for FetchEven { ... }

However, this approach will not work: even though SupportedLaneCount is a sealed trait and you implement it for all acceptable values of N (powers of two from 1 to 64), the compiler will not allow that.

error[E0277]: the trait bound `FetchEven: Swizzle<N>` is not satisfied
 --> src/main.rs:8:16
  |
8 |     let even = FetchEven::concat_swizzle(x, y);
  |                ^^^^^^^^^ the trait `Swizzle<N>` is not implemented for `FetchEven`
  |
  = help: the following other types implement trait `Swizzle<N>`:
            `FetchEven` implements `Swizzle<16>`
            `FetchEven` implements `Swizzle<1>`
            `FetchEven` implements `Swizzle<2>`
            `FetchEven` implements `Swizzle<32>`
            `FetchEven` implements `Swizzle<4>`
            `FetchEven` implements `Swizzle<64>`
            `FetchEven` implements `Swizzle<8>`

What we really need is to implement FetchEven for any N, which requires calling array::from_fn in a const context.

impl<const N: usize> Swizzle<N> for FetchEven {
    const INDEX: [usize; N] = std::array::from_fn(|i| 2 * i);
}

This brings us to the topic of this post.

error[E0015]: cannot call non-const fn `std::array::from_fn::<usize, N, {closure@src/main.rs:15:51: 15:54}>` in constants
  --> src/main.rs:15:31
   |
15 |     const INDEX: [usize; N] = std::array::from_fn(|i| 2 * i);
   |                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   |
   = note: calls in constants are limited to constant functions, tuple structs and tuple variants

As mentioned by SkiFire13 on Reddit, it’s possible to directly generate an const array without using array::from_fn. That said, it’s much more verbose and therefore less ergonomic.

struct FetchEven;

impl<const N: usize> Swizzle<N> for FetchEven {
    const INDEX: [usize; N] = {
        let mut out = [0; N];
        let mut i = 0;
        while i < N {
            out[i] = 2 * i;
            i += 1;
        }
        out
    };
}

A minimal implementation

Let’s try to implement a compile-time array::from_fn. How hard can it be?

The first idea would be to take the existing implementation and add some const keywords on it. However, you’ll quickly realize that this implementation isn’t that simple: the std::array module contains almost 1000 lines at the time of writing (including documentation), of which array::from_fn() isn’t particularly trivial.

A more pragmatic approach is to implement it by hand: with MaybeUninit and a loop it should be easy, right?

use std::mem::MaybeUninit;

// A first naive implementation.
const fn array_from_fn<T, const N: usize>(f: impl Fn(usize) -> T) -> [T; N] {
    let mut array = MaybeUninit::uninit_array::<N>();
    for i in 0..N {
        array[i].write(f(i));
    }
    unsafe { MaybeUninit::array_assume_init(array) }
}

Not that easy. The compiler greets us with several rather straightforward errors, and guides us to use the following unstable features.

error[E0658]: `for` is not allowed in a `const fn`
 --> src/main.rs:6:5
  |
6 | /     for i in 0..N {
7 | |         array[i].write(f(i));
8 | |     }
  | |_____^
  |
  = note: see issue #87575 <https://github.com/rust-lang/rust/issues/87575> for more information
  = help: add `#![feature(const_for)]` to the crate attributes to enable

error[E0658]: use of unstable library feature 'maybe_uninit_uninit_array'
 --> src/main.rs:5:21
  |
5 |     let mut array = MaybeUninit::uninit_array::<N>();
  |                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  |
  = note: see issue #96097 <https://github.com/rust-lang/rust/issues/96097> for more information
  = help: add `#![feature(maybe_uninit_uninit_array)]` to the crate attributes to enable

error[E0658]: use of unstable library feature 'maybe_uninit_array_assume_init'
 --> src/main.rs:9:14
  |
9 |     unsafe { MaybeUninit::array_assume_init(array) }
  |              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  |
  = note: see issue #96097 <https://github.com/rust-lang/rust/issues/96097> for more information
  = help: add `#![feature(maybe_uninit_array_assume_init)]` to the crate attributes to enable

For more information about this error, try `rustc --explain E0658`.
  • const_for, to be able to use for loops in const functions. For loops are not as trivial as they seem, as under the hood they desugar to using the Iterator trait, and that itself requires being able to use traits in const contexts.
  • maybe_uninit_array_assume_init, maybe_uninit_uninit_array because the corresponding functions are not yet stable.
#![feature(
    const_for,
    maybe_uninit_array_assume_init,
    maybe_uninit_uninit_array,
)]

At the next iteration, the compiler suggests a few more unstable features.

  • const_maybe_uninit_array_assume_init, const_maybe_uninit_uninit_array and const_maybe_uninit_write to use the corresponding functions in const contexts.
  • const_trait_impl as a follow-up of const_for, to use the underlying Iterator trait in the for loop.
  • const_mut_refs to be able to mutate the array, as well as the implicit iterator in the for loop.
  • const_refs_to_cell to be able to call the function and overcome the following error.
error[E0658]: cannot borrow here, since the borrowed element may contain interior mutability
  --> src/main.rs:15:24
   |
15 |         array[i].write(f(i));
   |                        ^
   |
   = note: see issue #80384 <https://github.com/rust-lang/rust/issues/80384> for more information
   = help: add `#![feature(const_refs_to_cell)]` to the crate attributes to enable

The code doesn’t compile yet, but we now use 9 unstable features!

#![feature(
    const_for,
    const_maybe_uninit_array_assume_init,
    const_maybe_uninit_uninit_array,
    const_maybe_uninit_write,
    const_mut_refs,
    const_refs_to_cell,
    const_trait_impl,
    maybe_uninit_array_assume_init,
    maybe_uninit_uninit_array,
)]

At this point, things start to get spicy. First, the compiler mentions that iterating over a range doesn’t work because the Iterator::next() function isn’t const, but that the effects feature should fix that.

error[E0015]: cannot convert `std::ops::Range<usize>` into an iterator in constant functions
  --> src/main.rs:17:14
   |
17 |     for i in 0..N {
   |              ^^^^
   |
note: impl defined here, but it is not `const`
  --> /rustc/d7f6ebacee13b6c03623c4b74197280454ede8de/library/core/src/iter/traits/collect.rs:349:1
   = note: calls in constant functions are limited to constant functions, tuple structs and tuple variants
help: add `#![feature(effects)]` to the crate attributes to enable

Likewise, the function parameter f isn’t const, but adding a trait bound ~const Fn(usize) is supposed to fix it. However, the ~const syntax appears to be deprecated, according to rust-lang/rust#110395.

error[E0015]: cannot call non-const closure in constant functions
  --> src/main.rs:18:24
   |
18 |         array[i].write(f(i));
   |                        ^^^^
   |
   = note: calls in constant functions are limited to constant functions, tuple structs and tuple variants
help: consider further restricting this bound
   |
15 | const fn array_from_fn<T, const N: usize>(f: impl Fn(usize) -> T + ~const Fn(usize)) -> [T; N] {
   |                                                                  ++++++++++++++++++
help: add `#![feature(effects)]` to the crate attributes to enable

Additionally, we’re told that the destructor of the function object f cannot be called in const contexts.

error[E0493]: destructor of `impl Fn(usize) -> T` cannot be evaluated at compile-time
  --> src/main.rs:15:43
   |
15 | const fn array_from_fn<T, const N: usize>(f: impl Fn(usize) -> T) -> [T; N] {
   |                                           ^ the destructor for this type cannot be evaluated in constant functions
...
21 | }
   | - value is dropped here

For now, let’s simply add the effects feature. Rounded up to 10 features!

#![feature(
    const_for,
    const_maybe_uninit_array_assume_init,
    const_maybe_uninit_uninit_array,
    const_maybe_uninit_write,
    const_mut_refs,
    const_refs_to_cell,
    const_trait_impl,
    effects,
    maybe_uninit_array_assume_init,
    maybe_uninit_uninit_array,
)]

use std::mem::MaybeUninit;

const fn array_from_fn<T, const N: usize>(f: impl Fn(usize) -> T) -> [T; N] {
    let mut array = MaybeUninit::uninit_array::<N>();
    for i in 0..N {
        array[i].write(f(i));
    }
    unsafe { MaybeUninit::array_assume_init(array) }
}

No more complaint about ~const, only the destructor question about the parameter f remains. Unfortunately, the compiler doesn’t propose any fix, but after digging into the documentation I found that Drop and Copy are exclusive. So if we require f to be Copy, we should hopefully not have to worry about its destructor.

And the following code compiles! More unstable features than lines of actual business logic!

#![feature(
    const_for,
    const_maybe_uninit_array_assume_init,
    const_maybe_uninit_uninit_array,
    const_maybe_uninit_write,
    const_mut_refs,
    const_refs_to_cell,
    const_trait_impl,
    effects,
    maybe_uninit_array_assume_init,
    maybe_uninit_uninit_array,
)]

use std::mem::MaybeUninit;

const fn array_from_fn<T, const N: usize>(f: impl Fn(usize) -> T + Copy) -> [T; N] {
    let mut array = MaybeUninit::uninit_array::<N>();
    for i in 0..N {
        array[i].write(f(i));
    }
    unsafe { MaybeUninit::array_assume_init(array) }
}

As mentioned by CAD1997 on Reddit, the Destruct trait is a more suitable bound than Copy. This is a special trait for types that can be dropped in const contexts.

use std::marker::Destruct;

const fn array_from_fn<T, const N: usize>(
    f: impl Fn(usize) -> T + ~const Destruct,
) -> [T; N] {
    ...
}

An example of function that is ~const Destruct but not Copy is a closure that captures a non-copy object, such as a Vec.

fn main() {
    let v = vec![42];
    let array: [_; 5] = array_from_fn(move |_| v.clone());
    assert_eq!(array, [vec![42], vec![42], vec![42], vec![42], vec![42]]);
}

This code compiles fine with the ~const Destruct bound, but returns the following error with the Copy bound.

error[E0277]: the trait bound `Vec<i32>: Copy` is not satisfied in `{closure@src/main.rs:26:39: 26:47}`
  --> src/main.rs:26:39
   |
26 |     let array: [_; 5] = array_from_fn(move |_| v.clone());
   |                         ------------- --------^^^^^^^^^^
   |                         |             |
   |                         |             within `{closure@src/main.rs:26:39: 26:47}`, the trait `Copy` is not implemented for `Vec<i32>`, which is required by `{closure@src/main.rs:26:39: 26:47}: Copy`
   |                         |             within this `{closure@src/main.rs:26:39: 26:47}`
   |                         required by a bound introduced by this call
   |
note: required because it's used within this closure
  --> src/main.rs:26:39
   |
26 |     let array: [_; 5] = array_from_fn(move |_| v.clone());
   |                                       ^^^^^^^^
note: required by a bound in `array_from_fn`
  --> src/main.rs:16:68
   |
16 | const fn array_from_fn<T, const N: usize>(f: impl Fn(usize) -> T + Copy) -> [T; N] {
   |                                                                    ^^^^ required by this bound in `array_from_fn`

Does it work?

Well, our custom array_from_fn compiles, but we haven’t even tried to call it, so does it really compile? Let’s try with a simple case.

const ARRAY: [usize; 10] = array_from_fn(|i| 2 * i);

At this point, the compiler back-tracks and tells us that the iterator cannot in fact be called in the for loop.

error[E0080]: evaluation of constant value failed
  --> src/main.rs:18:14
   |
18 |     for i in 0..N {
   |              ^^^^ calling non-const function `<std::ops::Range<usize> as IntoIterator>::into_iter`
   |
note: inside `array_from_fn::<usize, 10, {closure@src/main.rs:24:42: 24:45}>`
  --> src/main.rs:18:14
   |
18 |     for i in 0..N {
   |              ^^^^
note: inside `ARRAY`
  --> src/main.rs:24:28
   |
24 | const ARRAY: [usize; 10] = array_from_fn(|i| 2 * i);
   |                            ^^^^^^^^^^^^^^^^^^^^^^^^

Fortunately, we can still manually unroll the loop, and remove a couple unstable features (const_for, const_trait_impl).

const fn array_from_fn<T, const N: usize>(f: impl Fn(usize) -> T + Copy) -> [T; N] {
    let mut array = MaybeUninit::uninit_array::<N>();
    let mut i = 0;
    loop {
        array[i].write(f(i));
        i += 1;
        if i == N {
            break;
        }
    }
    unsafe { MaybeUninit::array_assume_init(array) }
}

The compiler now tells us that the function f isn’t const.

error[E0080]: evaluation of constant value failed
  --> src/main.rs:18:24
   |
18 |         array[i].write(f(i));
   |                        ^^^^ calling non-const function `ARRAY::{closure#0}`

This can be fixed at the call site by making the closure const.

const ARRAY: [usize; 10] = array_from_fn(const |i| 2 * i);

This additionally requires the const_closures feature, which itself triggers a warning as it’s incomplete.

warning: the feature `const_closures` is incomplete and may not be safe to use and/or cause compiler crashes
 --> src/main.rs:2:5
  |
2 |     const_closures,
  |     ^^^^^^^^^^^^^^
  |
  = note: see issue #106003 <https://github.com/rust-lang/rust/issues/106003> for more information
  = note: `#[warn(incomplete_features)]` on by default

But for the purpose of this exercise, the code compiles and runs.

#![feature(
    const_closures,
    const_maybe_uninit_array_assume_init,
    const_maybe_uninit_uninit_array,
    const_maybe_uninit_write,
    const_mut_refs,
    const_refs_to_cell,
    effects,
    maybe_uninit_array_assume_init,
    maybe_uninit_uninit_array,
)]

use std::mem::MaybeUninit;

const fn array_from_fn<T, const N: usize>(f: impl Fn(usize) -> T + Copy) -> [T; N] {
    let mut array = MaybeUninit::uninit_array::<N>();
    let mut i = 0;
    loop {
        array[i].write(f(i));
        i += 1;
        if i == N {
            break;
        }
    }
    unsafe { MaybeUninit::array_assume_init(array) }
}

const ARRAY: [usize; 10] = array_from_fn(const |i| 2 * i);

fn main() {
    println!("array = {ARRAY:?}");
}
array = [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

Is it usable?

So we’ve proved that the function compiles and can be called in a simple case. However, is it broadly usable?

One thing you may notice is that the original array::from_fn takes a FnMut parameter. This would come up if you want to pass a closure that has mutable state, for example computing factorials.

const fn factorials<const N: usize>() -> [usize; N] {
    let mut x = 1;
    array_from_fn(const |i| {
        if i == 0 {
            1
        } else {
            x *= i;
            x
        }
    })
}

const FACTORIALS: [usize; 10] = factorials();

fn main() {
    // Expected output: "factorials = [1, 1, 2, 6, 24, 120, 720, 5040, 40320, 362880]"
    println!("factorials = {FACTORIALS:?}");
}

Firstly, we need to add the move keyword to this mutable closure, otherwise we get a weird compilation error.

error[E0277]: the trait bound `&mut usize: Copy` is not satisfied in `{closure@src/main.rs:30:19: 30:28}`
  --> src/main.rs:30:19
   |
30 |       array_from_fn(const |i| {
   |       ------------- ^--------
   |       |             |
   |  _____|_____________within this `{closure@src/main.rs:30:19: 30:28}`
   | |     |
   | |     required by a bound introduced by this call
31 | |         if i == 0 {
32 | |             1
33 | |         } else {
...  |
36 | |         }
37 | |     })
   | |_____^ within `{closure@src/main.rs:30:19: 30:28}`, the trait `Copy` is not implemented for `&mut usize`, which is required by `{closure@src/main.rs:30:19: 30:28}: Copy`
   |
   = help: the trait `Copy` is implemented for `usize`
   = note: `Copy` is implemented for `&usize`, but not for `&mut usize`
note: required because it's used within this closure
  --> src/main.rs:30:19
   |
30 |     array_from_fn(const |i| {
   |                   ^^^^^^^^^
note: required by a bound in `array_from_fn`
  --> src/main.rs:15:68
   |
15 | const fn array_from_fn<T, const N: usize>(f: impl Fn(usize) -> T + Copy) -> [T; N] {
   |                                                                    ^^^^ required by this bound in `array_from_fn`

Once we add the move keyword to the closure, the compiler guides us to accepting a FnMut.

error[E0594]: cannot assign to `x`, as it is a captured variable in a `Fn` closure
  --> src/main.rs:34:13
   |
15 | const fn array_from_fn<T, const N: usize>(f: impl Fn(usize) -> T + Copy) -> [T; N] {
   |                                              -------------------------- change this to accept `FnMut` instead of `Fn`
...
30 |     array_from_fn(const move |i| {
   |     ------------- -------------- in this closure
   |     |
   |     expects `Fn` instead of `FnMut`
...
34 |             x *= i;
   |             ^^^^^^ cannot assign

We also need to make f a mut parameter, and can remove the const_refs_to_cell unstable feature.

const fn array_from_fn<T, const N: usize>(mut f: impl FnMut(usize) -> T + Copy) -> [T; N] {
    let mut array = MaybeUninit::uninit_array::<N>();
    let mut i = 0;
    loop {
        array[i].write(f(i));
        i += 1;
        if i == N {
            break;
        }
    }
    unsafe { MaybeUninit::array_assume_init(array) }
}
factorials = [1, 1, 2, 6, 24, 120, 720, 5040, 40320, 362880]

Final boss: fetching even elements

With that, we can come back to our original problem: fetching even elements from two SIMD vectors. Starting with a simple case, we can indeed write a function that operates on a fixed vector type, such as u32x8.

fn fetch_even(x: u32x8, y: u32x8) -> u32x8 {
    simd_swizzle!(x, y, array_from_fn::<_, 8>(const |i| 2 * i))
}

fn main() {
    let x = Simd::from_array(std::array::from_fn(|i| i as u32));
    let y = Simd::from_array(std::array::from_fn(|i| 8 + i as u32));
    let z = fetch_even(x, y);
    assert_eq!(<[u32; 8]>::from(x), [0, 1, 2, 3, 4, 5, 6, 7]);
    assert_eq!(<[u32; 8]>::from(y), [8, 9, 10, 11, 12, 13, 14, 15]);
    assert_eq!(<[u32; 8]>::from(z), [0, 2, 4, 6, 8, 10, 12, 14]);
}

We can easily generalize this to any SIMD type with a fixed number of lanes (here still 8).

fn fetch_even<T: SimdElement>(x: Simd<T, 8>, y: Simd<T, 8>) -> Simd<T, 8> {
    simd_swizzle!(x, y, array_from_fn::<_, 8>(const |i| 2 * i))
}

What’s more interesting is to generalize the number of lanes. I’d like to write the following:

fn fetch_even<const N: usize>(x: Simd<u32, N>, y: Simd<u32, N>) -> Simd<u32, N>
where
    LaneCount<N>: SupportedLaneCount,
{
    // error[E0401]: can't use generic parameters from outer item
    simd_swizzle!(x, y, array_from_fn::<_, { N }>(const |i| 2 * i))
}

Unfortunately, this doesn’t work.

error[E0401]: can't use generic parameters from outer item
  --> src/main.rs:34:46
   |
29 | fn fetch_even<const N: usize>(x: Simd<u32, N>, y: Simd<u32, N>) -> Simd<u32, N>
   |                     - const parameter from outer item
...
34 |     simd_swizzle!(x, y, array_from_fn::<_, { N }>(const |i| 2 * i))
   |     -----------------------------------------^---------------------
   |     |                                        |
   |     |                                        use of generic parameter from outer item
   |     help: try introducing a local generic parameter here: `<N>`

Using the simd_swizzle! macro simply doesn’t work in this case. However, a workaround is to define a new type and implement the Swizzle trait for it.

struct FetchEven;

impl<const N: usize> Swizzle<N> for FetchEven {
    const INDEX: [usize; N] = array_from_fn(const |i| 2 * i);
}

fn main() {
    let x = Simd::from_array(std::array::from_fn(|i| i as u32));
    let y = Simd::from_array(std::array::from_fn(|i| 8 + i as u32));
    let z = FetchEven::concat_swizzle(x, y);
    assert_eq!(<[u32; 8]>::from(x), [0, 1, 2, 3, 4, 5, 6, 7]);
    assert_eq!(<[u32; 8]>::from(y), [8, 9, 10, 11, 12, 13, 14, 15]);
    assert_eq!(<[u32; 8]>::from(z), [0, 2, 4, 6, 8, 10, 12, 14]);
}

At this point I wished I could present a full FFT-like snippet, but I got blocked by a compiler error when trying to fetch odd elements.

struct FetchOdd;

impl<const N: usize> Swizzle<N> for FetchOdd {
    const INDEX: [usize; N] = array_from_fn(const |i| 2 * i + 1);
}
thread 'rustc' panicked at /rustc/ada5e2c7b5427a591e30baeeee2698a5eb6db0bd/compiler/rustc_middle/src/ty/util.rs:945:22:
ConstContext::Maybe must have host effect param

You know the experiment with incomplete nightly features has gone too far when you encounter an Internal Compiler Error (ICE). I reported this one as rust-lang/rust#125866. Time to contribute to the Rust compiler, sponsor contributors and/or be patient :)

Here is my final working code.

#![feature(
    const_closures,
    const_maybe_uninit_array_assume_init,
    const_maybe_uninit_uninit_array,
    const_maybe_uninit_write,
    const_mut_refs,
    effects,
    maybe_uninit_array_assume_init,
    maybe_uninit_uninit_array,
    portable_simd,
)]

use std::mem::MaybeUninit;
use std::simd::*;

const fn array_from_fn<T, const N: usize>(mut f: impl FnMut(usize) -> T + Copy) -> [T; N] {
    let mut array = MaybeUninit::uninit_array::<N>();
    let mut i = 0;
    loop {
        array[i].write(f(i));
        i += 1;
        if i == N {
            break;
        }
    }
    unsafe { MaybeUninit::array_assume_init(array) }
}

struct FetchEven;
impl<const N: usize> Swizzle<N> for FetchEven {
    const INDEX: [usize; N] = array_from_fn(const |i| 2 * i);
}

fn main() {
    let x = Simd::from_array(std::array::from_fn(|i| i as u32));
    let y = Simd::from_array(std::array::from_fn(|i| 8 + i as u32));
    let z = FetchEven::concat_swizzle(x, y);
    assert_eq!(<[u32; 8]>::from(x), [0, 1, 2, 3, 4, 5, 6, 7]);
    assert_eq!(<[u32; 8]>::from(y), [8, 9, 10, 11, 12, 13, 14, 15]);
    assert_eq!(<[u32; 8]>::from(z), [0, 2, 4, 6, 8, 10, 12, 14]);
}

One more step: panic safety

Assuming the necessary language features are stabilized, are we done? As I mentioned in the beginning, the actual implementation of array::from_fn is far from trivial, calling into a chain of helper functions.

#[inline]
#[stable(feature = "array_from_fn", since = "1.63.0")]
pub fn from_fn<T, const N: usize, F>(cb: F) -> [T; N]
where
    F: FnMut(usize) -> T,
{
    try_from_fn(NeverShortCircuit::wrap_mut_1(cb)).0
}

#[inline]
#[unstable(feature = "array_try_from_fn", issue = "89379")]
pub fn try_from_fn<R, const N: usize, F>(cb: F) -> ChangeOutputType<R, [R::Output; N]>
where
    F: FnMut(usize) -> R,
    R: Try,
    R::Residual: Residual<[R::Output; N]>,
{
    let mut array = MaybeUninit::uninit_array::<N>();
    match try_from_fn_erased(&mut array, cb) {
        ControlFlow::Break(r) => FromResidual::from_residual(r),
        ControlFlow::Continue(()) => {
            // SAFETY: All elements of the array were populated.
            try { unsafe { MaybeUninit::array_assume_init(array) } }
        }
    }
}

// ...

Part of this complexity is a red herring, as array::from_fn is implemented in terms of the companion array::try_from_fn (where the closure is allowed to be fallible). But another part is legitimate, to ensure exception safety (sometimes also called panic safety).

One problem in particular is what to do if a panic occurs when creating an element: in that case part of the array is already initialized, while the other part isn’t. The Rust language handles panics with one of two strategies, as controlled by the panic setting in Cargo.toml:

There isn’t much to worry about in the second case (resources like allocated memory and file descriptors will be freed by the operating system). Unwinding is more insteresting: objects are dropped as the panic handling walks back the stack (potentially calling custom destructor code), and the panic may eventually be caught via the catch_unwind() function. This behavior is akin to exception handling in C++.

Back to our array initialization, one thing we should do in case of panic is drop the items that have already been initialized. Otherwise, it’s for example possible to trigger memory leaks if these items contain heap allocations, or deadlocks if these items contain mutex guards.

FFT step Panic with a partially initialized array.

Taking a step back from const considerations, here is an example of program that will reach an out-of-memory error with a naive implementation of array_from_fn that doesn’t try to handle panics.

#![feature(
    maybe_uninit_array_assume_init,
    maybe_uninit_slice,
    maybe_uninit_uninit_array
)]

use std::mem::MaybeUninit;

fn main() {
    // Loop enough times to exhaust memory on the Rust Playground configuration.
    for _ in 0..50 {
        let result = std::panic::catch_unwind(|| {
            // Create an array of 100 items, each allocating 1MB, but the
            // function panics creating the 51th item.
            let x: [_; 100] = array_from_fn(|i| {
                if i < 50 {
                    vec![i as u8; 1_000_000]
                } else {
                    panic!("overflow!")
                }
            });
        });
        assert!(result.is_err());
    }
    println!("Hello, world!");
}

fn array_from_fn<T, const N: usize>(mut f: impl FnMut(usize) -> T + Copy) -> [T; N] {
    let mut array = MaybeUninit::uninit_array::<N>();
    for i in 0..N {
        array[i].write(f(i));
    }
    unsafe { MaybeUninit::array_assume_init(array) }
}

Running this code on the Rust Playground will eventually lead to an out-of-memory error (and the program getting killed).

thread 'main' panicked at src/main.rs:19:21:
overflow!
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'main' panicked at src/main.rs:19:21:
overflow!
thread 'main' panicked at src/main.rs:19:21:
overflow!
...
thread 'main' panicked at src/main.rs:19:21:
overflow!
Exited with signal 9 (SIGKILL): kill program

The proper way to handle this, as done by the standard library, is to have a special guard object that keeps track of the number of initialized items, and drops them if a panic occurs before the whole array has been created. As you can see, this adds quite a bit of code.

fn array_from_fn<T, const N: usize>(mut f: impl FnMut(usize) -> T + Copy) -> [T; N] {
    let mut array = MaybeUninit::uninit_array::<N>();
    let mut guard = Guard {
        array_mut: &mut array,
        initialized: 0,
    };
    for i in 0..N {
        guard.array_mut[i].write(f(i));
        guard.initialized += 1;
    }
    std::mem::forget(guard);
    unsafe { MaybeUninit::array_assume_init(array) }
}

struct Guard<'a, T, const N: usize> {
    array_mut: &'a mut [MaybeUninit<T>; N],
    initialized: usize,
}

impl<T, const N: usize> Drop for Guard<'_, T, N> {
    fn drop(&mut self) {
        eprintln!("Panic detected, dropping {} items", self.initialized);
        // SAFETY: this slice will contain only initialized objects.
        unsafe {
            std::ptr::drop_in_place(MaybeUninit::slice_assume_init_mut(
                self.array_mut.get_unchecked_mut(..self.initialized),
            ));
        }
    }
}

This time the program completes normally.

thread 'main' panicked at src/main.rs:19:21:
overflow!
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Panic detected, dropping 50 items
...
thread 'main' panicked at src/main.rs:19:21:
overflow!
Panic detected, dropping 50 items
Hello, world!

Note that even though I’ve mentioned “exception safety”, memory leaks and deadlocks are not in scope for Rust’s notion of safety, which focuses on memory safety and undefined behavior. In particular, reference cycles can leak memory, as was learned the hard way a few months before the Rust 1.0 release. Still, it is generally accepted that standard library types should drop already initialized items in case of panic whenever that’s possible.

It’s also clear that this creates additional code that isn’t useful at all when panic = "abort" is configured. And things are changing in this space: the mutex poisoning logic is disabled since Rust 1.78 unless panicking is in unwind mode. This leverages the cfg(panic = "...") conditional compilation offered by the language.

A lot of places in the standard library could be adapted in a similar way to reduce the cost of unwinding for those who don’t unwind, but that requires a lot of code changes. Niko Matsakis recently proposed a more radical approach: deprecating panic = "unwind" altogether. This would definitely be a breaking change and therefore unlikely to happen in my opinion, but would simplify the work of library authors who wouldn’t have to worry about carefully handling panics.

Conclusion

What started as a seemingly simple problem – a naive implementation of array::from_fn would only take five lines of code – ended up much more complex. It’s of course frustrating when a given function isn’t available, but it’s often easy to overlook all the work that happens behind the scenes of fundamental standard library functions. Trying to (re-)implement some of it oneself is a great way to appreciate the contributions of others!

In my case, doing this re-implementation exercise made it clear that there are fundamental blockers in the language that need to be addressed before array::from_fn can be const. But in other cases, you may end up with an implementation ready to contribute back to the community, either directly in the standard library or in a separate crate.

Don’t be afraid to look behind the scenes and learn or contribute something on the way!

Bonus 1: Can we use a macro instead?

Often times, a way to work around unsupported const generics features in Rust is to leverage the macro system. So you may wonder if we could use a macro to generate an array of integers, rather than resorting to countless unstable features.

Let’s start with a simple case: a macro that takes an integer and outputs an array from zero to that integer. For example, integers!(5) should expand to [0, 1, 2, 3, 4, 5].

Rust macros come in two flavors: declarative macros and procedural macros. Declarative macros define rules in terms of a grammar, directly in the source code. I find them nice as they are also hygienic, with a type system over metavariables (e.g. expressions, literals, types), reducing the chances of programming errors. However, they are also inflexible: even though you can match a generic expression, you cannot transform it, nor specialize it to a base case where it is zero.

// A first attempt that doesn't work.
macro_rules! integers {
    ($i:tt) => {
        [integer_sequence!($i)]
    };
}

macro_rules! integer_sequence {
    (0) => { 0 };
    // "$i - 1" expands "42" into "42 - 1", not into "41".
    ($i:expr) => { $i, integer_sequence!($i - 1) };
}

Procedural macros are more flexible, and indeed the seq-macro crate allows to generate an array of consecutive integers. This could be adapted to generate an array of shuffled pairs.

let numbers = seq_macro::seq!(i in 0..=5 { [ #( i, )* ] });
assert_eq!(numbers, [0, 1, 2, 3, 4, 5]);

However, that wouldn’t be really useful: as mentioned in the preamble, the macro would receive as parameter a const generic parameter N, not a specific integer. So there isn’t any specific array the macro should expand to!

impl<const N: usize> Swizzle<N> for FetchEven {
    // What tokens should "even_integers!(N)" expand to?
    // N is a const generic parameter, not a specific integer!
    const INDEX: [usize; N] = even_integers!(N);
}

The macro approach would simply not work for this problem – unless one puts the whole computation in a big macro and discards const generics altogether, which wouldn’t be very practical.

Bonus 2: How efficient is std::simd::simd_swizzle!?

One thing you may wonder is whether the code generated by simd_swizzle! is optimal, or if one should use lower-level intrinsics or inline assembly to get better performance.

This is a legitimate question: for example on Intel the generic SIMD permutation instruction on u32x8 (vpermps) has 3 cycles of latency and 1 cycle of throughput, whereas more specific instructions such as vpermilps have only 1 cycle of latency and 1 cycle of throughput.

A great tool to confirm that is the Compiler Explorer. To test the Rust compiler optimizations, I used the following program which contains shuffles that are either arbitrary or correspond to more specific intrinsics.

I’m using extern "C" to obtain a simpler assembly without moves from and to the stack around the relevant instructions. However, SIMD types are not FFI-safe and will trigger a warning. See also rust-lang/rust#53346 for a longer explanation. Don’t repeat this at home!

The #[no_mangle] annotation is simply to help the Compiler Explorer.

#![feature(portable_simd)]
use std::simd::*;

// Intel-specific intrinsic
#[no_mangle]
pub extern "C" fn unpackhi_epi32(x: u32x8, y: u32x8) -> u32x8 {
    simd_swizzle!(x, y, [2, 10, 3, 11, 6, 14, 7, 15])
}

// ARM-specific intrinsic
#[no_mangle]
pub extern "C" fn vzip1q_u8(x: u8x16, y: u8x16) -> u8x16 {
    simd_swizzle!(x, y, [0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23])
}

// Relatively simple permutation
#[no_mangle]
pub extern "C" fn swap_pairs(x: u32x8) -> u32x8 {
    simd_swizzle!(x, [1, 0, 3, 2, 5, 4, 7, 6])
}

// Arbitrary shuffle
#[no_mangle]
pub extern "C" fn arbitrary_shuffle(x: u32x8) -> u32x8 {
    simd_swizzle!(x, [2, 7, 4, 1, 3, 5, 0, 6])
}

Compiling this on an Intel CPU with AVX2 instructions gives a well-optimized assembly (Rust flags: -O -C target-cpu=haswell).

unpackhi_epi32:
        vunpckhps       ymm0, ymm0, ymm1
        ret

vzip1q_u8:
        vpunpcklbw      xmm0, xmm0, xmm1
        ret

swap_pairs:
        vshufps ymm0, ymm0, ymm0, 177
        ret

.LCPI3_0:
        .long   2
        .long   7
        .long   4
        .long   1
        .long   3
        .long   5
        .long   0
        .long   6
arbitrary_shuffle:
        vmovaps ymm1, ymmword ptr [rip + .LCPI3_0]
        vpermps ymm0, ymm1, ymm0
        ret

On an ARM CPU with NEON instructions, the output is quite optimized as well (Rust flags: -O --target=aarch64-unknown-linux-gnu).

unpackhi_epi32:
        zip2    v1.4s, v1.4s, v3.4s
        zip2    v0.4s, v0.4s, v2.4s
        ret

vzip1q_u8:
        zip1    v0.16b, v0.16b, v1.16b
        ret

swap_pairs:
        rev64   v0.4s, v0.4s
        rev64   v1.4s, v1.4s
        ret

arbitrary_shuffle:
        ext     v2.16b, v0.16b, v1.16b, #4
        ext     v3.16b, v0.16b, v0.16b, #4
        trn2    v0.4s, v1.4s, v2.4s
        ext     v1.16b, v3.16b, v1.16b, #12
        ext     v0.16b, v0.16b, v2.16b, #4
        zip2    v1.4s, v3.4s, v1.4s
        ret

What’s interesting is to inspect the LLVM intermediate representation, which you can view in the Compiler Explorer with the --emit=llvm-ir flag. This shows that this each permutation is encoded in terms of the shufflevector LLVM instruction, and that LLVM later selects the best instructions depending on the target CPU.

define <8 x i32> @unpackhi_epi32(<8 x i32> %x, <8 x i32> %y) unnamed_addr #0 !dbg !6 {
  %0 = shufflevector <8 x i32> %x, <8 x i32> %y, <8 x i32> <i32 2, i32 10, i32 3, i32 11, i32 6, i32 14, i32 7, i32 15>, !dbg !11
  ret <8 x i32> %0, !dbg !21
}

define <16 x i8> @vzip1q_u8(<16 x i8> %x, <16 x i8> %y) unnamed_addr #0 !dbg !22 {
  %0 = shufflevector <16 x i8> %x, <16 x i8> %y, <16 x i32> <i32 0, i32 16, i32 1, i32 17, i32 2, i32 18, i32 3, i32 19, i32 4, i32 20, i32 5, i32 21, i32 6, i32 22, i32 7, i32 23>, !dbg !23
  ret <16 x i8> %0, !dbg !28
}

define <8 x i32> @swap_pairs(<8 x i32> %x) unnamed_addr #0 !dbg !29 {
  %0 = shufflevector <8 x i32> %x, <8 x i32> poison, <8 x i32> <i32 1, i32 0, i32 3, i32 2, i32 5, i32 4, i32 7, i32 6>, !dbg !30
  ret <8 x i32> %0, !dbg !35
}

define <8 x i32> @arbitrary_shuffle(<8 x i32> %x) unnamed_addr #0 !dbg !36 {
  %0 = shufflevector <8 x i32> %x, <8 x i32> poison, <8 x i32> <i32 2, i32 7, i32 4, i32 1, i32 3, i32 5, i32 0, i32 6>, !dbg !37
  ret <8 x i32> %0, !dbg !42
}

In fact, using the respective intrinsics gives the exact same output as the simd_swizzle! macro, both in terms of LLVM IR and assembly.

#![feature(portable_simd)]
use std::simd::*;
use std::arch::x86_64::*;

#[no_mangle]
pub extern "C" fn unpackhi_epi32_intrinsic(x: u32x8, y: u32x8) -> u32x8 {
    unsafe { _mm256_unpackhi_epi32(x.into(), y.into()).into() }
}
define <8 x i32> @unpackhi_epi32_intrinsic(<8 x i32> %x, <8 x i32> %y) unnamed_addr #0 !dbg !7 {
  %0 = shufflevector <8 x i32> %x, <8 x i32> %y, <8 x i32> <i32 2, i32 10, i32 3, i32 11, i32 6, i32 14, i32 7, i32 15>, !dbg !12
  ret <8 x i32> %0, !dbg !22
}
unpackhi_epi32_intrinsic:
        vunpckhps       ymm0, ymm0, ymm1
        ret

These experiments show that the simd_swizzle! macro is a zero-cost abstraction: both user friendly and well optimized. Once the portable_simd feature is stabilized, there should be no need to learn hundreds of CPU-specific intrinsics anymore.

This also shows intrinsics are not tied to specific instructions but can be further optimized by LLVM, so it’s pointless to use intrinsics to force the Rust compiler to generate a specific sequence of instructions. Here is one last example to illustrate that.

#![feature(portable_simd)]
use std::arch::x86_64::*;
use std::simd::*;

#[no_mangle]
pub extern "C" fn swap_pairs_two_intrinsics(x: u32x8) -> u32x8 {
    unsafe {
        _mm256_permutevar8x32_epi32(
            x.into(),
            _mm256_set_epi32(6, 7, 4, 5, 2, 3, 0, 1),
        ).into()
    }
}

#[no_mangle]
pub extern "C" fn swap_pairs_one_intrinsic(x: f32x8) -> f32x8 {
    unsafe {
        _mm256_shuffle_ps(
            x.into(),
            x.into(),
            0b10_11_00_01,
        ).into()
    }
}
swap_pairs_two_intrinsics:
        vshufps ymm0, ymm0, ymm0, 177
        ret

swap_pairs_one_intrinsic:
        vshufps ymm0, ymm0, ymm0, 177
        ret

This post was edited to take into account feedback on reddit, in particular about the Destruct trait.


Comments

To react to this blog post please check the Mastodon thread, the Lobste.rs thread and the Reddit thread.


RSS | Mastodon | GitHub


You may also like

Asynchronous streams in Rust (part 1) - Futures, buffering and mysterious compilation error messages
STV-rs: Single Transferable Vote implementation in Rust
Why my Rust benchmarks were wrong, or how to correctly use std::hint::black_box?
Optimization adventures: making a parallel Rust workload 10x faster with (or without) Rayon
And 32 more posts on this blog!