Making a const version of Rust's array::from_fn - How hard can it be?
Writing const
functions has been supported since Rust 1.31 in 2018.
These functions can be evaluated at compile time, which is for example useful to shift expensive calculations before the program runs.
Knowing values at compile time is also necessary for const
generic parameters, a feature available since Rust 1.51 in 2021.
Even though const functions have existed for 6 years, only a subset of Rust is available in const context, as tracked in rust-lang/rust#57563.
As this subset expands, more and more functions in the standard library become exposed as const fn
.
A recent example is the inline_const
feature, stabilized in Rust 1.79.
In this post, I will focus on std::array::from_fn
, a function that is unfortunately not const.
After explaining why I’d find it useful in const contexts, I’ll investigate why it’s not const yet and what it would really take to make it so (spoiler: with lots of unstable Rust features).
For reproducibility, the code snippets in this post have been tested with Rust nightly version 1.81.
$ cargo +nightly -V cargo 1.81.0-nightly (a1f47ec3f 2024-06-15)
- Preamble: why
array::from_fn
is useful at compile time? - A minimal implementation
- Does it work?
- Is it usable?
- Final boss: fetching even elements
- One more step: panic safety
- Conclusion
- Bonus 1: Can we use a macro instead?
- Bonus 2: How efficient is
std::simd::simd_swizzle!
?
Preamble: why array::from_fn
is useful at compile time?
My journey started by investigating an experimental feature of Rust: portable_simd
.
Exposed in the std::simd
module, this exposes SIMD operations in a hardware-agnostic way.
In one sentence, SIMD allows to more efficiently calculate arithmetic operations on multiple values in parallel, by packing multiple values into a single register.
It is widely available in one form or another in most modern CPUs.
For example, the u32x8
type represents 8 elements of type u32
packed together.
As such, the most common use of SIMD operations is to speed-up element-wise operations on arrays.
#![feature(portable_simd)]
use std::simd::u32x8;
fn main() {
let x = u32x8::from_array([0, 1, 2, 3, 4, 5, 6, 7]);
let y = u32x8::from_array([8, 9, 10, 11, 12, 13, 14, 15]);
// Element-wise sum.
let z: [u32; 8] = (x + y).into();
assert_eq!(z, [8, 10, 12, 14, 16, 18, 20, 22]);
}
Another useful operation is shuffling elements within registers, which is exposed via the simd_swizzle!
macro
#![feature(portable_simd)]
use std::simd::{simd_swizzle, u32x8};
fn main() {
let x = u32x8::from_array([42, 43, 44, 45, 46, 47, 48, 49]);
let y = u32x8::from_array([50, 51, 52, 53, 54, 55, 56, 57]);
// Swap pairs of consecutive elements.
let z = simd_swizzle!(x, [1, 0, 3, 2, 5, 4, 7, 6]);
let z: [u32; 8] = z.into();
assert_eq!(z, [43, 42, 45, 44, 47, 46, 49, 48]);
// Select even elements.
let t = simd_swizzle!(x, y, [0, 2, 4, 6, 8, 10, 12, 14]);
let t: [u32; 8] = t.into();
assert_eq!(t, [42, 44, 46, 48, 50, 52, 54, 56]);
}
One practical application of shuffles is implementing Fast Fourier Transforms, where you typically need to add and subtract consecutive elements.
Simplified Fast Fourier Transform step.
This non-trivial operation can be implemented from basic SIMD operations thanks to shuffling elements.
FFT step implemented with SIMD shuffles.
#![feature(portable_simd)]
use std::simd::{simd_swizzle, u32x8};
// Ceci n'est pas une FFT.
fn foo(x: u32x8, y: u32x8) -> (u32x8, u32x8) {
let even = simd_swizzle!(x, y, [0, 2, 4, 6, 8, 10, 12, 14]);
let odd = simd_swizzle!(x, y, [1, 3, 5, 7, 9, 11, 13, 15]);
let a = even + odd;
let b = even - odd;
let x = simd_swizzle!(a, b, [0, 8, 1, 9, 2, 10, 3, 11]);
let y = simd_swizzle!(a, b, [4, 12, 5, 13, 6, 14, 7, 15]);
(x, y)
}
As you may have noticed, the simd_swizzle!
macro takes an array parameter that describes the permutation in terms of indices: this array is in fact a compile-time array.
And this is where array::from_fn
would be useful, to automatically generate the sequence of even or odd indices.
// The code I wish to write:
let even = simd_swizzle!(x, y, array::from_fn(|i| 2 * i));
let odd = simd_swizzle!(x, y, array::from_fn(|i| 2 * i + 1));
This is even more useful if we want to generalize the FFT implementation to any number of lanes (not necessarily 8).
Indeed, the number of lanes supported natively depends on the CPU architecture, so making the code generic and instantiating it based on CPU support can yield better performance.
For that, the portable SIMD library has a generic Simd
type.
#![feature(portable_simd)]
use std::array;
use std::simd::{simd_swizzle, LaneCount, Simd, SupportedLaneCount};
fn foo<const N: usize>(x: Simd<u32, N>, y: Simd<u32, N>) -> (Simd<u32, N>, Simd<u32, N>)
where
LaneCount<N>: SupportedLaneCount,
{
let even = simd_swizzle!(x, y, /* insert array of N indices */);
let odd = simd_swizzle!(x, y, /* insert array of N indices */);
(even, odd)
}
Alternatively to the simd_swizzle!
macro, you may try to directly implement the Swizzle
trait for every possible length.
#![feature(portable_simd)]
use std::simd::{LaneCount, Simd, SupportedLaneCount, Swizzle};
fn foo<const N: usize>(x: Simd<u32, N>, y: Simd<u32, N>) -> (Simd<u32, N>, Simd<u32, N>)
where
LaneCount<N>: SupportedLaneCount,
{
let even = FetchEven::concat_swizzle(x, y);
...
}
// Custom type to represent a swizzle operation that fetches even indices.
struct FetchEven;
// Let's implement it for all valid lane counts.
impl Swizzle<1> for FetchEven {
const INDEX: [usize; 1] = [0];
}
impl Swizzle<2> for FetchEven {
const INDEX: [usize; 2] = [0, 2];
}
impl Swizzle<4> for FetchEven {
const INDEX: [usize; 4] = [0, 2, 4, 6];
}
impl Swizzle<8> for FetchEven { ... }
impl Swizzle<16> for FetchEven { ... }
impl Swizzle<32> for FetchEven { ... }
impl Swizzle<64> for FetchEven { ... }
However, this approach will not work: even though SupportedLaneCount
is a sealed trait and you implement it for all acceptable values of N
(powers of two from 1 to 64), the compiler will not allow that.
error[E0277]: the trait bound `FetchEven: Swizzle<N>` is not satisfied
--> src/main.rs:8:16
|
8 | let even = FetchEven::concat_swizzle(x, y);
| ^^^^^^^^^ the trait `Swizzle<N>` is not implemented for `FetchEven`
|
= help: the following other types implement trait `Swizzle<N>`:
`FetchEven` implements `Swizzle<16>`
`FetchEven` implements `Swizzle<1>`
`FetchEven` implements `Swizzle<2>`
`FetchEven` implements `Swizzle<32>`
`FetchEven` implements `Swizzle<4>`
`FetchEven` implements `Swizzle<64>`
`FetchEven` implements `Swizzle<8>`
What we really need is to implement FetchEven
for any N
, which requires calling array::from_fn
in a const
context.
impl<const N: usize> Swizzle<N> for FetchEven {
const INDEX: [usize; N] = std::array::from_fn(|i| 2 * i);
}
This brings us to the topic of this post.
error[E0015]: cannot call non-const fn `std::array::from_fn::<usize, N, {closure@src/main.rs:15:51: 15:54}>` in constants
--> src/main.rs:15:31
|
15 | const INDEX: [usize; N] = std::array::from_fn(|i| 2 * i);
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: calls in constants are limited to constant functions, tuple structs and tuple variants
As mentioned by SkiFire13 on Reddit, it’s possible to directly generate an const array without using
array::from_fn
. That said, it’s much more verbose and therefore less ergonomic.struct FetchEven; impl<const N: usize> Swizzle<N> for FetchEven { const INDEX: [usize; N] = { let mut out = [0; N]; let mut i = 0; while i < N { out[i] = 2 * i; i += 1; } out }; }
A minimal implementation
Let’s try to implement a compile-time array::from_fn
.
How hard can it be?
The first idea would be to take the existing implementation and add some const
keywords on it.
However, you’ll quickly realize that this implementation isn’t that simple: the std::array
module contains almost 1000 lines at the time of writing (including documentation), of which array::from_fn()
isn’t particularly trivial.
A more pragmatic approach is to implement it by hand: with MaybeUninit
and a loop it should be easy, right?
use std::mem::MaybeUninit;
// A first naive implementation.
const fn array_from_fn<T, const N: usize>(f: impl Fn(usize) -> T) -> [T; N] {
let mut array = MaybeUninit::uninit_array::<N>();
for i in 0..N {
array[i].write(f(i));
}
unsafe { MaybeUninit::array_assume_init(array) }
}
Not that easy. The compiler greets us with several rather straightforward errors, and guides us to use the following unstable features.
error[E0658]: `for` is not allowed in a `const fn`
--> src/main.rs:6:5
|
6 | / for i in 0..N {
7 | | array[i].write(f(i));
8 | | }
| |_____^
|
= note: see issue #87575 <https://github.com/rust-lang/rust/issues/87575> for more information
= help: add `#![feature(const_for)]` to the crate attributes to enable
error[E0658]: use of unstable library feature 'maybe_uninit_uninit_array'
--> src/main.rs:5:21
|
5 | let mut array = MaybeUninit::uninit_array::<N>();
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: see issue #96097 <https://github.com/rust-lang/rust/issues/96097> for more information
= help: add `#![feature(maybe_uninit_uninit_array)]` to the crate attributes to enable
error[E0658]: use of unstable library feature 'maybe_uninit_array_assume_init'
--> src/main.rs:9:14
|
9 | unsafe { MaybeUninit::array_assume_init(array) }
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: see issue #96097 <https://github.com/rust-lang/rust/issues/96097> for more information
= help: add `#![feature(maybe_uninit_array_assume_init)]` to the crate attributes to enable
For more information about this error, try `rustc --explain E0658`.
const_for
, to be able to usefor
loops inconst
functions. For loops are not as trivial as they seem, as under the hood they desugar to using theIterator
trait, and that itself requires being able to use traits inconst
contexts.maybe_uninit_array_assume_init
,maybe_uninit_uninit_array
because the corresponding functions are not yet stable.
#![feature(
const_for,
maybe_uninit_array_assume_init,
maybe_uninit_uninit_array,
)]
At the next iteration, the compiler suggests a few more unstable features.
const_maybe_uninit_array_assume_init
,const_maybe_uninit_uninit_array
andconst_maybe_uninit_write
to use the corresponding functions inconst
contexts.const_trait_impl
as a follow-up ofconst_for
, to use the underlyingIterator
trait in the for loop.const_mut_refs
to be able to mutate thearray
, as well as the implicit iterator in the for loop.const_refs_to_cell
to be able to call the function and overcome the following error.
error[E0658]: cannot borrow here, since the borrowed element may contain interior mutability
--> src/main.rs:15:24
|
15 | array[i].write(f(i));
| ^
|
= note: see issue #80384 <https://github.com/rust-lang/rust/issues/80384> for more information
= help: add `#![feature(const_refs_to_cell)]` to the crate attributes to enable
The code doesn’t compile yet, but we now use 9 unstable features!
#![feature(
const_for,
const_maybe_uninit_array_assume_init,
const_maybe_uninit_uninit_array,
const_maybe_uninit_write,
const_mut_refs,
const_refs_to_cell,
const_trait_impl,
maybe_uninit_array_assume_init,
maybe_uninit_uninit_array,
)]
At this point, things start to get spicy.
First, the compiler mentions that iterating over a range doesn’t work because the Iterator::next()
function isn’t const
, but that the effects
feature should fix that.
error[E0015]: cannot convert `std::ops::Range<usize>` into an iterator in constant functions
--> src/main.rs:17:14
|
17 | for i in 0..N {
| ^^^^
|
note: impl defined here, but it is not `const`
--> /rustc/d7f6ebacee13b6c03623c4b74197280454ede8de/library/core/src/iter/traits/collect.rs:349:1
= note: calls in constant functions are limited to constant functions, tuple structs and tuple variants
help: add `#![feature(effects)]` to the crate attributes to enable
Likewise, the function parameter f
isn’t const, but adding a trait bound ~const Fn(usize)
is supposed to fix it.
However, the ~const
syntax appears to be deprecated, according to rust-lang/rust#110395.
error[E0015]: cannot call non-const closure in constant functions
--> src/main.rs:18:24
|
18 | array[i].write(f(i));
| ^^^^
|
= note: calls in constant functions are limited to constant functions, tuple structs and tuple variants
help: consider further restricting this bound
|
15 | const fn array_from_fn<T, const N: usize>(f: impl Fn(usize) -> T + ~const Fn(usize)) -> [T; N] {
| ++++++++++++++++++
help: add `#![feature(effects)]` to the crate attributes to enable
Additionally, we’re told that the destructor of the function object f
cannot be called in const
contexts.
error[E0493]: destructor of `impl Fn(usize) -> T` cannot be evaluated at compile-time
--> src/main.rs:15:43
|
15 | const fn array_from_fn<T, const N: usize>(f: impl Fn(usize) -> T) -> [T; N] {
| ^ the destructor for this type cannot be evaluated in constant functions
...
21 | }
| - value is dropped here
For now, let’s simply add the effects
feature.
Rounded up to 10 features!
#![feature(
const_for,
const_maybe_uninit_array_assume_init,
const_maybe_uninit_uninit_array,
const_maybe_uninit_write,
const_mut_refs,
const_refs_to_cell,
const_trait_impl,
effects,
maybe_uninit_array_assume_init,
maybe_uninit_uninit_array,
)]
use std::mem::MaybeUninit;
const fn array_from_fn<T, const N: usize>(f: impl Fn(usize) -> T) -> [T; N] {
let mut array = MaybeUninit::uninit_array::<N>();
for i in 0..N {
array[i].write(f(i));
}
unsafe { MaybeUninit::array_assume_init(array) }
}
No more complaint about ~const
, only the destructor question about the parameter f
remains.
Unfortunately, the compiler doesn’t propose any fix, but after digging into the documentation I found that Drop
and Copy
are exclusive.
So if we require f
to be Copy
, we should hopefully not have to worry about its destructor.
And the following code compiles! More unstable features than lines of actual business logic!
#![feature(
const_for,
const_maybe_uninit_array_assume_init,
const_maybe_uninit_uninit_array,
const_maybe_uninit_write,
const_mut_refs,
const_refs_to_cell,
const_trait_impl,
effects,
maybe_uninit_array_assume_init,
maybe_uninit_uninit_array,
)]
use std::mem::MaybeUninit;
const fn array_from_fn<T, const N: usize>(f: impl Fn(usize) -> T + Copy) -> [T; N] {
let mut array = MaybeUninit::uninit_array::<N>();
for i in 0..N {
array[i].write(f(i));
}
unsafe { MaybeUninit::array_assume_init(array) }
}
As mentioned by CAD1997 on Reddit, the
Destruct
trait is a more suitable bound thanCopy
. This is a special trait for types that can be dropped inconst
contexts.use std::marker::Destruct; const fn array_from_fn<T, const N: usize>( f: impl Fn(usize) -> T + ~const Destruct, ) -> [T; N] { ... }
An example of function that is
~const Destruct
but notCopy
is a closure that captures a non-copy object, such as aVec
.fn main() { let v = vec![42]; let array: [_; 5] = array_from_fn(move |_| v.clone()); assert_eq!(array, [vec![42], vec![42], vec![42], vec![42], vec![42]]); }
This code compiles fine with the
~const Destruct
bound, but returns the following error with theCopy
bound.error[E0277]: the trait bound `Vec<i32>: Copy` is not satisfied in `{closure@src/main.rs:26:39: 26:47}` --> src/main.rs:26:39 | 26 | let array: [_; 5] = array_from_fn(move |_| v.clone()); | ------------- --------^^^^^^^^^^ | | | | | within `{closure@src/main.rs:26:39: 26:47}`, the trait `Copy` is not implemented for `Vec<i32>`, which is required by `{closure@src/main.rs:26:39: 26:47}: Copy` | | within this `{closure@src/main.rs:26:39: 26:47}` | required by a bound introduced by this call | note: required because it's used within this closure --> src/main.rs:26:39 | 26 | let array: [_; 5] = array_from_fn(move |_| v.clone()); | ^^^^^^^^ note: required by a bound in `array_from_fn` --> src/main.rs:16:68 | 16 | const fn array_from_fn<T, const N: usize>(f: impl Fn(usize) -> T + Copy) -> [T; N] { | ^^^^ required by this bound in `array_from_fn`
Does it work?
Well, our custom array_from_fn
compiles, but we haven’t even tried to call it, so does it really compile?
Let’s try with a simple case.
const ARRAY: [usize; 10] = array_from_fn(|i| 2 * i);
At this point, the compiler back-tracks and tells us that the iterator cannot in fact be called in the for loop.
error[E0080]: evaluation of constant value failed
--> src/main.rs:18:14
|
18 | for i in 0..N {
| ^^^^ calling non-const function `<std::ops::Range<usize> as IntoIterator>::into_iter`
|
note: inside `array_from_fn::<usize, 10, {closure@src/main.rs:24:42: 24:45}>`
--> src/main.rs:18:14
|
18 | for i in 0..N {
| ^^^^
note: inside `ARRAY`
--> src/main.rs:24:28
|
24 | const ARRAY: [usize; 10] = array_from_fn(|i| 2 * i);
| ^^^^^^^^^^^^^^^^^^^^^^^^
Fortunately, we can still manually unroll the loop, and remove a couple unstable features (const_for
, const_trait_impl
).
const fn array_from_fn<T, const N: usize>(f: impl Fn(usize) -> T + Copy) -> [T; N] {
let mut array = MaybeUninit::uninit_array::<N>();
let mut i = 0;
loop {
array[i].write(f(i));
i += 1;
if i == N {
break;
}
}
unsafe { MaybeUninit::array_assume_init(array) }
}
The compiler now tells us that the function f
isn’t const
.
error[E0080]: evaluation of constant value failed
--> src/main.rs:18:24
|
18 | array[i].write(f(i));
| ^^^^ calling non-const function `ARRAY::{closure#0}`
This can be fixed at the call site by making the closure const
.
const ARRAY: [usize; 10] = array_from_fn(const |i| 2 * i);
This additionally requires the const_closures
feature, which itself triggers a warning as it’s incomplete.
warning: the feature `const_closures` is incomplete and may not be safe to use and/or cause compiler crashes
--> src/main.rs:2:5
|
2 | const_closures,
| ^^^^^^^^^^^^^^
|
= note: see issue #106003 <https://github.com/rust-lang/rust/issues/106003> for more information
= note: `#[warn(incomplete_features)]` on by default
But for the purpose of this exercise, the code compiles and runs.
#![feature(
const_closures,
const_maybe_uninit_array_assume_init,
const_maybe_uninit_uninit_array,
const_maybe_uninit_write,
const_mut_refs,
const_refs_to_cell,
effects,
maybe_uninit_array_assume_init,
maybe_uninit_uninit_array,
)]
use std::mem::MaybeUninit;
const fn array_from_fn<T, const N: usize>(f: impl Fn(usize) -> T + Copy) -> [T; N] {
let mut array = MaybeUninit::uninit_array::<N>();
let mut i = 0;
loop {
array[i].write(f(i));
i += 1;
if i == N {
break;
}
}
unsafe { MaybeUninit::array_assume_init(array) }
}
const ARRAY: [usize; 10] = array_from_fn(const |i| 2 * i);
fn main() {
println!("array = {ARRAY:?}");
}
array = [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
Is it usable?
So we’ve proved that the function compiles and can be called in a simple case. However, is it broadly usable?
One thing you may notice is that the original array::from_fn
takes a FnMut
parameter.
This would come up if you want to pass a closure that has mutable state, for example computing factorials.
const fn factorials<const N: usize>() -> [usize; N] {
let mut x = 1;
array_from_fn(const |i| {
if i == 0 {
1
} else {
x *= i;
x
}
})
}
const FACTORIALS: [usize; 10] = factorials();
fn main() {
// Expected output: "factorials = [1, 1, 2, 6, 24, 120, 720, 5040, 40320, 362880]"
println!("factorials = {FACTORIALS:?}");
}
Firstly, we need to add the move
keyword to this mutable closure, otherwise we get a weird compilation error.
error[E0277]: the trait bound `&mut usize: Copy` is not satisfied in `{closure@src/main.rs:30:19: 30:28}`
--> src/main.rs:30:19
|
30 | array_from_fn(const |i| {
| ------------- ^--------
| | |
| _____|_____________within this `{closure@src/main.rs:30:19: 30:28}`
| | |
| | required by a bound introduced by this call
31 | | if i == 0 {
32 | | 1
33 | | } else {
... |
36 | | }
37 | | })
| |_____^ within `{closure@src/main.rs:30:19: 30:28}`, the trait `Copy` is not implemented for `&mut usize`, which is required by `{closure@src/main.rs:30:19: 30:28}: Copy`
|
= help: the trait `Copy` is implemented for `usize`
= note: `Copy` is implemented for `&usize`, but not for `&mut usize`
note: required because it's used within this closure
--> src/main.rs:30:19
|
30 | array_from_fn(const |i| {
| ^^^^^^^^^
note: required by a bound in `array_from_fn`
--> src/main.rs:15:68
|
15 | const fn array_from_fn<T, const N: usize>(f: impl Fn(usize) -> T + Copy) -> [T; N] {
| ^^^^ required by this bound in `array_from_fn`
Once we add the move
keyword to the closure, the compiler guides us to accepting a FnMut
.
error[E0594]: cannot assign to `x`, as it is a captured variable in a `Fn` closure
--> src/main.rs:34:13
|
15 | const fn array_from_fn<T, const N: usize>(f: impl Fn(usize) -> T + Copy) -> [T; N] {
| -------------------------- change this to accept `FnMut` instead of `Fn`
...
30 | array_from_fn(const move |i| {
| ------------- -------------- in this closure
| |
| expects `Fn` instead of `FnMut`
...
34 | x *= i;
| ^^^^^^ cannot assign
We also need to make f
a mut
parameter, and can remove the const_refs_to_cell
unstable feature.
const fn array_from_fn<T, const N: usize>(mut f: impl FnMut(usize) -> T + Copy) -> [T; N] {
let mut array = MaybeUninit::uninit_array::<N>();
let mut i = 0;
loop {
array[i].write(f(i));
i += 1;
if i == N {
break;
}
}
unsafe { MaybeUninit::array_assume_init(array) }
}
factorials = [1, 1, 2, 6, 24, 120, 720, 5040, 40320, 362880]
Final boss: fetching even elements
With that, we can come back to our original problem: fetching even elements from two SIMD vectors.
Starting with a simple case, we can indeed write a function that operates on a fixed vector type, such as u32x8
.
fn fetch_even(x: u32x8, y: u32x8) -> u32x8 {
simd_swizzle!(x, y, array_from_fn::<_, 8>(const |i| 2 * i))
}
fn main() {
let x = Simd::from_array(std::array::from_fn(|i| i as u32));
let y = Simd::from_array(std::array::from_fn(|i| 8 + i as u32));
let z = fetch_even(x, y);
assert_eq!(<[u32; 8]>::from(x), [0, 1, 2, 3, 4, 5, 6, 7]);
assert_eq!(<[u32; 8]>::from(y), [8, 9, 10, 11, 12, 13, 14, 15]);
assert_eq!(<[u32; 8]>::from(z), [0, 2, 4, 6, 8, 10, 12, 14]);
}
We can easily generalize this to any SIMD type with a fixed number of lanes (here still 8).
fn fetch_even<T: SimdElement>(x: Simd<T, 8>, y: Simd<T, 8>) -> Simd<T, 8> {
simd_swizzle!(x, y, array_from_fn::<_, 8>(const |i| 2 * i))
}
What’s more interesting is to generalize the number of lanes. I’d like to write the following:
fn fetch_even<const N: usize>(x: Simd<u32, N>, y: Simd<u32, N>) -> Simd<u32, N>
where
LaneCount<N>: SupportedLaneCount,
{
// error[E0401]: can't use generic parameters from outer item
simd_swizzle!(x, y, array_from_fn::<_, { N }>(const |i| 2 * i))
}
Unfortunately, this doesn’t work.
error[E0401]: can't use generic parameters from outer item
--> src/main.rs:34:46
|
29 | fn fetch_even<const N: usize>(x: Simd<u32, N>, y: Simd<u32, N>) -> Simd<u32, N>
| - const parameter from outer item
...
34 | simd_swizzle!(x, y, array_from_fn::<_, { N }>(const |i| 2 * i))
| -----------------------------------------^---------------------
| | |
| | use of generic parameter from outer item
| help: try introducing a local generic parameter here: `<N>`
Using the simd_swizzle!
macro simply doesn’t work in this case.
However, a workaround is to define a new type and implement the Swizzle
trait for it.
struct FetchEven;
impl<const N: usize> Swizzle<N> for FetchEven {
const INDEX: [usize; N] = array_from_fn(const |i| 2 * i);
}
fn main() {
let x = Simd::from_array(std::array::from_fn(|i| i as u32));
let y = Simd::from_array(std::array::from_fn(|i| 8 + i as u32));
let z = FetchEven::concat_swizzle(x, y);
assert_eq!(<[u32; 8]>::from(x), [0, 1, 2, 3, 4, 5, 6, 7]);
assert_eq!(<[u32; 8]>::from(y), [8, 9, 10, 11, 12, 13, 14, 15]);
assert_eq!(<[u32; 8]>::from(z), [0, 2, 4, 6, 8, 10, 12, 14]);
}
At this point I wished I could present a full FFT-like snippet, but I got blocked by a compiler error when trying to fetch odd elements.
struct FetchOdd;
impl<const N: usize> Swizzle<N> for FetchOdd {
const INDEX: [usize; N] = array_from_fn(const |i| 2 * i + 1);
}
thread 'rustc' panicked at /rustc/ada5e2c7b5427a591e30baeeee2698a5eb6db0bd/compiler/rustc_middle/src/ty/util.rs:945:22:
ConstContext::Maybe must have host effect param
You know the experiment with incomplete nightly features has gone too far when you encounter an Internal Compiler Error (ICE). I reported this one as rust-lang/rust#125866. Time to contribute to the Rust compiler, sponsor contributors and/or be patient :)
Here is my final working code.
#![feature(
const_closures,
const_maybe_uninit_array_assume_init,
const_maybe_uninit_uninit_array,
const_maybe_uninit_write,
const_mut_refs,
effects,
maybe_uninit_array_assume_init,
maybe_uninit_uninit_array,
portable_simd,
)]
use std::mem::MaybeUninit;
use std::simd::*;
const fn array_from_fn<T, const N: usize>(mut f: impl FnMut(usize) -> T + Copy) -> [T; N] {
let mut array = MaybeUninit::uninit_array::<N>();
let mut i = 0;
loop {
array[i].write(f(i));
i += 1;
if i == N {
break;
}
}
unsafe { MaybeUninit::array_assume_init(array) }
}
struct FetchEven;
impl<const N: usize> Swizzle<N> for FetchEven {
const INDEX: [usize; N] = array_from_fn(const |i| 2 * i);
}
fn main() {
let x = Simd::from_array(std::array::from_fn(|i| i as u32));
let y = Simd::from_array(std::array::from_fn(|i| 8 + i as u32));
let z = FetchEven::concat_swizzle(x, y);
assert_eq!(<[u32; 8]>::from(x), [0, 1, 2, 3, 4, 5, 6, 7]);
assert_eq!(<[u32; 8]>::from(y), [8, 9, 10, 11, 12, 13, 14, 15]);
assert_eq!(<[u32; 8]>::from(z), [0, 2, 4, 6, 8, 10, 12, 14]);
}
One more step: panic safety
Assuming the necessary language features are stabilized, are we done?
As I mentioned in the beginning, the actual implementation of array::from_fn
is far from trivial, calling into a chain of helper functions.
#[inline]
#[stable(feature = "array_from_fn", since = "1.63.0")]
pub fn from_fn<T, const N: usize, F>(cb: F) -> [T; N]
where
F: FnMut(usize) -> T,
{
try_from_fn(NeverShortCircuit::wrap_mut_1(cb)).0
}
#[inline]
#[unstable(feature = "array_try_from_fn", issue = "89379")]
pub fn try_from_fn<R, const N: usize, F>(cb: F) -> ChangeOutputType<R, [R::Output; N]>
where
F: FnMut(usize) -> R,
R: Try,
R::Residual: Residual<[R::Output; N]>,
{
let mut array = MaybeUninit::uninit_array::<N>();
match try_from_fn_erased(&mut array, cb) {
ControlFlow::Break(r) => FromResidual::from_residual(r),
ControlFlow::Continue(()) => {
// SAFETY: All elements of the array were populated.
try { unsafe { MaybeUninit::array_assume_init(array) } }
}
}
}
// ...
Part of this complexity is a red herring, as array::from_fn
is implemented in terms of the companion array::try_from_fn
(where the closure is allowed to be fallible).
But another part is legitimate, to ensure exception safety (sometimes also called panic safety).
One problem in particular is what to do if a panic occurs when creating an element: in that case part of the array is already initialized, while the other part isn’t.
The Rust language handles panics with one of two strategies, as controlled by the panic
setting in Cargo.toml
:
- unwind the stack (the default),
- abort the program.
There isn’t much to worry about in the second case (resources like allocated memory and file descriptors will be freed by the operating system).
Unwinding is more insteresting: objects are dropped as the panic handling walks back the stack (potentially calling custom destructor code), and the panic may eventually be caught via the catch_unwind()
function.
This behavior is akin to exception handling in C++.
Back to our array initialization, one thing we should do in case of panic is drop the items that have already been initialized. Otherwise, it’s for example possible to trigger memory leaks if these items contain heap allocations, or deadlocks if these items contain mutex guards.
Panic with a partially initialized array.
Taking a step back from const
considerations, here is an example of program that will reach an out-of-memory error with a naive implementation of array_from_fn
that doesn’t try to handle panics.
#![feature(
maybe_uninit_array_assume_init,
maybe_uninit_slice,
maybe_uninit_uninit_array
)]
use std::mem::MaybeUninit;
fn main() {
// Loop enough times to exhaust memory on the Rust Playground configuration.
for _ in 0..50 {
let result = std::panic::catch_unwind(|| {
// Create an array of 100 items, each allocating 1MB, but the
// function panics creating the 51th item.
let x: [_; 100] = array_from_fn(|i| {
if i < 50 {
vec![i as u8; 1_000_000]
} else {
panic!("overflow!")
}
});
});
assert!(result.is_err());
}
println!("Hello, world!");
}
fn array_from_fn<T, const N: usize>(mut f: impl FnMut(usize) -> T + Copy) -> [T; N] {
let mut array = MaybeUninit::uninit_array::<N>();
for i in 0..N {
array[i].write(f(i));
}
unsafe { MaybeUninit::array_assume_init(array) }
}
Running this code on the Rust Playground will eventually lead to an out-of-memory error (and the program getting killed).
thread 'main' panicked at src/main.rs:19:21:
overflow!
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'main' panicked at src/main.rs:19:21:
overflow!
thread 'main' panicked at src/main.rs:19:21:
overflow!
...
thread 'main' panicked at src/main.rs:19:21:
overflow!
Exited with signal 9 (SIGKILL): kill program
The proper way to handle this, as done by the standard library, is to have a special guard object that keeps track of the number of initialized items, and drops them if a panic occurs before the whole array has been created. As you can see, this adds quite a bit of code.
fn array_from_fn<T, const N: usize>(mut f: impl FnMut(usize) -> T + Copy) -> [T; N] {
let mut array = MaybeUninit::uninit_array::<N>();
let mut guard = Guard {
array_mut: &mut array,
initialized: 0,
};
for i in 0..N {
guard.array_mut[i].write(f(i));
guard.initialized += 1;
}
std::mem::forget(guard);
unsafe { MaybeUninit::array_assume_init(array) }
}
struct Guard<'a, T, const N: usize> {
array_mut: &'a mut [MaybeUninit<T>; N],
initialized: usize,
}
impl<T, const N: usize> Drop for Guard<'_, T, N> {
fn drop(&mut self) {
eprintln!("Panic detected, dropping {} items", self.initialized);
// SAFETY: this slice will contain only initialized objects.
unsafe {
std::ptr::drop_in_place(MaybeUninit::slice_assume_init_mut(
self.array_mut.get_unchecked_mut(..self.initialized),
));
}
}
}
This time the program completes normally.
thread 'main' panicked at src/main.rs:19:21:
overflow!
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Panic detected, dropping 50 items
...
thread 'main' panicked at src/main.rs:19:21:
overflow!
Panic detected, dropping 50 items
Hello, world!
Note that even though I’ve mentioned “exception safety”, memory leaks and deadlocks are not in scope for Rust’s notion of safety, which focuses on memory safety and undefined behavior. In particular, reference cycles can leak memory, as was learned the hard way a few months before the Rust 1.0 release. Still, it is generally accepted that standard library types should drop already initialized items in case of panic whenever that’s possible.
It’s also clear that this creates additional code that isn’t useful at all when panic = "abort"
is configured.
And things are changing in this space: the mutex poisoning logic is disabled since Rust 1.78 unless panicking is in unwind mode.
This leverages the cfg(panic = "...")
conditional compilation offered by the language.
A lot of places in the standard library could be adapted in a similar way to reduce the cost of unwinding for those who don’t unwind, but that requires a lot of code changes.
Niko Matsakis recently proposed a more radical approach: deprecating panic = "unwind"
altogether.
This would definitely be a breaking change and therefore unlikely to happen in my opinion, but would simplify the work of library authors who wouldn’t have to worry about carefully handling panics.
Conclusion
What started as a seemingly simple problem – a naive implementation of array::from_fn
would only take five lines of code – ended up much more complex.
It’s of course frustrating when a given function isn’t available, but it’s often easy to overlook all the work that happens behind the scenes of fundamental standard library functions.
Trying to (re-)implement some of it oneself is a great way to appreciate the contributions of others!
In my case, doing this re-implementation exercise made it clear that there are fundamental blockers in the language that need to be addressed before array::from_fn
can be const
.
But in other cases, you may end up with an implementation ready to contribute back to the community, either directly in the standard library or in a separate crate.
Don’t be afraid to look behind the scenes and learn or contribute something on the way!
Bonus 1: Can we use a macro instead?
Often times, a way to work around unsupported const generics features in Rust is to leverage the macro system. So you may wonder if we could use a macro to generate an array of integers, rather than resorting to countless unstable features.
Let’s start with a simple case: a macro that takes an integer and outputs an array from zero to that integer.
For example, integers!(5)
should expand to [0, 1, 2, 3, 4, 5]
.
Rust macros come in two flavors: declarative macros and procedural macros. Declarative macros define rules in terms of a grammar, directly in the source code. I find them nice as they are also hygienic, with a type system over metavariables (e.g. expressions, literals, types), reducing the chances of programming errors. However, they are also inflexible: even though you can match a generic expression, you cannot transform it, nor specialize it to a base case where it is zero.
// A first attempt that doesn't work.
macro_rules! integers {
($i:tt) => {
[integer_sequence!($i)]
};
}
macro_rules! integer_sequence {
(0) => { 0 };
// "$i - 1" expands "42" into "42 - 1", not into "41".
($i:expr) => { $i, integer_sequence!($i - 1) };
}
Procedural macros are more flexible, and indeed the seq-macro
crate allows to generate an array of consecutive integers.
This could be adapted to generate an array of shuffled pairs.
let numbers = seq_macro::seq!(i in 0..=5 { [ #( i, )* ] });
assert_eq!(numbers, [0, 1, 2, 3, 4, 5]);
However, that wouldn’t be really useful: as mentioned in the preamble, the macro would receive as parameter a const generic parameter N
, not a specific integer.
So there isn’t any specific array the macro should expand to!
impl<const N: usize> Swizzle<N> for FetchEven {
// What tokens should "even_integers!(N)" expand to?
// N is a const generic parameter, not a specific integer!
const INDEX: [usize; N] = even_integers!(N);
}
The macro approach would simply not work for this problem – unless one puts the whole computation in a big macro and discards const generics altogether, which wouldn’t be very practical.
Bonus 2: How efficient is std::simd::simd_swizzle!
?
One thing you may wonder is whether the code generated by simd_swizzle!
is optimal, or if one should use lower-level intrinsics or inline assembly to get better performance.
This is a legitimate question: for example on Intel the generic SIMD permutation instruction on u32x8
(vpermps
) has 3 cycles of latency and 1 cycle of throughput, whereas more specific instructions such as vpermilps
have only 1 cycle of latency and 1 cycle of throughput.
A great tool to confirm that is the Compiler Explorer. To test the Rust compiler optimizations, I used the following program which contains shuffles that are either arbitrary or correspond to more specific intrinsics.
I’m using
extern "C"
to obtain a simpler assembly without moves from and to the stack around the relevant instructions. However, SIMD types are not FFI-safe and will trigger a warning. See also rust-lang/rust#53346 for a longer explanation. Don’t repeat this at home!The
#[no_mangle]
annotation is simply to help the Compiler Explorer.
#![feature(portable_simd)]
use std::simd::*;
// Intel-specific intrinsic
#[no_mangle]
pub extern "C" fn unpackhi_epi32(x: u32x8, y: u32x8) -> u32x8 {
simd_swizzle!(x, y, [2, 10, 3, 11, 6, 14, 7, 15])
}
// ARM-specific intrinsic
#[no_mangle]
pub extern "C" fn vzip1q_u8(x: u8x16, y: u8x16) -> u8x16 {
simd_swizzle!(x, y, [0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23])
}
// Relatively simple permutation
#[no_mangle]
pub extern "C" fn swap_pairs(x: u32x8) -> u32x8 {
simd_swizzle!(x, [1, 0, 3, 2, 5, 4, 7, 6])
}
// Arbitrary shuffle
#[no_mangle]
pub extern "C" fn arbitrary_shuffle(x: u32x8) -> u32x8 {
simd_swizzle!(x, [2, 7, 4, 1, 3, 5, 0, 6])
}
Compiling this on an Intel CPU with AVX2 instructions gives a well-optimized assembly (Rust flags: -O -C target-cpu=haswell
).
unpackhi_epi32:
vunpckhps ymm0, ymm0, ymm1
ret
vzip1q_u8:
vpunpcklbw xmm0, xmm0, xmm1
ret
swap_pairs:
vshufps ymm0, ymm0, ymm0, 177
ret
.LCPI3_0:
.long 2
.long 7
.long 4
.long 1
.long 3
.long 5
.long 0
.long 6
arbitrary_shuffle:
vmovaps ymm1, ymmword ptr [rip + .LCPI3_0]
vpermps ymm0, ymm1, ymm0
ret
On an ARM CPU with NEON instructions, the output is quite optimized as well (Rust flags: -O --target=aarch64-unknown-linux-gnu
).
unpackhi_epi32:
zip2 v1.4s, v1.4s, v3.4s
zip2 v0.4s, v0.4s, v2.4s
ret
vzip1q_u8:
zip1 v0.16b, v0.16b, v1.16b
ret
swap_pairs:
rev64 v0.4s, v0.4s
rev64 v1.4s, v1.4s
ret
arbitrary_shuffle:
ext v2.16b, v0.16b, v1.16b, #4
ext v3.16b, v0.16b, v0.16b, #4
trn2 v0.4s, v1.4s, v2.4s
ext v1.16b, v3.16b, v1.16b, #12
ext v0.16b, v0.16b, v2.16b, #4
zip2 v1.4s, v3.4s, v1.4s
ret
What’s interesting is to inspect the LLVM intermediate representation, which you can view in the Compiler Explorer with the --emit=llvm-ir
flag.
This shows that this each permutation is encoded in terms of the shufflevector
LLVM instruction, and that LLVM later selects the best instructions depending on the target CPU.
define <8 x i32> @unpackhi_epi32(<8 x i32> %x, <8 x i32> %y) unnamed_addr #0 !dbg !6 {
%0 = shufflevector <8 x i32> %x, <8 x i32> %y, <8 x i32> <i32 2, i32 10, i32 3, i32 11, i32 6, i32 14, i32 7, i32 15>, !dbg !11
ret <8 x i32> %0, !dbg !21
}
define <16 x i8> @vzip1q_u8(<16 x i8> %x, <16 x i8> %y) unnamed_addr #0 !dbg !22 {
%0 = shufflevector <16 x i8> %x, <16 x i8> %y, <16 x i32> <i32 0, i32 16, i32 1, i32 17, i32 2, i32 18, i32 3, i32 19, i32 4, i32 20, i32 5, i32 21, i32 6, i32 22, i32 7, i32 23>, !dbg !23
ret <16 x i8> %0, !dbg !28
}
define <8 x i32> @swap_pairs(<8 x i32> %x) unnamed_addr #0 !dbg !29 {
%0 = shufflevector <8 x i32> %x, <8 x i32> poison, <8 x i32> <i32 1, i32 0, i32 3, i32 2, i32 5, i32 4, i32 7, i32 6>, !dbg !30
ret <8 x i32> %0, !dbg !35
}
define <8 x i32> @arbitrary_shuffle(<8 x i32> %x) unnamed_addr #0 !dbg !36 {
%0 = shufflevector <8 x i32> %x, <8 x i32> poison, <8 x i32> <i32 2, i32 7, i32 4, i32 1, i32 3, i32 5, i32 0, i32 6>, !dbg !37
ret <8 x i32> %0, !dbg !42
}
In fact, using the respective intrinsics gives the exact same output as the simd_swizzle!
macro, both in terms of LLVM IR and assembly.
#![feature(portable_simd)]
use std::simd::*;
use std::arch::x86_64::*;
#[no_mangle]
pub extern "C" fn unpackhi_epi32_intrinsic(x: u32x8, y: u32x8) -> u32x8 {
unsafe { _mm256_unpackhi_epi32(x.into(), y.into()).into() }
}
define <8 x i32> @unpackhi_epi32_intrinsic(<8 x i32> %x, <8 x i32> %y) unnamed_addr #0 !dbg !7 {
%0 = shufflevector <8 x i32> %x, <8 x i32> %y, <8 x i32> <i32 2, i32 10, i32 3, i32 11, i32 6, i32 14, i32 7, i32 15>, !dbg !12
ret <8 x i32> %0, !dbg !22
}
unpackhi_epi32_intrinsic:
vunpckhps ymm0, ymm0, ymm1
ret
These experiments show that the simd_swizzle!
macro is a zero-cost abstraction: both user friendly and well optimized.
Once the portable_simd
feature is stabilized, there should be no need to learn hundreds of CPU-specific intrinsics anymore.
This also shows intrinsics are not tied to specific instructions but can be further optimized by LLVM, so it’s pointless to use intrinsics to force the Rust compiler to generate a specific sequence of instructions. Here is one last example to illustrate that.
#![feature(portable_simd)]
use std::arch::x86_64::*;
use std::simd::*;
#[no_mangle]
pub extern "C" fn swap_pairs_two_intrinsics(x: u32x8) -> u32x8 {
unsafe {
_mm256_permutevar8x32_epi32(
x.into(),
_mm256_set_epi32(6, 7, 4, 5, 2, 3, 0, 1),
).into()
}
}
#[no_mangle]
pub extern "C" fn swap_pairs_one_intrinsic(x: f32x8) -> f32x8 {
unsafe {
_mm256_shuffle_ps(
x.into(),
x.into(),
0b10_11_00_01,
).into()
}
}
swap_pairs_two_intrinsics:
vshufps ymm0, ymm0, ymm0, 177
ret
swap_pairs_one_intrinsic:
vshufps ymm0, ymm0, ymm0, 177
ret
This post was edited to take into account feedback on reddit, in particular about the Destruct
trait.
Comments
To react to this blog post please check the Mastodon thread, the Lobste.rs thread and the Reddit thread.
You may also like