In this blog post, I’m announcing STV-rs, a Rust implementation of Single Transferable Vote (STV) algorithms. STV is a voting method that allows voters to rank the candidates, with an iterative counting process that automatically re-distributes votes when candidates get elected or defeated. For example, considering the ballot “Alice > Bob > Charles”, if the candidate ranked first (Alice) loses the election (because she obtained the least number of votes), then the vote gets re-attributed to the candidate ranked next (Bob). A perhaps more surprising behavior of STV is that if Alice becomes elected, then some part of this vote is transferred to Bob as well.

Although it’s an uncommon election system, various STV methods have been used in some real-world elections (see the history of STV on Wikipedia). After giving a more detailed description of STV, in particular Meek’s method, I’ll present a new mathematical method that allows to handle ballots with equal preferences.

Lastly, I’ll describe practical aspects of implementing STV using Rust. We’ll see that exact arithmetic has a prohibitive computational cost, and how the choice of the rounding strategy can affect the results. I’ll then show how Rust is a good choice in testing and optimizing this arithmetic-intensive workload. I’ll finish by presenting benchmarks to compare the various methods.


Single Transferable Vote (STV)

Single Transferable Vote (STV) is an electoral system that aims to offer proportional representation, for elections with multiple seats.

“Traditional” voting systems allow each voter to cast only one vote (first past the post voting), or to pick up to nn candidates for nn seats (plurality block voting), and elect winners by counting which candidate(s) have the most votes. While these systems are simple to count and verify – the score for each candidate is simply the number of votes attributed to them – they suffer from a “winner takes all” effect which leads to non-proportionality of the representation.

Besides, a large amount of votes are usually wasted.

  • Indeed, voting for one’s favorite candidate is a lost vote if this candidate lost with a high margin. It would have been more efficient to vote for one’s second favorite candidate instead if this candidate was on the edge of winning.
  • Likewise, voting for a very popular candidate that won by a large margin is an excess vote, and also inefficient. If the voter already knew that their favorite candidate was winning, they could have voted for their next preference instead to make sure they were elected as well.

In contrast, Single Transferable Vote gives each voter one vote, but asks them to rank the candidates. The counting process starts by attributing one vote per ballot to the preferred candidate (like for traditional systems), but then gives the opportunity to transfer the vote to the next candidate(s) in the ranking, if that vote was wasteful.

The nice thing is that this transfer happens in an automatic manner, following an iterative counting algorithm. The main drawback is that the counting process is non-trivial and therefore less easy to verify – typically one needs to run it on a computer. Additionally, implementation details (or bugs) can have ripple effects on the results: I’ll expand on these in the rest of my article.

In summary, with STV, greater1 fairness in proportional representation (by reducing wasted votes) comes at the price of a higher barrier to entry in verifying the counting: only “power users” who already have sufficient knowledge in mathematics and programming can independently verify the results.

Counting process

There are multiple variants of STV (see details on Wikipedia), but they share the following general idea.

At a high level, candidates are categorized as either hopeful (the starting status), elected or defeated. The votes are counted in multiple rounds until all candidates end up in the elected and defeated buckets.

To start, we simply attribute one vote for each ballot to the candidate ranked first. In subsequent rounds:

  • If a candidate X becomes defeated – typically the candidate with the least votes at the end of each round – their votes are transferred to the next candidates. For example, each ballot where X was ranked first will have its vote transferred to the candidate ranked second.
  • Once a hopeful candidate gets more votes than a required threshold, they become elected. There are multiple ways to define this threshold, see for example Droop’s quota in the next section. Because any votes in excess of this threshold were wasteful, most STV methods will also transfer this surplus of votes to candidates ranked next. However, only a fraction of the votes are transferred in this case: the votes required to meet the threshold must be kept. I’ll develop the details in Meek’s method below.

Droop quota

The first question is how to define the quota qq for a candidate to be elected. Droop’s quota defines it as:

q=tn+1+εq = \frac{t}{n + 1} + \varepsilon

where tt is the total number of votes, nn is the number of seats and ε\varepsilon is a positive number. To explain this formula, we can see that at most nn candidates can reach this quota, because if n+1n + 1 candidates would reach it, they would collectively get at least t+(n+1)εt + (n + 1) \varepsilon votes, which is greater than the total tt. Therefore, any candidate that reaches this quota is assured to win.

More concretely, with one seat the winner needs to have 50% of the votes (plus one). With two seats a winner needs to have one third of the votes (plus one), etc.

However, it isn’t guaranteed that any candidate will reach this quota from the start: for example with 3 candidates and 1 seat, the initial distribution of votes could be 40%, 35% and 25%, with no candidate reaching 50%. This is why we need to transfer votes, to make candidates reach the quota.

Meek

Meek’s method works by attributing a keep factor 0k10 \le k \le 1 to each candidate, corresponding to the fraction of votes that this candidate keeps for themselves, the rest being transferred to the next candidates. Keep factors are defined as follows.

  • If an elected candidate received vv votes, but the required quota was qq, their keep factor is defined as k=q/vk = q / v. This corresponds to the fraction of votes that were needed to reach the quota.
  • Candidates that are still hopeful have a keep factor of 1: they need to keep all of their votes to hope to reach the quota.
  • Defeated candidates have a keep factor of 0: all of their votes are transferred to other candidates.

Keep factors are then used to transfer votes as follows. Let’s say a ballot contains the ranking “A > B > C”.

  • In the first round, this ballot attributes exactly 1 vote to candidate A. Let’s assume that A becomes elected at the end of the first round, with a keep factor 0<a10 < a \le 1.
  • In the next round, we re-count the ballots. This time, this ballot attributes a fractional vote aa to A, and a further 1a1 - a is transferred to candidate B.
  • Likewise, if B becomes elected, then in the next count B keeps only a fraction bb of this 1a1 - a.

Overall, at any round this ballot attributes a fraction aa to A, (1a)b(1-a)b to B and (1a)(1b)c(1-a)(1-b)c to C.

Meek counting of a ballot Counting of the ballot “Apple > Banana > Cherry”, with respective keep factors 0.5, 0.7 and 0.8.
Apple keeps 50% of the whole vote.
Banana receives 0.5 votes and keeps 70% of that = 0.35.
Cherry receives 0.15 votes and keeps 80% of that = 0.12.
0.03 votes are discarded.

One thing to note is that if all of the candidates in a ballot are elected, a fraction of the vote is discarded – in our example (1a)(1b)(1c)(1-a)(1-b)(1-c). This is because voters are not required to rank all the candidates, otherwise the ballot would end up attributing votes to a hopeful candidate.

Because these “residual” votes are lost, they are subtracted from the “total votes” part of the quota. Otherwise, we could end up with most votes being discarded and no other candidate ever reaching the quota. Given the total residue rr across all votes, the quota is updated after each round to the following.

q=trn+1+εq = \frac{t - r}{n + 1} + \varepsilon

Likewise, keep factors are re-computed after each round, decreasing round after round. Indeed, even though a keep factor is initially defined so that an elected candidate reaches exactly the quota in the current round, they can receive more votes transferred from other candidates in the next round. The iterative update is to simply re-multiply each keep factor by q/vq / v.

kn+1=knqnvnk_{n+1} = k_n \frac{q_n}{v_n}

Handling equal preferences

Now that I’ve introduced the overall idea of Single Transferable Vote with Meek’s method, let me discuss the main idea of this article: handling equal preferences.

Indeed, so far the description assumed a strict ordering of candidates, such as “A > B > C”. However, this can be cumbersome for voters when there are many seats and candidates. Voters may want to rank some candidates equally, if they have no specific preference between them. For example, a ballot could be “A=B > C”, meaning that candidates A and B are equally preferred to C.

However, how should votes be attributed in this case? The “chaining” formula with keep factors doesn’t directly apply.

Prior art: Droop.py

Looking at prior art, the Droop.py implementation (v0.14) uses a “recursive decent” method.

The idea is to split the vote equally among candidates at the same level, and transfer any residual vote to the next candidate. For example, with the ballot A=B > C:

  • 0.50.5 votes would be attributed to both A and B, who respectively keep 0.5a0.5 a and 0.5b0.5 b,
  • C obtains the residual 0.5((1a)+(1b))0.5 ((1-a) + (1-b)).

In effect, this is as if the ballot was a superposition of two votes, each weighted 0.50.5: A > C and B > C.

Counting of a ballot by recursion Counting of the ballot “Apple = Banana > Cherry” by recursive descent.
Even though Banana has a keep factor of 1, some votes are attributed to Cherry.

This is problematic because such superposition doesn’t necessarily represent the wish of the voter: if A is elected but B is not, then some votes from A will be transferred to C rather than to B. A more faithful representation would be a superposition of A > B > C and B > A > C, i.e. A=B is interpreted as any permutation of A and B.

Counting of a ballot by permutations Counting of the ballot “Apple = Banana > Cherry” by permutations.
Banana has a keep factor of 1, so no votes are transferred to Cherry.

Another drawback of the “recursive descent” method is that it become quite slow if a ballot contains many equal rankings. Consider the ballot A=B > C=D > E=F > G=H: at each level there are two branches in the recursion, so there are 242^4 recursive calls to compute. This grows exponentially: a ballot containing nn equal pairs will be counted in O(2n)O(2^n) operations.

Interpreting the ballots differently does affect the results. You can find the following simple example in equal_preference.blt, with 6 ballots cast in the election:

  • 3 ballots for apple,
  • 2 ballots for apple=banana cherry,
  • 1 ballot for cherry apple.

With these 3 candidates and 2 seats:

Another problem (implementation bug?) with Droop.py’s method is that if all candidates that are ranked equally are defeated, then the transfer stops! I.e. A=B > C will attribute no vote to C if A and B are both defeated, rather than attributing all the votes to C. Worse, the implementation also stops at the first defeated candidate, if the ballot contains one equal ranking somewhere. So the ballot A > B > C=D won’t attribute any vote if A is defeated. This is certainly a major flaw compared to the general expectation of STV that each ballot attributes one vote (unless there is some residue remaining after going through all the candidates in the ballot).

Given that equal preferences are a reasonable and practical requirement, can we do better?

Counting permutations

I mentioned previously that a reasonable interpretation would be to consider each equal ranking A=B as a superposition of all possible permutations, i.e. A > B and B > A. However, this would become impractical in terms of complexity: if there are nn candidates ranked equally somewhere in a ballot, this expands to n!n! permutations!

Let’s still look at how the formulas would look like. Let’s assume that we need to distribute a vote fraction xx when we process the rank A=B (some votes were already kept by candidates ranked before in the ballot, so x1x \le 1). Let’s also denote by a=1aa' = 1 - a the unused factor of A, i.e. the complement of the keep factor aa.

If we interpret A=B as a superposition of these permutations, each candidate receives the following votes.

Permutation A B Remainder (next candidates in the ballot)
A > B axa x abxa' b x abxa' b' x
B > A baxb' a x bxb x abxa' b' x
Total 12(1+b)ax\frac{1}{2} (1 + b') a x 12(1+a)bx\frac{1}{2} (1 + a') b x abxa' b' x

Likewise, with three candidates ranked equally, A=B=C is a superposition of 6 permutations.

Permutation A B C Remainder
A > B > C axa x abxa' b x abcxa' b' c x abcxa' b' c' x
A > C > B axa x acbxa' c' b x acxa' c x abcxa' b' c' x
B > A > C baxb' a x bxb x abcxa' b' c x abcxa' b' c' x
B > C > A bcaxb' c' a x bxb x bcxb' c x abcxa' b' c' x
C > A > B caxc' a x acbxa' c' b x cxc x abcxa' b' c' x
C > B > A bcaxb' c' a x cbxc' b x cxc x abcxa' b' c' x
Total 16(2+b+c+2bc)ax\frac{1}{6} (2 + b' + c' + 2b'c') a x 16(2+a+c+2ac)bx\frac{1}{6} (2 + a' + c' + 2a'c') b x 16(2+a+b+2ab)cx\frac{1}{6} (2 + a' + b' + 2a'b') c x abcxa' b' c' x

Similarly, with 4 candidates A=B=C=D, we have a superposition of 4!=244! = 24 permutations and, sparing you the calculation details, candidate A receives:

14!(6+2(b+c+d)+2(bc+bd+cd)+6bcd)ax\frac{1}{4!} (6 + 2(b' + c' + d') + 2(b'c' + b'd' + c'd') + 6b'c'd') a x

Other candidates receive a symmetric amount, and the remainder is abcdxa'b'c'd' x.

Generalized formula

Now, we need to generalize these examples into a formula that works (and can be computed) for all ballots. The first observation is to use the elementary symmetric polynomials, defined as follows (more details on the Wikipedia page).

ek(X1,,Xn)=1j1<j2<<jknXj1Xjke_k(X_1, \ldots, X_n) = \sum_{1 \le j_1 < j_2 < \cdots < j_k \le n} X_{j_1} \dotsm X_{j_k}

These allow to re-write the formulas for candidate A.

  • with A=B, A receives:
12!(e0(b)+e1(b))ax\frac{1}{2!} (e_0(b') + e_1(b')) a x
  • with A=B=C, A receives:
13!(2e0(b,c)+e1(b,c)+2e2(b,c))ax\frac{1}{3!} (2 e_0(b', c') + e_1(b', c') + 2 e_2(b', c')) a x
  • with A=B=C=D, A receives:
14!(6e0(b,c,d)+2e1(b,c,d)+2e2(b,c,d)+6e3(b,c,d))ax\frac{1}{4!} (6 e_0(b', c', d') + 2 e_1(b', c', d') + 2 e_2(b', c', d') + 6 e_3(b', c', d')) a x

We start to see a pattern here, but one question is which coefficients should appear in front of each eie_i?

Let’s say nn candidates are ranked equally (including A). From a combinatorial point of view, ei(b,c,...)e_i(b', c', ...) represents all the permutations where A is at the i-th position (counting from index zero). For any subset of ii elements among {b,c,...}\{b', c', ...\}, there will be a term in eie_i for this subset ; additionally, there are i!i! ways to permute these ii elements before A, and (n1i)!(n-1-i)! ways to permute the remaining n1in-1-i elements after A.

Therefore, candidate A receives the following votes:

1n!i=0n1i!(n1i)!ei(b,c,...)ax\frac{1}{n!} \sum_{i=0}^{n-1} i! (n-1-i)! e_i(b', c', ...) a x

One last combinatorics trick is that:

n!i!(n1i)!=n(n1i)\frac{n!}{i! (n-1-i)!} = n {n-1 \choose i}

which will be an integer. The final formula for votes attributed to candidate A is:

axni=0n11(n1i)ei(b,c,...)\frac{a x}{n} \sum_{i=0}^{n-1} \frac{1}{n-1 \choose i} e_i(b', c', ...)

Computing this formula

In terms of efficiency, we can pre-compute and cache all the necessary (n1i)n-1 \choose i using Pascal’s triangle.

Another question is how to compute the elementary symmetric polynomials ei(b,c,...)e_i(b', c', ...), which themselves have many terms? For this, we can use Vieta’s formulas, namely that the polynomial (X+b)(X+c)...(X+n)(X + b') (X + c') ... (X + n') will have for coefficients precisely these elementary polynomials, i.e. be equal to:

i=0nei(b,...,n)Xni\sum_{i=0}^n e_i(b', ..., n') X^{n-i}

We can expand this polynomial into an array of coefficients in O(n2)O(n^2) steps, by iteratively multiplying by each (X+i)(X + i').

Although this formula is (much) more involved than the baseline Meek method, it is nonetheless relatively fast to compute from a complexity perspective (still in polynomial time). We avoid the exponential blow-up of Droop.py’s implementation, because we don’t need to branch and recurse through all levels in a ballot. If some rank A=B=...=N receives some input votes xx, we simply need to split that into candidates {A, B, …, N}, and pass on ab...nxa'b'...n' x to the next rank.

From a fairness perspective, this formula ensures that if any candidate in a rank is not elected yet, then zero votes are passed to the next rank in the ballot, which is exactly what we wanted to achieve. For example, let’s say candidate B is not elected yet, then the keep factor is b=1b = 1, therefore the unused factor is b=0b' = 0 which leads to ab...nx=0a'b'...n' x = 0.

Arithmetic

So far, I’ve described the counting method in purely mathematical terms, where all expressions are rational numbers. However, in practice an exact computation would quickly blow up!

Indeed, with a computer, one can represent rational numbers of arbitrary precision with big integers, i.e. x=abx = \frac{a}{b} were aa and bb are big integers. However, increasing the precision (i.e. the number of bits of aa and bb) leads to slower arithmetic operations: for example multiplying two integers of nn bits has an asymptotic complexity of O(nlogn)O(n \log n).

Additionally, each operation on rational numbers typically increases the precision of the result. For example multiplying two numbers with nn bits of precision x=abx = \frac{a}{b} and y=cdy = \frac{c}{d} yields a result xy=acbdxy = \frac{ac}{bd} of 2n2n bits of precision (minus the reduction by gcd(ac,bd)\gcd(ac, bd)). Likewise x+y=ad+bcbdx + y = \frac{ad + bc}{bd} has around 2n2n bits of precision. Therefore, chaining NN basic operations in a row can yield numbers of exponential precision, approximately 2N2^N times the initial precision!

At each round, the precision of the vote counts depends on the precision of the keep factors, with some amount of multiplications and additions. Then, the quota depends on the votes, and the next keep factors are updated based on the quota and each candidate’s votes. So we can see that the arithmetic “depth” increases at each round, and the precision grows (at least) exponentially.

To alleviate this problem, we need to use some arithmetic approximation.

Floating- and fixed-precision

The first approach you may think of is floating-point numbers (e.g. f64 in Rust). However, this suffers from various drawbacks: the precision is not uniform (but “floating”), and the main issue is that rounding can go up or down (due to rounding to the nearest representable number), which can affect fairness. Indeed, when counting votes attributed to a candidate, we always want to round down, so that a candidate close but below the threshold in exact computation stays below the threshold with rounding. Conversely, when computing a candidate’s keep factor, we always want to round up, to make sure that the candidate still gets at least the threshold in the next round, and that no vote gets unduly transferred to other candidates.

The next option, supported by Droop.py, is to use a fixed precision, e.g. 9 decimal places. This means that each number is represented by an integer, to multiply by 10910^{-9}. With that, addition and subtraction are trivial: we just add/subtract the integers. Multiplication requires rounding: x109y109=xy1018x 10^{-9} y 10^{-9} = xy 10^{-18}, so we need to discard the last 9 digits of xyxy. Likewise, division needs to be trimmed to the precision. However, we can control whether each rounding goes up or down.

One issue with rounding is that multiplication becomes non-associative! Indeed, x(yz)x \cdot (y \cdot z) is not always equal to (xy)z(x \cdot y) \cdot z, because of the order in which rounding happens. For example, with rounding after 6 decimal places:

{0.000001(0.52)=0.0000011=0.000001(0.0000010.5)2=02=0\begin{cases} 0.000001 \cdot (0.5 \cdot 2) = 0.000001 \cdot 1 = 0.000001 \\ (0.000001 \cdot 0.5) \cdot 2 = 0 \cdot 2 = 0 \end{cases}

This means that we need to be careful about the order of chained multiplications, to make sure we don’t introduce bias for or against any candidate.

With Meek’s strict ranking of candidates, the counting formula for a ballot comes with a natural ordering of multiplications: 1a1 - a becomes (1a)(1b)(1 - a)(1 - b), which becomes (1a)(1b)(1c)(1 - a)(1 - b)(1 - c), i.e. we always append a multiplication to the right (according to the order defined in the ballot). However, with equal rankings in ballots we lose this natural ordering of arithmetic operations, especially with the new formula I’ve described. In particular, we should be careful not to introduce bias based on an arbitrary order such as alphabetical order!

Other approaches

One way to avoid all these issues is to use a larger precision, hoping that errors will stay small enough. I don’t have a mathematical proof, but if there are NN ballots to count, we can hope that the rounding errors will not matter as long as the precision follows some function of 1/N1 / N. But is the correct precision 1/2N1 / 2N, 1/N21 / N^2, something else?

To get more clarity, we could use interval arithmetic and keep track of all rounding errors: any time a rounding occurs, the result becomes an interval between the rounded down and rounded up values.

Another approach is to mix exact arithmetic with some amount of rounding: count all the votes exactly, but always round the keep factors to some fixed precision (e.g. 6 decimal places). Indeed, this small loss in precision doesn’t have a large effect given that keep factors are updated all the time, and may conservatively be greater than their exact value. I’ve implemented this in approx_rational.rs.

Needless to say, there are many trade-offs to be made in the choice of arithmetic. Importantly, this choice may affect the final election results, which means that any election must specify the choice of arithmetic in advance, to make sure the results are reproducible, and are not going to be contested after the fact.

Practical implementation in Rust

Let’s now discuss some insights from my Rust implementation: STV-rs. I’ve first re-implemented Meek’s method, and after some trial-and-error validated that it is consistent with Droop.py’s implementation, i.e. both produce the same results of the election transcript (with details round by round). On top of it, I’ve implemented the new method I introduced to count equally-ranked candidates.

Re-implementing a protocol from general specifications and comparing it to existing implementations is generally a good way to question one’s assumptions, but also to discover bugs in other implementations. This generally applies to file formats, parsers, network protocols, cryptographic protocols, etc. In this case, I discovered Droop.py’s recursive decent method while trying to reproduce the results of some election.

I also discovered inputs that crash the counting algorithm! They crash both Droop.py’s and my implementation (which is good in terms of consistency), and I’m proposing a fix.

Testing

Testing is particularly important for vote-counting software: the last thing we want is to report incorrect election results because of a bug!

End-to-end tests are definitely a good start, providing a way to compare results with other implementations. However, despite the advantage of being decoupled from internal details and making it easy to compare with other implementations, they are often not enough. In our case, we have a complex program consisting of 3 main layers of abstraction: arithmetic operations, counting ballots, and iterating until each candidate is either elected or defeated. This makes it hard for end-to-end tests to cover all branches of all the functions, and to prove to the reader that the tests are correct.

Rust makes it very easy to create unit tests that cover each function independently, via the #[test] tag and the cargo test command. An interesting kind of unit test I’ve implemented is property testing for the base arithmetic operations. One example is to test if multiplication is associative (which as we saw isn’t always the case).

pub fn test_mul_is_associative() {
    let test_values = get_test_values();
    for a in &test_values {
        for b in &test_values {
            for c in &test_values {
                assert_eq!(
                    &(a * b) * c,
                    a * &(b * c),
                    "(a * b) * c != a * (b * c) for {a}, {b}, {c}"
                );
            }
        }
    }
}

I then wanted to test all these properties on various of arithmetic implementations (exact, fixed precision, floating point, etc.). However, for each arithmetic some properties hold and other don’t. To scale these property tests, I leveraged Rust’s declarative macros to succinctly describe which properties are expected.

The syntax I defined looks like this: a list of tests, those that shouldn’t pass being followed by a => fail("...") annotation with an expected counter-example.

big_numeric_tests!(
    FixedDecimal9,
    test_add_is_associative,
    test_mul_is_associative => fail(r"assertion failed: `(left == right)`
  left: `FixedDecimal9(0)`,
 right: `FixedDecimal9(1)`: (a * b) * c != a * (b * c) for 0.000000001, 0.536870912, 2.147483646"),
    ...
)

In principle, property testing can use randomized inputs (making it similar to fuzzing), but to keep the tests and the counter-examples deterministic I chose a fixed set of values for each type instead.

Lastly, it’s good to keep track of code coverage to ensure code is indeed being tested. There are multiple frameworks for surfacing it in Rust: in my case I found success using cargo-llvm-cov with Codecov.

Parallelism

Even though Rust is fast, counting STV elections can take seconds or even minutes depending on the chosen arithmetic and the complexity of the ballots. A simple optimization we can apply is to parallelize vote counting across multiple CPU cores. The basic implementation of vote counting is a serial for loop over the ballots, where each ballot can increment votes for the various candidates.

A high-level, simplified view is the following, accumulating the votes into a dedicated structure.

// Note: The generic parameter T represents the arithmetic type.
struct VoteAccumulator<T> {
    /// Sum of votes for each candidate.
    sum: Vec<T>,
}

impl<T> VoteAccumulator<T> {
    fn new(num_candidates: usize) -> Self {
        Self { sum: vec![T::zero(); num_candidates] }
    }
}

fn count_votes_serial<T>(
    num_candidates: usize,
    ballots: &[Ballot]
) -> VoteAccumulator<T> {
    let mut vote_accumulator = VoteAccumulator::new(num_candidates);
    for ballot in ballots.iter() {
        process_ballot(&mut vote_accumulator, ballot);
    }
    vote_accumulator
}

The loop over the ballots is embarrassingly parallel, and can be distributed across multiple threads via a map-reduce pattern.

  • The “map” part consists in the process_ballot() function, mapping a ballot to the votes attributed to each candidate.
  • The “reduce” part consists in summing the accumulators, candidate-wise.

The rayon crate makes it very easy to apply this map-reduce pattern in Rust: we just need to call its par_iter() function, and to define an additional reduce() function on our accumulator.

impl<T> VoteAccumulator<T> {
    fn reduce(self, other: Self) -> Self {
        Self {
            sum: std::iter::zip(self.sum, other.sum)
                .map(|(a, b)| a + b)
                .collect(),
        }
    }
}

fn count_votes_parallel<T>(
    num_candidates: usize,
    ballots: &[Ballot]
) -> VoteAccumulator<T> {
    ballots
        .par_iter()
        .map(|ballot| {
            let mut vote_accumulator = VoteAccumulator::new(num_candidates);
            process_ballot(&mut vote_accumulator, ballot);
            vote_accumulator
        })
        .reduce(
            || VoteAccumulator::new(num_candidates),
            |a, b| a.reduce(b),
        )
}

Under the hood, rayon spawns a suitable number of threads and dispatches all the tasks to map and reduce across these threads. This can speed up the wall time by a factor up to the number of available CPUs (not quite in practice, see the benchmarks below). The only caveat is that we allocate a new VoteAccumulator for each task (i.e. ballot), which doesn’t seem optimal.

After a bit of digging, I found the fold method and its sibling fold_with, which precisely allow to reuse the same accumulator on each thread (conceptually). All we need to do is replace the map call.

     ballots
         .par_iter()
-        .map(|ballot| {
-            let mut vote_accumulator = VoteAccumulator::new(num_candidates);
-            process_ballot(&mut vote_accumulator, ballot);
-            vote_accumulator
-        })
+        .fold_with(
+            VoteAccumulator::new(num_candidates),
+            |mut vote_accumulator, ballot| {
+                process_ballot(&mut vote_accumulator, ballot);
+                vote_accumulator
+            },
+        )
         .reduce(
             || VoteAccumulator::new(num_candidates),
             |a, b| a.reduce(b),
         )

Arithmetic optimizations

As I’ve explained, the computational complexity of exact arithmetic blows up after a few rounds and cannot be used directly for real-world elections.

An interesting case performance-wise is the integer type used for fixed-point arithmetic (times 10910^{-9}). Using BigInt from the num crate is a conservative choice, but unfortunately it means that each arithmetic operation leads to allocating its result on the heap, which is slow.

However, if we assume that all numbers are small enough to be represented by 64 bits, we can use a i64 instead. This comes with the drawback that we need to check for overflow (and panic! if one occurs). Still, thanks to the avoided allocation and the direct use of CPU instructions for each arithmetic operation, the performance was 10-100x faster than BigInt (see the benchmarks below)!

It’d be interesting to see the performance effect of a small-size optimization applied to the BigInt type. Under the hood, this type is backed by a Vec<u64> (on 64-bit targets). The Vec typically contains 3 usize values (size, capacity, pointer), plus the backing array on the heap. If the size is 1 or 2, the data could be stored directly in the capacity and pointer fields rather than on the heap, which is what the smallvec crate implements. This removes expensive allocations/deallocations to create the data and indirection to access the data. There is a pending pull request to back BigInt by smallvec, however it hasn’t received much traction since 2021.

Benchmarks

An article on this blog wouldn’t be complete without a performance analysis of the software (see my other posts about writing optimized software in Rust). I will focus this analysis on 3 aspects:

Firstly, I measured the basic operations (addition, subtraction, multiplication, division), as well as the time to count ballots of various shapes (with more or less candidates ranked equally). We can see on the following graph (logarithmic scale) that the choice of the arithmetic approximately incurs a multiplier on all the operations.

Benchmarks by arithmetic implementation Comparison of arithmetic implementations (logarithmic scale).

Arithmetic Multiplier
f64 1
Fixed-point arithmetic (9 decimal places) backed by i64 ~4x
Fixed-point arithmetic (9 decimal places) backed by BigInt ~300x
Rational arithmetic using Ratio<i64> ~100x
Exact Rational arithmetic using BigRational = Ratio<BigInt> ~1000x

These benchmarks are using small enough numbers that all the BigInts actually fit into an i64, i.e. the gap observed between i64 and BigInt is only about the baseline overhead of BigInt in terms of allocations and additional code to run.

Next, I evaluated the performance of my new equalized counting formula vs. the recursive descent implemented by Droop. The following benchmark measures the time to count a single ballot, depending on the ballot “pattern”, i.e. how many candidates are ranked equally. For example:

  • “2x5” reads as “5 consecutive pairs of equally ranked candidates”, such as the ballot A=B C=D E=F G=H I=J,
  • “10x2” reads as “2 consecutive tuples of 10 equally ranked candidates”, such as the ballot A=B=C=D=E=F=G=H=I=J K=L=M=N=O=P=Q=R=S=T.

With such nmn * m ballot patterns, the expected computational complexity is O(nm)O(n^m) for the recursive descent, and O(n2m)O(n^2 m) for my equalized counting method (see above). The following plot confirms this in practice.

Benchmarks for equally-ranked candidates Comparison of methods to count equally-ranked candidates (logarithmic scale).

Lastly, how does the parallelism enabled by rayon perform? I had great hopes for this drop-in parallelism framework, but it performed a bit worse than expected out-of-the-box. I’ve used the following measurement setup.

  • Using the hyperfine tool to benchmark command-line programs.
  • Like in previous posts, I’m using Docker’s --cpus parameter to simulate a range of available CPU cores.
  • For my implementation, I’ve chosen the fixed-point arithmetic with 9 decimal places backed by a BigInt, to be comparable with Droop.py (assuming Python integers are backed by big integers).

Here is how Droop and my implementation performed on the rand_2x10 ballot file. As a control, the dashed lines represent the benchmarks with parallelism disabled.

Benchmarks of parallelism (ballot file = rand_2x10) Comparison between Droop and my implementation to count a whole election.

  • We can see that Droop’s performance is roughly constant, as this Python implementation isn’t parallel. Likewise, my implementation has constant performance when parallelism is disabled.
  • At the baseline (no parallelism and/or 1 CPU), my Rust implementation is 2-3x faster than Python.
  • Disappointingly, when more CPUs are available, my parallel implementation only offers up to a 2x speedup in terms of wall time.
  • Worse, the total CPU time spent in userspace grows linearly with the number of available CPUs! This means that giving more CPUs only makes my program waste more CPU time rather than doing useful work.
  • An interesting metric is the CPU time spent in the kernel handling system calls, which also grows with the number of available CPUs.

Below is a longer benchmark, for the rand_hypergeometric ballot file. We observe similar effects (with more noise because hyperfine samples slow tests only 10 times). Note that the “system” time measurement has its own scale on the right-hand side.

Benchmarks of parallelism (ballot file = rand_hypergeometric) Comparison between Droop and my implementation to count a more complex election.
The system time is using the scaled axis on the right-hand-side.

One hypothesis is that the rayon framework is experiencing contention (waiting on mutexes?) to distribute available items across CPUs. Another possibility (less likely in my opinion) is that each call to Rayon’s par_iter() method incurs some overhead, e.g. to configure a thread pool.

I haven’t studied the details yet, but profiling and optimizing this would be a nice follow-up article.

Final thoughts

I’m by no means an expert in voting systems, but it’s clear that computers have enabled new technologies to emerge and support elections. Voting machines aim to reduce the need for manual labor in counting paper ballots, electronic voting allows people to vote from home without the need to travel to a polling station. And computers enable more involved voting systems like STV, which would be practically impossible to (correctly) count by hand.

However, some of these systems have become quite complex, due to many fundamental requirements like security (nobody should be able to tamper with the election), auditability (making sure nobody voted more than once, making sure every vote has been counted), vote privacy, etc. With an increase in complexity comes an increased risk of implementation bugs, or outright design flaws.

In this post I’ve proposed a new method extending STV. I find it promising, but would I recommend it? Definitely not before others have audited it!

Lastly, let’s not forget that although we can design many technological solutions to run elections, they remain first and foremost a human problem. If the voters don’t trust it, even the “best” technology won’t make a good election system.


References


  1. Due to Arrow’s impossibility theorem, although STV aims to provide more proportionality, it is not a perfect election system. 


Comments

To react to this blog post please check the Mastodon thread and the Reddit thread.


RSS | Mastodon | GitHub


You may also like

Optimization adventures: making a parallel Rust workload even faster with data-oriented design (and other tricks)
Optimization adventures: making a parallel Rust workload 10x faster with (or without) Rayon
Making my website 10x smaller in 2024, with a dark mode
Asynchronous streams in Rust (part 1) - Futures, buffering and mysterious compilation error messages
And 32 more posts on this blog!