Nine Rules to Formally Validate Rust Algorithms with Dafny (Part 1)

Lessons from Verifying the range-set-blaze Crate

Carl M. Kadie
Towards Data Science

--

By Carl M. Kadie and Divyanshu Ranjan

Crab looking at the Pythagorean theorem on a chalkboard.
Crab proving Pythagoras’ theorem — Source: https://openai.com/dall-e-2/ & CC BY-SA 3.0 File:Pythagorean.svg

My Rust crate range-set-blaze depends on a critical function named internal_add. The function is supposed to insert a range of integers into the crate’s data structure. But does it do so correctly? Of course, I test, but testing can miss bugs. Ideally, I want mathematical certainty of correctness.

Aside: As Rust programmers, we appreciate certainty. The Rust type system guarantees we won’t dereference null pointers. The Rust borrow checker guarantees we won’t use memory after it is freed. Tools such as the Kani Rust crate guarantee — in some cases — that arithmetic will not overflow. But what if we want certainty that an algorithm is correct?

To achieve this certainty, Divyanshu Ranjan and I ported internal_add's algorithm to the Dafny language. We then validated the Dafny-version of the algorithm. (We chose Dafny for its combination of power and ease of use. We’ll talk a little more about this choice, presently.)

Over the course of the validation, we learned nine rules that can help you validate algorithms — written in Rust or other languages — with Dafny. You may also find these rules interesting as a way to gauge the ease or difficulty of such verification using modern tools.

The rules are:

  1. Don’t Learn Dafny.
  2. Learn Dafny.
  3. Define Your Algorithm’s Basic Concepts.
  4. Specify Your Algorithm.
  5. Get Help from the Dafny Community.
  6. Validate a Different, Easier, Algorithm.

See Part 2 for these rules:

7. Port your Real Algorithm to Dafny.

8. Validate the Dafny Version of Your Algorithm.

9. Rework Your Validation for Reliability.

Aside: To avoid wishy-washiness, we call these “rules”, but they are, of course, just suggestions.

The internal_add function tries to efficiently insert a new range of integers into an existing list of sorted and disjoint integer ranges. For example, if we started with [101..=102, 400..=402, 404..=405] and added 402..=404, we expect a result of [101..=102, 400..=405].

Source: This and all following images by author.

Ideally, I’d formally verify this algorithm with Rust-specific tools [1,2]. Those tools, however, seem hard to use. Instead, I chose Dafny. Dafny is a language and verification system. It is taught to undergraduates at universities around the world. It is used in industry. I find it to be addictively interactive and programmer friendly.

Aside: Dafny creator, Dr. Rustan Leino has a connection to Rust beyond the coincidence of his first name. He helped create Spec#, the first language to use a type system to avoid null pointers. Rust, of course, adopted this idea to great success.

This article covers rules 1 to 6. Part 2 covers rules 7 to 9.

Rule 1: Don’t Learn Dafny.

Before trying to prove the mathematical correctness of your algorithm, decide if the effort is worth the benefit.

Dafny is not Rust. Using Dafny requires porting algorithms of interest from Rust to Dafny. This port can miss details and introduce errors. Given this risk, should you use Dafny to verify Rust algorithms? I boldly claim that “it depends”.

  • How important is your algorithm’s correctness? If you are printing a report and it looks right, it probably is right. The internal_add algorithm relates to a data structure that I’d like others to use with confidence, giving me extra motivation to verify it.
  • Maybe all formal verification, with current tools, is too hard. I believe, however, that Dafny makes formal verification as easy as currently possible. You will find formally verifying code easier if you are already familiar with types (for example, from Rust) and recursion/induction (used infrequently in Rust). You can read this article and decide for yourself if/when formal verification is easy enough to be valuable to you.
  • Maybe fuzzing (such as cargo-fuzz) and property-based testing (such as QuickCheck) are good enough. Although these methods do not provide mathematical certainty, they are clever, useful, and easy to use. (The range-set-blaze crate already uses QuickCheck. See Rule 9.5 in a previous article for details).
  • Maybe formal verification is and will always be doomed because writing a specification is as hard as writing code. I disagree. Think about refactoring. I often start coding by writing something simple. I then refactor this simple code for efficiency. For internal_add, I found the specification to be simpler than any code. (You can judge this for yourself in Rule 4.)

Aside: Verification then becomes a computer-checked refactoring from a simple specification to a final, efficient algorithm.

  • Maybe formal verification is and will always be doomed because the halting problem tells us formally that formality isn’t generally possible. The halting problem does not doom us. While we can’t always understand arbitrary code, we don’t need to. We only need to understand our own code, which we (hopefully) wrote to be understandable. Starting in Rule 2, we’ll see how Dafny easily verifies that specific loops and recursions halt.
  • Maybe porting to Dafny is too hard. This has not been my experience. Like Rust, Dafny mixes and matches imperative and functional programming. I found porting my algorithm to be straightforward.

Assuming you still want to verify your algorithm with Dafny, your next step is to learn Dafny.

Step 2: Learn Dafny.

Dafny is both a programming language and an interactive verification system. I recommend you install it as a VS Code extension.

To learn it, start at https://dafny.org/. Of special interest is the Online Tutorial and the Reference Manual. I also found the Verification Corner videos on YouTube helpful. (Of possible interest is the college textbook, Program Proofs, $49 for the Kindle Edition). I found the programming language part of Dafny easier to learn than Rust, perhaps similar in difficulty to C#.

Dafny, like Rust, is fully typed. Dafny, like Python, is garbage collected. Here is a “Hello World”:

// hello.dfy
method Main()
{
var s := "Hello World";
print s, "\n";
}

Dafny, also like Python, offers integers of arbitrary size. Here is a program that provably adds two natural numbers by repeatedly incrementing.

method SlowAdd(x: nat, y: nat) returns (r: nat)
 ensures r == x + y
 {
 r := x;
 var y2 := y;
 while y2 > 0
 invariant r + y2 == x + y
 {
 r := r + 1;
 y2 := y2–1;
 }
 }

Some points of interest:

  • Dafny coding guidelines follow C#, not Rust. So, we name the function SlowAdd not slow_add (although either will run).
  • Dafny supports subtypes. For example, any int that can be shown to be non-negative is also a nat.
  • Assignment is := and equality is == . (There is no = .)
  • Function parameters, for example, x and y above, are immutable.
  • Dafny uses ensures and invariant statements to verify the code at compile-type. It then removes these statements to finish compiling.
  • The green check mark shows that this code verifies. Dafny’s VS Code extension will, by default, continuously try to validate each method. This adds an almost gambling-like excitement to working with Dafny. In the example above, if I make y an int rather than a nat, then validation should and will fail. (Can you figure out why?) Dafny will mark my function with a red X and tell me “This postcondition might not hold: r == x + y”.
  • Dafny knows some of the mathematics of integers, arrays, sets, maps, sequences, etc. This often allows it to finish the last details of validation by itself.

Now that you know about Dafny, you should let it know about your algorithm.

Rule 3: Define Your Algorithm’s Basic Concepts.

The range-set-blaze crate represents sets of integers as sorted, disjoint ranges. For example, this list of three ranges:

100..=2_393,
20_303..=30_239_000,
501_000_013..=501_000_016

represents a set of 30,220,996 integers.

In Rust, the RangeSetBlaze struct represents this data structure internally with a standard BTreeMap. Recall that a BTreeMap represents a list of key/value pairs, sorted by key. Here, our keys are the ranges’ starts (for example, 100, 20_303, 501_000_013) and the values are the ranges’ inclusive ends (for example, 2_393, 30_239_000, 501_000_016. RangeSetBlaze stores the list with a BTreeMap rather than a vec to make key look up more cache friendly.

RangeSetBlaze depends on BTreeMap, so must we implement BTreeMap in Dafny? Happily, no. We can, instead, use Dafny’s vec-like seq data type. This substitution works because BTreeMap, vec, and seq can all represent sorted lists — just with different efficiencies. For the purpose of formal verification, we only care about correctness and can ignore efficiency.

RangeSetBlaze requires the list of ranges be sorted and disjoint. How do we say “sorted and disjoint” in Dafny? We can say it via this ghost predicate (and related code):

ghost predicate ValidSeq(sequence: seq<NeIntRange>) {
(forall i:nat, j:nat | i < j < |sequence| :: sequence[i].0 < sequence[j].0)
&& (forall i:nat, j:nat | i < j < |sequence| :: !Touch(sequence[i], sequence[j]))
}

type IntRange = (int, int)
type NeIntRange = x: IntRange | !IsEmpty(x) witness (0,0)

function IsEmpty(r: IntRange): bool
{
r.0 > r.1
}

A predicate is another name for a method that returns bool. A ghost method (or predicate) is one that can only be used for validation, not for running the final code.

At a high level, the ValidSeq predicate takes as input a sequence of non-empty integer ranges. It then tests that the start values are sorted and that the ranges don’t touch. Specifically,

  • An IntRange is a tuple of two int values.
  • An IntRange IsEmpty exactly when its start is greater than its end. (This follows Rust’s convention.)
  • A NeIntRange (non-empty integer range) is an IntRange that is not empty, for example, (0,0). [All our ranges are end inclusive.]
  • This expression tests that the start values are sorted:
forall i:nat, j:nat | i < j < |sequence| :: sequence[i].0 < sequence[j].0

It can be read as “for all natural numbers i and j — such that i is less than j and j is less than the length of the sequence — test that the start value at index i is less than the start value as index j”.

Aside: Note that a Rust BTreeMap doesn’t support (random-access) indexing but here we are using such indexing. This is OK because ValidSeq is a ghost predicate and so will only be used for validation.

  • This expression tests that the ranges are disjoint:
forall i:nat, j:nat | i < j < |sequence| :: !Touch(sequence[i], sequence[j])

It can be read as “for all natural numbers i and j — such that i is less than j and j is less than the length of the sequence — test that the range at index i does not touch the range at index j. But what is Touch?

We’ll define Touch on two-levels. On a mathematical level, a range i is said to touch a range j if there exists an integer i0 in range i and an integer j0 in range j such that i0 and j0 are within a distance of one of each other. On an efficient programming level, we want to avoid definitions depending on “there exists”. Here is a Dafny predicate that is both mathematical and efficient:

predicate Touch(i: NeIntRange, j: NeIntRange)
ensures Touch(i, j) == exists i0, j0 ::
Contains(i, i0) && Contains(j, j0) && -1 <= i0 - j0 <= 1
{
assert Contains(i, i.0) && Contains(i, i.1) && Contains(j, j.0) && Contains(j, j.1);
if i.1 < j.0 then
assert (-1 <= i.1 - j.0 <= 1) == (i.1+1 == j.0);
i.1+1 == j.0
else if j.1 < i.0 then
assert (-1 <= j.1 - i.0 <= 1) == (j.1+1 == i.0);
j.1+1 == i.0
else
var k0 := Max(i.0, j.0);
assert Contains(i, k0) && Contains(j, k0);
true
}

function Contains(r: IntRange, i: int): bool
{
r.0 <= i && i <= r.1
}

function Max(a: int, b: int): int
{
if a < b then b else a
}

Some points of interest:

  • Touch is not a ghost. In other words, we can use it in both regular code and validation code.
  • The assert statements help Dafny prove that the regular code meets the mathematical ensures statement.
  • For efficiency, the Dafny prover validates the inside of a method separately from its outside. Only the ensures (and the yet-to-be-seen, requires) statements cross this border. In contrast to a method, a Dafny function is transparent to the validator. (I think of it as inlining code with respect to validation.)

With concepts such as ValidSeq and Touch defined, we next move onto specifying what our algorithm is supposed to do.

Rule 4: Specify Your Algorithm.

Ultimately, I want to prove that my specific Rust algorithm for inserting a new range into a RangeSetBlaze is correct. Before we do that, however, let’s define what “correct” range insertion is.

method InternalAdd(xs: seq<NeIntRange>, a: IntRange) returns (rs: seq<NeIntRange>)
requires ValidSeq(xs)
ensures ValidSeq(rs)
ensures SeqToSet(rs) == SeqToSet(xs) + RangeToSet(a)
{
if IsEmpty(a)
{
rs := xs;
}
else
{
assume false; // cheat for now
}
}

This says that InternalAdd is a method that takes xs, a sequence of non-empty integer ranges, and a, an integer range (that could be empty). The method outputs rs, a new sequence of non-empty integer ranges.

We need to say that xs and rs must be sorted and disjoint. That is easily done with the ValidSeq’s in the requires and first ensures.

We also need to say that rs contains the right stuff. Is this hard? It is not. We just say that the set of integers in rs must equal the set of integers in xs unioned with the integers in a.

Aside: In Dafny, “+” when applied to sets is “union”.

The set of integers in a range is:

ghost function RangeToSet(pair: IntRange): set<int>
{
set i {:autotriggers false} | pair.0 <= i <= pair.1 :: i
}

And the set of integers in a sequence of non-empty ranges can be define inductively (that is, recursively):

ghost function SeqToSet(sequence: seq<NeIntRange>): set<int>
decreases |sequence|
requires ValidSeq(sequence)
{
if |sequence| == 0 then {}
else if |sequence| == 1 then RangeToSet(sequence[0])
else RangeToSet(sequence[0]) + SeqToSet(sequence[1..])
}

Some points of interest:

  • The line: assume false; // cheat for now makes validation work even if it really shouldn’t. We use it as a temporary placeholder.
  • We make RangeToSet and SeqToSet ghosts to stop us from using them in regular code. We make them functions (instead of methods) to inline them with respect to validation.
  • Because Dafny knows a lot about creating and manipulating sets and sequences, we often profit by using sets and sequences in our specification.
  • Even if our regular code uses loops instead of recursion, our validation code will often use recursive-like induction.
  • The {:autotriggers false} relates to avoiding a warning message. For more information see this Stack Overflow answer by Prof. James Wilcox.

We now have a formal specification of InternalAdd. I find this specification short and intuitive. But what if you need help figuring out a specification or other Dafny code?

Rule 5: Get help from the Dafny Community.

The main forum for Dafny questions is Stack Overflow. To my surprise, I actually received much useful help there.

I recommend starting your question’s title with “Dafny:”. Also, be sure to tag your question with dafny and, perhaps, formal-verification.

Aside: On the site, you can see my 11 questions and Divyanshu Ranjan’s 48 Dafny-related answers.

As an open-source project on GitHub, Dafny also hosts GitHub Discussions and Issues.

The Dafny community is small but seems enthusiastic about helping users and improving the project.

With help at hand, we must next find an algorithm that meets the specification.

Rule 6: Validate a Different, Easier, Algorithm.

As a novice to formal verification, I decided to postpone work on the real internal_add used in my Rust code. Instead, I started work on an InternalAdd algorithm that I hoped would be easier to validate. I ended up with this:

method InternalAdd(xs: seq<NeIntRange>, a: IntRange) returns (rs: seq<NeIntRange>)
requires ValidSeq(xs)
ensures ValidSeq(rs)
ensures SeqToSet(rs) == SeqToSet(xs) + RangeToSet(a)
{
if IsEmpty(a)
{
rs := xs;
}
else
{
var notTouching, merged := PartitionAndMerge(xs, a);
var indexAfter := NoTouchIndexAfter(notTouching, merged);
rs := InsertAt(notTouching, [merged], indexAfter);
}
}

The idea is that if range a is empty, we return the input sequence unchanged. Otherwise, we divide the work into three steps, which we can validate independently. The first step, PartitionAndMerge, returns:

  • notTouching, a sequence of ranges that don’t touch range a, and
  • merged, a single range created from a and everything it touches.

Here is an example input and output:

InternalAdd next finds where to insert merged and, finally, inserts it.

Here is the code for PartitionAndMerge:

method PartitionAndMerge(xs: seq<NeIntRange>, a: NeIntRange) returns (notTouching: seq<NeIntRange>, merged: NeIntRange)
requires ValidSeq(xs)

ensures ValidSeq(notTouching)
ensures RangeToSet(merged) >= RangeToSet(a)
ensures forall range | range in notTouching :: !Touch(range, merged)
ensures SeqToSet(xs) + RangeToSet(a) == SeqToSet(notTouching) + RangeToSet(merged)
{
// Split into touching and not touching seqs
var touching: seq<NeIntRange>;
touching, notTouching := Partition(a, xs);

// Merge the touching seq into one range with our original range
merged := UnionSeq(a, touching);
}

This says that PartitionAndMerge requires that xs be a valid sequence of non-empty integer ranges and that a be a non-empty integer range. It ensures that nonTouching is another valid sequence of non-empty integer ranges. It ensures that the integers in range merged are a superset of those in range a. It ensures that no range in notTouching touches range merged. And finally, it ensures that the integers in xs and a are exactly the same as the integers in notTouching and merged.

PartitionAndMerge also divides the work, this time into two steps (Partition and UnionSeq) that can be validated independently. Those steps continue to subdivide their work. Where does it end? Let’s look at one example.

The method UnionSeq calls UnionRange which merges two ranges:

function UnionRange(x: IntRange, y: IntRange): IntRange
requires IsEmpty(x) || IsEmpty(y) || Touch(x, y)
ensures RangeToSet(x) + RangeToSet(y) == RangeToSet(UnionRange(x,y))
{
if IsEmpty(x) then y
else if IsEmpty(y) then x
else (Min(x.0, y.0), Max(x.1, y.1))
}

The UnionRange code handles the empty cases and then returns the minimum bounding range. (The minimum bounding range is the range from the smaller of the two starts to the larger of the two ends.) But how can this be correct? In general, a minimum bounding range of two ranges might include extra integers. We might get something bigger than the union of the inputs, like so:

The code is correct because it requires that the two input ranges touch or are empty. This ensures that the union of the integers in range x with the integers in range y are exactly the integers in the output range.

At compile time, Dafny proves this function correct. Beyond that, it proves that everything that calls this function provides inputs that are empty or touching.

I think of this as a generalization of Rust’s borrow checker. At compile-time Rust checks that we are safe from many memory errors. At compile time, verification systems, such as Dafny, can prove almost arbitrary properties. Of course, as we are seeing, this ability comes at the cost of complexity.

The full code for this verified algorithm is about 200 lines, organized into about a dozen methods and functions.

This rule shows that we can verify an algorithm for InternalAdd, but it is not the algorithm I used in Rust. We will turn to that next.

Those are the first six rules for validating Rust algorithms with Dafny. See Part 2 for rules 7 to 9.

Please follow Carl on Medium. I write on scientific programming in Rust and Python, machine learning, and statistics. I tend to write about one article per month.

Read more of Divyanshu Ranjan’s work on his blog. In addition to formal methods, the blog touches on geometry, statistics, and more.

--

--

Ph.D. in CS and Machine Learning. Retired Microsoft & Microsoft Research. Volunteer, open-source projects related to ML, Genomics, Rust, and Python.