
Update: You can see a YouTube version of this article. I presented to the Rust Linz Meetup.
I love the Rust programming language, but it’s not perfect. For example, do you know how to write a Rust function that accepts any string-like thing as an input? What about accepting an iterator of any kind of path? Can you write a function that accepts a Vec<f32>
as an ndarray::ArrayView1
?
Rust functions can accept all these inputs, but I find the syntax hard to remember and read. So, I created the anyinput
Macro to remember the tricky syntax for me and other Rust programmers. (See it at https://crates.io/crates/anyinput.)
While creating anyinput
, I learned nine rules that can help you easily create procedure macros in Rust. The rules are:
- Use a Rust workspace and
proc_macro2
to develop, debug, and unit-test your macro in a normal (non-macro) project. - Use
syn
,proc_macro2
, andquote
to convert freely among literal code, tokens, syntax trees, and strings. - Create easily debuggable unit tests that report any differences between what your macro does and what you expect.
- Use AST Explorer and the
syn
documentation to understand Rust syntax trees. Destructure syntax trees with Rust’s pattern matching and struct/enum access. - Construct syntax trees with
parse_quote!
and Rust’s struct update syntax. - Use
syn
‘s Fold Trait to recursively traverse, destructure, and construct syntax trees. - Use
proc_macro_error
to return ergonomic and testable errors. - Create integration tests. Include UI tests based on
trybuild
. - Follow the rules of elegant Rust API design, especially, eating your own dogfood, using Clippy, and writing good documentation.
Rust’s powerful macro system lets us use Rust to write Rust. The system offers two kinds of macros. With the first kind, you use the macro_rules!
macro to declare a new macro. It is generally simple. Sadly, macro_rules!
could not do what I wanted. With the second kind, procedural macros, you get more power because you program them in Rust.
Procedural macros come in three flavors:
- Function-like macros –
custom!(...)
- Derive macros –
#[derive(CustomDerive)]
- Attribute macros –
#[CustomAttribute]
My macro, ** `anyinput`**, is an attribute macro, but these rules apply to all three flavors.
Here is a simple example of the anyinput
macro in use:
Task: Create a function that adds
2
to the length of any string-like thing.The next example shows that
anyinput
supports multiple inputs and nesting.Task: Create a function with two inputs. One input accepts any iterator-like thing of
usize
. The second input accepts any iterator-like thing of string-like things. The function returns the sum of the numbers and string lengths. Apply the function to the range1..=10
and a slice of&str
‘s.These two examples use
AnyString
andAnyIter
. The macro also understandsAnyPath
,AnyArray
, and (optionally)AnyNdArray
.
How does applying the anyinput
macro to a user’s function work? Behind the scenes, it rewrites the function with appropriate Rust generics. It also adds lines to the function to efficiently convert from any top-level generic to a concrete type. Specifically, in the first example, it rewrites the len_plus_2
function into:
Here AnyString0
is a generic type. The line let s = s.as_ref();
converts s
from generic type AnyString0
to concrete type &str
.
Creating a procedural macro requires many decisions. Based on my experience with anyinput
, here are the decisions I recommend. To avoid wishy-washiness, I’ll express these recommendations as rules.
Rule 1: Use a Rust workspace and proc_macro2 to develop, debug, and unit-test your macro in a normal (non-macro) project
If we set things up just right, we can develop and debug our macro’s core code as a regular Rust project. For example, I use VS Code set up for Rust. With my core code, I can set interactive breakpoints, step through code line-by-line, run unit tests, etc.
Rust macros require at least two projects (top-level and derive). We’re going to add a third project, called core, to make development easier.
IMPORTANT: Everywhere you see "anyinput" in a project or file name, substitute the name of your project.
This table summarizes how you should lay out your files:
Next, let’s go through the file layout in detail.
Top-Level Project: Create the top-level files using Rust’s usual [cargo new anyinput --lib](https://doc.rust-lang.org/cargo/guide/creating-a-new-project.html)
command. (REMEMBER: Replace anyinput with your project name.) Open the newly-created top-level [Cargo.toml](https://github.com/CarlKCarlK/anyinput/blob/9rules/Cargo.toml)
and add these lines to the bottom of the file:
The [workspace]
section defines the three-project Rust workspace. Our [dev-dependencies]
contains trybuild
, a dependency we’ll use for integration testing (Rule #8). Our [dependencies]
contain just anyinput-derive
, with the current version, and a path of "anyinput-derive"
.
If you look at my [src/lib.rs](https://github.com/CarlKCarlK/anyinput/blob/9rules/src/lib.rs)
file, you’ll see it mostly contains documentation. The only critical line is:
pub use anyinput_derive::anyinput;
This makes anyinput::anyinput
, the macro, visible by re-exporting anyinput_derive::anyinput
. We’ll define anyinput_derive::anyinput
presently.
The top-level [README.md](https://github.com/CarlKCarlK/anyinput/blob/9rules/README.md)
file contains the project’s main documentation.
We’ll talk about the top-level tests
folder when we get to Rule #8.
Derive Project: Create the derive files, from inside the top-level folder, with the command [cargo new anyinput-derive --lib](https://doc.rust-lang.org/cargo/guide/creating-a-new-project.html)
. Add these lines to the bottom of [anyinput-derive/Cargo.toml](https://github.com/CarlKCarlK/anyinput/blob/9rules/anyinput-derive/Cargo.toml)
:
The [lib]
section defines anyinput-derive
as a procedure-macro project. The [dependencies]
section brings in, first, our-yet-to-be-created anyinput-core
project, with its current version and local path. It also brings in two important external crates (to be discussed in Rules #2 and #7).
File [[anyinput](https://docs.rs/anyinput/)-derive/README.md](https://github.com/CarlKCarlK/anyinput/blob/9rules/anyinput-derive/README.md)
is really a "don’t read me" file. It literally says, "You are probably looking for the anyinput
crate, which wraps this crate and is much more ergonomic to use."
My [anyinput-derive/src/lib.rs](http://anyinput-derive/src/lib.rs)
file contains exactly eleven lines:
Here is what these lines do:
- They pull in the "don’t read me" file as documentation.
- They import the
anyinput_core
function from theanyinput_core
project. - They use
#[proc_macro_attribute]
to define a macro via functionanyinput
. As with all attribute-macro functions, this function takes twoTokenStream
inputs and returns aTokenStream
. (If you wish to create a function-like macro or a derive macro, the function signature differ slightly. See Procedural Macros – The Rust Reference for details.) - The lines define the
anyinput
function in terms of theanyinput_core
function. The.into()
method converts between two versions of a type calledTokenStream
. - They use
#[proc_macro_error]
to captureabort!
‘s and return ergonomic errors. See Rule #7 for details.
Core Project: Create the core project, from inside the top-level folder, with the command [cargo new anyinput-core --lib](https://doc.rust-lang.org/cargo/guide/creating-a-new-project.html)
. Add these lines to the bottom of [anyinput-core/Cargo.toml](https://github.com/CarlKCarlK/anyinput/blob/9rules/anyinput-core/Cargo.toml)
:
File [anyinput-core/README.md](https://github.com/CarlKCarlK/anyinput/blob/9rules/anyinput-core/README.md)
is another "don’t read me" file that refers users to the top-level project.
File [anyinput-core/src/tests.rs](https://github.com/CarlKCarlK/anyinput/blob/9rules/anyinput-core/src/tests.rs)
contains the unit tests. We’ll cover this in Rule #3.
File [anyinput-core/src/lib.rs](https://github.com/CarlKCarlK/anyinput/blob/9rules/anyinput-core/src/lib.rs)
will eventually contain most of the macro’s code. For now, start it with:
Later rules will detail what these lines do. The gist is that anyinput_core
calls transform_fn
. For now, the transform_fn
function turns any user function into a "Hello World" function.
File [anyinput-core/src/tests.rs](https://github.com/CarlKCarlK/anyinput/blob/9rules/anyinput-core/src/tests.rs)
will eventually contain all unit tests. (Discussed in Rule #3). For now, it contains:
These lines create a unit test that tests if the macro changed the user’s hello_universe
function into a hello_world
function. You can test it from the anyinput-core directory, by running cargo test first
. (You can also run a version of this on the Rust Playground.)
Because anyinput-core
is a normal (non-macro) Rust package, you can develop its code with your normal Rust tools. So, for example, if your code editor supports interactive debugging, you can set breakpoints and/or single step through the code.
Of course, the anyinput
macro should not turn users’ functions into the hello_world
function. Instead, it should rewrite user functions to accept any string, path, etc. Toward that end, we must understand how to convert among literal code, tokens, syntax trees, and strings. That is the topic of Rule #2.
Rule 2: Use syn
, proc_macro2
, and quote
to convert freely among literal code, tokens, syntax trees, and strings.
The [syn](https://docs.rs/syn/latest/syn/)
, [proc_macro2](https://docs.rs/proc-macro2/latest/proc_macro2/)
, and [quote](https://docs.rs/quote/latest/quote/)
crates make creating procedure macros much easier by allowing us to work with syntax trees.
For example, using the three crates, you can print the input to transform_fn
as a string. This is useful for debugging.
First, we add two temporary println!
statements.
Then, from the anyinput-core
folder, we run cargo test first -- --nocapture
. (You can also run a version of this on the Rust Playground.) Finally, we see:
To take advantage of the three crates, you must understand the following items and how to convert among them.
- literal code -This is code in a file. For example:
fn hello_universe() {
println!("Hello, universe!");
}
TokenStream
– This represents an abstract stream of tokens. The Rust compiler applies a macro by first turning a user’s literal code into aTokenStream
. The compiler next feeds thatTokenStream
to the macro for processing. Finally, the macro returns a newTokenStream
which the compiler compiles.- syntax tree – This is nested Rust structs and enums that represent parsed code. The structs and enums are defined in crate
syn
. For example,[[ItemFn](https://docs.rs/syn/latest/syn/struct.ItemFn.html)](https://docs.rs/syn/latest/syn/struct.ItemFn.html)
is a struct representing a free-standing function. One ofItemFn
‘s four fields,block
, contains a vector of[[Stmt](https://docs.rs/syn/latest/syn/enum.Stmt.html)](https://docs.rs/syn/latest/syn/enum.Stmt.html)
. The enumStmt
represents a Rust statement. (Rule #4 tells how to learn about the structs and enums thatsyn
defines.) - strings of code, syntax, and tokens – We can turn the previous items into strings. Also, we can turn strings into the previous items.
This table summarizes how to convert into the type you want from the other types.
Aside: Idea for a set of new Rust macros: A set of macros to do these conversions more uniformly
Next, let’s look at sample code that demonstrates these conversions. (You can also run this sample code in the Rust Playground.)
Literal code to tokens, syntax, and string-of-code
If you have literal code, use quote!
, parse_quote!
, and stringify!
to convert it into TokenStream
, a syntax tree, or a string-of-code, respectively.
Note that parse_quote!
must see a syn
type, here the ItemFn
struct. This tells it what to parse into. Also, recall that Rust lets us call function-like macros with any kind of bracket: !(
…)
, ![
…]
, or !{
…}
. It’s more or less the same.
Tokens to string-of-code & string-of-tokens
It’s often useful to see what code a TokenStream
represents. Use .to_string()
. You might also be interested in a string representation of the tokens themselves. If so, use format!("{:?}",...)
. Use format!("{:#?}",...)
to pretty print, that is to add new lines and tabs to the string-of-tokens.
Syntax tree to string-of-code & string-of-syntax
It’s often useful to see what code a syntax tree represents. Use quote!(#syntax).to_string()
. It’s also often useful to see the syntax tree itself as a string. Use format!("{:?}",syntax)
. Use format!("{:#?}",syntax)
to add new lines and tabs to the string-of-syntax.
Tokens ↔ syntax
To turn a TokenStream into a syntax tree, use parse2
. Notice that parse2
requires the syn
type in which to parse (here, ItemFn
). Also, notice that parse2
can return an error result. We’ll see how to handle errors in Rule #7.
To turn a syntax tree into a TokenStream
, use quote!(#syntax)
.
String-of-code to syntax tree or tokens
To turn a string-of-code into a syntax tree or a TokenStream
, use parse_str
. It requires a syn
type or TokenStream
so that it knows what to parse into. It can return an error result.
Literal code to string-of-tokens and string-of-syntax
Finally, to turn literal code into a string-of-tokens, we first convert to a TokenStream
and then turn that into a string. Turning literal code into a string-of-syntax requires three steps: to tokens, to syntax tree (with possible error result), to string.
With the knowledge of how to convert among these items of interest, we next move on to unit testing.
Rule 3: Create easily debuggable unit tests that report any differences between what your macro does and what you expect
Aside: The Rust Book recommends you put unit tests in your
lib.rs
. I prefer them in atests.rs
file. Either set up is fine.
As per Rule #1, our unit tests live in a standard (non-macro) Rust project and can be run and debugged with standard Rust tools. But what form should those unit tests take?
I recommend tests that
- Specify the user’s literal code. This is the before code that goes into the macro.
- Specify the expected literal code after the application of the macro.
- Apply the macro and then check that expected equals after. If expected differs from after, display the difference.
- Finally, if possible, run a copy of the expected code.
Here is a simple unit test. You can see that it expects the macro to rewrite user function any_str_len
. It also checks that the expected any_str_len
code actually returns the length of a string. (You can run this on the Rust Playground and see the test fail.)
What happens when we run it? It fails!

Why? In VS Code, I set a breakpoint in the unit test and then single step through the code. I see that the anyinput_core
function calls transform_fn
. I, then, see that the current version of transform_fn
transforms all user functions into a hello_world
function.
The test’s output also shows the difference between expected and after. The helper function assert_tokens_eq
, called by the unit test, reports the difference. The helper function is:
That was a simple unit test. Our next step is to create more unit tests that will exercise our yet-to-be-written macro. For the anyinput
macro these unit tests include processing user functions with two inputs, inputs with paths and arrays, nested inputs, etc.
With some unit tests in place, we’d next like to start writing our macro. That entails creating a better transform_fn
. That, however, will require an understanding of Rust syntax trees, which brings us to Rule #4.
Aside: The
anyinput
macro transforms a user’s function viatransform_fn
(a function in[anyinput-core/src/lib.rs](https://github.com/CarlKCarlK/anyinput/blob/9rules/anyinput-core/src/lib.rs)
). If your macro transforms, for example, a user’s struct, you’d change _transform_fn
to_transform_struct
.
Rule 4: Use AST Explorer and the syn
documentation to understand Rust syntax trees. Destructure syntax trees with Rust’s pattern matching and struct/enum access
We can use the on-line tool AST Explorer to see the syntax tree created by any_str_len
, our simple test case.
Aside: Be sure to set AST Explorer’s language to Rust. I didn’t figure this out at first and ended up creating my own little on-line tool, Rust AST Explorer.
If you paste the before version of any_str_len
into AST Explorer, it reports:

We guess that ItemFn
represents the user’s function. ItemFn
seems to be a four-field struct. We confirm this by searching the syn
crate’s documentation for ItemFn
.
Here is an example of using this information. The anyinput
macro will often need to add items to the where clause of the user’s function. The macro, thus, needs an initial list of items already in the where clause. Using the AST Explorer, the syn
documentation, and standard Rust pattern matching, I came up with:
See Destructuring Nested Structs and Enums for an overview of standard Rust pattern-matching and destructuring techniques.
We can now extract information from the input syntax tree. We next look at adding information the output syntax tree.
Rule 5: Construct syntax trees with parse_quote!
and Rust’s struct update syntax
Rust generally doesn’t let us edit parts of a struct or enum. Instead, we create a new struct or enum, perhaps based on information from an old one. The syn
crate’s [parse_quote!](https://docs.rs/syn/latest/syn/macro.parse_quote.html)
macro is an easy (and kind of fun) way to do this. It combines literal code with syntax trees to create new syntax trees.
Here is an example from the anyinput
macro. This code generates a statement to be added to the user’s function. For example, under some conditions it adds the statement let s = s.as_ref();
to the user’s function.
I also use parse_quote!
to create little bits of syntax such as a left-angled bracket and an empty list of WherePredicate
. (Runnable in the Rust Playground.)
The parse_quote!
macro is great for building up new syntax, but what if you want to change existing syntax. For that, I recommend Rust’s standard struct update syntax.
Here is an example from the anyinput
macro. It creates a ItemFn
struct with new values for fields sig
and block
but that is otherwise the same as old_fn
. Likewise, a new Signature
struct contains new values for fields generics
and inputs
but is otherwise the same as old_fn's
sig
. Why not just specify every field? Well, the Signature
struct contains 11 fields, so the update syntax is much more concise.
We now know how to use standard Rust methods (plus parse_quote!
) to work with syntax trees. Sometimes, however, we want something more powerful. Rule #6 shows how to transform syntax trees by exploiting their inherent recursion and nesting.
Rule 6: Use syn
‘s Fold Trait to recursively traverse, destructure, and construct syntax trees
The anyinput
macro must handle inputs to a user’s function such as these:
s: AnyString
v: Vec<AnyPath>
a: AnyArray<AnyPath>
yikes: AnyIter<Vec<AnyArray<AnyPath>>>
Such nesting strongly suggests that we should use recursion. The syn
crate enables such recursion via its Fold
Trait. Here is an example.
Suppose we want a macro that counts the number of statements in a function. This is harder than it sounds because – by using curly braces— Rust statements can contain sub-statements. Suppose, also, the macro should replace any statement containing "galaxy" with one that prints "hello universe". (You can run this example in the Rust Playground.)
We first create our own struct. It holds any information we need while processing the user’s syntax tree:
Next, we define a Fold
implementation for the type(s) we wish to inspect or rewrite. In this case, we wish to visit every Stmt
, so:
Fold
supports 187 types, but we only implement the ones of interest, here just one. The others automatically receive default implementations.
You may wonder about the let stmt_middle = syn::fold::fold_stmt(self, stmt_old);
line. It is important. It is required if we wish to visit statements and other types inside the statement we are currently visiting.
Here is a full fold_stmt
implementation. Note, we return a (possibly transformed) Stmt
.
Perhaps surprisingly, we don’t call our fold_stmt
implementation directly. Instead, we call fold_item_fn
because, in this example, ItemFn
is the type of input we get from the user.
Running count_statements
counts the statements recursively and replaces "hello galaxy" with "hello universe":
We now have all the tools we need to write macros for perfect users. But what if our users sometimes make mistakes? That is the topic of Rule #7.
Rule 7: Use proc_macro_error
to return ergonomic and testable errors
The anyinput
macro supports nested inputs such as AnyIter<AnyString>
. What should happen if the user uses parentheses instead of angled brackets? Ideally, they would receive a message that points to the exact error location:
There are, however, difficulties. Namely,
- Rust macros don’t use standard Rust error
[results](https://doc.rust-lang.org/std/result/)
. - The standard
panic!
macro works, but it only returns error messages, not error locations. [proc_macro::Diagnoistic](https://doc.rust-lang.org/proc_macro/struct.Diagnostic.html')
does what we want, but it is nightly-only.[std::compile_error](https://doc.rust-lang.org/std/macro.compile_error.html)
also does what we want but can only be used in the top macro function. So, it doesn’t help, for example, when we find a user error dozens of levels down aFold
traversal. (Rule #6 describesFold
traversals).
The [proc_macro_error](https://docs.rs/proc-macro-error/latest/proc_macro_error/)
crate solves these problems (at the cost of a little overhead). It also offers nightly compatibility for future proofing.
Here is how to set it up. First, apply #[proc_macro_attribute]
to your macro function in [anyinput-derive/src/lib.rs](https://github.com/CarlKCarlK/anyinput/blob/9rules/anyinput-derive/src/lib.rs)
. (See Rule #1 for details.)
When you find a user error in your core code ([anyinput-core/src/lib.rs](https://github.com/CarlKCarlK/anyinput/blob/9rules/anyinput-core/src/lib.rs)
), call the abort!
macro. For example, this code checks for three user errors. It calls abort!
when it finds one.
The abort!
macro works just like the standard panic!
macro except for a first argument that tells the location of the error. That argument can be a syntax tree from the user’s code. Alternatively, it can be a SpanRange
extracted from a syntax tree or TokenStream
, for example:
let span_range = SpanRange::from_tokens(&type_path_old);
Your unit tests should exercise this error handling. Here is the unit test for the "parentheses instead of angled brackets" error.
You’ll notice it uses the standard [should_panic](https://doc.rust-lang.org/reference/attributes/testing.html#the-should_panic-attribute)
testing attribute. The message the unit test looks for, however, is strange. This is the best a unit test can do. The next rule, however, shows how an integration test can check the text of the error messages.
Rule #8 Create integration tests. Include UI tests based on trybuild
Integration tests apply your macro to real code! They live in [tests/integration_test.rs](https://github.com/CarlKCarlK/anyinput/blob/9rules/tests/integration_test.rs)
. My first integration test is:
If you set up your projects as per Rule #1, you can run this test from the top-level folder with cargo test
. The Rust system will automatically take care of compiling all the levels of the workspace for testing and debugging.
What if you want to run all integration tests and all unit tests? Use the command cargo test --workspace
. What if you want to run your code interactively? With VS Code set up for Rust, I can set a breakpoint on s.len()
in the integration test and single step through the after-macro-application code.
One hole remains in our testing -We still haven’t fully tested our error handling. (Described in Rule #7). Fill this hole by adding [trybuild](https://crates.io/crates/trybuild)
UI testing to the integration tests. The steps to add this are:
- Create files containing user code that uses your macro. Here is the UI test for "parentheses instead of angled brackets". It lives in file
[tests/ui/anyiter_paren.rs](https://github.com/CarlKCarlK/anyinput/blob/9rules/tests/ui/anyiter_paren.rs)
:In
[tests/integration_test.rs](https://github.com/CarlKCarlK/anyinput/blob/9rules/tests/integration_test.rs)
add integration testui
:Run these UI integration tests from the top-level folder with
cargo test ui
. The first time you run this test, it will fail but it will also create filewip/[anyiter_paren](https://github.com/CarlKCarlK/anyinput/blob/9rules/tests/ui/anyiter_paren.rs).stderr
.Look at
[anyiter_paren](https://github.com/CarlKCarlK/anyinput/blob/9rules/tests/ui/anyiter_paren.rs).stderr
. If you decide it contains the correct output, move it to[tests/ui/anyiter_paren.stderr](https://github.com/CarlKCarlK/anyinput/blob/9rules/tests/ui/anyiter_paren.stderr)
. The next time you run the test(s), they will expect this output. See the[trybuild](https://crates.io/crates/trybuild)
documentation for more details on (re-)generating expected output files.
With your macro fully tested, you can think about using it and sharing it. Rule 9 discusses how to make it ready for sharing.
Rule 9: Follow the rules of elegant Rust API design, especially, eating your own dogfood, using Clippy, and writing good documentation
The previous rules describe coding a macro, but how should you design your macro? I recommend you follow the Nine Rules for Elegant Rust Library APIs, especially these three:
Use Clippy – Apply strict linting with rust-clippy.
Write good documentation to keep your design honest – The command for generating documentation and popping it up in your browser is cargo doc --no-deps --open
.
Know your users’ needs, ideally by eating your own dogfood – This means using your macro in other projects to see how well it works. In the case of the anyinput
macro, I used it in
[fetch-data](https://crates.io/crates/fetch-data)
– my crate to download and cache sample files from the Internet) – I usedanyinput
to specify paths. It worked great.[bed-reader](https://crates.io/crates/bed-reader)
— our genomics file reader – I usedanyinput
to specify paths (worked great) and iterators of string-like things (yikes).
With bed-reader
, I realized that users would see the generic names my macro generates. Both bed-reader’s documentation and code editors such as VS Code displayed the macro-generated generics. If I named them T0
, T1
, etc. they would be too vague and might collide with other generic in the user’s function. To avoid collisions, I tried names like T8a0700b3-d16b-4b8e-bdb4–8fcb7e809ff3
, but they looked terrible. I ended up giving generics names such as AnyString0
, AnyIter1
, etc. Thanks to "eating my own dogfood", I ended up with a design that I’m happy with.
So, there you have it: nine rules for creating procedural macros in Rust. If you want to publish your macro to crates.io, you’ll need to publish core, first, then derive, and finally the top level. You may need to wait 5 seconds or so between each [cargo publish](https://doc.rust-lang.org/cargo/commands/cargo-publish.html)
.
My experience with anyinput
shows that writing a macro can be as easy as writing a regular Rust program … that manipulates large, nested structs. The key is organizing the workspace, using the right tools, and understanding those tools. Follow these nine rules to create your own, powerful, Rust macros.
Aside: Please try https://crates.io/crates/anyinput. If you like this article, please give it "clap". If you’re interested in future articles – on Rust, Machine Learning, Python, Statistics, etc. – please follow my writing.