Comparing two of the fastest programming languages

Is C is faster than Rust? I had always assumed the answer to that question to be yes, but I recently felt the need to test my assumptions. I tried searching for information on the subject but I wasn’t able to find information that compared the two languages side by side, so I did my own testing. What I found were cases where Rust could be seen as faster than C.
C vs Rust
C and Rust are both similar (low-level) system programming languages designed for high Performance; where they differ is in their design philosophy and features.
- C is about simplicity and giving the programmer complete control over memory. With this freedom can come potential issues such as buffer overflows, null pointer dereferencing, and memory leaks if the programmer does not handle these properly in their code. It lacks built-in safety features to prevent these, and the user has to make them manually.
- Rust is the polar opposite of this and it emphasizes memory safety while trying to sacrifice as little performance as possible. It uses a borrow checker and ownership system to enforce rules at compile-time to ensure that memory is accessed safely.
While Rust’s main feature is automatic memory safety, if the programmer needs to turn this off it allows it, which can make it very versatile.
Comparing The speed
For testing I used:
- Debian 10
- Clang 7.0.1
- GCC 8.3.0
- Clang 13.0.1
- Cargo 1.83.0
- Rustc 1.83.0
The first thing I did was make a hello world program in each language.
C
#include <stdio.h>
int main() {
printf("Hello, World!n");
return 0;
}
Rust
fn main() {
println!("Hello, world!");
}
This is just default experience for both programs. I didn’t add any optimizations to either.
gcc hello.c -o helloc
cargo build --release
When I had both binaries I then used the valgrind massif tool to get the time and memory usage of each program.
C Hello World Execution

How Valgrind works is by taking snapshots throughout the program execution then printing out the total amount of memory used in bytes at that time, as well as the current time elapsed. Time in this case (i) is measured in CPU instructions. We can see that the C code took 4 total snapshots and 131,349 CPU instructions to print hello world.
Why do we care about CPU instruction count? CPU speed is generally measured in clock speed. These days usually Gigahertz, meaning a 3.0 GHz processor can execute 3 billion cock cycles per second. One clock cycle is not equal to one instruction though. It is a very architecture dependent problem how many clock cycles it takes to execute an instruction. Many processors can now even execute multiple instructions on one cycle. On the same platform though, like I am doing here, we can use the amount of instructions the CPU takes to execute the program in order to gain a understanding of which one took less work for the CPU to complete. Time to complete and amount of work to complete have a rough relationship.
Rust Hello World Execution

Unlike the C program, the Rust program took 14 total snapshots and 508,165 total CPU instructions to print hello world. From this we can tell that the Rust program took more work to print hello than the C program.
The Rust Program also used more ram, and while it is not shown here I also noticed a large difference in the size of the binaries with the Rust program taking 412k, while the C program took 16k.
This test does not mean that C is faster than Rust yet though, as there are ways to reduce the amount of time it takes Rust to print to screen. I just did this first to see, because I was curious before I started optimizing and performing operations.
I managed to reduce it further by adding this to my Cargo.toml:
[profile.release]
opt-level = 3
lto = true
panic = "abort"
codegen-units = 1
target-cpu = "native"
And using std::io::Write instead of println to reduce a little overhead.
use std::io::{self, Write};
fn main() {
io:stdout().write_all(b"Hello, World!n").unwrap();
}
By doing this I managed to reduce the time by a little, still nowhere near the C program, but I am sure it can be reduced further.

QuickSort
One of the things that matters a lot in a program is how quickly it can perform mathematical functions.
To test this I wrote a simple quick sort program in Rust and C.
To avoid random number generation contributing to the amount of time I used the same array of 1000 numbers in both programs.
Rust Code:
use std::time::Instant;
// QuickSort implementation in Rust
fn quick_sort<T: Ord>(arr: &mut [T]) {
if arr.len() <= 1 {
return;
}
let pivot_index = partition(arr);
quick_sort(&mut arr[0..pivot_index]);
quick_sort(&mut arr[pivot_index + 1..]);
}
fn partition<T: Ord>(arr: &mut [T]) -> usize {
let pivot_index = arr.len() / 2;
arr.swap(pivot_index, arr.len() - 1);
let mut i = 0;
for j in 0..arr.len() - 1 {
if arr[j] <= arr[arr.len() - 1] {
arr.swap(i, j);
i += 1;
}
}
arr.swap(i, arr.len() - 1);
i
}
fn main() {
let numbers: [i32; 1000] = [
//put array of 1000 numbers here
];
// Measure time before operation
let start = Instant::now();
// Perform quicksort
let mut numbers = numbers.clone(); // We need a mutable copy for sorting
quick_sort(&mut numbers);
// Measure time after operation
let duration = start.elapsed();
let seconds = duration.as_secs_f64(); // Convert to seconds as f64
println!("Sorting 1000 numbers took {:.6} seconds", seconds);
// Simple operation: sum of numbers
let start = Instant::now();
let sum: i32 = numbers.iter().sum();
let duration = start.elapsed();
let seconds = duration.as_secs_f64(); // Convert to seconds as f64
println!("Summing the numbers took {:.6} seconds", seconds);
}
C Code:
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
// QuickSort implementation in C
void quick_sort(int arr[], int low, int high) {
if (low < high) {
int pivot_index = partition(arr, low, high);
quick_sort(arr, low, pivot_index - 1); // Sort the left part
quick_sort(arr, pivot_index + 1, high); // Sort the right part
}
}
int partition(int arr[], int low, int high) {
int pivot = arr[low]; // Use the first element as pivot
int i = low + 1; // Start the i index just after the pivot
int j = high; // Start the j index at the last element
while (i <= j) {
// Move the i index to the right while the elements are less than or equal to pivot
while (i <= high && arr[i] <= pivot) {
i++;
}
// Move the j index to the left while the elements are greater than the pivot
while (arr[j] > pivot) {
j--;
}
if (i < j) {
// Swap arr[i] and arr[j] if they are in the wrong order
int temp = arr[i];
arr[i] = arr[j];
arr[j] = temp;
}
}
// Swap pivot into its correct position
arr[low] = arr[j];
arr[j] = pivot;
return j; // Return the index of the pivot
}
void benchmark() {
int numbers[1000] = {
//put array of 1000 numbers here
};
// Measure time before sorting
clock_t start = clock();
// Perform quicksort
quick_sort(numbers, 0, 999);
// Measure time after sorting
clock_t end = clock();
double sort_duration = (double)(end - start) / CLOCKS_PER_SEC;
printf("Sorting 1000 numbers took %f secondsn", sort_duration);
// Sum of array elements
start = clock();
long sum = 0;
for (int i = 0; i < 1000; i++) {
sum += numbers[i];
}
end = clock();
double sum_duration = (double)(end - start) / CLOCKS_PER_SEC;
printf("Summing the numbers took %f secondsn", sum_duration);
}
int main() {
benchmark();
return 0;
}
I then built both programs with the following arguments:
Cargo build --release
gcc quicksort.c -o quicksort
What I expected, based on the first test, was for the C code to perform faster, but to my surprise the Rust code said it was performing faster.


What confused me is then I ran both binaries through valgrind, and it reported that the C binary was still using less instructions.


Since valgrind was just measuring the total amount of instructions used for execution this made me wonder if it had something to do with how Rust was optimizing the binary. It led me to try different compilers and optimization flags with C to see if that was the issue.
gcc -O3 -funroll-loops -march=native -flto -fomit-frame-pointer -DNDEBUG -o sort sort.c

clang -Ofast -march=native -flto -funroll-loops -ftree-vectorize -fno-exceptions -fomit-frame-pointer -fno-math-errno -fstack-protector-strong -o sort sort.c

clang-13 -Ofast -march=native -flto -funroll-loops -ftree-vectorize -fno-exceptions -fomit-frame-pointer -fno-math-errno -fstack-protector-strong -o sort sort.c

Everything I tried showed the Rust binary beating the C binary.
This made me wonder if the Rust timer was faster than the C timer, so I wrote a simple program to just iterate a number as high as it could in a second to test this.
C code:
#include <stdio.h>
#include <time.h>
int main() {
clock_t start = clock();
clock_t end = start;
int count = 0;
// Loop until 1 second has passed
while (((double)(end - start)) / CLOCKS_PER_SEC < 1.0) {
count++;
end = clock();
}
printf("The number after 1 second is: %dn", count);
return 0;
}
Rust code:
use std::time::{Instant, Duration};
fn main() {
let start = Instant::now();
let mut count = 0;
// Loop until 1 second has passed
while start.elapsed() < Duration::new(1, 0) {
count += 1;
}
println!("The number after 1 second is: {}", count);
}
Compile flags:
cargo build --release
clang-13 -Ofast -march=native -flto -funroll-loops -ftree-vectorize -fno-exceptions -fomit-frame-pointer -fno-math-errno -fstack-protector-strong -o timer timer.c
What I found was that the Rust program’s number was always larger than the C’s. This showed Rust completing the loop faster than the C program.


No matter what I tried this always seemed to be the case. The only way to get an answer to why this was happening at this point was to look at the actual assembly.
I disassembled both programs in Ghidra, and I found the function that iterated the number for a second.
The loop in Rust Assembly:

The loop in C Assembly:

Each line in the disassembled output corresponds to one instructions that the CPU will have to execute.
What I found was how Rust and C were handling the time was different. Rust was way better optimized.
The C code had to perform a lot more instructions in the one loop to calculate if the current time was still less than a second than the Rust code did.
Looking at the Rust assembly made me realize that my C code should be optimized further, but by me not the compiler.
Here is the new C counter.
#include <stdio.h>
#include <time.h>
int main() {
time_t start = time(NULL); // Get the current time in seconds
time_t end = start;
int count = 0;
// Loop until 1 second has passed
while (end - start < 1) {
count++;
end = time(NULL); // Only check the time once per iteration
}
printf("The number after 1 second is: %dn", count);
return 0;
}
Compile flags same as above.

The new counter iterated the number significantly faster than either of the previous two.
When I looked at the dissembled version I found it to be similar to what the Rust’s looked like.

The only problem is this code is not running in 1 second. It is actually running faster than a second.
I tested this using a bash script to get the system time before and after execution.
#!/bin/bash
# Capture start time
start=$(date +%s%N) # Get time in nanoseconds
# Run the program (replace 'your_program' with the actual program)
./newtimer
# Capture end time
end=$(date +%s%N)
# Calculate the time difference in nanoseconds
duration=$((end - start))
# Convert duration to milliseconds (optional, if you need a more readable format)
ms=$((duration / 1000000))
echo "Program took $ms milliseconds to complete."
What I found after running this is the new timer completed in much less than a second:

Even worse, when the code became cached it would perform tremendously faster than 1 second, which makes clock(), despite taking more work in the loop. the better one to use if trying to stop for a timeframe in C.
The two original C and Rust code completed in around the same amount of time, being one second:
C Counter Execution Time:

Rust Counter Execution Time:

So while the disassembly of the C code using the time function looks similar to that of Rust’s, Rust’s time function works differently than both of C’s clocks.
The minor performance difference has to do with all three clocks getting the time differently. I should do a deep dive into all three, sometime.
I believe this showed that Rust had the more optimized clock.
None of this answered my original question though, if Rust was actually performing the Quick sort algorithm faster than C.
If it was the clock calculations slowing things down then all I had to do was increase the amount of items in the array and we would see the differences between them grow, because the clock is only slowing down by a set amount of time for the Quick sort calculations, so if I used a larger number of items we would see the C code performing faster.
Side note: I found the C clock could be performed without converting to double, but the Rust code was still faster.
#include <stdio.h>
#include <time.h>
int main() {
clock_t start = clock();
clock_t end = start;
int count = 0;
// Get the number of clock ticks per second
clock_t ticks_per_second = CLOCKS_PER_SEC;
// Loop until 1 second of CPU time has passed
while ((end - start) < ticks_per_second) {
count++;
end = clock(); // Update the clock value once per iteration
}
printf("The number after 1 second is: %dn", count);
return 0;
}
Using Rust and C as a Quick Sort External Library
To solve my problem I decided to make both Rust and C into an external library that I would call from a Python script. That way I could use Python’s timer to determine which was executing faster, and I could have python generate large arrays of randomly generated numbers that I could use for both.
Python code:
import ctypes
import random
import time
# Load the C library
c_lib = ctypes.CDLL('./libquicksort.so')
# C function prototypes
c_lib.quicksort.argtypes = [ctypes.POINTER(ctypes.c_int), ctypes.c_int, ctypes.c_int]
# Load the Rust library
rust_lib = ctypes.CDLL('./quicksort/target/release/libquicksort.so')
# Rust function prototype
rust_lib.quicksort.argtypes = [ctypes.POINTER(ctypes.c_int), ctypes.c_int]
def test_quicksort():
# Test with a random list of 100,000,000 integers
size = 100000000
arr = [random.randint(0, 10000) for _ in range(size)]
# C QuickSort
c_arr = (ctypes.c_int * size)(*arr)
start_time = time.time()
c_lib.quicksort(c_arr, 0, size - 1)
c_time = time.time() - start_time
print(f"C QuickSort completed in {c_time:.6f} seconds")
# Rust QuickSort
rust_arr = (ctypes.c_int * size)(*arr)
start_time = time.time()
rust_lib.quicksort(rust_arr, size)
rust_time = time.time() - start_time
print(f"Rust QuickSort completed in {rust_time:.6f} seconds")
# C QuickSort
c_arr = (ctypes.c_int * size)(*arr)
start_time = time.time()
c_lib.quicksort(c_arr, 0, size - 1)
c_time = time.time() - start_time
print(f"C QuickSort completed in {c_time:.6f} seconds")
# Rust QuickSort
rust_arr = (ctypes.c_int * size)(*arr)
start_time = time.time()
rust_lib.quicksort(rust_arr, size)
rust_time = time.time() - start_time
print(f"Rust QuickSort completed in {rust_time:.6f} seconds")
# Check if both results are the same
start_time = time.time()
pythonsorttime = sorted(arr)
python_time = time.time() - start_time
print(f"Python sort completed in {python_time:.6f} seconds")
assert list(c_arr) == sorted(arr), "C QuickSort failed!"
assert list(rust_arr) == sorted(arr), "Rust QuickSort failed!"
print("Both QuickSort implementations worked correctly!")
if __name__ == '__main__':
test_quicksort()
Rust Library
#[no_mangle]
pub extern "C" fn quicksort(arr: *mut i32, len: usize) {
let slice = unsafe { std::slice::from_raw_parts_mut(arr, len) };
quicksort_recursive(slice);
}
fn quicksort_recursive<T: Ord>(arr: &mut [T]) {
if arr.len() <= 1 {
return;
}
let pivot_index = partition(arr);
quicksort_recursive(&mut arr[0..pivot_index]);
quicksort_recursive(&mut arr[pivot_index + 1..]);
}
fn partition<T: Ord>(arr: &mut [T]) -> usize {
let pivot_index = arr.len() / 2;
arr.swap(pivot_index, arr.len() - 1);
let mut i = 0;
for j in 0..arr.len() - 1 {
if arr[j] <= arr[arr.len() - 1] {
arr.swap(i, j);
i += 1;
}
}
arr.swap(i, arr.len() - 1);
i
}
Cargo.toml
[package]
name = "quicksort"
version = "0.1.0"
edition = "2021"
[profile.release]
opt-level = 3
lto = true
panic = 'abort'
codegen-units = 1
target-cpu = "native"
[dependencies]
[lib]
crate-type = ["cdylib"] # This is the key part, specifying the shared library
C Library
// quicksort.c
void quicksort(int arr[], int low, int high) {
if (low < high) {
int pivot_index = partition(arr, low, high);
quicksort(arr, low, pivot_index - 1);
quicksort(arr, pivot_index + 1, high);
}
}
int partition(int arr[], int low, int high) {
int pivot = arr[high];
int i = low - 1;
for (int j = low; j < high; j++) {
if (arr[j] <= pivot) {
i++;
int temp = arr[i];
arr[i] = arr[j];
arr[j] = temp;
}
}
int temp = arr[i + 1];
arr[i + 1] = arr[high];
arr[high] = temp;
return i + 1;
}
Compile flags:
cargo build --release
gcc -shared -o libquicksort.so -fPIC -O3 -march=native -flto -funroll-loops quicksort.c
I then built and ran the program:

What I found is that the C implementation of Quicksort performed faster than the Rust implementation. I used a hundred million numbers in the array so the difference would be more noticeable. Also, you may notice that the Python sort function completed significantly faster than either the Rust or the C libraries; this is because Python uses Timsort, which has a best case of O(n). When Quicksort’s best case is just O(n log n).

Which, when we work this out, shows roughly the relationship between the two.
Wrapping Up
What I learned from all this is the gap between C and Rust is not that big. C usually is faster, uses less ram, and takes less space than Rust, but a user can easily make a mistake in C that could just reverse this. C code can be slower than Rust code. This actually makes choosing between the two based more on other factors than just the speed, because it is so similar.
What really matters is the algorithm that is chosen for the task.
Thanks for reading, and let me know what you think!
Aaron
References
The code used can be found at my github here