Categories: Fun

nrposner/fizzcrate: Optimizing FizzBuzz in Rust for enjoyable and revenue

This web page was created programmatically, to learn the article in its authentic location you possibly can go to the hyperlink bellow:
https://github.com/nrposner/fizzcrate
and if you wish to take away this text from our web site please contact us

My little cousin is studying Python, and wished to learn about what sorts of questions he might anticipate to get in coding interviews. I began him off with good ol’ FizzBuzz: out of date for any precise coding interview at the moment (at the very least, I hope), however good at his degree. After a bit backwards and forwards getting reminded about how one can create vary iterators and fixing the order, he landed on this implementation:

def basic_fizzbuzz():
    for i in vary(1, 101):
        if ipercent15 == 0:
            print("FizzBuzz")
        elif ipercent3 == 0:
            print("Fizz")
        elif ipercent5 == 0:
            print("Buzz")
        else:
            print(i)

A very good first shot: it’d get him to the following query with out a lot fuss.

But then, by coincidence, I watched an previous Prime video and determined to place the query to him: how would you lengthen this to 7 = “Baz”?

He expanded the if-else chain: I requested him to discover a strategy to do it with out explosively rising the variety of mandatory checks with every new time period added. After some hints and extra dialogue, we arrived at this implementation:

def enhanced_fizzbuzzbaz():
    for i in vary(1, 101):
        s = ""

        if ipercent3 == 0:
            s+="Fizz"
        if ipercent5 == 0:
            s+="Buzz"
        if ipercent7 == 0:
            s+="Baz"

        if not s:
            print(i)
        else:
            print(s)

Nice and easy. But this did get me pondering: how would we make this as performant and extensible as doable?

Some quick-and-dirty benchmarking exhibits that every fizzbuzz (give or take some loop overhead) takes round 105 microseconds. Slow, sure, however not tripping my alarm bells simply but. Python is gradual, say much less.

Adding ‘Baz’ takes us as much as 110 microseconds. Switching to our composable model shaves off a bit bit and will get us to round 107 microseconds. Easier on the developer AND marginally extra performant, simply what I wish to see.

I’m certain there are additional optimizations to be made in Python… however Python’s not the software I attain for once I wish to discover optimization potential.

So I took this to the pure conclusion of all software program, and rewrote it in Rust.

pub fn enhanced_fizzrustbaz() {
    for i in 1..=100 {
        let mut s = String::new();

        if ipercent3 == 0 {
            // that is sugar s = s.push_str("Fizz")
            // cf impl Add<&str> for String
            s+="Fizz"
        }
        if ipercent5 == 0 {
            s+="Buzz"
        }
        if ipercent7 == 0 {
            s+="Baz"
        }

        if s.is_empty() {
            println!("{i}");
        }
        else {
            println!("{s}");
        }
    }
}

Thanks to some syntactic sugar, the implementation is nearly similar. How does this stack as much as Python?

enhanced_fizzrustbaz    time:   [57.892 µs 58.948 µs 60.083 µs]
Found 3 outliers amongst 100 measurements (3.00%)
  2 (2.00%) excessive gentle
  1 (1.00%) excessive extreme

Around 60 microseconds, not too shabby. Now, let’s take a peek at what it is really doing, and take a look at how we’d pace it up:

First, it is iterating by way of an integer vary: we might substitute this with a .into_iter() for a small efficiency achieve, since its dimension is thought at compile time.

Then it performs a modulo and numerical comparability, evaluates, and doubtlessly executes the string addition. There is perhaps a intelligent strategy to get rid of a few of these checks: we all know, in spite of everything, that if n is a a number of of three, n+1 and n+2 is not going to. But the plain prospects nearly actually will not be performant: integer modulo is a single CPU instruction, whereas a counter- or step-based technique would require regularly referencing a distinct object for every department, and we’d find yourself taking a pleasant, common sample away from the department predictor. We might check out handbook loop unrolling, and that might be helpful if, say, we had been doing N=100000 on the unique 3 and 5 instances: however at N=100 on 3, 5, and seven we’re not doing even a single completion.

What in regards to the precise string addition, then? We begin every loop physique by heap-initializing an empty String, after which both do not use it or lengthen it with the composed phrases. We know what these phrases are and their size at compile time… might we make this heapless?

pub fn heapless_fizzrustbaz() {
    for i in 1..=100 {
        // max size of the string, assuming we prolonged past 100, can be 11 uf-8 characters
        let mut buf: [char; 11] = [' '; 11];
        let mut begin: usize = 0;

        if ipercent3 == 0 {
            let finish = begin + 4;
            let target_slice = &mut buf[start..end];
            for (dest_char, src_char) in target_slice.iter_mut().zip("Fizz".chars()) {
                *dest_char = src_char;
            }
            begin = finish; // replace the beginning index
        }
        if ipercent5 == 0 {
            let finish = begin + 4;
            let target_slice = &mut buf[start..end];
            for (dest_char, src_char) in target_slice.iter_mut().zip("Buzz".chars()) {
                *dest_char = src_char;
            }
            begin = finish;
        }
        if ipercent7 == 0 {
            let finish = begin + 3;
            let target_slice = &mut buf[start..end];
            for (dest_char, src_char) in target_slice.iter_mut().zip("Baz".chars()) {
                *dest_char = src_char;
            }
            begin = finish;
        }

        if begin == 0 {
            println!("{i}");
        } else {
            let res = buf[0..start].iter().gather::();
            println!("{res}");
        }
    }
}

This is not fairly heapless… we do find yourself amassing right into a String from a buffer on the very finish, however it’s immutable, and the remainder of our operations needs to be occurring totally on the stack. Does this ship any efficiency enhancements?

heapless_fizzrustbaz    time:   [58.466 µs 59.442 µs 60.435 µs]
Found 3 outliers amongst 100 measurements (3.00%)
  2 (2.00%) excessive gentle
  1 (1.00%) excessive extreme

Nope! A small efficiency regression, if something, although the distinction is small. If I needed to guess why, all our nonsense slicing, zipping, and amassing iterators is introducing extra overhead than simply allocating small Strings. We’d have been higher off switching String::new() to String::with_capacity(11).

But what’s really taking so lengthy right here? What really must occur? 100 small heap allocations, 300 modulo operations and numerical comparisons, a number of dozen String additions, after which printing both a quantity or the string. The numerical operations ought to take a lot lower than one microsecond all collectively, as ought to the looping overhead. Heap allocations and string additions are costly by comparability, however not that costly, particularly in the event that they’re small and happen usually.

So it could actually solely be the printing course of.

The solely factor slower than going to the heap goes to the disk… or to the display. What if we simply take away the print line?

enhanced_fizzrustbaz_no_print   time:   [888.64 ns 890.28 ns 891.88 ns]
Found 4 outliers amongst 100 measurements (4.00%)
  3 (3.00%) excessive gentle
  1 (1.00%) excessive extreme

Watch the models: nanoseconds, not microseconds. 98.5% of all runtime is spent printing. Without that step, the whole factor finishes in underneath a microsecond, string addition included.

In different phrases, fizzbuzz is overwhelmingly I/O bottlenecked.

But we’re not taking our ball and going dwelling, oh no sir. We have methods to cope with this.

Right now, we’re printing one line at a time: what if we simply saved all our outputs to a single buffer, after which write it suddenly?

pub fn fizzrustbaz_buffered() {
    // arrange a string buffer up entrance
    let mut output_buffer = String::new();

    for i in 1..=100 {
        // arrange a boolean for the native adjustments, since
        // we're not writing to a neighborhood variable
        let mut written = false;
        if i % 3 == 0 {
            output_buffer.push_str("Fizz");
            written = true;
        }
        if i % 5 == 0 {
            output_buffer.push_str("Buzz");
            written = true;
        }
        if i % 7 == 0 {
            output_buffer.push_str("Baz");
            written = true;
        }

        if !written {
            write!(&mut output_buffer, "{i}").unwrap();
        }
        output_buffer.push('n');
    }

    // get a lock on the stdout and print suddenly from bytes
    let mut deal with = stdout().lock();
    deal with.write_all(output_buffer.as_bytes()).unwrap();
}

fizzrustbaz_buffered  time:   [45.809 µs 46.382 µs 46.955 µs]
Found 12 outliers amongst 100 measurements (12.00%)
  7 (7.00%) low gentle
  3 (3.00%) excessive gentle
  2 (2.00%) excessive extreme

That shaved off round 14 microseconds, almost 1 / 4 of the runtime.

At this level, we’re closely, closely sure not simply to our I/O technique, however from this system on the opposite aspect of the pipe: the terminal/console/emulator itself, which is exterior the management of our particular Rust program. To present simply how bottlenecked we’re by it, right here is essentially the most important optimization I might discover: eliminating newlines.

fizzrustbaz_buffered_no_newline     time:   [9.0496 µs 9.2078 µs 9.3825 µs]
Found 15 outliers amongst 100 measurements (15.00%)
  6 (6.00%) excessive gentle
  9 (9.00%) excessive extreme

The easy act of eradicating newlines, printing the whole buffer as one steady, illegible block as a substitute of allocating a brand new line to every quantity reduces our runtime by roughly 80%.

Homework: Implement an async variant that prints to stdout with out blocking the computation loop. Is this sooner? Why?

At this level, I’m out of concepts. The most impactful transfer would most likely be to modify to a faster terminal… however I’m already working Ghostty! I assumed it was a reasonably performant terminal to being with!

If every character is utf-8 encoded, and assume every ‘line’ averages to roughly 5 characters, then at 9 microseconds per name, we’re placing textual content to display at a fee on the order of 0.5Gb/s which appears… low? At the very least, the specs for my pc point out a 300Gb/s reminiscence throughput, until that spec is definitely measuring one thing else totally.

I’m certain there’s additional optimizations to be made right here, however I’m out of my depth and this has grown nicely past my authentic plans for a humorous, low-effort submit. So let’s go away I/O and throughput to the aspect, and ask the query you Rustaceans have been ready on since we began: can we parallelize this?

Well sure, clearly.

serial_contrived_no_print_fizz_100  time:   [2.6293 µs 2.6392 µs 2.6546 µs]
Found 3 outliers amongst 100 measurements (3.00%)
  1 (1.00%) low extreme
  1 (1.00%) low gentle
  1 (1.00%) excessive extreme

parallel_contrived_no_print_fizz_100    time:   [44.210 µs 44.497 µs 44.757 µs]
Found 5 outliers amongst 100 measurements (5.00%)
  2 (2.00%) low extreme
  3 (3.00%) low gentle

serial_contrived_no_print_fizz_100000   time:   [2.9230 ms 2.9283 ms 2.9336 ms]
Found 3 outliers amongst 100 measurements (3.00%)
  3 (3.00%) excessive gentle

parallel_contrived_no_print_fizz_100000     time:   [1.4538 ms 1.4577 ms 1.4616 ms]
Found 10 outliers amongst 100 measurements (10.00%)
  6 (6.00%) low gentle
  4 (4.00%) excessive gentle

That mentioned, it does not do us any favors till we get nicely above N=100 (startup appears to be round 40µs, which sounds about proper).

Note that the single-threaded model on 100 is slower than the 900ns we time earlier: primarily as a result of this one is definitely saving the contents of every loop and amassing them right into a vector.

This is a little bit of a low level to finish, so let’s do one final thing to raise our spirits. Rather than emphasizing runtime efficiency, let’s flex with some code golf–and what higher strategy to play code golf than with an overcomplicated, Rube-Goldberg driver.

Let’s construct a procedural macro for fizzbuzz.

First, we’ll outline a few structs to make our lives simpler:

/// Represents a single FizzBuzz mapping, e.g. `(3, "Fizz")`
struct FizzBuzzRule {
    divisor: Expr,
    phrase: Expr,
}

/// Represents the complete macro enter, e.g. `100, [(3, "Fizz"), (5, "Buzz")]`
struct FizzBuzzInput {
    max: LitInt,
    guidelines: Vec,
}

Then a fast little parser:

/// Attempts to parse the ParseStream enter as a FizzBuzzInput
impl Parse for FizzBuzzInput {
    fn parse(enter: ParseStream) -> Result {
        // taking the primary token within the parsestream and parsing it as an integer literal
        let max: LitInt = enter.parse()?; // TODO make it possible for max just isn't 0
        
        // parsing the comma
        enter.parse::()?;

        // parsing the array of tuples containing the foundations
        // there higher not be any extra tokens past this that we're lacking!
        let rules_array;
        syn::bracketed!(rules_array in enter);
        let rules_punctuated: Punctuated<(Expr, Token![,], Expr), Token![,]> =
            rules_array.parse_terminated(|stream| {
                let tuple_content;
                syn::parenthesized!(tuple_content in stream);
                let divisor: Expr = tuple_content.parse()?;
                tuple_content.parse::()?;
                let phrase: Expr = tuple_content.parse()?;
                Ok((divisor, Token![,](proc_macro2::Span::call_site()), phrase))
            }, Token![,])?;

        let guidelines = rules_punctuated
            .into_iter()
            .map(|(divisor, _, phrase)| FizzBuzzRule { divisor, phrase })
            .gather();

        Ok(FizzBuzzInput { max, guidelines })
    }
}

And lastly, put all of it collectively:

/// Produces a buffered FizzBuzz implementation at compile time from the enter most and mappings
#[proc_macro]
pub fn fizzbuzz(enter: TokenStream) -> TokenStream {
    // name the Parse implementation we wrote above
    let FizzBuzzInput { max, guidelines } = parse_macro_input!(enter as FizzBuzzInput);

    // create the if checks programmatically
    let checks = guidelines.iter().map(|rule| {
        let divisor = &rule.divisor;
        let phrase = &rule.phrase;
        // the textual content contained in the quote! macro is replicated verbatim (aside from the #divisor and
        // #phrase variables) within the code generated at compile time
        quote! {
            if i % #divisor == 0 {
                write!(&mut output_buffer, "{}", #phrase).unwrap();
                written = true;
            }
        }
    });

    // assemble the ultimate block of code we'll subsitute for the macro
    let expanded_code = quote! {
        {
            use std::io::{stdout, Write as IoWrite};
            use std::fmt::Write as FmtWrite; // Avoid identify collision

            // coerce the utmost to at the very least 1
            let input_max = #max as usize;
            let max = input_max.max(1);

            // pre-allocate a buffer, 20 bytes per row (needs to be sufficient for small guidelines)
            let mut output_buffer = String::with_capacity(max * 20);

            for i in 1..=max {
                let mut written = false;

                // aka 'the checks tokenstream goes to be parsed as an iterator of issues which
                // you must then additionally broaden'
                // if an if assertion had a sort, this is able to be of kind [IfStatement]
                #(#checks)*

                // as a result of we're simply writing every part to the buffer, we need not care about
                // distinguishing between the numbers and phrases
                if !written {
                    write!(&mut output_buffer, "{}", i).unwrap();
                }
                output_buffer.push('n');
            }

            let mut deal with = stdout().lock();
            deal with.write_all(output_buffer.as_bytes()).unwrap();
        }
    };

    // convert again to TokenStream to be generated and compiled at compile time
    TokenStream::from(expanded_code)
}

And similar to that, we’ve a pleasant, one-line, extensible, and pretty performant implementation of fizzbuzz constructed programmatically at compile time.

fizzbuzz!(105, [(3, "Elementary, "), (5, "my dear "), (7, "Watson")]);

Thank you for studying and following together with! If you discovered this amusing and barely instructional, please give the repo a star and share it in your communities.

Join the dialog on Lobste.rs.

And should you’re the sort of nerd who learn all the way in which to the top… I’ll see you at RustConf2025.

This web page was created programmatically, to learn the article in its authentic location you possibly can go to the hyperlink bellow:
https://github.com/nrposner/fizzcrate
and if you wish to take away this text from our web site please contact us

fooshya