Programming is a mix of both: There are certainly "solutions" - algorithms, programs, etc. - that can be objectively proven without a shadow of a doubt to be ineffective or outright incorrect. There is also the possibility of an algorithm that solves the solution, but is objectively inferior to another, such as an algorithm that runs in O(n) time versus one that runs in O(1) time. On the other hand, there is a fair amount of subjective nuance involved in arriving to a correct solution. Design methodologies, languages, and APIs all come and go in a technological blink of an eye, and each will have its pros and cons, its proponents and its skeptics.

Nobody - sane, at least - involved in all this is under the impression that there is One True Way to do things. You could lock several seasoned programmers up in separate rooms, all of whom subscribe to the exact same design methodologies and use the exact same programming languages, ask them to design a program, and they would all likely emerge with completely separate, and likely equally or near-equally satisfactory, solutions. So, without further ado, let's talk about a simple programming problem, and some possible solutions to solve it. At the end, we'll discuss for a quick minute about which of these solutions someone might actually end up using.

The problem in question is one I've named FuzzBuzz, which was definitely not a reference to an overused programming interview question. To solve FuzzBuzz, one must write a program that considers the numbers from 1 to 100, inclusive. For each number, if the number in question is even, the program must print "Fuzz." If the number is odd, the program must print "Buzz." If the number is a multiple of 3, the program should ignore its even or odd nature and print "FuzzBuzz" instead.

This is an intentionally simplified problem, both to make it easy to make multiple solutions, and to make the solutions themselves clearer. By the time you finish reading this sentence, I bet you'll probably have thought of at least one way to solve this problem, but let me go over a couple first.

First up is probably the one that came to everyone else's mind, just a simple restating of the problem's definition in code.

for x in xrange(1, 101): # range/xrange stops at the integer before the 2nd argument if x % 3 == 0: print "FuzzBuzz" elif x % 2 == 0: print "Fuzz" else: print "Buzz"

Not really a whole lot to talk about with this one; like I said earlier, it's probably the first one you thought up yourself - it was the first one I came up with, as well. Honestly, if this were a real-world problem, most people would (and probably should) stop here. But innovation never sprung from a contented mind, right?

So, for our next solution, let's limit ourselves in some way. The modulo operator is pretty central to the prior solution, so let's take that out next.

def is_multiple_of(number, mul): """Return True if number is multiple of mul, false otherwise. Okay, so here's how this works: Normally, Python - and most other programming languages - chops off the decimal part of the quotient if both terms are integers, i.e, integer division is performed. However, if one of the terms is a float, then the decimal part will be kept. Any number plus a decimal part is going to be greater than simply the number itself, so we compare the two. If number is in fact a multple of mul, then there will be no decimal part, and both sides of the expression will be equal.""" if float(number) / mul == number / mul: return True else: return False def multiple_of_three(number): return is_multiple_of(number, 3) def multiple_of_two(number): return is_multiple_of(number, 2) for x in xrange(1, 101): if multiple_of_three(x): print "FuzzBuzz" elif multiple_of_two(x): print "Fuzz" else: print "Buzz"

Still pretty straightforward. Not having access to the modulo operator means we had to basically reinvent one ourselves, though. But we could go further still, couldn't we? How about no floating-point math?

def is_multiple_of(number, mul): """Return True if number is multiple of mul, False otherwise. So again we're exploiting the properties of integer division. For any divisor, there are at least two pairs of two consecutive dividends that will return the exact same quotient under integer divison, and the smaller number in each is a multiple of the divisor. All we have to do is check that number is the smaller of the pair. We do that by seeing if the result changes if we subtract 1. If so, number is the smaller of the pair (i.e, number - 1 is part of some other pair), so return True. Otherwise, we were given the larger of the pair, or alternatively, some completely different number, so return False. """ if number / mul > (number - 1) / mul: return True else: return False def multiple_of_three(number): return is_multiple_of(number, 3) def multiple_of_two(number): return is_multiple_of(number, 2) for x in xrange(1, 101): if multiple_of_three(x): print "FuzzBuzz" elif multiple_of_two(x): print "Fuzz" else: print "Buzz"

Now things are getting ever-so-slightly hairy. But let's get dangerous now. How about no Boolean logic whatsoever?

from collections import deque sequence = deque(["Buzz", "Fuzz", "FuzzBuzz", "Fuzz", "Buzz", "FuzzBuzz"]) def next_in_sequence(): """Return next string in FuzzBuzz sequence. The FuzzBuzz problem follows a pattern: * Odd * Even * Multiple of 3 * Even * Odd * Multiple of 3 All we have to do is keep track of where we are in the sequence. To accomplish this, we use a deque and a little bit of pushing and popping to keep from having to either pre-generate the entire sequence beforehand, or from having to use an iterator. """ next = sequence.popleft() sequence.append(next) return next for x in xrange(1, 101): print next_in_sequence()

Alright, now we're talking. Admittedly, you probably saw this one coming if you were running the programs alongside reading this article - once you recognize the pattern, this solution is pretty obvious. But let's make one final push: No modulo operator, no math, no Boolean logic, and no loops.

from collections import deque import sys # Python allows the user to manually set the recursion limit. # So, we exploit this by setting it to 101, which simulates the number of # loop iterations we'd execute if we were using a loop. # Eventually, we'll exceed this number, and Python will throw # a RuntimeError, essentially accomplishing the same thing as # a regular for/while loop. sys.setrecursionlimit(101) sequence = deque(["Buzz", "Fuzz", "FuzzBuzz", "Fuzz", "Buzz", "FuzzBuzz"]) def next_in_sequence(): """Print next string in FuzzBuzz sequence. This is identical to the deque solution in sequence.py, save for printing out the next string instead of returning it, and the recursion bit, so go see sequence.py for an explanation of what's going on here. """ next = sequence.popleft() sequence.append(next) print next next_in_sequence() try: next_in_sequence() except RuntimeError: # swallow the eventual exception due to recursion taking one step too many pass

I feel now is a good time to stop and discuss the various solutions, now that we have an ample amount. Like I mentioned earlier on, if this were a real-world problem, I imagine most people using the first solution. It's clear, concise, and gets its point across without needing much in the way of hand-holding. It's easy to extend if we need to tackle additional multiples, as well.

Most of the rest of the solutions are pretty silly if we examine them without the restrictions we came up with. The modulo operator is a pretty useful tool for this problem, so it makes no sense to try to work around it, or reimplement it with our own code. I could see an argument being made for the non-recursive sequence solution - some might consider it more "elegant" than the original solution, though obviously that's up for debate.

Some might also think the sequence version is faster, considering the O(1) performance time of deque.popleft() and deque.append() (collections.deque is implemented as a doubly-linked list under the hood), but at least on my machine, the initial solution is faster when testing 10,000,000 numbers, completing in about 3.1 seconds versus 3.3 seconds for the sequence version. After some fiddling around, I was able to produce a variant on the sequence solution that ran in about 2.9 seconds on the same set:

from collections import deque sequence = deque(["Buzz", "Fuzz", "FuzzBuzz", "Fuzz", "Buzz", "FuzzBuzz"]) # Binding oft-used functions to local scope results in a speed increase # due to decreased variable lookup time popleft = sequence.popleft append = sequence.append for x in xrange(1, 101): # Moving the function's code to the loop body removes the function call, # which actually increases performance slightly next = popleft() append(next) print next

By this point though, we've lost a fair amount of readability, which was one of the main reasons to use the sequence solution. On the upside, we have a reasonable increase in speed, which might be useful in a performance-critical application. On a side note, the performance of the other solutions is pretty abysmal in comparison; the modulo-less solution ran in about 8.5 seconds, whereas the float-less solution ran in about 6 seconds. As for the recursion-based solution, it segfaulted on my machine before it could actually complete the 10,000,000-iteration loop.

So, to summarize: Programming is about 50% art and 50% science; there's usually more than one demonstrably correct solution to the problem, and on top of that, there's often some fair room for debate about which of the solutions one should proceed with. We presented a simple programming problem, came up with some possible solutions for it, and discussed their pros and cons.

As an added bonus, I've set up a Github repository at http://github.com/DangerOnTheRanger/fuzzbuzz that contains all the solutions in this article, as well as reference output to test your own solutions with. If you come up with a solution you like, regardless of its performance or what language it's written in, send me a pull request, and I'll add it to the repo.

Data challenges like FIZZBUZZ have a set of fairly optimal solutions each of which has subtle advantages and disadvantages over the others. The real interesting situation is where you build a complex application or system to address a complex domain. Once you start doing this, you have to contend with the information side of things as well as the data side of things. We don't presently have the tools to write code to handle information well. So each application addresses the information challenge in an incomplete, ad hoc, and frankly sub-optimal way without proper tools. These solutions have huge differences with each other because they all end up falling 90% under the 'art' category. Someday we will know how to work with information inside a computer program just as well as data, and there will be a 'FIZZBUZZ' interview question for information too.

ReplyDeleteYour blog has given me that thing which I never expect to get from all over the websites. Nice post guys!

ReplyDeleteWell written article. Thanks for sharing such a informative blog! keep posting. need help programming assignment

ReplyDelete