Thank you for your interest in our newsletters. You will receive an email shortly to confirm your subscription.
Matt Hillman, Principal Security Researcher
15 mins read
Fuzzing is a way of discovering bugs in software by providing randomized inputs to programs to find test cases that cause a crash. Fuzzing your programs can give you a quick view on their overall robustness and help you find and fix critical bugs.
It’s ultimately a black box technique, requiring no access to source code, but it can still be used against software for which you do have source code. This is because it will potentially find bugs more quickly and avoid the need to review lots of code. Once a crash is detected, if you have the source code, it should become much easier to fix.
Fuzzing can be very useful, but it’s no silver bullet. Here are some of the pros and cons of the fuzzing technique:
Fuzzers provide random input to software. This may be in the form of a network protocol, a file of a certain format, or direct user input. The fuzzed input can be completely random with no knowledge of what the expected input should look like, or it can be created to look like valid input with some alterations.
A fuzzer that generates completely random input is known as a “dumb” fuzzer, as it has no built-in intelligence about the program it’s fuzzing. A dumb fuzzer requires the smallest amount of work to produce (it could be as simplistic as piping /dev/random into a program). This small amount of work can produce results for very little cost – one of fuzzing’s big advantages. However, sometimes a program will only perform certain processing if particular aspects of the input are present. For example, a program may accept a “name” field in its input, and this field may have a “name length” associated with it.
If these fields aren’t present in a form that’s valid enough for the program to identify, it may never attempt to read the name. And if the fields are present in a valid form but the length value is set to the incorrect value, the program may read beyond the buffer containing the name and trigger a crash. Without input that’s at least partly valid, this is very unlikely to happen. In these cases, “smart” fuzzers can be used. Smart fuzzers are programmed with knowledge of the input format, i.e. a protocol definition or rules for a file format. It can then construct mostly valid input and only fuzz parts of the input within that basic format.
The greater the level of intelligence you build into a fuzzer, the deeper you may be able to go into a protocol or file format’s processing - but the more work you create for yourself too. A balance needs to be found between these two extremes. It can be good to begin with a much more dumb fuzzer and increase its intelligence as the code quality of the software you’re testing increases. If you get lots of crashes with a simplistic fuzzer, there’s no point spending a long time making it more intelligent until the code quality increases to a point where it’s required.
Broadly speaking, fuzzers can be split into two categories based on how they create input to programs – mutation-based and generation-based. This section details those categories as well as offering a brief description of a more advanced technique called Evolutionary Fuzzing.
Mutation-based fuzzers are arguably one of the easier types to create. This technique suites dumb fuzzing but can be used with more intelligent fuzzers as well. With mutation, samples of valid input are mutated randomly to produce malformed input.
A dumb mutation fuzzer can simply select a valid sample input and alter parts of it randomly. For many programs, this can provide a surprising amount of mileage, as inputs are still often significantly similar enough to a valid input. This means good code coverage can be achieved without the need for further intelligence.
You can build in greater intelligence by allowing the fuzzer to do some level of parsing of the samples to ensure it only modifies specific parts or doesn’t break the overall structure of the input so it’s immediately rejected by the program. Some protocols or file formats will incorporate checksums that will fail if they’re modified arbitrarily. A mutation-based fuzzer should usually fix these checksums so the input’s accepted for processing or the only code tested is the checksum validation and nothing else.
Two useful techniques that can be used by mutation-based fuzzers are described below.
A fuzzer can take saved sample inputs and replay them after mutating them. This works well for file format fuzzing where a number of sample files can be saved and fuzzed to provide to the target program.
Simple or stateless network protocols can also be fuzzed effectively with replay, as the fuzzer won’t need to make lots of legitimate requests to get deep into the protocol. For a more complex protocol, replay may be more difficult. This is because the fuzzer may need to respond in a dynamic way to the program to allow processing to continue deep into the protocol, or the protocol may simply be inherently non-replayable.
You may have heard of Man-in-the-Middle (MITM) as a technique used by penetration testers and hackers, but it can also be used for mutation-based network protocol fuzzing. With MITM, you place yourself in the middle of a client and server (or two clients in the case of peer-to-peer networking), intercepting and possibly modifying messages passed between them. In this way, you’re acting like a proxy between the two.
The term MITM is generally used when it’s not expected you’ll be acting like a proxy, but for our purposes the terms are largely interchangeable. By setting your fuzzer up as a proxy, it can mutate requests or responses depending on whether you’re fuzzing the server or the client. Again, the fuzzer could have no intelligence about the protocol and randomly alter some requests and not others. Or it could intelligently target requests at the specific level of the protocol you’re interested in. Proxy-based fuzzing can allow you to take an existing deployment of a networked program and quickly insert a fuzzing layer into it, without needing to make your fuzzer act like a client or server itself.
Generation-based fuzzers actually generate input from scratch rather than mutating existing input. They usually require some level of intelligence to construct input that makes at least some sense to the program, although generating completely random data would also technically be generation.
Generation fuzzers often split a protocol or file format into chunks, which they can build up in a valid order, and randomly fuzz some of those chunks independently. This can create inputs that preserve their overall structure, but contain inconsistent data within it. The granularity of these chunks and the intelligence with which they’re constructed define the level of intelligence of the fuzzer. While mutation-based fuzzing can have a similar effect as generation fuzzing (as, over time, mutations will be randomly applied without completely breaking the input’s structure), generating inputs ensures this will be so.
Generation fuzzing can also get deeper into a protocol more easily, as it can construct valid sequences of inputs applying fuzzing to specific parts of that communication. It also allows the fuzzer to act as a true client/server, generating correct, dynamic responses where these can’t be blindly replayed.
Evolutionary fuzzing’s an advanced technique, which we’ll briefly describe. It allows the fuzzer to use feedback from each test case to learn the format of the input over time. For example, by measuring the code coverage of each test case, the fuzzer can work out which properties of the test case exercise a given area of code, and gradually evolve a set of test cases that cover the majority of the program code. Evolutionary fuzzing often relies on other techniques similar to genetic algorithms and may require some form of binary instrumentation to operate correctly.
Even for relatively dumb fuzzers, it’s important to keep in mind what part of the code your test cases are actually likely to hit. To give a simple example, if you’re fuzzing an application protocol that uses TCP/IP and your fuzzer randomly mutates a raw packet capture, you’re likely to be corrupting the TCP/IP packets themselves. As a result, your input’s unlikely to get processed by the application at all.
Or if you were testing an OCR program that parsed images of text into real text, but you were mutating the whole of an image file, you could end up testing its image parsing code more often than the actual OCR code. If you wanted to target that OCR processing specifically, you might wish to keep the headers of the image file valid.
Likewise, you may be generating input that’s so random it doesn’t pass an initial sanity check in the program, or the code contains a checksum you don’t correct. You’re then only testing that first branch in the program, never getting deeper into the program code.
To operate effectively, a fuzzer needs to perform the following important tasks:
Fuzzers often split many of these tasks into separate modules. For example, having one library that can mutate data or generate it based on a definition and another to provide test cases to the target program and so on. Below are some notes on each of these tasks.
Generating test cases will vary depending on whether mutation-based or generation-based fuzzing is being employed. With either, there will be something that needs randomly transforming, whether it’s a field of a particular type or an arbitrary chunk of data.
These transformations can be completely random, but it’s worth remembering edge and corner cases can often be the source of bugs in programs. As such, you may wish to favor such cases and include values such as:
Depending on what you’re fuzzing, there may be specific values or characters that are more likely to trigger bugs. For example:
The simplest way to reproduce a test case is to record the exact input used when a crash is detected. However, there are other ways to ensure reproducibility that can be more convenient in certain circumstances. One way to do this is storing the initial seed used for the random component of test case generation, and ensuring all subsequent random behavior follows a path that can be traced back to that seed. By re-running the fuzzer with the same seed, the behavior should be reproducible. For example, you may only record the test case number and the initial seed and then quickly re-execute generation with that seed until you reach the given test case.
This technique can be useful when the target program may accumulate dependencies based on past inputs. Previous inputs may have caused the program to initialize various items in its memory that are required to be present to trigger the bug. In these situations, simply recording the crashing test case wouldn’t be sufficient to reproduce the bug.
Interfacing with the target program to provide the fuzzed input is often straightforward. For network protocols, it may involve sending the test case over the network, or responding to a client request. For file formats, it could mean executing the program with a command line argument pointing to the test case. However, sometimes the input’s provided in a form that’s not trivial to generate in an automated way or where scripting the program to execute each test case has a high overhead and proves to be very slow. Creative thinking in these cases can reveal ways to exercise the relevant piece of code with the right data.
For example, this may be performed by instrumenting a program in memory artificially to execute a parsing function with the input provided as an argument entirely in memory. This can remove the need for the program to go through a lengthy loading procedure before each test case. And further speed increases could be obtained by having test cases generated and provided completely in memory rather than going via the hard drive.
Crash detection is critical for fuzzing. If you can’t accurately determine when a program’s crashed, you won’t be able to identify a test case as triggering a bug. There are a number of common ways to approach this:
This provides you with the most accurate results, and you can script the debugger to provide you with a crash trace as soon as a crash is detected. However, having a debugger attached can slow programs significantly and can cause quite an overhead. The fewer test cases you can generate in a given period of time, the fewer chances you have of finding a crash.
Rather than attaching a debugger, you can simply see if the process ID of the target still exists on the system after executing the test case. If the process has disappeared, it probably crashed. You can re-run the test case in a debugger later if you want some more information about the crash. You can even do this automatically for each crash, while still avoiding the slowdown of having the debugger attached for every case.
If the program normally responds to your test cases, you can set a timeout after which you assume the program has either crashed or frozen. This can also detect bugs that cause the program to become unresponsive but not necessarily terminate.
Whichever method you use, the program should be restarted whenever it crashes or becomes unresponsive, in order to allow fuzzing to continue.
There are a number of things you can do to measure or improve the quality of your fuzzing. While these are useful things to keep in mind, you may not need to bother with them all if you’re already getting lots of unique crashes within a useful timeframe.
Possibly one of the most important factors in fuzzing is speed. How many test cases per second/minute can you run? Sensible values will of course depend on the target, but the more test cases you can execute, the more likely you will be to find a crash in a given time period. Fuzzing is random, so every test case is like a lottery ticket, and you want as many of them as you can get.
There are lots of things you can do to increase the speed of your test cases, such as improving the efficiency of your generation or mutation routines, parallelizing test cases, decreasing timeouts, or running programs in “headless” modes where they don’t display a GUI. If you want to, you can simply buy faster kit.
Finding crashes is only the start of the process. Once you find a crashing test case, you’ll need to analyze it, work out what the bug is, and either fix it or write an exploit for it depending on your motivation. If you have thousands of crashing test cases, this can be quite daunting. By categorizing crashes you can prioritize them according to which ones are most interesting to you. This can also help you identify when one test case is triggering the same bug as another, so you only keep the cases relating to unique crashes.
In order to do this, you’ll need some automated information about the crash so you can make a decision. Running the test case with the target attached to a debugger can provide a crash trace which you can parse to find values such as the exception type, register values, stack contents, and so on.
One Microsoft tool which can help with this is !exploitable (pronounced “bang exploitable”). It works with the WinDbg debugger to categorize crashes according to how exploitable it thinks the bug is.
As fuzzing randomly alters input, it’s common for a crashing test case to have multiple alterations which aren’t relevant to triggering the bug. Test case reduction is the act of narrowing down a test case to the minimum set of alterations from a valid input required to trigger the bug, so you only need to focus on that part of the input in your analysis.
This reduction can be performed manually, but can also be performed automatically by the fuzzer. When a crashing test case is encountered, the fuzzer can re-execute the test case several times. Each time, it gradually reduces the alterations made to the input until the smallest set of changes remain, whilst still triggering the bug. This can simplify your analysis and help to categorize crashing test cases as you’ll know precisely which parts of the input are affected.
This is a measure of how much of the program’s code has been executed by the fuzzer. The idea is the more coverage you get, the more of the program you’ve actually tested. Measuring code coverage can be tricky and often requires binary instrumentation to track which portions of code are being executed. You can also measure code coverage in different ways, such as by line, by basic block, by branch, or by code path.
Code coverage isn’t a perfect measure with regards to fuzzing, as it’s possible to execute code without revealing bugs in it. And there are often areas of code that almost never get executed, such as safety error checks that are unlikely to really be needed and or interesting to us anyway. Nevertheless, some form of code coverage measurement can provide insight into what your fuzzer is triggering within the program. This is especially the case when your fuzzing is completely black box and you may not yet know much about the program’s inner workings.Some tools and technologies that may help with code coverage include Pai Mei, Valgrind, DynamoRIO, and DTrace.
There are a number of existing frameworks that allow you to create fuzzers without having to work from scratch. Some of these frameworks are complex and it may still take a while to create a working fuzzer for your target. By contrast, others take a very simple approach.
A selection of these frameworks and fuzzers are listed below:
Radamsa’s designed to be easy to use and flexible. It attempts to “just work” for a variety of input types and contains a number of different fuzzing algorithms for mutation.
Sulley provides a comprehensive generation framework, allowing structured data to be represented for generation based fuzzing. It also contains components to help with recording test cases and detecting crashes.
The Peach framework can perform smart fuzzing for file formats and network protocols. It can perform both generation- and mutation-based fuzzing and contains components to help with modelling and monitoring the target.
SPIKE is a network protocol fuzzer. It requires good knowledge of C to use and is designed to run on Linux.
Grinder is a web browser fuzzer, which also has features to help in managing large numbers of crashes.
NodeFuzz is a node.js-based harness for web browsers, which includes instrumentation modules to gain further information from the client side.
AFL is a grey box fuzzing tool that utilizes instrumentation compiled in to its target code. AFL was originally written for fuzzing C and C++ programs in Linux, and was later forked to support Windows, Java, and .Net.