What is bug surface area?

Understand how the opportunity for new bugs increases as your products grow.

July 3, 2024

Introduction

Most defects end up costing more than it would have cost to prevent them. Defects are expensive when they occur, both the direct costs of fixing the defects and the indirect costs because of damaged relationships, lost business, and lost development time. — Kent Beck, Extreme Programming Explained

You likely know what a bug is in a software project: it's simply where a feature in a piece of software behaves in an unexpected and unintended way due to an error in the underlying code. Knowing how bugs impact your products and what to expect is important for any software company. One of the challenges your development team faces is as your project grows the growth of the number of bugs increases at a much faster pace; sometimes even at an exponential rate! Making sure you've tackled the opportunity for bugs to surface like this before releasing it to production is essential for any successful software project. But why does this happen? It's best to look at an example project and then dive into the underlying behavior or what's happening.

How a simple chat application can turn into a wild mess

So let's say we're working on a simple chat application that can open a window between two computers. You write some text in a box, click a button, and your friend on the other side can view your message (these are two features). Now, suppose you want to be able to chat with different people, so now you need a list of friends you can talk to, giving you a third feature. But what happens if someone that isn't your friend sends you a chat? Well, this is a new edge case to consider and work out and requires another feature as the solution. You're now creating a basic permission system allowing users to send messages to one another if both users are on each other's list of friends. What happens if someone tries to send a message to another where they aren't on the recipient's friend list? Should there be an alert? If so, that's another feature added.

Now, we can keep playing this game going back and forth, but it should be clear one of the core components is the chat system sending the underlying messages being sent between devices. If you change the chat feature's code and this ends up requiring all the other parts of the system to be updated, this is a feature dependence from the basic chat feature with every other feature in your product. That means anytime you change the code for the chat functionality, you must ensure every other part of your system works correctly after making these changes.

What's worse is that at this point we're assuming everything works fine after changing code in the chat system. It's common that through making these changes you would introduce bugs in the dependent features you've updated. Now you need to go ahead and change code in every other feature dependent on the chat system and fix whatever bugs were introduced. But now you're at the same place you were before, those changes could cascade to new bugs in some other features. This game of cat and mouse comes from bug surface area. It is the non-linear relationship between the number of features and the number of (potential) bugs within a software project. So as soon as you're adding new functionality and making changes to the code, you could be introducing problems that cascade down the interdependence of features within the project.

How QA takes control of bug surface area

Thinking about this for the first time might make your head spin. It's a challenge every software project runs into and is something that's well-studied. It turns out that one of the most effective tools for fighting these kinds of bug explosions is by using automated testing. These are automated checks your developers run before they upload their code changes to the main repository. Without it, they'll be scrounging around the codebase trying to find any broken functionality from the previous code changes they've made. This is a tedious process and prone to error. Would you want your users to wake up one day and have one of their workflows broken because a developer uploaded a last-minute change and forgot to check an edge case? As your product matures and becomes more complex, this becomes untenable because of how many checks a developer would have to make before uploading their changes.

References