Is there a place for manual testing in software development?

August 5, 2024 Best Practices, automation, cicd, code review, consistency, error, failure, mistake, production, staging, testing

Anyone who has read my work knows that I’m a huge fan of automated testing. I’ve preached automated testing ever since Chris Hartjes (Grumpy Programmer) wrote about it years ago. He converted me, and I have been a dutiful, loyal apostle of the “test everything” movement for years.

Similarly, I’ve long been a faithful proponent of code review to catch bugs and improve development processes.

But what happens when automated testing and code review fall flat? What do we do then? How do we recover?

What happened

A client of mine recently prevented a very serious bug from going into production. The bug, introduced in an area of the code few people understood and most people avoided, had thorough automated tests – but had special flags for development versus production. And the code review that was performed apparently missed implementation of the bug, resulting in the client preparing to release the product without detecting the bug.

At least, that was the story – until manual testers got involved. A manual tester identified the issue before shipping the product to customers. They were able to articulate that there was a problem, and over the weekend the team was able to resolve the issue without impact to the customers or the timeline.

When automation and code review aren’t enough

Automation and code review are exceptionally good tools. They provide us with insights into our work, but as tools they have natural limitations that mean they aren’t always effective. That was the case here.

Automated tests are only really as good as the bugs you’ve already found. While we can thoroughly test expected functionality, we cannot test every permutation of possible failure cases – at least not without the use of additional, specialized tools like Mutation. This means that our tests, while potentially thorough, are not impervious to missing a bug.

In the case of my client, they also were testing mainly focused on the development flag – rather than the production flag. Tests need to be run as close to production as possible, but this is not always a reality for all tests. In fact, many tests cannot be run against production-like environments for various reasons – they’re destructive, for example – meaning the best we can do is a contrived example. But contrived examples are subject to the potential for failure.

This is where code review often comes in. Experienced developers can spot bugs or potential problems, and junior developers can learn from more senior developers’ work. But in this case, code review wasn’t as thorough as it should have been. This aspect of the code was not often modified, was central to the application as a whole, and the changes were esoteric meaning they weren’t clear to most people who might have seen them. In other words, code review was not the failsafe it needed to be.

How to address these weaknesses

What do we do when automation and code review fails us?

The answer to this is complicated. First and foremost, we have to recognize that these tools are just that – tools – that are fallible and potentially subject to failure. But there are some strategies we can employ to hopefully solve these issues.

Test-driven development

No new tests were written for the code that was inserted. This means that the old tests were used to assert new functionality – something they were never designed to do. This was addition of new functionality, rather than refactoring. When refactoring, we often use existing tests to assert that behavior has not changed, but when adding new code, new tests are a must.

Writing new tests is best done when there is not new code to test. Writing the tests first helps articulate what the code should do, and gives future reviewers something to peg their expectations against. Additionally, writing new tests can articulate ways in which old tests are insufficient or outdated – and might need updating.

Some areas of the code are more sensitive than others

Because the area of the code my client was updating was both central to the application and not well understood by other developers, it was imperative that code review be both careful and slow. Unfortunately that wasn’t the case.

Certain areas of the code will have greater impact than others. It is incumbent on reviewers to know when areas of the code that are sensitive are being modified and, as appropriate, subject them to greater scrutiny. There’s a difference between changing a button color and changing the login mechanism for an application, for example.

It might be useful for documenting certain features that must be reviewed by senior developers, rather than by more junior team members. Though more junior team members were not responsible for the failure in this case, documenting special areas of the code for second looks is an appropriate way to ensure code reviews are thorough.

Staging must match production (and test environments as close as possible)

Because the bug was not visible in dev mode, and developers tend to work in dev mode, the bug almost made it into production. And test environments that were stood up mirrored dev mode, which means they didn’t catch the bug either.

Test environments and staging should match production as closely as possible, to the point (if possible) of running the same containers or code base. This way, when you do test against staging or your test builds, you’re really testing a production-like environment.

Building a production-like staging environment isn’t an extra cost – it’s an extra insurance policy. Take the time and money to build a scaled-down version of production for testing against. This means having staging be as close as possible” if you run multiple containers of your application, there should be multiples in staging. If you use Redis caching and message queues in production, you should do the same in staging. You should test against a sample Postgres/MySQL database in test and staging, not a SQLite database, as well.

Leverage manual tests – they’re valuable!

Finally, automation, code review and comparable platforms are not always going to catch everything. Manual testing still matters. Automation, code review and equivalent platforms reduce the risk, they do not eliminate it. Take the time to carefully test your work, especially new features.

That’s not to say automated testing is unreliable or code review doesn’t matter. It does! What this means is that there can be things you miss, and a multi-layered approach is necessary to shipping a solid production-ready application.

Conclusion

Shipping production-ready applications is a hard thing. We all make mistakes, and failures can happen at any point. Employing a multi-pronged approach is important for deploying and shipping applications into production that are stable and serve their purpose without failure.