User login

On Unit Tests

Stephan Dahl, June 2., 2009

Unit tests are an integral part of a solid application. While the unit tests are of course never seen by the users of the application, they are part of the foundation that ensures that the application works according to intent, an more importantly, that it continues to do so in the face of a lifetime of changes.   

This article discuss the principles of designing Unit Tests into an application.

The Testable Unit

Before discussing how to design unit tests, we need to define what a testable unit is. In the following, I will use Anything that can be removed from the application and still run as a unit. This has a few important implications.

Unit Sizes

By this definition, units can be quite small (eg, a set of application-specific date-formatting routines) and quite large (eg, everything needed to perform a login).

What is not a unit

A part of the software that requires something external to itself is never a unit in itself. For example, a JSP page does not work without its backing action class, which in turn does not work without its backing persistence layer, which in turn does not work without a database.

Units can compose units

A testable unit thus consist of a part of the code, plus all the lower-level components (business logic, persistence layers etc) required to make it work. The unit tests for such a component will generally only consider the function provided by the component itself, including how it handles errors in its underlying layers. It can safely assume that the lower layers have their own unit tests and will therefore behave according to specification. However, it is wise to keep in mind that the component stack under test can be fairly deep, and that subtle bugs in the lower substrates sometimes doesn't manifest until they are triggered by higher-level complex functions. This, in itself, is a good reason to include the lower-level components in the unit.

Stubs and Mocks and Simulators.

Sometimes it is not practical to include everything a testable unit needs within the unit test - for example, parts of the application may not be developed yet, or the application may need to call external services that are not available to a development environment, or which charges by call. Such incomplete or otherwise unavailable systems must be simulated in some way; the techniques for this are usefully categorized as Stubs, Mocks or Simulators.

Stub

A stub is incomplete production code. Instead of making the "real" external call, the code makes up a fake response and returns that to the caller. Stubs are commonly seen where the interface to a particular component is not yet specified, allowing development of the calling code (eg, the user interface) to proceed unhindered. The intent is that once the interface is designed (or once the development schedule permits) the stub is replaced by real code.

Stubs are good and useful when they permit decoupling development efforts - the developer of a user interface need not be delayed by as-yet missing business logic, since it can just be stubbed out as needed. However, once the application is ready for production deployment, there must not be any stubs left.

Unit tests for items that are built on stubs should run unchanged once the stub is replaced by real code. Implementing a previously stubbed-out unit implies not only building unit tests for the unit itself, but also ensuring that all units that calls it still runs successfully.

Mocks

A Mock is used where the application is designed with a "pluggable backend". For instance, it may allow the use of different database managers, thus making the entire persistence layer replaceable. This concept generalizes to testing scaffolding - a pluggable persistence layer could be replaced by a layer that doesn't use a real database at all.

Mocks should never be built unless there is a design requirement for a replaceable backend; The overhead involved in making a robust interface is rarely worth it, unless for some strange reason the entire layer is awaiting the start of development while a higher layer has already begun.

Simulators

A simulator usually replaces a partner system or application by implementing the same interface as the partner and substituting the actual implementation. The variation is thus usually limited to addresses and security parameters, which in a well-designed application are externalized to run-time parameters anyway.

A separate sub-category is Simulated Users. Frequently, a particular process to be tested includes one or more human decisions and actions, and a programmable simulated user can then bridge those gaps on behalf of a unit test. Any application that includes a significant amount of human interaction will have simulated users anyway, to execute unit tests on the user interface units, so there is rarely any overhead involved in building the simulated user itself.

The advantage to simulating a partner is that it keeps the production code clean of test scaffolding, leading to a simpler code base and easier maintenance. The disadvantage, of course, is that the simulator then needs to be maintained too. However, the simulated partner and the application itself usually changes at completely unrelated schedules and for different reasons, so it makes sense to keep the two completely separate.

Performance

A particular concern when using any kind of function substitution is the effect it has on the application performance. Stubs and mocks usually speeds up performance, which is nice when running unit tests but which can lead to missed non-functional requirements, for instance when the ever-popular round-tripping anti-pattern hides behind a microsecond-per-fake-call stub. If the stub- or mock-based partially completed application is shown to actual end-users (which is generally a good idea, to catch design misunderstandings early) then an unrealistically good responsiveness is bound to lead to disappointment once the real backend is attached. For this reason, it is usually a good idea to build in configurable delays in the stubs and mocks, to allow tuning their performance depending on whether they are running as a unit test or as an early prototype user-demo.

Simulators, with their real network calls (even if just to the local development machine), exhibit much more realistic performance - which can be a problem once a complete unit test of the entire application starts to take more than a few minutes. If the test suites start to expand past a minute or so elapse time, it is time to start breaking the test suites out into sub-suites to allow the developer to run only those tests that are relevant for the code being worked on. The very healthy development style of "writing a stub, then writing a test that fail, then writing code to make the test pass, and then repeat steps 2 and 3 until done" absolutely requires that running the tests is fast and doesn't halt the developers train of thought.

Designing a Unit Test

The unit tests are part of the application, and must be included both in design and planning.

Design

When writing the design of a component, or a component template, a section should be devoted to the intended level of test. While it is obvious that (for example) all public methods on a class to be unit tested needs to be called, the expected variation in data is not necessarily obvious to the developer building the code, or who is refactoring it a few years down the road. The unit test section of the design page should provide examples of typical usage and expected results.

Planning

Creating or modifying the unit test code is done in tandem with writing or changing the production code, by the same person. It is to all intents and purposes the same task. For this reason, it is pointless to create separate planning tasks for unit tests and code.

However, it is important that the developers and designers making the estimates includes the effort needed to make unit test code and create suitable unit test data in their numbers. Unit tests and related data usually takes as much effort as the code it is intended to test, and inexperienced or pressured developers are prone to forget this work. It is equally important that the review task (which should be a separate planning task) includes reviewing the unit tests (at the very least, running them and verifying that they pass).

Writing the Unit Test

The unit test must cover all functionality provided by the unit being tested, excluding functionality provided by lower-level units, since those units have their own unit tests. Thus, test in a persistence layer must cover read, write, concurrent access, commit and rollback etc, but unit tests of a business logic layer that uses the persistence layer need only include tests for success and failure of the persistence layer. The following sections discuss general component stereotypes and their common testing requirements.

Class

A unit test for a Class (or Interface) must include tests of all public methods provided by the class. Note that overridden inherited methods must also be tested. 

Functional Module

Unit tests of a functional module must include tests of all entry points.

Persistence Layer

Unit tests of a persistence layer component must include the usual Create, Read, Update and Delete operations provided. In addition, persistence layer tests must test graceful handling of deadlocks and timeouts. If the persistence layer includes a caching mechanism, then the cache must be tested thoroughly too for consistency; This is in general a large and difficult task, and rolling ones own caching mechanism should only be attempted when no existing caching framework can be found to satisfy the applications requirements.

Business Logic Layer

Business logic is usually expressable in terms of state, inputs and outputs. Tests can then usually be expressed as "given state S and input I, verify result state S' and output O". These expressions are sometimes provided directly in the usecases or a business rules catalog, but must sometimes be derived by the unit test author. It is always a good idea to phrase the rules to be tested in a consistent and formal way - preferably in the design specifications where the application owners can review them, but at the very least in the comments of the unit test where the code reviewer or maintainer can read them.

User Interface Layer

User interface tests usually include a few expected scenarios, plus as many error scenarios as the test author can think of. If the user interface to the application is more complex than a simple command line, then a programmable simulated user is a necessary tool; Fortunately such tools are available for all user interface technologies. With highly interactive user interfaces (as provided by native GUIs, Java Swing, as well as Ajax-style web UIs) it is important to consider timing of events, so the simulated user can react to (eg) dynamically filled drop-down fields.

Creating this kind of test is generally more difficult than the more deterministic tests in the other layers, and a common anti-pattern is to just not do it, leaving tests of the human interface to human testers in the project's test phase. This will usually lead to lower quality, higher costs and delayed deliveries, as the UI bug-fixing process starts to include more than one organizational unit, plus the inevitable management layers.

External Interface

If the application includes external interfaces, then simulators are absolutely necessary to send or receive these interfaces. The simulators' internals must be accessible from within the unit test code, in order to let the automated tests verify the content and format of the interfaces.

In Software Programmes where several partner applications are built simultaneously, a side benefit of building simulators is realized: They also form a very precise requirements specification for the side of the interface that builds last, usually saving considerable effort on that projects part.

Code Coverage

Running the unit test must cover the unit under test - all lines of code must be excercised. Many tools exists that can measure this coverage, though all require a level of interpretation of the results. Unlike the metric "Unit tests passed", which should always be 100%, "Code Line Coverage" need only approach 100%. Note that for a particular unit, only the code within that unit need to be 100% covered - code in lower layers have their own unit tests and coverage targets.

Build and Deploy

The application must include Build scripts to deal with packaging the application with and without its test scaffolding; A well-designed application is built from a particular revision of the source repository, by an automated system, and a process for "deploying without touching" should guarantee that the application was never modified in any way since it was built.

These build scripts must be considered part of the application code, and should include scripted execution of the entire test suite in the environments where the test scaffolding deploys.

The best solution is to build separate deployable modules, such that the production application can be promoted from the build environment, through its test environments, accompanied by its test scaffolding. One the application deploys to a production (or production-like) enviroment, it can then jettison its scaffolding. This approach has the advantage that it can (along with suitable deployment management systems) ensure that the application that was tested and approved is the exact same application as is actually deployed to production.

Maintaining the Unit Tests

When defining the configuration items for the projects configuration management plan, the unit tests must be managed on par with the production code. Unit tests, including mocks or simulators, are part of the code base and must be included in any change or bug fix made throughout the applications lifetime.