Tải bản đầy đủ
Chapter 5. Principles of Test Automation

Chapter 5. Principles of Test Automation

Tải bản đầy đủ


Chapter 5

Principles of Test Automation

For the most part, these principles apply equally well to unit tests and storytests. A possible exception is the principle Verify One Condition per Test, which
may not be practical for customer tests that exercise more involved chunks of
functionality. It is, however, still worth striving to follow these principles and to
deviate from them only when you are fully cognizant of the consequences.
Also known as:

Principle: Write the Tests First
Test-driven development is very much an acquired habit. Once one has “gotten
the hang of it,” writing code in any other way can seem just as strange as TDD
seems to those who have never done it. There are two major arguments in favor
of doing TDD:
1. The unit tests save us a lot of debugging effort—effort that often fully
offsets the cost of automating the tests.
2. Writing the tests before we write the code forces the code to be designed
for testability. We don’t need to think about testability as a separate
design condition; it just happens because we have written tests.

Principle: Design for Testability
Given the last principle, this principle may seem redundant. For developers
who choose to ignore Write the Tests First, Design for Testability becomes an
even more important principle because they won’t be able to write automated
tests after the fact if the testability wasn’t designed in. Anyone who has tried
to retrofit automated unit tests onto legacy software can testify to the difficulty
this raises. Mike Feathers talks about special techniques for introducing tests in
this case in [WEwLC].
Also known as:
Front Door First

Principle: Use the Front Door First
Objects have several kinds of interfaces. There is the “public” interface that clients
are expected to use. There may also be a “private” interface that only close friends
should use. Many objects also have an “outgoing interface” consisting of the used
part of the interfaces of any objects on which they depend.
The types of interfaces we use influence the robustness of our tests. The use of
Back Door Manipulation (page 327) to set up the fixture or verify the expected
outcome or a test can result in Overcoupled Software (see Fragile Test on page
239) that needs more frequent test maintenance. Overuse of Behavior Verification (page 468) and Mock Objects (page 544) can result in Overspecified Software
(see Fragile Test) and tests that are more brittle and may discourage developers
from doing desirable refactorings.

The Principles


When all choices are equally effective, we should use round-trip tests to test
our SUT. To do so, we test an object through its public interface and use State
Verification (page 462) to determine whether it behaved correctly. If this is not sufficient to accurately describe the expected behavior, we can make our tests layercrossing tests and use Behavior Verification to verify the calls the SUT makes to
depended-on components (DOCs). If we must replace a slow or unavailable DOC
with a faster Test Double (page 522), using a Fake Object (page 551) is preferable
because it encodes fewer assumptions into the test (the only assumption is that the
component that the Fake Object replaces is actually needed).

Principle: Communicate Intent
Fully Automated Tests, especially Scripted Tests (page 285), are programs. They
need to be syntactically correct to compile and semantically correct to run successfully. They need to implement whatever detailed logic is required to put the SUT
into the appropriate starting state and to verify that the expected outcome has
occurred. While these characteristics are necessary, they are not sufficient because
they neglect the single most important interpreter of the tests: the test maintainer.
Tests that contain a lot of code1 or Conditional Test Logic (page 200) are
usually Obscure Tests (page 186). They are much harder to understand because
we need to infer the “big picture” from all the details. This reverse engineering
of meaning takes extra time whenever we need to revisit the test either to maintain it or to use the Tests as Documentation. It also increases the cost of ownership of the tests and reduces their return on investment.
Tests can be made easier to understand and maintain if we Communicate Intent. We can do so by calling Test Utility Methods (page 599) with
Intent-Revealing Names [SBPP] to set up our test fixture and to verify that the
expected outcome has been realized. It should be readily apparent within the
Test Method (page 348) how the test fixture influences the expected outcome of
each test—that is, which inputs result in which outputs. A rich library of Test
Utility Methods also makes tests easier to write because we don’t have to code
the details into every test.

Principle: Don’t Modify the SUT
Effective testing often requires us to replace a part of the application with a Test
Double or override part of its behavior using a Test-Specific Subclass (page 579).
This may be because we need to gain control over its indirect inputs or because we
need to perform Behavior Verification by intercepting its indirect outputs. It may


Anything more than about ten lines is getting to be too much.

Also known as:


Chapter 5

Principles of Test Automation

also be because parts of the application’s behavior have unacceptable side effects or
dependencies that are impossible to satisfy in our development or test environment.
Modifying the SUT is a dangerous thing whether we are putting in Test
Hooks (page 709), overriding behavior in a Test-Specific Subclass, or replacing
a DOC with a Test Double. In any of these circumstances, we may no longer
actually be testing the code we plan to put into production.
We need to ensure that we are testing the software in a configuration that is
truly representative of how it will be used in production. If we do need to replace
something the SUT depends on to get better control of the context surrounding the
SUT, we must make sure that we are doing so in a representative way. Otherwise,
we may end up replacing part of the SUT that we think we are testing. Suppose,
for example, that we are writing tests for objects X, Y, and Z, where object X
depends on object Y, which in turn depends on object Z. When writing tests for
X, it is reasonable to replace Y and Z with a Test Double. When testing Y, we can
replace Z with a Test Double. When testing Z, however, we cannot replace it with
a Test Double because Z is what we are testing! This consideration is particularly
salient when we have to refactor the code to improve its testability.
When we use a Test-Specific Subclass to override part of the behavior of an
object to allow testing, we have to be careful that we override only those methods that the test specifically needs to null out or use to inject indirect inputs. If
we choose to reuse a Test-Specific Subclass created for another test, we must
ensure that it does not override any of the behavior that this test is verifying.
Another way of looking at this principle is as follows: The term SUT is relative to the tests we are writing. In our “X uses Y uses Z” example, the SUT for
some component tests might be the aggregate of X, Y, and Z; for unit testing
purposes, it might be just X for some tests, just Y for other tests, and just Z for
yet other tests. Just about the only time we consider the entire application to be
the SUT is when we are doing user acceptance testing using the user interface
and going all the way back to the database. Even here, we might be testing only
one module of the entire application (e.g., the “Customer Management Module”). Thus “SUT” rarely equals “application.”
Also known as:

Principle: Keep Tests Independent
When doing manual testing, it is common practice to have long test procedures that
verify many aspects of the SUT’s behavior in a single test. This aggregation of tasks is
necessary because the steps involved in setting up the starting state of the system for
one test may simply repeat the steps used to verify other parts of its behavior. When
tests are executed manually, this repetition is not cost-effective. In addition, human
testers have the ability to recognize when a test failure should preclude continuing

The Principles

execution of the test, when it should cause certain tests to be skipped, or when the
failure is immaterial to subsequent tests (though it may still count as a failed test.)
If tests are interdependent and (even worse) order dependent, we will deprive
ourselves of the useful feedback that individual test failures provide. Interacting
Tests (see Erratic Test on page 228) tend to fail in a group. The failure of a test
that moved the SUT into the state required by the dependent test will lead to
the failure of the dependent test, too. With both tests failing, how can we tell
whether the failure reflects a problem in code that both tests rely on in some
way or whether it signals a problem in code that only the first test relies on?
When both tests fail, we can’t tell. And we are talking about only two tests in
this case—imagine how much worse matters would be with tens or even hundreds of Interacting Tests.
An Independent Test can be run by itself. It sets up its own Fresh Fixture (page 311) to put the SUT into a state that lets it verify the behavior it is
testing. Tests that build a Fresh Fixture are much more likely to be independent
than tests that use a Shared Fixture (page 317). The latter can lead to various
kinds of Erratic Tests, including Lonely Tests, Interacting Tests, and Test Run
Wars. With independent tests, unit test failures give us Defect Localization to
help us pinpoint the source of the failure.

Principle: Isolate the SUT
Some pieces of software depend on nothing but the (presumably correct) runtime system or operating system. Most pieces of software build on other pieces
of software developed by us or by others. When our software depends on other
software that may change over time, our tests may suddenly start failing because
the behavior of the other software has changed. This problem, which is called
Context Sensitivity (see Fragile Test), is a form of Fragile Test.
When our software depends on other software whose behavior we cannot
control, we may find it difficult to verify that our software behaves properly
with all possible return values. This is likely to lead to Untested Code (see Production Bugs on page 268) or Untested Requirements (see Production Bugs).
To avoid this problem, we need to be able to inject all possible reactions of the
DOC into our software under the complete control of our tests.
Whatever application, component, class, or method we are testing, we should
strive to isolate it as much as possible from all other parts of the software that
we choose not to test. This isolation of elements allows us to Test Concerns
Separately and allows us to Keep Tests Independent of one another. It also helps
us create a Robust Test by reducing the likelihood of Context Sensitivity caused
by too much coupling between our SUT and the software that surrounds it.



Chapter 5

Principles of Test Automation

We can satisfy this principle by designing our software such that each piece
of depended-on software can be replaced with a Test Double using Dependency
Injection (page 678) or Dependency Lookup (page 686) or overridden with a
Test-Specific Subclass that gives us control of the indirect inputs of the SUT.
This design for testability makes our tests more repeatable and robust.

Principle: Minimize Test Overlap
Most applications have lots of functionality to verify. Proving that all of the
functionality works correctly in all possible combinations and interaction scenarios is nearly impossible. Therefore, picking the tests to write is an exercise in
risk management.
We should structure our tests so that as few tests as possible depend on a
particular piece of functionality. This may seem counter-intuitive at first because one would think that we would want to improve test coverage by testing
the software as often as possible. Unfortunately, tests that verify the same functionality typically fail at the same time. They also tend to need the same maintenance when the functionality of the SUT is modified. Having several tests verify
the same functionality is likely to increase test maintenance costs and probably
won’t improve quality very much.
We do want to ensure that all test conditions are covered by the tests that we
do use. Each test condition should be covered by exactly one test—no more, no
less. If it seems to provide value to test the code in several different ways, we
may have identified several different test conditions.

Principle: Minimize Untestable Code
Some kinds of code are difficult to test using Fully Automated Tests. GUI components, multithreaded code, and Test Methods immediately spring to mind as
“untestable” code. All of these kinds of code share the same problem: They are
embedded in a context that makes it hard to instantiate or interact with them
from automated tests.
Untestable code simply won’t have any Fully Automated Tests to protect it
from those nefarious little bugs that can creep into code when we aren’t looking. That makes it more difficult to refactor this code safely and more dangerous to modify existing functionality or introduce new functionality.
It is highly desirable to minimize the amount of untestable code that we have
to maintain. We can refactor the untestable code to improve its testability by
moving the logic we want to test out of the class that is causing the lack of testability. For active objects and multithreaded code, we can refactor to Humble
Executable (see Humble Object on page 695). For user interface objects, we

The Principles


can refactor to Humble Dialog (see Humble Object). Even Test Methods can
have much of their untestable code extracted into Test Utility Methods, which
can then be tested.
When we Minimize Untestable Code, we improve the overall test coverage of
our code. In so doing, we also improve our confidence in the code and extend
our ability to refactor at will. The fact that this technique improves the quality
of the code is yet another benefit.

Principle: Keep Test Logic Out of Production Code
When the production code hasn’t been designed for testability (whether as a
result of test-driven development or otherwise), we may be tempted to put
“hooks” into the production code to make it easier to test. These hooks typically take the form of if testing then ... and may either run alternative logic or
prevent certain logic from running.
Testing is about verifying the behavior of a system. If the system behaves differently when under test, then how can we be certain that the production code
actually works? Even worse, the test hooks could cause the software to fail in
The production code should not contain any conditional statements of the if
testing then sort. Likewise, it should not contain any test logic. A well-designed
system (from a testing perspective) is one that allows for the isolation of functionality. Object-oriented systems are particularly amenable to testing because
they are composed of discrete objects. Unfortunately, even object-oriented systems can be built in such a way as to be difficult to test, and we may still encounter code with embedded test logic.

Principle: Verify One Condition per Test
Many tests require a starting state other than the default state of the SUT, and
many operations of the SUT leave it in a different state from its original state.
There is a strong temptation to reuse the end state of one test condition as the
starting state of the next test condition by combining the verification of the two
test conditions into a single Test Method because this makes testing more efficient. This approach is not recommended, however, because when one assertion
fails, the rest of the test will not be executed. As a consequence, it becomes more
difficult to achieve Defect Localization.
Verifying multiple conditions in a single test makes sense when we execute
tests manually because of the high overhead of test setup and because the liveware can adapt to test failures. It is too much work to set up the fixture for a
large number of manual tests, so human testers naturally tend to write long

Also known as:
No Test Logic
in Production

Also known as:


Chapter 5

Principles of Test Automation

multiple-condition tests.2 They also have the intelligence to work around any
issues they encounter so that all is not lost if a single step fails. In contrast, with
automated tests, a single failed assertion will cause the test to stop running and
the rest of the test will provide no data on what works and what doesn’t.
Each Scripted Test should verify a single test condition. This single-mindedness
is possible because the test fixture is set up programmatically rather than by a
human. Programs can set up fixtures very quickly and they don’t have trouble executing exactly the same sequence of steps hundreds of times! If several tests need
the same test fixture, either we can move the Test Methods into a single Testcase
Class per Fixture (page 631) so we can use Implicit Setup (page 424) or we can
call Test Utility Methods to set up the fixture using Delegated Setup (page 411).
We design each test to have four distinct phases (see Four-Phase Test on
page 358) that are executed in sequence: fixture setup, exercise SUT, result
verification, and fixture teardown.
• In the first phase, we set up the test fixture (the “before” picture) that
is required for the SUT to exhibit the expected behavior as well as anything we need to put in place to observe the actual outcome (such as
using a Test Double).
• In the second phase, we interact with the SUT to exercise whatever
behavior we are trying to verify. This should be a single, distinct behavior; if we try to exercise several parts of the SUT, we are not writing a
Single-Condition Test.
• In the third phase, we do whatever is necessary to determine whether
the expected outcome has been obtained and fail the test if it has not.
• In the fourth phase, we tear down the test fixture and put the world
back into the state in which we found it.
Note that there is a single exercise SUT phase and a single result verification
phase. We avoid having a series of such alternating calls (exercise, verify, exercise,
verify) because that approach would be trying to verify several distinct conditions—something that is better handled via distinct Test Methods.
One possibly contentious aspect of Verify One Condition per Test is what
we mean by “one condition.” Some test drivers insist on one assertion per test.
This insistence may be based on using a Testcase Class per Fixture organization
of the Test Methods and naming each test based on what the one assertion is


Clever testers often use automated test scripts to put the SUT into the correct starting
state for their manual tests, thereby avoiding long manual test scripts.

The Principles

verifying.3 Having one assertion per test makes such naming very easy but also
leads to many more test methods if we have to assert on many output fields. Of
course, we can often comply with this interpretation by extracting a Custom
Assertion (page 474) or Verification Method (see Custom Assertion) that allows
us to reduce the multiple assertion method calls to a single call. Sometimes that
approach makes the test more readable. When it doesn’t, I wouldn’t be too dogmatic about insisting on a single assertion.

Principle: Test Concerns Separately
The behavior of a complex application consists of the aggregate of a large number of smaller behaviors. Sometimes several of these behaviors are provided by
the same component. Each of these behaviors is a different concern and may
have a significant number of scenarios in which it needs to be verified.
The problem with testing several concerns in a single Test Method is that
this method will be broken whenever any of the tested concerns is modified.
Even worse, it won’t be obvious which concern is the one at fault. Identifying the real culprit typically requires Manual Debugging (see Frequent Debugging on page 248) because of the lack of Defect Localization. The net effect is
that more tests will fail and each test will take longer to troubleshoot and fix.
Refactoring is also made more difficult by testing several concerns in the same
test; it will be harder to “tease apart” the eager class into several independent
classes, each of which implements a single concern, because the tests will need
extensive redesign.
Testing our concerns separately allows a failure to tell us that we have a
problem in a specific part of our system rather than simply saying that we
have a problem somewhere. This approach to testing also makes it easier
to understand the behavior now and to separate the concerns in subsequent
refactorings. That is, we should just be able to move a subset of the tests to
a different Testcase Class (page 373) that verifies the newly created class; it
shouldn’t be necessary to modify the test much more than changing the class
name of the SUT.

Principle: Ensure Commensurate Effort and Responsibility
The amount of effort it takes to write or modify tests should not exceed the
effort it takes to implement the corresponding functionality. Likewise, the tools
required to write or maintain the test should require no more expertise than the
tools used to implement the functionality. For example, if we can configure the


For example, AwaitingApprovalFlight.validApproverRequestShouldBeApproved.



Chapter 5

Principles of Test Automation

behavior of a SUT using metadata and we want to write tests that verify that
the metadata is set up correctly, we should not have to write code to do so. A
Data-Driven Test (page 288) would be much more appropriate in these circumstances.

What’s Next?
Previous chapters covered the common pitfalls (in the form of test smells) and
goals of test automation. This chapter made the value system we use while
choosing patterns explicit. In Chapter 6, Test Automation Strategy, we will
examine the “hard to change” decisions that we should try to get right early in
the project.

Chapter 6

Test Automation Strategy
About This Chapter
In previous chapters, we saw some of the problems we might encounter with
test automation. In Chapter 5, Principles of Test Automation, we learned about
some of the principles we can apply to help address those problems. This chapter
gets a bit more concrete but still focuses at the 30,000-foot level. In the logical
sequence of things, test strategy comes before fixture setup but is a somewhat
more advanced topic. If you are new to test automation using xUnit, you may
want to skip this chapter and come back after reading more about the basics
of xUnit in Chapter 7, xUnit Basics, and about fixture setup and teardown in
Chapter 8, Transient Fixture Management, and subsequent chapters.

What’s Strategic?
As the story in the preface amply demonstrates, it is easy to get off on the wrong
foot. This is especially true when you lack experience in test automation and
when this testing strategy is adopted “bottom up.” If we catch the problems early
enough, the cost of refactoring the tests to eliminate the problems can be manageable. If, however, the problems are left to fester for too long or the wrong approach
is taken to address them, a very large amount of effort can be wasted. This is not
to suggest that we should follow a “big design upfront” (BDUF) approach to test
automation. BDUF is almost always the wrong answer. Rather, it is helpful to be
aware of the strategic decisions necessary and to make them “just in time” rather
than “much too late.” This chapter gives a “head’s up” about some of the strategic
issues we want to keep in mind so that we don’t get blindsided by them later.
What makes a decision “strategic”? A decision is strategic if it is “hard to
change.” That is, a strategic decision affects a large number of tests, especially
such that many or all the tests would need to be converted to a different approach


Chapter 6

Test Automation Strategy

at the same time. Put another way, any decision that could cost a large amount of
effort to change is strategic.
Common strategic decisions include the following considerations:
• Which kinds of tests to automate?
• Which tools to use to automate them?
• How to manage the test fixture?
• How to ensure that the system is easily tested and how the tests interact
with the SUT?
Each of these decisions can have far-reaching consequences, so they are best made
consciously, at the right time, and based on the best available information.
The strategies and more detailed patterns described in this book are equally
applicable regardless of the kind of Test Automation Framework (page 298)
we choose to use. Most of my experience is with xUnit, so it is the focus of this
book. But “don’t throw out the baby with the bath water”: If you find yourself
using a different kind of Test Automation Framework, remember that most of
what you learn in regard to xUnit may still be applicable.

Which Kinds of Tests Should We Automate?
Roughly speaking, we can divide tests into the following two categories:
• Per-functionality tests (also known as functional tests) verify the behavior
of the SUT in response to a particular stimulus.
• Cross-functional tests verify various aspects of the system’s behavior
that cut across specific functionality.
Figure 6.1 shows these two basic kinds of tests as two columns, each of which is
further subdivided into more specific kinds of tests.

Per-Functionality Tests
Per-functionality tests verify the directly observable behavior of a piece of software. The functionality can be business related (e.g., the principal use cases of
the system) or related to operational requirements (e.g., system maintenance
and specific fault-tolerance scenarios). Most of these requirements can also be
expressed as use cases, features, user stories, or test scenarios.
Per-functionality tests can be characterized by whether the functionality is
business (or user) facing and by the size of the SUT on which they operate.