Notes On Automated Acceptance Testing (from the Continuous Delivery book)

January 8, 2015

(Cross-posted from blog.iterate.no)

These are my rather extensive notes from reading the chapter 8 on Automated Acceptance Testing in the Continuous Delivery bible by Humble and Farley. There is plenty of very good advice that I just had to take record of. Acceptance testing is an exciting and important subject. Why should you care about it? Because:

We know from experience that without excellent automated acceptance test coverage, one of three things happens: Either a lot of time is spent trying to find and fix bugs at the end of the process when you thought you were done, or you spend a great deal of time and money on manual acceptance and regression testing, or you end up releasing poor-quality software.

(Ch8 focuses primarily on "functional" req., ch9 on "non-functional" or rather cross-functional requirements.)

acceptance tests are business-facing, story level test asserting it's complete and working run in a prod-like env; also serves as a regression test
manual testing is expensive => done infrequently => defects discovered late when there is little time to fix them and we risk to introduce new refression defects
Acc.T. put the app through a series of states => great for discovering threading problems, emergent behavior in event-driven apps, bugs due to architectural mistakes, env/config problems
expensive if done poorly

How to Create Maintainable Acc. T. Suites

Good acceptance criteria ("INVEST" - especially valuable to users, testable)
Layered implementation:
1. Acceptance criteria (Given/When/Then) - as xUnit tests or with Concordion/FitNesse/...
2. Test implementation - it's crucial that they use a (business) domain-specific language (DSL), no direct relation to UI/API, which would make it brittle
3. Application driver layer - translates the DSL to interactions with the API/UI, extracts and returns results
Take care to keep test implementation efficient and well factored, especially wrt. managing state, handling timeouts, use of test doubles. Refactor.

Testing against GUI

it's most realistic but complex (to set up, ..), brittle, hard to extract results, impossible with GUI technologies that aren't testable
if GUI is a thin layer of display-only code with no business/application logic, there is little risk in bypassing it and using the API it talks to directly - this is recommended whenever possible (plus, perhaps, few UI [smoke] tests)

Creating Acc. Tests

all (analyst, devs, testers) define acceptance criteria to ensure they all understand and testability

Acceptance Criteria as Executable Specifications

(See BDD) - the plain text specifications are bound to actual tests so that they have to be kept up to date [JH: "Living Documentation"]

The Application Driver Layer

provides business-level API and interacts with the application; f.ex. admin_api.createUser("Dave") or app_api.placeOrder("Dave", {"product": "Chocolate", "quantity": "5kg"}) - that both translate into a complex sequence of interactions with the API/UI of the app
tip: aliasing key values - createUser("Dave") actually creates a user with a random name but aliases it to "Dave" in the course of the test => readable test, unique data
tip: defaults - test data are created with reasonable defaults so that a test only needs to set what it cares about - so createUser takes many optional parameters (tlf, email, balance, ...)
a well done driver improves test reliability thanks to reuse - only 1/few places to fix on a change
develop it iteratively, start with a few cases and simple tests, extend on-demand

How to Express Your Acceptance Criteria

Internal DSL, i.e. in your programming language (f.ex. JUnit tests using App. Driver) - simple(r), refactoring-frinedly, scary for business people
External DSL - using FitNesse, Concordion etc. to record the acceptance criteria in plain text or HTML - easy to read and browse for non-tech people but more overhead to create, maintain, keep in synch with the tests [JH: This can be added on top of the internal DSL, pulling up test parameters]

The Window Driver Pattern: Decoupling the Tests from the GUI

W.D. is the part of App.Driver responsible for interaction with the GUI
may be split into multiple, for each distinct part of the application [standard coding best practice]
write your tests so that if a new GUI is added, e.g. a gesture-based one, we only need to switch the driver without changing the test

Implementing Acceptance Tests

Topics: state, handling of asynchronicity and timeouts, data management, test doubles management etc.

State in Acceptance Testing

"[..] getting the application ready to exhibit the behavior under test is often the most difficult part of writing the test." p204
we can't eliminate state; try to minimize dependency on complex state
- => resist the tendency to use prod DB dump; instead, maintain a controlled, minimal set of data; we want to focus on testing behavior, not data.
- this minimal coherent set of data should be represented as a collection of scripts; ideally they use the app's public API to put it into the correct state - less brittle than dumping data into the DB - see ch12
tests are ideally atomic, including independent => no hard-to-troubleshoot failures due to timing, possible to run in parallel
- an ideal test also creates all it needs and tidies up afterwards (this is admittedly difficult)
- tip: establish a transaction, roll back after the test - however this typically isn't possible if we treat acceptance testing as end-to-end testing as recommended [p205]
"The most effective approach to acceptance testing is to use the features of your application to isolate the scope of the tests." - f.ex. create a new user for every test, given independent user accounts
if there is no way around tests sharing data, be very careful, they'll be very fragile; don't forget tear down
worst possible case: unknown start state, impossible to clean up => make the tests very defensive (verify preconditions, ...)

Process Boundaries, Encapsulation, and Testing

preferably tests can act/verify without needing any priviledged access (back doors) to the app - don't succumb to the temptation to introduce such back doors, rather thing hard about design, improve modularity/encapsulation/verifiability [p206]
if back doors the only option, 2 possibilities; both lead to brittle, high-maintenance code:
1. Add test-specific API that enables you to modify the state/behavior of the app (e.g. switch WS for a stub for a particular call)
2. React to "magic" data values (this is ugly, reserve it for your stubs)

Managing Asynchrony and Timeouts

asynchrony arises f.ex. due to asynchronous communication, threads, transactions
push asycnhronous behavior (wait for response, retries, ...) to the App Driver, expose synchronous API to the tests => easier to write tests, fewer places to tune; so in a test we will have f.ex. sendAsyncMsg(m);verifyMsgProcessed(); and in the driver's sendAsyncMsg: while(!timeout) if(pollResult) return; else sleep N; continue;
tip: instead of waiting for MAX_TIMEOUT and then polling the result, retry polling it more frequently until response or timeout. If possible, replace polling with hooking into events generated by the system (i.e. register a listener) } both result in a faster response

Using Test Doubles

Automated acceptance tests are not the same as User Acceptance Tests, i.e. they shouldn't use (all) the real external systems, we need to ensure correct, known initial state and an external system we don't control prevents that [JH: unless it's stateless?]
dilemma: integration is difficult to get it right and a common source of errors => test integration points carefully and effectively X external systems take out our control of the app's state and perhaps cannot handle the load generated by testing. One possible solution is to:
1. Create and use test doubles for all ext. systems
2. Create small test suites around every integration point using the real system
a benefit of test doubles is that they add points where we can control the behavior, simulate communication failures, simulate error responses or responses under load etc., that might be difficult to provoke in the real system
minimize and contain the dependencies on ext. systems - preferably one gateway/adapter per system

Testing External Integration Points

these integration tests may need to run less frequently due to limitations of the target systems and might thus require a separate stage in the pipeline
focus on likely problems; f.ex. in an evolving systems the schemas and contracts we rely upon will change and thus we want to test them
"[..] there is usually a few obvious scenarios to simulate in most integrations" => do these, add more as defects are discovered. This approach isn't perfect but good wrt. cost/benefit.
only test calls and data that you use and care about, not everything <- cost/benefit

The Acceptance Testing Stage

fail the build if acceptance tests fail without a compromise; "stop the line"
tip: record the interaction of the test and UI for troubleshooting, e.g. via Vnc2flv [2/2010]
"We know from experience that without excellent automated acceptance test coverage, one of three things happens: Either a lot of time is spent trying to find and fix bugs at the end of the process when you thought you were done, or you spend a great deal of time and money on manual acceptance and regression testing, or you end up releasing poor-quality software." p213

Keeping Acceptance Tests Green

Due to their slowness, devs don't wait for the result of acceptance tests and thus tend to ignore their failure => build-in discipline. If you let the tests rot, they will eventually die away or it will cost you more to fix them before the release (delayed feedback, lost context, ...).

Deployment Tests

Ideal acceptance tests are atomic, set up and celan up their own data and thus have minimal dependency on existing state, and use public channels (API,..) instead of back doors. On the other hand, deployment tests are intended to verify, for the first time, that our deployment script works on a prod-like env. so they consist of a few smoke tests checking that the env. is configured as expected, communication links between components are up&running. They run before functional acc. tests and fail the build immediately (instead for letting the acc. tests time out due to dead dependencies etc.). If we have other slow but important tests (f.ex. expelled from the commit stage), we can run them here as well.

Acceptance Test Performance

being comprehensive and realistic (close to UI) is more important than speed; on large projects they often take few hours. Speedup tips below.

Refactor Common Tasks

factor out and reuse common tasks, especially in setup code, make the efficient
setup via API is faster than via UI; sadly, sometimes it is unavoidable to preload test data to DB or use back door though we thus riks differences between these and what the UI would create

Share Expensive Resources

Ideally we share nothing but this is usually too slow; typically we share at least the instance of the app for all tests. On a project it was considered to share the instance of Selenium (=> more complex code, risk of session leaks) but finally they rather parallelized the tests.

Parallel Testing

Run multiple tests concurrently, perhaps against a single system instance - provided they're isolated.

Using Compute Grids

Especially useful for single-user systems, very slow tests, or to simulate very many users. See f.ex. Selenium Grid.

Tags: book DevOps