Notes On Automated Acceptance Testing (from the Continuous Delivery book)

(Cross-posted from blog.iterate.no)

These are my rather extensive notes from reading the chapter 8 on Automated Acceptance Testing in the Continuous Delivery bible by Humble and Farley. There is plenty of very good advice that I just had to take record of. Acceptance testing is an exciting and important subject. Why should you care about it? Because:

We know from experience that without excellent automated acceptance test coverage, one of three things happens: Either a lot of time is spent trying to find and fix bugs at the end of the process when you thought you were done, or you spend a great deal of time and money on manual acceptance and regression testing, or you end up releasing poor-quality software.




(Ch8 focuses primarily on "functional" req., ch9 on "non-functional" or rather cross-functional requirements.)

  • acceptance tests are business-facing, story level test asserting it's complete and working run in a prod-like env; also serves as a regression test
  • manual testing is expensive => done infrequently => defects discovered late when there is little time to fix them and we risk to introduce new refression defects
  • Acc.T. put the app through a series of states => great for discovering threading problems, emergent behavior in event-driven apps, bugs due to architectural mistakes, env/config problems
  • expensive if done poorly


How to Create Maintainable Acc. T. Suites



  1. Good acceptance criteria ("INVEST" - especially valuable to users, testable)
  2. Layered implementation:
    1. Acceptance criteria (Given/When/Then) - as xUnit tests or with Concordion/FitNesse/...
    2. Test implementation - it's crucial that they use a (business) domain-specific language (DSL), no direct relation to UI/API, which would make it brittle
    3. Application driver layer - translates the DSL to interactions with the API/UI, extracts and returns results
  3. Take care to keep test implementation efficient and well factored, especially wrt. managing state, handling timeouts, use of test doubles. Refactor.


Testing against GUI



  • it's most realistic but complex (to set up, ..), brittle, hard to extract results, impossible with GUI technologies that aren't testable
  • if GUI is a thin layer of display-only code with no business/application logic, there is little risk in bypassing it and using the API it talks to directly - this is recommended whenever possible (plus, perhaps, few UI [smoke] tests)


Creating Acc. Tests



  • all (analyst, devs, testers) define acceptance criteria to ensure they all understand and testability


Acceptance Criteria as Executable Specifications



(See BDD) - the plain text specifications are bound to actual tests so that they have to be kept up to date [JH: "Living Documentation"]

The Application Driver Layer



  • provides business-level API and interacts with the application; f.ex. admin_api.createUser("Dave") or app_api.placeOrder("Dave", {"product": "Chocolate", "quantity": "5kg"}) - that both translate into a complex sequence of interactions with the API/UI of the app
  • tip: aliasing key values - createUser("Dave") actually creates a user with a random name but aliases it to "Dave" in the course of the test => readable test, unique data
  • tip: defaults - test data are created with reasonable defaults so that a test only needs to set what it cares about - so createUser takes many optional parameters (tlf, email, balance, ...)
  • a well done driver improves test reliability thanks to reuse - only 1/few places to fix on a change
  • develop it iteratively, start with a few cases and simple tests, extend on-demand


How to Express Your Acceptance Criteria



  1. Internal DSL, i.e. in your programming language (f.ex. JUnit tests using App. Driver) - simple(r), refactoring-frinedly, scary for business people
  2. External DSL - using FitNesse, Concordion etc. to record the acceptance criteria in plain text or HTML - easy to read and browse for non-tech people but more overhead to create, maintain, keep in synch with the tests [JH: This can be added on top of the internal DSL, pulling up test parameters]


The Window Driver Pattern: Decoupling the Tests from the GUI

  • W.D. is the part of App.Driver responsible for interaction with the GUI
  • may be split into multiple, for each distinct part of the application [standard coding best practice]
  • write your tests so that if a new GUI is added, e.g. a gesture-based one, we only need to switch the driver without changing the test


Implementing Acceptance Tests



  • Topics: state, handling of asynchronicity and timeouts, data management, test doubles management etc.


State in Acceptance Testing



  • "[..] getting the application ready to exhibit the behavior under test is often the most difficult part of writing the test." p204
  • we can't eliminate state; try to minimize dependency on complex state
    • => resist the tendency to use prod DB dump; instead, maintain a controlled, minimal set of data; we want to focus on testing behavior, not data.
    • this minimal coherent set of data should be represented as a collection of scripts; ideally they use the app's public API to put it into the correct state - less brittle than dumping data into the DB - see ch12
  • tests are ideally atomic, including independent => no hard-to-troubleshoot failures due to timing, possible to run in parallel
    • an ideal test also creates all it needs and tidies up afterwards (this is admittedly difficult)
    • tip: establish a transaction, roll back after the test - however this typically isn't possible if we treat acceptance testing as end-to-end testing as recommended [p205]
  • "The most effective approach to acceptance testing is to use the features of your application to isolate the scope of the tests." - f.ex. create a new user for every test, given independent user accounts
  • if there is no way around tests sharing data, be very careful, they'll be very fragile; don't forget tear down
  • worst possible case: unknown start state, impossible to clean up => make the tests very defensive (verify preconditions, ...)


Process Boundaries, Encapsulation, and Testing



  • preferably tests can act/verify without needing any priviledged access (back doors) to the app - don't succumb to the temptation to introduce such back doors, rather thing hard about design, improve modularity/encapsulation/verifiability [p206]
  • if back doors the only option, 2 possibilities; both lead to brittle, high-maintenance code:
    1. Add test-specific API that enables you to modify the state/behavior of the app (e.g. switch WS for a stub for a particular call)
    2. React to "magic" data values (this is ugly, reserve it for your stubs)


Managing Asynchrony and Timeouts



  • asynchrony arises f.ex. due to asynchronous communication, threads, transactions
  • push asycnhronous behavior (wait for response, retries, ...) to the App Driver, expose synchronous API to the tests => easier to write tests, fewer places to tune; so in a test we will have f.ex. sendAsyncMsg(m);verifyMsgProcessed(); and in the driver's sendAsyncMsg: while(!timeout) if(pollResult) return; else sleep N; continue;
  • tip: instead of waiting for MAX_TIMEOUT and then polling the result, retry polling it more frequently until response or timeout. If possible, replace polling with hooking into events generated by the system (i.e. register a listener) } both result in a faster response


Using Test Doubles



  • Automated acceptance tests are not the same as User Acceptance Tests, i.e. they shouldn't use (all) the real external systems, we need to ensure correct, known initial state and an external system we don't control prevents that [JH: unless it's stateless?]
  • dilemma: integration is difficult to get it right and a common source of errors => test integration points carefully and effectively X external systems take out our control of the app's state and perhaps cannot handle the load generated by testing. One possible solution is to:
    1. Create and use test doubles for all ext. systems
    2. Create small test suites around every integration point using the real system
  • a benefit of test doubles is that they add points where we can control the behavior, simulate communication failures, simulate error responses or responses under load etc., that might be difficult to provoke in the real system
  • minimize and contain the dependencies on ext. systems - preferably one gateway/adapter per system


Testing External Integration Points



  • these integration tests may need to run less frequently due to limitations of the target systems and might thus require a separate stage in the pipeline
  • focus on likely problems; f.ex. in an evolving systems the schemas and contracts we rely upon will change and thus we want to test them
  • "[..] there is usually a few obvious scenarios to simulate in most integrations" => do these, add more as defects are discovered. This approach isn't perfect but good wrt. cost/benefit.
  • only test calls and data that you use and care about, not everything <- cost/benefit


The Acceptance Testing Stage



  • fail the build if acceptance tests fail without a compromise; "stop the line"
  • tip: record the interaction of the test and UI for troubleshooting, e.g. via Vnc2flv [2/2010]
  • "We know from experience that without excellent automated acceptance test coverage, one of three things happens: Either a lot of time is spent trying to find and fix bugs at the end of the process when you thought you were done, or you spend a great deal of time and money on manual acceptance and regression testing, or you end up releasing poor-quality software." p213


Keeping Acceptance Tests Green



Due to their slowness, devs don't wait for the result of acceptance tests and thus tend to ignore their failure => build-in discipline. If you let the tests rot, they will eventually die away or it will cost you more to fix them before the release (delayed feedback, lost context, ...).

Deployment Tests



Ideal acceptance tests are atomic, set up and celan up their own data and thus have minimal dependency on existing state, and use public channels (API,..) instead of back doors. On the other hand, deployment tests are intended to verify, for the first time, that our deployment script works on a prod-like env. so they consist of a few smoke tests checking that the env. is configured as expected, communication links between components are up&running. They run before functional acc. tests and fail the build immediately (instead for letting the acc. tests time out due to dead dependencies etc.). If we have other slow but important tests (f.ex. expelled from the commit stage), we can run them here as well.

Acceptance Test Performance



  • being comprehensive and realistic (close to UI) is more important than speed; on large projects they often take few hours. Speedup tips below.


Refactor Common Tasks



  • factor out and reuse common tasks, especially in setup code, make the efficient
  • setup via API is faster than via UI; sadly, sometimes it is unavoidable to preload test data to DB or use back door though we thus riks differences between these and what the UI would create


Share Expensive Resources



Ideally we share nothing but this is usually too slow; typically we share at least the instance of the app for all tests. On a project it was considered to share the instance of Selenium (=> more complex code, risk of session leaks) but finally they rather parallelized the tests.

Parallel Testing



Run multiple tests concurrently, perhaps against a single system instance - provided they're isolated.

Using Compute Grids



Especially useful for single-user systems, very slow tests, or to simulate very many users. See f.ex. Selenium Grid.

Tags: book DevOps


Copyright © 2025 Jakub Holý
Powered by Cryogen
Theme by KingMob