Understanding Test Automation Frameworks: What is a Test Runner?

May 21, 2024

Photo by Trevor Wagner

In order to automate tests, an engineer responsible for test automation first needs some means of defining what the tests are that should be run. Somehow, the engineer will need to be able to define a subset of code where operations happen to set the stage, extract output from the system under test, and (in a moment of truth) compare the actual output to the expected output. Maybe interact with the system under test (or trigger some behavior) before extracting output. Arrange/ Act/ Assert; rinse and repeat. Each test specification (or test step, which, much like the way a sentence serves as a statement within the structure of a paragraph, is generally a smaller self-contained segment instruction that serves as a statement within a test specification) needs a clear set of boundaries that makes it possible to move and reorganize if needed.

Given a suite (read: assembly or grouping) of automated tests, it would be helpful to be able to execute tests in that suite somehow. For example, if it was possible to invoke a command in the command line that, every time it ran, initiated the process of running that suite of tests; every time the command was invoked, it would start from the top, with zero awareness of what happened in the last run. It would likely be helpful to pass some sort of configuration parameters to this process (for example, via the command line options or a configuration file), so that users could fine-tune the behaviors that, every time tests were run, could adapt the consistency of each test run to the expectations of the user/s who configured it.

When somebody refers to a test runner, they're referring to a library that provides (at least) two distinct APIs that enable test automation: one API that can be used to define tests and another API that can be used to run them (and specify configurations for test runs). As a definition, this is admittedly oversimplified, but for the purposes of getting a clear understanding of the essential value a test runner provides in automating software testing, hopefully the simplicity here provides a helpful starting point.

Test runners can be versatile tools, but where their area of concern focuses on test specifications and test execution, they essentially serve as the drivetrain for a test automation framework. Like links in a chain on a bicycle, test specifications are drawn through runtime in order to get (and keep) a test run moving. If there is a break somewhere in the chain, the test run (just like with a bicycle) fails, even if it generally continues to roll (at least downhill) for a while. Until that potentially happens, though, it's one test specification after another that drives everything forward.

Test runners are used to execute automated tests defined at any applicable (i.e. non-manual) level of the testing pyramid. As many are implemented, they tend to follow some general conventions that help make their operations accessible and predictable to anybody familiar with the conventions. This post will outline some of these conventions.

Test Runners Provide APIs Used to Execute Test Runs

As noted above, one of the general APIs a test runner provides allows for a user to begin execution of a test run. Most test runners actually provide two separate APIs that facilitate starting a test run: a command-line API and a code-level API.

Command-Line API for Test Execution

Like with most user space processes (even systemd units for anybody familiar), the execution of automated tests starts in the command line. If tests are run on a Continuous Integration server, the run is started from the command line. Debugger sessions in an IDE are generally started from the command line. Runs started in Terminal, cmd, and PowerShell are (for anybody potentially curious at this point) all started from the command line. Even if tests are executed using a task runner (like npm run test, ./gradlew test, make tests, or something similar), the task names serve as aliases for executing the test runner as a program (somehow, even if via an interpreter like node) via the command line.

Regardless of whether a given command line call is executing the library directly or somehow indirectly (beyond the node example above, running code that then configures and invokes/ executes a test run from within general runtime), the execution of a test run needs to start somehow. One of the key things a test runner provides is an API that can be used to start execution from the command line.

Code-Level API for Test Execution

Most test runners also provide classes and methods within code that can be used configure- and begin execution of a test run. For example, Jasmine allows defining a test run with JavaScript code that looks like this:


import Jasmine from 'jasmine';

const jasmine = new Jasmine();

jasmine.addMatchingSpecFiles(['tests/**/*.spec.js']);

await jasmine.execute().then(() => {
  resolve();
});

This can be useful in a number of cases, even if it's likely not especially helpful to get beginners up and running. For example, I worked with an organization once that used code to define suites of Cucumber tests, each with different configurations, that could each be executed with a simple command. I have found this also provides opportunities to explore engineering a solution for to multi-threaded (or multiprocess where applicable) test runs, if the runner does not support it natively.

Test Runners Provide Code-Level APIs Used to Define Test Operations Getting back to the chain metaphor developed in the introduction: if test runners serve as the drivetrain for a test automation framework, test specifications are like links in a chain on a bicycle (or a motorcycle). Even if every test is potentially fairly unique, each specification are a relatively-standardized segment of code that can be used to define how tests operate. If needed they can (again: much like with a bicycle chain, within reason) be managed or rearranged.

With test runners that use code to define specifications, test specifications generally look like some kind of method. In pytest and unittest in python, these methods all incorporate the substring test in their name. In JUnit and TestNG, these methods use the runner's respective @Test annotation.

With test runners (like Cucumber and behave) that use natural language to define specifications, specifications are stored in text files formatted using Gherkin syntax. Test specifications generally have titles that start with the keyword Scenario: (unless they are parameterized, at which point they start with Scenario Outline:).

Lifecycle Hooks

In addition to allowing users to define test specifications, most runners also allow for defining operations that should occur before or after every single specification (or group of specifications) or before- or after a test run. This can be useful for maintaining state (for example, cleaning up data stored within the system under test or an auxiliary datastore somewhere)

Most runners define these as methods with special names. Cucumber and behave provide two different forms of lifecycle hooks: one type defined in code and another (strictly for setup) for all scenarios in a file by using the keyword Background:.

Test Runners Report on The Results of Tests and Test Runs

In addition to executing test specifications as part of a test run, most test runners provide some sort of output reflecting both the current status and the final result of the test run.

Runners can also be configured to save reports of test results to file, like JUnit-style XML. That will be out of scope for this post; this post will deal more specifically with the ways results are reflected during- and after test runtime.

Current Status or Progress to Standard Output

In addition to other output allowed to display in standard output (pytest, for example, requires special configuration to capture current output to standard output), many test runners will print the progress of the test run to standard output, including which test specifications have been run and what the result of each test was.

This can be valuable beyond monitoring test status in a terminal window or CI log as a test run executes: with some runners it is possible to correlate log output printed to standard output with the name of the test being run.

Test Run Result Affect Runner Exit Status

When test specifications run they either pass or fail. The most common way for a test specification to fail is by way of an assertion error, where an assertion statement (tasked with comparing the actual result extracted from the system under tests against the expected result defined within code for the test) throws an error upon detecting a mismatch.

Regardless of whether the specification failure was the result of a test error or a test failure, a non-passing result generally occurs any time a test specification encounters an error or exception (as supported by the programming language tests are automated with) as the test specification executes. As an aside, this can be a handy way to enforce test integrity: if the state or status of a test specification varies from expectations when checked within something like an if statement, an error or exception thrown manually can

If any test specification completes with a non-passing result, the associated test run fails. Once a test run completes, the test runner will generally return a result status to the command line: passing runs generally exit cleanly, and failed runs generally result in nonzero/ unsuccessful exit status.

A nonzero/ unsuccessful exit status from a runner fails the task in CI executing it (although many CI platforms are able to determine before the process exits whether tests appear to be failing or not).

Test Runners Generally Provide Means of for Configuration and Extension

One of the natural benefits of any automated test is the level of consistency a test runner provides to execute tests runs the same way every time. Most test runners make this consistency customizable, to make this consistency adaptable to consumer expectations.

Extension via Configuration

Most test runners allow configuration via command-line arguments. In addition to this, many also allow configuration by way of configuration files. Some actually require a configuration file in order to start executing a test run: TestNG, for example, requires TestNG.xml to be present in order to begin executing tests.

Extension via Code

Some test runners allow subclassing or overriding portions of their APIs (as supported by the programming language they are implemented) to adapt and add behaviors to meet consumer needs.

Conclusion

To anybody familiar, there is (of course) much more to test runners than what's provided in this post. Depending on the runner, it may provide functionality that supports parameterizing tests, nesting test methods, executing tests in parallel, specifying a subset of tests (within files located at the specified file path) to include- or exclude from test runtime), defining test fixtures to persist between specifications, or something else. Some test runners (like Jasmine and Jest -- both for JavaScript) provide integrated assertion APIs (distinct from standalone assertion libraries like Chai -- also for JavaScript). Many provide integrated functionality to export results in a specific report format (like JUnit XML). The list goes on, especially with runners like pytest or (regardless of the language-specific implementation) Cucumber.

But for anybody unfamiliar, hopefully this provides a frame of reference for what a test runner does. Even if the objective for reading this post was to get a sense of what a specific test runner might be doing, hopefully this provides some context (beyond the documentation for that test runner, which should be essential reading) in terms of what test runners do conventionally.

In the end everything comes back to the chain (or maybe chains, if run in parallel, and regardless of whether tests are run at random or in the same sequence every time) of test specifications and/ or steps used both to define test runtime and drive test runtime. When building- or debugging a framework (much like working on a bicycle), additional functionality used in tests work either inside- or outside of the scope of this drivetrain. Flow of control (and often object lifecycles for runtime featuring garbage collection) is handled by default within the scope of a link in the chain.