Blog

Test Pyramid vs Testing Trophy- What’s the Difference?

December 27, 2022 / Katarina Rudela

Reading Time: 15 minutes

Software testing consists of examining code and its behavior through various means of validation and verification. It should provide an objective view of the software independent from the developers, allowing stakeholders to understand the risks of implementing it. Testing was originally an integral part of the coding process during the early days of software development, such that it was considered a type of debugging. This approach began to change when Glenford J. Myers introduced the idea of separating testing from debugging in 1979. His primary focus was on “breakage” testing, now known as destructive testing, but it illustrated the software development community’s general desire to make development and testing separate activities.

The pyramid and trophy models are two of the most commonly used approaches to software testing. They’re quite similar in many ways, but they also have distinct differences that developers should clearly understand before selecting a model for testing their software.

Trends

Software testing is a rapidly evolving field driven by several dominant trends. In general, developers are performing testing earlier in the software development life cycle (SDLC), as they experience greater incentive to get their products to market as quickly as possible. Agile and DevOps are considered best practices in achieving this goal, but testers are also implementing a more recent refinement of these methodologies known as QAOps.

Agile and DevOps

The Agile methodology is an approach to software development that emphasizes incremental delivery, continual planning and team collaboration. It also requires testing as an essential component of the development process. DevOps combines the development (Dev) and operations (Ops) phases of software implementation, which shortens the SDLC and facilitates the management of end-to-end (E2E) processes. Organizations are also using these methodologies to improve software quality through more rigorous testing.

The adoption of Agile and DevOps is increasing dramatically. While enterprises have been using these methodologies for decades, it’s only during the last few years that small businesses have started doing so.

The chart below shows the workflow for both Agile and DevOps processes:

Fig. 1: Agile and DevOps Workflow
Figure 1: Agile and DevOps Workflow

The Agile process begins with planning and enters a cycle of design, development, testing, deployment and review, which is typically repeated multiple times until the product is ready for release. The DevOps process is a continuing cycle of development and operations. The phases of the development part of the cycle are planning, coding, building and testing. Once the software is released, the operational phases consist of deployment, operations and monitoring until the next release of software.

QAOps

Most developers currently follow DevOps, but analysts expect the adoption of QAOps to become widespread in the foreseeable future. QAOps is a combination of quality Assurance (QA) and operations (Ops), which is intended to deliver software to market more quickly without compromising quality. It emphasizes greater collaboration between developers and testers integrating QA tasks into the Continuous Integration (CI)/Continuous Deployment (CD) pipeline of Agile development. QAOps expands the tester’s role to include all phases of the SDLC, so they must have a thorough understanding of QAOps when following this methodology.

The following chart shows the workflow for QAOps:

Figure 2: QAOps Workflow
Figure 2: QAOps Workflow

Note the general similarity in workflow between QAOps and DevOps. The QA side of QAOps consists of planning, automation, packaging and testing. Once the software is released, it enters the Ops cycle with phases that include triggers, execution and reporting before beginning the planning phase again.

Testing Models

Today, most development methodologies perform formal testing as a distinct phase in software development. Developers may perform some basic unit testing as part of development, but the majority of testing is still separate from development. In some methodologies, the SDLC may transition rapidly between development and testing, or separate teams may perform the two tasks in parallel. In all these cases, however, development and testing are viewed as different activities.

Many types of software testing exist, but E2E, integration and unit testing are most significant for comparing the pyramid and trophy models.

E2E Testing

E2E tests, or user interface (UI) tests, consist of using the software to see if it works, making them the most intuitive form of testing. These tests are entirely automated in most cases, allowing them to simulate every user interaction. These include clicking widgets, entering values and evaluating the UI.

E2E tests verify the users’ actual experience with the software and are the most realistic tests available. They exercise the entire system, allowing them to identify incompatibilities with particular platforms. However, E2E tests aren’t as good at determining exactly what the problem is. In addition, these tests can fail for reasons other than the software, such as network conditions. E2E testing is also slower than other types.

Integration Testing

Integration tests, also known as service tests, determine how well software components work together. Common examples include testing a module’s ability to correctly exchange data with a database and retrieve information from an application programming interface (API). Integration tests don’t need to interact with a UI, since they can directly execute code.

Integration tests are designed to determine if the software is doing what it’s supposed to do, especially when developers implement internal changes. While robust to these types of changes, integration testing can become quite slow for software that performs resource-intensive tasks such as writing to databases, routing HTTP requests and rendering page templates. Testers often need to examine many layers of integration tests to identify exact problems.

Unit Testing

Within the context of software development, a unit is a logically discrete piece of code. It’s usually small, although it may be a class, function or method. Unit testing only ensures that the unit behaves as the developer intended, so the tester only needs to call that code directly and analyze its output. That means unit testing doesn’t depend on other components like services or the UI.

Unit testing is usually much faster than the other types, especially for single modules. It provides highly specific information about the problems, since this type of testing isn’t concerned about how the software should act as a whole. Implementation changes that don’t affect the software’s behavior are difficult to test with this type of test because it may require the modification of many individual tests.

Test Balancing

E2E, integration and unit tests have quite different scopes. E2E tests require a complete application to run and are the most comprehensive type of test, so they require the most computing resources. Integration tests only find problems with the interface between components, such as a database service and a database. Unit tests only look for logical errors at a fundamental level, so they require few resources to run.

All three test types are thus essential, but they provide a different set of benefits and costs. In addition, testing must be limited in some way, since it’s impractical to test every possible case in real-world software. Testers must therefore determine the right balance between these testing types to achieve the best return on investment (ROI). This issue has been a matter of strong debate for a long period, but the testing pyramid and trophy have emerged as two leading models.

Testing Pyramid

The testing pyramid model comes from the book Succeeding with Agile by Mike Cohn, who proposes two basic rules for software testing. The first rule is to write tests with different granularities, meaning testing should be done at different levels. The second rule is to test less as the testing level increases, so you should have a few E2E tests and a lot of unit tests. These rules result in a testing model with the following shape:

Figure 3: Testing Pyramid
Figure 3: Testing Pyramid

The lower the testing levels in the above figure, the more tests are needed. The tip of the pyramid is narrow because relatively few E2E tests are needed, whereas the base is wider to indicate the need for more unit testing. The idea behind the pyramid model is that unit tests should be run earlier in the testing process because they’re faster and can expose problems quickly. In summary, the pyramid model says to conduct a few E2E tests, a decent number of integration tests and many unit tests.

The pyramid model is probably the most widespread practice for testing software at this time. It’s effective for organizing tests and is particularly suitable in test-driven development (TDD) methodologies, since unit testing forms the foundation for these processes. One reason for this model’s popularity is that it often appears spontaneously during the development process, even when developers don’t intend for this to happen. It’s challenging to write E2E tests in the early stages of a project unless the team writes acceptance tests from the beginning, usually as part of a behavior-driven development (BDD) framework. Otherwise, E2E tests are usually written only after the completion of a basic prototype or minimum viable product (MVP), by which time developers have already conducted unit and integration tests.

Test speed is another factor that creates the pyramid shape. Developers run test suites more often when they complete quickly, as slower tests delay the feedback needed for a productive test environment. Since unit tests at the base of the pyramid are faster, testers tend to write more of them. By the same logic, E2E testing takes longer, so developers use it more sparingly. This tendency means that a large web app typically has dozens of E2E tests, hundreds of integration tests and thousands of unit tests.

Testing Trophy

The testing pyramid originated in 2009, when Node.js was just being created. Internet Explorer and Adobe Flash were still in widespread use, and front end frameworks with rich feature sets like Angular and React weren’t yet available. However, by 2016, many developers felt that technological advancements could support a different approach to testing for front end development.

Guillermo Rauch, author of Socket.io and other JavaScript-based technologies, originated the philosophy behind the testing trophy when he advised developers to "Write tests. Not too many. Mostly integration." Like Cohn, Rauch believes strongly in the use of testing as the foundation of software development. However, Rauch also argues that the ROI for testing diminishes beyond a certain point, making it important to locate the sweet spot between doing enough tests to find the most significant bugs, but not doing so many that testing efforts are wasted. This philosophy emphasizes integration tests because a relatively small number of them are needed to identify real problems. In addition, integration tests aren’t tightly bound to specific implementations, but still fast enough that testers can perform a comparatively large number of them.

In 2018, Kent C. Dodds developed Rauch’s idea into the Testing Trophy model we know today, as shown in the following diagram:

Figure 4: Testing Trophy
Figure 4: Testing Trophy

The figure above shows that the Testing Trophy model reorders the priorities of testing. The role of unit testing is greatly reduced, which is largely replaced by static tools like ESLint and JSHInt. This change leaves integration tests as the primary focus, since today’s UIs typically rely on back end components that are challenging to test in isolation. The role of E2E tests in the trophy model is similar to the pyramid model.

The reason for these changes in testing front ends is that unit tests must adapt to changes in internal implementation, which can cost testers time. In addition, software’s behavior isn’t influenced by these changes, so unit testing should be kept under control. As a result, an increasing number of testers are starting to favor the trophy model. The testing trophy is currently the second most widespread model for testing software, after the pyramid model.

Static Testing

Static testing is the testing of software without executing it, in contrast with dynamic testing that’s performed on running programs. Automated tools typically perform static testing, although human testers are still needed for analysis, program comprehension and code review. These tools scan the code to identify problems like a lack of adherence to naming conventions and unsafe statements. The analysis is usually performed on the source code, but object code analysis is also a common part of static testing. Code reviews include inspections and walkthroughs in most cases.

The diagram of the trophy testing model shows that it’s supported by static testing, which isn’t the case for the pyramid model. This is another difference due to trophy model’s greater focus on maximizing ROI, since static tests are relatively inexpensive unless the developer needs to run them in real time. The low cost of static testing is primarily due to the ready availability of advanced analyzers like linters and type checkers, which are still worth using even though they rarely find problems in the software’s logic.

Testing Distribution and Coverage

Both pyramid and trophy testing require additional best practices to ensure that testing is adequately distributed to cover the maximum number of cases without diminishing its ROI. For example, the time needed to test plain gets and sets, or internal methods generally isn’t worthwhile. The sweet spot for testing is usually about 80 percent coverage of the code, with programming language being one of the most important variables.

Developers don’t need to write as much code when the language is more expressive, especially for complex actions. Testing is more important for complex cases, so software written in a highly expressive language like Python may require 90 percent coverage. Furthermore, an extreme case like porting software from Python 2 to Python 3 could require complete testing coverage, since that would be the only way to ensure the port hasn’t change any behavior.

In addition, testing practices related to TDD typically handle testing until the software is released. However, testing in modern methodologies continues after release, especially system testing. However, teams often neglect to establish this type of testing, which attempts to identify problems that only occur in a production environment with more users and data than the test environment. The ability to stage environments and simulate the behavior of real users is often the only way to find bugs that only occur after a system has been in continuous and heavy use for a prolonged period. Some developers go to the extreme of using tools that inject problems into a production system, solely for the purpose of verifying that it’s robust.

Testing Matrix

The degree of confidence that testing should provide has a great influence on the actual shape of the testing model. Only E2E tests can truly validate an application’s usability, but extensive E2E testing is rarely worth the time needed to create, execute and maintain them. Gleb Bahmutov and Roman Sandler have proposed the Testing Matrix as a means of developing a testing strategy as illustrated in the diagram below:

Figure 5: Testing Matrix
Figure 5: Testing Matrix

The figure above indicates effort on the x axis and confidence on the y axis. The ideal location is in the upper left quadrant, but most development efforts begin in the lower left quadrant. Testers should create more tests as the project matures, and developers add more features. Failure to maintain the test suite in this manner can push the project into the lower right quadrant, which is the least desirable.

Summary

The unique characteristics of each software project will determine the ideal shape for its testing model. The testing pyramid model has provided good service for over a decade by establishing a common language in software testing. However, advancements in testing practices and technology are causing testers to suggest new approaches like the testing trophy, especially for front end development.

About Baytech

Baytech is passionate about the technology we use to build custom business applications, especially enterprise solutions that optimize business processes. We’ve been delivering software solutions in a variety of technologies since 1997. Our success is due to the skill and efficiency of our senior staff, which includes software engineers, project managers, and DevOps experts. All of our engineers are onshore, salaried staff members.

We focus on the quality, usability, and scalability of our software and don’t believe in mitigating cost at the risk of quality. We manage project costs by implementing an efficient development process that’s completely transparent and uses the latest standards and practices to build software right the first time. Contact us today to learn more about how we can help your business. Find us online at https://www.baytechconsulting.com/contact.