Recently a client remarked that the time spent planning their testing was as productive at finding faults as was the actual execution of test cases. Since I was in the midst of helping another client plan their testing strategy this remark got my immediate attention. My second client was trying to determine exactly which of the many permutations of data values to test in an application that has a very complex GUI plus an extensible API. Both of these experiences reminded me that planning any activity is important, but in todays world of Just do it! there is a tendency to act rather than think.
The testing process can be divided into three phases [Hetzel]: planning, acquisition and execution & evaluation. The planning phase provides an opportunity for the tester to determine what to test and how to test it. The acquisition phase is the time during which the required testing software is manufactured, data sets are defined and collected, and detailed test scripts are written. During the execution and evaluation phase the test scripts are executed and the results of that execution are evaluated to determine whether the product passed the test.
In this months column I will focus on the planning phase. I am going to briefly discuss test strategies and then focus on some techniques for determining which features or components within a program to test more intensely than others. I will also discuss a technique for incrementally defining a test suite so that the level of intensity is used to determine the number of test cases to be created.
The project test plan should describe the overall strategy that the project will follow for testing the final application and the products leading up to the completed application. Strategic decisions that may be influenced by the choice of development paradigms and process models include:
When to test The test plan should show how the stages of the testing process, such as component, integration and acceptance, correspond to stages of the development process. For those of us who have adopted an iterative, incremental development strategy, incremental testing is a natural fit. Testing can begin as soon as some coherent unit is developed and continues on successively larger units until the complete application is tested. This approach provides for earlier detection of faults and feedback into development. For projects that do not schedule periodic deliveries of executable units, the big bang testing strategy, in which the first test is performed on the complete product, is necessary. This strategy often results in costly bottlenecks as small faults prevent the major system functionality from being exercised.
Who will test The test plan should clearly assign responsibilities for the various stages of testing to project personnel. The independent tester brings a fresh perspective to how well the application meets the requirements. Using such a person for the component test requires a long learning curve which may not be practical in a highly iterative environment. The developer brings a knowledge of the details of the program but also a bias concerning his/her own work. I favor involving developers in testing as do many others [Beizer], but this only works if there are clear guidelines about what to test and how. I will discuss one type of guideline below.
What will be tested The test plan should provide clear objectives for each stage in the testing process. The amount of testing at each stage will be determined by various factors. For example, the higher the priority of reuse in the project plan, the higher should be the priority of component testing in the testing strategy. Component testing is a major resource sink, but it can have tremendous impact on quality. I will devote a future column to techniques for minimizing the resources required for component testing and maximizing its benefits. For the remainder of this column I will discuss techniques for system testing such as determining how many cases to build.
Writing the detailed test plans provides an opportunity for a detailed investigation of the requirements model. A test plan for a use case requires the identification of the underlying domain objects for each use case. Since an object will typically apply to more than one use case, this gives the opportunity to locate inconsistencies in the requirements model. Typical errors include conflicting defaults, inconsistent naming, incomplete domain definitions and unanticipated interactions. This is precisely the activity that my client found to be very helpful.
The individual test cases are constructed for a use case by identifying the domain objects that cooperate to provide the use and by identifying the equivalence classes<1> for each object. The equivalence classes for a domain object can be thought of as subsets of the states identified in the dynamic model of the object. Each test case represents one combination of values for each domain object in the use scenario.
As the use case test plan is written, an input data specification table captures the information required to construct the test cases. That information includes the class from which the domain object is instantiated, the state space of the class and significant states (boundary values) for the objects. As the tester writes additional test plans and encounters additional objects from the same class, the information from one test plan can be used to facilitate the completion of the current test plan. This leads to the administrative pattern: Assign responsibility for test plans for related use cases to one individual.
Creating use case-level test plans also facilitates the identification and investigation of interactions, situations in which one object affects another one or one attribute of an object affects other attributes of the same object. Certainly many interactions are useful and necessary. That is how objects achieve their responsibilities. However, there are also undesirable or unintended interactions where an objects state is affected by another object in unanticipated ways. Two objects might share a component object because a pointer to the one object was inadvertently passed to the two encapsulating objects instead of a second new object being created and passed to one of them. A change made in one of the encapsulating objects is seen by the other encapsulating object.
Even an intended interaction gone bad can cause trouble. For example, if an error prevents the editing of a field, then it is more probable that the same, or a related, error will prevent us from clearing that same field. This is due to the intentional use of a single component object to handle both responsibilities.
The brute force technique for searching for unanticipated interactions is to test all possible permutations of the equivalence classes entered in the input data specification table. If this proves to be too much information or require too many resources for the information gained, the tester can use all possible permutations of successful execution but only include a single instance of error conditions and exceptional situations. These selection criteria represent successively less thorough coverage but also require fewer resources.
Since the tester often does not have access to the code, the identification of interactions is partially a matter of intuition and inference. The resources required for the permutation approach can be reduced further by making assumptions about where interactions do not exist. That is, there is no need to consider different combinations of object values if the value of one object does not influence the value of another. So test cases are constructed to exercise permutations within a set of interacting objects but not to include other objects that we assume are independent of the first group. Obviously this opens the door to faults not being detected, but that is true of any strategy other than an all permutations strategy.
Testing can be a resource intensive activity. The tester may need to reserve special hardware or he/she may have to construct large, complex data sets. The tester always will have to spend large amounts of time verifying that the expected results section of each test case actually corresponds to the correct behavior. In this section I want to present two techniques for determining which parts of the product should be tested more intensely than other parts. This information will be used to reduce the amount of effort expended while only marginally affecting the quality of the resulting product.
One technique for allocating testing resources uses use profiles as the basis for determining which parts of the application will be utilized the most and then tests those parts the most. The principle here is test the most used parts of the program over a wider range of inputs than lesser used portions to ensure greatest user satisfaction.
A use profile is simply a frequency graph that illustrates the number of times that an end user function is used, or is anticipated to be used, in the actual operation of the program. The profile can be constructed in a couple of ways. First, data can be collected from actual use such as during usability testing. This results in a raw count profile. Second, a profile can be constructed by reasoning about the meanings and responsibilities of the system interface. The result is a relative ordering of the end user functions rather than a precise frequency count.
The EXIT function for the system will be successfully completed exactly once per invocation of the program but the SAVE function may be used numerous times. It is conceivable that the Create FOOTNOTE function might not be used any at all during a use of the system. This results in a profile that indicates an ordering of SAVE, EXIT and Create FOOTNOTE. The SAVE function will be tested over a much wider range of inputs than the Create FOOTNOTE function.
A second approach to use profiling is to rate each use case on a scale. In projects mentored by Software Architects, we use a form of a use case that includes fields that record an estimate of how frequently the use will be activated and how critical the use described in the use scenario is to the operation of the system. An estimate is also made of the relative complexity of each use case. Figure 1 shows an example from a recent project. During the initial iterations, the team responsible for the use case model uses domain and application information to complete these fields.
Table 1 - Example Use Case
Use Case # 001
Use Scenario: The user selects the SAVE option from the menu.
The system responds by saving the current file using the
current file name, if it exists. If it does not exist, a dialog
box requests a new filename and a new file is created.
Frequency: High Criticality: High
The frequency field can be used to support the first approach to ordering the use cases. The criticality field can also be used to order the use cases. However, neither of these attributes is really adequate by itself. For example, we might paint a logo in the lower right hand corner of each window. This would be a relatively frequent event, but should it fail, the system will still be able to provide the important functionality to the user. Likewise, attaching to the local database server would happen very seldom but its success is critical to the success of certain other functions.
What is required is a technique for combining these two attributes of the use cases. Table 2 illustrates this using a matrix in which the vertical axis is the criticality rating scale and the horizontal is the frequency rating scale. This results in cells that represent categories such as highly critical and very frequent or minimally critical and seldom used. These two classifications are combined into a single scale that ranges from low in the upper left hand corner to high in the lower right corner. In the next section I will illustrate a couple of strategies for mapping the attribute values into the rating scale.
Table 2 - Combining Use Case Rankings
Criticality \ Frequency minimally critical moderately critical highly critical
The amount of resources and the size of the project will help determine the granularity of the scale. The objective is to identify a set of uses, in the top one or two categories of rating scale, whose combined size is manageable within the available resources. The scale may simply be low, medium and high. If too many cases are categorized as high, we might add a very high category to further discriminate among the cases.
Once the target group of use cases is identified, these heavily used cases can then be tested over the largest possible range of input values. Below I will present one scheme for varying the number of test cases systematically.
A second technique for allocating testing resources is based on the concept of risk. A risk is anything that threatens the successful achievement of the projects goals. The principle here is test most heavily those portions of the system that pose the highest risk to the project to ensure that the most harmful faults are identified.
Risks are divided into three types: business, technical and project risks. Project risks are largely managerial and environmental risks, such as an insufficient supply of qualified personnel, that do not directly affect the testing process. I will show how both business and technical risks are applicable to testing software systems.
Business risks correspond to domain related concepts. For example, changes in IRS reporting regulations would be a risk for an accounting system because the systems functionality must be altered to conform to the new regulations. This type of risk is related to the functionality of the program and therefore to the system level testing.
Technical risks include some implementation concepts. For example, the quality of code generated by the compiler is a technical risk. This type of risk is related to the implementation of the program and hence to the component level testing process.
The output from the risk analysis process is a prioritized list of risks to the project. This list must be translated into an ordering of the use cases. The ordering in turn is used to determine the amount of testing applied to each use case. For the purposes of system testing, I will only consider those business risks that address the domain within which the application is located.
An individual use case can be assigned a risk rating by the tester by considering how the risks identified at the project level apply to the specific use case. For example, those requirements that are rated most likely to change are high risks, those requirements that are outside the expertise of the development team are even higher risks, and those requirements that rely on new technology such as hardware being developed in parallel to the software are high risks as well. In fact it is usually harder to find low risk use cases than high risk. The exact classification scheme can vary from one project to another. It should have sufficient levels to separate the use cases into reasonable groupings but not have so many categories that some categories have no members.
The criticality value combined with the risk associated with the use case can produce a classification that identifies those use cases which describe behavior that is critical to the success of the system but that is also most vulnerable to the risks faced by the project. A highly critical use that has a high risk should obviously receive a high rating for the number of test cases to be generated while a non-critical use with low risk should receive a low rating.
There are several strategies possible for combining the risk and criticality values when the result is not so obvious. An averaging strategy would assign a medium rating to a low risk yet highly critical use while a conservative strategy would assign a high rating to that same use. The choice of strategy is not important to our discussion and is domain and application specific.
The combined rating for the use case is used to determine which combinations of values for domain objects are used in test cases. A low rating would indicate that each equivalence class for each object should be represented in some test case. A high rating indicates that each equivalence class for each object should be used in combination with every equivalence class from the other objects (all permutations).
Lets consider an example. In the interest of space I will summarize a set of use cases in Table 3. An analysis of the use cases identifies the following domain objects:
Table 3 - Three Use Cases
Use Risk Frequency Criticality Scenario
Edit Name Low Medium Low The user modifies the name field of an existing personnel record to which they have the appropriate security authorization.
Save Record Medium High High The user saves a record that has been newly created or modified.
Delete Record High Medium Medium The user deletes an existing record for which they have the appropriate authorization.
The equivalence classes for each object are described in Table 4.
Table 4: Input Value Specifications
Variable Name Object Type Equivalence Classes
Name String 1. name that exceeds the maximum length of the string
Person Personnel 1. newly created
Authorization Security Code 1. authorized for local access only
Table 5: Input Data Permutations
Name Person Authorization
complete name with remaining space pre-existing
authorized for local access only
complete name with remaining space newly created
authorized for local access only
Considering the number of equivalence classes for each object type, there are 16 possible permutations of these classes. Table 5 shows only a small set of possible permutations. Based on a conservative ranking strategy, the Save Record use case would have a rating of High and would be tested using all 16 permutations. The Delete Record use case would have a rating of Medium and would only be tested using 8 permutations. This would exclude permutations that include error conditions such as entering a blank name. The Edit Name Field use case would be tested with only 4 permutations because it has a Low rating and because there is little chance of interactions between this use case and the other two.
Good planning can find faults before live testing begins. By examining relationships among the objects required by various use cases, the requirements can be checked for consistency and completeness. This finds faults much more cheaply than creating test cases based on faulty requirements.
Good planning can optimize scarce resources. There is much information that can be used to make testing as productive as possible. By considering the frequency of use of program features and the risk associated with each use, the tester can make more intelligent decisions about where to emphasize testing. Considering equivalence classes of objects allows the tester to achieve better coverage without increasing the number of test cases.
Beizer, Boris. Software Testing Techniques, Second Edition, Boston, Mass.: International Thompson Computer Press, 1990.
Hetzel, William C. The Complete Guide to Software Testing, Wellesley, Mass.: QED Information Sciences, 1984.
<1> An equivalence class for an object is the set of values for which the system response should be the same.