Evaluating Single-Subject Designs

Statistical Analysis: Evaluating Single-Subject Designs

The purpose for using behavior modification and applied behavior analysis techniques In the classroom is to achieve, and verify, meaningful changes in a student's behavior. The effectiveness of an intervention is commonly judged against both an experimental criterion and a therapeutic criterion. The experimental criterion verifies that an independent variable (an intervention) was responsible for the change in the dependent variable (a behavior). Single-subject designs demonstrating within-subject replications of effect satisfy this criterion (Baer, Wolf, & Risley, 1968; Barlow & Hersen, 1984 Kazdin, 1977).

The therapeutic criterion is a judgment as to whether the results of the teacher's intervention are "important or of clinical or applied significance" (Kazdin. 1982, p. 47). For example, the teacher should ask herself whether it is truly meaningful to increase a student's grade from a D- to a D (Baer, Wolf, & Risley, 1968) or to decrease a child's self-destructive behavior from 100 to 60 instances per hour (Kazdin, 1977) or to reduce a student's off-task behavior in a resource room while it remains at high in the regular class-room. Kazdin (1977) suggests a third evaluation criterion: so do validation, the "social acceptability of an intervention program" (p. 430), Social validation was discussed at length in another chapter.

Intervention effects In applied behavior analysis are usually evaluated through inspection of the graph displaying the plotted data points of the various phases (conditions). Interpretation of data based on visual inspection sounds unrefined and may certainly be subjective. It may therefore be viewed by some as a weak form of evaluation. Evaluation resulting from visual inspection, however, reveals only strong intervention results, and that is what teachers are seeking.

The grossness and subjectivity of visual inspection are somewhat modified by common agreement that certain characteristics of the graphed data should be evaluated. These characteristics include the means of the data in the phases, the /eue/s of performance in the phases, the trends in performance, and the rapidity of behavior change (Kazdin, 1982).

Evaluation of changes in means focuses on the change in the average rate of student performance across the phases of a design. Within each phase, the mean (average) of the data points is determined and may be indicated on the graph by drawing a horizontal, dashed line corresponding to the value on the ordinate scale. Visual inspection or the relationship of these means will help determine if the intervention resulted in consistent and meaningful changes in the behavior in the desired direction of change. In Figure 5-29, Fox and Shapiro (1978) have supplied such indicators of means. The viewer can easily see the relative position of the students' disruptive behavior across the various design phases.

Evaluation of the level of performance refers to the increase or decrease in student performance from the end of one phase to the beginning of the next phase. The teacher wants to evaluate the magnitude and direction of this change. "When a large change in level occurs immediately after the introduction of a new condition, the level change is considered abrupt, which is indicative of a powerful or effective intervention" (Tawney & Gast, 1984, p. 162). Tawney and Gast suggest the following steps to determine and evaluate a level change between two adjacent conditions; (1) identify the ordinate value of the last data point of the first condition and the first data point value of the second condition, (2) subtract the smallest value from the largest, and (3) note whether the change in level is in an improving or decaying direction (p. 162). In Hgure 5-29, the arrows have been added to indicate level changes.

Evaluation of a trend in performance focuses on systematic and consistent increases or decreases in performance. Data trends are most often evaluated using a procedure known as the quarter-intersect method (White & Liberty, 1976), Evaluation of trends is based on lines of progress developed from the median value of the data points in each phase. The use of a trend line increases the reliability of visual analysis among people looking at a graph (Bailey, 1984; Ottenbacher & Cusick, 1991). This is of particular importance as teams of teachers, students, parents, and other concerned individuals review student data to assess progress and make decisions about future instruction or intervention. Steps for computing lines of progress are illustrated In figure 5-30. Trend lines can provide (1) an indication of the direction of behavior change in the past and (2) a prediction of the direction of behavior change in the future. This information can help the teacher determine whether to change the Intervention.

Taking this process one step further will yield a split-middle line of progress (White & Haring, 1980). This line of progress is drawn so that an equal number of data points fall on and above the line as tall on and below the line. As illustrated in Figure 5-31, if the data points do not naturally fall in such a pattern, the line is redrawn higher or lower, trailer to the original line, until the balance of data points is equal.

A fourth characteristic that may be evaluated through visual inspection is the rapidity of the behavior change. This refers to the length of time between the onset or termination of one phase and changes in performance, The sooner the change occurs after the intervention has been applied (or with-drawn), the clearer the intervention effect. It should be noted that "rapidity of change is a difficult notation to specify because it is a joint function of changes in level and slope (trend).... A marked change in level and in slope usually reflects a rapid change" (Kazdin, 1982, p. 316).

Although visual inspection is useful, convenient, and basically reliable for identifying or verifying strong intervention effects, much of the current published research using single-subject designs has accompanied visual inspection with statistical verification of effects (for example, t test, F test, R test, time series analysis). Kazdin (1976) offers three reasons to support the use of statistical techniques:

To assist in distinguishing subtle effects from chance occurrence
To analyze the effects of a treatment procedure when a stable baseline cannot be established
To assess the treatment effects in environments that lack control

More information about advanced uses of visual inspection and especially about statistical evaluation in single-subject designs can be found in Barlow and Hersen (1984), Kazdin (1982), and Tawney and Gast (1984).

Replication

A frequently asked question regarding the use of single-subject designs for research is whether the results can be generalized. If a study shows that a procedure is effective with a single subject, does this mean that it will be effective with others? Applied behavior analysts do not assume generalizabillty of research results based on a single successful intervention. Instead, they depend on replication—the repeated application of the same intervention with different subjects. That systematic teacher praise increases one student's rate of doing math problems may not be a convincing argument for the use of praise. However, documentation that such praise has increased production of not only math problems but also many other academic and social behaviors with dozens of students at many different ages is convincing. By replication, applied behavior analysts gradually identify procedures and techniques effective with many students. These procedures and techniques can then be adopted by others with considerable confidence that they will work.