40 Years of Rigorous Impact Evaluation: Then & Now

In honor of our 40th anniversary, Metis takes a 40-year look at rigorous impact evaluation then and now.

President and CEO Stan Schneider with Dr. Jing Zhu at a Metis reception.

In October of 1977 (just two months after Metis’s incorporation), the Joint Dissemination Review Panel (funded by the U.S. Department of Health, Education and Welfare, the National Institute of Education, and the U.S. Office of Education) published the Ideabook (G. Kasten Tallmadge, RMC Research Corporation, Mountain View California). The Ideabook was prepared in order to provide guidance to practitioners about ways to gather “convincing” evidence about the effectiveness of educational innovations – many of which were supported by Title I of the Elementary and Secondary Education Act (ESEA). Clearly, what passed for “convincing” in those days would today fall far short of the rigorous standards promulgated by the What Works Clearinghouse (WWC), an initiative of the US Department of Education’s Institute of Education Sciences.

Forty years ago norm-referenced evaluations of Title I interventions relied almost exclusively on pre-intervention and post-intervention comparisons of treated children’s percentile ranks. For example, if a treated group’s average reading pretest performance corresponded to the 20th percentile, and the same group’s average posttest corresponded to the 30th percentile, then the group was considered to have made a 10 percentile-rank improvement that was attributable to the intervention. Assuming that the actual calculations used equal-interval scales (e.g., normal curve equivalents), the mean differences were then tested for statistical significance, and, if significant mean differences exceeded a particular threshold (e.g., a third of a standard deviation was the most commonly used “rule of thumb”) then the intervention was considered to be educationally meaningful.

These days, current standards for evidence hinge on more rigorous methodologies to determine what would have happened in the absence of treatment. As such, most evidence-based governing bodies (such as the WWC) focus on adequate comparisons for treatment effects, particularly to establish that comparisons looked just like the “treated” before intervention. The highest evidence standards are reserved for those methods that give experiment subjects an equal chance to be treated or not treated – often referred to as Randomized Control Trials (RCTs). However, other evaluation designs can meet evidence standards, albeit with some reservations primarily due to the inability to control for unobserved factors, such as motivation and self-selection bias. These other evaluation designs (as well as poorly implemented RCTs and RCTs with high attrition) require a demonstration of baseline equivalence – that treatment and comparison groups were demonstrably similar on observable characteristics prior to intervention.

Due to these more rigorous definitions of evidence, there has been a shift to more emphasis (and financial support) on implementing evidence-based practices that achieve high standards of evidence in support of social services reform efforts – especially if such practices are implemented with a high degree of fidelity. And why not? If they are available and situationally appropriate, “proven” practices should always be preferred over unproven ones, and the body of evidence that distinguishes them from the unproven variety must be as rigorous as possible.

Directed by Dr. Jing Zhu, one of only 300 certified WWC reviewers nationally, Metis currently works with diverse clients to provide rigorous evaluation and support services, helping them to design and implement carefully controlled research designs that can establish an evidence base for their initiatives.