Design options

To make an informed decision about how best to design your evaluation, it is important to understand a few things:

The types of evaluations there are to choose from and what they entail
The information you will need to answer your questions of interest
Different data collection methods and what they involve

Designing an evaluation involves finding the best way to obtain the information needed to answer the evaluation questions. Evaluation design involves developing a plan that specifies what information will be collected, from what sources, and how and when it will be collected.

Broadly speaking, there are three "types" of evaluations:

Implementation Evaluation
Outcomes Monitoring
Impact Evaluation

The following pages describe each of these types of evaluation in greater detail, using the logic model framework to illustrate the kinds of questions that can be addressed by each type.

Design Options: Implementation Evaluation

Implementation Evaluation examines:

What is actually being implemented

The way in which it is being implemented
Staff and client reactions to the program
Whether participants are learning program content

Implementation evaluation serves many purposes:

To document what you did (to facilitate replication and, if coupled with an impact evaluation, to help interpret impacts)
To provide information for on-going program improvements
To report progress to stakeholders (including funders)
To provide a context for interpreting impact findings (if coupled with an impact evaluation)

Using the logic model framework and terminology, an Implementation Evaluation addresses the components highlighted in blue, below:

What you are doing ("interventions, activities")
How you are doing it ("interventions, activities")
Why you are doing it ("program goals" and "underlying program assumptions")
What it takes to do what you are doing ("inputs")
What your program is producing ("outputs")
Whom you are targeting, recruiting, enrolling, and serving (participation, an "output")
What participants are learning, their reactions to and satisfaction with the program, and any early behavior change ("immediate outcomes")
The context in which the program is operating

S2Q2-implementation-evaluation

Design Options: Outcomes Monitoring

Outcomes Monitoring involves assessing:

Participant outcomes directly related to program content-that is, knowledge, skills, attitudes, and behavior
Participant reactions to and satisfaction with the program

The purpose of outcomes monitoring is to assess whether participants appear to be learning (and retaining) program content, which can help providers decide whether adjustments or improvements are needed in program content, dosage, or service delivery. Outcomes monitoring is another aspect of performance monitoring.

The outcomes selected for monitoring should directly and narrowly reflect what the program did, gave, or taught-that is, what it should be held accountable for-and not the broader societal goals sought.

Outcomes monitoring is most useful when coupled with a good implementation evaluation; it can provide clues as to why you are (or are not) achieving expected outcomes. For this reason, outcomes monitoring can be considered a key component of an implementation evaluation.

Using the logic model framework and terminology, outcomes monitoring involves examining immediate outcomes upon program completion (see highlight, below).

You may also want to assess baseline levels of immediate outcomes (i.e., prior to program entry) to allow you to calculate changes in knowledge, skills, attitudes, intentions, and behaviors targeted by the program.

In addition, you may want to monitor certain subsequent outcomes-such as retention of knowledge and use of skills taught in the program–for this can also provide valuable information for program improvement.

S2Q2-outcomes-monitoring

Design Options: Impact Evaluation

Impact Evaluations assess program effectiveness; whether a program "works." An impact evaluation is most useful when coupled with a good implementation evaluation. An impact evaluation provides rigorous evidence regarding what changes resulted from the program, and an accompanying implementation evaluation provides clues as to why or how these impacts came about.

Using the logic model framework and terminology, an impact evaluation involves:

Examining what participants are learning in, and their reactions to, the program (immediate outcomes)
Examining whether this learning is sustained and translates into behavior change (subsequent outcomes)
Comparing immediate and subsequent outcomes of program participants to those in a comparison group of "similar" people who have not been exposed to your program
Using statistical techniques to rule out other possible explanations for any observed changes and/or differences in outcomes between the program and control groups

There are two approaches to constructing a comparison group:

Experimental design
Quasi-experimental design

1. Impact Evaluation: Experimental Methods

In an "experimental design," subjects are randomly assigned either to the program or to a control group that does not receive the program. Random assignment assures that the two groups are identical, on average, in every way prior to enrolling the program group into the intervention. Thus, the only difference between the two groups is the program group's exposure to the program. If the difference in outcomes between the program and control groups is statistically significant-and there are adequate retention rates and no systematic attrition-then you can confidently conclude whether or not the program caused any observed differences between the groups. (For more information on attrition, see Stage 5, Section III – What is Attrition and How Does it Affect Interpretation of Findings?)

S2Q2-impact-evaluation

2. Impact Evaluation: Quasi-Experimental Methods

In a "quasi-experimental design," a comparison group is identified or constructed from a sample of people who are "similar" to the program group. If the difference in outcomes between the program and comparison groups is statistically significant, you can cautiously infer that the program created these outcomes.

S2Q2-impact-evaluation2

Drawing causal conclusions from a quasi-experimental design can be tricky-it hinges on how similar the comparison and the program groups are. If the program and comparison groups differ in important ways (e.g., race/ethnicity, scores on immediate outcomes at baseline), you can be less sure that the outcomes observed were actually caused by the program.

You will want to discuss this with your evaluator. Ask him or her:

What s/he's doing to make sure that the two groups are as identical as possible at the outset
How s/he's going to discuss any remaining differences in statistical analyses
How s/he's going to account for any remaining differences when reporting the findings

Cautions

An impact evaluation is most useful when coupled with a good implementation evaluation. An impact evaluation tells you what changes resulted from the program, and an accompanying implementation evaluation provides clues as to why or how these impacts came about.

An impact evaluation is not appropriate for programs in the pilot phase, as these programs are still getting up and running and, generally, still figuring out how to deliver marriage/relationship education. Programs in the pilot phase should focus on a strong implementation evaluation examining major start-up activities (often called a formative evaluation).

Even for mature programs, there are circumstances under which an impact evaluation is not advisable. If you are a mature program, talk to your evaluator about whether the following constraints apply to you:

Is your program too small?
Are you or your stakeholders opposed to denying the program to the comparison/control group?
Will enough of your participants get enough of the program to give it a fair test?
Are conditions right for constructing a strong comparison group?
Do you have the budget, and is your evaluator equipped, to conduct a rigorous impact evaluation?

Remember: You will learn more from a strong implementation evaluation with outcomes monitoring than a weak impact evaluation!

Other Resources

The 2002 User-Friendly Handbook for Project Evaluation National Science Foundation
Developing process evaluation questions. Centers for Disease Control and Prevention
Harrel, A., Burt, M., Hatry, H., Rossman, S., Roth, J. & Sabol., W. (n.d.). Evaluation strategies for human services programs. Washington, DC: The Urban Institute.
"Evaluation Resource Guide for Responsible Fatherhood Program," Office of Family Assistance.
Data Collection Methods, the Compassion Capital Fund National Resource Center
Checklist for Data Collection Design, the Compassion Capital Fund National Resource Center
U.S. Department of Health and Human Services, Administration for Children and Families, Office of Planning, Research & Evaluation. The Program Manager's Guide to Evaluation Glossary.