Learning What Works
A new push for evidence on the impacts of development programs is prompting careful assessment of what works and under which conditions, potentially resulting in more effective interventions for poor people.
In recent years, many development program coordinators have been asked to provide something they have rarely needed to offer in the past: empirical evidence that their programs work.
Although some fields—especially medicine—have long embraced the habit of evaluating effectiveness, rigorous impact evaluation is relatively new in international development. When the International Initiative for Impact Evaluation (3ie) got underway in 2009, Executive Director Howard White said, “Each year billions of dollars are spent on development programs with scant evidence on whether those programs are having any impact at all on poverty, child mortality, getting girls to school, and so on. Policymakers simply don’t know the best use of their resources.”
If evaluating impact—that is, attributing a particular result to a particular program or policy—were easy, it probably would have been commonplace long ago. In fact, conducting high-quality evaluations faces a host of political, financial, and technical obstacles. Nonetheless, aid donors and governments have shown new interest in acquiring solid evidence of the effectiveness of development programs. “Impact evaluation has received more attention in the past decade since the Millennium Development Goals were set,” says Mywish Maredia, an associate professor at Michigan State University. “There has been a push for evidence-based policy.” This new focus has resulted in a number of careful evaluations that have shown definitively whether programs were working and how they could be improved.
Working in the dark
In 2003, the Center for Global Development convened the Evaluation Gap Working Group—a group of economists and evaluation methodologists with experience in a wide range of development activities—to try to understand why rigorous evaluations of social interventions are so rare. “We realized how deficient the evidence was about large-scale development programs,” says Ruth Levine, formerly with the Center for Global Development and now deputy assistant administrator for the Bureau of Policy, Planning, and Learning at the United States Agency for International Development (USAID). “We tried to find evidence backing up programs that people said were successful, and we found very little credible evidence.” The Working Group spent 18 months consulting with policymakers, project managers, agency staff, and evaluation experts and came up with a report—titled When Will We Ever Learn?—about how to stimulate more and better impact evaluations.
As an example of the largely absent evaluations in the field, the report describes findings on community-based health organizations. These organizations—alternatives to expensive, centralized, large-scale health systems—have become increasingly popular as a way for low-income countries to extend health services to their poor populations. Although researchers had conducted plenty of studies, there was still no evidence on whether community health organizations were an effective use of resources. Among 127 studies of 258 cases of community-based health organizations, for example, only 2 studies reported on whether the organizations improved people’s access to health services. None of the studies measured the organizations’ impact on people’s health. The problem is similar for other types of interventions; evaluations for health programs, education reforms, nutrition and feeding projects, and antipoverty programs are often poor in quality or nonexistent. Policymakers and development practitioners have been largely operating in the dark.
The push for results
The report When Will We Ever Learn? came at a moment when a number of aid donors and developing-country governments were demanding more evidence that the millions spent on development interventions were actually meeting their intended goals. Development programs and projects have long included a component called “monitoring and evaluation,” which typically answers questions about how money allocated for a project has been disbursed, and how that money was spent? This practice shows whether administrators have properly implemented a program, but it does not show whether that program makes any difference in the lives of its intended beneficiaries. “Everybody does M&E [monitoring and evaluation] as routine,” says Maredia, “but not enough attention has been given to causal effect and tying results to interventions.”
In the late 1990s and early 2000s, a new push for results emerged. Mexico was a pioneer in carefully evaluating impact and rigorously attributing results to interventions. In the 1990s it adopted a conditional cash transfer program called PROGRESA—an anti-poverty program that provides cash to poor families on the condition that they participate in education and health services. Policymakers incorporated a plan for evaluating PROGRESA’s impact right from the start. Researchers, led by a team from IFPRI, carefully compared the program’s beneficiaries with members of a control group who had not yet been incorporated into the program to see exactly how the program affected participants.
“PROGRESA set a new standard for evaluation in terms of quality and rigor,” says Michelle Adato, IFPRI senior research fellow and coeditor of a new book on conditional cash transfer programs. “As good impact evaluations are done, governments have seen the benefits of these types of programs and evaluations.” The benefits of evaluation include making programs more effective, generating political support for programs, increasing transparency and accountability, providing support for budgetary decisions, and identifying why programs are effective for some intended beneficiaries and not others, she says.
Aid donors have jumped on board. USAID and the World Bank have both put increased emphasis on systematic and rigorous impact evaluations of their programs and initiatives in recent years. “In the past five years at the Bank,” says Ariel Fiszbein, chief economist of the Human Development Network, “impact evaluation has gone from being a curiosity to a core part of our program. It has been slowly and profoundly changing the culture here.” The Millennium Challenge Corporation, a United States aid agency that provides grants to what it calls “well-performing” developing countries, weaves impact evaluation into all of the projects it supports. And the new results agenda laid the groundwork for the creation of the 3ie—a suggestion of the Evaluation Gap Working Group—to advocate for evidence-based policymaking and provide financing for rigorous impact evaluations of socioeconomic interventions in developing countries.
By highlighting successes, evaluations can point the way to future investment opportunities. A recent IFPRI project supported by the Bill & Melinda Gates Foundation sought to identify successes in agricultural development over the past 50 years. Hard evidence of impact was not always easy to come by, especially from interventions carried out decades ago, but ultimately the project identified 20 successes whose impact could be rigorously demonstrated. The resulting book, Millions Fed: Proven Successes in Agricultural Development, describes how actions ranging from crop research to water management to market reforms had measurable impacts on poor people, improving the food security and livelihoods of millions of people. David Spielman, an IFPRI research fellow and coeditor of the book, says, “We spent a lot of time reviewing the evidence of impact for more than 250 interventions. In the end, we found a handful of well-documented interventions and discovered that some of the best evidence came from combinations of different methods—for example, statistical analyses of changes in crop yields and household incomes, geospatial imagery showing changes in agroecological landscapes over time, and in-depth socio-anthropological studies of how real people benefited from an intervention.” The book’s other coeditor, Rajul Pandya-Lorch, head of IFPRI’s 2020 Vision Initiative, says, “The case studies provided lessons on how to target, implement, and sequence agricultural development interventions to achieve the greatest impact.”
Agricultural research played an important role in many of these successes, but assessing the impact of research can be difficult, says Derek Byerlee, chair of the Standing Panel on Impact Assessment of the Consultative Group on International Agricultural Research: “Research is cumulative—you build on the work of others and it is often difficult to attribute impacts to a particular piece of research. And research impacts are very uncertain. You invest in a lot of research that doesn’t have an impact, but you can also learn from failure.” The panel therefore, recognizing the importance of determining the impacts of CGIAR agricultural technologies not only on productivity but also on poverty, invested in a groundbreaking multicountry study of poverty impacts. Michelle Adato, co-editor of a book published from that study, notes that the study emphasized the importance of asking the question, Impact on whom? “Agricultural research will differentially impact wealthier and poorer farmers, men and women, and other groups with different vulnerabilities and access to assets,” she says.
Acting on the evidence
Impact evaluations are most useful when decisionmakers take the evidence into account and act on it. Dan Gilligan, a senior research fellow at IFPRI, evaluated the impact of food-for-schooling programs at camps for internally displaced people in northern Uganda. He compared two types of programs offering food in exchange for school attendance. One was a school-feeding program in which children were fed in school. The other program gave children who attended school a take-home ration of food. The take-home ration did just as well as the school-feeding program—and sometimes better—at improving school attendance and raising children’s test scores because, as Gilligan says, a take-home ration is easier to target to the neediest people. When people see food being distributed at school, they tend to demand it, even if they do not meet the criteria for the program. Because a take-home ration is less obvious to observers, he explains, it is easier to target it only to the people who need it the most. “Better targeting is one of the most effective ways to increase cost-effectiveness,” Gilligan says. When officials of the World Food Programme in Uganda saw the results, they decided to adopt take-home rations in other similar programs in that country. The study thus allowed practitioners to scale up the more cost-effective solution.
Evidence of impact can provide useful information not only for improving or scaling up a particular intervention, but also for designing interventions elsewhere. In the wake of the favorable evaluation of PROGRESA, for example, the Inter-American Development Bank delivered a loan of US$1 billion—its largest ever—to Mexico so that it could consolidate and expand the program (now called Oportunidades). The evaluation also had ripple effects far beyond Mexico’s borders. “When countries in Latin America started conditional cash transfer programs, they tended to also do impact evaluation,” says Dan Gilligan. “Conditional cash transfers took off, but impact evaluation also took off more broadly.” Brazil, Colombia, El Salvador, and Honduras are among the countries adopting this combined approach to conditional cash transfers and impact evaluation.
It is important to remember, however, that lessons from one experience are not always applicable to another situation. “Even when a program has worked in the past, it is not guaranteed to work in another location, at a different time, or at a different scale,” says Shenggen Fan, director general of IFPRI. Similarly, John Maluccio, an assistant professor of economics at Middlebury College, points out, “Impact evaluation studies typically evaluate a specific program at a specific time and place, so the degree to which the results are externally valid to other places can be limited.” But this situation may simply demonstrate the need to further improve how evaluations are conducted and reported so that their results can be more easily used elsewhere, he says.
Part of the challenge of impact evaluation is not only to show impact, but also to understand why and under which conditions a program works. One approach to this challenge is theory based, says Howard White of 3ie. In this case, researchers would start by mapping out a theory of how a program could have impact, describing a chain of causation from inputs to outputs, outcomes, and impact. The impact evaluation should then be designed specifically to test this theory and determine whether events unfold as predicted. This kind of information can make impact evaluations much more useful to policymakers who want to know whether an intervention should be adopted or scaled up.
Overcoming the obstacles
Rigorous impact evaluation is still far from universal in development interventions. One important reason for this is that the results of an impact evaluation may be disappointing to a program’s champions. As a result, impact evaluations may not always be in the interest of people who administer or support a development program. “It’s a big political economy issue,” says Ruth Levine. “Information and knowledge can constrain politicians and reduce their discretionary space, and their appetite for being constrained is pretty limited.” Moreover, some people devote their careers to particular ideas and approaches to problems because they genuinely believe in what they are doing. They may not see the need for a rigorous evaluation of their favorite intervention, which could reveal that it does not, in fact, work as well as hoped or expected.
Levine also points out that agencies and ministries whose core purpose is implementing programs may be more focused on getting money out the door than on evaluating how well each program works: “The institutional rewards are almost all based on how much money you move,” she points out.
Evaluations can also present technical challenges. Many consider the “gold standard” of evaluation to be the experimental approach (for more discussion of experimental methods, see the IFPRI Forum article “Putting Poverty and Hunger Solutions to the Test,” Volume 1, 2010), but this viewpoint is controversial and has recently been questioned by the American Evaluation Association. This approach involves randomly assigning an intervention to a group of people while a control group, made up of people with similar characteristics, does not receive the intervention. After the intervention has been implemented, researchers compare the changes that have occurred in the “treatment” group with the changes that have occurred in the control group.
The approach sounds straightforward, but in fact it is fraught with complications. It can be difficult to keep the intervention from “leaking” into the control group. Sometimes the treatment group is too small for a valid experimental analysis. Ethical considerations call into question the deliberate withholding of resources from people who would otherwise get them in absence of the research, and politicians can be reluctant to assign an intervention randomly when political considerations may push for the inclusion of certain groups of people. New programs often face intense pressure to get resources into people’s hands quickly, and administrators may decide to skip the time-consuming tasks of conducting a baseline study and identifying a control group. “It makes it harder to study the effects of an intervention if you don’t start at the beginning, when you can set up good data collection,” says Paul Winters, an associate professor at American University who has conducted many impact evaluations. “The better the data, the simpler the method you can use to study the program.” Experimental evaluations normally circumvent the ethical dilemma of withholding benefits because administrative or financial capacity prevents the rollout of a program to all potential beneficiaries simultaneously. Those awaiting incorporation most often serve as a control group.
Timing can also be an issue. Meredith Soule, a research adviser at USAID, points out that the impacts of investments may accrue or diminish over time in ways that evaluations focused on short-term results might miss: “At the end of a project, things might look pretty good, but five years later the benefits may have disappeared.” Other interventions, such as providing education or training to people in developing countries, may deliver their benefits years after the initial investment.
A rigorous impact evaluation can be costly, according to Dan Gilligan. “You need to do multiple survey rounds, and these can be relatively costly to run,” he says. Howard White, however, notes that paying for impact evaluation makes more sense than bearing the cost of a failed intervention: “An impact evaluation collecting its own primary data can cost from US$100,000 to over a million, but much cheaper studies can be done if existing data are used. There is a much higher cost to spending money on interventions that don’t work.”
Yet surveys on their own are not sufficient for understanding impacts. According to Michelle Adato, “Development programs that depend on behavior change and adoption of new practices confront cultural and social norms, values, belief systems, and different ways of doing things.” Understanding why interventions do or do not have an impact on people of diverse countries, regions, and backgrounds requires mixed research methodologies. The best evaluations combine quantitative surveys with qualitative methods such as in-depth interviews and observation, she says.
Finally, large-scale policy changes related to national- or global-level issues like trade, prices, marketing, sectoral investments, and institutions can potentially have much larger impacts than smaller-scale program interventions. But determining the impact of such wholesale policy shifts poses particular difficulties. “In the case of large-scale policies, it is much more challenging to attribute results to actions and contributors, to measure time lags between policy changes and impacts, and to account for spillover effects,” says Shenggen Fan. In the mid-1990s, IFPRI researchers conducted a study showing that if the Government of Vietnam relaxed both its international and domestic restrictions on the trade of rice, it could improve rice prices and reduce poverty while enhancing food security and avoiding regional disparities in rice supply. Vietnam adopted the recommended reforms, and an external evaluator then had the daunting task of assessing the impact of these reforms. The evaluator, James Ryan, worked out the technical challenges and determined that the policy reforms had greatly improved the welfare of Vietnamese farmers and consumers, achieving a cost-benefit ratio of 1 to 56.
Making development work better
In a time of tight budgets and burgeoning information sources, the imperative for impact evaluation of development programs is gaining strength. According to IFPRI Research Fellow David Spielman, “Donors understand that they need to do more to show effectiveness because they are now accountable to more stakeholders. And national governments are recognizing that evaluating impact can be helpful because, among other things, it can get them more money to do what they want to do.”
Impact evaluation will not give quick and easy answers, cautions Ariel Fiszbein of the World Bank. “People love the idea that you’ve found the solution to a problem—you do an evaluation and suddenly you’ve found the key to eradicating poverty,” he says. “But we’re dealing with complex issues, and we don’t have all the answers. We need to set in motion a process of experimentation and learning.” By evaluating impact, researchers, policymakers, and donors can make modest improvements that lead to both substantial advances in how well programs work and, ultimately, real progress in the fight against poverty and hunger.
—Reported by Heidi Fritschel