When, and what level of, rigour is appropriate in impact evaluations of aid?

By Steven Synyshyn

For those following the ongoing Millennium Villages (MVP) saga, Dr. Michael Clemens from the Center for Global Development in Washington D.C. and long time advocate of the need for more transparency in evaluating the MVPs ealier this week gave a web-recorded talk at the Overseas Development Institute in London entitled Evaluating the impact of aid to Africa: lessons from the Millennium Villages. His lecture centered on the need for a better understanding of when and how much rigour of evaluation ought to be applied when evaluating impact, both for the MVPs and more largely development interventions as a whole. He argues that if development practitioners and academics are to talk about results and cause and effect, then they must include impact evaluations.

The crux of his argument, however, lies in that this "must" be found along a continuum of level of rigour. For example, as the cost of being wrong increases, so should the level of rigour in impact evaluation. To him, the MVP is an example of where the level of rigour was set much too low for such a high profile intervention. The MVPs were more or less the only tangible policy commitment coming out of the 2000 Millennium Summit which was the largest gathering of heads of state ever assembled in history. Goal posts kept shifting, all evaluation was done internally and the rigour with which data and statistics were used to compare with the rest of the in country populations were all lacking. Dr. Clemen's laments the effect this poor amount of attention paid to the level of rigour has had and will have on the reputation of the MVPs and, undoubtedly, the world. He asks, what other ends could have that money for this project (MVPs) gone toward? Hundreds of insecticide-treated anti-malarial bed nets, which have been shown to reduce child mortality in Africa, could have been purchased and distributed with the money spent on a single household in one of the MVPs (see his talk for data). One might also include the cost of perception among donors in rich countries, however intangible, of another blackmark in the role of aid to help Africa's poorest... And these should not be taken lightly considering the world continues to embark on an age of austerity not seen since the days of the Great Depression.

Dr. Clemen's points to the Mexican government's PROGRESA campaign - cash transfers for confirmed school attendance among poor households' children - as one example of where the correct level of rigour was used, as well as one where the benefits of having done so are profound. He points out the program was continued under successive governments (began 1997, government change came in 2002) and has spread throughout Latin America and now to other places around the world. He adds this was due largely to the emphasis placed on its proven results and how it used its evaluation findings to assist in scaling. In comparison to the MVPs, this program is of great hope for future aid-financed programs and a model for how impact evaluation can be used to affirm and improve what works.

Another clear point made, and topical at that, is Dr. Clemen's assertion that rigorous evaluation is NOT randomization. He quotes Mark Kramer who earlier this year in Washington D.C. testified that only 1 out of 1000 aid programs ought to be subject to random controlled trials, the often-referenced gold standard in impact evaluation. He focuses in on the need for evaluation to be cost-effective in order for it to be deployed correctly and expertly.

All in all, Dr. Clemen's offers a very strong case to assess the need for the level of rigour of evaluation to exist along a continuum. His comparisons to the legal and medical clinical testing fields are also insightful - he convinces the listener that aid has a great deal of distance to cover in order to develop the safeguards and soundness of these other fields despite their own faults. I agree whole heartedly that aid deserves as well as demands the same level of rigour in deciding on the level of rigour in evaluating impact compared to these other fields which often receive greater emphasis because they involve "matters of life and death". Indeed, it likely doesn't take more than one or two trips to the "field" or reading a few chapters of Joseph Stiglitz's "Globalization and its discontents" to conclude that aid, helping and good intentions can cause just as much harm as a poor legal or medical outcome.

To Dr. Clemen's analysis and argument I offer two points. First, the need for added layers of analysis around time or length of intervention integrated into his range somehow. And, second, the importance of political considerations.

First, to me, the length of an intervention matters for deciding the level of overall rigour in assessing its impact. With time and aid, just as in personal finance, how long an intervention or series of interventions have been at work (i.e., think the magic of compounding interest) should affect how much rigour should be applied. For example, an aid program that has been in effect for 10 years, or has been renewed by donor governments for several decades, regardless of its relatively minimal outlay of funds still ought to have a higher level of rigour applied to evaluating its impact irrespective of the cost of the interventions, the cost of being wrong or perhaps other factors.

As I understand it, this would not fit into Dr. Clemen's definition of how to weigh the level of rigour to apply. If, say, the cost of being wrong is low only taking a project's or program's one or two year intervention (according to Dr. Clemen), then it could get by with a lower level of evidence to stay alive. But, fast forward 10, 15 or maybe even 20 years. I argue that there is "compounding" going on in the costs associated with it if no high level of rigour was applied earlier on. The "amount" of money, path dependency, vested interests, and stereotypes perpetuated in the donor communities around what's really working will and do add up, and even compound (i.e., it will cost even more to end or correct them after the fast forward has occurred). While the yearly cost is low, the year-on-year totals (say, many of millions over a decade or two among the other costs noted) will be large. In short, inch by inch, centimetre by centimetre, the costs add up and suddenly the projects and programs, people and communities involved (perhaps defined as "goodwill" but in a pejorative sense) now require an onerous level of cost to correct, evaluate well or, if found to have zero or negligible impact, end! Examples of recent cutbacks in Canada's aid program, while not ideal, hint at this. Scott Gilmore's piece here also points to this cumulative growth in costs and the penalty in dealing with them later on.

Conversely, applying this time aspect of when and how to apply higher levels of rigour can be helpful - similar to what Dr. Clemen's says. I argue they can be of great help to these types of long-running programs so they are actually forced to learn, improve, scale and ultimately make themselves no longer needed, regardless of contextual and political factors.

Second, I felt a deeper discussion of political considerations was warranted than what I heard in the presentation. Evaluations, and the level of rigour, are not decided on in a vacuum. Aid is notorious for being politically driven and allocated. While the type of attitude towards aid Dr. Clemen's talk embodies is both admirable and something we all should strive for, it is not as practical as it might be. Unfortunately, our world is full of lies, damned lies and statistics - all of which can be employed to make it seem like one aid project is having a greater impact than impact evaluation, whatever the level of rigour, would conclude. To use his own comparisons to legal and medical clinical trials against him (purely to illustrate my point, of course), law and medicine are inherently political and are often found to be meddled in by those who stand lose or gain from one outcome or another. Moreover, he says himself the factors underlying the success of the evaluation for the Mexican's PROGRESA were highly idiosyncratic to that case.

This is not a point of futility of want to evaluate impact better, but instead one about continually being aware of context. As social scientists we all must face the fact that deciding on a level of rigour for evaluating impact must also be looked at in a strategic way - humans are involved after all. If there is anything true about development, it is that it has always been about winners and losers. Nevertheless, Dr. Clemen's shows us there is much hope and our efforts for sound evidence are not in vain - PROGRESA is a program that works and has gone on to be replicated in many countries around the world and still continues to this day.

Here are some screenshots of Dr. Clemen's presentation and a link to it online: http://www.odi.org.uk/events/details.asp?id=2956&title=evaluating-impact-aid-africa-lessons-millennium-villages.

Note that I did not listen to the comments made after his by Dr Belay Ejigu Begashaw - Director, MDG Centre for East and Southern Africa and the Columbia Global Center in Africa and former Minister of Agriculture, The Ethiopia Government in Nairobi (ran out of time!). I'm sure they were good and worth listening to!

About the author:

Steven first joined CAIDC after planning its 2009 conference, first as a member and then as a Director in 2011. A young professional and self-professed "keener", Steven strives to balance his passion for youth-related issues with his interest in growing organizations. He has travelled in the Middle East and North Africa and has completed research and social enterprise development asignments in both Kenya and India (the latter being a CIDA youth internship). His background includes a number of operational, strategic and research roles as well as a Masters degree in International Affairs from Carleton University in Ottawa. Steven is married, has yet to acquire a permanent address, and is proud to say that his most prized possession in the world is his three year old bassethound named Sally.