Ten Hacks Towards World Class Evaluation

What is the best way of evaluating interventions aiming for systemic change? How do we evaluate responses to complex social challenges for effectiveness? How do we know when something we have done is working? How do we know when it’s failing?

Evaluation and metrics represent an attempt to bring “hard” numbers and “hard” results into the labs and innovation space. Some of what’s being asked for makes sense. Unfortunately, a lot of what’s written about evaluation is based on misunderstanding of what we are dealing with when we attempt to change systems.

This attempt at “hardness” is touching but mostly it’s just misguided. Part of it is “physics envy” – where a theory has universal, predictive capacity, and part of it is “market envy” – where there is a shared “consensus reality” about how to “rationally” assess value.

The problem is that neither of these makes a lot of sense. We are dealing with complexity and we are dealing with people. Systemic change will never be physics. Then markets function on the myth of rationality, they are not rational. Rather they are ruled by “animal spirits” that we would do well to remember (but more on this later).

Forget physics and market envy. Here are ten “hacks” that will dramatically improve your evaluation. Some of the ideas here are inspired by Edward Tufte’s book “Beautiful Evidence,” in particular a chapter titled “The Fundamentals of Analytical Design.”

1. Establish Confidence.

A lack of confidence in a strategy can manifest in many different ways. One of the ways it can manifest, either inadvertently or by design, is in evaluations.

Lacking confidence in a strategy can give rise to “noise” where those trying to discern the value of an intervention are distracted by unnecessary data, presentations, artifacts and activities.

Having confidence in a strategy manifests through letting the data speak for itself and in general not needing to introduce noise into the system. In general it means that your evaluations will be elegant, simple to grasp and compelling.

One simple way of establishing confidence in whatever you’re doing is to get early feedback from friendly outsiders. Don’t make the mistake of trying to get it right before getting feedback, it will be much harder to undo later. As the adage goes, “the perfect is the enemy of the good.”

2. Establish an Ex-Ante Baseline.

Before starting anything, establish some sort of baseline, no matter how rough or vague. This will help you demonstrate any changes from the baseline. Having the argument as to why these shifts happens if a different problem from showing that shifts have happened. Not having a baseline makes it many times harder to show that something has shifted or changed.

Don’t be put off by the idea that you need a statistically significant baseline. It is better to have some baseline than none. Think creatively about what you can “snapshot” before you start.

An ex-ante baseline should be coupled with a post-ante evaluation. This allows an assessment as to the impact of whatever you have attempted.

3. Make A Comparative Case.

Evaluating something always means that it is being assessed in comparison to something else, even when it’s not. If someone is investing in your lab or initiative then they are not investing in something else. They are also probably making a comparative case in their heads. So overall, if you can make a comparative case at every step of the way, you’re likely to show what value you’re creating.

Comparisons can be done at multiple levels. So an ex-ante and post-ante evaluation can be done when starting and ending a workshop, a comparison can be made between multiple prototyping teams and a comparative evaluation can be done against a “typical” BAU intervention. There are many, many creative and interesting ways of making a comparative case.

See Tufte’s explanation on a drawing of Napoleon’s March to Moscow for an example.


This is what Tufte calls “the First Principle for the analysis and presentation of data.”

4. Tell A Story About Causality.

One of the most prevailing and counter-productive ideas floating around when it comes to complexity is that it’s impossible to prove causality. That may well be philosophically true (Wittgenstein said “causality is the ultimate superstition”) but what you’ll notice if you look is that everyone does it.

What do I mean by this? What I mean is that it’s probably impossible to definitively prove a causal relationship between cause and effect in a complex system. Complex systems are governed by “complex causality” and not “simple” causality.

But this doesn’t mean that you can’t make a best guess as to the impacts your intervention has had. Everyone who does anything does this. Big government departments spending billions and billions of dollars do this and so do big non-profits that claim to be helping poverty in Africa or whatever. This is where “confidence” comes in.

So tell a story, post-ante, after you have done something. Say what you think the impact of what you have done is. Back this up with numbers if you can.

In Tufte’s Napoleon’s March to Moscow example above, the chart tells the story that “Napoleaon’s army was defeated by the cold,” and that is a story of causality.

Tufte reminds us to “show causality, mechanism, explanation, systemic structure” and this is the “the Second Principle for the analysis and presentation of data.”

5. Track Multiple Variables (But Not Too Many).

I’ve seen evaluations that seem to be trying to track 20-30 different variables and metrics. This doesn’t make sense. Don’t do that. This will generate an insane amount of data that it will be very hard to parse and make sense of.

Instead think about 3-6 variables you can track that are of interest. So for example, a for-profit company tracks revenues and profits. That’s pretty simple. And performance reporting is built around these variables. We are obviously not in such a context when trying to change systems.

See the six-capitals model for a broader take on possible ways to report on what you’re doing. Note that “social capital” or “natural capital” are not simple variables that you can track, they are composite variables. A “simple” variable would be “investment of time” or “investment of money.” And so on.

For a systems change effort you may want to track things like the number of people you reach or engage with, or the amount of time people invest in what you’re doing. Being able to demonstrate shifts in these variables over time forms the basis of an evaluation narrative.

Tufte refers to this as the “Third Principle for the analysis and presentation of data…show multivariate data; that is, show more than 1 or 2 variables.”

6. Focus on Results, Not Just Process.

The field of systemic change is rife with process-junkies. We like process. Process is obviously critical, it is one part of the “how” of things but we frequently assume it as being the only “how.” It is not.

If we go back to an analogy with cooking, then process represents frying or baking but this of course is not the sum of cooking as a practice. While it takes great skill to be good at a process, equally important are things like ingredients and the space one cooks in.

Many evaluations in this space focus on describing process to the exclusion of outlining results or what actually happened. Because the field is so skewed to process, we need to put process back in its box. Mainstream clients don’t case about process as much as they do about results. Describe process, on what has happened, what you have done, but focus on results, on what it is that has resulted from what you have done and why it matters.

7. Integrate Data Into Your Artefacts.

The rise of graphic facilitation has led to an explosion of visuals associated with group work. One description of this approach from Drew Dernvich is, “Graphic recording is the process of translating a group dialogue into images in real time. Along with providing a synthesized recording of ideas, these images engage participants, create a common information set, illuminate patterns and insights, and spark further creative thinking.”

A real challenge arising from the use of graphic facilitation is that its real-time nature tends to generate artefacts that violate most of Tufte’s principles. In deconstructing some examples of graphic facilitation my conclusion is that with a little pre-work and the establishment of the “good practice” guidelines it would be possible to dramatically improve the artefacts coming out of graphic facilitation or graphic recording.

So for example, if a graphic recorder is documenting a group session then labelling the session (date, location, purpose), incorporating simple data such as “who was in the room?” would transform an artefact from being an emotionally pleasing record to being actually useful.

A second step would be potentially incorporating other data sources into the final output, such as actual trend data or statistics from other sources, would immeasurably improve the quality and usefulness of graphic recording outputs. This would require either preparation beforehand or work on the artifact after it has been recorded in real-time.

This is Tufte’s Fourth Principle, “Completely integrate words, numbers, images, diagrams.”

8. Eliminate chartjunk.

This is an idea that hit me right between the eyes. It’s pretty simple, “all non-data ink” or “redundant data-ink” is chartjunk and “Like weeds, many varieties of chartjunk flourish.” See Tufte’s two essays, “Chartjunk: Vibrations, Grids and Ducks” and “The Cognitive Style of Powerpoint” for a full explanation.

Chartjunk is basically a visual device that upon closer examination reveals itself to not tell us anything at all. It is a form of visual noise and it is a function of a lack of confidence in the data sets and ideas being presented.

Unfortunately it is very common in the social sphere. One particularly egregious example I’ve often seen in graphic facilitation is the use of a wave, implying that some surfing is going on. Waves look visually good but what do they actually mean? Not a whole lot.

It’s worth reminding ourselves what we are doing when we are making a presentation:

“Making a presentation is a moral act as well as an intellectual activity. The use of corrupt manipulation and blatant rhetorical ploys in a report or presentation – outright lying, flagwaving, personal attacks, setting up phony alternatives, misdirection, jargon-mongering, evading key issues, feigning disinterested objectivity, wilful misunderstanding of other points of view – suggests that the presenter lacks both credibility and evidence. To maintain standards of quality, relevance, and integrity for evidence, consumers of presentations should insist that presenters be held intellectually and ethically responsible for what they should and tell. This consuming a presentation is also an intellectual and moral activity.” (Edward Tufte)

9. Communicate Frequently

This one is simple to say and hard to do in practice. Don’t wait 2 years into an intervention to say how it’s going and don’t send a newsletter out every 3 months. Try to figure out a way of communicating and getting your story out on an ongoing basis. Figure out a regime of small nuggets of valuable information, data, assessments and so on.

Full blown evaluations that take years to put together might be what your donors wants but the story around if you are succeeding or failing is a story that will come from the day-to-day. All too often there is a vacuum for months and months where people outside of the team are left guessing as to how things are going. Don’t let people project into a vacuum, tell small stories, put out status updates and put out more substantial evaluations as frequently as possible.

10. Remember The Animal Spirits

The social sector as a whole believes in the notion that we hold results or outcomes valuable. So when social enterprises or non-profits attempt to demonstrate “value” they scramble to show rational value, they struggle to put numbers together than somehow shows they are producing results that are worth the money. These rational cases are, in general, weak.

One reason for this is that social interventions are often under-capitalised, compare for example, the budget of a typical non-profit to that of a government department in an OECD country. They just don’t compare.

The other reason, perhaps more importantly, is that no sector is assessed purely on a rational basis – not the private sector and certainly not the public sector.

We somehow believe that the non-profit sector must demonstrate its value rationally, through hard numbers. This standard of evidence is actually impossible to achieve – and in many ways, it’s a waste of time trying. When you add complexity and the non-technical nature of many interventions, it’s even more impossible to show purely rational results with clear causality.

Yet, too many people in the non-profit sector buy the idea that this is the standard to achieve, as if perfect markets exist, as if people make decisions purely on a rational basis.

According to Akerlof and Shiller in their book “Animal Spirits” the irrationality of markets can be explained through five elements of human psychology:

1. confidence
2. fairness
3. corruption and bad faith
4. money illusion
5. stories

The term “animal spirits” was coined by John Maynard Keynes, who wrote,

“Even apart from the instability due to speculation, there is the instability due to the characteristic of human nature that a large proportion of our positive activities depend on spontaneous optimism rather than mathematical expectations, whether moral or hedonistic or economic. Most, probably, of our decisions to do something positive, the full consequences of which will be drawn out over many days to come, can only be taken as the result of animal spirits—a spontaneous urge to action rather than inaction, and not as the outcome of a weighted average of quantitative benefits multiplied by quantitative probabilities.”

When thinking about the purpose of evaluation, we have to remember that animal spirits guide much economic activity and we have to remember that making purely rational cases, as if this is what creates “confidence” is a fool’s errand.

The evaluations we create have to make both a rational and emotional case. Think as hard about the emotional case as you do about the rational case because there are animal spirits at work.