Lost in Aggregation

Talking at Microsoft the week before last was an exciting experience.  I saw the fabled soccer fields while soccer was being played, I got to see piles of Windows Vista in an actual bargain bin for $10 at the Microsoft store, and I gave my talk to a room that was filled with close to 100 people.  They asked me lots of intelligent questions.

There are few things more useful to research than intelligent, hard questions.  They show me what my project is lacking.  If I can justify a hard question, I know that I’m very close to having an idea fleshed out.  The same is true for presentations.  If the audience doesn’t understand what I’m talking about or if certain points are unclear, it shows in their questions.

This happened at Microsoft.  I got comments about how my treemap of tests was noise.  Someone else told me that “Bill G” would not have liked being presented with so many data points at all.  Someone else suggested that I take a more aggregated approach to presenting the data.

What this tells me is that I missed a MAJOR point in my presentation.

Ladies, gents, testers, Microsofties …I give you Professor Tufte:

“Graphical displays should
show the data…Graphics reveal data.”

-The Visual Display of Quantitative Information [13]

Think about how a visualization reveals its data.  In a bar chart, since you are looking at a count of either items or percentage points, an accurate scale will show you the quantity of individual data points.  If you look at the same information in a pie chart, you will not get this information.  Part of the appeal of a treemap is that you can see the individual items as parts of a whole. Once you aggregate, you lose this perspective.

In the Ghost Map, you can see individual deaths.  This is how Dr. Snow was revealing his data.

As an example of how this plays out in business, I work under an executive VP who will not look at charts. Why does he prefer to only look at numbers?  Keep in mind, that I work in financial services and that our business is driven by obscenely large, even excessive amounts of financial data.  The tables this man analyzes have thousands of data points.  I have never observed him exploring this data personally, but I have been told that he can zero in on an interesting number in seconds.  He prefers looking at data as opposed to charts because a data point is a number he can see and, to an extent, trust…a percentage, average or slice of pie, not so much.

Mistrust in charts is why I am specializing in visualizations that show every data point and not just an aggregation.  The best visualizations show not just conclusions, but also supporting facts.  I haven’t met Bill G. so I don’t know exactly what he would want to see, but I would give him the choice of staying with the aggregated view or drilling down.  Viewers should be allowed to interact with data enough to form their own trust with it.  Providing only an aggregated view or a conclusion does not give them this opportunity.  Excellent visualizations reveal their data.

2 thoughts on “Lost in Aggregation”

  1. I enjoyed the talk you gave at MS. It got my analytical juices flowing! I agree that testers need to use this stuff more in their day-to-day jobs.

    One of the principles that might drive some folks to avoid aggregations (or even mere depictions) of data is the belief that “even a pie chart is a work of an artist”. The visualizations of various aspects and metrics that someone deems “interesting” reflect, to some extent, the suspicions, theories, and even values of the person creating them. Like an artist, the creator seeks to tell her story with her depiction of the state of affairs. Someone whose livelihood depends on the conclusions drawn from data when the stakes are high might prefer to do their own analysis and even go to extraordinary lengths to avoid the biases that might have crept in to somone elses “work of art”. That might explain some of this aggregation backlash.

    Nevertheless, I think the process of creating the visualization should not be undervalued. It is during the process of defining what is important to show and how it would best be shown that theories are tested, suspicions confirmed, and truths revealed. Good stuff. Keep it coming!

  2. Thanks for the thoughtful comments :)

    I agree with you that, “the process of creating the visualization should not be undervalued.” This is at the heart of exploratory data analysis and has some very interesting implications when it comes to metrics.

    I’ve recently been going through some of James Bach’s information on this subject, and noticed a slide where he says that even lackluster metrics can be useful if they aid in some other discovery

    This does change the rules somewhat, and I’m still learning about this. As far as I know, this is how we are finding new ways of making visualization useful.

    I have ordered the book “Exploratory Data Analysis” by John Tukey. He was a colleague of Prof. Tufte’s. It will be interesting to see what shakes loose from reading it.

Comments are closed.