Visualizing Defect Percentages with Parallel Sets

Prof. Robert Kosara’s visualization tool, Parallel Sets (Parsets) fascinates me. If you download it and play with the sample datasets, you will likely be fascinated as well. It shows aggregations of categorical data in an interactive way.

I am so enamored with this tool, in particular, because it hits the sweet spot between beauty and utility. I’m a real fan of abstract and performance art. I love crazy paintings, sculptures and whatnot that force you to question their very existence. This is art that walks the line between brilliant and senseless.

When I look at the visualizations by Parsets, I’m inclined to print them off and stick them on my cube wall just because they’re “purty.” However, they are also quite utilitarian as every visualization should be. I’m going to show you how by using an example set of defects. Linda Wilkinson’s post last week was the inspiration for this. You can get some of the metrics she talks about in her post with this tool.

For my example, I created a dataset for a fictitious system under test (SUT). The SUT has defects broken down by operating system (Mac or Windows), who reported them (client or QA) and which part of the system they affect (UI, JRE, Database, Http, Xerces, SOAP).

Keeping in mind that I faked this data, here is the format:

DefectID,Reported By,OS,Application Component
Defect1,QA,MacOSX,SOAP
Defect2,Client,Windows,UI
Defect3,Client,MacOSX,Database

The import process is pretty simple. I click a button, choose my csv file, it’s imported. More info on the operation of Parsets is here. A warning: I did have to revert back to version 2.0. Maybe Prof. Kosara could be convinced to allow downloads of 2.0.

I had to check and recheck the boxes on the left to get the data into the order I wanted. Here is what I got:

See the highlighted defect.

So who wants to show me their piechart that they think is perfectly capable of showing this??? Oh wait, PIE CHARTS WON’T DO THIS.  Pie Charts can only show you one variable.  This one has 4.

This is very similar to the parallel coordinate plot described by Stephen Few in Now You See It and shows Wilkinson’s example of analyzing who has reported defects. She was showing how to calculate a percentage for defects.  See how the QA at the top is highlighted?  There’s your percentage.  Aside from who has reported the defects, Parsets makes it incredibly easy to see which OS has more defects and how the defects are spread out among the components.  If I had more time, I would add a severity level to each defect.  Wouldn’t that tell a story.

Parallel Sets is highly interactive.  I can reorder the categories by checking and unchecking boxes.  I can remove a category by unchecking a box if I wish.

I took away the individual defects.

By moving the mouse around, I can highlight and trace data points.  Here I see that Defect 205 is a database defect for Mac OS X.  Although I didn’t do it here, I bet that I could merge the Defect ID with a Defect Description and see both in the mouse over.

See the highlighted defect.

Parallel Sets is still pretty young, but is just so promising.  I’m hoping that eventually, it will be viewable in a browser and easier to share.  Visualizations like this one keep me engaged while providing me with useful information for exploratory analysis.  That’s the promise of data viz, and Parallel Sets delivers.

3 thoughts on “Visualizing Defect Percentages with Parallel Sets”

  1. Hi Marlena,

    I’d seen these types of diagrams before but didn’t know there was a tool out there for generating them.

    I can already think of some uses for this for representing some multi-layered data in my workplace. I’m going to try this out.

    Thanks!

    /Simon

  2. Hi Marlena, You’re right! A pie chart can’t show this data, but a simple table can. And it’s far easier to read. Why over complicate such data by using ANY chart or visualization? The default position should be a table – employ charts/viz when (& only when) a table can’t help. Parasets may have a potential advantage with huge data sets (thousands of variables).

  3. Thanks for the comment Sally.

    Your observation points out something that people frequently forget when they decide to create a graph or even a percentage. A percentage should have at least 25 to 30 data points to mean anything and it’s probably much more for a graph.

    Data points aside, charts such as this one can be useful in exposing relationships in the data that might be elusive if you are faced with many categories. Even then, however, it is good to remember that correlation does not equal causation.

Leave a Reply

Your email address will not be published. Required fields are marked *