Twitter github

Category “Data Visualization”

Credo Work: A few of my ultimate nerds

Previously, I blogged about “The Ultimate Nerd.”  In this post, I will introduce you to some people who make my list of Ultimate Nerds.  The only rule I made for my list is that I’m not writing about people I know personally.  I’ve done this to force myself into doing some research and thinking through the reasons why each of these people have made the list.  There were people who I initially thought were no-brainers that came off and people I added much to my own surprise.  It will be interesting to come back to this list in a few years and see how things have changed.

 

Español: Bjork en Jalisco.

Español: Bjork en Jalisco. (Photo credit: Wikipedia)

Bjork
Fans of Bjork won’t be surprised to see her on my list.  She left “has a good beat and you can dance to it” somewhere in the last millennium in favor of risking weird crazy experiments that don’t always work out but can be quite inspiring when they do.  Whenever I read about her projects, it’s not just about her singing or coming out with a new album, it’s about some new technology she’s exploring for making her music.  ReacTable?  Check!  Tesla Coil?  Check!  Swinging pendulums?  Check! Check!!  This is someone who sees technology and pushes it through music.  In my dream of dreams, she and Jack White have a love child.  But that’s another blog post.

Charles Joseph Minard
There are so many reasons why Minard makes my list of ultimate nerds.  Aside from being a pioneer in the field of data visualization, he’s a study in sticking with what you want to do even if it’s not something those around you care about or immediately understand.  He earned his living as a civil engineer and retired as a superintendent for a school of roads and bridges at a French university. It was only during his retirement that he started producing visualizations.

Charles Minard's 1869 chart showing the losses...

Beyond the fact that his visualization of Napoleon’s Russian campaign is ground breaking work, this is something he created at seventy-eight years old. There is so much focus in tech on completing your most important work in your twenties or maybe early thirties at a stretch, and it’s bullshit.

While Minard could have chosen any number of subjects for his visualizations, he chose to visualize the loss incurred by war.  Looking at his flow map of Napoleon’s Russian campaign the horrific losses of troops are immediately visible.  Minard has set an example of visualization as humanitarianism that I intend follow with the open data available on today’s web.

 

Eric Schmidt
In addition to being the former CEO of Google, Eric Schmidt wrote the forward to one of the few “business books” that managed to hold my attention, Artful Making.  You can see Chris McMahon’s blog about it here.

Recently, James Whittaker wrote a post about why he quit Google.  In the post he talks about the culture of innovation that Schmidt fostered at Google and it’s right in line with what I read in Artful Making.  In the testing world, Google has recently gone through an exodus of A-talent in testing.  It’s quite telling that a good number of the folks whose talks I most enjoyed at the last Google Test Automation Conference have quit the big G.

By all appearances, the company is undergoing a lot of change and seems to be embracing more of a top-down management.  Seeing this change and looking at the “leadership” I’ve experienced in my own career has shown me that it is much more challenging to embrace the path of trust and letting smart people do what they will than it is to throw on the black turtleneck and go all Steve Jobs on people.  Screaming may have worked for Steve Jobs, but those who decide to follow his leadership path should take a hard look in the mirror and ask themselves 1. Am I really that brilliant?  2. Am I living in the same context?  (Hint: the answer is NO.)

In my own work life, I often hear ideas that I myself would not reach for.  I work at saying, “let’s see if this will work out” instead of “you are freaking crazy.”

Eric Schmidt makes it onto my nerd-of-nerds list because he had the audacity to hire smart people, trust them and let them go.  Look at what they built. The ideas about software and software testing that came out of Google under Eric Schmidt’s leadership changed me and my career forever. I suspect I am not alone.  We need more leadership like this in software and software testing.

 

Fernanda Viégas and Martin Wattenberg
These two people are a dream team of data visualization.  Although they both do solo work, they work mostly as a team.  In fact, Google hired them as a team and they lead Google’s “Big Picture” visualization research group.  Although their work is always gorgeous, it’s also thought provoking and always has a solid basis in data.  Their first collaborative project, a visualization of Wikipedia, highlighted the controversy amongst some of the pages which, to be honest, I’d never stopped to consider before.

abortion on wikipedia

http://www.research.ibm.com/visual/projects/history_flow/gallery.htm

The artist statement on their website reveals how connected they are to what they do.  Although it’s worth reading the whole thing, (it’s not that long, actually), here are a few of my favorite bits:

“…our artwork complicates and subverts a tool that is largely used by the business and military elite. Unlike these traditional uses, we believe visualization to be an expressive medium that invites emotion.”

“Eventually we start to ask questions that can’t be answered by direct observation.”

For me, this team is an example of being open about collaborative work they do to move technology and mankind forward.

 

Ward Cunningham
There are a few reasons why Ward Cunningham makes my list of ultimate nerds.  He’s the father of the wiki which is a tool I consider mandatory to be effective for exploratory testing and can also be a framework for automated tests.  He helped lay the groundwork for design patterns.  He was also involved in the writing of the Agile Manifesto which is something I frequently reach for to remind myself about the human aspects of software.

Currently, Cunningham is a fellow in the “Code for a Better World” program at Nike where he oversees the Smallest Federated Wiki project.  This is a really neat project as the focus seems to be creating a community of wikis that can talk with each other.  It is also completely open and available on Github.  You an look back through some of the closed issues and see the constructive way in which Ward engages contributors to the project.

Part of the skill for maintaining longevity in a tech career seems to be the ability to simultaneously have a vision but also the ability to break that vision down into pieces small enough to implement.  Cunningham’s engagement with the idea of a wiki over time has shown me what this looks like.

 

So that’s my list of Ultimate Nerds.  In these people, I see what I wish for myself reflected back at me.  There are themes of collaboration, creativity, experimentation and longevity.  It’s easy to get burned out in this industry, but we are surrounded by fantastic mentors and role models if we choose to seek them out.

Last Fall, I ran a half-marathon.  Everything after mile 8 happened through a wall of exhaustion, but I stuck it out.  When I made it to the finish line, there were people lined up outside of the barricades on the street, cheering all of the finishers on.  I ran over to one side and got high-fives from anybody who would give me one, and ended the race feeling ecstatic.  When I feel burned out or when I just don’t know where I’m going with my tech career, it will be easy enough to picture these people lined up and ready to give me a high-five.  Now, there’s a visualization.

Enhanced by Zemanta

Tilt Visualization and CSS Performance

Facebook City

Today, I’m blogging from the HTML5Dev Conference.  The house is packed, and I’m breaking out of my shell as a tester to have a look at all of this from the dev perspective.  Of course, the tester in me has tagged along.  What’s great about conferences in the age of twitter is that you are never in a bubble during these things.  One of my co-workers, Greg Koberger tweeted about some performance testing win for the lastest Firefox release, Firefox 7.

Greg's tweet

In taking a look at the lifehacker article he tweeted, I noticed that there was a category for css performance tests.  Since I spend all day, every day looking at selenium tests that have css locators and I sit next to a css dev (who has excellent taste in rap), css has been on the brain lately.  “Hmm…css peformance…LET’S GOOGLE.”  If you google for css peformance, the first link is for an article titled, Performance Impact of CSS Selectors, by Steve Souders.  Given, the article is talking about Firefox 3.0 so it’s obviously a bit long in the tooth, but it’s interests me for not one, but two reasons:

  1. I’m waiting for the talk “High Performance HTML5″ to begin.  It’s being given by Steve Souders.  (Serendipity or Coincidence?  I shall leave you, dear reader, to ponder.)
  2. The article has a list of different websites and the number of DOM elements in each one.

Since this is a guerilla blog post I’m finishing up as a talk starts, I have no intention of delving deeply into the guts of the article, at the moment.  What I can share, however, is a new Firefox addon I’ve been playing with called Tilt.  The picture you see above is my Facebook page, turned on it’s side in tilt.  Tilt visualizes a web page’s dom elements in 3d and was developed by a Mozilla intern who has recently been hired.  So if we filter the list in the article, Google has the least number of elements while Facebook has the most.  It should be no surprise that the Google home page has had its performance tweaked to oblivion.  Here we can compare the dom of Google visually with the Dom of Face book.

Facebook:

Facebook Tilt

 

Google:

Google tilt

 

I’m definitely forming a hypothesis about the CSS performance of these two pages based on their tilt results.

And that’s your guerilla blog post from the HTML5Dev  Conf.   They really ought to make this thing 2 days next year.  It’s pretty cool.

Continuous Deployment and Data Visualization

Mozilla-firefox-usage-data

Image via Wikipedia

A phrase I hear a lot around Mozilla is “continuous deployment.”  I hear there’s this product Mozilla makes that’s competing with some other product that has rapid release cycles.  So, yeah, we’re working on continuous deployment.

 

I’ve noticed that a main resource around our office for information about continuous deployment is this video from Etsy.  Hearing, “We’re moving to continuous deployment,” is nothing new for me.  This is the 2nd job I’ve had where it’s been a major focus.  Since I’ve  heard of the Flickr version, I decided to watch this Etsy video.

 

Picture yourself at your computer about to hit the big button and deploy a feature you’ve been working on.  You are fairly confident that nothing catastrophic will happen, but you don’t know.  (I’m writing this from a dev perspective, but even if you’re a tester…come on…you never know, even if you’ve tested the hell out of something).  In the talk, this is what is referred to frequently as, “the fear.”  It’s actually referred to as either, “THE FEAR” or “the fear.”

 

“Fear for startups is the biggest no-no.”

“Fear is what keeps you from deleting your database.”

“Fear doesn’t go with creative work.”

 

This rings true for me because I frequently deploy selenium tests for addons.mozilla.org.  My teammates and I have talked about “THE FEAR.”  We have strategies for coping with it such as holding one’s breath, saying a prayer or running the 90+ tests one more time.  When Etsy talks about “The Fear” I know exactly what they mean.

 

Etsy’s video fascinates me because of how they have conquered “The Fear.”  It’s been on my mind every day since I watched the video.  What’s the special-continuous-deployment-sekrit-sauce-that-makes-everything-all-better?

 

Etsy combats “the fear” with visibility.  You see, at Etsy, EVERYTHING IS GRAPHED ALL THE TIME.

 

Here are some of the things they mentioned graphing in the video:
How many visitors are using this thing?
Can we deploy that to 100%?
Did we make it faster?
Did I just break something?
How long is it taking to generate a page?
How many users are logged in?
How is the bandwidth?
What’s the database load?
What’s the requests per second?

 

If you look at the graphs, they are simple bar or line graphs.  They are not exceptionally fancy but they are numerous and the maintenance admittedly takes work.  They are not, however maintained by specialists working in a silo.  The graphs are created by an engineer.  Here are some numbers:

 

20,000 lines/second is their log traffic, at times
16,000 is the number of “metrics” they have organized through dashboards
25 engineers committing code to dashboards
20 dashboards

 

I doubt that when Etsy decided to start graphing everything they woke up one day with 25 dashboards.  It sounded very much like they put the tools in the developers hands and lovingly nudged them along.

 

This is a serious commitment to data.
Data doesn’t just happen.  It takes a persistent effort to include log messages in your code. It takes servers and databases capable of handling the traffic created by the log messages and staff to maintain them.  It takes investing in huge monitors all around the office and giving people the bandwidth to figure out how to work with the data & graphics stack.  Most importantly, it takes trust so that employees are allowed to see the data without making them jump through hoops.

So how can a team move closer to the graphing part of continuous deployment?
According to Etsy:

  • Give people access to production data — without making them wait months for a special password or even log in every time.
  • Make the data real time instead of daily.  When I say access, I mean feeds.  This goes well beyond a spreadsheet.
  • Create copious amounts of log messages.  If someone clicks a link, goes full screen or downloads something…log it.
  • once you have the data, make graphs for features before you release them

 

I love data, but will be the first to admit that it is not pretty.  The plain truth about data is that it takes patience because combing through and refining  it can be tedious, monotonous work.  It is very easy to buy a bunch of monitors and put them on a wall showing an inst-o-matic graph that came with your bug tracker (I’ve seen this done.  O hai, expensive wallpaper!).  It takes more time to ask deeper, meaningful questions.  It takes even more time to filter the data into something graph-able.  After that, you have to find the right way to share it.  Note, that even if you do all of this and the data successfully tells a story, you’ll have to spend time dealing with, “and why did you use those colors.”  What was I saying? Oh yes, data is not pretty.

Now that I’m working every day with tests I visualized a couple of years ago, I’m continuing my quest for deeper questions about tests.  In my context, the tests are the selenium tests I work with day in and day out, so besides coming to grips with “THE FEAR,” I’ve also been thinking about, “THE FAIL.”  But wait!  That’s another blog post.

 

If you want to read more about Etsy’s graphs and data, they have written their own post about it.

Enhanced by Zemanta

Paradigm Shifts: A Year of Getting the Visualization Stack in Order

Kuhn used the duck-rabbit optical illusion to ...
Image via Wikipedia

In which you learn why Marlena has so woefully neglected her blog.

What a heavy, freaking year it’s been.  Considering that I moved to a different hemisphere, that’s no surprise, but I’m not even talking about the move itself.  One of my goals for the year that I don’t think I ever articulated even to myself was that I wanted to work on a paradigm shift for my own approach to visualizing data.  My effort to write a treemap application in Processing with Java was the last straw.  I guess my experiments with Erlang and Scheme corrupted me.  They showed me that there is a way to break free of the loops within loops with loops.  Aside from language choice, I had a “Gone with the Wind” moment of deciding that I would never use a spreadsheet as part of a visualization process again.  I don’t want to be stuck with static data forever.  It’s time to get closer to working with real time feeds as they are the best way to suck in extremely large amounts of data.  The sum total of these decisions has been a year spent building new skills.  I’ve learned how difficult that can be in the midst of new job that requires my full attention, growing Weekend Testing in my new corner of the planet and enduring my husband’s experiments with Australian cuisine (He doesn’t read my “nerdy” blog, so don’t y’all tell him I said that.)

Part I of the Epic Visualization Quest:  A language (or two)

For most of this year, I’ve been on a quest for a new language.  I tried on Python and attended Pycon which happened to take place in Atlanta a couple of weeks before I moved.  I’ve done a lot of work with ruby which felt more comfortable for me than python (who knows why, I certainly can’t explain these things.)   At the end of the year, I started learning javascript.  When I predicted, at the beginning of 2010, that functional programming would show up on my doorstep, I had NO idea that javascript is a highly functional language.  This really hit me hard when I sat down to write a javascript program with a colleague of mine and we both stared at the screen for 5 minutes before uttering a bunch of sentence fragments that went something like, “well you need a class…”  (ain’t no classes in javascript.)

I’m a fan of not just learning a language, but of understanding the headspace of that language.  This makes it harder to get started, but ultimately means that I won’t be trying to force java concepts that  don’t belong  into a javascript program.  I’ve also tried to understand which parts of javascript I might not want to use.  David Burns, The Automated Tester, suggested I give Douglas Crockford’s Javascript: The Good Parts, a read.  I’m halfway through, and while it’s not as hands on as some programming books I’ve read, it’s showing me the headspace I should be in to take better advantage of what JS has to offer.  It’s taking me some time, but I have more confidence that what I write will be better code.

Part 2 of the Epic Visualization Quest: Data Access

Most of the experiments I’ve done with data viz have involved spreadsheets, comma delimited data or tab delimited data.  I’m completely over using all three.  I can’t tell you how much time I spent schlepping data files from application to application in order to get my data in good enough shape to import into a visualization app.  Since the files were usually pretty big this turned into going and getting some coffee while Excel would open the file.  It was SO annoying.  When I attended the Writing About Testing conference in May, Chris McMahon did a short presentation on REST and it opened my eyes.  Over the rest of the year, I gradually built up my knowledge of REST and JSON which culminated in an example Ruby script you can use to pull data from JIRA, the Atlassian Issue tracker.

Part 3 of the Epic Visualization Quest:  A Visualization Library

Just as important as choosing a language is choosing a graphics library.  The 2 major libraries used with javascript, specifically for data visualization are Processing.js and Protovis.  Previously, I’ve worked my way through all of the examples in Ben Fry’s “Visualizing Data.” This was the book that initially introduced me to data visualization and convinced me that I needed to read everything by Edward Tufte.  Since Ben Fry is one of the creators of the processing language, the code in the book is processing and java.   This makes processing.js a no-brainer, but then I took a look at protovis.  I’m so intrigued with their example of a parallel coordinate plot that I have to give it a try.  I also think that their syntax will be slightly easier to use.

This has been a lot of change on top of change to digest and it’s made the year frustrating.  I am still horrible at writing javascript, but I’m also determined to be patient.  Good visualization takes time.  It is all about details and refinement which requires patience.  This patience means that my blog will probably continue to suffer but I’m hoping it also means I’ll have my visualization stack in order which will lead to better focus for 2011.

Btw…next Weekend Testing is on Sunday, January 23.  This month we’ll be pushing further into critical thinking.

Enhanced by Zemanta

If I blog about visualizing defects in JIRA it means I will do it.

@woodybrood's tweet

Sometimes peer pressure is a good thing.  Today I got an unexpected tweet from Daniel Woodward a.k.a @woodybrood asking about a FedEx project where I visualize JIRA issues.  I’ve now done 2 FedExes.  In my first one, I collaborated with Anna Dominguez to create a network graph based on comments in JIRA and Confluence. It was a map of who was commenting on whose issues and pages.  I wrote a blog post about that one here.

My second FedEx was an attempt at visualizing code churn as a horizon graph using data from Atlassian’s source code analysis tool, Fisheye.  It turned out to be one of my more unsuccessful attempts at visualizing.  I couldn’t get code churn data that really meant anything and I realized that code churn should not be visualized as a horizon graph when I started putting the graphic together.  (That was painful.)

So, Daniel, the short answer to your question of whether I have a great way to visualize defects in JIRA is:  not really.  I have put JIRA issues into a treemap before, but I wouldn’t recommend that either.

Where does that leave things?

It leaves me feeling a little frustrated but still curious.  I refuse to believe that the data from bugs is not to be visualized.  My gut tells me that I just haven’t found the right questions to ask or the right style to use for visualization.  I do have one more trick up my sleeve before I’m completely out of ideas.  Here are the pieces:

I have, in the past, visualized counts of fake defects with the parallel coordinate plot software, ParallelSets.  I already hear screams about visualizing bug counts so at least let me explain before you flame me.  When I did this, I was very happy with the results.  However, the version of the software I was using was not robust enough for me to use it on a regular basis.  It’s been updated to be more robust with data, but the newer version has the side effect of not showing the data points individually, plus it only works on windows and I’m more into mac.  When it was working for me, I really loved it.  They really nailed the interaction between the user and the data.  It really pains me to say it, but I’ve moved away from using parallel sets for now.

What I noticed a while ago is that the visualization language protovis has an example of parallel coordinate plots.  I encourage everyone who is interested in visualization to play with protovis.  It’s from a group at Stanford and uses javascript.  I’ve followed the first tutorial for protovis published by Dr. Robert Kosara, and it’s pretty cool.

So I’ve got my visualization idea.  How do we get it out of JIRA?  In the past I’ve gotten information about JIRA defects into a spreadsheet and maybe also csv.  One of the reasons I liked Atlassian so much before they hired me was that data exports pretty cleanly from JIRA. There is no way to overstate how much easier that makes visualization.  Unfortunately, once I got the data out, it did not work in a treemap.  Even though getting the data out through the UI is not that bad, I’d like to try something I’ve done some fiddling around with in the past year:  REST.

JIRA has just released 4.2 which contains a new version of their REST api, and I love the documentation they have for it.  It makes working with the api extremely accessible, much like the docs on twitter’s api.  They’ve got curl examples and a script you can use to make a graph of links.  They also have a page for simple REST examples which is not completely filled in.

Here’s what I’m gonna do:

Use the new JIRA REST api to create a data set in Ruby to be used for creating a parallel coordinate plot of JIRA issues.  If I’m lucky, I’ll get 20% time to do this.  I know some guys who, I’m guessing, wil help me get the example in ProtoVis to work with the JIRA data.  I bet I can have something together by next month.  The goal would be to provide a ruby example for the page on “The Simplest Possible JIRA REST examples” along with some javascript that shows the data in a parallel coordinate plot using protovis.

Readers are officially allowed to hassle me about finishing this before Christmas 2010 on twitter and in the comments here.

Visualizing Defect Percentages with Parallel Sets

Prof. Robert Kosara’s visualization tool, Parallel Sets (Parsets) fascinates me. If you download it and play with the sample datasets, you will likely be fascinated as well. It shows aggregations of categorical data in an interactive way.

I am so enamored with this tool, in particular, because it hits the sweet spot between beauty and utility. I’m a real fan of abstract and performance art. I love crazy paintings, sculptures and whatnot that force you to question their very existence. This is art that walks the line between brilliant and senseless.

When I look at the visualizations by Parsets, I’m inclined to print them off and stick them on my cube wall just because they’re “purty.” However, they are also quite utilitarian as every visualization should be. I’m going to show you how by using an example set of defects. Linda Wilkinson’s post last week was the inspiration for this. You can get some of the metrics she talks about in her post with this tool.

For my example, I created a dataset for a fictitious system under test (SUT). The SUT has defects broken down by operating system (Mac or Windows), who reported them (client or QA) and which part of the system they affect (UI, JRE, Database, Http, Xerces, SOAP).

Keeping in mind that I faked this data, here is the format:

DefectID,Reported By,OS,Application Component
Defect1,QA,MacOSX,SOAP
Defect2,Client,Windows,UI
Defect3,Client,MacOSX,Database

The import process is pretty simple. I click a button, choose my csv file, it’s imported. More info on the operation of Parsets is here. A warning: I did have to revert back to version 2.0. Maybe Prof. Kosara could be convinced to allow downloads of 2.0.

I had to check and recheck the boxes on the left to get the data into the order I wanted. Here is what I got:

See the highlighted defect.

So who wants to show me their piechart that they think is perfectly capable of showing this??? Oh wait, PIE CHARTS WON’T DO THIS.  Pie Charts can only show you one variable.  This one has 4.

This is very similar to the parallel coordinate plot described by Stephen Few in Now You See It and shows Wilkinson’s example of analyzing who has reported defects. She was showing how to calculate a percentage for defects.  See how the QA at the top is highlighted?  There’s your percentage.  Aside from who has reported the defects, Parsets makes it incredibly easy to see which OS has more defects and how the defects are spread out among the components.  If I had more time, I would add a severity level to each defect.  Wouldn’t that tell a story.

Parallel Sets is highly interactive.  I can reorder the categories by checking and unchecking boxes.  I can remove a category by unchecking a box if I wish.

I took away the individual defects.

By moving the mouse around, I can highlight and trace data points.  Here I see that Defect 205 is a database defect for Mac OS X.  Although I didn’t do it here, I bet that I could merge the Defect ID with a Defect Description and see both in the mouse over.

See the highlighted defect.

Parallel Sets is still pretty young, but is just so promising.  I’m hoping that eventually, it will be viewable in a browser and easier to share.  Visualizations like this one keep me engaged while providing me with useful information for exploratory analysis.  That’s the promise of data viz, and Parallel Sets delivers.

Underpants Gnomes Among Us: Exploratory Analysis for Visualization and Testing

Here’s a picture of tester dog, Laika, with Dr. James Whittaker’s new book, Exploratory Software Testing: Tips, Tricks, Tours, and Techniques to Guide Test Design. It showed up on my doorstep last week, and is my first free testing book ever (thanks Dr. Whittaker!)

i can haz testr buk.

Tester Dog

In reading through Stephen Few’s new book, Now You See It,I came across a completely separate perspective of looking at graphics in an “exploratory” manner. I can literally hold a book preaching the value of “exploratory testing” in one hand and a book preaching the value of “exploratory analysis” in the other. They are the same concept. If you have ever wondered what interdisciplinary means, this is a great example of an interdisciplinary concept.

Stephen Few does a great job of explaining exploratory analysis with pictures:

where's the profit?

Exploratory Analysis

Half of the people reading this now understand the underpants gnome tie-in. For those who don’t get it, here’s a link to the original South Park clip (NSFW).

Jokes aside, I’m going to start with the picture, and discuss what this says to me about testing and see if it meshes with what JW’s definition of exploratory testing. I will then look at how this applies to visualization. At the end, the two will either come together or not. At this point, I’m not sure if they will. I’ll just have to keep exploring until I have an answer or a comment telling me why my answer is crap (which is fine with me if you have a good point).

Starting with the picture and testing. I’m assuming the “?” means “write tests.” The eyeball means analyze. The light bulb is the decision of pass or fail. The illustration of directed analysis looks like the process HP Quality Center assumes. QC assumes you’ve primarily written tests and test steps before testing based on written requirements. Then you test. After you’ve tested, you have an outcome.

The second line for “exploratory” analysis looks like a much more cognitive and iterative process. This says that the tester has the opportunity to interact with the system-under-test (SUT) before formulating any tests(eyeball). After playing with the SUT, the tester pokes it with a few tests (“?”). At this point the tester may decide some stuff works and keep poking or decide that some stuff has failed and write defects(light bulb.) Chapter 2 of Exploratory Testing describes how JW defines exploratory testing: “Testers may interact with the application in whatever way they want and use the information the application provides to react, change course and generally explore the application’s functionality without restraint (16).” So far this is looking very similar.

Now that I’ve looked at how the exploratory analysis paradigm applies to testing, here’s how it applies to visualization. As an example visualization, I’m looking at a New York Times graphic, How Different Groups Spend their Day. When I open this graphic, I can see that it’s interactive, so I immediately slide my mouse across the screen. I notice the tool tips. Reading these gets me started reading the labels and eventually the description at the top. Then I start clicking. The boxes on the top right act as a filter. There is a also a filter that engages when a particular layer is clicked.

Few’s point in describing directed analysis vs. exploratory analysis is that in the wild, when we look at visualizations, we use exploratory analysis. It’s not like I knew what I was going to see before I opened the visualization. Few describes the process known as “Schneiderman’s mantra” (for Ben Schneiderman of treemap fame) in more detail saying that we make an overall assessment (eyeball), take a few specific actions (“?”), then reassess (eyeball). Although Few doesn’t say that there is a decision made at some point in this process, I’m assuming there is because of the light bulb in the picture (84).

Recently, Stephen Few asked for industry examples of people using visualization to do their work. Some of the replies were from the airline industry, a mail order warehouse and a medical center. Software engineers should be included in this mix and apparently from page 130 in JW’s book showing a treemap of Vista code complexity, already are. Given that both use the same form of exploratory analysis, I can see why.

Exploratory analysis of software testing and visualization diverge, however, when you look at the scale of data for which each is effective. Visualization requires a large dataset. This could be multiple runs of a set of tests or, as in JW’s example, analysis of large amounts of source code. Exploratory testing as JW describes can occur at a high level such as in the case of a visualization or at the level of an individual test.

One thing my exercise has shown me for sure is that I have to read more of Exploratory Testing.

Horizon Graphs for Code Churn

My advisor has asked that I do some posts as a journal for my software visualization work this semester. Some will hopefully come to fruition before the end of the semester. Do or die time, right?

The first idea that I’ve tweeted about, talked about, cannot shut up about is using a horizon graph to depict code churn. Currently, most of my work has centered around treemaps, but they have a weakness. They only depict one moment in time. Since code churn is concerned with the number of changes over time, a treemap will only show you code churn up to a certain moment in time.

This example horizon graph on Stephen Few’s blog shows the stock price performance for 50 stocks over 1 years. An increase in color saturation indicates an increase up or down. The hue indicates whether the change is positive (blue) or negative (red).

Although the example is from Panopticon, I’m confident that Processing could produce similar results. The idea is really genius, and says very good things about Panopticon. I starred this a while ago in reader. Recently, Alan Page turned me on to some papers by an expert at Microsoft, Dr. Nachi Nagappan. I’ve been rooting around in some of them and noticed that code churn is mentioned again and again. I also noticed that he’s frequently working with millions of lines of code. I don’t know how many components that adds up to, but I’d like to see if the horizon graph would still be useful with hundreds of components, or if it would overwhelm the viewer.

If we’re going for a t.h.u.d, I think you’d have the treemap for upper levels in a hierarchy of components. If the user wanted to investigate an area more there would be a click through to something like the horizon graph.

Plagues aren’t just for blog posts

Vibrio cholerae with a Leifson flagella stain ...
Image via Wikipedia

For the past couple of months, James Whittaker has been writing about the “plagues of testing.” As he’s been posting, I’ve been reading through a book about a real plague.

As software testers, we see a system from a perspective that developers and business types rarely and may never see.  We know our tests, we know how well they ran.  We know our system under test and which components are picky.  If you are like me, and have access to the code base, you also know the code.  In the case of both the system and it’s code you know what should be better.  Sometimes this is not a big deal, but sometimes it is a warning that it’s time to polish the resume. I have not seen this, personally, but I know that there are testers who find themselves in this position.

John Snow was a doctor in this position. He was an expert in the new art of anesthesiology in 1850′s London. He was also deeply involved in the study of cholera and concluded that it was a water born illness. This was very much counter to the prevailing theory of the time that cholera was passed through a sheer volume of stench or miasma (I can hear Dr. Evil saying this word). Health authorities in London, were so convinced that miasma, or extreme smellyness, was the reason for disease that they passed a law in 1848 requiring Londoners to drain their waste into a sewage system that would deposit into the Thames river. Unfortunately, the Thames was also a main source of drinking water for the city. Snow knew all of this and could see a health crisis in the making.

In 1854 when a major outbreak of cholera erupted in a neighborhood very close to where Snow lived, he conducted a thorough investigation in order to prove this theory that cholera was passed through water. He had a list that detailed the names and addresses of 83 people who had died from cholera.  He also had an invaluable resource of detailed information about the neighborhood’s residents in the form of local clergyman Henry Whitehead.  While Whitehead tracked down and questioned everyone he possibly could about their drinking habits, Snow analyzed this data to form patterns of who had been drinking the water, who had died and, just as importantly, who had NOT died.  Not only were Snow and Whitehead able to convince the local parish board to remove the handle of a contaminated water pump, they knew their data so well that they were able to figure out the index, or original, case that had started the outbreak.

John Snow Map

After the outbreak had subsided, Snow put the analysis from his investigation together with a very famous map into a monograph that circulated among London’s health professionals.  His monograph slowly but effectively turned the tide of thinking among health professionals away from the miasma theory.  This and a very smelly Thames convinced authorities to build a new sewer system that drained into the sea.

Frequently, when I talk to people about data visualization, they always ask me how I know what to visualize.  The Ghost Map, by Steven Johnson, illustrates this perfectly.  John Snow went from having a theory to proving his theory with visualization in a convincing way.  There’s no wizard or easy-button for this one.  It takes knowing what you are trying to say and knowing the data, inside and out, you are using to prove your theory.  For testers, this means digging into test runs and testcase organization.  Where are the tests that failed?  How many times did they fail? How are they grouped together?  Does an object that make 1 test fail make others fail as well?  If you know which tests are failing, what do you know about the code you were excercising?  How complex is it?  I know this awful, but I would so go here if I thought it made a difference…who coded it and do I respect their coding talent? Even if I think they are solid, did they know what they were supposed to be doing? You have to know exactly why you are telling the PM you think there is a serious problem and have a way to show it. I lump business people in with having the same short attention span as doctors and politicians. My blog post probably lost them at “James Whittaker.”

Johnson and Whitehead could not drag the dead people’s relatives in front of London’s public health community and force the doctors to listen.  These days doctor’s short attention span is because of insurance, but I’m sure there were other reasons back in the day.  A good visualization does not take a long time for the viewer to process.  That is their special power.  Snow’s map is much more concise than a table of 83 names and addresses along with their individual stories.  Visualization can quickly show your groups of tests that are failing.  It can show that severe defects are increasing, and not decreasing, over time.  Business may drive the ship/do not ship decision, but a good tester will know why a seriously ailing system is in so much trouble.  A great tester can effectively communicate this to a business team.

Reblog this post [with Zemanta]

Submitted an Abstract to PNSQC

I’m posting the abstract I just submitted to PNSQC. It’s also the abstract of the thesis I’m writing for my Masters. I’ve submitted a poster to the Grace Hopper Conference, but never before have I submitted a full-on paper requiring a full-on presentation. I chose PNSQC for 2 reasons: the focus is more on the practical side, unlike some of the ACM conferences and the conference is in Portland, Oregon. God, I love Portland.

Anyway, here is what I submitted:

Visualizing Software Quality

Moving quality forward will require better methods of assessing quality more quickly for large software systems. Assessing how much to test a software application is consistently a challenge for software testers especially when requirements are less than clear and deadlines are constrained.

For my graduate research and my job as a software tester, I have been looking at how visualization can benefit software testing. In assessing the quality of large-scale software systems, data visualization can be used as an aid. Visualizations can show complexity in a system, coverage of system or unit tests, where tests are passing vs. failing and which areas of a system contain the most frequent and severe defects.

In order to create visualizations for testing with a high level of utility and trustworthiness, I studied the principles of good data visualizations vs. visualizations with compromised integrity. Reading about these lead me to change some of the graphs that I had been using for my qa assessment and to adopt newer types of visualizations such as treemaps to show me where I should be testing and which areas of source code are more likely to have defects.

This paper will describe the principles of visualization I have been using, the visualizations I have created and how they are used as well as anecdotal evidence of their effectiveness for testing.