Data Driven Insights for Better Decisions

Blog

Blog

Visualization Matters

Developed by F.J. Anscombe in 1973, Anscombe's Quartet is comprised of four data sets where each produces identical mean, standard deviation, and correlation statistics, yet when viewed in a scatter plot we see each is unique.

When we calculate summary statistics, we can lose some information about the data we’re analyzing. A single number derived from a data set will not capture all the information that is present. Other statistical tests characterize distribution and offer a more complete understanding. For example, skewness is a measure of the asymmetry and does not have the same value for each of these four data sets.

Anscombe’s Quartet is a reminder of why it’s essential to use visualizations when exploring data and how summary statistical information can be misleading if used alone. Visualization provides a unique view that can make it much easier to discover interesting structures vs. using only numerical methods. Visualization also provides the context necessary for more accurate analysis and to make better choices.

Anscobe1March2020.jpg

Each of Anscombe’s examples illustrates the relationship between the two variables, but only one of them matches the story drawn from the summary statistics. Only data set I appears to be a well-behaved data set with a linear model.

Data sets III & IV demonstrate the effect a single outlier has especially when the sample size is small. Outliers can skew a data set in a way that is hidden in its statistical summary, but readily apparent when the data is visualized. In these two examples, the box charts do a good job of highlighting the presence of an outlier.

Anscobe2March2020.jpg

The folks at Autodesk Reserach have done great work in the field of visualization. This post has some terrifically informative animated visualizations, and they provide a compelling case for using multiple visualizations to reach a complete understanding.

Dave Kinney