Garrett Mayock's Blog


Learning Tableau (or, "viz viz viz viz viz Python viz")

Exploring a popular business intelligence tool.

  Garrett Mayock posted 2019-01-09 22:57:47 UTC

Check out my Tableau CV!

Learning to use Tableau follows the same path as any software, really - learn by doing. Because of that, I won't be too elaborate about what I've done in this blog, but rather direct you to the Tableau Work section of my website (and to my Tableau Public profile).

What is Tableau?

Tableau is a popular business intelligence tool used to represent data. It's a great tool for communicating information - once you learn something, Tableau makes it easy to communicate that knowledge using visualizations (or "vizzes" for short).

Tableau is a bit less capable when it comes to actually analyzing the data because it is weak in cleaning data. Their own training video for cleaning messy data is largely about using Excel and OpenRefine, and even has the teacher say dirty data is a "nightmare to deal with in Tableau". Furthermore, TabPy (their Python connector) isn't compatible with Tableau Public, so I couldn't directly use Python either (although, I did actually manipulate some data using Python's SciPy module to get smooth graphs for my CV).

However, Tableau does have built-in capabilities to calculate new fields, aggregates, and other functions over data including statistics. It's also commonly referred to as the industry leader in data visualization technology.

I'll be adding to my site's Tableau Work section as I build more public vizzes over time, so be sure to keep checking it out!

How did you get those smooth graphs in the CV?

(Check out my GitHub repository for the code and input/output files here: )

I wanted to graph my skills over time in my CV. However, I really only "knew" some relative skill levels at certain points in time - such as "I started using this program then" or "I went from not-so-skilled to fairly-skilled during whatever month I took a training on that". That mean I had a data source with a LOT of missing values.

There were a few options to fill those missing values.

The first option is easy in Excel, and the second isn't too hard, but all three of them are possible in Python fairly easily. Here's the code I used:

import pandas as pd
import scipy

df = pd.read_csv("C:/filepath/skills_input.csv")
df.index = pd.to_datetime(df['Date'], infer_datetime_format=True)
rng = pd.date_range(df.index.min(), df.index.max())
df = df.reindex(rng, axis=0).interpolate(method='pchip')
df.to_csv(r'C:/filepath/skills_output.csv', index=False)

Basically it reads the CSV (which had sparsely populated values for the first of each month from 10/1/2010 through 2/1/2019), sets the index to an interpolated version of the Date column (ie, creates a row for each single day between those days, and interpolates the column values (one column for each skill) for each day using the pandas wrapper of SciPy's pchip interpolator. Then it exports it into a location I was able to use in Tableau.

Fun, no?

contact me