Tableau and Write-Back – Together At Last

Editor’s Note: Huge thanks to special contributor Gordon Rose for this blog post.

Tableau helps people see and understand their data – and guarantees that it in the process, it will never make any changes to that data. Tableau is a strictly read-only technology. However, many customers want the ability to modify the data that lies behind a Tableau visualization (Viz), and then, either see those changes immediately reflected in the Viz and/or make other applications aware of those changes. With a small amount of supporting technology, Tableau’s read-only behavior can easily be integrated into so-called “write-back” use cases.

In this blog article, we’ll explore a way to do exactly that – one in which the write-back components are external to the Viz. An alternative approach is one in which those components are more tightly integrated into the Viz itself – that’s for a later blog article to explore. Ideally you will find that you can use one of these two approaches as a launching point for the development of your own write-back use case.



Pre-aggregating data with full drill-down

Have you heard this one before? “Just connect to your data in Tableau and start visualizing. Then you’ll publish and share with your whole organization.” It’s a great line, because it’s true. You CAN get started with analysis on top of just about any data in Tableau. But “can” is not “should” — what is possible may not be the BEST way, particularly if you want to scale up. When dealing with massive amounts of data, a better solution is to have two data sources: (1) A pre-aggregated data set for overviews, which I’ll call the Overview data source (2) The row-level data set, which I’ll call the Granular data source. Tableau’s abilities to filter between two data sources (actions & cross-datasource filters in Tableau 10) make this an excellent strategy, and one that I have seen massively improve performance over and over.


Thoughts on MPP databases and Tableau

As the recent post on Vertica brings to light, sometimes really highly performing systems need a little configuration to perform optimally with Tableau. There’s a particular set of systems that require some extra thought and care to use with Tableau, because if you set off without any planning and expect to combine Tableau’s ease of use with the speed of these systems and end up staring at the “query executing” screen for 10 minutes, you may start to doubt everyone’s claims.

The systems I’m talking about are the Massive Parallel Processing (MPP) databases. There’s already a great explanation of them here so I’m not going to go too deeply into how they work, other than what is relevant for Tableau. Which systems that Tableau supports are MPP (don’t get too angry if I get this a bit wrong) (in no particular order):

  • Teradata
  • Vertica
  • Redshift
  • Neteeza
  • Greenplum
  • Aster (although there is some Hadoop going on in the backend)
  • ParAccel