Building a Flexible Extract Generator using the Extract API

One of the least mentioned, but incredibly useful APIs in Tableau is the Extract API, which allows you to programmatically create an Extract file (Hyper files starting in 10.5, previously TDE files). The main use case is for data sources that require programmatic access (as opposed to using the one of the native connectors in Tableau). Some situations where this would be useful:

  • Data coming from a Web Service/ RESTful API with an object response
  • ODBC / JDBC drivers that Tableau cannot use
  • Additional programmatic modeling / statistical analysis against a whole data set

This post is focused mostly on first use case, where you are trying to make data available from some type of Web Service / RESTful API. In particular, if you need to provide only a subset from a very flexible set of possible fields for “ad hoc” analysis, this technique is the most functional solution to the problem.

When should I build a Flexible Extract Generator?

If you:

  • Know the structure of your web service responses
  • The amount of total fields is reasonably sized
  • The web service responses will not change frequently
  • Workbooks are fully built out and will not allow web editing
  • Data Source structure can be reused across multiple reports (and possibly customers)

then the better solution for Web Service/REST API based data sources is “Live” Web Services Connections in Tableau.

If instead you want to provide a selection screen to generate an Extract that will power a Web Edit session, then it makes sense to build a Flexible Extract Generator process. This is particularly useful when the set of fields could change drastically from extract to extract, or if other processing (such as machine learning) needs to be applied based on differing parameters prior to its use by the end user (that said, if the actual output columns are consistent, the “Live” Web Services solution could still work).

What is a Flexible Extract Generator?

In simplest terms, it’s a Web Data Connector with two differences:

  • The sign-in / option selection portion is run as a separate web application, not hosted via Tableau Server
  • The extract itself is generated via the Extract API and then pushed to the Tableau Server via the REST API

A fully developed system would also include a Scheduler for refreshing and pushing new versions of the Extract (since Extracts created programmatically cannot be refreshed automatically by the Tableau Server).

Here’s a hastily generated image of the workflow:

Extract Generator Overview

There’s a lot left up to you in this architecture, but that’s the general idea — it can be made as flexible and tailored to your needs as necessary. Unlike the Web Data Connector, you choose from the languages of the Extract API (C, C++, Python, Java) for the actual creation portion of the extract file itself. It also allows you to host and use the horsepower of the Extract File Creation step on your own server machines, separate from the Tableau Server (or the user’s Tableau Desktop). This means no data is transferring across the wire in the extract creation phase — you can certainly push the resulting Extract file to a Tableau Server or Desktop in another location after the extract has been built.

One comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s