Web Services / REST APIs as a Data Source in Tableau: A PostgreSQL/Python solution

The trend toward wrapping together data and security filtering into RESTful web services has only increased in the past few years, and for a lot of good reasons. If the REST API is optimized to return very quickly based on a set of filtering parameters, the performance alone can justify the architecture. However, Tableau is planning to do more with the result set than simply display it directly — Tableau is ready to do deep analysis and discover new insights. Because of this, Tableau’s VizQL engine needs something it can query in many different ways as the basis for a data source connection.

How can we bridge the gap between a JSON (or XML, if anyone still does that) object response and a relational query engine?

What if I told you THIS was all that was necessary on the Tableau Desktop side to build a data source that acts as a “live” connection to a web service?:

Accessing the PL/Python Function in Tableau

Custom SQL in Tableau

Connect to the PostgreSQL database in Tableau Desktop using whatever credentials you created for an end user.

Then on the Data Connection screen, you’ll find on New Custom SQL on the left:

If you drag that out, then the dialog to define the Custom SQL query appears:

Custom SQL dialog

As you can see, the query is just SELECT * FROM function_name( arguments ), with whatever parameters you want to attach. You can create new parameters right from this dialog box, then put them in place. Make sure to give sensible defaults, and probably most should default to an empty string so that someone can only access if they have the right tokens / etc.

How is this possible?

There’s a whole lot of setup on the PostgreSQL side, but the payoff is the simplicity of the setup in Tableau.

The solution is based on the architecture originally laid out here, but I’ve put together a Dockerfile which builds out a PostgreSQL database ready for your custom set of functions which will dynamically call a RESTful Web Service based on Tableau’s parameters.


The Missing Link – Putting a Hyper File from the Hyper API into a TDSX or TWBX File

The Tableau Hyper API is an amazingly powerful tool for building out Extracts that, for whatever reason, cannot be built or maintained using the standard Tableau Server extract refresh process. You can publish a Hyper file directly to a Tableau Server, but there are several drawbacks:

  • Tableau Server will build out an automatic TDS file, taking a rough guess at any type of metadata categorization (Measure vs. Dimensions, Hierarchies, Geographic info, etc.)
  • The only use for this data source will be creating Ad Hoc reports using Web Edit (or hoping someone in Desktop now knows that it exists). You can’t integrate it easily in an existing Workbook
  • It only works with Single-Table Extracts, and in older versions of Tableau, the single table must be named “Extract” or you will encounter errors.

What is missing is a TDS file to pair up with the Hyper file, describing the exact metadata that you want to go along with the Extracted data. In this article, I’ll describe a workflow that result in a fully controlled TDSX file with a newly generated Hyper file.


Building a Flexible Extract Generator using the Extract API

One of the least mentioned, but incredibly useful APIs in Tableau is the Extract API, which allows you to programmatically create an Extract file (Hyper files starting in 10.5, previously TDE files). The main use case is for data sources that require programmatic access (as opposed to using the one of the native connectors in Tableau). Some situations where this would be useful:

  • Data coming from a Web Service/ RESTful API with an object response
  • ODBC / JDBC drivers that Tableau cannot use
  • Additional programmatic modeling / statistical analysis against a whole data set

This post is focused mostly on first use case, where you are trying to make data available from some type of Web Service / RESTful API. In particular, if you need to provide only a subset from a very flexible set of possible fields for “ad hoc” analysis, this technique is the most functional solution to the problem.

When should I build a Flexible Extract Generator?

If you:

  • Know the structure of your web service responses
  • The amount of total fields is reasonably sized
  • The web service responses will not change frequently
  • Workbooks are fully built out and will not allow web editing
  • Data Source structure can be reused across multiple reports (and possibly customers)

then the better solution for Web Service/REST API based data sources is “Live” Web Services Connections in Tableau.

If instead you want to provide a selection screen to generate an Extract that will power a Web Edit session, then it makes sense to build a Flexible Extract Generator process. This is particularly useful when the set of fields could change drastically from extract to extract, or if other processing (such as machine learning) needs to be applied based on differing parameters prior to its use by the end user (that said, if the actual output columns are consistent, the “Live” Web Services solution could still work).


Multiple Table (Normalized) Hyper Extracts

Starting in Tableau 2018.3+:

  • The design for row level security will be the same in both live connections and extracts
  • Extract files with security will create much faster
  • Best practices for entitlements tables are now feasible in Extracts

Let’s dig into the essentials and how we can make this work for effective Row Level Security.


Isolating Tableau Server Performance Issues

In this post, I’ll be describing a set of steps to follow to isolate the causes of performance issues on Tableau Server.

Here are the basic steps:

  1. Test the workbook in Tableau Desktop. Does it perform well? If yes:
  2. Test the workbook in Tableau Desktop on the Tableau Server machine. Does it perform the same as it did on the previous machine? If yes:
  3. Publish the workbook to Tableau Server, and find a time when there is low-to-no usage on the Tableau Server. Go to the published workbook. Did it perform relatively the same as the test in Step 2 (within 1-3 seconds)?  If yes:
  4. Test the workbook during a time of high usage on the Tableau Server (either natural or do load testing using TabJolt).


Pre-aggregating data with full drill-down

Have you heard this one before? “Just connect to your data in Tableau and start visualizing. Then you’ll publish and share with your whole organization.” It’s a great line, because it’s true. You CAN get started with analysis on top of just about any data in Tableau. But “can” is not “should” — what is possible may not be the BEST way, particularly if you want to scale up. When dealing with massive amounts of data, a better solution is to have two data sources: (1) A pre-aggregated data set for overviews, which I’ll call the Overview data source (2) The row-level data set, which I’ll call the Granular data source. Tableau’s abilities to filter between two data sources (actions & cross-datasource filters in Tableau 10) make this an excellent strategy, and one that I have seen massively improve performance over and over.


Caching in Tableau Server and Row Level Security

Tableau Server, particularly since the 9.0 release has  fantastic caching mechanism. Once a view has been loaded into the cache, any subsequent view using the same data will load extremely quickly. This is why you may notice that a first view in the morning takes some amount of time to load, but every other view is much quicker. Some Tableau customers even “warm” the cache on some of their views by scheduling an e-mail or pinging the Tableau Server for a request of a PDF early in the morning, before any of the regular viewer come in. You can even force a refresh using the “warming” technique by appending the :refresh parameter to the end of your request.


Performance tip: Use “Selected Fields” in Dashboard Actions

When you are trying to maximize performance in Tableau, particularly on a live connection, sometimes the smallest changes can make a big difference. All of your choices in Tableau Desktop eventually end up as a real live SQL query, which the database will have to interpret. The simpler the query, the easier the interpretation, and in most cases the quicker the results.

generated action

Tableau’s Dashboard Actions are amazing, and in the newer versions there is a quick little “Use as filter” button on each sheet in a Dashboard. This creates an Action in the Dashboard->Actions menu which is set to “All Fields” down at the bottom. This is incredibly convenient from a creation standpoint; however, it means that the selected values for every single dimension in the Source Sheet will be passed along as filters in the WHERE clause of the eventual SQL query. This includes categorical information which you are displaying: if you are showing Product Category, Product Sub-Category, and Product ID; all three will be sent in the eventual query.

Particularly when you are getting down to granular details, you really only need the most granular piece of information to be passed into the WHERE clause. For optimal performance, you really only want to pass in values for fields that are indexed in the database. In the previous example, presuming that a Product ID can only belong to one Category and Sub-Category, setting the Action to “Selected Fields” and choosing “Product ID” would simplify the query sent; hopefully Product ID is indexed and thus you get an incredibly quick lookup.




How to set up your Database for Row Level Security in Tableau

Editor’s Note 2: There is a newer, simplified overview of just “how to make RLS work in Tableau” available now: Quick Explanation of Row Level Security in Tableau. This post and the whitepaper provide a more thorough explanation of “why” and how to set up your database.

Editor’s Note: The official (and improved) whitepaper version of this is available from Tableau here

The techniques outlined in this post are applicable to Live Connections and Multi-Table Extracts (available in Tableau 2018.3+). If you need to use Extracts and are on a version of Tableau prior to 2018.3, please see Keeping Your Extracts From Blowing Up .