The currently available Beta 1 of Tableau 2018.3 includes a long-requested feature for creating multiple table Hyper extracts — that is to say, each table you see in the connection pane will be brought in and stored as separate tables in a single Hyper extract file. Why is this so exciting? Because it’s the end of the need for Defusing Row Level Security in Tableau Data Extracts (Before They Blow Up) Part 1 (and Part 2)!
Starting in 2018.3
- The design for row level security will be the same in both live connections and extracts
- Extract files with security will create much faster
- Best practices for entitlements tables are now feasible in Extracts
Let’s dig into the essentials and how we can make this work for effective Row Level Security.
In this post, I’ll be describing a set of steps to follow to isolate the causes of performance issues on Tableau Server.
Here are the basic steps:
- Test the workbook in Tableau Desktop. Does it perform well? If yes:
- Test the workbook in Tableau Desktop on the Tableau Server machine. Does it perform the same as it did on the previous machine? If yes:
- Publish the workbook to Tableau Server, and find a time when there is low-to-no usage on the Tableau Server. Go to the published workbook. Did it perform relatively the same as the test in Step 2 (within 1-3 seconds)? If yes:
- Test the workbook during a time of high usage on the Tableau Server (either natural or do load testing using TabJolt).
Have you heard this one before? “Just connect to your data in Tableau and start visualizing. Then you’ll publish and share with your whole organization.” It’s a great line, because it’s true. You CAN get started with analysis on top of just about any data in Tableau. But “can” is not “should” — what is possible may not be the BEST way, particularly if you want to scale up. When dealing with massive amounts of data, a better solution is to have two data sources: (1) A pre-aggregated data set for overviews, which I’ll call the Overview data source (2) The row-level data set, which I’ll call the Granular data source. Tableau’s abilities to filter between two data sources (actions & cross-datasource filters in Tableau 10) make this an excellent strategy, and one that I have seen massively improve performance over and over.
Tableau Server, particularly since the 9.0 release has fantastic caching mechanism. Once a view has been loaded into the cache, any subsequent view using the same data will load extremely quickly. This is why you may notice that a first view in the morning takes some amount of time to load, but every other view is much quicker. Some Tableau customers even “warm” the cache on some of their views by scheduling an e-mail or pinging the Tableau Server for a request of a PDF early in the morning, before any of the regular viewer come in. You can even force a refresh using the “warming” technique by appending the :refresh parameter to the end of your request.
When you are trying to maximize performance in Tableau, particularly on a live connection, sometimes the smallest changes can make a big difference. All of your choices in Tableau Desktop eventually end up as a real live SQL query, which the database will have to interpret. The simpler the query, the easier the interpretation, and in most cases the quicker the results.
Tableau’s Dashboard Actions are amazing, and in the newer versions there is a quick little “Use as filter” button on each sheet in a Dashboard. This creates an Action in the Dashboard->Actions menu which is set to “All Fields” down at the bottom. This is incredibly convenient from a creation standpoint; however, it means that the selected values for every single dimension in the Source Sheet will be passed along as filters in the WHERE clause of the eventual SQL query. This includes categorical information which you are displaying: if you are showing Product Category, Product Sub-Category, and Product ID; all three will be sent in the eventual query.
Particularly when you are getting down to granular details, you really only need the most granular piece of information to be passed into the WHERE clause. For optimal performance, you really only want to pass in values for fields that are indexed in the database. In the previous example, presuming that a Product ID can only belong to one Category and Sub-Category, setting the Action to “Selected Fields” and choosing “Product ID” would simplify the query sent; hopefully Product ID is indexed and thus you get an incredibly quick lookup.
I’ve put together a new page in the top menu on Tableau Performance which links out to a lot of brilliant posts from all over (and a few of mine) on all of the topics involved in making things fast in Tableau. I’ll continue to update it over time with more and more resources.
Learning to design so that you limit loading unnecessary rows of granular data is the most important technique you can learn to make Tableau perform well. It reduces the strain on the database in finding and returning data, and it limits the amount that Tableau needs to return. Ask yourself: “Am I getting tired of scrolling through this list trying to find what I need?” If the answer is yes, consider what other views of the data might help you filter down just to the rows you need to see.
Setting Actions to Exclude when nothing is selected
The ideal workflow in Tableau, both from a performance and a visual analysis perspective, is useful visualizations that filter via Dashboard Actions that act on click / select. You can limit what is shown by setting your Actions to Exclude when nothing is selected. After the first time you do a selection and clear it, the affected sheets won’t show anything until something has been selected.