The techniques outlined in this post are applicable to Live Connections and Multi-Table Extracts (available in Tableau 2018.3+). If you need to use Extracts and are on a version of Tableau prior to 2018.3, please see Keeping Your Extracts From Blowing Up .
Learning to design so that you limit loading unnecessary rows of granular data is the most important technique you can learn to make Tableau perform well. It reduces the strain on the database in finding and returning data, and it limits the amount that Tableau needs to return. Ask yourself: “Am I getting tired of scrolling through this list trying to find what I need?” If the answer is yes, consider what other views of the data might help you filter down just to the rows you need to see.
Setting Actions to Exclude when nothing is selected
The ideal workflow in Tableau, both from a performance and a visual analysis perspective, is useful visualizations that filter via Dashboard Actions that act on click / select. You can limit what is shown by setting your Actions to Exclude when nothing is selected. After the first time you do a selection and clear it, the affected sheets won’t show anything until something has been selected.
“Tell us about Tableau on Azure”. “Does Tableau run on Azure?” There is probably no more confusing question at the moment, because the terminology for Microsoft Azure is unclear, and Microsoft have been making changes to both the technology and the nomenclature so quickly that it’s best to step back and understand the ways in which Tableau can interact with the Azure platform.
There are three different situations can fall under the “Tableau and Azure” moniker, all of which are possible:
- Tableau Server hosted on a Virtual Machine in Azure
- Tableau Desktop or Server connecting to the Microsoft run and operated Azure SQL Database
- Tableau Desktop or Server connecting to a database (often Microsoft SQL Server) on a hosted Virtual Machine in Azure
Tableau Server on a hosted VM
Much like AWS, Tableau Server can be run on a hosted VM in the Microsoft Azure cloud. Russell Christopher has done an excellent job of testing out the available VM and storage configurations available in Azure and making recommendations on what is necessary for good performance with Tableau Server. If you want to host Tableau Server on Azure, stop now and read Russell’s blog.
In virtualized environments, disk access / IOPS tends to be the biggest hidden issue for Tableau Server performance, particularly if you are using extracts. This is true of Azure, AWS, and also your internal VMWare configuration.
The official Tableau KB article on installing Tableau Server on Azure is now available here .
Tableau Desktop and Server connecting to Azure SQL Database
Here is where the wording gets fun (i.e. confusing). As if “SQL Server” wasn’t already a generic enough name, Microsoft refers to their managed and hosted cloud database, based on “SQL Server”, as “Azure SQL Database”. Internally, it is very similar to SQL Server, and as of Tableau 9.1, you simply connect via the standard Tableau native connector for Microsoft SQL Server. Put in your credentials, and you are good to go.
Tableau Desktop and Server connecting to a Microsoft SQL Server database on a hosted VM in Azure
You can also put a database (usually Microsoft SQL Server) on a hosted VM in Azure. Luckily, to Tableau Desktop and Server, the process for connecting is identical to that of any SQL Server connection: put in your credentials and viola!
As the recent post on Vertica brings to light, sometimes really highly performing systems need a little configuration to perform optimally with Tableau. There’s a particular set of systems that require some extra thought and care to use with Tableau, because if you set off without any planning and expect to combine Tableau’s ease of use with the speed of these systems and end up staring at the “query executing” screen for 10 minutes, you may start to doubt everyone’s claims.
The systems I’m talking about are the Massive Parallel Processing (MPP) databases. There’s already a great explanation of them here so I’m not going to go too deeply into how they work, other than what is relevant for Tableau. Which systems that Tableau supports are MPP (don’t get too angry if I get this a bit wrong) (in no particular order):
- Aster (although there is some Hadoop going on in the backend)
Really exciting announcement of TabMon for monitoring Tableau Server performance!
There are some good reasons to look at the actual queries Tableau sends to a live relational database:
- Performance is not good and your DBA needs to know what is happening so they can optimize. This goes for live connections and slow extract generation.
- Your viz still isn’t performing well enough even with extracts, but you don’t understand the TQL language you see in the Performance Recorder (it’s okay, no one does!). Seeing the same logic in SQL can help you understand what exactly is going on
- You just want to marvel at the amazing ability of the VizQL engine to translate your actions into SQL. You should check out what LOD calculations, Sets, or calculated fields look like sometime just to marvel at what is going on
Read more for how to accomplish #1
Update: You might just skip all this and go with the newly released TabMon to bring together all your Tableau Server system performance needs.
There’s a great Tableau KB article on using the Windows Performance Monitor (perfmon) that unfortunately hasn’t been updated since the 9.0 release. The steps in perfmon are all the same, but there are few more processes to monitor than there were in the 8.0 series
- Vizqlserver – processes that query data and build views
- Vizportal – The Tableau Server UI that is not a viz (shouldn’t be used much in an embedded situation)
- Wgserver – Now just the REST API server (and a few other tasks in)
- Backgrounder – takes on long running tasks like extract refreshes
- TDEServer64 – The Tableau Data Engine (TDE), our fast columnar data extract engine
- Searchserver – the SOLR service Tableau runs
- Zookeeper – Tableau coordination Services
- Redis-server – Shared cache services
- Httpd – The Apache web host and load balancer, usually a very low user of memory, but worth watching
For how to analyze it all, nothing beats the Alan ‘Ty Alevizos’ Smithee’s original workbook and instructions .