The Tenets of Tableau Templates on Multi-tenants

We have a tendency to answer questions of multi-tenancy in Tableau swiftly with switching between Sites, which are the virtual tenements for your Tableau tenants. But Sites are the simple part of the equation; when there is a need for multitenancy in Tableau, there is most likely existing multi-tenancy in the data systems Tableau must connect to. I’m going to dive into the diverse ways that customers corral their data and outline all the tenets of deploying effectively from a single template to all your tenants.

Why do you need to do these things?

Tableau Data Sources, which can either be a separate TDS file or be included within a TWB file, include a lot of information about the connection that is being made, from the database type and name, to the tables and how they will be JOINed together. Tableau Server actually only knows a very small portion of this information — those elements that can be updated using the Update Datasource Connection method of the REST API. Credentials are one of those things, which means you can send credentials updates through the REST API to all of the content published to your Tableau Server. The database server address and port can also be updated, if you do a data center swap and need to change only the server address (and port).

The rest of the information, including such thing as the database name/schema and the table JOINs (or Custom SQL) as well as any filters or extract definitions, all reside only in the XML of the TDS or TWB. This means that the process for generating variations in a Data Source from a template involves modifying those elements in the XML prior to publish time. You can do this manually, but the vast majority of things that need to be modified are encapsulated in the tableau_tools library. All of the examples in this post will be using the tableau_tools Python library.

A note on terminology

The database world is full of terminology that is used differently by each vendor, so I’m going to use some wonderfully generic terms to describe the situations we commonly see. In this post, a database server is the thing you type credentials into in Tableau during “Connect to Data”, referenced by a single domain name or IP address. A logical partition is a grouping of tables (and so forth) on that database server (which is referred to as a “database” in MS SQL Server and a “schema” in Oracle and who knows what else in other RDBMS systems). A schema is the organization and structure of those tables in the logical partition.

A Data Source in capitals means a Tableau Data Source, the thing that can be saved as a TDS file or published to a Tableau Server. Data Sources can also exist within a TWB file, in which case they are known as embedded Data Sources, as opposed to a Published Data Source. The XML for a Data Source is identical whether it is in a separate TDS file or a TWB file, so any description or example of the techniques for modifying Data Sources will apply to either.

Best Practices for Data Source Management

Connecting your Workbooks to Published Data Sources is considered best practice for the following reasons:

  • Published Data Sources can have different credentials from the Workbooks that connect to them, so even if a user can download the Workbook, all the will see in the XML is the reference to the Published Data Source. While credentials are never provided back in any Tableau XML, this protects you from revealing the true server names and ports, among other things.
  • Published Data Sources can be shared among multiple Workbooks, so things like Row Level Security Data Source Filters only need to be implemented once and will apply to any Workbook that uses them. When using extracts, you will not be generating extracts with duplicated data, thus reducing overall workload.
  • In Workbooks with multiple Data Sources, it is not possible to publish the credentials for each of the connections via the REST API. So if you have a programmatic publishing need for these Workbooks, you need to publish them each separately so you can specify credentials.

In some older versions of Tableau Server there were occasional performance differences between Embedded and Published Data Sources but these have mostly been resolved in the more recent releases.

The Central Tenet of Multi-tenants in Tableau

For a template Tableau Data Source to work with variations, the schema (remember above, the organization and structure of the tables) must be the same between the original set of tables that the Data Source was connected to, and to whatever tables that the data source will connect to based on the changes you make to the template.

Example 1: There is a table called “DataFacts” in logical partition 1 with three columns – A, B, C, which you have connected to in Tableau. You can save the Data Source to a TDS file (or edit the relevant XML within the TWB file) and change the XML reference to point to a different logical partition (even on a different database server), as long as that new partition has a table called “DataFacts” with three columns A, B, C.

Example 2: You are using Custom SQL that connects to the same logical partition where the SELECT statements do not change, but you have different values in the WHERE statements that restrict the data differently for each Tableau Data Source that you publish. You can modify the WHERE clauses in the Custom SQL that is stored in the Data Source XML and it should work without any issues. If that Custom SQL needs to apply on different logical partitions, then you will also modify the reference to the logical partition as in Example 1.

Example 3: There is a table called “DataFacts” in logical partition 1 with three columns – A, B, C.  However, the second logical partition actually has a table called “data_facts” with the same columns, because the universe is cruel. To use the template file, you must change the XML reference to point to a different logical partition (even on a different database server), and also change the name of the table that you are connected to.

In any of these situations, the essential aspect is that the underlying column names and data types match exactly, regardless of which table on whichever logical partition you are pointed to. If things start to vary, then you have to be concerned with considerably more detail (see the end of the article).

A quick note about Dev -> Test -> UAT -> Production

If your use case is not multi-tenanted, but you still need to change some elements of the  data source connections when moving from Dev -> Test -> UAT -> Production environments, the exact same techniques will be used. Just imagine each of those environments is a “tenant” and follow the instructions below for “1 client, 1 tenant”

Types of multi-tenancy and how to work on them

“Virtual” multi-tenancy / Single Data Warehouse: 1 database server, 1 logical partition

Some customers store all of their customer data in a single logical partition, with a column in every table that identifies which of the tenants the data belongs to. Often this is the case in a reporting data warehouse, but your operational system may also exist this way. It is more rare in industries that have strict regulations around personal identifying information, such as healthcare.

Solution A: One Workbook with Row Level Security

A simple example is a “customer_ID” column on every row of data in the database. Using Row Level Security, join in the “security_table” on “customer_ID”, then put your security calculation of “[Username] = USERNAME()” as a data source filter. This will always filter the data down to just those rows that the user should see.

One or many Projects or Sites

When you have Row Level Security in place, you can theoretically have all of your customers access the same Tableau Server Site, accessing a single published Workbook. You won’t run into an security issues from a data perspective — when you do Row Level Security the correct way, a customer will only ever see the data that you have given them entitlements for. However, there are other aspects of the Tableau Server, such as Custom Views and Subscriptions, which are a little more “leaky”, in the sense that a user might glean the existence of others users through those functionalities.

The most air-tight secure way is to publish a copy of the Workbook to a Site for each of your customers. The convenient thing is that since all of the tenants are in the same database, you will just need to publish the workbook or data source to each Tableau Site, with no changes. Pretty nifty, huh?

Here’s an example using tableau_tools in Python, assuming you have a Project for each of your customers already:


t = TableauRestApiConnection26(server=u'https://myserver', username=u'you', 
                               password=u'secure', site_content_url=u'my_production_site')
t.signin()

for proj in [u'Customer A', u'Customer B', u'Customer C']:
    proj_obj = t.query_project(proj)
    t.publish_workbook(u'Workbook.twbx', u'Workbook with RLS', proj_obj, 
                       overwrite=True, save_credentials=True, connection_username=u'addmen', connection_password='u'addmen')

Instead, you could loop through your various sites, and publish to a Project with the same name on each Site:

 
for customer_site_content_url in [u'customer_a', u'customer_b', u'customer_c']:
    t = TableauRestApiConnection26(server=u'https://myserver', username=u'you', 
                                   password=u'secure', site_content_url=customer_site_content_url) 
    t.signin()
    proj_obj = t.query_project(u'Destination Project') 
    t.publish_workbook(u'Workbook.twbx', u'Workbook with RLS', proj_obj, 
                       overwrite=True, save_credentials=True, connection_username=u'addmen', connection_password='u'addmen') 

If you are using Extracts (or are paranoid)

If you are publishing multiple workbooks, however, you are presented with the opportunity to do a little hard-coding of the filters that cause the multi-tenancy. For example, you might put a data source filter in place, set to the value of each of the customers. This way even if you somehow screwed up your Row Level Security calculations (not that you would!), the customer would still only see their data. This is particularly useful for Extracts, because a filter at the Extract level will be applied when the data is pulled in, thus giving you a bunch of small, efficient extracts, rather than one large extract with every customer in it.

Whether you should do this or not really comes down to how large your data is — if your whole data set is (and this is an arbitrary number) less than 5 million records for all your customers, it may not be worth the effort to maintain the different filters and cause different extracts to be generated. In that case, you should probably publish a single Data Source with a Workbook for each customer connected to it, with a Data Source filter in place. If each customer’s data is running in the 5 million record range or above, you’ll definitely get performance improvements by have an extract per customer.

In this example, we’ll put on a Data Source filter for a Live Connection. There is different code for a filter on an Extract (the Extract filter happening at creation and refresh time, while the DS filter happens at run time, whether live or extract)


t = TableauRestApiConnection26(server=u'https://myserver', username=u'you', 
                               password=u'secure', site_content_url=u'my_production_site')
t.signin()

t_file = TableauFile(u'Workbook.twbx')
dses = t_file.tableau_document.datasources # type: list[TableauDatasource]
for customer_name in [u'Customer A', u'Customer B', u'Customer C']:
    proj_obj = t.query_project(customer_name)
    for ds in dses:
        #for conn in ds.connections:
        #    conn.dbname = u'Global SuperStore Star Schema - Staging'
        ds.add_dimension_datasource_filter(column_name=u"customer_name", values=[customer_name, ])
    new_filename = t_file.save_new_file(u'Updated Workbook')
    t.publish_workbook(new_filename, u'Workbook with RLS', proj_obj, 
                       overwrite=True, save_credentials=True, connection_username=u'addmen', connection_password='u'addmen')
    # Always remember to clean up
    os.remove(new_filename) 

Standard SaaS / HIPAA-Compliant: 1 database server, 1 logical partition per tenant

If you have a single database server, with an identical logical partition per tenant, then your only prerequisite is some kind of table that records the logical partition name for each tenant ID. Ideally, you will have named your Tableau Sites to match your tenant IDs, but if not, you’ll need a way to match the Site names to the tenant IDs.

Solution: Programmatically publish from a template, changing the connection XML to point to the correct logical partition

As we saw in the Extracts/Paranoia section above, you can change the details in the data source connection on each round of publishing using the ds.connections collection. This is where to use your mapping of the customer / tenant to the correct logical partition on the database server. In this example, we’re assuming you are using a distinct Tableau Server Site for each Customer/Tenant, since the rest of your application is already partitioned and managed this way. It also assumes that the logical partitions of the database are named the same as the Tableau Server Sites — if not, you’ll have to map out which Sites match to which logical partition somewhere.

 
for customer_site_content_url in [u'customer_a', u'customer_b', u'customer_c']: 
    t = TableauRestApiConnection26(server=u'https://myserver', username=u'you', password=u'secure', site_content_url=customer_site_content_url)
    t.signin() 
    t_file = TableauFile(u'Workbook.twbx') 
    dses = t_file.tableau_document.datasources # type: list[TableauDatasource] 

    proj_obj = t.query_project(u'Destination Project') 
    for ds in dses: 
        for conn in ds.connections: 
            conn.dbname = customer_site_content_url

    new_filename = t_file.save_new_file(u'Updated Workbook') 
    t.publish_workbook(new_filename, u'Workbook with RLS', proj_obj, 
                       overwrite=True, save_credentials=True, connection_username=u'addmen', connection_password='u'addmen')
    # Always remember to clean up
    os.remove(new_filename) 

Multiple database servers, 1 logical partition per tenant

If your logical partitions are spread among multiple database servers, you need a mapping of the tenant ID to the logical partition name and the server address. You are basically just extending the mechanism from the SaaS setup above to take into account a few extra properties.

Solution: Modify the XML for Logical Partition Name and Server Address

If you are doing XML modification, you might as well change both while you are at it, they are just properties in the same tag. The server property of the TableauConnection object is what you’ll use to set the actual IP address or FQDN. You can see all of the properties here in the README.

You can update the Server Address using the REST API by itself later, if you need to, but when publishing changes to the Data Source, it’s easier just to do it all using the XML modification at once prior to publish.

Multiple database servers, 1 logical partition per server

If you have multiple database servers but the same logical partition name on each of them, with some arbitrary number of tenants spread among the servers, then you simply need a mapping of the tenants to the correct database server connection.

You will end up with a Workbook and Data Source for each set of customers on a given database server. You’ll have to have Row Level Security in place in the Data Source to make sure that the customers on a given logical partition only see their data. You’ll also need to set your Permissions in Tableau Server to show the correct Workbook to the set of customers on the correct database server. In this case, I’d recommend a Site per set of customers on a database server.

Solution: Modify the XML for Server Address

If your logical partition names are the same on the various servers, you only need to modify the server property.

More Complex Situations / Non-Identical Tenants

So far, the examples have worked by the logical partitions being identical in their schemas. This is the most preferred way for the things to be, and we can confidently say they will work (Tableau has released the Document API as an official way to do this process, on the basis that it is expected to work). However, you may be doing things beyond that simple use case, and that’s why tableau_tools.tableau_documents sub-module exists.

Different Table Names or Custom SQL or Stored Procedure Parameters

tableau_documents includes methods to alter the table names, or the Custom SQL, or the Stored Procedure name and any hard-coded SP parameters. The README has the best documentation, and if you are doing something simple, they should work without too much trouble.

Currently only the “main table”, that is, the one that is pulled out first in the data connection pane of Tableau Desktop, can be modified, just owing to the complexity of the way Tableau stores its table JOIN relationships.

Column Aliases (Pretty Names Customers See)

You can change the Column Names that your end users see, because those are stored as separate properties where the actual column definitions are defined. This is undocumented currently in tableau_tools but the functionality has been tested as in previous versions.

Different Table Column Names

If the tables in your different logical partitions actually have different column names, things become considerably more difficult. All of that information is stored in the Data Source XML as well, but you have to be sure the changes you are making are exact and there is a much higher risk of things breaking. At minimum, the data type needs to be consistent for the same “column” in the different logical partitions.

Workbooks that Connect to Published Data Sources

As mentioned in the Best Practices section (despite all of the examples showing embedded Data Sources within Workbook files), you should be connecting your Workbooks to Published Data Sources. However, to publish a template Workbook to a different site when it is connected to Published Data Sources means you also need to modify the connection details of the Data Source in the Workbook that describes the connection to the Published Data Source.

  1. You must published over the Published Data Sources to the target site before you publish the Workbook, or you will get an error
  2. Once you have published the Data Source to the destination site, you need to grab it’s contentUrl property to put into the Workbook XML prior to publish

There is blog post that discusses how this plays out when you are trying to replicate an existing Workbook with Published Data Sources. Theoretically if you know exactly which Data Sources you need to publish over, you don’t need to worry about the full set of scripting in that post, but understanding it and the example script it links to will help you do the whole process properly.

 

 

 

Advertisements

One comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s