Replicating Workbooks with Published Data Sources

If you were ever wondering why there is both a REST API and a Document API produced by Tableau, or why we at this blog put out tableau_tools implementing both of those functionalities (and more!), this use case will illustrate it clearly.

The desired action: Specify a workbook on one Tableau Server site to be downloaded and published to a different Tableau Server site (we’ll call this “replicating over”).

Why it is complicated: Best practice with Tableau Workbooks is to Publish their Data Sources separately, to aid in managing the metadata and to provide for unbreakable Row Level Security, among other great reasons. This means we need to download any Published Data Sources that the Workbook is connected to, and publish them over to the new site as well. Simple enough, right?

After a lot of research and testing, the following steps are required to accomplish this correctly:

  1. Download all of the workbooks you are interested in using the REST API
    1. Makes sure to do this one Project at a time, because Workbooks can have the same name if they are in different Projects
  2. Open up each of the workbook files to look at which published data sources (use tableau_tools.tableau_documents)
    1. Scan through all of the datasource elements in the Workbook XML.
    2. Check to see if each datasource is a published data sources
    3. If a published data source is found, find the contentUrl referenced within
  3. Query all Data Sources using the REST API. Search for any Data Source whose  contentURL attribute matches one of those from the workbooks
  4. Download the matching data sources using the REST API
  5. Publish the data sources across to the new Site
    1. You will need to provide the credentials for any data source at publish time, since there is no way to securely retrieve them from the originating site
  6. Once published, retrieve the details from the new Data Source on the new site, including the new contentUrl property
  7. Reopen the workbook file, then change the Site and Data Source cotentUrls to match the the newly published Data Sources on the destination site
  8. Publish the workbook using the REST API

Luckily, all of this is possible using tableau_tools, and there is a sample script available now showing how to do it.

Example Script

The example script for this process is contained in template_publish_sample.py under the function replicate_workbooks_with_publish_dses. It is well commented with explanations similar to this blog post as to why steps and the order of them are necessary.

Best Practices if you have this need

Give unique names to all of your Workbooks and Data Sources on one Site. There will be much less confusion overall if everything has unique names (you could theoretically look up things purely on the name attribute if you don’t have to worry that there is more than one)

Specify all your workbooks you want to move from one Project at a time, so that you don’t have any naming collisions as you download things down. If you follow the order of the example script, this also allows you to only publish over Data Sources one time, even if multiple Workbooks connect to them.

Why you have to go and make things so complicated?

Tableau Server only requires unique names for Workbooks and Data Sources within a Project, rather than within the Site. This means you can have multiple workbooks or data sources with the same “visible name”. Internally, Tableau Server generates a unique “contentUrl” property using a pattern which removes spaces and other characters, and appends numbers if the pattern would result in overwriting any existing contentUrl. However, there is no guarantee that you will get the same contentUrl from Site to Site, since the publish order could result in the numbering being different

  • Ex: Publishing “My Great Datasource” in Project A on Site A might get contentUrl “MyGreatDatasource”. Publishing “My Great Datasource” to Project B on Site A might get “MyGreatDatasource_1” as a contentUrl. But if you publish them to Site B in the opposite order, you might get “MyGreatDatasource” contentUrl for the data source in Project B, and “MyGreatSource_1” in Project A.

Tableau Server doesn’t have any REST API capability for looking into connection details beyond the few in the Query Datasource Connections method or Query Workbook Connections There are a few hints, but it’s not enough to make a full decision on what to publish over.

Query Workbook Connections will tell you the type property is “sqlproxy”, which indicates that you are dealing with a published workbook. However, if you look at the datasource tags within the connection tag, you will find that these “connections” cannot be queried via the REST API — they only “exist” when they come back from the Query Workbook Connections method, and if you use the IDs with the Query Datasource method, you will get an error.

Query Workbook Connections also will give you the name property, but as noted earlier, multiple datasources can have the exact same name if they are in different Projects. You actually need the contentUrl property of the datasource, which is not available.

The contentUrl property exists within the datasource XML of the workbook itself, and does need to be correct with regard to both Site and Data Source. There is no REST API method for querying or changing this at or after publish time, so you have to make that modification in the XML. Tableau’s official Document API does not have a mechanism to adjust these properties, but tableau_tools does.

 

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s