Publishing Datasources and Workbooks to Multiple Sites

This post coincides with the 1.2.0 release of the tableau_rest_api library, which incorporates code for handling all of this programmatically, in an integrated way with the REST API. As always, absolutely none of this is supported and it may break at any time.

You’ve built out a some great visualizations on your test database and now want to deploy them to the production servers. And then you want to deploy to each of your customer’s sites, but each customer’s data is silo’d off. You look for the button to deploy in Tableau Desktop, but alas it does not exist. This scenario is incredibly common, particularly for SaaS software deployments or if you are deploying multiple copies of Tableau Server in different data centers. Luckily it is very possible, and there are now facilities in the tableau_rest_api library to easily handle most common use cases.

The 9.0 REST API has a method for changing the database server, but no facility for changing the schema/database or whatever else your favorite RDBMS likes to call its logical partitions. I’m going to call them schemas throughout here because even though SQL Server and PostgreSQL call them databases it’s too confusing; don’t get me started about Tableau Server having a web server component on a physical server.

Best Practice – Publish all your datasources

The number one best practice if implementing this way is to Publish All Your Datasources. Publish them successfully to the first site, then connect your template workbook to those published datasources and build your visualizations from there.

Second, always publish ‘identical’ datasources (with just a change to the ‘dbname’ parameter / schema ) with the exact same name across all of your sites. Why? If a site has a Published Data Source with an identical name as the one you connected to in the template workbook, you can publish that workbook to ANY site without making any changes to the XML, and it will just work (only tested in 9.0). This is worth repeating — if you manage your datasources correctly, you only need ONE workbook. This was all true prior to 9.2. Additional checking was implemented in 9.2, so you do now need to change the site_content_url to successfully publish to a different site.

Published data sources are a necessity when you are publishing a workbook with multiple connections and you want to publish programmatically. The Tableau Desktop user interface has a way to embed the credentials of multiple live connections in a workbook, but the REST API (and tabcmd) only allow one set of credentials to be passed at publish time. So if you need to embed credentials to multiple datasources in a workbook, you need to upload those datasources first with the appropriate credentials, then publish the workbook pointing to the published data sources.

USING Tableau_REST_API 1.2.0 TO DEPLOY

Originally, the tableau_rest_api Python library published workbooks and datasources directly from disk, and was limited to writing anything brought back via the ‘save’ methods to disk. Version 1.2.0 includes a number of changes to make the process of changing datasource connection information easy.

TableauDatasource and TableauWorkbook are new classes with the necessary methods to do safely and consistently change only the XML we are interested in. They actually only change the ‘connection’ tag lines and use everything else as it came in originally. Each TableauDatasource contains one TableauConnection object that handles the changes to the ‘connection’ tag and nothing else. TableauWorkbook objects can contain multiple TableauDatasources in the ‘datasources’ attribute. You can create a TableauWorkbook or TableauDatasource from the XML as a string; in 1.2.0, the download_ methods for workbooks and datasources return the downloaded XML for this very purpose. You can also load directly into them from disk. Similarly, the publish_ methods can now take their appropriate object instead of a filename, so you can keep everything in memory while making your variations.

Remember, you always need to pass your database credentials during the publish step if you want security to be embedded (not necessary for SQL Server when Windows Authentication). Changing the username in the TDS or TWB is unnecessary.


from tableau_rest_api.tableau_rest_api import *
import urllib2

username = ''
password = ''
server = 'http://127.0.0.1'

logger = Logger('publish.log')

test = TableauRestApi(server, username, password, content_url='Test')
test.enable_logging(logger)
test.signin()

prod = TableauRestApi(server, username, password, content_url='Prod')
prod.enable_logging(logger)
prod.signin()

test_ds_proj_luid = test.query_project_luid_by_name('Datasources')
# Publishing up my first data source to Test, from disk
new_ds_luid = test.publish_datasource('postgres.tds', 'Postgres Test Two', test_ds_proj_luid,
overwrite=True, connection_password='', connection_username='')
# Download the datasource that we just published, to test that it worked. Now returns the string
downloaded_ds = tab_srv.download_datasource_by_luid(new_ds_luid, 'Downloaded source - demo 1')

# Create a TableauDatasource object to allow us to change the dbname
ds_obj = TableauDatasource(downloaded_ds)
ds_obj.connection.set_dbname('demo_2')

# Save to disk just so we can see what we did
destination_tds = 'postgres_2.tds'
ds_obj.save_datasource_xml(destination_tds)

# Publish the changed version up to the Prod server
prod_ds_proj_luid = prod.query_project_luid_by_name('Datasources')
# Here we're passing the TableauDatasource. Notice I'm naming it the same thing in both systems
new_ds_luid_prod = prod.publish_datasource(ds_obj, 'Postgres Test Two', prod_ds_proj_luid,
overwrite=True, connection_password='', connection_username='')

At this point, you could publish a workbook connected to the ‘Postgres Test Two’ source on the Test site to both the Test and Prod sites, without making any changes.

If you want to update the live connections which are stored directly in a TWB, there you can access the ‘datasources’ attribute, which is a dict in format { datasource_name : TableauDatasource}. Then use the ‘connection’ attribute of the TableauDatasource, which is a TableauConnection object that has method for setting the dbname, port, and server.

Here’s an example (also showing loading the TableauWorkbook object from a file on disk)


wb_filename = 'two source.twb'
fh = open(wb_filename, 'rb')
wb_obj = TableauWorkbook(fh.read(), logger_obj=logger)
# wb_obj.enable_logging(logger)
for ds in wb_obj.datasources.values():
print ds.get_datasource_name()
print ds.connection.is_published_datasource()
print ds.connection.get_dbname()
if ds.connection.get_dbname() == 'demo':
ds.connection.set_dbname('demo_2')

wb_obj.save_workbook_xml('saved_workbook.twb')

tab_srv.publish_workbook(wb_obj, 'Switched Up Datasource', proj_luid, overwrite=True, connection_password='',
connection_username='')

HOW IT WORKS

One of the most best things about Tableau’s file formats are that they are relatively easily understood XML files. It’s an ‘open secret’ that if you have databases with identical schemas, you can simply create an initial template file, save the TDS (or the TWB) and then make copies with the small changes necessary in the XML to change from one schema to another.

If you pop open a TDS in a good text editor, most likely on line 5 you will see the connection tag


  <connection class='postgres' dbname='demo' odbc-native-protocol='yes' port='5432' server='localhost' username='postgres'>

There are our dbname, server and port attributes. We just need an easy way to change them and we’re off to the races. The ‘username’ attribute does not drive the username used by server at all — you need to include both the username and the password when you publish (if you want embedded credentials) and the ‘username’ attribute in the XML does not need to match the credentials you publish with.

If you open up a TWB file, you’ll find a datasources tag, and inside it will contain datasource tags that are exactly the same as those in the TDS file. The workbook literally contains its datasources.

The only major difference in a workbook from a TDS is when there is a published data connection.


<datasource caption='Postgres Test Two' inline='true' name='sqlproxy.42178.428603298613' version='9.0'>
<repository-location derived-from='http://127.0.0.1/t/Prod/datasources/PostgresTestOne?rev=1.1' id='PostgresTestTwo' path='/t/Test/datasources' revision='1.3' site='Test' />
<connection channel='http' class='sqlproxy' dbname='PostgresTestTwo' directory='/dataserver' port='80' server='localhost' username=''>
<relation name='sqlproxy' table='[sqlproxy]' type='table' />

The ‘caption’ attribute in the ‘datasource’ tag is the name that the data source was published under. You’ll see it matches very closely to ‘dbname’ in the ‘connection’ tag — Tableau reduces all spaces and extraneous characters to give an ‘internal name’. This is why datasources should be published with the exact same name from one site to the next; this will result in the same ‘dbname’. You can see a ‘repository-location’ tag; nothing needs to be modified here, and the ‘derived-from’ attribute has no effect on anything in my testing.

Because Tableau Server handles all necessary transformations in these portions of the XML, there is no need to modify anything on a published data source in a workbook, simply republish it to a different site after publishing up the data source to that site (with whatever changes are necessary). This was all true prior to 9.2. Additional checking was implemented in 9.2, so you do now need to change the site_content_url to successfully publish to a different site.

Advertisements

4 comments

  1. Hi there! Thanks for the awesome write up and for the work on tableau_rest_api. I’m trying to make use of it to automate the deploy of our datasources. I can sign in, list data sources and projects, but when I try to publish a datasource (in a way almost identical to your first example) I get the following error output. I was wondering if you have ever run across something like this.

    Traceback (most recent call last):
    File “~/python/rest-api-python-sample/publishDataSource.py”, line 32, in
    a.publish_datasource(‘headcount.tds’, ‘headcount’, DEFAULT_PROJECT_ID, overwrite=True, connection_username=”,connection_password=”)
    File “build/bdist.macosx-10.10-x86_64/egg/tableau_rest_api/tableau_rest_api.py”, line 1407, in publish_datasource
    File “build/bdist.macosx-10.10-x86_64/egg/tableau_rest_api/tableau_rest_api.py”, line 1511, in publish_content
    File “build/bdist.macosx-10.10-x86_64/egg/tableau_rest_api/tableau_rest_api.py”, line 343, in send_publish_request
    File “build/bdist.macosx-10.10-x86_64/egg/tableau_rest_api/tableau_rest_api.py”, line 1701, in request_from_api
    File “build/bdist.macosx-10.10-x86_64/egg/tableau_rest_api/tableau_rest_api.py”, line 1669, in __make_request
    File “/usr/local/Cellar/python/2.7.9/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py”, line 437, in open
    response = meth(req, response)
    File “/usr/local/Cellar/python/2.7.9/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py”, line 550, in http_response
    ‘http’, request, response, code, msg, hdrs)
    File “/usr/local/Cellar/python/2.7.9/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py”, line 475, in error
    return self._call_chain(*args)
    File “/usr/local/Cellar/python/2.7.9/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py”, line 409, in _call_chain
    result = func(*args)
    File “/usr/local/Cellar/python/2.7.9/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py”, line 558, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
    urllib2.HTTPError: HTTP Error 400: Bad Request

    I’ve also experimented using Postman to issue the POST requests using the request body generated by tableau_rest_api. I get a 400 Bad Request, “Your browser sent a request that this server could not understand.” response.

    Any input would be greatly appreciated 🙂

    Thanks!

    Like

    1. Hi Drew,
      If you aren’t uploading credentials, then you can just drop out the connection_username= and connection_password= arguments. Otherwise they need some kind of values — the method itself that builds the XML expects a python None rather than empty strings. If you are putting in values there and just leaving them out for obfuscation, then there may be something else going on. What is appearing in the log files, those should be capturing all of the XML requests that are attempted.

      Like

      1. Thanks for the pointers! I removed the connection username and password in the publish_datasource command, but I’m still getting the same error. The actual POST request looks ok via the logs for what it’s worth. The end of the log file has the response below:

        2015-07-10 10:46:12: HTTP verb is post
        2015-07-10 10:46:12: Making REST request to Tableau Server using post
        2015-07-10 10:46:12: Request URI: https://tableau-test.redfintest.com/api/2.0/sites/695a28d9-f7e7-494e-be47-fb365a7cbb0d/datasources?overwrite=true
        2015-07-10 10:46:13: Received a 400 error, here was response:
        2015-07-10 10:46:13:

        Bad Request
        There was a problem publishing the file ‘headcount.tds’.

        Any additional thoughts?

        Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s