With the release of tableau_tools 4.0.0 and Tableau Server 10.5, most of the pieces are in place in the library and in the product itself to allow for an efficient path for publishing unique extracts for different customers all from a single original data source (or workbook) template.
The basics steps of the technique are:
- Create a template live connection to a database table, Custom SQL or a Stored Procedure in Tableau Desktop. This does not need to be the final table/custom SQL or Stored Proc; you can use a test or QA data source and switch it programmatically to the final source
- Optional: Set up your the appropriate filtering for a single customer / user / etc. — whatever the main filtering field will be. You can instead add this later programmatically.
- Save that file (TDS or TWB)
- Use the tableau_tools.tableau_documents sub-module to programmatically add any additional filters or modify the filters / parameters you set
- Use tableau_tools to alter the actual table / SP / Custom SQL to the final version of that customer
- Add an extract to that data source in tableau_tools. This will use the Extract API / SDK to generate an empty extract with the bare minimum of requirements to allow it to publish and refresh
- Save the new file. It will be saved as a TWBX or TDSX, based on the input file type
- Publish the file to Tableau Server
- Send an Extract Refresh command to Tableau Server using the REST API (using the tableau_tools.tableau_rest_api sub-module).
- Extract will refresh based on the information in the TDS and be filled out with information just for the specified customer/user/whatever you filtered
Why use tableau_tools and this method vs. the Extract API / SDK directly?
The direct Extract SDK method has the following limitations:
- You have to know how to do the connections programmatically in whichever language you are using
- Each query that is built must be hard-coded, or you need to build out a flexible platform for handling arbitrary queries
- The resulting TDE or Hyper file has no knowledge of how it was created, and so all updates require rerunning the Extract API script to create a new file, then push it to the Tableau Server
If instead we start with TDS or TWB files, our regular Tableau Desktop users (the data stewards in many cases) can do all their regular work to create great data sources, then simply pass them over to the publishing process, which just adds the empty extracts, pushes out, then refreshes them.
Examples of the Process using tableau_tools
As mentioned earlier, tableau_tools 4.4.0 and above contain the most complete set of tools for addressing any of these situations. Any version of Tableau Server after 10.3 has the necessary REST APIs to refresh an extract on demand, but with 10.5, you can be more specific with the particular workbook or data source you want refreshed, and you can also put the newly published workbook or data source on a refresh schedule.
Table-based Data Sources
Let’s look at a basic connection created in Tableau Desktop, to a regular table or view in a database. We want to use that connection, but change the name of the table and also add a filter on region to create separate workbooks for each regional team.
# Because all of the extracts will be published to the same site, we can use a single connection t = TableauRestApiConnection28(server=u'https://tab.internal.lan', username=u'my_name', password=u'donthackme', site_content_url=u'regional_site') t.signin() t.enable_logging(logger) for region in [u'North', u'South', u'Central', u'West']: t_file = TableauFile(u'Regular Table.twb', logger) dses = t_file.tableau_document.datasources # type: list[TableauDatasource] # In this example, there is only one data source, but the loop is still good practice # You can create more logic to protect and only change when you find particular variations for ds in dses: # Change the database connection to the Prod_DB database for conn in ds.connections: conn.dbname = u'Prod_DB' # Change the table name from test_table to customer_fact ds.main_table_relation.set(u'table', u'[customer_fact]') ds.add_extract(u'Extract File.tde') # Set the filter on the extract ds.add_dimension_extract_filter(u'Region', [region, ]) new_file = t_file.save_new_file(u'{} with Extract'.format(region)) # In our example, each region has their own Project on the same site, named for the Region proj_obj = t.query_project(region) new_wb = t.publish_workbook(workbook_filename=new_file, workbook_name=u'{} Customer Details'.format(region), project_obj=proj_obj, overwrite=True, connection_username=u'tableau_db_user', connection_password=u'tableau_user_passwrd') t.add_workbook_to_schedule(wb_name_or_luid=new_wb, schedule_name_or_luid=u'6 in tha morning') t.run_extract_refresh_for_workbook(new_wb)
Custom SQL Data Sources
With Custom SQL, often we aren’t adding an extract filter, but actually substituting new Custom SQL in which has different WHERE clauses, etc. Let’s take a similar example to before, but will modify a TDS file rather than a TWB.
for region in [u'North', u'South', u'Central', u'West']: t_file = TableauFile(u'Custom SQL.tds', logger) dses = t_file.tableau_document.datasources # type: list[TableauDatasource] # TDSes only have one data source, but the loop is still good practice # You can create more logic to protect and only change when you find particular variations for ds in dses: # Change the database connection to the Prod_DB database for conn in ds.connections: conn.dbname = u'Prod_DB' # With Custom SQL, you change the text of the relation directly # (Yes, this would be a SQL injection attack vector if you allowed outside world to set) ds.main_table_relation.text = u"SELECT * FROM customer_fact WHERE region = '{}'".format(region) ds.add_extract(u'Extract File.tde') new_file = t_file.save_new_file(u'{} with Extract'.format(region)) # In our example, each region has their own Project on the same site, named for the Region proj_obj = t.query_project(region) new_ds = t.publish_datasource(ds_filename=new_file, ds_name=u'{} Customer Details'.format(region), project_obj=proj_obj, overwrite=True, connection_username=u'tableau_db_user', connection_password=u'tableau_user_passwrd') t.add_datasource_to_schedule(ds_name_or_luid=new_ds, schedule_name_or_luid=u'6 in tha morning') t.run_extract_refresh_for_workbook(new_ds)
Stored Procedure Data Sources
A lot of people use Stored Procedures to efficiently deliver back the data that Tableau will be ingesting as an Extract. You can also set the parameters of the Stored Procedures
for region in [u'North', u'South', u'Central', u'West']: t_file = TableauFile(u'Stored Proc.tds', logger) dses = t_file.tableau_document.datasources # type: list[TableauDatasource] # TDSes only have one data source, but the loop is still good practice # You can create more logic to protect and only change when you find particular variations for ds in dses: # Change the database connection to the Prod_DB database for conn in ds.connections: conn.dbname = u'Prod_DB' # With stored procedure parameters, you need to specify the name as it would show up in Tableau Desktop # This includes the @ for MS SQL Server and Sybase ASE ds.set_stored_proc_parameter_value_by_name(u'@Region', region) ds.add_extract(u'Extract File.tde') new_file = t_file.save_new_file(u'{} with Extract'.format(region)) # In our example, each region has their own Project on the same site, named for the Region proj_obj = t.query_project(region) new_ds = t.publish_datasource(ds_filename=new_file, ds_name=u'{} Customer Details'.format(region), project_obj=proj_obj, overwrite=True, connection_username=u'tableau_db_user', connection_password=u'tableau_user_passwrd') t.add_datasource_to_schedule(ds_name_or_luid=new_ds, schedule_name_or_luid=u'6 in tha morning') t.run_extract_refresh_for_workbook(new_ds)