The Missing Link – Putting a Hyper File from the Extract API into a TDSX File

The Tableau Extract API 2.0 is an amazingly powerful tool for building out Extracts that, for whatever reason, cannot be built or maintained using the standard Tableau Server extract refresh process. The output of the Extract API 2.0 is a Hyper file (just as the older Extract API pushed out TDE files). You can publish a Hyper file directly to a Tableau Server, but there are several drawbacks:

  • Tableau Server will build out an automatic TDS file, taking a rough guess at any type of metadata categorization (Measure vs. Dimensions, Hierarchies, Geographic info, etc.)
  • The only use for this data source will be creating Ad Hoc reports using Web Edit (or hoping someone in Desktop now knows that it exists). You can’t integrate it easily in an existing Workbook

What is missing is a TDS file to pair up with the Hyper file, describing the exact metadata that you want to go along with the Extracted data. In this article, I’ll describe two workflows that result in a fully controlled TDSX file with a newly generated Hyper file.

Building a TDSX/TWBX in Desktop, then Dropping in a New Hyper file

The first workflow gives you the most control, and is recommended particularly if you know that your Extract definitions won’t be changing very frequently. You set up a template TDSX/TWBX:

  1. Create the Hyper file via the Extract API
  2. Open the Hyper file as your Data Source in Tableau Desktop
  3. Make any changes to the Data Source you would like
  4. Save the Data Source as a TDSX file / Save the Workbook as TWBX

Then for any subsequent updates / variations, you open the existing TDSX/TWBX and replace the .Hyper file with the new version. This will result in a new file, which can then be published via the Tableau Server REST API.

The one requirement for this method is that the schema of your Hyper file should not change after you have built out your template TDSX/TWBX, or things might break.

Here is an example using tableau_tools to create an Extract using the Extract API with two Tables (built using PyODBC cursors):

</pre>
<pre>def hyper_create_two_tables():
    hyper_final_filename = 'Test File.hyper'

    h = HyperFileGenerator(logger)

    # Definition of ODBC connection string for PostgreSQL
    db_server = 'dbserv1.company.lan'
    db_port = '5432'
    db_db = 'database_name'
    db_username = 'username'
    db_password = 'password'
    pg_conn_string = 'Driver={{PostgreSQL Unicode}};Server={};Port={};Database={};Uid={};Pwd={};'.format(
        db_server, db_port, db_db, db_username, db_password)

    # Connect using pyodbc to get first table
    fact_table_query = 'SELECT * FROM fact_table;'

    try:
        conn = pyodbc.connect(pg_conn_string)
        conn.setdecoding(pyodbc.SQL_WCHAR, encoding='utf-8')
        conn.setencoding(str, encoding='utf-8')
        conn.setencoding(unicode, encoding='utf-8', ctype=pyodbc.SQL_CHAR)

        # https://github.com/mkleehammer/pyodbc/issues/194 for this encoding fix

        conn.setdecoding(pyodbc.SQL_WMETADATA, encoding='utf-32le')
    except pyodbc.Error as e:
        print("ODBC Connection Error Message:\n")
        print(e)
        print("ODBC error, exiting...\n")
        sys.exit()
    cursor = conn.cursor()

    try:
        cursor.execute(fact_table_query)
    except pyodbc.ProgrammingError as e:
        print("\nODBC Query Error Message:\n")
        print(e)
        print("\nODBC error, exiting...\n")
        sys.exit()

    # This method gets all of the info from the table schema and correctly creates the extract schema
    h.create_table_definition_from_pyodbc_cursor(cursor)
    # This takes the cursor, reads through all the rows, and ads them into the extract
    h.create_extract(hyper_final_filename, append=True, table_name=u'Fact Table', pyodbc_cursor=cursor)

    print('Table 1 added to file {}'.format(hyper_final_filename))
    # Now repeat for the second entitlements table
    entitlement_table_query = 'SELECT * FROM entitlements;'
    try:
        cursor.execute(entitlement_table_query)
    except pyodbc.ProgrammingError as e:
        print("\nODBC Query Error Message:\n")
        print(e)
        print("\nODBC error, exiting...\n")
        sys.exit()

    # This method gets all of the info from the table schema and correctly creates the extract schema
    h.create_table_definition_from_pyodbc_cursor(cursor)
    # This takes the cursor, reads through all the rows, and ads them into the extract
    h.create_extract(hyper_final_filename, append=True, table_name=u'Entitlement Table', pyodbc_cursor=cursor)
    print('Table 2 added to file {}'.format(hyper_final_filename))

    print('All Done with Hyper create')</pre>
<pre>

And then the next step which is to swap in this newly created Hyper file into the existing TDSX file that we built using Tableau Desktop.

</pre>
<pre><code># First, we need to find the exact filepath name of the Hyper file
t_file = TableauFile('My AmazingDatasource.tdsx')
for file in t_file.other_files:
    print(file)
# Then we construct a dictionary telling what file to swap in
</code></pre>
<pre><code>t_file = TableauFile('My AmazingDatasource.tdsx')
file_map = { 'Data/Test File.hyper' : '/Users/bhowell/Documents/Test File 2.hyper'}
t_file.save_new_file('My AmazingDatasource - Updated', data_file_replacement_map=file_map)</code><code></code></pre>
<pre># Last, we publish to Server
ds_name = 'Extract API Generated Data Source'</pre>
<pre>t = TableauRestApiConnection33(server=u'http://myTableauServer', username=u'bhowell',
                               password=u'mypassword', site_content_url='site_b')
t.signin()
t.enable_logging(logger)
proj_obj = t.query_project('Default')
ds_id = t.publish_datasource('My AmazingDatasource - Updated.tdsx', ds_name, proj_obj, overwrite=True)
print('New DS ID {}'.format(ds_id))</pre>
<pre>

Building a TDS Programmatically

If your Hyper building process might result in files with different columns depending on what the user chooses or different customers have different columns (for example, if you are following the Flexible Extract Generator pattern), then it may not be possible to build the a single template TDSX/TWBX that works for all the possible Hyper files you are creating. In this case, it may be preferable to create a TDS file programmatically, then wrap it all together as a TDSX file.

We can build the Extract using the exact same code from the first example above. However, after creating the Hyper file, we’ll build a TDS file directly:

[Example to come soon ]

The last aspect is to put it all together as a TDSX file so it can be published together via the REST API.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s