Full Localization of Tableau Workbooks

Tableau has a great whitepaper that explains all of the out of the box supported localization methods. But they are not exactly “seamless”; they take a lot of effort and still result in some quirks. In this post, I’ll explore how some simple modification to the Tableau XML files can provide a method to generate “fully” translated workbooks that are won’t look out of place even when using Web Edit or See Underlying Data.

To build a translatable workbook, follow these steps:

  1. Connect to your datasource. Do not rename any fields. Also do not create any calculations
  2. Publish the datasource to Tableau Server.
  3. Save the datasource as a .TDS file for later, in case you need to make changes.
  4. Connect to the published datasource on Tableau Server. Close the original local connection
  5. Save the workbook as a TWB

Now you have a workbook that is fully ready to be translated. Keep reading for how to make it work.

If you are one of the 10’s of regular readers here, you’ll know that TWB files are just XML encoded in UTF-8. We can open them up in a regular text editor and make changes to see what happens.

If you set the locale in Tableau Desktop, this actually introduces a locale parameter to the main workbook tag. You can modify theis programmatically when publishing out localized copies.


<workbook locale='en_GB' source-platform='win' version='9.1' xmlns:user='http://www.tableausoftware.com/xml/user'>

Data Source Captions

Workbooks have a set of datasources tags, within which each datasource is defined. All of the visible attributes that you can change about a Dimension or Measure within a datasource are defined in a set of column tags.

These are not the column tags inside a set of columns tags — so if you see something like

<columns header='yes' outcome='6'>
            <column datatype='integer' name='Row ID' ordinal='0' />
            <column datatype='string' name='Order ID' ordinal='1' />
            <column datatype='date' name='Order Date' ordinal='2' />
            <column datatype='date' name='Ship Date' ordinal='3' />
            <column datatype='string' name='Ship Mode' ordinal='4' />
</columns>

you are in the wrong place. The columns you are looking for come after

<aliases enabled='yes' />

They look something like

 <column aggregation="Count" datatype="integer" default-type="ordinal" layered="true" name="[IDCustomer (DimCustomer)]" pivot="key" role="dimension" type="ordinal" user-datatype="integer" visual-totals="Default">
      </column>
      <column aggregation="Count" caption="ID Customer" datatype="integer" default-type="ordinal" layered="true" name="[IDCustomer]" pivot="key" role="dimension" type="ordinal" user-datatype="integer" visual-totals="Default">
      </column>

You’ll notice that one of these has a caption attribute. When this is visible, it is was is displayed in the Tableau UI for the Measure or Dimension name. The name attribute actually matches up to the name defined for the field elsewhere in the XML. This is why our steps are to publish without making any changes to how Tableau connects to the data initially — we want to make sure our names match up. Additionally, some caption information stored in a published data source will override the captions in the workbook, so it’s safer to minimize those and keep all the things we want to change in the workbook.

Our process for translation is just to add or change the caption attribute with the new values we want to use. For the most thorough method, you’ll go through and do a Rename in the Tableau UI and give each Measure and Dimension a token that can be easily identified and stored in a translation table somewhere.

So you might rename Order Date to {Order Date}, then set up a table in your DB or a Unicode CSV like the following:

Token English German Russian Thai
{{{Order Date}}} Order Date Auftragsdatum Дата заказа วันสั่ง
{{{Sales}}} Sales Bestellungen заказы คำสั่งซื้อ

Then you just go through and replace the tokens in the caption attribute with the appropriate translation.

Changing other elements

There are all sorts of places with text within a workbook or a dashboard, for example:

  • Sheet Names
  • Sheet Titles (when different from name)
  • Filter Aliases
  • Text Blocks on a Dashboard

 

All of these can also be modified using the same technique, but since they live in the workbook portion of a TWB, they have not historically been the focus of the XML modification capabilities delivered by Tableau. Rather than actually treating the XML as XML and working with the elements directly, it may be easier simply to treat those portions of the file as text and do a simple FIND/REPLACE operation on the tokens. Make sure your tokens are totally unique for each element, then label them in a way that is easy for a find-replace to work.

One thing to watch out for is that if your translated elements have any quotation marks (either single or double quotes), these must be encoded to be stored in the attribute values of an XML element.

For example, if your dashboard is named Bryant’s Boss’s Dashboard, this will be stored like


dashboard='Bryant's Boss's Dashboard'

So make sure to do the correct XML quoting prior to inserting any of your values. Or don’t use any quotes or apostrophes in any of the text you are substituting in, but that can be quite a challenge.

Actions require additional actions

Just when you think you’ve got it all figured out, there is one more place that requires its own set of rules. Dashboard and Worksheet Actions involve a section that is URL Encoded, including transforming any Unicode text into the % encodings. Here’s an example of what I’m talking about, where there is a sheet named 대시 보드 (the Google Translate into Korean of “Great Dashboard”). When you look at the link element in the expression attribute, there is a big long expression that I promise includes that sheet name:


<action caption='Filter 1 (generated)' name='[Action1]'>
<activation auto-clear='true' type='on-select' />
<source dashboard='대시 ' type='sheet' worksheet='My Darshbord's Fantabulous' />
<link caption='Filter 1 (generated)' delimiter=',' escape='\' expression='tsl:%EB%8C%80%EC%8B%9C%20?%5Bfederated.0sszrd31rd2h7c10h6udq0qnog6y%5D.%5BProduct%20ID%5D~s0=<[Category]~na>' include-null='true' multi-select='true' url-escape='true' />
<command command='tsc:tsl-filter'>
<param name='target' value='대시 ' />
</command>

That’s right, in this, the bolded portion represents 대시 보드 ! :

expression=’tsl:%EB%8C%80%EC%8B%9C%20?%5Bfederated.0sszrd31rd2h7c10h6udq0qnog6y%5D.%5BProduct%20ID%5D~s0=<[Category]~na>’

So to correctly replace your tokenized version (which should look like {{{tokenASCIICharactersOnly}}} ), you need to use the URL encoding of the Unicode text to insert in there.

A quick Python script to translate a whole document

If you are changing more than the data source captions, you might as well tokenize every element you want to translate, then just do a massive find/replace against the whole document. The following example script uses the amazing FlashText library to do the replacements in almost no time. You’ll notice I’m using a dictionary to define my tokens and values; you could instead run through a database or CSV and generate whatever data structure you like. (If using Python 3, you don’t need those strings to have the Unicode u’ prefix thankfully)

# -*- coding: utf-8 -*-
import codecs
from flashtext import KeywordProcessor
def localize():
    template_filename = 'Tokenized Template.twb'
    translated_filename = 'Tokenized - English.twb'
    translated_filename_ko = 'Tokenized - Korean.twb'
    keyword_processor = KeywordProcessor()
    token_dict = {
        u"{{{Sheet1}}}": u"Call Length By Agent",
        u"{{{ActivityIDDim}}}": u"Activity ID",
        u"{{{AgentIDDim}}}": u"Agent ID",
        u"{{{CallStatusDim}}}": u"Call Status",
        u"{{{TeamNameDim}}}": u"Team Name",
        u"{{{StatusByTeamSheet}}}": u"Status By Team",
        u"{{{Sheet1Title}}}": u"Call Length in Minutes",
        u"{{{Dashboard1}}}": u"Overview Dashboard"
    }
    for token in token_dict:
        keyword_processor.add_keyword(token, token_dict[token])

    o_ds_fh = codecs.open(template_filename, 'r', encoding='utf-8')
    new_file = keyword_processor.replace_keywords(o_ds_fh.read())
    o_ds_fh.close()
    ds_fh = codecs.open(translated_filename, 'w', encoding='utf-8')
    ds_fh.write(new_file)
    ds_fh.close()
    print("Finished translating English")

    keyword_processor = KeywordProcessor()
    token_dict = {
        u"{{{Sheet1}}}": u"상담원 별 통화 길이",
        u"{{{ActivityIDDim}}}": u"활동 ID",
        u"{{{AgentIDDim}}}": u"에이전트 ID",
        u"{{{CallStatusDim}}}": u"통화 상태",
        u"{{{TeamNameDim}}}": u"팀 이름",
        u"{{{StatusByTeamSheet}}}": u"팀 별 상태",
        u"{{{Sheet1Title}}}": u"상담원 별 통화 길이",
        u"{{{Dashboard1}}}": u"개요"
    }
    # Fix any quotes that might be in your elements
    for value in token_dict.values():
        value = quoteattr(value)
    for token in token_dict:
        keyword_processor.add_keyword(token, token_dict[token])
    
    # honestly you might need to make two passes here, one that only grabs action lines and puts the
    # urlencoded values, then another that replaces all the other instances in the file
    # This example only implements the simple find / replace
    o_ds_fh = codecs.open(template_filename, 'r', encoding='utf-8')
    new_file = keyword_processor.replace_keywords(o_ds_fh.read())
    o_ds_fh.close()

    ds_fh = codecs.open(translated_filename_ko, 'w', encoding='utf-8')
    ds_fh.write(new_file)
    ds_fh.close()
    print("Finished translating Korean")

Using tableau_tools to change captions

This caption translation functionality is built into tableau_tools (see section 2.5.6 of the README), using the translate_captions method of the TableauColumns object, which is accessible via the TableauDatasource.columns property.

<pre>english_dict = {'{token1}': 'category', '{token2}': 'sub-category'}
german_dict = {'{token1}': 'Kategorie', '{token2}': 'Unterkategorie'}
tab_file = TableauFile('template_file.tds')
dses = tab_file.tableau_document.datasources  # type: list[TableauDatasource]
for ds in dses:
    ds.columns.translate_captions(english_dict)
new_eng_filename = tab_file.save_new_file(u'English Version')
# Reload template again
tab_file = TableauFile('template_file.tds')
dses = tab_file.tableau_document.datasources  # type: list[TableauDatasource]
for ds in dses:
    ds.columns.translate_captions(german_dict)
new_ger_filename = tab_file.save_new_file(u'German Version')</pre>

For other elements, you can just do a full text find/replace on the entirety of the TWB file.

One comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s