Full Localization of Tableau Workbooks

Tableau has a great whitepaper that explains all of the out of the box supported localization methods. But they are not exactly “seamless”; they take a lot of effort and still result in some quirks. In this post, I’ll explore how some simple modification to the Tableau XML files can provide a method to generate “fully” translated workbooks that are won’t look out of place even when using Web Edit or See Underlying Data.

To build a translatable workbook, follow these steps:

  1. Connect to your datasource. Do not rename any fields. Also do not create any calculations
  2. Publish the datasource to Tableau Server.
  3. Save the datasource as a .TDS file for later, in case you need to make changes.
  4. Connect to the published datasource on Tableau Server. Close the original local connection
  5. Save the workbook as a TWB

Now you have a workbook that is fully ready to be translated. Keep reading for how to make it work.

If you are one of the 10’s of regular readers here, you’ll know that TWB files are just XML encoded in UTF-8. We can open them up in a regular text editor and make changes to see what happens.

If you set the locale in Tableau Desktop, this actually introduces a locale parameter to the main workbook tag. You can modify theis programmatically when publishing out localized copies.


<workbook locale='en_GB' source-platform='win' version='9.1' xmlns:user='http://www.tableausoftware.com/xml/user'>

Data Source Captions

Workbooks have a set of datasources tags, within which each datasource is defined. All of the visible attributes that you can change about a Dimension or Measure within a datasource are defined in a set of column tags.

These are not the column tags inside a set of columns tags — so if you see something like

<columns header='yes' outcome='6'>
            <column datatype='integer' name='Row ID' ordinal='0' />
            <column datatype='string' name='Order ID' ordinal='1' />
            <column datatype='date' name='Order Date' ordinal='2' />
            <column datatype='date' name='Ship Date' ordinal='3' />
            <column datatype='string' name='Ship Mode' ordinal='4' />
</columns>

you are in the wrong place. The columns you are looking for come after

<aliases enabled='yes' />

They look something like

 <column aggregation="Count" datatype="integer" default-type="ordinal" layered="true" name="[IDCustomer (DimCustomer)]" pivot="key" role="dimension" type="ordinal" user-datatype="integer" visual-totals="Default">
      </column>
      <column aggregation="Count" caption="ID Customer" datatype="integer" default-type="ordinal" layered="true" name="[IDCustomer]" pivot="key" role="dimension" type="ordinal" user-datatype="integer" visual-totals="Default">
      </column>

You’ll notice that one of these has a caption attribute. When this is visible, it is was is displayed in the Tableau UI for the Measure or Dimension name. The name attribute actually matches up to the name defined for the field elsewhere in the XML. This is why our steps are to publish without making any changes to how Tableau connects to the data initially — we want to make sure our names match up. Additionally, some caption information stored in a published data source will override the captions in the workbook, so it’s safer to minimize those and keep all the things we want to change in the workbook.

Our process for translation is just to add or change the caption attribute with the new values we want to use. For the most thorough method, you’ll go through and do a Rename in the Tableau UI and give each Measure and Dimension a token that can be easily identified and stored in a translation table somewhere.

So you might rename Order Date to {Order Date}, then set up a table in your DB or a Unicode CSV like the following:

Token English German Russian Thai
{{{Order Date}}} Order Date Auftragsdatum Дата заказа วันสั่ง
{{{Sales}}} Sales Bestellungen заказы คำสั่งซื้อ

Then you just go through and replace the tokens in the caption attribute with the appropriate translation.

Changing other elements

There are all sorts of places with text within a workbook or a dashboard, for example:

  • Sheet Names
  • Sheet Titles (when different from name)
  • Filter Aliases
  • Text Blocks on a Dashboard

 

All of these can also be modified using the same technique, but since they live in the workbook portion of a TWB, they have not historically been the focus of the XML modification capabilities delivered by Tableau. Rather than actually treating the XML as XML and working with the elements directly, it may be easier simply to treat those portions of the file as text and do a simple FIND/REPLACE operation on the tokens. Make sure your tokens are totally unique for each element, then label them in a way that is easy for a find-replace to work.

A quick Python script to translate a whole document

If you are changing more than the data source captions, you might as well tokenize every element you want to translate, then just do a massive find/replace against the whole document. The following example script uses the amazing FlashText library to do the replacements in almost no time. You’ll notice I’m using a dictionary to define my tokens and values; you could instead run through a database or CSV and generate whatever data structure you like. (If using Python 3, you don’t need those strings to have the Unicode u’ prefix thankfully)

# -*- coding: utf-8 -*-
import codecs
from flashtext import KeywordProcessor
def localize():
    template_filename = 'Tokenized Template.twb'
    translated_filename = 'Tokenized - English.twb'
    translated_filename_ko = 'Tokenized - Korean.twb'
    keyword_processor = KeywordProcessor()
    token_dict = {
        u"{{{Sheet1}}}": u"Call Length By Agent",
        u"{{{ActivityIDDim}}}": u"Activity ID",
        u"{{{AgentIDDim}}}": u"Agent ID",
        u"{{{CallStatusDim}}}": u"Call Status",
        u"{{{TeamNameDim}}}": u"Team Name",
        u"{{{StatusByTeamSheet}}}": u"Status By Team",
        u"{{{Sheet1Title}}}": u"Call Length in Minutes",
        u"{{{Dashboard1}}}": u"Overview Dashboard"
    }
    for token in token_dict:
        keyword_processor.add_keyword(token, token_dict[token])

    o_ds_fh = codecs.open(template_filename, 'r', encoding='utf-8')
    new_file = keyword_processor.replace_keywords(o_ds_fh.read())
    o_ds_fh.close()
    ds_fh = codecs.open(translated_filename, 'w', encoding='utf-8')
    ds_fh.write(new_file)
    ds_fh.close()
    print("Finished translating English")

    keyword_processor = KeywordProcessor()
    token_dict = {
        u"{{{Sheet1}}}": u"상담원 별 통화 길이",
        u"{{{ActivityIDDim}}}": u"활동 ID",
        u"{{{AgentIDDim}}}": u"에이전트 ID",
        u"{{{CallStatusDim}}}": u"통화 상태",
        u"{{{TeamNameDim}}}": u"팀 이름",
        u"{{{StatusByTeamSheet}}}": u"팀 별 상태",
        u"{{{Sheet1Title}}}": u"상담원 별 통화 길이",
        u"{{{Dashboard1}}}": u"개요"
    }
    for token in token_dict:
        keyword_processor.add_keyword(token, token_dict[token])

    o_ds_fh = codecs.open(template_filename, 'r', encoding='utf-8')
    new_file = keyword_processor.replace_keywords(o_ds_fh.read())
    o_ds_fh.close()

    ds_fh = codecs.open(translated_filename_ko, 'w', encoding='utf-8')
    ds_fh.write(new_file)
    ds_fh.close()
    print("Finished translating Korean")

Using tableau_tools to change captions

This caption translation functionality is built into tableau_tools (see section 2.5.6 of the README), using the translate_captions method of the TableauColumns object, which is accessible via the TableauDatasource.columns property.

<pre>english_dict = {'{token1}': 'category', '{token2}': 'sub-category'}
german_dict = {'{token1}': 'Kategorie', '{token2}': 'Unterkategorie'}
tab_file = TableauFile('template_file.tds')
dses = tab_file.tableau_document.datasources  # type: list[TableauDatasource]
for ds in dses:
    ds.columns.translate_captions(english_dict)
new_eng_filename = tab_file.save_new_file(u'English Version')
# Reload template again
tab_file = TableauFile('template_file.tds')
dses = tab_file.tableau_document.datasources  # type: list[TableauDatasource]
for ds in dses:
    ds.columns.translate_captions(german_dict)
new_ger_filename = tab_file.save_new_file(u'German Version')</pre>

For other elements, you can just do a full text find/replace on the entirety of the TWB file.

Advertisements

One comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s