Tableau Localization – Controlling the assigned contentUrl with non-ASCII (extended Latin or CJK) characters in content names

Even though Tableau Server stores the “full names” of published Workbooks, Datasources and Worksheets in Unicode exactly as published, the algorithm used to create the “contentUrl” used in the URL to reference the published content is based only on the ASCII English characters. The algorithm completely drops any non-ASCII character (and a few other non-URL safe characters). If any other existing contentUrl is found, an underscore and numeral is appended at the end.

In European languages with an expanded character set beyond the ASCII characters, the algorithm results in reduced but still distinctive contentUrls. For example (in Spanish):  “Años” -> “Aos”. The risk here, depending on language, is that there may be distinct content names that are reduced to the same contentUrl, resulting in the appending of the numeral to distinguish them like: “Aos” vs. “Aos_1”.

In Chinese, Korean and Japanese (and any other fully non-Latin character sets), the names are reduced to nothing, and then a random number is assigned: “销售运营分析仪表板”-> “_0” or “_1”.

Best Practices

  1. Always publish all Published Data Sources first to a Tableau Server, before publishing Workbooks
  2. Give unique name to all Content you publish, especially Data Sources. Tableau Server allows you to use the same name if Content is in different Projects on the same Site, but this will result in the contentUrls being given “_1” or “_923304292” type endings, with no way to control or change them other than through the names you choose at publish time
  3. If your content is named using non-Latin characters, add additional ASCII text to your names to ensure distinctive and controllable content URLs. This is especially important for Chinese, Japanese and Korean (CJK) text or other fully non-Latin scripts, but also might be necessary in Latin based scripts that frequently use diacritic marks on characters to distinguish (Vietnamese comes to mind)
    1. One solution for CJK languages: Add a transliteration (Pinyin, Hepburn, etc.) in parentheses after the name in the actual script. Alternatively add a unique numeric identifier after a Workbook or Data Source Name, and number your published Worksheets within a workbook.

Why It Matters

When using Published Data Sources, the XML of the Workbook actually references the contentUrl of the Data Source on Tableau Server. The contentUrl is the unique identifier embedded with the Workbook — there are no other details (like fullName or the numeric Data Source ID or the LUID) to help map to a particular Published Data Source.

When you publish content originally on one Site to another Site (either part of SLDC process or a republish when moving environments), if a Published Data Source exists on the new Site/Server with the same contentUrl as exists in the Workbook XML, Tableau Server automatically updates the Site ID reference in the Workbook XML during the publish process. This allows the Content Migration Tool, REST API and tabcmd to be used in programmatic publishing processes.

A second problem is that the since assigned contentUrls can be different based on publish order, and for entirely non-Latin names, only the assigned ending remains, it is very difficult to even know what the contentUrl is referencing. If you are making any hard-coded references to Viz URLs for loading them embedded into another application, this will break your Dev->Test->Prod publishing process, since references you are storing on Dev might point to a different Workbook entirely when the content is published to Prod. Following the best practices above will ensure that you are determining the contentUrl names in a way that will be the same on any Site/Server you publish to.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s