Multiple Table (Normalized) Hyper Extracts

Starting in Tableau 2018.3+:

  • The design for row level security will be the same in both live connections and extracts
  • Extract files with security will create much faster
  • Best practices for entitlements tables are now feasible in Extracts

Let’s dig into the essentials and how we can make this work for effective Row Level Security.

Some options from Single Table Extract are not available

The option for Multiple Table extracts is available in the Extract dialog box starting in 2018.3 Beta 1. You will notice when you click it that several of the option boxes grey out:

  • Number of Rows options
    • Incremental Extract: The load times for Hyper extracts are so much faster than TDE that this shouldn’t be an issue even on large data sets
    • Top
    • Sample
  • Aggregation
  • Filters

Let’s talk about Filtering, because it has the biggest consequences.

  • INNER JOINs actually cause filtering in SQL, even if you don’t think of it that way. If Table A is INNER JOINed to Table B, where Table A has users [X, Y, Z] and Table B has users [R, Y], the result set is actually only going to have [Y] (this is why we represent it as the middle portion of a Venn diagram).
    • In a single table extract, this happens naturally at the database level because Tableau actually makes a query that includes the INNER JOIN clauses to build the extracts
    • In a multiple table extract, JOIN filtering does not happen. All tables are queried separately, without any JOIN clauses. The data will still be filtered to the end users, but it does mean you might be bringing in more data than you would have with a single table extract.
  • You cannot filter in the Extract dialog box on individual fields. The queries for retrieval are separated, so the amount of logic / number of decisions necessary to do the exact filtering you want would be very difficult to represent in these dialogs.

Conclusion: You will need to do your filtering yourself, in one of two ways

  • Convert the each table to Custom SQL and add the appropriate filter clauses
  • Put the filtering logic into a View in your database

These two solutions apply equally as well for Top or Sample queries. You have to know the appropriate way to make them happen for your database, but they are all possible (that’s how Tableau accomplishes them in the first place).

Best Practices for Row Level Security

Given the restrictions on filtering in the UI and the fact that you still want to minimize the number of tables you are bringing in for fast-performing JOINs, if you are using Multiple Table Extracts for a security filtering, our best recommendation is to work toward two “objects”, which will look like two tables in the data connection pane, but could be database views or Custom SQL queries.

Data Object

Think of the “Data Object” as a single View that joins together whatever tables you need to for the main data you are trying to show, including any filters to reduce out unnecessary data. Any filters you would have put in at the Single Table Extract creation screen, should be incorporated into the definition of this object.

Entitlements Object

The “Entitlements Object” should be a denormalized View of whatever entitlements are necessary to filter the data at the most granular level. In almost all cases, this means the View should contain:

  • A column for Username, containing the exact usernames in Tableau Server, and
  • A row for each level of the most granular entitlements to the Data Object.

What does “level of the most granular entitlements” mean? An illustration might help.

Let’s consider data about Sales that will be viewed by a Sales management team. There are many different levels in the Sales management team, but in our example, there are three elements as to what the different levels can see:

  1. Category: 3 total
  2. Sub-Category: 17 total, but each a sub-set of a single category
  3. Segment: 3 total, with any category possibly in any of the three segments

We have a top level user, “the big boss”, who can see everything. And then at the lowest level, there are sales managers who only see their particular set of Category-SubCategory-Segment.

To represent the “big boss”, we’ll need a row that represents each of those lowest levels. So the “big boss” will have 17*3 (all sub-categories * segments)

A “category manger” will have 1 category’s sub-category count * 3 segment, while a “sub-category manager” will have 3 rows per sub-category they covert. Notice this model allows us to have a sub-category manager over more than one sub-category — they just end up with more rows.

You might be thinking, well this seems to blow up the entitlements table and the answer to that is exactly. In a denormalized, single table extract, this would blow up the extract size to enormous proportions instantly. But in a multiple table extract, both tables stay the same size, no matter how big either one is. And owing to Hyper being really great at JOINing and filtering, the technique remains very fast even on large data sets. It’s actually exactly how we’ve always describing setting up a database for row level security, only it works in an extract.

If you’ve never set up entitlements this way before in a database, I’ll be working up a post soon to show an example of how these relationships can be represented, and what type of queries you can do to create the denormalized view you’ll want to use as the “entitlements object” in your Multiple Table Extract.

Have a more complex situation: Hyper API!

If you are hitting up against some of the limitations of generating a multi-table extract within Desktop or Server, you should look to the Hyper API . While it is technically the third version of an Extract API, the Hyper API actually allows for full SQL access to any Hyper file, so you can update, insert, transform and read more effectively. You can create multiple tables, and load directly from a CSV source in almost no time.

Leave a comment