Utilizing Tableau Server’s Search Server in an Embedded Portal

Tableau Server 9.0 has an amazing search functionality — if you type in values into the top search box, it looks across everything available in the Tableau Server instantly and brings back results incredibly quickly. It’s not any particular secret that the Search & Browse process is powered by Apache Solr/Lucene . It’s a blazing fast piece of technology that supports a lot of the instantaneous feel in Tableau Server 9.0 (the portal and the REST API also use Solr).

I was asked recently how to do some of the same search functionality that exists in Tableau Server 9.0, but in an embedded portal. Some of it is possible via the REST API, while other requests would require opening up the PostgreSQL Repository. I wasn’t even sure some of the requests were possible — yet when I typed into the multi-search box, it seemed to be searching across all of the attributes we were looking to tap into.

I took a look at the ports Tableau Server uses, to see where the Search and Browse service might live. Sure enough, Port 11000 looked like a possibility. Digging into the logs a bit, I found some GET requests to a pattern:

/solr/project/select?q=*%3A*&fq=site_id%3A3&sort=id+asc

Combining everything together into

http://localhost:11000/solr/project/select?q=*%3A*&fq=site_id%3A3&sort=id+asc

I was a bit shocked to discover


<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">0</int>
<lst name="params">
<str name="q">*:*</str>
<str name="fq">site_id:3</str>
<str name="sort">id asc</str>
</lst>
</lst>
<result name="response" numFound="12" start="0">
<doc>
<int name="id">11</int>
<str name="luid">7f33bda4-f7c7-11e3-b52c-fb7449e1a05c</str>
<int name="site_id">3</int>
<int name="owner_id">19</int>
<str name="name">default</str>
<str name="owner_name">Bryant Howell</str>
<str name="owner_alias">bhowell</str>
<str name="owner_email">bhowell</str>
<str name="owner_domain">local</str>
<str name="description"/>
<bool name="controlled_permissions_enabled">false</bool>
<date name="created_time">2015-03-12T01:19:14.583Z</date>
<date name="modified_time">2015-03-12T01:19:14.583Z</date>
<arr name="denied_user_ids"><int>20</int></arr>
<int name="checksum">-1207487158</int>
<long name="_version_">1512300172092112896</long>
</doc>
<doc>
<int name="id">12</int>
<str name="luid">10aa15c1-1755-4762-afd2-f9247d25c33a</str>
<int name="site_id">3</int>
<int name="owner_id">19</int>
<str name="name">Internal Sandbox</str>
<str name="owner_name">Bryant Howell</str>
<str name="owner_alias">bhowell</str>
<str name="owner_email">bhowell</str>
<str name="owner_domain">local</str>
<str name="description"/>
<bool name="controlled_permissions_enabled">false</bool>
<date name="created_time">2015-03-12T03:08:29.036Z</date>
<date name="modified_time">2015-03-12T03:08:29.036Z</date>
<arr name="denied_user_ids"><int>20</int></arr>
<arr name="allowed_group_ids"><int>27</int><int>12</int></arr>
<arr name="allowed_leader_group_ids"><int>12</int></arr>
<int name="checksum">1718309931</int>
<long name="_version_">1512300172094210048</long>
</doc>
etc.

Needless to say, you can see how useful this information can be. A little deeper digging revealed there are the following keywords that you can put in the URL pattern in place of project:

  • favorite
  • group
  • project
  • site
  • user
  • view
  • workbook
  • unified_datasource
  • domain
  • system_user

What about the rest of the stuff in that URL? The q= and fq= parameters are part of the standard Solr syntax, which has a nicely written guide available . Interestingly, while it appears all Tableau Server generated queries use the site_id parameter, you don’t have to, and all results from all sites will come back. There are great use cases here for server administration, but it does lead to the Security Concerns section later on.

What are the IDs?

As the small sample of the results I posted reveal, there is an id result field that is different from the luid that is used by the REST API. My understanding is that there has long been an internal API to Tableau Server that referenced things using the original id field, and the luid is a newer concept (it is also the only way to specify things via the REST API). It appears the Solr search still relies mostly on the internal id .

How do you what the IDs stand for? Use the search!

The url

http://localhost:11000/solr/site/select?q=*&rows=100

has a result set like:


<doc>

<int name="id">1</int>

<str name="name">Default</str>

<str name="url_namespace"/>

<int name="checksum">708955710</int>

<long name="_version_">1512300168151564288</long>

</doc>

<doc>

<int name="id">3</int>

<str name="name">agency</str>

<str name="url_namespace">agency</str>

<int name="checksum">1814298721</int>

<long name="_version_">1512300168233353216</long>

</doc>

<doc>

<int name="id">4</int>

<str name="name">Test</str>

<str name="url_namespace">Test</str>

<int name="checksum">83335618</int>

<long name="_version_">1512300168237547520</long>

</doc>

Grab the ID that matches the name or the url_namespace (equivalent in REST API terms to the site_content_url) and you can make site specific searches. The same process can be accomplished on the other search terms as well (and they return the LUID, which means you can smoothly move from the Solr results to results from the REST API).

Security Concerns

This functionality is a great reminder of why you need to configure the firewalls and the trust on your Tableau Server. You don’t want to expose this functionality except to the other machines in a Tableau Server cluster, or to other trusted application servers within your network. Even within your corporate network, you should be blocking open access to the port that the search service runs on, or an intrepid and curious employee might gain access to information about users or content they are not supposed to have access to.

Access to the 11000 port for the searchserver service is limited not only by firewall restrictions, but also by a policy in the Apache Tomcat/Catalina server to only localhost. If you need to grant access to any other machines, the configuration file is found in C:\ProgramData\Tableau\Tableau Server\data\tabsvc\config\searchserver\server.xml .

By default, you will find section near the bottom that looks like


<Valve className="org.apache.catalina.valves.RemoteAddrValve"

allow="127\.0\.0\.1(%\d+)?|0:0:0:0:0:0:0:1(%\d+)?' />

This is a regular expression pattern that says which machines can have access. By default, it should allow localhost and any other machines in the cluster. If you remove this whole tag, it throws access wide open to any machine. To actually configure it appropriately to allow only certain other machines to access, you’ll have to add in the IP addresses using the correct format per the Apache Tomcat docs. Remember to save a copy of the original server.xml file before you make any modifications. Also, you will need to make a copy of the modified version in case any upgrade / configuration change rewrites your rules. Changes will only go into place when you restart the Tableau Server.

Your other option is to write your own pass-through service that runs on the Tableau Server on an unused port, that would access the Solr service via localhost but communicate to your other application server through that other port. We don’t usually recommend running anything additional on the Tableau Server, but a tiny service would be unlikely to affect performance much.

Advertisements

3 comments

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s