In this blog, I will be describing how to configure Tableau 10 Desktop connection to Apache Drill and explore Hive or HBase instantly on Hadoop. By using the combined feature of these tools, we are convenient to get direct access on semi-structured data such as key-value format and even document storage, without having to rely on the predefined data schema creation. Apache Drill is an important component to bridge Tableau Desktop with
To use Apache Drill with Tableau 10 Desktop, we will need to complete the following steps:
- Install Apache Drill over Hadoop
- Configure Hive Storage Plugin
- Configure HBase Storage Plugin
- Install the Drill ODBC driver from MapR and configure ODBC
- Connect to HBase and Hive with Tableau and Drill and show data
Install the Apache Drill on Hadoop
To install Apache Drill, you can refer to https://drill.apache.org/docs/install-drill-introduction/, where it has provided detailed steps. Note: a). Please install Drill on one node of Hadoop. 2). Hive and HBase Storage Configuration need to identified by cluster id.
After the installation, now you can Start the Drill:
When the Drill daemon get started, you’re able to access Drill web console: http://localhost:8047/
Configure Hive Storage Plugin
To update the Hive storage plugin, you can select the Storage tab on the Drill Web Console.
From the list of disabled storage plugins in the Drill Web Console, click Update next to hive. The Hive metastore runs as a separate service outside of Hive. Drill can query the Hive metastore through Thrift. The metastore service communicates with the Hive database over JDBC. You can get metastore information from hive configuration file.
NOTE: Verify that the Hive metastore service is running before you register the Hive metastore.
Here is hive configuration in Dill:
Configure HBase Storage Plugin
To use Apache Drill connect to a HBase data source with the HBase storage plugin, you need to specify a ZooKeeper quorum. In the Web Console, select the Storage tab, and then click the Update button for the hbase storage plugin configuration. The following shows a typical HBase storage plugin:
Note: please add configuration “zookeeper.znode.parent”: “/hbase-unsecure”, otherwise we will run into error:
The node /hbase is not in ZooKeeper. It should have been written by the master. Check the value configured in ‘zookeeper.znode.parent’. There could be a mismatch with the one configured in the master.
Here is hbase configuration in Dill:
Install the Drill ODBC driver from MapR and configure ODBC
You can refer drill ODBC driver here: https://drill.apache.org/docs/installing-the-odbc-driver/. It is supposed to install in same desktop of Tableau. We need to configure ODBC connection as following:
Query HBase and Hive with Tableau and Drill
Now we already have those tables in HBase and Hive. We can also browse hbase and hive tables by Drill Explorer.
Let’s go to Tableau Desktop, and configure ODBC to Drill.
Note: When query HBase tables, there is a jar conflict error, we need to replace Guava 18 jar with Guava 16 jar in apache-drill-1.9.0/jars/3rdparty folder. You can find Guava 16 jar here: https://github.com/google/guava/wiki/Release16.
SYSTEM ERROR: IllegalAccessError: tried to access method com.google.common.base.Stopwatch.<init>()V from class org.apache.hadoop.hbase.zookeeper.MetaTableLocator.
The issue is reported in https://issues.apache.org/jira/browse/DRILL-4931.
Reference
https://drill.apache.org/