How to connect Hive using JDBC connection?
Contents
Hive JDBC
Hive allows the applications to connect to it using the JDBC driver. JDBC driver uses Thrift to communicate with the Hive Server. Here Thrift is a software framework that allows application to access Hive remotely. HiveServer is built on Apache Thrift (http://thrift.apache.org/), therefore it is sometimes called as Thrift server. To connect Hive using JDBC, we need the below details
- Hive JDBC URL
- Hive JDBC driver
- Kerberos config and keytab file details (if the Hadoop cluster is enabled with Kerberos authentication)
Hive JDBC url
For a secure cluster, Hive JDBC url is defined in the below format. The client needs to have a valid Kerberos ticket in the ticket cache before connecting.
jdbc:hive2://<host>:<port>/<db>;principal=<Server_Principal_of_HiveServer2>
- Host: The host name or IP address of the server hosting HiveServer2.
- Port: The port for the connection to the HiveServer2 instance. We can find the port number in the hive-site.xml file which is located in /etc/hive/conf. In this example, we have used port number 10005.
1 2 3 4 |
<property> <name>hive.server2.thrift.http.port</name> <value>10005</value> </property> |
If the cluster is secure with kerberos authentication enabled, we can find the principal value in the hive-site.xml.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
<property> <name>hive.server2.authentication</name> <value>KERBEROS</value> </property> <property> <name>hive.server2.authentication.kerberos.keytab</name> <value>/etc/security/keytabs/hive.service.keytab</value> </property> <property> <name>hive.server2.authentication.kerberos.principal</name> <value>hive/_HOST@RCSUPPORT.COM</value> </property> |
Hive JDBC driver
The Hive JDBC Driver enables users to connect with live Hive data, directly from any applications that support JDBC connectivity. For java application, we need to provide the Hive JDBC jar file which is specific to that hive version.
If it is a maven project, we just need to add the below dependency in the pom.xml. It will be downloaded automatically by maven.
1 2 3 4 5 |
<dependency> <groupId>org.apache.hive</groupId> <artifactId>hive-jdbc</artifactId> <version>1.2.1</version> </dependency> |
Otherwise we can directly download from repository website and add the same as dependency in the project. Please find the jars for each hive version in hortonworks repository.
Kerberos config and keytab file details
Kerberos is an authentication protocol which helps user to prove its identity to the various services. It stores configuration information in a file named as krb5.conf. The Hadoop user must have a Kerberos principal or keytab to obtain Kerberos credentials to be allowed to access the cluster and use the Hadoop services.
krb5.conf – The krb5.conf file contains Kerberos configuration information, including the locations of KDCs and admin servers for the Kerberos realms of interest, defaults for the current realm and for Kerberos applications, and mappings of hostnames onto Kerberos realms. Here is an example of kerberos config file
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
[libdefaults] renew_lifetime = 7d forwardable = true default_realm = RCSUPPORT.COM ticket_lifetime = 24h dns_lookup_realm = false dns_lookup_kdc = false default_ccache_name = /tmp/krb5cc_%{uid} [logging] default = FILE:/var/log/krb5kdc.log admin_server = FILE:/var/log/kadmind.log kdc = FILE:/var/log/krb5kdc.log [realms] RCSUPPORT.COM = { admin_server = lltt234.rcsupport.com kdc = lltt234.rcsupport.com } |
Keytab file – The user keytab file is created with the name of <username>.keytab. The principal name is added in this file and we have to use klist -kt <key_tab_file_name> command to see the values of this file.
1 2 3 4 5 6 |
klist -kt rc_corp_batch.keytab Keytab name: FILE:rc_corp_batch.keytab KVNO Timestamp Principal ---- ------------------- ------------------------------------------------------ 9 12/31/1969 16:00:00 rc_corp_batch@RCSUPPORT.COM |
Example principal : rc_corp_batch@RCSUPPORT.COM
Hive JDBC connection with Kerberos authentication using Java
Let’s write a simple java program to connect Hive using Kerberos authentication. In this program, we are going to get the number of records of a Hive table.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 |
import org.apache.hadoop.security.UserGroupInformation; import org.apache.hadoop.config.Configuration; public class TestHiveJdbc { public static void main(String[] args) throws SQLException { //Set the Hadoop UserGroupInfomation API with Kerberos authentication: System.setProperty("java.security.krb5.conf", "/x/home/rc_corp_batch/.keytabs/krb5.conf"); org.apache.hadoop.conf.Configuration conf = new org.apache.hadoop.conf.Configuration(); conf.set("hadoop.security.authentication", "Kerberos"); UserGroupInformation.setConfiguration(conf); //Call the Hadoop UserGroupInformation API: try { UserGroupInformation.loginUserFromKeytab("rc_corp_batch@RCSUPPORT.COM", "/x/home/rc_corp_batch/.keytabs/rc_corp_batch.keytab"); } catch (IOException e) { e.printStackTrace(); } try { Class.forName("org.apache.hive.jdbc.HiveDriver"); } catch (ClassNotFoundException e) { e.printStackTrace(); } //Access the Hive service Connection connection = DriverManager.getConnection("jdbc:hive2://rc-hive.vip.rccorp.com:10005/default;principal=hive/_HOST@RCSUPPORT.COM"); Statement statement = connection.createStatement(); ResultSet result = null; //Execute Hive Query try { result = statement.executeQuery("select count(*) from cust_db.orderDetails"); result.next(); System.out.println("Table contains "+result.getInt("_c0")+" rows"); } catch (Exception exception) { exception.printStackTrace(); } result.close(); statement.close(); connection.close(); } } |
Output:
Table contains 2 rows
Recommended Articles