How to connect Hive using JDBC connection?

Posted on 31st October 202114th May 2022 by RevisitClass

Contents

1 Hive JDBC

Hive JDBC

Hive allows the applications to connect to it using the JDBC driver. JDBC driver uses Thrift to communicate with the Hive Server. Here Thrift is a software framework that allows application to access Hive remotely. HiveServer is built on Apache Thrift (http://thrift.apache.org/), therefore it is sometimes called as Thrift server. To connect Hive using JDBC, we need the below details

Hive JDBC URL
Hive JDBC driver
Kerberos config and keytab file details (if the Hadoop cluster is enabled with Kerberos authentication)

Hive JDBC url

For a secure cluster, Hive JDBC url is defined in the below format. The client needs to have a valid Kerberos ticket in the ticket cache before connecting.

jdbc:hive2://<host>:<port>/<db>;principal=<Server_Principal_of_HiveServer2>

Host: The host name or IP address of the server hosting HiveServer2.
Port: The port for the connection to the HiveServer2 instance. We can find the port number in the hive-site.xml file which is located in /etc/hive/conf. In this example, we have used port number 10005.

<property>
    <name>hive.server2.thrift.http.port</name>
    <value>10005</value>
</property>

<name>hive.server2.thrift.http.port</name>

</property>

If the cluster is secure with kerberos authentication enabled, we can find the principal value in the hive-site.xml.

<property>
     <name>hive.server2.authentication</name>
     <value>KERBEROS</value>
   </property>

   <property>
     <name>hive.server2.authentication.kerberos.keytab</name>
     <value>/etc/security/keytabs/hive.service.keytab</value>
   </property>

   <property>
     <name>hive.server2.authentication.kerberos.principal</name>
     <value>hive/_HOST@RCSUPPORT.COM</value>
   </property>

<name>hive.server2.authentication</name>

<value>KERBEROS</value>

</property>

<name>hive.server2.authentication.kerberos.keytab</name>

<value>/etc/security/keytabs/hive.service.keytab</value>

</property>

<name>hive.server2.authentication.kerberos.principal</name>

<value>hive/_HOST@RCSUPPORT.COM</value>

</property>

Hive JDBC driver

The Hive JDBC Driver enables users to connect with live Hive data, directly from any applications that support JDBC connectivity. For java application, we need to provide the Hive JDBC jar file which is specific to that hive version.

If it is a maven project, we just need to add the below dependency in the pom.xml. It will be downloaded automatically by maven.

   <dependency>
         <groupId>org.apache.hive</groupId>
         <artifactId>hive-jdbc</artifactId>
         <version>1.2.1</version>
     </dependency>

<groupId>org.apache.hive</groupId>

</dependency>

Otherwise we can directly download from repository website and add the same as dependency in the project. Please find the jars for each hive version in hortonworks repository.

Kerberos config and keytab file details

Kerberos is an authentication protocol which helps user to prove its identity to the various services. It stores configuration information in a file named as krb5.conf. The Hadoop user must have a Kerberos principal or keytab to obtain Kerberos credentials to be allowed to access the cluster and use the Hadoop services.

krb5.conf – The krb5.conf file contains Kerberos configuration information, including the locations of KDCs and admin servers for the Kerberos realms of interest, defaults for the current realm and for Kerberos applications, and mappings of hostnames onto Kerberos realms. Here is an example of kerberos config file

[libdefaults]
  renew_lifetime = 7d
  forwardable = true
  default_realm = RCSUPPORT.COM
  ticket_lifetime = 24h
  dns_lookup_realm = false
  dns_lookup_kdc = false
  default_ccache_name = /tmp/krb5cc_%{uid}

[logging]
  default = FILE:/var/log/krb5kdc.log
  admin_server = FILE:/var/log/kadmind.log
  kdc = FILE:/var/log/krb5kdc.log

[realms]
  RCSUPPORT.COM = {
    admin_server = lltt234.rcsupport.com
    kdc = lltt234.rcsupport.com
  }

[libdefaults]

renew_lifetime = 7d

forwardable = true

default_realm = RCSUPPORT.COM

ticket_lifetime = 24h

dns_lookup_realm = false

dns_lookup_kdc = false

default_ccache_name = /tmp/krb5cc_%{uid}

[logging]

default = FILE:/var/log/krb5kdc.log

admin_server = FILE:/var/log/kadmind.log

kdc = FILE:/var/log/krb5kdc.log

[realms]

RCSUPPORT.COM = {

admin_server = lltt234.rcsupport.com

kdc = lltt234.rcsupport.com

}

Keytab file – The user keytab file is created with the name of <username>.keytab. The principal name is added in this file and we have to use klist -kt <key_tab_file_name> command to see the values of this file.

klist -kt rc_corp_batch.keytab

Keytab name: FILE:rc_corp_batch.keytab
KVNO Timestamp           Principal
---- ------------------- ------------------------------------------------------
   9 12/31/1969 16:00:00 rc_corp_batch@RCSUPPORT.COM

klist -kt rc_corp_batch.keytab

Keytab name: FILE:rc_corp_batch.keytab

KVNO Timestamp Principal

---- ------------------- ------------------------------------------------------

9 12/31/1969 16:00:00 rc_corp_batch@RCSUPPORT.COM

Example principal : rc_corp_batch@RCSUPPORT.COM

Hive JDBC connection with Kerberos authentication using Java

Let’s write a simple java program to connect Hive using Kerberos authentication. In this program, we are going to get the number of records of a Hive table.

import org.apache.hadoop.security.UserGroupInformation;
import org.apache.hadoop.config.Configuration;
public class TestHiveJdbc {
  
    public static void main(String[] args) throws SQLException {
        //Set the Hadoop UserGroupInfomation API with Kerberos authentication:
        System.setProperty("java.security.krb5.conf", "/x/home/rc_corp_batch/.keytabs/krb5.conf");
        org.apache.hadoop.conf.Configuration conf = new org.apache.hadoop.conf.Configuration();
        conf.set("hadoop.security.authentication", "Kerberos");
        UserGroupInformation.setConfiguration(conf);

       //Call the Hadoop UserGroupInformation API:
        try {
            UserGroupInformation.loginUserFromKeytab("rc_corp_batch@RCSUPPORT.COM", "/x/home/rc_corp_batch/.keytabs/rc_corp_batch.keytab");
        } catch (IOException e) {
            e.printStackTrace();
        }
        try {
            Class.forName("org.apache.hive.jdbc.HiveDriver");
        } catch (ClassNotFoundException e) {
            e.printStackTrace();
        }

       //Access the Hive service
        Connection connection = DriverManager.getConnection("jdbc:hive2://rc-hive.vip.rccorp.com:10005/default;principal=hive/_HOST@RCSUPPORT.COM");

        Statement statement = connection.createStatement();
        ResultSet result = null;

       //Execute Hive Query
        try {
            result = statement.executeQuery("select count(*) from cust_db.orderDetails");
            result.next();
            System.out.println("Table contains "+result.getInt("_c0")+" rows");
        } catch (Exception exception) {
            exception.printStackTrace();
        }
        result.close();

        statement.close();

        connection.close();
  }
}

import org.apache.hadoop.security.UserGroupInformation;

import org.apache.hadoop.config.Configuration;

public class TestHiveJdbc {

public static void main(String[] args) throws SQLException {

//Set the Hadoop UserGroupInfomation API with Kerberos authentication:

System.setProperty("java.security.krb5.conf", "/x/home/rc_corp_batch/.keytabs/krb5.conf");

org.apache.hadoop.conf.Configuration conf = new org.apache.hadoop.conf.Configuration();

conf.set("hadoop.security.authentication", "Kerberos");

UserGroupInformation.setConfiguration(conf);

//Call the Hadoop UserGroupInformation API:

try {

UserGroupInformation.loginUserFromKeytab("rc_corp_batch@RCSUPPORT.COM", "/x/home/rc_corp_batch/.keytabs/rc_corp_batch.keytab");

} catch (IOException e) {

e.printStackTrace();

}

try {

Class.forName("org.apache.hive.jdbc.HiveDriver");

} catch (ClassNotFoundException e) {

e.printStackTrace();

}

//Access the Hive service

Connection connection = DriverManager.getConnection("jdbc:hive2://rc-hive.vip.rccorp.com:10005/default;principal=hive/_HOST@RCSUPPORT.COM");

Statement statement = connection.createStatement();

ResultSet result = null;

//Execute Hive Query

try {

result = statement.executeQuery("select count(*) from cust_db.orderDetails");

result.next();

System.out.println("Table contains "+result.getInt("_c0")+" rows");

} catch (Exception exception) {

exception.printStackTrace();

}

result.close();

statement.close();

connection.close();

}

Output:

Table contains 2 rows

Recommended Articles

Share your knowledge with other Developers in REVISIT CLASS.

How to connect Hive using JDBC connection?

Hive JDBC

Hive JDBC url

Hive JDBC driver

Kerberos config and keytab file details

Hive JDBC connection with Kerberos authentication using Java

Leave a Reply Cancel reply

Tags

Recent Posts

Hive JDBC

Hive JDBC url

Hive JDBC driver

Kerberos config and keytab file details

Hive JDBC connection with Kerberos authentication using Java

Related Posts

Find list of databases/tables with specific pattern in Hive

Hive describe command to check the meta data of the Hive table

How to split the string based on pattern in Hive

How to set the different execution engine in Hive with examples

Insert into table in Hive with examples

Leave a Reply Cancel reply

Tags

Recent Posts