How to set the different execution engine in Hive with examples

Posted on 28th June 20197th May 2022 by RevisitClass

Contents

1 Execution engine in Hive
2 Different types of Execution engine in Hive
3 Example to set the execution engine in Hive
- 3.1 Execution and output

Execution engine in Hive

Execution Engine used to communicate with Hadoop daemons such as Name node, Data nodes, and job tracker to execute the Hive query on top of Hadoop file system. It executes the execution plan created by the compiler.

Different types of Execution engine in Hive

Hive queries can run on three different kinds of execution engines and those are listed below

Map Reduce
Tez
Spark

Previously the default execution engine is Map Reduce(MR) in Hive. Now
Apache Tez replaces MapReduce as the default Hive execution engine. We can choose the execution engine by using the SET command as SET hive.execution.engine=tez;

If you want to change the execution engine for all the queries, you need to override the hive.execution.engine property in hive-site.xml file.

Map Reduce (MR)

If we choose the execution engine as MR, the query will be submitted as map reduce jobs. The number of mapper and reducers will be assigned and it will run in a traditional distributed way.

1	SET hive.execution.engine=mr;

TEZ execution engine

Apache Tez is application framework that build on top of Hadoop Yarn.
It is used for building high performance batch and interactive data processing applications. Tez improves query performance by using the expressions of directed acyclic graphs (DAGs) and data transfer primitives. It is an alternate of the traditional Mapreduce design in Hadoop.

1	SET hive.execution.engine=tez;

Spark execution engine

Spark execution engine is faster engine for running queries on Hive. It is used for large scale data processing. It overcomes the performance issue that are faced by MR and Tez engines.

1	SET hive.execution.engine=spark;

Example to set the execution engine in Hive

Lets write the hive queries in a file and set the execution engine only for that query.We have written the below queries in the test.hql file. Here we are using variable ${database} and setting the hive execution engine as tez. While we execute the queries, we need to pass the value for the variable using –hivevar option.

SET hive.execution.engine=tez;

USE ${database};

CREATE TABLE Employee(

Id int,

Name string);

INSERT INTO Employee

SELECT

id,

name

from History_details;

Execution and output

Since the queries are stored in a file, we need to use hive -f option as below to execute queries.Also we are using –hivevar option to pass the value to the database variable

hive -f <file_name> –hivevar <variable_name=value>

The hive queries are running in the Tez engine as we set the execution engine as Tez in the file.

hive -f test.hql --hivevar database=temp

log4j:WARN No such property [maxFileSize] in org.apache.log4j.DailyRollingFileAppender.

Logging initialized using configuration in file:/etc/hive/2.6.5.0-292/0/hive-log4j.properties
OK
Time taken: 1.411 seconds
OK
Time taken: 0.252 seconds
Query ID = revisit_class_20190628102218_7d0463b2-53b6-4510-b6a6-b02fa03a0ff3
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id application_1554473216483_1375648)

--------------------------------------------------------------------------------
        VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
--------------------------------------------------------------------------------
Map 1 ..........   SUCCEEDED      5          5        0        0       0       0
--------------------------------------------------------------------------------
VERTICES: 01/01  [==========================>>] 100%  ELAPSED TIME: 7.10 s
--------------------------------------------------------------------------------
Loading data to table temp.Employee
Table temp.Employee stats: [numFiles=5, numRows=12, totalSize=140, rawDataSize=90]
OK
Time taken: 9.831 seconds

hive -f test.hql --hivevar database=temp

log4j:WARN No such property [maxFileSize] in org.apache.log4j.DailyRollingFileAppender.

Logging initialized using configuration in file:/etc/hive/2.6.5.0-292/0/hive-log4j.properties

Time taken: 1.411 seconds

Time taken: 0.252 seconds

Query ID = revisit_class_20190628102218_7d0463b2-53b6-4510-b6a6-b02fa03a0ff3

Total jobs = 1

Launching Job 1 out of 1

Status: Running (Executing on YARN cluster with App id application_1554473216483_1375648)

--------------------------------------------------------------------------------

VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED

--------------------------------------------------------------------------------

Map 1 .......... SUCCEEDED 5 5 0 0 0 0

--------------------------------------------------------------------------------

VERTICES: 01/01 [==========================>>] 100% ELAPSED TIME: 7.10 s

--------------------------------------------------------------------------------

Loading data to table temp.Employee

Table temp.Employee stats: [numFiles=5, numRows=12, totalSize=140, rawDataSize=90]

Time taken: 9.831 seconds

Recommended Articles

Share your knowledge with other Developers in REVISIT CLASS.

How to set the different execution engine in Hive with examples

Execution engine in Hive

Different types of Execution engine in Hive

Map Reduce (MR)

TEZ execution engine

Spark execution engine

Example to set the execution engine in Hive

Execution and output

Leave a Reply Cancel reply

Tags

Recent Posts

Execution engine in Hive

Different types of Execution engine in Hive

Map Reduce (MR)

TEZ execution engine

Spark execution engine

Example to set the execution engine in Hive

Execution and output

Related Posts

Parsing Hive Create table query using Apache Hive library

How to check if the file or directory exists in HDFS?

Insert overwrite table values in Hive with examples

How to create hourly partitions in Hive table

How to write Group by and Order by query with column position number in Hive

Leave a Reply Cancel reply

Tags

Recent Posts