How to set the different execution engine in Hive with examples
Contents
Execution engine in Hive
Execution Engine used to communicate with Hadoop daemons such as Name node, Data nodes, and job tracker to execute the Hive query on top of Hadoop file system. It executes the execution plan created by the compiler.
Different types of Execution engine in Hive
Hive queries can run on three different kinds of execution engines and those are listed below
- Map Reduce
- Tez
- Spark
Previously the default execution engine is Map Reduce(MR) in Hive. Now
Apache Tez replaces MapReduce as the default Hive execution engine. We can choose the execution engine by using the SET command as SET hive.execution.engine=tez;
If you want to change the execution engine for all the queries, you need to override the hive.execution.engine property in hive-site.xml file.
Map Reduce (MR)
If we choose the execution engine as MR, the query will be submitted as map reduce jobs. The number of mapper and reducers will be assigned and it will run in a traditional distributed way.
1 |
SET hive.execution.engine=mr; |
TEZ execution engine
Apache Tez is application framework that build on top of Hadoop Yarn.
It is used for building high performance batch and interactive data processing applications. Tez improves query performance by using the expressions of directed acyclic graphs (DAGs) and data transfer primitives. It is an alternate of the traditional Mapreduce design in Hadoop.
1 |
SET hive.execution.engine=tez; |
Spark execution engine
Spark execution engine is faster engine for running queries on Hive. It is used for large scale data processing. It overcomes the performance issue that are faced by MR and Tez engines.
1 |
SET hive.execution.engine=spark; |
Example to set the execution engine in Hive
Lets write the hive queries in a file and set the execution engine only for that query.We have written the below queries in the test.hql file. Here we are using variable ${database} and setting the hive execution engine as tez. While we execute the queries, we need to pass the value for the variable using –hivevar option.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
SET hive.execution.engine=tez; USE ${database}; CREATE TABLE Employee( Id int, Name string); INSERT INTO Employee SELECT id, name from History_details; |
Execution and output
Since the queries are stored in a file, we need to use hive -f option as below to execute queries.Also we are using –hivevar option to pass the value to the database variable
hive -f <file_name> –hivevar <variable_name=value>
The hive queries are running in the Tez engine as we set the execution engine as Tez in the file.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
hive -f test.hql --hivevar database=temp log4j:WARN No such property [maxFileSize] in org.apache.log4j.DailyRollingFileAppender. Logging initialized using configuration in file:/etc/hive/2.6.5.0-292/0/hive-log4j.properties OK Time taken: 1.411 seconds OK Time taken: 0.252 seconds Query ID = revisit_class_20190628102218_7d0463b2-53b6-4510-b6a6-b02fa03a0ff3 Total jobs = 1 Launching Job 1 out of 1 Status: Running (Executing on YARN cluster with App id application_1554473216483_1375648) -------------------------------------------------------------------------------- VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED -------------------------------------------------------------------------------- Map 1 .......... SUCCEEDED 5 5 0 0 0 0 -------------------------------------------------------------------------------- VERTICES: 01/01 [==========================>>] 100% ELAPSED TIME: 7.10 s -------------------------------------------------------------------------------- Loading data to table temp.Employee Table temp.Employee stats: [numFiles=5, numRows=12, totalSize=140, rawDataSize=90] OK Time taken: 9.831 seconds |
Recommended Articles
- How to run a Hive query from a file using hive -f command?
- How to run a Hive query from a command line using hive -e command?