How to execute HiveQL queries using Shell script?
Contents
Introduction
Shell script can be used to run the Hive queries in batch mode. It will handle the input values/arguments , execution of queries and errors during the execution. In this tutorial, we will write the hive queries in a file. Then the file name is given as argument to the shell script.
Also we are using hive -f option and hivevar option for executing the hive queries using shell script
Hive query to run in batch mode
Lets consider that we have a below Hive query to run in batch mode. This is saved in a file with the name of input_hive_query.q. The query is inserting the customer order details into product_order_master table. Here the source table is customer_orders which present in the cust_DB database.
Also the variables ${database} and {run_date} are used in the Hive query. We need to set the values for these variables during the execution.
1 2 3 4 5 6 |
USE ${database}; insert overwrite table product_order_master select cust_id,product_id,product_name,price from cust_DB.customer_orders where order_date ={run_date}; |
Shell script to run the Hive query
The below shell script has three functions
- main () – The execution of the script is starts from here. It calls the other two functions one by one.
- setup() – It will process the arguments that we pass while executing the script. Also it set the values for the variables ( run_date & database ).
- run() – This function execute the hive query which is present in the file. The below command is used for the execution
1 |
hive --hivevar database=product_DB --hivevar run_date=2021-07-25 -v -f input_hive_query.q |
We saved the below shell script in the name of execute_hive.sh.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 |
#!/bin/bash usage="$(basename $0) -f <script> -d <batch_date>" # function setup to process the arguments and set the values for the variables setup() { unset BATCH_DATE unset HIVEQL_FILE while getopts "d:f:" arg do case ${arg} in d) BATCH_DATE=$(date --date="${OPTARG}" +%Y-%m-%d) if [[ $? -ne 0 ]] then echo "Invalid batch date ${BATCH_DATE}" echo "${usage}" exit 1 fi HIVEVAR_LIST="${HIVEVAR_LIST} --hivevar run_date=${BATCH_DATE}" ;; f) HIVEQL_FILE=${OPTARG} if [[ ! -f ${OPTARG} ]] then echo "File does not exist" echo "${HIVEQL_FILE}" exit 1 fi HIVEVAR_DATABASE="product_DB" HIVEVAR_LIST="${HIVEVAR_LIST} --hivevar database=${HIVEVAR_DATABASE}" ;; \?) echo "Invalid option: -${OPTARG}" echo "${usage}" exit 1 ;; esac done if [[ -z ${HIVEQL_FILE} ]] then echo "Missing hiveql file" echo "${usage}" exit 1 fi NOW=$(date +%Y%m%d%H%M%S) BASE_LOG_DIR="/x/home/user/revisit_class" LOG_FILE="${BASE_LOG_DIR}/hive_query.${NOW}.log" # -v means --verbose, Verbose mode (echo executed SQL to the console) HIVE_VERBOSE="-v" } # function run to execute the hive query run() { echo "hive ${HIVEVAR_LIST} ${HIVE_VERBOSE} -f ${HIVEQL_FILE} 2>&1" | tee -a ${LOG_FILE} hive ${HIVEVAR_LIST} ${HIVE_VERBOSE} -f ${HIVEQL_FILE} 2>&1 | tee -a ${LOG_FILE} if [[ ${PIPESTATUS[0]} -ne 0 ]] then echo " ********** Hive execution failed at `date` " exit 1 else echo " ********** Hive executed successfully at `date`." fi } main() { setup "$@" run } # Execution starts from here main "$@" |
Command to execute the shell script
We need two arguments to execute our shell script execute_hive.sh .
- HiveQL file name – The file name input_hive_query.q is given as a first argument with the name of -f.
- Batch date – The batch date is given as second argument with the name of -d.
sh execute_hive.sh -f input_hive_query.q -d ‘2021-07-25’
After we executing this command, the shell script process the arguments. Then it called hive command to execute the hive query. If there is any error in the execution, it will write the errors in the log file => /x/home/user/revisit_class/hive_query.${NOW}.log.
Recommended Articles
- How to use variables in Hive query?
- How to run one or more hive queries in a file using hive -f command?