Explode function in Hive with examples
Contents
Explode function in Hive
Explode is a User Defined Table generating Function(UDTF) in Hive. It takes an array (or a map) as an input and outputs the elements of the array (or a map) as separate rows. UDTFs can be used in the SELECT expression list and as a part of LATERAL VIEW.
LATERAL VIEW statement is used with UDTF such as explode(). It creates a virtual table by applying the UDTF to each row of the base table and then joins resulting output rows to the input row.
Explode function syntax
1 2 3 |
select explode(<ARRAY>) from <table_name>; (or) select explode (<MAP>) from <table_name>; |
It will return n number of rows where n is the size of the array/map. This function represent each element of array/map as a row.
Lateral view syntax
1 2 |
lateralView: LATERAL VIEW udtf(expression) tableAlias AS columnAlias (',' columnAlias)* fromClause: FROM baseTable (lateralView)* |
Example
Consider that we have a table investment_report in Hive. It has the columns such as user_id, account_value and invested_stocks. The column invested_stocks is an array data type.
It has the invested stock details of users as below. Let’s use the explode function on this column.
Now we will write a select query with explode function. It will accept the one row as input and return multiple rows as output.
1 2 |
select explode(invested_stocks) as stock_name from investment_report; |
As mentioned above, the explode function flatten the array. The select query returned each stock name as a separate row. Similarly we can use the explode function for the map data type values.
Let’s try to add the column user_id in the select query. It is throwing the semantic exception [Error 10081] due to explode function in the select.
Lateral view with explode in Hive
Lateral view with explode function creates a virtual table for the flatten rows. Then we can apply the join to the base table.
1 2 3 4 |
select user_id, stock_column from investment_report lateral view explode(invested_stocks) stock_tble as stock_column; |
As per the syntax, we have added the lateral view statement with explode function after the from clause. The alias name stock_tble is given for this results. Similarly we have provided the column alias as stock_column.
1 |
lateral view explode(invested_stocks) stock_tble as stock_column; |
Now we can select the base table column user_id with exploded values as below.
If we use the Select * statement, it will include the lateral view results also. As we shown below, the column stock_column is included in the result set.
Let’s use the Where condition to filter the results based on the stock names. As shown below, we have fetched only Costco stock.
From this lateral view results, we can get the count of each stock names using group by and count.
1 2 3 4 5 |
select stock_column, count(1) as count from investment_report lateral view explode(invested_stocks) stock_tble as stock_column group by stock_column; |
If we have multiple array/map columns in the table, we can create the multiple lateral view statement as below.
multiple lateral view syntax
1 2 3 |
SELECT myCol1, myCol2 FROM baseTable LATERAL VIEW explode(col1) myTable1 AS myCol1 LATERAL VIEW explode(col2) myTable2 AS myCol2; |
Moreover, the join operation can be performed on the lateral view results. For that, we can write the lateral view statement as inner query. From the outer query, we can do the join on different tables.
Recommended Articles
References from Apache Hive documentation