How to create hourly partitions in Hive table
Contents
Hourly partitions in Hive table
When we have large quantities of data, we look for partition column to improve the query execution time. In some cases, we may get a requirement to load the data every day or hour. For that, we can create a daily or hourly partition to improve the query processing in Hive. In this article, we will create hourly partition and insert the data into a hive table.
Create table with hourly partition in Hive
In this example, we are going to load every day transaction data with the time interval of 4 hour. So the partition columns are date and hour. Lets create the customer_txn table with partitions (txn_date and hour) as below
1 2 3 4 5 6 |
create table customer_txn (cust_id int, txn_type string, amount int, currency_type string) PARTITIONED BY (txn_date date,hour int); |
Alter table to add hourly partition in Hive
Consider that we need to load the data for the date ‘2021-07-12’ and hour is 4 AM. Lets add the partitions for the same in the customer_txn table.
1 |
ALTER TABLE customer_txn ADD PARTITION (txn_date = '2021-07-12',hour=4); |
The alter statement has created the hourly partition as below
Insert the data into hourly partition in Hive
Here we are using insert statement to load the data into hive table. Also the partition columns such as txn_date and hour are included in the insert statement.
1 2 3 4 5 6 7 8 |
INSERT INTO customer_txn PARTITION (txn_date = '2021-07-12',hour=4) values (803839,'Credit Card',550,'USD'); INSERT INTO customer_txn PARTITION (txn_date = '2021-07-12',hour=4) values (803021,'Debit Card',120,'USD'); |
After inserting the values, Lets check the data in the customer_txn table.
Get the location of the partitions in Hive
Using the show table statement, we can get the hdfs location of the partitioned table as below
1 |
hive> show table extended like 'customer_txn' partition (txn_date='2021-07-12',hour=4); |
The output of the show table extended statement provided the location of partitions with other file level details(number of files, file size)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
tab_name tableName:customer_txn owner:test_user_1 location:hdfs://revisit_class/apps/hive/warehouse/test_db.db/customer_txn/txn_date=2021-07-12/hour=4 inputformat:org.apache.hadoop.mapred.TextInputFormat outputformat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat columns:struct columns { i32 cust_id, string txn_type, i32 amount, string currency_type} partitioned:true partitionColumns:struct partition_columns { date txn_date, i32 hour} totalNumberFiles:2 totalFileSize:69 maxFileSize:35 minFileSize:34 lastAccessTime:1626619382324 lastUpdateTime:1626619382456 |
Drop the partitions in Hive
Lets say that we want to drop the partitions for the particular hour. This is can be done using the Alter table statement. As we inserted the data for the hour 4, we can drop this partition from the customer_txn table as below.
1 |
ALTER TABLE customer_txn DROP IF EXISTS PARTITION (txn_date ='2021-07-12',hour=4); |
It will remove the directory of txn_date=2021-07-12/hour=4 in the hdfs location of this table.
Recommended Articles