How to create hourly partitions in Hive table

Posted on 18th July 202114th May 2022 by RevisitClass

Contents

1 Hourly partitions in Hive table

Hourly partitions in Hive table

When we have large quantities of data, we look for partition column to improve the query execution time. In some cases, we may get a requirement to load the data every day or hour. For that, we can create a daily or hourly partition to improve the query processing in Hive. In this article, we will create hourly partition and insert the data into a hive table.

Create table with hourly partition in Hive

In this example, we are going to load every day transaction data with the time interval of 4 hour. So the partition columns are date and hour. Lets create the customer_txn table with partitions (txn_date and hour) as below

create table customer_txn
(cust_id int,
txn_type string,
amount int,
currency_type string)
PARTITIONED BY (txn_date date,hour int);

create table customer_txn

(cust_id int,

txn_type string,

amount int,

currency_type string)

PARTITIONED BY (txn_date date,hour int);

Create table with hourly partition in Hive

Alter table to add hourly partition in Hive

Consider that we need to load the data for the date ‘2021-07-12’ and hour is 4 AM. Lets add the partitions for the same in the customer_txn table.

ALTER TABLE customer_txn ADD PARTITION (txn_date = '2021-07-12',hour=4);

1	ALTER TABLE customer_txn ADD PARTITION (txn_date = '2021-07-12',hour=4);

The alter statement has created the hourly partition as below

Alter table to add hourly partition in Hive

Insert the data into hourly partition in Hive

Here we are using insert statement to load the data into hive table. Also the partition columns such as txn_date and hour are included in the insert statement.


INSERT INTO customer_txn
PARTITION (txn_date = '2021-07-12',hour=4)
values (803839,'Credit Card',550,'USD');

INSERT INTO customer_txn
PARTITION (txn_date = '2021-07-12',hour=4)
values (803021,'Debit Card',120,'USD');

INSERT INTO customer_txn

PARTITION (txn_date = '2021-07-12',hour=4)

values (803839,'Credit Card',550,'USD');

INSERT INTO customer_txn

PARTITION (txn_date = '2021-07-12',hour=4)

values (803021,'Debit Card',120,'USD');

After inserting the values, Lets check the data in the customer_txn table.

Select values from hourly partitioned table in hive

Get the location of the partitions in Hive

Using the show table statement, we can get the hdfs location of the partitioned table as below

hive> show table extended like 'customer_txn' partition (txn_date='2021-07-12',hour=4);

1	hive> show table extended like 'customer_txn' partition (txn_date='2021-07-12',hour=4);

The output of the show table extended statement provided the location of partitions with other file level details(number of files, file size)

tab_name
tableName:customer_txn
owner:test_user_1
location:hdfs://revisit_class/apps/hive/warehouse/test_db.db/customer_txn/txn_date=2021-07-12/hour=4
inputformat:org.apache.hadoop.mapred.TextInputFormat
outputformat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
columns:struct columns { i32 cust_id, string txn_type, i32 amount, string currency_type}
partitioned:true
partitionColumns:struct partition_columns { date txn_date, i32 hour}
totalNumberFiles:2
totalFileSize:69
maxFileSize:35
minFileSize:34
lastAccessTime:1626619382324
lastUpdateTime:1626619382456

tab_name

tableName:customer_txn

owner:test_user_1

location:hdfs://revisit_class/apps/hive/warehouse/test_db.db/customer_txn/txn_date=2021-07-12/hour=4

inputformat:org.apache.hadoop.mapred.TextInputFormat

outputformat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat

columns:struct columns { i32 cust_id, string txn_type, i32 amount, string currency_type}

partitioned:true

partitionColumns:struct partition_columns { date txn_date, i32 hour}

totalNumberFiles:2

totalFileSize:69

maxFileSize:35

minFileSize:34

lastAccessTime:1626619382324

lastUpdateTime:1626619382456

Drop the partitions in Hive

Lets say that we want to drop the partitions for the particular hour. This is can be done using the Alter table statement. As we inserted the data for the hour 4, we can drop this partition from the customer_txn table as below.

ALTER TABLE customer_txn DROP IF EXISTS PARTITION (txn_date ='2021-07-12',hour=4);

1	ALTER TABLE customer_txn DROP IF EXISTS PARTITION (txn_date ='2021-07-12',hour=4);

It will remove the directory of txn_date=2021-07-12/hour=4 in the hdfs location of this table.

Recommended Articles

Share your knowledge with other Developers in REVISIT CLASS.

How to create hourly partitions in Hive table

Hourly partitions in Hive table

Create table with hourly partition in Hive

Alter table to add hourly partition in Hive

Insert the data into hourly partition in Hive

Get the location of the partitions in Hive

Drop the partitions in Hive

Leave a Reply Cancel reply

Tags

Recent Posts

Hourly partitions in Hive table

Create table with hourly partition in Hive

Alter table to add hourly partition in Hive

Insert the data into hourly partition in Hive

Get the location of the partitions in Hive

Drop the partitions in Hive

Related Posts

How to count the number of lines in a HDFS file?

How to create a database at specific location in Hive?

Instr function in Hive with examples

Length function in Hive with examples

Get the create table statement for an existing Hive table using Show create table

Leave a Reply Cancel reply

Tags

Recent Posts