Coalesce function in Hive with examples
Contents
COALESCE function in Hive
Coalesce is a conditional function in Hive which is used to handle the NULL values. It accepts a number of arguments and returns the first non-NULL argument. If all the arguments are NULL, it will return the NULL value.
COALESCE function syntax
1 |
COALESCE(<value_1>, <value_2>, ...) |
- The arguments can be value, column name or expression which will be evaluated by COALESCE function.
Example
Let’s understand the COALESCE function with examples in Hive. We are giving two fruit names as arguments in the coalesce function. Since there is no null value, it returned the first non-null argument as favourite fruit.
Now the first argument is NULL and the second argument is orange. As mentioned earlier, Coalesce will return the first non-null argument. So the favourite fruit is orange.
If we give the NULL values in all the arguments, Coalesce function will return the NULL as output.
Similarly we can pass n number of arguments in Coalesce function to handle the null values.
Replace null value with default value using Coalesce
In this example, we will use the Coalesce function on the table. We have created a table called as user_subscriptions in Hive. This table contains the billing information of customers. It contains the below columns.
The table user_subscriptions has 3 records. As mentioned below, few row has NULL values in the columns card_number, home_address and office_address.
Query without Coalesce function
Let’s write a query to extract the first two digits from the card_number. This helps us to find the credit card issuer of each customer. We used substring function to get the card_issuer_id.
Since the user_id #44821 doesn’t have the card number, the query returned the NULL value. For other users, we got the first two digits from card_number.
Query with Coalesce function
We want to avoid the missing data in card_number field. Because it can affect the downstream system/product if they consumed this table. Example : Bias in Machine learning models
One solution would be replacing the NULL value with some default value. In this example, we are going to replace the NULL value with 0. Let’s add the Coalesce function on top of the substring function in the query.
1 2 3 |
select User_Id, coalesce(substr(Card_Number,1,2),0) as card_issuer_id from user_subscriptions; |
As we shown in the screenshot, the Coalesce function is replaced the NULL value with default value 0 for user_id #44821.
Replace missing data with another column
Consider that we want to get the user’s address from the same table user_subscriptions. There are two address columns in the table such as home_address and office_address.
Our goal is to get the address from the column office_address. If the office address is not present in the table, we need to get the address from the column home_address.
If we look at the table user_subscriptions, the user_id #38392 has NULL value in the column office_address. To handle the NULL value and get the result, we can use the COALESCE function with the arguments of office_address & home_address
1 2 3 |
select User_Id, coalesce(Office_Address,Home_Address) as mailing_address from user_subscriptions; |
As we shown above, Coalesce function has returned the address after evaluating the NULL values from the columns. For the user_id #38392, it returned the home_address as mailing address because the office_address value is NULL.
Recommended Articles