Sum function with Case expression in Hive
Contents
Sum() function in Hive
Sum is one of the Aggregate function that returns the sum of the values of the column in a group. The CASE expression check the conditions against column values and returns a values from the THEN clause. Hive supports aggregate functions such as count,min,max,avg,sum,..etc with the Case expression. So the aggregate functions can perform the calculations on the values which are returned from the CASE expression.
Syntax for Sum function with Case expression
1 2 3 4 5 6 |
SUM( CASE WHEN <condition_against_column_value> THEN <return_value_1> ELSE <return_value_2> END ) |
As mentioned above, the case expression should be mentioned inside the sum function. The Case expression contains the WHEN condition which is validated against the value of the column.
- If the condition is TRUE, it returns the value from the THEN clause.
- Otherwise it returns the value from the ELSE clause.
Example : SUM function in Hive
Lets understand the sum function with examples. Consider that we have a table called item_purchase in Hive. It contains the sales details of products. From that table, we need to find the total amount of sales for Electronics Category.
Lets write the Hive query using the SUM function. First we need to filter the product category of Electronics using the WHERE condition. Then we need to sum the values of amount column.
As we can see below, the sum function returns the total_amount as 2120 for the product category of Electronics.
Example : SUM function with CASE expression
For this example, we will take the student marks list table as below in Hive. This table contains the marks for four subjects such as Maths, Physics, Chemistry and Biology.
From this table, we want to find below details for each subjects.
- Number of Students who got equal or greater than 90 marks
- Number of Students who got equal or lesser than 65 marks
Using count() function with group by statement
If we don’t use sum with case expression, we need to write two queries as below
1 2 3 4 5 6 7 8 9 10 11 |
--equal or greater than 90 marks select subject,count(*) as greater_than_90 from student_results where marks>=90 group by subject; --equal or lesser than 65 marks select subject,count(*) as lesser_than_65 from student_results where marks<=65 group by subject; |
Output
In this query, we are using count() function with group by statement for subject.
If we see the student_results table, none of the students scored greater than 90 in Chemistry. So it is not included in the output.
Similarly none of the students scored below 65 in the subject of Biology. So we can’t see the count value for the Biology subject. But if we want the count value for all the subjects, we need to use sum function with case expression in Hive.
Using sum() function with case expression
If we use sum function with case expression, we can get the required result in single query as below.
1 2 3 4 5 |
select subject, sum(case when marks>=90 then 1 else 0 end) as greater_than_90, sum(case when marks<=65 then 1 else 0 end) as lesser_than_65 from student_results group by subject; |
As mentioned in the query, we have used the case expression as below to validate the marks. If the condition is TRUE, it will return 1. Otherwise it will return 0.
1 2 |
case when marks>=90 then 1 else 0 end case when marks<=65 then 1 else 0 end |
In addition, we used the sum function with the group by of subject column. So it will add the values which are returned from the case expression and give the final output as below.
Output
Finally we got the number of students who scored greater than 90 and lesser than 65 in Hive.
Recommended Articles