Substring function in Hive with examples
Contents
Hive substring :
Substring is a built-in string function in Hive which is used to extract a part of a string. In the hive sql, we can either specify substring or substr to get the required string from the column/value.
Substring syntax in Hive
1 2 3 |
substr(<input string/column>, int start, int length) (or) substring(<input string/column>, int start,int length) |
- <input string/column> – Here we need to mention the input string value or the column name in which we want to perform the substring operation.
- start – This is the starting position of the substring value.
- length – This is the optional argument which denotes the length/end position of the substring. If we didn’t mention the length, the function takes the remaining characters from the starting position.
Example 1 : Substring with input string in Hive
Lets look at the substring function with simple example.
- Input string -> Revisit Class
- Required substring -> Revisit
Here we are interested to take the substring as “Revisit” from the input string. When we compared the substring with the input string, we know that the starting position of the substring is 1 and length/end position of the substring is 7. Lets pass these values to substring function to get the required string in Hive.
Example 2 : Substring with column in Hive
There is a customer transaction table “cust_transaction” in hive as below. Lets write a hive sql to extract the substring from the account_type column.
As we can see that the account_type column has currency code with the account type values.Here the currency codes are USD,CAD,EUR and AUD. The account types are Saving,Checking and Term.
We want to extract the currency code from the account_type column. The starting position of the currency code is 1 and the length/ending position of the currency code is 3.
1 |
substr(account_type,1,3) |
As we can see the function returned the currency codes successfully. If we didn’t specify the length as 3 in substring, it will return the entire the string which is starting from position 1.
Example 3 : Get last few characters using substring
Hive allows to give the negative values as starting position in substring. It will return the characters from right to left. The length argument is optional here.
Example 4 : Instr function with substring in hive
In some cases, we don’t know the starting position of the substring. Also the starting position will change for each value in hive table. But we might have some common string/character present in that column. In that case, we can use instr function to get the starting position of the string which is used to substring function in hive.
Lets extract the account type alone without currency code from the customer transaction table.
1 |
instr(account_type,'-') = Returns the position of dash '-' in account type column |
1 2 |
substr(account_type, instr(account_type,'-')+1) = Returns the account type value after the dash character. |
Recommended Articles
Instr function in Hive with examples
Your Suggestions