Split function in BigQuery with examples
Contents
Split function in BigQuery
Split is a part of String function in BigQuery which helps to split the value based on given delimiter. The String and Bytes values can be used in this function.
- For String data type, the default delimiter is comma(,).
- For Bytes values, the delimit should be specified.
Syntax
1 |
SPLIT(value[, delimiter]) |
The function return data type is
- ARRAY of type STRING for string value
- If the given input string is empty, the function returns ARRAY with a single empty STRING.
- ARRAY of type BYTES for byte value
Examples
Let’s look at the SPLIT function with some examples in BigQuery. For these examples, we have created a table as customer_info in BigQuery. This table contains the personal details of bank customers.
As mentioned below, the table has information about customer such as customer_id, name, date_of_birth, address, email_id, phone_numer, state and zip_code.
Example 1:
The column “name” has both first and last name of the customer with the separator of comma (,). Let’s separate the first name and last name using SPLIT function.
1 2 3 4 |
select customer_id,name, split(name,','), date_of_birth from rc_fin_test_tables.customer_info; |
In the query, we have used SPLIT function for the column “name“. Also we mentioned the delimiter as comma (,). As a result, it returned the first name and last name in an ARRAY of STRING.
Example: Patricia,Hamilton is split into Patricia and Hamilton.
If we want to split the string into columns, we have to use the OFFSET (for zero-based indexes) or ORDINAL (for one-based indexes) operator. With that, we can fetch the specific value from the ARRAY.
SPLIT the string into columns using OFFSET
To get the first element from the array, we need to use the index value as 0 in OFFSET.
1 2 3 4 5 |
select customer_id,name, split(name,',') [offset(0)] as first_name, split(name,',') [offset(1)] as last_name, date_of_birth from rc_fin_test_tables.customer_info; |
SPLIT the string into columns using ORDINAL
To get the first element from the array, we need to use the index value as 1 in ORDINAL.
1 2 3 4 5 |
select customer_id,name, split(name,',') [ordinal(1)] as first_name, split(name,',') [ordinal(2)] as last_name, date_of_birth from rc_fin_test_tables.customer_info; |
As shown below, both the queries split the name and returned the first name and last name in a column.
Please note that, if the given index value is not found in the array, it will throw an error as “Array index is out of bounds (overflow)“. To avoid those errors, we can use SAFE_OFFSET or SAFE_ORDINAL function. Instead of error, it will return a NULL value for array index out of bounds.
1 2 3 4 5 |
select customer_id,name, split(name,',') [safe_offset(3)] as first_name, split(name,',') [safe_offset(3)] as last_name, date_of_birth from rc_fin_test_tables.customer_info; |
In the query, we have used the index value as 3 using safe_offset operator.
- The split function returns only two values(first, last name) in the array.
- It means that it can return the values only for index 0 and 1.
- Since there is no value for index 3 in array, the safe_offset operator returned the NULL values for both first_name and last_name columns.
Recommended Articles
- String REPLACE function in BigQuery with examples
- Regexp_extract function in BigQuery with examples
- How to flatten an array using UNNEST function in BigQuery?
- Substring function in BigQuery with examples
References from GCP official documentation