How to execute a Select query on BigQuery using Java?

Posted on 28th April 202217th August 2022 by RevisitClass

Contents

1 BigQuery

BigQuery

BigQuery is a fully managed and serverless datawarehouse system. It is part of Google Cloud Platform. It can process the massive amounts of data and provide the result quickly. In this tutorial, we are going to read the tables from BigQuery dataset using Java program. For this example, we are using the below SELECT query.

SELECT commit, author, repo_name

FROM `bigquery-public-data.github_repos.commits`

WHERE subject like '%bigquery%'

ORDER BY subject DESC LIMIT 10

BigQuery Job

The unit of work in BigQuery is called as job. It can be load data, export data, query data, or copy data. If the job is created programmatically, BigQuery execute it asynchronously and can be polled for the status.

Prerequisite to run a BigQuery job using Java

Project Id : To use the Google Cloud Platform, we need to create a project in GCP. Then GCP gives the project id which is a unique string used to differentiate our project from all others in Google Cloud.
Service Account : A service account is a Google Account that is associated with our Google Cloud project. To access the BigQuery API using program/application, we need to create a service account. Then to access our project, we need to grant the certain roles to that service account.
Service Account Key: For the service account, we need to create a key which is used in the program as a service account credential. BigQuery verify the client’s identity using the service account key.

Java program to execute a Select query on BigQuery:

To access the BigQuery, we need to install Google Cloud BigQuery Client libraries in our program. So we are creating the maven project as below.

Step 1: Set the client libraries in pom.xml

<!-- Using libraries-bom to manage versions.

See https://github.com/GoogleCloudPlatform/cloud-opensource-java/wiki/The-Google-Cloud-Platform-Libraries-BOM -->

<groupId>com.google.cloud</groupId>

<artifactId>libraries-bom</artifactId>

<scope>import</scope>

</dependency>

</dependencies>

</dependencyManagement>

<groupId>com.google.cloud</groupId>

<artifactId>google-cloud-bigquery</artifactId>

</dependency>

</dependencies>

Step 2 : Import the Google Cloud BigQuery libraries in the program

import com.google.cloud.bigquery.BigQuery;

import com.google.cloud.bigquery.BigQueryOptions;

import com.google.cloud.bigquery.FieldValueList;

import com.google.cloud.bigquery.Job;

import com.google.cloud.bigquery.JobId;

import com.google.cloud.bigquery.JobInfo;

import com.google.cloud.bigquery.QueryJobConfiguration;

import com.google.cloud.bigquery.TableResult;

import java.util.UUID;

Step 3 : Set the Service Account key credential

As we mentioned earlier, key should be created for the service account. From GCP console, we can download that key in json file format. The same json file is used in the program to set the credential. The path of the json file is given in the code.

// Path of the Service Account key file

String jsonPath = "D:\\Learning\\GCP\\my-rcs-project-833123-ef45632b1b12.json";

//Set the Service account key as credential

Credentials credentials = GoogleCredentials

.fromStream(new FileInputStream(jsonPath));

Step 4 : Initialize a BigQuery client

In this step, we are initializing the BigQuery client using our project id and service account credential.

//Initialize BigQuery Client by setting project id and credential

BigQuery bigquery = BigQueryOptions.newBuilder()

.setProjectId("my-rcs-project-833123")

.setCredentials(credentials)

.build().getService();

Step 5 : Define the query with a QueryJobConfiguration

Next we need to define our query in the QueryJobConfiguration as below. In this example, we are querying from bigquery public dataset bigquery-public-data.github_repos.commits .

QueryJobConfiguration queryConfig =

QueryJobConfiguration.newBuilder(

"SELECT commit, author, repo_name "

+ "FROM `bigquery-public-data.github_repos.commits` "

+ "WHERE subject like '%bigquery%' "

+ "ORDER BY subject DESC LIMIT 10")

.build();

Step 6 : Start a Bigquery job

In this step, we are creating a job id and starting the BigQuery job with the BigQuery.create() method. The QueryJobConfiguration is passed to this method.

// Create a job ID so that we can safely retry.

JobId jobId = JobId.of(UUID.randomUUID().toString());

Job queryJob = bigquery.create(JobInfo.newBuilder(queryConfig).setJobId(jobId).build());

Step 7 : Wait for the query to complete

1 2	// Wait for the query to complete. queryJob = queryJob.waitFor();

Step 8 : Check the errors

The below code used to check the errors in the BigQuery job.

// Check for errors

if (queryJob == null) {

throw new RuntimeException("Job no longer exists");

} else if (queryJob.getStatus().getError() != null) {

// You can also look at queryJob.getStatus().getExecutionErrors() for all

// errors, not just the latest one.

throw new RuntimeException(queryJob.getStatus().getError().toString());

}

Step 9 : Get the results

1 2	// Get the results. TableResult result = queryJob.getQueryResults();

Step 10 : Print the results

Here we are iterating the each row and printing the same in the output screen.

// Print all pages of the results.

for (FieldValueList row : result.iterateAll()) {

// String type

String commit = row.get("commit").getStringValue();

// Record type

FieldValueList author = row.get("author").getRecordValue();

String name = author.get("name").getStringValue();

String email = author.get("email").getStringValue();

// String Repeated type

String repoName = row.get("repo_name").getRecordValue().get(0).getStringValue();

System.out.printf(

"Repo name: %s Author name: %s email: %s commit: %s\n", repoName, name, email, commit);

}

Complete program

import com.google.api.client.util.Lists;
import com.google.auth.Credentials;
import com.google.auth.oauth2.GoogleCredentials;
import com.google.cloud.bigquery.BigQuery;
import com.google.cloud.bigquery.BigQueryOptions;
import com.google.cloud.bigquery.FieldValueList;
import com.google.cloud.bigquery.Job;
import com.google.cloud.bigquery.JobId;
import com.google.cloud.bigquery.JobInfo;
import com.google.cloud.bigquery.QueryJobConfiguration;
import com.google.cloud.bigquery.TableResult;

import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.util.UUID;

public class QueryManager {
    public static void main(String[] args) throws InterruptedException, IOException {

// Path of the Service Account key file
        String jsonPath = "D:\\Learning\\GCP\\my-rcs-project-833123-ef45632b1b12.json";

//Set the Service account key as credential
        Credentials credentials = GoogleCredentials
                .fromStream(new FileInputStream(jsonPath));

//Initialize BigQuery Client by setting project id and credential
        BigQuery bigquery = BigQueryOptions.newBuilder()
                .setProjectId("my-rcs-project-833123")
                .setCredentials(credentials)
                .build().getService();

// Define the query with a QueryJobConfiguration
        QueryJobConfiguration queryConfig =
                QueryJobConfiguration.newBuilder(
                                "SELECT commit, author, repo_name "
                                        + "FROM `bigquery-public-data.github_repos.commits` "
                                        + "WHERE subject like '%bigquery%' "
                                        + "ORDER BY subject DESC LIMIT 10")
                        .build();

// Create a job ID so that we can safely retry.
        JobId jobId = JobId.of(UUID.randomUUID().toString());
        Job queryJob = bigquery.create(JobInfo.newBuilder(queryConfig).setJobId(jobId).build());

// Wait for the query to complete.
        queryJob = queryJob.waitFor();

// Check for errors
        if (queryJob == null) {
            throw new RuntimeException("Job no longer exists");
        } else if (queryJob.getStatus().getError() != null) {
            // You can also look at queryJob.getStatus().getExecutionErrors() for all
            // errors, not just the latest one.
            throw new RuntimeException(queryJob.getStatus().getError().toString());
        }

// Get the results.
        TableResult result = queryJob.getQueryResults();

// Print all pages of the results.
        for (FieldValueList row : result.iterateAll()) {
            // String type
            String commit = row.get("commit").getStringValue();
            // Record type
            FieldValueList author = row.get("author").getRecordValue();
            String name = author.get("name").getStringValue();
            String email = author.get("email").getStringValue();
            // String Repeated type
            String repoName = row.get("repo_name").getRecordValue().get(0).getStringValue();
            System.out.printf(
                    "Repo name: %s Author name: %s email: %s commit: %s\n", repoName, name, email, commit);
        }
    }
}

import com.google.api.client.util.Lists;

import com.google.auth.Credentials;

import com.google.auth.oauth2.GoogleCredentials;

import com.google.cloud.bigquery.BigQuery;

import com.google.cloud.bigquery.BigQueryOptions;

import com.google.cloud.bigquery.FieldValueList;

import com.google.cloud.bigquery.Job;

import com.google.cloud.bigquery.JobId;

import com.google.cloud.bigquery.JobInfo;

import com.google.cloud.bigquery.QueryJobConfiguration;

import com.google.cloud.bigquery.TableResult;

import java.io.FileInputStream;

import java.io.FileNotFoundException;

import java.io.IOException;

import java.util.UUID;

public class QueryManager {

public static void main(String[] args) throws InterruptedException, IOException {

// Path of the Service Account key file

String jsonPath = "D:\\Learning\\GCP\\my-rcs-project-833123-ef45632b1b12.json";

//Set the Service account key as credential

Credentials credentials = GoogleCredentials

.fromStream(new FileInputStream(jsonPath));

//Initialize BigQuery Client by setting project id and credential

BigQuery bigquery = BigQueryOptions.newBuilder()

.setProjectId("my-rcs-project-833123")

.setCredentials(credentials)

.build().getService();

// Define the query with a QueryJobConfiguration

QueryJobConfiguration queryConfig =

QueryJobConfiguration.newBuilder(

"SELECT commit, author, repo_name "

+ "FROM `bigquery-public-data.github_repos.commits` "

+ "WHERE subject like '%bigquery%' "

+ "ORDER BY subject DESC LIMIT 10")

.build();

// Create a job ID so that we can safely retry.

JobId jobId = JobId.of(UUID.randomUUID().toString());

Job queryJob = bigquery.create(JobInfo.newBuilder(queryConfig).setJobId(jobId).build());

// Wait for the query to complete.

queryJob = queryJob.waitFor();

// Check for errors

if (queryJob == null) {

throw new RuntimeException("Job no longer exists");

} else if (queryJob.getStatus().getError() != null) {

// You can also look at queryJob.getStatus().getExecutionErrors() for all

// errors, not just the latest one.

throw new RuntimeException(queryJob.getStatus().getError().toString());

}

// Get the results.

TableResult result = queryJob.getQueryResults();

// Print all pages of the results.

for (FieldValueList row : result.iterateAll()) {

// String type

String commit = row.get("commit").getStringValue();

// Record type

FieldValueList author = row.get("author").getRecordValue();

String name = author.get("name").getStringValue();

String email = author.get("email").getStringValue();

// String Repeated type

String repoName = row.get("repo_name").getRecordValue().get(0).getStringValue();

System.out.printf(

"Repo name: %s Author name: %s email: %s commit: %s\n", repoName, name, email, commit);

}

Output

Finally we executed our Java program to read the data from BigQuery dataset bigquery-public-data.github_repos.commits. As we shown below, the program has printed the results.

Output in Google cloud console

Also we have verified the results of the select query in the Google cloud console. It looks good.

Select query results in Google Cloud Console

Recommended Articles

References

Google Cloud documentation

Share your knowledge with other Developers in REVISIT CLASS.

How to execute a Select query on BigQuery using Java?

BigQuery

BigQuery Job

Prerequisite to run a BigQuery job using Java

Java program to execute a Select query on BigQuery:

Step 1: Set the client libraries in pom.xml

Step 2 : Import the Google Cloud BigQuery libraries in the program

Step 3 : Set the Service Account key credential

Step 4 : Initialize a BigQuery client

Step 5 : Define the query with a QueryJobConfiguration

Step 6 : Start a Bigquery job

Step 7 : Wait for the query to complete

Step 8 : Check the errors

Step 9 : Get the results

Step 10 : Print the results

Complete program

Leave a Reply Cancel reply

Tags

Recent Posts

BigQuery

BigQuery Job

Prerequisite to run a BigQuery job using Java

Java program to execute a Select query on BigQuery:

Step 1: Set the client libraries in pom.xml

Step 2 : Import the Google Cloud BigQuery libraries in the program

Step 3 : Set the Service Account key credential

Step 4 : Initialize a BigQuery client

Step 5 : Define the query with a QueryJobConfiguration

Step 6 : Start a Bigquery job

Step 7 : Wait for the query to complete

Step 8 : Check the errors

Step 9 : Get the results

Step 10 : Print the results

Complete program

Related Posts

How to create a view in BigQuery?

String REPLACE function in BigQuery with examples

How to get the DDL of an existing table/view in BigQuery?

Split function in BigQuery with examples

How to convert a String to Date in BigQuery?

Leave a Reply Cancel reply

Tags

Recent Posts