Integration of Google BigQuery with SAP Data Warehouse Cloud (DWC)
Analyzing business data often requires the consumption of additional business-critical data from third-party sources. It is counterproductive to have to copy or move all this data into DWC for such analysis. Instead, DWC has the provision to integrate data from multiple sources, one such example being Google BigQuery.
Data federation between Google BigQuery and DWC has been explored in the following blogs:
- Google BigQuery to SAP Data Warehouse Cloud DWC
- Data Federation Between SAP Data Warehouse Cloud (DWC) and Google BigQuery
Here, delving deeper into this usecase, we create a Google BigQuery connection based on a specific GCP Service account in DWC and access data from a different third-party account in BigQuery. Read on to find out how this is done.
SAP Data Warehouse Cloud(DWC) is a cloud-based data warehousing solution that combines both efficient data management and advanced analytics. Built on the SAP HANA Cloud database, DWC combines data from several sources (SAP and third-party data, across clouds and on-premise repositories), enabling highly efficient real-time analysis.
Google BigQuery is part of the Google Cloud Platform (GCP) – It is a serverless, scalable, multicloud data warehousing solution that enables data analysis at a petabyte scale.
Once integrated with DWC, data from Google BigQuery can be queried directly by means of virtual tables in DWC. Live connectivity ensures that queries are federated through virtual tables with fresh data and there is never any need for caching or to replicate the data from the source.
DWC can connect to GBQ using two mechanisms:
- Through SDI with the use of the Google BigQuery adapter
- Directly from DWC through the connection type 'Google BigQuery'
As of May 2021, the first connection type (through SDI) is not available. It requires a 3rd party driver, but this has been blocked by SAP for security reasons.
This blog will focus on the second method, i.e., establishing a connection directly from DWC using the ‘Google BigQuery’ connection type.
Following are the prerequisites to integrating Google BigQuery data with SAP DWC
- Google Cloud Platform Account
- User with relevant authorization to use Google BigQuery
- Google BigQuery Service Instance with data
- SAP DWC instance
Steps:
-
The following steps need to be carried out on the Google BigQuery (GBQ) side:
- i) Login to the GCP console and create a *Project
*NOTE: Google BigQuery is managed through Projects, which consists of ‘Datasets’. Datasets in turn are made up of views and tables.
- ii) In the GCP console, create a ‘Service account’
- Go to IAM> Service Account > Create Service Account
- This account is necessary in order to connect with DWC
- iii) Give the user a name and grant the following roles:
- BigQuery User
- iv) Generate a Key file for the Service account – this can be done from the ‘Actions’ menu, once the Service account has been created.
- The key file must be generated and downloaded in the json format
- This key establishes a secure, trusted connection
- In GCP IAM (Identity Access Management), the generated key ID field will be populated automatically
- v) Download Google cloud certificate:
- In your browser go to google.com. Open the certificate and drag it to the desktop.
- Import the certificate to Keychain Access (on macOS) and export it as a .pem file
- i) Login to the GCP console and create a *Project
-
Once GBQ has been configured, the 3rd-party vendor will have to associate the Service account (created in previous step) directly with its own GBQ project and configure it to have access to data there
- i) - The vendor would require the service account (Gmail address) and Key file in order to do this
-
The following steps will then have to be carried out on the DWC side:
- i) Go to Administration section of SAP Data Warehouse Cloud and import the certificate (Downloaded in Step I)
- ii) Go to space management / space / connections and add Connection and choose BigQuery from the list
- iii) Enter the GCP Project name and upload the key exported in Step INOTE: By default, when a GBQ service account is associated with multiple projects, the DWC connection will show data from the “home” project of that service account, i.e., the Key file of that service account (Created and downloaded in Step I).
However, once this Service account has been associated by a 3rd party SaaS vendor to their standalone GBQ project, it is possible to get this data in DWC.
This can be achieved by creating a new DWC connection using the same Key file (generated in Step I), however, the project name that is specified has to be the project we need data from (i.e., the 3rd party project name in this case). - iv) After this is done, data from the project can be accessed via DWC. Create a new graphical view and navigate to ‘Sources’ and expand ‘Connections’ – the newly created BigQuery connection should now be available for use
Further Reading: