Data Egress from Data Hub
Data Egress from GCP and Data Hub
Both GCP and Data Hub support egressing data to other platforms. Below are a few examples of possible methods and use cases for moving data to other platforms.
For Customer with GCP project that want data in other data platforms.
The main options to push data from a Google project
GCP Data Flow – a Google Cloud Service for batch and streaming data.
- Customers can create their own pipeline code or use Google provided templates. This will allow Data Flow jobs to push to locations such as Azure SQL or Azure Data Lake.
Cloud Data Fusion - a Google managed data integration service
- Customers use a web interface to build solutions to connect to other data sources for data egress using plugins.
Pub/Sub - topics can be used to move real-time data.
Pulling data from BigQuery into other solutions
Other cloud vendors such as Azure, AWS, or Snowflake provide tools that can connect to and pull data from GCP BigQuery.
- Azure
- Data Factory – data integration service for automating data workflows
- Databricks – analytics platform using data lake
- AWS
- Glue – data integration, preparation, and discovery
- Data Exchange – connect to other data providers
- Snowflake
- Data loading - cloud storage connectors, Snowpipe, and Kafka topics
- Azure
There are a variety of third-party tools that can be used to move data between data platforms
- In general, a customer configures the connector in your 3rd party tool for BigQuery as a source, then selects the destination such as Microsoft Fabric Lakehouse as the target.
- Some common options include:
If a customer chooses not to use a GCP Project, Data Hub supports a batch delivery methodology to deliver data to Azure or AWS
Data Hub Data Egress
Supports exporting data in CSV, AVRO, or Parquet formats. This allows customers to load the data into other platforms like Azure Fabric, AWS, or Snowflake.
This solution uses scheduled jobs to pull data from Big Query tables based on the time column, generally the partition key. The schedule for the job varies based on the data delivery method for the pipeline. The files are created in a dedicated Google storage bucket for the customer.
Once the files are created, the Jack Henry File Transfer Service will securely move the data to the final destination. The customer will need to provide the landing area, firewall rules, and authentication for JHFT to write the files to their location.
- Have a how-to question? Seeing a weird error? Get help on StackOverflow.
- Register for the Developer Office Hours where we answer technical Q&A from the audience.