Banno Ads Reporting

Pipelined Data Sets > Banno Ads Reporting

Introduction

The goal of this pipeline is to make the Banno Ads Reporting dataset available in Google Big Query where it can be queried using the SQL interface for business purposes and to drive downstream business processes.

Scenarios

The Banno Ads Reporting dataset is a made up of data files from Kafka. A single data file in JSON format based on specific topics are written directly to Google Cloud Storage. The files are then imported into BigQuery and exposed as a native table in BigQuery to provide a SQL interface over the data in order to drive downstream business processes.

This pipeline enables three scenarios:

Initial load: The data files are loaded for the first time
Incremental load: Subsequent data file loads after the initial load
Historical load / backfill: Historical load to clean up existing data and reload from the beginning of time

Pipelines

Ads Reporting data is sent to Kafka as JSON strings. A dedicated Kafka consumer group pulls those records. Once pulled, they are deserialized into Scala objects, processed, and transformed into a format suitable for BigQuery. Finally, the data is converted to JSONL and written to a Google Cloud Storage bucket using the Google API.

A Data Transfer job then picks up the data file from the Google Cloud Storage bucket and imports it into Google Big Query to expose it as native table. After the data file is imported, the copies of the data file in the Google Cloud Storage bucket are retained for later archival and audit purposes. After the initial data file import into Big Query, the subsequent imports only append to the existing table; there is no additional work to de-duplicate data or purge old(er) entries. Per the current requirements, the two jobs are currently time-driven and not auto-triggered based on events such as the availability of the source file and the completion of the first job.

Schedule

Initially, the data import process is expected to run hourly, however the pipeline is extensible to change it to any schedule as needed in the future.

Querying the Data

Banno Ads Data Descriptions in Dataset `jh_banno_ads_reporting_us`

impressions: The event of a user “seeing” an Ad. More specifically, a placed Ad is detected within the viewport of a user’s device or browser.

clickthroughs: The event of a user clicking on a placed Ad.

impression_campaign_users: Links an impression event for a user to the specific Campaign the placed Ad is a part of, as well as whether or not the impression led to a clickthrough.

impression_campaigns: Links an impression to the specific Campaign the placed Ad is a part of.

impression_segment_users: Links an impression event for a user to the specific Segment that user was included in for Ad targeting, as well as whether or not the impression led to a clickthrough.

Topics in this section

Have a Question?

Have a how-to question? Seeing a weird error? Get help on StackOverflow.
Register for the Developer Office Hours where we answer technical Q&A from the audience.

Please ignore this field

Did this page help you?

Last updated Wed Mar 4 2026