Developer Programs

Learn

Docs

Silverlake

Pipelined Data Sets > Silverlake

Introduction

The goal of this pipeline is to make SilverLake RAW Core extracts available on a weekly basis with optional near-real time streaming of CORE record changes.

Scenarios

Customers could use the data broker data set of SilverLake to report and analyze the financial institutions unedited data produced from the CORE raw extracts.

Pipelines

The SilverLake pipeline utilizes a weekly full extraction by each CORE and an internal adapter service to ingest and append weekly extracts to raw tables in BigQuery.

Schedule

Weekly full extracts. Continuous near-real time streaming of changed records.

Dataset Types

SilverLake pipeline datasets provided four types of data:

  1. raw - Producer source system raw extract without transformations
  2. cdc - Change data capture indicates this table / view contains all instances of a specific thing and receives all changes to those things, INSERT, UPDATE, or DELETE
  3. map - Internal processing table for Data Broker Adapter Service
  4. sch - Producer source schema definition

Dataset Table Partitioning for Query Performance

DataBroker dataset tables are partitioned by DAY Timestamps, __op_ts column, to improve query performance and lower cost of data reads. Recommended to include the timestamp in any where clause to reduce data reads and CPU costs of queries.

SELECT * FROM `<FI_project>.<FI_dataset>.glmast_raw` 
WHERE TIMESTAMP_TRUNC(__op_ts, DAY) = TIMESTAMP("2025-02-18")

Combining RAW and CDC

Combining records from matching raw and cdc tables can be combined together to produce a merged dataset of the weekly SilverLake extract with the near real time change-data-caputure records to provide a near-real-time result set for that table.

Example A: Basic union of glmast raw and cdc SilverLake tables

with raw_date as (
select max(__op_ts) as base_date
from <FI_project>.<FI_dataset>.glmast_raw
),
combined_glmast as (
    select *
    from `<FI_project>.<FI_dataset>.glmast_raw`
    where __op_ts = (select base_date from raw_date)
    union all
    select *
    from `<FI_project>.<FI_dataset>.glmast_cdc`
    where __op_ts > (select base_date from raw_date)
),
ranked_glmast as (
    select *,
        row_number() over (
            partition by __uid
            order by __op_ts desc
        ) rank
    from combined_glmast
),
current_glmast as (
    select * except (rank)
    from ranked_glmast
    where rank = 1
        and __op in ('I', 'U')
)
select *
from current_glmast

Example B: Filtering union of cfmast raw and cdc SilverLake tables with an example join to publicly available Google BigQuery geological data

with raw_date_cfmast as (
select max(__op_ts) as base_date 
from `<FI_project>.<FI_dataset>.cfmast_raw` 
),
combined_cfmast as (
    select *
    from `<FI_project>.<FI_dataset>.cfmast_raw`
    where __op_ts >= (select base_date from raw_date)
    union all
    select *
    from `<FI_project>.<FI_dataset>.cfmast_cdc`
    where __op_ts > (select base_date from raw_date_cfmast)
),
ranked_cfmast as (
    select *,
        row_number() over (
            partition by __uid
            order by __op_ts desc
        ) as rank
    from combined_cfmast
),
cfmast_cur as (
    select * except (rank)
    from ranked_cfmast
    where rank = 1
        and __op in ('I', 'U')
),
raw_zip as (
    select lpad(cast(cfzip/10000 as string), 5, '0') as ZIP,
        count(*) as Customers
    from cfmast_cur
    group by ZIP
)
select if(Customers > 9, internal_point_lat, null) as Latitude,
    if(Customers > 9, internal_point_lon, null) as Longitude,
    sum(Customers) as Customers
from raw_zip
join bigquery-public-data.geo_us_boundaries.zip_codes
    on ZIP = zip_code
group by Latitude, Longitude

ZipCode Leading Zero

Zip codes are stored as Numeric something and required a leading zero to be added for certain zipcodes.

Left Padding zero’s will be required in order to replace the leading zero due to the data type.

Example A

select distinct (cfzip, LPAD(cast (cfzip AS STRING) ,9,'0') ) from `<FI_project>.<FI_dataset>.cfmast_raw`

Example B

select substr(lpad(cast(cfzip as string), 9, '0'), 0, 5) as ZIP from `<FI_project>.<FI_dataset>.cfmast_raw`

Column Name Special Characters

Columns with special characters supported by SilverLake will have the special characters removed from column names within DataBroker.

Special Character List

'#' _num_
'$' _mon_
'&' _amp_
'@' _cat_

Source Column DataBroker Column

group --> grp 
desc --> description 
limit --> lmt

Timestamp Conversion to Datetime

Timestamp column convert:
CAST (__op_ts as STRING) __op_ts_DELIVERED,
CAST (TIMESTAMP_SECONDS(__op_ts)  AS DATETIME) AS __op_ts_CONVERTED,

NULL values as null string

missing values from a CORE column may be sent across as a string with the string value of “null” in certain circumstances.

Topics in this section


Have a Question?
Have a how-to question? Seeing a weird error? Get help on StackOverflow.
Register for the Digital Toolkit Meetup where we answer technical Q&A from the audience.
Last updated Tue Mar 4 2025