Silverlake
Introduction
The goal of this pipeline is to make SilverLake RAW Core extracts available on a weekly basis with optional near-real time streaming of CORE record changes.
Scenarios
Customers could use the data broker data set of SilverLake to report and analyze the financial institutions unedited data produced from the CORE raw extracts.
Pipelines
The SilverLake pipeline utilizes a weekly full extraction by each CORE and an internal adapter service to ingest and append weekly extracts to raw tables in BigQuery.
Schedule
Weekly full extracts. Continuous near-real time streaming of changed records.
Dataset Types
SilverLake pipeline datasets provided four types of data:
- raw - Producer source system raw extract without transformations
- cdc - Change data capture indicates this table / view contains all instances of a specific thing and receives all changes to those things, INSERT, UPDATE, or DELETE
- map - Internal processing table for Data Broker Adapter Service
- sch - Producer source schema definition
Dataset Table Partitioning for Query Performance
DataBroker dataset tables are partitioned by DAY Timestamps, __op_ts column, to improve query performance and lower cost of data reads. Recommended to include the timestamp in any where clause to reduce data reads and CPU costs of queries.
SELECT * FROM `<FI_project>.<FI_dataset>.glmast_raw`
WHERE TIMESTAMP_TRUNC(__op_ts, DAY) = TIMESTAMP("2025-02-18")
Combining RAW and CDC
Combining records from matching raw and cdc tables can be combined together to produce a merged dataset of the weekly SilverLake extract with the near real time change-data-caputure records to provide a near-real-time result set for that table.
Example A: Basic union of glmast raw and cdc SilverLake tables
with raw_date as (
select max(__op_ts) as base_date
from <FI_project>.<FI_dataset>.glmast_raw
),
combined_glmast as (
select *
from `<FI_project>.<FI_dataset>.glmast_raw`
where __op_ts = (select base_date from raw_date)
union all
select *
from `<FI_project>.<FI_dataset>.glmast_cdc`
where __op_ts > (select base_date from raw_date)
),
ranked_glmast as (
select *,
row_number() over (
partition by __uid
order by __op_ts desc
) rank
from combined_glmast
),
current_glmast as (
select * except (rank)
from ranked_glmast
where rank = 1
and __op in ('I', 'U')
)
select *
from current_glmast
Example B: Filtering union of cfmast raw and cdc SilverLake tables with an example join to publicly available Google BigQuery geological data
with raw_date_cfmast as (
select max(__op_ts) as base_date
from `<FI_project>.<FI_dataset>.cfmast_raw`
),
combined_cfmast as (
select *
from `<FI_project>.<FI_dataset>.cfmast_raw`
where __op_ts >= (select base_date from raw_date)
union all
select *
from `<FI_project>.<FI_dataset>.cfmast_cdc`
where __op_ts > (select base_date from raw_date_cfmast)
),
ranked_cfmast as (
select *,
row_number() over (
partition by __uid
order by __op_ts desc
) as rank
from combined_cfmast
),
cfmast_cur as (
select * except (rank)
from ranked_cfmast
where rank = 1
and __op in ('I', 'U')
),
raw_zip as (
select lpad(cast(cfzip/10000 as string), 5, '0') as ZIP,
count(*) as Customers
from cfmast_cur
group by ZIP
)
select if(Customers > 9, internal_point_lat, null) as Latitude,
if(Customers > 9, internal_point_lon, null) as Longitude,
sum(Customers) as Customers
from raw_zip
join bigquery-public-data.geo_us_boundaries.zip_codes
on ZIP = zip_code
group by Latitude, Longitude
ZipCode Leading Zero
Zip codes are stored as Numeric something and required a leading zero to be added for certain zipcodes.
Left Padding zero’s will be required in order to replace the leading zero due to the data type.
Example A
select distinct (cfzip, LPAD(cast (cfzip AS STRING) ,9,'0') ) from `<FI_project>.<FI_dataset>.cfmast_raw`
Example B
select substr(lpad(cast(cfzip as string), 9, '0'), 0, 5) as ZIP from `<FI_project>.<FI_dataset>.cfmast_raw`
Column Name Special Characters
Columns with special characters supported by SilverLake will have the special characters removed from column names within DataBroker.
Special Character List
'#' _num_
'$' _mon_
'&' _amp_
'@' _cat_
Source Column DataBroker Column
group --> grp
desc --> description
limit --> lmt
Timestamp Conversion to Datetime
Timestamp column convert:
CAST (__op_ts as STRING) __op_ts_DELIVERED,
CAST (TIMESTAMP_SECONDS(__op_ts) AS DATETIME) AS __op_ts_CONVERTED,
NULL values as null string
missing values from a CORE column may be sent across as a string with the string value of “null” in certain circumstances.
Topics in this section
- acpar5_raw
- acpara_raw
- brinfo_raw
- cdhist_raw
- cdmast_raw
- cdpar2_raw
- cdpar3_raw
- cdpar4_raw
- cdpar5_raw
- cdpar6_raw
- cdparc_raw
- cdparp_raw
- cdpars_raw
- cdparu_raw
- cdplan_raw
- cdrmst_raw
- cfacct_raw
- cfmast_raw
- cfpar1_raw
- cfpar5_raw
- cfparb_raw
- cfpard_raw
- cfsicc_raw
- cfusrflds_raw
- ddaam1_raw
- ddaams_raw
- ddaap2_raw
- ddbrst_raw
- ddefth_raw
- ddfeesiw_raw
- ddhist_raw
- ddmast_raw
- ddpar2_raw
- ddpar3_raw
- ddpar5_raw
- ddpar6_raw
- ddparc_raw
- ddparf_raw
- ddpars_raw
- ddparu_raw
- ddusrflds_raw
- gletrn_raw
- glint1_raw
- glmast_raw
- glpar3_raw
- glpar5_raw
- glpar7_raw
- glpar8_raw
- glpar9_raw
- glparz_raw
- glparz1_raw
- hbacct_raw
- hbmast_raw
- jhclas_raw
- jhcldr_raw
- jhdata_raw
- jhmast_raw
- jhoffr_raw
- jhparl_raw
- jhparsrc_raw
- jhrgms_raw
- jhstat_raw
- jhwith_raw
- lnfee_raw
- lnhist_raw
- lnline_raw
- lnmast_raw
- lnnaic_raw
- lnpar2_raw
- lnpar3_raw
- lnpar8_raw
- lnpar9_raw
- lnparq_raw
- lnparr_raw
- lnparu_raw
- lnsicc_raw
- lnsmtg_raw
- lnusrflds_raw
- sdmast_raw
- sdpars_raw