Slowly changing dimension (SCD) types using Data Services
What is a Slowly Changing Dimension
A Slowly Changing Dimension (SCD) is a dimension that stores and manages both current and historical data over time in a data warehouse. It is considered and implemented as one of the most critical ETL tasks in tracking the history of dimension records.
- TYPE 0 - Fixed Dimension.
No changes allowed, dimension never changes. - TYPE 1- No History.
Update record directly, there is no record of historical values, only current state. - TYPE 2- Row Versioning.
Track changes as version records with current flag & active dates and other metadata - Type 3 – Previous Value column.
Track change to a specific attribute, add a column to show the previous value, which is updated as further changes occur - Type 4 – History Table
Show current value in dimension table but track all changes in separate table - Type 6 – Hybrid SCD
Utilize techniques from SCD Types 1, 2, and 3 to track change
In reality, only types 0, 1, and 2 are widely used, with the others reserved for very specific requirements.
Here we are going to implement the commonly used SCDs i.e.: TYPE0, TYPE 1and TYPE 2 using BODS
Fixed Dimension.
Value remains the same as it were at the time the dimension record was first entered.
Consider an example with fields Name, IDate and Designation.
Here the date and designation remain the same.
We can simply map all the values to a table using query transform.
DF_TYPE_0: Map all the fields
Direct mapping of the fields alone is required.
The output will be the same as the source.
TYPE 1-No History.
This method does not track any history data. This methodology overwrites old data with new data without keeping the history. This method is mainly used for misspelled names.
Let us consider below-given data as our target data after the first run.
Consider the designation of ‘John’ is changed to B on the date 2003.12.01 and a new member’ June’ also joined.
In TYPE2, No history has preserved the designation and IDate are updated with new value.
In BODS it can be implemented by using the following transformation.<.p>
DF_TYPE1_source: Map the source to the table.
DF_TYPE1_UPDATED: Implement the type 2 logic.
Source: The file which contains the changed values:
Query: Map the source data to query transform without applying any transformation.
Table Comparison: Table comparison used to compare source data and the target table data.
Here we are comparing the source data with the target table (create a staging table with required fields as the target table i.e.: add ID field for key generation).
Key_Generation: This transform is used to generate a surrogated key.
After the execution of the job, we will get the required output:
Here the IDATE and Designation is replaced with current values.
TYPE 2-Row Versioning.
In Type 2 Slowly Changing Dimension, a new record is added to the table to represent the new information. Therefore, both the original and the new record will be present.
In order to support type 2 changes, we need to add four columns to our table:
Surrogate Key – the original ID will no longer be sufficient to identify the specific record we require, we therefore need to create a new ID that the fact records can join to specifically.
Current Flag – A quick method of returning only the current version of each record
Start Date – The date from which the specific historical version is active
End Date – The date to which the specific historical version record is active
In the first run, we have the following values,
In BODS, it can be implemented by using the following transformation.
DF_TYPE2_TABLE: Map the values
DF_TYPE2_UPDATION: Implement the logic.
Source: The file which contains the changed values:
Query: Map the source data to query transform without applying any transformation. Add extra 2 columns for the target table ID and Current flag.
Table_Comparison: Table comparison used to compare source data and the target table data.
History_Preserving:- This transform is used to store the history data, and if any updates on source data then new row will be inserted.
Here we are providing Valid-from date as Start Date and Valid to date as End_Date. For new records, Valid to date has to be given. Current _Flag can be set to determine the current and old values.
Key_Generation: This transform is used to generate a surrogated key.
After the execution of the job.
Here we can see that John has 2 records, the Current-flay set to “N” are the old record, and the Current-flay set to “Y” is the current record.