Thursday, October 19, 2017

Type 2 Slowly Changing Dimensions

SCD type 2 will store the entire history in the dimension table. In SCD type 2 effective date, the dimension table will have Start_Date (Begin_Date) and End_Date as the fields. If the End_Date is Null, then it indicates the current row. it will  accurately keep all historical information, but will cause the size of the table to grow fast which needs more storage space and poor performance.So SCD2 is used only when it is necessary to track all historical data.

Example:
--Source Table

Create Table Customers
(
  Customer_Id integer Primary Key,
  Location    Varchar(30)
);

--Target Dimension Table

Create Table Customers_Dim
(
  Cust_Key integer Primary Key,
  Customer_Id   integer,
  Location      Varchar(30),
  Starting_Date    Date,
  End_Date      Date
);

Steps

Identify new records and insert into dimension table Customers_Dim with  Starting_Date as Current date (getdate()) and End_Date as NULL.
Identify changed record and insert into dimension table with Starting_Date as Current date (getdate()) and End_Date as NULL.
Identify the changed record and update the existing record in dimension table with End_Date as Current date (getdate()).

In type 2, you can store the data in three different ways. They are
Versioning
Flagging
Effective Date

Versioning: A sequence number is used to represent the change. The latest sequence number always represents the current row and the previous sequence numbers represents the past data. The records with the highest version number is treated as an active record.
 Flagging: A flag column is created in the dimension table. The current record will have the flag value as 1 and the previous records will have the flag as 0/ If the flag indicates ‘Y’ the record is latest and if the flag indicates ‘N’ then the record is outdated.
 Effective Date: The period of the change is tracked using the start_date and end_date columns in the dimension table. NULL in the End_Date indicates the current version of the data and the remaining records indicate the past data. Four attributes of a Type 2 dimension:


1) SCD original ID -  An alternative primary key for the dimension
2) SCD start date - date this dimension member become active
3) SCD end date - date this dimension becomes inactive
4) SCD status - current state, either active or inactive usually represented by Flags

Example:

Type 1 : In the Type 1 SCD example, each employee can only appear once in the Salesperson
dimension.

Type 2: In the Type 2 SCD, a salesperson can appear in the dimension more than once
A Type 3 SCD is similar to a Type 2 SCD with one exception. A Type 3 SCD does not
track the entire history of the dimension members. Instead, a Type 3 SCD tracks only the current state and the original state of a dimension member.
A Type 3 SCD is implemented using two additional attributes:
1)  SCD Start Date The date the current state of this dimension member became active 2)  SCD Initial Value:  The original state of this attribute

Source: Delivering Business Intelligence with Microsoft SQL Server 2012 (3rd Edition) Author: Brian Larson
http://www.ajer.org/papers/v2(4)/K0248691.pdf

No comments:

Post a Comment

GEN AI

  Stay Tuned....