Friday, March 15, 2013

SCD Types... Part 1


Type 0
The Type 0 method is passive. It manages dimensional changes and no action is performed. Values remain as they were at the time the dimension record was first inserted
The most common types are I, II, and III.
Type I
This methodology overwrites old with new data, and therefore does not track historical data. Its common uses are for misspelled names
Example of a Customer table:
Customer_Key
Customer_Code
Customer_Name
Customer_State
1
ABC
Oracle
CA
In the above example, Customer_Code is the natural key and Customer_Key is a surrogate key.

If the Customer relocates the headquarters to New Jersey. The updated table would overwrite this record:
Customer_Key
Customer_Code
Customer_Name
Customer_State
1
ABC
Oracle
NJ
Cons:
·         No history in the data warehouse.
·         If you have calculated an aggregate table summarizing facts by state, it will need to be recalculated when the Customer_State is changed.
Pros: It's easy to maintain.

Type II
This method tracks historical data by creating multiple records for a given natural key in the dimensional tables with separate surrogate keys and/or different version numbers. Unlimited history is preserved for each insert.
For example, if the Customer relocates to New Jersey the version numbers will be incremented sequentially:
Customer_Key
Customer_Code
Customer_Name
Customer_State
Version.
1
ABC
Oracle
CA
0
2
ABC
Oracle
NJ
1
Another method is to add 'effective date' columns.
Customer_Key
Customer_Code
Customer_Name
Customer_State
Start_Date
End_Date
1
ABC
Oracle
CA
01-Jan-2005
21-Dec-2009
2
ABC
Oracle
NJ
22-Dec-2009
The null End_Date in row two indicates the current record version.
Pros: An aggregate table summarizing facts by state continues to reflect the historical state, i.e. the state the Customer was in at the time of the transaction; no update is needed.
Cons: If there are retrospective changes made to the contents of the dimension, or if new attributes are added to the dimension (for example a Customer_Rep column) which have different effective dates from those already defined, then this can result in the existing transactions needing to be updated to reflect the new situation. This can be an expensive database operation, so Type 2 SCDs are not a good choice if the dimensional model is subject to change.

No comments:

Post a Comment