Showing posts with label Dimension Modelling concepts. Show all posts
Showing posts with label Dimension Modelling concepts. Show all posts

Friday, March 15, 2013

SCD Types .... Part III

Here is the final post for the last  SCD type ...


Type 6
The Type 6 method combines the approaches of types 1, 2 and 3 (1 + 2 + 3 = 6).
The Customer table starts out with one record for our example Customer:
Customer_Key
Customer_Code
Customer_Name
Current_State
Historical_State
Start_Date
End_Date
Current_Flag
1
ABC
Oracle
NJ
CA
01-Jan-2005
31-Dec-9999
Y
The Current_State and the Historical_State are the same. The Current_Flag attribute indicates that this is the current or most recent record for this Customer.
When Oracle company moves to New Jersey, we add a new record, as in Type 2 processing:
Customer_Key
Customer_Code
Customer_Name
Current_State
Historical_State
Start_Date
End_Date
Current_Flag
1
ABC
Oracle
NJ
CA
01-Jan-2005
21-Dec-2009
N
2
ABC
Oracle
NJ
NJ
22-Dec-2009
31-Dec-9999
Y
We overwrite the Current_State information in the first record (Customer_Key = 1) with the new information, as in Type 1 processing. We create a new record to track the changes, as in Type 2 processing. And we store the history in a second State column (Historical_State), which incorporates Type 3 processing.
For example if the Customer were to relocate again, we would add another record to the Customer dimension, and we would overwrite the contents of the Current_State column:
Customer_Key
Customer_Code
Customer_Name
Current_State
Historical_State
Start_Date
End_Date
Current_Flag
1
ABC
Oracle
NY
CA
01-Jan-2005
21-Dec-2009
N
2
ABC
Oracle
NY
NJ
22-Dec-2009
03-Feb-2013
N
3
ABC
Oracle
NY
NY
04-Feb-2013
31-Dec-9999
Y
Note that, for the current record (Current_Flag = 'Y'), the Current_State and the Historical_State are always the same

SCD Part I     SCD Part 2

SCD Types.. Part II


Type III
This method tracks changes using separate columns and preserves limited history. The Type II preserves unlimited history as it's limited to the number of columns designated for storing historical data. The original table structure in Type I and Type II is the same but Type III adds additional columns. In the following example, an additional column has been added to the to record the Customer's original state - only the previous history is stored.
Customer_Key
Customer_Code
Customer_Name
Original_Customer_State
Effective_Date
Current_Customer_State
1
ABC
Oracle
CA
22-Dec-2009
NJ

Cons: This record contains a column for the original state and current state—cannot track the changes if the Customer relocates a second time. One variation of this is to create the field Previous_Customer_State instead of Original_Customer_State which would track only the most recent historical change.
Type IV
The Type 4 method is usually referred to as using "history tables", where one table keeps the current data, and an additional table is used to keep a record of some or all changes.
For the above example the original table name is Customer and the history table is Customer_History.
Customer
Customer_key
Customer_Code
Customer_Name
Customer_State
1
ABC
Oracle
NJ

Customer_History
Customer_key
Customer_Code
Customer_Name
Customer_State
Create_Date
1
ABC
Oracle
CA
22-Dec-2009


SCD Types... Part 1


Type 0
The Type 0 method is passive. It manages dimensional changes and no action is performed. Values remain as they were at the time the dimension record was first inserted
The most common types are I, II, and III.
Type I
This methodology overwrites old with new data, and therefore does not track historical data. Its common uses are for misspelled names
Example of a Customer table:
Customer_Key
Customer_Code
Customer_Name
Customer_State
1
ABC
Oracle
CA
In the above example, Customer_Code is the natural key and Customer_Key is a surrogate key.

If the Customer relocates the headquarters to New Jersey. The updated table would overwrite this record:
Customer_Key
Customer_Code
Customer_Name
Customer_State
1
ABC
Oracle
NJ
Cons:
·         No history in the data warehouse.
·         If you have calculated an aggregate table summarizing facts by state, it will need to be recalculated when the Customer_State is changed.
Pros: It's easy to maintain.

Type II
This method tracks historical data by creating multiple records for a given natural key in the dimensional tables with separate surrogate keys and/or different version numbers. Unlimited history is preserved for each insert.
For example, if the Customer relocates to New Jersey the version numbers will be incremented sequentially:
Customer_Key
Customer_Code
Customer_Name
Customer_State
Version.
1
ABC
Oracle
CA
0
2
ABC
Oracle
NJ
1
Another method is to add 'effective date' columns.
Customer_Key
Customer_Code
Customer_Name
Customer_State
Start_Date
End_Date
1
ABC
Oracle
CA
01-Jan-2005
21-Dec-2009
2
ABC
Oracle
NJ
22-Dec-2009
The null End_Date in row two indicates the current record version.
Pros: An aggregate table summarizing facts by state continues to reflect the historical state, i.e. the state the Customer was in at the time of the transaction; no update is needed.
Cons: If there are retrospective changes made to the contents of the dimension, or if new attributes are added to the dimension (for example a Customer_Rep column) which have different effective dates from those already defined, then this can result in the existing transactions needing to be updated to reflect the new situation. This can be an expensive database operation, so Type 2 SCDs are not a good choice if the dimensional model is subject to change.