Oracle Business Intelligence: Dimension Modelling concepts

SCD Types .... Part III

Here is the final post for the last SCD type ...

Type 6

The Type 6 method combines the approaches of types 1, 2 and 3 (1 + 2 + 3 = 6).

The Customer table starts out with one record for our example Customer:

Customer_Key	Customer_Code	Customer_Name	Current_State	Historical_State	Start_Date	End_Date	Current_Flag
1	ABC	Oracle	NJ	CA	01-Jan-2005	31-Dec-9999	Y

The Current_State and the Historical_State are the same. The Current_Flag attribute indicates that this is the current or most recent record for this Customer.

When Oracle company moves to New Jersey, we add a new record, as in Type 2 processing:

Customer_Key	Customer_Code	Customer_Name	Current_State	Historical_State	Start_Date	End_Date	Current_Flag
1	ABC	Oracle	NJ	CA	01-Jan-2005	21-Dec-2009	N
2	ABC	Oracle	NJ	NJ	22-Dec-2009	31-Dec-9999	Y

We overwrite the Current_State information in the first record (Customer_Key = 1) with the new information, as in Type 1 processing. We create a new record to track the changes, as in Type 2 processing. And we store the history in a second State column (Historical_State), which incorporates Type 3 processing.

For example if the Customer were to relocate again, we would add another record to the Customer dimension, and we would overwrite the contents of the Current_State column:

Customer_Key	Customer_Code	Customer_Name	Current_State	Historical_State	Start_Date	End_Date	Current_Flag
1	ABC	Oracle	NY	CA	01-Jan-2005	21-Dec-2009	N
2	ABC	Oracle	NY	NJ	22-Dec-2009	03-Feb-2013	N
3	ABC	Oracle	NY	NY	04-Feb-2013	31-Dec-9999	Y

Note that, for the current record (Current_Flag = 'Y'), the Current_State and the Historical_State are always the same

SCD Part I SCD Part 2

SCD Types.. Part II

Type III

This method tracks changes using separate columns and preserves limited history. The Type II preserves unlimited history as it's limited to the number of columns designated for storing historical data. The original table structure in Type I and Type II is the same but Type III adds additional columns. In the following example, an additional column has been added to the to record the Customer's original state - only the previous history is stored.

Customer_Key	Customer_Code	Customer_Name	Original_Customer_State	Effective_Date	Current_Customer_State
1	ABC	Oracle	CA	22-Dec-2009	NJ

Cons: This record contains a column for the original state and current state—cannot track the changes if the Customer relocates a second time. One variation of this is to create the field Previous_Customer_State instead of Original_Customer_State which would track only the most recent historical change.

Type IV

The Type 4 method is usually referred to as using "history tables", where one table keeps the current data, and an additional table is used to keep a record of some or all changes.

For the above example the original table name is Customer and the history table is Customer_History.

Customer
Customer_key	Customer_Code	Customer_Name	Customer_State
1	ABC	Oracle	NJ

Customer_History
Customer_key	Customer_Code	Customer_Name	Customer_State	Create_Date
1	ABC	Oracle	CA	22-Dec-2009

Part III here ...

Part I here...

SCD Types... Part 1

Type 0

The Type 0 method is passive. It manages dimensional changes and no action is performed. Values remain as they were at the time the dimension record was first inserted

The most common types are I, II, and III.

Type I

This methodology overwrites old with new data, and therefore does not track historical data. Its common uses are for misspelled names

Example of a Customer table:

Customer_Key	Customer_Code	Customer_Name	Customer_State
1	ABC	Oracle	CA

In the above example, Customer_Code is the natural key and Customer_Key is a surrogate key.

If the Customer relocates the headquarters to New Jersey. The updated table would overwrite this record:

Customer_Key	Customer_Code	Customer_Name	Customer_State
1	ABC	Oracle	NJ

Cons:

· No history in the data warehouse.

· If you have calculated an aggregate table summarizing facts by state, it will need to be recalculated when the Customer_State is changed.

Pros: It's easy to maintain.

Type II

This method tracks historical data by creating multiple records for a given natural key in the dimensional tables with separate surrogate keys and/or different version numbers. Unlimited history is preserved for each insert.

For example, if the Customer relocates to New Jersey the version numbers will be incremented sequentially:

Customer_Key	Customer_Code	Customer_Name	Customer_State	Version.
1	ABC	Oracle	CA	0
2	ABC	Oracle	NJ	1

Another method is to add 'effective date' columns.

Customer_Key	Customer_Code	Customer_Name	Customer_State	Start_Date	End_Date
1	ABC	Oracle	CA	01-Jan-2005	21-Dec-2009
2	ABC	Oracle	NJ	22-Dec-2009

The null End_Date in row two indicates the current record version.

Pros: An aggregate table summarizing facts by state continues to reflect the historical state, i.e. the state the Customer was in at the time of the transaction; no update is needed.

Cons: If there are retrospective changes made to the contents of the dimension, or if new attributes are added to the dimension (for example a Customer_Rep column) which have different effective dates from those already defined, then this can result in the existing transactions needing to be updated to reflect the new situation. This can be an expensive database operation, so Type 2 SCDs are not a good choice if the dimensional model is subject to change.

Part 2 here .....

Part 3 here.....

Friday, March 15, 2013

SCD Types .... Part III

SCD Types.. Part II

SCD Types... Part 1