Table of Contents
Data Models
A data model (or datamodel) is an abstract model that organizes elements of data and standardizes how they relate to one another and to properties of the real world entities. For instance, a data model may specify that the data element representing a car be composed of a number of other elements which, in turn, represent the color and size of the car and define its owner.
- E-R Model
- Multidimensional Model
E-R Model
- It is the representation of How various entities are related to each other.
- Relation Types
- 1-1 Relation
- 1-M Relation
- M-1 Relation
- M-M Relation
- Relation Types
- 1-1
One Instance of Entity A related to only One instance of Entity B - 1-M
One instance of Entity A related to Many instances of Entity B - M-1
Many Instances of Entity A related to Only one instance of Entity B - M-M
Many Instances of Entity A related to Many Instances of Entity B or vice versa
- 1-1
E-R Model Example
Dimensional Data Modeling
Dimensional Data Modeling comprises of one or more dimension tables and fact tables. Good examples of dimensions are location, product, time, promotion, organization etc. Dimension tables store records related to that particular dimension and no facts (measures) are stored in these tables.
Example – Star Schema
Dimensions
- Dimensions are the subject areas against which the analysis of facts (numerical measures) is carried out
- Determine these by the ways you want to slice and dice the data
- Small number of rows compared to facts
- Usually 5-10 dimensions surrounding a fact table
- Time is almost always a dimension used by every fact
- Track history
- Uses Surrogate Keys
- Hierarchies are usually built into them if possible
Tracking History in Dimensions
- Three types of history updates
- Type 1 – Overwrite the change
- Type 2 – Save the original version and create a new record with the updated information
- Type 3 – Move the current data to another field and replace the original with the new data.
Dimensional Update Flow
Dimensions: Type 1 Changes
- Simply update the field with the new data
- History data will be lost
Dimensions: Type 2 Changes
- Create a new record where any of the type 2 data has changed.
- Use surrogate keys to allow for multiple real keys
Dimensions: Type 3 Changes
- Update the current production row moving the original field to a newly created column (field), and putting the new data in the original
- Create a new column (field) to store the data when the dimension was changed for tracking purpose if required
Facts
- Facts are the numerical measures which the business house wants to monitor for analysis purpose
- After the dimensions have been updated, then the facts can be generated
- Use only surrogate keys when generating facts
- A fact record is the intersection of all the dimensions
- Fact records are created using SQL that groups by each of the dimensions and stores the measures
- You can have a fact record with no measures. This would be used for something like attendance. (Factless Fact table)
- Fact Table
The centralized table in a star schema is called as FACT table. A fact table typically has two types of columns: those that contain facts and those that are foreign keys to dimension tables. The primary key of a fact table is usually a composite key that is made up of all of its foreign keys. - In the example “Sales Dollar” is a fact (measure) and it can be added across several dimensions. Fact tables store different types of measures like additive, non additive and semi additive measures.
- Measure Types
- Additive – Measures that can be added across all dimensions.
- Non Additive – Measures that cannot be added across all dimensions.
- Semi Additive – Measures that can be added across few dimensions and not with others.
- Steps in designing Fact Table
- Identify a business process for analysis (like sales).
- Identify measures or facts (sales dollar).
- Identify dimensions for facts (product dimension, location dimension, time dimension, organization dimension).
- List the columns that describe each dimension (region name, branch name, region name).
- Determine the lowest level of summary in a fact table (sales dollar).
Fact Table Update Logic Flow
Snow Flake Schema
- One Dimension is stored in more then one related table
- Product Dimension is normalized
- Snow flaking affects the Query Performance and should avoided unless absolutely required for frequent analytical purpose
Dimensional Modeling Benefits
- Ease of Use
- Best suited for large scale data operations (Storing and Retrieving)
- Performance – Because of the use of integer surrogate keys and the traversal of smaller dimensions before approaching the larger fact tables
- Many Tools require Dimensional Models for BI Reporting
- Industry Standard
Dimensional Modeling Challenges
- Complexity of the Loads, especially Type 2 changes and surrogate keys
- Need for Staging Data and how they co-exist
- Databases not always geared to optimize star schemas
- Added complexity when relationships are many to many