Table of Contents
Star and SnowFlake Schema in Data Warehousing
Star Schema
The star schema consists of one or more fact tables referencing any number of dimension tables.
- It is called a star schema because the diagram resembles a star, with points radiating from a center.
- The center of the star consists of fact table and the points of the star are the dimension tables.
- Usually the fact tables in a star schema are in third normal form(3NF) whereas dimensional tables are de-normalized.
- The star schema separates business process data into facts, which hold the measurable, quantitative data about a business, and dimensions which are descriptive attributes related to fact data.
Examples of fact data include sales price, sale quantity, and time, distance, speed, and weight measurements. - Related dimension attribute examples include product models, product colors, product sizes, geographic locations, and salesperson names.
- A star schema that has many dimensions is sometimes called a centipede schema.
Benefits of Star Schema
Star schemas are denormalized, meaning the normal rules of normalization applied to transactional relational databases are relaxed during star schema design and implementation.
The benefits of star schema denormalization are:
- Simpler queries.
- Simplified business reporting logic.
- Query performance
- Fast aggregations
- Feeding cubes
Simpler queries
Star schema join logic is generally simpler than the join logic required to retrieve data from a highly normalized transactional schemas.
Simplified business reporting logic
When compared to highly normalized schemas, the star schema simplifies common business reporting logic, such as period-over-period and as-of reporting.
Query performance
Star schemas can provide performance enhancements for read-only reporting applications when compared to highly normalized schemas.
Fast aggregations
The simpler queries against a star schema can result in improved performance for aggregation operations.
Feeding cubes
Star schemas are used by all OLAP systems to build proprietary OLAP cubes efficiently; in fact, most major OLAP systems provide a ROLAP mode of operation which can use a star schema directly as a source without building a proprietary cube structure.
Disadvantages of Start Schema
- The main disadvantage of the star schema is that data integrity is not enforced as well as it is in a highly normalized database.
- One-off inserts and updates can result in data anomalies which normalized schemas are designed to avoid.
- Generally speaking, star schemas are loaded in a highly controlled fashion via batch processing or near-real time “trickle feeds”, to compensate for the lack of protection afforded by normalization.
Example
Consider a database of sales, perhaps from a store chain, classified by date, store and product.
Fact_Sales is the fact table and there are three dimension tables Dim_Date, Dim_Store and Dim_Product.
Snowflake Schema
- Is a logical arrangement of tables in a multidimensional database such that the entity relationship diagram resembles a snowflake shape.
- The snowflake schema is represented by centralized fact tables which are connected to multiple dimensions.
- “Snowflaking” is a method of normalising the dimension tables in a star schema.
- When it is completely normalised along all the dimension tables, the resultant structure resembles a snowflake with the fact table in the middle.
- The principle behind snow flaking is normalization of the dimension tables by removing low cardinality attributes and forming separate tables.
- The snowflake schema is similar to the star schema.
- However, in the snowflake schema, dimensions are normalized into multiple related tables, whereas the star schema’s dimensions are denormalized with each dimension represented by a single table.
- A complex snowflake shape emerges when the dimensions of a snowflake schema are elaborate, having multiple levels of relationships, and the child tables have multiple parent tables.
Benefits of Snowflake Schema
- The snowflake schema is in the same family as the star schema logical model. In fact, the star schema is considered a special case of the snowflake schema.
- The snowflake schema provides some advantages over the star schema in certain situations, including:
- Some OLAP multidimensional database modeling tools are optimized for snowflake schemas.
- Normalizing attributes results in storage savings, the tradeoff being additional complexity in source query joins.
Disadvantages of Snowflake Schema
- The primary disadvantage of the snowflake schema is that the additional levels of attribute normalization adds complexity to source query joins, when compared to the star schema.
- When compared to a highly normalized transactional schema, the snowflake schema’s denormalization removes the data integrity assurances provided by normalized schemas.
- Data loads into the snowflake schema must be highly controlled and managed to avoid update and insert anomalies.
Example of Snowflake Schema