In the previous blog post, we learned about fact and dimension tables. Today we are going to see that how these facts and dimension tables can be combined to build a dimensional model data warehouse. Let’s understand the terminology Schema first.
A Schema is a logical representation of the entire database.
Similar to the database, data warehouses also have to maintain a particular schema.
While designing a data warehouse, there are a variety of ways in which we can arrange the schema objects.
Two most popular schema types among them are Star and Snowflake schema.
This is the simplest data warehouse schema. A star schema is represented by one large fact table and many dimension tables. The schema diagram looks like a star with a central fact table from which points radiating to the surrounding dimension tables.
The fact table contains the numeric measurements such as sales_amount, cost, and #quantity_sold and keys to the dimension tables.
Each dimension table is joined to the fact table using their primary key, but the dimension tables are not joined to each other.
An example for star schema representation is given below:
- In the above representation, Sales fact table contains measures such as sales_dollars and sales_units. It also contains keys such as date_key, customer_key, product_key and store_key.
- Each dimension table contains detailed information about particular attribute (ie. customer/product/store info), which is present in the fact table entry
- Star Schema is useful in designing both data marts and large data warehouses
- Due to fewer joins, this Schema gives better performance
- Each and every dimension is treated as equivalent in star schema
- We can add completely new dimensions to the Schema as long as a single value of that dimension is defined for each existing fact row
- Dimension tables are highly denormalized in star schema(i.e. has redundant data )
Snowflake schema can be considered as a variant of the star schema. However, this is a more complex data model compared to the star schema.
In a snowflake schema, there is single, large and central fact table and one or more tables for each dimension. In order to eliminate redundancy, dimension tables split data into different tables. Due to this normalization, often it results in more complex queries and reduced query performance.
Example for a snowflake schema representation is given below:
In the above representation, the Product dimension table split data into multiple dimension tables such as Branch dimension and Package type dimension.Even the branch dimension split its data into category dimension table in order to minimize redundancy. Or in other words, to implement normalization, the secondary tables are connected to base dimension table by an attribute key.This result in a multilevel structure called snowflake.
Even though it represents data accurately, it is difficult for business users to understand and navigate snowflake schema. Also, this multilevel structure negatively impacts query performance.