Dimensional Modeling is a widely accepted technique for representing data in data warehouse/business intelligence applications.
Moreover, it is one of the methods of data modeling, that will help to store the data in such a way that it is relatively easy to retrieve the data from the database.
Before getting deep into the dimensional modeling concepts, let’s try to understand
- What is meant by Data modeling?
- Why do we prefer to use dimensional modeling in data warehouses?
- How dimensional modeling differ from relational modeling?
Data modeling is considered as the first step in the database design. In this process, Data modelers work with various business stakeholders and create a conceptual/logical and physical data model. There are different modeling techniques which store data in different ways.One among them is relational modeling. It heavily relies on the principles of Normalization.
Note: If you are not familiar with Normalisation, you could refer the link here.They have provided pretty good explanations for various normal forms using simple examples.
These normalized tables perform pretty well on transactional systems(i.e. OLTP systems) because of the insert/update transactions which touch database in only one place.
However, when it comes to reporting, joining dozens of tables isn’t very efficient. It will give a terrible query performance. Fortunately, this problem is effectively addressed by the dimensional modeling technique.
Dimensional databases are used to slice data along certain dimensions. Here everything is divided into 2 distinct categories – dimension or facts. Anything we try to model must fit in one of these two categories.
Fact tables generally represent a business measurement.
For example consider a sales transaction: Some XYZ product of certain quantity has been sold for 100$. This particular transaction event can correspond to a single row in a fact table. Here the data on each row of a fact table is at a specific level of detail (also referred as Granularity).
- All the measurement rows in a fact table must be at the same grain
- A measurement event in the physical world has one-to-one relationship to a single row in the corresponding fact table
- Facts are numeric and additive in nature
- Fact table tends to be deep in terms of number of rows, but narrow in terms of number of columns
- All Fact tables have two or more foreign keys that connect to dimension tables’ primary keys
Dimension tables usually contain all the important information (who/ what/ where/ when/why/how) associated with a transaction event.
Examples of various dimensions are location, product, time, promotion, customer etc.
Each dimension is defined by a single primary key, which serves as the foreign key with any fact table to which it is joined.
Hope you understood the basics about fact and dimension tables.
How to bring these two building blocks to create a dimensional model?
We are going to discuss this in detail in the upcoming blog post.
References: The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling 2nd Edition by Ralph Kimball