A fact table in a data warehouse contains business facts (or measures).
These measures serve as the core of dimensional model and can be summed/ mathematically manipulated.
The measures can be categorized as
- Non additive
- These are the most flexible and useful facts
- Measures can be summed across any of the dimensions associated with the fact table
Let us use below example to illustrate this types of facts. The example assumes that we have a retail sales fact table with the following columns:
The purpose of this table is to record the sales amount in dollars for each product in each store on a daily basis. Sales Dollars is a fully additive measure.We can sum up this measure along any of the dimensions present in the fact table. Or in other words, we can slice and dice the fact table by the dimension attributes and every sum is valid and correct.
- Semi-additive measures can be summed up across some dimensions, not all.
- Inventory levels and financial account balances are semi-additive because they are additive across all dimensions except time
Consider the below scenario:
Here the Current Balance measure is semi-additive because adding up all the current balances for a given account for each day of the month doesn’t give any useful information.
In order to get further clarity, let’s assume that on the first day of the week, you have $50 in your account. Next two days, the balance remains unchanged. On the fourth day, you deposit another $50 so the balance is now $100. The account has no further activity through the end of the week.This case we can’t merely add up the daily balances during the week and declare that the ending balance is $550 (based on $50 + $50 + $50 + $100 + $100+ $100 + $100). The most useful way to combine account balances is to find average daily balance on a monthly account summary. Because in this example table, Current Balance represents the balance amount at one point in time (or, snapshot).
- Non-Additive measures can’t be summed up across any of the dimensions in the fact table
- Unit prices, percentages, and ratios are non-additive.
For example, if a product is sold at 5% margin, and then another one is sold at 5% margin, it is a mistake to say that total margin for the two sales is 10%.
However, it is considered as a good practice that to never store percentages or ratios in fact tables but only calculate these in the BI tools. It only stores the numerator and denominator in the fact table, which then can be aggregated and then can then be used for calculating the ratio or percentage in BI tool.
Fact Table Types:
There are three fundamental types of fact tables. They are
- Transaction fact tables
- Periodic snapshot fact tables
- Accumulating snapshot fact tables
Transaction fact table :
- They are the most basic and fundamental fact table type
- They hold data of the most detailed level, by having a great number of dimensions associated with it
- This type of tables helps answer the “how many?” question. For example, how many cars were sold last week?
- The grain of transaction table is one row per transaction. For example, the grain of the policy transaction fact table should be one row for each individual policy transaction.
- The metrics resulting from transactional events are typically additive
Periodic snapshot fact table:
- This type of fact table describes the state of things in a particular instance of time
- It usually includes more semi-additive and non-additive facts
- It answers the “how much?” question and making sure that the right product is present is in the right store at the right time. This will, in turn, minimizes the possibility out-of-stocks.
- In periodic snapshot tables, the grain is the time period, not the individual transaction
Accumulating snapshot fact table:
- These are typically appropriate for predictable workflows with well-established milestones
- Accumulating snapshot fact tables have multiple date stamps which represent the most important milestones in the workflow.
For example, in an order processing system, set of dates which describe the typical process workflow are order date, actual ship date, delivery date, final payment date, and return date.Each time any of the above processes happens, the accumulating snapshot fact row is getting modified.Or in other words, the date foreign keys are overwritten each time.
The Data Warehouse Toolkit by Ralph Kimball & Margy Ross