A Short Primer on Fact Tables - Data Warehousing

A comprehensive introduction to the concept of "Fact Tables" in data warehousing.

I remember when I had my first business intelligence course, we were introduced to different dimensional modelling concepts such as fact tables, dimensions, the Kimball approach, the Inmon approach, etc. It was a bit confusing and overwhelming at first because my background was mainly in backend development and relational transactional databases. But a few years later, I took the time to learn about data warehousing and dimensional modelling again using various resources.

Out of all the things I learned, understanding fact tables and dimension tables is crucial for grasping dimensional modelling. That's why I decided to write this short primer. It's a way for me to share and reinforce what I've learned, and I believe it can serve as a handy guide to help you understand and build fact tables the right way.

In this guide, we will learn what fact tables are, their properties, basic techniques to build them and some advanced tips, at the end we will take a look at two different use cases and try to model the corresponding fact tables.

1. What is a Fact Table?

In dimensional modelling, we have mainly two types of tables: fact tables and dimension tables. A fact table holds physical events called facts or measures, such as orders, shipments, etc. It’s a one-to-one mapping with an operational business event, that’s the most important takeaway from this guide.

Each row in the fact table represents an occurrence of the event, if we are building a fact table to hold orders, each row in the fact table will correspond to one and only one order.

A fact table is composed of measures, these measures are numeric information that composes our facts, to better understand facts and measures, let’s take a look at the following fact table from the classic book « The Data Warehouse Toolkit » by Kimball that represents a dimensional model of an order process in a retail business.

A fact table for orders in a store, source : The Data Warehouse Toolkit book (Kimball)

In this model, orders are the physical event resulting from the business operations, therefore it makes sense to model orders as facts. An order has a date, ID, total amount, etc. Unlike transactional databases where we would put all these attributes in the same table, a fact table contains only attributes that are essential to defining the physical activity we are representing, a good rule of thumb is that measurements in a fact table are always numeric, this attributes such as shipping date and order date are not included in the fact table, instead they are represented as dimensions, which will help us to organize our order analytics by date: Average orders per day, total orders per quarter, etc.

This thinking approach leaves us with few attributes to consider as measurements in our fact table, attributes such as net total amount, discounted total and ordered quantity are all numerical measurements that are essential in defining an order, they don’t make sense to build analytical queries against them (e.g Total orders per order quantity !?), thus they should be included in the fact table.

A fact table is related to other dimensions through one-to-one relationships: the primary key of each dimension table is included in the fact table as a foreign key ( a reference to the dimension table).

2. Fact Tables vs. Dimensions Tables

One issue that newcomers to dimensional modelling face frequently is the differentiation between a fact table and a dimension table, I remember that I was struggling to decide whether a given entity should be regarded as a dimension or as a fact.

A dimension describes the context, it usually answers one of the following questions: Who, What, Where and When regarding a business process event. Dates, for example, are always modelled as dimensions, as they describe the temporal aspect of an event (an order, a visit, a delivery, etc.). Other examples of dimensions could include customer, product, or location. These dimensions provide additional information about the facts. Dimensions don't have to be numerical and are stored in dimension tables and are linked to the fact table through keys (foreign keys) to establish relationships.

A fact on the other hand, corresponds solely to a physical activity or an event resulting from an operation, they are typically numerical and are often associated with Key Performance Indicators (KPIs), they are the core of what we are analyzing. Examples of facts: orders, deliveries, hirings, errors, returns, etc.

Hopefully, by now, you have a clearer understanding of what a fact is and what a dimension is, for the next few sections, we will look at some properties of fact tables and expand on different types of facts that can be modelled.

3. Additivity

Facts can be either additive, semi-additive or non-additive :

Additive facts contain measures that can be summed across various dimensions. Additive measures are particularly useful because they can be aggregated regardless of the dimension. For instance, consider the example of total sales in a retail store, where the dimensions are Date and Location. With an additive measure, we can easily calculate the total sales for a specific quarter by summing the sales across different dates. Similarly, we can determine the total sales in a particular location, such as Manchester, by summing the sales across different dates. When designing a data warehouse it's important to differentiate these additive facts as they are usually the most useful for conducting meaningful analytics.

Semi-additive facts are measures that can be summed across some dimensions but not other dimensions, while non-additive facts cannot be summed across any of the available dimensions. To better understand this point, let's take an example: suppose we have a fact table in a bank data warehouse, composed of the current balance and a ratio that represents the ratio of the current balance to the balance of last year, this fact table is related to two dimensions: Date and Account. The current balance is semi-additive, as it makes sense to sum the current balance across all accounts, but it's not logical, nor useful to sum the current balance across the date dimension, it provides no analytical value.

On the other hand, the return ratio is a non-additive fact. It cannot be summed across any dimension. Ratios, in general, tend to be non-additive. They represent relationships or proportions and don't lend themselves to simple summation. Another example of a non-additive fact is a qualitative measure like customer satisfaction, which cannot be meaningfully summed across any dimension.

Additivity table for a bank deposit

4. Types of Fact Tables

The concept of additivity is applied to facts and measures only and not to tables, in the case of the whole fact table, we have 4 main types :

  • Transaction Fact Tables: The most common type of fact tables are transaction fact tables, these tables represent facts of an event that occurs at a given point in time such as an order, a delivery, a deposit, etc. They are composed of additive facts and are usually expressive and the most useful from an analytical perspective.

  • Periodic Snapshot Fact Tables: These fact tables represent the state of an event at a specific and well-defined point in time, such as the performance of a manager over a given period, we would take a snapshot at the end of a week, month or any period. It is composed of semi-additive and non-additive facts and it's useful for reporting. It is worth noting that in a periodic snapshot table, a row is added in each period even if no measurement takes place.

  • Accumulating Snapshot Fact Tables: accumulating snapshot fact tables track the progress or status of a process as it moves through various stages or milestones. They are typically used to analyze and monitor processes that have a predefined workflow or sequence of steps. The accumulating snapshot fact table is updated periodically or at specific points in the process, capturing the relevant measures and dimensions at each stage. This allows analysts to analyze the duration, throughput, efficiency, and other performance metrics of the process across different periods.

  • Factless Fact Tables: Some fact tables might have no numerical measurement at all and they only capture a set of dimensions (through their foreign keys) at a given point in time, in this case, the table is called a factless fact table as it doesn't hold any measurement. A nice example given by Kimball is a fact table that records the presence of a student in a classroom, this fact table doesn't have any numerical measurement, it only holds the foreign keys of the student, class and date dimensions as presented in the following diagram :

5. Conclusion

So far, we have taken a look at different properties and types of facts and fact tables, understanding how to design fact tables correctly is crucial to design efficient data warehouses. I will try to write another short guide to dimensions and dimension tables in a future article. If you enjoyed reading this article, a share would be appreciated.