Data Vault and Dimensional Modelling, a happy marriage! (Data Vault Series)

We need ambassadors!

I was at this one day course recently and spoke to a professional working in the business intelligence domain.
He said that he had just followed a two week course for datawarehouse architects.
I asked him, what did they tell about Data Vault?
His answer was shocking: “Not so much ..”

If recently trained datawarehouse architects do not get trained to use Data Vault in a datawarehouse architecture, there definitely is a lot of missionary work to do.
Because the Data Vault can add so much value to your datawarehouse solution. 
And more important, it is not a choice, a Kimball dimensional model OR a Linstedt Data Vault. Use both and
combine each strengths!

Intended audience

This is a strong call for everybody who reads this to share this post with at least one person you know working in the business intelligence domain using star schemas (a.k.a. dimensional modelling) directly fed from source systems.

What are the strengths of a Dimensional Model (a.k.a. Star Schema)?

  • Because the dimensional model is already around for decades, a lot of front end tools have an excellent support for it.
  • A dimensional model is designed for reporting, slicing and dicing, for getting data out. That is what it is good at.

    This becomes clear by the following examples:

    • Query performance is usually better than of other models, like Data Vault or Relational Datawarehouse, especially for aggregations.
    • A star model is an excellent source for Cube databases, like SSAS, especially when using a multidimensional model.
    • Date selections and comparison of periods (e.g. Q1 this year versus Q1 Last Year) is easier than in Data Vault or Relational models.
  • It enables self service BI. Give business (power) users (direct) access to a dimensional model, and there is a good chance that they can get  data out in the format they want.

What are the strengths of a Data Vault?

  • A Data Vault is meant to capture all the data all the time. Therefore it stores all data from the source systems, captures the changes in it, and keeps a full history of all changes (with a sidemark, that multiple changes between two load cycles, will be seen as one).
  • A Data Vault captures all data exactly as it is stored in the source systems. “The good, the bad and the ugly”with no business rules applied to it (except, in some cases hard business rules like data types). This makes all data auditable.
  • Data Vault 2.0 supports cross platform integration with other systems (e.g. Hadoop) and decentralized datawarehouses, for instance in different global regions.
  • With Data Vault 2.0 working agile is easy, in sprints, building the system in increments.
  • Data Vault has a very good support for Data Integration. Satellites, hanging off the Hub, containing data from different source systems.

So? What are the strengths of a datawarehouse architecture using both?

  • You can change your mind! If your business rules change (and it happens!), just reload your dimensional model from the Data Vault with the new rules applied.
  • You have one version of the facts . If you have multiple dimensional models, you have not.
  • Data history is not like a destroyed monument. With a Data Vault, you capture all data from the source systems, and all changes applied to it. Source systems usually reflect only the current state of the data, so without a change capturing central storage like Data Vault, your history is gone forever!
  • Your dimensional model does not have to serve two purposes: data storage and presentation of information. These two tasks can be in conflict, for instance there is currently no report need for all data that you could grap from source systems, so you leave it out of your star schema. But if the question comes tomorrow, you have no data, no history!
  • You have full auditibility. Because the (raw) Data Vault stores the data from source systems unmodified, every measure, every dimensional attribute, can be traced back to the source it came from and when. This makes it able to refute claims that the data is not correct. Maybe now it will become clear that the old datawarehouse- or reporting system has lied for years!

How can you use both?

The answer is simple: by using a multi-tier architecture.
Data from source systems is first extracted to a Staging area, before it is moved into the Enterprise Data Warehouse using a Data Vault Model (with or without some optional components).
From there it will be distributed to dimensional models (star schemas) and cubes, and whilst the data is on its way, business rules can be applied.
DV-S01E02-multi-tier-architecture
Multi-tier architecture using both Data Vault and Dimensional Modelling techniques.

Conclusion / Wrap up

I have written this post to create more awareness about using both Data Vault and Dimensional Modelling or Star Schemas in a data warehouse architecture.
I have listed strengths and benefits of Data Vault, Dimensional Modelling and of an architecture using both.

(c) 2016 hansmichiels.com – Do not steal the contents – spread the link instead – thank you.

Hans Michiels

Hans Michiels

Hans is an Independent Business Intelligence and Datawarehouse Consultant & Microsoft SQL Server Consultant, working in the Netherlands. He has been working in the software industry since 1996, with SQL Server since the year 2001, and since 2008 he has a primary focus on datawarehouse- and business intelligence projects using Microsoft technology, using a Datavault and Kimball architecture. He has a special interest in Datawarehouse Automation and Metadata driven solutions. * Certified in Data Vault Modeling: Certified Data Vault 2.0 Practitioner, CDVDM (aka Data Vault 1.0) * Certified in MS SQL Server: * MCSA (Microsoft Certified Solutions Associate) SQL Server 2012 - MCITP Business Intelligence Developer 2005/2008 - MCITP Database Developer 2005/2008 - MCITP Database Administrator 2005/2008

More Posts

13 thoughts on “Data Vault and Dimensional Modelling, a happy marriage! (Data Vault Series)”

    1. Hi Lohic, thank you for feedback. Good to hear that this topic is recognized by a lot of people (also on LinkedIn).

  1. Hi Hans,
    we started to use data vault last year. One of the biggest problems is the performance when we populate the multidimensional models from the Enterprise Data Warehouse. We really need to work out some views to be able to extract the data for the multidimensional entities, especially of course the fact tables. Another drawback is, the DW model is not easy to understand. What is your experience implementing ETL interfaces between Data Warehouse and data marts?
    Kind regards,
    Paul Hernandez

    1. Hi Paul,
      If you do not use them already, you definitely need Point-in-time (or PIT) tables and maybe Bridge tables.
      Joins between hubs and satellites using BETWEEN or >= and < can become slow on big tables and many satellites. PIT tables can increase performance because INNER JOINs with equal conditions are used. Is your Data Vault a 2.0 model or do you use sequence numbers? If you use SQL Server Enterprise Edition you might be able to gain performance using table partitioning. If you are interested I can give more info by email. About the complexity of the model, if possible use metadata and generate views and/or ETL code. It is inherent to a Data Vault model that it has more tables than a 3NF database or a dimensional model. This makes the Data Vault extremely flexible, but also a bit complex sometimes. Best regards, Hans

      1. Hi Hans,
        Many thanks for your useful answer. A developer from our DW team told me that he plans to implement PIT and bridge tables too, he just need some time to do it 😉
        I will contact you through Linkedin to stay connected .
        Cheers, Paul

  2. Good article, data warehousing is a collection of methods, techniques and tools which is used to support knowledge workers such as senior managers, directors, managers, and business analysts to conduct data analyses that help with performing decision-making processes and improving information resources. It is secured place for the historical data where data is never deleted and data updates are normally carried out when data warehouses are offline.
    http://www.sql-datatools.com/2015/09/data-warehouse-architecture-and-multidimensional-model.html

    1. Hello Klaus,
      That’s good to hear! I am really interested in this, when I have time I will download the trial.
      Best regards, Hans

  3. Hi Hans,
    Teradata enterprise DW also uses a three tier architecture which implements NCR – FSLDM model which is a 3NF relational model which acts as a data vault. Here data from all sources are brought in staged and loaded to core database where versioning of data records is implemented and history is maintained. Data is than snapshotted to the different datamarts (star schemas) as needed for reporting.

    1. Hi Mohandorai,

      As I understand after doing some research on the web FSLDM means Financial Services Logical Data Model and it was founded in 1999.
      Also, as far as I understand, the focus is on the logical model for financial services (more specific, banks) and the historization of data, the similarity with Data Vault, seems a (technical) by-product. But I can be wrong. Also I read nothing about methodology, which is part of Data Vault 2.0 as well.
      So it has some similarity, but more differences, which doesn’t mean it cannot work.
      Thank you for sharing this.

      Best regards,
      Hans

Leave a Reply to Lohic Beneyzet Cancel reply

Your email address will not be published. Required fields are marked *