Introduction
Dan Linstedt, the inventor of Data Vault, has written a lot about it: hashkeys.
For instance, one of his latest blog posts:
#datavault 2.0, Hashes, one more time.
I will not list all other sources, as you can use Google yourself.
A few comments on hash keys:
- You need them for scalability. Using sequence numbers is taking the risk that your data warehouse does not scale well later when the amount of data grows.
- They can collide: two different business keys can produce the same hash key. However the chance that this happens is very small. For instance when using SHA-1 (which produces a hash value of 160 bits) you will have a
1 in 1018 chance on a hash collision when having1.71 * 1015 hash values (read: hub rows) according to this blog post. - If collisions are unacceptable you need a hash key collision strategy.
The full article is posted on DWA.Guide, so you read further there ..
Picture credits: © Can Stock Photo / alexskp