Top 10 Design Mistakes seen in OLTP production systems (SQL Server Series)

Introduction


If you ever had to source your datawarehouse from OnLine Transaction Processing databases, I suppose this top 10 will be a “joy” of recognition for you.

Top 10 Design Mistakes seen in OLTP production systems

1. Don’t have referential integrity enforced

Without actual foreign key constraints in the database, you will get orphaned records sooner or later (e.g. orderline rows for already deleted orders). This is bad for dataquality and can hide bugs in the front end software maintaining the data.

Back to top

2. Foreign key columns having a totally different name than the primary key they refer to

Especially when point 1 is the case, it is difficult to tell that a foreign key column “account_id” refers to the “customer_id” column in the customer table. When reverse engineering a OLTP design this does not really help to understand it.

Back to top

3. Foreign key columns missing “Id”, “Key” or “Guid” in the name

I personally dislike a column name “customer” when it is an integer or global unique identifier datatype. However from the datatype you can draw the conclusion that it cannot be the name, there are many situations where you only see the column name, and then it can be confusing.

Back to top

4. Foreign key columns meaning more than one thing

More than once, in standard software which is highly adjustable or configurable, the datamodel is “flexible”, e.g. link tables where object_id can be a customer, an employee, or yet something else. Apart from the problem this gives in enforcing referential integrity without a physical constraint, it makes it more difficult to understand the model and/or get data out.

Back to top

5. Accessing the datamodel through low-performance all-in-one views, protecting base tables from direct access

They exist! Views with 700+ columns, probably joining 20+ tables, to give an all-in-one view for an account. And still, then you need an extra column from an already joined table, that is not in the list of 700+ columns. You can guess what this leads to, suboptimal queries with poor performance. While, if you were allowed to write a view directly on the tables, you could optimize it for your needs.

Back to top

6. Oversized column lengths

Ever seen a varchar(max) column for a country code with a maximum length of 3? I Have. While this is an extreme example, oversized columns lead to several problems:

  • SQL Server handles the column differently. Without going into too much detail, a char(3) column for the country code would be stored much more efficiently than a varchar(max) column.
  • You do not always know how much space must be reserved for the column on reports and so on.
  • If you use the value in concatenated columns (e.g. business keys), you need to size it after the physical maximum size, even when the actual data will not fill it up.


Oversizing can happen:

  • when front end C# developers use “code first”, and the column length is not defined in the code. Therefore I recommend, when “code first” is used, a database developer or DBA has a critical look at the generated physical datamodel before going live.
  • when the column size is not specified in design documents.
  • when people just don’t care to size the column properly. Oversizing seems like a safe thing to do, if the column is big enough, no one will complain!

Back to top

7. Oversized INT Datatypes

Or: To Smallint or not to Smallint. When reference tables only contain a few rows, a smallint or even tinyint datatype for the primary key will do! If the primary key is used in foreign key constraints of large tables (with many rows) less storage is needed, indexes will be smaller, all leading to better performance.

Unfortunately, SQL Server does not support unsigned smallint, int and bigint datatypes, but nobody stops you from start counting at the minimum negative value, so -32768 instead of 1 for a smallint! Just change the seed part of the IDENTITY specification. If you do this, you can store up to 65536 rows in a table with a smallint primary key.

MSSQL-S01E01-Top 10 Design Mista…

–\
—) Author: Hans Michiels
—) Script to demonstrate how to start an
—) identity column with a negative number
–/

IF OBJECT_ID(‘[dbo].[A65kRowsTable]’, ‘U’) IS NOT NULL
DROP TABLE [dbo].[A65kRowsTable]
GO
CREATE TABLE [dbo].[A65kRowsTable](
[ID] SMALLINT IDENTITY(32768, 1) NOT NULL,
[SomeColumn] VARCHAR(50) NOT NULL
CONSTRAINT [PK_dbo_A65kRowsTable_ID] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
INSERT INTO [dbo].[A65kRowsTable] ( [SomeColumn] )
VALUES (‘-32768 is the first ID.’)
GO
SELECT [ID], [SomeColumn]
FROM [dbo].[A65kRowsTable]
GO


If you do this trick with an INT, you can store 4.2 billion rows and you might in some cases be able to prevent that you need a bigint.

Back to top

8. Lack of unique indexes for business or natural keys

Again a dataquality issue. If the primary key is surrogate (an integer type which is autonumbered), a table still can have one or more different columns that make a row unique. For example, a social security number in a citizens table. Not enforcing this constraint via an unique index, can lead to two citizens with the same social security number! This can hide bugs in the front end software, fraude, or both. Anyway, it is not wise to take the risk that this happens.

Back to top

 9. Guids stored as (n)varchar

A guid, short for Global Unique IDentifier, is a generated value of 16 bytes, guaranteed to be globally unique. It is sometimes used for primary key columns, instead of integer datatypes.
I will keep it in the middle if this is wise to do, there might be some use cases when this is a valid approach, but I would only do this if using integers is not an option.

SQL Server has the UNIQUEIDENTIFIER datatype to store guids. Internally, this datatype uses 16 bytes, but the value is represented as a hexadecimal string of 36 characters (32 for the 16 bytes, with 4 hyphens in between, sometimes surrounded by curly brackets, e.g. {935D9FA8-2C77-4C34-894C-8FCDA8E47F19}.
But if the guid is stored in a varchar(36) column, it will use 36 bytes for the value plus 2 bytes overhead, so 38 bytes in total. Compared with the 16 bytes for the UNIQUEIDENTIFIER, you can imagine that storage space and index size will increase and performance will decrease.
Not to mention Nvarchar(36), the Unicode variant of varchar, which will use 2 bytes for every character plus 2 bytes overhead, so 74 bytes in total. This is massive! I have seen nvarchar(36) in a production system to store a guid and would classify it as a “My first database” mistake. Only very unexperienced developers, hardly knowing anything about datatypes would do this. Don’t be one of them!

MSSQL-S01E01-mistakes-homer-doh

Back to top

10. Where’s the DBA?

This point is more about daily operations than about the design, but I still want to mention it. If you are using SQL Server databases you need to maintain them. A simple maintenance plan is easy to set up, and will do index and statistics maintenance for you, make backups and could prevent that the transaction log file grows excessively. If you don’t have an internal DBA, hire one to set it up and monitor your SQL Server instances.

(c) 2016 hansmichiels.com – Do not steal the contents – spread the link instead – thank you.

Hans Michiels

Hans Michiels

Hans is an Independent Business Intelligence and Datawarehouse Consultant & Microsoft SQL Server Consultant, working in the Netherlands. He has been working in the software industry since 1996, with SQL Server since the year 2001, and since 2008 he has a primary focus on datawarehouse- and business intelligence projects using Microsoft technology, using a Datavault and Kimball architecture. He has a special interest in Datawarehouse Automation and Metadata driven solutions. * Certified in Data Vault Modeling: Certified Data Vault 2.0 Practitioner, CDVDM (aka Data Vault 1.0) * Certified in MS SQL Server: * MCSA (Microsoft Certified Solutions Associate) SQL Server 2012 - MCITP Business Intelligence Developer 2005/2008 - MCITP Database Developer 2005/2008 - MCITP Database Administrator 2005/2008

More Posts

Leave a Reply

Your email address will not be published. Required fields are marked *