Category Archives: SQL Server Series

Contains blog posts related to SQL Server that can not be placed in other categories like “SSIS Series” or “T-SQL Scripting Series”.

Using alternate (non-time) triggers to start a SQL Agent job (SQL Server Series)

Updates

After its first publication of this post, I have made a number of modifications on the tool:

Change log

Date Changes
25 September, 2016 Major update:
(1) extended to use an umlimited number of jobtrigger conditions;
(2) added SQLAgentJobResult trigger conditions;
(3) configuration via config file no longer supported.
28 January, 2017 Major update:
(1) added “Running” as a state that can be checked for a job (in the previous version the values where “Succeeded”, “Failed” or “Completed”);
(2) added jobcondition property “reverse_condition”, which enables negative triggers, e.g. a file may not be found, a job may not be running;
(3) a check is done that the jobs used for jobtriggers and jobtrigger conditions actually exist, if not an error is thrown, so this can prevent a silent failure, when a job is renamed or removed after a jobtrigger has been made.

Introduction

Today some SQL Agent stuff. I thought the SQL Agent works just great, until I got the question: can you start this job, when there is a file dropped in folder X, but only if also another process has already done something in the database. Huh?

Problem

With the SQL Agent I can only schedule a job to using time triggers, while I need different triggers like
existence of one or more files, a certain database state and/or other SQL Agent Job(s) having succeeded, failed or just completed.

Solution

So I needed something that checked if a file exists in a folder, and if something was done in the database, and if both is true, a SQL Agent job must be started.
I decided to create a C# Console Application named SQLAgentJobStarter, that could do this job for me. But I wanted it to be a bit more flexible, using one or more trigger conditions of different types, like file trigger or a sql trigger, or both (as in the requirement above).
Later on I extended this with jobtrigger conditions, and I changed the definitions a bit. A trigger can now have one or more conditions of different types:

  • file jobtrigger condition
  • sql jobtrigger condition
  • SQLAgentJobResult jobtrigger condition

The different condition types can be combined in one jobtrigger, e.g. SQLAgentJob “Job_A” must have succeeded and file test.txt must be in folder C:\temp.

In this article I want to concentrate on instructing you how to use this tool, instead of going into deep from a coding point of view.
I have made this tool an open source project on GitHub, so you still can look in the code (SQLAgentJobStarter_VisualStudio2015_SourceCode.zip) if you want, but that’s up to you.

DISCLAIMER: please use this tool at your own risk. It is in a bèta state, you should not use it in production environments, until you have convinced yourself that it does what it should do.

Installation and demonstration of the tool

One important comment first. This tool is not a Windows Service. It is “just” a Console Application that needs to be scheduled using a SQL Agent job or Windows Task Scheduler.
Having said that, what are we going to do?

  • We are going to create two tables in the msdb database: [dbo].[sysjobtriggers] and [dbo].[sysjobtrigger_conditions]. All other SQL Agent job data is also in this database, therewith this seems a fair place. Still I want to emphasize the importance of backing up this database on a regular base, preferably daily. But if you have a DBA and a maintenance plan set up, this might be needless to say.
  • We are going to create a few stored procedures in the msdb database to maintain those tables.
  • We are going to install the executable HansMichiels.com.Tools.SQLAgentJobStarter.exe and corresponding config file in the C:\Program Files\SQLAgentJobStarter\MSSQL13.MSSQLSERVER folder. MSSQL13.MSSQLSERVER represents the SQL Version number and Instance name, in the example a default instance of SQL Server 2016. Change it to the name of your SQL Server version and instance. In this way you can install the tool side by side for different SQL Instances on the same machine.
  • Before start using the tool, check the SQLConnectionString in the HansMichiels.com.Tools.SQLAgentJobStarter.exe.config file. By default, this is configured as follows: key=”SQLConnectionString” value=”Data Source=(local);Initial Catalog=msdb;Integrated Security=SSPI;”. This is okay for a local default SQL Server Instance, but not usable when you are using named instances.
  • We are going to schedule the tool using a SQL Agent job.
  • Now you are ready to rock! All that needs to be done is adding jobtriggers and jobtrigger_conditions to the msdb database. But to help you with this, I will show you a demo now.

Installation – technical steps

  • Download all the needed files here.
  • Unzip the files in SQLAgentJobStarter_ProgramFiles.zip, and copy them to a folder on your server, I used C:\Program Files\SQLAgentJobStarter\MSSQL13.MSSQLSERVER.
  • Review the script msdb_install.sql and if needed adjust the foldername here: @command=N'”C:\Program Files\SQLAgentJobStarter\MSSQL13.MSSQLSERVER\HansMichiels.com.Tools.SQLAgentJobStarter.exe”‘,
  • After reviewing, execute the script msdb_install.sql
  • Review the SQL Agent job ##SQLAgentJobStarter##, especially the schedule. For the demo the polling interval of the schedule is 1 minute, but I advice to set it to 3 or 5 minutes for production environments, because this interval usually is good enough.

    Also enable the schedule, it is installed in a disabled state.

So far, so good. The tool is installed.

Installation – preparation for the demo.

If you want to join the demo, also do the following:

  • execute script demo_setup_sqlagentjobs.sql: this script will set up dummy SQL Agent jobs that are used for the demo.
  • execute script demo_setup_storedprocedures.sql: this script will create a database [MSSQL_S01E04] with some stored procedures that are used for sql triggers.
  • create a folder C:\Temp, if it does not exist yet, and create a file with a .txt extension in it.
  • execute script demo_setup_sysjobtriggers.sql: this script will add demo rows to the tables [dbo].[sysjobtriggers] and [dbo].[sysjobtrigger_conditions]
  • enable all schedules. To do this you can right-click on the Jobs node of SQL Server Agent and click on Manage Schedules, then select all the schedules for the demojobs (their names start with Schedule for MSSQL_S01E04) and the schedule for ##SQLAgentJobStarter##, if not enabled yet.


The Manage Schedules menu-item ..


Check the Enabled checkbox for the applicable schedules ..

The jobs that are created so far

mssql-s01e04-342
The tables in msdb that hold the trigger information ..

mssql-s01e04-355The stored procedures in msdb used to maintain the jobtriggers ..

mssql-s01e04-334The [MSSQL_S01E04] database with stored procedures, used for the demo.

mssql-s01e04-335The C:\Temp folder, that should contain one or more .txt files ..

Viewing the demo results.

Now just wait for about 10 minutes.
In the meantime I can tell you some expected testresults:

  • MSSQL_S01E04_Job01_SqlConditionTrue, MSSQL_S01E04_Job02_FileConditionTrue and MSSQL_S01E04_Job03_FileAndSqlConditionTrue will be executed every time after the job ##SQLAgentJobStarter## has run, because their triggers (C:\Temp\*.txt and [MSSQL_S01E04].[sp].[SqlTriggerExample], both return True.
  • MSSQL_S01E04_Job04_TwoJobSuccessConditions will only be executed after the triggering jobs have succeeded both.
  • MSSQL_S01E04_Job05_JobFailureCondition will only be executed after the triggering job has failed.
  • MSSQL_S01E04_Job06_JobConditionThatNeedsToComplete will only be executed after the triggering job has either succeeded or failed.
  • MSSQL_S01E04_Job07_RunningJob will only be executed when the triggering job is running.
  • MSSQL_S01E04_Job08_NonRunningJob will only be executed when the triggering job is not running. Because job 7 and 8 both have the same triggering job, either MSSQL_S01E04_Job07_RunningJob or MSSQL_S01E04_Job08_NonRunningJob will be started after the job ##SQLAgentJobStarter## has run.
  • MSSQL_S01E04_Job09_SqlConditionFalse will never be executed because the jobtrigger uses a sql trigger using stored procedure [MSSQL_S01E04].[sp].[SqlTriggerFalse], that always returns 0 (false).
  • MSSQL_S01E04_Job10_ReversedFileCondition_RunsWhenFileDoesNotExist will only be executed when the file “C:\Temp\NowJob10DoesNotRun.txt” does not exist.

Let’s see if it worked out!


Viewing the job history of job 1 to 3..
In the job history I can see that:

  • MSSQL_S01E04_Job01_SqlConditionTrue, MSSQL_S01E04_Job02_FileConditionTrue and MSSQL_S01E04_Job03_FileAndSqlConditionTrue were executed every time after the job ##SQLAgentJobStarter## had run.
  • ##SQLAgentJobStarter##, the job that caused this, was also executed every minute, as scheduled.


Viewing the job history of job 4..

  • MSSQL_S01E04_Job4 was only being executed after the triggering jobs had succeeded both.


Viewing the job history of job 5..

  • MSSQL_S01E04_Job5 was only being executed after the triggering job had failed.


Viewing the job history of job 6..

  • MSSQL_S01E04_Job6 was being executed after the triggering job had either succeeded or failed.


Viewing the job history of job 7 and 8..

  • MSSQL_S01E04_Job07_RunningJob was being executed when the triggering job was still running (job MSSQL_S01E04_Trigger_for_Job07_and_Job08 runs every other minute for 30 seconds).


Viewing the job history of job 9..

  • MSSQL_S01E04_Job09_SqlConditionFalse was never executed: it is checked on the left, but does not occur on the right.


Viewing the job history of job 10 when the file “C:\Temp\NowJob10DoesNotRun.txt” does not exist.

Viewing the job history of job 10 after the file “C:\Temp\NowJob10DoesNotRun.txt” is created.

  • MSSQL_S01E04_Job10_ReversedFileCondition_RunsWhenFileDoesNotExist was only executed when the file “C:\Temp\NowJob10DoesNotRun.txt” did not exist.

But there is more. The tool will write to the Windows Eventlog when an error occurs.
mssql-s01e04-337To view the windows Eventlog, press <Windows Key>+R, then type eventvwr


Example of errors that are logged.

Make sure you clean up your server! Use the script demo_cleanup.sql for that. Otherwise the tool will keep on executing the demo jobs. Also change the interval of the ##SQLAgentJobStarter## from 1 to 3 or 5 minutes.

Understanding the stored procedures of the solution

Stored procedure [dbo].[usp_upsert_jobtrigger]

With this stored procedure you can insert or update a jobtrigger.

You can set the following parameters:

  • @trigger_id: If provided as input, it will be used for checking if an update or insert needs to be done. If left null or smaller than or equal to 0, it will be filled in the stored procedure.
  • @enabled: Only enabled jobtriggers are processed. Values 0 or 1 (required).
  • @name: Name for the jobtrigger (unique + required). Is used for checking if an update or insert needs to be done, when @trigger_id is null or smaller than or equal to 0
  • @sqlagent_jobname: The name of the SQL Agent job that must be started when all jobtrigger conditions are true (required).
  • @processing_order: Indicates the order in which jobtriggers must be processed (required). Duplicate processing order values are allowed.
  • @description: An optional description.

Stored procedure [dbo].[usp_upsert_jobtrigger_condition]

With this stored procedure you can insert or update a jobtrigger condition of any type.

You can set the following parameters:

  • @condition_id: If provided as input, it will be used for checking if an update or insert needs to be done. If left null or smaller than or equal to 0, it will be filled in the stored procedure.
  • @trigger_id: The trigger_id of the jobtrigger to which the jobtrigger condition belongs.
  • @enabled: Only enabled jobtrigger conditions are processed. Values 0 or 1 (required).
  • @processing_order: Indicates the order in which jobtrigger conditions must be processed (required). Duplicate processing order values are allowed.
  • @description: An optional description.
  • Note regarding the parameters underneath: You must provide values for the one @sql* parameter OR the three @file* parameters OR the three @job* parameters.
  • @sql_stored_procedure: The fully qualified name (including database name) of a stored procedure that returns a single row with a single column named [SqlTriggerResult] of the BIT datatype.
  • @file_watch_folder: The folder where is searched for files.
  • @file_recurse_subfolders: Indicates if also subfolders of the watchFolder must be checked.
  • @file_name_pattern: A filename or filename pattern with wildcards of the file that must exist for the triggercondition to be true.
  • @job_name: The name of a SQLAgentjob that must be completed with the desired value for @job_result for the triggercondition to be true.
  • @job_result: Domainvalues [S | F | C] meaning Succeeded, Failed or Completed (=Succeeded or Failed).
  • @job_result_maxage: An age in seconds for the maximum time ago the job was completed. Use -1 for unlimited. For instance, you want a sql agent job to start when 4 other jobs, that are scheduled daily and partially run simultaneously, have all succeeded.
    The time difference between the first and the last estimated completion time is 60 minutes.
    The time difference between the first scheduled starttime and the last estimated completion time is 120 minutes.
    Any value (recalculated as seconds) between 60 minutes (plus a safety margin) and 22 hours (24 hours minus 120 minutes) would be fine.
  • @reverse_condition: If the value 1 (true) is used, the condition must have the opposite result to trigger the job to start. For example a file may not be found or a job may not be running.
    A few things are important to understand:

    1. Job results “Succeeded”, “Failed” or “Completed” check for a result; job state “Running” (or not) checks for a current state. Do not get confused about this.
    2. @reverse_condition does not work for job results “Succeeded” and “Failed” and there is no need to: the opposite of “Succeeded” is “Failed”, you should implement it that way. The opposite of “Not Succeeded” implemented with @reverse_condition = 0, would be any result except “Succeeded”, so “Failed”, “Completed” or “No execution found”.
    3. @reverse_condition does not work for “Completed” for the same reason: The opposite of “Completed” implemented with @reverse_condition = 0, would be any result except “Completed”, so “Failed”, “Succeeded” or “No execution found”. Instead of “Not Completed” you could check for “Running”.
    4. If @job_result = ‘R’ (Running) and @reverse_condition = 0 then @job_result_maxage is not used: when job is running (and the condition is met) @sqlagent_jobname is only started, if it was not started yet after the current execution of the running job (so a running job will not cause the other job to run more than once).
    5. If @job_result = ‘R’ (Running) and @reverse_condition = 1 then @job_result_maxage is used differently: when job is not running (and the condition is met) @sqlagent_jobname is only started, if it was not started the last @job_result_maxage seconds. Therefore it is best, to use “Not Running” only in conjunction with some other condition.

Stored procedure [dbo].[usp_upsert_sql_jobtrigger_condition]

With this stored procedure you can insert or update a jobtrigger condition of type ‘sql’ (using a stored procedure).

You can set the following parameters:
@condition_id, @trigger_id, @enabled, @processing_order, @description, @sql_stored_procedure, @reverse_condition: as described above.

Stored procedure [dbo].[usp_upsert_file_jobtrigger_condition]

With this stored procedure you can insert or update a jobtrigger condition of type ‘file’.

You can set the following parameters:
@condition_id, @trigger_id, @enabled, @processing_order, @description, @file_watch_folder, @file_recurse_subfolders, @file_name_pattern, @reverse_condition: as described above.

Stored procedure [dbo].[usp_upsert_jobresult_jobtrigger_condition]

With this stored procedure you can insert or update a jobtrigger condition of the SQLAgentJobResult type.

You can set the following parameters:
@condition_id, @trigger_id, @enabled, @processing_order, @description, @job_name, @job_result, @job_result_maxage, @reverse_condition: as described above.

Stored procedure [dbo].[usp_delete_jobtrigger]

With this stored procedure you can delete a jobtrigger by trigger_id or name.

Stored procedure [dbo].[usp_delete_jobtrigger_condition]

With this stored procedure you can delete a jobtrigger condition by using the condition_id.

Stored procedure [dbo].[usp_delete_jobtrigger_conditions_by_trigger_id]

With this stored procedure you can delete all jobtrigger conditions for a given trigger_id.

Examples of usage of the stored procedures

Among the files you can download there is the file demo_setup_sysjobtriggers.sql, that uses the upsert stored procedures for different use cases.

Conclusion / Wrap up

I have tried to explain how the SQLAgentJobStarter tool can be used for starting SQL Agent jobs triggered by the existence of one or more files, by the result of a stored procedure execution or by the execution result of one or more other SQL Agent Jobs.
A few important sidemarks I still have to make:

  1. In real life scenarios, make sure the files you are polling for are moved after processing, so that the tool does not try to start the job again.
  2. To create stored procedures that are used as a sql trigger you can use the storedprocedure_template.sql file as a template.
  3. To create your own jobtriggers and triggerconditions please check demo_setup_sysjobtriggers.sql in which some of the stored procedures are used.
  4. I am aware that the sourcecode of the tool might need some refactoring. It does the job, however.
  5. Anyone that can help me with creating a graphical user interface to manage the jobtriggers and triggerconditions using C# and WinForms is more than welcome. If you are interested, please contact me. I do not have enough time to do this myself.

(c) 2016/2017 hansmichiels.com – Do not steal the contents – spread the link instead – thank you.

Zeros, bloody zeros! (Data Vault Series)

Introduction

I must admit I have a weakness for British humour.
When I had to cope with leading zeros in business keys some time ago, I spontaneously came up with the title of this post, not knowing that it would serve as such.
For those who do not know, “Meetings, bloody meetings” is a British comedy training film in which John Cleese plays a main role. It was made in 1976, and a remake was made in 2012.
It tells in a funny way what can go wrong at meetings and how you can do better, check it out if you can.

DV-S01E05-meetingsMr John Cleese

But, obviously, this post is not about meetings but about zeros.

Problem

I can be short about that: leading zeros in business key values.
For instance a customer number is delivered to the data warehouse as 0001806 (instead of 1806).
This would not be a problem it is would always be delivered exactly like that. But to be honest, you can and will not know that upfront. Even this might be the case now, it might not be in the future.
When other tools are used, leading zeros could suddenly disappear (for instance when a csv file is modified using Excel), or (more rarely) the number of leading zeros could change (01806, 00001806). When this happens you have a problem, because for the data warehouse 01806, 0001806, 00001806 and 1806 are all different business keys! Even if you have only two variants, it is already a problem.
Because every business key gets a different row in the hub, and this customer now exists multiple times!

DV-S01E05-zeros(No acting here, this is how I look sometimes)

Solution

If you are familiar with Data Vault, you might already think of same-as-links to solve this.
But I think the solution should be implemented earlier, to avoid having multiple hub rows.
Simply always remove leading zeros when the sourcecolumn is (part of) a business key (either primary or foreign key) and seems a number or ID but is delivered as a string/varchar. In this way 1806 will always be 1806! And I think it is pretty impossible that 001806 and 1806 would refer to two different customers.
Unless, of course, they would come from different source systems. But in that situation, depending on leading zeros would be a bad thing to do, because when then leading zeros dropped off, satellite rows of different customers (in different source systems) could end up as connected to the same hub row! In this situation, in a non-integrated Raw Vault, it would be better to prefix the business key with the source system code and remove the leading zeros, for instance, CRM.1806 and ERP.1806.
In all cases, you can still store the original value (with leading zeros) as an ordinary attribute in a satellite for auditing reasons.

How to implement the solution

There are many ways to remove leading zeros. When I was searching for this I had two requirements:

  • No casting from and to an integer may take place, otherwise all business keys need to be numeric, so this would make the solution less reliable.
  • No function, routine or assembly may be called, this could negatively impact performance. I was looking for an “inline” conversion.

After some research I found an expression that was the same for SQL and SSIS and quite okay (T-SQL version by Robin Hames, my credits for his work), but appeared to change a string with only one or more zeros to an empty string. And because a 0 can have a meaning – and is certainly different from an empty string – this is undesired behavior, IMHO.
So I had to add some logic to it: a SELECT CASE in T-SQL and an inline condition (format {condition} ? {true part} : {false part} ) to the SSIS expression.
Furthermore I came on a different method for T-SQL as well, using the PATINDEX function, which is more compact than the other solution.
For SSIS I still use the ‘Robin Hames’ method, because the PATINDEX function is not available in SSIS Expressions.
So .. this is what it has become:

T-SQL

Remove_leading_zeros.sql

SELECT
    example.[id_with_leading_zeros],
   CASE
      WHEN LTRIM(example.[id_with_leading_zeros]) = '' THEN ''
      WHEN PATINDEX( '%[^0 ]%', example.[id_with_leading_zeros]) = 0 THEN '0'
      ELSE SUBSTRING(example.[id_with_leading_zeros], PATINDEX('%[^0 ]%', example.[id_with_leading_zeros]), LEN(example.[id_with_leading_zeros]))
   END AS [id_without_zeros_method1],

   CASE
      WHEN LTRIM(example.[id_with_leading_zeros]) = '' THEN ''
      WHEN PATINDEX( '%[^0 ]%', example.[id_with_leading_zeros]) = 0 THEN '0'
      ELSE REPLACE(REPLACE(LTRIM(REPLACE(-- Robin Hames' method
            REPLACE(LTRIM(example.[id_with_leading_zeros]), ' ', '!#!') -- replace existing spaces with a string that does not occur in the column value, I have chosen '!#!'
            , '0', ' ') -- replace '0' with ' '
            ) -- end of LTRIM to remove leading '0's that have been changed to ' 's
            , ' ', '0') -- change ' ' back to '0'
            , '!#!', ' ') -- change '!#!' back to ' '
   END AS [id_without_zeros_method2]
FROM
    (
    SELECT
        TOP 1000 RIGHT('00000000000' + CONVERT(NVARCHAR(12), object_id), 14) AS [id_with_leading_zeros]
    FROM
        master.sys.objects
    UNION
    SELECT N' 00000 '
    UNION
    SELECT N'00'
    UNION
    SELECT N' '
    UNION
    SELECT ' 0099990 A '
    UNION
    SELECT '-5550'
    ) example

SSIS Expression (can be used in Derived Column)

(LTRIM(REPLACE(id_with_leading_zeros,"0", "")) == "" && LTRIM(id_with_leading_zeros) != "") ? "0" : REPLACE(REPLACE(LTRIM(REPLACE(REPLACE(LTRIM(id_with_leading_zeros)," ","!#!"),"0"," "))," ","0"),"!#!"," ")

DV-S01E05-151In a Derived Column Transformation this looks for instance like this

Conclusion / Wrap up

In this post I have motivated why I think you should remove leading zeros from business keys when data is loaded from source systems to a data warehouse.
This post also contains different ways to remove leading zeros, two for T-SQL and one for a SSIS expression.

(c) 2016 hansmichiels.com – Do not steal the contents – spread the link instead – thank you.

SQL Server 2016 tempdb configurations benchmark (SQL Server Series)

Introduction

When you have installed SQL Server 2016 you will have noticed that the configuration for tempdb has changed.
Microsoft made a proper tempdb configuration easy to do via the GUI during installation.

Today I want to go through the settings offered and will measure performance of different setups.

What are the options?

During installation you can set the following values:
For TempDB datafiles:

Setting Default value
Number of files 4
Initial size (MB) 8
Autogrowth (MB) 64
Data directories Default data directory, in my case C:\MSSQL\Instance root\MSSQL13.MSSQLSERVER\MSSQL\Data

For TempDB logfile:

Setting Default value
Initial size (MB) 8
Autogrowth (MB) 64
Log directory Default data directory, in my case C:\MSSQL\Instance root\MSSQL13.MSSQLSERVER\MSSQL\Data

MSSQL-S01E03-tempdb-setup
Configuration screen during SQL Server 2016 installation (I changed the directories already).

So what is the best setup?

This subject is not new for SQL Server 2016, but the installation procedure now just gives a better opportunity to set it up right the first time.
There are roughly five differentiators in the setup:

  • Initial size (MB)
  • Autogrowth (MB)
  • Number of data files
  • Location of data files
  • Read/write performance of the different storage media

What are the considerations when setting these values?

Initial size (MB)

A larger initial size will accomplish two things:

  1. It will take longer before the file will have the need to grow.
  2. The file can be stored less fragmented on disk.

If you have enough disk space, the larger the better, there is no real downside on making the initial size quite large. More about this in my conclusion.

Autogrowth (MB)

With autogrowth the choice is a bit more delicated, as a small or large value will both have pros and cons, as pointed out in the schedule below.

Autogrowth value Pros Cons
Small More frequent smaller growth actions will cause less noticable hick ups TempDB datafiles will become more fragmented.
Large Tempdb datafiles will become less fragmented. Less frequent larger growth actions might cause noticable hick ups

Number of data files

In general, the recommendation is one data file per (logical) processor. You can configure more files than the number of (logical) processors, but do this only if you notice that this gives a better performance.

Location of data files

Locating the data files on different drives can have two effects:

  1. Queue length per disk drive in Windows (all scenarios, also with virtual disks residing on the same physical drive).
  2. Physical writes to the diskdrive can go in parallel (only when different physical drives).

The tests

The testcases

Creating combinations of all differentiators would just create *a lot* of testscenarios.
So I will be a bit pragmatic.
The testcases are chosen, so that they can be compared with at least one other testcase, of which one differentiator is different.

Test # # Data files Init size (MB) Auto growth (MB) # Fol- ders Scenario
0 4 8 (x4) 64 1 Default installation (data files on OS disk)
1 4 8 (x4) 64 1 Data files on dedicated drive
2 4 8 (x4) 64 4 Data files on more drives
3 4 2048 (x4) 64 4 Data files on more drives + Larger initial size
4 4 8 (x4) 256 4 Data files on more drives + Larger autogrowth
5 4 2048 (x4) 64 1 Data files on dedicated drive + Larger initial size
6 4 2048 (x4) 64 1 Data files on dedicated drive + Larger initial size + TempDB on different SSD
7 8 2048 (x8) 64 1 Data files on dedicated drive + Larger initial size + More data files
8 4 8 (x4) 64 1 Default installation (data files on OS disk) + Larger initial size

Test results

I will present you both two graphs as well as the raw data. I think especially the first graph gives some useful insights, as it compares all testcases with testcase 0 (default installation), by using an index 100 for testcase 0. If a different testcase performs better, the index will be less than 100. Also, in this way, the average result of both testsets (A and B) can be shown.
I left testcase 6 out of the graph, later I will tell why I did this.
MSSQL-S01E03-testresult-indexIndex-based comparison of testresults

MSSQL-S01E03-testresult-secondsExecution times of testcases for testset A and B

Test results figures

Test # Scenario Testresult testset A (ms) Testresult testset B (ms)
0 Default installation 30359 232795
1 Data files on dedicated drive 28344 261467
2 Data files on more drives 22514 248484
3 Data files on more drives + Larger initial size 28040 210223
4 Data files on more drives + Larger autogrowth 24173 200632
5 Data files on dedicated drive + Larger initial size 19608 192587
6 Data files on dedicated drive + Larger initial size + TempDB on different SSD 123289 387646
7 Data files on dedicated drive + Larger initial size + More data files 19692 211501
8 Default installation + Larger initial size 20149 361844

Conclusion

Data files on dedicated drive + Larger initial size (test 5) gives the best performance for both testsets. Putting tempdb files on multiple disks can not win from this, however the differences are relatively small.

Surprisingly performance dramaticly decreases when the virtual disk on which tempdb is stored is moved to a different (faster) SSD (this is test 6)! I have no explanation for this. Therefore I will leave test 6 out of the analysis, by doing this physical storage is the same for all testcases and changes in performance are only caused by the other differentiators.

My most important conclusion is that the default values offered during installation are quite okay, except for two things:

The Initial size (MB)

You should really set the initial size to a big number (gigabytes if possible). Note that if you leave the number of files 4 (which I recommend), the space used is four times the size that you use for Initial size.
If you have a dedicated disk for tempdb (which is recommended), you might want to set the initial size of the datafile(s) to fill up the disk. You should then disable Autogrowth.
Also on a production server, check the size of tempdb files after a week of uptime (also the SQL instance may not have been restarted). Say the tempdb size of all files together (can also be one file) is 22GB. You could then set your initial size to 28GB in total, so 7GB per file if you configured 4 files. This would mean that during normal operation, the tempdb never has to grow.

Dedicated disk

It is a common best practive to put tempdb on a seperate disk so that the Windows Disk Queue does not have to compete with other read/write actions to that disk.

Other settings are somewhat less important but … think for yourself (my disclaimer)

In my opinion both putting tempdb files on multiple different disks and setting Autogrowth to a different value do not offer much performance benefit.
The Autogrowth value is even irrelevant if the initial size is so large that temdb never has to grow during normal operation.
But every server is different! Test and think for yourself.
My testresults are presented “AS IS” in the hope they will be useful as a starting point for further own analysis. I am not responsible for any disadvantage you might have by using my test results for your tempdb setup.

What if SQL Server is already installed and/or it is an older version?

No worries, you still can change the tempdb configuration.
You can move and add files using SQL scripts.
Initial size and Autogrowth you can also set with Management Studio.
MSSQL-S01E03-166Go to properties of tempdb

MSSQL-S01E03-167Here you can change Initial size and Autogrowth

By the way, I noticed that SQL Server does not delete moved tempdb files. So if you move tempdb files, you should delete the files on the old location, so that they do not use disk space and do not cause confusion for administrators.

Test justification

  • All tests are done on a Hyper V Virtual Machine running Windows Server 2012 R2 and SQL Server 2016 Developer Edition.
  • Change of physical storage type is done by moving vhdx files to different physical disks, and then change the file location in the settings of the Hyper V Virtual Machine (while it is not running, of course).
  • Each test is done on an initial copy of the Virtual Machine, therewith the startsituation is equal for every test (e.g. no tempdb growth by previous tests).
  • Tests where done with two sets of scripts (set A and B) that inserted, updated and deleted rows in temporary tables in tempdb. Set A consists of three sql scripts, and set B of five.
  • All scripts were developed, so that all tempdb inserts started at the same time (maybe milliseconds difference in starttime only). This to ensure that every testrun was executed in the same way.
  • All scripts were executed in SQL Server Management Studio.
  • The first file of tempdb is stored on its own (virtual) disk except for testcase 0, where tempdb files are on the C-drive where Windows is installed.
  • When spreading files over multiple virtual disks, the disks created for database files, log files, and backup files are used. This was done for pragmatic reasons during the test. This was defendable, because no other activity on databases and log files was going on during the test. It is however not recommended to spread tempdb like that in production environment, unless every disk is dedicated to be used for tempdb only.
  • The host machine had the same applications open during every test, being a Windows Explorer and Microsoft Excel. No other applications that used CPU and could influence performance of the virtual machines were running.
  • The tempdb log file was placed on a seperate disk (the same one) for each test. Therewith it was no subject of performance investigation. This would have at least doubled the number of testcases.

Wrap up

Microsoft did a good job in improving the setup options for tempdb in the installation procedure.
It is a bit of a pity that they did not go the last mile, giving the tempdb data files a more realistic Initial size.
Luckily this is easy to change during installation, and it is a quick win for anyone installing SQL Server 2016.

Download all the used scripts here.

(c) 2016 hansmichiels.com – Do not steal the contents – spread the link instead – thank you.

When reverse engineering is a necessary evil .. (SQL Server Series)

.. or: How to find data (text, guids, numbers) in your entire SQL Server database ..

Introduction

A number of years ago I had to make queries on an OLTP database used by third parties’ software. As a person preferring data warehouses for too many reasons to mention here, I am not proud of it, but sometimes these requests just happen. If you have the same now and then, you are not the only one 😉 .

Problem

A little simplified, this is what happened:
I started asking around, is there documentation? “No”, there wasn’t.
Is there a data dictionary? “No”, there wasn’t, either.
Can I have access to the database? “Yes, here’s the connection string.”
So I made a connection, but .. what do all these tables mean? How are they related? How are the foreign keys? The model didn’t tell me, with no physical constraints and unclear table- and column names.
Do you have anything else for me to see how this works? “Well, we have a PROD and a TEST environment, you can have read only access to PROD and read/write to the TEST.”
Okay, that is something ..

MSSQL-S01E02-testconnection
Picture credits: © David Coleman | Dreamstime Stock Photos
When there is only a (test)connection..

Towards a solution

Could you show me how you use the system? Luckily the employee of my client was very helpful and showed me how he used the application.
I tried to get a grip on the SQL Statements caused by his actions, but due to a lack of permissions on the database, this didn’t work.
This could be arranged, but could take a while before I actually would have temporary “elevated” permissions.
As it was just some small piece of “inbetween” work, that seemed an inefficient route to follow.
Okay, I have seen what you did, can I do that on the TEST as well? “Yeah, go ahead.”
So I started to make an order, using silly names and descriptions, and choosing weird seldomly used products. So, what’s next?
Ok, if I can find where my description “First order of Hans” is stored, I can start to map the frontend to the backend, or make clear for myself where the entered information is stored in the database.

Solution

This is where this script became useful. It finds all occurrences of one or more text values in a SQL Server database.

WARNING: Do not use in production environments, as this script can potentially run pretty long (10 to 20 minutes when I used it) and can have a negative impact on database performance in general (so other users might regret that you are running the script).
So use it in a test enviroment when you can.

SearchTextInDatabase.sql

--\
---) SearchTextInDatabase script.
---) Author: Hans Michiels
---) Searches for one or more texts in all (n)(var)char and (n)text
---) columns of a SQL Server database.
---) WARNING: Can potentially run very long, my advise is not
---) to run this script in production environments!
--/
/*
(c) Copyright 2016 - hansmichiels.com
 
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
 
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
 
You should have received a copy of the GNU General Public License
along with this program. If not, see http://www.gnu.org/licenses/.
*/

USE [AdventureWorks2014]
GO

--\-----------------------------------------------------------------------------------
---) DECLARATIONS
--/-----------------------------------------------------------------------------------
DECLARE @schema_name SYSNAME
DECLARE @table_name SYSNAME
DECLARE @column_name SYSNAME
DECLARE @nsql NVARCHAR(4000)
DECLARE @param_def NVARCHAR(500)
DECLARE @search_text NVARCHAR(1000)
DECLARE @search_texts TABLE ([search_text] NVARCHAR(1000))
DECLARE @debug_print NVARCHAR(500)
DECLARE @results_count INT = 0

IF OBJECT_ID('tempdb..#search_results') IS NOT NULL BEGIN
    DROP TABLE #search_results
END
CREATE TABLE #search_results
    ( [search_text] NVARCHAR(1000)
    , [schema_name] NVARCHAR(128)
    , [table_name] NVARCHAR(128)
    , [column_name] NVARCHAR(128)
    , [the_value] NVARCHAR(1000)
    )

DECLARE columns_cursor CURSOR LOCAL STATIC FOR
    SELECT sch.name AS [schema_name]
    , tbl.name AS [table_name]
    , col.name AS [column_name]
    FROM sys.tables tbl
     JOIN sys.schemas sch
        ON sch.schema_id = tbl.schema_id
      JOIN sys.columns col
        ON col.object_id = tbl.object_id
    WHERE col.system_type_id IN (175, 167, 231, 239, 35, 99) -- (N)(VAR)CHAR, (N)TEXT

--\-----------------------------------------------------------------------------------
---) INITIALIZATION
---) Specify the texts to search for below.
---) Use the wildcard % if you want to do a 'like' search.
--/-----------------------------------------------------------------------------------
INSERT INTO @search_texts

  SELECT '%Assembly%'
  UNION SELECT 'First order of Hans'

DECLARE search_cursor CURSOR LOCAL STATIC FOR
    SELECT search_text
    FROM @search_texts

--\-----------------------------------------------------------------------------------
---) MAIN
--/-----------------------------------------------------------------------------------
OPEN columns_cursor
FETCH NEXT FROM columns_cursor INTO @schema_name, @table_name, @column_name

WHILE @@FETCH_STATUS = 0
BEGIN

    SET @debug_print = 'Processing ' + @table_name + '.' + @column_name;
    PRINT REPLICATE('-', LEN(@debug_print))
    PRINT @debug_print
    PRINT REPLICATE('-', LEN(@debug_print))
    
    SET @nsql = 'INSERT INTO #search_results SELECT DISTINCT @inner_search_text, '
        + '''' + @schema_name + ''' AS [schema_name], '
        + '''' + @table_name + ''' AS [table_name], '
        + '''' + @column_name + ''' AS [column_name], CONVERT(NVARCHAR(1000), ' + QUOTENAME(@column_name) + ') AS [the_value] '
        + ' FROM ' + QUOTENAME(@schema_name) + '.' + QUOTENAME(@table_name) + ' WITH (NOLOCK) '
        + ' WHERE ' + QUOTENAME(@column_name) + ' LIKE @inner_search_text;';

    OPEN search_cursor
    FETCH NEXT FROM search_cursor INTO @search_text

    WHILE @@FETCH_STATUS = 0
    BEGIN
        PRINT ''
        PRINT '--> Processing search text ''' + @search_text + '''';
        SET @param_def = N'@inner_search_text NVARCHAR(1000)';
        EXECUTE sp_executesql @nsql, @param_def,
                              @inner_search_text = @search_text;

        FETCH NEXT FROM search_cursor INTO @search_text
    END
    
    CLOSE search_cursor

    FETCH NEXT FROM columns_cursor INTO @schema_name, @table_name, @column_name
    
END

CLOSE columns_cursor
DEALLOCATE columns_cursor

SELECT
      t.schema_name
    , t.table_name
    , t.column_name
    , t.the_value
    , 'SELECT * FROM ' + QUOTENAME(t.schema_name) + '.' + QUOTENAME(t.table_name)
      + ' WHERE ' + QUOTENAME(t.column_name) + ' = ''' + REPLACE(t.the_value, '''', '''''') + '''' AS [select_statement]
 FROM #search_results t

GO

Some time later, I also made two similar versions for searching unique identifiers and numeric values.
I know the scripts might not look so ‘pretty’, but they do the job when needed.

SearchUniqueIdentifierInDatabase.sql

--\
---) SearchUniqueIdentifierInDatabase script.
---) Author: Hans Michiels
---) Searches for one or more unique identifiers in all UNIQUEIDENTIFIER columns
---) of a SQL Server database.
---) WARNING: Can potentially run very long, my advise is not
---) to run this script in production environments!
--/
/*
(c) Copyright 2016 - hansmichiels.com
 
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
 
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
 
You should have received a copy of the GNU General Public License
along with this program. If not, see http://www.gnu.org/licenses/.
*/

USE [AdventureWorks2014]
GO

--\-----------------------------------------------------------------------------------
---) DECLARATIONS
--/-----------------------------------------------------------------------------------
DECLARE @schema_name SYSNAME
DECLARE @table_name SYSNAME
DECLARE @column_name SYSNAME
DECLARE @nsql NVARCHAR(4000)
DECLARE @param_def NVARCHAR(500)
DECLARE @search_value UNIQUEIDENTIFIER
DECLARE @search_values TABLE ([search_value] UNIQUEIDENTIFIER)
DECLARE @debug_print NVARCHAR(500)
DECLARE @results_count INT = 0

IF OBJECT_ID('tempdb..#search_results') IS NOT NULL BEGIN
    DROP TABLE #search_results
END
CREATE TABLE #search_results
    ( [search_value] UNIQUEIDENTIFIER
    , [schema_name] NVARCHAR(128)
    , [table_name] NVARCHAR(128)
    , [column_name] NVARCHAR(128)
    , [the_value] UNIQUEIDENTIFIER
    )

DECLARE columns_cursor CURSOR LOCAL STATIC FOR
    SELECT sch.name AS [schema_name]
    , tbl.name AS [table_name]
    , col.name AS [column_name]
    FROM sys.tables tbl
     JOIN sys.schemas sch
        ON sch.schema_id = tbl.schema_id
      JOIN sys.columns col
        ON col.object_id = tbl.object_id
                              -- UNIQUEIDENTIFIER
    WHERE col.system_type_id IN (36)

--\-----------------------------------------------------------------------------------
---) INITIALIZATION
---) Specify the unique identifiers to search for below.
--/-----------------------------------------------------------------------------------
INSERT INTO @search_values
SELECT 'D081136E-38D6-4D42-8FFD-19A6A8FA61E6'
UNION SELECT '9AADCB0D-36CF-483F-84D8-585C2D4EC6E9'
UNION SELECT '32A54B9E-E034-4BFB-B573-A71CDE60D8C0'
UNION SELECT '4C506923-6D1B-452C-A07C-BAA6F5B142A4'

DECLARE search_cursor CURSOR LOCAL STATIC FOR
    SELECT search_value
    FROM @search_values

--\-----------------------------------------------------------------------------------
---) MAIN
--/-----------------------------------------------------------------------------------
OPEN columns_cursor
FETCH NEXT FROM columns_cursor INTO @schema_name, @table_name, @column_name

WHILE @@FETCH_STATUS = 0
BEGIN

    SET @debug_print = 'Processing ' + @table_name + '.' + @column_name;
    PRINT REPLICATE('-', LEN(@debug_print))
    PRINT @debug_print
    PRINT REPLICATE('-', LEN(@debug_print))
    
    SET @nsql = 'INSERT INTO #search_results SELECT DISTINCT @inner_search_value, '
        + '''' + @schema_name + ''' AS [schema_name], '
        + '''' + @table_name + ''' AS [table_name], '
        + '''' + @column_name + ''' AS [column_name], ' + QUOTENAME(@column_name) + ' AS [the_value] '
        + ' FROM ' + QUOTENAME(@schema_name) + '.' + QUOTENAME(@table_name) + ' WITH (NOLOCK) '
        + ' WHERE ' + QUOTENAME(@column_name) + ' = @inner_search_value ;';

    OPEN search_cursor
    FETCH NEXT FROM search_cursor INTO @search_value

    WHILE @@FETCH_STATUS = 0
    BEGIN
        PRINT ''
        PRINT '--> Processing search value ''' + CONVERT(NVARCHAR(40), @search_value) + '''';
        SET @param_def = N'@inner_search_value UNIQUEIDENTIFIER';
        EXECUTE sp_executesql @nsql, @param_def,
                              @inner_search_value = @search_value;

        FETCH NEXT FROM search_cursor INTO @search_value
    END
    
    CLOSE search_cursor

    FETCH NEXT FROM columns_cursor INTO @schema_name, @table_name, @column_name
    
    SELECT @results_count = (SELECT COUNT(*) FROM #search_results);
    
END

CLOSE columns_cursor
DEALLOCATE columns_cursor

SELECT
      t.schema_name
    , t.table_name
    , t.column_name
    , t.the_value
    , 'SELECT * FROM ' + QUOTENAME(t.schema_name) + '.' + QUOTENAME(t.table_name)
      + ' WHERE ' + QUOTENAME(t.column_name) + ' = ''' + REPLACE(t.the_value, '''', '''''') + '''' AS [select_statement]
 FROM #search_results t

GO


SearchNumberInDatabase.sql

--\
---) SearchNumberInDatabase script.
---) Author: Hans Michiels
---) Searches for one or more numeric values in all TINYINT, SMALLINT,
---) INT, FLOAT, DECIMAL and NUMERIC columns of a SQL Server database.
---) WARNING: Can potentially run very long, my advise is not
---) to run this script in production environments!
--/
/*
(c) Copyright 2016 - hansmichiels.com
 
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
 
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
 
You should have received a copy of the GNU General Public License
along with this program. If not, see http://www.gnu.org/licenses/.
*/

USE [AdventureWorks2014]
GO

--\-----------------------------------------------------------------------------------
---) DECLARATIONS
--/-----------------------------------------------------------------------------------
DECLARE @schema_name SYSNAME
DECLARE @table_name SYSNAME
DECLARE @column_name SYSNAME
DECLARE @nsql NVARCHAR(4000)
DECLARE @param_def NVARCHAR(500)
DECLARE @search_value DECIMAL(38, 2)
DECLARE @search_values TABLE ([search_value] DECIMAL(38, 2))
DECLARE @debug_print NVARCHAR(500)
DECLARE @results_count INT = 0
DECLARE @max_difference DECIMAL(38, 2) -- Use to find values 'near by'

IF OBJECT_ID('tempdb..#search_results') IS NOT NULL BEGIN
    DROP TABLE #search_results
END
CREATE TABLE #search_results
    ( [search_value] DECIMAL(38, 2)
    , [schema_name] NVARCHAR(128)
    , [table_name] NVARCHAR(128)
    , [column_name] NVARCHAR(128)
    , [the_value] DECIMAL(38, 2)
    )

DECLARE columns_cursor CURSOR LOCAL STATIC FOR
    SELECT sch.name AS [schema_name]
    , tbl.name AS [table_name]
    , col.name AS [column_name]
    FROM sys.tables tbl
     JOIN sys.schemas sch
        ON sch.schema_id = tbl.schema_id
      JOIN sys.columns col
        ON col.object_id = tbl.object_id
                              -- TINYINT, SMALLINT, INT, INT
    WHERE col.system_type_id IN (48, 52, 56, 127
                              -- FLOAT, DECIMAL, FLOAT, DECIMAL, DECIMAL, DECIMAL
                              --, 59, 60, 62, 106, 108, 122, 127
                                )

--\-----------------------------------------------------------------------------------
---) INITIALIZATION
---) Specify the texts to search for below “INSERT INTO @search_values”
---) Also set a value for @max_difference
---) @max_difference makes it possible to find nearby numbers.
--/-----------------------------------------------------------------------------------
SELECT @max_difference = 1.0
INSERT INTO @search_values
  SELECT 755
  --UNION SELECT 97867563

DECLARE search_cursor CURSOR LOCAL STATIC FOR
    SELECT search_value
    FROM @search_values

-------------------------------------------------------------------------------------
-- MAIN
-------------------------------------------------------------------------------------
OPEN columns_cursor
FETCH NEXT FROM columns_cursor INTO @schema_name, @table_name, @column_name

WHILE @@FETCH_STATUS = 0
BEGIN

    SET @debug_print = 'Processing ' + @table_name + '.' + @column_name;
    PRINT REPLICATE('-', LEN(@debug_print))
    PRINT @debug_print
    PRINT REPLICATE('-', LEN(@debug_print))
    
    SET @nsql = 'INSERT INTO #search_results SELECT DISTINCT @inner_search_value, '
        + '''' + @schema_name + ''' AS [schema_name], '
        + '''' + @table_name + ''' AS [table_name], '
        + '''' + @column_name + ''' AS [column_name], CONVERT(DECIMAL(38, 2), ' + QUOTENAME(@column_name) + ') AS [the_value] '
        + ' FROM ' + QUOTENAME(@schema_name) + '.' + QUOTENAME(@table_name) + ' WITH (NOLOCK) WHERE ABS(' + QUOTENAME(@column_name) + ' - @inner_search_value) <= ' + CAST(@max_difference AS VARCHAR(50)) + ';';

    PRINT @nsql
    
    OPEN search_cursor
    FETCH NEXT FROM search_cursor INTO @search_value

    WHILE @@FETCH_STATUS = 0
    BEGIN
        PRINT ''
        PRINT '--> Processing search value ''' + CONVERT(NVARCHAR(40), @search_value) + '''';
        SET @param_def = N'@inner_search_value DECIMAL(38, 2)';
        EXECUTE sp_executesql @nsql, @param_def,
                              @inner_search_value = @search_value;

        FETCH NEXT FROM search_cursor INTO @search_value
    END
    
    CLOSE search_cursor

    FETCH NEXT FROM columns_cursor INTO @schema_name, @table_name, @column_name
    
    SELECT @results_count = (SELECT COUNT(*) FROM #search_results);
    
END

CLOSE columns_cursor
DEALLOCATE columns_cursor

SELECT
      t.schema_name
    , t.table_name
    , t.column_name
    , t.the_value
    , 'SELECT * FROM ' + QUOTENAME(t.schema_name) + '.' + QUOTENAME(t.table_name) + ' WHERE ' + QUOTENAME(t.column_name) + ' = ''' + REPLACE(t.the_value, '''', '''''') + ''''
 FROM #search_results t

GO

Download all scripts here.

Conclusion / Wrap up

When you have to understand a SQL Server datamodel, for instance to get data out, and there is no documentation but only a test environment and database connection available, these scripts can help you to find where data entered in the frontend software is stored in the database.
A different use case is when the database is using unique identifiers for primary and foreign keys, and the foreign key relationships are not physically enforced and/or flexible. In that case you can search for a specific unique identifier and see where it is used in the database.

(c) 2016 hansmichiels.com – Do not steal the contents – spread the link instead – thank you.

Top 10 Design Mistakes seen in OLTP production systems (SQL Server Series)

Introduction


If you ever had to source your datawarehouse from OnLine Transaction Processing databases, I suppose this top 10 will be a “joy” of recognition for you.

Top 10 Design Mistakes seen in OLTP production systems

1. Don’t have referential integrity enforced

Without actual foreign key constraints in the database, you will get orphaned records sooner or later (e.g. orderline rows for already deleted orders). This is bad for dataquality and can hide bugs in the front end software maintaining the data.

Back to top

2. Foreign key columns having a totally different name than the primary key they refer to

Especially when point 1 is the case, it is difficult to tell that a foreign key column “account_id” refers to the “customer_id” column in the customer table. When reverse engineering a OLTP design this does not really help to understand it.

Back to top

3. Foreign key columns missing “Id”, “Key” or “Guid” in the name

I personally dislike a column name “customer” when it is an integer or global unique identifier datatype. However from the datatype you can draw the conclusion that it cannot be the name, there are many situations where you only see the column name, and then it can be confusing.

Back to top

4. Foreign key columns meaning more than one thing

More than once, in standard software which is highly adjustable or configurable, the datamodel is “flexible”, e.g. link tables where object_id can be a customer, an employee, or yet something else. Apart from the problem this gives in enforcing referential integrity without a physical constraint, it makes it more difficult to understand the model and/or get data out.

Back to top

5. Accessing the datamodel through low-performance all-in-one views, protecting base tables from direct access

They exist! Views with 700+ columns, probably joining 20+ tables, to give an all-in-one view for an account. And still, then you need an extra column from an already joined table, that is not in the list of 700+ columns. You can guess what this leads to, suboptimal queries with poor performance. While, if you were allowed to write a view directly on the tables, you could optimize it for your needs.

Back to top

6. Oversized column lengths

Ever seen a varchar(max) column for a country code with a maximum length of 3? I Have. While this is an extreme example, oversized columns lead to several problems:

  • SQL Server handles the column differently. Without going into too much detail, a char(3) column for the country code would be stored much more efficiently than a varchar(max) column.
  • You do not always know how much space must be reserved for the column on reports and so on.
  • If you use the value in concatenated columns (e.g. business keys), you need to size it after the physical maximum size, even when the actual data will not fill it up.


Oversizing can happen:

  • when front end C# developers use “code first”, and the column length is not defined in the code. Therefore I recommend, when “code first” is used, a database developer or DBA has a critical look at the generated physical datamodel before going live.
  • when the column size is not specified in design documents.
  • when people just don’t care to size the column properly. Oversizing seems like a safe thing to do, if the column is big enough, no one will complain!

Back to top

7. Oversized INT Datatypes

Or: To Smallint or not to Smallint. When reference tables only contain a few rows, a smallint or even tinyint datatype for the primary key will do! If the primary key is used in foreign key constraints of large tables (with many rows) less storage is needed, indexes will be smaller, all leading to better performance.

Unfortunately, SQL Server does not support unsigned smallint, int and bigint datatypes, but nobody stops you from start counting at the minimum negative value, so -32768 instead of 1 for a smallint! Just change the seed part of the IDENTITY specification. If you do this, you can store up to 65536 rows in a table with a smallint primary key.

MSSQL-S01E01-Top 10 Design Mista…

–\
—) Author: Hans Michiels
—) Script to demonstrate how to start an
—) identity column with a negative number
–/

IF OBJECT_ID(‘[dbo].[A65kRowsTable]’, ‘U’) IS NOT NULL
DROP TABLE [dbo].[A65kRowsTable]
GO
CREATE TABLE [dbo].[A65kRowsTable](
[ID] SMALLINT IDENTITY(32768, 1) NOT NULL,
[SomeColumn] VARCHAR(50) NOT NULL
CONSTRAINT [PK_dbo_A65kRowsTable_ID] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
INSERT INTO [dbo].[A65kRowsTable] ( [SomeColumn] )
VALUES (‘-32768 is the first ID.’)
GO
SELECT [ID], [SomeColumn]
FROM [dbo].[A65kRowsTable]
GO


If you do this trick with an INT, you can store 4.2 billion rows and you might in some cases be able to prevent that you need a bigint.

Back to top

8. Lack of unique indexes for business or natural keys

Again a dataquality issue. If the primary key is surrogate (an integer type which is autonumbered), a table still can have one or more different columns that make a row unique. For example, a social security number in a citizens table. Not enforcing this constraint via an unique index, can lead to two citizens with the same social security number! This can hide bugs in the front end software, fraude, or both. Anyway, it is not wise to take the risk that this happens.

Back to top

 9. Guids stored as (n)varchar

A guid, short for Global Unique IDentifier, is a generated value of 16 bytes, guaranteed to be globally unique. It is sometimes used for primary key columns, instead of integer datatypes.
I will keep it in the middle if this is wise to do, there might be some use cases when this is a valid approach, but I would only do this if using integers is not an option.

SQL Server has the UNIQUEIDENTIFIER datatype to store guids. Internally, this datatype uses 16 bytes, but the value is represented as a hexadecimal string of 36 characters (32 for the 16 bytes, with 4 hyphens in between, sometimes surrounded by curly brackets, e.g. {935D9FA8-2C77-4C34-894C-8FCDA8E47F19}.
But if the guid is stored in a varchar(36) column, it will use 36 bytes for the value plus 2 bytes overhead, so 38 bytes in total. Compared with the 16 bytes for the UNIQUEIDENTIFIER, you can imagine that storage space and index size will increase and performance will decrease.
Not to mention Nvarchar(36), the Unicode variant of varchar, which will use 2 bytes for every character plus 2 bytes overhead, so 74 bytes in total. This is massive! I have seen nvarchar(36) in a production system to store a guid and would classify it as a “My first database” mistake. Only very unexperienced developers, hardly knowing anything about datatypes would do this. Don’t be one of them!

MSSQL-S01E01-mistakes-homer-doh

Back to top

10. Where’s the DBA?

This point is more about daily operations than about the design, but I still want to mention it. If you are using SQL Server databases you need to maintain them. A simple maintenance plan is easy to set up, and will do index and statistics maintenance for you, make backups and could prevent that the transaction log file grows excessively. If you don’t have an internal DBA, hire one to set it up and monitor your SQL Server instances.

(c) 2016 hansmichiels.com – Do not steal the contents – spread the link instead – thank you.