The CONTROLLER – TASK design pattern for stored procedures

(This blog post was first published on 25-nov-2013 on my former blog)

The challenges

Working with SQL Server and stored procedures for several years, I have faced several challenges like:

  1. reuse of stored procedures: make sure stored procedures are small components that you can use for different purposes.
  2. maintainability of stored procedures: make sure stored procedures do not become large ‘procedural written’, otherwise they can become hard to maintain due to this.
  3. generating code versus manual maintenance: how to merge generated and manual SQL code easily.
  4. where to implement error handling and the automic unit using a transaction: how to ensure these without harming the needs above.

And I have found a way to deal with this: The CONTROLLER - TASK design pattern for stored procedures. This pattern solves my challenges into a working solution!

1) Reuse of stored procedures

Sometimes you want to reuse a particular INSERT, UPDATE or DELETE statement. When it is in a stored procedure you can simply call this stored procedure! But what if this stored procedure also contains other code, uses its own transaction or implements its own error handling? Then this might be not that simple!

The CONTROLLER-TASK design pattern for stored procedures makes reuse of SQL code very easy.

2) Maintainability of stored procedures

A best practice in C# programming (or any type of programming really) that has become even more popular over the last years, is to create short methods containing only a small amount of code, and having a “main” method to call all the short methods.

Advantages: the main method is very readable (when method names are self-explaining), and code stays really maintainable. Also, short methods can be unittested easier.

Why not do the same in SQL code then?

Why not have an inventory of short stored procedures, either generated or manually made, that do only one task, and shop around to glue together what you need in a “main” stored procedure?

When working like this in a team:

- tasks can be divided easier when stored procedures are short

- stored procedures are easier to understand for new developers when they are short

The CONTROLLER-TASK design pattern for stored procedures makes your SQL code better maintainable.

3) Generating code versus manual maintenance

How convenient would it be to generate generic stored procedures, for instance UPDATE stored procedures for each table, a LOAD stored procedure for each hub in a datavault datawarehouse, and so on? Generating is not extremely more difficult than creating normal stored procedures. Just use tables like sys.objecs, sys.columns, and so on, and a lot is possible.

Imagine these stored procedures populate your inventory of “tasks”, ready to use and glue together in  a “main” stored procedure?

The CONTROLLER-TASK design pattern makes combining both manually made and generated SQL code to a working solution extremely easy.

4) Where to implement the error handling and automic unit using a transaction?

Is a stored procedure still reusable, if it is running in its own transaction? Maybe, if you use nested transactions. But there are drawnbacks and risks. Code can become messy when not implemented in the same way in different stored procedures.

Is a stored procedure still reusable, if it implements its own error handling? Maybe, but you have to be careful. If it ‘swallows’ the error, the stored procedure that calls it will not notice it and will continue after the error. This is almost never what you want. Besides that, implementing error handling in every stored procedure, makes your code base consist of many more lines.

So .. why not divide the “core” SQL code to run, and the “housekeeping” (automic unit, error handling)? No more confusion, it is clear where things are done, and code is easier reusable and maintainable.

The CONTROLLER-TASK design pattern divides core functionality and housekeeping code, therewith keeping your code cleaner and again .. better maintainable!

My solution, easy to implement

My answer to the challenges above is the CONTROLLER-TASK design pattern for stored procedures. As a picture says more than a thousand words, take a close look at the one below.

controller_task_design_pattern

Getting started

Allright, if you see the value of it, how can you implement the CONTROLLER-TASK design pattern?

In its basic form, you’ll need only a few things: a naming convention and two stored procedure templates.

I will tell you how I did this, so you can use or modify it in the way you prefer.

Choosing a naming convention

As explained, you have three types of stored procedures:

  • Task stored procedures (manually made)
  • Task stored procedures (generated)
  • Controller stored procedures

I just prefix my stored procedures with the typename as follows:

  • TASK_<BaseStoredProcName> for stored procedures that are manually made
  • GTSK_<BaseStoredProcName> for stored procedures that are generated
  • CTLR_<BaseStoredProcName> for controller stored procedures

Of course you can invent your own different naming convention, as long as you use it consequently and document it for new developers working on your project.

Using two templates

Here are my templates, you can modify for your own needs

Prerequisites

prerequisites.sql
CREATE SCHEMA [sp] AUTHORIZATION [dbo];

TASK stored procedures

TASK stored procedures.sql
PRINT ‘ScriptFile P10.sp.TASK_ReplaceWithStoredProcBaseName.sql’
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
IF EXISTS (SELECT 1 FROM sys.objects WHERE object_id = OBJECT_ID(N’sp.TASK_ReplaceWithStoredProcBaseName’) AND type in (N’P’, N’PC’))
    DROP PROCEDURE sp.TASK_ReplaceWithStoredProcBaseName
GO
/* PRESS CTRL-SHIFT-M in SSMS to fill parameters
==========================================================================================
Author :
Create date :
Parameters:
- Input : none
- Output : none
Returnvalue : 0 if executed successfully.
Description : >
==========================================================================================
*/

CREATE PROCEDURE sp. TASK_ReplaceWithStoredProcBaseName
AS
BEGIN
  ---------------------------------------------------------------------------------------------------
  -- DECLARATIONS
  ---------------------------------------------------------------------------------------------------
  --N/A
  ---------------------------------------------------------------------------------------------------
  -- INITIALIZATION
  ---------------------------------------------------------------------------------------------------
  --N/A
  ---------------------------------------------------------------------------------------------------
  -- MAIN
  ---------------------------------------------------------------------------------------------------
  SELECT 1 AS [Replace with your own code]

END
GO

 

Controller stored procedures

Controller stored procedures.sql
PRINT ‘ScriptFile P20.sp.CTLR_ReplaceWithStoredProcBaseName.sql’
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
IF EXISTS (SELECT 1 FROM sys.objects WHERE object_id = OBJECT_ID(N’sp.CTLR_ReplaceWithStoredProcBaseName’) AND type in (N’P’, N’PC’))
    DROP PROCEDURE sp.CTLR_ReplaceWithStoredProcBaseName
GO
/* PRESS CTRL-SHIFT-M in SSMS to fill parameters
==========================================================================================
Author :
Create date :
Parameters:
- Input : none
- Output : none
Returnvalue : 0 if executed successfully.
Description : >
==========================================================================================
*/

CREATE PROCEDURE sp.CTLR_ReplaceWithStoredProcBaseName
AS
BEGIN
  ---------------------------------------------------------------------------------------------------
  -- TEMPLATE CODE: CONFIGURATION OF CONTROLLER, BEGIN TRANSACTION, INFO LOGGING
  ---------------------------------------------------------------------------------------------------
  BEGIN TRY

    SET XACT_ABORT ON
    BEGIN TRANSACTION

    ---------------------------------------------------------------------------------------------------
    -- (NON-TEMPLATE) DECLARATIONS
    ---------------------------------------------------------------------------------------------------
    -- N/A
    ---------------------------------------------------------------------------------------------------
    -- EXECUTE TASK STORED PROCEDURES
    ---------------------------------------------------------------------------------------------------
    -- Example call:
    EXECUTE sp.TASK_ReplaceWithStoredProcBaseName
    --

    ---------------------------------------------------------------------------------------------------
    -- TEMPLATE CODE: COMMIT TRANSACTION
    ---------------------------------------------------------------------------------------------------
    COMMIT TRANSACTION

  END TRY
  BEGIN CATCH
    ---------------------------------------------------------------------------------------------------
    -- TEMPLATE CODE: CATCH BLOCK WITH ROLLBACK AND THROW
    ---------------------------------------------------------------------------------------------------
    IF @@TRANCOUNT > 0
    BEGIN
        ROLLBACK TRANSACTION;
    END
    -- Options here:
    --\
    ---> 1) Log the error (optional)
    --/
    -- Not part of the BASIC template

    --\
    ---> 2) Rethrow the error (optional)
    --/
    -- Syntax for SQL Server 2012:
    -- ; THROW

    -- Syntax for SQL Server 2008 and earlier:
    DECLARE @ORIGINAL_ERROR NVARCHAR(MAX)
          , @ERROR_SEVERITY INT = ERROR_SEVERITY()
          , @ERROR_STATE INT = ERROR_STATE()

    SELECT @ORIGINAL_ERROR = + ERROR_MESSAGE()
                        + ‘ (ERROR_NUMBER=’ + CAST(ERROR_NUMBER() AS NVARCHAR)
                        + ‘, ERROR_LINE=’ + CAST(ERROR_LINE() AS NVARCHAR)
                        + ‘, ERROR_PROCEDURE=’ + ERROR_PROCEDURE() + ‘)’
    RAISERROR(@ORIGINAL_ERROR, @ERROR_SEVERITY, @ERROR_STATE)

  END CATCH

  ---------------------------------------------------------------------------------------------------
  -- RETURN THE RESULT WHEN EXECUTED SUCCESFULLY
  ---------------------------------------------------------------------------------------------------
  RETURN 0;
    
END
GO

 

Summary

In this blogpost I have explained to you the advantages of the CONTROLLER - TASK design pattern for stored procedures, which are:

  • makes reuse of SQL code very easy.
  • makes your SQL code better maintainable.
  • makes combining both manually made and generated SQL code to a working solution extremely easy.
  • divides core functionality and housekeeping code, therewith keeping your code cleaner and again .. better maintainable!

After that you were shown how to get started with a basic implementation of this design pattern.

Thank you for reading my blogpost and I hope you enjoyed it and found it useful.

How to add a rownumber to a Data Flow Task in SSIS 2012

(This blog post was first published on 20-jun-2013 on my former blog)

The problem

For one of my projects I needed to add the file line number to the staging table, so that the raw data that was inserted into the database, would be tracable back to the source files exact line.

First I wanted to use the RowCount component in the Data Flow Task, it was easy to configure (only assign a variable) but it appeared that the variable only got the value of the current row after the Data Flow Task had finished, so you knew afterwards how many rows had gone through the dataflow. But that was not what I needed! I want an instant row number, telling me which row is flowing through right now!

The solution: a script component

So what we have to do is add a script component to the Data Flow Task.

20130620_pscr1

 

But remember, if we work with a ReadWrite variable, we will have the same problem as with the RowCount component: the value of the variable is not updated until the dataflow task has finished, so in the flow it will have the initial value for every row.

To work around this, we have to add an output column to the script component as follows:

20130620_pscr2

 

Now we have to edit the script. No ReadOnlyVariables or ReadWriteVariables need to be set, just press the <Edit Script> button.

20130620_pscr3

As the script really is just a C# class, all we have to do is add a private variable, initialize it, increase it for every row and assign the value to the outputcolumn “RowNumber”.

I will show you how:

Simply add the parts in red boxes to the script:

20130620_pscr4

20130620_pscr5

That’s it. Simple, hey?

Now, there is one more thing to do. Open your OLE DB Destination and make a mapping for the RowNumber column. In my example the database field is also called RowNumber, but this is of course not required, you can give it any name you like.

20130620_pscr6

Now you’re done, run the script and you will see that the RowNumber column in the destination table will increase for every row that is inserted!

Query hints WITH (NOLOCK) and WITH (READPAST) in action

(This blog post was first published on 18-jan-2013 on my former blog)

Download all used scripts here.

Query hints WITH (NOLOCK) and WITH (READPAST) are a great way to increase performance on (readonly) queries that run on one or more tables in which data is added , edited or deleted at the same time.

However those query hints have also some drawbacks, to be aware of, when using them.

In this article I will demonstrate how these query hints work, and when to use them.

Let’s start with some theory: what do these query hints do?

It makes no sense to copy the MSDN description here. If you need more detailed information you can read it yourself on msdn

I will just summarize what these query hints can do for me.

WITH (READPAST) – shows only committed data. When a record gets updated, it will not be selected at all, not the previous version of the record, nor the updated but uncommitted version.

WITH (NOLOCK) – shows both committed and uncommitted (dirty) data. When a record gets updated, the new uncommitted values of the record will be selected.

When to use

In general, do not use these queryhints, when you retrieve data to edit, e.g. in disconnected scenario’s like editing in a webapplication. Working with uncommitted data will in this case seldom be what you want.

When it comes to read only access, it depends. You have to decide what is more important: showing always the right (committed) data, or showing data fast. If the risk that uncommitted data is shown is acceptable, WITH (NOLOCK) is the query hint for you, if the risk that data being changed is not shown at all, WITH (READPAST) is the query hint for you.

Also for semi-static data (dimension- or system tables), that is seldom changed, the risk of the drawbacks of both queryhints, is usually acceptable.

Executing the demo

Step 1

First of all we have to do some setup work. I have made a script for that. You can run it in SQL Server Management Studio (see link to download all scripts at the top of this article).

Note that for simplicity of the demo I have created a CHAR column to be primary key. I know it is not a best practice, but for the demo it is sufficient. Normally you would always strive for numeric primary keys.

step1_setup.sql

— Step 1: Create a QueryHintsDemo database with one table, DemoData
USE [master]
GO
IF NOT EXISTS (SELECT name FROM sys.databases WHERE name = N’QueryHintsDemo’)
  CREATE DATABASE [QueryHintsDemo]
GO
USE [QueryHintsDemo]
GO
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[DemoData]’) AND type in (N’U’))
DROP TABLE [dbo].[DemoData]
GO

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO

CREATE TABLE [dbo].[DemoData](
 [DemoDataCode] CHAR(6) NOT NULL,
 [DemoDataDescription] VARCHAR(100) NOT NULL,
 CONSTRAINT [PK_dbo_DemoData] PRIMARY KEY CLUSTERED
(
 [DemoDataCode] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

SET ANSI_PADDING OFF
GO

Download all used scripts here.

Step 2

What is next?

Now you have to make sure you have two query windows open in Management Studio, so you can execute one script, and before that is finished, execute another script as well.

The first script will insert some initial data in the DemoData table, and then update it in a transaction. To be able to do this demonstration properly, the WAITFOR DELAY feature is used to wait 20 seconds before the transaction is rolled back.

In the first query window, use this script:

step2a_keep_transaction_open.sql

TRUNCATE TABLE [QueryHintsDemo].[dbo].[DemoData]
GO

INSERT INTO [QueryHintsDemo].[dbo].[DemoData]
           ([DemoDataCode]
           ,[DemoDataDescription])
     VALUES
           (‘DEMO01’
           ,‘This is the original text.’)
GO

BEGIN TRANSACTION

  UPDATE [QueryHintsDemo].[dbo].[DemoData]
     SET [DemoDataDescription] = ‘This is the updated text that gets rolled back later.’
   WHERE [DemoDataCode] = ‘DEMO01’

  WAITFOR DELAY ’00:00:20′

ROLLBACK TRANSACTION
GO

BEGIN TRANSACTION

  UPDATE [QueryHintsDemo].[dbo].[DemoData]
     SET [DemoDataDescription] = ‘This is the updated text that gets committed.’
   WHERE [DemoDataCode] = ‘DEMO01’
  
COMMIT TRANSACTION
GO

 

This is the script for the second querywindow:

step2b_select.sql

— WITH READPAST, will not show the updated, non committed record
— but also not the ‘old’ version of the record, before it got updated.
— So you potentially miss out some data when using this query hint.
SELECT ‘WITH (READPAST)’ AS [READPAST], *
FROM [QueryHintsDemo].[dbo].[DemoData] WITH (READPAST)
GO

— WITH NOLOCK, will show the updated, but uncommitted record. This is also called a “dirty read”.
— The risk here is that you use/display data, that in the end will
— not be committed and therefor never became ‘official’.
SELECT ‘WITH (NOLOCK)’ AS [NOLOCK], *
FROM [QueryHintsDemo].[dbo].[DemoData] WITH (NOLOCK)
GO

— Without using any queryhint, the select statement will wait until the
— first transaction is rolled back, and then show the original value.
— This is the most reliable, but also the slowest.
— You will notice that the first two SELECT statements will return
— a result immediately, but that the query below will only show result after almost
— 20 seconds, caused by the 20 seconds delay in the previous script.
SELECT ‘Without queryhint (1)’ AS [No hint 1], *
FROM [QueryHintsDemo].[dbo].[DemoData]

— Wait a few seconds, so that the update to be committed in the other script, is completed.
WAITFOR DELAY ’00:00:02′
— Without using any queryhint, the select statement will wait until the
— update is finished, and then show the final value.
SELECT ‘Without queryhint (2)’ AS [No hint 2], *
FROM [QueryHintsDemo].[dbo].[DemoData]

 

As you might already have understood, you should execute both scripts in SQL Server Management Studio: execute the first script, and while it is still running, start the second script in a seperate query window as soon as possible (but at least within 20 seconds after starting the first script).

Analyzing the query result

20130118_analyze_result2

 
Clean up
Use this script to clean up.

cleanup.sql

USE [master]
GO
EXEC msdb.dbo.sp_delete_database_backuphistory @database_name = N’QueryHintsDemo’
GO
ALTER DATABASE [QueryHintsDemo] SET SINGLE_USER WITH ROLLBACK IMMEDIATE
GO
DROP DATABASE [QueryHintsDemo]
GO

 

Conclusion

The queryhints WITH (NOLOCK) and WITH (READPAST) are a great way to improve performance.

However they both have their own drawnbacks, to be aware of, when using them:

  • When using WITH (READPAST), you potentially miss out some data that is being updated, when your select query runs.
  • When using WITH (NOLOCK), the risk here is that you use/display data, that in the end will not be committed and therefor never became ‘official’.

It depends on the situation, when it is acceptable to use these queryhints.