Category Archives: SQL Server Series

Contains blog posts related to SQL Server that can not be placed in other categories like “SSIS Series” or “T-SQL Scripting Series”.

Generate new predictable unique identifiers from existing ones

Introduction

Today I want to share a script with you that I used to copy some database rows that use unique identifiers (or guids).
I had related rows that I needed to copy (compare with OrderHeader / OrderDetail where OrderDetail contains the OrderHeaderGuid)

Problem

When you want to copy related records that are identified with unique identifiers, you have a dependency on the unique identifier values that are used in other tables as foreign keys.
Especially when you want to copy multiple rows (having multiple different unique identifiers) creating new predictable guids from the existing ones can be useful.
In the example below I explain how this can be done using a MD5 hash of the source guid.
The example uses the AdventureWorks database.

Solution

TSQL-E04-script.sql

--\
---) Script: Generate new predictable GUIDs from existing ones - demo
---) Author: Hans Michiels
---) Website: www.hansmichiels.com
--/

--\
---) 1. Prepare for the demo
--/
USE [tempdb]
GO

IF OBJECT_ID('[dbo].[SalesOrderHeader]', 'U') IS NOT NULL
   DROP TABLE [dbo].[SalesOrderHeader];
IF OBJECT_ID('[dbo].[SalesOrderDetail]', 'U') IS NOT NULL
   DROP TABLE [dbo].[SalesOrderDetail];
GO

SELECT NEWID() AS [SalesOrderGUID], h.*
INTO [dbo].[SalesOrderHeader]
FROM [AdventureWorks].[Sales].[SalesOrderHeader] h

SELECT h.SalesOrderGUID, NEWID() AS [SalesOrderDetailGUID], d.*
INTO [dbo].[SalesOrderDetail]
FROM [AdventureWorks].[Sales].[SalesOrderDetail] d
JOIN [dbo].[SalesOrderHeader] h ON h.SalesOrderID = d.SalesOrderID

ALTER TABLE [tempdb].[dbo].[SalesOrderHeader] DROP COLUMN SalesOrderID;
ALTER TABLE [tempdb].[dbo].[SalesOrderDetail] DROP COLUMN SalesOrderID;
ALTER TABLE [tempdb].[dbo].[SalesOrderDetail] DROP COLUMN SalesOrderDetailID;
GO

--\
---) 2. Demonstrate how it works given the example that you would
---) want to copy an existing SalesOrder and its detail rows to
---) a new one, without having the dependency to insert the
---) header first, then retrieve the value for [SalesOrderGUID]
---) for the row, and then use it to insert the detail rows.
---) Using a new predictable unique identifier this lookup is
---) not needed.
--/

-- First we declare a value to use for 'Salt'. This makes sure that we could
-- copy the same source rows multiple times to new rows, because the generated
-- guids will be different every time the script runs.
-- We keep it simple and just use a timestamp as a salt value.
DECLARE @NewGuidSalt VARCHAR(30) = CONVERT(VARCHAR, GETDATE(), 120)
PRINT @NewGuidSalt

-- Again I keep the example very simple, hypothetical almost because the action
-- below is not likely for a real life example: we select 3 salesorders from a specific CustomerID and copy them to a different CustomerID.
-- However it explains the concept.
DECLARE @SourceCustomerID INT
DECLARE @TargetCustomerID INT

SELECT
  @SourceCustomerID = 11000
, @TargetCustomerID = 99999

-- Obviously it is important to run the entire operation
-- in a transaction and implement TRY CATCH to rollback
-- the transaction when an error occurs. For instance
-- a hash collision could occur (but the chance that it
-- happens is astronomically small).
BEGIN TRY

    BEGIN TRAN

    --\
    ---) 2a. Copy [SalesOrderHeader]
    --/
    INSERT INTO [dbo].[SalesOrderHeader]
      ( [SalesOrderGUID]
      , [RevisionNumber]
      , [OrderDate]
      , [DueDate]
      , [ShipDate]
      , [Status]
      , [OnlineOrderFlag]
      , [SalesOrderNumber]
      , [PurchaseOrderNumber]
      , [AccountNumber]
      , [CustomerID]
      , [SalesPersonID]
      , [TerritoryID]
      , [BillToAddressID]
      , [ShipToAddressID]
      , [ShipMethodID]
      , [CreditCardID]
      , [CreditCardApprovalCode]
      , [CurrencyRateID]
      , [SubTotal]
      , [TaxAmt]
      , [Freight]
      , [TotalDue]
      , [Comment]
      , [rowguid]
      , [ModifiedDate]
      )
    SELECT
        CONVERT(UNIQUEIDENTIFIER, HASHBYTES('MD5', CONVERT(CHAR(36), src.[SalesOrderGUID]) + @NewGuidSalt)) AS [SalesOrderGUID]
      , [RevisionNumber]
      , [OrderDate]
      , [DueDate]
      , [ShipDate]
      , [Status]
      , [OnlineOrderFlag]
      , [SalesOrderNumber]
      , [PurchaseOrderNumber]
      , [AccountNumber]
      , @TargetCustomerID AS [CustomerID]
      , [SalesPersonID]
      , [TerritoryID]
      , [BillToAddressID]
      , [ShipToAddressID]
      , [ShipMethodID]
      , [CreditCardID]
      , [CreditCardApprovalCode]
      , [CurrencyRateID]
      , [SubTotal]
      , [TaxAmt]
      , [Freight]
      , [TotalDue]
      , [Comment]
      , NEWID() AS [rowguid]
      , GETDATE() AS [ModifiedDate]
    FROM
        [dbo].[SalesOrderHeader] src
    WHERE
        src.CustomerID = @SourceCustomerID

    --\
    ---) 2b. Copy [SalesOrderDetail] rows
    --/
    INSERT INTO [dbo].[SalesOrderDetail]
      ( [SalesOrderGUID]
      , [SalesOrderDetailGUID]
      , [CarrierTrackingNumber]
      , [OrderQty]
      , [ProductID]
      , [SpecialOfferID]
      , [UnitPrice]
      , [UnitPriceDiscount]
      , [LineTotal]
      , [rowguid]
      , [ModifiedDate]
      )
    SELECT
        CONVERT(UNIQUEIDENTIFIER, HASHBYTES('MD5', CONVERT(CHAR(36), src.[SalesOrderGUID]) + @NewGuidSalt)) AS [SalesOrderGUID]
      , NEWID() AS [SalesOrderDetailGUID]
      -- For [SalesOrderDetailGUID] we simply use NEWID(), because it is not used somewhere else.
      , src.[CarrierTrackingNumber]
      , src.[OrderQty]
      , src.[ProductID]
      , src.[SpecialOfferID]
      , src.[UnitPrice]
      , src.[UnitPriceDiscount]
      , src.[LineTotal]
      , NEWID() AS [rowguid]
      , GETDATE() AS [ModifiedDate]
    FROM
        [dbo].[SalesOrderDetail] src
    JOIN
        [dbo].[SalesOrderHeader] h
        ON h.SalesOrderGUID = src.SalesOrderGUID
    WHERE
        h.CustomerID = @SourceCustomerID

    COMMIT TRAN;

END TRY
BEGIN CATCH
    IF @@TRANCOUNT > 0 ROLLBACK;
    DECLARE @ErrorMessage NVARCHAR(MAX) = ERROR_MESSAGE();
    RAISERROR(@ErrorMessage, 16, 1);
END CATCH

--\
---) 3. Now view the results
---) Note that the order detail rows are copied using the
---) same generated guid for [SalesOrderGUID]. Note that
---) if you would rerun the script starting with
---) “2. Demonstrate ..” you would get a second copy of the
---) same orders with orderdetails rows and with a different
---) generated guid for [SalesOrderGUID], due to the Salt
---) that was used.
--/
SELECT *
FROM
    [dbo].[SalesOrderHeader] h
WHERE
    h.CustomerID IN (@SourceCustomerID, @TargetCustomerID)

SELECT d.*
FROM
    [dbo].[SalesOrderDetail] d
JOIN
    [dbo].[SalesOrderHeader] h
    ON h.SalesOrderGUID = d.SalesOrderGUID
WHERE
    h.CustomerID IN (@SourceCustomerID, @TargetCustomerID)

GO

--\
---) 3. CLEANUP
--/
USE [tempdb]
GO

IF OBJECT_ID('[dbo].[SalesOrderHeader]', 'U') IS NOT NULL
   DROP TABLE [dbo].[SalesOrderHeader];
IF OBJECT_ID('[dbo].[SalesOrderDetail]', 'U') IS NOT NULL
   DROP TABLE [dbo].[SalesOrderDetail];
GO

Download the script here.

TSQL-E04-script
© 2018 hansmichiels.com – Do not steal the contents – spread the link instead – thank you.

The SEQUENCE, an updatable alternative for an IDENTITY column

Introduction

Recently I used SEQUENCES instead of IDENTITY columns. I was aware of their existence but there was never a need to use them .. until now.

Problem

Identity values of another system needed to be preserved, I know normally it is trivial if you should want this, but without going into detail, I can tell you this time it made sense.
But the problem with IDENTITY columns is that you can only assign a specific value during INSERT using the IDENTITY_INSERT option.
In my case this was not a simple solution. I would need a complicated script to create a new table, use IDENTITY_INSERT to copy over the rows from the current table, then recreate all foreign keys that refer to the old table so that they refer to the new table, then drop foreign keys referring to the old table, then drop the old table, then rename the new table to the old name. Sounds like a complicated plan, right? This is what I thought as well ..


Picture credits: © Can Stock Photo / Andreus

Solution

Using SEQUENCES made it possible to UPDATE the values. But I must admit, if you already have a table with an IDENTITY column, with rows, with foreign keys, it is already too late to switch to using a SEQUENCE, or at least you need the complicated plan, described above.
But in my case I generated a data warehouse house with biGENiUS, a data warehouse automation tool.
Then I did an initial load, and then I wanted to set some ID’s right. I added ID’s to Data Vault 2.0 hubs to use for SCD Type 1 Dimensions.
The flexibility of biGENiUS made it possible for me to change the template for the generated hub, and use a column that gets a value from a SEQUENCE instead of an IDENTITY column. So 15 minutes later I had what I wanted, an updatable sequence number in the hub that had a hashkey as primary key.
Without going into more details about why I choose this approach and whether it is good or not, I want to share the scripts with you.

First the script for the table with a sequence number/autonumbered column, in my case a hub table in a data vault.

create_hub.sql

USE [MSSQL_E13]
GO
CREATE SCHEMA [raw]
GO

IF OBJECT_ID (N'[raw].[Example_Hub]', N'U') IS NOT NULL
    DROP TABLE [raw].[Example_Hub];
GO

CREATE TABLE [raw].[Example_Hub] (
     [Example_HK] CHAR(32) NOT NULL -- the hashkey
    ,[Example_BK] NVARCHAR(10) NOT NULL -- the business key
    ,[SA_SourceSystem_ID] SMALLINT NOT NULL
    ,[DW_Load_ID] BIGINT NOT NULL
    ,[DW_Sequence_ID] INT NOT NULL -- don't add IDENTITY(1,1) here,
                                   -- the alternative is below.
    ,CONSTRAINT [PK_Example_Hub] PRIMARY KEY NONCLUSTERED (
          [Example_HK] ASC -- due to the distributed character of
          -- the hashkey, it is created as NONCLUSTERED.
    )
    ,CONSTRAINT [IX_UN_Example_Hub] UNIQUE (
          [Example_BK]
    )
    ,CONSTRAINT [IX_UC_Example_Hub] UNIQUE CLUSTERED (
          [DW_Sequence_ID] -- the additional sequence number is
          -- created with a UNIQUE CLUSTERED index.
    )
);
GO
-- Create a sequence to use instead of IDENTITY.
-- Note that the sequence is not bound to the table.
-- It is in our case only related to the table using a naming
-- convention {TABLENAME}_Sequence, but you could use a sequence
-- to be used by multiple tables, although I do not know why you
-- would want that (it could become a bottleneck for insertions).
IF OBJECT_ID('[raw].[Example_Hub_Sequence]', 'SO') IS NULL
    CREATE SEQUENCE [raw].[Example_Hub_Sequence] AS INT START WITH 1;

-- Add default constraint for column [DW_Sequence_ID] so that
-- it uses the next value of the sequence.
ALTER TABLE [raw].[Example_Hub]
    ADD CONSTRAINT [DF_raw_Example_Hub__DW_Sequence_ID]
    DEFAULT NEXT VALUE FOR [raw].[Example_Hub_Sequence]
    FOR [DW_Sequence_ID];
GO

Then the script to update the sequence number from an existing table, based/joined on the business key.

update_sequence.sql

--\
---) Keep ExampleId's
---) Obviously it only makes sense to run this script when
---) (generated) values are loaded to the table and the [DW_Sequence_ID]
---) column, that you now want to change.
--/
BEGIN TRY
    BEGIN TRAN;

    --\
    ---) Swap the sign of all ID's to a negative value.
    ---) This works only if all values are positive.
    --/
    UPDATE hub
      SET [DW_Sequence_ID] = -1 * [DW_Sequence_ID]
      FROM [raw].[Example_Hub] hub
      WHERE [DW_Sequence_ID] > 0;

    --\
    ---) Now update the values.
    --/
    UPDATE hub
      SET [DW_Sequence_ID] = curdb.ExampleId
      FROM [raw].[Example_Hub] hub
      JOIN [ExistingDB].[dbo].[dim_Example] curdb
        ON curdb.ExampleCode = hub.Example_Code_BK

    --\
    ---) Make sure that the sequence next value will be higher than the
    ---) already used ID's in the existing table.
    --/
    DECLARE @MaxID_New INT = ISNULL((SELECT MAX([DW_Sequence_ID]) FROM [raw].[Example_Hub]), 0) + 1;
    DECLARE @sql VARCHAR(1000)

    SET @sql = 'ALTER SEQUENCE [raw].[Example_Hub_Sequence] RESTART WITH ' + CONVERT(VARCHAR, @MaxID_New) + ';'
    PRINT @sql
    EXEC(@sql);

    COMMIT TRAN;
END TRY
BEGIN CATCH
    IF @@TRANCOUNT > 0 ROLLBACK TRAN;
    THROW;
END CATCH

--\
---) If not all values are updated, you have to check manually.
---) Maybe you can just swap the sign again to make it a positive value, if that
---) does not conflict with existing values.
--/
IF EXISTS(SELECT 1 FROM [raw].[Example_Hub] hub WHERE [DW_Sequence_ID] < 0)
BEGIN
    PRINT 'Not all rows of [Example_Hub] are updated, please check.'
    SELECT * FROM [raw].[Example_Hub] hub WHERE [DW_Sequence_ID] < 0;
END

GO

Download the scripts here.

Conclusion / Wrap up

A sequence is an object type in SQL Server of which the usage is not very widely spread, as far as I know. But it can be useful!
It can be used as an alternative to a column with an IDENTITY specification, if you want to update the value later on.
Please make sure if you update the values, that after this update the SEQUENCE object will only return values that are higher than the updated values. It is your responsibility to make this happen. An example of how to to this is shown in the second script.

(c) 2017 hansmichiels.com – Do not steal the contents – spread the link instead – thank you.

How to mimic a wildcard search on Always Encrypted columns with Entity Framework

Introduction

The title of this post should have been “How to implement wildcard search functionality with Always Encrypted, make deterministic encryption safer, and load initial data using SqlBulkInsert”, but as you understand, that’s just too long for a title.

A number of years ago I built a web application with “Always Encrypted” as VARBINARY columns, before Microsoft offered this feature out of the SQL Server Box. So in case the database server would be compromised by hackers, no client details could be revealed.

Just like Microsoft my Data Access Layer, built in .NET did the encryption and decryption, transparent for other code that accessed it. And I had the same challenge how to search on encrypted columns.

So with (maybe more than average) interest I was looking forward to know how the Always Encrypted feature of SQL Server 2016 works.

The full article is posted on SQLShack.com, so you can read further there ..

Temporal Table applications in SQL Data Warehouse environments (SQL Server Series)

Today the subject of investigation is the Temporal Table, which is a new feature in SQL Server 2016. My focus will slightly be on how to use it in Data Warehouse environments, but there is some general information passing by as I write.
I want to cover next topics:

  1. What is a temporal table (in short)?
  2. Can I use a temporal table for a table in the PSA (Persistent Staging Area)?
  3. Can I use a temporal table for a Data Vault Satellite?
  4. Is using temporal tables for full auditing in an OLTP system a good idea?

The full article is posted on SQLShack.com, so read further there ..

Using a Persistent Staging Area: What, Why, and How

Change log

Date Changes
21 February, 2017 Due to a very useful discussion on LinkedIn with Dan Linstedt, I have made some changes in this article to emphasize that a PSA does not have a permanent place in any EDW
(with exception, as Dan suggests, being the Hadoop NoSQL platform used for slower storage mechanisms).

Today I want to discuss the Persistent Staging Area in an Enterprise Data Warehouse, or PSA for short. First I want to give a credit to Roelant Vos who has blogged about this subject before. He triggered me to go further with this concept to see “what’s in it for me” and to do a proof of concept using SQL Server 2016.

What is a Persistent Staging Area or PSA?

As an introduction, I want to tell what a (non-persistent) Staging Area is. Some of you will already know, so be it. A Staging Area is a “landing zone” for data flowing into a data warehouse environment. The data in a Staging Area is only kept there until it is successfully loaded into the data warehouse. Between two loads, all staging tables are made empty again (or dropped and recreated before the next load).
A Persistent Staging Area on the other hand, is a Staging Area that is not wiped out between loads: it contains full history from the source(s) that deliver data to it.


Picture credits: © creativecommonsstockphotos | Dreamstime Stock Photos
The “Staging Area” for passengers on an airport ..

Why would you want to make use of a PSA, and what are its pros and cons?

A PSA makes it possible to reload the next layer in the Data Warehouse (for example a Data Vault) when requirements change during development of the Data Vault / EDW without going back to the original source(s). The original source(s) might not contain all history anymore (e.g. because it is an OLTP system or a web service returning only current data) or it is cumbersome to do such a reload because the source data is delivered in large numbers of flat files.

As you might know for data warehouses with (semi)structured data I always preach for a multi-tier architecture including a Data Vault for storing full history of unmodified source data and one or more dimensional models (virtual if performance allows, otherwise physical) as data feeds for cubes, reports, dashboards and BI tools.

So, I hear you think, in a Data Vault you have already full history, why would you need a PSA then? The answer is simple: because building a Data Vault takes time. You do not load all data on day 1, maybe there are uncertainties about the model and your resources are restricted, most likely. So you build the Data Vault in increments. But you can start capturing as much source data as you can! When Data Warehouse Automation tools are used, this is fairly easy to build. So even though you do not take the data further into your data warehouse than the (Persistent) Staging Area, you already start collecting your history of data.

Note: a different reason to use a PSA could be that your source systems contain high data volumes and are geographically spread: you could then create a PSA on premise near the source system, and export only delta data from it – each row would get an I, U or D indication for Insert, Update and Delete – to massively reduce data traffic between source and EDW. I will not discuss this use case scenario today, because then pros and cons would be completely different, and time travelling and full reload would not be applicable.

Using a PSA has it’s pros and cons. To mention a few:

Pros:

  • You collect data, even for those parts of the data warehouse you have not properly designed and developed yet.
  • You can clear the Data Vault, or even change the model completely (this can happen, for instance when business concepts initially were not well understood), create new ETL Code to load data to the modified Data Vault and do a replay of loading full history of the data captured in the PSA.
  • Important is that the PSA is implemented with changed rows only. If done so, it does not have excessive storage needs. So like a Data Vault Satellite, a new row for a table in the PSA is only added, when something has changed for the primary (surrogate or natural) key. A start- and end date/time can be used to implement this.
  • A (relational) PSA should have an optional, temporary character, and is not part of the core EDW tiers. A (relational) PSA should have an optional, temporary character, and is not part of the core EDW tiers! This was written twice on purpose! Do not forget! It is useful during building the EDW, but must be disarmed when the EDW has gone into production and has reached a maintenance phase. A good way to make sure this happens is to plan a decommission date and stick to the plan (can also be a plan like “after sources X, Y and Z are connected and are data feeds in production for 3 months”). Do not keep the PSA around for no good reason.
    In this way, most of the cons below are very likely to be not applicable!

Cons:

  • You can argue that you are in fact building a second data warehouse.
  • When sources change over time, your PSA might not reflect the current structure of the source system(s) anymore (e.g. a column is removed in the source system but you leave it in the PSA so that you can add new data with the column empty but keep the data in this column for older rows). This might be confusing.
  • When source systems are not used anymore (for instance when accounting software is replaced by a similar product from a different vendor) you need to add tables of the new system to the PSA, while you also keep the tables from the old system, which will not receive new data anymore. Now you need to keep the ETL code to load from the “old systems” PSA tables, in case a full reload is needed, and you also need to develop code to get data from the “new systems” PSA, and might need logic for which point in time which tables should be loaded. As you can understand, now the maintenance of your PSA can start to become a heavy burden. So going back to the pros, make sure your PSA is optional and can be disarmed when the maintenance efforts do no longer justify the benefits.

How could you implement a PSA?

To give an idea how this fits in my favourite architecture, I’ve added the PSA to it.
For a normal load the PSA is just loaded from the Staging Area (preferably in parallel with loading the Enterprise Data Warehouse from the Staging Area).
A Virtual Staging Access Layer (for example implemented with views) ensures that ETL code for loading the Enterprise Data Warehouse can switch easily between Staging Area and PSA for loading data. An example how to implement this will follow below.


How the PSA fits in the EDW architecture

For the next schema I owe you an explanation, because I know it’s a bit weird to mix architecture with processing in one schema, but in this case I thought it made sense and could clarify how to reload the EDW from the PSA: no sources and Staging Area are involved, but instead a time travelling mechanism queries the data from the PSA through the Virtual Staging Access Layer, to replay all loads that have been done so far, but with modified ETL Code to load data into a modified or extended Data Vault.


Reloading the DWH from the Persistent Staging Area

To set up time travelling, you could create a list of load sessions and their start date/times, then substract a small time fraction and use that value as Point-In-Time date time value to retrieve all rows from the previous load.
Note: For the DATETIME2 columns, I use a precision of centiseconds, so DATETIME2(2). For justification see one of my older blog posts: Stop being so precise! and more about using Load(end)dates (Datavault Series). If needed you can use a higher precision for your data warehouse.

In a table:

Load no LoadStartDatetime Point-In-Time value
1 2017-02-01 02:00:01.74 2017-02-02 02:00:00.88
2 2017-02-02 02:00:00.89 2017-02-02 02:00:02.11
3 (last load so far) 2017-02-03 02:00:02.12 9999-12-31 23:59:59.98 (! not 99 !)

The Time Travel icon with the arrows in the picture above refers to the process that loops through all Start date/time values chronologically and triggers execution of the load process from PSA to EDW for every Point-In-Time value.

SQL Server 2016 implementation: how could you implement loading the PSA from the Staging Area?

This logic is very similar to loading a Data Vault Satellite. For each staging table you should create a “History” version in the PSA database, with two extra columns: EffectiveStartDts and EffectiveEndDts and a primary key defined as the primary key of the staging table plus the EffectiveStartDts. Most concepts of loading a satellite could be applied: you could add rows that are new, insert new versions of rows that have changed, and mark absent rows (missing in the source) as such. For the PSA I would suggest just to update the end date time so that this row will simply not be in the selection when a later point-in-time is used. Also there is no need to create a separate “hub” table for each staging table.

SQL Server 2016 implementation: how could you implement the historic reload from the PSA?

In a previous blog post Isolate cross database “databasename” dependencies using synonyms and schemas I already advocated the concept of putting each tier of the EDW in its own database schema, and use synonyms to achieve interconnectivity between objects in different databases. In the rest of this article I will assume that you understand this concept (if not, you could read this blog post first).

With some imagination, we can use a modified version of this concept by creating the Virtual Staging Access Layer into the PSA database. This virtual layer is needed for loading data from the PSA instead of the Staging Area.
Suppose the Staging Database has a database schema stg and that schema contains all tables that can be loaded into the EDW / Raw Data Vault.
What we can do is do a similar thing in the PSA database, so create a stg schema, but instead of tables this schema contains views with the same name and same columns as the tables in the Staging Database! These views select the rows valid at a certain point in time.

The challenge to retrieve only the valid rows at a point in time is an easy one: To the PSA database we add a table [stg].[PointInTime] with only one row and a [CurrentPointInTime] column of datatype DATETIME2. Now we can build a SELECT statement on the PSA table (same as the staging table, but with start- and end date/time stamps to hold historic changed and deleted rows) with a CROSS JOIN on the PointInTime table and a WHERE clause to select only the rows valid at the given point in time.

An example how such a view could look like:

CREATE_VIEW_stg_Customer.sql

CREATE VIEW [stg].[Customer]
AS
/*
==========================================================================================
Author: Hans Michiels
Create date: 15-FEB-2017
Description: View that has exactly the same structure as a table with the same name
             in the Staging Database.
             This view can be used for full reloads of all data in the PSA Database.
==========================================================================================
*/

SELECT
    [CustomerID],
    [FirstName],
    [Initials],
    [MiddleName],
    [SurName],
    [DateOfBirth],
    [Gender],
    [SocialSecurityNumber],
    [Address],
    [PostalCode],
    [Residence],
    [StateOrProvince],
    [Country],
    [RowHash]
FROM
    [psa].[CustomerHistory] hist
CROSS JOIN
    [psa].[PointInTime] pit
WHERE
    hist.EffectiveStartDts <= pit.CurrentPointInTime
    AND hist.EffectiveEndDts >= pit.CurrentPointInTime -- When INCLUSIVE enddating is used.
    -- AND hist.EffectiveEndDts > pit.CurrentPointInTime -- When EXCLUSIVE enddating is used.
GO

The next steps are:

  1. Make sure your ETL Code to load the data has a mechanism to skip loading data from the sources to the Staging Area and from the Staging Area to the PSA when the PSA is used instead of the normal Staging area. For instance you could add a view [stg].[IsPSA] to both the Staging database and the PSA database, returning a single row with a single value “False” or “True”, respectively. In your package you could retrieve this value using the Staging Connection Manager, and change the flow of your package depending on the fact if the Staging database is connected or the PSA database.
  2. Build a time travelling mechanism (could be a FullReloadMaster.dtsx package) that updates the stg.PointInTime table for every point-in-time date/time and after every update starts the ETL Process (e.g. a Master.dtsx SSIS package) with a changed connection string, connected to the PSA instead of “normal” Staging database!

Here is a possible implementation of the [stg].[IsPSA] view, based on the fact that the PSA database will have _PSA_ in the database name and the staging database will not. This allows to use the same version of the view for the Staging and PSA database.

CREATE_VIEW_stg_IsPSA.sql

CREATE VIEW [stg].[IsPSA]
AS
/*
==========================================================================================
Author: Hans Michiels
Create date: 10-FEB-2017
Description: View that returns one rows with a bitcolumn [IsPSA] that indicates
             if the current database is the PSA or the STG.
==========================================================================================
*/

SELECT
  CONVERT(BIT,
    CASE
    WHEN CHARINDEX('_PSA_', DB_NAME()) > 0 THEN 1
    ELSE 0
    END) AS [IsPSA]
GO


Example how a FullReloadMaster.dtsx package could look like. Important is that the Staging database connection string is a parameter of Master.dtsx, and is passed by FullReloadMaster.dtsx, but it now is a connection to the PSA database.

Now the next challenge is how to get this working. If you use a SSIS dataflow with a OLEDB Source (and a destination), it is not so difficult: you can just change the connection string of the OLEDB Source. I have done a proof of concept and it worked! You do not have to redeploy SSIS Packages. If views in the PSA database are used with exactly the same structure as the tables in the Staging database, the SSIS Package agrees with the “metadata” and “VS_NEEDSNEWMETADATA” or “VS_ISBROKEN” errors do not occur. Of course you could also make views for reading the Staging tables in the Staging database. This would create an additional layer of abstraction, because then in both the Staging Area and PSA you read the data from the views, and you could, when needed, implement views of which the structure differs from the underlying tables.

If you use SQL code (e.g. stored procedures) this will be a little more work, but still doable. You can follow one of the following strategies for storing the SQL Code:

  • a push strategy: SQL code is in the PSA database, and should then also be in the Staging database. From there, you push the data to the raw Data Vault.
  • a pull strategy: SQL code is in the raw Data Vault. From there you pull the data from the Staging Area or PSA. You will need to switch via synonyms: drop stg synonyms and recreate them but them pointing to the PSA database before a historic reload is done. Afterwards, the reverse action needs to take place to let them point to the Staging database again.
  • an outsider strategy: SQL code is in a separate database, and has synonyms for both staging and raw Data Vault tables and views. Now the same switch via synonyms method can be used.

A few considerations:

  • a push strategy forces you to install your SQL code twice: in the Staging and the PSA database. Also a Staging database might not feel like the “right place” for stored procedures.
  • a pull strategy and an outsider strategy force you to change synonyms, something that normally only has to be done during a deployment, but now before a full reload (and afterwards the reverse action).

I have a slight preference for switching via synonyms, assumed that a full reload will not happen often.
Still using the push strategy could be a valid choice too.

Conclusion / Wrap up

In this post you could read more about the concept of a Persistent Staging Area. Although I have not used it in production environments yet, I think in some projects it can be very useful.
Important to keep in mind:

  • Make sure that your data warehouse can survive if the PSA is switched off. So do not expose it to end users or make it a mandatory part of your ETL Code.
  • Plan a decommission moment when the PSA is created and stick to the plan. Do not keep the PSA around for no good reason.
  • Use code generation tools to update the PSA from the staging area to limit the development effort to set it up.
  • Virtualize the read access to the staging tables, for instance by using views.

If you stick to this “rules” I think a PSA can prove its value for your Enterprise Data Warehouse!

(c) 2017 hansmichiels.com – Do not steal the contents – spread the link instead – thank you.

Free download: Blog Yearbook 2016

I have made a compilation of all my blog posts of the past year, and it was more work than you would think.
Especially because it also contains a non-published new article.

After filling in the form you can download the e-book for free.

    I promise you the following:

    • I will not give or sell your email address to any third party.
    • I will not spam you.
    • If I start with a newsletter, you can easily unsubscribe or change the frequency.

    (c) 2016 hansmichiels.com – Do not steal the contents – spread the link instead – thank you.

    Making your SQL Agent job scripts generic for all servers (SQL Server Series)

    Introduction

    If you schedule jobs and you are using SQL Server (not Express Edition) SQL Agent is your best friend. I think it is easier to use than Windows Scheduler for most use cases.
    The GUI that SQL Server Management Studio has for creating and editing SQL Agent Jobs is also quite good, I think.

    Problem

    However, after the agent jobs are created, the SQL Agent Job scripting done by SQL Server Management Studio has a number of issues (Microsoft developers of the SQL Server team, please read!).
    Where shall I start ..

    • I like to write repeatable scripts, that raise an error only when something is really wrong, so when the script is being executed as expected, no errors are thrown. The generated scripts however use brute force to delete any existing jobs. If they do not exist yet, an error occurs. This is not what I would call a real error, because it is logical that when a job does not exist yet, it cannot be deleted. A simple IF statement could check if the job exists and delete the job only if it exists.
    • Even worse, the generated scripts use the @job_id, a unique identifier, to delete the job. By definition, this will work only once! After the job is (re)created, the job_id will be different! It would be much better to use the @job_name as parameter for sp_delete_job (yes this is possible), it is just that the SQL Server developer team made the choice to use the @job_id .. 🙁
    • Because scripts are always delete and (re)create, instead of alter/modify, your entire job history is gone, when you change anything in your script and execute it on your server (sorry, no workaround for this).
    • Any (SQL) Server specific values, like SSIS Environment IDs and SQLInstance names, can make your scripts only suitable for one server, the one you created the job on with the SSMS GUI.

    Solution

    Given all the challenges above, I was looking for a way so:

    • I could create the SQL Agent jobs with the SSMS GUI.
    • Then script the jobs and put them under source control.
    • Then also make sure I can install the SQL Agent jobs on a different server (development/test/production) with the same (nearly) unmodified script.

    For this to work, we are again visiting the system objects area of the msdb database.

    The plan is roughly as follows:

    1. Create a wrapper stored procedure for sp_create_jobstep.
    2. Create a generic install script that partially is manual labour, that you have to do only once, and partially contains a script generated by SSMS for creating jobs.

    Create a wrapper stored procedure for sp_create_jobstep.

    This wrapper stored procedure, called [dbo].[usp_add_jobstep_wrapper] consists of the following three parts:

    • PART 1: Check if temp table #jobstep_replacements exists with project-/customer specific replacements: this temporary table can be created in a script that creates jobsteps and contains info about textual replacements to be done in any (N)VARCHAR parameter of the stored procedure sp_create_jobstep. However it would be neater to pass a table variable as parameter to the stored procedure, this would make adjusting scripted jobs more work, because a parameter would have to be added to the EXECUTE [dbo].[usp_add_jobstep_wrapper] command.
    • PART 2: Fix environment references in ssis commands: a feature that is open for improvement in a future SQL Server Service Pack is that when you create a jobstep that executes a SSIS Package, and you use a reference to an environment, in the script a technical ID is used, that is most likely to be different on another server. This is also not very easy to fix manually, or you have to look up all the new id’s with a query, and then change this in the script. Good as occupational therapy, but not so good for your productivity.
      So this part of the stored procedure fixes this /ENVREFERENCE for you.
    • PART 3: After doing replacements in parameter values, and fixing the /ENVREFERENCE, the built-in stored procedure sp_add_jobstep is executed.
      This implies that we simply can execute [dbo].[usp_add_jobstep_wrapper] instead of [dbo].[sp_add_jobstep] , and PART 1 and 2 will be done extra.

    And here it is (can also be downloaded).

    msdb.dbo.usp_add_jobstep_wrapper.sql

    --\
    ---) hansmichiels.com [msdb].[dbo].[usp_add_jobstep_wrapper]
    ---) Author: Hans Michiels
    ---) Stored procedure that can help to make SQL Server Agent job scripts usable for multiple servers.
    --/
    /*
    (c) Copyright 2016 - hansmichiels.com
     
    This program is free software: you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation, either version 3 of the License, or
    (at your option) any later version.
     
    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
    GNU General Public License for more details.
     
    You should have received a copy of the GNU General Public License
    along with this program. If not, see http://www.gnu.org/licenses/.
    */

    USE [msdb]
    GO
    IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[usp_add_jobstep_wrapper]') AND type in (N'P', N'PC'))
      DROP PROCEDURE [dbo].[usp_add_jobstep_wrapper]
    GO
    SET ANSI_NULLS ON
    GO
    SET QUOTED_IDENTIFIER OFF
    GO
    CREATE PROCEDURE [dbo].[usp_add_jobstep_wrapper]
      @job_id UNIQUEIDENTIFIER = NULL, -- Must provide either this or job_name
      @job_name sysname = NULL, -- Must provide either this or job_id
      @step_id INT = NULL, -- The proc assigns a default
      @step_name sysname,
      @subsystem NVARCHAR(40) = N'TSQL',
      @command NVARCHAR(max) = NULL,
      @additional_parameters NVARCHAR(max) = NULL,
      @cmdexec_success_code INT = 0,
      @on_success_action TINYINT = 1, -- 1 = Quit With Success, 2 = Quit With Failure, 3 = Goto Next Step, 4 = Goto Step
      @on_success_step_id INT = 0,
      @on_fail_action TINYINT = 2, -- 1 = Quit With Success, 2 = Quit With Failure, 3 = Goto Next Step, 4 = Goto Step
      @on_fail_step_id INT = 0,
      @server sysname = NULL,
      @database_name sysname = NULL,
      @database_user_name sysname = NULL,
      @retry_attempts INT = 0, -- No retries
      @retry_interval INT = 0, -- 0 minute interval
      @os_run_priority INT = 0, -- -15 = Idle, -1 = Below Normal, 0 = Normal, 1 = Above Normal, 15 = Time Critical)
      @output_file_name NVARCHAR(200) = NULL,
      @flags INT = 0, -- 0 = Normal,
                                                         -- 1 = Encrypted command (read only),
                                                         -- 2 = Append output files (if any),
                                                         -- 4 = Write TSQL step output to step history,
                                                         -- 8 = Write log to table (overwrite existing history),
                                                         -- 16 = Write log to table (append to existing history)
                                                         -- 32 = Write all output to job history
                                                         -- 64 = Create a Windows event to use as a signal for the Cmd jobstep to abort
      @proxy_id INT = NULL,
      @proxy_name sysname = NULL,
      -- mutual exclusive; must specify only one of above 2 parameters to
      -- identify the proxy.
      @step_uid UNIQUEIDENTIFIER = NULL OUTPUT
    AS
    BEGIN
      DECLARE @retval INT

      DECLARE @sort_order INT
      DECLARE @max_sort_order INT
      DECLARE @subsystems NVARCHAR(128)
      DECLARE @replace_scope NVARCHAR(128)
      DECLARE @scripted_value NVARCHAR(128)
      DECLARE @replace_value NVARCHAR(128)
      DECLARE @message NVARCHAR(MAX)
      DECLARE @divider_length INT = 120

      DECLARE @folder_name NVARCHAR(128)
      DECLARE @project_name NVARCHAR(128)
      DECLARE @startpos INT
      DECLARE @endpos INT
      DECLARE @endpos_project INT
      DECLARE @reference_id INT = -1
      DECLARE @new_reference NVARCHAR(128)
      DECLARE @new_command NVARCHAR(MAX)
      
      PRINT ' ';
      PRINT REPLICATE('-', @divider_length);
      PRINT 'Stored proc : [usp_add_jobstep_wrapper]'
      PRINT 'Copyright : (c) 2016 - hansmichiels.com'
      PRINT 'License : GNU General Public License, see http://www.gnu.org/licenses/'

      IF @job_name IS NOT NULL
      BEGIN
      -- PRINT 'Stored proc:
        SELECT @message = 'Job : ' + @job_name;
      END ELSE BEGIN
        SELECT @message = 'Job : ' + ISNULL((SELECT TOP 1 j.name FROM dbo.sysjobs j WHERE job_id = @job_id), N'(unknown)');
      END
      PRINT @message;
      SELECT @message = 'Jobstep : ' + @step_name;
      PRINT @message;
      PRINT ' ';

      --\---------------------------------------------------------------------------------------------------------
      ---) PART 1: Check if temp table #jobstep_replacements exists with project-/customer specific replacements.
      --/---------------------------------------------------------------------------------------------------------

      IF OBJECT_ID('tempdb..#jobstep_replacements', 'U') IS NULL
      BEGIN

        PRINT 'No jobstep_replacements were found, installing original script.';

      END ELSE BEGIN

        PRINT 'If any replacements are made, they will be shown below.';

        DECLARE replacements_cursor CURSOR LOCAL STATIC FOR
        SELECT TOP (2147483647) -- Otherwise ORDER BY might not work.
            CASE
              WHEN ISNULL(v.[subsystems], N'') IN (N'*', N'')
              THEN N'*'
              ELSE N',' + v.[subsystems] + N','
              END AS [subsystems],
            CASE
              WHEN ISNULL(v.[replace_scope], N'') IN (N'*', N'')
              THEN N'*'
              ELSE N',' + v.[replace_scope] + N','
              END AS [replace_scope],
            v.[scripted_value],
            v.[replace_value]
        FROM
            #jobstep_replacements v
        ORDER BY
            v.sort_order,
            v.id;
                  
        OPEN replacements_cursor;

        FETCH NEXT FROM replacements_cursor
          INTO @subsystems, @replace_scope, @scripted_value, @replace_value;

        WHILE @@FETCH_STATUS = 0
        BEGIN

          IF (@subsystems = N'*' OR CHARINDEX(N',' + @subsystem + N',', @subsystems, 1) > 0)
              AND @replace_value IS NOT NULL
              AND @scripted_value != @replace_value
          BEGIN

            IF (@replace_scope = N'*' OR CHARINDEX(N',@command,', @replace_scope, 1) > 0) AND CHARINDEX(@scripted_value, @command, 1) > 0
            BEGIN
                SET @message = 'Replacement in @command: ' + @scripted_value + ' by ' + @replace_value;
                PRINT ' '
                PRINT @message;
                PRINT 'Old @command: ' + @command;
                SELECT @command = REPLACE(@command, @scripted_value, @replace_value);
                PRINT 'New @command: ' + @command;
            END

            IF (@replace_scope = N'*' OR CHARINDEX(N',@additional_parameters,', @replace_scope, 1) > 0) AND CHARINDEX(@scripted_value, @additional_parameters, 1) > 0
            BEGIN
                SET @message = 'Replacement in @additional_parameters: ' + @scripted_value + ' by ' + @replace_value;
                PRINT ' '
                PRINT @message;
                PRINT 'Old @additional_parameters: ' + @additional_parameters;
                SET @additional_parameters = REPLACE(@additional_parameters, @scripted_value, @replace_value);
                PRINT 'New @additional_parameters: ' + @additional_parameters;
            END

            IF (@replace_scope = N'*' OR CHARINDEX(N',@server,', @replace_scope, 1) > 0) AND CHARINDEX(@scripted_value, @server, 1) > 0
            BEGIN
                SET @message = 'Replacement in @server: ' + @scripted_value + ' by ' + @replace_value;
                PRINT ' '
                PRINT @message;
                PRINT 'Old @server: ' + @server;
                SET @server = REPLACE(@server, @scripted_value, @replace_value);
                PRINT 'New @server: ' + @server;
            END

            IF (@replace_scope = N'*' OR CHARINDEX(N',@database_name,', @replace_scope, 1) > 0) AND CHARINDEX(@scripted_value, @database_name, 1) > 0
            BEGIN
                SET @message = 'Replacement in @database_name: ' + @scripted_value + ' by ' + @replace_value;
                PRINT ' '
                PRINT @message;
                PRINT 'Old @database_name: ' + @database_name;
                SET @database_name = REPLACE(@database_name, @scripted_value, @replace_value);
                PRINT 'New @database_name: ' + @database_name;
            END

            IF (@replace_scope = N'*' OR CHARINDEX(N',@database_user_name,', @replace_scope, 1) > 0) AND CHARINDEX(@scripted_value, @database_user_name, 1) > 0
            BEGIN
                SET @message = 'Replacement in @database_user_name: ' + @scripted_value + ' by ' + @replace_value;
                PRINT REPLICATE('', @divider_length);
                PRINT @message;
                PRINT 'Old @database_user_name: ' + @database_user_name;
                SET @database_user_name = REPLACE(@database_user_name, @scripted_value, @replace_value);
                PRINT 'New @database_user_name: ' + @database_user_name;
            END

            IF (@replace_scope = N'*' OR CHARINDEX(N',@proxy_name,', @replace_scope, 1) > 0) AND CHARINDEX(@scripted_value, @proxy_name, 1) > 0
            BEGIN
                SET @message = 'Replacement in @proxy_name: ' + @scripted_value + ' by ' + @replace_value;
                PRINT ' '
                PRINT @message;
                PRINT 'Old @proxy_name: ' + @proxy_name;
                SET @proxy_name = REPLACE(@proxy_name, @scripted_value, @replace_value);
                PRINT 'New @proxy_name: ' + @proxy_name;
            END

            IF (@replace_scope = N'*' OR CHARINDEX(N',@output_file_name,', @replace_scope, 1) > 0) AND CHARINDEX(@scripted_value, @output_file_name, 1) > 0
            BEGIN
                SET @message = 'Replacement in @output_file_name: ' + @scripted_value + ' by ' + @replace_value;
                PRINT ' '
                PRINT @message;
                PRINT 'Old @output_file_name: ' + @output_file_name;
                SET @output_file_name = REPLACE(@output_file_name, @scripted_value, @replace_value);
                PRINT 'New @output_file_name: ' + @output_file_name;
            END

          END

          FETCH NEXT FROM replacements_cursor
            INTO @subsystems, @replace_scope, @scripted_value, @replace_value;
          
        END

        CLOSE replacements_cursor
        DEALLOCATE replacements_cursor
              
      END

      --\---------------------------------------------------------------------------------------------------------
      ---) PART 2: Fix environment references in ssis commands.
      --/---------------------------------------------------------------------------------------------------------
             
      --\
      ---) First check if there is something to do
      --/
      IF @subsystem = N'SSIS' AND CHARINDEX(N'/ENVREFERENCE', @command, 1) > 0 AND CHARINDEX(N'/ISSERVER “\”\SSISDB\', @command, 1) > 0
      BEGIN
        --\
        ---) Pull out @folder_name and @project_name from the @command variable value
        --/
        SELECT
          @startpos = CHARINDEX(N'/ISSERVER “\”\SSISDB\', @command, 1) + LEN(N'/ISSERVER “\”\SSISDB\'),
          -- @endpos = CHARINDEX(N'dtsx\””', @command, @startpos),
          @endpos = CHARINDEX(N'\', @command, @startpos + 1),
          @endpos_project = CHARINDEX(N'\', @command, @endpos + 1),
          @folder_name = SUBSTRING(@command, @startpos, @endpos - @startpos),
          @project_name = SUBSTRING(@command, @endpos + 1, @endpos_project - @endpos - 1);

          --\
          ---) Armed with the correct @folder_name and @project_name get the environment reference id.
          --/
          SELECT @reference_id = ISNULL((
              SELECT TOP 1
                  er.reference_id
              FROM
                  SSISDB.catalog.environments AS env
              JOIN
                  SSISDB.catalog.folders AS fld
                  ON fld.folder_id = env.folder_id
              JOIN
                  SSISDB.catalog.projects AS prj
                  ON prj.folder_id = fld.folder_id
              JOIN
                  SSISDB.catalog.environment_references AS er
                  ON er.project_id = prj.project_id
              WHERE
                  fld.name = @folder_name
                  AND prj.name = @project_name
          ), -1)

      END

      --\
      ---) If a valid environment reference id was found ..
      --/
      IF @reference_id != -1
      BEGIN
        SELECT
          --\
          ---) .. adjust the /ENVREFERENCE part of the @command so that it is followed by the retrieved @reference_id.
          --/
          @startpos = CHARINDEX(N'/ENVREFERENCE ', @command, 1),
          @endpos = CHARINDEX(N' ', @command, @startpos + + LEN(N'/ENVREFERENCE ') + 1);

          SELECT
              @new_reference = '/ENVREFERENCE ' + CONVERT(NVARCHAR, @reference_id),
              @new_command = LEFT(@command, @startpos - 1) + @new_reference + SUBSTRING(@command, @endpos, 8000);
          IF @new_command != @command
          BEGIN
              SET @message = 'Replacement in @command: fixing /ENVREFERENCE';
              PRINT ' '
              PRINT @message;
              PRINT 'Old @command: ' + @command;
              PRINT 'New @command: ' + @new_command;
              SET @command = @new_command;
          END
      END

      --\---------------------------------------------------------------------------------------------------------
      ---) PART 3: Now we have done our trick with the parameter values,
      ---) execute the built-in stored procedure sp_add_jobstep.
      --/---------------------------------------------------------------------------------------------------------

      EXECUTE @retval = dbo.sp_add_jobstep
          @job_id = @job_id,
          @job_name = @job_name,
          @step_id = @step_id,
          @step_name = @step_name,
          @subsystem = @subsystem,
          @command = @command,
          @additional_parameters = @additional_parameters,
          @cmdexec_success_code = @cmdexec_success_code,
          @on_success_action = @on_success_action,
          @on_success_step_id = @on_success_step_id,
          @on_fail_action = @on_fail_action,
          @on_fail_step_id = @on_fail_step_id,
          @server = @server,
          @database_name = @database_name,
          @database_user_name = @database_user_name,
          @retry_attempts = @retry_attempts,
          @retry_interval = @retry_interval,
          @os_run_priority = @os_run_priority,
          @output_file_name = @output_file_name,
          @flags = @flags,
          @proxy_id = @proxy_id,
          @proxy_name = @proxy_name,
          @step_uid = @step_uid OUTPUT

      RETURN(@retval)
    END
    GO

    Create a generic install script

    This install script consists of the following parts:

    PART 1: Create temporary table #jobstep_replacements with specific replacements.

    This is the part where you customize for your own environments.
    Two things require further explanation: subsystems and replace-scope.
    Each jobstep is defined for a specific subsystem. Valid values are:

    Full name Code
    Operating system (CmdExec) CmdExec
    PowerShell PowerShell
    Replication Distributor Distribution
    Replication Merge Merge
    Replication Queue Reader QueueReader
    Replication Snapshot Snapshot
    SQL Server Analysis Services Command ANALYSISCOMMAND
    SQL Server Analysis Services Query ANALYSISQUERY
    SQL Server Integation Services Package SSIS
    Transact-SQL script (T-SQL) TSQL

    The subsystem codes are used in SQL scripts generated by SSMS to create jobsteps.

    Replace scopes is not an official term, I just called it that way.
    Valid values are any (n)varchar parameter names of sp_add_jobstep, so:

    • @command
    • @additional_parameters
    • @server
    • @database_name
    • @database_user_name
    • @proxy_name
    • @output_file_name

    The temporary table #jobstep_replacements contains the columns [subsystems] and [replace_scope].
    Both can be filled with a commaseparated list to control the subsystems and parameters where the replacement should take place. A * wildcard means “all subsystems” or “all parameters”.

    An example of part 1:

    MSSQL-E08-jobscript-part1

    --\------------------------------------------------------------------------------------
    ---) PART 1: Create temporary table #jobstep_replacements with specific replacements.
    --/------------------------------------------------------------------------------------

    IF OBJECT_ID('tempdb..#jobstep_replacements', 'U') IS NOT NULL
    DROP TABLE #jobstep_replacements;

    IF @@SERVERNAME != 'VWS63-SQL161' -- The server where the script was generated.
    BEGIN
        CREATE TABLE #jobstep_replacements
        --\------------------------------------------------------------------------------------
        ---) IMPORTANT; do not change the structure of this temporary table because
        ---) stored procedure [usp_add_jobstep_wrapper] depends on it.
        --/------------------------------------------------------------------------------------
        (
        [id] INT IDENTITY(1, 1),
        [subsystems] NVARCHAR(128) NOT NULL, -- Use wildcard * or comma separated list with subsystem codes e.g. SSIS,TSQL (don't use spaces!)
        /*
        SUBSYSTEMS
        -------------------------------------- ---------------------
        Full name Code
        -------------------------------------- ---------------------
        Operating system (CmdExec) CmdExec
        PowerShell PowerShell
        Replication Distributor Distribution
        Replication Merge Merge
        Replication Queue Reader QueueReader
        Replication Snapshot Snapshot
        SQL Server Analysis Services Command ANALYSISCOMMAND
        SQL Server Analysis Services Query ANALYSISQUERY
        SQL Server Integation Services Package SSIS
        Transact-SQL script (T-SQL) TSQL
        */

        [replace_scope] NVARCHAR(128) NOT NULL, -- Use wildcard * or comma separated list with subsystem codes e.g. @command,@proxy_name (don't use spaces!)
        /*
        --------------------------------------
        REPLACE-SCOPES
        --------------------------------------
        @command
        @additional_parameters
        @server
        @database_name
        @database_user_name
        @proxy_name
        @output_file_name
        */

        [scripted_value] NVARCHAR(128) NOT NULL,
        [replace_value] NVARCHAR(128) NULL,
        [sort_order] INT NOT NULL DEFAULT(100)
        );
    END

    --\
    ---) Fill the temporary table, based on the current SQL Server Instance name.
    ---) Include the server on which the scripts were made in one IF statement, so the records can be updated later.
    --/
    IF @@SERVERNAME = 'VWS63-SQL161\TABULAR'
    BEGIN
        INSERT INTO #jobstep_replacements
        ( [sort_order], [subsystems], [replace_scope], [scripted_value], [replace_value])
        VALUES
          (1, N'SSIS',
              N'@command',
              N'VWS63-SQL161\”',
              N'VWS63-SQL161\TABULAR\”'),

          (2, N'SSIS',
              N'@command',
              N'DEV',
              N'TST'),

         (3, N'TSQL',
              N'@command,@database_name',
              N'Demo_DEV',
              N'Demo_TST'),

         (4, N'PowerShell',
              N'@command',
              N'F:\Files\Development',
              N'F:\Files\Test')
    END
    ELSE IF @@SERVERNAME = 'PRODSERVER'
    BEGIN
        INSERT INTO #jobstep_replacements
        ( [sort_order], [subsystems], [replace_scope], [scripted_value], [replace_value])
        VALUES
          (1, N'SSIS',
              N'@command',
              N'VWS63-SQL161\”',
              N'VWS63-SQL163\PRD\”'),

          (2, N'SSIS',
              N'@command',
              N'DEV',
              N'PRD'),

         (3, N'TSQL',
              N'@command,@database_name',
              N'Demo_DEV',
              N'Demo_PRD'),

         (4, N'PowerShell',
              N'@command',
              N'F:\Files\Development',
              N'F:\Files\Production')
    END

    PART 2: Smart sp_delete_job statements, using @job_name and IF EXISTS.

    Those statements can be generated using a SQL query. This query is available as comment in the MSSQL-E08-jobscript.sql script (download the scripts here). Just select it and execute. Then you copy and paste from the query output window.

    An example of part 2:

    MSSQL-E08-jobscript-part2

    --\------------------------------------------------------------------------------------
    ---) PART 2: Smart sp_delete_job statements, using @job_name and IF EXISTS
    --/------------------------------------------------------------------------------------

    USE [msdb]
    GO
    /* -- Query to generate sp_delete_job statements:
    SELECT
        CASE sql.line_no
          WHEN 1 THEN 'IF EXISTS (SELECT 1 FROM msdb.dbo.sysjobs WHERE [name] = N'''+ name + ''')'
          WHEN 2 THEN ' EXEC msdb.dbo.sp_delete_job @job_name=N'''+ name + ''', @delete_unused_schedule=1;'
          WHEN 3 THEN 'GO'
        END AS [drop_sql_for_copy_and_paste]
    FROM
        msdb.dbo.sysjobs j
    CROSS JOIN -- To force new lines in the output, a cross join with 3 line_no rows is used.
        (SELECT 1 AS [line_no] UNION SELECT 2 UNION SELECT 3) AS [sql]
    WHERE
        j.name like 'MSSQL-E08%' -- IMPORTANT: here you filter on the jobs to create delete statements for.
    ORDER BY
        j.name,
        sql.line_no
    */

    IF EXISTS (SELECT 1 FROM msdb.dbo.sysjobs WHERE [name] = N'MSSQL-E08-Job1-SSIS')
      EXEC msdb.dbo.sp_delete_job @job_name=N'MSSQL-E08-Job1-SSIS', @delete_unused_schedule=1;
    GO
    IF EXISTS (SELECT 1 FROM msdb.dbo.sysjobs WHERE [name] = N'MSSQL-E08-Job2-TSQL')
      EXEC msdb.dbo.sp_delete_job @job_name=N'MSSQL-E08-Job2-TSQL', @delete_unused_schedule=1;
    GO
    IF EXISTS (SELECT 1 FROM msdb.dbo.sysjobs WHERE [name] = N'MSSQL-E08-Job3-Powershell')
      EXEC msdb.dbo.sp_delete_job @job_name=N'MSSQL-E08-Job3-Powershell', @delete_unused_schedule=1;
    GO

    PART 3: Create jobs, script generated by SSMS.

    Here you paste the script to create jobs that was generated by SQL Server Management Studio. After pasting, replace sp_add_jobstep by usp_add_jobstep_wrapper.

    An example of part 3:

    MSSQL-E08-jobscript-part3

    --\------------------------------------------------------------------------------------
    ---) PART 3: Create jobs, script generated by SSMS
    --/------------------------------------------------------------------------------------
    --\
    ---) IMPORTANT NOTE: You can generate the part below as follows:
    ---) In SQL Server Management Studio select a SQL Agent job, then press F7
    ---) In the Object Explorer Details Pane, select the jobs you want to create a script for.
    ---) Then rightclick and in the context menu select:
    ---) > Script Job As > CREATE To > New Query Editor Window
    ---) In the script generated with SQL Server Management Studio, replace
    ---) sp_add_jobstep
    ---) by
    ---) usp_add_jobstep_wrapper
    --/
    USE [msdb]
    GO

    /****** Object: Job [MSSQL-E08-Job1-SSIS] Script Date: 19-12-2016 18:59:58 ******/
    BEGIN TRANSACTION
    DECLARE @ReturnCode INT
    SELECT @ReturnCode = 0
    /****** Object: JobCategory [[Uncategorized (Local)]] Script Date: 19-12-2016 18:59:58 ******/
    IF NOT EXISTS (SELECT name FROM msdb.dbo.syscategories WHERE name=N'[Uncategorized (Local)]' AND category_class=1)
    BEGIN
    EXEC @ReturnCode = msdb.dbo.sp_add_category @class=N'JOB', @type=N'LOCAL', @name=N'[Uncategorized (Local)]'
    IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback

    END

    DECLARE @jobId BINARY(16)
    EXEC @ReturnCode = msdb.dbo.sp_add_job @job_name=N'MSSQL-E08-Job1-SSIS',
      @enabled=1,
      @notify_level_eventlog=0,
      @notify_level_email=0,
      @notify_level_netsend=0,
      @notify_level_page=0,
      @delete_level=0,
      @description=N'No description available.',
      @category_name=N'[Uncategorized (Local)]',
      @owner_login_name=N'VWS63-SQL161\Hans', @job_id = @jobId OUTPUT
    IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
    /****** Object: Step [AnyPackage.dtsx] Script Date: 19-12-2016 18:59:58 ******/
    EXEC @ReturnCode = msdb.dbo.usp_add_jobstep_wrapper @job_id=@jobId, @step_name=N'AnyPackage.dtsx',
      @step_id=1,
      @cmdexec_success_code=0,
      @on_success_action=1,
      @on_success_step_id=0,
      @on_fail_action=2,
      @on_fail_step_id=0,
      @retry_attempts=0,
      @retry_interval=0,
      @os_run_priority=0, @subsystem=N'SSIS',
      @command=N'/ISSERVER “\”\SSISDB\DEV\SSIS-E10-P2\AnyPackage.dtsx\”” /SERVER “\”VWS63-SQL161\”” /ENVREFERENCE 4 /Par “\”$ServerOption::LOGGING_LEVEL(Int16)\””;1 /Par “\”$ServerOption::SYNCHRONIZED(Boolean)\””;True /CALLERINFO SQLAGENT /REPORTING E',
      @database_name=N'master',
      @flags=0
    IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
    EXEC @ReturnCode = msdb.dbo.sp_update_job @job_id = @jobId, @start_step_id = 1
    IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
    EXEC @ReturnCode = msdb.dbo.sp_add_jobschedule @job_id=@jobId, @name=N'Schedule for MSSQL-E08-Job1-SSIS',
      @enabled=0,
      @freq_type=4,
      @freq_interval=1,
      @freq_subday_type=1,
      @freq_subday_interval=0,
      @freq_relative_interval=0,
      @freq_recurrence_factor=0,
      @active_start_date=20161216,
      @active_end_date=99991231,
      @active_start_time=60000,
      @active_end_time=235959,
      @schedule_uid=N'6d24a58d-f800-4341-ab5b-41fbe4923da8'
    IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
    EXEC @ReturnCode = msdb.dbo.sp_add_jobserver @job_id = @jobId, @server_name = N'(local)'
    IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
    COMMIT TRANSACTION
    GOTO EndSave
    QuitWithRollback:
        IF (@@TRANCOUNT > 0) ROLLBACK TRANSACTION
    EndSave:

    GO

    /****** Object: Job [MSSQL-E08-Job2-TSQL] Script Date: 19-12-2016 18:59:58 ******/
    BEGIN TRANSACTION
    DECLARE @ReturnCode INT
    SELECT @ReturnCode = 0
    /****** Object: JobCategory [[Uncategorized (Local)]] Script Date: 19-12-2016 18:59:58 ******/
    IF NOT EXISTS (SELECT name FROM msdb.dbo.syscategories WHERE name=N'[Uncategorized (Local)]' AND category_class=1)
    BEGIN
    EXEC @ReturnCode = msdb.dbo.sp_add_category @class=N'JOB', @type=N'LOCAL', @name=N'[Uncategorized (Local)]'
    IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback

    END

    DECLARE @jobId BINARY(16)
    EXEC @ReturnCode = msdb.dbo.sp_add_job @job_name=N'MSSQL-E08-Job2-TSQL',
      @enabled=1,
      @notify_level_eventlog=0,
      @notify_level_email=0,
      @notify_level_netsend=0,
      @notify_level_page=0,
      @delete_level=0,
      @description=N'No description available.',
      @category_name=N'[Uncategorized (Local)]',
      @owner_login_name=N'VWS63-SQL161\Hans', @job_id = @jobId OUTPUT
    IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
    /****** Object: Step [Demo_DEV] Script Date: 19-12-2016 18:59:58 ******/
    EXEC @ReturnCode = msdb.dbo.usp_add_jobstep_wrapper @job_id=@jobId, @step_name=N'Demo_DEV',
      @step_id=1,
      @cmdexec_success_code=0,
      @on_success_action=1,
      @on_success_step_id=0,
      @on_fail_action=2,
      @on_fail_step_id=0,
      @retry_attempts=0,
      @retry_interval=0,
      @os_run_priority=0, @subsystem=N'TSQL',
      @command=N'SELECT * from sys.objects',
      @database_name=N'Demo_DEV',
      @flags=0
    IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
    EXEC @ReturnCode = msdb.dbo.sp_update_job @job_id = @jobId, @start_step_id = 1
    IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
    EXEC @ReturnCode = msdb.dbo.sp_add_jobschedule @job_id=@jobId, @name=N'Schedule for MSSQL-E08-Job2-TSQL',
      @enabled=0,
      @freq_type=4,
      @freq_interval=1,
      @freq_subday_type=1,
      @freq_subday_interval=0,
      @freq_relative_interval=0,
      @freq_recurrence_factor=0,
      @active_start_date=20161216,
      @active_end_date=99991231,
      @active_start_time=70000,
      @active_end_time=235959,
      @schedule_uid=N'a25c43d5-8543-4723-903b-beeb6d9a07a3'
    IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
    EXEC @ReturnCode = msdb.dbo.sp_add_jobserver @job_id = @jobId, @server_name = N'(local)'
    IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
    COMMIT TRANSACTION
    GOTO EndSave
    QuitWithRollback:
        IF (@@TRANCOUNT > 0) ROLLBACK TRANSACTION
    EndSave:

    GO

    /****** Object: Job [MSSQL-E08-Job3-Powershell] Script Date: 19-12-2016 18:59:58 ******/
    BEGIN TRANSACTION
    DECLARE @ReturnCode INT
    SELECT @ReturnCode = 0
    /****** Object: JobCategory [[Uncategorized (Local)]] Script Date: 19-12-2016 18:59:58 ******/
    IF NOT EXISTS (SELECT name FROM msdb.dbo.syscategories WHERE name=N'[Uncategorized (Local)]' AND category_class=1)
    BEGIN
    EXEC @ReturnCode = msdb.dbo.sp_add_category @class=N'JOB', @type=N'LOCAL', @name=N'[Uncategorized (Local)]'
    IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback

    END

    DECLARE @jobId BINARY(16)
    EXEC @ReturnCode = msdb.dbo.sp_add_job @job_name=N'MSSQL-E08-Job3-Powershell',
      @enabled=1,
      @notify_level_eventlog=0,
      @notify_level_email=0,
      @notify_level_netsend=0,
      @notify_level_page=0,
      @delete_level=0,
      @description=N'No description available.',
      @category_name=N'[Uncategorized (Local)]',
      @owner_login_name=N'VWS63-SQL161\Hans', @job_id = @jobId OUTPUT
    IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
    /****** Object: Step [Powershell step] Script Date: 19-12-2016 18:59:59 ******/
    EXEC @ReturnCode = msdb.dbo.usp_add_jobstep_wrapper @job_id=@jobId, @step_name=N'Powershell step',
      @step_id=1,
      @cmdexec_success_code=0,
      @on_success_action=1,
      @on_success_step_id=0,
      @on_fail_action=2,
      @on_fail_step_id=0,
      @retry_attempts=0,
      @retry_interval=0,
      @os_run_priority=0, @subsystem=N'PowerShell',
      @command=N'dir “F:\Files\Development\*.csv”',
      @database_name=N'master',
      @flags=0
    IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
    EXEC @ReturnCode = msdb.dbo.sp_update_job @job_id = @jobId, @start_step_id = 1
    IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
    EXEC @ReturnCode = msdb.dbo.sp_add_jobschedule @job_id=@jobId, @name=N'Schedule for MSSQL-E08-Job3-Powershell',
      @enabled=0,
      @freq_type=4,
      @freq_interval=1,
      @freq_subday_type=1,
      @freq_subday_interval=0,
      @freq_relative_interval=0,
      @freq_recurrence_factor=0,
      @active_start_date=20161216,
      @active_end_date=99991231,
      @active_start_time=110000,
      @active_end_time=235959,
      @schedule_uid=N'a6f0e0ca-d27a-4d3f-a349-4a53c0392541'
    IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
    EXEC @ReturnCode = msdb.dbo.sp_add_jobserver @job_id = @jobId, @server_name = N'(local)'
    IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
    COMMIT TRANSACTION
    GOTO EndSave
    QuitWithRollback:
        IF (@@TRANCOUNT > 0) ROLLBACK TRANSACTION
    EndSave:

    GO

    It’s demo time ..

    It is also time to set up a demo, otherwise you have to believe me that this works, and I always like to prove it.
    For this I have deployed two SSIS projects from an earlier blog post to two SQL Instances on my virtual server. One instance is the default instance MSSQLSERVER and the other one is called TABULAR (it has SSAS installed in TABULAR mode).
    For the demo I will pretend however that the default instance is my development environment and the TABULAR instance is my test.
    On the default instance I create three SQL Agent jobs for SSIS-, TSQL and PowerShell jobsteps, respectively.

    Jobs created for the demo ..

    Then I create a script from the jobs, paste it into script MSSQL-E08-jobscript.sql and replace sp_add_jobstep by usp_add_jobstep_wrapper.

    Generating a CREATE script for the jobs ..

    After doing the necessary preparation to install usp_add_jobstep_wrapper on both instances, I can create the SQL Agent jobs on the TABULAR instance.
    The output window shows which replacements were done.

    As you can see in the output window, multiple replacements will be applied one after the other ..

    Replacement in the TSQL jobstep ..

    Replacement in the PowerShell jobstep..

    I can still use the script on the default instance, on that instance simply no replacements will be done.

    No replacements done on the default instance where the CREATE script was generated ..

    Download the scripts here.

    Conclusion / Wrap up

    In this blog post you could read how to cope with some peculiarities of SQL Server Management Studio when it creates SQL Agent job scripts.
    I have explained how you can make your script generic for all environments, which makes it more suitable for putting it under source control.
    For this a special stored procedure, [dbo].[usp_add_jobstep_wrapper] is installed in the [msdb] database and a generic install script must be made that is partly developed and partly pasted from the SSMS Script.

    And, of course, have a merry Christmas and a happy New Year!

    (c) 2016 hansmichiels.com – Do not steal the contents – spread the link instead – thank you.

    Quiet install of SQL Server + SP1 + all tools (SQL Server Series)

    Introduction

    Due to some serious trouble with my SQL Server 2016 Virtual Machine I had to reinstall SQL Server + SP1 + SSMS + Visual Studio 2015 + SQL Server Data Tools again on an older backup of the Virtual Machine.
    So the article I had planned next is delayed.
    But my bad luck was still inspiration for this unplanned article.

    Problem

    How can I do quiet installations of SQL Server 2016 including all service packs and tools?

    Solution

    So what I want to do is install the following products with minimum effort and user interaction after each other, in the Netherlands we would say “gewoon, omdat het kan” (this means something like “just because it is possible”):

    1. SQL Server 2016 Developer Edition RTM
    2. SQL Server 2016 SP1
    3. SQL Server Management Studio 2016
    4. Visual Studio Community 2015
    5. SQL Server Data Tools

    The structure of this article will be practical and straight forward.
    For each program to install I will give comments and details about the command line options, one or more external references for more info when required, and a location to download the software.

    A few common remarks that are applicable to all programs:

    • I have chosen for “unattended but visible” installations when possible. This means there is no user interaction required, but you can see what is happening.
      These switches are either called /qs or /passive.
    • Before doing an unattended install you have to download all media. I mounted one ISO (SQL Server) as DVD drive in my Hyper V Virtual machine, and copied the other setup files into a Virtual harddisk X. (e.g. mounted the downloaded ISO file of SSDT and copied the files to X:\Install\SSDT).
    • Also /norestart is used whenever possible. Not because I do not want to reboot, but because the reboot is taken care of in the Powershell script (using shutdown -r -t 0), so I can first create an empty dummy file. When this file exists, the script knows that the setup has already run.
    • And last but not least: I think it is appropriate to thank Microsoft Corporation. All those software above is FOR FREE now (but please do respect the sometimes limiting license terms)! This is heaven for any developer just to play around without any license cost. So: Microsoft, a sincere Thank You.

    Finally a PowerShell script (download the script here) is used that calls all installers, one after the other. You can choose to reboot inbetween, and then run the same script again.

    SQL Server 2016 Developer Edition RTM

    Comments and command line options

    Two things are important here, you must add /IACCEPTSQLSERVERLICENSETERMS and you must have a ConfigurationFile that specifies all the details for the installation.
    The simplest way to obtain a ConfigurationFile is to start the setup manually and walk through the wizard until it is ready to install. Then a path to the ConfigurationFile.ini is shown.
    Save this file and you can use it for future installations.

    To make the configuration file suitable for a “progress only” installation, set QUIETSIMPLE=”True”.
    Also put a semicolon in front of UIMODE to make it comment, because it is not possible to use QUIET or QUIETSIMPLE together with UIMODE.
    mssql-e07-472Changes needed in the SQL Server Configuration File ..

    External references

    Install SQL Server 2016 Using a Configuration File.

    Where to download

    SQL Server 2016 Developer Edition Download (Microsoft Account required).

    SQL Server 2016 SP1

    Comments and command line options

    While writing this article I noticed that on the page above the installation media of SQL Server 2016 is also available including SP1, so in that case you do not need to download SP1 seperately.
    If you need SP1, e.g. because you have existing installation media (maybe other than Developer Edition) the following can be said about the command line options:
    I have used /action=Patch /allinstances /qs /IAcceptSQLServerLicenseTerms.
    If you not want to do all instances on a server, check the external references below for more info.

    External references

    Silent install info on Technet.

    Where to download

    Microsoft® SQL Server® 2016 Service Pack 1 (SP1) download.

    SQL Server Management Studio 2016

    Comments and command line options

    The command line options used are /install /passive /norestart.

    External references

    Performing a Silent Install of SQL Server Management Studio (2016) by Sven Aelterman.

    Where to download

    Download SQL Server Management Studio (SSMS).

    Visual Studio Community 2015

    Comments and command line options

    Microsoft recommends to install Visual Studio 2015 before SQL Server Data Tools, as follows:
    “We recommend installing Visual Studio 2015 prior to applying this update. Installing this update will replace SSDT RTM in Visual Studio 2015 with the latest version.
    If you do not have Visual Studio 2015, SSDT will install the Visual Studio 2015 Integrated shell and Visual Studio 2015 Isolated shell with limited feature support for SQL Server Database and BI Projects.”

    I like to install Visual Studio, so I can do other stuff like creating console applications. If you are sure you will only use the Business Intelligence “Suite” you could skip installing Visual Studio.
    The command line options used are /Passive /NoRestart /Log “X:\Install\VSTUD_LOG\VSTUD.log”.
    These options lead to an installation of about 7.2 GB.
    It is best to provide a logfile name in a seperate “dedicated” folder, because not one, but hundreds of logfiles are created. I created the folder before manually. I have not tested (sorry, forgot) if the setup would create the folder if it would not exist.

    I must warn you also that the command line option /Full leads to a massive install of about 56 GB and takes quite a long time. Because the default installation installs all I possibly need I did not use /Full.

    mssql-e07-464When you run vs_community.exe /? you get an overview of the command line parameters.

    External references

    Using Command-Line Parameters to Install Visual Studio.
    How to: Create and Run an Unattended Installation of Visual Studio.
    Install Visual Studio 2015.

    Where to download

    VS 2015 Community Edition download.

    SQL Server Data Tools

    Comments and command line options

    Hoorah, Microsoft ended the confusion and separate setups for SSDT (Visual Studio project type for Database projects) and SSDT-BI (formerly known as BIDS with project types for SSIS-, SSRS- and SSAS-development).
    The current installer contains both!
    One thing of the command line options really caught me and therefore a
    WARNING!
    Do not use a / before the command line options INSTALLAS, INSTALLIS, INSTALLRS and INSTALLALL!
    However a slash is not shown in the help screen below, it is so easy to assume that a / is required (or overlook this on the help screen). Intuitively you would expect that all command line parameters start with /, but believe me I did this and both the installed software as the install log file proved that the options where not installed while I provided /INSTALLALL=1 as command line parameter.
    A line of the logfile: Condition ‘INSTALLIS = 1 OR INSTALLALL = 1’ evaluates to false.
    When I used INSTALLALL=1 as command line parameter, all Business Intelligence project types where installed!

    mssql-e07-463When you run SSDTSETUP.EXE /? you get an overview of the command line parameters.

    External references

    Blog article by Dandy Weyn.

    Where to download

    Download SQL Server Data Tools (SSDT).
    SQL Server Data Tools in Visual Studio 2015.

    Using Powershell to install all

    Please note that the installation is not “fully” automated. After each reboot, you have to restart the Powershell script, and it will continue with the first program that is not installed yet.
    If you like, you could create a Windows Scheduled Task and start the script “with the highest privileges” after a reboot. I tried this but it seemed not to work very well, maybe I did something wrong.
    So I just start the script a few times manually after reboot, not a big deal IMHO (but of course this depends on how you intend to use the script).

    From a technical point of view, the comment in the script should help you further:
    mssql-e07-470“Help” section of the PowerShell script. Please read!

    mssql-e07-471The MAIN SCRIPT contains multiple of this kind of code blocks, for each installer one.

    mssql-e07-465Running the PowerShell script ..

    mssql-e07-466My Install folder with some “done” files ..

    mssql-e07-468Command-based dialog after an installation is completed ..

    mssql-e07-467Printscreen of installation in progress ..

    mssql-e07-469After installation both “SQL Server Data Tools 2015” and “Visual Studio 2015” are available as Apps on Windows Server 2012 R2. In my experience you can use both for all project types, however I tend to use Visual Studio ..

    Where to download

    Download the Powershell script here.

    (c) 2016 hansmichiels.com – Do not steal the contents – spread the link instead – thank you.

    How to protect your stored procedures against multiple concurrent executions (SQL Server Series)

    Last update: 5 October, 2016: added examples of using sp_getapplock and sp_releaseapplock.

    Introduction

    I ran into this problem lately. I had a stored procedure that did some global stuff for a database, it would do the exact same thing every time you executed it. It had no parameters.
    It would at the best case be useless and use unnecessary server resources when executed multiple times concurrently, and in the worst case would cause deadlocks in that situation.
    And this is exactly what this post is about: a stored procedure that is not supposed to be executed multiple times concurrently.
    Examples are stored procedures that are purging a database log, updating derived tables or columns, etc.

    Problem

    You can start a stored procedure, and before it is finished, can start it again from a different connection.

    Solutions

    There are (at least) two possible solutions for this problem.

    1. Use sp_getapplock and sp_releaseapplock (the preferred solution). My credits go to Andy Novick who has published about this before on MSSQLTips. I have added examples of using ‘Session’ as lock owner, and therefore I hope that my article will still have added value.
    2. My initial solution using dynamic management views is still in this article for reference.

    Using sp_getapplock and sp_releaseapplock

    How it works

    For your convenience the links to MSDN for help on those two system stored procedures:
    MSDN page about sp_getapplock
    MSDN page about sp_releaseapplock

    Important to understand is that as @LockOwner you can either use a transaction or a session.
    I have included two examples for both types of lockowner.

    Using ‘Transaction’ as lock owner

    In this stored procedure example (of which you can download the script) sp_getapplock is used with @LockOwner = ‘Transaction’.
    When you use this example, it is really important to use the TRY .. CATCH code so that the lock is always released before the stored procedure execution ends.

    sp_StoredProcUsingAppLock.sql

    -------------- [sp].[StoredProcUsingAppLock] --------------
    IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[sp].[StoredProcUsingAppLock]') AND type in (N'P', N'PC'))
      DROP PROCEDURE [sp].[StoredProcUsingAppLock]
    GO
    SET ANSI_NULLS ON
    GO
    SET QUOTED_IDENTIFIER ON
    GO
    /*
    ==========================================================================================
    Author: Hans Michiels
    Create date: 5-oct-2016
    Description: Example of stored procedure that is using sp_getapplock by using a transaction.
    ==========================================================================================
    */

    CREATE PROCEDURE [sp].[StoredProcUsingAppLock]
    (
      @RaiseError BIT = 0
    )
    AS
    BEGIN
      SET NOCOUNT ON;
      DECLARE @RC INT
      DECLARE @message VARCHAR(500)

      BEGIN TRY

        --\
        ---) Protection Against Concurrent Executions
        --/
        BEGIN TRAN

        SELECT @message = CONVERT(VARCHAR(30), GETDATE(), 121) + ': Try to obtain a lock ..'
        RAISERROR(@message, 0, 1) WITH NOWAIT;

        EXEC @RC = sp_getapplock
            @Resource = 'StoredProcUsingAppLock',
            @LockMode = 'Exclusive',
            @LockOwner = 'Transaction',
            @LockTimeout = 60000 -- 1 minute

        IF @RC < 0
        BEGIN
            IF @@TRANCOUNT > 0 ROLLBACK TRAN;
            SELECT @message = CONVERT(VARCHAR(30), GETDATE(), 121) + ': Sorry, could not obtain a lock within the timeout period, return code was ' + CONVERT(VARCHAR(30), @RC) + '.'
            RAISERROR(@message, 0, 1) WITH NOWAIT;
            RETURN @RC
        END ELSE BEGIN
            SELECT @message = CONVERT(VARCHAR(30), GETDATE(), 121) + ': AppLock obtained ..'
            RAISERROR(@message, 0, 1) WITH NOWAIT;
        END
      
        --\
        ---) Stored procedure body
        --/
        -- Wait so that stored procedure has some considerable execution time.
        SELECT @message = CONVERT(VARCHAR(30), GETDATE(), 121) + ': Hello World!'
        RAISERROR(@message, 0, 1) WITH NOWAIT;

        DECLARE @i INT = 0

        WHILE @i < 6
        BEGIN
            WAITFOR DELAY '00:00:01';
            EXEC [sp].[SubStoredProc];
            SET @i = @i + 1
        END

        IF @RaiseError = 1
        BEGIN
            RAISERROR('An error on demand', 16, 1);
        END

        SELECT @message = CONVERT(VARCHAR(30), GETDATE(), 121) + ': Goodbye ..'
        RAISERROR(@message, 0, 1) WITH NOWAIT;

        COMMIT TRAN
        SELECT @message = CONVERT(VARCHAR(30), GETDATE(), 121) + ': Transaction committed, appLock released ..'
        RAISERROR(@message, 0, 1) WITH NOWAIT;

      END TRY
      BEGIN CATCH
        IF @@TRANCOUNT > 0 ROLLBACK TRAN;
        SELECT @message = CONVERT(VARCHAR(30), GETDATE(), 121) + ': Transaction rolled back, appLock released ..'
        RAISERROR(@message, 0, 1) WITH NOWAIT;
        THROW;
      END CATCH
    END
    GO

    When executed ..

    I would not be me if I did not include a test and a proof that it works.

    As you can see in the printscreens below, all executions as nicely after each other ..
    mssql-e05-354
    mssql-e05-356
    mssql-e05-355
    mssql-e05-357

    As you can see in the printscreens below, this even works well, when an error occurs, due to the TRY .. CATCH block ..
    mssql-e05-361
    mssql-e05-358
    mssql-e05-360
    mssql-e05-359

    Using ‘Session’ as lock owner

    In this stored procedure example (of which you can download the script) sp_getapplock is used with @LockOwner = ‘Session’.
    An “advantage” of using the session might be that you can also use it, when you do not want to use a transaction within your stored procedure.
    As with the previous example, now also it is really important to use the TRY .. CATCH code so that the lock is always released before the stored procedure execution ends.

    demo_setup_sp_StoredProcUsingAppLockNoTran.sql
    -------------- [sp].[StoredProcUsingAppLockNoTran] --------------
    IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[sp].[StoredProcUsingAppLockNoTran]') AND type in (N'P', N'PC'))
      DROP PROCEDURE [sp].[StoredProcUsingAppLockNoTran]
    GO
    SET ANSI_NULLS ON
    GO
    SET QUOTED_IDENTIFIER ON
    GO
    /*
    ==========================================================================================
    Author: Hans Michiels
    Create date: 5-oct-2016
    Description: Example of stored procedure that is using sp_getapplock without using a transaction.
    ==========================================================================================
    */

    CREATE PROCEDURE [sp].[StoredProcUsingAppLockNoTran]
    (
      @RaiseError BIT = 0
    )
    AS
    BEGIN
      SET NOCOUNT ON;
      DECLARE @RC INT
      DECLARE @message VARCHAR(500)

      BEGIN TRY

        --\
        ---) Protection Against Concurrent Executions
        --/
        SELECT @message = CONVERT(VARCHAR(30), GETDATE(), 121) + ': Try to obtain a lock ..'
        RAISERROR(@message, 0, 1) WITH NOWAIT;

        EXEC @RC = sp_getapplock
            @Resource = 'StoredProcUsingAppLockNoTran',
            @LockMode = 'Exclusive',
            @LockOwner = 'Session',
            @LockTimeout = 60000 -- 1 minute

        IF @RC < 0
        BEGIN
            SELECT @message = CONVERT(VARCHAR(30), GETDATE(), 121) + ': Sorry, could not obtain a lock within the timeout period, return code was ' + CONVERT(VARCHAR(30), @RC) + '.'
            RAISERROR(@message, 0, 1) WITH NOWAIT;
            RETURN @RC
        END ELSE BEGIN
            SELECT @message = CONVERT(VARCHAR(30), GETDATE(), 121) + ': AppLock obtained ..'
            RAISERROR(@message, 0, 1) WITH NOWAIT;
        END
      
        --\
        ---) Stored procedure body
        --/
        -- Wait so that stored procedure has some considerable execution time.
        SELECT @message = CONVERT(VARCHAR(30), GETDATE(), 121) + ': Hello World!'
        RAISERROR(@message, 0, 1) WITH NOWAIT;

        DECLARE @i INT = 0

        WHILE @i < 6
        BEGIN
            WAITFOR DELAY '00:00:01';
            EXEC [sp].[SubStoredProc];
            SET @i = @i + 1
        END

        IF @RaiseError = 1
        BEGIN
            RAISERROR('An error on demand', 16, 1);
        END

        SELECT @message = CONVERT(VARCHAR(30), GETDATE(), 121) + ': Goodbye ..'
        RAISERROR(@message, 0, 1) WITH NOWAIT;

        EXEC @RC = sp_releaseapplock @Resource='StoredProcUsingAppLockNoTran', @LockOwner='Session';
        SELECT @message = CONVERT(VARCHAR(30), GETDATE(), 121) + ': AppLock released ..'
        RAISERROR(@message, 0, 1) WITH NOWAIT;

      END TRY
      BEGIN CATCH
        EXEC @RC = sp_releaseapplock @Resource='StoredProcUsingAppLockNoTran', @LockOwner='Session';
        SELECT @message = CONVERT(VARCHAR(30), GETDATE(), 121) + ': AppLock released after error ..'
        RAISERROR(@message, 0, 1) WITH NOWAIT;
        THROW;
      END CATCH
    END
    GO

    When executed ..

    As you can see in the printscreens below, all executions as nicely after each other ..
    mssql-e05-363
    mssql-e05-362
    mssql-e05-365
    mssql-e05-364

    As you can see in the printscreens below, this even works well, when an error occurs, due to the TRY .. CATCH block ..
    mssql-e05-367
    mssql-e05-366
    mssql-e05-368
    mssql-e05-369

    Other considerations

    As you might have noticed, this only worked because the timeout was 60 seconds, and all the executions could be completed within those 60 seconds. If this would not be the case, the stored procedure execution that was waiting to obtain the applock would time out and would bail out due to a RETURN statement.
    So you might play a bit with the timeout time to influence this behavior.
    If you do not care that the non-first executions bail out, you could set the time out to a smaller value, e.g. 5000 (milliseconds).
    If you absolutely want all executions to continue after waiting, you could set the time out to a higher value, that is above the maximum execution time of the stored procedure.
    In this way you can choose to let non-first executions wait or not, as could be achieved by using the @WaitIfExecuted parameter in my initial solution.

    My initial solution using dynamic management views

    So if you want to create a stored procedure that is Protected Against Concurrent Executions, you have to do it yourself.
    Making a PACE stored procedure is not very difficult. There is however one big pitfall. The examples on the internet I found all have nasty bug in it, at least when your stored procedure executes other stored procedures: when this happens, the “main” stored procedure goes off the radar, when the dynamic management objects sys.dm_exec_requests and sys.dm_exec_sql_text() are used, that means, you cannot detect that it is running as long as the “sub” stored procedure is being executed.
    To make the challenge even bigger, the object_id of the “main” stored procedure is nowhere, so with the available dynamic management objects there is no way to solve this.
    Therefore an extra “user” dynamic management table is needed, I called it [udm_storedproc_executions]. It is automaticly cleaned up in the stored procedure [sp].[GetPace].
    This stored procedure is used to implement Protection Against Concurrent Executions.
    You execute it at the beginning in stored procedures that you want to protect against concurrent executions.

    How it works

    • You create the table [dbo].[udm_storedproc_executions] (or name it differently) in your database.
    • You create the stored procedure [sp].[GetPace] (or name it differently) in your database.
    • You add the code snippet below to stored procedures that are not supposed to run multiple times concurrently.

    Create the table

    udm_storedproc_executions.sql
    /*
    (c) Copyright 2016 - hansmichiels.com
     
    This program is free software: you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation, either version 3 of the License, or
    (at your option) any later version.
     
    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
    GNU General Public License for more details.
     
    You should have received a copy of the GNU General Public License
    along with this program. If not, see .
    */

    IF OBJECT_ID('[dbo].[udm_storedproc_executions]', 'U') IS NOT NULL
       DROP TABLE [dbo].[udm_storedproc_executions];
    GO
    SET ANSI_NULLS ON
    GO
    SET QUOTED_IDENTIFIER ON
    GO
    SET ANSI_PADDING OFF
    GO

    CREATE TABLE [dbo].[udm_storedproc_executions](
     [start_time] [datetime] NOT NULL,
     [session_id] [smallint] NOT NULL,
     [connection_id] [uniqueidentifier] NOT NULL,
     [task_address] [varbinary](8) NOT NULL,
     [object_id] [int] NOT NULL,
      CONSTRAINT [PK_dbo_udm_storedproc_executions] PRIMARY KEY CLUSTERED
    ( [start_time] ASC,
      [session_id] ASC,
      [connection_id] ASC,
      [task_address] ASC
    ) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
    ) ON [PRIMARY]
    GO

    SET ANSI_PADDING OFF
    GO

    Create the stored procedure

    sp_GetPace.sql
    /*
    (c) Copyright 2016 - hansmichiels.com
     
    This program is free software: you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation, either version 3 of the License, or
    (at your option) any later version.
     
    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
    GNU General Public License for more details.
     
    You should have received a copy of the GNU General Public License
    along with this program. If not, see .
    */

    -------------- [sp].[GetPace] --------------
    IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[sp].[GetPace]') AND type in (N'P', N'PC'))
      DROP PROCEDURE [sp].[GetPace]
    GO
    SET ANSI_NULLS ON
    GO
    SET QUOTED_IDENTIFIER ON
    GO
    /*
    ==========================================================================================
    Author: Hans Michiels
    Create date: 17-sep-2016
    Description: Generic stored procedure that can give a different stored procedure
                 PACE (Protection Against Concurrent Execution).
    ==========================================================================================
    */

    CREATE PROCEDURE [sp].[GetPace]
    (
      @StoredProcedureName NVARCHAR(256), -- Quoted name including schema, e.g. [dbo].[SomeStoredProcedure]
      @WaitIfExecuted BIT = 0, -- Indication whether to wait when the stored procedure is already executing (and execute it afterwards).
      @PaceAction VARCHAR(8) OUTPUT, -- Domain values [ CONTINUE | WAIT | RETURN ]
      @InsertDmRow BIT = 0, -- Indication if a row must be inserted to table [dbo].[udm_storedproc_executions]
      @EnableDebugMessages BIT = 0 -- Speaks for itself
    )
    AS
    BEGIN
      SET NOCOUNT ON;

      --\
      ---) Declarations.
      --/
      DECLARE @SpidThatWins INT
      DECLARE @WaitTime CHAR(8) = '00:00:03'
      DECLARE @message NVARCHAR(500)

      --\
      ---) Insert new row into table [dbo].[udm_storedproc_executions].
      --/
      IF @InsertDmRow = 1
      BEGIN
          INSERT INTO [dbo].[udm_storedproc_executions]
            SELECT
                der.[start_time],
                der.[session_id],
                der.[connection_id],
                der.[task_address],
                OBJECT_ID(@StoredProcedureName, 'P') AS [object_id]
            FROM
                sys.dm_exec_requests der
            CROSS APPLY
                sys.dm_exec_sql_text(der.sql_handle) xst
            WHERE
                der.session_id = @@SPID
                AND der.[sql_handle] is not null
                AND xst.objectid = OBJECT_ID('[sp].[GetPace]', 'P');
      END

      -- We need to prevent that this stored procedure would be executed multiple times
      -- simultaneously.
      SELECT @SpidThatWins = -- The one that first started. If started at exact the same time, the lowest SPID wins.
        (
        SELECT TOP 1 der.[session_id]
        FROM
            sys.dm_exec_requests der
        JOIN
            dbo.udm_storedproc_executions spx
            ON spx.[session_id] = der.[session_id]
            AND spx.[start_time] = der.[start_time]
            AND spx.[connection_id] = der.[connection_id]
            AND spx.[task_address] = der.[task_address]
        WHERE
            der.[sql_handle] is not null
            AND spx.[object_id] = OBJECT_ID(@StoredProcedureName, 'P')
        ORDER BY
            der.[start_time] ASC,
            der.[session_id] ASC
        )

      SELECT @PaceAction =
      CASE
        WHEN @@SPID = @SpidThatWins THEN 'CONTINUE'
        WHEN @WaitIfExecuted = 0 THEN 'RETURN'
        WHEN @WaitIfExecuted = 1 THEN 'WAIT'
      END

      IF @PaceAction = 'WAIT'
      BEGIN
          -- Still executing, so wait until I can start.
          IF @EnableDebugMessages = 1
          BEGIN
              -- Still executing, so wait until I can start.
              SET @message = @StoredProcedureName + N' is waiting for other execution to finish .. ';
              RAISERROR(@message, 0, 1) WITH NOWAIT;
          END
          WAITFOR DELAY @WaitTime
      END
      IF @PaceAction = 'RETURN' AND @EnableDebugMessages = 1
      BEGIN
          -- Still executing, so wait until I can start.
          SET @message = N'Cancelling execution of stored proc ' + @StoredProcedureName;
          RAISERROR(@message, 0, 1) WITH NOWAIT;
      END

      --\
      ---) Cleanup old rows from [udm_storedproc_executions].
      --/
      DELETE FROM spx
        FROM
            [dbo].[udm_storedproc_executions] spx
        LEFT JOIN
            sys.dm_exec_requests der
            ON spx.[session_id] = der.[session_id]
            AND spx.[start_time] = der.[start_time]
            AND spx.[connection_id] = der.[connection_id]
            AND spx.[task_address] = der.[task_address]
        WHERE
            der.session_id IS NULL;
    END
    GO

    Add the code snippet

    code_snippet_sp_start.sql
    CREATE PROCEDURE [sp].[StoredProcGettingPace]
    AS
    BEGIN
      SET NOCOUNT ON;
      --\
      ---) PACE (Protection Against Concurrent Executions) code
      --/
      DECLARE @PaceAction VARCHAR(8), @InsertDmRow BIT = 1
      WHILE 1 = 1
      BEGIN
          EXECUTE [sp].[GetPace]
            @StoredProcedureName = '[sp].[StoredProcGettingPace]',
            @WaitIfExecuted = 1,
            @PaceAction = @PaceAction OUTPUT,
            @InsertDmRow = @InsertDmRow,
            @EnableDebugMessages = 1;

          IF @PaceAction = 'RETURN' RETURN 0;
          IF @PaceAction = 'CONTINUE' BREAK;
          SET @InsertDmRow = 0;
      END

      --\
      ---) Stored procedure body ..
      --/
    END
    GO

    Proof that it works

    For the demo I use the following stored procedures:

    • [sp].[SubStoredProc]
    • [sp].[StoredProcGettingPace]
    • [sp].[StoredProcGettingPaceNoWait]

    By the way, all the scripts can be downloaded here.

    [sp].[SubStoredProc] is being executed by both other stored procedures.

    [sp].[StoredProcGettingPace]

    First I will demonstrate the testresults with [sp].[StoredProcGettingPace].
    It is executed from 4 query windows at the same time. Debug messages show us what happened.

    mssql-e05-341The query that predicts the order in which the session_ids will be processed: the order is 60, 59, 57, 52.

    mssql-e05-342As expected session 60 is processed first.

    mssql-e05-343As expected session 59 is processed after 60. The “Hello world” time (the beginning of the actual stored procedure body) is after the “Goodbye” time of session 60.

    mssql-e05-344Then session 57. Again the “Hello world” time is after the “Goodbye” time of session 59.

    mssql-e05-345And finally session 52. It goes without saying that the “Hello world” time is after the “Goodbye” time of the previous session.

    [sp].[StoredProcGettingPaceNoWait]

    It is important to understand parameter @WaitIfExecuted of stored procedure [sp].[GetPace].
    In the previous example we used the value 1 (true), which means that every execution of a stored procedure protected with [sp].[GetPace] is eventually executed, but later.
    But if we use the value 0 (false) for @WaitIfExecuted, we actually cancel the execution of a stored procedure if it is already running.

    Here are the testresults of executing [sp].[StoredProcGettingPaceNoWait].
    Again it is executed from 4 query windows at the same time and debug messages show us what happened.

    mssql-e05-346As expected the stored procedure is executed only once. The other executions are cancelled (see below).

    mssql-e05-347

    mssql-e05-348

    mssql-e05-349

    Download all the scripts here.

    Conclusion / Wrap up

    In this blog post you could read about two different solutions that can prevent a stored procedure from being executed multiple times simultaneously.
    This can be convenient in certain situations, when the stored procedure does some “global” work in your database, and it would be unnecessary and potentially “dangerous” when it is executing multiple times concurrently.

    (c) 2016 hansmichiels.com – Do not steal the contents – spread the link instead – thank you.

    Zeros, bloody zeros! (Data Vault Series)

    Introduction

    I must admit I have a weakness for British humour.
    When I had to cope with leading zeros in business keys some time ago, I spontaneously came up with the title of this post, not knowing that it would serve as such.
    For those who do not know, “Meetings, bloody meetings” is a British comedy training film in which John Cleese plays a main role. It was made in 1976, and a remake was made in 2012.
    It tells in a funny way what can go wrong at meetings and how you can do better, check it out if you can.

    DV-S01E05-meetingsMr John Cleese

    But, obviously, this post is not about meetings but about zeros.

    Problem

    I can be short about that: leading zeros in business key values.
    For instance a customer number is delivered to the data warehouse as 0001806 (instead of 1806).
    This would not be a problem it is would always be delivered exactly like that. But to be honest, you can and will not know that upfront. Even this might be the case now, it might not be in the future.
    When other tools are used, leading zeros could suddenly disappear (for instance when a csv file is modified using Excel), or (more rarely) the number of leading zeros could change (01806, 00001806). When this happens you have a problem, because for the data warehouse 01806, 0001806, 00001806 and 1806 are all different business keys! Even if you have only two variants, it is already a problem.
    Because every business key gets a different row in the hub, and this customer now exists multiple times!

    DV-S01E05-zeros(No acting here, this is how I look sometimes)

    Solution

    If you are familiar with Data Vault, you might already think of same-as-links to solve this.
    But I think the solution should be implemented earlier, to avoid having multiple hub rows.
    Simply always remove leading zeros when the sourcecolumn is (part of) a business key (either primary or foreign key) and seems a number or ID but is delivered as a string/varchar. In this way 1806 will always be 1806! And I think it is pretty impossible that 001806 and 1806 would refer to two different customers.
    Unless, of course, they would come from different source systems. But in that situation, depending on leading zeros would be a bad thing to do, because when then leading zeros dropped off, satellite rows of different customers (in different source systems) could end up as connected to the same hub row! In this situation, in a non-integrated Raw Vault, it would be better to prefix the business key with the source system code and remove the leading zeros, for instance, CRM.1806 and ERP.1806.
    In all cases, you can still store the original value (with leading zeros) as an ordinary attribute in a satellite for auditing reasons.

    How to implement the solution

    There are many ways to remove leading zeros. When I was searching for this I had two requirements:

    • No casting from and to an integer may take place, otherwise all business keys need to be numeric, so this would make the solution less reliable.
    • No function, routine or assembly may be called, this could negatively impact performance. I was looking for an “inline” conversion.

    After some research I found an expression that was the same for SQL and SSIS and quite okay (T-SQL version by Robin Hames, my credits for his work), but appeared to change a string with only one or more zeros to an empty string. And because a 0 can have a meaning – and is certainly different from an empty string – this is undesired behavior, IMHO.
    So I had to add some logic to it: a SELECT CASE in T-SQL and an inline condition (format {condition} ? {true part} : {false part} ) to the SSIS expression.
    Furthermore I came on a different method for T-SQL as well, using the PATINDEX function, which is more compact than the other solution.
    For SSIS I still use the ‘Robin Hames’ method, because the PATINDEX function is not available in SSIS Expressions.
    So .. this is what it has become:

    T-SQL

    Remove_leading_zeros.sql

    SELECT
        example.[id_with_leading_zeros],
       CASE
          WHEN LTRIM(example.[id_with_leading_zeros]) = '' THEN ''
          WHEN PATINDEX( '%[^0 ]%', example.[id_with_leading_zeros]) = 0 THEN '0'
          ELSE SUBSTRING(example.[id_with_leading_zeros], PATINDEX('%[^0 ]%', example.[id_with_leading_zeros]), LEN(example.[id_with_leading_zeros]))
       END AS [id_without_zeros_method1],

       CASE
          WHEN LTRIM(example.[id_with_leading_zeros]) = '' THEN ''
          WHEN PATINDEX( '%[^0 ]%', example.[id_with_leading_zeros]) = 0 THEN '0'
          ELSE REPLACE(REPLACE(LTRIM(REPLACE(-- Robin Hames' method
                REPLACE(LTRIM(example.[id_with_leading_zeros]), ' ', '!#!') -- replace existing spaces with a string that does not occur in the column value, I have chosen '!#!'
                , '0', ' ') -- replace '0' with ' '
                ) -- end of LTRIM to remove leading '0's that have been changed to ' 's
                , ' ', '0') -- change ' ' back to '0'
                , '!#!', ' ') -- change '!#!' back to ' '
       END AS [id_without_zeros_method2]
    FROM
        (
        SELECT
            TOP 1000 RIGHT('00000000000' + CONVERT(NVARCHAR(12), object_id), 14) AS [id_with_leading_zeros]
        FROM
            master.sys.objects
        UNION
        SELECT N' 00000 '
        UNION
        SELECT N'00'
        UNION
        SELECT N' '
        UNION
        SELECT ' 0099990 A '
        UNION
        SELECT '-5550'
        ) example

    SSIS Expression (can be used in Derived Column)

    (LTRIM(REPLACE(id_with_leading_zeros,"0", "")) == "" && LTRIM(id_with_leading_zeros) != "") ? "0" : REPLACE(REPLACE(LTRIM(REPLACE(REPLACE(LTRIM(id_with_leading_zeros)," ","!#!"),"0"," "))," ","0"),"!#!"," ")

    DV-S01E05-151In a Derived Column Transformation this looks for instance like this

    Conclusion / Wrap up

    In this post I have motivated why I think you should remove leading zeros from business keys when data is loaded from source systems to a data warehouse.
    This post also contains different ways to remove leading zeros, two for T-SQL and one for a SSIS expression.

    (c) 2016 hansmichiels.com – Do not steal the contents – spread the link instead – thank you.