Tag Archives: microsoft

The Data Warehouse Automation Tools Survey Results (Data Warehouse Series)

Introduction

As you might know, I started a survey on the usage and opinion about Data Warehouse Automation (DWA) Tools.
Today I am presenting the results.
I sincerely want to thank all the participants for sharing your thoughts.

The participants

First some insight into the persons that filled in the survey. You were not with so many (62 completed surveys).

In statistics there is a formula to compute the minimum sample (number of participants) given a certain margin of error, reliability level and total population. If I fill in 10% for error margin, 90% for reliability level and 20000 for total population (this number does not have much impact on the end result) the minimum sample (number of participants) should be 68.
Please note that with 10% error margin and 90% reliability level that there are considerable chances that the answers are not representative for the entire population of data warehouse professionals.

Participants by country

Participants are spread over 23 countries on several continents, with a concentration in the Netherlands and the United States.

Detailed list:

Country No of participants
Argentina 1
Australia 1
Austria 1
Belgium 3
Brazil 1
Bulgaria 1
Denmark 1
Germany 3
Ghana 1
Hungary 1
Iceland 1
India 4
Lithuania 1
Netherlands 20
New Zealand 1
Norway 1
Romania 1
Slovakia 1
Spain 1
Switzerland 3
Turkey 1
United Kingdom 2
United States 10

Jobroles of participants

Almost half of the participants selected Data Warehouse Architect as best describing their current job role.
Furthermore there were 12 ETL developers and a number of other job roles, as shown in the graph.

Exact counts:

Jobrole No of participants
BI & Analytics/Data Science manager 2
BI Consultant 1
Business analyst 1
Data warehouse architect 30
Director 2
ETL Developer 12
Front end BI Developer 1
Manager 1
Multi-disciplined BI Professional 4
Project manager 6
SQL Developer 1

Familiarity with the investigated tools

First question to answer is: how familiar are the respondents with those tools?
I used 5 levels of familiarity with a short name and explanation. The option values have the following meaning:

  1. Don’t know: I do not know this product.
  2. Heard of: I have heard of this product but I have not worked in a project where it was tried or used.
  3. Tried: I have tried this product but not used it in production environments.
  4. Used: I have worked in a project where this product was used.
  5. Using: I am currently working in a project where this product is being used.

The results in a graph:

And the exact figures, first absolute, then percentages:

Issuer – Productname Don’t know Heard of Tried Used Using
AnalytiX DS – CatfX / Data Vault Code Gen Bundle 44 14 3 0 0
Attunity – Attunity Compose (formerly BiReady) 36 21 4 0 1
Gamma Systems – Data Warehouse Studio 55 5 1 0 0
Javlin – CloverETL 51 7 2 1 0
Magnitude – Magnitude Kalido 39 20 1 1 0
Qosqo – Quipu 43 12 3 2 1
TimeXtender – TimeXtender 42 14 3 0 2
Trivadis – biGenius 52 5 1 1 2
Varigence – BimlExpress / Bidshelper 43 3 4 3 8
Varigence – Mist / BimlStudio 46 4 9 1 1
WhereScape – RED 35 10 6 5 5
Average 44,2 10,5 3,4 1,3 1,8
Issuer – Productname Don’t know Heard of Tried Used Using
AnalytiX DS – CatfX / Data Vault Code Gen Bundle 72,1% 23% 4,9% 0% 0%
Attunity – Attunity Compose (formerly BiReady) 58,1% 33,9% 6,5% 0% 1,6%
Gamma Systems – Data Warehouse Studio 90,2% 8,2% 1,6% 0% 0%
Javlin – CloverETL 83,6% 11,5% 3,3% 1,6% 0%
Magnitude – Magnitude Kalido 63,9% 32,8% 1,6% 1,6% 0%
Qosqo – Quipu 70,5% 19,7% 4,9% 3,3% 1,6%
TimeXtender – TimeXtender 68,9% 23% 4,9% 0% 3,3%
Trivadis – biGenius 85,2% 8,2% 1,6% 1,6% 3,3%
Varigence – BimlExpress / Bidshelper 70,5% 4,9% 6,6% 4,9% 13,1%
Varigence – Mist / BimlStudio 75,4% 6,6% 14,8% 1,6% 1,6%
WhereScape – RED 57,4% 16,4% 9,8% 8,2% 8,2%
Average 72,3% 17,2% 5,6% 2,1% 2,9%

I think most remarkable is that most of the tools are not very well known: the answer “Don’t know” is on the average given by 72,3% of the participants.

Product satisfaction

The product satisfaction question gave the insight that there are no ‘bad’ tools. All tools fulfill a need, and the digits go from 3 to 5 (on a scale of 1 to 5). So the products with the lowest mark for satisfaction, still score a ‘C’ (or as we say in the Netherlands, a ‘vijfje’).

Value for money

Value for money is obviously a very subjective matter. What one person thinks is expensive, the other might think of as cheap. Most vendors are not transparent about their prices.
Marks vary from 1,80 to 3,46. What we can conclude from this is that some tools have the perception of a relatively low value for money, while others provide a better value for money, according to the participants of the survey.
The products that have an average score of 3 or higher (again on a scale of 1 to 5) are both Varigence products (BimlExpress/Bidshelper and Mist/BimlStudio), AnalytiX DS (CatfX / Data Vault Code Gen Bundle) and Trivadis (biGenius).

By the way, the question in the survey had the following scale: 1=cheap and 5=expensive. I translated this to a value for money by reversing the answers (so 1 became 5 and vice versa and 2 became 4 and vice versa). In this way, a higher score means a ‘cheaper’ pricing perception, thus a better value for money.

Other tools that were mentioned

A number of other tools were also mentioned, many of them only once or twice.
Remarkable is that also tools that do not meet the “DWA Tool” definition were mentioned, for instance Microsoft BI Products.
What I learned from this is that I should have given a definition of what I meant with a “Data Warehouse Automation Tool”.
From Wikipedia:
“Data warehouse automation or DWA refers to the process of accelerating and automating the data warehouse development cycles, while assuring quality and consistency. DWA is believed to provide automation of the entire lifecycle of a data warehouse, from source system analysis to testing to documentation. It helps improve productivity, reduce cost, and improve overall quality.”

Worth mentioning is the open source tool ODE. I didn’t know it but tried it recently. It’s a sql based, metadata driven tool without graphical user interface, and I could not get an “AdventureWorks” example working due to errors that occurred during working through the example. So I think this tool needs to become more stable or at least should have a better manual on how to use it. But it’s Open Source and that puts a smile on my face.
From a company/human resources perspective, working with such a tool can however lead to a hidden cost because using it can be more laborious than working with a commercial, ‘more stable’ product. And once you start changing the source code (for yourself or your company, not as member of the Open Source Development team) you can’t upgrade to newer versions anymore and are stuck with the maintenance of your copy of the tool. And if you join the Open Source Team you are now doing two projects instead of one!
In the Netherlands we have an old-fashioned saying “Bezint eer gij begint” which means “think well before you start”, and I think this is applicable here as well. A license for a commercial tool might be worth the money.

What also appears to be the case, that a number of those tools are not for sale, but are tools that consultancy firms use for doing projects.

List of all the other tools that were mentioned:

Issuer – Productname Mentions
Adante – Datawarehouse on Demand 1
Automic – One Automation 2
Birst – Birst 1
Centennium – TBD engine (template based development) 1
Cubido – Cubikit 1
DIKW – DHW Accelerator 1
Grip Op Data B.V. – GDIP 1
Infent – Infenture Insight 1
Informatica – Powercenter 2
Informatica – B2b data exchange 1
Informatica – Informatica 1
Informatica – MFT 1
I-refact – I-refactory 1
Kadenza – Smart Datawarehouse generator 1
Microsoft – MSBI(SSIS,SSAS,SSRS) 9
Microsoft – Power BI 1
Optimalbi Limited – ODE (Open source) 1
Oracle – ODI 1
Oracle – OWB 1
Own build – Excelbased tool 1
Pentaho – Kettle 2
Pentaho – Big Data 1
Rabobank – Foundation Toolkit 1
SAP – Data Services 2
SAP – Information Steward 1
Tableau – Server 1
Talend – Data Fabric 1
Talend – Open Studio 1
TopBi – Data Vault Generator 1
TopBI – HSG 1
Varigence – BimlFlex 1
Varigence – BimlOnline 1
Viz Analytics – Data Warehouse Studio 1

Comments given about all the tools

Participants could give their opinion about the tools in free text format – and they did!
Here are just the bare comments.

Attunity – Attunity Compose (formerly BiReady)
  • “Like features, makes work flow easily “
Qosqo – Quipu
  • “This one could be split up between the free and paid variant for this research.”
  • “Most advanced options on data vault modeling and generation in one tool. Does not support data marts, requires Groovy knowledge to change templates.”
TimeXtender – TimeXtender
  • “vendor lock-in, no best practices applied when generating SSAS cubes”
  • “lock in, very good features, easy to use (even for business users)”
Trivadis – biGenius
  • “if they extend SSIS generation no lock in, very flexibel, fast, cool”
  • “No Vendor-Lock (generated Code can be scheduled and executed native), Microsoft and Oracle-Support, Data Vault Support”
  • “Integrated product, short learning curve, easy to adapt”
Varigence – BimlExpress / Bidshelper
  • “Great to building SSIS packages.”
  • “Free, but you need to write the generating code first. Ideal for consultants though.”
  • “Nice combination, but needs to mature.”
  • “no lock in, very cool that you generate SSIS/SSAS, free, a lot of work coding, not for everyone”
  • “Very flexible, but long learning curve”
  • “Second best tool to generate SSIS packages. Requires a lot of exceptions if you want to alter parts of code for specific tables. Debugging is hell.”
Varigence – Mist / BimlStudio
  • “Has potential but is progressing slowly. Pricing is fair.”
  • “No vendor lock-in, as the result of BIML are just SSIS packages”
  • “More mature version.”
  • “easier, expensive, nice you can easily re-engineer from existing DWH”
  • “There is no reverse engineering module.We can not convert old package to use BimlStudio”
  • “Very flexible”
  • “Best tool to generate SSIS packages. Still requires a lot of manual labor for low level changes.”
WhereScape – RED
  • “We were able to increase development speed with a factor 3. “
  • “Vendor-Lock (needs App-Server for Scheduling, Processing), Microsoft not fully supported (only Beta), Data Vault Support”
  • “Creates stored procedures which clutters db’s, powerful tool, but has some annoyances”
  • “Several products which are not completely integrated, some features not “ready”, several limitations (especially when working with datasets, expensive for what you get, only based overseas (except smalll base in UK)”
  • “You have to click and type almost everything, then it “generates” a stored procedure. Also stores its repository in the same database and schema as your data warehouse.”
  • “Changed projects into initiatives enabling business analytics. “
  • “Like lineage, like support for multiple generation types (such as data vault), dislike performance, dislike excessive storage, dislike cost of consulting, dislike unreliability, dislike UI, dislike lack of adapters, dislike reliance on database procedures, dislike scheduler, dislike target specific repository”

Comments for other tools (not all available for sale)

Adante – Datawarehouse on Demand
  • “Very easy to use automation tool for Dimensional modeling”
Centennium – TBD engine (template based development)
  • “SQL server based metadata based staging, vault en datamart generator. Developed by Centennium, for the larger part made adult by me.”
I-refact – I-refactory
  • “A very solid, well thought solution for the data management. It’s not just for dwh, but for the whole data platform of your company.”
Kadenza – Smart Datawarehouse generator
  • “Model driven generator c-sharp”
Talend – Data Fabric
  • “Talend is a very friendly for completing complex operations like Data Integration, Data Quality and Big Data (towards Hadoop) with minimum programming skills”
Viz Analytics – Data Warehouse Studio
  • “Custom built from the ground up, with own ETL engine, very quick to build and implement DWs.”

Used technologies

One of the questions was about the technology that is being used (from Gartner Magic Quadrant for BI and Analytics). As more than one answer was allowed, the total number of responses is more than the number of participants, because some participants use two or more technologies.
In the Top 5 we see: Microsoft, Oracle, Tableau, SAP and Qlik. To be honest, nothing that surprises me.

Technology Mentions
Alteryx 7
Birst 3
Board International 3
Datameer 3
IBM 11
Information Builders 3
Microsoft 47
MicroStrategy 7
Oracle 25
Pentaho 6
Qlik 13
SAP 16
SAS 5
Sisense 1
Tableau 20
TIBCO Software 2
Yellowfin 1
Zoomdata 1

Conclusion / Wrap up

With this survey I tried to give the business intelligence community more insight into the available Data Warehouse Automation Tools.
Due to the low number of participants, conclusions come with large uncertainty margins. I think this is a pity.
Nevertheless this investigation could be good starting point for further analysis on DWA Tools or a DWA Tool selection.

(c) 2017 hansmichiels.com – Do not steal the contents – spread the link instead – thank you.

Is SSDT ready for Visual Studio 2017 RC already? (Tooling Series)

Introduction

At the moment of writing, Microsoft released Visual Studio 2017 RC (Release Candidate) a week ago. It has been the fourth release of a Visual Studio 2017 “preview” since November 2016.
The final release is planned for the first half of 2017 (source: Redmond Channel Partner website)
Time to see what’s in the box!
And I am especially interested in how will it cooperate with SQL Server Data Tools (SSDT).

Download and installation

Visual Studio 2017 RC1 can be downloaded here. It’s a webinstaller, so during installation it will download the components you have selected.
Installation is quite straightforward.
A quiet install (with my Powershell script) does not work yet, I did not put effort in it to find out why.
I just started the executable for a normal install.
The number of Components you can install is impressive.


Installable parts of Visual Studio 2017 ..

I have selected Data storage and processing and .NET desktop development (for my open source console applications).
After selection they appear in the summary on the right (see picture below).


Installation summary ..

The installation doesn’t take too long; I did not use a stopwatch, but think it took about 10 to 15 minutes.

What draws the attention?

I have choosen the blue color theme, and Visual Studio 2017 looks very similar to Visual Studio 2015.


Choosing the color theme ..

What is new is a Visual Studio Installer shortcut in the “Apps” of Windows.


Visual Studio Installer shortcut ..
You can use it to modify your installation easily.


Visual Studio Installer screen ..

However slightly out of scope, I couldn’t help that I wanted to open one of my console apps, SQLAgentJobStarter, in this version to check if any upgrade or modification would be necessary. And the answer is: no. It just opens.
What draws my attention are features in the editor that in the past could only be achieved with third party tools. For example a code change is suggested, and the preview of the change is shown before you apply it. Useful!

Visual Studio suggests code changes and shows a preview of the change ..

Is SSDT ready for Visual Studio 2017?

No. Sorry.
I downloaded Download SQL Server Data Tools (17.0 RC1), the description says “Includes support for SQL Server vNext CTP1, but not recommended for production use.”.
Apparently it is more related to “SQL Server vNext” (SQL Server 2018?) than to “Visual Studio vNext”.
And it’s just SSDT for Visual Studio 2015.
It’s doesn’t even install, probably because VS 2015 is not installed.

SSDT (17.0 RC1) does not install ..

But is this bad? Again, a no. Just have some patience.
SSDT has a different release cycle than Visual Studio. On Microsoft’s SSDT blog there is no news on any SSDT activity for VS 2017.
I have read on some forum (sorry could not find the link anymore) that it usually takes a few months after a new release of Visual Studio before an update for SSDT will become available.

Conclusion / Wrap up

So, you could already install VS 2017, but for BI projects (SSIS, SSRS, SSAS) you will still have to use SSDT for VS 2015.
Visual Studio 2017 is expected to be released in the first half of 2017, you have to add a few months for SSDT.

(c) 2017 hansmichiels.com – Do not steal the contents – spread the link instead – thank you.

Free download: Blog Yearbook 2016

I have made a compilation of all my blog posts of the past year, and it was more work than you would think.
Especially because it also contains a non-published new article.

After filling in the form you can download the e-book for free.

I promise you the following:

  • I will not give or sell your email address to any third party.
  • I will not spam you.
  • If I start with a newsletter, you can easily unsubscribe or change the frequency.

(c) 2016 hansmichiels.com – Do not steal the contents – spread the link instead – thank you.

Making your SQL Agent job scripts generic for all servers (SQL Server Series)

Introduction

If you schedule jobs and you are using SQL Server (not Express Edition) SQL Agent is your best friend. I think it is easier to use than Windows Scheduler for most use cases.
The GUI that SQL Server Management Studio has for creating and editing SQL Agent Jobs is also quite good, I think.

Problem

However, after the agent jobs are created, the SQL Agent Job scripting done by SQL Server Management Studio has a number of issues (Microsoft developers of the SQL Server team, please read!).
Where shall I start ..

  • I like to write repeatable scripts, that raise an error only when something is really wrong, so when the script is being executed as expected, no errors are thrown. The generated scripts however use brute force to delete any existing jobs. If they do not exist yet, an error occurs. This is not what I would call a real error, because it is logical that when a job does not exist yet, it cannot be deleted. A simple IF statement could check if the job exists and delete the job only if it exists.
  • Even worse, the generated scripts use the @job_id, a unique identifier, to delete the job. By definition, this will work only once! After the job is (re)created, the job_id will be different! It would be much better to use the @job_name as parameter for sp_delete_job (yes this is possible), it is just that the SQL Server developer team made the choice to use the @job_id .. 🙁
  • Because scripts are always delete and (re)create, instead of alter/modify, your entire job history is gone, when you change anything in your script and execute it on your server (sorry, no workaround for this).
  • Any (SQL) Server specific values, like SSIS Environment IDs and SQLInstance names, can make your scripts only suitable for one server, the one you created the job on with the SSMS GUI.

Solution

Given all the challenges above, I was looking for a way so:

  • I could create the SQL Agent jobs with the SSMS GUI.
  • Then script the jobs and put them under source control.
  • Then also make sure I can install the SQL Agent jobs on a different server (development/test/production) with the same (nearly) unmodified script.

For this to work, we are again visiting the system objects area of the msdb database.

The plan is roughly as follows:

  1. Create a wrapper stored procedure for sp_create_jobstep.
  2. Create a generic install script that partially is manual labour, that you have to do only once, and partially contains a script generated by SSMS for creating jobs.

Create a wrapper stored procedure for sp_create_jobstep.

This wrapper stored procedure, called [dbo].[usp_add_jobstep_wrapper] consists of the following three parts:

  • PART 1: Check if temp table #jobstep_replacements exists with project-/customer specific replacements: this temporary table can be created in a script that creates jobsteps and contains info about textual replacements to be done in any (N)VARCHAR parameter of the stored procedure sp_create_jobstep. However it would be neater to pass a table variable as parameter to the stored procedure, this would make adjusting scripted jobs more work, because a parameter would have to be added to the EXECUTE [dbo].[usp_add_jobstep_wrapper] command.
  • PART 2: Fix environment references in ssis commands: a feature that is open for improvement in a future SQL Server Service Pack is that when you create a jobstep that executes a SSIS Package, and you use a reference to an environment, in the script a technical ID is used, that is most likely to be different on another server. This is also not very easy to fix manually, or you have to look up all the new id’s with a query, and then change this in the script. Good as occupational therapy, but not so good for your productivity.
    So this part of the stored procedure fixes this /ENVREFERENCE for you.
  • PART 3: After doing replacements in parameter values, and fixing the /ENVREFERENCE, the built-in stored procedure sp_add_jobstep is executed.
    This implies that we simply can execute [dbo].[usp_add_jobstep_wrapper] instead of [dbo].[sp_add_jobstep] , and PART 1 and 2 will be done extra.

And here it is (can also be downloaded).

msdb.dbo.usp_add_jobstep_wrapper.sql

--\
---) hansmichiels.com [msdb].[dbo].[usp_add_jobstep_wrapper]
---) Author: Hans Michiels
---) Stored procedure that can help to make SQL Server Agent job scripts usable for multiple servers.
--/
/*
(c) Copyright 2016 - hansmichiels.com
 
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
 
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
 
You should have received a copy of the GNU General Public License
along with this program. If not, see http://www.gnu.org/licenses/.
*/

USE [msdb]
GO
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[usp_add_jobstep_wrapper]') AND type in (N'P', N'PC'))
  DROP PROCEDURE [dbo].[usp_add_jobstep_wrapper]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER OFF
GO
CREATE PROCEDURE [dbo].[usp_add_jobstep_wrapper]
  @job_id UNIQUEIDENTIFIER = NULL, -- Must provide either this or job_name
  @job_name sysname = NULL, -- Must provide either this or job_id
  @step_id INT = NULL, -- The proc assigns a default
  @step_name sysname,
  @subsystem NVARCHAR(40) = N'TSQL',
  @command NVARCHAR(max) = NULL,
  @additional_parameters NVARCHAR(max) = NULL,
  @cmdexec_success_code INT = 0,
  @on_success_action TINYINT = 1, -- 1 = Quit With Success, 2 = Quit With Failure, 3 = Goto Next Step, 4 = Goto Step
  @on_success_step_id INT = 0,
  @on_fail_action TINYINT = 2, -- 1 = Quit With Success, 2 = Quit With Failure, 3 = Goto Next Step, 4 = Goto Step
  @on_fail_step_id INT = 0,
  @server sysname = NULL,
  @database_name sysname = NULL,
  @database_user_name sysname = NULL,
  @retry_attempts INT = 0, -- No retries
  @retry_interval INT = 0, -- 0 minute interval
  @os_run_priority INT = 0, -- -15 = Idle, -1 = Below Normal, 0 = Normal, 1 = Above Normal, 15 = Time Critical)
  @output_file_name NVARCHAR(200) = NULL,
  @flags INT = 0, -- 0 = Normal,
                                                     -- 1 = Encrypted command (read only),
                                                     -- 2 = Append output files (if any),
                                                     -- 4 = Write TSQL step output to step history,
                                                     -- 8 = Write log to table (overwrite existing history),
                                                     -- 16 = Write log to table (append to existing history)
                                                     -- 32 = Write all output to job history
                                                     -- 64 = Create a Windows event to use as a signal for the Cmd jobstep to abort
  @proxy_id INT = NULL,
  @proxy_name sysname = NULL,
  -- mutual exclusive; must specify only one of above 2 parameters to
  -- identify the proxy.
  @step_uid UNIQUEIDENTIFIER = NULL OUTPUT
AS
BEGIN
  DECLARE @retval INT

  DECLARE @sort_order INT
  DECLARE @max_sort_order INT
  DECLARE @subsystems NVARCHAR(128)
  DECLARE @replace_scope NVARCHAR(128)
  DECLARE @scripted_value NVARCHAR(128)
  DECLARE @replace_value NVARCHAR(128)
  DECLARE @message NVARCHAR(MAX)
  DECLARE @divider_length INT = 120

  DECLARE @folder_name NVARCHAR(128)
  DECLARE @project_name NVARCHAR(128)
  DECLARE @startpos INT
  DECLARE @endpos INT
  DECLARE @endpos_project INT
  DECLARE @reference_id INT = -1
  DECLARE @new_reference NVARCHAR(128)
  DECLARE @new_command NVARCHAR(MAX)
  
  PRINT ' ';
  PRINT REPLICATE('-', @divider_length);
  PRINT 'Stored proc : [usp_add_jobstep_wrapper]'
  PRINT 'Copyright : (c) 2016 - hansmichiels.com'
  PRINT 'License : GNU General Public License, see http://www.gnu.org/licenses/'

  IF @job_name IS NOT NULL
  BEGIN
  -- PRINT 'Stored proc:
    SELECT @message = 'Job : ' + @job_name;
  END ELSE BEGIN
    SELECT @message = 'Job : ' + ISNULL((SELECT TOP 1 j.name FROM dbo.sysjobs j WHERE job_id = @job_id), N'(unknown)');
  END
  PRINT @message;
  SELECT @message = 'Jobstep : ' + @step_name;
  PRINT @message;
  PRINT ' ';

  --\---------------------------------------------------------------------------------------------------------
  ---) PART 1: Check if temp table #jobstep_replacements exists with project-/customer specific replacements.
  --/---------------------------------------------------------------------------------------------------------

  IF OBJECT_ID('tempdb..#jobstep_replacements', 'U') IS NULL
  BEGIN

    PRINT 'No jobstep_replacements were found, installing original script.';

  END ELSE BEGIN

    PRINT 'If any replacements are made, they will be shown below.';

    DECLARE replacements_cursor CURSOR LOCAL STATIC FOR
    SELECT TOP (2147483647) -- Otherwise ORDER BY might not work.
        CASE
          WHEN ISNULL(v.[subsystems], N'') IN (N'*', N'')
          THEN N'*'
          ELSE N',' + v.[subsystems] + N','
          END AS [subsystems],
        CASE
          WHEN ISNULL(v.[replace_scope], N'') IN (N'*', N'')
          THEN N'*'
          ELSE N',' + v.[replace_scope] + N','
          END AS [replace_scope],
        v.[scripted_value],
        v.[replace_value]
    FROM
        #jobstep_replacements v
    ORDER BY
        v.sort_order,
        v.id;
              
    OPEN replacements_cursor;

    FETCH NEXT FROM replacements_cursor
      INTO @subsystems, @replace_scope, @scripted_value, @replace_value;

    WHILE @@FETCH_STATUS = 0
    BEGIN

      IF (@subsystems = N'*' OR CHARINDEX(N',' + @subsystem + N',', @subsystems, 1) > 0)
          AND @replace_value IS NOT NULL
          AND @scripted_value != @replace_value
      BEGIN

        IF (@replace_scope = N'*' OR CHARINDEX(N',@command,', @replace_scope, 1) > 0) AND CHARINDEX(@scripted_value, @command, 1) > 0
        BEGIN
            SET @message = 'Replacement in @command: ' + @scripted_value + ' by ' + @replace_value;
            PRINT ' '
            PRINT @message;
            PRINT 'Old @command: ' + @command;
            SELECT @command = REPLACE(@command, @scripted_value, @replace_value);
            PRINT 'New @command: ' + @command;
        END

        IF (@replace_scope = N'*' OR CHARINDEX(N',@additional_parameters,', @replace_scope, 1) > 0) AND CHARINDEX(@scripted_value, @additional_parameters, 1) > 0
        BEGIN
            SET @message = 'Replacement in @additional_parameters: ' + @scripted_value + ' by ' + @replace_value;
            PRINT ' '
            PRINT @message;
            PRINT 'Old @additional_parameters: ' + @additional_parameters;
            SET @additional_parameters = REPLACE(@additional_parameters, @scripted_value, @replace_value);
            PRINT 'New @additional_parameters: ' + @additional_parameters;
        END

        IF (@replace_scope = N'*' OR CHARINDEX(N',@server,', @replace_scope, 1) > 0) AND CHARINDEX(@scripted_value, @server, 1) > 0
        BEGIN
            SET @message = 'Replacement in @server: ' + @scripted_value + ' by ' + @replace_value;
            PRINT ' '
            PRINT @message;
            PRINT 'Old @server: ' + @server;
            SET @server = REPLACE(@server, @scripted_value, @replace_value);
            PRINT 'New @server: ' + @server;
        END

        IF (@replace_scope = N'*' OR CHARINDEX(N',@database_name,', @replace_scope, 1) > 0) AND CHARINDEX(@scripted_value, @database_name, 1) > 0
        BEGIN
            SET @message = 'Replacement in @database_name: ' + @scripted_value + ' by ' + @replace_value;
            PRINT ' '
            PRINT @message;
            PRINT 'Old @database_name: ' + @database_name;
            SET @database_name = REPLACE(@database_name, @scripted_value, @replace_value);
            PRINT 'New @database_name: ' + @database_name;
        END

        IF (@replace_scope = N'*' OR CHARINDEX(N',@database_user_name,', @replace_scope, 1) > 0) AND CHARINDEX(@scripted_value, @database_user_name, 1) > 0
        BEGIN
            SET @message = 'Replacement in @database_user_name: ' + @scripted_value + ' by ' + @replace_value;
            PRINT REPLICATE('', @divider_length);
            PRINT @message;
            PRINT 'Old @database_user_name: ' + @database_user_name;
            SET @database_user_name = REPLACE(@database_user_name, @scripted_value, @replace_value);
            PRINT 'New @database_user_name: ' + @database_user_name;
        END

        IF (@replace_scope = N'*' OR CHARINDEX(N',@proxy_name,', @replace_scope, 1) > 0) AND CHARINDEX(@scripted_value, @proxy_name, 1) > 0
        BEGIN
            SET @message = 'Replacement in @proxy_name: ' + @scripted_value + ' by ' + @replace_value;
            PRINT ' '
            PRINT @message;
            PRINT 'Old @proxy_name: ' + @proxy_name;
            SET @proxy_name = REPLACE(@proxy_name, @scripted_value, @replace_value);
            PRINT 'New @proxy_name: ' + @proxy_name;
        END

        IF (@replace_scope = N'*' OR CHARINDEX(N',@output_file_name,', @replace_scope, 1) > 0) AND CHARINDEX(@scripted_value, @output_file_name, 1) > 0
        BEGIN
            SET @message = 'Replacement in @output_file_name: ' + @scripted_value + ' by ' + @replace_value;
            PRINT ' '
            PRINT @message;
            PRINT 'Old @output_file_name: ' + @output_file_name;
            SET @output_file_name = REPLACE(@output_file_name, @scripted_value, @replace_value);
            PRINT 'New @output_file_name: ' + @output_file_name;
        END

      END

      FETCH NEXT FROM replacements_cursor
        INTO @subsystems, @replace_scope, @scripted_value, @replace_value;
      
    END

    CLOSE replacements_cursor
    DEALLOCATE replacements_cursor
          
  END

  --\---------------------------------------------------------------------------------------------------------
  ---) PART 2: Fix environment references in ssis commands.
  --/---------------------------------------------------------------------------------------------------------
         
  --\
  ---) First check if there is something to do
  --/
  IF @subsystem = N'SSIS' AND CHARINDEX(N'/ENVREFERENCE', @command, 1) > 0 AND CHARINDEX(N'/ISSERVER “\”\SSISDB\', @command, 1) > 0
  BEGIN
    --\
    ---) Pull out @folder_name and @project_name from the @command variable value
    --/
    SELECT
      @startpos = CHARINDEX(N'/ISSERVER “\”\SSISDB\', @command, 1) + LEN(N'/ISSERVER “\”\SSISDB\'),
      -- @endpos = CHARINDEX(N'dtsx\””', @command, @startpos),
      @endpos = CHARINDEX(N'\', @command, @startpos + 1),
      @endpos_project = CHARINDEX(N'\', @command, @endpos + 1),
      @folder_name = SUBSTRING(@command, @startpos, @endpos - @startpos),
      @project_name = SUBSTRING(@command, @endpos + 1, @endpos_project - @endpos - 1);

      --\
      ---) Armed with the correct @folder_name and @project_name get the environment reference id.
      --/
      SELECT @reference_id = ISNULL((
          SELECT TOP 1
              er.reference_id
          FROM
              SSISDB.catalog.environments AS env
          JOIN
              SSISDB.catalog.folders AS fld
              ON fld.folder_id = env.folder_id
          JOIN
              SSISDB.catalog.projects AS prj
              ON prj.folder_id = fld.folder_id
          JOIN
              SSISDB.catalog.environment_references AS er
              ON er.project_id = prj.project_id
          WHERE
              fld.name = @folder_name
              AND prj.name = @project_name
      ), -1)

  END

  --\
  ---) If a valid environment reference id was found ..
  --/
  IF @reference_id != -1
  BEGIN
    SELECT
      --\
      ---) .. adjust the /ENVREFERENCE part of the @command so that it is followed by the retrieved @reference_id.
      --/
      @startpos = CHARINDEX(N'/ENVREFERENCE ', @command, 1),
      @endpos = CHARINDEX(N' ', @command, @startpos + + LEN(N'/ENVREFERENCE ') + 1);

      SELECT
          @new_reference = '/ENVREFERENCE ' + CONVERT(NVARCHAR, @reference_id),
          @new_command = LEFT(@command, @startpos - 1) + @new_reference + SUBSTRING(@command, @endpos, 8000);
      IF @new_command != @command
      BEGIN
          SET @message = 'Replacement in @command: fixing /ENVREFERENCE';
          PRINT ' '
          PRINT @message;
          PRINT 'Old @command: ' + @command;
          PRINT 'New @command: ' + @new_command;
          SET @command = @new_command;
      END
  END

  --\---------------------------------------------------------------------------------------------------------
  ---) PART 3: Now we have done our trick with the parameter values,
  ---) execute the built-in stored procedure sp_add_jobstep.
  --/---------------------------------------------------------------------------------------------------------

  EXECUTE @retval = dbo.sp_add_jobstep
      @job_id = @job_id,
      @job_name = @job_name,
      @step_id = @step_id,
      @step_name = @step_name,
      @subsystem = @subsystem,
      @command = @command,
      @additional_parameters = @additional_parameters,
      @cmdexec_success_code = @cmdexec_success_code,
      @on_success_action = @on_success_action,
      @on_success_step_id = @on_success_step_id,
      @on_fail_action = @on_fail_action,
      @on_fail_step_id = @on_fail_step_id,
      @server = @server,
      @database_name = @database_name,
      @database_user_name = @database_user_name,
      @retry_attempts = @retry_attempts,
      @retry_interval = @retry_interval,
      @os_run_priority = @os_run_priority,
      @output_file_name = @output_file_name,
      @flags = @flags,
      @proxy_id = @proxy_id,
      @proxy_name = @proxy_name,
      @step_uid = @step_uid OUTPUT

  RETURN(@retval)
END
GO

Create a generic install script

This install script consists of the following parts:

PART 1: Create temporary table #jobstep_replacements with specific replacements.

This is the part where you customize for your own environments.
Two things require further explanation: subsystems and replace-scope.
Each jobstep is defined for a specific subsystem. Valid values are:

Full name Code
Operating system (CmdExec) CmdExec
PowerShell PowerShell
Replication Distributor Distribution
Replication Merge Merge
Replication Queue Reader QueueReader
Replication Snapshot Snapshot
SQL Server Analysis Services Command ANALYSISCOMMAND
SQL Server Analysis Services Query ANALYSISQUERY
SQL Server Integation Services Package SSIS
Transact-SQL script (T-SQL) TSQL

The subsystem codes are used in SQL scripts generated by SSMS to create jobsteps.

Replace scopes is not an official term, I just called it that way.
Valid values are any (n)varchar parameter names of sp_add_jobstep, so:

  • @command
  • @additional_parameters
  • @server
  • @database_name
  • @database_user_name
  • @proxy_name
  • @output_file_name

The temporary table #jobstep_replacements contains the columns [subsystems] and [replace_scope].
Both can be filled with a commaseparated list to control the subsystems and parameters where the replacement should take place. A * wildcard means “all subsystems” or “all parameters”.

An example of part 1:

MSSQL-E08-jobscript-part1

--\------------------------------------------------------------------------------------
---) PART 1: Create temporary table #jobstep_replacements with specific replacements.
--/------------------------------------------------------------------------------------

IF OBJECT_ID('tempdb..#jobstep_replacements', 'U') IS NOT NULL
DROP TABLE #jobstep_replacements;

IF @@SERVERNAME != 'VWS63-SQL161' -- The server where the script was generated.
BEGIN
    CREATE TABLE #jobstep_replacements
    --\------------------------------------------------------------------------------------
    ---) IMPORTANT; do not change the structure of this temporary table because
    ---) stored procedure [usp_add_jobstep_wrapper] depends on it.
    --/------------------------------------------------------------------------------------
    (
    [id] INT IDENTITY(1, 1),
    [subsystems] NVARCHAR(128) NOT NULL, -- Use wildcard * or comma separated list with subsystem codes e.g. SSIS,TSQL (don't use spaces!)
    /*
    SUBSYSTEMS
    -------------------------------------- ---------------------
    Full name Code
    -------------------------------------- ---------------------
    Operating system (CmdExec) CmdExec
    PowerShell PowerShell
    Replication Distributor Distribution
    Replication Merge Merge
    Replication Queue Reader QueueReader
    Replication Snapshot Snapshot
    SQL Server Analysis Services Command ANALYSISCOMMAND
    SQL Server Analysis Services Query ANALYSISQUERY
    SQL Server Integation Services Package SSIS
    Transact-SQL script (T-SQL) TSQL
    */

    [replace_scope] NVARCHAR(128) NOT NULL, -- Use wildcard * or comma separated list with subsystem codes e.g. @command,@proxy_name (don't use spaces!)
    /*
    --------------------------------------
    REPLACE-SCOPES
    --------------------------------------
    @command
    @additional_parameters
    @server
    @database_name
    @database_user_name
    @proxy_name
    @output_file_name
    */

    [scripted_value] NVARCHAR(128) NOT NULL,
    [replace_value] NVARCHAR(128) NULL,
    [sort_order] INT NOT NULL DEFAULT(100)
    );
END

--\
---) Fill the temporary table, based on the current SQL Server Instance name.
---) Include the server on which the scripts were made in one IF statement, so the records can be updated later.
--/
IF @@SERVERNAME = 'VWS63-SQL161\TABULAR'
BEGIN
    INSERT INTO #jobstep_replacements
    ( [sort_order], [subsystems], [replace_scope], [scripted_value], [replace_value])
    VALUES
      (1, N'SSIS',
          N'@command',
          N'VWS63-SQL161\”',
          N'VWS63-SQL161\TABULAR\”'),

      (2, N'SSIS',
          N'@command',
          N'DEV',
          N'TST'),

     (3, N'TSQL',
          N'@command,@database_name',
          N'Demo_DEV',
          N'Demo_TST'),

     (4, N'PowerShell',
          N'@command',
          N'F:\Files\Development',
          N'F:\Files\Test')
END
ELSE IF @@SERVERNAME = 'PRODSERVER'
BEGIN
    INSERT INTO #jobstep_replacements
    ( [sort_order], [subsystems], [replace_scope], [scripted_value], [replace_value])
    VALUES
      (1, N'SSIS',
          N'@command',
          N'VWS63-SQL161\”',
          N'VWS63-SQL163\PRD\”'),

      (2, N'SSIS',
          N'@command',
          N'DEV',
          N'PRD'),

     (3, N'TSQL',
          N'@command,@database_name',
          N'Demo_DEV',
          N'Demo_PRD'),

     (4, N'PowerShell',
          N'@command',
          N'F:\Files\Development',
          N'F:\Files\Production')
END

PART 2: Smart sp_delete_job statements, using @job_name and IF EXISTS.

Those statements can be generated using a SQL query. This query is available as comment in the MSSQL-E08-jobscript.sql script (download the scripts here). Just select it and execute. Then you copy and paste from the query output window.

An example of part 2:

MSSQL-E08-jobscript-part2

--\------------------------------------------------------------------------------------
---) PART 2: Smart sp_delete_job statements, using @job_name and IF EXISTS
--/------------------------------------------------------------------------------------

USE [msdb]
GO
/* -- Query to generate sp_delete_job statements:
SELECT
    CASE sql.line_no
      WHEN 1 THEN 'IF EXISTS (SELECT 1 FROM msdb.dbo.sysjobs WHERE [name] = N'''+ name + ''')'
      WHEN 2 THEN ' EXEC msdb.dbo.sp_delete_job @job_name=N'''+ name + ''', @delete_unused_schedule=1;'
      WHEN 3 THEN 'GO'
    END AS [drop_sql_for_copy_and_paste]
FROM
    msdb.dbo.sysjobs j
CROSS JOIN -- To force new lines in the output, a cross join with 3 line_no rows is used.
    (SELECT 1 AS [line_no] UNION SELECT 2 UNION SELECT 3) AS [sql]
WHERE
    j.name like 'MSSQL-E08%' -- IMPORTANT: here you filter on the jobs to create delete statements for.
ORDER BY
    j.name,
    sql.line_no
*/

IF EXISTS (SELECT 1 FROM msdb.dbo.sysjobs WHERE [name] = N'MSSQL-E08-Job1-SSIS')
  EXEC msdb.dbo.sp_delete_job @job_name=N'MSSQL-E08-Job1-SSIS', @delete_unused_schedule=1;
GO
IF EXISTS (SELECT 1 FROM msdb.dbo.sysjobs WHERE [name] = N'MSSQL-E08-Job2-TSQL')
  EXEC msdb.dbo.sp_delete_job @job_name=N'MSSQL-E08-Job2-TSQL', @delete_unused_schedule=1;
GO
IF EXISTS (SELECT 1 FROM msdb.dbo.sysjobs WHERE [name] = N'MSSQL-E08-Job3-Powershell')
  EXEC msdb.dbo.sp_delete_job @job_name=N'MSSQL-E08-Job3-Powershell', @delete_unused_schedule=1;
GO

PART 3: Create jobs, script generated by SSMS.

Here you paste the script to create jobs that was generated by SQL Server Management Studio. After pasting, replace sp_add_jobstep by usp_add_jobstep_wrapper.

An example of part 3:

MSSQL-E08-jobscript-part3

--\------------------------------------------------------------------------------------
---) PART 3: Create jobs, script generated by SSMS
--/------------------------------------------------------------------------------------
--\
---) IMPORTANT NOTE: You can generate the part below as follows:
---) In SQL Server Management Studio select a SQL Agent job, then press F7
---) In the Object Explorer Details Pane, select the jobs you want to create a script for.
---) Then rightclick and in the context menu select:
---) > Script Job As > CREATE To > New Query Editor Window
---) In the script generated with SQL Server Management Studio, replace
---) sp_add_jobstep
---) by
---) usp_add_jobstep_wrapper
--/
USE [msdb]
GO

/****** Object: Job [MSSQL-E08-Job1-SSIS] Script Date: 19-12-2016 18:59:58 ******/
BEGIN TRANSACTION
DECLARE @ReturnCode INT
SELECT @ReturnCode = 0
/****** Object: JobCategory [[Uncategorized (Local)]] Script Date: 19-12-2016 18:59:58 ******/
IF NOT EXISTS (SELECT name FROM msdb.dbo.syscategories WHERE name=N'[Uncategorized (Local)]' AND category_class=1)
BEGIN
EXEC @ReturnCode = msdb.dbo.sp_add_category @class=N'JOB', @type=N'LOCAL', @name=N'[Uncategorized (Local)]'
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback

END

DECLARE @jobId BINARY(16)
EXEC @ReturnCode = msdb.dbo.sp_add_job @job_name=N'MSSQL-E08-Job1-SSIS',
  @enabled=1,
  @notify_level_eventlog=0,
  @notify_level_email=0,
  @notify_level_netsend=0,
  @notify_level_page=0,
  @delete_level=0,
  @description=N'No description available.',
  @category_name=N'[Uncategorized (Local)]',
  @owner_login_name=N'VWS63-SQL161\Hans', @job_id = @jobId OUTPUT
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
/****** Object: Step [AnyPackage.dtsx] Script Date: 19-12-2016 18:59:58 ******/
EXEC @ReturnCode = msdb.dbo.usp_add_jobstep_wrapper @job_id=@jobId, @step_name=N'AnyPackage.dtsx',
  @step_id=1,
  @cmdexec_success_code=0,
  @on_success_action=1,
  @on_success_step_id=0,
  @on_fail_action=2,
  @on_fail_step_id=0,
  @retry_attempts=0,
  @retry_interval=0,
  @os_run_priority=0, @subsystem=N'SSIS',
  @command=N'/ISSERVER “\”\SSISDB\DEV\SSIS-E10-P2\AnyPackage.dtsx\”” /SERVER “\”VWS63-SQL161\”” /ENVREFERENCE 4 /Par “\”$ServerOption::LOGGING_LEVEL(Int16)\””;1 /Par “\”$ServerOption::SYNCHRONIZED(Boolean)\””;True /CALLERINFO SQLAGENT /REPORTING E',
  @database_name=N'master',
  @flags=0
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_update_job @job_id = @jobId, @start_step_id = 1
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_add_jobschedule @job_id=@jobId, @name=N'Schedule for MSSQL-E08-Job1-SSIS',
  @enabled=0,
  @freq_type=4,
  @freq_interval=1,
  @freq_subday_type=1,
  @freq_subday_interval=0,
  @freq_relative_interval=0,
  @freq_recurrence_factor=0,
  @active_start_date=20161216,
  @active_end_date=99991231,
  @active_start_time=60000,
  @active_end_time=235959,
  @schedule_uid=N'6d24a58d-f800-4341-ab5b-41fbe4923da8'
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_add_jobserver @job_id = @jobId, @server_name = N'(local)'
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
COMMIT TRANSACTION
GOTO EndSave
QuitWithRollback:
    IF (@@TRANCOUNT > 0) ROLLBACK TRANSACTION
EndSave:

GO

/****** Object: Job [MSSQL-E08-Job2-TSQL] Script Date: 19-12-2016 18:59:58 ******/
BEGIN TRANSACTION
DECLARE @ReturnCode INT
SELECT @ReturnCode = 0
/****** Object: JobCategory [[Uncategorized (Local)]] Script Date: 19-12-2016 18:59:58 ******/
IF NOT EXISTS (SELECT name FROM msdb.dbo.syscategories WHERE name=N'[Uncategorized (Local)]' AND category_class=1)
BEGIN
EXEC @ReturnCode = msdb.dbo.sp_add_category @class=N'JOB', @type=N'LOCAL', @name=N'[Uncategorized (Local)]'
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback

END

DECLARE @jobId BINARY(16)
EXEC @ReturnCode = msdb.dbo.sp_add_job @job_name=N'MSSQL-E08-Job2-TSQL',
  @enabled=1,
  @notify_level_eventlog=0,
  @notify_level_email=0,
  @notify_level_netsend=0,
  @notify_level_page=0,
  @delete_level=0,
  @description=N'No description available.',
  @category_name=N'[Uncategorized (Local)]',
  @owner_login_name=N'VWS63-SQL161\Hans', @job_id = @jobId OUTPUT
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
/****** Object: Step [Demo_DEV] Script Date: 19-12-2016 18:59:58 ******/
EXEC @ReturnCode = msdb.dbo.usp_add_jobstep_wrapper @job_id=@jobId, @step_name=N'Demo_DEV',
  @step_id=1,
  @cmdexec_success_code=0,
  @on_success_action=1,
  @on_success_step_id=0,
  @on_fail_action=2,
  @on_fail_step_id=0,
  @retry_attempts=0,
  @retry_interval=0,
  @os_run_priority=0, @subsystem=N'TSQL',
  @command=N'SELECT * from sys.objects',
  @database_name=N'Demo_DEV',
  @flags=0
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_update_job @job_id = @jobId, @start_step_id = 1
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_add_jobschedule @job_id=@jobId, @name=N'Schedule for MSSQL-E08-Job2-TSQL',
  @enabled=0,
  @freq_type=4,
  @freq_interval=1,
  @freq_subday_type=1,
  @freq_subday_interval=0,
  @freq_relative_interval=0,
  @freq_recurrence_factor=0,
  @active_start_date=20161216,
  @active_end_date=99991231,
  @active_start_time=70000,
  @active_end_time=235959,
  @schedule_uid=N'a25c43d5-8543-4723-903b-beeb6d9a07a3'
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_add_jobserver @job_id = @jobId, @server_name = N'(local)'
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
COMMIT TRANSACTION
GOTO EndSave
QuitWithRollback:
    IF (@@TRANCOUNT > 0) ROLLBACK TRANSACTION
EndSave:

GO

/****** Object: Job [MSSQL-E08-Job3-Powershell] Script Date: 19-12-2016 18:59:58 ******/
BEGIN TRANSACTION
DECLARE @ReturnCode INT
SELECT @ReturnCode = 0
/****** Object: JobCategory [[Uncategorized (Local)]] Script Date: 19-12-2016 18:59:58 ******/
IF NOT EXISTS (SELECT name FROM msdb.dbo.syscategories WHERE name=N'[Uncategorized (Local)]' AND category_class=1)
BEGIN
EXEC @ReturnCode = msdb.dbo.sp_add_category @class=N'JOB', @type=N'LOCAL', @name=N'[Uncategorized (Local)]'
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback

END

DECLARE @jobId BINARY(16)
EXEC @ReturnCode = msdb.dbo.sp_add_job @job_name=N'MSSQL-E08-Job3-Powershell',
  @enabled=1,
  @notify_level_eventlog=0,
  @notify_level_email=0,
  @notify_level_netsend=0,
  @notify_level_page=0,
  @delete_level=0,
  @description=N'No description available.',
  @category_name=N'[Uncategorized (Local)]',
  @owner_login_name=N'VWS63-SQL161\Hans', @job_id = @jobId OUTPUT
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
/****** Object: Step [Powershell step] Script Date: 19-12-2016 18:59:59 ******/
EXEC @ReturnCode = msdb.dbo.usp_add_jobstep_wrapper @job_id=@jobId, @step_name=N'Powershell step',
  @step_id=1,
  @cmdexec_success_code=0,
  @on_success_action=1,
  @on_success_step_id=0,
  @on_fail_action=2,
  @on_fail_step_id=0,
  @retry_attempts=0,
  @retry_interval=0,
  @os_run_priority=0, @subsystem=N'PowerShell',
  @command=N'dir “F:\Files\Development\*.csv”',
  @database_name=N'master',
  @flags=0
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_update_job @job_id = @jobId, @start_step_id = 1
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_add_jobschedule @job_id=@jobId, @name=N'Schedule for MSSQL-E08-Job3-Powershell',
  @enabled=0,
  @freq_type=4,
  @freq_interval=1,
  @freq_subday_type=1,
  @freq_subday_interval=0,
  @freq_relative_interval=0,
  @freq_recurrence_factor=0,
  @active_start_date=20161216,
  @active_end_date=99991231,
  @active_start_time=110000,
  @active_end_time=235959,
  @schedule_uid=N'a6f0e0ca-d27a-4d3f-a349-4a53c0392541'
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_add_jobserver @job_id = @jobId, @server_name = N'(local)'
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
COMMIT TRANSACTION
GOTO EndSave
QuitWithRollback:
    IF (@@TRANCOUNT > 0) ROLLBACK TRANSACTION
EndSave:

GO

It’s demo time ..

It is also time to set up a demo, otherwise you have to believe me that this works, and I always like to prove it.
For this I have deployed two SSIS projects from an earlier blog post to two SQL Instances on my virtual server. One instance is the default instance MSSQLSERVER and the other one is called TABULAR (it has SSAS installed in TABULAR mode).
For the demo I will pretend however that the default instance is my development environment and the TABULAR instance is my test.
On the default instance I create three SQL Agent jobs for SSIS-, TSQL and PowerShell jobsteps, respectively.

Jobs created for the demo ..

Then I create a script from the jobs, paste it into script MSSQL-E08-jobscript.sql and replace sp_add_jobstep by usp_add_jobstep_wrapper.

Generating a CREATE script for the jobs ..

After doing the necessary preparation to install usp_add_jobstep_wrapper on both instances, I can create the SQL Agent jobs on the TABULAR instance.
The output window shows which replacements were done.

As you can see in the output window, multiple replacements will be applied one after the other ..

Replacement in the TSQL jobstep ..

Replacement in the PowerShell jobstep..

I can still use the script on the default instance, on that instance simply no replacements will be done.

No replacements done on the default instance where the CREATE script was generated ..

Download the scripts here.

Conclusion / Wrap up

In this blog post you could read how to cope with some peculiarities of SQL Server Management Studio when it creates SQL Agent job scripts.
I have explained how you can make your script generic for all environments, which makes it more suitable for putting it under source control.
For this a special stored procedure, [dbo].[usp_add_jobstep_wrapper] is installed in the [msdb] database and a generic install script must be made that is partly developed and partly pasted from the SSMS Script.

And, of course, have a merry Christmas and a happy New Year!

(c) 2016 hansmichiels.com – Do not steal the contents – spread the link instead – thank you.

Quiet install of SQL Server + SP1 + all tools (SQL Server Series)

Introduction

Due to some serious trouble with my SQL Server 2016 Virtual Machine I had to reinstall SQL Server + SP1 + SSMS + Visual Studio 2015 + SQL Server Data Tools again on an older backup of the Virtual Machine.
So the article I had planned next is delayed.
But my bad luck was still inspiration for this unplanned article.

Problem

How can I do quiet installations of SQL Server 2016 including all service packs and tools?

Solution

So what I want to do is install the following products with minimum effort and user interaction after each other, in the Netherlands we would say “gewoon, omdat het kan” (this means something like “just because it is possible”):

  1. SQL Server 2016 Developer Edition RTM
  2. SQL Server 2016 SP1
  3. SQL Server Management Studio 2016
  4. Visual Studio Community 2015
  5. SQL Server Data Tools

The structure of this article will be practical and straight forward.
For each program to install I will give comments and details about the command line options, one or more external references for more info when required, and a location to download the software.

A few common remarks that are applicable to all programs:

  • I have chosen for “unattended but visible” installations when possible. This means there is no user interaction required, but you can see what is happening.
    These switches are either called /qs or /passive.
  • Before doing an unattended install you have to download all media. I mounted one ISO (SQL Server) as DVD drive in my Hyper V Virtual machine, and copied the other setup files into a Virtual harddisk X. (e.g. mounted the downloaded ISO file of SSDT and copied the files to X:\Install\SSDT).
  • Also /norestart is used whenever possible. Not because I do not want to reboot, but because the reboot is taken care of in the Powershell script (using shutdown -r -t 0), so I can first create an empty dummy file. When this file exists, the script knows that the setup has already run.
  • And last but not least: I think it is appropriate to thank Microsoft Corporation. All those software above is FOR FREE now (but please do respect the sometimes limiting license terms)! This is heaven for any developer just to play around without any license cost. So: Microsoft, a sincere Thank You.

Finally a PowerShell script (download the script here) is used that calls all installers, one after the other. You can choose to reboot inbetween, and then run the same script again.

SQL Server 2016 Developer Edition RTM

Comments and command line options

Two things are important here, you must add /IACCEPTSQLSERVERLICENSETERMS and you must have a ConfigurationFile that specifies all the details for the installation.
The simplest way to obtain a ConfigurationFile is to start the setup manually and walk through the wizard until it is ready to install. Then a path to the ConfigurationFile.ini is shown.
Save this file and you can use it for future installations.

To make the configuration file suitable for a “progress only” installation, set QUIETSIMPLE=”True”.
Also put a semicolon in front of UIMODE to make it comment, because it is not possible to use QUIET or QUIETSIMPLE together with UIMODE.
mssql-e07-472Changes needed in the SQL Server Configuration File ..

External references

Install SQL Server 2016 Using a Configuration File.

Where to download

SQL Server 2016 Developer Edition Download (Microsoft Account required).

SQL Server 2016 SP1

Comments and command line options

While writing this article I noticed that on the page above the installation media of SQL Server 2016 is also available including SP1, so in that case you do not need to download SP1 seperately.
If you need SP1, e.g. because you have existing installation media (maybe other than Developer Edition) the following can be said about the command line options:
I have used /action=Patch /allinstances /qs /IAcceptSQLServerLicenseTerms.
If you not want to do all instances on a server, check the external references below for more info.

External references

Silent install info on Technet.

Where to download

Microsoft® SQL Server® 2016 Service Pack 1 (SP1) download.

SQL Server Management Studio 2016

Comments and command line options

The command line options used are /install /passive /norestart.

External references

Performing a Silent Install of SQL Server Management Studio (2016) by Sven Aelterman.

Where to download

Download SQL Server Management Studio (SSMS).

Visual Studio Community 2015

Comments and command line options

Microsoft recommends to install Visual Studio 2015 before SQL Server Data Tools, as follows:
“We recommend installing Visual Studio 2015 prior to applying this update. Installing this update will replace SSDT RTM in Visual Studio 2015 with the latest version.
If you do not have Visual Studio 2015, SSDT will install the Visual Studio 2015 Integrated shell and Visual Studio 2015 Isolated shell with limited feature support for SQL Server Database and BI Projects.”

I like to install Visual Studio, so I can do other stuff like creating console applications. If you are sure you will only use the Business Intelligence “Suite” you could skip installing Visual Studio.
The command line options used are /Passive /NoRestart /Log “X:\Install\VSTUD_LOG\VSTUD.log”.
These options lead to an installation of about 7.2 GB.
It is best to provide a logfile name in a seperate “dedicated” folder, because not one, but hundreds of logfiles are created. I created the folder before manually. I have not tested (sorry, forgot) if the setup would create the folder if it would not exist.

I must warn you also that the command line option /Full leads to a massive install of about 56 GB and takes quite a long time. Because the default installation installs all I possibly need I did not use /Full.

mssql-e07-464When you run vs_community.exe /? you get an overview of the command line parameters.

External references

Using Command-Line Parameters to Install Visual Studio.
How to: Create and Run an Unattended Installation of Visual Studio.
Install Visual Studio 2015.

Where to download

VS 2015 Community Edition download.

SQL Server Data Tools

Comments and command line options

Hoorah, Microsoft ended the confusion and separate setups for SSDT (Visual Studio project type for Database projects) and SSDT-BI (formerly known as BIDS with project types for SSIS-, SSRS- and SSAS-development).
The current installer contains both!
One thing of the command line options really caught me and therefore a
WARNING!
Do not use a / before the command line options INSTALLAS, INSTALLIS, INSTALLRS and INSTALLALL!
However a slash is not shown in the help screen below, it is so easy to assume that a / is required (or overlook this on the help screen). Intuitively you would expect that all command line parameters start with /, but believe me I did this and both the installed software as the install log file proved that the options where not installed while I provided /INSTALLALL=1 as command line parameter.
A line of the logfile: Condition ‘INSTALLIS = 1 OR INSTALLALL = 1’ evaluates to false.
When I used INSTALLALL=1 as command line parameter, all Business Intelligence project types where installed!

mssql-e07-463When you run SSDTSETUP.EXE /? you get an overview of the command line parameters.

External references

Blog article by Dandy Weyn.

Where to download

Download SQL Server Data Tools (SSDT).
SQL Server Data Tools in Visual Studio 2015.

Using Powershell to install all

Please note that the installation is not “fully” automated. After each reboot, you have to restart the Powershell script, and it will continue with the first program that is not installed yet.
If you like, you could create a Windows Scheduled Task and start the script “with the highest privileges” after a reboot. I tried this but it seemed not to work very well, maybe I did something wrong.
So I just start the script a few times manually after reboot, not a big deal IMHO (but of course this depends on how you intend to use the script).

From a technical point of view, the comment in the script should help you further:
mssql-e07-470“Help” section of the PowerShell script. Please read!

mssql-e07-471The MAIN SCRIPT contains multiple of this kind of code blocks, for each installer one.

mssql-e07-465Running the PowerShell script ..

mssql-e07-466My Install folder with some “done” files ..

mssql-e07-468Command-based dialog after an installation is completed ..

mssql-e07-467Printscreen of installation in progress ..

mssql-e07-469After installation both “SQL Server Data Tools 2015” and “Visual Studio 2015” are available as Apps on Windows Server 2012 R2. In my experience you can use both for all project types, however I tend to use Visual Studio ..

Where to download

Download the Powershell script here.

(c) 2016 hansmichiels.com – Do not steal the contents – spread the link instead – thank you.

Zeros, bloody zeros! (Data Vault Series)

Introduction

I must admit I have a weakness for British humour.
When I had to cope with leading zeros in business keys some time ago, I spontaneously came up with the title of this post, not knowing that it would serve as such.
For those who do not know, “Meetings, bloody meetings” is a British comedy training film in which John Cleese plays a main role. It was made in 1976, and a remake was made in 2012.
It tells in a funny way what can go wrong at meetings and how you can do better, check it out if you can.

DV-S01E05-meetingsMr John Cleese

But, obviously, this post is not about meetings but about zeros.

Problem

I can be short about that: leading zeros in business key values.
For instance a customer number is delivered to the data warehouse as 0001806 (instead of 1806).
This would not be a problem it is would always be delivered exactly like that. But to be honest, you can and will not know that upfront. Even this might be the case now, it might not be in the future.
When other tools are used, leading zeros could suddenly disappear (for instance when a csv file is modified using Excel), or (more rarely) the number of leading zeros could change (01806, 00001806). When this happens you have a problem, because for the data warehouse 01806, 0001806, 00001806 and 1806 are all different business keys! Even if you have only two variants, it is already a problem.
Because every business key gets a different row in the hub, and this customer now exists multiple times!

DV-S01E05-zeros(No acting here, this is how I look sometimes)

Solution

If you are familiar with Data Vault, you might already think of same-as-links to solve this.
But I think the solution should be implemented earlier, to avoid having multiple hub rows.
Simply always remove leading zeros when the sourcecolumn is (part of) a business key (either primary or foreign key) and seems a number or ID but is delivered as a string/varchar. In this way 1806 will always be 1806! And I think it is pretty impossible that 001806 and 1806 would refer to two different customers.
Unless, of course, they would come from different source systems. But in that situation, depending on leading zeros would be a bad thing to do, because when then leading zeros dropped off, satellite rows of different customers (in different source systems) could end up as connected to the same hub row! In this situation, in a non-integrated Raw Vault, it would be better to prefix the business key with the source system code and remove the leading zeros, for instance, CRM.1806 and ERP.1806.
In all cases, you can still store the original value (with leading zeros) as an ordinary attribute in a satellite for auditing reasons.

How to implement the solution

There are many ways to remove leading zeros. When I was searching for this I had two requirements:

  • No casting from and to an integer may take place, otherwise all business keys need to be numeric, so this would make the solution less reliable.
  • No function, routine or assembly may be called, this could negatively impact performance. I was looking for an “inline” conversion.

After some research I found an expression that was the same for SQL and SSIS and quite okay (T-SQL version by Robin Hames, my credits for his work), but appeared to change a string with only one or more zeros to an empty string. And because a 0 can have a meaning – and is certainly different from an empty string – this is undesired behavior, IMHO.
So I had to add some logic to it: a SELECT CASE in T-SQL and an inline condition (format {condition} ? {true part} : {false part} ) to the SSIS expression.
Furthermore I came on a different method for T-SQL as well, using the PATINDEX function, which is more compact than the other solution.
For SSIS I still use the ‘Robin Hames’ method, because the PATINDEX function is not available in SSIS Expressions.
So .. this is what it has become:

T-SQL

Remove_leading_zeros.sql

SELECT
    example.[id_with_leading_zeros],
   CASE
      WHEN LTRIM(example.[id_with_leading_zeros]) = '' THEN ''
      WHEN PATINDEX( '%[^0 ]%', example.[id_with_leading_zeros]) = 0 THEN '0'
      ELSE SUBSTRING(example.[id_with_leading_zeros], PATINDEX('%[^0 ]%', example.[id_with_leading_zeros]), LEN(example.[id_with_leading_zeros]))
   END AS [id_without_zeros_method1],

   CASE
      WHEN LTRIM(example.[id_with_leading_zeros]) = '' THEN ''
      WHEN PATINDEX( '%[^0 ]%', example.[id_with_leading_zeros]) = 0 THEN '0'
      ELSE REPLACE(REPLACE(LTRIM(REPLACE(-- Robin Hames' method
            REPLACE(LTRIM(example.[id_with_leading_zeros]), ' ', '!#!') -- replace existing spaces with a string that does not occur in the column value, I have chosen '!#!'
            , '0', ' ') -- replace '0' with ' '
            ) -- end of LTRIM to remove leading '0's that have been changed to ' 's
            , ' ', '0') -- change ' ' back to '0'
            , '!#!', ' ') -- change '!#!' back to ' '
   END AS [id_without_zeros_method2]
FROM
    (
    SELECT
        TOP 1000 RIGHT('00000000000' + CONVERT(NVARCHAR(12), object_id), 14) AS [id_with_leading_zeros]
    FROM
        master.sys.objects
    UNION
    SELECT N' 00000 '
    UNION
    SELECT N'00'
    UNION
    SELECT N' '
    UNION
    SELECT ' 0099990 A '
    UNION
    SELECT '-5550'
    ) example

SSIS Expression (can be used in Derived Column)

(LTRIM(REPLACE(id_with_leading_zeros,"0", "")) == "" && LTRIM(id_with_leading_zeros) != "") ? "0" : REPLACE(REPLACE(LTRIM(REPLACE(REPLACE(LTRIM(id_with_leading_zeros)," ","!#!"),"0"," "))," ","0"),"!#!"," ")

DV-S01E05-151In a Derived Column Transformation this looks for instance like this

Conclusion / Wrap up

In this post I have motivated why I think you should remove leading zeros from business keys when data is loaded from source systems to a data warehouse.
This post also contains different ways to remove leading zeros, two for T-SQL and one for a SSIS expression.

(c) 2016 hansmichiels.com – Do not steal the contents – spread the link instead – thank you.

SQL Server 2016 tempdb configurations benchmark (SQL Server Series)

Introduction

When you have installed SQL Server 2016 you will have noticed that the configuration for tempdb has changed.
Microsoft made a proper tempdb configuration easy to do via the GUI during installation.

Today I want to go through the settings offered and will measure performance of different setups.

What are the options?

During installation you can set the following values:
For TempDB datafiles:

Setting Default value
Number of files 4
Initial size (MB) 8
Autogrowth (MB) 64
Data directories Default data directory, in my case C:\MSSQL\Instance root\MSSQL13.MSSQLSERVER\MSSQL\Data

For TempDB logfile:

Setting Default value
Initial size (MB) 8
Autogrowth (MB) 64
Log directory Default data directory, in my case C:\MSSQL\Instance root\MSSQL13.MSSQLSERVER\MSSQL\Data

MSSQL-S01E03-tempdb-setup
Configuration screen during SQL Server 2016 installation (I changed the directories already).

So what is the best setup?

This subject is not new for SQL Server 2016, but the installation procedure now just gives a better opportunity to set it up right the first time.
There are roughly five differentiators in the setup:

  • Initial size (MB)
  • Autogrowth (MB)
  • Number of data files
  • Location of data files
  • Read/write performance of the different storage media

What are the considerations when setting these values?

Initial size (MB)

A larger initial size will accomplish two things:

  1. It will take longer before the file will have the need to grow.
  2. The file can be stored less fragmented on disk.

If you have enough disk space, the larger the better, there is no real downside on making the initial size quite large. More about this in my conclusion.

Autogrowth (MB)

With autogrowth the choice is a bit more delicated, as a small or large value will both have pros and cons, as pointed out in the schedule below.

Autogrowth value Pros Cons
Small More frequent smaller growth actions will cause less noticable hick ups TempDB datafiles will become more fragmented.
Large Tempdb datafiles will become less fragmented. Less frequent larger growth actions might cause noticable hick ups

Number of data files

In general, the recommendation is one data file per (logical) processor. You can configure more files than the number of (logical) processors, but do this only if you notice that this gives a better performance.

Location of data files

Locating the data files on different drives can have two effects:

  1. Queue length per disk drive in Windows (all scenarios, also with virtual disks residing on the same physical drive).
  2. Physical writes to the diskdrive can go in parallel (only when different physical drives).

The tests

The testcases

Creating combinations of all differentiators would just create *a lot* of testscenarios.
So I will be a bit pragmatic.
The testcases are chosen, so that they can be compared with at least one other testcase, of which one differentiator is different.

Test # # Data files Init size (MB) Auto growth (MB) # Fol- ders Scenario
0 4 8 (x4) 64 1 Default installation (data files on OS disk)
1 4 8 (x4) 64 1 Data files on dedicated drive
2 4 8 (x4) 64 4 Data files on more drives
3 4 2048 (x4) 64 4 Data files on more drives + Larger initial size
4 4 8 (x4) 256 4 Data files on more drives + Larger autogrowth
5 4 2048 (x4) 64 1 Data files on dedicated drive + Larger initial size
6 4 2048 (x4) 64 1 Data files on dedicated drive + Larger initial size + TempDB on different SSD
7 8 2048 (x8) 64 1 Data files on dedicated drive + Larger initial size + More data files
8 4 8 (x4) 64 1 Default installation (data files on OS disk) + Larger initial size

Test results

I will present you both two graphs as well as the raw data. I think especially the first graph gives some useful insights, as it compares all testcases with testcase 0 (default installation), by using an index 100 for testcase 0. If a different testcase performs better, the index will be less than 100. Also, in this way, the average result of both testsets (A and B) can be shown.
I left testcase 6 out of the graph, later I will tell why I did this.
MSSQL-S01E03-testresult-indexIndex-based comparison of testresults

MSSQL-S01E03-testresult-secondsExecution times of testcases for testset A and B

Test results figures

Test # Scenario Testresult testset A (ms) Testresult testset B (ms)
0 Default installation 30359 232795
1 Data files on dedicated drive 28344 261467
2 Data files on more drives 22514 248484
3 Data files on more drives + Larger initial size 28040 210223
4 Data files on more drives + Larger autogrowth 24173 200632
5 Data files on dedicated drive + Larger initial size 19608 192587
6 Data files on dedicated drive + Larger initial size + TempDB on different SSD 123289 387646
7 Data files on dedicated drive + Larger initial size + More data files 19692 211501
8 Default installation + Larger initial size 20149 361844

Conclusion

Data files on dedicated drive + Larger initial size (test 5) gives the best performance for both testsets. Putting tempdb files on multiple disks can not win from this, however the differences are relatively small.

Surprisingly performance dramaticly decreases when the virtual disk on which tempdb is stored is moved to a different (faster) SSD (this is test 6)! I have no explanation for this. Therefore I will leave test 6 out of the analysis, by doing this physical storage is the same for all testcases and changes in performance are only caused by the other differentiators.

My most important conclusion is that the default values offered during installation are quite okay, except for two things:

The Initial size (MB)

You should really set the initial size to a big number (gigabytes if possible). Note that if you leave the number of files 4 (which I recommend), the space used is four times the size that you use for Initial size.
If you have a dedicated disk for tempdb (which is recommended), you might want to set the initial size of the datafile(s) to fill up the disk. You should then disable Autogrowth.
Also on a production server, check the size of tempdb files after a week of uptime (also the SQL instance may not have been restarted). Say the tempdb size of all files together (can also be one file) is 22GB. You could then set your initial size to 28GB in total, so 7GB per file if you configured 4 files. This would mean that during normal operation, the tempdb never has to grow.

Dedicated disk

It is a common best practive to put tempdb on a seperate disk so that the Windows Disk Queue does not have to compete with other read/write actions to that disk.

Other settings are somewhat less important but … think for yourself (my disclaimer)

In my opinion both putting tempdb files on multiple different disks and setting Autogrowth to a different value do not offer much performance benefit.
The Autogrowth value is even irrelevant if the initial size is so large that temdb never has to grow during normal operation.
But every server is different! Test and think for yourself.
My testresults are presented “AS IS” in the hope they will be useful as a starting point for further own analysis. I am not responsible for any disadvantage you might have by using my test results for your tempdb setup.

What if SQL Server is already installed and/or it is an older version?

No worries, you still can change the tempdb configuration.
You can move and add files using SQL scripts.
Initial size and Autogrowth you can also set with Management Studio.
MSSQL-S01E03-166Go to properties of tempdb

MSSQL-S01E03-167Here you can change Initial size and Autogrowth

By the way, I noticed that SQL Server does not delete moved tempdb files. So if you move tempdb files, you should delete the files on the old location, so that they do not use disk space and do not cause confusion for administrators.

Test justification

  • All tests are done on a Hyper V Virtual Machine running Windows Server 2012 R2 and SQL Server 2016 Developer Edition.
  • Change of physical storage type is done by moving vhdx files to different physical disks, and then change the file location in the settings of the Hyper V Virtual Machine (while it is not running, of course).
  • Each test is done on an initial copy of the Virtual Machine, therewith the startsituation is equal for every test (e.g. no tempdb growth by previous tests).
  • Tests where done with two sets of scripts (set A and B) that inserted, updated and deleted rows in temporary tables in tempdb. Set A consists of three sql scripts, and set B of five.
  • All scripts were developed, so that all tempdb inserts started at the same time (maybe milliseconds difference in starttime only). This to ensure that every testrun was executed in the same way.
  • All scripts were executed in SQL Server Management Studio.
  • The first file of tempdb is stored on its own (virtual) disk except for testcase 0, where tempdb files are on the C-drive where Windows is installed.
  • When spreading files over multiple virtual disks, the disks created for database files, log files, and backup files are used. This was done for pragmatic reasons during the test. This was defendable, because no other activity on databases and log files was going on during the test. It is however not recommended to spread tempdb like that in production environment, unless every disk is dedicated to be used for tempdb only.
  • The host machine had the same applications open during every test, being a Windows Explorer and Microsoft Excel. No other applications that used CPU and could influence performance of the virtual machines were running.
  • The tempdb log file was placed on a seperate disk (the same one) for each test. Therewith it was no subject of performance investigation. This would have at least doubled the number of testcases.

Wrap up

Microsoft did a good job in improving the setup options for tempdb in the installation procedure.
It is a bit of a pity that they did not go the last mile, giving the tempdb data files a more realistic Initial size.
Luckily this is easy to change during installation, and it is a quick win for anyone installing SQL Server 2016.

Download all the used scripts here.

(c) 2016 hansmichiels.com – Do not steal the contents – spread the link instead – thank you.

Zip sourcefiles in a SSIS Package (SSIS Series)

Introduction

As I am in a (un)zipping mood, this week the opposite problem of last week.
The example is not so ready-to-use as in my previous post, as it depends on choices made in your development process of other SSIS packages.
Still I hope you can adjust this example to match your needs.

Problem

If you have imported and archived (csv- or xml) files, these files can take up a lot of disk space. The archive folder can also become overwhelmingly full with ten thousands of files, which can make it slowly accessible and difficult to manage.

Solution

To tackle this problem I have made a fully configurable SSIS package, that can be executed from a different package with an Execute Package Task.
The package is made using SSIS 2014.

The zipping itself is done with 7za.exe, for more information on how to get and install this free program see my previous blog post.

Scenario

When I used this concept, a different SSIS package moved files to a subfolder of an archive folder after they were processed, but their name was prefixed with the LoadCycleID with leading zeros, for instance 001345_customers.xml.

The archive folder and subfolder were separated, because the archive folder was an environment variable and the same for all packages, while the subfolder was different for different SSIS packages, and contained the name of the source system of the files.
If you only have one archive folder, just change the example to use that instead of the concatenation of archive folder and subfolder.

Overview

Here is a picture of the end result:

SSIS-S01E06-124
The Control Flow

Detailed steps

Parameters and variables

First you need to add a few parameters and variables to the package.

Parameters:

Name Data type Required Description
_7zaPathAndFilename String False The full path and filename of 7za.exe.
ArchiveFolder String True The root folder for archiving processed files.
ArchiveSubFolder String True The subfolder of the ArchiveFolder in which the files to be zipped exist, the subfolder can for instance be named after a source system like SAP or CRM.
LoadCycleID Integer True The ID of the LoadCycle that is processing these files. In your situation it could also be called ExecutionID, SessionID and so on, the name is not really important.

SSIS-S01E06-125
Package parameters

Variables:

However not all variables might be necessary, I like the concept of being able to configure all values in one place.
The data type of all variables is String, except LoadCycleID which is an Integer.
Every variable uses an expression to set its value.
Here is a detailed list of all variables and the expression used for each one.

Name Expression
ArchiveFilesToDeleteInFolder @[$Package::ArchiveFolder] + "\\" + @[$Package::ArchiveSubFolder]
ArchiveFilesToDeletePattern @[User::LoadCycleIDString] + "*"
ArchiveFilesToZipPattern @[$Package::ArchiveFolder] + "\\" + @[$Package::ArchiveSubFolder] + "\\" + @[User::LoadCycleIDString] + "*.*"
ArchiveFilesWorkingDirectory @[$Package::ArchiveFolder]
ArchiveZipFile @[User::LoadCycleIDString] + "_" + @[$Package::ArchiveSubFolder] + ".zip"
ArchiveZipPathFile_Final @[$Package::ArchiveFolder] + "\\" +@[$Package::ArchiveSubFolder] + "\\" + @[User::LoadCycleIDString] + "_" + @[$Package::ArchiveSubFolder] + ".zip"
ArchiveZipPathFile_Temp @[$Package::ArchiveFolder] + "\\" + @[User::LoadCycleIDString] + "_" + @[$Package::ArchiveSubFolder] + ".zip"
LoadCycleID @[$Package::LoadCycleID]
LoadCycleIDString RIGHT("00000" + (DT_STR, 10, 1252) @[$Package::LoadCycleID], 6)

Important note: if you expect that the value for LoadCycleID will grow beyond 999999 over time, you should add more prefixed zero’s to LoadCycleIDString and increase the number 6, the second parameter of the RIGHT function.

SSIS-S01E06-126
SSIS variables of the package

The Control Flow

The control flow is quite straight forward, only three tasks and we’re done.

EPR Zip files in archive subfolder

This task creates one zipfile containing multiple other files in the ArchiveSubFolder.
From the SSIS Toolbox drag an Execute Process Task to the Control Flow panel and open the Execute Process Task Editor.
On the General and Process panes you do not have to change anything.

SSIS-S01E06-128
SSIS-S01E06-129
No changes needed here

On the Expressions pane, press the ellipsis next to Expressions. In the Property Expressions Editor add the following:

Property Expression
Arguments "a "+ "\"" + @[User::ArchiveZipFile] + "\" " + "\"" + @[User::ArchiveFilesToZipPattern] + "\" -mx=9"
Executable @[$Package::_7zaPathAndFilename]
WorkingDirectory @[User::ArchiveFilesWorkingDirectory]

Please note that in the Arguments expression -mx=9 is used. This is the compression level, level 9 means “ultra” compression. This level gives the smallest zip files but is also the slowest compression level. You can for instance also use 5 for “normal” or 3 for “fast” compression, if this is what you want.

SSIS-S01E06-127
Expressions for the Execute Process Task

SCR Delete uncompressed files

This task deletes the files that were added to the zip file in the previous task.

In the Script Task Editor, add both variables User::ArchiveFilesToDeleteInFolder and User::ArchiveFilesToDeletePattern to ReadOnlyVariables, then press Edit Script ….

Then in the namespaces region add one namespace, as indicated below.

Namespaces region
#region Namespaces
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;
// Added namespace:
using System.IO;
#endregion

Adjust public void main() as follows:

public void Main

public void Main()
{
  // Copy SSIS variable values to C# string variables.
  string archiveFilesToDeleteInFolder = Dts.Variables["User::ArchiveFilesToDeleteInFolder"].Value.ToString();
  string archiveFilesToDeletePattern = Dts.Variables["User::ArchiveFilesToDeletePattern"].Value.ToString();

  // Delete each file that just has been added to the zip archive.
  foreach (string f in Directory.EnumerateFiles(archiveFilesToDeleteInFolder, archiveFilesToDeletePattern))
  {
    File.Delete(f);
  }

 Dts.TaskResult = (int)ScriptResults.Success;
}


WARNING: Make sure that no files that meet the file name pattern of files to add to the zip file are added to the ArchiveSubFolder while or after the zipfile is (being) created. Otherwise files that are NOT zipped might get deleted!

Download the script here.

FST Move zip file

The File System Task is to move the zip file to the ArchiveSubFolder, because the zipfile is initially created in a temporary folder and not in the ArchiveSubFolder.
This is to prevent that it would be deleted by coincidence, if you use a value for ArchiveFilesToDeletePattern that ends with .*
Configure the File System Task as shown in the picture below:

SSIS-S01E06-130
File System Task Configuration

Download the script here.

Let the car drive

Time to test the package and tell the result in pictures (if you don’t mind).

SSIS-S01E06-131
Before: files of LoadCycleID 001346 need to be zipped ..

SSIS-S01E06-132
Execute the package ..

SSIS-S01E06-133
All csv files replaced with one zip file ..

SSIS-S01E06-134
.. that contains the csv files ..

Conclusion / Wrap up

In this blog post you could read the steps needed to build a SSIS package that creates a zip file containing a number of files in an archive(sub)folder.
This makes the archive smaller and easier to manage.
However beware that you only delete files that were added to the zip file. For more details see the warning above.

(c) 2016 hansmichiels.com – Do not steal the contents – spread the link instead – thank you.

Unzip specific files from lots of zipped archives (SSIS Series)

Introduction

This week I want to discuss the trouble you may have when searching hundreds of zip files for specific files inside them.
Recognizable?
At least I faced this problem some time ago, and solved it using SSIS and the commandline version of 7-zip, named 7za.exe.
The choice for SSIS was purely based on convenience, as you could tackle this problem also using Powershell, C# or almost any other scripting language.
But I thought SSIS is just a neat and quick way to do this.

Problem

Finding specific files in huge amounts of zip files.

Solution

To tackle this problem I have made a fully configurable SSIS package, that can be executed with different parameters.
The main task of the package is to extract specific files (based on a file name pattern) from zipfiles in a folder and – optionally – subfolders.
The unzipping itself is done with 7za.exe, which is the commandline version of 7z, the great open source zip program, that exists already for several years and is used by millions of users.

By the way, the package is made using SSIS 2014.

Preparations: download and install 7za

It is important to download this package from the official 7z web site, just to avoid that you would get a copy from some deceitful web site infected with malware.
To go to the official 7-zip download page click on this link:
7-zip download page
Most important is the 7-Zip Extra: standalone console version.
For the example I have used version 16.02 (2016-05-21) for Windows, but it should also work with version 9.20 (2010-11-18), then called 7-Zip Command Line Version.

If you download the newest version, you also need software to unzip the 7z160-extra.7z file, for instance the regular 32-bit or 64-bit version of 7-zip, which can also be downloaded on the same page.

SSIS-S01E05-100
Download page of 7-zip

I assume you will use the newest version 7z160-extra.7z
Once unzipped, just take the time to read License.txt, readme.txt and history.txt.
Then copy the three files 7zxa.dll, 7za.dll and 7za.exe to a folder on the Windows computer where SSIS runs, usually a Windows Server.
If you run a x64 version of Windows, you can take the x64 version of the 7za files, which are in a subfolder “x64” of the unzipped files.

SSIS-S01E05-101
Unzipped 7za files ..

I have created a subfolder “7za” in the “C:\Program files” folder of my virtual Windows Server, and copied the x64-version of the three mentioned files to it, but you can put them in any folder you like.
When you have done this, the real fun can start ..

Overview

Here is a picture of the end result:

SSIS-S01E05-102
The Control Flow

Detailed steps

Parameters and variables

First you need to add a few parameters and variables to the package.

Parameters:

Name Data type Required Description
_7zaPathAndFilename String False The full path and filename of 7za.exe.
FilenamePattern String True The filename pattern of files you want to unzip.
SourceRootFolder String True The main folder where you want to search for zip files.
TraverseSubfolders Boolean False Indication if you want to look only in SourceRootFolder, or also in its subfolders.
UnzipRootFolder String True The folder to which files are unzipped. Important: this may NOT be a subfolder of SourceRootFolder.
ZipFilenamePattern String True The filename pattern of the zipfiles from which you want to unzip files.

SSIS-S01E05-103
Package parameters

Variables:

Name Data type Description
FLC_UnzipFolder String The subfolder where the files will be unzipped. This is a subfolder of the UnzipRootFolder, with the same relative subpath as the zipfile has in SourceRootFolder. Additionally, the zipfilename has been replaced with a subfolder that is the same as the zipfilename without the file extension.
FLC_ZipPathAndFileName String The zipfile full path, that is target of investigation if any files with FilenamePattern are zipped inside.

SSIS-S01E05-104
SSIS variables needed to do the job

Developing the package

FELC Unzip files

This foreach loop container finds all zip files.
In the Foreach Loop Editor, select Collection in the left pane.
As we want to make the solution generic by using all those parameters mentioned above, we have to add a few expressions.

SSIS-S01E05-105
Foreach Loop container Collection configuration

Add the following three expressions (in my opinion the editor is not very userfriendly, the window is too small, but it can be done):

Property Expression
Directory @[$Package::SourceRootFolder]
FileSpec @[$Package::ZipFilenamePattern]
Recurse @[$Package::TraverseSubfolders]

Please note that these expressions will set the values for Folder, Files and the Traverse subfolders checkbox in the Enumerator configuration part of the screen (inside the purple square in the picture above).
If you close and reopen the Foreach Loop Editor, you will notice that the configuration fields now use the values from the corresponding parameters, except the value Fully qualified for Retrieve file name, which is hardcoded.

SSIS-S01E05-106
Example of how to set an expression..

Then, still in the Foreach Loop Editor, select Variable Mappings in the left pane and map variable User::FLC_ZipPathAndFileName to Index 0, as shown in the picture below.

SSIS-S01E05-107
Variable Mappings

EXPR Set Unzip folder

The next nifty thing that needs to be done, is creating a folder to store the unzipped files. As we do not know if zipped files with the same name exist in multiple zip files, the safest thing to do is to create a directory structure identical to the directory structure of the zip files.
Additionally, the basename of the zip file (so without file extension) is used as an extra subfolder. In this way we are sure that unzipped files with the same name can be stored without overwriting other ones.

To do this, from the SSIS Toolbox drag an Expression Task inside the Foreach Loop container.
Open the Expression Task and paste this expression into the Expression textbox.

@[User::FLC_UnzipFolder] =
REPLACE(LEFT(@[User::FLC_ZipPathAndFileName],
FINDSTRING(@[User::FLC_ZipPathAndFileName], "."
+ TOKEN(@[User::FLC_ZipPathAndFileName], ".",
TOKENCOUNT(@[User::FLC_ZipPathAndFileName], ".")),1) -1),
@[$Package::SourceRootFolder], @[$Package::UnzipRootFolder])

Then press the Evaluate Expression button to check that the expression is valid.

SSIS-S01E05-108
Configuring the Expression Task

FSYS Create Unzip folder

In the previous step we have composed the full path to unzip files into, but this folder structure might not exist yet.
Therefore we use A File System Task named FSYS Create Unzip folder to create the folder structure.
The good news is that the File System Task can create multiple (sub)folders at once.
Configure it as shown in the picture.
SSIS-S01E05-109
Configuration of FSYS Create Unzip folder

EPR Unzip Selected Files

Below you see a few screenshots for the configuration of the Execute Process Task named EPR Unzip Selected Files.
The funny thing is, that in the Process pane, you do not have to change anything. Executable and Arguments are set using expressions (coming next) and all the other properties can keep their default values.

SSIS-S01E05-110
Process configuration.

SSIS-S01E05-111
Setting Executable and Arguments using expressions
Arguments (copy-friendly):

"-y x \"" + @[User::FLC_ZipPathAndFileName] + "\" -o" + @[User::FLC_UnzipFolder] + " " + @[$Package::FilenamePattern] + " -r"

SCR Remove Empty folders

There is one problem with the steps done so far. If a zipfile does not contain a file that matches the FilenamePattern, then the folderstructure is made for nothing.
However you could just leave these empty folders, I think the solution is a bit better if these empty folders are deleted.
To do this, from the SSIS Toolbox drag an Script Task inside the Foreach Loop container.

SSIS-S01E05-112
In the Script Task Editor, add both variables User::FLC_UnzipFolder and $Package::UnzipRootFolder to ReadOnlyVariables

In the Script Task Editor window, press Edit Script ….

Please note that you can download the C# script here..

Then in the namespaces region add two namespaces, as indicated below.

Namespaces region
#region Namespaces
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;
// Added namespaces:
using System.IO;
using System.Linq;
#endregion

Adjust public void main() as follows:

public void Main

public void Main()
{
  // Copy SSIS variable value of FLC_UnzipFolder to C# string variable.
  string unzipFolder = Dts.Variables["User::FLC_UnzipFolder"].Value.ToString();

  // Source: http://stackoverflow.com/questions/22520761/delete-all-empty-directories-in-specified-path
  // My credits go to author "Jon" from Berlin.
  // Because not all zipfiles might contain files with the FilenamePattern we are looking for, an empty path might have been created.
  // This script deletes these empty folders, so that in the end only folders will exist with unzipped files or subfolders in it.
  var di = new DirectoryInfo(unzipFolder);
  var root = Dts.Variables["$Package::UnzipRootFolder"].Value.ToString(); // no trailing slash!
  while (di.FullName != root
          && !di.EnumerateFiles().Any()
          && !di.EnumerateDirectories().Any())
  {
    di.Delete();
    di = di.Parent;
  }

 Dts.TaskResult = (int)ScriptResults.Success;
}

Download the script here.

The proof of the pudding is in the eating

Time to do some testing.
I have created a few folders with subfolders and subsubfolders. You can see the folder structure in the last picture below.
As you might have noticed the default values for my parameters already prepared for the testcase: I want to unzip only customer files from zip files with 201605 in the name.

SSIS-S01E05-113
Executing the package in Debug mode..

Below you can see the result. Only customer files are unzipped. The subfolders under the Archive folder that contain no customer zip files (but only currency files), do not exist under the Unzipped folder.

SSIS-S01E05-114
Folder structure of Archive and Unzipped rootfolders..

SSIS-S01E05-115
Unzipped “Customer” files ..

Important notes for using with huge amounts of zip files

From my experience I must also warn you. If you run this package on a folder containing gigabytes of zipfiles, there are two things to be aware of:

  1. The execution of the package can potentially take a long time, several hours is not unusual.
  2. Make sure you have enough disk space. Suppose you have the following hypothetical scenario: zip files: 100GB, average zip compression rate: 20%, percentage of files to be unzipped: 30%. In this case you are going to unzip files that have a zipped size of 30% of 100GB, is 30GB. Once unzipped, their size will be 30GB * (1/0,20) = 150GB. You need to have that disk space available on the disk where the Unzipped folder resides.

Conclusion / Wrap up

With this SSIS package you can selectively unzip files from archived zip files.
When using it, make sure you have enough disk space, and be aware that the package can run for several hours, depending on the number and (average) size of the zip files to process.

(c) 2016 hansmichiels.com – Do not steal the contents – spread the link instead – thank you.

Adding custom header and footer to csv files (SSIS Series)

Introduction

This article wasn’t planned. SSIS again? No I wanted to write a post about something else.
Nevertheless I got this question on one of my previous posts and thought, I might as well create a blog post about it.

Problem

“Hi Hans,
I have created my ssis package for flat file destination. My problem is that I want to add header and footer in my flat file destination to send it to FTP location format will be

0|MMDDYYYY|HHMMSS
1|Row1
1|Row2
1|Row3
1|ROW4
9|6

The last line contains no of row count including header and trailer
Please let me know how to do that.. I am completely stuck with that and no good luck.. “

So to summarize, both the header and the footer line of the csv file need to contain dynamic content.
Also each row starts with a ‘row type’: 0 for the header, 1 for every detail row, and 9 for the footer.

Solution

As an example I just use a simple Data Flow Task with an OLEDB Source connected to table DimCurrency from the AdventureWorksDW database.
The data then flows to a Script Component used as a Destination.
Here are a few printscreens to demonstrate what I mean. The Script Component will create the file.
The package is made using SSIS 2014.

Developing the package

SSIS-S01E04-099
The demo package just contains one Data Flow Task

SSIS-S01E04-205
Design of the Data Flow Task

SSIS-S01E04-101
OLEDB Source: DimCurrency from the AdventureWorksDW database

SSIS-S01E04-102
OLEDB Source: select all three columns

SSIS-S01E04-208
When you drag the Script Component to the Data Flow, you will get this dialog. Use the Script Component as a Destination

SSIS-S01E04-207
Variables that the package contains.

SSIS-S01E04-209
In the Script Transformation Editor, add both variables User::HeaderTimestamp and User::TargetCsvPathAndFileName to ReadOnlyVariables

Now comes the most interesting part! We use the endless possibilities of C# to fix our problem.
In the Script Transformation Editor window, press Edit Script ….

Please note that you can download the entire script here, so you do not have to copy the (sometimes malformatted) code from this webpage..

Then in the namespaces region add two namespaces, as indicated below.

Namespaces region
#region Namespaces
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;
// Added namespaces:
using System.IO;
using System.Text;
#endregion

Above public override void PreExecute() add the copyright notice and the following two regions: Configuration and Private variables.
Please note that these regions do not exist yet, so you have to copy the code including the #region and #endregion lines.

New regions
  /*
  (c) Copyright 2016 - hansmichiels.com
 
  This program is free software: you can redistribute it and/or modify
  it under the terms of the GNU General Public License as published by
  the Free Software Foundation, either version 3 of the License, or
  (at your option) any later version.
 
  This program is distributed in the hope that it will be useful,
  but WITHOUT ANY WARRANTY; without even the implied warranty of
  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
  GNU General Public License for more details.
 
  You should have received a copy of the GNU General Public License
  along with this program. If not, see http://www.gnu.org/licenses/.
  */

 
  #region ***** Configuration *****

  // Here you can change the column delimiter and the text qualifier easily.
  private const string _delimiter = “|”;
  private const string _textQualifier = “”;
  // To set the text qualifier to ” use:
  // private const string _textQualifier = “\””; // I recommend to use a text qualifier.
  
  // Every x rows, write the buffer to the file, and clear the buffer.
  // This is faster than writing every individual line to the file.
  // The higher the value of _bufferRows is, the more memory will be used and the less write actions are done to the csv file.
  // A higher value is a good idea if you have enough memory in your server.
  private const int _bufferRows = 100000;

  #endregion ***** Configuration *****

  #region Private variables

  // Variable to hold the path and filename
  private string _fileName;
  // Variable to count the rows
  private int _rowCount;
  // StringBuilder to hold a buffer of text that still needs to be written to the csv file.
  private StringBuilder _fileContentsBuffer;

  #endregion Private variables

PreExecute() is called before the data starts to flow through. Here we do some preparation and compose the headerline of the file.

PreExecute()

 &nbs p;/// <summary>
  /// This method is called once, before rows begin to be processed in the data flow.
  /// </summary>
  public override void PreExecute()
  {
    base.PreExecute();

    // Initialize variables
    _rowCount = 0;
    _fileContentsBuffer = new StringBuilder();
    _fileName = Variables.TargetCsvPathAndFileName;
    DateTime headerTimestamp = Variables.HeaderTimestamp;

    // Append the new line to the buffer.
    // This will be written to file with the first detail rows.
    // If there are no detail rows, no file will be created.
    _fileContentsBuffer.AppendFormat(“{1}0{1}{0}{1}{2:ddMMyyyy}{1}{0}{1}{2:HHmmss}{1}\r\n”, _delimiter, _textQualifier, headerTimestamp);
    }

PostExecute() is called after the data has flowed through. Here we append the last lines in the _fileContentsBuffer variable and the footerline to the file.

PostExecute()
  /// <summary>
  /// This method is called after all the rows have passed through this component.
  ///
  /// You can delete this method if you don't need to do anything here.
  /// </summary>
  public override void PostExecute()
  {
    base.PostExecute();
    // If no detail rows, bail out. No file will be created when there are no detail rows.
    if (_rowCount == 0) return;

    using (StreamWriter sw = File.AppendText(_fileName))
    {
      // Append a footer line with the rowcount, example value: 9|105
      // The rowcount is a count of the detail rows, but the footer count
      // needs to include header and footer, so we add 2 to the value of _rowCount.
      _fileContentsBuffer.AppendFormat(“{1}9{1}{0}{1}{2}{1}”, _delimiter, _textQualifier, _rowCount + 2);
      sw.Write(_fileContentsBuffer.ToString());
      _fileContentsBuffer.Clear();
    }
  }

Input0_ProcessInputRow(Input0Buffer Row) is the place where all the data processing takes place. As this script is the Destination for the data, we need to do something with those data. A Flat File Destination is not there and will not do this for us.
So here we append every detail row to the _fileContentsBuffer variable. Once in a while, we write the buffer to the file, and clear the buffer again.

Input0_ProcessInputRow

  /// <summary>
  /// This method is called once for every row that passes through the component from Input0.
  /// </summary>
  /// <param name=”Row”>The row that is currently passing through the component</param>
  public override void Input0_ProcessInputRow(Input0Buffer Row)
  {
    // Increase the rowcounter
    _rowCount++;

    // Append the new line to the buffer.
    // The format needs to be like “{1}[middle part]{1}\r\n”, so that the row starts and ends with the text qualifier.
    // Replace [middle part] with {1}{0}{1}{x} for every column that you want to append to the file, where x is 2 for column 1, 3 for column 2 and so on.
    _fileContentsBuffer.AppendFormat(“{1}1{1}{0}{1}{2}{1}{0}{1}{3}{1}{0}{1}{4}{1}\r\n”, _delimiter, _textQualifier,
      SafeValue(Row.CurrencyKey), // First column, {2} in the format
      SafeValue(Row.CurrencyAlternateKey), // Second column, {3} in the format
      SafeValue(Row.CurrencyName) // Third column, {3} in the format
             // etc.
      );
    
    // Every x rows, write the buffer to the file, and clear the buffer.
    if (_rowCount % _bufferRows == 0)
    {
      using (StreamWriter sw = File.AppendText(_fileName))
      {
      sw.Write(_fileContentsBuffer.ToString());
      _fileContentsBuffer.Clear();
      }
    }
  }

Multiple overloaded versions of the SafeValue method (for different datatypes like string, int, DateTime, you can add more when needed) add escape characters to the column values when needed.

SafeValue method and overloads
  /// <summary>
  /// Escapes a double quote delimiter in a column value by doubling it.
  /// </summary>
  /// <param name=”value”></param>
  /// <returns></returns>
  private string SafeValue(string value)
  {
    // If _textQualifier is null or empty, just return the value.
    if (string.IsNullOrEmpty(_textQualifier)) return value;

    if (value.Contains(_textQualifier))
    {
    // If the value contains one or more text qualifiers,
    // escape them by replacing them with two text qualifiers.
    return value.Replace(_textQualifier, _textQualifier + _textQualifier);
    }
    return value;
  }

  /// <summary>
  /// Overloaded method for DateTime value (meant as an example)
  /// For other data types, create new overloaded methods,
  /// and in that overloaded method convert the value to string,
  /// before calling the 'SafeValue' method with a string parameter.
  /// </summary>
  /// <param name=”value”></param>
  /// <returns></returns>
  private string SafeValue(DateTime value)
  {
    return SafeValue(value.ToString(“yyyy-MM-dd HH:mm:ss.fff”));
  }

  /// <summary>
  /// Overloaded method for int value
  /// </summary>
  /// <param name=”value”></param>
  /// <returns></returns>
  private string SafeValue(int value)
  {
    return SafeValue(value.ToString());
  }
}

When the script is ready, Choose Rebuild from the BUILD menu in the VstaProjects – Microsoft Visual Studio Window. Then close that Window. You return to the Script Transformation Editor, close it by pressing OK.

Testing the package

I simply run the package in debug mode. 105 currency rows are added to the csv file.
Including the header and footer, the csv file contains 107 rows, and this is the number that is written to the footer line.

SSIS-S01E04-110
Running the package in debug mode

SSIS-S01E04-111
First and last part of the created csv file.

Please note that to change the delimiter and text qualifier you only have to make two changes in the ***** Configuration ***** region.

Configuration
  #region ***** Configuration *****

  // Here you can change the column delimiter and the text qualifier easily.
  private const string _delimiter = “;”;
  private const string _textQualifier = “\””;
. . .

  #endregion ***** Configuration *****

By this change the file will look as follows:
SSIS-S01E04-112
First and last part of the created csv file with a different delimiter and text qualifier.

I also did a test with a source query that returned over 1.1 million rows, and the csv file of 25 MB was created in just 3 seconds (on a core i7 laptop with SSD).
So my conclusion is that the performance of this solution is very satisfying.

Download the C# script for the script component here.

Conclusion / Wrap up

In this post I have demonstrated how to use SSIS 2014 to create a csv file with a dynamic header (with a timestamp) and footer line (including a rowcount) using a Script Component as a Destination in a Data Flow Task.

(Promotional)

Big Data & Analytics insight 2016: save the date for an insightful conference
(as a speaker I was asked to do a bit of promotion ..)

(c) 2016 hansmichiels.com – Do not steal the contents – spread the link instead – thank you.