Using alternate (non-time) triggers to start a SQL Agent job (SQL Server Series)

Updates

After its first publication of this post, I have made a number of modifications on the tool:

Change log

Date Changes
25 September, 2016 Major update:
(1) extended to use an umlimited number of jobtrigger conditions;
(2) added SQLAgentJobResult trigger conditions;
(3) configuration via config file no longer supported.
28 January, 2017 Major update:
(1) added “Running” as a state that can be checked for a job (in the previous version the values where “Succeeded”, “Failed” or “Completed”);
(2) added jobcondition property “reverse_condition”, which enables negative triggers, e.g. a file may not be found, a job may not be running;
(3) a check is done that the jobs used for jobtriggers and jobtrigger conditions actually exist, if not an error is thrown, so this can prevent a silent failure, when a job is renamed or removed after a jobtrigger has been made.

Introduction

Today some SQL Agent stuff. I thought the SQL Agent works just great, until I got the question: can you start this job, when there is a file dropped in folder X, but only if also another process has already done something in the database. Huh?

Problem

With the SQL Agent I can only schedule a job to using time triggers, while I need different triggers like
existence of one or more files, a certain database state and/or other SQL Agent Job(s) having succeeded, failed or just completed.

Solution

So I needed something that checked if a file exists in a folder, and if something was done in the database, and if both is true, a SQL Agent job must be started.
I decided to create a C# Console Application named SQLAgentJobStarter, that could do this job for me. But I wanted it to be a bit more flexible, using one or more trigger conditions of different types, like file trigger or a sql trigger, or both (as in the requirement above).
Later on I extended this with jobtrigger conditions, and I changed the definitions a bit. A trigger can now have one or more conditions of different types:

  • file jobtrigger condition
  • sql jobtrigger condition
  • SQLAgentJobResult jobtrigger condition

The different condition types can be combined in one jobtrigger, e.g. SQLAgentJob “Job_A” must have succeeded and file test.txt must be in folder C:\temp.

In this article I want to concentrate on instructing you how to use this tool, instead of going into deep from a coding point of view.
I have made this tool an open source project on GitHub, so you still can look in the code (SQLAgentJobStarter_VisualStudio2015_SourceCode.zip) if you want, but that’s up to you.

DISCLAIMER: please use this tool at your own risk. It is in a bèta state, you should not use it in production environments, until you have convinced yourself that it does what it should do.

Installation and demonstration of the tool

One important comment first. This tool is not a Windows Service. It is “just” a Console Application that needs to be scheduled using a SQL Agent job or Windows Task Scheduler.
Having said that, what are we going to do?

  • We are going to create two tables in the msdb database: [dbo].[sysjobtriggers] and [dbo].[sysjobtrigger_conditions]. All other SQL Agent job data is also in this database, therewith this seems a fair place. Still I want to emphasize the importance of backing up this database on a regular base, preferably daily. But if you have a DBA and a maintenance plan set up, this might be needless to say.
  • We are going to create a few stored procedures in the msdb database to maintain those tables.
  • We are going to install the executable HansMichiels.com.Tools.SQLAgentJobStarter.exe and corresponding config file in the C:\Program Files\SQLAgentJobStarter\MSSQL13.MSSQLSERVER folder. MSSQL13.MSSQLSERVER represents the SQL Version number and Instance name, in the example a default instance of SQL Server 2016. Change it to the name of your SQL Server version and instance. In this way you can install the tool side by side for different SQL Instances on the same machine.
  • Before start using the tool, check the SQLConnectionString in the HansMichiels.com.Tools.SQLAgentJobStarter.exe.config file. By default, this is configured as follows: key=”SQLConnectionString” value=”Data Source=(local);Initial Catalog=msdb;Integrated Security=SSPI;”. This is okay for a local default SQL Server Instance, but not usable when you are using named instances.
  • We are going to schedule the tool using a SQL Agent job.
  • Now you are ready to rock! All that needs to be done is adding jobtriggers and jobtrigger_conditions to the msdb database. But to help you with this, I will show you a demo now.

Installation – technical steps

  • Download all the needed files here.
  • Unzip the files in SQLAgentJobStarter_ProgramFiles.zip, and copy them to a folder on your server, I used C:\Program Files\SQLAgentJobStarter\MSSQL13.MSSQLSERVER.
  • Review the script msdb_install.sql and if needed adjust the foldername here: @command=N'”C:\Program Files\SQLAgentJobStarter\MSSQL13.MSSQLSERVER\HansMichiels.com.Tools.SQLAgentJobStarter.exe”‘,
  • After reviewing, execute the script msdb_install.sql
  • Review the SQL Agent job ##SQLAgentJobStarter##, especially the schedule. For the demo the polling interval of the schedule is 1 minute, but I advice to set it to 3 or 5 minutes for production environments, because this interval usually is good enough.

    Also enable the schedule, it is installed in a disabled state.

So far, so good. The tool is installed.

Installation – preparation for the demo.

If you want to join the demo, also do the following:

  • execute script demo_setup_sqlagentjobs.sql: this script will set up dummy SQL Agent jobs that are used for the demo.
  • execute script demo_setup_storedprocedures.sql: this script will create a database [MSSQL_S01E04] with some stored procedures that are used for sql triggers.
  • create a folder C:\Temp, if it does not exist yet, and create a file with a .txt extension in it.
  • execute script demo_setup_sysjobtriggers.sql: this script will add demo rows to the tables [dbo].[sysjobtriggers] and [dbo].[sysjobtrigger_conditions]
  • enable all schedules. To do this you can right-click on the Jobs node of SQL Server Agent and click on Manage Schedules, then select all the schedules for the demojobs (their names start with Schedule for MSSQL_S01E04) and the schedule for ##SQLAgentJobStarter##, if not enabled yet.


The Manage Schedules menu-item ..


Check the Enabled checkbox for the applicable schedules ..

The jobs that are created so far

mssql-s01e04-342
The tables in msdb that hold the trigger information ..

mssql-s01e04-355The stored procedures in msdb used to maintain the jobtriggers ..

mssql-s01e04-334The [MSSQL_S01E04] database with stored procedures, used for the demo.

mssql-s01e04-335The C:\Temp folder, that should contain one or more .txt files ..

Viewing the demo results.

Now just wait for about 10 minutes.
In the meantime I can tell you some expected testresults:

  • MSSQL_S01E04_Job01_SqlConditionTrue, MSSQL_S01E04_Job02_FileConditionTrue and MSSQL_S01E04_Job03_FileAndSqlConditionTrue will be executed every time after the job ##SQLAgentJobStarter## has run, because their triggers (C:\Temp\*.txt and [MSSQL_S01E04].[sp].[SqlTriggerExample], both return True.
  • MSSQL_S01E04_Job04_TwoJobSuccessConditions will only be executed after the triggering jobs have succeeded both.
  • MSSQL_S01E04_Job05_JobFailureCondition will only be executed after the triggering job has failed.
  • MSSQL_S01E04_Job06_JobConditionThatNeedsToComplete will only be executed after the triggering job has either succeeded or failed.
  • MSSQL_S01E04_Job07_RunningJob will only be executed when the triggering job is running.
  • MSSQL_S01E04_Job08_NonRunningJob will only be executed when the triggering job is not running. Because job 7 and 8 both have the same triggering job, either MSSQL_S01E04_Job07_RunningJob or MSSQL_S01E04_Job08_NonRunningJob will be started after the job ##SQLAgentJobStarter## has run.
  • MSSQL_S01E04_Job09_SqlConditionFalse will never be executed because the jobtrigger uses a sql trigger using stored procedure [MSSQL_S01E04].[sp].[SqlTriggerFalse], that always returns 0 (false).
  • MSSQL_S01E04_Job10_ReversedFileCondition_RunsWhenFileDoesNotExist will only be executed when the file “C:\Temp\NowJob10DoesNotRun.txt” does not exist.

Let’s see if it worked out!


Viewing the job history of job 1 to 3..
In the job history I can see that:

  • MSSQL_S01E04_Job01_SqlConditionTrue, MSSQL_S01E04_Job02_FileConditionTrue and MSSQL_S01E04_Job03_FileAndSqlConditionTrue were executed every time after the job ##SQLAgentJobStarter## had run.
  • ##SQLAgentJobStarter##, the job that caused this, was also executed every minute, as scheduled.


Viewing the job history of job 4..

  • MSSQL_S01E04_Job4 was only being executed after the triggering jobs had succeeded both.


Viewing the job history of job 5..

  • MSSQL_S01E04_Job5 was only being executed after the triggering job had failed.


Viewing the job history of job 6..

  • MSSQL_S01E04_Job6 was being executed after the triggering job had either succeeded or failed.


Viewing the job history of job 7 and 8..

  • MSSQL_S01E04_Job07_RunningJob was being executed when the triggering job was still running (job MSSQL_S01E04_Trigger_for_Job07_and_Job08 runs every other minute for 30 seconds).


Viewing the job history of job 9..

  • MSSQL_S01E04_Job09_SqlConditionFalse was never executed: it is checked on the left, but does not occur on the right.


Viewing the job history of job 10 when the file “C:\Temp\NowJob10DoesNotRun.txt” does not exist.

Viewing the job history of job 10 after the file “C:\Temp\NowJob10DoesNotRun.txt” is created.

  • MSSQL_S01E04_Job10_ReversedFileCondition_RunsWhenFileDoesNotExist was only executed when the file “C:\Temp\NowJob10DoesNotRun.txt” did not exist.

But there is more. The tool will write to the Windows Eventlog when an error occurs.
mssql-s01e04-337To view the windows Eventlog, press <Windows Key>+R, then type eventvwr


Example of errors that are logged.

Make sure you clean up your server! Use the script demo_cleanup.sql for that. Otherwise the tool will keep on executing the demo jobs. Also change the interval of the ##SQLAgentJobStarter## from 1 to 3 or 5 minutes.

Understanding the stored procedures of the solution

Stored procedure [dbo].[usp_upsert_jobtrigger]

With this stored procedure you can insert or update a jobtrigger.

You can set the following parameters:

  • @trigger_id: If provided as input, it will be used for checking if an update or insert needs to be done. If left null or smaller than or equal to 0, it will be filled in the stored procedure.
  • @enabled: Only enabled jobtriggers are processed. Values 0 or 1 (required).
  • @name: Name for the jobtrigger (unique + required). Is used for checking if an update or insert needs to be done, when @trigger_id is null or smaller than or equal to 0
  • @sqlagent_jobname: The name of the SQL Agent job that must be started when all jobtrigger conditions are true (required).
  • @processing_order: Indicates the order in which jobtriggers must be processed (required). Duplicate processing order values are allowed.
  • @description: An optional description.

Stored procedure [dbo].[usp_upsert_jobtrigger_condition]

With this stored procedure you can insert or update a jobtrigger condition of any type.

You can set the following parameters:

  • @condition_id: If provided as input, it will be used for checking if an update or insert needs to be done. If left null or smaller than or equal to 0, it will be filled in the stored procedure.
  • @trigger_id: The trigger_id of the jobtrigger to which the jobtrigger condition belongs.
  • @enabled: Only enabled jobtrigger conditions are processed. Values 0 or 1 (required).
  • @processing_order: Indicates the order in which jobtrigger conditions must be processed (required). Duplicate processing order values are allowed.
  • @description: An optional description.
  • Note regarding the parameters underneath: You must provide values for the one @sql* parameter OR the three @file* parameters OR the three @job* parameters.
  • @sql_stored_procedure: The fully qualified name (including database name) of a stored procedure that returns a single row with a single column named [SqlTriggerResult] of the BIT datatype.
  • @file_watch_folder: The folder where is searched for files.
  • @file_recurse_subfolders: Indicates if also subfolders of the watchFolder must be checked.
  • @file_name_pattern: A filename or filename pattern with wildcards of the file that must exist for the triggercondition to be true.
  • @job_name: The name of a SQLAgentjob that must be completed with the desired value for @job_result for the triggercondition to be true.
  • @job_result: Domainvalues [S | F | C] meaning Succeeded, Failed or Completed (=Succeeded or Failed).
  • @job_result_maxage: An age in seconds for the maximum time ago the job was completed. Use -1 for unlimited. For instance, you want a sql agent job to start when 4 other jobs, that are scheduled daily and partially run simultaneously, have all succeeded.
    The time difference between the first and the last estimated completion time is 60 minutes.
    The time difference between the first scheduled starttime and the last estimated completion time is 120 minutes.
    Any value (recalculated as seconds) between 60 minutes (plus a safety margin) and 22 hours (24 hours minus 120 minutes) would be fine.
  • @reverse_condition: If the value 1 (true) is used, the condition must have the opposite result to trigger the job to start. For example a file may not be found or a job may not be running.
    A few things are important to understand:

    1. Job results “Succeeded”, “Failed” or “Completed” check for a result; job state “Running” (or not) checks for a current state. Do not get confused about this.
    2. @reverse_condition does not work for job results “Succeeded” and “Failed” and there is no need to: the opposite of “Succeeded” is “Failed”, you should implement it that way. The opposite of “Not Succeeded” implemented with @reverse_condition = 0, would be any result except “Succeeded”, so “Failed”, “Completed” or “No execution found”.
    3. @reverse_condition does not work for “Completed” for the same reason: The opposite of “Completed” implemented with @reverse_condition = 0, would be any result except “Completed”, so “Failed”, “Succeeded” or “No execution found”. Instead of “Not Completed” you could check for “Running”.
    4. If @job_result = ‘R’ (Running) and @reverse_condition = 0 then @job_result_maxage is not used: when job is running (and the condition is met) @sqlagent_jobname is only started, if it was not started yet after the current execution of the running job (so a running job will not cause the other job to run more than once).
    5. If @job_result = ‘R’ (Running) and @reverse_condition = 1 then @job_result_maxage is used differently: when job is not running (and the condition is met) @sqlagent_jobname is only started, if it was not started the last @job_result_maxage seconds. Therefore it is best, to use “Not Running” only in conjunction with some other condition.

Stored procedure [dbo].[usp_upsert_sql_jobtrigger_condition]

With this stored procedure you can insert or update a jobtrigger condition of type ‘sql’ (using a stored procedure).

You can set the following parameters:
@condition_id, @trigger_id, @enabled, @processing_order, @description, @sql_stored_procedure, @reverse_condition: as described above.

Stored procedure [dbo].[usp_upsert_file_jobtrigger_condition]

With this stored procedure you can insert or update a jobtrigger condition of type ‘file’.

You can set the following parameters:
@condition_id, @trigger_id, @enabled, @processing_order, @description, @file_watch_folder, @file_recurse_subfolders, @file_name_pattern, @reverse_condition: as described above.

Stored procedure [dbo].[usp_upsert_jobresult_jobtrigger_condition]

With this stored procedure you can insert or update a jobtrigger condition of the SQLAgentJobResult type.

You can set the following parameters:
@condition_id, @trigger_id, @enabled, @processing_order, @description, @job_name, @job_result, @job_result_maxage, @reverse_condition: as described above.

Stored procedure [dbo].[usp_delete_jobtrigger]

With this stored procedure you can delete a jobtrigger by trigger_id or name.

Stored procedure [dbo].[usp_delete_jobtrigger_condition]

With this stored procedure you can delete a jobtrigger condition by using the condition_id.

Stored procedure [dbo].[usp_delete_jobtrigger_conditions_by_trigger_id]

With this stored procedure you can delete all jobtrigger conditions for a given trigger_id.

Examples of usage of the stored procedures

Among the files you can download there is the file demo_setup_sysjobtriggers.sql, that uses the upsert stored procedures for different use cases.

Conclusion / Wrap up

I have tried to explain how the SQLAgentJobStarter tool can be used for starting SQL Agent jobs triggered by the existence of one or more files, by the result of a stored procedure execution or by the execution result of one or more other SQL Agent Jobs.
A few important sidemarks I still have to make:

  1. In real life scenarios, make sure the files you are polling for are moved after processing, so that the tool does not try to start the job again.
  2. To create stored procedures that are used as a sql trigger you can use the storedprocedure_template.sql file as a template.
  3. To create your own jobtriggers and triggerconditions please check demo_setup_sysjobtriggers.sql in which some of the stored procedures are used.
  4. I am aware that the sourcecode of the tool might need some refactoring. It does the job, however.
  5. Anyone that can help me with creating a graphical user interface to manage the jobtriggers and triggerconditions using C# and WinForms is more than welcome. If you are interested, please contact me. I do not have enough time to do this myself.

(c) 2016/2017 hansmichiels.com – Do not steal the contents – spread the link instead – thank you.

Using the project deployment model with environments (SSIS Series)

Introduction

Hi, as announced in my previous post of July, 14, I took a summer break. I was really useful and I enjoyed it.
From now on I will try to write some interesting posts weekly again.
Today I want to explain the project deployment model of SSIS, introduced already in SSIS 2012.
Also you are gonna find a (hopefully useful) script below that creates folders, environments and environment variables, and creates connections from package parameters to environment variables.
But that’s for later, first a little introduction.

Folders and environments

What I found very confusing when I started using the project deployment model, already some years ago, is that an environment in this context is not what I expected it to be.
From a software development perspective an environment is an installed and working version of your software, optionally using one or more databases and configuration setttings exclusively for that environment. For instance a Development or a Test environment.
But the project deployment model of SSIS does not agree with that definition.
For the project deployment model an environment is only a set of values for all environment variables, and nothing more.
Can you think of what that means? Suppose you have a server, both used for Development and Test, and you have databases and cubes with a “_DEV” or a “_TST” suffix to their names.
But as far as SSIS is concerned, you can only install one version of an SSIS package! Yeah, you can execute it with a different set of parameter values stored in SSIS environment variables, but that’s it.
You cannot have two different versions of the same package installed on the same instance of SQL Server – at least not using only SSIS Environments.
I found it not so intuitive that you need to use folders inside the Integration Services Catalog to achieve this.
So far the theory, let us try this stuff out together.

Let’s try it

I’ll just click an example together, you do not need to follow me, my goal is that you understand the concepts so that you can apply it to your own projects.
By the way, I am using SQL Server 2016, but it should also work in SQL Server 2012 and 2014.

My test setup is very basic:

  • I have made two databases, [SSIS_S01E08_DEV] AND [SSIS_S01E08_TST], as development- and testenvironment, respectively.
  • Both databases have a table [tbl].[Fruit], but the table in the DEV database has one extra column, [FruitDescription], to simulate a change that is under development but not deployed to test yet.
  • I have made one SSIS Package AddFruit.dtsx, that inserts a row into the table [tbl].[Fruit] passing the value for [FruitName] only. During the demo I will change the package so that it also provides the [FruitDescription] column value.

Playing around with the example above, I can demonstrate the use of folders and environments when the package is deployed using the project deployment model.

The initial SSIS Package

The initial SSIS Package contains just one Execute SQL Task, one parameter named FruitName and one OLEDB Connection manager named MyDatabase.connectionmanager.
The connection manager is configured by an expression so that the property ConnectionString gets the value of a project parameter also named ConnectionString as demonstrated in the images below.
SSIS-S01E08-280The Execute SQL Task

SSIS-S01E08-281The package parameter

SSIS-S01E08-282The project parameter

SSIS-S01E08-283The OLEDB connection manager configured with an expression
SSIS-S01E08-284Then rebuild the Solution

Deploying the SSIS Package

Once the solution has been rebuilt, you can deploy the package with the Integration Services Deployment Wizard. You can also call this utility from the command line, but explaining that is out of scope for this blog post.
To execute this wizard, you have to locate the .ispac file, it is in a subfolder of your projects bin folder.
If you have created additional solution configurations in your Visual Studio solution, this subfolder might be different from the image below (where the subfolder is Development).
SSIS-S01E08-285Locating the .ispac file, that was created when the solution was rebuilt.

SSIS-S01E08-286First step of the wizard

SSIS-S01E08-287The default deployment model is Project Deployment, the one you need. The other option is Package Deployment.

SSIS-S01E08-288Select a server name and a path [1]

SSIS-S01E08-289Select a server name and a path [2]: create a new folder

SSIS-S01E08-290Select a server name and a path [3]: name the folder after your environment

SSIS-S01E08-292Select a server name and a path [4]: then click Next

SSIS-S01E08-293Review and deploy.

SSIS-S01E08-295Deployment succeeded.

SSIS-S01E08-296In the Integration Services Catalog you can find your project and package.

SSIS-S01E08-297You could configure it now, but there are no SSIS environments yet (note: these are a set of environment variable values).

SSIS-S01E08-298Don’t configure just yet. We will do this later.

SSIS-S01E08-299Here you could create an environment .. don’t do it now !

SSIS-S01E08-TST-environment_setup.sql

--\
---) A script to create a folder and environment in an SSIS Catalog.
---) Author: Hans Michiels
--/
/*
(c) Copyright 2016 - hansmichiels.com
 
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
 
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
 
You should have received a copy of the GNU General Public License
along with this program. If not, see http://www.gnu.org/licenses/.
*/

USE [SSISDB]
GO

DECLARE @environment_variables_table TABLE(
  [id] int identity(1, 1),
  [variable_name] nvarchar(128),
  [data_type] nvarchar(128),
  [sensitive] bit,
  [value] nvarchar(4000),
  [description] nvarchar(1024)
  )

DECLARE @object_parameter_value_table TABLE(
  [id] int identity(1, 1),
  [object_type] smallint,
  [object_name] nvarchar(260),
  [parameter_name] nvarchar(128)
  )

DECLARE @id INT
DECLARE @max_id INT
DECLARE @folder_name sysname
DECLARE @environment_name nvarchar(128)
DECLARE @environment_description nvarchar(128)
DECLARE @project_name nvarchar(128)
DECLARE @variable_name nvarchar(128)
DECLARE @data_type nvarchar(128)
DECLARE @sensitive bit
DECLARE @value sql_variant
DECLARE @value_tinyint tinyint
DECLARE @description nvarchar(1024)
DECLARE @nsql nvarchar(max)

DECLARE @object_type smallint
DECLARE @object_name nvarchar(260)
DECLARE @value_type CHAR(1)
DECLARE @parameter_name nvarchar(128)
DECLARE @parameter_value sql_variant

--\
---) Environment settings
--/
SELECT
  @folder_name = N'TST',
  @environment_name = N'TST',
  @environment_description = N'My demo Test environment',
  @project_name=N'SSIS_S01E08'

--\
---) Script to create environment and settings.
--/
IF NOT EXISTS (
    SELECT 1
    FROM [SSISDB].[internal].[environments]
    WHERE [environment_name] = @environment_name
    )
BEGIN
    EXEC [SSISDB].[catalog].[create_environment] @folder_name, @environment_name, @environment_description;
END

INSERT INTO @environment_variables_table
  ( [variable_name], [data_type], [sensitive], [value], [description] )
  VALUES
      ( N'ConnectionString', N'String', 0, N'Data Source=.;Initial Catalog=SSIS_S01E08_TST;Provider=SQLNCLI11.1;Integrated Security=SSPI;Auto Translate=False;', N'Connection string for the database.' )
    , ( N'FruitName', N'String', 0, N'Tamarillo', N'' )
    --, ( N'Int32Example', N'Int32', 0, N'1', N'' )
    --, ( N'BooleanExample', N'Boolean', 0, N'1', N'' )
    --, ( N'ByteExample', N'Byte', 0, N'5', N'' )

SELECT @id = 1, @max_id = MAX([id]) FROM @environment_variables_table
WHILE @id <= @max_id
BEGIN
    SELECT
      @variable_name = v.variable_name,
      @data_type = v.data_type,
      @sensitive = v.sensitive,
      @value = v.value,
      @description = v.[description]
    FROM @environment_variables_table v
    WHERE [id] = @id;

    IF EXISTS (
        SELECT 1
        FROM [SSISDB].[internal].[environment_variables] v
        JOIN [SSISDB].[internal].[environments] e ON e.environment_id = v.environment_id
        WHERE v.[name] = @variable_name
        AND e.environment_name = @environment_name
        )
    BEGIN
        SET @nsql = N'EXECUTE [catalog].[delete_environment_variable] '
          + N'@folder_name = N'''+ @folder_name + ''', @environment_name = N''' + @environment_name + ''', '
          + N'@variable_name = N''' + @variable_name + ''''
        PRINT @nsql;
        EXEC sp_executesql @nsql;
    END

    PRINT '/*'
    PRINT @variable_name
    PRINT CONVERT(nvarchar(128), SQL_VARIANT_PROPERTY(@value, 'BaseType'));
    PRINT '*/'

    SET @nsql = N'EXECUTE [catalog].[create_environment_variable] '
      + N'@folder_name = N'''+ @folder_name + ''', @environment_name = N''' + @environment_name + ''', '
      + N'@variable_name = N'''+ @variable_name + ''', @data_type = N''' + @data_type + ''', '
      + N'@sensitive = ' + CONVERT(NVARCHAR, @sensitive) + ', @description = N''' + @description + ''', '
      + CHAR(13) + CHAR(10) + N'@value = ' +
      CASE UPPER(@data_type)
      WHEN 'String' THEN 'N''' + CONVERT(NVARCHAR(1000), @value) + ''' '
      ELSE CONVERT(NVARCHAR(1000), @value)
      END + '; '
    PRINT @nsql;
    EXEC sp_executesql @nsql;

    SET @id = @id + 1
END

--\
---) Add environment reference to project.
--/
Declare @reference_id bigint
IF NOT EXISTS(SELECT 1
  FROM [SSISDB].[internal].[environment_references] r
  JOIN [SSISDB].[internal].[projects] p
    ON p.project_id = r.project_id
  WHERE p.name = @project_name
    AND r.environment_name = @environment_name
  )
BEGIN
    EXEC [SSISDB].[catalog].[create_environment_reference]
        @environment_name=@environment_name, @reference_id=@reference_id OUTPUT,
        @project_name=@project_name, @folder_name=@folder_name, @reference_type='R';
    --Select @reference_id
END

--\
---) Connect environment variables to package parameters, based on the name
--/
INSERT INTO @object_parameter_value_table (
    [object_type],
    [object_name],
    [parameter_name]
    )
SELECT
    prm.[object_type],
    prm.[object_name],
    prm.[parameter_name]
FROM
    [SSISDB].[internal].[object_parameters] prm
JOIN
    [SSISDB].[internal].[environment_variables] ev
    ON ev.name = prm.parameter_name
JOIN
    [SSISDB].[internal].[projects] prj
    ON prj.project_id = prm.project_id
WHERE
    prm.[value_type] != 'R'
AND
    prm.value_set = 0
AND
    prj.name = @project_name
ORDER BY
    prm.object_name, prm.parameter_name

SELECT @id = 1, @max_id = MAX([id]) FROM @object_parameter_value_table
WHILE @id <= @max_id
BEGIN
    SELECT
      @object_type = v.[object_type],
      @object_name = v.[object_name],
      @parameter_name = v.[parameter_name]
    FROM @object_parameter_value_table v
    WHERE [id] = @id;

    SELECT @value_type = 'R', @parameter_value = @parameter_name;
    
    EXEC [SSISDB].[catalog].[set_object_parameter_value]
      @object_type, @folder_name, @project_name, @parameter_name,
      @parameter_value, @object_name, @value_type

    SET @id = @id + 1
END

GO

My script that facilitates to set up multiple environments quickly (download all the used scripts here).

In the demo it would not be a problem to do all the configuration by hand, but if you have a lot of SSIS Packages with a lot of parameters, you really do not want to do this, especially not if you have different environments, like Development, Test and Production.
This is why I made a script that automates this work for me:

  • It creates an environment, if it doesn’t exist yet.
  • It (re)creates environment variables, defined in the script.
  • It adds an environment reference to the project, if it doesn’t exist yet.
  • It connects environment variables to package parameters, based on the name.
    For this to work, it is important that package parameters with the same name used in different packages mean the same thing!
    For instance if you have a parameter SourceFolder used in different packages, it should be the same folder! If not, rename the parameters so that they have a unique name and – when mapped to environment variables – can both be assigned a different value, e.g. CrmSourceFolder and ErpSourceFolder.

We run the script for the TST environment.

SSIS-S01E08-300The script was executed successfully.

SSIS-S01E08-301After the script has run, a TST folder with an TST environment inside exists. You can view or edit its properties.

SSIS-S01E08-302When the Variables pane is selected on the left, the environment variables, created by the script, are shown.

SSIS-S01E08-303View package configuration.

SSIS-S01E08-304The package parameter FruitName is now connected to environment variable FruitName.

SSIS-S01E08-305When I choose Execute ..

SSIS-S01E08-306… all I have to do is check Environment, it will automaticly show .\TST

SSIS-S01E08-307After execution of the package, a row exists in the table (and one from a previous test)

SSIS-S01E08-308We can also execute the package with a SQL Agent Job ..

SSIS-S01E08-310Execute the package with a SQL Agent Job: all I have to do is check Environment, it will automaticly show .\TST ..

SSIS-S01E08-311.. and all values are connected properly ..

SSIS-S01E08-312And yes, it works!

Extending the package with a FruitDescription

I must admit, I cheated a little. I skipped the part of deploying the first version of the SSIS Package to the DEV environment.
So now I am going to deploy to a new DEV Folder and Environment for the first time, while it is will be the second version of the package. This is not a recommended practice in real projects.
Anyway, the package gets an extra parameter FruitDescription.
SSIS-S01E08-313Added FruitDescription

SSIS-S01E08-314Modify the Execute SQL Task to insert also the value for @FruitDescription [1]

SSIS-S01E08-315Modify the Execute SQL Task to insert also the value for @FruitDescription [2]

SSIS-S01E08-316Then Rebuild

SSIS-S01E08-317Then run deployment wizard again, but now create a DEV folder

SSIS-S01E08-318Script modifications [1]: use a DEV folder and a DEV environment name.

SSIS-S01E08-319Add FruitDescription to the list of environment variables and change the values: connection string must use the DEV database and fruitname and -description something different than in the TST environment.

Then run the script.
SSIS-S01E08-320The package was deployed using the deployment wizard.

SSIS-S01E08-321SSIS Environment was created using the script

SSIS-S01E08-322Environment variables where added using the script

SSIS-S01E08-323Environment variables were connected to package parameters using the script

SSIS-S01E08-324Package execution test: note the new parameter FruitDescription

SSIS-S01E08-325Test result

SSIS-S01E08-326Final result: different versions of the same SSIS Package installed side by side on the same machine. But in addition to SSIS Environments with variable values, folders inside the Integration Services Catalog are used for that!

Download the scripts here.

Conclusion / Wrap up

With the project deployment model of SSIS, introduced in SQL Server 2012, deployment and configuration of SSIS Packages has become easier.
However it is important to understand how folders and environments in the Integration Services Catalog work. I have tried to explain this, and provided a script to set up multiple environments quickly.

Why did you use those weird fruit examples?

D=Development
T=Test
😉
Read more here:
Durian
Tamarillo

(c) 2016 hansmichiels.com – Do not steal the contents – spread the link instead – thank you.

Implement logging in a Script Task (SSIS Series)

Introduction

This post is a follow-up on two older posts and will be the last one before my summer break.

Today I want to enhance the SSIS package made in post Fixing corrupt csv files in a SSIS Package (SSIS Series) by using the Plug and Play Logging Solution.

Problem

As long as a script runs well, you might not need logging inside. BUT .. when something GOES wrong, having logging in the script can reduce the time needed to troubleshoot the issue.
In the example for fixing csv files, anything can be wrong with a file, causing an error in the script. In that case it is for instance useful to know which file was being processed when the error occurred.

Prerequisite

For this to work, you have to install my Plug and Play Logging Solution, which can be downloaded from this post. It is just one SQL Script you have to run, that creates a [logdb] database with objects inside. You can also add those objects to a different database that already exists.
It is however not mandatory to rebuild the Fixing corrupt csv files in a SSIS Package example, as you can also add logging to your own Script Task.

Solution

This post is a enhancement on the Fixing corrupt csv files in a SSIS Package post, so you might need to read that post first, if you did not do that already.

Therefore I will dive directly into the changes that are needed to add logging in the Script Task.
This will be done with lots of screenprints with some comment in between.

SSIS-S01E07-169Adding a package parameter “LoggingConnectionString”
First you need to add a package parameter LoggingConnectionString that can be used in an expression of the OLE DB Connection and as input variable for the Script Task.

SSIS-S01E07-170OLE DB Connection configuration ..
Then add an OLE DB Connection for the logging database.

SSIS-S01E07-171OLE DB Connection properties ..
Use an expression to let the ConnectionString of the OLE DB Connection be equal to the value of the package parameter LoggingConnectionString.

SSIS-S01E07-173OLE DB Connection – Expression configuration [1]

SSIS-S01E07-172OLE DB Connection – Expression configuration [2]

SSIS-S01E07-174Final result
By the fx icon you can see that the connection manager uses an expression.

SSIS-S01E07-176Enable logging
Now enable logging for the package.

SSIS-S01E07-177Enable logging [2]
Add a log provider for SQL Server and let it use the logdb.oledbConnection by selecting it under the Configuration column header.

SSIS-S01E07-178Enable logging [3]
Then select all events. Filtering on what is actually logged is done by the logging solution (by the value of @MaxMessageClass, see this blog post for more details).

SSIS-S01E07-179Select the Script Task
Select the Script Task and add the following Variables to ReadOnlyVariables:

  • System::ExecutionInstanceGUID
  • System::PackageID
  • $Package::LoggingConnectionString


  • SSIS-S01E07-180The added ReadOnlyVariables in the red rectangles

    Below you will find a number of screenprints of the script task to talk you through the changes.
    You can download the C# script here.

    SSIS-S01E07-181Overview
    First make sure the Namespaces region is as shown.
    Then fold the namespace with the guid in the name, and paste the entire namespace HansMichielsCom.PlugAndPlaySSISLoggingSolution underneath it.
    This code is put in a separate namespace, so that it could also be placed in a .NET assembly that is added to the GAC (Global Assembly Cache). When you would do this, you do not have to add the code to every Script Task.
    For the example of today, we just put this namespace inside the Script Task to make things not too complicated for now.

    SSIS-S01E07-182Using the HansMichielsCom.PlugAndPlaySSISLoggingSolution namespace
    As a result, you have to tell the first guid-like namespace, that you want to call code inside the second namespace. Therefore add the using statement as shown above.

    SSIS-S01E07-183Constant used for logging
    Below you will see some printscreens with changed parts in the script.

    SSIS-S01E07-184Method GetSsisLogWriter to instantiate a SsisLogWriter object

    SSIS-S01E07-187Method Main is extended with logging.

    SSIS-S01E07-188Pass the logWriter as parameter to other methods ..

    SSIS-S01E07-189IMPORTANT: Bugfix in CheckAndReturnHeader!

    SSIS-S01E07-190IMPORTANT: Bugfix in CheckAndReturnHeader!
    (header == null) is added to cope with empty files.

    Testing, testing, one, two ..



    SSIS-S01E07-191Test preparations [1]

    SSIS-S01E07-193Test preparations [2]

    SSIS-S01E07-194Test execution

    SSIS-S01E07-195Test result: logging rows done inside the script are in the log table.

    Conclusion / Wrap up

    In this post I have demonstrated how to implement logging in SSIS Script Tasks using my Plug and Play Logging Solution.
    This type of logging gives more control on what to log and how to log it than when you implement logging using SSIS events.
    The examples given are very basic. You can use your imagination to implement logging of errors using a try .. catch block, or use all available parameters of logWriter.AddLogEntry to change the Retention Class, Message Class, and so on.

    In the summer I will take some time for study, reflection, holiday, and still .. work.
    My next post will be early September at the latest, maybe earlier.

    Download the C# script here.

    (c) 2016 hansmichiels.com – Do not steal the contents – spread the link instead – thank you.

Zeros, bloody zeros! (Data Vault Series)

Introduction

I must admit I have a weakness for British humour.
When I had to cope with leading zeros in business keys some time ago, I spontaneously came up with the title of this post, not knowing that it would serve as such.
For those who do not know, “Meetings, bloody meetings” is a British comedy training film in which John Cleese plays a main role. It was made in 1976, and a remake was made in 2012.
It tells in a funny way what can go wrong at meetings and how you can do better, check it out if you can.

DV-S01E05-meetingsMr John Cleese

But, obviously, this post is not about meetings but about zeros.

Problem

I can be short about that: leading zeros in business key values.
For instance a customer number is delivered to the data warehouse as 0001806 (instead of 1806).
This would not be a problem it is would always be delivered exactly like that. But to be honest, you can and will not know that upfront. Even this might be the case now, it might not be in the future.
When other tools are used, leading zeros could suddenly disappear (for instance when a csv file is modified using Excel), or (more rarely) the number of leading zeros could change (01806, 00001806). When this happens you have a problem, because for the data warehouse 01806, 0001806, 00001806 and 1806 are all different business keys! Even if you have only two variants, it is already a problem.
Because every business key gets a different row in the hub, and this customer now exists multiple times!

DV-S01E05-zeros(No acting here, this is how I look sometimes)

Solution

If you are familiar with Data Vault, you might already think of same-as-links to solve this.
But I think the solution should be implemented earlier, to avoid having multiple hub rows.
Simply always remove leading zeros when the sourcecolumn is (part of) a business key (either primary or foreign key) and seems a number or ID but is delivered as a string/varchar. In this way 1806 will always be 1806! And I think it is pretty impossible that 001806 and 1806 would refer to two different customers.
Unless, of course, they would come from different source systems. But in that situation, depending on leading zeros would be a bad thing to do, because when then leading zeros dropped off, satellite rows of different customers (in different source systems) could end up as connected to the same hub row! In this situation, in a non-integrated Raw Vault, it would be better to prefix the business key with the source system code and remove the leading zeros, for instance, CRM.1806 and ERP.1806.
In all cases, you can still store the original value (with leading zeros) as an ordinary attribute in a satellite for auditing reasons.

How to implement the solution

There are many ways to remove leading zeros. When I was searching for this I had two requirements:

  • No casting from and to an integer may take place, otherwise all business keys need to be numeric, so this would make the solution less reliable.
  • No function, routine or assembly may be called, this could negatively impact performance. I was looking for an “inline” conversion.

After some research I found an expression that was the same for SQL and SSIS and quite okay (T-SQL version by Robin Hames, my credits for his work), but appeared to change a string with only one or more zeros to an empty string. And because a 0 can have a meaning – and is certainly different from an empty string – this is undesired behavior, IMHO.
So I had to add some logic to it: a SELECT CASE in T-SQL and an inline condition (format {condition} ? {true part} : {false part} ) to the SSIS expression.
Furthermore I came on a different method for T-SQL as well, using the PATINDEX function, which is more compact than the other solution.
For SSIS I still use the ‘Robin Hames’ method, because the PATINDEX function is not available in SSIS Expressions.
So .. this is what it has become:

T-SQL

Remove_leading_zeros.sql

SELECT
    example.[id_with_leading_zeros],
   CASE
      WHEN LTRIM(example.[id_with_leading_zeros]) = '' THEN ''
      WHEN PATINDEX( '%[^0 ]%', example.[id_with_leading_zeros]) = 0 THEN '0'
      ELSE SUBSTRING(example.[id_with_leading_zeros], PATINDEX('%[^0 ]%', example.[id_with_leading_zeros]), LEN(example.[id_with_leading_zeros]))
   END AS [id_without_zeros_method1],

   CASE
      WHEN LTRIM(example.[id_with_leading_zeros]) = '' THEN ''
      WHEN PATINDEX( '%[^0 ]%', example.[id_with_leading_zeros]) = 0 THEN '0'
      ELSE REPLACE(REPLACE(LTRIM(REPLACE(-- Robin Hames' method
            REPLACE(LTRIM(example.[id_with_leading_zeros]), ' ', '!#!') -- replace existing spaces with a string that does not occur in the column value, I have chosen '!#!'
            , '0', ' ') -- replace '0' with ' '
            ) -- end of LTRIM to remove leading '0's that have been changed to ' 's
            , ' ', '0') -- change ' ' back to '0'
            , '!#!', ' ') -- change '!#!' back to ' '
   END AS [id_without_zeros_method2]
FROM
    (
    SELECT
        TOP 1000 RIGHT('00000000000' + CONVERT(NVARCHAR(12), object_id), 14) AS [id_with_leading_zeros]
    FROM
        master.sys.objects
    UNION
    SELECT N' 00000 '
    UNION
    SELECT N'00'
    UNION
    SELECT N' '
    UNION
    SELECT ' 0099990 A '
    UNION
    SELECT '-5550'
    ) example

SSIS Expression (can be used in Derived Column)

(LTRIM(REPLACE(id_with_leading_zeros,"0", "")) == "" && LTRIM(id_with_leading_zeros) != "") ? "0" : REPLACE(REPLACE(LTRIM(REPLACE(REPLACE(LTRIM(id_with_leading_zeros)," ","!#!"),"0"," "))," ","0"),"!#!"," ")

DV-S01E05-151In a Derived Column Transformation this looks for instance like this

Conclusion / Wrap up

In this post I have motivated why I think you should remove leading zeros from business keys when data is loaded from source systems to a data warehouse.
This post also contains different ways to remove leading zeros, two for T-SQL and one for a SSIS expression.

(c) 2016 hansmichiels.com – Do not steal the contents – spread the link instead – thank you.

SQL Server 2016 tempdb configurations benchmark (SQL Server Series)

Introduction

When you have installed SQL Server 2016 you will have noticed that the configuration for tempdb has changed.
Microsoft made a proper tempdb configuration easy to do via the GUI during installation.

Today I want to go through the settings offered and will measure performance of different setups.

What are the options?

During installation you can set the following values:
For TempDB datafiles:

Setting Default value
Number of files 4
Initial size (MB) 8
Autogrowth (MB) 64
Data directories Default data directory, in my case C:\MSSQL\Instance root\MSSQL13.MSSQLSERVER\MSSQL\Data

For TempDB logfile:

Setting Default value
Initial size (MB) 8
Autogrowth (MB) 64
Log directory Default data directory, in my case C:\MSSQL\Instance root\MSSQL13.MSSQLSERVER\MSSQL\Data

MSSQL-S01E03-tempdb-setup
Configuration screen during SQL Server 2016 installation (I changed the directories already).

So what is the best setup?

This subject is not new for SQL Server 2016, but the installation procedure now just gives a better opportunity to set it up right the first time.
There are roughly five differentiators in the setup:

  • Initial size (MB)
  • Autogrowth (MB)
  • Number of data files
  • Location of data files
  • Read/write performance of the different storage media

What are the considerations when setting these values?

Initial size (MB)

A larger initial size will accomplish two things:

  1. It will take longer before the file will have the need to grow.
  2. The file can be stored less fragmented on disk.

If you have enough disk space, the larger the better, there is no real downside on making the initial size quite large. More about this in my conclusion.

Autogrowth (MB)

With autogrowth the choice is a bit more delicated, as a small or large value will both have pros and cons, as pointed out in the schedule below.

Autogrowth value Pros Cons
Small More frequent smaller growth actions will cause less noticable hick ups TempDB datafiles will become more fragmented.
Large Tempdb datafiles will become less fragmented. Less frequent larger growth actions might cause noticable hick ups

Number of data files

In general, the recommendation is one data file per (logical) processor. You can configure more files than the number of (logical) processors, but do this only if you notice that this gives a better performance.

Location of data files

Locating the data files on different drives can have two effects:

  1. Queue length per disk drive in Windows (all scenarios, also with virtual disks residing on the same physical drive).
  2. Physical writes to the diskdrive can go in parallel (only when different physical drives).

The tests

The testcases

Creating combinations of all differentiators would just create *a lot* of testscenarios.
So I will be a bit pragmatic.
The testcases are chosen, so that they can be compared with at least one other testcase, of which one differentiator is different.

Test # # Data files Init size (MB) Auto growth (MB) # Fol- ders Scenario
0 4 8 (x4) 64 1 Default installation (data files on OS disk)
1 4 8 (x4) 64 1 Data files on dedicated drive
2 4 8 (x4) 64 4 Data files on more drives
3 4 2048 (x4) 64 4 Data files on more drives + Larger initial size
4 4 8 (x4) 256 4 Data files on more drives + Larger autogrowth
5 4 2048 (x4) 64 1 Data files on dedicated drive + Larger initial size
6 4 2048 (x4) 64 1 Data files on dedicated drive + Larger initial size + TempDB on different SSD
7 8 2048 (x8) 64 1 Data files on dedicated drive + Larger initial size + More data files
8 4 8 (x4) 64 1 Default installation (data files on OS disk) + Larger initial size

Test results

I will present you both two graphs as well as the raw data. I think especially the first graph gives some useful insights, as it compares all testcases with testcase 0 (default installation), by using an index 100 for testcase 0. If a different testcase performs better, the index will be less than 100. Also, in this way, the average result of both testsets (A and B) can be shown.
I left testcase 6 out of the graph, later I will tell why I did this.
MSSQL-S01E03-testresult-indexIndex-based comparison of testresults

MSSQL-S01E03-testresult-secondsExecution times of testcases for testset A and B

Test results figures

Test # Scenario Testresult testset A (ms) Testresult testset B (ms)
0 Default installation 30359 232795
1 Data files on dedicated drive 28344 261467
2 Data files on more drives 22514 248484
3 Data files on more drives + Larger initial size 28040 210223
4 Data files on more drives + Larger autogrowth 24173 200632
5 Data files on dedicated drive + Larger initial size 19608 192587
6 Data files on dedicated drive + Larger initial size + TempDB on different SSD 123289 387646
7 Data files on dedicated drive + Larger initial size + More data files 19692 211501
8 Default installation + Larger initial size 20149 361844

Conclusion

Data files on dedicated drive + Larger initial size (test 5) gives the best performance for both testsets. Putting tempdb files on multiple disks can not win from this, however the differences are relatively small.

Surprisingly performance dramaticly decreases when the virtual disk on which tempdb is stored is moved to a different (faster) SSD (this is test 6)! I have no explanation for this. Therefore I will leave test 6 out of the analysis, by doing this physical storage is the same for all testcases and changes in performance are only caused by the other differentiators.

My most important conclusion is that the default values offered during installation are quite okay, except for two things:

The Initial size (MB)

You should really set the initial size to a big number (gigabytes if possible). Note that if you leave the number of files 4 (which I recommend), the space used is four times the size that you use for Initial size.
If you have a dedicated disk for tempdb (which is recommended), you might want to set the initial size of the datafile(s) to fill up the disk. You should then disable Autogrowth.
Also on a production server, check the size of tempdb files after a week of uptime (also the SQL instance may not have been restarted). Say the tempdb size of all files together (can also be one file) is 22GB. You could then set your initial size to 28GB in total, so 7GB per file if you configured 4 files. This would mean that during normal operation, the tempdb never has to grow.

Dedicated disk

It is a common best practive to put tempdb on a seperate disk so that the Windows Disk Queue does not have to compete with other read/write actions to that disk.

Other settings are somewhat less important but … think for yourself (my disclaimer)

In my opinion both putting tempdb files on multiple different disks and setting Autogrowth to a different value do not offer much performance benefit.
The Autogrowth value is even irrelevant if the initial size is so large that temdb never has to grow during normal operation.
But every server is different! Test and think for yourself.
My testresults are presented “AS IS” in the hope they will be useful as a starting point for further own analysis. I am not responsible for any disadvantage you might have by using my test results for your tempdb setup.

What if SQL Server is already installed and/or it is an older version?

No worries, you still can change the tempdb configuration.
You can move and add files using SQL scripts.
Initial size and Autogrowth you can also set with Management Studio.
MSSQL-S01E03-166Go to properties of tempdb

MSSQL-S01E03-167Here you can change Initial size and Autogrowth

By the way, I noticed that SQL Server does not delete moved tempdb files. So if you move tempdb files, you should delete the files on the old location, so that they do not use disk space and do not cause confusion for administrators.

Test justification

  • All tests are done on a Hyper V Virtual Machine running Windows Server 2012 R2 and SQL Server 2016 Developer Edition.
  • Change of physical storage type is done by moving vhdx files to different physical disks, and then change the file location in the settings of the Hyper V Virtual Machine (while it is not running, of course).
  • Each test is done on an initial copy of the Virtual Machine, therewith the startsituation is equal for every test (e.g. no tempdb growth by previous tests).
  • Tests where done with two sets of scripts (set A and B) that inserted, updated and deleted rows in temporary tables in tempdb. Set A consists of three sql scripts, and set B of five.
  • All scripts were developed, so that all tempdb inserts started at the same time (maybe milliseconds difference in starttime only). This to ensure that every testrun was executed in the same way.
  • All scripts were executed in SQL Server Management Studio.
  • The first file of tempdb is stored on its own (virtual) disk except for testcase 0, where tempdb files are on the C-drive where Windows is installed.
  • When spreading files over multiple virtual disks, the disks created for database files, log files, and backup files are used. This was done for pragmatic reasons during the test. This was defendable, because no other activity on databases and log files was going on during the test. It is however not recommended to spread tempdb like that in production environment, unless every disk is dedicated to be used for tempdb only.
  • The host machine had the same applications open during every test, being a Windows Explorer and Microsoft Excel. No other applications that used CPU and could influence performance of the virtual machines were running.
  • The tempdb log file was placed on a seperate disk (the same one) for each test. Therewith it was no subject of performance investigation. This would have at least doubled the number of testcases.

Wrap up

Microsoft did a good job in improving the setup options for tempdb in the installation procedure.
It is a bit of a pity that they did not go the last mile, giving the tempdb data files a more realistic Initial size.
Luckily this is easy to change during installation, and it is a quick win for anyone installing SQL Server 2016.

Download all the used scripts here.

(c) 2016 hansmichiels.com – Do not steal the contents – spread the link instead – thank you.

Zip sourcefiles in a SSIS Package (SSIS Series)

Introduction

As I am in a (un)zipping mood, this week the opposite problem of last week.
The example is not so ready-to-use as in my previous post, as it depends on choices made in your development process of other SSIS packages.
Still I hope you can adjust this example to match your needs.

Problem

If you have imported and archived (csv- or xml) files, these files can take up a lot of disk space. The archive folder can also become overwhelmingly full with ten thousands of files, which can make it slowly accessible and difficult to manage.

Solution

To tackle this problem I have made a fully configurable SSIS package, that can be executed from a different package with an Execute Package Task.
The package is made using SSIS 2014.

The zipping itself is done with 7za.exe, for more information on how to get and install this free program see my previous blog post.

Scenario

When I used this concept, a different SSIS package moved files to a subfolder of an archive folder after they were processed, but their name was prefixed with the LoadCycleID with leading zeros, for instance 001345_customers.xml.

The archive folder and subfolder were separated, because the archive folder was an environment variable and the same for all packages, while the subfolder was different for different SSIS packages, and contained the name of the source system of the files.
If you only have one archive folder, just change the example to use that instead of the concatenation of archive folder and subfolder.

Overview

Here is a picture of the end result:

SSIS-S01E06-124
The Control Flow

Detailed steps

Parameters and variables

First you need to add a few parameters and variables to the package.

Parameters:

Name Data type Required Description
_7zaPathAndFilename String False The full path and filename of 7za.exe.
ArchiveFolder String True The root folder for archiving processed files.
ArchiveSubFolder String True The subfolder of the ArchiveFolder in which the files to be zipped exist, the subfolder can for instance be named after a source system like SAP or CRM.
LoadCycleID Integer True The ID of the LoadCycle that is processing these files. In your situation it could also be called ExecutionID, SessionID and so on, the name is not really important.

SSIS-S01E06-125
Package parameters

Variables:

However not all variables might be necessary, I like the concept of being able to configure all values in one place.
The data type of all variables is String, except LoadCycleID which is an Integer.
Every variable uses an expression to set its value.
Here is a detailed list of all variables and the expression used for each one.

Name Expression
ArchiveFilesToDeleteInFolder @[$Package::ArchiveFolder] + "\\" + @[$Package::ArchiveSubFolder]
ArchiveFilesToDeletePattern @[User::LoadCycleIDString] + "*"
ArchiveFilesToZipPattern @[$Package::ArchiveFolder] + "\\" + @[$Package::ArchiveSubFolder] + "\\" + @[User::LoadCycleIDString] + "*.*"
ArchiveFilesWorkingDirectory @[$Package::ArchiveFolder]
ArchiveZipFile @[User::LoadCycleIDString] + "_" + @[$Package::ArchiveSubFolder] + ".zip"
ArchiveZipPathFile_Final @[$Package::ArchiveFolder] + "\\" +@[$Package::ArchiveSubFolder] + "\\" + @[User::LoadCycleIDString] + "_" + @[$Package::ArchiveSubFolder] + ".zip"
ArchiveZipPathFile_Temp @[$Package::ArchiveFolder] + "\\" + @[User::LoadCycleIDString] + "_" + @[$Package::ArchiveSubFolder] + ".zip"
LoadCycleID @[$Package::LoadCycleID]
LoadCycleIDString RIGHT("00000" + (DT_STR, 10, 1252) @[$Package::LoadCycleID], 6)

Important note: if you expect that the value for LoadCycleID will grow beyond 999999 over time, you should add more prefixed zero’s to LoadCycleIDString and increase the number 6, the second parameter of the RIGHT function.

SSIS-S01E06-126
SSIS variables of the package

The Control Flow

The control flow is quite straight forward, only three tasks and we’re done.

EPR Zip files in archive subfolder

This task creates one zipfile containing multiple other files in the ArchiveSubFolder.
From the SSIS Toolbox drag an Execute Process Task to the Control Flow panel and open the Execute Process Task Editor.
On the General and Process panes you do not have to change anything.

SSIS-S01E06-128
SSIS-S01E06-129
No changes needed here

On the Expressions pane, press the ellipsis next to Expressions. In the Property Expressions Editor add the following:

Property Expression
Arguments "a "+ "\"" + @[User::ArchiveZipFile] + "\" " + "\"" + @[User::ArchiveFilesToZipPattern] + "\" -mx=9"
Executable @[$Package::_7zaPathAndFilename]
WorkingDirectory @[User::ArchiveFilesWorkingDirectory]

Please note that in the Arguments expression -mx=9 is used. This is the compression level, level 9 means “ultra” compression. This level gives the smallest zip files but is also the slowest compression level. You can for instance also use 5 for “normal” or 3 for “fast” compression, if this is what you want.

SSIS-S01E06-127
Expressions for the Execute Process Task

SCR Delete uncompressed files

This task deletes the files that were added to the zip file in the previous task.

In the Script Task Editor, add both variables User::ArchiveFilesToDeleteInFolder and User::ArchiveFilesToDeletePattern to ReadOnlyVariables, then press Edit Script ….

Then in the namespaces region add one namespace, as indicated below.

Namespaces region
#region Namespaces
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;
// Added namespace:
using System.IO;
#endregion

Adjust public void main() as follows:

public void Main

public void Main()
{
  // Copy SSIS variable values to C# string variables.
  string archiveFilesToDeleteInFolder = Dts.Variables["User::ArchiveFilesToDeleteInFolder"].Value.ToString();
  string archiveFilesToDeletePattern = Dts.Variables["User::ArchiveFilesToDeletePattern"].Value.ToString();

  // Delete each file that just has been added to the zip archive.
  foreach (string f in Directory.EnumerateFiles(archiveFilesToDeleteInFolder, archiveFilesToDeletePattern))
  {
    File.Delete(f);
  }

 Dts.TaskResult = (int)ScriptResults.Success;
}


WARNING: Make sure that no files that meet the file name pattern of files to add to the zip file are added to the ArchiveSubFolder while or after the zipfile is (being) created. Otherwise files that are NOT zipped might get deleted!

Download the script here.

FST Move zip file

The File System Task is to move the zip file to the ArchiveSubFolder, because the zipfile is initially created in a temporary folder and not in the ArchiveSubFolder.
This is to prevent that it would be deleted by coincidence, if you use a value for ArchiveFilesToDeletePattern that ends with .*
Configure the File System Task as shown in the picture below:

SSIS-S01E06-130
File System Task Configuration

Download the script here.

Let the car drive

Time to test the package and tell the result in pictures (if you don’t mind).

SSIS-S01E06-131
Before: files of LoadCycleID 001346 need to be zipped ..

SSIS-S01E06-132
Execute the package ..

SSIS-S01E06-133
All csv files replaced with one zip file ..

SSIS-S01E06-134
.. that contains the csv files ..

Conclusion / Wrap up

In this blog post you could read the steps needed to build a SSIS package that creates a zip file containing a number of files in an archive(sub)folder.
This makes the archive smaller and easier to manage.
However beware that you only delete files that were added to the zip file. For more details see the warning above.

(c) 2016 hansmichiels.com – Do not steal the contents – spread the link instead – thank you.

Unzip specific files from lots of zipped archives (SSIS Series)

Introduction

This week I want to discuss the trouble you may have when searching hundreds of zip files for specific files inside them.
Recognizable?
At least I faced this problem some time ago, and solved it using SSIS and the commandline version of 7-zip, named 7za.exe.
The choice for SSIS was purely based on convenience, as you could tackle this problem also using Powershell, C# or almost any other scripting language.
But I thought SSIS is just a neat and quick way to do this.

Problem

Finding specific files in huge amounts of zip files.

Solution

To tackle this problem I have made a fully configurable SSIS package, that can be executed with different parameters.
The main task of the package is to extract specific files (based on a file name pattern) from zipfiles in a folder and – optionally – subfolders.
The unzipping itself is done with 7za.exe, which is the commandline version of 7z, the great open source zip program, that exists already for several years and is used by millions of users.

By the way, the package is made using SSIS 2014.

Preparations: download and install 7za

It is important to download this package from the official 7z web site, just to avoid that you would get a copy from some deceitful web site infected with malware.
To go to the official 7-zip download page click on this link:
7-zip download page
Most important is the 7-Zip Extra: standalone console version.
For the example I have used version 16.02 (2016-05-21) for Windows, but it should also work with version 9.20 (2010-11-18), then called 7-Zip Command Line Version.

If you download the newest version, you also need software to unzip the 7z160-extra.7z file, for instance the regular 32-bit or 64-bit version of 7-zip, which can also be downloaded on the same page.

SSIS-S01E05-100
Download page of 7-zip

I assume you will use the newest version 7z160-extra.7z
Once unzipped, just take the time to read License.txt, readme.txt and history.txt.
Then copy the three files 7zxa.dll, 7za.dll and 7za.exe to a folder on the Windows computer where SSIS runs, usually a Windows Server.
If you run a x64 version of Windows, you can take the x64 version of the 7za files, which are in a subfolder “x64” of the unzipped files.

SSIS-S01E05-101
Unzipped 7za files ..

I have created a subfolder “7za” in the “C:\Program files” folder of my virtual Windows Server, and copied the x64-version of the three mentioned files to it, but you can put them in any folder you like.
When you have done this, the real fun can start ..

Overview

Here is a picture of the end result:

SSIS-S01E05-102
The Control Flow

Detailed steps

Parameters and variables

First you need to add a few parameters and variables to the package.

Parameters:

Name Data type Required Description
_7zaPathAndFilename String False The full path and filename of 7za.exe.
FilenamePattern String True The filename pattern of files you want to unzip.
SourceRootFolder String True The main folder where you want to search for zip files.
TraverseSubfolders Boolean False Indication if you want to look only in SourceRootFolder, or also in its subfolders.
UnzipRootFolder String True The folder to which files are unzipped. Important: this may NOT be a subfolder of SourceRootFolder.
ZipFilenamePattern String True The filename pattern of the zipfiles from which you want to unzip files.

SSIS-S01E05-103
Package parameters

Variables:

Name Data type Description
FLC_UnzipFolder String The subfolder where the files will be unzipped. This is a subfolder of the UnzipRootFolder, with the same relative subpath as the zipfile has in SourceRootFolder. Additionally, the zipfilename has been replaced with a subfolder that is the same as the zipfilename without the file extension.
FLC_ZipPathAndFileName String The zipfile full path, that is target of investigation if any files with FilenamePattern are zipped inside.

SSIS-S01E05-104
SSIS variables needed to do the job

Developing the package

FELC Unzip files

This foreach loop container finds all zip files.
In the Foreach Loop Editor, select Collection in the left pane.
As we want to make the solution generic by using all those parameters mentioned above, we have to add a few expressions.

SSIS-S01E05-105
Foreach Loop container Collection configuration

Add the following three expressions (in my opinion the editor is not very userfriendly, the window is too small, but it can be done):

Property Expression
Directory @[$Package::SourceRootFolder]
FileSpec @[$Package::ZipFilenamePattern]
Recurse @[$Package::TraverseSubfolders]

Please note that these expressions will set the values for Folder, Files and the Traverse subfolders checkbox in the Enumerator configuration part of the screen (inside the purple square in the picture above).
If you close and reopen the Foreach Loop Editor, you will notice that the configuration fields now use the values from the corresponding parameters, except the value Fully qualified for Retrieve file name, which is hardcoded.

SSIS-S01E05-106
Example of how to set an expression..

Then, still in the Foreach Loop Editor, select Variable Mappings in the left pane and map variable User::FLC_ZipPathAndFileName to Index 0, as shown in the picture below.

SSIS-S01E05-107
Variable Mappings

EXPR Set Unzip folder

The next nifty thing that needs to be done, is creating a folder to store the unzipped files. As we do not know if zipped files with the same name exist in multiple zip files, the safest thing to do is to create a directory structure identical to the directory structure of the zip files.
Additionally, the basename of the zip file (so without file extension) is used as an extra subfolder. In this way we are sure that unzipped files with the same name can be stored without overwriting other ones.

To do this, from the SSIS Toolbox drag an Expression Task inside the Foreach Loop container.
Open the Expression Task and paste this expression into the Expression textbox.

@[User::FLC_UnzipFolder] =
REPLACE(LEFT(@[User::FLC_ZipPathAndFileName],
FINDSTRING(@[User::FLC_ZipPathAndFileName], "."
+ TOKEN(@[User::FLC_ZipPathAndFileName], ".",
TOKENCOUNT(@[User::FLC_ZipPathAndFileName], ".")),1) -1),
@[$Package::SourceRootFolder], @[$Package::UnzipRootFolder])

Then press the Evaluate Expression button to check that the expression is valid.

SSIS-S01E05-108
Configuring the Expression Task

FSYS Create Unzip folder

In the previous step we have composed the full path to unzip files into, but this folder structure might not exist yet.
Therefore we use A File System Task named FSYS Create Unzip folder to create the folder structure.
The good news is that the File System Task can create multiple (sub)folders at once.
Configure it as shown in the picture.
SSIS-S01E05-109
Configuration of FSYS Create Unzip folder

EPR Unzip Selected Files

Below you see a few screenshots for the configuration of the Execute Process Task named EPR Unzip Selected Files.
The funny thing is, that in the Process pane, you do not have to change anything. Executable and Arguments are set using expressions (coming next) and all the other properties can keep their default values.

SSIS-S01E05-110
Process configuration.

SSIS-S01E05-111
Setting Executable and Arguments using expressions
Arguments (copy-friendly):

"-y x \"" + @[User::FLC_ZipPathAndFileName] + "\" -o" + @[User::FLC_UnzipFolder] + " " + @[$Package::FilenamePattern] + " -r"

SCR Remove Empty folders

There is one problem with the steps done so far. If a zipfile does not contain a file that matches the FilenamePattern, then the folderstructure is made for nothing.
However you could just leave these empty folders, I think the solution is a bit better if these empty folders are deleted.
To do this, from the SSIS Toolbox drag an Script Task inside the Foreach Loop container.

SSIS-S01E05-112
In the Script Task Editor, add both variables User::FLC_UnzipFolder and $Package::UnzipRootFolder to ReadOnlyVariables

In the Script Task Editor window, press Edit Script ….

Please note that you can download the C# script here..

Then in the namespaces region add two namespaces, as indicated below.

Namespaces region
#region Namespaces
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;
// Added namespaces:
using System.IO;
using System.Linq;
#endregion

Adjust public void main() as follows:

public void Main

public void Main()
{
  // Copy SSIS variable value of FLC_UnzipFolder to C# string variable.
  string unzipFolder = Dts.Variables["User::FLC_UnzipFolder"].Value.ToString();

  // Source: http://stackoverflow.com/questions/22520761/delete-all-empty-directories-in-specified-path
  // My credits go to author "Jon" from Berlin.
  // Because not all zipfiles might contain files with the FilenamePattern we are looking for, an empty path might have been created.
  // This script deletes these empty folders, so that in the end only folders will exist with unzipped files or subfolders in it.
  var di = new DirectoryInfo(unzipFolder);
  var root = Dts.Variables["$Package::UnzipRootFolder"].Value.ToString(); // no trailing slash!
  while (di.FullName != root
          && !di.EnumerateFiles().Any()
          && !di.EnumerateDirectories().Any())
  {
    di.Delete();
    di = di.Parent;
  }

 Dts.TaskResult = (int)ScriptResults.Success;
}

Download the script here.

The proof of the pudding is in the eating

Time to do some testing.
I have created a few folders with subfolders and subsubfolders. You can see the folder structure in the last picture below.
As you might have noticed the default values for my parameters already prepared for the testcase: I want to unzip only customer files from zip files with 201605 in the name.

SSIS-S01E05-113
Executing the package in Debug mode..

Below you can see the result. Only customer files are unzipped. The subfolders under the Archive folder that contain no customer zip files (but only currency files), do not exist under the Unzipped folder.

SSIS-S01E05-114
Folder structure of Archive and Unzipped rootfolders..

SSIS-S01E05-115
Unzipped “Customer” files ..

Important notes for using with huge amounts of zip files

From my experience I must also warn you. If you run this package on a folder containing gigabytes of zipfiles, there are two things to be aware of:

  1. The execution of the package can potentially take a long time, several hours is not unusual.
  2. Make sure you have enough disk space. Suppose you have the following hypothetical scenario: zip files: 100GB, average zip compression rate: 20%, percentage of files to be unzipped: 30%. In this case you are going to unzip files that have a zipped size of 30% of 100GB, is 30GB. Once unzipped, their size will be 30GB * (1/0,20) = 150GB. You need to have that disk space available on the disk where the Unzipped folder resides.

Conclusion / Wrap up

With this SSIS package you can selectively unzip files from archived zip files.
When using it, make sure you have enough disk space, and be aware that the package can run for several hours, depending on the number and (average) size of the zip files to process.

(c) 2016 hansmichiels.com – Do not steal the contents – spread the link instead – thank you.

11 Things you should know about Biml – an overview (Datawarehouse Series)

Intended audience and purpose of this post

This post gives an overview of a number products and features related to Biml on a high level.
Therewith it is a starting point for further learning and investigating.

Introduction


Biml means Business Intelligence Markup Language.
What else do you need to know about it, if you are a professional worker in the business intelligence and data warehouse domain?

SSIS-S01E05-biml-logo

11 Things you should know about Biml – an overview

  1. Who invented Biml?
  2. What is Biml?
  3. What are BimlExpress and BIDS Helper?
  4. What is BimlScript?
  5. What is BimlFlex?
  6. What are Mist and BimlStudio?
  7. What are BimlBundles?
  8. What are BimlTransformers?
  9. Is the free part of Biml a way to get me addicted to it?
  10. How do I know that Biml is a proven technology that is going to stay?
  11. Where can I find more information on the internet?

1. Who invented Biml?

Biml and all tools around it is created by Varigence, a company with CEO Scott Currie, a former Microsoft Employee.

Back to top

2. What is Biml?

Biml – the markup language – is an XML dialect to describe Microsoft SSIS Packages and SSAS Cubes.
Everything you can do with SSIS, you can do with Biml (quote of Scott Currie, done during the Biml usergroup NL kickoff on May 7, 2016).
The goal of Biml is to improve productivity of Data warehouse development.
But the biml ‘ecosystem’ consists of more than only the Markup Language. Other components are:

  • Biml Script
  • BimlExpress and BIDS Helper
  • BimlFlex
  • BimlStudio (also/formerly known as Mist)
  • BimlBundles
  • BimlTransformers

Back to top

3. What are BimlExpress and BIDS Helper?

BimlExpress and BIDS Helper both are Add-ins for Business Intelligence Development Studio (actually the Visual Studio version with the project types needed for Business Intelligence).
BimlExpress is the successor of BIDS Helper and one of it biggest advantages is syntax highlighting and intellisense.
It is really worth checking them out, because they can improve your productivity, especially when you use metadata and design patterns for SSIS packages.
You can download and install BimlExpress. You have to register but can use the product at no cost.

Back to top

4. What is BimlScript?

BimlScript is what makes Biml (the markup language) even more interesting. With BimlScript you can write .NET Code (C#) and do all kind of interesting stuff, like accessing your metadata, using for each loops, and generate SSIS packages using a design pattern.
I have generated packages to load dimensions of a star schema, and I intend to blog about it later this year in more detail.

Back to top

5. What is BimlFlex?

BimlFlex is a datawarehouse automation framework of Varigence. It can greatly improve productivity when building data warehouses.
It uses Predefined patterns for data warehouse automation and Extension Points that you can use to change the standard automation patterns on certain ‘change points’, or can build exceptions for the pattern of this one very special table.
BimlFlex is part of BimlStudio and therefore a commercial (paid) product.

Back to top

6. What are Mist and BimlStudio?

Mist and BimlStudio are the same tool. BimlStudio is just the new name, due to strange associations with the “Mist” name in different languages, for example in German (What does the German word “Mist” mean?). This is why they changed the name.
Also the names for all products and techniques associated with Biml now start with Biml, which makes them more recognizable.
BimlStudio has a number of extra features when compared with BimlExpress.
BimlStudio is a paid product, you can buy a perpetual license or a monthly subscription.

SSIS-S01E05-bimlstudio-logo

Back to top

7. What are BimlBundles?

BimlBundles are part of BimlStudio and protect your source code while your customer still can make changes at framework extension points.
Koos van Strien did a very good job at explaining BimlBundles in more detail here.

Back to top

8. What are BimlTransformers?

With BimlTransformers you can define and store changes on Biml code, instead of changing the Biml code itself. If you have worked with Visual Studio/.NET before, you can compare it with changes you make for your config file for different environments.
Koos van Strien did a very good job at explaining BimlTransformers in more detail here.

Back to top

9. Is the free part of Biml a way to get me addicted to it?

Nope. Although Varigence works sales model is a Fremium model (a free product that can be extended/enhanced with paid products), the promise of Scott Currie, done during the Biml usergroup NL kickoff on May 7, 2016, is that all features that are released for free will stay for free. You can count on that.

Back to top

10. How do I know that Biml is a proven technology that is going to stay?

Biml already exists for about 8 years and has proven itself already. Varigence is a global player and steady company that does not suddenly disappear. I am not concerned if Biml will exist 10 years from now, and you should not have to be either. Even in the utterworst case, you would still have the SSIS packages that Biml has generated for you, and you can edit/maintain them without Biml.

Back to top

11. Where can I find more information on the internet?

To read more about Biml, here are a few interesting web sites, blog posts and Linked In Usergroups:

Back to top

Conclusion / Wrap up

Biml is here to stay and adds a lot of value, even if you use only the free parts of it. Varigence can only be praised for that.
In this blog post I shortly described a number of products and techniques that are part of the Biml ‘ecosystem’.
I do not have a commercial interest in promoting Biml, I am just a Business Intelligence Consultant who likes Biml a lot.

(c) 2016 hansmichiels.com – Do not steal the contents – spread the link instead – thank you.

Adding custom header and footer to csv files (SSIS Series)

Introduction

This article wasn’t planned. SSIS again? No I wanted to write a post about something else.
Nevertheless I got this question on one of my previous posts and thought, I might as well create a blog post about it.

Problem

“Hi Hans,
I have created my ssis package for flat file destination. My problem is that I want to add header and footer in my flat file destination to send it to FTP location format will be

0|MMDDYYYY|HHMMSS
1|Row1
1|Row2
1|Row3
1|ROW4
9|6

The last line contains no of row count including header and trailer
Please let me know how to do that.. I am completely stuck with that and no good luck.. “

So to summarize, both the header and the footer line of the csv file need to contain dynamic content.
Also each row starts with a ‘row type’: 0 for the header, 1 for every detail row, and 9 for the footer.

Solution

As an example I just use a simple Data Flow Task with an OLEDB Source connected to table DimCurrency from the AdventureWorksDW database.
The data then flows to a Script Component used as a Destination.
Here are a few printscreens to demonstrate what I mean. The Script Component will create the file.
The package is made using SSIS 2014.

Developing the package

SSIS-S01E04-099
The demo package just contains one Data Flow Task

SSIS-S01E04-205
Design of the Data Flow Task

SSIS-S01E04-101
OLEDB Source: DimCurrency from the AdventureWorksDW database

SSIS-S01E04-102
OLEDB Source: select all three columns

SSIS-S01E04-208
When you drag the Script Component to the Data Flow, you will get this dialog. Use the Script Component as a Destination

SSIS-S01E04-207
Variables that the package contains.

SSIS-S01E04-209
In the Script Transformation Editor, add both variables User::HeaderTimestamp and User::TargetCsvPathAndFileName to ReadOnlyVariables

Now comes the most interesting part! We use the endless possibilities of C# to fix our problem.
In the Script Transformation Editor window, press Edit Script ….

Please note that you can download the entire script here, so you do not have to copy the (sometimes malformatted) code from this webpage..

Then in the namespaces region add two namespaces, as indicated below.

Namespaces region
#region Namespaces
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;
// Added namespaces:
using System.IO;
using System.Text;
#endregion

Above public override void PreExecute() add the copyright notice and the following two regions: Configuration and Private variables.
Please note that these regions do not exist yet, so you have to copy the code including the #region and #endregion lines.

New regions
  /*
  (c) Copyright 2016 - hansmichiels.com
 
  This program is free software: you can redistribute it and/or modify
  it under the terms of the GNU General Public License as published by
  the Free Software Foundation, either version 3 of the License, or
  (at your option) any later version.
 
  This program is distributed in the hope that it will be useful,
  but WITHOUT ANY WARRANTY; without even the implied warranty of
  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
  GNU General Public License for more details.
 
  You should have received a copy of the GNU General Public License
  along with this program. If not, see http://www.gnu.org/licenses/.
  */

 
  #region ***** Configuration *****

  // Here you can change the column delimiter and the text qualifier easily.
  private const string _delimiter = “|”;
  private const string _textQualifier = “”;
  // To set the text qualifier to ” use:
  // private const string _textQualifier = “\””; // I recommend to use a text qualifier.
  
  // Every x rows, write the buffer to the file, and clear the buffer.
  // This is faster than writing every individual line to the file.
  // The higher the value of _bufferRows is, the more memory will be used and the less write actions are done to the csv file.
  // A higher value is a good idea if you have enough memory in your server.
  private const int _bufferRows = 100000;

  #endregion ***** Configuration *****

  #region Private variables

  // Variable to hold the path and filename
  private string _fileName;
  // Variable to count the rows
  private int _rowCount;
  // StringBuilder to hold a buffer of text that still needs to be written to the csv file.
  private StringBuilder _fileContentsBuffer;

  #endregion Private variables

PreExecute() is called before the data starts to flow through. Here we do some preparation and compose the headerline of the file.

PreExecute()

 &nbs p;/// <summary>
  /// This method is called once, before rows begin to be processed in the data flow.
  /// </summary>
  public override void PreExecute()
  {
    base.PreExecute();

    // Initialize variables
    _rowCount = 0;
    _fileContentsBuffer = new StringBuilder();
    _fileName = Variables.TargetCsvPathAndFileName;
    DateTime headerTimestamp = Variables.HeaderTimestamp;

    // Append the new line to the buffer.
    // This will be written to file with the first detail rows.
    // If there are no detail rows, no file will be created.
    _fileContentsBuffer.AppendFormat(“{1}0{1}{0}{1}{2:ddMMyyyy}{1}{0}{1}{2:HHmmss}{1}\r\n”, _delimiter, _textQualifier, headerTimestamp);
    }

PostExecute() is called after the data has flowed through. Here we append the last lines in the _fileContentsBuffer variable and the footerline to the file.

PostExecute()
  /// <summary>
  /// This method is called after all the rows have passed through this component.
  ///
  /// You can delete this method if you don't need to do anything here.
  /// </summary>
  public override void PostExecute()
  {
    base.PostExecute();
    // If no detail rows, bail out. No file will be created when there are no detail rows.
    if (_rowCount == 0) return;

    using (StreamWriter sw = File.AppendText(_fileName))
    {
      // Append a footer line with the rowcount, example value: 9|105
      // The rowcount is a count of the detail rows, but the footer count
      // needs to include header and footer, so we add 2 to the value of _rowCount.
      _fileContentsBuffer.AppendFormat(“{1}9{1}{0}{1}{2}{1}”, _delimiter, _textQualifier, _rowCount + 2);
      sw.Write(_fileContentsBuffer.ToString());
      _fileContentsBuffer.Clear();
    }
  }

Input0_ProcessInputRow(Input0Buffer Row) is the place where all the data processing takes place. As this script is the Destination for the data, we need to do something with those data. A Flat File Destination is not there and will not do this for us.
So here we append every detail row to the _fileContentsBuffer variable. Once in a while, we write the buffer to the file, and clear the buffer again.

Input0_ProcessInputRow

  /// <summary>
  /// This method is called once for every row that passes through the component from Input0.
  /// </summary>
  /// <param name=”Row”>The row that is currently passing through the component</param>
  public override void Input0_ProcessInputRow(Input0Buffer Row)
  {
    // Increase the rowcounter
    _rowCount++;

    // Append the new line to the buffer.
    // The format needs to be like “{1}[middle part]{1}\r\n”, so that the row starts and ends with the text qualifier.
    // Replace [middle part] with {1}{0}{1}{x} for every column that you want to append to the file, where x is 2 for column 1, 3 for column 2 and so on.
    _fileContentsBuffer.AppendFormat(“{1}1{1}{0}{1}{2}{1}{0}{1}{3}{1}{0}{1}{4}{1}\r\n”, _delimiter, _textQualifier,
      SafeValue(Row.CurrencyKey), // First column, {2} in the format
      SafeValue(Row.CurrencyAlternateKey), // Second column, {3} in the format
      SafeValue(Row.CurrencyName) // Third column, {3} in the format
             // etc.
      );
    
    // Every x rows, write the buffer to the file, and clear the buffer.
    if (_rowCount % _bufferRows == 0)
    {
      using (StreamWriter sw = File.AppendText(_fileName))
      {
      sw.Write(_fileContentsBuffer.ToString());
      _fileContentsBuffer.Clear();
      }
    }
  }

Multiple overloaded versions of the SafeValue method (for different datatypes like string, int, DateTime, you can add more when needed) add escape characters to the column values when needed.

SafeValue method and overloads
  /// <summary>
  /// Escapes a double quote delimiter in a column value by doubling it.
  /// </summary>
  /// <param name=”value”></param>
  /// <returns></returns>
  private string SafeValue(string value)
  {
    // If _textQualifier is null or empty, just return the value.
    if (string.IsNullOrEmpty(_textQualifier)) return value;

    if (value.Contains(_textQualifier))
    {
    // If the value contains one or more text qualifiers,
    // escape them by replacing them with two text qualifiers.
    return value.Replace(_textQualifier, _textQualifier + _textQualifier);
    }
    return value;
  }

  /// <summary>
  /// Overloaded method for DateTime value (meant as an example)
  /// For other data types, create new overloaded methods,
  /// and in that overloaded method convert the value to string,
  /// before calling the 'SafeValue' method with a string parameter.
  /// </summary>
  /// <param name=”value”></param>
  /// <returns></returns>
  private string SafeValue(DateTime value)
  {
    return SafeValue(value.ToString(“yyyy-MM-dd HH:mm:ss.fff”));
  }

  /// <summary>
  /// Overloaded method for int value
  /// </summary>
  /// <param name=”value”></param>
  /// <returns></returns>
  private string SafeValue(int value)
  {
    return SafeValue(value.ToString());
  }
}

When the script is ready, Choose Rebuild from the BUILD menu in the VstaProjects – Microsoft Visual Studio Window. Then close that Window. You return to the Script Transformation Editor, close it by pressing OK.

Testing the package

I simply run the package in debug mode. 105 currency rows are added to the csv file.
Including the header and footer, the csv file contains 107 rows, and this is the number that is written to the footer line.

SSIS-S01E04-110
Running the package in debug mode

SSIS-S01E04-111
First and last part of the created csv file.

Please note that to change the delimiter and text qualifier you only have to make two changes in the ***** Configuration ***** region.

Configuration
  #region ***** Configuration *****

  // Here you can change the column delimiter and the text qualifier easily.
  private const string _delimiter = “;”;
  private const string _textQualifier = “\””;
. . .

  #endregion ***** Configuration *****

By this change the file will look as follows:
SSIS-S01E04-112
First and last part of the created csv file with a different delimiter and text qualifier.

I also did a test with a source query that returned over 1.1 million rows, and the csv file of 25 MB was created in just 3 seconds (on a core i7 laptop with SSD).
So my conclusion is that the performance of this solution is very satisfying.

Download the C# script for the script component here.

Conclusion / Wrap up

In this post I have demonstrated how to use SSIS 2014 to create a csv file with a dynamic header (with a timestamp) and footer line (including a rowcount) using a Script Component as a Destination in a Data Flow Task.

(Promotional)

Big Data & Analytics insight 2016: save the date for an insightful conference
(as a speaker I was asked to do a bit of promotion ..)

(c) 2016 hansmichiels.com – Do not steal the contents – spread the link instead – thank you.

Big Data & Analytics insight 2016: save the date for an insightful conference

Dear reader,

Is the data warehouse at the end of its lifecycle or does the existing infrastructure gain new value from integrating with new Big Data technologies? If so, how do you go about it? Do you need inspiration for your analytics strategy? Then there’s plenty to go by at Big Data & Analytics Insight 2016, on June 7th in Utrecht, The Netherlands.

There will be input from Big Data domain experts and highly interesting presentations like the ones on Data Vault 2.0 & Big Data and Hands-on Predictive Text Analytics that will inspire you. Whether you are deep into the technical or the business aspects of Big Data & Analytics, this conference is a must attend if you want to stay ahead of future evolutions instead of enduring them.

Check the agenda and book your seat, or better, become a ba4all member and get access to one more conference and a hands-on session on Customer Analytics.

I look forward to seeing you at Big Data & Analytics Insight 2016!

Kind regards,

Hans Michiels
(as a speaker I was asked to do a bit of promotion ..)