Implement logging in a Script Task (SSIS Series)

Introduction

This post is a follow-up on two older posts and will be the last one before my summer break.

Today I want to enhance the SSIS package made in post Fixing corrupt csv files in a SSIS Package (SSIS Series) by using the Plug and Play Logging Solution.

Problem

As long as a script runs well, you might not need logging inside. BUT .. when something GOES wrong, having logging in the script can reduce the time needed to troubleshoot the issue.
In the example for fixing csv files, anything can be wrong with a file, causing an error in the script. In that case it is for instance useful to know which file was being processed when the error occurred.

Prerequisite

For this to work, you have to install my Plug and Play Logging Solution, which can be downloaded from this post. It is just one SQL Script you have to run, that creates a [logdb] database with objects inside. You can also add those objects to a different database that already exists.
It is however not mandatory to rebuild the Fixing corrupt csv files in a SSIS Package example, as you can also add logging to your own Script Task.

Solution

This post is a enhancement on the Fixing corrupt csv files in a SSIS Package post, so you might need to read that post first, if you did not do that already.

Therefore I will dive directly into the changes that are needed to add logging in the Script Task.
This will be done with lots of screenprints with some comment in between.

SSIS-S01E07-169Adding a package parameter “LoggingConnectionString”
First you need to add a package parameter LoggingConnectionString that can be used in an expression of the OLE DB Connection and as input variable for the Script Task.

SSIS-S01E07-170OLE DB Connection configuration ..
Then add an OLE DB Connection for the logging database.

SSIS-S01E07-171OLE DB Connection properties ..
Use an expression to let the ConnectionString of the OLE DB Connection be equal to the value of the package parameter LoggingConnectionString.

SSIS-S01E07-173OLE DB Connection – Expression configuration [1]

SSIS-S01E07-172OLE DB Connection – Expression configuration [2]

SSIS-S01E07-174Final result
By the fx icon you can see that the connection manager uses an expression.

SSIS-S01E07-176Enable logging
Now enable logging for the package.

SSIS-S01E07-177Enable logging [2]
Add a log provider for SQL Server and let it use the logdb.oledbConnection by selecting it under the Configuration column header.

SSIS-S01E07-178Enable logging [3]
Then select all events. Filtering on what is actually logged is done by the logging solution (by the value of @MaxMessageClass, see this blog post for more details).

SSIS-S01E07-179Select the Script Task
Select the Script Task and add the following Variables to ReadOnlyVariables:

  • System::ExecutionInstanceGUID
  • System::PackageID
  • $Package::LoggingConnectionString


  • SSIS-S01E07-180The added ReadOnlyVariables in the red rectangles

    Below you will find a number of screenprints of the script task to talk you through the changes.
    You can download the C# script here.

    SSIS-S01E07-181Overview
    First make sure the Namespaces region is as shown.
    Then fold the namespace with the guid in the name, and paste the entire namespace HansMichielsCom.PlugAndPlaySSISLoggingSolution underneath it.
    This code is put in a separate namespace, so that it could also be placed in a .NET assembly that is added to the GAC (Global Assembly Cache). When you would do this, you do not have to add the code to every Script Task.
    For the example of today, we just put this namespace inside the Script Task to make things not too complicated for now.

    SSIS-S01E07-182Using the HansMichielsCom.PlugAndPlaySSISLoggingSolution namespace
    As a result, you have to tell the first guid-like namespace, that you want to call code inside the second namespace. Therefore add the using statement as shown above.

    SSIS-S01E07-183Constant used for logging
    Below you will see some printscreens with changed parts in the script.

    SSIS-S01E07-184Method GetSsisLogWriter to instantiate a SsisLogWriter object

    SSIS-S01E07-187Method Main is extended with logging.

    SSIS-S01E07-188Pass the logWriter as parameter to other methods ..

    SSIS-S01E07-189IMPORTANT: Bugfix in CheckAndReturnHeader!

    SSIS-S01E07-190IMPORTANT: Bugfix in CheckAndReturnHeader!
    (header == null) is added to cope with empty files.

    Testing, testing, one, two ..



    SSIS-S01E07-191Test preparations [1]

    SSIS-S01E07-193Test preparations [2]

    SSIS-S01E07-194Test execution

    SSIS-S01E07-195Test result: logging rows done inside the script are in the log table.

    Conclusion / Wrap up

    In this post I have demonstrated how to implement logging in SSIS Script Tasks using my Plug and Play Logging Solution.
    This type of logging gives more control on what to log and how to log it than when you implement logging using SSIS events.
    The examples given are very basic. You can use your imagination to implement logging of errors using a try .. catch block, or use all available parameters of logWriter.AddLogEntry to change the Retention Class, Message Class, and so on.

    In the summer I will take some time for study, reflection, holiday, and still .. work.
    My next post will be early September at the latest, maybe earlier.

    Download the C# script here.

    (c) 2016 hansmichiels.com – Do not steal the contents – spread the link instead – thank you.

Zeros, bloody zeros! (Data Vault Series)

Introduction

I must admit I have a weakness for British humour.
When I had to cope with leading zeros in business keys some time ago, I spontaneously came up with the title of this post, not knowing that it would serve as such.
For those who do not know, “Meetings, bloody meetings” is a British comedy training film in which John Cleese plays a main role. It was made in 1976, and a remake was made in 2012.
It tells in a funny way what can go wrong at meetings and how you can do better, check it out if you can.

DV-S01E05-meetingsMr John Cleese

But, obviously, this post is not about meetings but about zeros.

Problem

I can be short about that: leading zeros in business key values.
For instance a customer number is delivered to the data warehouse as 0001806 (instead of 1806).
This would not be a problem it is would always be delivered exactly like that. But to be honest, you can and will not know that upfront. Even this might be the case now, it might not be in the future.
When other tools are used, leading zeros could suddenly disappear (for instance when a csv file is modified using Excel), or (more rarely) the number of leading zeros could change (01806, 00001806). When this happens you have a problem, because for the data warehouse 01806, 0001806, 00001806 and 1806 are all different business keys! Even if you have only two variants, it is already a problem.
Because every business key gets a different row in the hub, and this customer now exists multiple times!

DV-S01E05-zeros(No acting here, this is how I look sometimes)

Solution

If you are familiar with Data Vault, you might already think of same-as-links to solve this.
But I think the solution should be implemented earlier, to avoid having multiple hub rows.
Simply always remove leading zeros when the sourcecolumn is (part of) a business key (either primary or foreign key) and seems a number or ID but is delivered as a string/varchar. In this way 1806 will always be 1806! And I think it is pretty impossible that 001806 and 1806 would refer to two different customers.
Unless, of course, they would come from different source systems. But in that situation, depending on leading zeros would be a bad thing to do, because when then leading zeros dropped off, satellite rows of different customers (in different source systems) could end up as connected to the same hub row! In this situation, in a non-integrated Raw Vault, it would be better to prefix the business key with the source system code and remove the leading zeros, for instance, CRM.1806 and ERP.1806.
In all cases, you can still store the original value (with leading zeros) as an ordinary attribute in a satellite for auditing reasons.

How to implement the solution

There are many ways to remove leading zeros. When I was searching for this I had two requirements:

  • No casting from and to an integer may take place, otherwise all business keys need to be numeric, so this would make the solution less reliable.
  • No function, routine or assembly may be called, this could negatively impact performance. I was looking for an “inline” conversion.

After some research I found an expression that was the same for SQL and SSIS and quite okay (T-SQL version by Robin Hames, my credits for his work), but appeared to change a string with only one or more zeros to an empty string. And because a 0 can have a meaning – and is certainly different from an empty string – this is undesired behavior, IMHO.
So I had to add some logic to it: a SELECT CASE in T-SQL and an inline condition (format {condition} ? {true part} : {false part} ) to the SSIS expression.
Furthermore I came on a different method for T-SQL as well, using the PATINDEX function, which is more compact than the other solution.
For SSIS I still use the ‘Robin Hames’ method, because the PATINDEX function is not available in SSIS Expressions.
So .. this is what it has become:

T-SQL

Remove_leading_zeros.sql

SELECT
    example.[id_with_leading_zeros],
   CASE
      WHEN LTRIM(example.[id_with_leading_zeros]) = '' THEN ''
      WHEN PATINDEX( '%[^0 ]%', example.[id_with_leading_zeros]) = 0 THEN '0'
      ELSE SUBSTRING(example.[id_with_leading_zeros], PATINDEX('%[^0 ]%', example.[id_with_leading_zeros]), LEN(example.[id_with_leading_zeros]))
   END AS [id_without_zeros_method1],

   CASE
      WHEN LTRIM(example.[id_with_leading_zeros]) = '' THEN ''
      WHEN PATINDEX( '%[^0 ]%', example.[id_with_leading_zeros]) = 0 THEN '0'
      ELSE REPLACE(REPLACE(LTRIM(REPLACE(-- Robin Hames' method
            REPLACE(LTRIM(example.[id_with_leading_zeros]), ' ', '!#!') -- replace existing spaces with a string that does not occur in the column value, I have chosen '!#!'
            , '0', ' ') -- replace '0' with ' '
            ) -- end of LTRIM to remove leading '0's that have been changed to ' 's
            , ' ', '0') -- change ' ' back to '0'
            , '!#!', ' ') -- change '!#!' back to ' '
   END AS [id_without_zeros_method2]
FROM
    (
    SELECT
        TOP 1000 RIGHT('00000000000' + CONVERT(NVARCHAR(12), object_id), 14) AS [id_with_leading_zeros]
    FROM
        master.sys.objects
    UNION
    SELECT N' 00000 '
    UNION
    SELECT N'00'
    UNION
    SELECT N' '
    UNION
    SELECT ' 0099990 A '
    UNION
    SELECT '-5550'
    ) example

SSIS Expression (can be used in Derived Column)

(LTRIM(REPLACE(id_with_leading_zeros,"0", "")) == "" && LTRIM(id_with_leading_zeros) != "") ? "0" : REPLACE(REPLACE(LTRIM(REPLACE(REPLACE(LTRIM(id_with_leading_zeros)," ","!#!"),"0"," "))," ","0"),"!#!"," ")

DV-S01E05-151In a Derived Column Transformation this looks for instance like this

Conclusion / Wrap up

In this post I have motivated why I think you should remove leading zeros from business keys when data is loaded from source systems to a data warehouse.
This post also contains different ways to remove leading zeros, two for T-SQL and one for a SSIS expression.

(c) 2016 hansmichiels.com – Do not steal the contents – spread the link instead – thank you.