Adding custom header and footer to csv files (SSIS Series)

Introduction

This article wasn’t planned. SSIS again? No I wanted to write a post about something else.
Nevertheless I got this question on one of my previous posts and thought, I might as well create a blog post about it.

Problem

“Hi Hans,
I have created my ssis package for flat file destination. My problem is that I want to add header and footer in my flat file destination to send it to FTP location format will be

0|MMDDYYYY|HHMMSS
1|Row1
1|Row2
1|Row3
1|ROW4
9|6

The last line contains no of row count including header and trailer
Please let me know how to do that.. I am completely stuck with that and no good luck.. “

So to summarize, both the header and the footer line of the csv file need to contain dynamic content.
Also each row starts with a ‘row type’: 0 for the header, 1 for every detail row, and 9 for the footer.

Solution

As an example I just use a simple Data Flow Task with an OLEDB Source connected to table DimCurrency from the AdventureWorksDW database.
The data then flows to a Script Component used as a Destination.
Here are a few printscreens to demonstrate what I mean. The Script Component will create the file.
The package is made using SSIS 2014.

Developing the package

SSIS-S01E04-099
The demo package just contains one Data Flow Task

SSIS-S01E04-205
Design of the Data Flow Task

SSIS-S01E04-101
OLEDB Source: DimCurrency from the AdventureWorksDW database

SSIS-S01E04-102
OLEDB Source: select all three columns

SSIS-S01E04-208
When you drag the Script Component to the Data Flow, you will get this dialog. Use the Script Component as a Destination

SSIS-S01E04-207
Variables that the package contains.

SSIS-S01E04-209
In the Script Transformation Editor, add both variables User::HeaderTimestamp and User::TargetCsvPathAndFileName to ReadOnlyVariables

Now comes the most interesting part! We use the endless possibilities of C# to fix our problem.
In the Script Transformation Editor window, press Edit Script ….

Please note that you can download the entire script here, so you do not have to copy the (sometimes malformatted) code from this webpage..

Then in the namespaces region add two namespaces, as indicated below.

Namespaces region
#region Namespaces
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;
// Added namespaces:
using System.IO;
using System.Text;
#endregion

Above public override void PreExecute() add the copyright notice and the following two regions: Configuration and Private variables.
Please note that these regions do not exist yet, so you have to copy the code including the #region and #endregion lines.

New regions
  /*
  (c) Copyright 2016 - hansmichiels.com
 
  This program is free software: you can redistribute it and/or modify
  it under the terms of the GNU General Public License as published by
  the Free Software Foundation, either version 3 of the License, or
  (at your option) any later version.
 
  This program is distributed in the hope that it will be useful,
  but WITHOUT ANY WARRANTY; without even the implied warranty of
  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
  GNU General Public License for more details.
 
  You should have received a copy of the GNU General Public License
  along with this program. If not, see http://www.gnu.org/licenses/.
  */

 
  #region ***** Configuration *****

  // Here you can change the column delimiter and the text qualifier easily.
  private const string _delimiter = “|”;
  private const string _textQualifier = “”;
  // To set the text qualifier to ” use:
  // private const string _textQualifier = “\””; // I recommend to use a text qualifier.
  
  // Every x rows, write the buffer to the file, and clear the buffer.
  // This is faster than writing every individual line to the file.
  // The higher the value of _bufferRows is, the more memory will be used and the less write actions are done to the csv file.
  // A higher value is a good idea if you have enough memory in your server.
  private const int _bufferRows = 100000;

  #endregion ***** Configuration *****

  #region Private variables

  // Variable to hold the path and filename
  private string _fileName;
  // Variable to count the rows
  private int _rowCount;
  // StringBuilder to hold a buffer of text that still needs to be written to the csv file.
  private StringBuilder _fileContentsBuffer;

  #endregion Private variables

PreExecute() is called before the data starts to flow through. Here we do some preparation and compose the headerline of the file.

PreExecute()

 &nbs p;/// <summary>
  /// This method is called once, before rows begin to be processed in the data flow.
  /// </summary>
  public override void PreExecute()
  {
    base.PreExecute();

    // Initialize variables
    _rowCount = 0;
    _fileContentsBuffer = new StringBuilder();
    _fileName = Variables.TargetCsvPathAndFileName;
    DateTime headerTimestamp = Variables.HeaderTimestamp;

    // Append the new line to the buffer.
    // This will be written to file with the first detail rows.
    // If there are no detail rows, no file will be created.
    _fileContentsBuffer.AppendFormat(“{1}0{1}{0}{1}{2:ddMMyyyy}{1}{0}{1}{2:HHmmss}{1}\r\n”, _delimiter, _textQualifier, headerTimestamp);
    }

PostExecute() is called after the data has flowed through. Here we append the last lines in the _fileContentsBuffer variable and the footerline to the file.

PostExecute()
  /// <summary>
  /// This method is called after all the rows have passed through this component.
  ///
  /// You can delete this method if you don't need to do anything here.
  /// </summary>
  public override void PostExecute()
  {
    base.PostExecute();
    // If no detail rows, bail out. No file will be created when there are no detail rows.
    if (_rowCount == 0) return;

    using (StreamWriter sw = File.AppendText(_fileName))
    {
      // Append a footer line with the rowcount, example value: 9|105
      // The rowcount is a count of the detail rows, but the footer count
      // needs to include header and footer, so we add 2 to the value of _rowCount.
      _fileContentsBuffer.AppendFormat(“{1}9{1}{0}{1}{2}{1}”, _delimiter, _textQualifier, _rowCount + 2);
      sw.Write(_fileContentsBuffer.ToString());
      _fileContentsBuffer.Clear();
    }
  }

Input0_ProcessInputRow(Input0Buffer Row) is the place where all the data processing takes place. As this script is the Destination for the data, we need to do something with those data. A Flat File Destination is not there and will not do this for us.
So here we append every detail row to the _fileContentsBuffer variable. Once in a while, we write the buffer to the file, and clear the buffer again.

Input0_ProcessInputRow

  /// <summary>
  /// This method is called once for every row that passes through the component from Input0.
  /// </summary>
  /// <param name=”Row”>The row that is currently passing through the component</param>
  public override void Input0_ProcessInputRow(Input0Buffer Row)
  {
    // Increase the rowcounter
    _rowCount++;

    // Append the new line to the buffer.
    // The format needs to be like “{1}[middle part]{1}\r\n”, so that the row starts and ends with the text qualifier.
    // Replace [middle part] with {1}{0}{1}{x} for every column that you want to append to the file, where x is 2 for column 1, 3 for column 2 and so on.
    _fileContentsBuffer.AppendFormat(“{1}1{1}{0}{1}{2}{1}{0}{1}{3}{1}{0}{1}{4}{1}\r\n”, _delimiter, _textQualifier,
      SafeValue(Row.CurrencyKey), // First column, {2} in the format
      SafeValue(Row.CurrencyAlternateKey), // Second column, {3} in the format
      SafeValue(Row.CurrencyName) // Third column, {3} in the format
             // etc.
      );
    
    // Every x rows, write the buffer to the file, and clear the buffer.
    if (_rowCount % _bufferRows == 0)
    {
      using (StreamWriter sw = File.AppendText(_fileName))
      {
      sw.Write(_fileContentsBuffer.ToString());
      _fileContentsBuffer.Clear();
      }
    }
  }

Multiple overloaded versions of the SafeValue method (for different datatypes like string, int, DateTime, you can add more when needed) add escape characters to the column values when needed.

SafeValue method and overloads
  /// <summary>
  /// Escapes a double quote delimiter in a column value by doubling it.
  /// </summary>
  /// <param name=”value”></param>
  /// <returns></returns>
  private string SafeValue(string value)
  {
    // If _textQualifier is null or empty, just return the value.
    if (string.IsNullOrEmpty(_textQualifier)) return value;

    if (value.Contains(_textQualifier))
    {
    // If the value contains one or more text qualifiers,
    // escape them by replacing them with two text qualifiers.
    return value.Replace(_textQualifier, _textQualifier + _textQualifier);
    }
    return value;
  }

  /// <summary>
  /// Overloaded method for DateTime value (meant as an example)
  /// For other data types, create new overloaded methods,
  /// and in that overloaded method convert the value to string,
  /// before calling the 'SafeValue' method with a string parameter.
  /// </summary>
  /// <param name=”value”></param>
  /// <returns></returns>
  private string SafeValue(DateTime value)
  {
    return SafeValue(value.ToString(“yyyy-MM-dd HH:mm:ss.fff”));
  }

  /// <summary>
  /// Overloaded method for int value
  /// </summary>
  /// <param name=”value”></param>
  /// <returns></returns>
  private string SafeValue(int value)
  {
    return SafeValue(value.ToString());
  }
}

When the script is ready, Choose Rebuild from the BUILD menu in the VstaProjects – Microsoft Visual Studio Window. Then close that Window. You return to the Script Transformation Editor, close it by pressing OK.

Testing the package

I simply run the package in debug mode. 105 currency rows are added to the csv file.
Including the header and footer, the csv file contains 107 rows, and this is the number that is written to the footer line.

SSIS-S01E04-110
Running the package in debug mode

SSIS-S01E04-111
First and last part of the created csv file.

Please note that to change the delimiter and text qualifier you only have to make two changes in the ***** Configuration ***** region.

Configuration
  #region ***** Configuration *****

  // Here you can change the column delimiter and the text qualifier easily.
  private const string _delimiter = “;”;
  private const string _textQualifier = “\””;
. . .

  #endregion ***** Configuration *****

By this change the file will look as follows:
SSIS-S01E04-112
First and last part of the created csv file with a different delimiter and text qualifier.

I also did a test with a source query that returned over 1.1 million rows, and the csv file of 25 MB was created in just 3 seconds (on a core i7 laptop with SSD).
So my conclusion is that the performance of this solution is very satisfying.

Download the C# script for the script component here.

Conclusion / Wrap up

In this post I have demonstrated how to use SSIS 2014 to create a csv file with a dynamic header (with a timestamp) and footer line (including a rowcount) using a Script Component as a Destination in a Data Flow Task.

(Promotional)

Big Data & Analytics insight 2016: save the date for an insightful conference
(as a speaker I was asked to do a bit of promotion ..)

(c) 2016 hansmichiels.com – Do not steal the contents – spread the link instead – thank you.

Hans Michiels

Hans is an Independent Business Intelligence and Datawarehouse Consultant & Microsoft SQL Server Consultant, working in the Netherlands. He has been working in the software industry since 1996, with SQL Server since the year 2001, and since 2008 he has a primary focus on datawarehouse- and business intelligence projects using Microsoft technology, using a Datavault and Kimball architecture. He has a special interest in Datawarehouse Automation and Metadata driven solutions. * Certified in Data Vault Modeling: Certified Data Vault 2.0 Practitioner, CDVDM (aka Data Vault 1.0) * Certified in MS SQL Server: * MCSA (Microsoft Certified Solutions Associate) SQL Server 2012 - MCITP Business Intelligence Developer 2005/2008 - MCITP Database Developer 2005/2008 - MCITP Database Administrator 2005/2008

More Posts

4 thoughts on “Adding custom header and footer to csv files (SSIS Series)”

  1. Hi Hans,

    Interestring post. I get the package running but end-up with a file that only contain the starting character and delimiters like:

    0;;;;;;
    1;;;;;;
    1;;;;;;
    etc

    It seems that the script doesn’t receive the values from the OLEDB source connection.

    Any idea what could be the problem?

    1. Hi Dirk,
      Pay attention to the method Input0_ProcessInputRow.
      In the blog post the following code appends column values from the Data source.
      Replace this with column names from your “Row” ( intellisense will show the column names when you type Row. )
      _fileContentsBuffer.AppendFormat(ā€œ{1}1{1}{0}{1}{2}{1}{0}{1}{3}{1}{0}{1}{4}{1}\r\nā€, _delimiter, _textQualifier,
      SafeValue(Row.CurrencyKey), // First column, {2} in the format
      SafeValue(Row.CurrencyAlternateKey), // Second column, {3} in the format
      SafeValue(Row.CurrencyName)

      If you cannot get it working send me the code and I will have a look. Good luck!

  2. Hi Hans,
    I have to extract a csv file with header and footer. For I have 8 columns and I have to get the header for only two columns and rest of the columns should be without header. Can you please help me with this.

    1. Hi Seema,
      Yes sure, I can give it a try. I will send you an email so you can reply on that. If you send me your SSIS Package (preferably a stripped version that runs on my machine as well) I will have a look.
      Best regards, Hans

Leave a Reply

Your email address will not be published. Required fields are marked *