(This blog post was first published on 20-jun-2013 on my former blog)
The problem
For one of my projects I needed to add the file line number to the staging table, so that the raw data that was inserted into the database, would be tracable back to the source files exact line.
First I wanted to use the RowCount component in the Data Flow Task, it was easy to configure (only assign a variable) but it appeared that the variable only got the value of the current row after the Data Flow Task had finished, so you knew afterwards how many rows had gone through the dataflow. But that was not what I needed! I want an instant row number, telling me which row is flowing through right now!
The solution: a script component
So what we have to do is add a script component to the Data Flow Task.
But remember, if we work with a ReadWrite variable, we will have the same problem as with the RowCount component: the value of the variable is not updated until the dataflow task has finished, so in the flow it will have the initial value for every row.
To work around this, we have to add an output column to the script component as follows:
Now we have to edit the script. No ReadOnlyVariables or ReadWriteVariables need to be set, just press the <Edit Script> button.
As the script really is just a C# class, all we have to do is add a private variable, initialize it, increase it for every row and assign the value to the outputcolumn “RowNumber”.
I will show you how:
Simply add the parts in red boxes to the script:
That’s it. Simple, hey?
Now, there is one more thing to do. Open your OLE DB Destination and make a mapping for the RowNumber column. In my example the database field is also called RowNumber, but this is of course not required, you can give it any name you like.
Now you’re done, run the script and you will see that the RowNumber column in the destination table will increase for every row that is inserted!
carlos · February 12, 2016 at 7:37 pm ·
its not more easy with row_number window function ? https://msdn.microsoft.com/en-us/library/ms186734.aspx
Hi Carlos, you can use the row_number window function if your source is a SQL database. If your source is something different, for example a csv file, you cannot use it, and then the script task as described above can do the job.
Hi,
When i add the code parts in C# , Input0Buffer isn’t recognized.
Do i have to add some namespaces ? and how can i do this ?
Thanks.
Hi Erik, it should work out of the box. If you open the C# script task you should see the method Input0_ProcessInputRow and it should compile without adding any namespaces.
Which version of SSIS are you using?
Hi Hans,
I’m getting an “PipelineBuffer has encountered an invalid row index value” error.
We are using VS 2015.
Any thoughts are appreciated.
Hi Peter,
I never had to deal with this error.
It is difficult to tell what the reason could be by the information provided.
Which version of SSIS are you using?
If you can send me the SSIS Package I can have a look, but I can’t promise a solution.
For this I will send my email address to you by email.
Best regards,
Hans
Hi Hans,
I’m getting the same issue as Peter… “PipelineBuffer has encountered an invalid row index value”.
Please send me an e-mail and I will send you a screenshot of my code.
I’m using VS 2010.
Thanks!
-Wesley
I figured it out! You have to make sure to use the correct override method. By default, SSIS C# has the “Input0_ProcessInput” method when adding a transformation script task; however, your process specifies the “Input0_ProcessInputRow” method. If you rename the existing method to the correct method, you will get a prompt asking you if you want to “rename an inherited method” or something to that effect and you want to say “Yes” and then save changes. Re-open the script task, open the script, close it and save it and it will not have the “red X” showing. I re-ran my package and it worked!
Hello Wesley,
very good that you found it! Sorry I didn’t reply earlier, I was busy and just saw your two comments.
Works like a charm!
Good to hear! Thanks for the feedback.
Hi,
I am trying to get the same functionality to work via BIML, but I keep getting errors.
Do you have an example of how to implement this using BIML?
No, Sorry, I do not have that.
Best regards,
Hans
I am new to scripting, but I have your script working to create a key, the problem is that sometimes I need to use it to raise the starting seed a bit, I modified this portion to go from a 0 to
m_rowNumber = 200000;
expecting the Row Number to start at 200,000 and then go up from there, but it still starts at 1. What am I doing wrong. I am using this with SQL 2017
Hello Dave,
Sorry I missed your comment some months ago.
But to be honest I do not understand what could have gone wrong. If you put the assignment of the value 200000 in the PreExecute() method I would – just as you – expect it to start counting at 200000.
Best regards,
Hans
Thanks so much Hans, your elegant solution helped a lot. Can now output row ID direct to target table as part of the key. Much cleaner than posthumous ROW_NUMBER OVER update to heap table. Microsoft should add as a dedicated script component and hire or reward you 🙂
Hi Andrew,
Good to hear that my blog post helped you, and thank you for the compliment.
Best regards,
Hans
2020 and this post still rocks! Thank you! 🙂