Home SSIS Validating Data using SSIS

Validating Data using SSIS

by Prathy Kamasani January 23, 2017 0 comment

I cannot believe it took me almost three months to recover from the house move. At last, I feel like I am back to routine. Now I have my home office; I am quite proud of it, it involved much DIY work. Wish I am one of those YouTube stars, who show off their interior decorating skills 😉 I know none of this excites you, so let’s look at the actual blog post…

 

 

Validating Data using SSIS:

Recently I have got a requirement to validate flat files using SSIS. Then I realised, why I never thought of doing this before. It would have saved hours I wasted in emails or checking data to understand the problem. We face problems with dirty data very often, it could be because of invalid characters or lookup errors or something else. Based on the ETL process you identify this data problem at various stages. However, having a data validation task enables you to accept or reject the data at the very initial stage. I have to thank my client for giving me this task. Because it is something I have not done so far and I used script component (Which I do not use very often, so I def need to write this blog post before I forget it.)

The goal of our SSIS package is to find all the validation failures. And the validations we want to check are:

  1. Only Yellow and Orange colour products are valid
  2. Product Column is mandatory
  3. Product Name must be capital
  4. Amount is a mandatory field
  5. Only two decimals values are allowed in the Amount field
  6. No comma’s (,) allowed in the Amount field
  7. No special characters in the Product column
  8. No 0’s allowed in the Amount Filed
  9. No NULL values in the Amount Field
  10. No duplicated products with the same column

I used Script component as a transformation in the Data Flow task. This is where all the logic resides. In the script component, I added a new output called Validation with two new output columns called FailedColumn and ValidationDescription. I also included all the source columns.

In the script component under Input0_ProcessInputRow, I included my code. Mainly I was using If condition as per my validation, As I wanted to capture all validation errors, so I added validation error rows to Validation Buffer.

Sample Script:

I populated ValidationDescription and Failed column with relevant information. This enables me to understand exactly which column failed and why. And my output looks like this.

Now I can easily see where and what the issues are. Also I can just redirect these rows to a table or send them in an email or just ignore these rows and load rest or abort the ETL process and notify the relevant person to correct the file.

Prathy

You can download the SSIS project here  or the complete code of the Script Component for all above mentioned validations:

 

0 comment

You may also like

%d bloggers like this:
Skip to toolbar