Monday, September 20, 2010

Passing variables in SSIS task to Script Task

Passing variables in SSIS task to Script Task

I have set up a package with three String variables called
* FileName, which has a data type of String and an initial value of “Import.txt”
* FolderName, which has a data type of String and an initial value of “c:\”
* FullPath, which has a data type of String and no initial value
And a Script task.

I am executing this task in Sql Server 2008, which supports C# in script task. Sql Server 2005 does not support C# code it supports only VB.
One thing to note when you set up variables (choose SSIS-VARIABLES from the top menu) is to make sure you have clicked on the package and not a task when you create or add a variable. If you create a variable while being clicked on a task (therefore with task scope) then the variable will disappear from the list when you click up to the package.

The task I took is, I am storing the name of folder i.e. raghu in variable FolderName. I am assigning filename i.e. import.txt to the variable FileName. In the script task we are reading these two variables and concatenating the result to the third variable i.e. FullPath.

Wednesday, September 15, 2010

Fuzzy Lookup in SSIS with example


The Fuzzy Lookup Transformation

A lookup becomes Fuzzy when it can match to records that are similar, but not identical to, the lookup key. For example, it can match “Jon Smith” to “John Smith” – this component is very useful for helping consoled client data such as names and addresses from multiple systems.
An important thing to get to grips with is that when the Fuzzy Lookup outputs its matches, it also outputs a Similarity and Confidence score for the match. The Similarity score is easy to understand – it’s a measure on a 0-1 scale of how similar the matched item is to the lookup key. So for example, “John Smith” and “Jon Smith” rate a 0.89, whereas “John Smith” and “James Smith” rate a lower 0.62. The Confidence score is on the same scale but not fixed for a given match – essentially the higher it is, the lower the number of alternative matches it has found. So from my example, “Johnathan Fruitbat” has just two matches, and the highest Confidence is 0.38. In contrast “John James Walliams” has 15 matches and the highest Confidence is just 0.0

Let us take an example. Consider two tables employee table and department table. Create an Employee table as below



Create another table Dept as below
In the Employee table just watch for the data that is entered with slight mistakes,



The Fuzzy Lookup transformation performs data cleaning tasks such as standardizing data, correcting data, and providing missing values.
This transformation differs from the Lookup transformation in its use of fuzzy matching. The Lookup transformation uses an equi-join to locate matching records in the reference table. It returns either an exact match or nothing from the reference table. In contrast, the Fuzzy Lookup transformation uses fuzzy matching to return one or more close matches from the reference table.
A Fuzzy Lookup transformation frequently follows a Lookup transformation in a package data flow. First, the Lookup transformation tries to find an exact match. If it fails, the Fuzzy Lookup transformation provides close matches from the reference table.

Let us use SOURCE as  OLEDB and the final result will be populated to EXCEL DESTINATION. So the control flow inside dataflow will look as follows with FUZZY LOOKUP as dataflow transformation

In OLEDB source Employee table emp is added which contains four columns including dept_name. In Fuzzy Lookup we are taking department table dept contain two columns dept_name and dept_id

In the above diagram we are taking dept table as reference table. We are using emp_dept in emp table and dept_name in dept table as to reference values .We have to select the look up value i.e. dept_id.

The next step is  Data Flow Destination as excel destination. After configuring the Destination editor will look as follows.

Before mapping as above we have to enter the fields in top row of excel so that user can know the fields to which it belongs to.
Mapping at destination editor is done as above in figure i.e. all fields including source and lookup are mapped to fields in excel destination. Now we finished configuring and mapping.
After executing task though there are mismatch in spelling in dept_name we can see that all rows are transformed successfully.

(See the word PRODUCSION which actually must be PRODUCTION)

(All the rows executed successfully).
In the excel sheet we can see the dept_id for all the departments including those which are with error.

So that about simple Fuzzy Lookup with an exanmple