The Fuzzy Lookup Transformation
A lookup becomes Fuzzy when it can match to records that are similar, but not identical to, the lookup key. For example, it can match “Jon Smith” to “John Smith” – this component is very useful for helping consoled client data such as names and addresses from multiple systems.
An important thing to get to grips with is that when the Fuzzy Lookup outputs its matches, it also outputs a Similarity and Confidence score for the match. The Similarity score is easy to understand – it’s a measure on a 0-1 scale of how similar the matched item is to the lookup key. So for example, “John Smith” and “Jon Smith” rate a 0.89, whereas “John Smith” and “James Smith” rate a lower 0.62. The Confidence score is on the same scale but not fixed for a given match – essentially the higher it is, the lower the number of alternative matches it has found. So from my example, “Johnathan Fruitbat” has just two matches, and the highest Confidence is 0.38. In contrast “John James Walliams” has 15 matches and the highest Confidence is just 0.0
Let us take an example. Consider two tables employee table and department table. Create an Employee table as below
Create another table Dept as below
In the Employee table just watch for the data that is entered with slight mistakes,
This transformation differs from the Lookup transformation in its use of fuzzy matching. The Lookup transformation uses an equi-join to locate matching records in the reference table. It returns either an exact match or nothing from the reference table. In contrast, the Fuzzy Lookup transformation uses fuzzy matching to return one or more close matches from the reference table.
A Fuzzy Lookup transformation frequently follows a Lookup transformation in a package data flow. First, the Lookup transformation tries to find an exact match. If it fails, the Fuzzy Lookup transformation provides close matches from the reference table.
Let us use SOURCE as OLEDB and the final result will be populated to EXCEL DESTINATION. So the control flow inside dataflow will look as follows with FUZZY LOOKUP as dataflow transformation
In OLEDB source Employee table emp is added which contains four columns including dept_name. In Fuzzy Lookup we are taking department table dept contain two columns dept_name and dept_id
In the above diagram we are taking dept table as reference table. We are using emp_dept in emp table and dept_name in dept table as to reference values .We have to select the look up value i.e. dept_id.
The next step is Data Flow Destination as excel destination. After configuring the Destination editor will look as follows.
Before mapping as above we have to enter the fields in top row of excel so that user can know the fields to which it belongs to.
Mapping at destination editor is done as above in figure i.e. all fields including source and lookup are mapped to fields in excel destination. Now we finished configuring and mapping.
After executing task though there are mismatch in spelling in dept_name we can see that all rows are transformed successfully.
(See the word PRODUCSION which actually must be PRODUCTION)
(All the rows executed successfully).
In the excel sheet we can see the dept_id for all the departments including those which are with error.
So that about simple Fuzzy Lookup with an exanmple
nice
ReplyDeleteThanks for sharing. It would be better to have '
ReplyDeleteUnderstanding on the Fuzzy Lookup Transformations