If you’ve ever pulled data into Microsoft Excel from multiple sources, you’ll probably have had to deal with lots of minor discrepancies. It could be differences in how information is collated, varying ways of presenting the data, or simple spelling errors – but all of this can make it difficult to scrutinize large amounts of data, or to give accurate results based upon it.
To resolve this issue, Microsoft has created Fuzzy Lookup, which is effectively a broad match version of the immensely popular VLOOKUP tool.
Though not a standard Excel function, Fuzzy Lookup can be downloaded as an add-in directly from Microsoft. Once you have downloaded and installed the file, it will automatically appear in a new tab at the end of the ribbon. You’re then ready to start performing searches.
First, you’ll need to prepare the data by formatting it into tables, and include names so the Fuzzy Lookup function can work on them.
Now, head into Fuzzy Lookup where you’ll be presented with the option to select columns to merge from your left and right table. So, if both contain a ‘Sales Person’ tab, for example, you select these from each table so they’ll merge into one. What happens then is Fuzzy Lookup identifies similar results (Nick Williams and Nicholas Williams, say) and combines the data for the two.
Before generating this new combined column, however, you also have the option to select a similarity threshold. Worked out as a percentage, this tells Fuzzy Lookup how strict you want to be with the data merging. 100% is effectively the same as VLOOKUP, whereas 0% is much broader.
Click ‘Go’ and Fuzzy Lookup will generate a new table, where all the ‘Sales Person’ data has been merged, and the other fields (Sales Targets or Sales Achieved, for example) are populated alongside. Provided all the information can be ported over, it’ll all appear in the right place on the newly generated table. If there are any gaps, Fuzzy Lookup will leave the fields black for you to manually complete later.
You can also set the number of matches you want to come out of the data, which can be used alongside the similarity threshold to get the data set returned just right.
A lot of advanced maths and logic has gone into creating this intelligent software – including Jaccard similarity, tokenization of records, and transformations. If you’re interested in discovering more, there’s a PDF that comes with the download which has further information.
This article is from our Spring 2021 SoundBytes Newsletter. To read the other articles from the newsletter, please click on a links below: