How to clean ๐งน Dirty Data
If you ever worked with real-life data, you probably know that it is hardly ever clean. This means it must be cleansed before we can apply any Data Visualization to it to draw business insights.
Where do we even start?
Let’s try to define Dirty Data.
Dirty Data could be incorrect, inaccurate, incomplete, or inconsistent data. To make it easier, let’s just say that dirty data is the kind of data we cannot use AS IS in our reporting without any cleansing exercise.
Which gives us two options to choose from. Either apply human resources, in a form of Data Conditioner or Data Janitor, or find an algorithm that can help us to clean raw data.
Here comes Fuzzy Merge…
Fuzzy Merge or Fuzzy Matching allows us in Power BI, or to be more specific in Power Query, to lookup values based on similarity. And this is a huge step compared to any kind of EXACT matching method.
This is a huge step in cleaning datasets, especially compared to SQL’s joins and Excel’s VLOOKUP. Those only allow exact matching.
With Fuzzy Merge, it is up to us to define how similar our data should be, and fine-tune some other aspects of the Jaccard similarity algorithm as well.
Final resort
Additionally, if everything fails, we can use a so-called Transformation Table which allows us to go in hard with the cleansing.
If you would like to learn how to use Fuzzy Merge in Power Query, check out my video below where I show you the ins and outs of this cool feature.
Roland
๐Thanks for watching and if you liked this video please hit the ๐ button.
๐ข Also be sure to subscribe to the channel so you never miss a new video.
0 Comments