HIGHLIGHTING DUPLICATES BY TEXT SIMILARITY
Introduction to Highlighting Duplicates
Flookup provides tools to identify and highlight duplicate or similar data in Google Sheets. You can analyse a single column for duplicates or compare data across multiple columns. Duplicates are identified based on either percentage similarity or sound-based matching and are visually marked.
To begin, open the tool by navigating to Extensions > Flookup Data Wrangler > Highlight duplicates in your Google Sheets menu.
Highlighting Duplicates by Percentage or Sound Similarity
- Select the function to run
Click the menu item labelled "By percentage" or "By sound". - Select the highlight mode
Select Highlight all duplicates or Skip first occurrence, depending on how you want your results to appear. - Select the data range to analyse
Select a range with one or more columns. This specifies the number of columns, on the row with duplicates, that you want to highlight. For example, if you select the range A2:D500 and duplicates are identified on rows B10, B20 and B50, then A10:D10, A20:D20 and A50:D50 will be highlighted. - Specify the column of data to analyse
Specify the Left_column index. If no user input is made, the first column of the selected range will be analysed. - Specify the level of similarity
If you selected "By percentage" in step #1, specify the Threshold value. If no user input is made, the default threshold of 0.8 will be used. - Highlight Duplicates
Click Highlight to execute the function.
Notes on Highlighting Data in a Single Column
- The number of columns you select determines the number of cells that will be highlighted in each row.
- The Left_column value is the column index, within your selection, that will be analysed.
- If you are identifying duplicates "By percentage", then duplicates will be values in the Left_column that have a level of similarity equal to or higher than the Threshold value.
How to Highlight Duplicates Across Two Different Columns
- Select the function to run
Click the menu item labelled "By percentage" or "By sound". - Select the highlight mode
Select Highlight all duplicates or Skip first occurrence, depending on how you want your results to appear. - Select the comparison mode
Click the option labelled Compare two different columns. - Select the data to compare
Select a range with two or more columns. - Specify the column indexes to analyse
Specify your Left_column and Right_column index. These are the two columns to compare to each other. - Set the level of similarity
If you selected "By percentage" in step #1, adjust the Threshold value to match your needs. If no user input is made, the default threshold of 0.8 will be used. - Highlight duplicates
Click Highlight to execute the function.
Notes on Highlighting Data Across Two Columns
- If you are identifying duplicates "By percentage", then duplicates will be values in the Left_column that exist in the Right_column and have a level of similarity equal to or higher than the Threshold value.
How to View Duplicate Clusters
- Select Trace duplicate clusters
In the second drop down menu, select Trace duplicate clusters. - Specify the Trace_row index
Specify your Trace_row index. This is the row for which you would like to see related duplicates. -
Click the Trace button
Scroll to the bottom and click Trace.
Notes on Tracing Highlighted Data
- The duplicate clusters for any particular row index will be highlighted in a distinct peach colour. In order to revert to the original highlight colour, simply use the Undo button within Google Sheets.
For the Visual Learners
Labels might differ slightly but the steps remain the same.