STANDARDISING DATA
Introduction to Standardising Data
Leverage the power of clean data with Flookup's "Standardise Data" feature, an essential tool for effective data standardisation and text normalisation within Google Sheets. This feature provides precise control over specific data cleaning operations, allowing you to preprocess datasets by removing punctuation, eliminating unwanted words, stripping diacritics or normalising URLs. These functions streamline your Google Sheets data cleaning efforts by modifying your original dataset in place or creating a new, standardised version.
Classic Mode Functions
To access these functions, navigate to Extensions > Flookup Data Wrangler > Preprocess data > Standardize data in your spreadsheet menu.
- Remove punctuation marks: Removes all specified punctuation marks from the selected text entries.
- Remove unwanted words: Removes all specified words from the selected text entries.
- Remove diacritical marks: Removes all diacritical marks from text entries.
- Keep URL domain only: Strips a URL of everything except its root domain, for example "www.example.com" to "example".
- Keep URL path only: Strips a URL of everything except its path, for example "www.example.com/new/users" to "/new/users".
Removing Punctuation or Unwanted Words
-
Open the function sidebar
Navigate to Extensions > Flookup Data Wrangler > Preprocess data > Standardize data in your spreadsheet menu. - Select the mode to run
Choose the mode for the function from the drop down menu. - Select the entries to standardise
Highlight the column of text entries to standardise and click Grab selected range. - Select the entries to remove
Highlight the cell containing stop words (comma separated) or punctuation marks (no separation) and click Grab selected range. - Specify the result location
Click an empty cell to indicate where the results should be displayed. This cell can be in any sheet within the same workbook. - Standardise the dataset
Click the Standardise text entries button.
Removing Diacritical Marks, Keeping URL Domain or Path
-
Open the function sidebar
Navigate to Extensions > Flookup Data Wrangler > Preprocess data > Standardize data in your spreadsheet menu. - Select the mode to run
Choose the mode for the function from the drop down menu. - Select the entries to standardise
Highlight the column of entries to standardise and click Grab selected range. - Standardise the text entries
Click the Standardise text entries button.
Important Notes on Data Standardisation
- The Threshold parameter is only active when removing unwanted words. You should adjust it at step #5 above. It defines the minimum level of similarity between words to be removed from each text entry in your dataset and the predefined list of unwanted words.
- When removing punctuation marks, the range of entries to be removed should contain values that are not separated by any delimiter, for example ?,.@/.
- When removing unwanted words, the range of entries to be removed should contain comma separated values, for example company, limited, Mr, Mrs.
- This function processes data in the leftmost column of the range if you select more than one column. To standardise text entries from any column within a range, you must select a single column.
Visual Learning Resources
Labels might differ slightly but the steps remain the same.