Fuzzy Grouping

Contents[Hide]

The Fuzzy Grouping transform allows grouping of records by looking at the similarity between the values of various columns.

Two records in which a possible misspelling has occurred can be grouped together for further analysis, or found duplicates can be removed. The sensitivity toward differences between values can be adjusted.

Transform - Fuzzy Grouping
Transform - Fuzzy Grouping

1. Input

The Fuzzy Grouping transform requires one input transform that has at least one column.

Consider the following input as an example:

Example data
Example data

2. Add the transform

Click the connector link between two transforms to select it.

Select a link
Select a link

In the toolbar, choose Insert Other, then Fuzzy Grouping.

Toolbar option
Toolbar option

To edit/configure the transform, select it and choose Configure in the toolbar.

Configure the transform
Configure the transform

3. Configure

Steps to configure the Fuzzy Grouping transform:

Fuzzy Grouping transform configuration
Fuzzy Grouping transform configuration

  1. Uncheck any columns that should be excluded from the output.
  2. Drag and drop the columns that should be grouped from under Input to Grouping Columns.
  3. Enter the Probability Threshold. Valid values are from .0001 to 1.0, where a value of 1 will require input data to be an exact match for them to be grouped together.
  4. Select Ignore String Case if you want a non-case sensitive match.
  5. Select Output Top Level Records Only if you want to omit the records found to be duplicates.

4. Output

The figure below illustrates the output for our example.

5. See also

Dundas Data Visualization, Inc.
400-15 Gervais Drive
Toronto, ON, Canada
M3C 1Y8

North America: 1.800.463.1492
International: 1.416.467.5100

Dundas Support Hours:
Phone: 9am-6pm, ET, Mon-Fri
Email: 7am-6pm, ET, Mon-Fri