De-Duplicate Dataset
Automatically de-duplicate your messy data
1. Create a Dataset
Before you can start a job, you need to create a dataset. You can do that in the App by clicking the New Dataset
button. Right now the application will only accept CSV files.
Add a dataset
Your dataset should only include the fields relevant for the task (this can be multiple fields), you can either edit the dataset before uploading or enter the specific Columns
in the upload form.
Select Columns
2. Start a Match Job
Once your datasets have been created, it's time to kick off a job. You can do that in the App by clicking the Start Job
button.
Start Match Job
- Select
New Job
- Select
Match
- Enter a name for your job - this can be anything you want
- Select the
dataset
you added in the previous step from the dropdown for bothBase Dataset
andMatch Dataset
- If you want to give the app specific guidance, click
Edit
on theInstructions
box. While not necessary, this can help guide the app to de-duplicate your data more accurately.
3. Exporting Your Duplicate Ids
Since a job can take a while depending on the amount of data, you can see the progress on the job’s page which will update automatically ever 30 seconds. You will also receive an email when the job is complete, so feel free to navigate away from the page if needed.
View Job Progress
Once the match job is complete, you can click the Export Data
button to get the duplicate record ids in a CSV file.