Cluster Dataset
Automatically cluster your messy data
1. Create a Dataset
Before you can start a job, you need to create a dataset. You can do that in the App by clicking the New Dataset
button. Right now the application will only accept CSV files.
Add a dataset
Your dataset should only include the fields relevant for the task (this can be multiple fields), you can either edit the dataset before uploading or enter the specific Columns
in the upload form.
Select Columns
2. Start a Ca Job
Once your datasets have been created, it's time to kick off a job. You can do that in the App by clicking the Start Job
button.
Start Categorize Job
- Select
New Job
- Select
Categorize
- Enter a name for your job - this can be anything you want
- Select the
Dataset
you added in the previous step from the dropdown - In the
Categories
section, click theAuto Determine
button. This will open up a panel where you can provide instructions and the system can automatically generate clusters for you. Alternatively, you can enter the specific clusters for your job in theCategories
input field. You can enter multiple clusters at once by separating them with a|
character. - If you want to give the app specific guidance when determining which record belongs in which cluster(s), click
Edit
on theInstructions
box. While not strictly necessary, this can help guide the app to cluster your data more accurately.
3. Exporting Your Clusters
Since a job can take a while depending on the amount of data, you can see the progress on the job’s page which will update automatically ever 30 seconds. You will also receive an email when the job is complete, so feel free to navigate away from the page if needed.
View Job Progress
Once the categorize job is complete, you can click the Export Data
button to get the corresponding cluster values in a CSV file.