What Does It Mean to Enrich a Dataset?

Enriching a dataset means adding synthetic datapoints to increase its size and diversity. This helps make your dataset more representative of real-world scenarios and improves model training. For advanced use cases and practical examples, see our Enrichment Guide.

Quick Start: How to Enrich a Dataset

1

Step 1: Open Enrich

Click the ✨ Enrich button to open the enrichment window.GIF of clicking the enrich dataset button
If your dataset hasn’t been split yet, you’ll see a reminder about creating a validation set to avoid data leakage. This is not a blocker. For details, check our Best Practices.
2

Step 2: Choose Number of Datapoints

Use the slider to set how many additional datapoints you want to generate (from 10 to 10000).
3

Step 3: Advanced Settings (Optional)

You can guide the enrichment process with additional parameters:
  • Creativity (Temperature)
    • Higher values → more diverse but less predictable results.
    • Lower values → more consistent and relevant results.
  • User Instructions – Add custom instructions to control style, tone, or constraints.
GIF of adjusting creativity and instructions

Writing Effective Instructions

For best results, consider:
  1. Be specific about output format and structure.
  2. Provide examples of desired outputs.
  3. Define acceptable boundaries and tone.
  4. Include domain-specific terminology.
  5. State the purpose of augmentation clearly.
  6. Indicate diversity needs (e.g., vary sentence structure).
  7. Set limits on length, complexity, or style.
  8. Explain how to handle edge cases.
Example Instead of:
“Generate more customer service responses” Try: “Generate professional customer service responses about shipping delays, using a sympathetic tone, offering specific solutions, and keeping responses 50–75 words long.”
For deeper use cases, refer to the Enrichment Guide.
4

Step 4: Review Results

Once enrichment is complete, review the new datapoints that were generated.GIF of reviewing enrichment resultsMake sure the outputs meet your expectations before moving on.

Next Steps

Now that your dataset is larger and more diverse, you can use it to train or prepare your model further.