🧠 Why This Matters
A small dataset can limit a model’s ability to generalize and perform well on unseen data — especially in domain-specific tasks. With Prem Studio, you can enrich your dataset using synthetic data generation strategies. This allows you to:- 📈 Expand your dataset size with low human effort
- 🧠 Introduce diversity into training examples
- 🚀 Improve model performance and generalization during fine-tuning
- 🧪 Prototype quickly, even with minimal real data
💼 Use Case: Fine-Tuning a Customer Support Chatbot
Imagine you’re building a domain-specific chatbot to answer product-related queries for an e-commerce company. You’ve collected only 50 QA pairs from past support tickets — not enough for robust fine-tuning. Instead of manually creating more data, you can enrich your dataset using synthetic generation in Prem Studio. This guide shows you how.⚙️ Step-by-Step: Enrich Your Dataset with Synthetic Data
1
Select Your Dataset and Perform a 50/50 Split

It’s not mandatory, but we highly recommend splitting your dataset into training and validation sets before running enrichment. This helps avoid data leakage and ensures more reliable evaluations.
See our Dataset Best Practices Guide for more.
2
Launch the Enrichment Workflow

3
Define Enrichment Settings and (Optional) Instructions

- New pairs to generate:
500
- Creativity:
0.1
(lower creativity = safer, more consistent results)
- Instructions (optional):
4
Review and Approve Synthetic Examples
Click Generate. Review the 500 synthetic examples.
Approve the ones you’d like to keep.
5
Add Synthetic Datapoint to Training Bucket

This happens automatically when using the Autosplit functionality if “Allow synthetic data in Validation” is not selected — even if you apply a split like 80/20 or 70/30.
6
(Optional) Further Enrich Using Textual Documents

When documents are provided, the enrichment engine combines both seed examples and content from your uploaded files to create new, high-quality datapoints that better reflect your domain language and topics.
📊 Example Before and After Enrichment
Type | Question | Answer |
---|---|---|
Original | How can I track my order? | You can track it using the link in your confirmation email. |
Synthetic | Where do I check the status of my shipment? | Use the tracking link in the confirmation email we sent you. |
Synthetic | Can I know where my package is? | Yes, the tracking link in your confirmation email shows real-time updates. |
📊 Dataset Size: Before vs After
Started with 50 original examples.- Split: 25 training / 25 validation
- After enrichment: +500 synthetic datapoints → 550 total (525 in training + 25 in validation)
📦 What’s Next?
With your enriched dataset, you can now:- Fine-tune a model with higher data diversity (Fine-Tuning Guide)
- Evaluate model generalization with agentic evaluation
💡 Pro Tips
- Always enrich after splitting to avoid data leakage.
- Use instructions to control output tone, complexity, topic, or QA structure.
- Review synthetic data for consistency — quality > quantity.
- Avoid over-relying on synthetic examples for evaluation.