๐Ÿง  Why This Matters

Donโ€™t have a ready-made QA dataset? You donโ€™t need one anymore. Whether youโ€™re fine-tuning a model or building a knowledge assistant, having QA data is essential. But not everyone has the time or resources to create it from scratch. This feature lets you:
  • โœ… Convert any textual content into usable QA pairs.
  • ๐Ÿš€ Instantly scale your training data, even from a few documents.
  • ๐Ÿ”„ Reuse your documentation, whitepapers, or reports to fuel model training.
  • ๐Ÿ› ๏ธ Prepare data for model fine-tuning โ€” natively supported in Prem Studio.

๐Ÿ“ What Youโ€™ll Need

To get started, you only need:
  • A set of text-based input files (e.g. .pdf, .txt, .docx, .html)
  • A Prem Studio account
  • Optionally: Custom instructions to guide the QA generation process
In this example, weโ€™ll use 5 input files about crypto concepts and trends.

โš™๏ธ Step-by-Step: Using QA Generation in Prem Studio

1

Create Dataset from the Datasets Section

GIF of navigating to synthetic dataFrom the left sidebar, click on Datasets, then + Create dataset (top right), and select the option Synthetic Data.
2

Upload Your Input Files

GIF of uploading input filesDrag and drop your documents (PDFs, DOCX, TXT, HTML). In our example, we use 5 crypto-related files.
3

(Optional) Add Custom Instructions

GIF of entering instructionsTailor the output with extra instructions, e.g.:
Only create questions related to blockchain, DeFi, and crypto tokens. Avoid generic or repetitive questions. Keep answers short and factual.
4

Generate Your Dataset

GIF of generating datasetClick Generate Dataset and let Prem Studio do the rest. In our example, we generate 100 QA pairs from 5 documents in under a minute.
5

Review Your Synthetic Dataset

GIF of reviewing and editing the datapointsClick Generate Dataset and let Prem Studio do the rest. In our example, we generate 100 QA pairs from 5 documents in under a minute.
Once the process is complete, youโ€™ll be able to preview the generated dataset, edit any entries, and use it directly in Prem Studioโ€™s fine-tuning and evaluation tools. Example Outputs:
QuestionAnswer
What is a blockchain?A decentralized, distributed ledger that records transactions.
What is the main purpose of tokens in DeFi?To facilitate decentralized financial operations.
How do smart contracts work?They execute predefined actions automatically based on conditions.

๐Ÿ“ฆ Whatโ€™s Next?

You can now:

๐Ÿ’ก Pro Tips

  • Upload diverse content to get a wide range of QA pairs.
  • Use domain-specific instructions to control the output style.
  • Start small (5โ€“10 files) to experiment, then scale up.