Datasets Overview
Learn how to create and use datasets to fine-tune your models.
Check out the Get Started guide to learn how to create and use datasets.
What Are Datasets?
A Dataset is a collection of datapoints that we will use to train a model how to respond to specifric types of inputs.
Understanding JSONL Datasets for Fine-Tuning
A dataset for fine-tuning is a collection of examples in JSONL format (JSON Lines), where each line represents a single conversation example.
Dataset Structure
Each line in your JSONL file contains a JSON object with a single field called βmessagesβ. This field holds an array of 3 message objects, each with:
-
A
"role"
field (identifying who is speaking) -
A
"content"
field (containing the actual text)
The Three Roles
-
"system"
: Provides context and instructions that guide the modelβs behavior -
"user"
: Represents what a human user would say or ask -
"assistant"
: Contains the ideal response you want the model to learn to generate
Example Format
This format teaches the model that when given the system instruction to act as a customer service agent and asked about returns, it should respond with the specific return process information.
Download this test dataset here to give it a try.
Next Step: Create a Dataset
Create a Dataset
Click here to learn how to create a dataset.