This project demonstrates efficient supervised fine-tuning of the GPT-OSS-20B model using the OpenAI GSM8K dataset. We leverage LoRA (Low-Rank Adaptation) with Unsloth to make fine-tuning large models practical on limited GPU resources.
The goal is to enhance mathematical reasoning and step-by-step problem solving in large language models, without requiring full-scale retraining.
- ๐งฎ Dataset: GSM8K โ 7.4k high-quality grade-school math problems
- โก Model: GPT-OSS-20B with 4-bit quantization for memory efficiency
- ๐ง Fine-tuning: LoRA adapters applied to key transformer layers (
q_proj
,k_proj
,v_proj
,o_proj
,gate_proj
,up_proj
,down_proj
) - ๐ ๏ธ Training Framework: TRL SFTTrainer + Hugging Face Datasets
- ๐ Memory Optimization: Gradient checkpointing (
unsloth
mode) and 8-bit optimizer - ๐ฏ Objective: Supervised Fine-Tuning (SFT) for improved reasoning and problem-solving
-
Import PyTorch and Unsloth to handle model loading and optimization.
-
Load GPT-OSS-20B with:
- 4-bit quantization (saves memory)
- max sequence length = 1024 tokens
- LoRA adapters applied on key transformer layers (
q_proj
,k_proj
,v_proj
, etc.) - Gradient checkpointing (
unsloth
) for efficient training.
- Load GSM8K train split (7.4k math word problems).
- Convert data into ShareGPT-style conversations (
user
โ question,assistant
โ answer). - Apply the modelโs chat template to convert into proper training text format.
-
Use TRLโs SFTTrainer for supervised fine-tuning.
-
Configure training with:
- Batch size = 1 (with gradient accumulation)
- Optimizer = AdamW (8-bit)
- Learning rate = 2e-4
- Training steps = 30 (demo run)
-
Run training with
trainer.train()
.
- Save fine-tuned model + tokenizer into
outputs/
folder. - Print training logs (loss, metrics).
- Format a user query with
apply_chat_template
. - Generate a response with sampling (
temperature=0.7
,top_p=0.9
). - Decode and print the modelโs prediction.
โ In summary:
- Load GPT-OSS-20B + LoRA
- Prepare GSM8K dataset in chat format
- Fine-tune using SFTTrainer
- Save model & tokenizer
- Run inference on math questions
import torch
from unsloth import FastLanguageModel
max_seq_length = 1024
dtype = None
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="unsloth/gpt-oss-20b",
dtype=dtype,
max_seq_length=max_seq_length,
load_in_4bit=True,
full_finetuning=False,
)
model = FastLanguageModel.get_peft_model(
model,
r=8,
target_modules=["q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj"],
lora_alpha=16,
lora_dropout=0,
bias="none",
use_gradient_checkpointing="unsloth",
random_state=3407,
)
We load the GSM8K dataset and convert it into ShareGPT conversation format.
from datasets import load_dataset
from unsloth.chat_templates import standardize_sharegpt
# Load dataset
ds = load_dataset("openai/gsm8k", "main")["train"]
# Convert format
def convert_to_sharegpt(example):
return {
"conversations": [
{"from": "user", "value": example["question"]},
{"from": "assistant", "value": example["answer"]}
]
}
ds = ds.map(convert_to_sharegpt)
ds = standardize_sharegpt(ds)
# Apply tokenizer template
def formatting_prompts_func(examples):
texts = [
tokenizer.apply_chat_template(
convo, tokenize=False, add_generation_prompt=False
)
for convo in examples["conversations"]
]
return {"text": texts}
ds = ds.map(formatting_prompts_func, batched=True)
We configure supervised fine-tuning with TRLโs SFTTrainer.
from trl import SFTConfig, SFTTrainer
trainer = SFTTrainer(
model=model,
tokenizer=tokenizer,
train_dataset=ds,
args=SFTConfig(
per_device_train_batch_size=1,
gradient_accumulation_steps=4,
warmup_steps=5,
max_steps=30,
learning_rate=2e-4,
logging_steps=1,
optim="adamw_8bit",
weight_decay=0.01,
lr_scheduler_type="linear",
seed=3407,
output_dir="outputs",
report_to="none",
),
)
# Start training
train_result = trainer.train()
We save the fine-tuned model and tokenizer into the outputs/
directory.
# Save model and tokenizer
trainer.save_model("outputs")
tokenizer.save_pretrained("outputs")
# Optional: print metrics
metrics = trainer.state.log_history
print(metrics)
We test the fine-tuned model with a sample math reasoning question.
messages = [
{"role": "user", "content": "If you have 3 apples and eat 1, how many remain?"}
]
input_ids = tokenizer.apply_chat_template(messages, tokenize=True, return_tensors="pt").to(model.device)
outputs = model.generate(
input_ids,
max_new_tokens=100,
temperature=0.7,
top_p=0.9
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
โ Summary:
- Load GPT-OSS-20B + LoRA
- Prepare GSM8K dataset in chat format
- Fine-tune using SFTTrainer
- Save model & tokenizer
- Run inference on math questions
Example usage after training:
from transformers import AutoTokenizer
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained("outputs", load_in_4bit=True)
messages = [
{"role": "user", "content": "If you have 3 apples and eat 1, how many remain?"}
]
input_ids = tokenizer.apply_chat_template(messages, tokenize=True, return_tensors="pt").to(model.device)
outputs = model.generate(
input_ids,
max_new_tokens=100,
temperature=0.7,
top_p=0.9
)
response = tokenizer.decode(outputs[0][input_ids.shape[-1]:], skip_special_tokens=True)
print("Model:", response)
- LoRA + Quantization = Training 20B models on single GPUs
- GSM8K is an excellent benchmark for reasoning-focused finetuning
- Unsloth greatly simplifies efficient fine-tuning with minimal code changes
- Unsloth for efficient LLM training
- Hugging Face ecosystem for datasets + transformers
- OpenAI GSM8K dataset for high-quality math reasoning tasks
โ Successfully fine-tuned GPT-OSS-20B on GSM8K using LoRA. ๐ Expected improvements in step-by-step reasoning for math problems. ๐ Future Work:
- Train for full epochs instead of demo steps
- Evaluate on GSM8K test set
- Experiment with higher LoRA ranks (
r=16
or32
) - Compare with baseline models (GPT-3.5, LLaMA, etc.)