Supervised-Fine-Tuning-of-GPT-OSS-20B-on-OpenAI-s-gsm8k-reasoning-with-LoRA

📌 Overview

This project demonstrates efficient supervised fine-tuning of the GPT-OSS-20B model using the OpenAI GSM8K dataset. We leverage LoRA (Low-Rank Adaptation) with Unsloth to make fine-tuning large models practical on limited GPU resources.

The goal is to enhance mathematical reasoning and step-by-step problem solving in large language models, without requiring full-scale retraining.

✨ Features

🧮 Dataset: GSM8K – 7.4k high-quality grade-school math problems
⚡ Model: GPT-OSS-20B with 4-bit quantization for memory efficiency
🔧 Fine-tuning: LoRA adapters applied to key transformer layers (q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj)
🛠️ Training Framework: TRL SFTTrainer + Hugging Face Datasets
🔋 Memory Optimization: Gradient checkpointing (unsloth mode) and 8-bit optimizer
🎯 Objective: Supervised Fine-Tuning (SFT) for improved reasoning and problem-solving

📖 Notebook Walkthrough

1. Importing Libraries & Model Setup

Import PyTorch and Unsloth to handle model loading and optimization.
Load GPT-OSS-20B with:
- 4-bit quantization (saves memory)
- max sequence length = 1024 tokens
- LoRA adapters applied on key transformer layers (q_proj, k_proj, v_proj, etc.)
- Gradient checkpointing (unsloth) for efficient training.

2. Loading the GSM8K Dataset

Load GSM8K train split (7.4k math word problems).
Convert data into ShareGPT-style conversations (user → question, assistant → answer).
Apply the model’s chat template to convert into proper training text format.

3. Training Setup with TRL

Use TRL’s SFTTrainer for supervised fine-tuning.
Configure training with:
- Batch size = 1 (with gradient accumulation)
- Optimizer = AdamW (8-bit)
- Learning rate = 2e-4
- Training steps = 30 (demo run)
Run training with trainer.train().

4. Saving the Model

Save fine-tuned model + tokenizer into outputs/ folder.
Print training logs (loss, metrics).

5. Running Inference

Format a user query with apply_chat_template.
Generate a response with sampling (temperature=0.7, top_p=0.9).
Decode and print the model’s prediction.

✅ In summary:

Load GPT-OSS-20B + LoRA
Prepare GSM8K dataset in chat format
Fine-tune using SFTTrainer
Save model & tokenizer
Run inference on math questions

📖 Notebook Walkthrough

1. Importing Libraries & Model Setup

import torch
from unsloth import FastLanguageModel

max_seq_length = 1024
dtype = None

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/gpt-oss-20b",
    dtype=dtype,
    max_seq_length=max_seq_length,
    load_in_4bit=True,
    full_finetuning=False,
)

model = FastLanguageModel.get_peft_model(
    model,
    r=8,
    target_modules=["q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj"],
    lora_alpha=16,
    lora_dropout=0,
    bias="none",
    use_gradient_checkpointing="unsloth",
    random_state=3407,
)

2. Loading the GSM8K Dataset

We load the GSM8K dataset and convert it into ShareGPT conversation format.

from datasets import load_dataset
from unsloth.chat_templates import standardize_sharegpt

# Load dataset
ds = load_dataset("openai/gsm8k", "main")["train"]

# Convert format
def convert_to_sharegpt(example):
    return {
        "conversations": [
            {"from": "user", "value": example["question"]},
            {"from": "assistant", "value": example["answer"]}
        ]
    }

ds = ds.map(convert_to_sharegpt)
ds = standardize_sharegpt(ds)

# Apply tokenizer template
def formatting_prompts_func(examples):
    texts = [
        tokenizer.apply_chat_template(
            convo, tokenize=False, add_generation_prompt=False
        )
        for convo in examples["conversations"]
    ]
    return {"text": texts}

ds = ds.map(formatting_prompts_func, batched=True)

3. Training Setup with TRL

We configure supervised fine-tuning with TRL’s SFTTrainer.

from trl import SFTConfig, SFTTrainer

trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=ds,
    args=SFTConfig(
        per_device_train_batch_size=1,
        gradient_accumulation_steps=4,
        warmup_steps=5,
        max_steps=30,
        learning_rate=2e-4,
        logging_steps=1,
        optim="adamw_8bit",
        weight_decay=0.01,
        lr_scheduler_type="linear",
        seed=3407,
        output_dir="outputs",
        report_to="none",
    ),
)

# Start training
train_result = trainer.train()

4. Saving the Model

We save the fine-tuned model and tokenizer into the outputs/ directory.

# Save model and tokenizer
trainer.save_model("outputs")
tokenizer.save_pretrained("outputs")

# Optional: print metrics
metrics = trainer.state.log_history
print(metrics)

5. Running Inference

We test the fine-tuned model with a sample math reasoning question.

messages = [
    {"role": "user", "content": "If you have 3 apples and eat 1, how many remain?"}
]

input_ids = tokenizer.apply_chat_template(messages, tokenize=True, return_tensors="pt").to(model.device)

outputs = model.generate(
    input_ids,
    max_new_tokens=100,
    temperature=0.7,
    top_p=0.9
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

✅ Summary:

Load GPT-OSS-20B + LoRA
Prepare GSM8K dataset in chat format
Fine-tune using SFTTrainer
Save model & tokenizer
Run inference on math questions

🤖 Inference

Example usage after training:

from transformers import AutoTokenizer
from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained("outputs", load_in_4bit=True)

messages = [
    {"role": "user", "content": "If you have 3 apples and eat 1, how many remain?"}
]

input_ids = tokenizer.apply_chat_template(messages, tokenize=True, return_tensors="pt").to(model.device)

outputs = model.generate(
    input_ids,
    max_new_tokens=100,
    temperature=0.7,
    top_p=0.9
)

response = tokenizer.decode(outputs[0][input_ids.shape[-1]:], skip_special_tokens=True)
print("Model:", response)

🧩 Key Takeaways

LoRA + Quantization = Training 20B models on single GPUs
GSM8K is an excellent benchmark for reasoning-focused finetuning
Unsloth greatly simplifies efficient fine-tuning with minimal code changes

🙌 Acknowledgements

Unsloth for efficient LLM training
Hugging Face ecosystem for datasets + transformers
OpenAI GSM8K dataset for high-quality math reasoning tasks

📊 Results & Next Steps

✅ Successfully fine-tuned GPT-OSS-20B on GSM8K using LoRA. 📈 Expected improvements in step-by-step reasoning for math problems. 🔜 Future Work:

Train for full epochs instead of demo steps
Evaluate on GSM8K test set
Experiment with higher LoRA ranks (r=16 or 32)
Compare with baseline models (GPT-3.5, LLaMA, etc.)

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Finetune_GPT_OSS.ipynb		Finetune_GPT_OSS.ipynb
Finetune_GPT_OSS_20B.py		Finetune_GPT_OSS_20B.py
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Supervised-Fine-Tuning-of-GPT-OSS-20B-on-OpenAI-s-gsm8k-reasoning-with-LoRA

📌 Overview

✨ Features

📖 Notebook Walkthrough

1. Importing Libraries & Model Setup

2. Loading the GSM8K Dataset

3. Training Setup with TRL

4. Saving the Model

5. Running Inference

📖 Notebook Walkthrough

1. Importing Libraries & Model Setup

2. Loading the GSM8K Dataset

3. Training Setup with TRL

4. Saving the Model

5. Running Inference

🤖 Inference

🧩 Key Takeaways

🙌 Acknowledgements

📊 Results & Next Steps

About

Uh oh!

Releases

Packages

Languages

License

Pavansomisetty21/Supervised-Fine-Tuning-of-GPT-OSS-20B-on-OpenAI-s-gsm8k-reasoning-with-LoRA

Folders and files

Latest commit

History

Repository files navigation

Supervised-Fine-Tuning-of-GPT-OSS-20B-on-OpenAI-s-gsm8k-reasoning-with-LoRA

📌 Overview

✨ Features

📖 Notebook Walkthrough

1. Importing Libraries & Model Setup

2. Loading the GSM8K Dataset

3. Training Setup with TRL

4. Saving the Model

5. Running Inference

📖 Notebook Walkthrough

1. Importing Libraries & Model Setup

2. Loading the GSM8K Dataset

3. Training Setup with TRL

4. Saving the Model

5. Running Inference

🤖 Inference

🧩 Key Takeaways

🙌 Acknowledgements

📊 Results & Next Steps

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages