Skip to content

Pavansomisetty21/Supervised-Fine-Tuning-of-GPT-OSS-20B-on-OpenAI-s-gsm8k-reasoning-with-LoRA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

4 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Supervised-Fine-Tuning-of-GPT-OSS-20B-on-OpenAI-s-gsm8k-reasoning-with-LoRA

๐Ÿ“Œ Overview

This project demonstrates efficient supervised fine-tuning of the GPT-OSS-20B model using the OpenAI GSM8K dataset. We leverage LoRA (Low-Rank Adaptation) with Unsloth to make fine-tuning large models practical on limited GPU resources.

The goal is to enhance mathematical reasoning and step-by-step problem solving in large language models, without requiring full-scale retraining.


โœจ Features

  • ๐Ÿงฎ Dataset: GSM8K โ€“ 7.4k high-quality grade-school math problems
  • โšก Model: GPT-OSS-20B with 4-bit quantization for memory efficiency
  • ๐Ÿ”ง Fine-tuning: LoRA adapters applied to key transformer layers (q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj)
  • ๐Ÿ› ๏ธ Training Framework: TRL SFTTrainer + Hugging Face Datasets
  • ๐Ÿ”‹ Memory Optimization: Gradient checkpointing (unsloth mode) and 8-bit optimizer
  • ๐ŸŽฏ Objective: Supervised Fine-Tuning (SFT) for improved reasoning and problem-solving

๐Ÿ“– Notebook Walkthrough

1. Importing Libraries & Model Setup

  • Import PyTorch and Unsloth to handle model loading and optimization.

  • Load GPT-OSS-20B with:

    • 4-bit quantization (saves memory)
    • max sequence length = 1024 tokens
    • LoRA adapters applied on key transformer layers (q_proj, k_proj, v_proj, etc.)
    • Gradient checkpointing (unsloth) for efficient training.

2. Loading the GSM8K Dataset

  • Load GSM8K train split (7.4k math word problems).
  • Convert data into ShareGPT-style conversations (user โ†’ question, assistant โ†’ answer).
  • Apply the modelโ€™s chat template to convert into proper training text format.

3. Training Setup with TRL

  • Use TRLโ€™s SFTTrainer for supervised fine-tuning.

  • Configure training with:

    • Batch size = 1 (with gradient accumulation)
    • Optimizer = AdamW (8-bit)
    • Learning rate = 2e-4
    • Training steps = 30 (demo run)
  • Run training with trainer.train().

4. Saving the Model

  • Save fine-tuned model + tokenizer into outputs/ folder.
  • Print training logs (loss, metrics).

5. Running Inference

  • Format a user query with apply_chat_template.
  • Generate a response with sampling (temperature=0.7, top_p=0.9).
  • Decode and print the modelโ€™s prediction.

โœ… In summary:

  1. Load GPT-OSS-20B + LoRA
  2. Prepare GSM8K dataset in chat format
  3. Fine-tune using SFTTrainer
  4. Save model & tokenizer
  5. Run inference on math questions

๐Ÿ“– Notebook Walkthrough

1. Importing Libraries & Model Setup

import torch
from unsloth import FastLanguageModel

max_seq_length = 1024
dtype = None

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/gpt-oss-20b",
    dtype=dtype,
    max_seq_length=max_seq_length,
    load_in_4bit=True,
    full_finetuning=False,
)

model = FastLanguageModel.get_peft_model(
    model,
    r=8,
    target_modules=["q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj"],
    lora_alpha=16,
    lora_dropout=0,
    bias="none",
    use_gradient_checkpointing="unsloth",
    random_state=3407,
)

2. Loading the GSM8K Dataset

We load the GSM8K dataset and convert it into ShareGPT conversation format.

from datasets import load_dataset
from unsloth.chat_templates import standardize_sharegpt

# Load dataset
ds = load_dataset("openai/gsm8k", "main")["train"]

# Convert format
def convert_to_sharegpt(example):
    return {
        "conversations": [
            {"from": "user", "value": example["question"]},
            {"from": "assistant", "value": example["answer"]}
        ]
    }

ds = ds.map(convert_to_sharegpt)
ds = standardize_sharegpt(ds)

# Apply tokenizer template
def formatting_prompts_func(examples):
    texts = [
        tokenizer.apply_chat_template(
            convo, tokenize=False, add_generation_prompt=False
        )
        for convo in examples["conversations"]
    ]
    return {"text": texts}

ds = ds.map(formatting_prompts_func, batched=True)

3. Training Setup with TRL

We configure supervised fine-tuning with TRLโ€™s SFTTrainer.

from trl import SFTConfig, SFTTrainer

trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=ds,
    args=SFTConfig(
        per_device_train_batch_size=1,
        gradient_accumulation_steps=4,
        warmup_steps=5,
        max_steps=30,
        learning_rate=2e-4,
        logging_steps=1,
        optim="adamw_8bit",
        weight_decay=0.01,
        lr_scheduler_type="linear",
        seed=3407,
        output_dir="outputs",
        report_to="none",
    ),
)

# Start training
train_result = trainer.train()

4. Saving the Model

We save the fine-tuned model and tokenizer into the outputs/ directory.

# Save model and tokenizer
trainer.save_model("outputs")
tokenizer.save_pretrained("outputs")

# Optional: print metrics
metrics = trainer.state.log_history
print(metrics)

5. Running Inference

We test the fine-tuned model with a sample math reasoning question.

messages = [
    {"role": "user", "content": "If you have 3 apples and eat 1, how many remain?"}
]

input_ids = tokenizer.apply_chat_template(messages, tokenize=True, return_tensors="pt").to(model.device)

outputs = model.generate(
    input_ids,
    max_new_tokens=100,
    temperature=0.7,
    top_p=0.9
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

โœ… Summary:

  1. Load GPT-OSS-20B + LoRA
  2. Prepare GSM8K dataset in chat format
  3. Fine-tune using SFTTrainer
  4. Save model & tokenizer
  5. Run inference on math questions

๐Ÿค– Inference

Example usage after training:

from transformers import AutoTokenizer
from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained("outputs", load_in_4bit=True)

messages = [
    {"role": "user", "content": "If you have 3 apples and eat 1, how many remain?"}
]

input_ids = tokenizer.apply_chat_template(messages, tokenize=True, return_tensors="pt").to(model.device)

outputs = model.generate(
    input_ids,
    max_new_tokens=100,
    temperature=0.7,
    top_p=0.9
)

response = tokenizer.decode(outputs[0][input_ids.shape[-1]:], skip_special_tokens=True)
print("Model:", response)

๐Ÿงฉ Key Takeaways

  • LoRA + Quantization = Training 20B models on single GPUs
  • GSM8K is an excellent benchmark for reasoning-focused finetuning
  • Unsloth greatly simplifies efficient fine-tuning with minimal code changes

๐Ÿ™Œ Acknowledgements

๐Ÿ“Š Results & Next Steps

โœ… Successfully fine-tuned GPT-OSS-20B on GSM8K using LoRA. ๐Ÿ“ˆ Expected improvements in step-by-step reasoning for math problems. ๐Ÿ”œ Future Work:

  • Train for full epochs instead of demo steps
  • Evaluate on GSM8K test set
  • Experiment with higher LoRA ranks (r=16 or 32)
  • Compare with baseline models (GPT-3.5, LLaMA, etc.)

About

In this we finetune GPT-OSS-20B on OpenAI's gsm8k dataset

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published