GPTOSS
🎉

Get Started with Free ChatGPT OSS

Experience the power of open source ChatGPT completely free online! No API keys, no subscriptions, no usage limits. Start chatting with advanced AI models instantly.

🚀Try Free ChatGPT OSS Now
✓Free online access
✓100% open source

GPT-OSS 120B Installation Guide: Deploy OpenAI Open-Weight Models

Welcome to the GPT-OSS series - OpenAI's open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases. This guide covers installation and deployment of both GPT-OSS 120B and 20B models.

Model Overview

We're releasing two flavors of these open source ChatGPT models:

GPT-OSS 120B

  • Parameters: 117B total (5.1B active)
  • Use Case: Production, general purpose, high reasoning
  • Hardware: Fits into a single H100 GPU
  • Best For: Enterprise applications, complex reasoning tasks

GPT-OSS 20B

  • Parameters: 21B total (3.6B active)
  • Use Case: Lower latency, local or specialized
  • Hardware: Runs within 16GB of memory
  • Best For: Local deployment, consumer hardware

Important: Both models were trained on the harmony response format and should only be used with the harmony format as they will not work correctly otherwise.

Key Features & Highlights

Core Features

  • ✓Permissive Apache 2.0 license: Build freely without copyleft restrictions
  • ✓Configurable reasoning effort: Adjust reasoning (low, medium, high)
  • ✓Full chain-of-thought: Complete access to reasoning process
  • ✓Fine-tunable: Customize models for specific use cases

Advanced Capabilities

  • ✓Agentic capabilities: Function calling, web browsing, Python execution
  • ✓Native MXFP4 quantization: Optimized for memory efficiency
  • ✓Structured Outputs: JSON and schema-based responses
  • ✓Production ready: Ideal for commercial deployment

Installation Methods

Choose the installation method that best fits your needs and hardware setup:

Transformers

Python library integration

vLLM

High-performance serving

Ollama

Consumer hardware

LM Studio

GUI interface

Transformers Installation

You can use GPT-OSS 120B and 20B with Transformers. The chat template automatically applies the harmony response format.

Prerequisites

Install the necessary dependencies to setup your environment:

Bash
pip install -U transformers kernels torch

Basic Usage

Run the model with the following Python code:

Python
from transformers import pipeline
import torch

model_id = "openai/gpt-oss-120b"

pipe = pipeline(
    "text-generation",
    model=model_id,
    torch_dtype="auto",
    device_map="auto",
)

messages = [
    {"role": "user", "content": "Explain quantum mechanics clearly and concisely."},
]

outputs = pipe(
    messages,
    max_new_tokens=256,
)
print(outputs[0]["generated_text"][-1])

Transformers Serve

Spin up an OpenAI-compatible webserver:

Bash
transformers serve transformers chat localhost:8000 --model-name-or-path openai/gpt-oss-120b

vLLM Deployment

vLLM provides high-performance serving for self-hosted LLM deployments. vLLM recommends using uv for Python dependency management.

Installation

Bash
uv pip install --pre vllm==0.10.1+gptoss \
    --extra-index-url https://wheels.vllm.ai/gpt-oss/ \
    --extra-index-url https://download.pytorch.org/whl/nightly/cu128 \
    --index-strategy unsafe-best-match

Start Server

The following command will automatically download the model and start the server:

Bash
vllm serve openai/gpt-oss-120b

Ollama Installation

If you are trying to run GPT-OSS on consumer hardware, you can use Ollama. This is perfect for AI chatbot alternatives on local machines.

Install Ollama

First, install Ollama from ollama.ai

Download and Run Models

Bash
# GPT-OSS 120B
ollama pull gpt-oss:120b
ollama run gpt-oss:120b

# GPT-OSS 20B (for lower-end hardware)
ollama pull gpt-oss:20b
ollama run gpt-oss:20b

LM Studio Setup

LM Studio provides a user-friendly GUI for running open source ChatGPT models locally.

Download Model

Use the following command in LM Studio:

Bash
# GPT-OSS 120B
lms get openai/gpt-oss-120b

# GPT-OSS 20B
lms get openai/gpt-oss-20b

Direct Model Download

You can download the model weights directly from Hugging Face:

Bash
# Using Hugging Face CLI
huggingface-cli download openai/gpt-oss-120b --include "original/*" --local-dir gpt-oss-120b/

# Install GPT-OSS package
pip install gpt-oss

# Run the model
python -m gpt_oss.chat model/

Reasoning Levels Configuration

You can adjust the reasoning level that suits your task across three levels:

Low

Fast responses for general dialogue

Medium

Balanced speed and detail

High

Deep and detailed analysis

The reasoning level can be set in the system prompts:

Python
messages = [
    {"role": "system", "content": "Reasoning: high"},
    {"role": "user", "content": "Your question here"}
]

Tool Use & Agentic Capabilities

The GPT-OSS models excel at:

  • Web browsing: Using built-in browsing tools for real-time information
  • Function calling: Executing functions with defined schemas
  • Python code execution: Running and debugging Python code
  • Agentic operations: Complex multi-step browser tasks
  • Structured outputs: JSON and schema-based responses

Fine-tuning Guide

Both GPT-OSS models can be fine-tuned for specialized use cases:

GPT-OSS 120B

Can be fine-tuned on a single H100 node

GPT-OSS 20B

Can be fine-tuned on consumer hardware

Hardware Requirements

ModelMemory RequiredRecommended GPUUse Case
GPT-OSS 120B80GB+ VRAMH100, A100 80GBProduction, Enterprise
GPT-OSS 20B16GB+ RAMRTX 4090, RTX 3090Local, Development

Performance Optimization Tips

  • Use MXFP4 quantization for memory efficiency
  • Enable GPU acceleration when available
  • Configure appropriate batch sizes for your hardware
  • Monitor memory usage with nvidia-smi
  • Use gradient checkpointing for fine-tuning on limited memory

Next Steps

Now that you have GPT-OSS running, explore our other resources: