Get Started with Free ChatGPT OSS
Experience the power of open source ChatGPT completely free online! No API keys, no subscriptions, no usage limits. Start chatting with advanced AI models instantly.
GPT-OSS 120B Installation Guide: Deploy OpenAI Open-Weight Models
Welcome to the GPT-OSS series - OpenAI's open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases. This guide covers installation and deployment of both GPT-OSS 120B and 20B models.
Model Overview
We're releasing two flavors of these open source ChatGPT models:
GPT-OSS 120B
- Parameters: 117B total (5.1B active)
- Use Case: Production, general purpose, high reasoning
- Hardware: Fits into a single H100 GPU
- Best For: Enterprise applications, complex reasoning tasks
GPT-OSS 20B
- Parameters: 21B total (3.6B active)
- Use Case: Lower latency, local or specialized
- Hardware: Runs within 16GB of memory
- Best For: Local deployment, consumer hardware
Important: Both models were trained on the harmony response format and should only be used with the harmony format as they will not work correctly otherwise.
Key Features & Highlights
Core Features
- ✓Permissive Apache 2.0 license: Build freely without copyleft restrictions
- ✓Configurable reasoning effort: Adjust reasoning (low, medium, high)
- ✓Full chain-of-thought: Complete access to reasoning process
- ✓Fine-tunable: Customize models for specific use cases
Advanced Capabilities
- ✓Agentic capabilities: Function calling, web browsing, Python execution
- ✓Native MXFP4 quantization: Optimized for memory efficiency
- ✓Structured Outputs: JSON and schema-based responses
- ✓Production ready: Ideal for commercial deployment
Installation Methods
Choose the installation method that best fits your needs and hardware setup:
Transformers
Python library integration
vLLM
High-performance serving
Ollama
Consumer hardware
LM Studio
GUI interface
Transformers Installation
You can use GPT-OSS 120B and 20B with Transformers. The chat template automatically applies the harmony response format.
Prerequisites
Install the necessary dependencies to setup your environment:
pip install -U transformers kernels torch
Basic Usage
Run the model with the following Python code:
from transformers import pipeline
import torch
model_id = "openai/gpt-oss-120b"
pipe = pipeline(
"text-generation",
model=model_id,
torch_dtype="auto",
device_map="auto",
)
messages = [
{"role": "user", "content": "Explain quantum mechanics clearly and concisely."},
]
outputs = pipe(
messages,
max_new_tokens=256,
)
print(outputs[0]["generated_text"][-1])
Transformers Serve
Spin up an OpenAI-compatible webserver:
transformers serve transformers chat localhost:8000 --model-name-or-path openai/gpt-oss-120b
vLLM Deployment
vLLM provides high-performance serving for self-hosted LLM deployments. vLLM recommends using uv for Python dependency management.
Installation
uv pip install --pre vllm==0.10.1+gptoss \
--extra-index-url https://wheels.vllm.ai/gpt-oss/ \
--extra-index-url https://download.pytorch.org/whl/nightly/cu128 \
--index-strategy unsafe-best-match
Start Server
The following command will automatically download the model and start the server:
vllm serve openai/gpt-oss-120b
Ollama Installation
If you are trying to run GPT-OSS on consumer hardware, you can use Ollama. This is perfect for AI chatbot alternatives on local machines.
Install Ollama
First, install Ollama from ollama.ai
Download and Run Models
# GPT-OSS 120B
ollama pull gpt-oss:120b
ollama run gpt-oss:120b
# GPT-OSS 20B (for lower-end hardware)
ollama pull gpt-oss:20b
ollama run gpt-oss:20b
LM Studio Setup
LM Studio provides a user-friendly GUI for running open source ChatGPT models locally.
Download Model
Use the following command in LM Studio:
# GPT-OSS 120B
lms get openai/gpt-oss-120b
# GPT-OSS 20B
lms get openai/gpt-oss-20b
Direct Model Download
You can download the model weights directly from Hugging Face:
# Using Hugging Face CLI
huggingface-cli download openai/gpt-oss-120b --include "original/*" --local-dir gpt-oss-120b/
# Install GPT-OSS package
pip install gpt-oss
# Run the model
python -m gpt_oss.chat model/
Reasoning Levels Configuration
You can adjust the reasoning level that suits your task across three levels:
Low
Fast responses for general dialogue
Medium
Balanced speed and detail
High
Deep and detailed analysis
The reasoning level can be set in the system prompts:
messages = [
{"role": "system", "content": "Reasoning: high"},
{"role": "user", "content": "Your question here"}
]
Tool Use & Agentic Capabilities
The GPT-OSS models excel at:
- Web browsing: Using built-in browsing tools for real-time information
- Function calling: Executing functions with defined schemas
- Python code execution: Running and debugging Python code
- Agentic operations: Complex multi-step browser tasks
- Structured outputs: JSON and schema-based responses
Fine-tuning Guide
Both GPT-OSS models can be fine-tuned for specialized use cases:
GPT-OSS 120B
Can be fine-tuned on a single H100 node
GPT-OSS 20B
Can be fine-tuned on consumer hardware
Hardware Requirements
Model | Memory Required | Recommended GPU | Use Case |
---|---|---|---|
GPT-OSS 120B | 80GB+ VRAM | H100, A100 80GB | Production, Enterprise |
GPT-OSS 20B | 16GB+ RAM | RTX 4090, RTX 3090 | Local, Development |
Performance Optimization Tips
- Use MXFP4 quantization for memory efficiency
- Enable GPU acceleration when available
- Configure appropriate batch sizes for your hardware
- Monitor memory usage with
nvidia-smi
- Use gradient checkpointing for fine-tuning on limited memory
Next Steps
Now that you have GPT-OSS running, explore our other resources: