🎉

Get Started with Free ChatGPT OSS

Experience the power of open source ChatGPT completely free online! No API keys, no subscriptions, no usage limits. Start chatting with advanced AI models instantly.

🚀Try Free ChatGPT OSS Now

✓Free online access

✓100% open source

GPT-OSS 120B Installation Guide: Deploy OpenAI Open-Weight Models

Welcome to the GPT-OSS series - OpenAI's open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases. This guide covers installation and deployment of both GPT-OSS 120B and 20B models.

Model Overview

We're releasing two flavors of these open source ChatGPT models:

GPT-OSS 120B

Parameters: 117B total (5.1B active)
Use Case: Production, general purpose, high reasoning
Hardware: Fits into a single H100 GPU
Best For: Enterprise applications, complex reasoning tasks

GPT-OSS 20B

Parameters: 21B total (3.6B active)
Use Case: Lower latency, local or specialized
Hardware: Runs within 16GB of memory
Best For: Local deployment, consumer hardware

Important: Both models were trained on the harmony response format and should only be used with the harmony format as they will not work correctly otherwise.

Key Features & Highlights

Core Features

✓Permissive Apache 2.0 license: Build freely without copyleft restrictions
✓Configurable reasoning effort: Adjust reasoning (low, medium, high)
✓Full chain-of-thought: Complete access to reasoning process
✓Fine-tunable: Customize models for specific use cases

Advanced Capabilities

✓Agentic capabilities: Function calling, web browsing, Python execution
✓Native MXFP4 quantization: Optimized for memory efficiency
✓Structured Outputs: JSON and schema-based responses
✓Production ready: Ideal for commercial deployment

Installation Methods

Choose the installation method that best fits your needs and hardware setup:

Transformers

Python library integration

vLLM

High-performance serving

Ollama

Consumer hardware

LM Studio

GUI interface

Transformers Installation

You can use GPT-OSS 120B and 20B with Transformers. The chat template automatically applies the harmony response format.

Prerequisites

Install the necessary dependencies to setup your environment:

Bash

pip install -U transformers kernels torch

Basic Usage

Run the model with the following Python code:

Python

from transformers import pipeline
import torch

model_id = "openai/gpt-oss-120b"

pipe = pipeline(
    "text-generation",
    model=model_id,
    torch_dtype="auto",
    device_map="auto",
)

messages = [
    {"role": "user", "content": "Explain quantum mechanics clearly and concisely."},
]

outputs = pipe(
    messages,
    max_new_tokens=256,
)
print(outputs[0]["generated_text"][-1])

Transformers Serve

Spin up an OpenAI-compatible webserver:

Bash

transformers serve transformers chat localhost:8000 --model-name-or-path openai/gpt-oss-120b

vLLM Deployment

vLLM provides high-performance serving for self-hosted LLM deployments. vLLM recommends using uv for Python dependency management.

Installation

Bash

uv pip install --pre vllm==0.10.1+gptoss \
    --extra-index-url https://wheels.vllm.ai/gpt-oss/ \
    --extra-index-url https://download.pytorch.org/whl/nightly/cu128 \
    --index-strategy unsafe-best-match

Start Server

The following command will automatically download the model and start the server:

Bash

vllm serve openai/gpt-oss-120b

Ollama Installation

If you are trying to run GPT-OSS on consumer hardware, you can use Ollama. This is perfect for AI chatbot alternatives on local machines.

Install Ollama

First, install Ollama from ollama.ai

Download and Run Models

Bash

# GPT-OSS 120B
ollama pull gpt-oss:120b
ollama run gpt-oss:120b

# GPT-OSS 20B (for lower-end hardware)
ollama pull gpt-oss:20b
ollama run gpt-oss:20b

LM Studio Setup

LM Studio provides a user-friendly GUI for running open source ChatGPT models locally.

Download Model

Use the following command in LM Studio:

Bash

# GPT-OSS 120B
lms get openai/gpt-oss-120b

# GPT-OSS 20B
lms get openai/gpt-oss-20b

Direct Model Download

You can download the model weights directly from Hugging Face:

Bash

# Using Hugging Face CLI
huggingface-cli download openai/gpt-oss-120b --include "original/*" --local-dir gpt-oss-120b/

# Install GPT-OSS package
pip install gpt-oss

# Run the model
python -m gpt_oss.chat model/

Reasoning Levels Configuration

You can adjust the reasoning level that suits your task across three levels:

Low

Fast responses for general dialogue

Medium

Balanced speed and detail

High

Deep and detailed analysis

The reasoning level can be set in the system prompts:

Python

messages = [
    {"role": "system", "content": "Reasoning: high"},
    {"role": "user", "content": "Your question here"}
]

Tool Use & Agentic Capabilities

The GPT-OSS models excel at:

Web browsing: Using built-in browsing tools for real-time information
Function calling: Executing functions with defined schemas
Python code execution: Running and debugging Python code
Agentic operations: Complex multi-step browser tasks
Structured outputs: JSON and schema-based responses

Fine-tuning Guide

Both GPT-OSS models can be fine-tuned for specialized use cases:

GPT-OSS 120B

Can be fine-tuned on a single H100 node

GPT-OSS 20B

Can be fine-tuned on consumer hardware

Hardware Requirements

Model	Memory Required	Recommended GPU	Use Case
GPT-OSS 120B	80GB+ VRAM	H100, A100 80GB	Production, Enterprise
GPT-OSS 20B	16GB+ RAM	RTX 4090, RTX 3090	Local, Development

Performance Optimization Tips

Use MXFP4 quantization for memory efficiency
Enable GPU acceleration when available
Configure appropriate batch sizes for your hardware
Monitor memory usage with nvidia-smi
Use gradient checkpointing for fine-tuning on limited memory

Next Steps

Now that you have GPT-OSS running, explore our other resources:

Table of Contents

Get Started with Free ChatGPT OSS

GPT-OSS 120B Installation Guide: Deploy OpenAI Open-Weight Models

Model Overview

GPT-OSS 120B

GPT-OSS 20B

Key Features & Highlights

Core Features

Advanced Capabilities

Installation Methods

Transformers

vLLM

Ollama

LM Studio

Transformers Installation

Prerequisites

Basic Usage

Transformers Serve

vLLM Deployment

Installation

Start Server

Ollama Installation

Install Ollama

Download and Run Models

LM Studio Setup

Download Model

Direct Model Download

Reasoning Levels Configuration

Low

Medium

High

Tool Use & Agentic Capabilities

Fine-tuning Guide

GPT-OSS 120B

GPT-OSS 20B

Hardware Requirements

Performance Optimization Tips

Next Steps

Additional Resources

Related Articles

Open Source ChatGPT Alternatives: Complete Guide 2025

GPT-5 Overlooked Details: The Crucial Information You Need to Know

ChatGPT OSS vs OpenAI: Complete Comparison Guide