NVIDIA and OpenAI Unveil Lightning-Fast Open Reasoning Model Models
NVIDIA and OpenAI Unveil Revolutionary AI Models
NVIDIA and OpenAI have announced the release of their latest AI models, the gpt-oss-120b and gpt-oss-20b, marking a significant step forward in their collaboration. This release, powered by NVIDIA's Blackwell architecture and a new 4-bit precision format called NVFP4, feels less like a launch and more like a turning point in AI development.
Trained on NVIDIA's powerful H100 GPUs, these models leverage advanced hardware and software technologies for efficient, open-weight reasoning. The models are optimized for inference on the latest NVIDIA Blackwell architecture, specifically on the NVIDIA GB200 NVL72 rack-scale system, achieving inference speeds up to 1.5 million tokens per second. They also run efficiently on more accessible devices, including NVIDIA RTX consumer GPUs such as the GeForce RTX 5090, enabling performance around 256 tokens/second for desktop and workstation usage.
The models use a Mixture-of-Experts (MoE) architecture with SwigGLU activations to optimize performance and reasoning capacity. The attention mechanism uses Rotary Positional Encodings (RoPE) with extended context windows up to 128k tokens, allowing them to handle very long inputs efficiently. Software optimizations are integrated with frameworks like Hugging Face Transformers, Ollama, vLLM, and NVIDIA's own TensorRT-LLM, providing highly efficient kernels for inference acceleration and integration ease.
These combined hardware-software innovations enable efficient large-scale reasoning on open-weight LLMs, allowing deployment from large data centers to edge PCs. The models support advanced capabilities like chain-of-thought reasoning, instruction following, and tool use, enabling flexible, dynamic AI applications.
The models are packaged as "Inference Microservices," making them faster and easier to use. If you're already using popular AI tools like Hugging Face or Llama.cpp, these models will plug right in. The relationship between NVIDIA and OpenAI, which dates back to the delivery of the first DGX-1, is instrumental in this release.
Over 4 million lifetime developers are building on OpenAI's platform, and the new models, gpt-oss-120b and gpt-oss-20b, are designed to be used by anyone, including startups and universities. With NVIDIA's extensive developer base of over 6.5 million, these models are poised to make AI more accessible and foster innovation.
The hardware, software, and services of NVIDIA are all working together in this release, a rare sight at this level. The models require orders of magnitude more computing power, polish, and operational readiness compared to previous productions. The gpt-oss series represents a significant step forward in the collaboration between NVIDIA and OpenAI.
These AI models, the gpt-oss-120b and gpt-oss-20b, developed by NVIDIA and OpenAI, utilize artificial-intelligence technology and are optimized for inference on the latest NVIDIA Blackwell architecture (token). The models also run efficiently on NVIDIA RTX consumer GPUs, such as the GeForce RTX 5090, showcasing the integration of advanced hardware and software technologies (technology).