All about technology. — All about artificial intelligence.

NVIDIA and OpenAI unveil the speediest open-source models for intellectual reasoning

AI models made using NVFP4 and CUDA by NVIDIA and OpenAI have been openly released, improving the speed and ease of complex AI reasoning and increasing its accessibility.

, and Administrator

2025 August 13 . 7:12 PM

3 min read

NVIDIA and OpenAI release the fastest open-source models for logical reasoning

NVIDIA and OpenAI unveil the speediest open-source models for intellectual reasoning

In a groundbreaking development, NVIDIA and OpenAI have unveiled their latest creations, the gpt-oss-120b and gpt-oss-20b models. This release feels more like a significant turning point rather than a typical launch, marking a new era in artificial intelligence.

The collaboration between NVIDIA and OpenAI, dating back to the first DGX-1, has led to the evolution of the gpt-oss series. The new models employ the Mixture of Experts (MoE) architecture with SwigGLU activations, a design that allows for large parameter counts with efficient active parameter usage per token. They also incorporate Rotary Position Embedding (RoPE) with a very long context length (up to 128k tokens), alternating between full context and a sliding 128-token window attention mechanism.

These models run in FP4 precision, a low-precision data format natively supported by NVIDIA’s Blackwell GPU architecture. This innovation fits the 120B model on a single 80GB GPU for inference while maintaining accuracy.

Training these models required NVIDIA H100 Tensor Core GPUs, necessitating millions of GPU hours (2.1M hours for the 120B model). For inference, NVIDIA optimized these models via the TensorRT-LLM backend with improved CUDA kernels supporting MoE layers, achieving up to 1.5 million tokens per second throughput on NVIDIA GB200 NVL72 systems. Performance is further enhanced by parallelism strategies like Tensor Parallelism and Expert Parallelism, as well as model silicing across multiple GPUs for large deployments.

The core technology relies on synergy between MoE to activate only subsets of experts, reducing computation while expanding model capacity; efficient attention with RoPE and very long context windows; low-precision FP4 computation enabled by Blackwell GPUs; and highly optimized software stacks including TensorRT-LLM and frameworks like Hugging Face Transformers, Ollama, and vLLM.

This combination allows NVIDIA to deliver these advanced LLMs with high token throughput, low latency, and efficient hardware utilization from cloud GPUs down to desktop RTX GPUs.

The models are classified as "inference microservices" by NVIDIA, making them faster and simpler. They are designed to be easily deployable, especially for those already familiar with CUDA. The deployment of these technologies in the extensive NVIDIA and OpenAI ecosystem often leads to rapid adoption.

Over 4 million developers are working on OpenAI's platform, and over 6.5 million developers are using NVIDIA's software tools. The models are optimized for smooth performance across various devices, including large-scale cloud systems and standard desktop computers with NVIDIA RTX cards. If one is already using common AI tools like Hugging Face or Llama.cpp, these models will integrate immediately.

However, it's important to note that the gpt-oss models require significantly higher processing power, refinement, and operational availability compared to previous versions. The models are now open for contributions from various entities, including startups and universities.

The collaboration between NVIDIA and OpenAI in developing the gpt-oss series is ongoing for nearly a decade. The synergy between hardware, software, and services in the development of the gpt-oss series is unusual at this level. The efficiency of the gpt-oss-120b and gpt-oss-20b models is due to a combination of new hardware (NVIDIA's H100 GPUs) and smart software.

A key innovation in this development is NVIDIA's Blackwell material, specifically the NVFP4 feature. This enables models to run faster and more efficiently using less precise numbers without compromising precision.

In summary, the efficiency and speed of the gpt-oss models come from the Mixture of Experts model design, FP4 precision on NVIDIA Blackwell GPUs, attention optimizations with RoPE over 128k tokens, and software kernel improvements via TensorRT-LLM optimized for these models. These advancements enable faster AI reasoning and tool use at scale.

[1] NVIDIA Press Release

[2] OpenAI Blog Post

[4] NVIDIA Developer Blog

The groundbreaking gpt-oss series, developed by NVIDIA and OpenAI, is powered by the Mixture of Experts (MoE) architecture with SwigGLU activations and Rotary Position Embedding (RoPE) for efficient active parameter usage and very long context length. These models, employing advanced technologies like NVIDIA's Blackwell GPU's FP4 precision and NVIDIA's H100 Tensor Core GPUs, demonstrate significant improvements in AI reasoning and tool use at scale.

The collaboration between NVIDIA and OpenAI in developing the gpt-oss series is a testament to the power of artificial-intelligence and technology synergy, with innovations such as NVIDIA's NVFP4 feature in Blackwell material enabling faster and more efficient AI processing.

Latest

Retailers neglecting to assist with stationery supplies

All about technology.

Retailers of stationery failing to assist or provide support

Retail industry in North Rhine-Westphalia (NRW) has seen a steady growth in significance during November since the year 2020.

, and Administrator

2025 August 14

Electric Vehicles Hold a 28.6% Market Share in Germany, Internal Combustion Engines Leading the...

All about technology.

Electric Vehicles Account for 28.6% of New Vehicle Sales in Germany, Internal Combustion Engines Leading the Top Three

Germany's plugin electric vehicles (EVs) held a significant 28.6% market share in July, representing a notable year-on-year (YoY) rise from 19.1%. The boom in battery electric vehicles (BEVs) was particularly noteworthy, showing a 58% YoY increase, while plugin hybrid electric vehicles (PHEVs)...

, and Administrator

2025 August 14

Mobile payment sector poised to surpass $3.5 trillion by 2023, predicts study

All about technology.

Mobile payment sector poised to hit $3.5 trillion mark by 2023, predicted in new study

Modern nations in the West were initially lagging in technological advancements, particularly in digital payments, yet the pandemic propelled a rapid digitization of these services.

, and Administrator

2025 August 14

Competing tactics catapult brokerage firms to the pinnacle

All about technology.

Competitive approaches elevate brokerages to dominant positions in the market

Tech-focused AI versus long-term expansion strategy.

, and Administrator

2025 August 14

NVIDIA and OpenAI unveil the speediest open-source models for intellectual reasoning

NVIDIA and OpenAI unveil the speediest open-source models for intellectual reasoning

Read also:

Related

Latest