Revolutionizing Tech with AI — Gadget Hype's Tech Hub

oLLM Library Enables Execution of 8B-20B Models on NVIDIA GPUs

oLLM brings large language models to consumer GPUs. It's not perfect, but it's a significant step forward for offline and batch workloads.

, and Administrator

2025 October 3 . 10:09 PM

1 min read

In this picture we can see a platform and on this platform we can see a mosaic art of snakes and a... — In this picture we can see a platform and on this platform we can see a mosaic art of snakes and a person.

oLLM Library Enables Execution of 8B-20B Models on NVIDIA GPUs

oLLM, a new library built on Huggingface Transformers and PyTorch, has been released. It enables comfortable execution of 8B-20B models and even MoE-80B on NVIDIA GPUs, with some limitations.

oLLM's key features include KV cache read/writes that bypass host RAM, DiskCache support for Qwen3-Next-80B, Llama-3 FlashAttention-2 for stability, and GPT-OSS memory reductions via 'flash-attention-like' kernels and chunked MLP. It targets offline, single-GPU workloads, using FP16/BF16 weights with FlashAttention-2 and disk-backed KV caching.

The library reports a throughput of ~0.5 tok/s for Qwen3-Next-80B at 50K context on an RTX 3060 Ti, suitable for batch/offline analytics but not interactive chat. It streams layer weights directly from SSD into the GPU, offloads the attention KV cache to SSD, and optionally offloads layers to CPU. Out of the box, oLLM supports Llama-3 (1B/3B/8B), GPT-OSS-20B, and Qwen3-Next-80B models, targeting NVIDIA Ampere, Ada, and Hopper GPUs.

oLLM can handle up to ~100K tokens of context while keeping VRAM within 8-10 GB. Running Qwen3-Next-80B on consumer hardware with oLLM is feasible but still storage-bound and workload-specific. The library is lightweight and built for pragmatic use, offering a way to execute large models comfortably on NVIDIA GPUs.

Latest

In this picture, we see the coin in gold and brown color. We see some text written as "The United...

Invest Smart, Save More

Silver and Gold Surge to Decade, Record Highs Amid Market Uncertainty

Silver prices climb to 2011 highs, gold surges past $4,000. Digital gold tokens like PAX Gold and Tether Gold gain popularity, driving demand for safe havens.

, and Administrator

2025 October 9

In this image there are two buildings, in which there is a fire in a building,and in the background...

Smart-home-devices

Firefighters Quickly Extinguish Blaze, Save Lives in Kamchatka

Firefighters' quick response saved lives. A faulty chandelier sparked the blaze, causing significant damage to an apartment.

, and Administrator

2025 October 9

Explore Latest Tech Trends!

Apple AirPods 4 Now Available at 20% Off During Amazon Prime Day 2025

Get the new AirPods 4 at an unbeatable price. Enjoy improved fit, noise cancellation, and advanced features during Amazon's Prime Day 2025.

, and Administrator

2025 October 9

there was a room in which people are sitting in the chairs,in front of a table looking into the...

Protect Your Gadgets from Cyber Threats

Telstra Confirms Data Breach Affecting 30,000 Employees

Telstra's data breach follows the recent Optus incident. 30,000 employees' data exposed, but no sensitive personal details. Stay vigilant against potential phishing attempts.

, and Administrator

2025 October 9

oLLM Library Enables Execution of 8B-20B Models on NVIDIA GPUs

oLLM Library Enables Execution of 8B-20B Models on NVIDIA GPUs

Read also:

Related

Latest