Making Softmax More Efficient with NVIDIA Blackwell Ultra

Making Softmax More Efficient with NVIDIA Blackwell Ultra

• LLM context lengths are exploding, and architectures are moving toward complex attention schemes like Multi-Head Latent Attention (MLA) and Grouped Query Attention (GQA) • As a r

Using NVFP4 Low-Precision Model Training for Higher Throughput Without Losing Accuracy

Using NVFP4 Low-Precision Model Training for Higher Throughput Without Losing Accuracy

• As the sizes of AI models and datasets continue to increase, relying only on higher-precision BF16 training is no longer sufficient. • Key challenges such as training throughput

Accelerating Data Processing with NVIDIA Multi-Instance GPU and NUMA Node Localization

Accelerating Data Processing with NVIDIA Multi-Instance GPU and NUMA Node Localization

• NVIDIA flagship data center GPUs in the NVIDIA Ampere, NVIDIA Hopper, and NVIDIA Blackwell families all feature non-uniform memory access (NUMA) behaviors, but expose a single me

Unlocking Ultralow Latency Video for Tethered Drones with Semtech's BlueRiver® Audio Video Processor Technology

Unlocking Ultralow Latency Video for Tethered Drones with Semtech's BlueRiver® Audio Video Processor Technology

• Unlocking Ultralow Latency Video for Tethered Drones with Semtech’s BlueRiver® Audio Video Processor Technology Tethered drones are rapidly expanding across surveillance, industr

Unlock Massive Token Throughput with GPU Fractioning in NVIDIA Run:ai

Unlock Massive Token Throughput with GPU Fractioning in NVIDIA Run:ai

• As AI workloads scale, achieving high throughput, efficient resource usage, and predictable latency becomes essential. • NVIDIA Run:ai addresses these challenges through intellig

Topping the GPU MODE Kernel Leaderboard with NVIDIA cuda.compute

Topping the GPU MODE Kernel Leaderboard with NVIDIA cuda.compute

• Topping the GPU MODE Kernel Leaderboard with NVIDIA cuda.compute The leaderboard scores how fast users’ custom GPU kernels solve a set of standard problems like vector addition,

How NVIDIA Extreme Hardware-Software Co-Design Delivered a Large Inference Boost for Sarvam AI's Sovereign Models

How NVIDIA Extreme Hardware-Software Co-Design Delivered a Large Inference Boost for Sarvam AI's Sovereign Models

• As global AI adoption accelerates, developers face a growing challenge: delivering large language model (LLM) performance that meets real-world latency and cost requirements. • R

Build AI-Ready Knowledge Systems Using 5 Essential Multimodal RAG Capabilities

Build AI-Ready Knowledge Systems Using 5 Essential Multimodal RAG Capabilities

• Enterprise data is inherently complex: real-world documents are multimodal, spanning text, tables, charts and graphs, images, diagrams, scanned pages, forms, and embedded metadat

LoRa® and LoRaWAN® for Industrial Sensors: Enabling Smart, Connected Operations

LoRa® and LoRaWAN® for Industrial Sensors: Enabling Smart, Connected Operations

• LoRa® and LoRaWAN® for Industrial Sensors: Enabling Smart, Connected Operations The industrial sector is experiencing a significant transformation as organizations strive to enha

R²D²: Scaling Multimodal Robot Learning with NVIDIA Isaac Lab

R²D²: Scaling Multimodal Robot Learning with NVIDIA Isaac Lab

• Building robust, intelligent robots requires testing them in complex environments. • However, gathering data in the physical world is expensive, slow, and often dangerous. • It i

Using Accelerated Computing to Live-Steer Scientific Experiments at Massive Research Facilities

Using Accelerated Computing to Live-Steer Scientific Experiments at Massive Research Facilities

• Scientists and engineers who design and build unique scientific research facilities face similar challenges. • These include managing massive data rates that exceed current compu

Automating Inference Optimizations with NVIDIA TensorRT LLM AutoDeploy

Automating Inference Optimizations with NVIDIA TensorRT LLM AutoDeploy

• NVIDIA TensorRT LLM enables developers to build high-performance inference engines for large language models (LLMs), but deploying a new architecture traditionally requires signi

3 Ways NVFP4 Accelerates AI Training and Inference

3 Ways NVFP4 Accelerates AI Training and Inference

• 3 Ways NVFP4 Accelerates AI Training and Inference L T F R E The latest AI models continue to grow in size and complexity, demanding increasing amounts of compute performance for

How to Build License-Compliant Synthetic Data Pipelines for AI Model Distillation

How to Build License-Compliant Synthetic Data Pipelines for AI Model Distillation

• Specialized AI models are built to perform specific tasks or solve particular problems. • But if you’ve ever tried to fine-tune or distill a domain-specific model, you’ve probabl

How Painkiller RTX Uses Generative AI to Modernize Game Assets at Scale

How Painkiller RTX Uses Generative AI to Modernize Game Assets at Scale

• Painkiller RTX sets a new standard for how small teams can balance massive visual ambition with limited resources by integrating generative AI. • By upscaling thousands of legacy

Build with Kimi K2.5 Multimodal VLM Using NVIDIA GPU-Accelerated Endpoints

Build with Kimi K2.5 Multimodal VLM Using NVIDIA GPU-Accelerated Endpoints

• Kimi K2.5 is a multimodal vision‑language model trained with Megatron‑LM. • It contains 1 trillion parameters, 384 experts, a single dense layer, and 3.2% activation per token. •

How to Build a Document Processing Pipeline for RAG with Nemotron

How to Build a Document Processing Pipeline for RAG with Nemotron

• What if your AI agent could instantly parse complex PDFs, extract nested tables, and ‘see’ data within charts as easily as reading a text file? • With NVIDIA Nemotron RAG, you ca

Accelerating Long-Context Model Training in JAX and XLA

Accelerating Long-Context Model Training in JAX and XLA

• Large language models (LLMs) are rapidly expanding their context windows, with recent models supporting sequences of 128K tokens, 256K tokens, and beyond. • However, training the

Optimizing Communication for Mixture-of-Experts Training with Hybrid Expert Parallel

Optimizing Communication for Mixture-of-Experts Training with Hybrid Expert Parallel

• In LLM training, Expert Parallel (EP) communication for hyperscale mixture-of-experts (MoE) models is challenging. • EP communication is essentially all-to-all, but due to its dy

Advancing GPU Programming with the CUDA Tile IR Backend for OpenAI Triton

Advancing GPU Programming with the CUDA Tile IR Backend for OpenAI Triton

• NVIDIA CUDA Tile is a GPU-based programming model that targets portability for NVIDIA Tensor Cores, unlocking peak GPU performance. • One of the great things about CUDA Tile is t

Establishing a Scalable Sparse Ecosystem with the Universal Sparse Tensor

Establishing a Scalable Sparse Ecosystem with the Universal Sparse Tensor

• UST decouples tensor sparsity from memory representation, enabling flexible storage formats. • Developers describe storage via a DSL, focusing solely on sparsity patterns. • Comp

Practical Security Guidance for Sandboxing Agentic Workflows and Managing Execution Risk

Practical Security Guidance for Sandboxing Agentic Workflows and Managing Execution Risk

• AI coding agents enable developers to work faster by streamlining tasks and driving automated, test-driven development. • However, they also introduce a significant, often overlo

Ensuring Balanced GPU Allocation in Kubernetes Clusters with Time-Based Fairshare

Ensuring Balanced GPU Allocation in Kubernetes Clusters with Time-Based Fairshare

• NVIDIA Run:ai v2.24 introduces time-based fairshare scheduling for Kubernetes GPU clusters. • Scheduler tracks historical GPU usage, adjusting queue scores to balance long-term r

Speeding Up Variable-Length Training with Dynamic Context Parallelism and NVIDIA Megatron Core

Speeding Up Variable-Length Training with Dynamic Context Parallelism and NVIDIA Megatron Core

• This post introduces Dynamic Context Parallelism (Dynamic-CP), a scheduling approach in NVIDIA Megatron Core used for LLM post-training or DiT pre-training. • It dynamically sele

Updating Classifier Evasion for Vision Language Models

Updating Classifier Evasion for Vision Language Models

• Advances in AI architectures have unlocked multimodal functionality, enabling transformer models to process multiple forms of data in the same context. • For instance, vision lan

LoRa® Technology Revolutionizing Drone Communications

LoRa® Technology Revolutionizing Drone Communications

• LoRa’s Chirp Spread Spectrum delivers >10 km range, surpassing conventional RC links for drones. • Low power consumption keeps drone batteries light, extending flight times and m

CES 2026: Semtech Showcases Next-Generation IoT Innovation with LoRa® and Edge AI Solutions

CES 2026: Semtech Showcases Next-Generation IoT Innovation with LoRa® and Edge AI Solutions

• Semtech unveiled its next-gen LoRa® Gen 4 platform at CES 2026, boosting throughput to 2.6 Mbps. • Multi‑protocol connectivity via LoRa Plus™ supports LoRaWAN®, Amazon Sidewalk,

Racing to 6G: Key Takeaways from the MOPA Alliance Webinar on Optical Standardization

Racing to 6G: Key Takeaways from the MOPA Alliance Webinar on Optical Standardization

• Racing to 6G: Key Takeaways from the MOPA Alliance Webinar on Optical Standardization Industry leaders from Ericsson, Nokia, and Semtech recently discussed why mobile optical sta

LoRaWAN® Takes Center Stage at Enlit Europe: A Smart Metering Milestone

LoRaWAN® Takes Center Stage at Enlit Europe: A Smart Metering Milestone

• LoRaWAN® Takes Center Stage at Enlit Europe: A Smart Metering Milestone If there was any doubt that LoRaWAN® has secured its place in smart metering, Enlit Europe 2025 put that t

LoRa® Holiday Blog 2025: Wrapping Up a Year of Innovation and Collaboration

LoRa® Holiday Blog 2025: Wrapping Up a Year of Innovation and Collaboration

• LoRa® Holiday Blog 2025: Wrapping Up a Year of Innovation and Collaboration Expanding the Boundaries of Internet of Things (IoT) Connectivity - Because Innovation Doesn’t Take a

One-Channel Hub: A Compact LoRaWAN® Access Point for Cost-Effective Internet of Things (IoT)

One-Channel Hub: A Compact LoRaWAN® Access Point for Cost-Effective Internet of Things (IoT)

• One-Channel Hub: A Compact LoRaWAN® Access Point for Cost-Effective Internet of Things (IoT) Semtech’s One-Channel Hub reference design enables affordableLoRaWAN®sensor networks,

Advanced Surge Protection for Industrial 24V DC Systems

Advanced Surge Protection for Industrial 24V DC Systems

• Advanced Surge Protection for Industrial 24V DC Systems Protecting the 24-Volt Direct Current (DC) System - The Backbone of Industry 4.0 Industrial automation is rapidly evolving

LoRaWAN® Reaches Critical Mass: 125 Million Devices and Accelerating Growth

LoRaWAN® Reaches Critical Mass: 125 Million Devices and Accelerating Growth

• LoRaWAN® Reaches Critical Mass: 125 Million Devices and Accelerating Growth The Internet of Things (IoT) landscape is witnessing a significant milestone. • LoRaWAN®, the low-powe