Making Softmax More Efficient with NVIDIA Blackwell Ultra
• LLM context lengths are exploding, and architectures are moving toward complex attention schemes like Multi-Head Latent Attention (MLA) and Grouped Query Attention (GQA) • As a r
• LLM context lengths are exploding, and architectures are moving toward complex attention schemes like Multi-Head Latent Attention (MLA) and Grouped Query Attention (GQA) • As a r
• As the sizes of AI models and datasets continue to increase, relying only on higher-precision BF16 training is no longer sufficient. • Key challenges such as training throughput
• NVIDIA flagship data center GPUs in the NVIDIA Ampere, NVIDIA Hopper, and NVIDIA Blackwell families all feature non-uniform memory access (NUMA) behaviors, but expose a single me
• Unlocking Ultralow Latency Video for Tethered Drones with Semtech’s BlueRiver® Audio Video Processor Technology Tethered drones are rapidly expanding across surveillance, industr
• As AI workloads scale, achieving high throughput, efficient resource usage, and predictable latency becomes essential. • NVIDIA Run:ai addresses these challenges through intellig
• Topping the GPU MODE Kernel Leaderboard with NVIDIA cuda.compute The leaderboard scores how fast users’ custom GPU kernels solve a set of standard problems like vector addition,
• As global AI adoption accelerates, developers face a growing challenge: delivering large language model (LLM) performance that meets real-world latency and cost requirements. • R
• Enterprise data is inherently complex: real-world documents are multimodal, spanning text, tables, charts and graphs, images, diagrams, scanned pages, forms, and embedded metadat
• LoRa® and LoRaWAN® for Industrial Sensors: Enabling Smart, Connected Operations The industrial sector is experiencing a significant transformation as organizations strive to enha
• Building robust, intelligent robots requires testing them in complex environments. • However, gathering data in the physical world is expensive, slow, and often dangerous. • It i
• Scientists and engineers who design and build unique scientific research facilities face similar challenges. • These include managing massive data rates that exceed current compu
• NVIDIA TensorRT LLM enables developers to build high-performance inference engines for large language models (LLMs), but deploying a new architecture traditionally requires signi
• 3 Ways NVFP4 Accelerates AI Training and Inference L T F R E The latest AI models continue to grow in size and complexity, demanding increasing amounts of compute performance for
• Specialized AI models are built to perform specific tasks or solve particular problems. • But if you’ve ever tried to fine-tune or distill a domain-specific model, you’ve probabl
• Painkiller RTX sets a new standard for how small teams can balance massive visual ambition with limited resources by integrating generative AI. • By upscaling thousands of legacy
• Kimi K2.5 is a multimodal vision‑language model trained with Megatron‑LM. • It contains 1 trillion parameters, 384 experts, a single dense layer, and 3.2% activation per token. •
• What if your AI agent could instantly parse complex PDFs, extract nested tables, and ‘see’ data within charts as easily as reading a text file? • With NVIDIA Nemotron RAG, you ca
• Large language models (LLMs) are rapidly expanding their context windows, with recent models supporting sequences of 128K tokens, 256K tokens, and beyond. • However, training the
• In LLM training, Expert Parallel (EP) communication for hyperscale mixture-of-experts (MoE) models is challenging. • EP communication is essentially all-to-all, but due to its dy
• NVIDIA CUDA Tile is a GPU-based programming model that targets portability for NVIDIA Tensor Cores, unlocking peak GPU performance. • One of the great things about CUDA Tile is t
• UST decouples tensor sparsity from memory representation, enabling flexible storage formats. • Developers describe storage via a DSL, focusing solely on sparsity patterns. • Comp
• AI coding agents enable developers to work faster by streamlining tasks and driving automated, test-driven development. • However, they also introduce a significant, often overlo
• NVIDIA Run:ai v2.24 introduces time-based fairshare scheduling for Kubernetes GPU clusters. • Scheduler tracks historical GPU usage, adjusting queue scores to balance long-term r
• This post introduces Dynamic Context Parallelism (Dynamic-CP), a scheduling approach in NVIDIA Megatron Core used for LLM post-training or DiT pre-training. • It dynamically sele
• Advances in AI architectures have unlocked multimodal functionality, enabling transformer models to process multiple forms of data in the same context. • For instance, vision lan
• LoRa’s Chirp Spread Spectrum delivers >10 km range, surpassing conventional RC links for drones. • Low power consumption keeps drone batteries light, extending flight times and m
• Semtech unveiled its next-gen LoRa® Gen 4 platform at CES 2026, boosting throughput to 2.6 Mbps. • Multi‑protocol connectivity via LoRa Plus™ supports LoRaWAN®, Amazon Sidewalk,
• Racing to 6G: Key Takeaways from the MOPA Alliance Webinar on Optical Standardization Industry leaders from Ericsson, Nokia, and Semtech recently discussed why mobile optical sta
• LoRaWAN® Takes Center Stage at Enlit Europe: A Smart Metering Milestone If there was any doubt that LoRaWAN® has secured its place in smart metering, Enlit Europe 2025 put that t
• LoRa® Holiday Blog 2025: Wrapping Up a Year of Innovation and Collaboration Expanding the Boundaries of Internet of Things (IoT) Connectivity - Because Innovation Doesn’t Take a
• One-Channel Hub: A Compact LoRaWAN® Access Point for Cost-Effective Internet of Things (IoT) Semtech’s One-Channel Hub reference design enables affordableLoRaWAN®sensor networks,
• Advanced Surge Protection for Industrial 24V DC Systems Protecting the 24-Volt Direct Current (DC) System - The Backbone of Industry 4.0 Industrial automation is rapidly evolving
• LoRaWAN® Reaches Critical Mass: 125 Million Devices and Accelerating Growth The Internet of Things (IoT) landscape is witnessing a significant milestone. • LoRaWAN®, the low-powe