<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>Gpu on Tenu Tech Brief</title>
    <link>https://cluster-site.onrender.com/tags/gpu/</link>
    <description>Recent content in Gpu on Tenu Tech Brief</description>
    <generator>Hugo -- 0.146.0</generator>
    <language>en-us</language>
    <lastBuildDate>Thu, 26 Feb 2026 01:41:33 +0000</lastBuildDate>
    <atom:link href="https://cluster-site.onrender.com/tags/gpu/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Nvidia delivers first Vera Rubin AI GPU samples to customers - 88-core Vera CPU paired with Rubin GPUs with 288 GB of HBM4 memory apiece</title>
      <link>https://cluster-site.onrender.com/posts/nvidia-delivers-first-vera-rubin-ai-gpu-samples-to-customers-88-core-vera-cpu-paired-with-rubin-gpus-with-288-gb-of-hbm4-memory-apiece/</link>
      <pubDate>Thu, 26 Feb 2026 01:14:00 +0000</pubDate>
      <guid>https://cluster-site.onrender.com/posts/nvidia-delivers-first-vera-rubin-ai-gpu-samples-to-customers-88-core-vera-cpu-paired-with-rubin-gpus-with-288-gb-of-hbm4-memory-apiece/</guid>
      <description>• Nvidia delivers first Vera Rubin AI GPU samples to customers - 88-core Vera CPU paired with Rubin GPUs with 288 GB of HBM4 memory apiece On track for 2H 2026 • Get Tom&amp;rsquo;s Hardware</description>
    </item>
    <item>
      <title>RCCLX: Innovating GPU communications on AMD platforms</title>
      <link>https://cluster-site.onrender.com/posts/rcclx-innovating-gpu-communications-on-amd-platforms/</link>
      <pubDate>Tue, 24 Feb 2026 21:30:54 +0000</pubDate>
      <guid>https://cluster-site.onrender.com/posts/rcclx-innovating-gpu-communications-on-amd-platforms/</guid>
      <description>• We are open-sourcing the initial version of RCCLX - an enhanced version of RCCL that we developed and tested on Meta&amp;rsquo;s internal workloads. • RCCLX is fully integrated with Torchc</description>
    </item>
    <item>
      <title>tiny-gpu-compiler: An educational MLIR-based compiler targeting open-source GPU hardware</title>
      <link>https://cluster-site.onrender.com/posts/tiny-gpu-compiler-an-educational-mlir-based-compiler-targeting-open-source-gpu-hardware/</link>
      <pubDate>Tue, 24 Feb 2026 06:01:49 +0000</pubDate>
      <guid>https://cluster-site.onrender.com/posts/tiny-gpu-compiler-an-educational-mlir-based-compiler-targeting-open-source-gpu-hardware/</guid>
      <description>• Tiny-gpu-compiler: An educational MLIR-based compiler targeting open-source GPU hardware I built an open-source compiler that uses MLIR to compile a C-like GPU kernellanguage dow</description>
    </item>
    <item>
      <title>BiScale: Energy-Efficient Disaggregated LLM Serving via Phase-Aware Placement and DVFS</title>
      <link>https://cluster-site.onrender.com/posts/biscale-energy-efficient-disaggregated-llm-serving-via-phase-aware-placement-and-dvfs/</link>
      <pubDate>Tue, 24 Feb 2026 05:00:00 +0000</pubDate>
      <guid>https://cluster-site.onrender.com/posts/biscale-energy-efficient-disaggregated-llm-serving-via-phase-aware-placement-and-dvfs/</guid>
      <description>• Prefill/decode disaggregation improves latency-throughput tradeoff for large language model serving. • Energy consumption remains high; autoscaling is too coarse-grained for rapi</description>
    </item>
    <item>
      <title>GPU-Resident Gaussian Process Regression Leveraging Asynchronous Tasks with HPX</title>
      <link>https://cluster-site.onrender.com/posts/gpu-resident-gaussian-process-regression-leveraging-asynchronous-tasks-with-hpx/</link>
      <pubDate>Tue, 24 Feb 2026 05:00:00 +0000</pubDate>
      <guid>https://cluster-site.onrender.com/posts/gpu-resident-gaussian-process-regression-leveraging-asynchronous-tasks-with-hpx/</guid>
      <description>• GPRat library extended to a fully GPU-resident Gaussian Process prediction pipeline. • Combines HPX task‑based parallelism with an intuitive Python API for seamless integration.</description>
    </item>
    <item>
      <title>The Landscape of GPU-Centric Communication</title>
      <link>https://cluster-site.onrender.com/posts/the-landscape-of-gpu-centric-communication/</link>
      <pubDate>Tue, 24 Feb 2026 05:00:00 +0000</pubDate>
      <guid>https://cluster-site.onrender.com/posts/the-landscape-of-gpu-centric-communication/</guid>
      <description>• GPUs dominate HPC/ML workloads, yet inter‑GPU communication remains a scalability bottleneck. • Traditional CPU‑centric communication is being challenged by GPU‑centric models th</description>
    </item>
    <item>
      <title>ucTrace: A Multi-Layer Profiling Tool for UCX-driven Communication</title>
      <link>https://cluster-site.onrender.com/posts/uctrace-a-multi-layer-profiling-tool-for-ucx-driven-communication/</link>
      <pubDate>Tue, 24 Feb 2026 05:00:00 +0000</pubDate>
      <guid>https://cluster-site.onrender.com/posts/uctrace-a-multi-layer-profiling-tool-for-ucx-driven-communication/</guid>
      <description>• ucTrace delivers fine‑grained UCX communication traces, filling gaps left by existing MPI profilers. • It maps UCX operations back to originating MPI calls, linking host‑to‑devic</description>
    </item>
    <item>
      <title>GPU Memory and Utilization Estimation for Training-Aware Resource Management: Opportunities and Limitations</title>
      <link>https://cluster-site.onrender.com/posts/gpu-memory-and-utilization-estimation-for-training-aware-resource-management-opportunities-and-limitations/</link>
      <pubDate>Mon, 23 Feb 2026 05:00:00 +0000</pubDate>
      <guid>https://cluster-site.onrender.com/posts/gpu-memory-and-utilization-estimation-for-training-aware-resource-management-opportunities-and-limitations/</guid>
      <description>• Computer Science &amp;gt; Distributed, Parallel, and Cluster Computing [Submitted on 19 Feb 2026] Title:GPU Memory and Utilization Estimation for Training-Aware Resource Management: Opp</description>
    </item>
    <item>
      <title>AI craze leaves only one Nvidia RTX 50-series GPU at MSRP - RTX 5060 Ti 8GB makes the final stand, as even the RTX 5050 falls</title>
      <link>https://cluster-site.onrender.com/posts/ai-craze-leaves-only-one-nvidia-rtx-50-series-gpu-at-msrp-rtx-5060-ti-8gb-makes-the-final-stand-as-even-the-rtx-5050-falls/</link>
      <pubDate>Fri, 20 Feb 2026 18:57:34 +0000</pubDate>
      <guid>https://cluster-site.onrender.com/posts/ai-craze-leaves-only-one-nvidia-rtx-50-series-gpu-at-msrp-rtx-5060-ti-8gb-makes-the-final-stand-as-even-the-rtx-5050-falls/</guid>
      <description>• AI craze leaves only one Nvidia RTX 50-series GPU at MSRP - RTX 5060 Ti 8GB makes the final stand, as even the RTX 5050 falls Get Tom&amp;rsquo;s Hardware&amp;rsquo;s best news and in-depth reviews,</description>
    </item>
    <item>
      <title>Intel Hiring More Linux Developers - Including For GPU Drivers / Linux Gaming Stack</title>
      <link>https://cluster-site.onrender.com/posts/intel-hiring-more-linux-developers-including-for-gpu-drivers-/-linux-gaming-stack/</link>
      <pubDate>Fri, 20 Feb 2026 18:49:07 +0000</pubDate>
      <guid>https://cluster-site.onrender.com/posts/intel-hiring-more-linux-developers-including-for-gpu-drivers-/-linux-gaming-stack/</guid>
      <description>• Intel Hiring More Linux Developers - Including For GPU Drivers / Linux Gaming Stack As some good news out of Intel today on the Linux/open-source side following last year&amp;rsquo;s layof</description>
    </item>
    <item>
      <title>The great Bench GPU retest begins - how we&#39;re testing for our GPU Hierarchy in 2026, and why upscaling and framegen are still out</title>
      <link>https://cluster-site.onrender.com/posts/the-great-bench-gpu-retest-begins-how-were-testing-for-our-gpu-hierarchy-in-2026-and-why-upscaling-and-framegen-are-still-out/</link>
      <pubDate>Fri, 20 Feb 2026 18:18:41 +0000</pubDate>
      <guid>https://cluster-site.onrender.com/posts/the-great-bench-gpu-retest-begins-how-were-testing-for-our-gpu-hierarchy-in-2026-and-why-upscaling-and-framegen-are-still-out/</guid>
      <description>• The great Bench GPU retest begins - how we&amp;rsquo;re testing for our GPU Hierarchy in 2026, and why upscaling and framegen are still out It&amp;rsquo;s time to test. • Here&amp;rsquo;s how the sausage is m</description>
    </item>
    <item>
      <title>Accelerating Data Processing with NVIDIA Multi-Instance GPU and NUMA Node Localization</title>
      <link>https://cluster-site.onrender.com/posts/accelerating-data-processing-with-nvidia-multi-instance-gpu-and-numa-node-localization/</link>
      <pubDate>Thu, 19 Feb 2026 17:30:00 +0000</pubDate>
      <guid>https://cluster-site.onrender.com/posts/accelerating-data-processing-with-nvidia-multi-instance-gpu-and-numa-node-localization/</guid>
      <description>• NVIDIA flagship data center GPUs in the NVIDIA Ampere, NVIDIA Hopper, and NVIDIA Blackwell families all feature non-uniform memory access (NUMA) behaviors, but expose a single me</description>
    </item>
    <item>
      <title>DigitalOcean Gradient™ AI GPU Droplets Optimized for Inference: Increasing Throughput at Lower the Cost</title>
      <link>https://cluster-site.onrender.com/posts/digitalocean-gradient-ai-gpu-droplets-optimized-for-inference-increasing-throughput-at-lower-the-cost/</link>
      <pubDate>Thu, 19 Feb 2026 14:42:18 +0000</pubDate>
      <guid>https://cluster-site.onrender.com/posts/digitalocean-gradient-ai-gpu-droplets-optimized-for-inference-increasing-throughput-at-lower-the-cost/</guid>
      <description>• By Jason Peng and Hemasumanth Rasineni Production-grade LLM inference demands more than just access to GPUs; it requires deep optimization across the entire serving stack, from q</description>
    </item>
    <item>
      <title>Expanding our Agentic Inference Cloud: Introducing GPU Droplets Powered by AMD Instinct™ MI350X GPUs</title>
      <link>https://cluster-site.onrender.com/posts/expanding-our-agentic-inference-cloud-introducing-gpu-droplets-powered-by-amd-instinct-mi350x-gpus/</link>
      <pubDate>Thu, 19 Feb 2026 12:30:00 +0000</pubDate>
      <guid>https://cluster-site.onrender.com/posts/expanding-our-agentic-inference-cloud-introducing-gpu-droplets-powered-by-amd-instinct-mi350x-gpus/</guid>
      <description>• Expanding our Agentic Inference Cloud: Introducing GPU Droplets Powered by AMD Instinct™ MI350X GPUs ByWaverly Swinton Published:February 19, 2026 2 min read As our Agentic Infer</description>
    </item>
    <item>
      <title>Unlock Massive Token Throughput with GPU Fractioning in NVIDIA Run:ai</title>
      <link>https://cluster-site.onrender.com/posts/unlock-massive-token-throughput-with-gpu-fractioning-in-nvidia-runai/</link>
      <pubDate>Wed, 18 Feb 2026 18:00:00 +0000</pubDate>
      <guid>https://cluster-site.onrender.com/posts/unlock-massive-token-throughput-with-gpu-fractioning-in-nvidia-runai/</guid>
      <description>• As AI workloads scale, achieving high throughput, efficient resource usage, and predictable latency becomes essential. • NVIDIA Run:ai addresses these challenges through intellig</description>
    </item>
    <item>
      <title>Topping the GPU MODE Kernel Leaderboard with NVIDIA cuda.compute</title>
      <link>https://cluster-site.onrender.com/posts/topping-the-gpu-mode-kernel-leaderboard-with-nvidia-cuda.compute/</link>
      <pubDate>Wed, 18 Feb 2026 17:00:00 +0000</pubDate>
      <guid>https://cluster-site.onrender.com/posts/topping-the-gpu-mode-kernel-leaderboard-with-nvidia-cuda.compute/</guid>
      <description>• Topping the GPU MODE Kernel Leaderboard with NVIDIA cuda.compute The leaderboard scores how fast users&amp;rsquo; custom GPU kernels solve a set of standard problems like vector addition,</description>
    </item>
    <item>
      <title>Bruteforcing Accidental Antenna Designs</title>
      <link>https://cluster-site.onrender.com/posts/bruteforcing-accidental-antenna-designs/</link>
      <pubDate>Wed, 18 Feb 2026 03:00:45 +0000</pubDate>
      <guid>https://cluster-site.onrender.com/posts/bruteforcing-accidental-antenna-designs/</guid>
      <description>• Antenna design often seen as black art, but brute-force GPU approach explored. • Janne, novice, used VNA and GPU-based FDTD to simulate and optimize antennas. • Leveraged LLMs to</description>
    </item>
    <item>
      <title>Warnings in GPU to NVVM pipeline</title>
      <link>https://cluster-site.onrender.com/posts/warnings-in-gpu-to-nvvm-pipeline/</link>
      <pubDate>Tue, 17 Feb 2026 14:06:43 +0000</pubDate>
      <guid>https://cluster-site.onrender.com/posts/warnings-in-gpu-to-nvvm-pipeline/</guid>
      <description>• Warnings in GPU to NVVM pipeline Hi, I&amp;rsquo;m currently in the process of trying to understand the conversion from the GPU dialect to LLVM via the NVVM dialect and GPU code-generation</description>
    </item>
    <item>
      <title>Parallel Track Transformers: Enabling Fast GPU Inference with Reduced Synchronization</title>
      <link>https://cluster-site.onrender.com/posts/parallel-track-transformers-enabling-fast-gpu-inference-with-reduced-synchronization/</link>
      <pubDate>Tue, 10 Feb 2026 00:00:00 +0000</pubDate>
      <guid>https://cluster-site.onrender.com/posts/parallel-track-transformers-enabling-fast-gpu-inference-with-reduced-synchronization/</guid>
      <description>• Parallel Track Transformers: Enabling Fast GPU Inference with Reduced Synchronization Parallel Track Transformers: Enabling Fast GPU Inference with Reduced Synchronization Author</description>
    </item>
    <item>
      <title>Build with Kimi K2.5 Multimodal VLM Using NVIDIA GPU-Accelerated Endpoints</title>
      <link>https://cluster-site.onrender.com/posts/build-with-kimi-k2.5-multimodal-vlm-using-nvidia-gpu-accelerated-endpoints/</link>
      <pubDate>Wed, 04 Feb 2026 19:46:33 +0000</pubDate>
      <guid>https://cluster-site.onrender.com/posts/build-with-kimi-k2.5-multimodal-vlm-using-nvidia-gpu-accelerated-endpoints/</guid>
      <description>• Kimi K2.5 is a multimodal vision‑language model trained with Megatron‑LM. • It contains 1 trillion parameters, 384 experts, a single dense layer, and 3.2% activation per token. •</description>
    </item>
    <item>
      <title>Advancing GPU Programming with the CUDA Tile IR Backend for OpenAI Triton</title>
      <link>https://cluster-site.onrender.com/posts/advancing-gpu-programming-with-the-cuda-tile-ir-backend-for-openai-triton/</link>
      <pubDate>Fri, 30 Jan 2026 20:01:47 +0000</pubDate>
      <guid>https://cluster-site.onrender.com/posts/advancing-gpu-programming-with-the-cuda-tile-ir-backend-for-openai-triton/</guid>
      <description>• NVIDIA CUDA Tile is a GPU-based programming model that targets portability for NVIDIA Tensor Cores, unlocking peak GPU performance. • One of the great things about CUDA Tile is t</description>
    </item>
    <item>
      <title>Ensuring Balanced GPU Allocation in Kubernetes Clusters with Time-Based Fairshare</title>
      <link>https://cluster-site.onrender.com/posts/ensuring-balanced-gpu-allocation-in-kubernetes-clusters-with-time-based-fairshare/</link>
      <pubDate>Wed, 28 Jan 2026 17:00:00 +0000</pubDate>
      <guid>https://cluster-site.onrender.com/posts/ensuring-balanced-gpu-allocation-in-kubernetes-clusters-with-time-based-fairshare/</guid>
      <description>• NVIDIA Run:ai v2.24 introduces time-based fairshare scheduling for Kubernetes GPU clusters. • Scheduler tracks historical GPU usage, adjusting queue scores to balance long-term r</description>
    </item>
    <item>
      <title>AWS Weekly Roundup: Amazon EC2 G7e instances, Amazon Corretto updates, and more (January 26, 2026)</title>
      <link>https://cluster-site.onrender.com/posts/aws-weekly-roundup-amazon-ec2-g7e-instances-amazon-corretto-updates-and-more-january-26-2026/</link>
      <pubDate>Mon, 26 Jan 2026 16:25:46 +0000</pubDate>
      <guid>https://cluster-site.onrender.com/posts/aws-weekly-roundup-amazon-ec2-g7e-instances-amazon-corretto-updates-and-more-january-26-2026/</guid>
      <description>• Amazon EC2 G7e instances GA, NVIDIA RTX PRO 6000 Blackwell GPUs, 2.3× better inference than G6e. • G7e offers up to 8 GPUs, 768GB total GPU memory, supports FP8 precision, ideal</description>
    </item>
    <item>
      <title>Announcing Amazon EC2 G7e instances accelerated by NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs</title>
      <link>https://cluster-site.onrender.com/posts/announcing-amazon-ec2-g7e-instances-accelerated-by-nvidia-rtx-pro-6000-blackwell-server-edition-gpus/</link>
      <pubDate>Tue, 20 Jan 2026 21:22:56 +0000</pubDate>
      <guid>https://cluster-site.onrender.com/posts/announcing-amazon-ec2-g7e-instances-accelerated-by-nvidia-rtx-pro-6000-blackwell-server-edition-gpus/</guid>
      <description>• Amazon EC2 G7e instances launched, powered by NVIDIA RTX PRO 6000 Blackwell GPUs. • Deliver up to 2.3× inference performance over G6e, ideal for generative AI and graphics worklo</description>
    </item>
    <item>
      <title>Reddit&#39;s Home Feed on GPU: Unlock ML Growth and Efficiency</title>
      <link>https://cluster-site.onrender.com/posts/reddits-home-feed-on-gpu-unlock-ml-growth-and-efficiency/</link>
      <pubDate>Mon, 10 Nov 2025 19:15:56 +0000</pubDate>
      <guid>https://cluster-site.onrender.com/posts/reddits-home-feed-on-gpu-unlock-ml-growth-and-efficiency/</guid>
      <description>• Author: Cedric Blondeau TL;DR We migrated Reddit&amp;rsquo;s Home Feed Ranker from CPU to GPU to unlock scalability, efficiency, and enable further growth with new architectures like Trans</description>
    </item>
    <item>
      <title>Hack Week 2025: How these engineers liquid-cooled a GPU server</title>
      <link>https://cluster-site.onrender.com/posts/hack-week-2025-how-these-engineers-liquid-cooled-a-gpu-server/</link>
      <pubDate>Wed, 27 Aug 2025 15:00:00 +0000</pubDate>
      <guid>https://cluster-site.onrender.com/posts/hack-week-2025-how-these-engineers-liquid-cooled-a-gpu-server/</guid>
      <description>• Hack Week 2025: How these engineers liquid-cooled a GPU server Hack Week 2025 at Dropbox centered on the theme &amp;lsquo;Keep It Simple,&amp;rsquo; offering opportunities for innovation, experiment</description>
    </item>
    <item>
      <title>Hack Week 2025: How these engineers liquid-cooled a GPU server</title>
      <link>https://cluster-site.onrender.com/posts/hack-week-2025-how-these-engineers-liquid-cooled-a-gpu-server/</link>
      <pubDate>Wed, 27 Aug 2025 15:00:00 +0000</pubDate>
      <guid>https://cluster-site.onrender.com/posts/hack-week-2025-how-these-engineers-liquid-cooled-a-gpu-server/</guid>
      <description>• Hack Week 2025: How these engineers liquid-cooled a GPU server Hack Week 2025 at Dropbox centered on the theme &amp;lsquo;Keep It Simple,&amp;rsquo; offering opportunities for innovation, experiment</description>
    </item>
    <item>
      <title>Arm Unveils 2024 Compute Platform: 3nm, Cortex-X925, Cortex-A725, Immortalis-G925</title>
      <link>https://cluster-site.onrender.com/posts/arm-unveils-2024-compute-platform-3nm-cortex-x925-cortex-a725-immortalis-g925/</link>
      <pubDate>Wed, 29 May 2024 15:00:22 +0000</pubDate>
      <guid>https://cluster-site.onrender.com/posts/arm-unveils-2024-compute-platform-3nm-cortex-x925-cortex-a725-immortalis-g925/</guid>
      <description>• Arm launches 2024 Client Compute Subsystem (CSS) featuring 3nm process and new Cortex cores. • Cortex-X925 delivers highest single‑thread performance for demanding workloads. • C</description>
    </item>
    <item>
      <title>Arm Launches Next-Gen Flagship Cortex-X925</title>
      <link>https://cluster-site.onrender.com/posts/arm-launches-next-gen-flagship-cortex-x925/</link>
      <pubDate>Wed, 29 May 2024 15:00:18 +0000</pubDate>
      <guid>https://cluster-site.onrender.com/posts/arm-launches-next-gen-flagship-cortex-x925/</guid>
      <description>• Arm unveils Cortex‑X925, 5th‑gen flagship core, boosting performance and power efficiency. • Core part of 2024 Client Compute Subsystems, paired with DSU‑120 and Immortalis‑G925</description>
    </item>
    <item>
      <title>Leveraging Spark 3 and NVIDIA&#39;s GPUs to Reduce Cloud Cost by up to 70% for Big Data Pipelines</title>
      <link>https://cluster-site.onrender.com/posts/leveraging-spark-3-and-nvidias-gpus-to-reduce-cloud-cost-by-up-to-70-for-big-data-pipelines/</link>
      <pubDate>Wed, 21 Feb 2024 16:42:14 +0000</pubDate>
      <guid>https://cluster-site.onrender.com/posts/leveraging-spark-3-and-nvidias-gpus-to-reduce-cloud-cost-by-up-to-70-for-big-data-pipelines/</guid>
      <description>• PayPal runs hundreds of thousands of Spark jobs hourly, processing petabytes of data. • Upgrading to Spark 3 and adopting NVIDIA RAPIDS GPUs cuts cloud costs up to 70%. • GPUs of</description>
    </item>
  </channel>
</rss>
