Gpu on Tenu Tech Brief

Gpu on Tenu Tech Brief https://cluster-site.onrender.com/tags/gpu/ Recent content in Gpu on Tenu Tech Brief Hugo -- 0.146.0 en-us Thu, 26 Feb 2026 01:41:33 +0000 Nvidia delivers first Vera Rubin AI GPU samples to customers - 88-core Vera CPU paired with Rubin GPUs with 288 GB of HBM4 memory apiece https://cluster-site.onrender.com/posts/nvidia-delivers-first-vera-rubin-ai-gpu-samples-to-customers-88-core-vera-cpu-paired-with-rubin-gpus-with-288-gb-of-hbm4-memory-apiece/ Thu, 26 Feb 2026 01:14:00 +0000 https://cluster-site.onrender.com/posts/nvidia-delivers-first-vera-rubin-ai-gpu-samples-to-customers-88-core-vera-cpu-paired-with-rubin-gpus-with-288-gb-of-hbm4-memory-apiece/ • Nvidia delivers first Vera Rubin AI GPU samples to customers - 88-core Vera CPU paired with Rubin GPUs with 288 GB of HBM4 memory apiece On track for 2H 2026 • Get Tom’s Hardware RCCLX: Innovating GPU communications on AMD platforms https://cluster-site.onrender.com/posts/rcclx-innovating-gpu-communications-on-amd-platforms/ Tue, 24 Feb 2026 21:30:54 +0000 https://cluster-site.onrender.com/posts/rcclx-innovating-gpu-communications-on-amd-platforms/ • We are open-sourcing the initial version of RCCLX - an enhanced version of RCCL that we developed and tested on Meta’s internal workloads. • RCCLX is fully integrated with Torchc tiny-gpu-compiler: An educational MLIR-based compiler targeting open-source GPU hardware https://cluster-site.onrender.com/posts/tiny-gpu-compiler-an-educational-mlir-based-compiler-targeting-open-source-gpu-hardware/ Tue, 24 Feb 2026 06:01:49 +0000 https://cluster-site.onrender.com/posts/tiny-gpu-compiler-an-educational-mlir-based-compiler-targeting-open-source-gpu-hardware/ • Tiny-gpu-compiler: An educational MLIR-based compiler targeting open-source GPU hardware I built an open-source compiler that uses MLIR to compile a C-like GPU kernellanguage dow BiScale: Energy-Efficient Disaggregated LLM Serving via Phase-Aware Placement and DVFS https://cluster-site.onrender.com/posts/biscale-energy-efficient-disaggregated-llm-serving-via-phase-aware-placement-and-dvfs/ Tue, 24 Feb 2026 05:00:00 +0000 https://cluster-site.onrender.com/posts/biscale-energy-efficient-disaggregated-llm-serving-via-phase-aware-placement-and-dvfs/ • Prefill/decode disaggregation improves latency-throughput tradeoff for large language model serving. • Energy consumption remains high; autoscaling is too coarse-grained for rapi GPU-Resident Gaussian Process Regression Leveraging Asynchronous Tasks with HPX https://cluster-site.onrender.com/posts/gpu-resident-gaussian-process-regression-leveraging-asynchronous-tasks-with-hpx/ Tue, 24 Feb 2026 05:00:00 +0000 https://cluster-site.onrender.com/posts/gpu-resident-gaussian-process-regression-leveraging-asynchronous-tasks-with-hpx/ • GPRat library extended to a fully GPU-resident Gaussian Process prediction pipeline. • Combines HPX task‑based parallelism with an intuitive Python API for seamless integration. The Landscape of GPU-Centric Communication https://cluster-site.onrender.com/posts/the-landscape-of-gpu-centric-communication/ Tue, 24 Feb 2026 05:00:00 +0000 https://cluster-site.onrender.com/posts/the-landscape-of-gpu-centric-communication/ • GPUs dominate HPC/ML workloads, yet inter‑GPU communication remains a scalability bottleneck. • Traditional CPU‑centric communication is being challenged by GPU‑centric models th ucTrace: A Multi-Layer Profiling Tool for UCX-driven Communication https://cluster-site.onrender.com/posts/uctrace-a-multi-layer-profiling-tool-for-ucx-driven-communication/ Tue, 24 Feb 2026 05:00:00 +0000 https://cluster-site.onrender.com/posts/uctrace-a-multi-layer-profiling-tool-for-ucx-driven-communication/ • ucTrace delivers fine‑grained UCX communication traces, filling gaps left by existing MPI profilers. • It maps UCX operations back to originating MPI calls, linking host‑to‑devic GPU Memory and Utilization Estimation for Training-Aware Resource Management: Opportunities and Limitations https://cluster-site.onrender.com/posts/gpu-memory-and-utilization-estimation-for-training-aware-resource-management-opportunities-and-limitations/ Mon, 23 Feb 2026 05:00:00 +0000 https://cluster-site.onrender.com/posts/gpu-memory-and-utilization-estimation-for-training-aware-resource-management-opportunities-and-limitations/ • Computer Science > Distributed, Parallel, and Cluster Computing [Submitted on 19 Feb 2026] Title:GPU Memory and Utilization Estimation for Training-Aware Resource Management: Opp AI craze leaves only one Nvidia RTX 50-series GPU at MSRP - RTX 5060 Ti 8GB makes the final stand, as even the RTX 5050 falls https://cluster-site.onrender.com/posts/ai-craze-leaves-only-one-nvidia-rtx-50-series-gpu-at-msrp-rtx-5060-ti-8gb-makes-the-final-stand-as-even-the-rtx-5050-falls/ Fri, 20 Feb 2026 18:57:34 +0000 https://cluster-site.onrender.com/posts/ai-craze-leaves-only-one-nvidia-rtx-50-series-gpu-at-msrp-rtx-5060-ti-8gb-makes-the-final-stand-as-even-the-rtx-5050-falls/ • AI craze leaves only one Nvidia RTX 50-series GPU at MSRP - RTX 5060 Ti 8GB makes the final stand, as even the RTX 5050 falls Get Tom’s Hardware’s best news and in-depth reviews, Intel Hiring More Linux Developers - Including For GPU Drivers / Linux Gaming Stack https://cluster-site.onrender.com/posts/intel-hiring-more-linux-developers-including-for-gpu-drivers-/-linux-gaming-stack/ Fri, 20 Feb 2026 18:49:07 +0000 https://cluster-site.onrender.com/posts/intel-hiring-more-linux-developers-including-for-gpu-drivers-/-linux-gaming-stack/ • Intel Hiring More Linux Developers - Including For GPU Drivers / Linux Gaming Stack As some good news out of Intel today on the Linux/open-source side following last year’s layof The great Bench GPU retest begins - how we're testing for our GPU Hierarchy in 2026, and why upscaling and framegen are still out https://cluster-site.onrender.com/posts/the-great-bench-gpu-retest-begins-how-were-testing-for-our-gpu-hierarchy-in-2026-and-why-upscaling-and-framegen-are-still-out/ Fri, 20 Feb 2026 18:18:41 +0000 https://cluster-site.onrender.com/posts/the-great-bench-gpu-retest-begins-how-were-testing-for-our-gpu-hierarchy-in-2026-and-why-upscaling-and-framegen-are-still-out/ • The great Bench GPU retest begins - how we’re testing for our GPU Hierarchy in 2026, and why upscaling and framegen are still out It’s time to test. • Here’s how the sausage is m Accelerating Data Processing with NVIDIA Multi-Instance GPU and NUMA Node Localization https://cluster-site.onrender.com/posts/accelerating-data-processing-with-nvidia-multi-instance-gpu-and-numa-node-localization/ Thu, 19 Feb 2026 17:30:00 +0000 https://cluster-site.onrender.com/posts/accelerating-data-processing-with-nvidia-multi-instance-gpu-and-numa-node-localization/ • NVIDIA flagship data center GPUs in the NVIDIA Ampere, NVIDIA Hopper, and NVIDIA Blackwell families all feature non-uniform memory access (NUMA) behaviors, but expose a single me DigitalOcean Gradient™ AI GPU Droplets Optimized for Inference: Increasing Throughput at Lower the Cost https://cluster-site.onrender.com/posts/digitalocean-gradient-ai-gpu-droplets-optimized-for-inference-increasing-throughput-at-lower-the-cost/ Thu, 19 Feb 2026 14:42:18 +0000 https://cluster-site.onrender.com/posts/digitalocean-gradient-ai-gpu-droplets-optimized-for-inference-increasing-throughput-at-lower-the-cost/ • By Jason Peng and Hemasumanth Rasineni Production-grade LLM inference demands more than just access to GPUs; it requires deep optimization across the entire serving stack, from q Expanding our Agentic Inference Cloud: Introducing GPU Droplets Powered by AMD Instinct™ MI350X GPUs https://cluster-site.onrender.com/posts/expanding-our-agentic-inference-cloud-introducing-gpu-droplets-powered-by-amd-instinct-mi350x-gpus/ Thu, 19 Feb 2026 12:30:00 +0000 https://cluster-site.onrender.com/posts/expanding-our-agentic-inference-cloud-introducing-gpu-droplets-powered-by-amd-instinct-mi350x-gpus/ • Expanding our Agentic Inference Cloud: Introducing GPU Droplets Powered by AMD Instinct™ MI350X GPUs ByWaverly Swinton Published:February 19, 2026 2 min read As our Agentic Infer Unlock Massive Token Throughput with GPU Fractioning in NVIDIA Run:ai https://cluster-site.onrender.com/posts/unlock-massive-token-throughput-with-gpu-fractioning-in-nvidia-runai/ Wed, 18 Feb 2026 18:00:00 +0000 https://cluster-site.onrender.com/posts/unlock-massive-token-throughput-with-gpu-fractioning-in-nvidia-runai/ • As AI workloads scale, achieving high throughput, efficient resource usage, and predictable latency becomes essential. • NVIDIA Run:ai addresses these challenges through intellig Topping the GPU MODE Kernel Leaderboard with NVIDIA cuda.compute https://cluster-site.onrender.com/posts/topping-the-gpu-mode-kernel-leaderboard-with-nvidia-cuda.compute/ Wed, 18 Feb 2026 17:00:00 +0000 https://cluster-site.onrender.com/posts/topping-the-gpu-mode-kernel-leaderboard-with-nvidia-cuda.compute/ • Topping the GPU MODE Kernel Leaderboard with NVIDIA cuda.compute The leaderboard scores how fast users’ custom GPU kernels solve a set of standard problems like vector addition, Bruteforcing Accidental Antenna Designs https://cluster-site.onrender.com/posts/bruteforcing-accidental-antenna-designs/ Wed, 18 Feb 2026 03:00:45 +0000 https://cluster-site.onrender.com/posts/bruteforcing-accidental-antenna-designs/ • Antenna design often seen as black art, but brute-force GPU approach explored. • Janne, novice, used VNA and GPU-based FDTD to simulate and optimize antennas. • Leveraged LLMs to Warnings in GPU to NVVM pipeline https://cluster-site.onrender.com/posts/warnings-in-gpu-to-nvvm-pipeline/ Tue, 17 Feb 2026 14:06:43 +0000 https://cluster-site.onrender.com/posts/warnings-in-gpu-to-nvvm-pipeline/ • Warnings in GPU to NVVM pipeline Hi, I’m currently in the process of trying to understand the conversion from the GPU dialect to LLVM via the NVVM dialect and GPU code-generation Parallel Track Transformers: Enabling Fast GPU Inference with Reduced Synchronization https://cluster-site.onrender.com/posts/parallel-track-transformers-enabling-fast-gpu-inference-with-reduced-synchronization/ Tue, 10 Feb 2026 00:00:00 +0000 https://cluster-site.onrender.com/posts/parallel-track-transformers-enabling-fast-gpu-inference-with-reduced-synchronization/ • Parallel Track Transformers: Enabling Fast GPU Inference with Reduced Synchronization Parallel Track Transformers: Enabling Fast GPU Inference with Reduced Synchronization Author Build with Kimi K2.5 Multimodal VLM Using NVIDIA GPU-Accelerated Endpoints https://cluster-site.onrender.com/posts/build-with-kimi-k2.5-multimodal-vlm-using-nvidia-gpu-accelerated-endpoints/ Wed, 04 Feb 2026 19:46:33 +0000 https://cluster-site.onrender.com/posts/build-with-kimi-k2.5-multimodal-vlm-using-nvidia-gpu-accelerated-endpoints/ • Kimi K2.5 is a multimodal vision‑language model trained with Megatron‑LM. • It contains 1 trillion parameters, 384 experts, a single dense layer, and 3.2% activation per token. • Advancing GPU Programming with the CUDA Tile IR Backend for OpenAI Triton https://cluster-site.onrender.com/posts/advancing-gpu-programming-with-the-cuda-tile-ir-backend-for-openai-triton/ Fri, 30 Jan 2026 20:01:47 +0000 https://cluster-site.onrender.com/posts/advancing-gpu-programming-with-the-cuda-tile-ir-backend-for-openai-triton/ • NVIDIA CUDA Tile is a GPU-based programming model that targets portability for NVIDIA Tensor Cores, unlocking peak GPU performance. • One of the great things about CUDA Tile is t Ensuring Balanced GPU Allocation in Kubernetes Clusters with Time-Based Fairshare https://cluster-site.onrender.com/posts/ensuring-balanced-gpu-allocation-in-kubernetes-clusters-with-time-based-fairshare/ Wed, 28 Jan 2026 17:00:00 +0000 https://cluster-site.onrender.com/posts/ensuring-balanced-gpu-allocation-in-kubernetes-clusters-with-time-based-fairshare/ • NVIDIA Run:ai v2.24 introduces time-based fairshare scheduling for Kubernetes GPU clusters. • Scheduler tracks historical GPU usage, adjusting queue scores to balance long-term r AWS Weekly Roundup: Amazon EC2 G7e instances, Amazon Corretto updates, and more (January 26, 2026) https://cluster-site.onrender.com/posts/aws-weekly-roundup-amazon-ec2-g7e-instances-amazon-corretto-updates-and-more-january-26-2026/ Mon, 26 Jan 2026 16:25:46 +0000 https://cluster-site.onrender.com/posts/aws-weekly-roundup-amazon-ec2-g7e-instances-amazon-corretto-updates-and-more-january-26-2026/ • Amazon EC2 G7e instances GA, NVIDIA RTX PRO 6000 Blackwell GPUs, 2.3× better inference than G6e. • G7e offers up to 8 GPUs, 768GB total GPU memory, supports FP8 precision, ideal Announcing Amazon EC2 G7e instances accelerated by NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs https://cluster-site.onrender.com/posts/announcing-amazon-ec2-g7e-instances-accelerated-by-nvidia-rtx-pro-6000-blackwell-server-edition-gpus/ Tue, 20 Jan 2026 21:22:56 +0000 https://cluster-site.onrender.com/posts/announcing-amazon-ec2-g7e-instances-accelerated-by-nvidia-rtx-pro-6000-blackwell-server-edition-gpus/ • Amazon EC2 G7e instances launched, powered by NVIDIA RTX PRO 6000 Blackwell GPUs. • Deliver up to 2.3× inference performance over G6e, ideal for generative AI and graphics worklo Reddit's Home Feed on GPU: Unlock ML Growth and Efficiency https://cluster-site.onrender.com/posts/reddits-home-feed-on-gpu-unlock-ml-growth-and-efficiency/ Mon, 10 Nov 2025 19:15:56 +0000 https://cluster-site.onrender.com/posts/reddits-home-feed-on-gpu-unlock-ml-growth-and-efficiency/ • Author: Cedric Blondeau TL;DR We migrated Reddit’s Home Feed Ranker from CPU to GPU to unlock scalability, efficiency, and enable further growth with new architectures like Trans Hack Week 2025: How these engineers liquid-cooled a GPU server https://cluster-site.onrender.com/posts/hack-week-2025-how-these-engineers-liquid-cooled-a-gpu-server/ Wed, 27 Aug 2025 15:00:00 +0000 https://cluster-site.onrender.com/posts/hack-week-2025-how-these-engineers-liquid-cooled-a-gpu-server/ • Hack Week 2025: How these engineers liquid-cooled a GPU server Hack Week 2025 at Dropbox centered on the theme ‘Keep It Simple,’ offering opportunities for innovation, experiment Hack Week 2025: How these engineers liquid-cooled a GPU server https://cluster-site.onrender.com/posts/hack-week-2025-how-these-engineers-liquid-cooled-a-gpu-server/ Wed, 27 Aug 2025 15:00:00 +0000 https://cluster-site.onrender.com/posts/hack-week-2025-how-these-engineers-liquid-cooled-a-gpu-server/ • Hack Week 2025: How these engineers liquid-cooled a GPU server Hack Week 2025 at Dropbox centered on the theme ‘Keep It Simple,’ offering opportunities for innovation, experiment Arm Unveils 2024 Compute Platform: 3nm, Cortex-X925, Cortex-A725, Immortalis-G925 https://cluster-site.onrender.com/posts/arm-unveils-2024-compute-platform-3nm-cortex-x925-cortex-a725-immortalis-g925/ Wed, 29 May 2024 15:00:22 +0000 https://cluster-site.onrender.com/posts/arm-unveils-2024-compute-platform-3nm-cortex-x925-cortex-a725-immortalis-g925/ • Arm launches 2024 Client Compute Subsystem (CSS) featuring 3nm process and new Cortex cores. • Cortex-X925 delivers highest single‑thread performance for demanding workloads. • C Arm Launches Next-Gen Flagship Cortex-X925 https://cluster-site.onrender.com/posts/arm-launches-next-gen-flagship-cortex-x925/ Wed, 29 May 2024 15:00:18 +0000 https://cluster-site.onrender.com/posts/arm-launches-next-gen-flagship-cortex-x925/ • Arm unveils Cortex‑X925, 5th‑gen flagship core, boosting performance and power efficiency. • Core part of 2024 Client Compute Subsystems, paired with DSU‑120 and Immortalis‑G925 Leveraging Spark 3 and NVIDIA's GPUs to Reduce Cloud Cost by up to 70% for Big Data Pipelines https://cluster-site.onrender.com/posts/leveraging-spark-3-and-nvidias-gpus-to-reduce-cloud-cost-by-up-to-70-for-big-data-pipelines/ Wed, 21 Feb 2024 16:42:14 +0000 https://cluster-site.onrender.com/posts/leveraging-spark-3-and-nvidias-gpus-to-reduce-cloud-cost-by-up-to-70-for-big-data-pipelines/ • PayPal runs hundreds of thousands of Spark jobs hourly, processing petabytes of data. • Upgrading to Spark 3 and adopting NVIDIA RAPIDS GPUs cuts cloud costs up to 70%. • GPUs of