BeamVLM for Low-altitude Economy: Generative Beam Prediction via Vision-language Models

• BeamVLM introduces vision-language model for UAV-to-base station beam prediction in low-altitude economy. • Treats beam prediction as vision QA, projecting visual patches into language domain. • Uses instructional prompts to combine UAV trajectories with environmental context for joint reasoning. • Outperforms state-of-the-art deep learning methods on real-world datasets. • Demonstrates superior generalization to vehicle-to-infrastructure beam prediction scenarios. • Addresses lack of high-level semantic understanding in existing models. • Bridges gap between LLM generalization and fine-grained spatial perception. • Provides end-to-end generative framework enabling efficient, accurate beam alignment for high-mobility UAVs.

Article Summaries:

BeamVLM is a new end‑to‑end generative framework that applies vision‑language models (VLMs) to beam prediction for low‑altitude economy (LAE) networks. The system reframes beam selection as a vision‑question‑answering task, projecting raw visual patches from UAV environments into the language domain and using a carefully designed instructional prompt. This allows the VLM to jointly reason about UAV trajectories and fine‑grained spatial semantics, overcoming the limited generalization of prior deep‑learning methods and the perception gaps of large‑language‑model approaches. Experiments on real‑world datasets show BeamVLM surpasses state‑of‑the‑art accuracy and generalizes well to vehicle‑to‑infrastructure (V2I) scenarios.

Sources:

https://arxiv.org/abs/2602.19929