• Computer Science > Computation and Language [Submitted on 21 Jan 2026] Title:Language Model Representations for Efficient Few-Shot Tabular Classification View PDF HTML (experimental)Abstract:The Web is a rich source of structured data in the form of tables, from product catalogs and knowledge bases to scientific datasets. • However, the heterogeneity of the structure and semantics of these tables makes it challenging to build a unified method that can effectively leverage the information they contain. • Meanwhile, Large language models (LLMs) are becoming an increasingly integral component of web infrastructure for tasks like semantic search. • This raises a crucial question: can we leverage these already-deployed LLMs to classify structured data in web-native tables (e.g., product catalogs, knowledge base exports, scientific data portals), avoiding the need for specialized models or extensive retraining? • This work investigates a lightweight paradigm, $\textbf{Ta}$ble $\textbf{R}$epresentation with $\textbf{L}$anguage Model~($\textbf{TaRL}$), for few-shot tabular classification that directly utilizes semantic embeddings of individual table rows. • We first show that naive application of these embeddings underperforms compared to specialized tabular models.
Article Summaries:
- Summary
A new study introduces TaRL (Table Representation with Language Model), a lightweight method that repurposes large language model (LLM) embeddings for few‑shot classification of web‑native tables. While raw LLM embeddings initially underperform compared to specialized tabular models, the authors enhance them by removing a common component shared across rows and by calibrating the softmax temperature. A simple meta‑learner predicts the optimal temperature from handcrafted features. In low‑data regimes (≤32 examples), TaRL matches state‑of‑the‑art performance on semantically rich tables, demonstrating that existing LLM infrastructure can be efficiently reused for web table understanding.
Sources: