Memes-as-Replies: Can Models Select Humorous Manga Panel Responses?

• Computer Science > Machine Learning [Submitted on 21 Jan 2026] Title:Memes-as-Replies: Can Models Select Humorous Manga Panel Responses? • View PDF HTML (experimental)Abstract:Memes are a popular element of modern web communication, used not only as static artifacts but also as interactive replies within conversations. • While computational research has focused on analyzing the intrinsic properties of memes, the dynamic and contextual use of memes to create humor remains an understudied area of web science. • To address this gap, we introduce the Meme Reply Selection task and present MaMe-Re (Manga Meme Reply Benchmark), a benchmark of 100,000 human-annotated pairs (500,000 total annotations from 2,325 unique annotators) consisting of openly licensed Japanese manga panels and social media posts. • Our analysis reveals three key insights: (1) large language models (LLMs) show preliminary evidence of capturing complex social cues such as exaggeration, moving beyond surface-level semantic matching; (2) the inclusion of visual information does not improve performance, revealing a gap between understanding visual content and effectively using it for contextual humor; (3) while LLMs can match human judgments in controlled settings, they struggle to distinguish subtle differences in wit among semantically similar candidates. • These findings suggest that selecting contextually humorous replies remains an open challenge for current models.

Article Summaries:

A new study introduces the Meme Reply Selection task, focusing on how models choose humorous responses to manga panels in social‑media contexts. The authors built MaMe‑Re, a benchmark of 100,000 human‑annotated pairs (500,000 annotations from 2,325 annotators) featuring openly licensed Japanese manga panels paired with online posts. Experiments with large language models show they can detect complex social cues like exaggeration but do not benefit from visual input, indicating a gap between visual understanding and humor use. While models match human judgments in controlled settings, they struggle to differentiate subtle wit among similar options, underscoring that context‑aware humorous reply selection remains an open challenge.

Sources:

https://arxiv.org/abs/2602.15842