When large language models are reliable for judging empathic communication
• Abstract Large language models (LLMs) excel at generating empathic responses in text-based conversations. • But, how reliably do they judge the nuances of empathic communication?