Metrics for evaluating LLMs can be highly product-specific ... end users perceive and interact with the product. A chatbot’s performance could be judged on how positively users rate the interaction.