Probabilistic Report Cards: LLM Evaluation MetricsFrom N-Grams to LLM-as-a-Judge: A deep dive into the evolution of evaluation metrics.