When Single Answer Is Not Enough: Rethinking Single-Step Retrosynthesis Benchmarks for LLMs | Arena Library | Arena