Magic Theater
HOME
HOME
54
Tags
0
Categories
31
Posts
benchmark
2025
1
AAAR-1.0: Assessing AI's Potential to Assist Research
2024
2
VISCO: Benchmarking Fine-Grained Critique and Correction Towards Self-Improvement in Visual Reasoning
FREB-TQA: A Fine-Grained Robustness Evaluation Benchmark for Table Question Answering
1