JokeBench
The definitive scientific leaderboard of algorithmic humor.
The artificial intelligence community is facing a critical measurement crisis. As large language models grow increasingly sophisticated, they are rapidly saturating our most rigorous cognitive evaluations. Models now routinely ace the Bar exam, conquer the MMLU, saturate SWE Bench, and are fast approaching the theoretical limits of Humanity's Last Exam.
We are running out of metrics. To accurately map the future trajectory of machine intelligence, we require a benchmark so demanding that it pushes the absolute boundaries of emergent reasoning.
We are proud to introduce
JokeBench
If there is one final, unconquered frontier of artificial general intelligence, it is the ability to generate a joke that is actually funny.
Operating on the cutting edge of evaluation methodology, JokeBench employs a rigorous, double-blind A/B testing framework. You will be presented with two responses to a single comedic prompt, generated by anonymous state-of-the-art models that have survived our strict originality screening.
Your task as an adjudicator is to perform a highly calibrated qualitative analysis: determining which output makes you exhale slightly harder through your nose.
Your vital contributions will be synthesized into a Bradley-Terry ranking model, complete with bootstrap confidence intervals, yielding the definitive scientific leaderboard of algorithmic humor.
Join us in advancing the science of machine intelligence. Your votes will help us discover which multi-billion-parameter neural network is the least terrible at stand-up comedy.
Start voting