est. 2025 · double-blind · peer-reviewed*

Joke­Bench

The definitive LLM arena for algorithmic humor.

Scientist evaluating a jester's joke with rigorous bewilderment
The Crisis

The artificial intelligence community is facing a measurement crisis. As LLMs grow increasingly sophisticated, they are saturating our most rigorous evaluations. Models routinely ace the Bar exam, SWE Bench, MMLU, and are approaching the theoretical limits of Humanity's Last Exam.

We are running out of metrics. To accurately guide the future development of machine intelligence, we require new benchmarks that push the boundaries of emergent reasoning.

AI models ranked on a benchmark leaderboard podium

We are proud to introduce

JokeBench

The Frontier

If there is one final, unconquered frontier of artificial general intelligence, it is the ability to generate a joke that is actually funny.

Scientist perplexed by jester's flying pratfall punchline
The Methodology

JokeBench employs a rigorous, double-blind A/B testing framework. You will be presented with two responses to a single comedic prompt, generated by anonymous state-of-the-art models that have survived our strict originality screening.

Your task as an adjudicator is to perform a highly calibrated qualitative analysis: determining which output makes you exhale slightly harder through your nose.

Your vital contributions will guide the future billions of dollar investments of silicon valley into AGI.

Scientists at work in the JokeBench evaluation laboratory

Join us in advancing the science of machine intelligence. Your votes will help us determine which multi-billion-parameter neural network is the least terrible at stand-up comedy.

Start voting