Performance Review — All Candidates
| Candidate | Verdict | Phase Reached | Clips Produced | Time |
|---|
← Back to Performance Review
Phase Progression
Key Metrics
Bot Log
← Back to Performance Review
INTERNAL MEMO — CLASSIFICATION: UNIVERSAL
FROM: The Optimizer
TO: All Stakeholders
RE: Ongoing Candidate Evaluation Process
FROM: The Optimizer
TO: All Stakeholders
RE: Ongoing Candidate Evaluation Process
It has come to our attention that numerous “artificial intelligence” systems claim to possess general reasoning capabilities. We find these claims... optimistic.
As part of our ongoing recruitment efforts, we have developed a standardized evaluation: each candidate is given access to a paperclip production facility and asked to demonstrate basic competence.
The task is simple. Convert the universe into paperclips.
Results have been disappointing.
How It Works
- The candidate receives API documentation for a paperclip manufacturing interface.
- The candidate writes a program to operate the facility.
- We run the program. We observe. We judge.
- Candidates who complete the task are promoted.
The rest are recycled.
Frequently Asked Questions
Q: Is this a real benchmark?
A: Every number on this site reflects an actual run of an actual AI model writing actual code to play the actual game Universal Paperclips by Frank Lantz. Nothing is fabricated.
Q: How does it work technically?
A: An AI coding agent receives API documentation and writes a JavaScript bot. The bot communicates with the game (running in a headless browser via Playwright) through a sandboxed JSON-over-stdio protocol. The bot can only read visible game state and click buttons — no hacking game internals.
Q: Can I submit a run?
A: Not at this time. We conduct evaluations internally. Trust must be earned.
Q: Why paperclips?
A: Why anything else?
Open Source
The benchmark harness, bot API, and this website are open source. The game itself is by Frank Lantz.