It’s become a predictable pattern. Every few weeks, a new AI headline drops and the breathless commentary ensues. Human workers outperformed. Knowledge work is dead.

It’s understandable enough. Anyone who has spent any time working with modern AI systems knows that the capability of these tools is unprecedented. But extrapolating out to the end of all knowledge is a big leap; in spite of the clickbait headlines, an imminent job apocalypse is not a foregone conclusion.
Perhaps this is just “cope,” or a lack of imagination on my part. But even if we take the end of all human knowledge work as a given, it’s still worth having a detailed mental model of what the path between here and there might look like.
Consider OpenAI’s release of GPT 5.4 in March of this year. The company, with plenty of help from the press and online influencers eager for clicks, touted that the model outperforms 83% of knowledge workers. If you stopped reading there, you could be forgiven for throwing your hands up in despair. But what’s behind this number? How do they define knowledge work? And what is their method for measuring performance?
The benchmark used in this case is GDPval, and its stated goal is to “evaluate AI model capabilities on real-world economically valuable tasks.” It’s a rigorous evaluation, and the methodology behind it is available online for anyone to review. I’ve spent some time doing just that and concluded that it’s worth taking seriously, but only with caveats.
Also by Jason Griffing: Differentiating in a Post-AI World
The most important thing to understand about OpenAI’s claim is the specific types of tasks they’re evaluating. Quoting from the white paper, “For GDPval, we provide the full context of the task in the prompt.” Essentially, the eval is based strictly on well-defined, well-understood, and perfectly documented tasks. Actual examples from the dataset include:
- Concierge: Create a week-long luxury Bahamas itinerary for a family of four
- Order Clerk: Audit pricing inconsistencies in a given set of purchase orders
- Real Estate Agent: Design a sales brochure for a specific property
The prompts provided to the models to assist them in these tasks are highly detailed, providing thorough instructions along with examples and supporting resources.
The paper goes on to acknowledge the limits of the methodology: “…in real life, it often takes effort to figure out the full context of a task and understand what to work on.” They even provide a section on performance in “under-contextualized” domains, showing that the win rate of the models over humans drops from 83% to 43% when the model is given a little less than half of the context.
The bottom line, at least for now, is that AI models do outperform humans, but only under certain conditions, namely where the scope of the tasks is well-defined, the desired outcomes are perfectly understood, and all the necessary context is thoroughly documented. In other words, conditions that look nothing like the reality that most knowledge workers actually operate in. Real work is about more than executing isolated tasks. It’s about navigating ambiguity, building and leveraging relationships, exercising discretion, and using judgment to fill in gaps, all skills that evaluations like GDPval cannot easily measure and that large language models will struggle to replicate.
The point of this isn’t to dismiss the potential for job disruption that AI presents; it’s to build your own intuition for how exactly it will play out in your specific domain. Exercising concern about your potential to stay relevant and provide for yourself and your family in the age of AI isn’t just understandable, it’s morally responsible. But if you really want to grasp the implications of this technology, you have to go beyond cherry-picked statistics and clickbait headlines.
Also by Jason Griffing: The Unseen Engine of Operational Excellence
Only by appreciating the nuances can you more effectively leverage AI as a force multiplier where it makes sense, while increasingly pushing your personal value delivery into the kinds of work that humans are still uniquely positioned to deliver.