For two years, "AI" meant "a thing you type at." A chat box. A prompt. A response. The interaction was a conversation, and the only entity doing actual work in the world was you.
That's over.
The shape of AI in 2025 is the agent: software that takes a goal, figures out the steps, picks the tools, runs the workflow, evaluates whether it worked, and tries again. You don't talk to it. You hand it a task. This is the biggest interface shift since the touchscreen, and it's happening in roughly eighteen months instead of a decade.
What an agent actually is
The shortest defensible definition: a system that takes a goal, plans a sequence of steps to reach it, takes actions in the world (usually by calling tools or APIs), observes the results, and adjusts.
The minimum required parts:
- A model that can reason. The brain.
- A set of tools or actions it can invoke. The hands.
- A loop that lets it iterate — try, observe, adjust, try again. The persistence.
- Some form of memory — short-term context, long-term store, episodic record of what it did.
A chatbot has the brain. It doesn't have the hands, the loop, or the memory. That's the whole difference.
The spectrum
"Agent" is a slippery word because it spans four very different products:
- Single-turn assistant. Answers a question, no follow-up. Most ChatGPT 2023 usage.
- Multi-turn copilot. Suggests one step at a time, waits for you to approve. GitHub Copilot. Most 2024 "AI features."
- Task-completing agent. Given a goal, figures out the path and executes it. Cursor's agent mode, Devin, Claude Code, browser-use agents.
- Long-horizon autonomous agent. Operates over hours, days, or weeks. Sets its own sub-goals. Reports back. Early production deployments now.
The difference between an assistant and an agent is whether the user is in the loop on every step. The difference between an agent and an autonomous agent is whether the user is in the loop at all.
What they're already doing
Concrete production examples, not demos:
- Code. Take a GitHub issue, read the codebase, write the fix, run the tests, open a PR. (Cursor, Devin, Claude Code, GitHub Copilot Workspace.)
- Research. Assemble a literature review, synthesize sources, surface contradictions, write the brief. (Elicit, Perplexity, ChatGPT Deep Research.)
- Browser & computer use. Log into sites, click through, fill forms, scrape data, take screenshots. (Anthropic's computer use, OpenAI Operator, Google Mariner.)
- Operations. Process invoices, reconcile accounts, run recruiting pipelines, triage support tickets.
- Sales. Research a prospect, draft a personalized outreach, send it, follow up.
- Personal. Book travel. Plan an event. Fight a parking ticket.
None of these are speculative. All of them have running production systems handling real volume in 2025.
Why now
Three things converged in late 2024 and broke open in 2025:
01Model reliability crossed a threshold
GPT-4-class and Claude-3.5-class models can follow multi-step instructions reliably enough that an agent finishes most tasks without falling over. Below that bar, agents are demos. Above it, they're products.
02Tool-use got standardized
MCP, function-calling APIs, structured outputs — the plumbing for connecting a model to the outside world got common enough that you can build a working agent in an afternoon. The same primitive everyone now agrees on, with rapidly converging standards.
03Cost dropped an order of magnitude
Running a fifty-step agent task used to cost around ten dollars. It now costs cents. The economics of "let the agent try" flipped from extravagant to obvious.
What's still hard
This is where the marketing and the engineering reality diverge.
Reliability. Agents fail at unexpected places. A 95% success rate looks fantastic in a demo. In production, it means one in twenty runs needs human cleanup.
Cost at scale. Long-running agents are expensive. The token economics that work for a single user collapse for an enterprise running thousands of agent runs per day.
Permissions. How much should an agent be allowed to do without asking? "Reply to an email" feels safe. "Send money" doesn't. The line between is unclear and contested.
Evaluation. Measuring whether an agent did the task correctly is much harder than measuring whether a model produced a good answer. Eval frameworks for agents are being built in real time, in public, with everyone watching.
Trust calibration. Users have to learn when to trust an agent and when to verify. Most haven't built that intuition yet — and the agents themselves aren't great at signaling their own uncertainty.
A 95% success rate makes a great demo. It also means one in twenty production runs needs a human to clean up.
How the world is reorganizing
Six shifts already visible:
01The interface stops being a chat box
The chat box becomes one tool the agent uses. The agent itself runs invisibly — inside an app you used to use manually, or in a background process that surfaces only when it needs you.
02Vertical agents per industry
Generic agents are giving way to specialized ones: legal research agents, clinical documentation agents, recruiting agents, financial reconciliation agents. Each trained on a domain's workflows and evaluated against the domain's standards of correctness.
03Agent-to-agent protocols
Standards are emerging for agents to authenticate each other, negotiate, and exchange tasks. MCP is one example. Agent identity standards are another. This year is the agent-internet equivalent of 1995 HTTP — primitive but unmistakable.
04New economic models
Per-seat SaaS is being supplemented by per-task and per-outcome pricing. "Pay for the result, not the software." Whether this scales is one of the open questions of the next two years.
05Identity and accountability
Whose agent did this? Who's liable when it goes wrong? Cryptographic agent identity is becoming a real engineering problem, not a thought experiment.
06The labor conversation gets sharper
It's no longer "AI will help your job." It's "an agent might do most of the work you used to do." Some roles are restructured. Some are eliminated. The conversation that was abstract in 2023 is now operational.
The skill gap, again
We wrote in the last Insights piece about the five-part global AI skill gap. Agents make all five worse — especially productive use and domain integration.
Using AI well in 2023 meant writing good prompts. Using AI well in 2025 means:
- Scoping a task at the right granularity for an agent to execute
- Defining clear success criteria — and evals the agent can be measured against
- Setting permission boundaries — what can the agent touch, what does it need to ask first
- Watching the agent run and intervening when it goes off-track
- Auditing what it actually did, after the fact
Almost none of this is in the existing AI literacy curriculum. The curriculum is two model generations behind.
The bet
If chatbots were a new feature, agents are a new form factor. The companies, professionals, and educators who treat them as the next chat interface will lose ground over the next twenty-four months to the ones who reorganize around a simpler question: what does our work look like when most of it is being done by software you can hire by the task?
The transition won't be uniform. It won't be fair. The five gaps the world has on AI literacy carry forward to agents — and likely get wider before they close.
That's the work we're building toward at the AI Impact Foundation. The curriculum is being rewritten to train people not just to use AI, but to direct it.
That's the work. Come help us do it.