A demo proves an agent can work once. Production proves it can work ten thousand times, on a bad day, with inputs nobody anticipated. The gap between the two is where most of the engineering lives.
We have now put custom AI agents into live operations across manufacturing, financial services, and real estate. The use cases differ wildly, but the lessons that decided success or failure were almost always the same.
Narrow scope beats broad ambition
Every agent that succeeded did one job well. Every agent that struggled was asked to do everything. The instinct to build a single assistant that handles the whole department is the fastest route to something unreliable. Pick one high volume task, make it excellent, then expand.
Guardrails are the product
In a demo the interesting part is what the agent can do. In production the important part is what it must never do. Hard limits, validation on every output, and clear escape routes to a human are not features you add later. They are the foundation that earns the trust an agent needs to be used at all.
"Users do not abandon an agent because it is occasionally wrong. They abandon it because it is wrong in ways they cannot predict or catch."
Keep a human in the loop where it counts
The best deployments are honest about confidence. When the agent is sure, it acts. When it is not, it routes to a person with everything that person needs to decide quickly. That single design choice turns a risky system into a dependable one and gives the team a reason to keep using it.
What we now do on every build
- Define the one task and the hard boundaries before any prompting.
- Build evaluation against real historical cases, not invented ones.
- Instrument everything so failures are visible, not silent.
- Watch token and inference cost from day one, before it scales.
- Ship to a small group, learn, then widen.
Agents in production are less about clever models and more about disciplined engineering and honest design. The teams that internalise that are the ones still running their agents a year later.
Back to insights