Enterprise Adoption of AI from a product manager's point of view

Michael Huang
May 31
5 min read

Six months. That is usually how long it takes.

You build a clean demo. You walk the stakeholders through it. They are impressed. You leave the room with momentum and a follow-up meeting scheduled. Six months later, the product sits untouched. Not because it broke. Because nobody trusted it enough to keep using it.

Bhavna opened her talk at Coworking Friday this way, and I recognised the room in her description. Most of us had watched this happen. Several of us from inside it.

Bhavna is the co-founder of Kind AI. Her team ships agentic AI for debt collections across Indonesia and the Philippines. Regulated, operationally complex, high-stakes. When she talks about the distance between a demo and actual adoption, she is drawing from a year of working in that gap directly.

The failures your logs will never show

Every engineering failure surfaces somewhere. Hallucinations. Crashes. API timeouts. These show up in audit logs, and builders fix them. Bhavna calls these table stakes. You do not get to launch with engineering failures.

Soft failures are different. They are the off-tone response the agent delivers to an impatient borrower. The missing next step that leaves the user staring at a screen. The reply that technically answers the question and solves nothing. No dashboard catches these. No error log flags them. They accumulate quietly over months and destroy adoption before you have any data to explain why.

The gap between a demo and a product is almost entirely soft failures. A demo is a controlled environment with a perfect user, a clean prompt, and one intended path. Real usage is none of those things. Variable users. Edge cases nobody designed for. Intents that sit slightly outside the training distribution. The agent drifts into responses that are technically correct and deeply unhelpful. Trust erodes. Users stop trying.

Measuring what actually breaks

The SaaS metrics most of us were trained on do not carry over cleanly to agentic products.

Daily active users, session length, feature click-through: these only tell you about usage patterns. But they do not tell you whether the agent resolved anything.

Bhavna draws a clean line here. You need to measure what she calls intent resolution.

You track override rates, how often a user rejects what the agent produced and does it themselves.
You watch escalation frequency, how often the agent pulls a human into the loop to complete a task it should handle on its own.

Override rates, specifically, are a signal worth sitting with. If users override regularly, the agent is producing output. It is not producing anything they trust.

Define good before you write a line of code

Normal software engineering gives you a feedback loop. You ship, you instrument, you fix. Agentic AI does not work this way, because you cannot predict how the agent will behave the way you can predict how a deterministic function will behave.

For AI products, Bhavna believes we have to flip the process entirely. Before touching any code, you define what good looks like. For a debt collection voice agent, that means: was the message clear? Was it actionable? Did it respect compliance guardrails? Did the borrower take a next step? These are rubrics, and you set them before you prototype.

Then you run the prototype against real conditions and evaluate the outputs as ship it, needs edits, or unacceptable. You iterate until the agent behaves within those rubrics. Then, and only then, you build.

A builder in the room asked about the transition period for enterprise customers. He is working on a supply inspection product where the OCR struggles with handwritten forms and the human override rate will be high, at least initially. Bhavna put a number on it.

For enterprise pilots, she benchmarks six to twelve weeks. That is the window where the agent is learning, the customer is watching, and the human-in-the-loop is not an admission of weakness. It is the value proposition. You are giving a regulated buyer confidence while the model earns it.

Twenty customers, two countries, one sharp focus

Kyne AI does not have a universal product. Bhavna is very clear and adamant about this.

They have mapped twenty enterprise customers across Indonesia and the Philippines. That is their market. Not Southeast Asia broadly, not APAC, but only twenty companies in two countries where they have tested the problem and know the product fits. If she took the same product to the US market tomorrow, she says she would have zero product-market fit, because it is not designed for that market.

Customising agentic AI is expensive. Building for a narrow buyer with a known problem is the only path that does not financially destroy an early-stage team. You go deep into the business before you sell. You read the annual report. You map the stakeholders. You arrive already knowing the pain point.

Someone in the room made a point at this point. In enterprise sales, the builders who land deals often set the demo aside early. Demos are important no-doubt, but what is more important, is understanding what the enterprise is trying to do strategically. This helps the people sitting across the table feel understood. That trust comes before product trust. And once you have it, product trust is far easier to build.

An operator who had sold payroll software added a layer. Enterprise deals typically have two audiences in the room: the stakeholder who owns the budget and the operator whose daily life changes if this ships. They care about completely different things. Getting both in the room and speaking both languages is what separates the pitches that land from the ones that get a polite follow-up and nothing else.

The moment a product earns trust without explaining itself

We spent a while talking about the products that do this well.

Someone pointed to Cursor and the way it narrates what it is doing in the sidebar in real time. Transparency builds trust more reliably than accuracy. If the agent shows its work, users develop a mental model of when to rely on it and when to check it. That mental model is what keeps someone in the product.

Then came the Pendo story. A builder in the room had been searching for in-app guidance software years ago. Three vendors showed him polished demos. Pendo pulled up his actual application and pushed a button. A heat map appeared over his page, showing exactly where his users were clicking. He did not need to say anything else for the sale. The vendor had proved they understood his job, not his product, his job, the thing he was responsible for every day. That was the moment. Every conversation after it started from a different place.

Bhavna wrapped with a simple check. Before shipping any agent, ask three questions.

Did it resolve the intent or just output a response? When it failed, did it fail gracefully or make things worse? And does the human using it actually trust it, or are they quietly overriding it every other session?

The last question is not a metric. It is the thing all the metrics eventually point toward.

The SQ Collective is a community of operators and builders in Singapore. We meet on Fridays to cowork, share context, and help each other ship. If you are working on something and want a room where someone has likely already made your next mistake, come find us. Details for our next Coworking Friday are below.

Missed out last week? Don't worry, these conversations happen every Friday at SQ Collective.

Usually over laptops. Sometimes over pizza.

You're welcome to join the next one.