Building AI Workflows That Non-Technical Staff Actually Use

Michael Huang
Feb 25
6 min read

Updated: Mar 15

Nobody wanted to use my Telegram bot, Rex ( -> https://rex.sq-collective.com/).

I built it. I was proud of it. Rex could generate invoices, book calendars, check for double bookings. It could create Luma events, generate cover art, send drip invites, update descriptions.

It even did Luma event invites based on "interest level" through the embedding database I built for Kai ( -> https://kai.sq-collective.com/)

My staff opened it once, got a confusing response, and went back to doing things manually. Canva was less vexing than chatting endlessly with a bot that delivered 70% of what you asked for. At least Canva does exactly what you tell it.

These 2 weeks I have been trying my hands on building delightful AI products, with Kai and with Rex, and the feedback is brutal. But they were what I needed.

Here's some takeaways fresh off the press.

70% is not 70%. It's 0%.

This is the thing nobody tells you about AI workflows: the standard isn't "almost correct." in its current form, it's either it works reliably, or it's useless.

There's no middle ground when you're building for non-technical users.

An iPhone with a keyboard that glitches sometimes is NOT an iPhone. It's junk.

A developer (me) will debug. My admin won't. When something comes out wrong, wrong image, wrong description, wrong event, a developer thinks "I'll fix it." An admin thinks "Wth, after all that prompting I still have to manually do it... I'll just do it myself next time." And they will, and put the product in the drawer.

The 30% failure rate doesn't cost you 30% of adoption. It costs you all of it.

This changed how I now think about every workflow I build. If I am not prepared to walk through it to a 100%, I don't build it.

The standard is no longer "does it work?" but "does it work reliably enough that a busy person will trust it with their actual job?"

The critique loop that changed everything

I learnt this building Kai, but here's another more relatable example: My Luma page's event cover art.

I wanted to generate event images with AI. I tried to leave AI to do it by itself. A total mess. I mean look at this Frankenstein of an image. I mean... it is SO literal: A philosopher looking at a complex machine.

Thankfully our designer friend Gray wrote graciously wrote me a prompt document called "Design Critique", which specified designer standards such as spacing between fonts, font sizes, color contrasts etc.

I dropped this "critique prompt" into the workflow: before the image ships to anyone, a second AI pass evaluates it against a spec. Is the visual semantically connected to the event topic? Does it look editorial or like a stock photo? Does it fit the brand?

Scores improved significantly. Not because the generation got better (All were made with Gemini), but because we added a feedback loop/judge in the middle before the output reached a human. The critic/judge catches what the worker AI would have passed off as good, just because the worker wanted to quickly finish his work.

It's important that its an additional step in between, as opposed to the critique itself being the prompt.

Check out the beautiful cover images we have now:

Now this is delightful automation. You just tell the AI, "MongoDB Community Event", and Baam!

The broader principle: your first output is rarely your best output. Adding critique, even automated critique, between generation and delivery is one of the highest-leverage things you can do. It costs one extra step. It dramatically raises the floor.

P.S. Please, ask your designer friends to write the spec.

Design for the actual human, not the demo

I used to build workflows by thinking about what the AI would do. What commands to run. What the output should look like. It made for great demos...but terrible daily use.

The real design process starts with a named person doing a specific thing in a specific context. This IS user centric design.

Sheina is our community executive. She's not technical. She gets a WhatsApp message from a speaker and needs to update the Luma event description. That's the scenario. Walk through it message by message, including every place it can go wrong.

Does she format the message first? Where does she paste it? What if the speaker doesn't include their job title? What if there are two upcoming events and the bot has to pick one? What if the copy-paste cuts off mid-sentence?

Every one of those is a place the workflow fails.

But do not worry, you can rely on the AI to do the design thinking interview for you.

Here's a prompt you can immediately plug into your workflow (no thanks):


Before writing or updating any skill, do the following, no shortcuts:
 1. Restate intent — what is this skill actually trying to do?
 2. Map the user journey — named persona, specific channel, message by message
 3. Surface edge cases — what can go wrong at each step?
 4. Write tests first — what does success look like before writing code?
 5. Read back to user and get confirmation.
 6. Only then write the skill

I also had to be honest about the state of my current setup: the notification messages are messy. The instructions aren't clear. The workspace is littered with test messages from things I was trying. A non-technical user opening this bot for the first time would have no idea what was happening or what to do.

That's not an AI problem. That's a product problem. And AI workflows are products.

The bot needs to think out loud

Non-technical users don't debug. When something's wrong, they don't know why. They just know it didn't work. So the bot can't silently produce output and wait for approval.

It has to show its work.

For the speaker intake: the bot extracts information from whatever the speaker wrote, then tells Sheina exactly what it found: name, talk title, bio, content. She can see immediately if something was misread. She corrects it. Then the bot shows her exactly what will appear on Luma before anything is changed.

That back-and-forth is not friction. It's the feature.

The alternative is horrific:

Bot processes silently, produces output, posts immediately. This looks clean in a demo. In production, the speaker's title is wrong and nobody notices until the event page is live and you're fixing it manually at 11pm, which is exactly the problem you built the bot to avoid.

Edge cases are the job

Happy path thinking kills workflows. You build for the normal case, demo it, it looks great, you ship it. Then real use starts and everything breaks slightly.

Edge cases I've had to plan for in the last week alone: what if the Luma event is on someone else's calendar and I don't have edit access? What if the speaker gives a LinkedIn URL instead of typing their title? What if the topic is too vague to generate a meaningful image? What if the AI critique disagrees with itself?

None of these are exotic. They all happen in normal use.

The difference between a workflow that gets adopted and one that gets abandoned is whether it handles the edges gracefully:

1) either solving them or

2) failing clearly and asking for help.

Producing confident wrong output is worse than crashing.

Most of this isn't about AI

Looking back at why my bot failed, most of the reasons had nothing to do with the AI:

- Messy notifications nobody asked for

- Unclear instructions about what the bot could even do

- Test artifacts left in the workspace

- No confidence that the output was right before it did something irreversible

These are product problems. They are design problems. Problems that would exist with any tool, AI or not. The AI is actually the easy part (in fact it is what made this even possible over a span of ONE week).

The hard part is making something a real person, with a real job, under real time pressure, will trust enough to use instead of just opening Canva.

That bar is higher than it sounds. But it's the only bar that matters.

*Building in public from SQ Collective, Singapore.*