How to Design Products Better With Claude and Paper.design

huangpf
4 hours ago
6 min read

When AI can ship code in minutes, the bottleneck becomes thinking. Gang Rui opens there.

The problem with vibe coding isn't speed, but direction. "You're steering a rocket ship," he says. Nudge it slightly off course early and the whole product drifts.

The real leverage moves upstream

Gang's argument is simple. If AI makes building faster, then the highest-leverage work is no longer implementation or code. It's defining the end state clearly enough that the machine can build towards something coherent.

He breaks this into three questions.

Are we solving the right problem.
What are the core building blocks.
Is this the right experience for the user.

Start with building blocks, not screens

One of the most interesting parts of the talk is how he frames product thinking in terms of building blocks. Notion has blocks and databases. Linear has issues, projects, and phases. A product gets easier to design once its core units are clear.

If you skip that step, the AI still gives you output. Plenty of it. But the pieces don't share a vocabulary. Features arrive as isolated answers instead of parts of one system.

Gang's fix is to write what he calls a mental model document: part PRD, part shared language layer. In this document, he defines what a skill is, what a test case is, what an assertion is, what an iteration is. Then he uses that vocabulary again and again while working with Claude.

The point isn't documentation for its own sake. It's to create alignment between AI and the human. If the product and the AI are both reasoning with the same nouns, the prompts get sharper.

Take-away for me: this is the hidden tax in most AI product builds. We blame the model when the real issue is that we never named the system properly.

His own test case: an eval tool for agent skills

To make it concrete, Gang walks through a product he's building for himself. A local tool for evaluating agent skills. He likes Anthropic's skill tooling, but thinks the eval side has gaps. In particular, he doesn't trust fully automated evaluation when the judgement depends on taste.

His example was great.

Imagine you ask an LLM for a "simple recipe". What does simple mean? Simple for whom? Simple means quick, or ingredients are easily accessible, or beginner-friendly, or just low clean-up?

Instead of letting the model define quality on its own, he builds a workflow around golden datasets and human-vetted assertions. First, a calibration phase. Then an auto-improve phase that hill-climbs until the skill reaches a threshold like 75 percent pass rate.

That process becomes much more believable once the evaluation criteria are reviewed by the builder instead of hallucinated by the evaluator.

Form factor comes after the model

Once the mental model is stable, he writes a second document: the form factor.

This is where he maps the experience. Entry point. Next action. Back and forth movement. Core screens. Edge cases. Again, the choice of document matters. He likes working in prose because both he and the AI can read it, critique it, and refine it.

Someone in the room asks whether the building blocks and the end state are defined in the same place. Gang clarifies that the building blocks are more conceptual, while the form factor translates them into structure and flow.

He also points out something practical about Claude. It naturally sketches ASCII diagrams, tables, and visual structures, which makes the thinking easier to inspect. That's one reason he still prefers it over other models for this stage.

Why he codes before he polishes the canvas

A good question comes from the floor: if tools like Paper can generate UI so quickly, why not jump there first.

Gang says the sequence matters.

He codes an early version before doing the polished canvas work because he wants to validate assumptions first. Can the CLI flow work. Does the architecture hold. Will the underlying mechanism behave reliably.

Only after that does he move into visual refinement.

His version of the stack looks like this:

1. Define the mental model

2. Map the form factor

3. Build a rough working layer

4. Explore and refine visuals in Paper

5. Bring the best version back into code 6. Polish responsiveness, interaction, and motion last

Paper.design beats Figma for this workflow

The most opinionated part of the session is his comparison between Figma and Paper.design. He feels like Paper wins for AI-assisted product work.

The reason is technical and practical at once. Paper uses HTML under the hood, so the translation between code and canvas is much tighter. He says he's seeing around 95 percent code-to-canvas accuracy, while Figma lands closer to 70 percent and often needs cleanup.

Gang shows examples where Figma's output feels slightly wonky, with spacing and structure that still need manual repair. Paper, by contrast, stays closer to the intended layout and produces cleaner auto-layout behaviour.

He also likes the taste. "I don't know why," he says, "but I think the folks at Paper have prompted it quite well."

How it looks versus how it feels

This might be the clearest framework from the talk.

Gang separates product design into two layers: how it looks and how it feels.

How it looks includes layout, hierarchy, spacing, and styling. Better handled in a canvas.

How it feels includes animation, click direction, error states, and interaction rhythm. Better handled in code.

That distinction explains a lot of dead-end design workflows. If you're trying to perfect motion inside a static canvas, or trying to rough out layout purely through code, you're using the wrong medium.

He gives a small example from his own app: a bottom bar animation that took him over an hour to get right. The AI could get him to 80 percent. The remaining 20 percent was judgement. Easing. Timing. Edge cases. The exact feel of transition.

That's where the craft still lives.

The planning phase now takes longer than the build

Near the end, someone asks how his time breaks down across thinking, Paper, and code.

His answer is telling. The majority of time now goes into planning and clarification. He spends days in the conversational phase with the model before moving into implementation.

For this product, he says he spent three to four days just in the chatting phase.

That doesn't sound inefficient. It sounds like the new shape of good product work.

Because once the thinking is sharp, the build phase can be partially automated. But if the thinking is fuzzy, speed just creates mess faster.

He even describes asking Claude to "grill me" with 40 or 50 questions before drafting the core doc. The point is to pressure-test the boundaries of the product until the shape becomes obvious.

That boundary-setting theme comes up again and again. If the boundaries are fuzzy, the product wanders.

Three questions for reviewing product experience

He closes with a practical lens from Scott Belsky. When reviewing a UI, ask:

1. How did I get here

2. What do I do now

3. Where do I go next

If a screen can't answer those three questions, the product isn't ready. It's a good reminder that even in an AI-heavy workflow, interface quality still comes down to whether people can orient themselves.

What stayed with the room

What makes this session work is that it isn't really about Claude, Paper, or Figma.

It's about where human judgement still matters when the machine can already generate most of the surface area.

Not in typing faster. Not in producing more screens. In setting the end state. Naming the building blocks. Deciding where polish matters. Knowing when the output is off, even if it looks impressive.

Around the room, people push on sequencing, responsiveness, design systems, and whether these tools can absorb a richer visual language than flat minimalism. The questions are specific. The energy feels practical. Builders comparing notes, not spectators watching a demo.

That's probably why this talk lands. It gives a workflow, yes. But more importantly, it gives permission to treat thinking as the main craft again.

That Friday at SQ Collective, the conversation didn't stay stuck at "which tool is best." Tony pushed on reuse, sequencing, and mobile responsiveness. Faye asked where the time really goes. Others compared Figma, Paper, Framer, Pencil, and even direct-manipulation tools like Agentation. You could feel the room trying to sort out a new working style in real time - one where builders talk to models all day, but still need sharper taste, clearer boundaries, and better questions.

Missed out last week? Don't worry, these conversations happen every Friday at SQ Collective.

Usually over laptops. Sometimes over pizza.

Join the next one.