The anatomy of a wait

The previous three essays were about why perception matters. The next four are about what to do about it. Before we get to specific patterns and demos, it is worth slowing down for a moment on the structure of every wait — because every wait has the same anatomy, and most teams optimise the wrong stage of it.

I'd argue every wait you can describe in a UI breaks down into four phases:

Pre-action signal — the moment the user reaches to do something.
Response — the moment the system acknowledges the input.
Animation — the transition from old state to new state.
Completion — the moment the new state is fully settled.

Each phase has its own dominant time band, its own perception challenge, and its own toolkit. Understanding the anatomy is what makes it possible to spend perception budget where it matters and stop spending it where it does not.

Pre-action signal: the moment of intent

The first phase is the easiest to ignore because nothing has technically happened yet. The user has reached for an action — they have not yet committed. The hand is on the button, the cursor is descending toward a link, the finger is hovering over a touch target. The system has done nothing.

This is also the cheapest phase to make faster, because the system has absolute control over when "intent" turns into "action."

Two free wins live here:

mousedown over click. A click event fires when the mouse button is released, which is ~100–150 ms after it was pressed (Fitch's Fitch Mechanical Turk study put the median in that range). If you start loading on mousedown instead, you have already done 100 ms of work by the time the user releases the button. The wait is shorter without the system having to be faster.
Predictive preloading on hover or mouse-deceleration. If the cursor is decelerating toward a button, the user's intent is predictable before the click ever happens. Start loading on hover. The Future Link pattern Fitch Fitch documents can buy 600 ms or more of head start in the right conditions. You have probably seen how it works in real life — Gmail uses this technique.

A touch caveat sits underneath the mousedown win. On touch surfaces, pointerdown fires the same head-start, but it also fires when the user is starting a scroll — finger on screen, no click intended. The fix is a 6 px movement threshold: if the contact moves more than ~6 px before lift, cancel the preload. The threshold matches Fitch's own touch handling and is small enough to ignore for taps, large enough to catch every real scroll. The combined rule: start work on pointerdown, cancel on pointermove past 6 px, commit on pointerup only if the contact stayed inside the target.

The Future Link pattern deepens with the cursor-deceleration signal. A plain hover trigger buys ~200 ms of head-start (the time between hover and click). Layering a cursor-velocity check on top — fire the preload only when the cursor is decelerating toward the target — buys closer to ~600 ms, because deceleration starts earlier than the hover does. The trade is a small increase in false positives (cursor decelerates, user veers away) against a meaningful gain in true-positive head-start. Below a ~10 % false-positive budget, the trade is almost always worth it.

The predictive moves cascade. Route prefetching has a four-trigger ladder, from cheap to expensive: viewport entry (broadest, fires for every link on screen), hover (narrower, fires when the user has noticed a link), deceleration (narrower still, fires when the user is heading toward one), and pointerdown (narrowest, the head-start before click). A serious perception layer uses all four, with the trigger calibrated to how expensive the preload is. Static HTML and small JS bundles get viewport-level prefetch; bigger fetches with auth requirements get pinned to hover or later.

The risk in this phase is acting on intent that does not pan out — preloading a route the user never navigates to, firing a side-effect on mousedown that should not happen if they slide off the button. The mitigation is to keep preloads idempotent and reversible. Loading a JavaScript bundle is fine even if unused; charging a credit card is not. A good Design Engineer thinks about everything.

Response: the system says "yes, I heard you"

The second phase is where most teams are weakest. The user has clicked. Something is happening. The UI needs to confirm receipt.

If this acknowledgment lands within ~50 ms, the click feels caused — Card-Moran-Newell's Card, Moran & Newell 1983 ~100 ms perceptual frame is the upper bound. Past 50 ms or so, you start seeing the eye twitch toward "did it register?" Past 200 ms, the user is reaching for the button again. Past 1 s, they have entered passive mode and you have lost the active window.

The response can be small. It does not need to be the actual operation — that takes longer — it needs to be a confirmation that input was received. The cheapest forms:

:active pseudo-class on the button (a CSS-only acknowledgment).
A tiny press animation on the affordance.
A focus state moving forward.
A cursor change.
A loading indicator, but only if the wait is going to last longer than one second (more on this below).

Fitch Fitch measured a sweet spot of ~200 ms for the active-state animation duration — long enough to feel substantial, short enough not to feel sluggish. The user holds the button down ~50 ms longer when there is feedback to watch, which extends your free-budget window incidentally.

The press-state needs enough contrast to register. WCAG 1.4.11 Non-Text Contrast requires a 3:1 ratio between the active-state colour and the surrounding interface. Below that, the change is technically there and perceptually absent — the user's eye does not register the flip, the wait stops feeling caused. It is one of the few accessibility rules that improves perception for non-disabled users too: the contrast that helps a low-vision user notice the button pressed is the same contrast that helps every user feel the click landed.

There is no "when not to use" pre-action feedback. It costs essentially nothing, fires on every input regardless of what happens next, and prevents the failure mode below. Treat it as platform plumbing, not as a per-feature decision.

The failure mode here is the "did the click work?" double-tap. Users who do not get response feedback within ~200 ms will press again. Sometimes their second click registers as a separate action. This is how forms get submitted twice and likes get unliked. The Response phase is where that gets prevented.

This immediate feedback is especially crucial when the user is taking an action of high stakes. This is something my team learned the hard way. The payment system we used in our product was a third-party solution embedded in our environment, which made any styling or behavioural change especially hard. We eventually found a systemic workaround — but the lack of a quick response was costly in user experience and, ultimately, revenue.

Animation: from old state to new state

The third phase is the bridge. The system has the new state. The UI needs to get from the old state to the new state without jarring the user.

Card, Robertson & Mackinlay Card et al. 1991 recommended ~10 Hz refresh during transitions to maintain object permanence — a ~100-ms-per-frame budget. Modern animation libraries default to similar values. The View Transitions API spec uses ~150–250 ms cross-fades. Tailwind's default transition lands at ~150 ms.

The interesting work in this phase is shape. Linear motion from A to B feels mechanical; eased motion feels alive; over-eased motion feels theatrical. A 200 ms ease-out is a sensible default for state changes — slides, fades, and scale changes all benefit.

Two additional tools live here:

Skeleton screens. The animation between "nothing" and "content" goes through a skeleton intermediate. The user's eye lands on roughly the right place by the time the real content arrives, so the perceived jump is smaller. This is a transition disguised as a loading state. Moreover, skeletoning feels like loading has already partly happened, ramping up the perceived performance of the app.
Progressive image loading (LQIP, blur-up). A low-quality placeholder shows the rough shape and colour first, then sharpens to the full image. The perceptual transition is "seen this image gradually become itself" rather than "stared at a grey box, then the image popped in."

A third tool lives at the framework level rather than the component level: streaming server-rendered pages. Instead of waiting for the full HTML response before painting, the browser receives the page in chunks as the server resolves them — layout shell first, above-the-fold content next, slower components last. The Animation phase here is the page assembling itself, not a single element morphing. React Suspense and the App Router's streaming primitives make this composable per-region. The perception payoff is large because the user sees the page shape before any individual fetch resolves.

The failure mode in this phase is animation that competes with the user's next action. If a button is animating its press response while the user is already moving the cursor toward the next thing, the animation is in the way. Keep transitions non-blocking and never lock the UI for the duration of the animation.

Completion: the new state is settled

The fourth phase is the easiest to handle and the easiest to forget. The new state is rendered, the animation has finished, the system has nothing pending. The question is whether the user knows.

If the new state is obviously different from the old (say a navigation to a different page), the user's eyes do most of the work. If the new state is subtly different (a "Save" that updated a single field), the user may not register it — and may save again. The fix is a confirmation that lasts long enough to be perceived but not so long that it becomes noise.

A few tools:

Confirmation toast. A small element appearing for ~2–4 s. Long enough to read, short enough to dismiss itself.
Inline confirmation. A small "Saved" indicator near the affordance, fading out over ~2 s.
Focus management. Moving focus to the changed area (or a stable landmark) so keyboard users register the change.

The failure mode here is over-confirmation. If every save action triggers a toast, the toasts become noise. The user starts ignoring them. You then have a UI that is technically confirming everything and effectively confirming nothing.

The tip-the-hand rule

One rule cuts across all four phases and is worth surfacing on its own.

If the wait between phases is going to resolve in under one second, do not show a loader. Miller's Miller 1968 finer-grained taxonomy already implied this — many transactions sit comfortably in the "no feedback needed" range. Showing a loader for a sub-1-second wait does two things:

It tells the user "you are now waiting" — pulling them out of the active mode they were happily in.
It creates a visual noise pattern (loader appears, loader disappears, content appears) that registers as a longer wait than the single appear-content event would have.

Spinners belong to the 1–2 s window. Skeletons belong to the 1–10 s window. Engagement belongs to the 10 s+ window. Below 1 s, the right answer is nothing — let the system resolve and update the UI directly. The :active press feedback you put in the Response phase is enough.

This is the rule that distinguishes a polished product from a noisy one. Most apps that feel slow are not actually slow; they are over-loadered. Yet — once again, research your users. It really depends on your audience how they accomplish tasks, how they behave in specific situations, and what kind of boundaries they have in relation to time and waits. A useful rule of thumb: the younger they are, the shorter their attention span tends to be, and the shorter and more fragile their patience tends to become.

What to do with this

Three takeaways before the next essay:

Audit your UI by phase. Walk through one common interaction — pre-action, response, animation, completion. The weakest phase is probably where your perception budget is being burned.
Move work upstream when you can. Anything you can do in the pre-action phase (preload, prefetch, head-start on mousedown) shrinks the visible wait without making the system faster.
Default to quiet. No spinner under 1 s, no over-confirming, no animations that block the next click. The user feels speed by not feeling friction; quieter is faster.

References · 4

Fitch
Fitch, E. Perceived Performance: The Only Kind That Really Matters (conference talk). Source for the mousedown / pointerdown head start (~100–150 ms median hold time, n ≈ 100 via Mechanical Turk), the 200 ms active-state sweet spot, and the predictive-preloading patterns referenced in this essay.
Card et al. 1991
Card, S. K., Robertson, G. G., & Mackinlay, J. D. (1991). The information visualizer, an information workspace. Proceedings of CHI '91, 181–188. ~10 Hz / 100-ms-per-frame animation refresh recommendation underlying the modern transition-timing defaults referenced in the Animation phase.
Card, Moran & Newell 1983
Card, S. K., Moran, T. P., & Newell, A. (1983). The Psychology of Human-Computer Interaction. Lawrence Erlbaum. ~100 ms perceptual processing frame; the upper bound for the Response phase to feel caused by the user.
Miller 1968
Miller, R. B. (1968). Response time in man-computer conversational transactions. Proceedings of the AFIPS Fall Joint Computer Conference, 33(I), 267–277. The 17-transaction taxonomy whose finer-grained tiers underlie the 'tip-the-hand' rule about sub-1-second loaders.