AI · Long compute / Inference
The user submits a request that is going to take 30 seconds or more. Image generation. A complex analytical query. A long-context reasoning task. The user has crossed the unit-task boundary; static skeletons no longer carry their weight; engagement is the last move before they leave.
This scenario sits in the 10 S+ band. Block & Zakay 1997's Block & Zakay 1997 meta-analysis frames the trade-off — engagement compresses prospective duration while expanding retrospective duration; the design decision is whether you have made that trade deliberately. Fitch's Fitch Slack and FIFA examples are the canonical references; Myers 1985 Myers 1985 is the determinate-progress fallback where the inference reports phases.
AI · Streaming response
A chat-style assistant returns a ~200-character answer. Naive: total wait, then the full response drops in. Tuned: ~600 ms thinking state, then tokens stream at a natural reading pace.
Off
Press Run to start
On
Press Run to start
What is happening
The ai-streaming demo stands in. For a real long inference, the tuned flow stacks more layers:
- Thinking state during the time-to-first-token gap — the same dots-and-cursor pattern from ai-chat-streaming-response.
- Streaming render as soon as the first token arrives, paced to a natural reading rhythm.
- Tool-call transparency if the inference involves visible tool calls — narrate them ("Searching…", "Reading…", "Reasoning…").
- Determinate progress if the inference can report phases ("Step 3 of 7"); fall back to engagement otherwise.
- Cancellation always available — the stop button must respond inside the perceptual frame even if the abort takes longer.
- Background fallback — past 30–60 seconds, offer "do this in the background and notify me" so the user can leave the surface.
What to tune
- Pre-action — submit button echo within ~50 ms; thinking dots cover the time-to-first-token gap.
- First 1 s — thinking state in place. No spinner, no skeleton over content the model will produce.
- 1 – 10 s — token streaming where text is the output. Tool-call transparency where the work is visible.
- Past 10 s — engaging copy where applicable; determinate progress where the inference reports phases. Cancellation always visible.
- Past 30–60 s — hand-off to background sync with notification. The foreground is no longer the right surface.
When perceived performance hurts you here
The engagement-vs-retrospective-duration trade is the central trap. A 30-second inference with rich engagement feels short while it runs and long in retrospect — the user remembers it as taking forever even when their session went smoothly. Slack and FIFA accept this; for inference where the user repeats the action many times in a session, the retrospective cost compounds.
The cleaner answer for repeat-use AI inference: ship determinate progress where measurable, tool-call transparency where applicable (the user is learning during the wait), and background sync past 60 s. Generic engaging content (motivational quotes, mini-games) belongs only on rare or one-off inferences.
Accessibility
aria-live="polite"on the streaming output and on tool-call narration.aria-busy="true"during the inference; flip on completion.prefers-reduced-motion: reduce— replace cross-fades and pulse animations with static states.- Always provide a way out — visible cancellation, visible "do in background", visible re-attempt on failure.
References
References · 3
- Block & Zakay 1997
Block, R. A., & Zakay, D. (1997). Prospective and retrospective duration judgments: A meta-analytic review. Psychonomic Bulletin & Review, 4(2), 184–197. The trade-off engagement makes during long inference waits.
- Fitch
Fitch, E. Perceived Performance: The Only Kind That Really Matters (conference talk). Engaging-loading examples (Slack, FIFA) that map onto long AI inference waits.
- Myers 1985
Myers, B. A. (1985). The importance of percent-done progress indicators for computer-human interfaces. Proceedings of CHI '85, 11–17. Determinate progress where measurable.