The canonical thresholds

If you have read anything about response time and UX, you have read about the 0.1, 1, and 10-second limits. They appear in design textbooks, on every NN/g blog, in every "designing for performance" workshop, and they get cited in production-planning meetings as if they were physics. They are not physics. They are an observation, an interpretation of an interpretation, and a piece of editorial work — and it is worth knowing which is which, because the citation chain that gives them their authority is more complicated than the trichotomy suggests.

I'd argue every designer working on performance should be able to say, with a straight face, what each of the four canonical sources actually claimed. The literature is older than most designers realise and the credit gets attributed weirdly. Let's go through them in chronological order.

Miller 1968: the seventeen response-time tiers

The original is Robert B. Miller's Miller 1968 1968 paper "Response time in man-computer conversational transactions," delivered at the AFIPS Fall Joint Computer Conference. It is the source of the tiered-response-time idea. It does not contain the clean three-tier model.

What Miller actually did was identify seventeen different transaction types between humans and computers:

keystroke echo,
request for service,
complex visual recall,
error feedback,
identification of an item,
system response to a control input,
error correction,
and so on,

and propose acceptable response times for each, ranging from "system reflection of action" (~0.1 s) up to "complex inquiry and response" (~10 s) and longer.

The reason this matters in practice: Miller's table is more granular than the 0.1 / 1 / 10 framing. He had separate thresholds for keystroke acknowledgment, simple inquiry, complex visual recall, and error feedback. Some of these are below 1 s, some are at 2 s, some at 5 s, some at 15 s. The clean trichotomy is not in the paper; it is a later distillation.

If you are arguing about response budgets and somebody cites "the 0.1-second rule from Miller," you are dealing with a paraphrase. The rule from Miller is "0.1 s for keystroke echo, but other thresholds for other transactions." It is not wrong to cite the popular version, but it is worth knowing where it came from.

Card, Moran & Newell 1983: the human processor model

Card, Moran, and Newell's Card, Moran & Newell 1983 1983 book The Psychology of Human-Computer Interaction is where the underlying cognitive timing comes from. They modelled the human processor in terms of a hierarchy of time constants:

Perceptual processing — ~100 ms — the time for sensory input to register.
Immediate response — ~1 s — the time for a single conscious response.
Unit task — ~10 s — the time for a small, complete cognitive task.

These are not response-time recommendations. They are descriptions of what the human cognitive system actually does. The leap from "a perceptual frame is ~100 ms" to "your UI should respond in ~100 ms" is an inference: if the UI responds slower than perceptual processing, the user notices the gap; if it responds within perceptual processing, the response feels caused by the user's own action.

This is the underlying psychology behind the 0.1-second tier of the trichotomy. Card et al. did not write the trichotomy itself, but they wrote the cognitive constants that justify it.

Card, Robertson & Mackinlay 1991: animation timing

The same Card, with Robertson and Mackinlay, returned to this in their 1991 CHI paper The Information Visualizer Card et al. 1991. The paper's main contribution is an architecture for interactive visualisations, but it is also the source of the recommendation that animation refresh at ~10 Hz (~100 ms per frame) to maintain object permanence during transitions. That recommendation is why the View Transitions API spec uses ~150–250 ms cross-fades, why Tailwind's default transition lands at ~150 ms, why this project's own motion timings hover around 200 ms.

If somebody cites "the 100-ms rule for transitions," this is the paper to cite back.

The 100 ms frame also implies a small but useful three-tier family of animation durations that modern motion systems converge on without quite saying why:

~100 ms — the perceptual-frame floor. Reserved for micro-feedback: hover-state transitions, focus-ring moves, button-press acknowledgements. Any shorter and the user does not register the change; any longer and the response stops feeling caused.
~200 ms — the comfortable default for state changes. View transitions, fades, slides, scale-on-hover, opening menus. Long enough to read as motion, short enough to stay out of the user's way. Tailwind defaults to ~150 ms; the View Transitions spec lands at ~200 ms; both are inside this band.
~400 ms — the upper edge for non-blocking transitions. Page-level fades, large layout shifts, choreographed multi-element entrances. Above ~400 ms the transition starts competing with the user's next action and the perceived performance cost outweighs the visual benefit.

The durations are not magic numbers. They are the 100 ms frame and its first two harmonics — what you would expect from a system whose underlying clock is the perceptual frame itself.

Doherty & Thadani 1982: the 400 ms productivity cliff

The most empirical of the four. Walter Doherty and Ahrvind Thadani's Doherty 1982 IBM technical report measured actual productivity for terminal users at varying response times. They found a non-linear curve: as response times dropped under one second, productivity rose disproportionately, and the curve broke sharply at around 400 ms.

The numbers, for context: programmer transactions per hour rose from ~180 at a 3-second response time to ~371 at 0.3 seconds — a 106 % increase. Component-forecaster productivity rose ~339 % moving from over 5 seconds to sub-second response.

This is what the modern web-vitals community calls the "Doherty threshold." It is the empirical basis for treating 400 ms as a flow-state boundary — the motivation behind INP, behind the Interaction-to-Next-Paint metric, behind every "sub-second response time" goal you have ever seen in a sprint planning meeting.

It is also the most-quoted, least-read paper in this set. The original technical report is a niche IBM artifact; nearly every modern reference is to a secondary source. The numbers are real. The threshold is real. The "Doherty threshold" framing is later attribution.

Nielsen 1993: the trichotomy

And finally, the source of the version everybody actually cites. Jakob Nielsen's Nielsen 1993 1993 Usability Engineering (Chapter 5; later excerpted as the NN/g article "Response Times: The 3 Important Limits") synthesised Miller, Card-Moran-Newell, and the field's empirical work into a clean three-tier model:

0.1 second — the limit for the system to feel instantaneous; no feedback needed.
1.0 second — the limit for uninterrupted flow of thought; the user notices the delay but stays in control.
10 seconds — the limit for keeping attention on the dialogue; beyond this, give a progress indicator.

This is the version every designer has memorised. It is a synthesis, not a primary finding. It is well-supported — Miller's transaction data, Card's perceptual constants, and the wider empirical record all back the boundaries — but it is Nielsen's editorial framing, not Miller's claim.

The reason this matters: when you cite "the 0.1 / 1 / 10 rule," you are citing Nielsen 1993, who is citing Miller 1968 and Card-Moran-Newell 1983. The chain is short, it is correct, and it is rarely surfaced. Half the literature attributes the trichotomy to Miller alone, which is partly a courtesy to the older paper and partly a folkloric edit. Use Nielsen for the popular framing. Use Miller and Card et al. for the underlying research. Use Doherty for the productivity numbers.

Where the modern web fits

These four papers are the foundation under every response-time graph you have ever seen, every Web Vitals threshold, every conference talk on speed. They do not disagree. They cover slightly different ground:

Card et al. 1991 sets the ~100 ms perceptual frame — the floor under which UI responses feel caused by the user.
Doherty & Thadani 1982 sets the ~400 ms productivity cliff — the point under which sub-second response converts to real productivity gains.
Miller 1968 populates the table between ~1 s and ~10 s with finer-grained transaction-typed thresholds.
Nielsen 1993 distils the whole thing into 0.1 / 1 / 10 for daily use.

When a Web Vitals threshold (First Input Delay under 100 ms, Interaction to Next Paint under 200 ms, Largest Contentful Paint under 2.5 s) lines up with one of these papers, that is not coincidence. The Vitals team is sitting on top of forty years of cognitive timing data. The boundaries hold up because the underlying cognition has not changed.

What to do with this

Three takeaways before the next essay:

Cite the right paper for the right claim. "0.1 s feels instant" is Nielsen distilling Card-Moran-Newell. "Sub-second response is more productive" is Doherty. "Keystroke echo at 0.1 s, simple inquiry at 2 s, complex inquiry at 10 s" is Miller. The folkloric "Miller said 0.1 / 1 / 10" is wrong; the correct version is "Nielsen 1993, after Miller 1968 and Card et al. 1983."
Treat the boundaries as cognitive transitions, not budgets. They are not "you must respond under X" — they are "the user's mental state changes at X." Some products will live happily at 2 s; some die at 500 ms. The boundaries are the shape of the curve, not the curve.
Use the four sources to argue, not to decide. Performance budgets get set by your product team and your CFO. The papers tell you which budgets cost you what. They do not tell you which budgets to set.

References · 5

Miller 1968
Miller, R. B. (1968). Response time in man-computer conversational transactions. Proceedings of the AFIPS Fall Joint Computer Conference, 33(I), 267–277. The 17-transaction taxonomy that the popularised 0.1 / 1 / 10-second trichotomy is later distilled from.
Card, Moran & Newell 1983
Card, S. K., Moran, T. P., & Newell, A. (1983). The Psychology of Human-Computer Interaction. Lawrence Erlbaum. Source of the human-processor time constants — perceptual processing ~100 ms, immediate response ~1 s, unit task ~10 s.
Card et al. 1991
Card, S. K., Robertson, G. G., & Mackinlay, J. D. (1991). The information visualizer, an information workspace. Proceedings of CHI '91, 181–188. Source of the ~10 Hz / 100-ms-per-frame animation refresh recommendation that underlies most modern transition timing defaults.
Doherty 1982
Doherty, W. J., & Thadani, A. J. (1982). The Economic Value of Rapid Response Time. IBM Technical Report GE20-0752-0. Empirical productivity-vs-response-time curve breaking sharply at ~400 ms — the basis of the 'Doherty threshold' framing.
Nielsen 1993
Nielsen, J. (1993). Response Times: The 3 Important Limits. From Usability Engineering, Ch. 5. Morgan Kaufmann. The synthesis that distilled Miller, Card-Moran-Newell, and the field's empirical work into the clean 0.1 / 1 / 10-second trichotomy.