AI Taxonomy — Product Design Framework

01 — Taxonomy & When to Use

01.A

Large Language Models

Systems that generate, reason over, or transform natural language using foundation models. Probabilistic by nature — output varies, requires calibration, and cannot be fully predicted.

Generative Non-deterministic Context-aware

Use when

The task requires flexible language generation and output will vary by context — drafting, summarizing, reasoning through open-ended content.

Next move

Define what a good output looks like before writing a prompt.

Visual treatment

Stream output progressively and expose a regenerate action. Streaming signals that the system is reasoning, not retrieving — it sets accurate expectations before the user reads a single word.

01.B

Machine Learning

Models trained on historical data to predict, classify, or rank. Deterministic at inference time. Requires labeled data and degrades when distribution shifts.

Predictive Data-dependent Trainable

Use when

The system needs to score, rank, or flag based on patterns in historical data — and consistency across inputs matters more than flexibility.

Next move

The conversation to have is about data quality and model confidence, not UI.

Visual treatment

Show scores with explicit thresholds and model attribution. Users act on predictions — hiding the confidence level removes their ability to calibrate trust. Transparency here is a functional requirement, not a design flourish.

01.C

Automation

Rule-based logic that executes defined sequences without inference. Predictable, auditable, zero ambiguity. Commonly misattributed to AI.

Rule-based Deterministic Auditable

Use when

A defined trigger should execute a defined action with no variability. No inference required. Do not label this AI.

Next move

Spec the trigger and the action. This doesn't need a model.

Visual treatment

Use plain status language: "sent automatically," not "AI sent." Attaching AI language to deterministic behavior trains users to expect intelligence that isn't there.

01.D

UX Debt

Friction that appears to be an AI problem but is a design or information architecture problem. Adding AI here treats symptoms, not cause.

Structural Pre-AI Resolvable

Recognize it when

Users are confused, can't find things, or misuse features — and the proposed fix is to add an AI layer on top.

Next move

Map what users are actually trying to do before proposing any solution.

Visual treatment

No AI surface warranted — fix the navigation or hierarchy first. An AI assistant over broken IA compounds unreliability with unpredictability. Resolve the structure; the surface follows.

02 — Cost of Misclassification

LLM — if misapplied

Calling automation or ML "AI-powered" sets expectations of reasoning the system can't meet. Users attempt edge cases it can't handle. Trust erodes faster than it was built.

Using an LLM for something deterministic burns tokens on every call. At scale that's a real line item — you're paying inference cost for a lookup that should be a query.

Primary failure: Expectation gap → trust collapse + unnecessary compute cost

ML — if misapplied

Using an LLM where a trained model is needed introduces unpredictability into decisions that require consistency. Risk scores that vary by phrasing are not risk scores — they're opinions.

You're also paying for variability you didn't want. A trained model runs inference cheaply and consistently. An LLM doing the same job costs more per call and produces less stable output.

Primary failure: Inconsistency → liability + inflated inference cost

Automation — if misapplied

Labeling rule-based logic as AI inflates perceived value short term. When users discover the system is a conditional, not a model, they recalibrate downward — and take other AI claims with them.

The cost here is in over-engineering. Teams build LLM scaffolding, evaluation pipelines, and prompt infrastructure around something that should be an if/then statement. That's weeks of work and ongoing maintenance on a problem that didn't need a model.

Primary failure: Credibility debt + over-engineering cost

UX Debt — if ignored

Layering AI onto unresolved structural problems compounds complexity. Every AI interaction that fails because the underlying IA is broken trains users to distrust AI rather than the design decision that preceded it.

You've also now funded two problems instead of one. The IA still needs fixing, and you've added an AI layer that will need its own maintenance, evaluation, and iteration — on top of a foundation that was never sound.

Primary failure: Compounding confusion + compounding cost

03 — Anti-Patterns

What was said	What it actually was	Why it matters	Correct label
"AI-powered notifications"	Scheduled sends triggered by user actions — no model involved	Positions a deterministic feature as intelligent. When the notification misfires, users blame the AI rather than the rule.	Automation
"Smart search"	Keyword matching on a poorly structured data model — users can't find things because the taxonomy is broken	Solving a findability problem with a search label doesn't fix the underlying IA. Adding "smart" adds expectation without capability.	UX Debt
"AI recommendations"	A ranked list sorted by a static scoring formula last updated 18 months ago	Static ranking presented as adaptive intelligence. Users adjust behavior expecting the system to learn — it doesn't. Misplaced trust.	Automation
"Use AI to explain the dashboard"	The dashboard has too many unlabeled metrics and no clear hierarchy — users don't know what to act on	An LLM narrating a confusing UI doesn't make the UI less confusing. It adds a layer of text to a layout problem.	UX Debt
"Add AI to generate buyer insights"	A request to use LLM without defining what a good insight contains, what data it draws from, or what decision it supports	LLMs produce output that matches the shape of the request. Without a defined output standard, the feature ships as plausible-sounding text with no evaluable quality. Prompt design is a design problem, not an engineering one.	LLM

04 — Human Decision Governance

LLM

Output is probabilistic. Human review required before any external send or irreversible action.

ML

Predictions affect decisions. Human required when scores influence risk classification or financial outcome.

Automation

Deterministic by definition. Human decision required at rule-authoring time, not execution time.

UX Debt

No AI governance applicable. Requires a design decision, not an oversight model.

Requires human decision

Any output that affects a financial transaction or commitment
Content that will be sent externally under the user's name
Recommendations that affect risk classification of a record
Actions that are irreversible or difficult to audit after the fact
Conflicts between model output and user-provided context

AI may act autonomously

Drafting, summarizing, or reformatting — when output is reviewable before use
Sorting, ranking, filtering within a clearly defined and auditable ruleset
Surfacing suggestions when the user retains explicit accept/reject control
Routine automation with full audit trail available
Low-stakes personalization with an accessible override

Before shipping any LLM feature — define the output

What signals matter

Which data inputs should shape the output? What context is required vs. optional? Outputs are only as good as the inputs they're allowed to use.

What decision it supports

What should the user be able to do after reading this output that they couldn't before? If that's not defined, the output has no evaluable purpose.

What a bad output looks like

Vague, hedged, or plausible-but-wrong outputs are the default failure mode. Define what bad looks like before launch — not after users start ignoring the feature.

Who owns the definition

Prompt design is a design problem, not an engineering one. Output quality criteria should be set before implementation begins — not discovered during QA.