Product Design Framework · v1.0
AI Taxonomy & Governance
Esther Yang · Senior Product Designer · Highway.AI · Internal Reference
01.A
Large Language Models
Systems that generate, reason over, or transform natural language using foundation models. Probabilistic by nature — output varies, requires calibration, and cannot be fully predicted.
Generative Non-deterministic Context-aware
Use when
The task requires flexible language generation and output will vary by context — drafting, summarizing, reasoning through open-ended content.
Next move
Define what a good output looks like before writing a prompt.
Visual treatment
Stream output progressively and expose a regenerate action. Streaming signals that the system is reasoning, not retrieving — it sets accurate expectations before the user reads a single word.
01.B
Machine Learning
Models trained on historical data to predict, classify, or rank. Deterministic at inference time. Requires labeled data and degrades when distribution shifts.
Predictive Data-dependent Trainable
Use when
The system needs to score, rank, or flag based on patterns in historical data — and consistency across inputs matters more than flexibility.
Next move
The conversation to have is about data quality and model confidence, not UI.
Visual treatment
Show scores with explicit thresholds and model attribution. Users act on predictions — hiding the confidence level removes their ability to calibrate trust. Transparency here is a functional requirement, not a design flourish.
01.C
Automation
Rule-based logic that executes defined sequences without inference. Predictable, auditable, zero ambiguity. Commonly misattributed to AI.
Rule-based Deterministic Auditable
Use when
A defined trigger should execute a defined action with no variability. No inference required. Do not label this AI.
Next move
Spec the trigger and the action. This doesn't need a model.
Visual treatment
Use plain status language: "sent automatically," not "AI sent." Attaching AI language to deterministic behavior trains users to expect intelligence that isn't there.
01.D
UX Debt
Friction that appears to be an AI problem but is a design or information architecture problem. Adding AI here treats symptoms, not cause.
Structural Pre-AI Resolvable
Recognize it when
Users are confused, can't find things, or misuse features — and the proposed fix is to add an AI layer on top.
Next move
Map what users are actually trying to do before proposing any solution.
Visual treatment
No AI surface warranted — fix the navigation or hierarchy first. An AI assistant over broken IA compounds unreliability with unpredictability. Resolve the structure; the surface follows.
LLM — if misapplied
Calling automation or ML "AI-powered" sets expectations of reasoning the system can't meet. Users attempt edge cases it can't handle. Trust erodes faster than it was built.
Using an LLM for something deterministic burns tokens on every call. At scale that's a real line item — you're paying inference cost for a lookup that should be a query.
Primary failure: Expectation gap → trust collapse + unnecessary compute cost
ML — if misapplied
Using an LLM where a trained model is needed introduces unpredictability into decisions that require consistency. Risk scores that vary by phrasing are not risk scores — they're opinions.
You're also paying for variability you didn't want. A trained model runs inference cheaply and consistently. An LLM doing the same job costs more per call and produces less stable output.
Primary failure: Inconsistency → liability + inflated inference cost
Automation — if misapplied
Labeling rule-based logic as AI inflates perceived value short term. When users discover the system is a conditional, not a model, they recalibrate downward — and take other AI claims with them.
The cost here is in over-engineering. Teams build LLM scaffolding, evaluation pipelines, and prompt infrastructure around something that should be an if/then statement. That's weeks of work and ongoing maintenance on a problem that didn't need a model.
Primary failure: Credibility debt + over-engineering cost
UX Debt — if ignored
Layering AI onto unresolved structural problems compounds complexity. Every AI interaction that fails because the underlying IA is broken trains users to distrust AI rather than the design decision that preceded it.
You've also now funded two problems instead of one. The IA still needs fixing, and you've added an AI layer that will need its own maintenance, evaluation, and iteration — on top of a foundation that was never sound.
Primary failure: Compounding confusion + compounding cost
What was said What it actually was Why it matters Correct label
"AI-powered notifications" Scheduled sends triggered by user actions — no model involved Positions a deterministic feature as intelligent. When the notification misfires, users blame the AI rather than the rule. Automation
"Smart search" Keyword matching on a poorly structured data model — users can't find things because the taxonomy is broken Solving a findability problem with a search label doesn't fix the underlying IA. Adding "smart" adds expectation without capability. UX Debt
"AI recommendations" A ranked list sorted by a static scoring formula last updated 18 months ago Static ranking presented as adaptive intelligence. Users adjust behavior expecting the system to learn — it doesn't. Misplaced trust. Automation
"Use AI to explain the dashboard" The dashboard has too many unlabeled metrics and no clear hierarchy — users don't know what to act on An LLM narrating a confusing UI doesn't make the UI less confusing. It adds a layer of text to a layout problem. UX Debt
"Add AI to generate buyer insights" A request to use LLM without defining what a good insight contains, what data it draws from, or what decision it supports LLMs produce output that matches the shape of the request. Without a defined output standard, the feature ships as plausible-sounding text with no evaluable quality. Prompt design is a design problem, not an engineering one. LLM
LLM
Output is probabilistic. Human review required before any external send or irreversible action.
ML
Predictions affect decisions. Human required when scores influence risk classification or financial outcome.
Automation
Deterministic by definition. Human decision required at rule-authoring time, not execution time.
UX Debt
No AI governance applicable. Requires a design decision, not an oversight model.
Requires human decision
  • Any output that affects a financial transaction or commitment
  • Content that will be sent externally under the user's name
  • Recommendations that affect risk classification of a record
  • Actions that are irreversible or difficult to audit after the fact
  • Conflicts between model output and user-provided context
AI may act autonomously
  • Drafting, summarizing, or reformatting — when output is reviewable before use
  • Sorting, ranking, filtering within a clearly defined and auditable ruleset
  • Surfacing suggestions when the user retains explicit accept/reject control
  • Routine automation with full audit trail available
  • Low-stakes personalization with an accessible override
Before shipping any LLM feature — define the output
What signals matter
Which data inputs should shape the output? What context is required vs. optional? Outputs are only as good as the inputs they're allowed to use.
What decision it supports
What should the user be able to do after reading this output that they couldn't before? If that's not defined, the output has no evaluable purpose.
What a bad output looks like
Vague, hedged, or plausible-but-wrong outputs are the default failure mode. Define what bad looks like before launch — not after users start ignoring the feature.
Who owns the definition
Prompt design is a design problem, not an engineering one. Output quality criteria should be set before implementation begins — not discovered during QA.