Overview of common approaches to AI intent recognition — how they work, their tradeoffs, and when to use each.

1. Rule-Based / Pattern Matching

Match user input against hand-written regex patterns and keyword dictionaries.

rules = {
    "book_flight": ["book.*flight", "buy.*ticket", "fly.*to"],
    "check_weather": ["weather", "temperature", "raining"],
}

Pros: Fully explainable, zero training data, sub-millisecond latency, fully controllable.

Cons: Low coverage, maintenance cost explodes with scale, poor generalization.

2. Traditional ML Classification (TF-IDF + SVM/LR)

Convert text to TF-IDF vectors, then train a multi-class classifier (SVM, Logistic Regression, Naive Bayes).

from sklearn.pipeline import Pipeline
from sklearn.svm import LinearSVC
from sklearn.feature_extraction.text import TfidfVectorizer

model = Pipeline([("tfidf", TfidfVectorizer()), ("clf", LinearSVC())])
model.fit(X_train, y_train)

Pros: Fast to train, interpretable, works with modest data.

Cons: No semantic understanding, poor handling of synonyms and ambiguity, heavy feature engineering needed.

3. Deep Learning Classification (BiLSTM / TextCNN)

Feed word embeddings into a BiLSTM or CNN encoder, then a classification head.

Pros: Better semantic capture than TF-IDF, end-to-end training.

Cons: Needs thousands of labeled examples, outclassed by Transformer-based models, largely obsolete now.

4. Pre-trained Model Fine-tuning (BERT / RoBERTa / ERNIE) ⭐ mainstream

Fine-tune a BERT-family model on domain data. The [CLS] token representation feeds into a classification head.

from transformers import BertForSequenceClassification, Trainer

model = BertForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=N)
trainer = Trainer(model=model, train_dataset=dataset)
trainer.train()

Pros: High accuracy, strong generalization, multilingual variants available (ERNIE, MacBERT for Chinese).

Cons: Inference latency 50–200ms, compute-heavy, needs hundreds of labeled examples minimum.

5. Sentence Embedding + Similarity Matching (Zero/Few-shot)

Encode user input and intent examples into a shared embedding space, then pick the closest intent by cosine similarity.

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("paraphrase-multilingual-mpnet-base-v2")
intent_examples = {"book_flight": "I want to book a flight to New York"}

query_emb = model.encode(user_query)
# cosine similarity → best matching intent

Pros: Minimal labeling needed, new intents can be added without retraining, cold-start friendly.

Cons: Struggles to distinguish similar intents, threshold tuning required, inconsistent recall.

6. LLM Prompt-based (GPT / Claude / local models)

Prompt an LLM directly to classify intent, returning structured output.

prompt = """
User input: "{user_input}"
Choose the best matching intent from the list below and return JSON.
Intents: {intents}
Output format: {{"intent": "xxx", "confidence": 0.9, "slots": {{}}}}
"""

Pros: Zero labeling, handles complex semantics, can extract entities/slots simultaneously, easy to extend.

Cons: High latency (500ms+), API cost, non-deterministic output, requires prompt engineering.

Few-Shot Prompting for LLM Intent Classification

Zero-shot (just listing intents) works but is inconsistent — the model may return "book_flight", "Book Flight", or "booking a flight" for the same input. Few-shot prompting anchors the output format with examples.

Zero-shot (fragile):

prompt = """
Classify the user input into one of these intents: book_flight, check_weather, set_alarm
User: "fly me to Tokyo"
Intent:
"""

Few-shot + structured output (production):

import json
from anthropic import Anthropic

client = Anthropic()

INTENTS = ["book_flight", "check_weather", "set_alarm", "play_music", "send_message"]

FEW_SHOT_EXAMPLES = [
    ("I want a ticket to London",  "book_flight",   0.97),
    ("will it rain tomorrow?",     "check_weather", 0.95),
    ("wake me up at 7am",          "set_alarm",     0.98),
    ("play some jazz music",       "play_music",    0.96),
    ("text John I'll be late",     "send_message",  0.94),
]

def build_prompt(user_input: str) -> str:
    examples = "\n".join(
        f'User: "{text}" → {{"intent": "{intent}", "confidence": {conf}}}'
        for text, intent, conf in FEW_SHOT_EXAMPLES
    )
    intent_list = ", ".join(INTENTS)
    return f"""Classify user input into exactly one intent from: {intent_list}

Examples:
{examples}

Now classify:
User: "{user_input}"
Output JSON only:"""

def classify_intent(user_input: str) -> dict:
    response = client.messages.create(
        model="claude-haiku-3-5",  # cheapest/fastest — sufficient for classification
        max_tokens=60,
        messages=[{"role": "user", "content": build_prompt(user_input)}]
    )
    return json.loads(response.content[0].text)

# {"intent": "book_flight", "confidence": 0.96}
result = classify_intent("fly me to Beijing next Monday")

Dynamic Few-Shot (for 20+ intents)

Stuffing all examples into every prompt wastes tokens. Instead, retrieve the most relevant examples per query using semantic similarity:

from sentence_transformers import SentenceTransformer
import numpy as np

embedder = SentenceTransformer("paraphrase-multilingual-mpnet-base-v2")

example_bank = [
    {"text": "book me a flight to London", "intent": "book_flight"},
    {"text": "I need a plane ticket",      "intent": "book_flight"},
    {"text": "what's the weather today",   "intent": "check_weather"},
    {"text": "will it snow tomorrow",      "intent": "check_weather"},
    # ... 2-3 examples per intent
]

bank_embeddings = embedder.encode([e["text"] for e in example_bank])

def get_top_k_examples(query: str, k: int = 3) -> list:
    query_emb = embedder.encode(query)
    scores = np.dot(bank_embeddings, query_emb)
    top_k = np.argsort(scores)[::-1][:k]
    return [example_bank[i] for i in top_k]

def classify_with_dynamic_fewshot(user_input: str) -> dict:
    examples = get_top_k_examples(user_input, k=3)
    example_str = "\n".join(
        f'User: "{e["text"]}" → {e["intent"]}' for e in examples
    )
    # only the 3 most relevant examples go into the prompt
    prompt = f"""...\nExamples:\n{example_str}\n\nUser: "{user_input}"\nIntent:"""
    # ... call LLM

When to Use What

Technique When Why
Zero-shot Prototyping, very small intent sets Simplest, no examples needed
Static few-shot <20 intents, stable labels Reliable output format, cheap
Dynamic few-shot 20+ intents, large example bank Stays within context window, higher accuracy
Structured JSON output Production always Parseable, no format drift

Use a small/fast model (claude-haiku, gpt-4o-mini) — classification doesn’t need a large model.

7. Hybrid Architecture (production recommendation)

Layer the approaches by cost and confidence:

User input
  │
  ├─ High-confidence rule match ──→ return immediately (<1ms)
  │
  ├─ BERT classifier (primary) ───→ confidence above threshold → return
  │
  └─ LLM fallback ────────────────→ low confidence / complex semantics

This gives you speed on the common path, accuracy on the hard cases, and control over cost.

Comparison

Approach Data needed Latency Accuracy Controllability Best for
Rule matching None <1ms Low Highest High-frequency fixed intents
TF-IDF + SVM Thousands <10ms Medium High Rapid prototype
BERT fine-tune 100–1000+ 50–200ms High Medium Production primary
Embedding similarity Very few <50ms Medium Medium Cold start / new intents
LLM prompt None 500ms+ High Low Complex semantics / fallback
Hybrid 100+ Tiered Highest High Production (recommended)

Choosing an Approach

  • <20 intents, sufficient labeled data → BERT fine-tune
  • Cold start or frequently changing intents → Embedding similarity + few examples
  • Complex dialogue / multi-turn understanding → LLM (or hybrid)
  • Strict latency requirements → Rules + lightweight classifier

BERT Fine-tuning in Practice

Labels and Training Data Volume

num_labels = number of intents. For 20 intents: num_labels=20.

model = BertForSequenceClassification.from_pretrained(
    "bert-base-uncased",
    num_labels=20
)

How much data per intent:

Quality bar Samples per intent Total (20 intents)
Minimum viable 50 1,000
Decent production 100–200 2,000–4,000
Comfortable 500+ 10,000+

BERT transfers well — far less data needed than training from scratch.

What Training Data Looks Like

Each sample = one user utterance + one intent label. An utterance is a single thing the user says — one sentence, question, or command.

Raw CSV:

text,intent
"book me a flight to London","book_flight"
"I want to fly to Tokyo next Friday","book_flight"
"can you get me a ticket to Paris","book_flight"
"what's the weather like today","check_weather"
"will it rain tomorrow in Shanghai","check_weather"
"set an alarm for 7am","set_alarm"

As a HuggingFace Dataset:

from datasets import Dataset
from transformers import BertTokenizer

tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")

intent2id = {
    "book_flight": 0,
    "check_weather": 1,
    "set_alarm": 2,
    # ... 17 more
}

data = [
    {"text": "book me a flight to London",        "label": 0},
    {"text": "I want to fly to Tokyo next Friday", "label": 0},
    {"text": "what's the weather like today",      "label": 1},
    {"text": "will it rain tomorrow",              "label": 1},
    {"text": "set an alarm for 7am",               "label": 2},
]

dataset = Dataset.from_list(data)

def tokenize(batch):
    return tokenizer(batch["text"], truncation=True, padding="max_length", max_length=128)

dataset = dataset.map(tokenize, batched=True)
dataset = dataset.train_test_split(test_size=0.1)

Training loop:

from transformers import TrainingArguments, Trainer
import numpy as np

args = TrainingArguments(
    output_dir="./intent-model",
    num_train_epochs=5,
    per_device_train_batch_size=32,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
)

def compute_metrics(eval_pred):
    logits, labels = eval_pred
    preds = np.argmax(logits, axis=-1)
    return {"accuracy": (preds == labels).mean()}

trainer = Trainer(
    model=model,
    args=args,
    train_dataset=dataset["train"],
    eval_dataset=dataset["test"],
    compute_metrics=compute_metrics,
)
trainer.train()

What happens inside:

Input:  "fly me to Beijing"
         ↓ tokenize
        [CLS] fly me to beijing [SEP]
         ↓ BERT encoder (12 layers)
        [CLS] embedding  ← 768-dim vector
         ↓ linear layer (768 → 20)
        logits: [-1.2, 3.8, 0.1, ...]
         ↓ softmax → argmax
        predicted intent: "book_flight"

Key things to watch:

  • Variance matters — each intent needs lexically diverse examples, not 100 paraphrases of one sentence
  • Class balance — keep sample counts roughly equal across intents, or use weighted loss

Hardware Requirements

GPU intensive, but nowhere near LLM scale.

Hardware Time (~2000 samples, 5 epochs) Cost
CPU only 2–8 hours free (just slow)
RTX 3060 (12GB VRAM) ~5–10 min consumer GPU
RTX 4090 (24GB VRAM) ~2–5 min prosumer
Google Colab free (T4) ~10–15 min free

Why not like LLM training:

  LLM pre-training BERT fine-tuning
What you’re doing Learning language from scratch on trillions of tokens Adapting the classifier head on your small dataset
Parameters 175B+ 110M (BERT-base), converges fast
Data Terabytes ~2,000 rows
GPU needed Hundreds of A100s 1 consumer GPU or free Colab
Time Weeks/months Minutes
Cost Millions $ ~$0

You’re not pre-training BERT — that’s already done. You’re fine-tuning the last classification layer and slightly adjusting the rest. Google Colab free tier (T4) is sufficient for 20 intents.