Open this lesson in your favourite AI. It'll walk you through the why, explain the demo, and quiz you on the try-it list.
HuggingFace has become the standard library for working with pretrained models. The pipeline() abstraction lets you run sentiment analysis, named entity recognition, text generation, question answering, and translation in 3 lines. The tokenizer + model pair gives you full control when you need it. Knowing when to use each layer of the API — pipeline for prototyping, AutoTokenizer/AutoModel for production, Trainer for fine-tuning — means you can go from idea to working prototype in minutes and from prototype to production without rewriting everything.
HuggingFace exposes three layers of control: pipeline() hides everything behind a task string and is best for prototyping; AutoTokenizer + AutoModel exposes raw logits for custom heads or batching logic; Trainer wraps the training loop for fine-tuning. Understanding which layer to reach for — and what each hides — lets you go from a working prototype to a production inference service without rewriting the core logic.
pipeline('ner') (named entity recognition). Run it on 'Elon Musk founded SpaceX in 2002 in El Segundo.' Print each entity and its label. How does HuggingFace know which model to download?pipeline('zero-shot-classification') on a sentence with candidate_labels=['technology', 'sports', 'politics']. This works with no fine-tuning — the model has never seen your labels.'Transformers are great' and print both the token IDs and the decoded tokens (call tokenizer.convert_ids_to_tokens(ids)). Notice that 'Transformers' may be split into subword pieces.AutoModel.from_pretrained('bert-base-uncased') (base BERT, not fine-tuned). Extract the [CLS] embedding for two sentences. Compute cosine similarity between them using torch.nn.functional.cosine_similarity. Semantically similar sentences should score higher.Use these three in order. Each builds on the one before.
In one paragraph, explain what `pipeline('sentiment-analysis')` is doing under the hood: what model gets downloaded, what does the tokenizer do, what does the model output, and how does HuggingFace map logits to a label+score dict?
Walk me through the three levels of the HuggingFace API (pipeline, AutoTokenizer+AutoModel, Trainer). For each: what it abstracts away, what it exposes, and when you'd drop down to the next layer.
I'm building a production sentiment analysis service that needs to handle 1000 requests/sec, with p95 latency < 50ms, on a server with 2 NVIDIA A10G GPUs. Walk me through the choices: model size (DistilBERT vs BERT-base vs RoBERTa-large), batching strategy, quantization (int8 with ONNX Runtime or bitsandbytes), and serving framework (TorchServe, Triton, custom FastAPI). What's the expected throughput for each combination?
from transformers import pipeline, AutoTokenizer, AutoModelForSequenceClassification
import torch
# Layer 1: pipeline — the fastest path from text to prediction
classifier = pipeline("sentiment-analysis") # downloads distilbert-sst2 (~250MB once)
results = classifier([
"This course is fantastic!",
"I don't understand half of it.",
"It's fine, I guess.",
])
for r in results:
print(f"{r['label']:8s} {r['score']:.3f}")
# Layer 2: AutoTokenizer — convert text to token IDs
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english")
encoded = tokenizer(["Great course!", "Not great."], padding=True, return_tensors="pt")
print("\nToken IDs shape:", encoded["input_ids"].shape) # (2, max_len)
print("Decoded[0]:", tokenizer.decode(encoded["input_ids"][0]))
# Layer 3: AutoModel — raw logits for full control
model = AutoModelForSequenceClassification.from_pretrained(
"distilbert-base-uncased-finetuned-sst-2-english"
)
with torch.no_grad():
logits = model(**encoded).logits
probs = logits.softmax(dim=-1)
print("\nProbabilities (neg/pos):", probs.numpy().round(3))python3 main.py