Mastering HuggingFace Models - Comprehensive Guide (Full Tutorial)
Mastering HuggingFace Models - Comprehensive Guide
Author: AYI-NEDJIMI | AI & Cybersecurity Consultant
This tutorial covers everything you need to know about HuggingFace models: searching, loading, inference, text generation, classification, computer vision, audio, embeddings, Inference API, uploading, and best practices.
1. Browse, Search, and Filter Models
The HuggingFace Hub hosts over 500,000 models. Knowing how to find them efficiently is essential.
1.1 Main Filters
| Filter | Description | Examples |
|---|---|---|
| pipeline_tag | Model task | text-generation, text-classification, image-classification |
| language | Supported language | en, fr, ar, zh |
| license | License | mit, apache-2.0, cc-by-4.0 |
| library | Framework | pytorch, tensorflow, jax, onnx |
| model_name | Search by name | llama, mistral, gpt |
1.2 Programmatic Search
from huggingface_hub import HfApi
api = HfApi()
# Text generation models, sorted by popularity
models = api.list_models(
filter="text-generation",
sort="downloads",
direction=-1,
limit=20
)
for m in models:
print(f"{m.id:50s} | {m.downloads:>12,} DL | {m.likes:>6} likes")
# Search by author
meta_models = api.list_models(author="meta-llama", limit=10)
for m in meta_models:
print(f" {m.id}")
# Text search
results = api.list_models(search="cybersecurity", limit=10)
for m in results:
print(f" {m.id}")
For a detailed comparison of the best open-source LLMs, check: Open-Source LLM Comparison 2026
2. Load Any Model with AutoModel and Pipeline
2.1 The Pipeline Method (Recommended for Beginners)
from transformers import pipeline
# Pipeline auto-detects the model and tokenizer
classifier = pipeline("sentiment-analysis")
result = classifier("I love HuggingFace, it's amazing!")
print(result)
# [{'label': 'POSITIVE', 'score': 0.9998}]
# Specify a particular model
classifier = pipeline(
"text-classification",
model="nlptown/bert-base-multilingual-uncased-sentiment",
device=0 # GPU 0 (or -1 for CPU)
)
results = classifier(["Excellent product!", "Very disappointed, poor quality."])
for r in results:
print(f" {r['label']} ({r['score']:.4f})")
2.2 The AutoModel Method (More Control)
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model_name = "nlptown/bert-base-multilingual-uncased-sentiment"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
# Manual tokenization
inputs = tokenizer("This restaurant is fantastic!", return_tensors="pt", padding=True, truncation=True)
# Inference
with torch.no_grad():
outputs = model(**inputs)
predictions = torch.softmax(outputs.logits, dim=-1)
predicted_class = torch.argmax(predictions, dim=-1)
print(f"Predicted class: {predicted_class.item() + 1} stars")
print(f"Probabilities: {predictions[0].tolist()}")
2.3 Available Auto Classes
| Class | Task |
|---|---|
AutoModelForCausalLM |
Text generation (GPT, Llama) |
AutoModelForSeq2SeqLM |
Translation, Summarization (T5, BART) |
AutoModelForSequenceClassification |
Text classification |
AutoModelForTokenClassification |
NER, POS tagging |
AutoModelForQuestionAnswering |
Question Answering |
AutoModelForImageClassification |
Image classification |
AutoModelForObjectDetection |
Object detection |
AutoModelForSpeechSeq2Seq |
Speech recognition |
3. Text Generation (LLMs)
3.1 Popular Models
The main LLMs available on the Hub:
- Meta Llama 3.1 (8B, 70B, 405B) - the best open-source
- Mistral / Mixtral - excellent quality/size ratio
- Qwen 2.5 - strong in multilingual
- Google Gemma 2 - compact and efficient
- Microsoft Phi-3 - small but powerful
3.2 Generation with Pipeline
from transformers import pipeline
generator = pipeline(
"text-generation",
model="gpt2",
device=-1 # CPU
)
# Simple generation
output = generator(
"The future of cybersecurity involves",
max_new_tokens=200,
temperature=0.7,
top_p=0.9,
do_sample=True,
num_return_sequences=1
)
print(output[0]['generated_text'])
3.3 Generation with AutoModelForCausalLM
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_name = "gpt2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
# Advanced generation parameters
input_text = "Cybersecurity in the age of AI"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(
**inputs,
max_new_tokens=150,
temperature=0.8, # Creativity (0=deterministic, 1=creative)
top_k=50, # Top-K sampling
top_p=0.95, # Nucleus sampling
repetition_penalty=1.2, # Repetition penalty
do_sample=True,
pad_token_id=tokenizer.eos_token_id
)
result = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(result)
3.4 Chat Templates (Conversational Models)
from transformers import AutoTokenizer
# Modern models use chat templates
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3.1-8B-Instruct")
messages = [
{"role": "system", "content": "You are a cybersecurity expert."},
{"role": "user", "content": "What is a zero-day vulnerability?"}
]
# Apply chat template
# formatted = tokenizer.apply_chat_template(messages, tokenize=False)
# print(formatted)
4. Text Classification
from transformers import pipeline
# Sentiment analysis
sentiment = pipeline("sentiment-analysis", model="cardiffnlp/twitter-roberta-base-sentiment-latest")
results = sentiment([
"I love this new AI tool!",
"This is terrible, worst experience ever.",
"The weather is okay today."
])
for r in results:
print(f" {r['label']:10s} ({r['score']:.4f})")
# Zero-shot classification (no specific training needed)
classifier = pipeline("zero-shot-classification", model="facebook/bart-large-mnli")
result = classifier(
"The new security update fixes a critical vulnerability in the authentication system",
candidate_labels=["cybersecurity", "development", "marketing", "finance"]
)
for label, score in zip(result['labels'], result['scores']):
print(f" {label:20s}: {score:.4f}")
5. Named Entity Recognition (NER)
from transformers import pipeline
ner = pipeline("ner", model="dslim/bert-base-NER", aggregation_strategy="simple")
text = "Elon Musk visited the HuggingFace headquarters in New York on January 15, 2026."
entities = ner(text)
for entity in entities:
print(f" {entity['entity_group']:5s} | {entity['word']:25s} | score: {entity['score']:.4f}")
6. Text Summarization
from transformers import pipeline
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
text = (
"HuggingFace is a technology company that develops tools for building applications "
"using machine learning. The company is most notable for its Transformers library "
"built for natural language processing applications and its platform that allows "
"users to share machine learning models and datasets. Founded in 2016, the company "
"has grown to become one of the most important players in the AI ecosystem, with "
"over 500,000 models hosted on its platform. The company provides free hosting for "
"ML demos through Spaces and offers enterprise solutions for production deployments."
)
summary = summarizer(text, max_length=80, min_length=20, do_sample=False)
print(summary[0]['summary_text'])
7. Computer Vision
7.1 Image Classification
from transformers import pipeline
classifier = pipeline("image-classification", model="google/vit-base-patch16-224")
result = classifier("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg")
for r in result[:3]:
print(f" {r['label']:30s}: {r['score']:.4f}")
7.2 Object Detection
from transformers import pipeline
detector = pipeline("object-detection", model="facebook/detr-resnet-50")
result = detector("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg")
for obj in result:
print(f" {obj['label']:15s} (score: {obj['score']:.4f}) | box: {obj['box']}")
7.3 Image Segmentation
from transformers import pipeline
segmenter = pipeline("image-segmentation", model="facebook/maskformer-swin-base-coco")
# result = segmenter("image.jpg")
# Returns masks with labels and scores
8. Audio and Speech
8.1 Speech Recognition (ASR)
from transformers import pipeline
asr = pipeline("automatic-speech-recognition", model="openai/whisper-base")
# result = asr("audio_sample.wav")
# print(result['text'])
# Whisper supports 99+ languages
asr_multilingual = pipeline("automatic-speech-recognition", model="openai/whisper-large-v3")
# result = asr_multilingual("french_audio.wav", generate_kwargs={"language": "french"})
8.2 Text-to-Speech (TTS)
from transformers import pipeline
tts = pipeline("text-to-speech", model="microsoft/speecht5_tts")
# audio = tts("Hello, welcome to HuggingFace!")
# Save with scipy or soundfile
9. Embeddings and Sentence-Transformers
Embeddings are essential for RAG (Retrieval-Augmented Generation) and semantic search.
For more on RAG, check: RAG Guide - Retrieval Augmented Generation
from transformers import AutoTokenizer, AutoModel
import torch
# Load an embedding model
model_name = "sentence-transformers/all-MiniLM-L6-v2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name)
def get_embedding(text):
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True)
with torch.no_grad():
outputs = model(**inputs)
# Mean pooling
attention_mask = inputs['attention_mask']
token_embeddings = outputs.last_hidden_state
input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
embedding = torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
return embedding[0]
# Compute similarity
emb1 = get_embedding("Cybersecurity is important for businesses")
emb2 = get_embedding("IT security protects organizations")
emb3 = get_embedding("I love Italian cooking")
sim_12 = torch.cosine_similarity(emb1.unsqueeze(0), emb2.unsqueeze(0))
sim_13 = torch.cosine_similarity(emb1.unsqueeze(0), emb3.unsqueeze(0))
print(f"Similarity (cybersec/IT security): {sim_12.item():.4f}") # High
print(f"Similarity (cybersec/cooking): {sim_13.item():.4f}") # Low
10. Inference API (Free Tier)
The Inference API lets you use models without downloading them:
import requests
API_URL = "https://api-inference.huggingface.co/models/gpt2"
headers = {"Authorization": "Bearer hf_your_token"}
def query(payload):
response = requests.post(API_URL, headers=headers, json=payload)
return response.json()
result = query({"inputs": "The future of cybersecurity is"})
print(result)
Free Tier Limits
- Rate limiting (a few requests/minute)
- Models loaded on demand (cold start)
- No dedicated GPU
- Timeout on large models
11. Dedicated Inference Endpoints (Production)
For production, Inference Endpoints offer:
- Dedicated GPU: T4, L4, A10G, A100
- Auto-scaling: 0 to N replicas
- SLA: 99.9% availability
- Security: VPC, authentication
- Monitoring: real-time metrics
from huggingface_hub import InferenceClient
client = InferenceClient(
model="meta-llama/Meta-Llama-3.1-8B-Instruct",
token="hf_your_token"
)
# Text generation
response = client.text_generation(
"Explain cybersecurity best practices:",
max_new_tokens=200,
temperature=0.7
)
print(response)
# Chat completion
response = client.chat_completion(
messages=[
{"role": "system", "content": "You are a cybersecurity expert."},
{"role": "user", "content": "What is a zero-day vulnerability?"}
],
max_tokens=300
)
print(response.choices[0].message.content)
12. Model Cards Best Practices
A good model card is essential for transparency:
---
language: en
license: mit
pipeline_tag: text-classification
tags:
- cybersecurity
- english
- bert
datasets:
- my-cybersec-dataset
metrics:
- accuracy
- f1
model-index:
- name: my-cybersec-model
results:
- task:
type: text-classification
metrics:
- name: Accuracy
type: accuracy
value: 0.95
---
# My CyberSec Model
## Description
This model classifies cybersecurity texts into categories.
## Usage
(code examples)
## Training
(dataset details, hyperparameters)
## Limitations
(known biases, edge cases)
13. Upload Your Own Model
from huggingface_hub import HfApi
api = HfApi(token="hf_your_token")
# Method 1: push_to_hub from transformers
# model.push_to_hub("my-username/my-model")
# tokenizer.push_to_hub("my-username/my-model")
# Method 2: upload_folder
api.upload_folder(
folder_path="./my_model",
repo_id="my-username/my-model",
repo_type="model"
)
# Method 3: upload_file
api.upload_file(
path_or_fileobj="./model.safetensors",
path_in_repo="model.safetensors",
repo_id="my-username/my-model",
repo_type="model"
)
Conclusion
HuggingFace models cover every imaginable AI task. Whether you need text generation, classification, NER, vision, audio, or embeddings, the Hub has the model for you. The key is choosing the right model based on the task, language, and deployment constraints.
Explore our CyberSec AI collection: CyberSec AI Portfolio
Tutorial written by AYI-NEDJIMI - AI & Cybersecurity Consultant