Mastering HuggingFace Models - Comprehensive Guide (Full Tutorial)

#1
by AYI-NEDJIMI - opened

Mastering HuggingFace Models - Comprehensive Guide

Author: AYI-NEDJIMI | AI & Cybersecurity Consultant

This tutorial covers everything you need to know about HuggingFace models: searching, loading, inference, text generation, classification, computer vision, audio, embeddings, Inference API, uploading, and best practices.


1. Browse, Search, and Filter Models

The HuggingFace Hub hosts over 500,000 models. Knowing how to find them efficiently is essential.

1.1 Main Filters

Filter Description Examples
pipeline_tag Model task text-generation, text-classification, image-classification
language Supported language en, fr, ar, zh
license License mit, apache-2.0, cc-by-4.0
library Framework pytorch, tensorflow, jax, onnx
model_name Search by name llama, mistral, gpt

1.2 Programmatic Search

from huggingface_hub import HfApi

api = HfApi()

# Text generation models, sorted by popularity
models = api.list_models(
    filter="text-generation",
    sort="downloads",
    direction=-1,
    limit=20
)

for m in models:
    print(f"{m.id:50s} | {m.downloads:>12,} DL | {m.likes:>6} likes")

# Search by author
meta_models = api.list_models(author="meta-llama", limit=10)
for m in meta_models:
    print(f"  {m.id}")

# Text search
results = api.list_models(search="cybersecurity", limit=10)
for m in results:
    print(f"  {m.id}")

For a detailed comparison of the best open-source LLMs, check: Open-Source LLM Comparison 2026


2. Load Any Model with AutoModel and Pipeline

2.1 The Pipeline Method (Recommended for Beginners)

from transformers import pipeline

# Pipeline auto-detects the model and tokenizer
classifier = pipeline("sentiment-analysis")
result = classifier("I love HuggingFace, it's amazing!")
print(result)
# [{'label': 'POSITIVE', 'score': 0.9998}]

# Specify a particular model
classifier = pipeline(
    "text-classification",
    model="nlptown/bert-base-multilingual-uncased-sentiment",
    device=0  # GPU 0 (or -1 for CPU)
)
results = classifier(["Excellent product!", "Very disappointed, poor quality."])
for r in results:
    print(f"  {r['label']} ({r['score']:.4f})")

2.2 The AutoModel Method (More Control)

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_name = "nlptown/bert-base-multilingual-uncased-sentiment"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Manual tokenization
inputs = tokenizer("This restaurant is fantastic!", return_tensors="pt", padding=True, truncation=True)

# Inference
with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.softmax(outputs.logits, dim=-1)
    predicted_class = torch.argmax(predictions, dim=-1)

print(f"Predicted class: {predicted_class.item() + 1} stars")
print(f"Probabilities: {predictions[0].tolist()}")

2.3 Available Auto Classes

Class Task
AutoModelForCausalLM Text generation (GPT, Llama)
AutoModelForSeq2SeqLM Translation, Summarization (T5, BART)
AutoModelForSequenceClassification Text classification
AutoModelForTokenClassification NER, POS tagging
AutoModelForQuestionAnswering Question Answering
AutoModelForImageClassification Image classification
AutoModelForObjectDetection Object detection
AutoModelForSpeechSeq2Seq Speech recognition

3. Text Generation (LLMs)

3.1 Popular Models

The main LLMs available on the Hub:

  • Meta Llama 3.1 (8B, 70B, 405B) - the best open-source
  • Mistral / Mixtral - excellent quality/size ratio
  • Qwen 2.5 - strong in multilingual
  • Google Gemma 2 - compact and efficient
  • Microsoft Phi-3 - small but powerful

3.2 Generation with Pipeline

from transformers import pipeline

generator = pipeline(
    "text-generation",
    model="gpt2",
    device=-1  # CPU
)

# Simple generation
output = generator(
    "The future of cybersecurity involves",
    max_new_tokens=200,
    temperature=0.7,
    top_p=0.9,
    do_sample=True,
    num_return_sequences=1
)
print(output[0]['generated_text'])

3.3 Generation with AutoModelForCausalLM

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_name = "gpt2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# Advanced generation parameters
input_text = "Cybersecurity in the age of AI"
inputs = tokenizer(input_text, return_tensors="pt")

outputs = model.generate(
    **inputs,
    max_new_tokens=150,
    temperature=0.8,        # Creativity (0=deterministic, 1=creative)
    top_k=50,               # Top-K sampling
    top_p=0.95,             # Nucleus sampling
    repetition_penalty=1.2, # Repetition penalty
    do_sample=True,
    pad_token_id=tokenizer.eos_token_id
)

result = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(result)

3.4 Chat Templates (Conversational Models)

from transformers import AutoTokenizer

# Modern models use chat templates
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3.1-8B-Instruct")

messages = [
    {"role": "system", "content": "You are a cybersecurity expert."},
    {"role": "user", "content": "What is a zero-day vulnerability?"}
]

# Apply chat template
# formatted = tokenizer.apply_chat_template(messages, tokenize=False)
# print(formatted)

4. Text Classification

from transformers import pipeline

# Sentiment analysis
sentiment = pipeline("sentiment-analysis", model="cardiffnlp/twitter-roberta-base-sentiment-latest")
results = sentiment([
    "I love this new AI tool!",
    "This is terrible, worst experience ever.",
    "The weather is okay today."
])
for r in results:
    print(f"  {r['label']:10s} ({r['score']:.4f})")

# Zero-shot classification (no specific training needed)
classifier = pipeline("zero-shot-classification", model="facebook/bart-large-mnli")
result = classifier(
    "The new security update fixes a critical vulnerability in the authentication system",
    candidate_labels=["cybersecurity", "development", "marketing", "finance"]
)
for label, score in zip(result['labels'], result['scores']):
    print(f"  {label:20s}: {score:.4f}")

5. Named Entity Recognition (NER)

from transformers import pipeline

ner = pipeline("ner", model="dslim/bert-base-NER", aggregation_strategy="simple")
text = "Elon Musk visited the HuggingFace headquarters in New York on January 15, 2026."
entities = ner(text)

for entity in entities:
    print(f"  {entity['entity_group']:5s} | {entity['word']:25s} | score: {entity['score']:.4f}")

6. Text Summarization

from transformers import pipeline

summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
text = (
    "HuggingFace is a technology company that develops tools for building applications "
    "using machine learning. The company is most notable for its Transformers library "
    "built for natural language processing applications and its platform that allows "
    "users to share machine learning models and datasets. Founded in 2016, the company "
    "has grown to become one of the most important players in the AI ecosystem, with "
    "over 500,000 models hosted on its platform. The company provides free hosting for "
    "ML demos through Spaces and offers enterprise solutions for production deployments."
)

summary = summarizer(text, max_length=80, min_length=20, do_sample=False)
print(summary[0]['summary_text'])

7. Computer Vision

7.1 Image Classification

from transformers import pipeline

classifier = pipeline("image-classification", model="google/vit-base-patch16-224")
result = classifier("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg")
for r in result[:3]:
    print(f"  {r['label']:30s}: {r['score']:.4f}")

7.2 Object Detection

from transformers import pipeline

detector = pipeline("object-detection", model="facebook/detr-resnet-50")
result = detector("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg")
for obj in result:
    print(f"  {obj['label']:15s} (score: {obj['score']:.4f}) | box: {obj['box']}")

7.3 Image Segmentation

from transformers import pipeline

segmenter = pipeline("image-segmentation", model="facebook/maskformer-swin-base-coco")
# result = segmenter("image.jpg")
# Returns masks with labels and scores

8. Audio and Speech

8.1 Speech Recognition (ASR)

from transformers import pipeline

asr = pipeline("automatic-speech-recognition", model="openai/whisper-base")
# result = asr("audio_sample.wav")
# print(result['text'])

# Whisper supports 99+ languages
asr_multilingual = pipeline("automatic-speech-recognition", model="openai/whisper-large-v3")
# result = asr_multilingual("french_audio.wav", generate_kwargs={"language": "french"})

8.2 Text-to-Speech (TTS)

from transformers import pipeline

tts = pipeline("text-to-speech", model="microsoft/speecht5_tts")
# audio = tts("Hello, welcome to HuggingFace!")
# Save with scipy or soundfile

9. Embeddings and Sentence-Transformers

Embeddings are essential for RAG (Retrieval-Augmented Generation) and semantic search.

For more on RAG, check: RAG Guide - Retrieval Augmented Generation

from transformers import AutoTokenizer, AutoModel
import torch

# Load an embedding model
model_name = "sentence-transformers/all-MiniLM-L6-v2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name)

def get_embedding(text):
    inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True)
    with torch.no_grad():
        outputs = model(**inputs)
    # Mean pooling
    attention_mask = inputs['attention_mask']
    token_embeddings = outputs.last_hidden_state
    input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
    embedding = torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
    return embedding[0]

# Compute similarity
emb1 = get_embedding("Cybersecurity is important for businesses")
emb2 = get_embedding("IT security protects organizations")
emb3 = get_embedding("I love Italian cooking")

sim_12 = torch.cosine_similarity(emb1.unsqueeze(0), emb2.unsqueeze(0))
sim_13 = torch.cosine_similarity(emb1.unsqueeze(0), emb3.unsqueeze(0))

print(f"Similarity (cybersec/IT security): {sim_12.item():.4f}")  # High
print(f"Similarity (cybersec/cooking):     {sim_13.item():.4f}")  # Low

10. Inference API (Free Tier)

The Inference API lets you use models without downloading them:

import requests

API_URL = "https://api-inference.huggingface.co/models/gpt2"
headers = {"Authorization": "Bearer hf_your_token"}

def query(payload):
    response = requests.post(API_URL, headers=headers, json=payload)
    return response.json()

result = query({"inputs": "The future of cybersecurity is"})
print(result)

Free Tier Limits

  • Rate limiting (a few requests/minute)
  • Models loaded on demand (cold start)
  • No dedicated GPU
  • Timeout on large models

11. Dedicated Inference Endpoints (Production)

For production, Inference Endpoints offer:

  • Dedicated GPU: T4, L4, A10G, A100
  • Auto-scaling: 0 to N replicas
  • SLA: 99.9% availability
  • Security: VPC, authentication
  • Monitoring: real-time metrics
from huggingface_hub import InferenceClient

client = InferenceClient(
    model="meta-llama/Meta-Llama-3.1-8B-Instruct",
    token="hf_your_token"
)

# Text generation
response = client.text_generation(
    "Explain cybersecurity best practices:",
    max_new_tokens=200,
    temperature=0.7
)
print(response)

# Chat completion
response = client.chat_completion(
    messages=[
        {"role": "system", "content": "You are a cybersecurity expert."},
        {"role": "user", "content": "What is a zero-day vulnerability?"}
    ],
    max_tokens=300
)
print(response.choices[0].message.content)

12. Model Cards Best Practices

A good model card is essential for transparency:

---
language: en
license: mit
pipeline_tag: text-classification
tags:
  - cybersecurity
  - english
  - bert
datasets:
  - my-cybersec-dataset
metrics:
  - accuracy
  - f1
model-index:
  - name: my-cybersec-model
    results:
      - task:
          type: text-classification
        metrics:
          - name: Accuracy
            type: accuracy
            value: 0.95
---

# My CyberSec Model

## Description
This model classifies cybersecurity texts into categories.

## Usage
(code examples)

## Training
(dataset details, hyperparameters)

## Limitations
(known biases, edge cases)

13. Upload Your Own Model

from huggingface_hub import HfApi

api = HfApi(token="hf_your_token")

# Method 1: push_to_hub from transformers
# model.push_to_hub("my-username/my-model")
# tokenizer.push_to_hub("my-username/my-model")

# Method 2: upload_folder
api.upload_folder(
    folder_path="./my_model",
    repo_id="my-username/my-model",
    repo_type="model"
)

# Method 3: upload_file
api.upload_file(
    path_or_fileobj="./model.safetensors",
    path_in_repo="model.safetensors",
    repo_id="my-username/my-model",
    repo_type="model"
)

Conclusion

HuggingFace models cover every imaginable AI task. Whether you need text generation, classification, NER, vision, audio, or embeddings, the Hub has the model for you. The key is choosing the right model based on the task, language, and deployment constraints.

Explore our CyberSec AI collection: CyberSec AI Portfolio


Tutorial written by AYI-NEDJIMI - AI & Cybersecurity Consultant

Sign up or log in to comment