YOLOv8n Face Detection β€” Bengali Video Caption Pipeline

Model Summary

Fine-tuned YOLOv8n for face/person detection in Bengali video frames.
Part of a Bengali video auto-captioning system that combines:

  • This model β†’ detects visible faces in frames
  • Tmanna/whisper-bengali-final β†’ Bengali speech-to-text
  • Rule-based visibility filter β†’ shows caption only when face is on screen

Pipeline Architecture

Video
  ↓
Audio ──→ Whisper (Tmanna/whisper-bengali-final) ──→ Bengali caption text
  ↓
Frames ──→ THIS MODEL (YOLOv8 Face Detection)
  ↓
Visibility Filter (rule-based: len(boxes) > 0)
  ↓
IF face visible β†’ overlay Bengali caption
ELSE            β†’ skip caption for this frame

Training Details

Parameter Value
Base Model yolov8n.pt (Ultralytics)
Dataset lylmsc/wider-face-for-yolo-training
Classes 1 (face)
Epochs 50
Image Size 640 Γ— 640
Batch Size 32
Hardware 2Γ— NVIDIA T4 (Kaggle)
Optimizer AdamW + Cosine LR decay
Early Stop patience=10

Evaluation Results

Metric Score
mAP@0.5 0.6994
mAP@0.5:0.95 0.3665
Precision 0.8556
Recall 0.6104

Usage

from ultralytics import YOLO

model = YOLO("Tmanna/yolov8-face-bengali-video")

# On a video frame (numpy array or image path)
results = model("frame.jpg", conf=0.25)

faces = results[0].boxes
if len(faces) > 0:
    print("Face visible β†’ show Bengali caption")
else:
    print("No face β†’ skip caption")

Visibility Filter (no extra model needed)

def should_show_caption(frame_path, face_model, conf=0.25):
    results = face_model(frame_path, conf=conf, verbose=False)
    return len(results[0].boxes) > 0   # True = show, False = hide

What This Model Does NOT Handle

Module Status Reason
Bengali STT βœ… Handled by Tmanna/whisper-bengali-final Pre-existing model
Translation ❌ Not needed Englishβ†’Bengali not required
Speaker Diarization ⚠️ Optional Not in core pipeline
Face Tracking (DeepSORT) ⚠️ Optional Not in core pipeline
Active Speaker Detection πŸ”΄ Hard/Optional Not in core pipeline
Downloads last month
71
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support