Spaces:

Kev-HL
/

capsule-defect-segmentation-api

Sleeping

App Files Files Community

capsule-defect-segmentation-api / README.md

Kev-HL

Added citations

eb20f84 13 days ago

preview code

raw

history blame contribute delete

5.28 kB

metadata

title: Capsule Defect Detection and Segmentation with ConvNeXt+U-Net and FastAPI
emoji: 💊
colorFrom: blue
colorTo: gray
sdk: docker
app_port: 8000
pinned: true

Note: This repo contains only deployment/demo files.
For full source, notebooks, and complete code, see Capsule Defect Detection and Segmentation with ConvNeXt+U-Net and FastAPI.

Capsule Defect Detection and Segmentation with ConvNeXt+U-Net and FastAPI

This project addresses a real-world computer vision challenge: detecting and localizing defects on medicinal capsules via image classification and segmentation.
The aim is to deliver a complete pipeline—data preprocessing, model training and evaluation, and deployment, demonstrating practical ML engineering from scratch to API.

Main Repo

This is a minimal clone with only the necessary files from the main repo.
For full source, notebooks, and complete code, see Capsule Defect Detection and Segmentation with ConvNeXt+U-Net and FastAPI.

Project Overview

End-to-end defect detection and localization using the Capsule class from the MVTec AD dataset.
Key steps include:

Data preprocessing, formatting, and augmentation
Model design (pre-trained backbone + custom heads)
Training, evaluation, and hyperparameter tuning
Dockerized FastAPI deployment for inference

Portfolio project to showcase ML workflow and engineering.

Key Results

Evaluation dataset: MVTec AD 'capsule' class, 70/15/15 train/val/test split
Quantitative results on test evaluation:
- Classification accuracy: 83 %
- Classification defect-only accuracy: 75 %
- Defect presence accuracy: 91 %
- Segmentation quality (mIoU / Dice): 0.79 / 0.73
- Segmentation defect-only quality (mIoU / Dice): 0.70 / 0.55
Model artifacts:
- Original model size (.keras / SavedModel): 345 MB
- Raw Converted TFLite size (.tflite): 119 MB
- Optimized Converted TFLite size (.tflite): 31 MB (Dynamic Range Quantization applied)
Container / runtime:
- Docker image size: 317 MB
- Runtime used: tflite-runtime + Uvicorn/FastAPI
- Avg inference latency (inference only, set tensor + invoke): 239 ms
- Avg inference latency (single POST request, measured): 271 ms
- Average memory usage during inference: 321 MB
- Startup time (local): 72 ms
Observations:
- The app returns expected visualizations and class labels for the MVTec-style test images.
- POST inference latency measured locally, expect increased latency on real use (network delays)
- Given the small and highly imbalanced dataset (351 samples, 242 'good' and 109 defective distributed in 5 defect types, ~22 per defect), coupled with the nature of the samples (only distinctive feature is the defect, which in most cases has a small size and varied shape), performance is not as strong as desired, and results lack statistical confidence for a real-case use. Without more data would be difficult to get a reasonable improvement.

Dataset

Capsule class from MVTec AD dataset
License: Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
Dataset folder contains license file
Usage is strictly non-commercial/educational

Tech Stack

Python
TensorFlow
Scikit-Learn
Numpy / Pandas
OpenCV / Pillow
Ray Tune (Experiment tracking)
OmegaConf (Config management)
Docker, FastAPI, Uvicorn (Deployment)

Folder Structure

data/       # Dataset and annotations
app/        # Inference and deployment code and files
models/     # Saved trained models and training logs

How to Run

Build image for deployment:

Requirements:
- models/final_model/final_model.tflite (included)
- app/ folder and contents (included)
- Dockerfile (included)
- .dockerignore (included)
From the project root, build and run the Docker image:

docker build -t cv-app .
docker run -p 8000:8000 cv-app

Open http://0.0.0.0:8000 in your browser to access the demo UI

Note: For the full source code and steps on how to recreate the model, visit the full repo (see "Main Repo" section near the top)

Citations & References

Backbone architectures:

EfficientNetV2: EfficientNetV2: Smaller Models and Faster Training (Mingxing Tan, Quoc V. Le. ICML 2021)
MobileNetV3: Searching for MobileNetV3 (Andrew Howard et al. ICCV 2019)
ConvNeXt: A ConvNet for the 2020s (Zhuang Liu et al. CVPR 2022)

Output heads architectures: Not directly implemented, but inspired by:

FCN: Fully Convolutional Networks for Semantic Segmentation (Jonathan Long, Evan Shelhamer, Trevor Darrell. CVPR 2015)
U-Net: U-Net: Convolutional Networks for Biomedical Image Segmentation (Olaf Ronneberger, Philipp Fischer, Thomas Brox. MICCAI 2015)

Contact

For questions reach out via GitHub (Kev-HL).