thaiphonghuan's picture
Update README.md
21bcc71 verified
metadata
license: apache-2.0
tags:
  - vision-language
  - image-captioning
  - medical-imaging
  - chest-xray
  - blip
  - finetuned
library_name: transformers

Fine-tuned BLIP on Chest X-rays (Indiana University)

This repository contains a fine-tuned BLIP (Bootstrapped Language-Image Pretraining) model trained on the Chest X-rays (Indiana University) dataset.
The model is adapted for vision–language tasks in the medical imaging domain, particularly chest X-ray understanding.


🧠 Model Description

  • Base model: BLIP (Bootstrapped Language-Image Pretraining)
  • Fine-tuning domain: Medical imaging
  • Modality: Vision–Language (Image + Text)
  • Target data: Chest X-ray images (frontal & lateral views)

The goal of fine-tuning is to adapt BLIP to better capture radiological visual patterns and associated semantic information from chest X-ray images.


📊 Dataset Information

The model is fine-tuned using the Chest X-rays (Indiana University) dataset.

Dataset Source

Image Preprocessing Pipeline

Original images were provided in raw DICOM format. Each image was converted to PNG with the following preprocessing steps:

  1. Outlier clipping

    • The top and bottom 0.5% of DICOM pixel values were clipped
    • Purpose: eliminate extremely dark or bright pixel outliers
  2. Intensity normalization

    • DICOM pixel values were linearly scaled to the 0–255 range
  3. Resizing

    • Images were resized so that the shorter side is 2048 pixels
    • This was done to comply with Kaggle dataset size limits
  4. View classification

    • Each image was manually classified into:
      • Frontal chest X-ray
      • Lateral chest X-ray