MedAI_Processing / docs /REQUEST.md
LiamKhoaLe's picture
Upd local setups with dynamic mode setter
a89888b
# 📑 MedAI Processing – Request Examples
Base URL of the Space:
**`https://binkhoale1812-medai-processing.hf.space`**
This Space processes medical datasets into a centralised fine-tuning format (JSONL + CSV) with optional augmentations such as **paraphrasing**, **back-translation**, **style standardisation**, **de-identification**, and **deduplication**.
---
## 🔹 1. Process HealthCareMagic
```bash
curl -X POST \
-H "Content-Type: application/json" \
-d '{
"augment": {
"paraphrase_ratio": 0.1,
"backtranslate_ratio": 0.05,
"paraphrase_outputs": false,
"style_standardize": true,
"deidentify": true,
"dedupe": true,
"max_chars": 5000
},
"sample_limit": 2000,
"seed": 42
}' \
https://binkhoale1812-medai-processing.hf.space/process/healthcaremagic
````
---
## 🔹 2. Process iCliniq
```bash
curl -X POST \
-H "Content-Type: application/json" \
-d '{
"augment": {
"paraphrase_ratio": 0.2,
"backtranslate_ratio": 0.1,
"paraphrase_outputs": true,
"style_standardize": true,
"deidentify": true,
"dedupe": true,
"max_chars": 5000
},
"sample_limit": 1500,
"seed": 123
}' \
https://binkhoale1812-medai-processing.hf.space/process/icliniq
```
---
## 🔹 3. Process PubMedQA (Labelled)
```bash
curl -X POST \
-H "Content-Type: application/json" \
-d '{
"augment": {
"paraphrase_ratio": 0.05,
"backtranslate_ratio": 0.02,
"paraphrase_outputs": false,
"style_standardize": true,
"deidentify": false,
"dedupe": true,
"max_chars": 8000
},
"sample_limit": 1000,
"seed": 99
}' \
https://binkhoale1812-medai-processing.hf.space/process/pubmedqa_l
```
---
## 🔹 4. Process PubMedQA (Unlabelled)
```bash
curl -X POST \
-H "Content-Type: application/json" \
-d '{
"augment": {
"paraphrase_ratio": 0.05,
"backtranslate_ratio": 0.05,
"paraphrase_outputs": false,
"style_standardize": true,
"deidentify": true,
"dedupe": true,
"max_chars": 7000,
"consistency_check_ratio": 0.01,
"distill_fraction": 0.1
},
"sample_limit": 500,
"seed": 7
}' \
https://binkhoale1812-medai-processing.hf.space/process/pubmedqa_u
```
---
## 🔹 5. Process PubMedQA (Map)
```bash
curl -X POST \
-H "Content-Type: application/json" \
-d '{
"augment": {
"paraphrase_ratio": 0.1,
"backtranslate_ratio": 0.05,
"paraphrase_outputs": true,
"style_standardize": true,
"deidentify": true,
"dedupe": true,
"max_chars": 6000
},
"sample_limit": 1200,
"seed": 2024
}' \
https://binkhoale1812-medai-processing.hf.space/process/pubmedqa_map
```
---
## 🔹 6. Check Current Job Status
```bash
curl https://binkhoale1812-medai-processing.hf.space/status
```
---
## 🔹 7. List Generated Artifacts
```bash
curl https://binkhoale1812-medai-processing.hf.space/files
```
---
# ✅ Notes
* Each run outputs both `.jsonl` and `.csv` in `cache/outputs/` and also uploads them to Google Drive folder ID:
`1JvW7its63E58fLxurH8ZdhxzdpcMrMbt`
* `augment` options can be adjusted per dataset:
* `paraphrase_ratio` – % of rows paraphrased (0–1)
* `backtranslate_ratio` – % of rows back-translated
* `paraphrase_outputs` – whether to also augment model answers
* `style_standardize` – enforce neutral, clinical style
* `deidentify` – redact PHI (emails, phones, URLs, IPs)
* `dedupe` – skip duplicate pairs
* `consistency_check_ratio` – run lightweight QA sanity check
* `distill_fraction` – generate pseudo-labels for unlabelled data