Spaces:
Sleeping
Sleeping
📑 MedAI Processing – Request Examples
Base URL of the Space:https://binkhoale1812-medai-processing.hf.space
This Space processes medical datasets into a centralised fine-tuning format (JSONL + CSV) with optional augmentations such as paraphrasing, back-translation, style standardisation, de-identification, and deduplication.
🔹 1. Process HealthCareMagic
curl -X POST \
-H "Content-Type: application/json" \
-d '{
"augment": {
"paraphrase_ratio": 0.1,
"backtranslate_ratio": 0.05,
"paraphrase_outputs": false,
"style_standardize": true,
"deidentify": true,
"dedupe": true,
"max_chars": 5000
},
"sample_limit": 2000,
"seed": 42
}' \
https://binkhoale1812-medai-processing.hf.space/process/healthcaremagic
🔹 2. Process iCliniq
curl -X POST \
-H "Content-Type: application/json" \
-d '{
"augment": {
"paraphrase_ratio": 0.2,
"backtranslate_ratio": 0.1,
"paraphrase_outputs": true,
"style_standardize": true,
"deidentify": true,
"dedupe": true,
"max_chars": 5000
},
"sample_limit": 1500,
"seed": 123
}' \
https://binkhoale1812-medai-processing.hf.space/process/icliniq
🔹 3. Process PubMedQA (Labelled)
curl -X POST \
-H "Content-Type: application/json" \
-d '{
"augment": {
"paraphrase_ratio": 0.05,
"backtranslate_ratio": 0.02,
"paraphrase_outputs": false,
"style_standardize": true,
"deidentify": false,
"dedupe": true,
"max_chars": 8000
},
"sample_limit": 1000,
"seed": 99
}' \
https://binkhoale1812-medai-processing.hf.space/process/pubmedqa_l
🔹 4. Process PubMedQA (Unlabelled)
curl -X POST \
-H "Content-Type: application/json" \
-d '{
"augment": {
"paraphrase_ratio": 0.05,
"backtranslate_ratio": 0.05,
"paraphrase_outputs": false,
"style_standardize": true,
"deidentify": true,
"dedupe": true,
"max_chars": 7000,
"consistency_check_ratio": 0.01,
"distill_fraction": 0.1
},
"sample_limit": 500,
"seed": 7
}' \
https://binkhoale1812-medai-processing.hf.space/process/pubmedqa_u
🔹 5. Process PubMedQA (Map)
curl -X POST \
-H "Content-Type: application/json" \
-d '{
"augment": {
"paraphrase_ratio": 0.1,
"backtranslate_ratio": 0.05,
"paraphrase_outputs": true,
"style_standardize": true,
"deidentify": true,
"dedupe": true,
"max_chars": 6000
},
"sample_limit": 1200,
"seed": 2024
}' \
https://binkhoale1812-medai-processing.hf.space/process/pubmedqa_map
🔹 6. Check Current Job Status
curl https://binkhoale1812-medai-processing.hf.space/status
🔹 7. List Generated Artifacts
curl https://binkhoale1812-medai-processing.hf.space/files
✅ Notes
Each run outputs both
.jsonland.csvincache/outputs/and also uploads them to Google Drive folder ID:1JvW7its63E58fLxurH8ZdhxzdpcMrMbtaugmentoptions can be adjusted per dataset:paraphrase_ratio– % of rows paraphrased (0–1)backtranslate_ratio– % of rows back-translatedparaphrase_outputs– whether to also augment model answersstyle_standardize– enforce neutral, clinical styledeidentify– redact PHI (emails, phones, URLs, IPs)dedupe– skip duplicate pairsconsistency_check_ratio– run lightweight QA sanity checkdistill_fraction– generate pseudo-labels for unlabelled data