Spaces:
Build error
Build error
File size: 11,287 Bytes
2f8ae1f | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 | # P0 CRITICAL BUGS - Why DeepCritical Produces Garbage Results
**Date:** November 27, 2025
**Status:** CRITICAL - App is functionally useless
**Severity:** P0 (Blocker)
## TL;DR
The app produces garbage because:
1. **BioRxiv search doesn't work** - returns random papers
2. **Free tier LLM is too dumb** - can't identify drugs
3. **Query construction is naive** - no optimization for PubMed/CT.gov syntax
4. **Loop terminates too early** - 5 iterations isn't enough
---
## P0-001: BioRxiv Search is Fundamentally Broken
**File:** `src/tools/biorxiv.py:248-286`
**The Problem:**
The bioRxiv API **DOES NOT SUPPORT KEYWORD SEARCH**.
The code does this:
```python
# Fetch recent papers (last 90 days, first 100 papers)
url = f"{self.BASE_URL}/{self.server}/{interval}/0/json"
# Then filter client-side for keywords
```
**What Actually Happens:**
1. Fetches the first 100 papers from medRxiv in the last 90 days (chronological order)
2. Filters those 100 random papers for query keywords
3. Returns whatever garbage matches
**Result:** For "Long COVID medications", you get random papers like:
- "Calf muscle structure-function adaptations"
- "Work-Life Balance of Ophthalmologists During COVID"
These papers contain "COVID" somewhere but have NOTHING to do with Long COVID treatments.
**Root Cause:** The `/0/json` pagination only returns 100 papers. You'd need to paginate through ALL papers (thousands) to do proper keyword filtering.
**Fix Options:**
1. **Remove BioRxiv entirely** - It's unusable without proper search API
2. **Use a different preprint aggregator** - Europe PMC has preprints WITH search
3. **Add pagination** - Fetch all papers (slow, expensive)
4. **Use Semantic Scholar API** - Has preprints and proper search
---
## P0-002: Free Tier LLM Cannot Perform Drug Identification
**File:** `src/agent_factory/judges.py:153-211`
**The Problem:**
Without an API key, the app uses `HFInferenceJudgeHandler` with:
- Llama 3.1 8B Instruct
- Mistral 7B Instruct
These are **7-8 billion parameter models**. They cannot:
- Reliably parse complex biomedical abstracts
- Identify drug candidates from scientific text
- Generate structured JSON output consistently
- Reason about mechanism of action
**Evidence of Failure:**
```python
# From MockJudgeHandler - the honest fallback when LLM fails
drug_candidates=[
"Drug identification requires AI analysis",
"Enter API key above for full results",
]
```
The team KNEW the free tier can't identify drugs and added this message.
**Root Cause:** Drug repurposing requires understanding:
- Drug mechanisms
- Disease pathophysiology
- Clinical trial phases
- Statistical significance
This requires GPT-4 / Claude Sonnet class models (100B+ parameters).
**Fix Options:**
1. **Require API key** - No free tier, be honest
2. **Use larger HF models** - Llama 70B or Mixtral 8x7B (expensive on free tier)
3. **Hybrid approach** - Use free tier for search, require paid for synthesis
---
## P0-003: PubMed Query Not Optimized
**File:** `src/tools/pubmed.py:54-71`
**The Problem:**
The query is passed directly to PubMed without optimization:
```python
search_params = self._build_params(
db="pubmed",
term=query, # Raw user query!
retmax=max_results,
sort="relevance",
)
```
**What User Enters:** "What medications show promise for Long COVID?"
**What PubMed Receives:** `What medications show promise for Long COVID?`
**What PubMed Should Receive:**
```
("long covid"[Title/Abstract] OR "post-COVID"[Title/Abstract] OR "PASC"[Title/Abstract])
AND (drug[Title/Abstract] OR treatment[Title/Abstract] OR medication[Title/Abstract] OR therapy[Title/Abstract])
AND (clinical trial[Publication Type] OR randomized[Title/Abstract])
```
**Root Cause:** No query preprocessing or medical term expansion.
**Fix Options:**
1. **Add query preprocessor** - Extract medical entities, expand synonyms
2. **Use MeSH terms** - PubMed's controlled vocabulary for better recall
3. **LLM query generation** - Use LLM to generate optimized PubMed query
---
## P0-004: Loop Terminates Too Early
**File:** `src/app.py:42-45` and `src/utils/models.py`
**The Problem:**
```python
config = OrchestratorConfig(
max_iterations=5,
max_results_per_tool=10,
)
```
5 iterations is not enough to:
1. Search multiple variations of the query
2. Gather enough evidence for the Judge to synthesize
3. Refine queries based on initial results
**Evidence:** The user's output shows "Max Iterations Reached" with only 6 sources.
**Root Cause:** Conservative defaults to avoid API costs, but makes app useless.
**Fix Options:**
1. **Increase default to 10-15** - More iterations = better results
2. **Dynamic termination** - Stop when confidence > threshold, not iteration count
3. **Parallel query expansion** - Run more queries per iteration
---
## P0-005: No Query Understanding Layer
**Files:** `src/orchestrator.py`, `src/tools/search_handler.py`
**The Problem:**
There's no NLU (Natural Language Understanding) layer. The system:
1. Takes raw user query
2. Passes directly to search tools
3. No entity extraction
4. No intent classification
5. No query expansion
For drug repurposing, you need to extract:
- **Disease:** "Long COVID" β [Long COVID, PASC, Post-COVID syndrome, chronic COVID]
- **Drug intent:** "medications" β [drugs, treatments, therapeutics, interventions]
- **Evidence type:** "show promise" β [clinical trials, efficacy, RCT]
**Root Cause:** No preprocessing pipeline between user input and search execution.
**Fix Options:**
1. **Add entity extraction** - Use BioBERT or PubMedBERT for medical NER
2. **Add query expansion** - Use medical ontologies (UMLS, MeSH)
3. **LLM preprocessing** - Use LLM to generate search strategy before searching
---
## P0-006: ClinicalTrials.gov Results Not Filtered
**File:** `src/tools/clinicaltrials.py`
**The Problem:**
ClinicalTrials.gov returns ALL matching trials including:
- Withdrawn trials
- Terminated trials
- Not yet recruiting
- Observational studies (not interventional)
For drug repurposing, you want:
- Interventional studies
- Phase 2+ (has safety/efficacy data)
- Completed or with results
**Root Cause:** No filtering of trial metadata.
---
## Summary: Why This App Produces Garbage
```
User Query: "What medications show promise for Long COVID?"
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β NO QUERY PREPROCESSING β
β - No entity extraction β
β - No synonym expansion β
β - No medical term normalization β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β BROKEN SEARCH LAYER β
β - PubMed: Raw query, no MeSH, gets 1 result β
β - BioRxiv: Returns random papers (API doesn't support search)β
β - ClinicalTrials: Returns all trials, no filtering β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β GARBAGE EVIDENCE β
β - 6 papers, most irrelevant β
β - "Calf muscle adaptations" (mentions COVID once) β
β - "Ophthalmologist work-life balance" β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β DUMB JUDGE (Free Tier) β
β - Llama 8B can't identify drugs from garbage β
β - JSON parsing fails β
β - Falls back to "Drug identification requires AI analysis" β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β LOOP HITS MAX (5 iterations) β
β - Never finds enough good evidence β
β - Never synthesizes anything useful β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
GARBAGE OUTPUT
```
---
## What Would Make This Actually Work
### Minimum Viable Fix (1-2 days)
1. **Remove BioRxiv** - It doesn't work
2. **Require API key** - Be honest that free tier is useless
3. **Add basic query preprocessing** - Strip question words, expand COVID synonyms
4. **Increase iterations to 10**
### Proper Fix (1-2 weeks)
1. **Query Understanding Layer**
- Medical NER (BioBERT/SciBERT)
- Query expansion with MeSH/UMLS
- Intent classification (drug discovery vs mechanism vs safety)
2. **Optimized Search**
- PubMed: Proper query syntax with MeSH terms
- ClinicalTrials: Filter by phase, status, intervention type
- Replace BioRxiv with Europe PMC (has preprints + search)
3. **Evidence Ranking**
- Score by publication type (RCT > cohort > case report)
- Score by journal impact factor
- Score by recency
- Score by citation count
4. **Proper LLM Pipeline**
- Use GPT-4 / Claude for synthesis
- Structured extraction of: drug, mechanism, evidence level, effect size
- Multi-step reasoning: identify β validate β rank β synthesize
---
## The Hard Truth
Building a drug repurposing agent that works is HARD. The state of the art is:
- **Drug2Disease (IBM)** - Uses knowledge graphs + ML
- **COVID-KG (Stanford)** - Dedicated COVID knowledge graph
- **Literature Mining at scale (PubMed)** - Millions of papers, not 10
This hackathon project is fundamentally a **search wrapper with an LLM prompt**. That's not enough.
To make it useful:
1. Either scope it down (e.g., "find clinical trials for X disease")
2. Or invest serious engineering in the NLU + search + ranking pipeline
|