Spaces:

DataQuests
/

DeepCritical

Build error

App Files Files Community

DeepCritical / docs /bugs /P0_CRITICAL_BUGS.md

VibecoderMcSwaggins

refactor(tools): replace BioRxiv with Europe PMC (Phase 01)

2f8ae1f 5 months ago

preview code

raw

history blame

11.3 kB

	# P0 CRITICAL BUGS - Why DeepCritical Produces Garbage Results

	Date: November 27, 2025
	Status: CRITICAL - App is functionally useless
	Severity: P0 (Blocker)

	## TL;DR

	The app produces garbage because:
	1. BioRxiv search doesn't work - returns random papers
	2. Free tier LLM is too dumb - can't identify drugs
	3. Query construction is naive - no optimization for PubMed/CT.gov syntax
	4. Loop terminates too early - 5 iterations isn't enough

	---

	## P0-001: BioRxiv Search is Fundamentally Broken

	File: `src/tools/biorxiv.py:248-286`

	The Problem:
	The bioRxiv API DOES NOT SUPPORT KEYWORD SEARCH.

	The code does this:
	```python
	# Fetch recent papers (last 90 days, first 100 papers)
	url = f"{self.BASE_URL}/{self.server}/{interval}/0/json"
	# Then filter client-side for keywords
	```

	What Actually Happens:
	1. Fetches the first 100 papers from medRxiv in the last 90 days (chronological order)
	2. Filters those 100 random papers for query keywords
	3. Returns whatever garbage matches

	Result: For "Long COVID medications", you get random papers like:
	- "Calf muscle structure-function adaptations"
	- "Work-Life Balance of Ophthalmologists During COVID"

	These papers contain "COVID" somewhere but have NOTHING to do with Long COVID treatments.

	Root Cause: The `/0/json` pagination only returns 100 papers. You'd need to paginate through ALL papers (thousands) to do proper keyword filtering.

	Fix Options:
	1. Remove BioRxiv entirely - It's unusable without proper search API
	2. Use a different preprint aggregator - Europe PMC has preprints WITH search
	3. Add pagination - Fetch all papers (slow, expensive)
	4. Use Semantic Scholar API - Has preprints and proper search

	---

	## P0-002: Free Tier LLM Cannot Perform Drug Identification

	File: `src/agent_factory/judges.py:153-211`

	The Problem:
	Without an API key, the app uses `HFInferenceJudgeHandler` with:
	- Llama 3.1 8B Instruct
	- Mistral 7B Instruct

	These are 7-8 billion parameter models. They cannot:
	- Reliably parse complex biomedical abstracts
	- Identify drug candidates from scientific text
	- Generate structured JSON output consistently
	- Reason about mechanism of action

	Evidence of Failure:
	```python
	# From MockJudgeHandler - the honest fallback when LLM fails
	drug_candidates=[
	"Drug identification requires AI analysis",
	"Enter API key above for full results",
	]
	```

	The team KNEW the free tier can't identify drugs and added this message.

	Root Cause: Drug repurposing requires understanding:
	- Drug mechanisms
	- Disease pathophysiology
	- Clinical trial phases
	- Statistical significance

	This requires GPT-4 / Claude Sonnet class models (100B+ parameters).

	Fix Options:
	1. Require API key - No free tier, be honest
	2. Use larger HF models - Llama 70B or Mixtral 8x7B (expensive on free tier)
	3. Hybrid approach - Use free tier for search, require paid for synthesis

	---

	## P0-003: PubMed Query Not Optimized

	File: `src/tools/pubmed.py:54-71`

	The Problem:
	The query is passed directly to PubMed without optimization:
	```python
	search_params = self._build_params(
	db="pubmed",
	term=query, # Raw user query!
	retmax=max_results,
	sort="relevance",
	)
	```

	What User Enters: "What medications show promise for Long COVID?"

	What PubMed Receives: `What medications show promise for Long COVID?`

	What PubMed Should Receive:
	```
	("long covid"[Title/Abstract] OR "post-COVID"[Title/Abstract] OR "PASC"[Title/Abstract])
	AND (drug[Title/Abstract] OR treatment[Title/Abstract] OR medication[Title/Abstract] OR therapy[Title/Abstract])
	AND (clinical trial[Publication Type] OR randomized[Title/Abstract])
	```

	Root Cause: No query preprocessing or medical term expansion.

	Fix Options:
	1. Add query preprocessor - Extract medical entities, expand synonyms
	2. Use MeSH terms - PubMed's controlled vocabulary for better recall
	3. LLM query generation - Use LLM to generate optimized PubMed query

	---

	## P0-004: Loop Terminates Too Early

	File: `src/app.py:42-45` and `src/utils/models.py`

	The Problem:
	```python
	config = OrchestratorConfig(
	max_iterations=5,
	max_results_per_tool=10,
	)
	```

	5 iterations is not enough to:
	1. Search multiple variations of the query
	2. Gather enough evidence for the Judge to synthesize
	3. Refine queries based on initial results

	Evidence: The user's output shows "Max Iterations Reached" with only 6 sources.

	Root Cause: Conservative defaults to avoid API costs, but makes app useless.

	Fix Options:
	1. Increase default to 10-15 - More iterations = better results
	2. Dynamic termination - Stop when confidence > threshold, not iteration count
	3. Parallel query expansion - Run more queries per iteration

	---

	## P0-005: No Query Understanding Layer

	Files: `src/orchestrator.py`, `src/tools/search_handler.py`

	The Problem:
	There's no NLU (Natural Language Understanding) layer. The system:
	1. Takes raw user query
	2. Passes directly to search tools
	3. No entity extraction
	4. No intent classification
	5. No query expansion

	For drug repurposing, you need to extract:
	- Disease: "Long COVID" → [Long COVID, PASC, Post-COVID syndrome, chronic COVID]
	- Drug intent: "medications" → [drugs, treatments, therapeutics, interventions]
	- Evidence type: "show promise" → [clinical trials, efficacy, RCT]

	Root Cause: No preprocessing pipeline between user input and search execution.

	Fix Options:
	1. Add entity extraction - Use BioBERT or PubMedBERT for medical NER
	2. Add query expansion - Use medical ontologies (UMLS, MeSH)
	3. LLM preprocessing - Use LLM to generate search strategy before searching

	---

	## P0-006: ClinicalTrials.gov Results Not Filtered

	File: `src/tools/clinicaltrials.py`

	The Problem:
	ClinicalTrials.gov returns ALL matching trials including:
	- Withdrawn trials
	- Terminated trials
	- Not yet recruiting
	- Observational studies (not interventional)

	For drug repurposing, you want:
	- Interventional studies
	- Phase 2+ (has safety/efficacy data)
	- Completed or with results

	Root Cause: No filtering of trial metadata.

	---

	## Summary: Why This App Produces Garbage

	```
	User Query: "What medications show promise for Long COVID?"
	│
	▼
	┌─────────────────────────────────────────────────────────────┐
	│ NO QUERY PREPROCESSING │
	│ - No entity extraction │
	│ - No synonym expansion │
	│ - No medical term normalization │
	└─────────────────────────────────────────────────────────────┘
	│
	▼
	┌─────────────────────────────────────────────────────────────┐
	│ BROKEN SEARCH LAYER │
	│ - PubMed: Raw query, no MeSH, gets 1 result │
	│ - BioRxiv: Returns random papers (API doesn't support search)│
	│ - ClinicalTrials: Returns all trials, no filtering │
	└─────────────────────────────────────────────────────────────┘
	│
	▼
	┌─────────────────────────────────────────────────────────────┐
	│ GARBAGE EVIDENCE │
	│ - 6 papers, most irrelevant │
	│ - "Calf muscle adaptations" (mentions COVID once) │
	│ - "Ophthalmologist work-life balance" │
	└─────────────────────────────────────────────────────────────┘
	│
	▼
	┌─────────────────────────────────────────────────────────────┐
	│ DUMB JUDGE (Free Tier) │
	│ - Llama 8B can't identify drugs from garbage │
	│ - JSON parsing fails │
	│ - Falls back to "Drug identification requires AI analysis" │
	└─────────────────────────────────────────────────────────────┘
	│
	▼
	┌─────────────────────────────────────────────────────────────┐
	│ LOOP HITS MAX (5 iterations) │
	│ - Never finds enough good evidence │
	│ - Never synthesizes anything useful │
	└─────────────────────────────────────────────────────────────┘
	│
	▼
	GARBAGE OUTPUT
	```

	---

	## What Would Make This Actually Work

	### Minimum Viable Fix (1-2 days)

	1. Remove BioRxiv - It doesn't work
	2. Require API key - Be honest that free tier is useless
	3. Add basic query preprocessing - Strip question words, expand COVID synonyms
	4. Increase iterations to 10

	### Proper Fix (1-2 weeks)

	1. Query Understanding Layer
	- Medical NER (BioBERT/SciBERT)
	- Query expansion with MeSH/UMLS
	- Intent classification (drug discovery vs mechanism vs safety)

	2. Optimized Search
	- PubMed: Proper query syntax with MeSH terms
	- ClinicalTrials: Filter by phase, status, intervention type
	- Replace BioRxiv with Europe PMC (has preprints + search)

	3. Evidence Ranking
	- Score by publication type (RCT > cohort > case report)
	- Score by journal impact factor
	- Score by recency
	- Score by citation count

	4. Proper LLM Pipeline
	- Use GPT-4 / Claude for synthesis
	- Structured extraction of: drug, mechanism, evidence level, effect size
	- Multi-step reasoning: identify → validate → rank → synthesize

	---

	## The Hard Truth

	Building a drug repurposing agent that works is HARD. The state of the art is:

	- Drug2Disease (IBM) - Uses knowledge graphs + ML
	- COVID-KG (Stanford) - Dedicated COVID knowledge graph
	- Literature Mining at scale (PubMed) - Millions of papers, not 10

	This hackathon project is fundamentally a search wrapper with an LLM prompt. That's not enough.

	To make it useful:
	1. Either scope it down (e.g., "find clinical trials for X disease")
	2. Or invest serious engineering in the NLU + search + ranking pipeline