Instructions to use Ejokhan/hydro-expert-llm with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use Ejokhan/hydro-expert-llm with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-Instruct-v0.3") model = PeftModel.from_pretrained(base_model, "Ejokhan/hydro-expert-llm") - Notebooks
- Google Colab
- Kaggle
๐งฌ Hydro Expert LLM: Domain-Specific QLoRA Adapter for Hydrology
A QLoRA fine-tuned adapter for Mistral-7B-Instruct-v0.3 trained on 41,958 hydrology Q&A pairs generated from 29,654 scientific papers. Transforms a general-purpose LLM into a hydrology domain expert with 2.8x faster inference and concise, technically accurate answers.
Key Results
| Metric | Base Mistral-7B | Fine-tuned |
|---|---|---|
| Avg Response Time | 18.5s | 6.7s |
| Answer Style | Generic, textbook | Concise, domain-expert |
| Loss | 2.176 | 1.232 (โ43%) |
| Token Accuracy | 59.7% | 70.4% |
Quick Start
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True,
)
base_model = AutoModelForCausalLM.from_pretrained(
"mistralai/Mistral-7B-Instruct-v0.3",
quantization_config=bnb_config,
device_map="auto",
)
model = PeftModel.from_pretrained(base_model, "Ejokhan/hydro-expert-llm")
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.3")
question = "What machine learning models are most effective for streamflow prediction?"
messages = [{"role": "user", "content": question}]
input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(**inputs, max_new_tokens=300, temperature=0.3, do_sample=True)
answer = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(answer)
Training Details
| Parameter | Value |
|---|---|
| Base Model | Mistral-7B-Instruct-v0.3 |
| Method | QLoRA (4-bit NF4 + LoRA) |
| LoRA Rank | 16 |
| LoRA Alpha | 32 |
| Target Modules | q,k,v,o,gate,up,down projections |
| Trainable Params | 41.9M / 3.8B (1.1%) |
| Batch Size | 16 (4 x 4 gradient accumulation) |
| Learning Rate | 5e-5 (cosine schedule) |
| Epochs | 3 |
| Precision | bf16 |
| Training Time | 4h 10m on A100 40GB |
| Adapter Size | 87.6 MB |
Training Data
41,958 Q&A pairs generated from 29,654 scientific paper abstracts using Qwen2.5-3B on GPU:
| Source | Papers |
|---|---|
| PubMed | 20,292 |
| ArXiv | 8,165 |
| Semantic Scholar | 1,182 |
| EarthArXiv | 62 |
| USGS | 15 |
Topics: streamflow prediction, flood forecasting, water quality monitoring, anomaly detection, hydrological modeling, rainfall-runoff, groundwater, climate science.
Example Comparison
Q: "Compare random forest and gradient boosting for water quality classification"
Base: "Random Forest and Gradient Boosting are two popular machine learning algorithms used for classification problems... Each decision tree in the forest..." (generic textbook)
Fine-tuned: "Random forest performed better than gradient boosting, achieving accuracy of 98.8%." (specific, concise, cites metrics)
Related Projects
- HydroRAG โ RAG system benchmarking 15 retrieval configs over 8,618 papers. Live demo
- Hydro Expert LLM โ This project
- HydroAgent (planned) โ Agentic AI for autonomous hydrological analysis
Infrastructure
Trained on TACC Lonestar6 (NVIDIA A100 40GB) through the NSF NAIRR Pilot program.
Author
Ijaz Ul Haq, Ph.D. โ University of Vermont
License
MIT
- Downloads last month
- 25
Model tree for Ejokhan/hydro-expert-llm
Base model
mistralai/Mistral-7B-v0.3