Changelog
All notable changes to Thought-Retriever will be documented in this file.
[2.0.0] - 2025-05-19
Added
π¨π³ Chinese Embedding Engine: jieba tokenization + TF-IDF embedder (
_JiebaTfidfEmbedder)- Auto-detected as priority backend after sentence-transformers
- 3-5x better semantic similarity for Chinese text vs character n-gram TF-IDF
- Fully offline, no model download required
- Vocabulary caching for consistent embedding dimensions
π¨π³ Chinese Prompt Templates:
thought_confidence_prompt_zh()andanswer_generation_prompt_zh()- Auto language detection based on Chinese character ratio
- Configurable via
ThoughtConfig(language="zh")
π§Ή Smart Message Filtering:
should_generate_thought()utility- Skip short/meaningless messages (e.g., "ε―", "ε₯½", "ok")
- 7 regex patterns for filtering noise
π§Ή Thought Content Cleaning: Auto-remove LLM output labels
- Clean
[η₯θ―ηΉ]οΌ,[ζηΌηη₯θ―ηΉ]etc. from thought content
- Clean
π Language Configuration:
ThoughtConfig.languageandThoughtConfig.thought_prompt_langauto: Auto-detect based on Chinese character ratiozh: Force Chinese promptsen: Force English prompts
π§ Public API Extensions:
ThoughtStore.clear_knowledge()public method- Export
ThoughtStore,EmbeddingEngine,generate_id,timestamp_now,chunk_textfrom package
π‘ Robust Response Parsing: Enhanced
_parse_thought_response()- Support Chinese "ζ―/ε¦/ζζ/ζ ζ" in addition to "1/0"
- Fallback: treat multi-line content as valid if β₯5 chars
Changed
- Embedding Backend Priority: sentence-transformers β jieba_tfidf β tfidf β hash
- JiebaTfidfEmbedder: Vocabulary and IDF cached after first encode, reused in subsequent calls
- setup.py: Added
jieba>=0.42.1to core dependencies - Minimum Python version: 3.8 (unchanged)
Fixed
- Fixed embedding dimension inconsistency when jieba_tfidf rebuilds vocabulary each call
- Fixed
_parse_thought_responsefailing on Chinese LLM output formats - Fixed
ThoughtMemory.clear()directly accessing private_knowledgeattribute
[1.0.0] - 2025-05-15
Added
- Initial release based on TMLR 2026 paper
- 5-step pipeline: Retrieval β Answer β Thought β Merge β Update
- Dual filtering: confidence (ci) + redundancy (si)
- 3-tier embedding fallback: sentence-transformers β TF-IDF β hash
- JSON file persistence
- Trae AI IDE Skill integration