# Changelog All notable changes to Thought-Retriever will be documented in this file. ## [2.0.0] - 2025-05-19 ### Added - ๐Ÿ‡จ๐Ÿ‡ณ **Chinese Embedding Engine**: jieba tokenization + TF-IDF embedder (`_JiebaTfidfEmbedder`) - Auto-detected as priority backend after sentence-transformers - 3-5x better semantic similarity for Chinese text vs character n-gram TF-IDF - Fully offline, no model download required - Vocabulary caching for consistent embedding dimensions - ๐Ÿ‡จ๐Ÿ‡ณ **Chinese Prompt Templates**: `thought_confidence_prompt_zh()` and `answer_generation_prompt_zh()` - Auto language detection based on Chinese character ratio - Configurable via `ThoughtConfig(language="zh")` - ๐Ÿงน **Smart Message Filtering**: `should_generate_thought()` utility - Skip short/meaningless messages (e.g., "ๅ—ฏ", "ๅฅฝ", "ok") - 7 regex patterns for filtering noise - ๐Ÿงน **Thought Content Cleaning**: Auto-remove LLM output labels - Clean `[็Ÿฅ่ฏ†็‚น]๏ผš`, `[ๆ็‚ผ็š„็Ÿฅ่ฏ†็‚น]` etc. from thought content - ๐Ÿ“Š **Language Configuration**: `ThoughtConfig.language` and `ThoughtConfig.thought_prompt_lang` - `auto`: Auto-detect based on Chinese character ratio - `zh`: Force Chinese prompts - `en`: Force English prompts - ๐Ÿ”ง **Public API Extensions**: - `ThoughtStore.clear_knowledge()` public method - Export `ThoughtStore`, `EmbeddingEngine`, `generate_id`, `timestamp_now`, `chunk_text` from package - ๐Ÿ›ก **Robust Response Parsing**: Enhanced `_parse_thought_response()` - Support Chinese "ๆ˜ฏ/ๅฆ/ๆœ‰ๆ•ˆ/ๆ— ๆ•ˆ" in addition to "1/0" - Fallback: treat multi-line content as valid if โ‰ฅ5 chars ### Changed - **Embedding Backend Priority**: sentence-transformers โ†’ jieba_tfidf โ†’ tfidf โ†’ hash - **JiebaTfidfEmbedder**: Vocabulary and IDF cached after first encode, reused in subsequent calls - **setup.py**: Added `jieba>=0.42.1` to core dependencies - **Minimum Python version**: 3.8 (unchanged) ### Fixed - Fixed embedding dimension inconsistency when jieba_tfidf rebuilds vocabulary each call - Fixed `_parse_thought_response` failing on Chinese LLM output formats - Fixed `ThoughtMemory.clear()` directly accessing private `_knowledge` attribute ## [1.0.0] - 2025-05-15 ### Added - Initial release based on TMLR 2026 paper - 5-step pipeline: Retrieval โ†’ Answer โ†’ Thought โ†’ Merge โ†’ Update - Dual filtering: confidence (ci) + redundancy (si) - 3-tier embedding fallback: sentence-transformers โ†’ TF-IDF โ†’ hash - JSON file persistence - Trae AI IDE Skill integration