refactor(json_parser): 统一 LLM 响应的 JSON 解析逻辑,简化代码并提高解析成功率
This commit is contained in:
216
docs/development/json_parser_unification.md
Normal file
216
docs/development/json_parser_unification.md
Normal file
@@ -0,0 +1,216 @@
|
|||||||
|
# JSON 解析统一化改进文档
|
||||||
|
|
||||||
|
## 改进目标
|
||||||
|
统一项目中所有 LLM 响应的 JSON 解析逻辑,使用 `json_repair` 库和统一的解析工具,简化代码并提高解析成功率。
|
||||||
|
|
||||||
|
## 创建的新工具模块
|
||||||
|
|
||||||
|
### `src/utils/json_parser.py`
|
||||||
|
提供统一的 JSON 解析功能:
|
||||||
|
|
||||||
|
#### 主要函数:
|
||||||
|
1. **`extract_and_parse_json(response, strict=False)`**
|
||||||
|
- 从 LLM 响应中提取并解析 JSON
|
||||||
|
- 自动处理 Markdown 代码块标记
|
||||||
|
- 使用 json_repair 修复格式问题
|
||||||
|
- 支持严格模式和容错模式
|
||||||
|
|
||||||
|
2. **`safe_parse_json(json_str, default=None)`**
|
||||||
|
- 安全解析 JSON,失败时返回默认值
|
||||||
|
|
||||||
|
3. **`extract_json_field(response, field_name, default=None)`**
|
||||||
|
- 从 LLM 响应中提取特定字段的值
|
||||||
|
|
||||||
|
#### 处理策略:
|
||||||
|
1. 清理 Markdown 代码块标记(```json 和 ```)
|
||||||
|
2. 提取 JSON 对象或数组(使用栈匹配算法)
|
||||||
|
3. 尝试直接解析
|
||||||
|
4. 如果失败,使用 json_repair 修复后解析
|
||||||
|
5. 容错模式下返回空字典或空列表
|
||||||
|
|
||||||
|
## 已修改的文件
|
||||||
|
|
||||||
|
### 1. `src/chat/memory_system/memory_query_planner.py` ✅
|
||||||
|
- 移除了自定义的 `_extract_json_payload` 方法
|
||||||
|
- 使用 `extract_and_parse_json` 替代原有的解析逻辑
|
||||||
|
- 简化了代码,提高了可维护性
|
||||||
|
|
||||||
|
**修改前:**
|
||||||
|
```python
|
||||||
|
payload = self._extract_json_payload(response)
|
||||||
|
if not payload:
|
||||||
|
return self._default_plan(query_text)
|
||||||
|
try:
|
||||||
|
data = orjson.loads(payload)
|
||||||
|
except orjson.JSONDecodeError as exc:
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
**修改后:**
|
||||||
|
```python
|
||||||
|
data = extract_and_parse_json(response, strict=False)
|
||||||
|
if not data or not isinstance(data, dict):
|
||||||
|
return self._default_plan(query_text)
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. `src/chat/memory_system/memory_system.py` ✅
|
||||||
|
- 移除了自定义的 `_extract_json_payload` 方法
|
||||||
|
- 在 `_evaluate_information_value` 方法中使用统一解析工具
|
||||||
|
- 简化了错误处理逻辑
|
||||||
|
|
||||||
|
### 3. `src/chat/interest_system/bot_interest_manager.py` ✅
|
||||||
|
- 移除了自定义的 `_clean_llm_response` 方法
|
||||||
|
- 使用 `extract_and_parse_json` 解析兴趣标签数据
|
||||||
|
- 改进了错误处理和日志输出
|
||||||
|
|
||||||
|
### 4. `src/plugins/built_in/affinity_flow_chatter/chat_stream_impression_tool.py` ✅
|
||||||
|
- 将 `_clean_llm_json_response` 标记为已废弃
|
||||||
|
- 使用 `extract_and_parse_json` 解析聊天流印象数据
|
||||||
|
- 添加了类型检查和错误处理
|
||||||
|
|
||||||
|
## 待修改的文件
|
||||||
|
|
||||||
|
### 需要类似修改的其他文件:
|
||||||
|
1. `src/plugins/built_in/affinity_flow_chatter/proactive_thinking_executor.py`
|
||||||
|
- 包含自定义的 JSON 清理逻辑
|
||||||
|
|
||||||
|
2. `src/plugins/built_in/affinity_flow_chatter/user_profile_tool.py`
|
||||||
|
- 包含自定义的 JSON 清理逻辑
|
||||||
|
|
||||||
|
3. 其他包含自定义 JSON 解析逻辑的文件
|
||||||
|
|
||||||
|
## 改进效果
|
||||||
|
|
||||||
|
### 1. 代码简化
|
||||||
|
- 消除了重复的 JSON 提取和清理代码
|
||||||
|
- 减少了代码行数和维护成本
|
||||||
|
- 统一了错误处理模式
|
||||||
|
|
||||||
|
### 2. 解析成功率提升
|
||||||
|
- 使用 json_repair 自动修复常见的 JSON 格式问题
|
||||||
|
- 支持多种 JSON 包装格式(代码块、纯文本等)
|
||||||
|
- 更好的容错处理
|
||||||
|
|
||||||
|
### 3. 可维护性提升
|
||||||
|
- 集中管理 JSON 解析逻辑
|
||||||
|
- 易于添加新的解析策略
|
||||||
|
- 便于调试和日志记录
|
||||||
|
|
||||||
|
### 4. 一致性提升
|
||||||
|
- 所有 LLM 响应使用相同的解析流程
|
||||||
|
- 统一的日志输出格式
|
||||||
|
- 一致的错误处理
|
||||||
|
|
||||||
|
## 使用示例
|
||||||
|
|
||||||
|
### 基本用法:
|
||||||
|
```python
|
||||||
|
from src.utils.json_parser import extract_and_parse_json
|
||||||
|
|
||||||
|
# LLM 响应可能包含 Markdown 代码块或其他文本
|
||||||
|
llm_response = '```json\\n{"key": "value"}\\n```'
|
||||||
|
|
||||||
|
# 自动提取和解析
|
||||||
|
data = extract_and_parse_json(llm_response, strict=False)
|
||||||
|
# 返回: {'key': 'value'}
|
||||||
|
|
||||||
|
# 如果解析失败,返回空字典(非严格模式)
|
||||||
|
# 严格模式下返回 None
|
||||||
|
```
|
||||||
|
|
||||||
|
### 提取特定字段:
|
||||||
|
```python
|
||||||
|
from src.utils.json_parser import extract_json_field
|
||||||
|
|
||||||
|
llm_response = '{"score": 0.85, "reason": "Good quality"}'
|
||||||
|
score = extract_json_field(llm_response, "score", default=0.0)
|
||||||
|
# 返回: 0.85
|
||||||
|
```
|
||||||
|
|
||||||
|
## 测试建议
|
||||||
|
|
||||||
|
1. **单元测试**:
|
||||||
|
- 测试各种 JSON 格式(带/不带代码块标记)
|
||||||
|
- 测试格式错误的 JSON(验证 json_repair 的修复能力)
|
||||||
|
- 测试嵌套 JSON 结构
|
||||||
|
- 测试空响应和无效响应
|
||||||
|
|
||||||
|
2. **集成测试**:
|
||||||
|
- 在实际 LLM 调用场景中测试
|
||||||
|
- 验证不同模型的响应格式兼容性
|
||||||
|
- 测试错误处理和日志输出
|
||||||
|
|
||||||
|
3. **性能测试**:
|
||||||
|
- 测试大型 JSON 的解析性能
|
||||||
|
- 验证缓存和优化策略
|
||||||
|
|
||||||
|
## 迁移指南
|
||||||
|
|
||||||
|
### 旧代码模式:
|
||||||
|
```python
|
||||||
|
# 旧的自定义解析逻辑
|
||||||
|
def _extract_json(response: str) -> str | None:
|
||||||
|
stripped = response.strip()
|
||||||
|
code_block_match = re.search(r"```(?:json)?\\s*(.*?)```", stripped, re.DOTALL)
|
||||||
|
if code_block_match:
|
||||||
|
return code_block_match.group(1)
|
||||||
|
# ... 更多自定义逻辑
|
||||||
|
|
||||||
|
# 使用
|
||||||
|
payload = self._extract_json(response)
|
||||||
|
if payload:
|
||||||
|
data = orjson.loads(payload)
|
||||||
|
```
|
||||||
|
|
||||||
|
### 新代码模式:
|
||||||
|
```python
|
||||||
|
# 使用统一工具
|
||||||
|
from src.utils.json_parser import extract_and_parse_json
|
||||||
|
|
||||||
|
# 直接解析
|
||||||
|
data = extract_and_parse_json(response, strict=False)
|
||||||
|
if data and isinstance(data, dict):
|
||||||
|
# 使用数据
|
||||||
|
pass
|
||||||
|
```
|
||||||
|
|
||||||
|
## 注意事项
|
||||||
|
|
||||||
|
1. **导入语句**:确保添加正确的导入
|
||||||
|
```python
|
||||||
|
from src.utils.json_parser import extract_and_parse_json
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **错误处理**:统一工具已包含错误处理,无需额外 try-except
|
||||||
|
```python
|
||||||
|
# 不需要
|
||||||
|
try:
|
||||||
|
data = extract_and_parse_json(response)
|
||||||
|
except Exception:
|
||||||
|
...
|
||||||
|
|
||||||
|
# 应该
|
||||||
|
data = extract_and_parse_json(response, strict=False)
|
||||||
|
if not data:
|
||||||
|
# 处理失败情况
|
||||||
|
pass
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **类型检查**:始终验证返回值类型
|
||||||
|
```python
|
||||||
|
data = extract_and_parse_json(response)
|
||||||
|
if isinstance(data, dict):
|
||||||
|
# 处理字典
|
||||||
|
elif isinstance(data, list):
|
||||||
|
# 处理列表
|
||||||
|
```
|
||||||
|
|
||||||
|
## 后续工作
|
||||||
|
|
||||||
|
1. 完成剩余文件的迁移
|
||||||
|
2. 添加完整的单元测试
|
||||||
|
3. 更新相关文档
|
||||||
|
4. 考虑添加性能监控和统计
|
||||||
|
|
||||||
|
## 日期
|
||||||
|
2025年11月2日
|
||||||
@@ -15,6 +15,7 @@ from src.common.config_helpers import resolve_embedding_dimension
|
|||||||
from src.common.data_models.bot_interest_data_model import BotInterestTag, BotPersonalityInterests, InterestMatchResult
|
from src.common.data_models.bot_interest_data_model import BotInterestTag, BotPersonalityInterests, InterestMatchResult
|
||||||
from src.common.logger import get_logger
|
from src.common.logger import get_logger
|
||||||
from src.config.config import global_config
|
from src.config.config import global_config
|
||||||
|
from src.utils.json_parser import extract_and_parse_json
|
||||||
|
|
||||||
logger = get_logger("bot_interest_manager")
|
logger = get_logger("bot_interest_manager")
|
||||||
|
|
||||||
@@ -194,7 +195,10 @@ class BotInterestManager:
|
|||||||
raise RuntimeError("❌ LLM未返回有效响应")
|
raise RuntimeError("❌ LLM未返回有效响应")
|
||||||
|
|
||||||
logger.info("✅ LLM响应成功,开始解析兴趣标签...")
|
logger.info("✅ LLM响应成功,开始解析兴趣标签...")
|
||||||
interests_data = orjson.loads(response)
|
# 使用统一的 JSON 解析工具
|
||||||
|
interests_data = extract_and_parse_json(response, strict=False)
|
||||||
|
if not interests_data or not isinstance(interests_data, dict):
|
||||||
|
raise RuntimeError("❌ 解析LLM响应失败,未获取到有效的JSON数据")
|
||||||
|
|
||||||
bot_interests = BotPersonalityInterests(
|
bot_interests = BotPersonalityInterests(
|
||||||
personality_id=personality_id, personality_description=personality_description
|
personality_id=personality_id, personality_description=personality_description
|
||||||
@@ -225,9 +229,6 @@ class BotInterestManager:
|
|||||||
logger.info("✅ 兴趣标签生成完成")
|
logger.info("✅ 兴趣标签生成完成")
|
||||||
return bot_interests
|
return bot_interests
|
||||||
|
|
||||||
except orjson.JSONDecodeError as e:
|
|
||||||
logger.error(f"❌ 解析LLM响应JSON失败: {e}")
|
|
||||||
raise
|
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
logger.error(f"❌ 根据人设生成兴趣标签失败: {e}")
|
logger.error(f"❌ 根据人设生成兴趣标签失败: {e}")
|
||||||
traceback.print_exc()
|
traceback.print_exc()
|
||||||
@@ -270,9 +271,8 @@ class BotInterestManager:
|
|||||||
if reasoning_content:
|
if reasoning_content:
|
||||||
logger.debug(f"🧠 推理内容: {reasoning_content[:100]}...")
|
logger.debug(f"🧠 推理内容: {reasoning_content[:100]}...")
|
||||||
|
|
||||||
# 清理响应内容,移除可能的代码块标记
|
# 直接返回原始响应,后续使用统一的 JSON 解析工具
|
||||||
cleaned_response = self._clean_llm_response(response)
|
return response
|
||||||
return cleaned_response
|
|
||||||
else:
|
else:
|
||||||
logger.warning("⚠️ LLM返回空响应或调用失败")
|
logger.warning("⚠️ LLM返回空响应或调用失败")
|
||||||
return None
|
return None
|
||||||
@@ -283,25 +283,6 @@ class BotInterestManager:
|
|||||||
traceback.print_exc()
|
traceback.print_exc()
|
||||||
return None
|
return None
|
||||||
|
|
||||||
def _clean_llm_response(self, response: str) -> str:
|
|
||||||
"""清理LLM响应,移除代码块标记和其他非JSON内容"""
|
|
||||||
import re
|
|
||||||
|
|
||||||
# 移除 ```json 和 ``` 标记
|
|
||||||
cleaned = re.sub(r"```json\s*", "", response)
|
|
||||||
cleaned = re.sub(r"\s*```", "", cleaned)
|
|
||||||
|
|
||||||
# 移除可能的多余空格和换行
|
|
||||||
cleaned = cleaned.strip()
|
|
||||||
|
|
||||||
# 尝试提取JSON对象(如果响应中有其他文本)
|
|
||||||
json_match = re.search(r"\{.*\}", cleaned, re.DOTALL)
|
|
||||||
if json_match:
|
|
||||||
cleaned = json_match.group(0)
|
|
||||||
|
|
||||||
logger.debug(f"🧹 清理后的响应: {cleaned[:200]}..." if len(cleaned) > 200 else f"🧹 清理后的响应: {cleaned}")
|
|
||||||
return cleaned
|
|
||||||
|
|
||||||
async def _generate_embeddings_for_tags(self, interests: BotPersonalityInterests):
|
async def _generate_embeddings_for_tags(self, interests: BotPersonalityInterests):
|
||||||
"""为所有兴趣标签生成embedding"""
|
"""为所有兴趣标签生成embedding"""
|
||||||
if not hasattr(self, "embedding_request"):
|
if not hasattr(self, "embedding_request"):
|
||||||
|
|||||||
@@ -11,6 +11,7 @@ import orjson
|
|||||||
from src.chat.memory_system.memory_chunk import MemoryType
|
from src.chat.memory_system.memory_chunk import MemoryType
|
||||||
from src.common.logger import get_logger
|
from src.common.logger import get_logger
|
||||||
from src.llm_models.utils_model import LLMRequest
|
from src.llm_models.utils_model import LLMRequest
|
||||||
|
from src.utils.json_parser import extract_and_parse_json
|
||||||
|
|
||||||
logger = get_logger(__name__)
|
logger = get_logger(__name__)
|
||||||
|
|
||||||
@@ -58,16 +59,10 @@ class MemoryQueryPlanner:
|
|||||||
|
|
||||||
try:
|
try:
|
||||||
response, _ = await self.model.generate_response_async(prompt, temperature=0.2)
|
response, _ = await self.model.generate_response_async(prompt, temperature=0.2)
|
||||||
payload = self._extract_json_payload(response)
|
# 使用统一的 JSON 解析工具
|
||||||
if not payload:
|
data = extract_and_parse_json(response, strict=False)
|
||||||
logger.debug("查询规划模型未返回结构化结果,使用默认规划")
|
if not data or not isinstance(data, dict):
|
||||||
return self._default_plan(query_text)
|
logger.debug("查询规划模型未返回有效的结构化结果,使用默认规划")
|
||||||
|
|
||||||
try:
|
|
||||||
data = orjson.loads(payload)
|
|
||||||
except orjson.JSONDecodeError as exc:
|
|
||||||
preview = payload[:200]
|
|
||||||
logger.warning("解析查询规划JSON失败: %s,片段: %s", exc, preview)
|
|
||||||
return self._default_plan(query_text)
|
return self._default_plan(query_text)
|
||||||
|
|
||||||
plan = self._parse_plan_dict(data, query_text)
|
plan = self._parse_plan_dict(data, query_text)
|
||||||
@@ -205,24 +200,6 @@ class MemoryQueryPlanner:
|
|||||||
请直接输出符合要求的 JSON 对象,禁止添加额外文本或 Markdown 代码块。
|
请直接输出符合要求的 JSON 对象,禁止添加额外文本或 Markdown 代码块。
|
||||||
"""
|
"""
|
||||||
|
|
||||||
def _extract_json_payload(self, response: str) -> str | None:
|
|
||||||
if not response:
|
|
||||||
return None
|
|
||||||
|
|
||||||
stripped = response.strip()
|
|
||||||
code_block_match = re.search(r"```(?:json)?\s*(.*?)```", stripped, re.IGNORECASE | re.DOTALL)
|
|
||||||
if code_block_match:
|
|
||||||
candidate = code_block_match.group(1).strip()
|
|
||||||
if candidate:
|
|
||||||
return candidate
|
|
||||||
|
|
||||||
start = stripped.find("{")
|
|
||||||
end = stripped.rfind("}")
|
|
||||||
if start != -1 and end != -1 and end > start:
|
|
||||||
return stripped[start : end + 1]
|
|
||||||
|
|
||||||
return stripped if stripped.startswith("{") and stripped.endswith("}") else None
|
|
||||||
|
|
||||||
@staticmethod
|
@staticmethod
|
||||||
def _safe_str(value: Any) -> str:
|
def _safe_str(value: Any) -> str:
|
||||||
if isinstance(value, str):
|
if isinstance(value, str):
|
||||||
|
|||||||
@@ -19,6 +19,7 @@ from src.chat.memory_system.memory_builder import MemoryBuilder, MemoryExtractio
|
|||||||
from src.chat.memory_system.memory_chunk import MemoryChunk
|
from src.chat.memory_system.memory_chunk import MemoryChunk
|
||||||
from src.chat.memory_system.memory_fusion import MemoryFusionEngine
|
from src.chat.memory_system.memory_fusion import MemoryFusionEngine
|
||||||
from src.chat.memory_system.memory_query_planner import MemoryQueryPlanner
|
from src.chat.memory_system.memory_query_planner import MemoryQueryPlanner
|
||||||
|
from src.utils.json_parser import extract_and_parse_json
|
||||||
|
|
||||||
# 全局背景任务集合
|
# 全局背景任务集合
|
||||||
_background_tasks = set()
|
_background_tasks = set()
|
||||||
@@ -987,28 +988,7 @@ class MemorySystem:
|
|||||||
|
|
||||||
return [chunk]
|
return [chunk]
|
||||||
|
|
||||||
@staticmethod
|
# 已移除自定义的 _extract_json_payload 方法,统一使用 src.utils.json_parser.extract_and_parse_json
|
||||||
def _extract_json_payload(response: str) -> str | None:
|
|
||||||
"""从模型响应中提取JSON部分,兼容Markdown代码块等格式"""
|
|
||||||
if not response:
|
|
||||||
return None
|
|
||||||
|
|
||||||
stripped = response.strip()
|
|
||||||
|
|
||||||
# 优先处理Markdown代码块格式 ```json ... ```
|
|
||||||
code_block_match = re.search(r"```(?:json)?\s*(.*?)```", stripped, re.IGNORECASE | re.DOTALL)
|
|
||||||
if code_block_match:
|
|
||||||
candidate = code_block_match.group(1).strip()
|
|
||||||
if candidate:
|
|
||||||
return candidate
|
|
||||||
|
|
||||||
# 回退到查找第一个 JSON 对象的大括号范围
|
|
||||||
start = stripped.find("{")
|
|
||||||
end = stripped.rfind("}")
|
|
||||||
if start != -1 and end != -1 and end > start:
|
|
||||||
return stripped[start : end + 1].strip()
|
|
||||||
|
|
||||||
return stripped if stripped.startswith("{") and stripped.endswith("}") else None
|
|
||||||
|
|
||||||
def _normalize_context(
|
def _normalize_context(
|
||||||
self, raw_context: dict[str, Any] | None, user_id: str | None, timestamp: float | None
|
self, raw_context: dict[str, Any] | None, user_id: str | None, timestamp: float | None
|
||||||
@@ -1416,13 +1396,13 @@ class MemorySystem:
|
|||||||
return 0.5
|
return 0.5
|
||||||
response, _ = await self.value_assessment_model.generate_response_async(prompt, temperature=0.3)
|
response, _ = await self.value_assessment_model.generate_response_async(prompt, temperature=0.3)
|
||||||
|
|
||||||
# 解析响应
|
# 解析响应 - 使用统一的 JSON 解析工具
|
||||||
try:
|
result = extract_and_parse_json(response, strict=False)
|
||||||
payload = self._extract_json_payload(response)
|
if not result or not isinstance(result, dict):
|
||||||
if not payload:
|
logger.warning(f"解析价值评估响应失败,响应片段: {response[:200]}")
|
||||||
raise ValueError("未在响应中找到有效的JSON负载")
|
return 0.5 # 默认中等价值
|
||||||
|
|
||||||
result = orjson.loads(payload)
|
try:
|
||||||
value_score = float(result.get("value_score", 0.0))
|
value_score = float(result.get("value_score", 0.0))
|
||||||
reasoning = result.get("reasoning", "")
|
reasoning = result.get("reasoning", "")
|
||||||
key_factors = result.get("key_factors", [])
|
key_factors = result.get("key_factors", [])
|
||||||
@@ -1433,9 +1413,8 @@ class MemorySystem:
|
|||||||
|
|
||||||
return max(0.0, min(1.0, value_score))
|
return max(0.0, min(1.0, value_score))
|
||||||
|
|
||||||
except (orjson.JSONDecodeError, ValueError) as e:
|
except (ValueError, TypeError) as e:
|
||||||
preview = response[:200].replace("\n", " ")
|
logger.warning(f"解析价值评估数值失败: {e}")
|
||||||
logger.warning(f"解析价值评估响应失败: {e}, 响应片段: {preview}")
|
|
||||||
return 0.5 # 默认中等价值
|
return 0.5 # 默认中等价值
|
||||||
|
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
|
|||||||
@@ -13,6 +13,7 @@ from src.common.logger import get_logger
|
|||||||
from src.config.config import model_config
|
from src.config.config import model_config
|
||||||
from src.llm_models.utils_model import LLMRequest
|
from src.llm_models.utils_model import LLMRequest
|
||||||
from src.plugin_system import BaseTool, ToolParamType
|
from src.plugin_system import BaseTool, ToolParamType
|
||||||
|
from src.utils.json_parser import extract_and_parse_json
|
||||||
|
|
||||||
logger = get_logger("chat_stream_impression_tool")
|
logger = get_logger("chat_stream_impression_tool")
|
||||||
|
|
||||||
@@ -290,9 +291,11 @@ class ChatStreamImpressionTool(BaseTool):
|
|||||||
logger.warning("LLM未返回有效响应")
|
logger.warning("LLM未返回有效响应")
|
||||||
return None
|
return None
|
||||||
|
|
||||||
# 清理并解析响应
|
# 使用统一的 JSON 解析工具
|
||||||
cleaned_response = self._clean_llm_json_response(llm_response)
|
response_data = extract_and_parse_json(llm_response, strict=False)
|
||||||
response_data = json.loads(cleaned_response)
|
if not response_data or not isinstance(response_data, dict):
|
||||||
|
logger.warning("解析LLM响应失败")
|
||||||
|
return None
|
||||||
|
|
||||||
# 提取最终决定的数据
|
# 提取最终决定的数据
|
||||||
final_impression = {
|
final_impression = {
|
||||||
@@ -373,35 +376,18 @@ class ChatStreamImpressionTool(BaseTool):
|
|||||||
logger.error(f"更新聊天流印象到数据库失败: {e}", exc_info=True)
|
logger.error(f"更新聊天流印象到数据库失败: {e}", exc_info=True)
|
||||||
raise
|
raise
|
||||||
|
|
||||||
def _clean_llm_json_response(self, response: str) -> str:
|
# 已移除自定义的 _clean_llm_json_response 方法,统一使用 src.utils.json_parser.extract_and_parse_json
|
||||||
"""清理LLM响应,移除可能的JSON格式标记
|
|
||||||
|
def _clean_llm_json_response_deprecated(self, response: str) -> str:
|
||||||
Args:
|
"""已废弃,保留仅用于兼容性
|
||||||
response: LLM原始响应
|
|
||||||
|
请使用 src.utils.json_parser.extract_and_parse_json 替代
|
||||||
Returns:
|
|
||||||
str: 清理后的JSON字符串
|
|
||||||
"""
|
"""
|
||||||
|
from src.utils.json_parser import extract_and_parse_json
|
||||||
try:
|
try:
|
||||||
import re
|
import json
|
||||||
|
result = extract_and_parse_json(response, strict=False)
|
||||||
cleaned = response.strip()
|
return json.dumps(result) if result else response
|
||||||
|
|
||||||
# 移除 ```json 或 ``` 等标记
|
|
||||||
cleaned = re.sub(r"^```(?:json)?\s*", "", cleaned, flags=re.MULTILINE | re.IGNORECASE)
|
|
||||||
cleaned = re.sub(r"\s*```$", "", cleaned, flags=re.MULTILINE)
|
|
||||||
|
|
||||||
# 尝试找到JSON对象的开始和结束
|
|
||||||
json_start = cleaned.find("{")
|
|
||||||
json_end = cleaned.rfind("}")
|
|
||||||
|
|
||||||
if json_start != -1 and json_end != -1 and json_end > json_start:
|
|
||||||
cleaned = cleaned[json_start : json_end + 1]
|
|
||||||
|
|
||||||
cleaned = cleaned.strip()
|
|
||||||
|
|
||||||
return cleaned
|
|
||||||
|
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
logger.warning(f"清理LLM响应失败: {e}")
|
logger.warning(f"清理LLM响应失败: {e}")
|
||||||
return response
|
return response
|
||||||
|
|||||||
@@ -17,6 +17,7 @@ from src.config.config import global_config, model_config
|
|||||||
from src.individuality.individuality import Individuality
|
from src.individuality.individuality import Individuality
|
||||||
from src.llm_models.utils_model import LLMRequest
|
from src.llm_models.utils_model import LLMRequest
|
||||||
from src.plugin_system.apis import message_api, send_api
|
from src.plugin_system.apis import message_api, send_api
|
||||||
|
from src.utils.json_parser import extract_and_parse_json
|
||||||
|
|
||||||
logger = get_logger("proactive_thinking_executor")
|
logger = get_logger("proactive_thinking_executor")
|
||||||
|
|
||||||
@@ -339,19 +340,17 @@ class ProactiveThinkingPlanner:
|
|||||||
logger.warning("LLM未返回有效响应")
|
logger.warning("LLM未返回有效响应")
|
||||||
return None
|
return None
|
||||||
|
|
||||||
# 清理并解析JSON响应
|
# 使用统一的 JSON 解析工具
|
||||||
cleaned_response = self._clean_json_response(response)
|
decision = extract_and_parse_json(response, strict=False)
|
||||||
decision = json.loads(cleaned_response)
|
if not decision or not isinstance(decision, dict):
|
||||||
|
logger.error("解析决策JSON失败")
|
||||||
|
if response:
|
||||||
|
logger.debug(f"原始响应: {response[:500]}")
|
||||||
|
return None
|
||||||
|
|
||||||
logger.info(f"决策结果: {decision.get('action', 'unknown')} - {decision.get('reasoning', '无理由')}")
|
logger.info(f"决策结果: {decision.get('action', 'unknown')} - {decision.get('reasoning', '无理由')}")
|
||||||
|
|
||||||
return decision
|
return decision
|
||||||
|
|
||||||
except json.JSONDecodeError as e:
|
|
||||||
logger.error(f"解析决策JSON失败: {e}")
|
|
||||||
if response:
|
|
||||||
logger.debug(f"原始响应: {response}")
|
|
||||||
return None
|
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
logger.error(f"决策过程失败: {e}", exc_info=True)
|
logger.error(f"决策过程失败: {e}", exc_info=True)
|
||||||
return None
|
return None
|
||||||
@@ -539,21 +538,7 @@ class ProactiveThinkingPlanner:
|
|||||||
logger.warning(f"获取表达方式失败: {e}")
|
logger.warning(f"获取表达方式失败: {e}")
|
||||||
return ""
|
return ""
|
||||||
|
|
||||||
def _clean_json_response(self, response: str) -> str:
|
# 已移除自定义的 _clean_json_response 方法,统一使用 src.utils.json_parser.extract_and_parse_json
|
||||||
"""清理LLM响应中的JSON格式标记"""
|
|
||||||
import re
|
|
||||||
|
|
||||||
cleaned = response.strip()
|
|
||||||
cleaned = re.sub(r"^```(?:json)?\s*", "", cleaned, flags=re.MULTILINE | re.IGNORECASE)
|
|
||||||
cleaned = re.sub(r"\s*```$", "", cleaned, flags=re.MULTILINE)
|
|
||||||
|
|
||||||
json_start = cleaned.find("{")
|
|
||||||
json_end = cleaned.rfind("}")
|
|
||||||
|
|
||||||
if json_start != -1 and json_end != -1 and json_end > json_start:
|
|
||||||
cleaned = cleaned[json_start : json_end + 1]
|
|
||||||
|
|
||||||
return cleaned.strip()
|
|
||||||
|
|
||||||
|
|
||||||
# 全局规划器实例
|
# 全局规划器实例
|
||||||
|
|||||||
@@ -16,6 +16,7 @@ from src.common.logger import get_logger
|
|||||||
from src.config.config import global_config, model_config
|
from src.config.config import global_config, model_config
|
||||||
from src.llm_models.utils_model import LLMRequest
|
from src.llm_models.utils_model import LLMRequest
|
||||||
from src.plugin_system import BaseTool, ToolParamType
|
from src.plugin_system import BaseTool, ToolParamType
|
||||||
|
from src.utils.json_parser import extract_and_parse_json
|
||||||
|
|
||||||
logger = get_logger("user_profile_tool")
|
logger = get_logger("user_profile_tool")
|
||||||
|
|
||||||
@@ -269,9 +270,12 @@ class UserProfileTool(BaseTool):
|
|||||||
logger.warning("LLM未返回有效响应")
|
logger.warning("LLM未返回有效响应")
|
||||||
return None
|
return None
|
||||||
|
|
||||||
# 清理并解析响应
|
# 使用统一的 JSON 解析工具
|
||||||
cleaned_response = self._clean_llm_json_response(llm_response)
|
response_data = extract_and_parse_json(llm_response, strict=False)
|
||||||
response_data = orjson.loads(cleaned_response)
|
if not response_data or not isinstance(response_data, dict):
|
||||||
|
logger.error("LLM响应JSON解析失败")
|
||||||
|
logger.debug(f"LLM原始响应: {llm_response[:500] if llm_response else 'N/A'}")
|
||||||
|
return None
|
||||||
|
|
||||||
# 提取最终决定的数据
|
# 提取最终决定的数据
|
||||||
final_profile = {
|
final_profile = {
|
||||||
@@ -285,11 +289,6 @@ class UserProfileTool(BaseTool):
|
|||||||
logger.debug(f"决策理由: {response_data.get('reasoning', '无')}")
|
logger.debug(f"决策理由: {response_data.get('reasoning', '无')}")
|
||||||
|
|
||||||
return final_profile
|
return final_profile
|
||||||
|
|
||||||
except orjson.JSONDecodeError as e:
|
|
||||||
logger.error(f"LLM响应JSON解析失败: {e}")
|
|
||||||
logger.debug(f"LLM原始响应: {llm_response if 'llm_response' in locals() else 'N/A'}")
|
|
||||||
return None
|
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
logger.error(f"LLM决策失败: {e}", exc_info=True)
|
logger.error(f"LLM决策失败: {e}", exc_info=True)
|
||||||
return None
|
return None
|
||||||
@@ -336,35 +335,4 @@ class UserProfileTool(BaseTool):
|
|||||||
logger.error(f"更新用户画像到数据库失败: {e}", exc_info=True)
|
logger.error(f"更新用户画像到数据库失败: {e}", exc_info=True)
|
||||||
raise
|
raise
|
||||||
|
|
||||||
def _clean_llm_json_response(self, response: str) -> str:
|
# 已移除自定义的 _clean_llm_json_response 方法,统一使用 src.utils.json_parser.extract_and_parse_json
|
||||||
"""清理LLM响应,移除可能的JSON格式标记
|
|
||||||
|
|
||||||
Args:
|
|
||||||
response: LLM原始响应
|
|
||||||
|
|
||||||
Returns:
|
|
||||||
str: 清理后的JSON字符串
|
|
||||||
"""
|
|
||||||
try:
|
|
||||||
import re
|
|
||||||
|
|
||||||
cleaned = response.strip()
|
|
||||||
|
|
||||||
# 移除 ```json 或 ``` 等标记
|
|
||||||
cleaned = re.sub(r"^```(?:json)?\s*", "", cleaned, flags=re.MULTILINE | re.IGNORECASE)
|
|
||||||
cleaned = re.sub(r"\s*```$", "", cleaned, flags=re.MULTILINE)
|
|
||||||
|
|
||||||
# 尝试找到JSON对象的开始和结束
|
|
||||||
json_start = cleaned.find("{")
|
|
||||||
json_end = cleaned.rfind("}")
|
|
||||||
|
|
||||||
if json_start != -1 and json_end != -1 and json_end > json_start:
|
|
||||||
cleaned = cleaned[json_start:json_end + 1]
|
|
||||||
|
|
||||||
cleaned = cleaned.strip()
|
|
||||||
|
|
||||||
return cleaned
|
|
||||||
|
|
||||||
except Exception as e:
|
|
||||||
logger.warning(f"清理LLM响应失败: {e}")
|
|
||||||
return response
|
|
||||||
|
|||||||
240
src/utils/json_parser.py
Normal file
240
src/utils/json_parser.py
Normal file
@@ -0,0 +1,240 @@
|
|||||||
|
"""
|
||||||
|
统一的 JSON 解析工具模块
|
||||||
|
|
||||||
|
提供统一的 LLM 响应 JSON 解析功能,使用 json_repair 库进行修复,
|
||||||
|
简化代码并提高解析成功率。
|
||||||
|
"""
|
||||||
|
|
||||||
|
import re
|
||||||
|
from typing import Any
|
||||||
|
|
||||||
|
import orjson
|
||||||
|
from json_repair import repair_json
|
||||||
|
|
||||||
|
from src.common.logger import get_logger
|
||||||
|
|
||||||
|
logger = get_logger(__name__)
|
||||||
|
|
||||||
|
|
||||||
|
def extract_and_parse_json(response: str, *, strict: bool = False) -> dict[str, Any] | list | None:
|
||||||
|
"""
|
||||||
|
从 LLM 响应中提取并解析 JSON
|
||||||
|
|
||||||
|
处理策略:
|
||||||
|
1. 清理 Markdown 代码块标记(```json 和 ```)
|
||||||
|
2. 提取 JSON 对象或数组
|
||||||
|
3. 使用 json_repair 修复格式问题
|
||||||
|
4. 解析为 Python 对象
|
||||||
|
|
||||||
|
Args:
|
||||||
|
response: LLM 响应字符串
|
||||||
|
strict: 严格模式,如果为 True 则解析失败时返回 None,否则尝试容错处理
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
解析后的 dict 或 list,失败时返回 None
|
||||||
|
|
||||||
|
Examples:
|
||||||
|
>>> extract_and_parse_json('```json\\n{"key": "value"}\\n```')
|
||||||
|
{'key': 'value'}
|
||||||
|
|
||||||
|
>>> extract_and_parse_json('Some text {"key": "value"} more text')
|
||||||
|
{'key': 'value'}
|
||||||
|
|
||||||
|
>>> extract_and_parse_json('[{"a": 1}, {"b": 2}]')
|
||||||
|
[{'a': 1}, {'b': 2}]
|
||||||
|
"""
|
||||||
|
if not response:
|
||||||
|
logger.debug("空响应,无法解析 JSON")
|
||||||
|
return None
|
||||||
|
|
||||||
|
try:
|
||||||
|
# 步骤 1: 清理响应
|
||||||
|
cleaned = _clean_llm_response(response)
|
||||||
|
|
||||||
|
if not cleaned:
|
||||||
|
logger.warning("清理后的响应为空")
|
||||||
|
return None
|
||||||
|
|
||||||
|
# 步骤 2: 尝试直接解析
|
||||||
|
try:
|
||||||
|
result = orjson.loads(cleaned)
|
||||||
|
logger.debug(f"✅ JSON 直接解析成功,类型: {type(result).__name__}")
|
||||||
|
return result
|
||||||
|
except Exception as direct_error:
|
||||||
|
logger.debug(f"直接解析失败: {type(direct_error).__name__}: {direct_error}")
|
||||||
|
|
||||||
|
# 步骤 3: 使用 json_repair 修复并解析
|
||||||
|
try:
|
||||||
|
repaired = repair_json(cleaned)
|
||||||
|
|
||||||
|
# repair_json 可能返回字符串或已解析的对象
|
||||||
|
if isinstance(repaired, str):
|
||||||
|
result = orjson.loads(repaired)
|
||||||
|
logger.debug(f"✅ JSON 修复后解析成功(字符串模式),类型: {type(result).__name__}")
|
||||||
|
else:
|
||||||
|
result = repaired
|
||||||
|
logger.debug(f"✅ JSON 修复后解析成功(对象模式),类型: {type(result).__name__}")
|
||||||
|
|
||||||
|
return result
|
||||||
|
|
||||||
|
except Exception as repair_error:
|
||||||
|
logger.warning(f"JSON 修复失败: {type(repair_error).__name__}: {repair_error}")
|
||||||
|
|
||||||
|
if strict:
|
||||||
|
logger.error(f"严格模式下解析失败,响应片段: {cleaned[:200]}")
|
||||||
|
return None
|
||||||
|
|
||||||
|
# 最后的容错尝试:返回空字典或空列表
|
||||||
|
if cleaned.strip().startswith("["):
|
||||||
|
logger.warning("返回空列表作为容错")
|
||||||
|
return []
|
||||||
|
else:
|
||||||
|
logger.warning("返回空字典作为容错")
|
||||||
|
return {}
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"❌ JSON 解析过程出现异常: {type(e).__name__}: {e}")
|
||||||
|
if strict:
|
||||||
|
return None
|
||||||
|
return {} if not response.strip().startswith("[") else []
|
||||||
|
|
||||||
|
|
||||||
|
def _clean_llm_response(response: str) -> str:
|
||||||
|
"""
|
||||||
|
清理 LLM 响应,提取 JSON 部分
|
||||||
|
|
||||||
|
处理步骤:
|
||||||
|
1. 移除 Markdown 代码块标记(```json 和 ```)
|
||||||
|
2. 提取第一个完整的 JSON 对象 {...} 或数组 [...]
|
||||||
|
3. 清理多余的空格和换行
|
||||||
|
|
||||||
|
Args:
|
||||||
|
response: 原始 LLM 响应
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
清理后的 JSON 字符串
|
||||||
|
"""
|
||||||
|
if not response:
|
||||||
|
return ""
|
||||||
|
|
||||||
|
cleaned = response.strip()
|
||||||
|
|
||||||
|
# 移除 Markdown 代码块标记
|
||||||
|
# 匹配 ```json ... ``` 或 ``` ... ```
|
||||||
|
code_block_patterns = [
|
||||||
|
r"```json\s*(.*?)```", # ```json ... ```
|
||||||
|
r"```\s*(.*?)```", # ``` ... ```
|
||||||
|
]
|
||||||
|
|
||||||
|
for pattern in code_block_patterns:
|
||||||
|
match = re.search(pattern, cleaned, re.IGNORECASE | re.DOTALL)
|
||||||
|
if match:
|
||||||
|
cleaned = match.group(1).strip()
|
||||||
|
logger.debug(f"从 Markdown 代码块中提取内容,长度: {len(cleaned)}")
|
||||||
|
break
|
||||||
|
|
||||||
|
# 提取 JSON 对象或数组
|
||||||
|
# 优先查找对象 {...},其次查找数组 [...]
|
||||||
|
for start_char, end_char in [("{", "}"), ("[", "]")]:
|
||||||
|
start_idx = cleaned.find(start_char)
|
||||||
|
if start_idx != -1:
|
||||||
|
# 使用栈匹配找到对应的结束符
|
||||||
|
extracted = _extract_balanced_json(cleaned, start_idx, start_char, end_char)
|
||||||
|
if extracted:
|
||||||
|
logger.debug(f"提取到 {start_char}...{end_char} 结构,长度: {len(extracted)}")
|
||||||
|
return extracted
|
||||||
|
|
||||||
|
# 如果没有找到明确的 JSON 结构,返回清理后的原始内容
|
||||||
|
logger.debug("未找到明确的 JSON 结构,返回清理后的原始内容")
|
||||||
|
return cleaned
|
||||||
|
|
||||||
|
|
||||||
|
def _extract_balanced_json(text: str, start_idx: int, start_char: str, end_char: str) -> str | None:
|
||||||
|
"""
|
||||||
|
从指定位置提取平衡的 JSON 结构
|
||||||
|
|
||||||
|
使用栈匹配算法找到对应的结束符,处理嵌套和字符串中的特殊字符
|
||||||
|
|
||||||
|
Args:
|
||||||
|
text: 源文本
|
||||||
|
start_idx: 起始字符的索引
|
||||||
|
start_char: 起始字符({ 或 [)
|
||||||
|
end_char: 结束字符(} 或 ])
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
提取的 JSON 字符串,失败时返回 None
|
||||||
|
"""
|
||||||
|
depth = 0
|
||||||
|
in_string = False
|
||||||
|
escape_next = False
|
||||||
|
|
||||||
|
for i in range(start_idx, len(text)):
|
||||||
|
char = text[i]
|
||||||
|
|
||||||
|
# 处理转义字符
|
||||||
|
if escape_next:
|
||||||
|
escape_next = False
|
||||||
|
continue
|
||||||
|
|
||||||
|
if char == "\\":
|
||||||
|
escape_next = True
|
||||||
|
continue
|
||||||
|
|
||||||
|
# 处理字符串
|
||||||
|
if char == '"':
|
||||||
|
in_string = not in_string
|
||||||
|
continue
|
||||||
|
|
||||||
|
# 只在非字符串内处理括号
|
||||||
|
if not in_string:
|
||||||
|
if char == start_char:
|
||||||
|
depth += 1
|
||||||
|
elif char == end_char:
|
||||||
|
depth -= 1
|
||||||
|
if depth == 0:
|
||||||
|
# 找到匹配的结束符
|
||||||
|
return text[start_idx : i + 1].strip()
|
||||||
|
|
||||||
|
# 没有找到匹配的结束符
|
||||||
|
logger.debug(f"未找到匹配的 {end_char},深度: {depth}")
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def safe_parse_json(json_str: str, default: Any = None) -> Any:
|
||||||
|
"""
|
||||||
|
安全解析 JSON,失败时返回默认值
|
||||||
|
|
||||||
|
Args:
|
||||||
|
json_str: JSON 字符串
|
||||||
|
default: 解析失败时返回的默认值
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
解析结果或默认值
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
result = extract_and_parse_json(json_str, strict=False)
|
||||||
|
return result if result is not None else default
|
||||||
|
except Exception as e:
|
||||||
|
logger.warning(f"安全解析 JSON 失败: {e}")
|
||||||
|
return default
|
||||||
|
|
||||||
|
|
||||||
|
def extract_json_field(response: str, field_name: str, default: Any = None) -> Any:
|
||||||
|
"""
|
||||||
|
从 LLM 响应中提取特定字段的值
|
||||||
|
|
||||||
|
Args:
|
||||||
|
response: LLM 响应
|
||||||
|
field_name: 字段名
|
||||||
|
default: 字段不存在时的默认值
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
字段值或默认值
|
||||||
|
"""
|
||||||
|
parsed = extract_and_parse_json(response, strict=False)
|
||||||
|
|
||||||
|
if isinstance(parsed, dict):
|
||||||
|
return parsed.get(field_name, default)
|
||||||
|
|
||||||
|
logger.warning(f"解析结果不是字典,无法提取字段 '{field_name}'")
|
||||||
|
return default
|
||||||
Reference in New Issue
Block a user