refactor(json_parser): 统一 LLM 响应的 JSON 解析逻辑，简化代码并提高解析成功率

2025-11-02 12:18:53 +08:00
parent 7235c681d8
commit 0e024d30c2
8 changed files with 511 additions and 179 deletions
--- a/docs/development/json_parser_unification.md
+++ b/docs/development/json_parser_unification.md
@@ -0,0 +1,216 @@
+# JSON 解析统一化改进文档
+
+## 改进目标
+统一项目中所有 LLM 响应的 JSON 解析逻辑，使用 `json_repair` 库和统一的解析工具，简化代码并提高解析成功率。
+
+## 创建的新工具模块
+
+### `src/utils/json_parser.py`
+提供统一的 JSON 解析功能：
+
+#### 主要函数：
+1. **`extract_and_parse_json(response, strict=False)`**
+   - 从 LLM 响应中提取并解析 JSON
+   - 自动处理 Markdown 代码块标记
+   - 使用 json_repair 修复格式问题
+   - 支持严格模式和容错模式
+
+2. **`safe_parse_json(json_str, default=None)`**
+   - 安全解析 JSON，失败时返回默认值
+
+3. **`extract_json_field(response, field_name, default=None)`**
+   - 从 LLM 响应中提取特定字段的值
+
+#### 处理策略：
+1. 清理 Markdown 代码块标记（```json 和 ```）
+2. 提取 JSON 对象或数组（使用栈匹配算法）
+3. 尝试直接解析
+4. 如果失败，使用 json_repair 修复后解析
+5. 容错模式下返回空字典或空列表
+
+## 已修改的文件
+
+### 1. `src/chat/memory_system/memory_query_planner.py` ✅
+- 移除了自定义的 `_extract_json_payload` 方法
+- 使用 `extract_and_parse_json` 替代原有的解析逻辑
+- 简化了代码，提高了可维护性
+
+**修改前：**
+```python
+payload = self._extract_json_payload(response)
+if not payload:
+    return self._default_plan(query_text)
+try:
+    data = orjson.loads(payload)
+except orjson.JSONDecodeError as exc:
+    ...
+```
+
+**修改后：**
+```python
+data = extract_and_parse_json(response, strict=False)
+if not data or not isinstance(data, dict):
+    return self._default_plan(query_text)
+```
+
+### 2. `src/chat/memory_system/memory_system.py` ✅
+- 移除了自定义的 `_extract_json_payload` 方法
+- 在 `_evaluate_information_value` 方法中使用统一解析工具
+- 简化了错误处理逻辑
+
+### 3. `src/chat/interest_system/bot_interest_manager.py` ✅
+- 移除了自定义的 `_clean_llm_response` 方法
+- 使用 `extract_and_parse_json` 解析兴趣标签数据
+- 改进了错误处理和日志输出
+
+### 4. `src/plugins/built_in/affinity_flow_chatter/chat_stream_impression_tool.py` ✅
+- 将 `_clean_llm_json_response` 标记为已废弃
+- 使用 `extract_and_parse_json` 解析聊天流印象数据
+- 添加了类型检查和错误处理
+
+## 待修改的文件
+
+### 需要类似修改的其他文件：
+1. `src/plugins/built_in/affinity_flow_chatter/proactive_thinking_executor.py`
+   - 包含自定义的 JSON 清理逻辑
+   
+2. `src/plugins/built_in/affinity_flow_chatter/user_profile_tool.py`
+   - 包含自定义的 JSON 清理逻辑
+
+3. 其他包含自定义 JSON 解析逻辑的文件
+
+## 改进效果
+
+### 1. 代码简化
+- 消除了重复的 JSON 提取和清理代码
+- 减少了代码行数和维护成本
+- 统一了错误处理模式
+
+### 2. 解析成功率提升
+- 使用 json_repair 自动修复常见的 JSON 格式问题
+- 支持多种 JSON 包装格式（代码块、纯文本等）
+- 更好的容错处理
+
+### 3. 可维护性提升
+- 集中管理 JSON 解析逻辑
+- 易于添加新的解析策略
+- 便于调试和日志记录
+
+### 4. 一致性提升
+- 所有 LLM 响应使用相同的解析流程
+- 统一的日志输出格式
+- 一致的错误处理
+
+## 使用示例
+
+### 基本用法：
+```python
+from src.utils.json_parser import extract_and_parse_json
+
+# LLM 响应可能包含 Markdown 代码块或其他文本
+llm_response = '```json\\n{"key": "value"}\\n```'
+
+# 自动提取和解析
+data = extract_and_parse_json(llm_response, strict=False)
+# 返回: {'key': 'value'}
+
+# 如果解析失败，返回空字典（非严格模式）
+# 严格模式下返回 None
+```
+
+### 提取特定字段：
+```python
+from src.utils.json_parser import extract_json_field
+
+llm_response = '{"score": 0.85, "reason": "Good quality"}'
+score = extract_json_field(llm_response, "score", default=0.0)
+# 返回: 0.85
+```
+
+## 测试建议
+
+1. **单元测试**：
+   - 测试各种 JSON 格式（带/不带代码块标记）
+   - 测试格式错误的 JSON（验证 json_repair 的修复能力）
+   - 测试嵌套 JSON 结构
+   - 测试空响应和无效响应
+
+2. **集成测试**：
+   - 在实际 LLM 调用场景中测试
+   - 验证不同模型的响应格式兼容性
+   - 测试错误处理和日志输出
+
+3. **性能测试**：
+   - 测试大型 JSON 的解析性能
+   - 验证缓存和优化策略
+
+## 迁移指南
+
+### 旧代码模式：
+```python
+# 旧的自定义解析逻辑
+def _extract_json(response: str) -> str | None:
+    stripped = response.strip()
+    code_block_match = re.search(r"```(?:json)?\\s*(.*?)```", stripped, re.DOTALL)
+    if code_block_match:
+        return code_block_match.group(1)
+    # ... 更多自定义逻辑
+    
+# 使用
+payload = self._extract_json(response)
+if payload:
+    data = orjson.loads(payload)
+```
+
+### 新代码模式：
+```python
+# 使用统一工具
+from src.utils.json_parser import extract_and_parse_json
+
+# 直接解析
+data = extract_and_parse_json(response, strict=False)
+if data and isinstance(data, dict):
+    # 使用数据
+    pass
+```
+
+## 注意事项
+
+1. **导入语句**：确保添加正确的导入
+   ```python
+   from src.utils.json_parser import extract_and_parse_json
+   ```
+
+2. **错误处理**：统一工具已包含错误处理，无需额外 try-except
+   ```python
+   # 不需要
+   try:
+       data = extract_and_parse_json(response)
+   except Exception:
+       ...
+   
+   # 应该
+   data = extract_and_parse_json(response, strict=False)
+   if not data:
+       # 处理失败情况
+       pass
+   ```
+
+3. **类型检查**：始终验证返回值类型
+   ```python
+   data = extract_and_parse_json(response)
+   if isinstance(data, dict):
+       # 处理字典
+   elif isinstance(data, list):
+       # 处理列表
+   ```
+
+## 后续工作
+
+1. 完成剩余文件的迁移
+2. 添加完整的单元测试
+3. 更新相关文档
+4. 考虑添加性能监控和统计
+
+## 日期
+2025年11月2日
--- a/src/chat/interest_system/bot_interest_manager.py
+++ b/src/chat/interest_system/bot_interest_manager.py
@@ -15,6 +15,7 @@ from src.common.config_helpers import resolve_embedding_dimension
 from src.common.data_models.bot_interest_data_model import BotInterestTag, BotPersonalityInterests, InterestMatchResult
 from src.common.logger import get_logger
 from src.config.config import global_config
+from src.utils.json_parser import extract_and_parse_json

 logger = get_logger("bot_interest_manager")

@@ -194,7 +195,10 @@ class BotInterestManager:
                raise RuntimeError("❌ LLM未返回有效响应")

            logger.info("✅ LLM响应成功，开始解析兴趣标签...")
-            interests_data = orjson.loads(response)
+            # 使用统一的 JSON 解析工具
+            interests_data = extract_and_parse_json(response, strict=False)
+            if not interests_data or not isinstance(interests_data, dict):
+                raise RuntimeError("❌ 解析LLM响应失败，未获取到有效的JSON数据")

            bot_interests = BotPersonalityInterests(
                personality_id=personality_id, personality_description=personality_description
@@ -225,9 +229,6 @@ class BotInterestManager:
            logger.info("✅ 兴趣标签生成完成")
            return bot_interests

-        except orjson.JSONDecodeError as e:
-            logger.error(f"❌ 解析LLM响应JSON失败: {e}")
-            raise
        except Exception as e:
            logger.error(f"❌ 根据人设生成兴趣标签失败: {e}")
            traceback.print_exc()
@@ -270,9 +271,8 @@ class BotInterestManager:
                if reasoning_content:
                    logger.debug(f"🧠 推理内容: {reasoning_content[:100]}...")

-                # 清理响应内容，移除可能的代码块标记
-                cleaned_response = self._clean_llm_response(response)
-                return cleaned_response
+                # 直接返回原始响应，后续使用统一的 JSON 解析工具
+                return response
            else:
                logger.warning("⚠️ LLM返回空响应或调用失败")
                return None
@@ -283,25 +283,6 @@ class BotInterestManager:
            traceback.print_exc()
            return None

-    def _clean_llm_response(self, response: str) -> str:
-        """清理LLM响应，移除代码块标记和其他非JSON内容"""
-        import re
-
-        # 移除 ```json 和 ``` 标记
-        cleaned = re.sub(r"```json\s*", "", response)
-        cleaned = re.sub(r"\s*```", "", cleaned)
-
-        # 移除可能的多余空格和换行
-        cleaned = cleaned.strip()
-
-        # 尝试提取JSON对象（如果响应中有其他文本）
-        json_match = re.search(r"\{.*\}", cleaned, re.DOTALL)
-        if json_match:
-            cleaned = json_match.group(0)
-
-        logger.debug(f"🧹 清理后的响应: {cleaned[:200]}..." if len(cleaned) > 200 else f"🧹 清理后的响应: {cleaned}")
-        return cleaned
-
    async def _generate_embeddings_for_tags(self, interests: BotPersonalityInterests):
        """为所有兴趣标签生成embedding"""
        if not hasattr(self, "embedding_request"):
--- a/src/chat/memory_system/memory_query_planner.py
+++ b/src/chat/memory_system/memory_query_planner.py
@@ -11,6 +11,7 @@ import orjson
 from src.chat.memory_system.memory_chunk import MemoryType
 from src.common.logger import get_logger
 from src.llm_models.utils_model import LLMRequest
+from src.utils.json_parser import extract_and_parse_json

 logger = get_logger(__name__)

@@ -58,16 +59,10 @@ class MemoryQueryPlanner:

        try:
            response, _ = await self.model.generate_response_async(prompt, temperature=0.2)
-            payload = self._extract_json_payload(response)
-            if not payload:
-                logger.debug("查询规划模型未返回结构化结果，使用默认规划")
-                return self._default_plan(query_text)
-
-            try:
-                data = orjson.loads(payload)
-            except orjson.JSONDecodeError as exc:
-                preview = payload[:200]
-                logger.warning("解析查询规划JSON失败: %s，片段: %s", exc, preview)
+            # 使用统一的 JSON 解析工具
+            data = extract_and_parse_json(response, strict=False)
+            if not data or not isinstance(data, dict):
+                logger.debug("查询规划模型未返回有效的结构化结果，使用默认规划")
                return self._default_plan(query_text)

            plan = self._parse_plan_dict(data, query_text)
@@ -205,24 +200,6 @@ class MemoryQueryPlanner:
 请直接输出符合要求的 JSON 对象，禁止添加额外文本或 Markdown 代码块。
 """

-    def _extract_json_payload(self, response: str) -> str | None:
-        if not response:
-            return None
-
-        stripped = response.strip()
-        code_block_match = re.search(r"```(?:json)?\s*(.*?)```", stripped, re.IGNORECASE | re.DOTALL)
-        if code_block_match:
-            candidate = code_block_match.group(1).strip()
-            if candidate:
-                return candidate
-
-        start = stripped.find("{")
-        end = stripped.rfind("}")
-        if start != -1 and end != -1 and end > start:
-            return stripped[start : end + 1]
-
-        return stripped if stripped.startswith("{") and stripped.endswith("}") else None
-
    @staticmethod
    def _safe_str(value: Any) -> str:
        if isinstance(value, str):
--- a/src/chat/memory_system/memory_system.py
+++ b/src/chat/memory_system/memory_system.py
@@ -19,6 +19,7 @@ from src.chat.memory_system.memory_builder import MemoryBuilder, MemoryExtractio
 from src.chat.memory_system.memory_chunk import MemoryChunk
 from src.chat.memory_system.memory_fusion import MemoryFusionEngine
 from src.chat.memory_system.memory_query_planner import MemoryQueryPlanner
+from src.utils.json_parser import extract_and_parse_json

 # 全局背景任务集合
 _background_tasks = set()
@@ -987,28 +988,7 @@ class MemorySystem:

        return [chunk]

-    @staticmethod
-    def _extract_json_payload(response: str) -> str | None:
-        """从模型响应中提取JSON部分，兼容Markdown代码块等格式"""
-        if not response:
-            return None
-
-        stripped = response.strip()
-
-        # 优先处理Markdown代码块格式 ```json ... ```
-        code_block_match = re.search(r"```(?:json)?\s*(.*?)```", stripped, re.IGNORECASE | re.DOTALL)
-        if code_block_match:
-            candidate = code_block_match.group(1).strip()
-            if candidate:
-                return candidate
-
-        # 回退到查找第一个 JSON 对象的大括号范围
-        start = stripped.find("{")
-        end = stripped.rfind("}")
-        if start != -1 and end != -1 and end > start:
-            return stripped[start : end + 1].strip()
-
-        return stripped if stripped.startswith("{") and stripped.endswith("}") else None
+    # 已移除自定义的 _extract_json_payload 方法，统一使用 src.utils.json_parser.extract_and_parse_json

    def _normalize_context(
        self, raw_context: dict[str, Any] | None, user_id: str | None, timestamp: float | None
@@ -1416,13 +1396,13 @@ class MemorySystem:
                return 0.5
            response, _ = await self.value_assessment_model.generate_response_async(prompt, temperature=0.3)

-            # 解析响应
-            try:
-                payload = self._extract_json_payload(response)
-                if not payload:
-                    raise ValueError("未在响应中找到有效的JSON负载")
+            # 解析响应 - 使用统一的 JSON 解析工具
+            result = extract_and_parse_json(response, strict=False)
+            if not result or not isinstance(result, dict):
+                logger.warning(f"解析价值评估响应失败，响应片段: {response[:200]}")
+                return 0.5  # 默认中等价值

-                result = orjson.loads(payload)
+            try:
                value_score = float(result.get("value_score", 0.0))
                reasoning = result.get("reasoning", "")
                key_factors = result.get("key_factors", [])
@@ -1433,9 +1413,8 @@ class MemorySystem:

                return max(0.0, min(1.0, value_score))

-            except (orjson.JSONDecodeError, ValueError) as e:
-                preview = response[:200].replace("\n", " ")
-                logger.warning(f"解析价值评估响应失败: {e}, 响应片段: {preview}")
+            except (ValueError, TypeError) as e:
+                logger.warning(f"解析价值评估数值失败: {e}")
                return 0.5  # 默认中等价值

        except Exception as e:
--- a/src/plugins/built_in/affinity_flow_chatter/chat_stream_impression_tool.py
+++ b/src/plugins/built_in/affinity_flow_chatter/chat_stream_impression_tool.py
@@ -13,6 +13,7 @@ from src.common.logger import get_logger
 from src.config.config import model_config
 from src.llm_models.utils_model import LLMRequest
 from src.plugin_system import BaseTool, ToolParamType
+from src.utils.json_parser import extract_and_parse_json

 logger = get_logger("chat_stream_impression_tool")

@@ -290,9 +291,11 @@ class ChatStreamImpressionTool(BaseTool):
                logger.warning("LLM未返回有效响应")
                return None

-            # 清理并解析响应
-            cleaned_response = self._clean_llm_json_response(llm_response)
-            response_data = json.loads(cleaned_response)
+            # 使用统一的 JSON 解析工具
+            response_data = extract_and_parse_json(llm_response, strict=False)
+            if not response_data or not isinstance(response_data, dict):
+                logger.warning("解析LLM响应失败")
+                return None

            # 提取最终决定的数据
            final_impression = {
@@ -373,35 +376,18 @@ class ChatStreamImpressionTool(BaseTool):
            logger.error(f"更新聊天流印象到数据库失败: {e}", exc_info=True)
            raise

-    def _clean_llm_json_response(self, response: str) -> str:
-        """清理LLM响应，移除可能的JSON格式标记
+    # 已移除自定义的 _clean_llm_json_response 方法，统一使用 src.utils.json_parser.extract_and_parse_json
    
-        Args:
-            response: LLM原始响应
+    def _clean_llm_json_response_deprecated(self, response: str) -> str:
+        """已废弃，保留仅用于兼容性
        
-        Returns:
-            str: 清理后的JSON字符串
+        请使用 src.utils.json_parser.extract_and_parse_json 替代
        """
+        from src.utils.json_parser import extract_and_parse_json
        try:
-            import re
-
-            cleaned = response.strip()
-
-            # 移除 ```json 或 ``` 等标记
-            cleaned = re.sub(r"^```(?:json)?\s*", "", cleaned, flags=re.MULTILINE | re.IGNORECASE)
-            cleaned = re.sub(r"\s*```$", "", cleaned, flags=re.MULTILINE)
-
-            # 尝试找到JSON对象的开始和结束
-            json_start = cleaned.find("{")
-            json_end = cleaned.rfind("}")
-
-            if json_start != -1 and json_end != -1 and json_end > json_start:
-                cleaned = cleaned[json_start : json_end + 1]
-
-            cleaned = cleaned.strip()
-
-            return cleaned
-
+            import json
+            result = extract_and_parse_json(response, strict=False)
+            return json.dumps(result) if result else response
        except Exception as e:
            logger.warning(f"清理LLM响应失败: {e}")
            return response
--- a/src/plugins/built_in/affinity_flow_chatter/proactive_thinking_executor.py
+++ b/src/plugins/built_in/affinity_flow_chatter/proactive_thinking_executor.py
@@ -17,6 +17,7 @@ from src.config.config import global_config, model_config
 from src.individuality.individuality import Individuality
 from src.llm_models.utils_model import LLMRequest
 from src.plugin_system.apis import message_api, send_api
+from src.utils.json_parser import extract_and_parse_json

 logger = get_logger("proactive_thinking_executor")

@@ -339,19 +340,17 @@ class ProactiveThinkingPlanner:
                logger.warning("LLM未返回有效响应")
                return None

-            # 清理并解析JSON响应
-            cleaned_response = self._clean_json_response(response)
-            decision = json.loads(cleaned_response)
+            # 使用统一的 JSON 解析工具
+            decision = extract_and_parse_json(response, strict=False)
+            if not decision or not isinstance(decision, dict):
+                logger.error("解析决策JSON失败")
+                if response:
+                    logger.debug(f"原始响应: {response[:500]}")
+                return None

            logger.info(f"决策结果: {decision.get('action', 'unknown')} - {decision.get('reasoning', '无理由')}")

            return decision
-
-        except json.JSONDecodeError as e:
-            logger.error(f"解析决策JSON失败: {e}")
-            if response:
-                logger.debug(f"原始响应: {response}")
-            return None
        except Exception as e:
            logger.error(f"决策过程失败: {e}", exc_info=True)
            return None
@@ -539,21 +538,7 @@ class ProactiveThinkingPlanner:
            logger.warning(f"获取表达方式失败: {e}")
            return ""

-    def _clean_json_response(self, response: str) -> str:
-        """清理LLM响应中的JSON格式标记"""
-        import re
-
-        cleaned = response.strip()
-        cleaned = re.sub(r"^```(?:json)?\s*", "", cleaned, flags=re.MULTILINE | re.IGNORECASE)
-        cleaned = re.sub(r"\s*```$", "", cleaned, flags=re.MULTILINE)
-
-        json_start = cleaned.find("{")
-        json_end = cleaned.rfind("}")
-
-        if json_start != -1 and json_end != -1 and json_end > json_start:
-            cleaned = cleaned[json_start : json_end + 1]
-
-        return cleaned.strip()
+    # 已移除自定义的 _clean_json_response 方法，统一使用 src.utils.json_parser.extract_and_parse_json


 # 全局规划器实例
--- a/src/plugins/built_in/affinity_flow_chatter/user_profile_tool.py
+++ b/src/plugins/built_in/affinity_flow_chatter/user_profile_tool.py
@@ -16,6 +16,7 @@ from src.common.logger import get_logger
 from src.config.config import global_config, model_config
 from src.llm_models.utils_model import LLMRequest
 from src.plugin_system import BaseTool, ToolParamType
+from src.utils.json_parser import extract_and_parse_json

 logger = get_logger("user_profile_tool")

@@ -269,9 +270,12 @@ class UserProfileTool(BaseTool):
                logger.warning("LLM未返回有效响应")
                return None

-            # 清理并解析响应
-            cleaned_response = self._clean_llm_json_response(llm_response)
-            response_data = orjson.loads(cleaned_response)
+            # 使用统一的 JSON 解析工具
+            response_data = extract_and_parse_json(llm_response, strict=False)
+            if not response_data or not isinstance(response_data, dict):
+                logger.error("LLM响应JSON解析失败")
+                logger.debug(f"LLM原始响应: {llm_response[:500] if llm_response else 'N/A'}")
+                return None

            # 提取最终决定的数据
            final_profile = {
@@ -285,11 +289,6 @@ class UserProfileTool(BaseTool):
            logger.debug(f"决策理由: {response_data.get('reasoning', '无')}")

            return final_profile
-
-        except orjson.JSONDecodeError as e:
-            logger.error(f"LLM响应JSON解析失败: {e}")
-            logger.debug(f"LLM原始响应: {llm_response if 'llm_response' in locals() else 'N/A'}")
-            return None
        except Exception as e:
            logger.error(f"LLM决策失败: {e}", exc_info=True)
            return None
@@ -336,35 +335,4 @@ class UserProfileTool(BaseTool):
            logger.error(f"更新用户画像到数据库失败: {e}", exc_info=True)
            raise

-    def _clean_llm_json_response(self, response: str) -> str:
-        """清理LLM响应，移除可能的JSON格式标记
-
-        Args:
-            response: LLM原始响应
-
-        Returns:
-            str: 清理后的JSON字符串
-        """
-        try:
-            import re
-
-            cleaned = response.strip()
-
-            # 移除 ```json 或 ``` 等标记
-            cleaned = re.sub(r"^```(?:json)?\s*", "", cleaned, flags=re.MULTILINE | re.IGNORECASE)
-            cleaned = re.sub(r"\s*```$", "", cleaned, flags=re.MULTILINE)
-
-            # 尝试找到JSON对象的开始和结束
-            json_start = cleaned.find("{")
-            json_end = cleaned.rfind("}")
-
-            if json_start != -1 and json_end != -1 and json_end > json_start:
-                cleaned = cleaned[json_start:json_end + 1]
-
-            cleaned = cleaned.strip()
-
-            return cleaned
-
-        except Exception as e:
-            logger.warning(f"清理LLM响应失败: {e}")
-            return response
+    # 已移除自定义的 _clean_llm_json_response 方法，统一使用 src.utils.json_parser.extract_and_parse_json
--- a/src/utils/json_parser.py
+++ b/src/utils/json_parser.py
@@ -0,0 +1,240 @@
+"""
+统一的 JSON 解析工具模块
+
+提供统一的 LLM 响应 JSON 解析功能，使用 json_repair 库进行修复，
+简化代码并提高解析成功率。
+"""
+
+import re
+from typing import Any
+
+import orjson
+from json_repair import repair_json
+
+from src.common.logger import get_logger
+
+logger = get_logger(__name__)
+
+
+def extract_and_parse_json(response: str, *, strict: bool = False) -> dict[str, Any] | list | None:
+    """
+    从 LLM 响应中提取并解析 JSON
+    
+    处理策略：
+    1. 清理 Markdown 代码块标记（```json 和 ```）
+    2. 提取 JSON 对象或数组
+    3. 使用 json_repair 修复格式问题
+    4. 解析为 Python 对象
+    
+    Args:
+        response: LLM 响应字符串
+        strict: 严格模式，如果为 True 则解析失败时返回 None，否则尝试容错处理
+        
+    Returns:
+        解析后的 dict 或 list，失败时返回 None
+        
+    Examples:
+        >>> extract_and_parse_json('```json\\n{"key": "value"}\\n```')
+        {'key': 'value'}
+        
+        >>> extract_and_parse_json('Some text {"key": "value"} more text')
+        {'key': 'value'}
+        
+        >>> extract_and_parse_json('[{"a": 1}, {"b": 2}]')
+        [{'a': 1}, {'b': 2}]
+    """
+    if not response:
+        logger.debug("空响应，无法解析 JSON")
+        return None
+    
+    try:
+        # 步骤 1: 清理响应
+        cleaned = _clean_llm_response(response)
+        
+        if not cleaned:
+            logger.warning("清理后的响应为空")
+            return None
+        
+        # 步骤 2: 尝试直接解析
+        try:
+            result = orjson.loads(cleaned)
+            logger.debug(f"✅ JSON 直接解析成功，类型: {type(result).__name__}")
+            return result
+        except Exception as direct_error:
+            logger.debug(f"直接解析失败: {type(direct_error).__name__}: {direct_error}")
+        
+        # 步骤 3: 使用 json_repair 修复并解析
+        try:
+            repaired = repair_json(cleaned)
+            
+            # repair_json 可能返回字符串或已解析的对象
+            if isinstance(repaired, str):
+                result = orjson.loads(repaired)
+                logger.debug(f"✅ JSON 修复后解析成功（字符串模式），类型: {type(result).__name__}")
+            else:
+                result = repaired
+                logger.debug(f"✅ JSON 修复后解析成功（对象模式），类型: {type(result).__name__}")
+            
+            return result
+            
+        except Exception as repair_error:
+            logger.warning(f"JSON 修复失败: {type(repair_error).__name__}: {repair_error}")
+            
+            if strict:
+                logger.error(f"严格模式下解析失败，响应片段: {cleaned[:200]}")
+                return None
+            
+            # 最后的容错尝试：返回空字典或空列表
+            if cleaned.strip().startswith("["):
+                logger.warning("返回空列表作为容错")
+                return []
+            else:
+                logger.warning("返回空字典作为容错")
+                return {}
+                
+    except Exception as e:
+        logger.error(f"❌ JSON 解析过程出现异常: {type(e).__name__}: {e}")
+        if strict:
+            return None
+        return {} if not response.strip().startswith("[") else []
+
+
+def _clean_llm_response(response: str) -> str:
+    """
+    清理 LLM 响应，提取 JSON 部分
+    
+    处理步骤：
+    1. 移除 Markdown 代码块标记（```json 和 ```）
+    2. 提取第一个完整的 JSON 对象 {...} 或数组 [...]
+    3. 清理多余的空格和换行
+    
+    Args:
+        response: 原始 LLM 响应
+        
+    Returns:
+        清理后的 JSON 字符串
+    """
+    if not response:
+        return ""
+    
+    cleaned = response.strip()
+    
+    # 移除 Markdown 代码块标记
+    # 匹配 ```json ... ``` 或 ``` ... ```
+    code_block_patterns = [
+        r"```json\s*(.*?)```",  # ```json ... ```
+        r"```\s*(.*?)```",      # ``` ... ```
+    ]
+    
+    for pattern in code_block_patterns:
+        match = re.search(pattern, cleaned, re.IGNORECASE | re.DOTALL)
+        if match:
+            cleaned = match.group(1).strip()
+            logger.debug(f"从 Markdown 代码块中提取内容，长度: {len(cleaned)}")
+            break
+    
+    # 提取 JSON 对象或数组
+    # 优先查找对象 {...}，其次查找数组 [...]
+    for start_char, end_char in [("{", "}"), ("[", "]")]:
+        start_idx = cleaned.find(start_char)
+        if start_idx != -1:
+            # 使用栈匹配找到对应的结束符
+            extracted = _extract_balanced_json(cleaned, start_idx, start_char, end_char)
+            if extracted:
+                logger.debug(f"提取到 {start_char}...{end_char} 结构，长度: {len(extracted)}")
+                return extracted
+    
+    # 如果没有找到明确的 JSON 结构，返回清理后的原始内容
+    logger.debug("未找到明确的 JSON 结构，返回清理后的原始内容")
+    return cleaned
+
+
+def _extract_balanced_json(text: str, start_idx: int, start_char: str, end_char: str) -> str | None:
+    """
+    从指定位置提取平衡的 JSON 结构
+    
+    使用栈匹配算法找到对应的结束符，处理嵌套和字符串中的特殊字符
+    
+    Args:
+        text: 源文本
+        start_idx: 起始字符的索引
+        start_char: 起始字符（{ 或 [）
+        end_char: 结束字符（} 或 ]）
+        
+    Returns:
+        提取的 JSON 字符串，失败时返回 None
+    """
+    depth = 0
+    in_string = False
+    escape_next = False
+    
+    for i in range(start_idx, len(text)):
+        char = text[i]
+        
+        # 处理转义字符
+        if escape_next:
+            escape_next = False
+            continue
+        
+        if char == "\\":
+            escape_next = True
+            continue
+        
+        # 处理字符串
+        if char == '"':
+            in_string = not in_string
+            continue
+        
+        # 只在非字符串内处理括号
+        if not in_string:
+            if char == start_char:
+                depth += 1
+            elif char == end_char:
+                depth -= 1
+                if depth == 0:
+                    # 找到匹配的结束符
+                    return text[start_idx : i + 1].strip()
+    
+    # 没有找到匹配的结束符
+    logger.debug(f"未找到匹配的 {end_char}，深度: {depth}")
+    return None
+
+
+def safe_parse_json(json_str: str, default: Any = None) -> Any:
+    """
+    安全解析 JSON，失败时返回默认值
+    
+    Args:
+        json_str: JSON 字符串
+        default: 解析失败时返回的默认值
+        
+    Returns:
+        解析结果或默认值
+    """
+    try:
+        result = extract_and_parse_json(json_str, strict=False)
+        return result if result is not None else default
+    except Exception as e:
+        logger.warning(f"安全解析 JSON 失败: {e}")
+        return default
+
+
+def extract_json_field(response: str, field_name: str, default: Any = None) -> Any:
+    """
+    从 LLM 响应中提取特定字段的值
+    
+    Args:
+        response: LLM 响应
+        field_name: 字段名
+        default: 字段不存在时的默认值
+        
+    Returns:
+        字段值或默认值
+    """
+    parsed = extract_and_parse_json(response, strict=False)
+    
+    if isinstance(parsed, dict):
+        return parsed.get(field_name, default)
+    
+    logger.warning(f"解析结果不是字典，无法提取字段 '{field_name}'")
+    return default