Spaces:

MCP-1st-Birthday
/

Incident-Copilot-MCP

Configuration error

App Files Files Community

DIV-45 commited on 10 days ago

Commit

cf33275

0 Parent(s):

Initial commit: Incident Copilot MCP

Browse files

Files changed (13) hide show

.gitignore +31 -0
README.md +201 -0
app.py +615 -0
mcp_servers/__init__.py +7 -0
mcp_servers/blaxel_server.py +93 -0
mcp_servers/gateway.py +35 -0
mcp_servers/logs_server.py +192 -0
mcp_servers/modal_server.py +84 -0
mcp_servers/nebius_server.py +116 -0
mcp_servers/voice_elevenlabs_server.py +107 -0
modal_deep_analysis_app.py +92 -0
requirements.txt +10 -0
run_gateway.py +18 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,31 @@

+# Python bytecode and cache
+__pycache__/
+*.py[cod]
+*.pyo
+# Virtual environments
+.venv/
+venv/
+.envrc
+# Environment & secret files
+.env
+.env.*
+*.env
+*.env.*
+# Editor / IDE
+.vscode/
+.idea/
+*.swp
+*.swo
+# OS files
+.DS_Store
+Thumbs.db
+# Logs
+*.log
+# Misc
+.python-version

README.md ADDED Viewed

	@@ -0,0 +1,201 @@

+# Incident & Error Copilot (MCP + Gradio)
+Incident & Error Copilot is a small incident-triage assistant built for the MCP hackathon.
+It uses **Model Context Protocol (MCP)** tools to orchestrate:
+- Logs & metrics (Neon Postgres or synthetic data)
+- Nebius Token Factory (LLM incident summarization)
+- Modal (deep log analysis)
+- Blaxel (sandbox diagnostics)
+- ElevenLabs (text-to-speech voice summary)
+The UI is a single Gradio app; all tools are exposed via a local MCP HTTP gateway.
+---
+## Architecture
+- **Gradio app**: `app.py`
+  - Chat interface for incident descriptions / stack traces
+  - Buttons for:
+    - Play Voice Summary (ElevenLabs)
+    - Deep Log Analysis (Modal)
+    - Sandbox Health Check (Blaxel)
+    - Auto Triage Incident (agent flow combining all tools)
+  - Talks to MCP tools over HTTP using the official `mcp` Python client.
+- **MCP gateway**: `run_gateway.py` + `mcp_servers/gateway.py`
+  - `mcp_servers/gateway.py` mounts 5 FastMCP servers under different paths:
+    - `/logs`   → `mcp_servers/logs_server.py`
+    - `/voice`  → `mcp_servers/voice_elevenlabs_server.py`
+    - `/nebius` → `mcp_servers/nebius_server.py`
+    - `/modal`  → `mcp_servers/modal_server.py`
+    - `/blaxel` → `mcp_servers/blaxel_server.py`
+  - Each FastMCP server exposes a Streamable HTTP endpoint at `/mcp`.
+  - `run_gateway.py` starts a local uvicorn server on `http://127.0.0.1:8004` and
+    mounts the gateway app. The Gradio app calls:
+    - `http://127.0.0.1:8004/logs/mcp`
+    - `http://127.0.0.1:8004/voice/mcp`
+    - `http://127.0.0.1:8004/nebius/mcp`
+    - `http://127.0.0.1:8004/modal/mcp`
+    - `http://127.0.0.1:8004/blaxel/mcp`
+- **Individual MCP servers** (all FastMCP, Python):
+  - `mcp_servers/logs_server.py`
+    - Tools: `get_logs`, `summarize_logs`
+    - Uses Neon Postgres when `NEON_DATABASE_URL` / `DATABASE_URL` is set; otherwise
+      falls back to a small in-memory synthetic log store.
+  - `mcp_servers/nebius_server.py`
+    - Tool: `nebius_incident_summary`
+    - Calls Nebius Token Factory (OpenAI-compatible) to produce a structured incident
+      summary (title, severity, impact, root cause, actions).
+  - `mcp_servers/modal_server.py`
+    - Tool: `deep_log_analysis`
+    - Sends logs to a Modal web endpoint specified by `MODAL_DEEP_ANALYSIS_URL` for
+      heavy / statistical analysis.
+  - `mcp_servers/voice_elevenlabs_server.py`
+    - Tools: `list_voices`, `generate_incident_summary_audio`
+    - Wraps ElevenLabs TTS API and returns base64-encoded MP3 data.
+  - `mcp_servers/blaxel_server.py`
+    - Tool: `run_simple_diagnostic`
+    - Uses the Blaxel Python SDK to create/reuse a sandbox and run a small shell
+      command, returning stdout/stderr.
+> Note: there is also a Blaxel-hosted unified MCP (`mcp-sgllk`) that mirrors these
+> tools, but the demo flow is designed to work entirely with the local MCP gateway.
+---
+## Environment & dependencies
+- **Python**: 3.11+
+- **Dependencies**: see `requirements.txt` (Gradio, mcp, httpx, openai, modal,
+  blaxel, psycopg, uvicorn, python-dotenv, etc.).
+Secrets & configuration are loaded via standard environment variables. For local
+runs, they are typically provided via `.env.blaxel`:
+- `NEON_DATABASE_URL` – Postgres/Neon connection string for logs (optional; if
+  absent, synthetic logs are used).
+- `NEBIUS_API_KEY`, `NEBIUS_MODEL_ID`, `NEBIUS_BASE_URL` – Nebius Token Factory
+  API key and model config.
+- `ELEVENLABS_API_KEY` – ElevenLabs TTS key.
+- `MODAL_DEEP_ANALYSIS_URL` – Modal web endpoint for deep analysis.
+- `MODAL_AUTH_TOKEN` – Optional bearer token for Modal.
+- `BL_API_KEY`, `BL_WORKSPACE` – Blaxel credentials (used by Blaxel SDK and also
+  forwarded as headers in MCP requests; safe to omit for pure local demo if SDK is
+  already authenticated via CLI).
+Example (redacted) `.env.blaxel` is already present in this repo for reference –
+replace it with your own keys if you fork.
+---
+## Running locally (demo flow)
+1. **Install dependencies**
+   ```bash
+   cd mcp-action
+   pip install -r requirements.txt
+   ```
+2. **Populate secrets**
+   - Copy `.env.blaxel` (already present) or create your own with the variables
+     described above.
+3. **Start the MCP gateway** (Terminal 1)
+   ```bash
+   cd mcp-action
+   python run_gateway.py
+   ```
+   This runs uvicorn on `http://127.0.0.1:8004` and mounts all MCP servers.
+4. **Start the Gradio app** (Terminal 2)
+   ```bash
+   cd mcp-action
+   python app.py
+   ```
+5. **Open the UI**
+   - Visit the URL printed by Gradio (typically `http://127.0.0.1:7860`).
+6. **Exercise the tools**
+   - Paste an incident description or stack trace into the chat.
+   - Click:
+     - **Play Voice Summary (ElevenLabs)** – generates and plays an audio recap.
+     - **Deep Log Analysis (Modal)** – calls the deep analysis MCP and shows a
+       JSON summary plus a human-readable headline.
+     - **Sandbox Health Check (Blaxel)** – runs a simple command in a Blaxel
+       sandbox and prints stdout/stderr.
+     - **Auto Triage Incident (Agent)** – orchestrates logs → Nebius → Modal →
+       Blaxel and produces an end-to-end triage report.
+---
+## Deploying to Hugging Face Spaces (one-process, non-Docker)
+Hugging Face Spaces can run both the MCP gateway and the Gradio app inside the
+same container. One simple pattern is:
+1. Create a new **Gradio** Space and point it at this repository.
+2. In the Space **Settings → App** (or the "Advanced" start command field), set
+   the start command to:
+   ```bash
+   python run_gateway.py & python app.py
+   ```
+   This starts the uvicorn MCP gateway in the background, then launches the
+   Gradio UI in the foreground.
+3. Add the same `.env.blaxel` variables to the Space **Secrets / Environment
+   variables** tab so the MCP servers can reach Neon, Nebius, Modal, Blaxel and
+   ElevenLabs.
+4. Once the Space builds, open it and exercise the same buttons as in local
+   dev. The health of the MCP gateway can be seen in the Space logs.
+> If you prefer Docker-based Spaces or other hosting, the same pattern applies:
+> run `run_gateway.py` and `app.py` in the same container, expose only the
+> Gradio port, and keep the MCP gateway internal to the container.
+---
+## Suggested demo script
+A short flow that shows everything end-to-end:
+1. **Intro**
+   - Briefly explain the goal: MCP-powered incident triage across multiple infra
+     providers.
+2. **Chat-based incident summary**
+   - Paste a realistic stack trace / incident description.
+   - Show the Nebius-generated structured summary (severity, impact, root cause,
+     recommended actions).
+3. **Voice recap**
+   - Click **Play Voice Summary (ElevenLabs)** and play a few seconds of the
+     generated MP3.
+4. **Deep log analysis**
+   - Click **Deep Log Analysis (Modal)**.
+   - Highlight the log counts, severity distribution, latest error, etc.
+5. **Sandbox diagnostics**
+   - Click **Sandbox Health Check (Blaxel)**.
+   - Show the sandbox command output (e.g. `uname -a` and diagnostic text).
+6. **Agentic auto-triage**
+   - Click **Auto Triage Incident (Agent)**.
+   - Scroll through the markdown report, calling out how it orchestrates Logs →
+     Nebius → Modal → Blaxel and produces a concise, judge-friendly summary.
+This is the flow intended for the hackathon submission and demo video.

app.py ADDED Viewed

	@@ -0,0 +1,615 @@

+import ast
+import base64
+import json
+import os
+import tempfile
+from typing import Any, Dict, List
+from dotenv import load_dotenv
+import gradio as gr
+from mcp import ClientSession, types as mcp_types
+from mcp.client.streamable_http import streamablehttp_client
+# Load local environment (for dev) including optional .env.blaxel with Blaxel creds
+load_dotenv()
+load_dotenv(".env.blaxel", override=False)
+# All tools now talk to a local MCP gateway (uvicorn + mcp_servers.gateway)
+_MCP_GATEWAY_BASE_URL = "http://127.0.0.1:8004"
+_BL_API_KEY = os.getenv("BL_API_KEY")
+_BL_WORKSPACE = os.getenv("BL_WORKSPACE")
+_MCP_HEADERS: Dict[str, str] = {}
+if _BL_API_KEY:
+    # Support both generic Authorization and Blaxel-specific header names (harmless for local)
+    _MCP_HEADERS["Authorization"] = f"Bearer {_BL_API_KEY}"
+    _MCP_HEADERS["X-Blaxel-Authorization"] = f"Bearer {_BL_API_KEY}"
+if _BL_WORKSPACE:
+    _MCP_HEADERS["X-Blaxel-Workspace"] = _BL_WORKSPACE
+LOGS_SERVER_URL = f"{_MCP_GATEWAY_BASE_URL}/logs/mcp"
+VOICE_SERVER_URL = f"{_MCP_GATEWAY_BASE_URL}/voice/mcp"
+NEBIUS_SERVER_URL = f"{_MCP_GATEWAY_BASE_URL}/nebius/mcp"
+MODAL_SERVER_URL = f"{_MCP_GATEWAY_BASE_URL}/modal/mcp"
+BLAXEL_SERVER_URL = f"{_MCP_GATEWAY_BASE_URL}/blaxel/mcp"
+def _prepare_spoken_text(markdown: str) -> str:
+    """Convert the rich incident markdown into a concise, TTS-friendly summary.
+    - Strips markdown syntax like **, ``, and leading list markers.
+    - Collapses newlines into sentences.
+    """
+    # Remove bold/inline code markers
+    text = markdown.replace("**", "").replace("`", "")
+    # Remove leading list markers like "- "
+    lines = []
+    for line in text.splitlines():
+        stripped = line.lstrip()
+        if stripped.startswith("- "):
+            stripped = stripped[2:]
+        lines.append(stripped)
+    text = " ".join(lines)
+    # Normalize whitespace
+    text = " ".join(text.split())
+    # Add a small preface so it sounds more natural
+    return f"Here is a short incident recap: {text}".strip()
+async def _summarize_logs_via_mcp(service: str = "recs-api", env: str = "prod") -> Dict[str, Any]:
+    async with streamablehttp_client(LOGS_SERVER_URL, headers=_MCP_HEADERS) as (read, write, _):
+        async with ClientSession(read, write) as session:
+            await session.initialize()
+            result = await session.call_tool(
+                "summarize_logs",
+                arguments={"service": service, "env": env},
+            )
+            # Prefer structured content but fall back to parsing JSON text
+            data: Dict[str, Any] = result.structuredContent or {}
+            if not data and getattr(result, "content", None):
+                first_block = result.content[0]
+                if isinstance(first_block, mcp_types.TextContent):
+                    try:
+                        data = json.loads(first_block.text)
+                    except Exception:
+                        data = {}
+            if not isinstance(data, dict):
+                return {}
+            return data
+async def _generate_voice_summary_via_mcp(text: str) -> str:
+    if not text.strip():
+        raise ValueError("No content available to synthesize.")
+    async with streamablehttp_client(VOICE_SERVER_URL, headers=_MCP_HEADERS) as (read, write, _):
+        async with ClientSession(read, write) as session:
+            await session.initialize()
+            voices_result = await session.call_tool("list_voices", arguments={})
+            # Prefer structuredContent but fall back to parsing unstructured JSON
+            data: Dict[str, Any] = voices_result.structuredContent or {}
+            if not data and getattr(voices_result, "content", None):
+                first_block = voices_result.content[0]
+                if isinstance(first_block, mcp_types.TextContent):
+                    try:
+                        data = json.loads(first_block.text)
+                    except Exception:
+                        data = {}
+            voices_list = data.get("voices") or data.get("data") or []
+            voice_id = None
+            if isinstance(voices_list, list) and voices_list:
+                first_voice = voices_list[0]
+                if isinstance(first_voice, dict):
+                    voice_id = first_voice.get("voice_id") or first_voice.get("id")
+            if not voice_id:
+                raise RuntimeError("No ElevenLabs voices available from MCP server.")
+            audio_result = await session.call_tool(
+                "generate_incident_summary_audio",
+                arguments={"text": text, "voice_id": voice_id},
+            )
+            # Prefer structured JSON but fall back to parsing text blocks
+            audio_data: Dict[str, Any] = audio_result.structuredContent or {}
+            if not audio_data and getattr(audio_result, "content", None):
+                first_block = audio_result.content[0]
+                if isinstance(first_block, mcp_types.TextContent):
+                    try:
+                        audio_data = json.loads(first_block.text)
+                    except Exception:
+                        audio_data = {}
+            audio_b64 = audio_data.get("audio_base64")
+            if not audio_b64:
+                # Try one more time to extract from text content (JSON or Python literal)
+                if getattr(audio_result, "content", None):
+                    first_block = audio_result.content[0]
+                    if isinstance(first_block, mcp_types.TextContent):
+                        raw_text = first_block.text
+                        parsed: Dict[str, Any] = {}
+                        try:
+                            parsed = json.loads(raw_text)
+                        except Exception:
+                            try:
+                                parsed_obj = ast.literal_eval(raw_text)
+                                if isinstance(parsed_obj, dict):
+                                    parsed = parsed_obj
+                            except Exception:
+                                parsed = {}
+                        if isinstance(parsed, dict):
+                            audio_b64 = parsed.get("audio_base64") or audio_b64
+                            if audio_b64:
+                                audio_data = parsed
+            if not audio_b64:
+                # Try to surface any error message returned by the MCP tool
+                error_details: Dict[str, Any] = {}
+                if audio_data:
+                    error_details["structured"] = audio_data
+                if getattr(audio_result, "content", None):
+                    first_block = audio_result.content[0]
+                    if isinstance(first_block, mcp_types.TextContent):
+                        error_details["text"] = first_block.text
+                raise RuntimeError(
+                    "No audio_base64 field returned from ElevenLabs MCP server. "
+                    f"Details: {error_details}"
+                )
+            audio_bytes = base64.b64decode(audio_b64)
+            with tempfile.NamedTemporaryFile(delete=False, suffix=".mp3") as tmp:
+                tmp.write(audio_bytes)
+                path = tmp.name
+    return path
+async def _get_logs_via_mcp(service: str = "recs-api", env: str = "prod") -> List[Dict[str, Any]]:
+    """Fetch raw logs via MCP (from Neon-backed or synthetic store)."""
+    async with streamablehttp_client(LOGS_SERVER_URL, headers=_MCP_HEADERS) as (read, write, _):
+        async with ClientSession(read, write) as session:
+            await session.initialize()
+            result = await session.call_tool(
+                "get_logs",
+                arguments={"service": service, "env": env},
+            )
+            raw = result.structuredContent
+            # FastMCP wraps json_response results under a "result" key.
+            logs: List[Dict[str, Any]] = []
+            if isinstance(raw, list):
+                logs = raw
+            elif isinstance(raw, dict):
+                maybe_list = raw.get("result") or raw.get("logs")
+                if isinstance(maybe_list, list):
+                    logs = maybe_list
+            # If structuredContent is empty, fall back to parsing text (if present).
+            if not logs and getattr(result, "content", None):
+                first_block = result.content[0]
+                if isinstance(first_block, mcp_types.TextContent):
+                    try:
+                        parsed = json.loads(first_block.text)
+                        if isinstance(parsed, list):
+                            logs = parsed
+                        elif isinstance(parsed, dict):
+                            maybe_list = parsed.get("result") or parsed.get("logs")
+                            if isinstance(maybe_list, list):
+                                logs = maybe_list
+                    except Exception:
+                        logs = []
+    return logs
+async def _nebius_incident_summary_via_mcp(
+    user_message: str,
+    logs_summary_text: str,
+) -> Dict[str, Any]:
+    async with streamablehttp_client(NEBIUS_SERVER_URL, headers=_MCP_HEADERS) as (read, write, _):
+        async with ClientSession(read, write) as session:
+            await session.initialize()
+            result = await session.call_tool(
+                "nebius_incident_summary",
+                arguments={
+                    "user_description": user_message,
+                    "logs_summary": logs_summary_text,
+                },
+            )
+            return result.structuredContent or {}
+async def _modal_deep_analysis_via_mcp() -> str:
+    """Call the Modal MCP server for deep log analysis and pretty-print the result."""
+    logs = await _get_logs_via_mcp(service="recs-api", env="prod")
+    async with streamablehttp_client(MODAL_SERVER_URL, headers=_MCP_HEADERS) as (read, write, _):
+        async with ClientSession(read, write) as session:
+            await session.initialize()
+            result = await session.call_tool(
+                "deep_log_analysis",
+                arguments={"service": "recs-api", "env": "prod", "logs": logs},
+            )
+            raw = result.structuredContent or {}
+    # FastMCP with json_response=True wraps tool return values under
+    # a top-level "result" key. Unwrap to get the actual analysis dict.
+    if isinstance(raw, dict) and "result" in raw and isinstance(raw["result"], dict):
+        data: Dict[str, Any] = raw["result"]
+    else:
+        data = raw if isinstance(raw, dict) else {}
+    if not isinstance(data, dict):
+        return f"Deep analysis (Modal) returned non-dict result: {raw!r}"
+    # Build a concise human-readable headline from the analysis payload.
+    log_count = data.get("log_count")
+    sev_counts = data.get("severity_counts") or {}
+    top_region = data.get("top_region")
+    latest_error = data.get("latest_error") or {}
+    sev_parts = []
+    if isinstance(sev_counts, dict):
+        for sev, cnt in sev_counts.items():
+            sev_parts.append(f"{sev}={cnt}")
+    sev_str = ", ".join(sev_parts) if sev_parts else "no severity distribution available"
+    latest_err_msg = latest_error.get("message") if isinstance(latest_error, dict) else None
+    latest_err_ts = latest_error.get("timestamp") if isinstance(latest_error, dict) else None
+    headline_parts = []
+    if isinstance(log_count, int):
+        headline_parts.append(f"Analyzed {log_count} logs")
+    if sev_parts:
+        headline_parts.append(f"severity mix: {sev_str}")
+    if top_region:
+        headline_parts.append(f"top region: {top_region}")
+    headline = "; ".join(headline_parts) if headline_parts else "Deep log analysis summary"
+    if latest_err_msg:
+        if latest_err_ts:
+            headline += f". Latest error at {latest_err_ts}: {latest_err_msg}"
+        else:
+            headline += f". Latest error: {latest_err_msg}"
+    pretty = json.dumps(data, indent=2)
+    return (
+        f"**Deep Analysis (Modal)**\n\n{headline}\n\n"
+        f"```json\n{pretty}\n```"
+    )
+async def _blaxel_run_diagnostic_via_mcp() -> str:
+    """Run a simple sandbox diagnostic via Blaxel MCP server."""
+    async with streamablehttp_client(BLAXEL_SERVER_URL, headers=_MCP_HEADERS) as (read, write, _):
+        async with ClientSession(read, write) as session:
+            await session.initialize()
+            result = await session.call_tool(
+                "run_simple_diagnostic",
+                arguments={
+                    # A slightly richer default command that demonstrates the
+                    # sandbox is actually running OS-level commands.
+                    "command": (
+                        "echo '[sandbox] incident diagnostics start' && "
+                        "uname -a && echo 'sandbox diagnostics ok'"
+                    ),
+                },
+            )
+            raw = result.structuredContent
+    # FastMCP with json_response=True wraps tool return values under
+    # a top-level "result" key. Unwrap that so we get the actual
+    # diagnostics dict from the Blaxel MCP server.
+    if isinstance(raw, dict) and "result" in raw and isinstance(raw["result"], dict):
+        data: Dict[str, Any] = raw["result"]
+    else:
+        data = raw if isinstance(raw, dict) else {}
+    # If we still don't have structured data, surface any textual error
+    # returned by the MCP tool so the user can see what went wrong
+    # (e.g. auth issues, quota, etc.).
+    if not data and getattr(result, "content", None):
+        first_block = result.content[0]
+        if isinstance(first_block, mcp_types.TextContent):
+            text = first_block.text.strip()
+            return (
+                "**Sandbox Diagnostics (Blaxel)**\n"
+                "Blaxel MCP did not return structured diagnostics data. Raw response:\n\n"
+                f"```text\n{text}\n```"
+            )
+    if not isinstance(data, dict):
+        return f"Diagnostics (Blaxel) returned non-dict result: {raw!r}"
+    stdout = str(data.get("stdout", "")).strip()
+    stderr = str(data.get("stderr", "")).strip()
+    exit_code = data.get("exit_code")
+    parts = ["**Sandbox Diagnostics (Blaxel)**"]
+    parts.append(f"Exit code: {exit_code}")
+    if stdout:
+        parts.append("\n**stdout:**\n")
+        parts.append(f"```\n{stdout}\n```")
+    if stderr:
+        parts.append("\n**stderr:**\n")
+        parts.append(f"```\n{stderr}\n```")
+    return "\n".join(parts)
+async def _auto_triage_incident(history: List[Dict[str, Any]]) -> str:
+    """Agentic triage flow that orchestrates logs, Nebius, Modal, and Blaxel.
+    It reads the latest user incident description, pulls logs via the Logs MCP
+    (Neon-backed), runs Nebius for a structured incident summary, then runs
+    Modal deep analysis and Blaxel sandbox diagnostics. The result is a
+    markdown report describing the steps taken and the findings.
+    """
+    # Find the latest user message to triage
+    last_user = None
+    for msg in reversed(history or []):
+        if msg.get("role") == "user":
+            last_user = msg
+            break
+    if not last_user:
+        return (
+            "**Agent Triage Report**\n\n"
+            "No user incident description found yet. "
+            "Please describe an incident in the chat first."
+        )
+    user_message = str(last_user.get("content", ""))
+    steps: List[str] = []
+    steps.append("1. Read your latest incident description.")
+    logs_summary_text = "No logs summary available."
+    logs_data: Dict[str, Any] = {}
+    nebius_data: Dict[str, Any] = {}
+    # Step 2: Pull logs and build a brief summary
+    try:
+        logs_data = await _summarize_logs_via_mcp(service="recs-api", env="prod")
+        logs_summary_text = logs_data.get("summary", logs_summary_text)
+        steps.append(
+            "2. Pulled recent logs for `recs-api` (prod) from Neon via the Logs MCP server."
+        )
+    except Exception as exc:  # pragma: no cover - defensive
+        logs_summary_text = f"(Error fetching logs summary from MCP: {exc})"
+        steps.append("2. Attempted to pull logs from Neon but hit an error.")
+    # Step 3: Nebius incident summary
+    try:
+        nebius_data = await _nebius_incident_summary_via_mcp(
+            user_message=user_message,
+            logs_summary_text=logs_summary_text,
+        )
+        severity = nebius_data.get("severity", "Unknown")
+        steps.append(
+            f"3. Generated a structured incident summary using Nebius (severity: {severity})."
+        )
+    except Exception as exc:  # pragma: no cover - defensive
+        nebius_data = {}
+        steps.append(f"3. Nebius incident summarization failed: {exc}.")
+    # Step 4: Modal deep analysis
+    try:
+        modal_section = await _modal_deep_analysis_via_mcp()
+        steps.append(
+            "4. Ran deep log analysis with Modal over the same Neon-backed logs."
+        )
+    except Exception as exc:  # pragma: no cover - defensive
+        modal_section = f"Deep analysis (Modal) failed: {exc}"
+        steps.append("4. Attempted deep log analysis with Modal but hit an error.")
+    # Step 5: Blaxel sandbox diagnostics
+    try:
+        blaxel_section = await _blaxel_run_diagnostic_via_mcp()
+        steps.append(
+            "5. Executed sandbox diagnostics in a Blaxel VM to validate basic system health."
+        )
+    except Exception as exc:  # pragma: no cover - defensive
+        blaxel_section = f"Sandbox diagnostics (Blaxel) failed: {exc}"
+        steps.append("5. Attempted sandbox diagnostics with Blaxel but hit an error.")
+    # Format the Nebius summary section
+    if nebius_data:
+        title = nebius_data.get("title", "Incident Summary")
+        severity = nebius_data.get("severity", "Unknown")
+        impact = nebius_data.get("impact", "Not specified")
+        root_cause = nebius_data.get("root_cause", "Not specified")
+        actions = nebius_data.get("actions", [])
+        if isinstance(actions, list):
+            actions_text = "\n".join(f"- {a}" for a in actions)
+        else:
+            actions_text = str(actions)
+        nebius_section = (
+            f"**{title}** (severity: {severity})\n\n"
+            f"**Impact:** {impact}\n\n"
+            f"**Probable root cause:** {root_cause}\n\n"
+            "**Log summary (recs-api, prod):**\n"
+            f"{logs_summary_text}\n\n"
+            "**Recommended actions:**\n"
+            f"{actions_text}\n"
+        )
+    else:
+        nebius_section = (
+            "**Incident Summary:** Nebius summarization was not available.\n\n"
+            "**Log summary (recs-api, prod):**\n"
+            f"{logs_summary_text}\n"
+        )
+    steps_md = "\n".join(f"- {s}" for s in steps)
+    report = (
+        "**Agent Triage Report**\n\n"
+        "**Steps taken:**\n"
+        f"{steps_md}\n\n"
+        "---\n\n"
+        "### Incident Summary (Nebius)\n\n"
+        f"{nebius_section}\n\n"
+        "### Deep Log Analysis (Modal)\n\n"
+        f"{modal_section}\n\n"
+        "### Sandbox Diagnostics (Blaxel)\n\n"
+        f"{blaxel_section}\n"
+    )
+    return report
+async def _chat_fn(message: str, history: List[Dict[str, Any]]) -> str:
+    try:
+        logs_data = await _summarize_logs_via_mcp(service="recs-api", env="prod")
+        logs_summary_text = logs_data.get("summary", "No logs summary available.")
+    except Exception as exc:
+        logs_summary_text = f"(Error fetching logs summary from MCP: {exc})"
+    try:
+        nebius_data = await _nebius_incident_summary_via_mcp(
+            user_message=message,
+            logs_summary_text=logs_summary_text,
+        )
+    except Exception as exc:
+        # Fall back to a simpler message if Nebius is unavailable
+        return (
+            "Thanks for the incident description.\n\n"
+            "Here is a synthetic log summary for service `recs-api` (prod):\n"
+            f"{logs_summary_text}\n\n"
+            f"(Nebius incident summary call failed: {exc})"
+        )
+    title = nebius_data.get("title", "Incident Summary")
+    severity = nebius_data.get("severity", "Unknown")
+    impact = nebius_data.get("impact", "Not specified")
+    root_cause = nebius_data.get("root_cause", "Not specified")
+    actions = nebius_data.get("actions", [])
+    if isinstance(actions, list):
+        actions_text = "\n".join(f"- {a}" for a in actions)
+    else:
+        actions_text = str(actions)
+    reply = (
+        f"**{title}** (severity: {severity})\n\n"
+        f"**Impact:** {impact}\n\n"
+        f"**Probable root cause:** {root_cause}\n\n"
+        "**Log summary (recs-api, prod):**\n"
+        f"{logs_summary_text}\n\n"
+        "**Recommended actions:**\n"
+        f"{actions_text}\n\n"
+        "This summary was generated via Nebius Token Factory. "
+        "You can click **Generate Voice Summary (ElevenLabs)** to hear an audio recap."
+    )
+    return reply
+async def _voice_from_history(history: List[Dict[str, Any]]) -> str:
+    last_assistant = None
+    for msg in reversed(history or []):
+        if msg.get("role") == "assistant":
+            last_assistant = msg
+            break
+    if not last_assistant:
+        raise ValueError("No assistant messages found to synthesize.")
+    content = str(last_assistant.get("content", ""))
+    spoken_text = _prepare_spoken_text(content)
+    return await _generate_voice_summary_via_mcp(spoken_text)
+def build_interface() -> gr.Blocks:
+    with gr.Blocks(
+        title="Incident & Error Copilot",
+    ) as demo:
+        gr.Markdown("# Incident & Error Copilot", elem_classes=["incident-header"])
+        gr.Markdown(
+            "Enterprise incident assistant powered by MCP servers (logs, Nebius, Modal, Blaxel, ElevenLabs).",
+            elem_classes=["incident-subheader"],
+        )
+        with gr.Row():
+            with gr.Column(scale=3):
+                chat = gr.ChatInterface(
+                    fn=_chat_fn,
+                    textbox=gr.Textbox(
+                        placeholder="Describe an incident or paste an error stack trace...",
+                        label="Incident description",
+                    ),
+                    title="Incident Copilot",
+                )
+            with gr.Column(scale=2):
+                gr.Markdown("### Incident Tools")
+                gr.Markdown(
+                    "Use these MCP-backed tools to go deeper on the current incident.",
+                )
+                # Voice summary panel
+                with gr.Group(elem_classes=["tool-panel"]):
+                    gr.Markdown(
+                        "**Play Voice Summary (ElevenLabs)** — listen to the latest assistant summary as audio.",
+                    )
+                    voice_button = gr.Button("Play Voice Summary (ElevenLabs)")
+                    audio_player = gr.Audio(
+                        label="Incident Voice Summary",
+                        type="filepath",
+                        interactive=False,
+                    )
+                # Agentic auto-triage panel
+                with gr.Group(elem_classes=["tool-panel"]):
+                    gr.Markdown(
+                        "**Auto Triage Incident (Agent)** — runs logs, Nebius, Modal and Blaxel in sequence and produces a triage report.",
+                    )
+                    auto_button = gr.Button("Auto Triage Incident (Agent)")
+                    auto_output = gr.Markdown(label="Agent Triage Report")
+                # Modal deep analysis panel
+                with gr.Group(elem_classes=["tool-panel"]):
+                    gr.Markdown(
+                        "**Deep Log Analysis (Modal)** — calls a Modal function for detailed log statistics and latest error context.",
+                    )
+                    modal_button = gr.Button("Deep Log Analysis (Modal)")
+                    modal_output = gr.Markdown(label="Deep Analysis Result")
+                # Blaxel sandbox diagnostics panel
+                with gr.Group(elem_classes=["tool-panel"]):
+                    gr.Markdown(
+                        "**Sandbox Health Check (Blaxel)** — runs a lightweight command inside a Blaxel sandbox and shows its output.",
+                    )
+                    blaxel_button = gr.Button("Sandbox Health Check (Blaxel)")
+                    blaxel_output = gr.Markdown(label="Diagnostics Output")
+        voice_button.click(_voice_from_history, inputs=[chat.chatbot], outputs=[audio_player])
+        auto_button.click(_auto_triage_incident, inputs=[chat.chatbot], outputs=[auto_output])
+        modal_button.click(_modal_deep_analysis_via_mcp, inputs=None, outputs=[modal_output])
+        blaxel_button.click(_blaxel_run_diagnostic_via_mcp, inputs=None, outputs=[blaxel_output])
+    return demo
+if __name__ == "__main__":
+    demo = build_interface()
+    demo.launch()  # For local dev; HF Spaces will call `demo = build_interface()` implicitly

mcp_servers/__init__.py ADDED Viewed

	@@ -0,0 +1,7 @@

+"""MCP servers for the Incident & Error Copilot.
+Servers:
+- logs_server: access synthetic application logs
+- voice_elevenlabs_server: generate audio summaries via ElevenLabs
+Additional servers (status, runbooks) will be added here.
+"""

mcp_servers/blaxel_server.py ADDED Viewed

	@@ -0,0 +1,93 @@

+"""Blaxel Sandboxed Diagnostics MCP server.
+This MCP server uses the official Blaxel Python SDK to trigger simple
+sandbox-based diagnostics. It expects that you have authenticated locally
+(using `bl login` or environment variables like BL_API_KEY / BL_WORKSPACE).
+Environment variables (one of these auth methods must be configured as per
+Blaxel docs):
+- BL_API_KEY, BL_WORKSPACE (recommended for remote deployment), or
+- local Blaxel CLI login config.
+Tools:
+- run_simple_diagnostic: create or reuse a sandbox and run a shell command,
+  returning stdout/stderr.
+This provides a concrete, real integration point with Blaxel's infrastructure
+without mocking.
+"""
+from __future__ import annotations
+import asyncio
+import os
+from typing import Any, Dict
+from blaxel.core import SandboxInstance
+from mcp.server.fastmcp import FastMCP
+mcp = FastMCP("BlaxelDiagnostics", json_response=True)
+async def _get_or_create_sandbox(name: str = "incident-diagnostics") -> SandboxInstance:
+    """Get an existing sandbox by name or create a new one.
+    Uses default image and region; you can later tune this via the Blaxel UI
+    or by editing this server.
+    """
+    try:
+        sandbox = await SandboxInstance.get(name)
+    except Exception:
+        # Per Blaxel SDK docs, create() expects a single dict payload
+        # with fields like name, image, memory, ports, region, etc.
+        sandbox = await SandboxInstance.create(
+            {
+                "name": name,
+                "image": "blaxel/base-image:latest",
+                "memory": 2048,  # MB
+                "ports": [],
+            }
+        )
+    return sandbox
+@mcp.tool()
+async def run_simple_diagnostic(command: str = "echo 'diagnostic ok'", name: str = "incident-diagnostics") -> Dict[str, Any]:
+    """Run a simple shell command inside a Blaxel sandbox.
+    This is a generic diagnostic hook. In practice, you can pass commands
+    like curl checks to upstream services, small Python scripts, etc.
+    """
+    sandbox = await _get_or_create_sandbox(name)
+    # Execute the command in the sandbox using the Blaxel SDK's
+    # process.exec API. We request waitForCompletion so that logs are
+    # available directly on the returned process object.
+    process = await sandbox.process.exec(
+        {
+            "name": "incident-diagnostics",
+            "command": command,
+            "waitForCompletion": True,
+        }
+    )
+    # According to the docs, logs will contain stdout/stderr for the
+    # completed process. For the purposes of this demo we surface logs
+    # as stdout and assume exit_code 0 if no exception was raised.
+    logs = getattr(process, "logs", "")
+    return {
+        "sandbox_name": name,
+        "exit_code": 0,
+        "stdout": logs,
+        "stderr": "",
+    }
+if __name__ == "__main__":
+    host = os.getenv("BL_SERVER_HOST", "0.0.0.0")
+    port = int(os.getenv("BL_SERVER_PORT", "8000"))
+    mcp.run(transport="streamable-http", host=host, port=port)

mcp_servers/gateway.py ADDED Viewed

	@@ -0,0 +1,35 @@

+from __future__ import annotations
+import contextlib
+from starlette.applications import Starlette
+from starlette.routing import Mount
+from mcp_servers.logs_server import mcp as logs_mcp
+from mcp_servers.voice_elevenlabs_server import mcp as voice_mcp
+from mcp_servers.nebius_server import mcp as nebius_mcp
+from mcp_servers.modal_server import mcp as modal_mcp
+from mcp_servers.blaxel_server import mcp as blaxel_mcp
+@contextlib.asynccontextmanager
+async def lifespan(app: Starlette):
+    async with contextlib.AsyncExitStack() as stack:
+        await stack.enter_async_context(logs_mcp.session_manager.run())
+        await stack.enter_async_context(voice_mcp.session_manager.run())
+        await stack.enter_async_context(nebius_mcp.session_manager.run())
+        await stack.enter_async_context(modal_mcp.session_manager.run())
+        await stack.enter_async_context(blaxel_mcp.session_manager.run())
+        yield
+app = Starlette(
+    routes=[
+        Mount("/logs", logs_mcp.streamable_http_app()),
+        Mount("/voice", voice_mcp.streamable_http_app()),
+        Mount("/nebius", nebius_mcp.streamable_http_app()),
+        Mount("/modal", modal_mcp.streamable_http_app()),
+        Mount("/blaxel", blaxel_mcp.streamable_http_app()),
+    ],
+    lifespan=lifespan,
+)

mcp_servers/logs_server.py ADDED Viewed

	@@ -0,0 +1,192 @@

+"""Logs MCP server exposing synthetic incident logs via MCP tools.
+Run locally, for example:
+    python -m mcp_servers.logs_server
+Then connect from an MCP-compatible client using the HTTP transport.
+"""
+from __future__ import annotations
+from dataclasses import dataclass
+from datetime import datetime
+from typing import Any, Dict, List, Literal, Optional
+import os
+import psycopg
+from mcp.server.fastmcp import FastMCP
+mcp = FastMCP("IncidentLogs", json_response=True)
+@dataclass
+class LogEntry:
+    timestamp: str
+    service: str
+    env: Literal["prod", "staging", "dev"]
+    severity: Literal["DEBUG", "INFO", "WARN", "ERROR", "CRITICAL"]
+    message: str
+    region: str
+_NEON_DSN = os.getenv("NEON_DATABASE_URL") or os.getenv("DATABASE_URL")
+_neon_conn: Optional[psycopg.Connection] = None
+def _get_neon_conn() -> Optional[psycopg.Connection]:
+    """Return a cached Neon/Postgres connection if a DSN is configured.
+    If no NEON_DATABASE_URL / DATABASE_URL is set, returns None and the
+    synthetic in-memory log store will be used instead.
+    """
+    global _neon_conn
+    if not _NEON_DSN:
+        return None
+    if _neon_conn is None or _neon_conn.closed:
+        _neon_conn = psycopg.connect(_NEON_DSN)
+    return _neon_conn
+# Very small synthetic log store for now; we can expand later.
+_LOGS: List[LogEntry] = [
+    LogEntry(
+        timestamp="2025-11-25T22:10:00Z",
+        service="recs-api",
+        env="prod",
+        severity="ERROR",
+        message="Timeout while calling upstream model backend (eu-west-1)",
+        region="eu-west-1",
+    ),
+    LogEntry(
+        timestamp="2025-11-25T22:11:30Z",
+        service="recs-api",
+        env="prod",
+        severity="WARN",
+        message="Latency spike detected: p95=9800ms for /predict",
+        region="eu-west-1",
+    ),
+    LogEntry(
+        timestamp="2025-11-25T22:12:10Z",
+        service="recs-api",
+        env="prod",
+        severity="INFO",
+        message="Auto-scaler requested 2 additional model replicas",
+        region="eu-west-1",
+    ),
+]
+@mcp.tool()
+def get_logs(
+    service: str,
+    env: Literal["prod", "staging", "dev"] = "prod",
+    severity: Optional[Literal["DEBUG", "INFO", "WARN", "ERROR", "CRITICAL"]] = None,
+    region: Optional[str] = None,
+) -> list[dict]:
+    """Fetch recent logs for a given service/environment.
+    If a Neon/Postgres DSN is configured, this will query the `incident_logs`
+    table; otherwise it falls back to the in-memory synthetic store.
+    """
+    conn = _get_neon_conn()
+    if conn is not None:
+        where_clauses = ["service = %s", "env = %s"]
+        params: List[Any] = [service, env]
+        if severity is not None:
+            where_clauses.append("severity = %s")
+            params.append(severity)
+        if region is not None:
+            where_clauses.append("region = %s")
+            params.append(region)
+        where_sql = " AND ".join(where_clauses)
+        sql = (
+            "SELECT timestamp, service, env, severity, message, region "
+            "FROM incident_logs "
+            f"WHERE {where_sql} "
+            "ORDER BY timestamp DESC LIMIT 200"
+        )
+        with conn.cursor() as cur:
+            cur.execute(sql, params)
+            rows = cur.fetchall()
+        results: List[Dict[str, Any]] = []
+        for ts, svc, env_val, sev, msg, reg in rows:
+            if isinstance(ts, datetime):
+                ts_val = ts.isoformat()
+            else:
+                ts_val = str(ts)
+            results.append(
+                {
+                    "timestamp": ts_val,
+                    "service": svc,
+                    "env": env_val,
+                    "severity": sev,
+                    "message": msg,
+                    "region": reg,
+                }
+            )
+        return results
+    # Fallback: synthetic in-memory log store
+    def _matches(entry: LogEntry) -> bool:
+        if entry.service != service:
+            return False
+        if entry.env != env:
+            return False
+        if severity is not None and entry.severity != severity:
+            return False
+        if region is not None and entry.region != region:
+            return False
+        return True
+    results = [e for e in _LOGS if _matches(e)]
+    return [e.__dict__ for e in results]
+@mcp.tool()
+def summarize_logs(service: str, env: Literal["prod", "staging", "dev"] = "prod") -> dict:
+    """Provide a tiny synthetic summary of recent logs for a service.
+    Returns counts by severity and a short human-readable summary.
+    """
+    conn = _get_neon_conn()
+    if conn is not None:
+        # Reuse get_logs to unify shape
+        relevant_dicts = get_logs(service=service, env=env)
+    else:
+        relevant_dicts = [e.__dict__ for e in _LOGS if e.service == service and e.env == env]
+    counts: dict[str, int] = {}
+    for e in relevant_dicts:
+        sev = e.get("severity")
+        if sev is None:
+            continue
+        counts[sev] = counts.get(sev, 0) + 1
+    summary: str
+    if not relevant_dicts:
+        summary = "No logs found for this service/env in the demo store."
+    else:
+        latest = max(relevant_dicts, key=lambda e: e.get("timestamp", ""))
+        summary = (
+            f"Latest log at {latest.get('timestamp')}: [{latest.get('severity')}] "
+            f"{latest.get('message')} (region={latest.get('region')})."
+        )
+    return {"counts": counts, "summary": summary}
+if __name__ == "__main__":
+    # Expose as a streamable HTTP MCP server.
+    # When running on Blaxel MCP hosting, BL_SERVER_HOST/BL_SERVER_PORT
+    # are provided by the platform. Locally we default to 0.0.0.0:8000.
+    host = os.getenv("BL_SERVER_HOST", "0.0.0.0")
+    port = int(os.getenv("BL_SERVER_PORT", "8000"))
+    mcp.run(transport="streamable-http", host=host, port=port)

mcp_servers/modal_server.py ADDED Viewed

	@@ -0,0 +1,84 @@

+"""Modal Deep Analysis MCP server.
+This server forwards deep log analysis requests to a real Modal web endpoint
+that you deploy.
+Environment variables:
+- MODAL_DEEP_ANALYSIS_URL: required. HTTPS URL of a Modal web endpoint
+  (e.g. https://modal-labs-example--deep-analysis.modal.run).
+- MODAL_AUTH_TOKEN: optional. If set, sent as
+  `Authorization: Bearer <MODAL_AUTH_TOKEN>` header.
+The Modal endpoint is expected to accept JSON like:
+{
+  "service": "recs-api",
+  "env": "prod",
+  "logs": [ ... ]
+}
+and return a JSON object with deep analysis results, which this MCP server
+returns directly to clients.
+"""
+from __future__ import annotations
+import os
+from typing import Any, Dict, List, Optional
+import httpx
+from mcp.server.fastmcp import FastMCP
+mcp = FastMCP("ModalDeepAnalysis", json_response=True)
+def _get_modal_config() -> str:
+    url = os.getenv("MODAL_DEEP_ANALYSIS_URL")
+    if not url:
+        raise RuntimeError(
+            "MODAL_DEEP_ANALYSIS_URL is not set. Set it to your Modal web "
+            "endpoint URL for deep log analysis."
+        )
+    return url
+@mcp.tool()
+async def deep_log_analysis(
+    service: str,
+    env: str = "prod",
+    logs: Optional[List[Dict[str, Any]]] = None,
+) -> Dict[str, Any]:
+    """Run deep log analysis using a deployed Modal function.
+    This calls a real Modal web endpoint specified by MODAL_DEEP_ANALYSIS_URL.
+    The endpoint should perform heavy/statistical analysis and return a JSON
+    object summarizing findings (spikes, time windows, correlations, etc.).
+    """
+    url = _get_modal_config()
+    headers: Dict[str, str] = {"Content-Type": "application/json"}
+    token = os.getenv("MODAL_AUTH_TOKEN")
+    if token:
+        headers["Authorization"] = f"Bearer {token}"
+    payload = {
+        "service": service,
+        "env": env,
+        "logs": logs or [],
+    }
+    async with httpx.AsyncClient() as client:
+        resp = await client.post(url, json=payload, headers=headers, timeout=60)
+        resp.raise_for_status()
+        data = resp.json()
+    if not isinstance(data, dict):
+        # Normalize non-dict responses
+        return {"result": data}
+    return data
+if __name__ == "__main__":
+    host = os.getenv("BL_SERVER_HOST", "0.0.0.0")
+    port = int(os.getenv("BL_SERVER_PORT", "8000"))
+    mcp.run(transport="streamable-http", host=host, port=port)

mcp_servers/nebius_server.py ADDED Viewed

	@@ -0,0 +1,116 @@

+"""Nebius Token Factory MCP server.
+Provides tools that call the real Nebius Token Factory OpenAI-compatible API
+for incident reasoning.
+Environment variables required:
+- NEBIUS_API_KEY: Nebius Token Factory API key.
+- NEBIUS_MODEL_ID: Optional, model identifier (default: deepseek-ai/DeepSeek-R1-0528).
+- NEBIUS_BASE_URL: Optional, override base URL (default: https://api.tokenfactory.nebius.com/v1/).
+"""
+from __future__ import annotations
+import json
+import os
+from typing import Any, Dict, Optional
+from mcp.server.fastmcp import FastMCP
+from openai import AsyncOpenAI
+mcp = FastMCP("NebiusIncident", json_response=True)
+def _get_client() -> AsyncOpenAI:
+    api_key = os.getenv("NEBIUS_API_KEY")
+    if not api_key:
+        raise RuntimeError(
+            "NEBIUS_API_KEY environment variable is not set; "
+            "set it to a valid Nebius Token Factory API key."
+        )
+    base_url = os.getenv("NEBIUS_BASE_URL", "https://api.tokenfactory.nebius.com/v1/")
+    return AsyncOpenAI(api_key=api_key, base_url=base_url)
+def _get_model_id() -> str:
+    return os.getenv("NEBIUS_MODEL_ID", "deepseek-ai/DeepSeek-R1-0528")
+@mcp.tool()
+async def nebius_incident_summary(
+    user_description: str,
+    logs_summary: Optional[str] = None,
+) -> Dict[str, Any]:
+    """Call Nebius LLM to generate a structured incident summary.
+    Returns a JSON object with fields like:
+    - title
+    - severity
+    - impact
+    - root_cause
+    - actions (list of recommended steps)
+    """
+    client = _get_client()
+    model = _get_model_id()
+    system_prompt = (
+        "You are an SRE / incident management assistant. "
+        "Given an incident description and optional logs summary, "
+        "produce a concise, structured JSON incident report."
+    )
+    user_content = {
+        "user_description": user_description,
+        "logs_summary": logs_summary,
+    }
+    messages = [
+        {
+            "role": "system",
+            "content": system_prompt,
+        },
+        {
+            "role": "user",
+            "content": [
+                {
+                    "type": "text",
+                    "text": (
+                        "Respond ONLY with a JSON object of the form "
+                        "{\"title\": str, \"severity\": str, \"impact\": str, "
+                        "\"root_cause\": str, \"actions\": [str]} based on this data: "
+                        f"{json.dumps(user_content)}"
+                    ),
+                }
+            ],
+        },
+    ]
+    response = await client.chat.completions.create(
+        model=model,
+        messages=messages,
+        temperature=0.2,
+        response_format={"type": "json_object"},
+    )
+    # In JSON mode, message.content should be a JSON string.
+    content = response.choices[0].message.content
+    if isinstance(content, str):
+        try:
+            data = json.loads(content)
+        except json.JSONDecodeError:
+            # Fall back to wrapping raw content if decoding fails.
+            data = {"raw": content}
+    else:
+        # For safety if the SDK returns a different structure.
+        data = {"raw": str(content)}
+    return data
+if __name__ == "__main__":
+    host = os.getenv("BL_SERVER_HOST", "0.0.0.0")
+    port = int(os.getenv("BL_SERVER_PORT", "8000"))
+    mcp.run(transport="streamable-http", host=host, port=port)

mcp_servers/voice_elevenlabs_server.py ADDED Viewed

	@@ -0,0 +1,107 @@

+"""ElevenLabs Voice MCP server.
+Exposes tools to list voices and generate audio summaries for incidents.
+Requires the ELEVENLABS_API_KEY environment variable to be set.
+"""
+from __future__ import annotations
+import base64
+import os
+from typing import Any, Dict, List
+import httpx
+from mcp.server.fastmcp import FastMCP
+mcp = FastMCP("ElevenLabsVoice", json_response=True)
+_BASE_URL = "https://api.elevenlabs.io/v1"
+def _get_api_key() -> str:
+    api_key = os.getenv("ELEVENLABS_API_KEY")
+    if not api_key:
+        raise RuntimeError(
+            "ELEVENLABS_API_KEY environment variable is not set; "
+            "set it before using the ElevenLabs MCP server."
+        )
+    return api_key
+@mcp.tool()
+async def list_voices() -> dict:
+    """List available ElevenLabs voices.
+    Wraps GET /v1/voices and returns the JSON payload.
+    """
+    api_key = _get_api_key()
+    async with httpx.AsyncClient() as client:
+        resp = await client.get(
+            f"{_BASE_URL}/voices",
+            headers={"xi-api-key": api_key},
+            timeout=20,
+        )
+        resp.raise_for_status()
+        data = resp.json()
+    # Ensure we always return a plain dict for structured MCP output
+    if isinstance(data, dict):
+        return data
+    return {"voices": data}
+@mcp.tool()
+async def generate_incident_summary_audio(
+    text: str,
+    voice_id: str,
+    model_id: str = "eleven_turbo_v2",
+    bitrate: str = "128k",
+) -> Dict[str, Any]:
+    """Generate an audio summary for an incident description or report.
+    Returns base64-encoded MP3 data plus basic metadata. The client app
+    (e.g. Gradio UI) can decode this into an audio player.
+    """
+    api_key = _get_api_key()
+    payload = {
+        "text": text,
+        "model_id": model_id,
+        "voice_settings": {
+            "stability": 0.5,
+            "similarity_boost": 0.7,
+        },
+    }
+    async with httpx.AsyncClient() as client:
+        resp = await client.post(
+            f"{_BASE_URL}/text-to-speech/{voice_id}",
+            headers={
+                "xi-api-key": api_key,
+                "Accept": "audio/mpeg",
+                "Content-Type": "application/json",
+            },
+            json=payload,
+            timeout=60,
+        )
+        resp.raise_for_status()
+        audio_bytes = resp.content
+    audio_b64 = base64.b64encode(audio_bytes).decode("ascii")
+    return {
+        "audio_base64": audio_b64,
+        "format": "mp3",
+        "model_id": model_id,
+        "voice_id": voice_id,
+        "bitrate": bitrate,
+    }
+if __name__ == "__main__":
+    # Expose as a streamable HTTP MCP server.
+    host = os.getenv("BL_SERVER_HOST", "0.0.0.0")
+    port = int(os.getenv("BL_SERVER_PORT", "8000"))
+    mcp.run(transport="streamable-http", host=host, port=port)

modal_deep_analysis_app.py ADDED Viewed

	@@ -0,0 +1,92 @@

+"""Modal app exposing a deep_log_analysis web endpoint.
+This is called by the Modal MCP server (mcp_servers/modal_server.py).
+It expects a JSON body of the form:
+{
+  "service": "recs-api",
+  "env": "prod",
+  "logs": [
+    {"timestamp": "...", "service": "...", "env": "...",
+     "severity": "ERROR", "message": "...", "region": "..."},
+    ...
+  ]
+}
+and returns a JSON object with some aggregate stats and a short summary.
+"""
+from __future__ import annotations
+from collections import Counter
+from typing import Any, Dict, List
+import modal
+# Web endpoints using modal.fastapi_endpoint now require FastAPI to be installed
+# explicitly in the container image.
+image = modal.Image.debian_slim().pip_install("fastapi[standard]")
+app = modal.App("incident-deep-analysis", image=image)
+@app.function()
+@modal.fastapi_endpoint(method="POST", docs=True)
+def deep_log_analysis(payload: Dict[str, Any]) -> Dict[str, Any]:
+    service = payload.get("service")
+    env = payload.get("env")
+    logs: List[Dict[str, Any]] = payload.get("logs") or []
+    # Basic stats over the logs we received
+    severity_counts: Counter[str] = Counter()
+    regions: Counter[str] = Counter()
+    latest_error: Dict[str, Any] | None = None
+    for entry in logs:
+        sev = str(entry.get("severity", "UNKNOWN"))
+        severity_counts[sev] += 1
+        region = str(entry.get("region", "unknown"))
+        regions[region] += 1
+        if sev in {"ERROR", "CRITICAL"}:
+            # keep the last error we see (logs are usually newest-first)
+            latest_error = entry
+    top_region, top_region_count = (None, 0)
+    if regions:
+        top_region, top_region_count = regions.most_common(1)[0]
+    summary_lines = []
+    summary_lines.append(
+        f"Deep analysis for service '{service}' in env '{env}' over {len(logs)} log entries."
+    )
+    if severity_counts:
+        parts = [f"{sev}={count}" for sev, count in severity_counts.items()]
+        summary_lines.append("Severity distribution: " + ", ".join(parts) + ".")
+    if latest_error is not None:
+        summary_lines.append(
+            "Latest high-severity event: "
+            f"[{latest_error.get('severity')}] {latest_error.get('message')} "
+            f"at {latest_error.get('timestamp')} (region={latest_error.get('region')})."
+        )
+    if top_region is not None:
+        summary_lines.append(
+            f"Region with most activity: {top_region} ({top_region_count} events)."
+        )
+    summary = " ".join(summary_lines)
+    return {
+        "service": service,
+        "env": env,
+        "log_count": len(logs),
+        "severity_counts": dict(severity_counts),
+        "top_region": top_region,
+        "top_region_count": top_region_count,
+        "latest_error": latest_error,
+        "summary": summary,
+    }

requirements.txt ADDED Viewed

	@@ -0,0 +1,10 @@

+gradio
+mcp
+httpx
+pydantic
+python-dotenv
+openai
+modal
+blaxel
+psycopg[binary]
+uvicorn[standard]

run_gateway.py ADDED Viewed

	@@ -0,0 +1,18 @@

+import os
+from dotenv import load_dotenv
+import uvicorn
+from mcp_servers.gateway import app
+if __name__ == "__main__":
+    # Load local env and .env.blaxel so all MCP servers see the same secrets
+    load_dotenv()
+    load_dotenv(".env.blaxel", override=False)
+    host = os.getenv("MCP_GATEWAY_HOST", "127.0.0.1")
+    # Default to 8004 locally to avoid conflicts with other services
+    port = int(os.getenv("MCP_GATEWAY_PORT", "8004"))
+    uvicorn.run(app, host=host, port=port)