NVIDIA · namitdhameja · Apr 6, 2026 · Apr 17, 2026 · Apr 18, 2026 · Apr 18, 2026
diff --git a/docs/source/fault_tolerance/usage_guide.rst b/docs/source/fault_tolerance/usage_guide.rst
@@ -109,33 +109,91 @@ Validation behavior:
   - Other existing types (e.g., devices/symlinks): performs ``stat`` access
 
 
-Attribution service integration
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Attribution integration
+^^^^^^^^^^^^^^^^^^^^^^
 
-Enable artifact analysis (e.g., logs) during rendezvous health checks by pointing to a running attribution service.
-The feature is enabled by specifying both host and port.
+Enable artifact analysis (e.g., logs) during rendezvous to make RESTART/STOP decisions.
+You can configure **one or more backends** (e.g. ``mcp`` for LogSage + FR via MCP, plus an HTTP URL for a third-party service). The run stops the workload (no restart) if **any** backend reports do not restart.
 
-* CLI:
+Use ``--ft-attribution-backend`` (repeatable) and/or YAML ``attribution_backends``.
+
+* ``mcp``: Log analysis via MCP subprocess (``nvrx-mcp-analysis``).
+* **HTTP URL** (no separate keyword): pass the URL as the flag value, e.g.
+  ``--ft-attribution-backend http://127.0.0.1:8000`` or ``--ft-attribution-backend host:port``
+  (``http://`` is added when you use ``host:port`` form).
 
-  - ``--ft-attrsvc-host <HOST>`` (alias: ``--ft_attrsvc_host``)
-  - ``--ft-attrsvc-port <PORT>`` (alias: ``--ft_attrsvc_port``)
+* CLI:
 
-  Example:
+  - ``--ft-attribution-backend`` (alias: ``--ft_attribution_backend``): Add one backend; repeat for multiple.
+    Each value is ``mcp`` or an HTTP URL. Combined with YAML ``attribution_backends``.
+  - ``--ft-attribution-timeout`` (alias: ``--ft_attribution_timeout``): Wait/timeout in seconds;
+    skip result if exceeded (default: 60).
+  - ``--ft-attribution-dry-run`` (alias: ``--ft_attribution_dry_run``): Dry run. Run the full
+    attribution chain (log analysis, Slack, dataflow) but do not apply the restart/stop decision.
+    Log what would happen instead. Useful for validating the pipeline without affecting behavior.
+  - ``--ft-slack-token-file`` (alias: ``--ft_slack_token_file``): Path to file containing Slack bot token.
+    When not set, uses ``SLACK_BOT_TOKEN`` or ``SLACK_BOT_TOKEN_FILE`` env vars.
+  - ``--ft-slack-channel`` (alias: ``--ft_slack_channel``): Slack channel for alerts.
+    When not set, uses ``SLACK_CHANNEL`` env var.
+  - ``--ft-dataflow-index`` (alias: ``--ft_dataflow_index``): Elasticsearch/dataflow index for posting
+    attribution results (mcp/URL). Requires ``nvdataflow`` (install via ``pip install nvidia-resiliency-ext[dataflow]``).
+    When not set, dataflow posting is disabled.
+  - ``--ft-llm-api-key-file`` (alias: ``--ft_llm_api_key_file``): Path to a file containing the LLM API key.
+    Sets ``LLM_API_KEY_FILE`` in the process before MCP attribution starts. Overrides YAML ``llm_api_key_file`` when both are set.
+
+  Examples:
 
   .. code-block:: bash
 
-     ft_launcher \
-       --ft-attrsvc-host 127.0.0.1 \
-       --ft-attrsvc-port 8000 \
-       train.py
+     # MCP: log analysis via nvrx-mcp-analysis
+     ft_launcher --ft-attribution-backend mcp train.py
+
+     # URL mode (HTTP attribution service)
+     ft_launcher --ft-attribution-backend http://127.0.0.1:8000 train.py
+
+     # Service with custom timeout
+     ft_launcher --ft-attribution-backend http://127.0.0.1:8000 --ft-attribution-timeout 90 train.py
+
+     # MCP with Slack and dataflow (token from file; channel from env)
+     ft_launcher --ft-attribution-backend mcp --ft-slack-token-file /etc/secrets/slack-token train.py
+
+     # MCP with explicit Slack channel and dataflow index
+     ft_launcher --ft-attribution-backend mcp \
+       --ft-slack-token-file /etc/secrets/slack-token --ft-slack-channel "#alerts" \
+       --ft-dataflow-index my-attribution-index train.py
+
+     # Dry run: exercise full attribution chain without applying restart/stop decision
+     ft_launcher --ft-attribution-backend mcp --ft-attribution-dry-run train.py
+
+     # Multiple backends: MCP plus third-party HTTP service
+     ft_launcher --ft-attribution-backend mcp --ft-attribution-backend http://127.0.0.1:8000 train.py
 
-* YAML: under the ``fault_tolerance`` section
+* YAML: under the ``fault_tolerance`` section use ``attribution_backends`` (list of ``mcp`` and/or URLs),
+  ``attribution_timeout_seconds``, ``slack``, ``dataflow_index``, and optional ``llm_api_key_file``:
 
   .. code-block:: yaml
 
      fault_tolerance:
-       attrsvc_host: "127.0.0.1"
-       attrsvc_port: 8000
+       # Prefer explicit list for multiple backends:
+       attribution_backends:
+         - "mcp"
+         - "http://127.0.0.1:8000"
+       attribution_timeout_seconds: 60
+       attribution_dry_run: false           # true = run chain but don't apply action; log only
+       slack:
+         bot_token_file: "/etc/secrets/slack-token"  # or bot_token for inline (less secure)
+         channel: "#alerts"
+       dataflow_index: "my-attribution-index"       # optional; requires nvdataflow
+       llm_api_key_file: "/etc/secrets/llm-api-key"  # optional; sets LLM_API_KEY_FILE for MCP
+
+* Environment (fallback when CLI/YAML not set):
+
+  - ``SLACK_BOT_TOKEN`` or ``SLACK_BOT_TOKEN_FILE``: Slack bot token for mcp/URL alerts.
+  - ``SLACK_CHANNEL``: Slack channel for alerts.
+  - **LLM / LogSage API key** (MCP backend): ``LLM_API_KEY`` or ``LLM_API_KEY_FILE``, or default files
+    ``~/.llm_api_key`` / ``~/.config/nvrx/llm_api_key`` (see ``load_llm_api_key`` in
+    ``nvidia_resiliency_ext.attribution.api_keys``). For ``ft_launcher``, use YAML ``llm_api_key_file`` or
+    ``--ft-llm-api-key-file``.
 
 GPU Memory Reclaim
 ^^^^^^^^^^^^^^^^^^

diff --git a/pyproject.toml b/pyproject.toml
@@ -52,6 +52,7 @@ setproctitle = ">=1.3.0"
 logsage = ">=0.1.7"
 grpcio = "^1.76.0"
 grpcio-tools = "^1.76.0"
+httpx = ">=0.24.0"
 protobuf = ">=4.22.0"
 
 [tool.poetry.scripts]

diff --git a/services/nvrx_attrsvc/ATTRSVC_SPEC.md b/services/nvrx_attrsvc/ATTRSVC_SPEC.md
@@ -94,10 +94,10 @@ Two layers: **library** (`nvidia_resiliency_ext.attribution`) and **service**
 **3.1 Environment variables** — Full table and defaults: **README.md** (source of truth).
 
 Summary:
-- Prefix **`NVRX_ATTRSVC_`** for service settings (see README for exceptions: NVIDIA
-  API key, Slack tokens, optional `NVIDIA_API_KEY_FILE` / file paths in `api_keys.py`).
-- **`NVIDIA_API_KEY`**: required for attribution; loaded in `config.setup()` after
-  logging — **empty/missing → log error and process exit**. Slack is optional.
+- Prefix **`NVRX_ATTRSVC_`** for service settings (see README for exceptions: LLM
+  API key, Slack tokens, optional `LLM_API_KEY_FILE` / file paths in `api_keys.py`).
+- **`LLM_API_KEY`** / **`LLM_API_KEY_FILE`**: required for attribution (or default key files);
+  loaded in `config.setup()` after logging — **empty/missing → log error and process exit**. Slack is optional.
 - LLM-related env vars are optional; unset → library defaults (`LogAnalyzerConfig`).
 - Rate limits: slowapi, `RATE_LIMIT_SUBMIT` / `RATE_LIMIT_ANALYZE` / `RATE_LIMIT_PREVIEW`.
 
@@ -144,7 +144,7 @@ Patterns tried in order (scheduler-agnostic where possible): `_(\d+)_date_`,
 --------------------------------------------------------------------------------
 
 **Startup (conceptual)**  
-Load `Settings` → configure logging → **require non-empty NVIDIA API key** → wire
+Load `Settings` → configure logging → **require non-empty LLM API key** → wire
 postprocessing (`configure`, poster, dataflow index, Slack) → construct
 `AttributionService` / **`Analyzer`** → background poll → Uvicorn. Optional cache
 import.

diff --git a/services/nvrx_attrsvc/README.md b/services/nvrx_attrsvc/README.md
@@ -24,8 +24,8 @@ pip install -e .
 
 # Run
 export NVRX_ATTRSVC_ALLOWED_ROOT=/path/to/logs
-# API key: set env var OR create ~/.nvidia_api_key file
-export NVIDIA_API_KEY=nvapi-...
+# API key: set env var OR create ~/.llm_api_key file
+export LLM_API_KEY=your-llm-api-key-here
 nvrx-attrsvc
 ```
 
@@ -57,11 +57,11 @@ Environment variables (prefix: `NVRX_ATTRSVC_`):
 | `NVRX_ATTRSVC_COMPUTE_TIMEOUT`  | Timeout for analysis in seconds                                                                                                                                                                               |
 | `NVRX_ATTRSVC_ANALYSIS_BACKEND` | `mcp` (subprocess MCP, default) or `lib` (in-process LogSage and flight-recorder analysis). Same setting for both; library behavior: **ARCHITECTURE.md §7**. Legacy env: `NVRX_ATTRSVC_LOG_ANALYSIS_BACKEND`. |
 
-**NVIDIA API Key** (required, checked in order):
-1. `NVIDIA_API_KEY` environment variable
-2. `NVIDIA_API_KEY_FILE` environment variable (path to file)
-3. `~/.nvidia_api_key` file
-4. `~/.config/nvrx/nvidia_api_key` file
+**LLM API Key** (required, checked in order — see `api_keys.load_llm_api_key`):
+1. `LLM_API_KEY` environment variable
+2. `LLM_API_KEY_FILE` environment variable (path to file)
+3. `~/.llm_api_key` file
+4. `~/.config/nvrx/llm_api_key` file
 
 **Slack Notifications** (optional; no `NVRX_ATTRSVC_` prefix):
 

diff --git a/services/nvrx_attrsvc/config.py b/services/nvrx_attrsvc/config.py
@@ -262,14 +262,14 @@ def setup() -> Settings:
     logging.getLogger("nvidia_resiliency_ext.attribution.mcp_integration").setLevel(_root_lvl)
     logging.getLogger("uvicorn.access").setLevel(logging.WARNING)
 
-    from nvidia_resiliency_ext.attribution.api_keys import load_nvidia_api_key, load_slack_bot_token
+    from nvidia_resiliency_ext.attribution.api_keys import load_llm_api_key, load_slack_bot_token
 
-    nvidia_key = load_nvidia_api_key()
-    if not nvidia_key:
+    llm_key = load_llm_api_key()
+    if not llm_key:
         logger.error(
-            "NVIDIA API key not found or empty. Attribution requires a key. Set NVIDIA_API_KEY "
-            "or NVIDIA_API_KEY_FILE, or place a key in ~/.nvidia_api_key or "
-            "~/.config/nvrx/nvidia_api_key. Slack notifications remain optional (SLACK_BOT_TOKEN)."
+            "LLM API key not found or empty. Attribution requires a key. Set LLM_API_KEY or "
+            "LLM_API_KEY_FILE, or default key files (~/.llm_api_key, ~/.config/nvrx/llm_api_key). "
+            "Slack notifications remain optional (SLACK_BOT_TOKEN)."
         )
         raise SystemExit(1)
 

diff --git a/services/nvrx_attrsvc/deploy/Dockerfile b/services/nvrx_attrsvc/deploy/Dockerfile
@@ -7,7 +7,7 @@
 #   docker run -d \
 #       -p 8000:8000 \
 #       -e NVRX_ATTRSVC_ALLOWED_ROOT=/data/logs \
-#       -e NVIDIA_API_KEY=nvapi-... \
+#       -e LLM_API_KEY=your-llm-api-key-here \
 #       -v /path/to/logs:/data/logs:ro \
 #       nvrx-attrsvc
 #

diff --git a/services/nvrx_attrsvc/deploy/kubernetes.yaml b/services/nvrx_attrsvc/deploy/kubernetes.yaml
@@ -4,7 +4,7 @@
 #   kubectl apply -f services/nvrx_attrsvc/deploy/kubernetes.yaml
 #
 # Prerequisites:
-#   - Create secret: kubectl create secret generic nvidia-api-key --from-literal=api-key=nvapi-...
+#   - Create secret: kubectl create secret generic llm-api-key --from-literal=api-key=your-llm-api-key-here
 #   - Ensure log volume is accessible (update hostPath as needed)
 #
 # Deployment considerations:
@@ -54,10 +54,10 @@ spec:
             - configMapRef:
                 name: nvrx-attrsvc-config
           env:
-            - name: NVIDIA_API_KEY
+            - name: LLM_API_KEY
               valueFrom:
                 secretKeyRef:
-                  name: nvidia-api-key
+                  name: llm-api-key
                   key: api-key
           volumeMounts:
             - name: logs

diff --git a/services/nvrx_attrsvc/deploy/nvrx-attrsvc.service b/services/nvrx_attrsvc/deploy/nvrx-attrsvc.service
@@ -7,7 +7,7 @@
 # Manual installation:
 #   1. Create venv:     python3 -m venv /opt/nvrx/venv
 #   2. Install:         /opt/nvrx/venv/bin/pip install -e services
-#   3. Create API key:  echo "nvapi-xxx" | sudo tee /etc/nvrx/nvidia_api_key
+#   3. Create API key:  echo "your-llm-api-key-here" | sudo tee /etc/nvrx/llm_api_key
 #   4. Copy service:    sudo cp nvrx-attrsvc.service /etc/systemd/system/
 #   5. Reload:          sudo systemctl daemon-reload
 #   6. Enable:          sudo systemctl enable nvrx-attrsvc

diff --git a/services/nvrx_attrsvc/deploy/run_attrsvc.sh b/services/nvrx_attrsvc/deploy/run_attrsvc.sh
@@ -6,7 +6,7 @@
 #
 # Required environment variables:
 #   NVRX_ATTRSVC_ALLOWED_ROOT - Root path for log files to analyze
-#   NVIDIA_API_KEY            - API key for LLM (or NVIDIA_API_KEY_FILE)
+#   LLM_API_KEY               - API key for LLM (or LLM_API_KEY_FILE)
 #
 # Optional environment variables:
 #   NVRX_ATTRSVC_PORT         - Listen port (default: 8000)
@@ -17,7 +17,7 @@
 #
 # Example:
 #   export NVRX_ATTRSVC_ALLOWED_ROOT=/lustre/logs
-#   export NVIDIA_API_KEY=nvapi-...
+#   export LLM_API_KEY=your-llm-api-key-here
 #   ./run_attrsvc.sh ~/nvrx_logs
 
 set -e
@@ -38,7 +38,7 @@ PID_FILE="${OUTPUT_DIR}/${PREFIX}_attrsvc.pid"
 validate_attrsvc_allowed_root || exit 1
 
 # Setup API key
-setup_nvidia_api_key || exit 1
+setup_llm_api_key || exit 1
 
 # Create output directory
 ensure_directory "${OUTPUT_DIR}" "logs directory" || exit 1

diff --git a/services/nvrx_attrsvc/deploy/slurm.sbatch b/services/nvrx_attrsvc/deploy/slurm.sbatch
@@ -18,9 +18,9 @@
 #   NVRX_ATTRSVC_ALLOWED_ROOT - Root path for log files to analyze
 #
 # API Key Options (in priority order):
-#   1. NVIDIA_API_KEY env var (direct key)
-#   2. NVIDIA_API_KEY_FILE env var (path to file containing key)
-#   3. Default: ~/.nvidia_api_key
+#   1. LLM_API_KEY env var (direct key)
+#   2. LLM_API_KEY_FILE env var (path to file containing key)
+#   3. Default: ~/.llm_api_key or ~/.config/nvrx/llm_api_key
 #
 # Example:
 #   NVRX_ATTRSVC_ALLOWED_ROOT=/lustre/logs sbatch --account=myaccount slurm.sbatch
@@ -48,7 +48,7 @@ export NVRX_ATTRSVC_NVDATAFLOW_PROJECT="${NVRX_ATTRSVC_NVDATAFLOW_PROJECT:-}"
 export NVRX_ATTRSVC_CLUSTER_NAME="${NVRX_ATTRSVC_CLUSTER_NAME:-${SLURM_CLUSTER_NAME:-unknown}}"
 
 # Setup API key
-setup_nvidia_api_key || exit 1
+setup_llm_api_key || exit 1
 
 # Install packages
 install_nvrx_packages "attrsvc"

diff --git a/services/scripts/README.md b/services/scripts/README.md
@@ -20,8 +20,8 @@ Shared shell scripts for deployment and monitoring.
 ```bash
 # Set required environment
 export NVRX_ATTRSVC_ALLOWED_ROOT=/lustre/logs
-# API key: set env var OR create ~/.nvidia_api_key file
-export NVIDIA_API_KEY=nvapi-...
+# API key: set env var OR create ~/.llm_api_key file
+export LLM_API_KEY=your-llm-api-key-here
 
 # Install, start, and manage
 ./scripts/run_services.sh install   # Install packages
@@ -50,10 +50,10 @@ sudo ./scripts/setup_systemd.sh start
 ### API Key
 
 The API key can be provided in multiple ways (checked in order):
-1. `NVIDIA_API_KEY` environment variable
-2. `NVIDIA_API_KEY_FILE` environment variable (path to key file)
-3. `~/.nvidia_api_key` file
-4. `~/.config/nvrx/nvidia_api_key` file
+1. `LLM_API_KEY` environment variable
+2. `LLM_API_KEY_FILE` environment variable (path to key file)
+3. `~/.llm_api_key` file
+4. `~/.config/nvrx/llm_api_key` file
 
 **Output files** (in `~/nvrx_logs/` by default):
 - `<timestamp>_attrsvc.log` - Attribution service stdout/stderr
@@ -129,7 +129,7 @@ Shared functions sourced by other scripts:
 
 | Function | Description |
 |----------|-------------|
-| `setup_nvidia_api_key` | Load API key from env, file, or default location |
+| `setup_llm_api_key` | Load LLM API key from env, file, or default location |
 | `install_nvrx_packages` | Install NVRX packages from local repo |
 | `validate_commands` | Check required commands exist |
 

diff --git a/services/scripts/build_enroot_image.sh b/services/scripts/build_enroot_image.sh
@@ -14,7 +14,7 @@
 #   # Run attribution service
 #   srun --container-image=/path/to/nvrx-services.sqsh \
 #        --container-env=NVRX_ATTRSVC_ALLOWED_ROOT=/data \
-#        --container-env=NVIDIA_API_KEY=${NVIDIA_API_KEY} \
+#        --container-env=LLM_API_KEY=${LLM_API_KEY} \
 #        --container-mounts=/path/to/logs:/data:ro \
 #        nvrx-attrsvc
 #
@@ -150,7 +150,7 @@ echo ""
 echo "  # Attribution service"
 echo "  srun --container-image=${OUTPUT_PATH} \\"
 echo "       --container-env=NVRX_ATTRSVC_ALLOWED_ROOT=/data \\"
-echo "       --container-env=NVIDIA_API_KEY=\${NVIDIA_API_KEY} \\"
+echo "       --container-env=LLM_API_KEY=\${LLM_API_KEY} \\"
 echo "       --container-mounts=/path/to/logs:/data:ro \\"
 echo "       nvrx-attrsvc"
 echo ""