From 5db0d891632688423e1079e6e9b3f310d333b100 Mon Sep 17 00:00:00 2001
From: Yolanda Robla <info@ysoft.biz>
Date: Tue, 14 Apr 2026 15:13:31 +0200
Subject: [PATCH] Explain proxyrunner session-aware backend pod routing

Users scaling backendReplicas > 1 had no explanation of why Redis is
needed or how the proxy runner routes sessions to backend pods.

- Add a "Session routing for backend replicas" subsection explaining
  that the proxy runner uses Redis to store session-to-pod mappings,
  what happens when a backend pod restarts, and why client-IP affinity
  alone is unreliable behind NAT
- Absorb the standalone SessionStorageWarning note into the new
  subsection so the backendReplicas implication is explicit
- Update Redis link to point directly to the horizontal scaling anchor

Closes #708

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---
 docs/toolhive/guides-k8s/run-mcp-k8s.mdx | 41 ++++++++++++++++--------
 1 file changed, 28 insertions(+), 13 deletions(-)

diff --git a/docs/toolhive/guides-k8s/run-mcp-k8s.mdx b/docs/toolhive/guides-k8s/run-mcp-k8s.mdx
index e0154331..5254899a 100644
--- a/docs/toolhive/guides-k8s/run-mcp-k8s.mdx
+++ b/docs/toolhive/guides-k8s/run-mcp-k8s.mdx
@@ -453,17 +453,40 @@ The proxy runner handles authentication, MCP protocol framing, and session
 management; it is stateless with respect to tool execution. The backend runs the
 actual MCP server and executes tools.
 
+### Session routing for backend replicas
+
+MCP connections are stateful: once a client establishes a session with a
+specific backend pod, all subsequent requests in that session must reach the
+same pod. When `backendReplicas > 1`, the proxy runner uses Redis to store a
+session-to-pod mapping so every proxy runner replica knows which backend pod
+owns each session.
+
+Without Redis, the proxy runner falls back to Kubernetes client-IP session
+affinity on the backend Service, which is unreliable behind NAT or shared egress
+IPs. If a backend pod is restarted or replaced, its entry in the Redis routing
+table is invalidated and the next request reconnects to an available pod —
+sessions are not automatically migrated between pods.
+
+:::note
+
+The `SessionStorageWarning` condition only fires when `spec.replicas > 1`
+(multiple proxy runner pods). It does not fire when only `backendReplicas > 1`,
+but Redis session storage is still strongly recommended in that case to ensure
+reliable per-session pod routing.
+
+:::
+
 Common configurations:
 
 - **Scale only the proxy** (`replicas: N`, omit `backendReplicas`): useful when
   auth and connection overhead is the bottleneck with a single backend.
 - **Scale only the backend** (omit `replicas`, `backendReplicas: M`): useful
-  when tool execution is CPU/memory-bound and the proxy is not a bottleneck. The
-  backend Deployment uses client-IP session affinity to route repeated
-  connections to the same pod - subject to the same NAT limitations as
-  proxy-level affinity.
+  when tool execution is CPU/memory-bound and the proxy is not a bottleneck.
+  Configure Redis session storage so the proxy runner can route requests to the
+  correct backend pod.
 - **Scale both** (`replicas: N`, `backendReplicas: M`): full horizontal scale.
-  Redis session storage is required when `replicas > 1`.
+  Redis session storage is required for reliable operation when `replicas > 1`,
+  and strongly recommended when `backendReplicas > 1`.
 
 ```yaml title="MCPServer resource"
 spec:
@@ -484,14 +507,6 @@ When running multiple replicas, configure
 across pods. If you omit `replicas` or `backendReplicas`, the operator defers
 replica management to an HPA or other external controller.
 
-:::note
-
-The `SessionStorageWarning` condition fires only when `spec.replicas > 1`.
-Scaling only the backend (`backendReplicas > 1`) does not trigger a warning, but
-backend client-IP affinity is still unreliable behind NAT or shared egress IPs.
-
-:::
-
 :::note[Connection draining on scale-down]
 
 When a proxy runner pod is terminated (scale-in, rolling update, or node