Conversation
…recovery Distinguish high availability (single-site clustering) from disaster recovery (multi-site failover), clarify that Mattermost supports active/passive DR only and does not support active/active deployments, and rename the "High Availability deployment" section to "Active/passive DR deployment" for accuracy. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Extract the AWS-specific active/passive DR deployment steps from backup-disaster-recovery.rst into a new disaster-recovery-aws.rst subpage. The main page now links to it via toctree, keeping the overview page concise and making room for future platform-specific guides. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
✅ Files skipped from review due to trivial changes (1)
📝 WalkthroughWalkthroughDocumentation restructured: the main disaster recovery guide was simplified to distinguish HA vs DR and introduce an active/passive DR approach; detailed AWS-specific active/passive DR implementation moved into a new platform-specific guide with end-to-end replication and failover steps. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor User
participant DNS as DNS
participant Primary as PrimaryRegion\n(App, RDS, S3, OpenSearch)
participant Secondary as SecondaryRegion\n(App replica, RDS replica, S3 replica, OpenSearch replica)
participant JobServer as JobServer
User->>DNS: Resolve app endpoint
DNS->>Primary: Direct traffic to Primary App nodes
User->>Primary: App requests (reads/writes)
Primary->>RDS: DB writes/reads
Primary->>S3: Object writes (replicated)
Primary->>OpenSearch: Index writes (replicated)
Note over Primary,Secondary: Continuous replication configured\n(RDS global cluster, S3 replication, OpenSearch CCR)
%% Failover initiation
alt Primary failure detected
Admin->>DNS: Switch endpoint to Secondary
DNS->>Secondary: Route users to Secondary App nodes
Admin->>RDS: Promote secondary as writer
Admin->>JobServer: Disable scheduler on Secondary until failover complete
Admin->>OpenSearch: Reverse replication direction, make indices writable
Secondary->>S3: Accept replicated objects / sync
Admin->>JobServer: Enable scheduler on Secondary
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (2)
source/deployment-guide/disaster-recovery-aws.rst (2)
95-117: Usejsoninstead ofshfor the IAM policy block.This block is a JSON policy document, not a shell command. Language labelling should match content.
Suggested minimal diff
- .. code-block:: sh + .. code-block:: jsonAs per coding guidelines, "Require code fences or code directives to identify the language when practical."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@source/deployment-guide/disaster-recovery-aws.rst` around lines 95 - 117, The code block showing the IAM policy is labeled as a shell snippet ("code-block:: sh") but contains JSON; update the directive to "code-block:: json" so the IAM policy document is correctly identified and syntax-highlighted; locate the block that currently begins with code-block:: sh and change that directive to code-block:: json (the JSON policy object with keys "Version" and "Statement") to match the content.
7-10: Add a short prerequisites block before procedural steps.This page jumps into execution quickly. A compact prerequisites list (AWS account access, region pair selected, existing Mattermost primary deployment, DNS ownership, OpenSearch/RDS permissions) would reduce operator error for novice admins.
As per coding guidelines, "List prerequisites clearly at the top of documentation sections."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@source/deployment-guide/disaster-recovery-aws.rst` around lines 7 - 10, Add a short "Prerequisites" block at the top of the Mattermost AWS disaster recovery guide (before the procedural steps that start with the current introductory paragraphs) listing required items: AWS account access and IAM permissions, chosen region pair for failover, an existing Mattermost primary deployment, control/ownership of DNS for failover updates, required OpenSearch/RDS permissions and backups, and any tooling/CLI versions; ensure the block uses a clear bullet list and a brief note about verifying backups and network connectivity so novice operators see these checks before executing the procedure.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@source/deployment-guide/disaster-recovery-aws.rst`:
- Line 13: Fix the typo in the cross-reference sentence by replacing
"documenation" with "documentation" in the sentence that references the
Upgrading Mattermost in Kubernetes and High Availability Environments doc (the
string containing ":doc:`Upgrading Mattermost in Kubernetes and High
Availability Environments
</administration-guide/upgrade/upgrade-mattermost-kubernetes-ha>`"). Ensure the
corrected sentence reads "...see the ... documentation." and keep the rest of
the cross-reference unchanged.
- Around line 235-237: Duplicate curl command checking the _status for
posts_<DATE> appears twice; remove the redundant line so only one curl -H
'Content-Type: application/json' -u '<USERNAME>:<PASSWORD>'
'https://<HOSTNAME>/_plugins/_replication/posts_<DATE>/_status?pretty' remains,
preserving the Sample output line that follows and keeping steps atomic and
numbered.
- Line 175: Replace the incorrect curl credential separator and clarify the host
placeholder: change the curl -u argument from "username/password" to the
required "username:password" format and update the URL placeholder (e.g., use a
clearer <hostname[:port]> or <elasticsearch-host>) in the example command string
shown in the diff so readers can substitute a real host.
---
Nitpick comments:
In `@source/deployment-guide/disaster-recovery-aws.rst`:
- Around line 95-117: The code block showing the IAM policy is labeled as a
shell snippet ("code-block:: sh") but contains JSON; update the directive to
"code-block:: json" so the IAM policy document is correctly identified and
syntax-highlighted; locate the block that currently begins with code-block:: sh
and change that directive to code-block:: json (the JSON policy object with keys
"Version" and "Statement") to match the content.
- Around line 7-10: Add a short "Prerequisites" block at the top of the
Mattermost AWS disaster recovery guide (before the procedural steps that start
with the current introductory paragraphs) listing required items: AWS account
access and IAM permissions, chosen region pair for failover, an existing
Mattermost primary deployment, control/ownership of DNS for failover updates,
required OpenSearch/RDS permissions and backups, and any tooling/CLI versions;
ensure the block uses a clear bullet list and a brief note about verifying
backups and network connectivity so novice operators see these checks before
executing the procedure.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: ca13014e-776c-4c71-a691-88e42d103ad0
📒 Files selected for processing (2)
source/deployment-guide/backup-disaster-recovery.rstsource/deployment-guide/disaster-recovery-aws.rst
|
Newest code from mattermost has been published to preview environment for Git SHA 7fd420e |
- Fix typo: "documenation" → "documentation" - Fix curl credentials: "username/password" → "<USERNAME>:<PASSWORD>" and empty host placeholder - Remove duplicate posts_<DATE> status curl command - Change IAM policy code block language from sh to json - Add prerequisites section to disaster-recovery-aws.rst - Wrap HA vs DR explanation in a note directive Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Newest code from mattermost has been published to preview environment for Git SHA dde9aab |
Change the three SSO failover sub-sections from ~~~~ to ^^^^^ so they render as children of "Failover from Single Sign-On outage" in the sidebar TOC rather than at the same level. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Newest code from mattermost has been published to preview environment for Git SHA 995626c |
|
cc @mrckndt |
| Set up in one data center | ||
| -------------------------- | ||
|
|
||
| As a first step, set up Mattermost in a single data center. At a very basic high level, this would be something like below: |
There was a problem hiding this comment.
The description "At a very basic high level, this would be something like below" reads a little awkward, imo. Maybe something like "The following diagram illustrates a basic single data center architecture:" It reads a bit cleaner.
| .. tip:: | ||
|
|
||
| All you need is a recent OpenSearch version with fine-grained access control enabled. Node-to-node encryption is automatically enabled once you enable fine-grained access control. | ||
|
|
There was a problem hiding this comment.
"All you need is a recent OpenSearch version with fine-grained access control enabled." This reads a bit weird because the bullet just above it also lists Elasticsearch 7.10 as supported. The tip should either acknowledge both or be scoped appropriately. I get what you're trying to do, and it's probably not a big deal. Maybe reword it like "if you are already running Opensearch 2.x, all you need to do is turn on fine-grained access control and node-to-node encryption will enable automatically" so it reads a little more tip-like?
|
|
||
| For simplicity, let's say ``site1`` is primary, and ``site2`` is secondary. Therefore, OS in ``site1`` is the leader domain, and in ``site2`` is the follower. The follower pulls from the leader. To switch the direction where ``site2`` becomes leader, and ``site1`` becomes follower. | ||
|
|
||
| 1. Remove the rule from ``site1`` > ``site 2`` in AWS Console. This will auto-pause the replication, but the indices in ``site2`` will still be read-only. Remove the replication rules for that. |
There was a problem hiding this comment.
"site 2" here versus "site2" everywhere else.
| S3 bucket is auto-replicated both ways | ||
| ---------------------------------------- | ||
|
|
||
| There's nothing you need to do to ensure the S3 bucket is auto-replicating both ways. |
There was a problem hiding this comment.
This section have a whole header for just one sentence is not wrong but, I don't know, just seems kind of weird. Maybe make it a Note or Tip style section instead?
| .. tip:: | ||
| Websockets will still point to the old data center even if you have switched DNS. You need to roll over each app node gradually to move those connections to the new data center. If all your nodes are down, no action is necessary and the clients will automatically re-connect to the new data center. | ||
|
|
||
| The S3 bucket is replicated bi-directionally while the database and ES/OS is replicated uni-directionally. |
There was a problem hiding this comment.
Should we add a note or section here at the end of what do do when the disaster event or whatever is over? Even if it's just "perform these operations the same way to restore functionality back to the primary data center".
| ---------------------- | ||
|
|
||
| If the job scheduler is left running in the secondary region, it will pick up jobs and start running them. Therefore, set ``JobSettings.RunScheduler`` to ``false`` on all nodes in the secondary region. When a failover happens, you need to enable it for the new primary region, and deactivate it for the new secondary region. | ||
|
|
There was a problem hiding this comment.
Should we tell them how to do this, or are we assuming if they are this far they probably know how to change settings?
Summary
The old page was 90% an AWS guide. I've split the page into two: one general overview page on how we do DR at Mattermost and a sub page for the AWS guide. I've also added clarifications to the main page on what is supported and what isn't.
AI Summary
backup-disaster-recovery.rstinto a newdisaster-recovery-aws.rstsubpagetoctree, keeping the overview concise and making room for future platform-specific guidesmaster)Preview
http://mattermost-docs-preview-pulls.s3-website-us-east-1.amazonaws.com/8893/deployment-guide/backup-disaster-recovery.html#
🤖 Generated with Claude Code