Add helper to install ceph cluster by viktor-karpochev · Pull Request #13 · deckhouse/storage-e2e

viktor-karpochev · 2026-04-23T09:36:46Z

Summary

Move reusable Rook/Ceph provisioning helpers into pkg/testkit so downstream module repositories can create a working Ceph-backed StorageClass without duplicating setup code.
Add Kubernetes helpers for Rook CephCluster / CephBlockPool, rook config overrides, Ceph credentials, csi-ceph connection/auth resources, CephStorageClass, and VolumeSnapshotClass support.
Keep storage-e2e focused on shared Ceph testkit utilities and update docs after the main documentation restructure.

Test Plan

go test ./pkg/...

Add kubernetes helpers for CephCluster, CephBlockPool, rook-config-override, Ceph credentials, CephClusterConnection/Authentication, CephStorageClass, VolumeSnapshotClass, and OSD backing StorageClass resolution. Add testkit.EnsureCephStorageClass orchestrating module enablement through working csi-ceph StorageClass, plus csi-ceph e2e test package. Signed-off-by: Viktor Karpochev <viktor.karpochev@flant.com> Made-with: Cursor Signed-off-by: Viktor Karpochev <viktor.karpochev@flant.com>

Move reusable Rook/Ceph provisioning and CRC toggling into storage-e2e so csi-ceph e2e can consume the shared testkit instead of carrying duplicated setup code. Signed-off-by: Viktor Karpochev <viktor.karpochev@flant.com> Made-with: Cursor

Keep storage-e2e focused on reusable Ceph testkit helpers while the csi-ceph repository owns its module-specific e2e suite. Signed-off-by: Viktor Karpochev <viktor.karpochev@flant.com> Made-with: Cursor

Keep the public Ceph helper comments aligned with the 10Gi OSD default and avoid referring to the old full 2x2 CRC matrix. Signed-off-by: Viktor Karpochev <viktor.karpochev@flant.com> Made-with: Cursor

Resolve the documentation restructure conflict while keeping the Ceph testkit helper docs aligned with the current tree. Made-with: Cursor

Extend storage-e2e so callers can provision a CephFS-backed CephStorageClass alongside the existing RBD path. * New pkg/kubernetes/cephfilesystem.go with idempotent CreateCephFilesystem / WaitForCephFilesystemReady / DeleteCephFilesystem helpers (single replicated metadata pool + one replicated data pool, configurable failure domain and MDS active count). WaitForCephFilesystemReady accepts both status.phase=Ready and status.conditions[Ready]=True so it works across Rook revisions. Adds CephFSDataPoolFullName helper that encodes Rook's <fsName>-<dataPoolName> pool naming convention so callers can feed the right value into CephStorageClass.spec.cephFS.pool. * pkg/testkit/ceph.go: CephStorageClassConfig grows a Type field ("RBD" default / "CephFS") plus CephFSName, CephFSDataPoolName, CephFS{Metadata,Data}Replicas, CephFSActiveMDSCount and CephFilesystemReadyTimeout knobs. EnsureCephStorageClass step 5 now branches on Type to create the matching pool primitive, and step 8 wires the resulting CephStorageClass with rbd.pool or cephFS.{fsName,pool} accordingly. TeardownCephStorageClass deletes the right Rook primitive based on Type. * New SkipClusterTeardown flag on CephStorageClassConfig: when several StorageClasses share one CephCluster, every teardown except the last one sets it to true so only the owning call removes the underlying CephCluster and rook-config-override. * Re-export CephStorageClassTypeRBD / CephStorageClassTypeCephFS from the testkit package so suites don't have to import pkg/kubernetes just to set cfg.Type. * docs/FUNCTIONS_GLOSSARY.md: documents the new CephFilesystem helpers, the CephFS branch of EnsureCephStorageClass, and the TeardownCephStorageClass + SkipClusterTeardown semantics. Signed-off-by: Aleksandr Zimin <alexandr.zimin@flant.com>

…mplating This bundles four related fixes that surfaced during csi-ceph e2e diagnosis, all aimed at the same failure mode: a flapping Wi-Fi or unreliable bootstrap network silently breaking a 50-minute test run. 1. modulePullOverride env templating - internal/config/overrides.go (+_test.go): ExpandEnvInModulePullOverride resolves ${VAR} placeholders in cluster_config.yml at config load time. CI sets one MODULE_IMAGE_TAG (e.g. "pr131" / "mr131") and points multiple modules at it without per-run YAML edits. Missing env fails fast with an explicit message so the wrong-image-pull confusion is gone. - Hooks in internal/cluster/cluster.go::LoadClusterConfig and pkg/cluster/cluster.go::loadClusterConfigFromPath after yaml.Unmarshal. - README.md documents the new ${VAR} form. 2. Bootstrap robustness on developer laptops - pkg/cluster/setup.go: pass FORCE_NO_PRIVATE_KEYS=true and USE_AGENT_WITH_NO_PRIVATE_KEYS=true into the dhctl install:main container so lib-connection stops trying to open /root/.ssh/id_rsa and authenticates only via the mounted ssh-agent socket. Fixes "extract config: Failed to read private keys from flags" with a passphrase-protected key. - pkg/cluster/vms.go: cloud-init now pins apt at mirror.yandex.ru and forces IPv4 so package_update + Docker install stop stalling on egress paths where archive.ubuntu.com is partially unreachable. - internal/config/env.go: extracted ApplyDefaults() out of ValidateEnvironment so suites that skip validation still get defaults for SSH_VM_USER / SSH_PRIVATE_KEY / etc. - pkg/cluster/cluster.go::CreateTestCluster now calls ApplyDefaults() and falls back to YAMLConfigFilenameDefaultValue on empty arg. - internal/cluster/cluster.go::GetKubeconfig falls back to clientcmd default loading rules (KUBECONFIG / ~/.kube/config, minified to the current context) when SSH retrieval fails and KUBE_CONFIG_PATH is unset. 3. SSH tunnel auto-reconnect - internal/infrastructure/ssh/client.go: both (*client).StartTunnel and (*jumpHostClient).StartTunnel now share runTunnelLoop driven by a tunnelDialer struct. When the underlying SSH session dies, dial fails with EOF; the loop emits a WARN, calls the existing reconnect() (which already has retry + exponential backoff), and retries the dial once with the rebuilt session. Without this a Wi-Fi flap killed the tunnel and every client-go GET silently returned EOF until the parent readiness timeout fired. 4. Per-call deadline + visible WARN in Ceph readiness pollers - pkg/kubernetes/poll.go (new): pollResourceUntilReady centralizes our Wait*Ready loops. Each Get is bounded by PollGetTimeout (30s) so a hung TCP connect surfaces in seconds, and consecutive Get failures escalate to WARN once they cross 3 so the user sees the cluster connection is dying instead of waiting for the readyTimeout. - pkg/kubernetes/{cephcluster,cephblockpool,cephfilesystem}.go: WaitForCephClusterReady / WaitForCephBlockPoolReady / WaitForCephFilesystemReady migrated. Public signatures unchanged. Docs: - docs/WORKLOG.md: 2026-05-05 entries. - docs/FUNCTIONS_GLOSSARY.md: updated descriptions for the three Wait*Ready helpers. - docs/ARCHITECTURE.md: poll.go and cephfilesystem.go added to the package tree (Sections 1.1 and 3.6); overrides.go in Section 3.1. Signed-off-by: Aleksandr Zimin <alexandr.zimin@flant.com>

GetKubeconfig used to log a single info-level line when SSH retrieval of admin.conf failed and we silently dropped to the developer's local kubeconfig. In practice that hid a class of nasty bugs where tests were acquiring stale locks on unrelated SAN clusters or installing modules against the wrong stand because $KUBECONFIG happened to point elsewhere. Make the fallback obvious: * Tag every kubeconfig source path with a short label (SSH(...), KUBE_CONFIG_PATH=..., LOCAL_FALLBACK(...)). * Promote the fallback message to logger.Warn, include the resolved current-context and cluster server URL, and tell the user how to fail-fast (unset KUBECONFIG, drop ~/.kube/config) if that behaviour is undesirable. * Always print a final "Loaded kubeconfig (source=..., current-context=..., server=...)" line so the actual cluster is visible in test logs regardless of which resolution path fired. The new kubeconfigContextSummary helper parses the serialized kubeconfig through clientcmd.Load and degrades to "<unknown>" on any error so the surrounding log line stays safe to print. Signed-off-by: Aleksandr Zimin <alexandr.zimin@flant.com>

TeardownCephStorageClass now waits for each CR to be GC'd before deleting its parent. Without that synchronization the parent CephCluster could be deleted while a child CephBlockPool / CephFilesystem is still alive, leaving Rook stuck with DeletionIsBlocked / ObjectHasDependents and the cluster in phase=Deleting indefinitely. Adds: - pollResourceUntilGone helper with periodic deletionTimestamp / finalizers progress logging, so a stuck finalizer surfaces immediately instead of after a silent timeout. - WaitFor*Gone helpers for CephCluster, CephBlockPool, CephFilesystem, CephClusterAuthentication, CephClusterConnection, CephStorageClass with sensible per-CR default budgets. - errIfTerminating guard in every Create* helper so an Ensure* call finds a Terminating CR and fails fast instead of issuing a silent no-op Update and trapping WaitFor*Ready for 15-20m. - pollResourceUntilReady fail-fast on deletionTimestamp != nil for the same reason. Fail-fast policy on Wait*Gone timeouts: errors are aggregated and returned, no auto-strip of finalizers — that would mask real Rook bugs. Operator must investigate the cluster manually before re-running. Signed-off-by: Aleksandr Zimin <alexandr.zimin@flant.com>

…l containers) storage-e2e had no pod-exec helpers at all (pkg/kubernetes/pod.go only covers WaitFor*Ready). Each downstream test suite was forced to roll its own — see csi-ceph/e2e/tests/e2e_shared_test.go::execInPod which wraps remotecommand.NewSPDYExecutor and only works on containers that have cat (i.e. test probe pods, not the actual distroless csi-controllers). This commit lifts pod exec into the shared testkit so any module's e2e suite can reuse it. New file: pkg/kubernetes/pod_exec.go - ExecInPod(ctx, kubeconfig, ns, pod, container, cmd) (stdout, stderr string, error). General SPDY exec on /pods/<name>/exec. Returns stdout/stderr SEPARATELY (the csi-ceph copy concatenates them and loses signal). - ReadFileFromPod(...) — ExecInPod + cat <path>. For containers that ship a real userland. - ReadFileFromDistrolessPod(..., opts ReadFileOptions) — adds a short-lived ephemeral container with TargetContainerName set, polls until it goes Running, then cat /proc/1/root<path>. The distroless path leans on Kubernetes Ephemeral Containers (GA since 1.25). They're added through the dedicated /pods/<name>/ephemeralcontainers subresource — NOT via the regular pod PUT/PATCH path, which is why the apiserver explicitly allows this mutation on a running pod and existing containers do NOT restart. metadata.generation, spec.containers, pod sandbox UID and ReplicaSet/DaemonSet observation all stay intact, so e2e suites that subsequently assert on checksum/... annotations or rollout state see a clean signal — the FS read does not contaminate it. Caveat documented in the doc-comment: ephemeral containers cannot be removed once added; sleep 60 lets the cat process exit on its own. For long-running test suites the entry just stays as Terminated in pod.status.ephemeralContainerStatuses until the next rollout recycles the pod. docs/FUNCTIONS_GLOSSARY.md gets a new entry under the Pod subsection listing the three primitives with selection guidance (which to pick for distroless vs. shell-bearing containers). Signed-off-by: Aleksandr Zimin <alexandr.zimin@flant.com>

ReadFileFromDistrolessPod was designed as one-shot: every call injects a fresh ephemeral container, waits for the kubelet to launch it, runs cat once, and exits. That's fine for diagnostics, but makes Eventually-style polling loops painfully slow — each iteration pays the full ephemeral-container cold-start cost (~10-20 s for kubelet to launch a new container in the existing pod sandbox), so a "predicate matches in 30 s" case can spend 2+ minutes inside the loop. A real trace from the msCrcData matrix shows ~127 s for an rbd FS-poll that should have settled in well under a minute. This commit splits the helper into a session API: - OpenDistrolessReader(...) injects ONE ephemeral container with a long sleep (default 30 minutes via opts.SessionTTL), waits for it to go Running, and returns a DistrolessReader bound to that ephemeral container. - DistrolessReader.ReadFile(ctx, path) is just a pods/exec round-trip into the already-running ephemeral container — sub-second. - ReadFileFromDistrolessPod is now a thin wrapper (open + read) for one-shot callers. Behaviour is unchanged from their perspective, but ReadFileOptions grows a SessionTTL field used by the session path. Reader API is what callers running poll loops should use; the single-shot helper stays for the one-shot diagnostics case. The reader cannot outlive its target pod — there's no Close() because Kubernetes does not allow removing an ephemeral container, and a pod recycle (rollout) drops the entry along with the rest of the pod status. Callers that need fresh sessions across pod identities should re-open against the new pod (DistrolessReader.PodName helps detect this). Signed-off-by: Aleksandr Zimin <alexandr.zimin@flant.com>

…system.Ready RestartCephDaemons used to rolling-restart only mon/mgr/osd, which left two classes of state stuck on the pre-flip ms_crc_data: 1. rook-ceph-mds: a CephFS daemon that talks to mons over the same messenger that ms_crc_data toggles CRC for. With mons on the new value and MDS still on the old one, the MDS↔mon channel silently desynchronises, CephFS goes degraded, and any csi-cephfs PVC hangs in Pending until somebody bounces MDS by hand. Reproduced reliably in the msCrcData matrix on cell `protocol=cephfs server=off client=off -> Bound`: PVC stuck for ~2 minutes, unstuck only after kubectl rollout restart of the d8-sds-elastic namespace. 2. The rook-operator pod: itself a Ceph admin client that uses an in-pod ceph.conf rendered at startup. Without a pod restart it keeps using the stale ms_crc_data and can't talk to the freshly- bounced mons, surfacing as cephcluster CR phase=Ready / state=Error / `failed to get status. . timed out` until the next reconcile after operator pod recycle. Fix: * Extend RestartCephDaemons selector to mon/mgr/osd/mds/rgw. rgw is pre-included for forward-compat with future S3 tests; absence is not an error. * Add RestartRookOperator helper that bounces the rook-operator Deployment and waits for Ready. Operator-Deployment name is derived from the namespace by stripping the leading `d8-` prefix (`d8-sds-elastic` → `sds-elastic`), matching how Deckhouse packages the operator binary as a per-module Helm release. Vanilla Rook (`rook-ceph-operator` in `rook-ceph` namespace) is not supported — storage-e2e targets the Deckhouse flavor exclusively. Returns a descriptive error if the namespace doesn't have the expected prefix or the derived Deployment isn't there. * Wire RestartRookOperator into SetMsCrcDataOnServer (after the daemon restart so the operator boots against fresh-config mons). * Gate the whole flip on every CephFilesystem in the namespace reaching Ready before returning. Catches the MDS-stuck-on-old-CRC class of bug at the source instead of letting it surface as a PVC timeout downstream. RBD-only clusters are a no-op (no CephFilesystem CRs to wait for). Net cost: ~30s extra per flip (mds + operator restart). In return: no manual kubectl rollout restart between matrix cells, no spurious HEALTH_ERR on cephcluster CR, and CephFS PVCs stop hanging in Pending when CRC flips back to a matched state. Signed-off-by: Aleksandr Zimin <alexandr.zimin@flant.com>

When SSH retrieval of /etc/kubernetes/{super-admin,admin}.conf from the master fails and KUBE_CONFIG_PATH is not set, GetKubeconfig now fails fast again instead of silently loading the developer's $KUBECONFIG / ~/.kube/config via clientcmd.NewDefaultClientConfigLoadingRules. The fallback (added in e3d4e8d) was convenient on dev laptops but too risky in CI and on machines whose `kubectl` already targets an unrelated cluster: tests would silently deploy modules to / acquire cluster locks on the wrong stand. Reverting preserves the original fail-fast contract that downstream suites already relied on. - internal/cluster/cluster.go: replace the default switch branch with a descriptive error pointing at KUBE_CONFIG_PATH and embedding the SSH error; drop loadDefaultKubeconfig and the now-unused clientcmdapi import. - docs/WORKLOG.md: rewrite the 2026-05-05 GetKubeconfig bullet to reflect the final fail-fast behavior. Signed-off-by: Aleksandr Zimin <alexandr.zimin@flant.com>

Backfill the documentation that earlier commits in this branch should have updated as they landed. No code changes. FUNCTIONS_GLOSSARY.md: - Pod section: documented OpenDistrolessReader and the three *DistrolessReader methods (PodName, EphemeralName, ReadFile) added alongside the single-shot ReadFileFromDistrolessPod helper. - New sections: "Ceph CRC (Testkit)" (EnableServerCRC / DisableServerCRC / ResetServerCRCToDefault / SetMsCrcDataOnServer / RestartCephDaemons / RestartRookOperator) and "VolumeSnapshotClass" (CreateVolumeSnapshotClass / WaitForVolumeSnapshotClass). - StorageClass section: documented CreateStorageClass (in pkg/kubernetes/storageclass_manage.go). - Rook Config Override section: documented RenderCephGlobalConfig. - Table of Contents: added missing entries for "Ceph Cluster (Testkit) - no csi-ceph wiring", "VolumeSnapshotClass", and "Ceph CRC (Testkit)". ARCHITECTURE.md: - Section 1.1 (Package Structure): added internal/config/overrides.go (was only listed in 3.1) and pkg/kubernetes/pod_exec.go. - Section 3.6 (Public API): added pkg/kubernetes/pod_exec.go. - Section 7 (Environment Variables): documented the new fail-fast KUBE_CONFIG_PATH semantics and the generic ${VAR} expansion in modulePullOverride (e.g. MODULE_IMAGE_TAG). WORKLOG.md: - 2026-05-05: backfilled entries for pod_exec.go, DistrolessReader, the WaitFor*Gone family + Create-time deletionTimestamp guards, TeardownCephStorageClass rewrite, RestartCephDaemons selector extension (mds/rgw), RestartRookOperator, SetMsCrcDataOnServer rework. The GetKubeconfig revert (which actually landed today) was hoisted out of 2026-05-05 into a new 2026-05-06 heading. Signed-off-by: Aleksandr Zimin <alexandr.zimin@flant.com>

When the SSH-side kubeconfig fetch fails and KUBE_CONFIG_PATH is unset, the default switch branch in GetKubeconfig used to return a single generic "command failed: Process exited with status 1" wrapped into a vague suggestion to "fix SSH credentials so passwordless sudo works on the master". That left the operator guessing. The default branch now runs two cheap probe commands against the master to classify the failure: 1) test -f /etc/kubernetes/{super-admin,admin}.conf -> at least one kubeconfig file exists on the host 2) sudo -n -l /bin/cat <path-from-1> -> a NOPASSWD rule that matches the cat command actually applies and returns a multi-line, actionable error tailored to the detected cause. The "sudo password required" branch embeds a ready-to-paste /etc/sudoers.d/e2e-kubeconfig snippet (with the actual SSH user baked in), the "kubeconfig missing" branch points at SSH_HOST/SSH_JUMP_HOST misconfig, and the unknown branch lists all three remedies. While here, fix a self-inflicted source of the same failure: the SSH command used to read the kubeconfig was sudo -n sh -c 'if [ -f .../super-admin.conf ]; then cat ...; ...' so the privileged binary as far as sudoers was concerned was /bin/sh, NOT /bin/cat. The fine-grained NOPASSWD rule the new error message recommends ("NOPASSWD: /bin/cat /etc/kubernetes/{super-admin,admin}.conf") therefore did not match and sudo asked for a password — exactly the situation the error message tells the user to fix. The command is now sudo -n /bin/cat /etc/kubernetes/super-admin.conf 2>/dev/null \ || sudo -A -n /bin/cat /etc/kubernetes/admin.conf which works with the recommended minimal rule. The classifier probe was moved off "sudo -n true" for the same reason: under hosts that grant "NOPASSWD: ALL" the probe returned 0 even when the per-file rule was absent, which would mask the real cause. "sudo -n -l /bin/cat <path>" asks sudo whether THAT specific command is allowed without a password. Contract preserved: still fail-fast (no silent ~/.kube/config fallback), still wraps the original ssh exit error via %w so callers' errors.Is / errors.As keep working. Probes are best-effort -- any error from a probe is treated as "unknown" rather than masking the original sshErr. Out of scope: actual SUDO_PASSWORD plumbing via 'sudo -S' (requires extending SSHClient to forward stdin and adding secret-redaction in command logs). Documented as a follow-up. Signed-off-by: Aleksandr Zimin <alexandr.zimin@flant.com>

Conflicts: - go.mod: dropped mxk/go-flowrate and openshift/api indirect deps; go mod tidy confirms they are unused. - internal/config/env.go: kept main's EffectiveVirtualMachineClassName and our ApplyDefaults() extraction; ValidateEnvironment now calls ApplyDefaults first. - pkg/cluster/setup.go: took main's --connection-config approach for passphrase-protected dhctl bootstrap; main's solution subsumes our earlier FORCE_NO_PRIVATE_KEYS / USE_AGENT_WITH_NO_PRIVATE_KEYS hack. Signed-off-by: Aleksandr Zimin <alexandr.zimin@flant.com>

viktor-karpochev force-pushed the vkarpochev/csi-ceph-testkit branch from 4431dc7 to e4d5466 Compare April 23, 2026 09:51

viktor-karpochev and others added 14 commits April 28, 2026 17:03

Add Ceph testkit provisioning helpers

af5b794

Move reusable Rook/Ceph provisioning and CRC toggling into storage-e2e so csi-ceph e2e can consume the shared testkit instead of carrying duplicated setup code. Signed-off-by: Viktor Karpochev <viktor.karpochev@flant.com> Made-with: Cursor

Remove csi-ceph test suite from storage-e2e

fca3bf9

Keep storage-e2e focused on reusable Ceph testkit helpers while the csi-ceph repository owns its module-specific e2e suite. Signed-off-by: Viktor Karpochev <viktor.karpochev@flant.com> Made-with: Cursor

Fix Ceph testkit docs

f162d5a

Keep the public Ceph helper comments aligned with the 10Gi OSD default and avoid referring to the old full 2x2 CRC matrix. Signed-off-by: Viktor Karpochev <viktor.karpochev@flant.com> Made-with: Cursor

Merge origin/main into csi-ceph testkit branch

3013304

Resolve the documentation restructure conflict while keeping the Ceph testkit helper docs aligned with the current tree. Made-with: Cursor

AleksZimin force-pushed the vkarpochev/csi-ceph-testkit branch from 75f4eae to 1d636a5 Compare May 7, 2026 07:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add helper to install ceph cluster#13

Add helper to install ceph cluster#13
viktor-karpochev wants to merge 16 commits intomainfrom
vkarpochev/csi-ceph-testkit

viktor-karpochev commented Apr 23, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

viktor-karpochev commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test Plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

viktor-karpochev commented Apr 23, 2026 •

edited

Loading