diff --git a/README.md b/README.md index 0f6b6ac..177c1d1 100644 --- a/README.md +++ b/README.md @@ -100,6 +100,7 @@ See [docs/FUNCTIONS_GLOSSARY.md](docs/FUNCTIONS_GLOSSARY.md) for a full list of - `TEST_CLUSTER_CLEANUP` -- Set to `true` to remove the test cluster after tests complete. Default: `false` - `TEST_CLUSTER_RESUME` -- Set to `true` to continue from a previous failed run (only for `alwaysCreateNew`). If the test failed in the middle of cluster creation, re-run with `TEST_CLUSTER_RESUME=true`; the framework will load saved state from `/tmp/e2e/cluster-state.json` (written after step 6), restore VM hostnames, and run the remaining steps (connect to first master, add nodes, enable modules). Requires that step 6 (VMs created, VM info gathered) completed before the failure. - `TEST_CLUSTER_NAMESPACE` -- Namespace for DKP cluster deployment. Default: `e2e-test-cluster` +- `TEST_CLUSTER_VIRTUAL_MACHINE_CLASS_NAME` -- `VirtualMachine.spec.virtualMachineClassName` for VMs created on the **base** cluster (`alwaysCreateNew`). Default: `generic` (no extra objects created). If you set a **different** name and no `VirtualMachineClass` with that name exists, the framework creates one by cloning the **`generic`** class: it keeps **`spec.sizingPolicies`** (and other non-placement fields from the template) but sets **`spec.cpu.type` to `Host`** and **clears `spec.nodeSelector` and `spec.tolerations`** so scheduling is not inherited from `generic` (Host CPU requires a consistent instruction-set pool for live migration; see [Deckhouse VM classes](https://deckhouse.io/products/virtualization-platform/documentation/admin/platform-management/virtualization/virtual-machine-classes.html)). The created resource has label **`storage-e2e.deckhouse.io/auto-created=true`** and is **not** deleted during test cleanup (remove manually if needed). The value must be a valid Kubernetes DNS-1123 subdomain when not `generic` (matches cluster-scoped resource name constraints). - `KUBE_CONFIG_PATH` -- Path to a kubeconfig file. Used as fallback if SSH-based kubeconfig retrieval fails - `KUBE_INSECURE_SKIP_TLS_VERIFY` -- Set to `true` to skip TLS certificate verification for the Kubernetes API (e.g. self-signed certs or tunnel to 127.0.0.1). Default: not set (verify TLS) - `IMAGE_PULL_POLICY` -- Image pull policy for ClusterVirtualImages: `Always` or `IfNotExists`. Default: `IfNotExists` diff --git a/docs/ARCHITECTURE.md b/docs/ARCHITECTURE.md index d40a4be..4d06918 100644 --- a/docs/ARCHITECTURE.md +++ b/docs/ARCHITECTURE.md @@ -441,7 +441,7 @@ infrastructure/ssh/ - SSH key handling - Port forwarding (e.g., for Kubernetes API access) - Remote command execution -- File transfer operations +- File transfer operations (including UploadPrivate: chmod-before-data for sensitive payloads) **Key Features**: - Support for password and key-based authentication @@ -727,6 +727,7 @@ logger.Error("Failed to create resource: %v", err) | `SSH_PUBLIC_KEY` | `~/.ssh/id_rsa.pub` | SSH public key path | | `SSH_VM_USER` | `cloud` | SSH user for VMs | | `TEST_CLUSTER_NAMESPACE` | `e2e-test-cluster` | Test namespace name | +| `TEST_CLUSTER_VIRTUAL_MACHINE_CLASS_NAME` | `generic` | VM class for VMs on the base cluster in `alwaysCreateNew`. If set to another name (DNS-1123 subdomain) and the class does not exist, it is created from `generic` with `spec.cpu.type: Host`, **`spec.nodeSelector` / `spec.tolerations` cleared**, sizing policies retained from template, labeled `storage-e2e.deckhouse.io/auto-created=true`, and left after cleanup | | `TEST_CLUSTER_CLEANUP` | `false` | Cleanup cluster after tests | | `LOG_LEVEL` | `debug` | Log level (debug/info/warn/error) | | `KUBE_CONFIG_PATH` | - | Fallback kubeconfig path | @@ -770,6 +771,7 @@ logger.Error("Failed to create resource: %v", err) - Set `TEST_CLUSTER_CLEANUP=false` for debugging - Bootstrap node always cleaned up - Test VMs cleaned up only if cleanup enabled +- `VirtualMachineClass` resources auto-created by the framework (custom class name with clone-from-generic logic) are **not** removed during cleanup; they remain cluster-scoped on the base cluster for idempotent re-runs ### 8.4 Using Existing Cluster Mode diff --git a/docs/FUNCTIONS_GLOSSARY.md b/docs/FUNCTIONS_GLOSSARY.md index cdf1bf7..64d4b4a 100644 --- a/docs/FUNCTIONS_GLOSSARY.md +++ b/docs/FUNCTIONS_GLOSSARY.md @@ -67,7 +67,7 @@ All exported functions available in the `pkg/` directory, grouped by resource. `pkg/cluster/vms.go` -- `CreateVirtualMachines(ctx, virtClient, clusterDef)` — Creates all VMs from cluster definition in parallel. Handles name conflicts and returns VM names and resource tracking info. +- `CreateVirtualMachines(ctx, virtClient, clusterDef)` — Ensures configured `VirtualMachineClass` exists (auto-create from `generic` with Host CPU when missing; clears inherited `nodeSelector`/`tolerations`; keeps sizing policies), creates CVIs/VMs in parallel, handles name conflicts, returns VM names and resource tracking info. - `RemoveAllVMs(ctx, resources)` — Forcefully stops and deletes VMs, virtual disks, and virtual images. - `RemoveVM(ctx, virtClient, namespace, vmName)` — Removes a single VM and its associated VirtualDisks and ClusterVirtualImage (if unused). - `GetSetupNode(clusterDef)` — Returns the setup (bootstrap) VM node from ClusterDefinition. diff --git a/docs/WORKLOG.md b/docs/WORKLOG.md index 5ff70b2..e7d995f 100644 --- a/docs/WORKLOG.md +++ b/docs/WORKLOG.md @@ -4,6 +4,36 @@ All notable changes to this repository are documented here. New entries are appe --- +## 2026-05-06 + +- **Add** `UploadPrivate` on `ssh.SSHClient` (`internal/infrastructure/ssh`): SFTP `Chmod` immediately after `Create`, before payload copy; `uploadOverSFTPOnce`, `uploadWithSFTPRetries`, `jumpUploadWithSFTPRetries`; passphrase `BootstrapCluster` uses it with `install -d -m 0700` staging (`pkg/cluster/setup.go`); ARCHITECTURE mentions ssh uploads +- **Bugfix** `ensureVirtualMachineClassForClusterVMs` (`pkg/cluster/vms.go`): GET + wait Ready for configured class including default `generic`; explicit error if default missing; Host CPU auto-clone still clears `nodeSelector`/`tolerations` from template +- **Update** `ValidateEnvironment` (`internal/config/env.go`): non-`generic` `TEST_CLUSTER_VIRTUAL_MACHINE_CLASS_NAME` validated with `IsDNS1123Subdomain`; README, ARCHITECTURE §7, FUNCTIONS_GLOSSARY aligned (names + auto-created class semantics) + +--- + +## 2026-05-04 + +- **Bugfix** `BootstrapCluster` in `pkg/cluster/setup.go`: drop dhctl-in-Docker flow via `SSH_AUTH_SOCK`/ssh-agent; bind-mount the setup-node key (from `UploadBootstrapFiles`) to `/root/.ssh/id_rsa` and pass `--ssh-agent-private-keys` — aligns with dhctl/lib-connection `ExtractConfig` reading key paths early ([deckhouse#19063](https://github.com/deckhouse/deckhouse/pull/19063)) +- **Add** when `SSH_PASSPHRASE` is set: build dhctl connection-config (`SSHConfig` + `SSHHost`, `dhctl.deckhouse.io/v1`) with inline PEM and passphrase, upload to the setup node, run bootstrap with `--connection-config` only (dhctl disallows mixing with `--ssh-*`) +- **Add** `buildDHCTLSSHConnectionConfig` and YAML manifest structs (`dhctlSSHConfigManifest`, etc.) in `pkg/cluster/setup.go` + +--- + +## 2026-04-30 + +- **Add** `TEST_CLUSTER_VIRTUAL_MACHINE_CLASS_NAME` in `internal/config/env.go`: configurable `VirtualMachineClassName` for base-cluster VMs (default `generic`), DNS-1123 validation for non-generic names +- **Add** `EffectiveVirtualMachineClassName()` and `VirtualMachineClassReadinessTimeout` (`internal/config/config.go`) +- **Add** `VirtualMachineClass` client (`internal/kubernetes/virtualization/virtual_machine_class.go`) and `Client.VirtualMachineClasses()` in `client.go` +- **Add** `ensureVirtualMachineClassForClusterVMs` / readiness wait in `pkg/cluster/vms.go`: if named class is missing, clone from `generic` with `spec.cpu.type` Host, label `storage-e2e.deckhouse.io/auto-created=true`; no deletion on e2e cleanup +- **Update** `CreateVirtualMachines` to call ensure before CVMI creation; `createVM` uses effective class name +- **Update** env dumps in `pkg/cluster/cluster.go`, `tests/test-template/template_test.go`, and `tests/csi-all-stress-tests/csi_all_stress_tests_test.go` +- **Update** `docs/FUNCTIONS_GLOSSARY.md`: `CreateVirtualMachines` description (ensure VM class) +- **Bugfix** `ValidateEnvironment` in `internal/config/env.go`: align error strings with staticcheck ST1005 (no trailing punctuation; semicolons in multi-part messages) +- **Update** `github.com/deckhouse/virtualization/api` to v1.8.0: register `core/v1alpha3` scheme in virtualization client; `VirtualMachineClass` CRUD uses `v1alpha3` (preferred API; `spec.cpu.discovery` is `*CpuDiscovery`, so Host CPU serializes without empty discovery object) + +--- + ## 2026-03-25 - **Refactor** `WaitForLocalStorageClassCreated` in `pkg/kubernetes/localstorageclass.go`: replaced manual deadline + `time.Sleep` with idiomatic `context.WithTimeout` + `time.NewTicker` + `select` pattern diff --git a/go.mod b/go.mod index e8935c3..764c44d 100644 --- a/go.mod +++ b/go.mod @@ -5,7 +5,7 @@ go 1.26.0 require ( github.com/deckhouse/deckhouse v1.74.0 github.com/deckhouse/sds-node-configurator/api v0.0.0-20260114125558-7fd7152586ff - github.com/deckhouse/virtualization/api v1.0.0 + github.com/deckhouse/virtualization/api v1.8.0 github.com/go-logr/logr v1.4.3 github.com/onsi/ginkgo/v2 v2.23.3 github.com/onsi/gomega v1.37.0 @@ -13,9 +13,9 @@ require ( golang.org/x/crypto v0.46.0 golang.org/x/term v0.38.0 gopkg.in/yaml.v3 v3.0.1 - k8s.io/api v0.34.1 - k8s.io/apimachinery v0.34.1 - k8s.io/client-go v0.34.1 + k8s.io/api v0.34.2 + k8s.io/apimachinery v0.34.2 + k8s.io/client-go v0.34.2 sigs.k8s.io/controller-runtime v0.22.4 ) @@ -42,7 +42,6 @@ require ( github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd // indirect github.com/modern-go/reflect2 v1.0.3-0.20250322232337-35a7c28c31ee // indirect github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822 // indirect - github.com/openshift/api v0.0.0-20230503133300-8bbcb7ca7183 // indirect github.com/openshift/custom-resource-status v1.1.2 // indirect github.com/pkg/errors v0.9.1 // indirect github.com/spf13/pflag v1.0.7 // indirect @@ -58,12 +57,12 @@ require ( google.golang.org/protobuf v1.36.6 // indirect gopkg.in/evanphx/json-patch.v4 v4.12.0 // indirect gopkg.in/inf.v0 v0.9.1 // indirect - k8s.io/apiextensions-apiserver v0.34.1 // indirect + k8s.io/apiextensions-apiserver v0.34.2 // indirect k8s.io/klog/v2 v2.130.1 // indirect k8s.io/kube-openapi v0.0.0-20250710124328-f3f2b991d03b // indirect k8s.io/utils v0.0.0-20250604170112-4c0f3b243397 // indirect - kubevirt.io/api v1.3.1 // indirect - kubevirt.io/containerized-data-importer-api v1.57.0-alpha1 // indirect + kubevirt.io/api v1.6.2 // indirect + kubevirt.io/containerized-data-importer-api v1.60.3-0.20241105012228-50fbed985de9 // indirect kubevirt.io/controller-lifecycle-operator-sdk/api v0.0.0-20220329064328-f3cc58c6ed90 // indirect sigs.k8s.io/json v0.0.0-20241014173422-cfa47c3a1cc8 // indirect sigs.k8s.io/randfill v1.0.0 // indirect diff --git a/go.sum b/go.sum index eb9ac25..e68a8bc 100644 --- a/go.sum +++ b/go.sum @@ -24,8 +24,8 @@ github.com/deckhouse/deckhouse v1.74.0 h1:a/gEuLKutoV6ReWaBWMDJ+VLlOkkCwS4VMvR/s github.com/deckhouse/deckhouse v1.74.0/go.mod h1:qMuvDbP8AYghXkWmDjoFPc6r1w9uw/cWxl/hmvA0BzA= github.com/deckhouse/sds-node-configurator/api v0.0.0-20260114125558-7fd7152586ff h1:G6H7rkm/AvL6xWwbNO14gyistC3p48weL0sLCvpJnyI= github.com/deckhouse/sds-node-configurator/api v0.0.0-20260114125558-7fd7152586ff/go.mod h1:X5ftUa4MrSXMKiwQYa4lwFuGtrs+HoCNa8Zl6TPrGo8= -github.com/deckhouse/virtualization/api v1.0.0 h1:q4TvC74tpjk25k0byXJCYP4HjvRexBSeI0cC8QeCMTQ= -github.com/deckhouse/virtualization/api v1.0.0/go.mod h1:meTeGulR+xwnvt0pTGsoI14YhGe0lHUVyAfhZsoQyeQ= +github.com/deckhouse/virtualization/api v1.8.0 h1:wR4Ivcg56OWJRGWrZjEL+0mQrHFEG0gKn0xrq1yzjy0= +github.com/deckhouse/virtualization/api v1.8.0/go.mod h1:jqKdfrs7bhU5kbn6JTJUix8N180UkugJIa3TnOTqdmA= github.com/docopt/docopt-go v0.0.0-20180111231733-ee0de3bc6815/go.mod h1:WwZ+bS3ebgob9U8Nd0kOddGdZWjyMGR8Wziv+TBNwSE= github.com/elazarl/goproxy v0.0.0-20180725130230-947c36da3153/go.mod h1:/Zj4wYkgs4iZTTu3o/KG3Itv/qCCa8VVMlb3i9OVuzc= github.com/emicklei/go-restful v0.0.0-20170410110728-ff4f55a20633/go.mod h1:otzb+WCGbkyDHkqmQmT5YD2WR4BBwUdeQoFo8l/7tVs= @@ -162,8 +162,6 @@ github.com/onsi/gomega v1.17.0/go.mod h1:HnhC7FXeEQY45zxNK3PPoIUhzk/80Xly9PcubAl github.com/onsi/gomega v1.18.1/go.mod h1:0q+aL8jAiMXy9hbwj2mr5GziHiwhAIQpFmmtT5hitRs= github.com/onsi/gomega v1.37.0 h1:CdEG8g0S133B4OswTDC/5XPSzE1OeP29QOioj2PID2Y= github.com/onsi/gomega v1.37.0/go.mod h1:8D9+Txp43QWKhM24yyOBEdpkzN8FvJyAwecBgsU4KU0= -github.com/openshift/api v0.0.0-20230503133300-8bbcb7ca7183 h1:t/CahSnpqY46sQR01SoS+Jt0jtjgmhgE6lFmRnO4q70= -github.com/openshift/api v0.0.0-20230503133300-8bbcb7ca7183/go.mod h1:4VWG+W22wrB4HfBL88P40DxLEpSOaiBVxUnfalfJo9k= github.com/openshift/custom-resource-status v1.1.2 h1:C3DL44LEbvlbItfd8mT5jWrqPfHnSOQoQf/sypqA6A4= github.com/openshift/custom-resource-status v1.1.2/go.mod h1:DB/Mf2oTeiAmVVX1gN+NEqweonAPY0TKUwADizj8+ZA= github.com/pkg/errors v0.9.1 h1:FEBLx1zS214owpjy7qsBeixbURkuhQAwrK5UwLGTwt4= @@ -362,15 +360,15 @@ gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM= honnef.co/go/tools v0.0.0-20190102054323-c2f93a96b099/go.mod h1:rf3lG4BRIbNafJWhAfAdb/ePZxsR/4RtNHQocxwk9r4= honnef.co/go/tools v0.0.0-20190523083050-ea95bdfd59fc/go.mod h1:rf3lG4BRIbNafJWhAfAdb/ePZxsR/4RtNHQocxwk9r4= k8s.io/api v0.23.3/go.mod h1:w258XdGyvCmnBj/vGzQMj6kzdufJZVUwEM1U2fRJwSQ= -k8s.io/api v0.34.1 h1:jC+153630BMdlFukegoEL8E/yT7aLyQkIVuwhmwDgJM= -k8s.io/api v0.34.1/go.mod h1:SB80FxFtXn5/gwzCoN6QCtPD7Vbu5w2n1S0J5gFfTYk= -k8s.io/apiextensions-apiserver v0.34.1 h1:NNPBva8FNAPt1iSVwIE0FsdrVriRXMsaWFMqJbII2CI= -k8s.io/apiextensions-apiserver v0.34.1/go.mod h1:hP9Rld3zF5Ay2Of3BeEpLAToP+l4s5UlxiHfqRaRcMc= +k8s.io/api v0.34.2 h1:fsSUNZhV+bnL6Aqrp6O7lMTy6o5x2C4XLjnh//8SLYY= +k8s.io/api v0.34.2/go.mod h1:MMBPaWlED2a8w4RSeanD76f7opUoypY8TFYkSM+3XHw= +k8s.io/apiextensions-apiserver v0.34.2 h1:WStKftnGeoKP4AZRz/BaAAEJvYp4mlZGN0UCv+uvsqo= +k8s.io/apiextensions-apiserver v0.34.2/go.mod h1:398CJrsgXF1wytdaanynDpJ67zG4Xq7yj91GrmYN2SE= k8s.io/apimachinery v0.23.3/go.mod h1:BEuFMMBaIbcOqVIJqNZJXGFTP4W6AycEpb5+m/97hrM= -k8s.io/apimachinery v0.34.1 h1:dTlxFls/eikpJxmAC7MVE8oOeP1zryV7iRyIjB0gky4= -k8s.io/apimachinery v0.34.1/go.mod h1:/GwIlEcWuTX9zKIg2mbw0LRFIsXwrfoVxn+ef0X13lw= -k8s.io/client-go v0.34.1 h1:ZUPJKgXsnKwVwmKKdPfw4tB58+7/Ik3CrjOEhsiZ7mY= -k8s.io/client-go v0.34.1/go.mod h1:kA8v0FP+tk6sZA0yKLRG67LWjqufAoSHA2xVGKw9Of8= +k8s.io/apimachinery v0.34.2 h1:zQ12Uk3eMHPxrsbUJgNF8bTauTVR2WgqJsTmwTE/NW4= +k8s.io/apimachinery v0.34.2/go.mod h1:/GwIlEcWuTX9zKIg2mbw0LRFIsXwrfoVxn+ef0X13lw= +k8s.io/client-go v0.34.2 h1:Co6XiknN+uUZqiddlfAjT68184/37PS4QAzYvQvDR8M= +k8s.io/client-go v0.34.2/go.mod h1:2VYDl1XXJsdcAxw7BenFslRQX28Dxz91U9MWKjX97fE= k8s.io/code-generator v0.23.3/go.mod h1:S0Q1JVA+kSzTI1oUvbKAxZY/DYbA/ZUb4Uknog12ETk= k8s.io/gengo v0.0.0-20210813121822-485abfe95c7c/go.mod h1:FiNAH4ZV3gBg2Kwh89tzAEV2be7d5xI0vBa/VySYy3E= k8s.io/gengo v0.0.0-20211129171323-c02415ce4185/go.mod h1:FiNAH4ZV3gBg2Kwh89tzAEV2be7d5xI0vBa/VySYy3E= @@ -388,10 +386,10 @@ k8s.io/utils v0.0.0-20210802155522-efc7438f0176/go.mod h1:jPW/WVKK9YHAvNhRxK0md/ k8s.io/utils v0.0.0-20211116205334-6203023598ed/go.mod h1:jPW/WVKK9YHAvNhRxK0md/EJ228hCsBRufyofKtW8HA= k8s.io/utils v0.0.0-20250604170112-4c0f3b243397 h1:hwvWFiBzdWw1FhfY1FooPn3kzWuJ8tmbZBHi4zVsl1Y= k8s.io/utils v0.0.0-20250604170112-4c0f3b243397/go.mod h1:OLgZIPagt7ERELqWJFomSt595RzquPNLL48iOWgYOg0= -kubevirt.io/api v1.3.1 h1:MoTNo/zvDlZ44c2ocXLPln8XTaQOeUodiYbEKrTCqv4= -kubevirt.io/api v1.3.1/go.mod h1:tCn7VAZktEvymk490iPSMPCmKM9UjbbfH2OsFR/IOLU= -kubevirt.io/containerized-data-importer-api v1.57.0-alpha1 h1:IWo12+ei3jltSN5jQN1xjgakfvRSF3G3Rr4GXVOOy2I= -kubevirt.io/containerized-data-importer-api v1.57.0-alpha1/go.mod h1:Y/8ETgHS1GjO89bl682DPtQOYEU/1ctPFBz6Sjxm4DM= +kubevirt.io/api v1.6.2 h1:aoqZ4KsbOyDjLnuDw7H9wEgE/YTd/q5BBmYeQjJNizc= +kubevirt.io/api v1.6.2/go.mod h1:p66fEy/g79x7VpgUwrkUgOoG2lYs5LQq37WM6JXMwj4= +kubevirt.io/containerized-data-importer-api v1.60.3-0.20241105012228-50fbed985de9 h1:KTb8wO1Lxj220DX7d2Rdo9xovvlyWWNo3AVm2ua+1nY= +kubevirt.io/containerized-data-importer-api v1.60.3-0.20241105012228-50fbed985de9/go.mod h1:SDJjLGhbPyayDqAqawcGmVNapBp0KodOQvhKPLVGCQU= kubevirt.io/controller-lifecycle-operator-sdk/api v0.0.0-20220329064328-f3cc58c6ed90 h1:QMrd0nKP0BGbnxTqakhDZAUhGKxPiPiN5gSDqKUmGGc= kubevirt.io/controller-lifecycle-operator-sdk/api v0.0.0-20220329064328-f3cc58c6ed90/go.mod h1:018lASpFYBsYN6XwmA2TIrPCx6e0gviTd/ZNtSitKgc= sigs.k8s.io/controller-runtime v0.22.4 h1:GEjV7KV3TY8e+tJ2LCTxUTanW4z/FmNB7l327UfMq9A= diff --git a/internal/config/config.go b/internal/config/config.go index 5dde700..9b19b18 100644 --- a/internal/config/config.go +++ b/internal/config/config.go @@ -41,6 +41,9 @@ const ( VMInfoTimeout = 30 * time.Second // Timeout for gathering VM information ClusterVirtualImageReadinessTimeout = 15 * time.Minute // Timeout for waiting for ClusterVirtualImage to become provisioned (Ready) + // VirtualMachineClassReadinessTimeout is how long to wait for an auto-created VirtualMachineClass to reach Ready phase. + VirtualMachineClassReadinessTimeout = 15 * time.Minute + // Node operations NodesReadyTimeout = 15 * time.Minute // Timeout for waiting for nodes to become Ready diff --git a/internal/config/env.go b/internal/config/env.go index 2ff087a..23b3a0a 100644 --- a/internal/config/env.go +++ b/internal/config/env.go @@ -5,6 +5,9 @@ package config import ( "fmt" "os" + "strings" + + "k8s.io/apimachinery/pkg/util/validation" ) const ( @@ -101,6 +104,12 @@ var ( TestClusterStorageClass = os.Getenv("TEST_CLUSTER_STORAGE_CLASS") //TestClusterStorageClassDefaultValue = "rsc-test-r2-local" + // TestClusterVirtualMachineClassName is spec.virtualMachineClassName for VirtualMachines created on the base cluster. + // Empty means use TestClusterVirtualMachineClassNameDefaultValue ("generic"). Values other than generic must satisfy + // Kubernetes DNS-1123 subdomain rules for object names (cluster-scoped VirtualMachineClass). + TestClusterVirtualMachineClassName = os.Getenv("TEST_CLUSTER_VIRTUAL_MACHINE_CLASS_NAME") + TestClusterVirtualMachineClassNameDefaultValue = "generic" + // DKPLicenseKey specifies the DKP license key for cluster deployment DKPLicenseKey = os.Getenv("DKP_LICENSE_KEY") @@ -224,6 +233,15 @@ var ( LogTimetampsEnabledDefaultValue = "true" ) +// EffectiveVirtualMachineClassName returns the VM class name used for test VMs (defaults to generic when unset). +func EffectiveVirtualMachineClassName() string { + n := strings.TrimSpace(TestClusterVirtualMachineClassName) + if n == "" { + return TestClusterVirtualMachineClassNameDefaultValue + } + return n +} + func ValidateEnvironment() error { // Default values for environment variables if YAMLConfigFilename == "" { @@ -247,25 +265,36 @@ func ValidateEnvironment() error { TestClusterNamespace = TestClusterNamespaceDefaultValue } + TestClusterVirtualMachineClassName = strings.TrimSpace(TestClusterVirtualMachineClassName) + if TestClusterVirtualMachineClassName == "" { + TestClusterVirtualMachineClassName = TestClusterVirtualMachineClassNameDefaultValue + } + if TestClusterVirtualMachineClassName != TestClusterVirtualMachineClassNameDefaultValue { + if errs := validation.IsDNS1123Subdomain(TestClusterVirtualMachineClassName); len(errs) > 0 { + return fmt.Errorf("TEST_CLUSTER_VIRTUAL_MACHINE_CLASS_NAME %q is not a valid Kubernetes DNS-1123 subdomain name: %v", + TestClusterVirtualMachineClassName, errs) + } + } + // There are no default values for these variables and they must be set! Otherwise, the test will fail. if SSHUser == "" { - return fmt.Errorf("SSH_USER environment variable is required but not set.") + return fmt.Errorf("SSH_USER environment variable is required but not set") } if SSHHost == "" { - return fmt.Errorf("SSH_HOST environment variable is required but not set.") + return fmt.Errorf("SSH_HOST environment variable is required but not set") } if TestClusterStorageClass == "" { - return fmt.Errorf("TEST_CLUSTER_STORAGE_CLASS environment variable is required but not set.") + return fmt.Errorf("TEST_CLUSTER_STORAGE_CLASS environment variable is required but not set") } if DKPLicenseKey == "" { - return fmt.Errorf("DKP_LICENSE_KEY environment variable is required but not set. ") + return fmt.Errorf("DKP_LICENSE_KEY environment variable is required but not set") } if RegistryDockerCfg == "" { - return fmt.Errorf("REGISTRY_DOCKER_CFG environment variable is required but not set.") + return fmt.Errorf("REGISTRY_DOCKER_CFG environment variable is required but not set") } if ImagePullPolicy == "" { @@ -273,22 +302,22 @@ func ValidateEnvironment() error { } if ImagePullPolicy != ImagePullPolicyAlways && ImagePullPolicy != ImagePullPolicyIfNotExists { - return fmt.Errorf("IMAGE_PULL_POLICY has invalid value '%s'. "+ - "Must be either '%s' or '%s'", + return fmt.Errorf("IMAGE_PULL_POLICY has invalid value '%s'; "+ + "must be either '%s' or '%s'", ImagePullPolicy, ImagePullPolicyAlways, ImagePullPolicyIfNotExists) } if TestClusterCreateMode == "" { - return fmt.Errorf("TEST_CLUSTER_CREATE_MODE environment variable is required but not set. "+ - "Please set it to '%s', '%s', or '%s'", + return fmt.Errorf("TEST_CLUSTER_CREATE_MODE environment variable is required but not set; "+ + "please set it to '%s', '%s', or '%s'", ClusterCreateModeAlwaysUseExisting, ClusterCreateModeAlwaysCreateNew, ClusterCreateModeCommander) } if TestClusterCreateMode != ClusterCreateModeAlwaysUseExisting && TestClusterCreateMode != ClusterCreateModeAlwaysCreateNew && TestClusterCreateMode != ClusterCreateModeCommander { - return fmt.Errorf("TEST_CLUSTER_CREATE_MODE has invalid value '%s'. "+ - "Must be '%s', '%s', or '%s'", + return fmt.Errorf("TEST_CLUSTER_CREATE_MODE has invalid value '%s'; "+ + "must be '%s', '%s', or '%s'", TestClusterCreateMode, ClusterCreateModeAlwaysUseExisting, ClusterCreateModeAlwaysCreateNew, ClusterCreateModeCommander) } @@ -320,8 +349,8 @@ func ValidateEnvironment() error { } if LogLevel != LogLevelDebug && LogLevel != LogLevelInfo && LogLevel != LogLevelWarn && LogLevel != LogLevelError { - return fmt.Errorf("LOG_LEVEL has invalid value '%s'. "+ - "Must be either '%s' or '%s' or '%s' or '%s'", + return fmt.Errorf("LOG_LEVEL has invalid value '%s'; "+ + "must be either '%s' or '%s' or '%s' or '%s'", LogLevel, LogLevelDebug, LogLevelInfo, LogLevelWarn, LogLevelError) } @@ -330,8 +359,8 @@ func ValidateEnvironment() error { } if LogTimetampsEnabled != "true" && LogTimetampsEnabled != "false" { - return fmt.Errorf("LOG_TIMESTAMPS_ENABLED has invalid value '%s'. "+ - "Must be either '%s' or '%s'", + return fmt.Errorf("LOG_TIMESTAMPS_ENABLED has invalid value '%s'; "+ + "must be either '%s' or '%s'", LogTimetampsEnabled, "true", "false") } diff --git a/internal/infrastructure/ssh/client.go b/internal/infrastructure/ssh/client.go index 61736d2..e2a24d1 100644 --- a/internal/infrastructure/ssh/client.go +++ b/internal/infrastructure/ssh/client.go @@ -101,6 +101,33 @@ func copyWithContext(ctx context.Context, dst io.Writer, src io.Reader) (written return written, err } +// uploadOverSFTPOnce creates remotePath via SFTP, optionally chmods it before streaming bytes (avoids a wide-permission window during transfer), then copies localPath. +func uploadOverSFTPOnce(ctx context.Context, sftpClient *sftp.Client, localPath, remotePath string, chmodBeforeCopy *os.FileMode) error { + localFile, err := os.Open(localPath) + if err != nil { + return fmt.Errorf("failed to open local file %s: %w", localPath, err) + } + defer localFile.Close() + + remoteFile, err := sftpClient.Create(remotePath) + if err != nil { + return fmt.Errorf("failed to create remote file %s: %w", remotePath, err) + } + defer remoteFile.Close() + + if chmodBeforeCopy != nil { + if err := sftpClient.Chmod(remotePath, *chmodBeforeCopy); err != nil { + _ = sftpClient.Remove(remotePath) + return fmt.Errorf("chmod remote file before transfer: %w", err) + } + } + + if _, err := copyWithContext(ctx, remoteFile, localFile); err != nil { + return fmt.Errorf("failed to copy file: %w", err) + } + return nil +} + // readPassword reads a password from the terminal func readPassword(prompt string) ([]byte, error) { fmt.Fprint(os.Stderr, prompt) @@ -538,12 +565,10 @@ func (c *client) ExecFatal(ctx context.Context, cmd string) string { return output } -// Upload uploads a local file to the remote host with automatic retry and reconnection -func (c *client) Upload(ctx context.Context, localPath, remotePath string) error { +func (c *client) uploadWithSFTPRetries(ctx context.Context, localPath, remotePath string, chmodBeforeCopy *os.FileMode) error { var lastErr error for attempt := 0; attempt < config.SSHRetryCount; attempt++ { - // Check context before starting if err := ctx.Err(); err != nil { return fmt.Errorf("context error before upload: %w", err) } @@ -564,41 +589,18 @@ func (c *client) Upload(ctx context.Context, localPath, remotePath string) error return lastErr } - localFile, err := os.Open(localPath) - if err != nil { - sftpClient.Close() - return fmt.Errorf("failed to open local file %s: %w", localPath, err) - } - - remoteFile, err := sftpClient.Create(remotePath) - if err != nil { - localFile.Close() - sftpClient.Close() - lastErr = fmt.Errorf("failed to create remote file %s: %w", remotePath, err) - if isConnectionError(err) { - if reconnErr := c.reconnect(ctx); reconnErr != nil { - return fmt.Errorf("remote file creation failed and reconnection failed: %w (original: %v)", reconnErr, lastErr) - } - continue - } - return lastErr - } - - // Use context-aware copy - _, err = copyWithContext(ctx, remoteFile, localFile) - remoteFile.Close() - localFile.Close() + err = uploadOverSFTPOnce(ctx, sftpClient, localPath, remotePath, chmodBeforeCopy) sftpClient.Close() if err != nil { - lastErr = fmt.Errorf("failed to copy file: %w", err) + lastErr = err if isConnectionError(err) { if reconnErr := c.reconnect(ctx); reconnErr != nil { - return fmt.Errorf("file copy failed and reconnection failed: %w (original: %v)", reconnErr, lastErr) + return fmt.Errorf("SFTP upload failed and reconnection failed: %w (original: %v)", reconnErr, lastErr) } continue } - return lastErr + return err } return nil @@ -607,6 +609,17 @@ func (c *client) Upload(ctx context.Context, localPath, remotePath string) error return fmt.Errorf("SSH upload failed after %d attempts: %w", config.SSHRetryCount, lastErr) } +// Upload uploads a local file to the remote host with automatic retry and reconnection +func (c *client) Upload(ctx context.Context, localPath, remotePath string) error { + return c.uploadWithSFTPRetries(ctx, localPath, remotePath, nil) +} + +// UploadPrivate uploads like Upload but applies remotePerm to remotePath over SFTP immediately after create and before copying payload, avoiding a world-readable window during transfer (CWE-732). +func (c *client) UploadPrivate(ctx context.Context, localPath, remotePath string, remotePerm os.FileMode) error { + p := remotePerm & os.ModePerm + return c.uploadWithSFTPRetries(ctx, localPath, remotePath, &p) +} + // Close closes the SSH connection func (c *client) Close() error { // Stop keepalive goroutine @@ -1064,12 +1077,10 @@ func (c *jumpHostClient) ExecFatal(ctx context.Context, cmd string) string { return output } -// Upload uploads a local file to the remote host with automatic retry and reconnection -func (c *jumpHostClient) Upload(ctx context.Context, localPath, remotePath string) error { +func (c *jumpHostClient) jumpUploadWithSFTPRetries(ctx context.Context, localPath, remotePath string, chmodBeforeCopy *os.FileMode) error { var lastErr error for attempt := 0; attempt < config.SSHRetryCount; attempt++ { - // Check context before starting if err := ctx.Err(); err != nil { return fmt.Errorf("context error before upload: %w", err) } @@ -1090,41 +1101,18 @@ func (c *jumpHostClient) Upload(ctx context.Context, localPath, remotePath strin return lastErr } - localFile, err := os.Open(localPath) - if err != nil { - sftpClient.Close() - return fmt.Errorf("failed to open local file %s: %w", localPath, err) - } - - remoteFile, err := sftpClient.Create(remotePath) - if err != nil { - localFile.Close() - sftpClient.Close() - lastErr = fmt.Errorf("failed to create remote file %s: %w", remotePath, err) - if isConnectionError(err) { - if reconnErr := c.reconnect(ctx); reconnErr != nil { - return fmt.Errorf("remote file creation failed and reconnection failed: %w (original: %v)", reconnErr, lastErr) - } - continue - } - return lastErr - } - - // Use context-aware copy - _, err = copyWithContext(ctx, remoteFile, localFile) - remoteFile.Close() - localFile.Close() + err = uploadOverSFTPOnce(ctx, sftpClient, localPath, remotePath, chmodBeforeCopy) sftpClient.Close() if err != nil { - lastErr = fmt.Errorf("failed to copy file: %w", err) + lastErr = err if isConnectionError(err) { if reconnErr := c.reconnect(ctx); reconnErr != nil { - return fmt.Errorf("file copy failed and reconnection failed: %w (original: %v)", reconnErr, lastErr) + return fmt.Errorf("SFTP upload failed and reconnection failed: %w (original: %v)", reconnErr, lastErr) } continue } - return lastErr + return err } return nil @@ -1133,6 +1121,17 @@ func (c *jumpHostClient) Upload(ctx context.Context, localPath, remotePath strin return fmt.Errorf("SSH upload failed after %d attempts: %w", config.SSHRetryCount, lastErr) } +// Upload uploads a local file to the remote host with automatic retry and reconnection +func (c *jumpHostClient) Upload(ctx context.Context, localPath, remotePath string) error { + return c.jumpUploadWithSFTPRetries(ctx, localPath, remotePath, nil) +} + +// UploadPrivate uploads like Upload but applies remotePerm to remotePath over SFTP immediately after create and before copying payload, avoiding a world-readable window during transfer (CWE-732). +func (c *jumpHostClient) UploadPrivate(ctx context.Context, localPath, remotePath string, remotePerm os.FileMode) error { + p := remotePerm & os.ModePerm + return c.jumpUploadWithSFTPRetries(ctx, localPath, remotePath, &p) +} + // Close closes both SSH connections func (c *jumpHostClient) Close() error { // Stop keepalive goroutines first diff --git a/internal/infrastructure/ssh/interface.go b/internal/infrastructure/ssh/interface.go index d2ba326..d61a249 100644 --- a/internal/infrastructure/ssh/interface.go +++ b/internal/infrastructure/ssh/interface.go @@ -16,7 +16,10 @@ limitations under the License. package ssh -import "context" +import ( + "context" + "os" +) // SSHClient provides SSH operations type SSHClient interface { @@ -36,6 +39,9 @@ type SSHClient interface { // Uploads a local file to the remote host Upload(ctx context.Context, localPath, remotePath string) error + // UploadPrivate uploads like Upload but sets remotePerm on the remote path via SFTP after create and before copying payload (reduces permission race on secrets). + UploadPrivate(ctx context.Context, localPath, remotePath string, remotePerm os.FileMode) error + // Close closes the SSH connection Close() error } diff --git a/internal/kubernetes/virtualization/client.go b/internal/kubernetes/virtualization/client.go index 5e88e47..59a15d0 100644 --- a/internal/kubernetes/virtualization/client.go +++ b/internal/kubernetes/virtualization/client.go @@ -26,6 +26,7 @@ import ( "github.com/deckhouse/storage-e2e/pkg/retry" "github.com/deckhouse/virtualization/api/core/v1alpha2" + "github.com/deckhouse/virtualization/api/core/v1alpha3" ) // Client provides access to virtualization resources @@ -44,7 +45,10 @@ func NewClient(ctx context.Context, config *rest.Config) (*Client, error) { // Register virtualization API types with the scheme if err := v1alpha2.SchemeBuilder.AddToScheme(scheme); err != nil { - return nil, fmt.Errorf("failed to add virtualization scheme: %w", err) + return nil, fmt.Errorf("failed to add virtualization v1alpha2 scheme: %w", err) + } + if err := v1alpha3.SchemeBuilder.AddToScheme(scheme); err != nil { + return nil, fmt.Errorf("failed to add virtualization v1alpha3 scheme: %w", err) } cl, err := client.New(config, client.Options{Scheme: scheme}) @@ -71,6 +75,11 @@ func (c *Client) ClusterVirtualImages() *ClusterVirtualImageClient { return &ClusterVirtualImageClient{client: c.client} } +// VirtualMachineClasses returns a VirtualMachineClass client (cluster-scoped). +func (c *Client) VirtualMachineClasses() *VirtualMachineClassClient { + return &VirtualMachineClassClient{client: c.client} +} + // VirtualImages returns a VirtualImage client func (c *Client) VirtualImages() *VirtualImageClient { return &VirtualImageClient{client: c.client} diff --git a/internal/kubernetes/virtualization/virtual_machine_class.go b/internal/kubernetes/virtualization/virtual_machine_class.go new file mode 100644 index 0000000..e208a87 --- /dev/null +++ b/internal/kubernetes/virtualization/virtual_machine_class.go @@ -0,0 +1,73 @@ +/* +Copyright 2025 Flant JSC + +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +*/ + +package virtualization + +import ( + "context" + "fmt" + + metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" + "sigs.k8s.io/controller-runtime/pkg/client" + + "github.com/deckhouse/virtualization/api/core/v1alpha3" +) + +// VirtualMachineClassClient provides operations on VirtualMachineClass resources (cluster-scoped). +// Uses API version v1alpha3 (storage/preferred); v1alpha2.VirtualMachineClass is deprecated upstream. +type VirtualMachineClassClient struct { + client client.Client +} + +// Get retrieves a VirtualMachineClass by name. +func (c *VirtualMachineClassClient) Get(ctx context.Context, name string) (*v1alpha3.VirtualMachineClass, error) { + vmc := &v1alpha3.VirtualMachineClass{} + key := client.ObjectKey{Name: name} + if err := c.client.Get(ctx, key, vmc); err != nil { + return nil, fmt.Errorf("failed to get VirtualMachineClass %s: %w", name, err) + } + return vmc, nil +} + +// List lists all VirtualMachineClasses. +func (c *VirtualMachineClassClient) List(ctx context.Context) ([]v1alpha3.VirtualMachineClass, error) { + list := &v1alpha3.VirtualMachineClassList{} + if err := c.client.List(ctx, list); err != nil { + return nil, fmt.Errorf("failed to list VirtualMachineClasses: %w", err) + } + return list.Items, nil +} + +// Create creates a new VirtualMachineClass. +func (c *VirtualMachineClassClient) Create(ctx context.Context, vmc *v1alpha3.VirtualMachineClass) error { + if err := c.client.Create(ctx, vmc); err != nil { + return fmt.Errorf("failed to create VirtualMachineClass %s: %w", vmc.Name, err) + } + return nil +} + +// Delete deletes a VirtualMachineClass by name. +func (c *VirtualMachineClassClient) Delete(ctx context.Context, name string) error { + vmc := &v1alpha3.VirtualMachineClass{ + ObjectMeta: metav1.ObjectMeta{ + Name: name, + }, + } + if err := c.client.Delete(ctx, vmc); err != nil { + return fmt.Errorf("failed to delete VirtualMachineClass %s: %w", name, err) + } + return nil +} diff --git a/pkg/cluster/cluster.go b/pkg/cluster/cluster.go index 86ae335..b57f334 100644 --- a/pkg/cluster/cluster.go +++ b/pkg/cluster/cluster.go @@ -2221,6 +2221,8 @@ func OutputEnvironmentVariables() { GinkgoWriter.Printf(" TEST_CLUSTER_STORAGE_CLASS: %s\n", config.TestClusterStorageClass) } + GinkgoWriter.Printf(" TEST_CLUSTER_VIRTUAL_MACHINE_CLASS_NAME: %s\n", config.EffectiveVirtualMachineClassName()) + // SSH_HOST - no masking if config.SSHHost != "" { GinkgoWriter.Printf(" SSH_HOST: %s\n", config.SSHHost) diff --git a/pkg/cluster/setup.go b/pkg/cluster/setup.go index ea4ffa5..9989639 100644 --- a/pkg/cluster/setup.go +++ b/pkg/cluster/setup.go @@ -426,6 +426,54 @@ func getDevBranchFromConfig(configPath string) (string, error) { return "", fmt.Errorf("devBranch not found in config file %s", configPath) } +// dhctlSSHConfigManifest and dhctlSSHHostManifest match dhctl OpenAPI kinds under candi/openapi/dhctl (dhctl.deckhouse.io/v1). +type dhctlSSHConfigManifest struct { + APIVersion string `yaml:"apiVersion"` + Kind string `yaml:"kind"` + SSHUser string `yaml:"sshUser"` + SSHPort int32 `yaml:"sshPort"` + SSHAgentPrivateKeys []dhctlSSHAgentPrivateKey `yaml:"sshAgentPrivateKeys"` +} + +type dhctlSSHAgentPrivateKey struct { + Key string `yaml:"key"` + Passphrase string `yaml:"passphrase,omitempty"` +} + +type dhctlSSHHostManifest struct { + APIVersion string `yaml:"apiVersion"` + Kind string `yaml:"kind"` + Host string `yaml:"host"` +} + +func buildDHCTLSSHConnectionConfig(pemKey, sshUser, masterHost, passphrase string) ([]byte, error) { + cfg := dhctlSSHConfigManifest{ + APIVersion: "dhctl.deckhouse.io/v1", + Kind: "SSHConfig", + SSHUser: sshUser, + SSHPort: 22, + SSHAgentPrivateKeys: []dhctlSSHAgentPrivateKey{{ + Key: strings.TrimSpace(pemKey) + "\n", + Passphrase: passphrase, + }}, + } + hostDoc := dhctlSSHHostManifest{ + APIVersion: "dhctl.deckhouse.io/v1", + Kind: "SSHHost", + Host: masterHost, + } + cfgBytes, err := yaml.Marshal(&cfg) + if err != nil { + return nil, fmt.Errorf("marshal SSHConfig: %w", err) + } + hostBytes, err := yaml.Marshal(&hostDoc) + if err != nil { + return nil, fmt.Errorf("marshal SSHHost: %w", err) + } + doc := "---\n" + strings.TrimSuffix(string(cfgBytes), "\n") + "\n---\n" + strings.TrimSuffix(string(hostBytes), "\n") + "\n" + return []byte(doc), nil +} + // BootstrapCluster bootstraps a Kubernetes cluster from the setup node to the first master node. // It performs the following steps: // 1. Logs into the Docker registry using DKP_LICENSE_KEY from config @@ -483,81 +531,133 @@ func BootstrapCluster(ctx context.Context, sshClient ssh.SSHClient, clusterDef * } logFilePath := filepath.Join(config.E2ETempDir, fmt.Sprintf("bootstrap-%s.log", time.Now().Format("2006-01-02_15-04-05"))) - remoteLogPath := fmt.Sprintf("/tmp/bootstrap-%d.log", os.Getpid()) // Use unique name to avoid conflicts - agentSocketPath := fmt.Sprintf("/tmp/ssh-agent-%d.sock", os.Getpid()) // Unique agent socket path - - // Step 2: Setup ssh-agent and add the SSH key - // Create a temporary askpass script to provide the passphrase non-interactively - askpassScriptPath := fmt.Sprintf("/tmp/ssh-askpass-%d.sh", os.Getpid()) - askpassScript := fmt.Sprintf(`#!/bin/bash -echo "%s" -`, config.SSHPassphrase) - - // Create the askpass script file on the remote host - createAskpassCmd := fmt.Sprintf("sudo -u %s bash -c 'cat > %s << \"ASKPASS_EOF\"\n%sASKPASS_EOF\nchmod +x %s'", config.VMSSHUser, askpassScriptPath, askpassScript, askpassScriptPath) - _, err = sshClient.Exec(ctx, createAskpassCmd) - if err != nil { - return fmt.Errorf("failed to create askpass script: %w", err) - } - - // Setup ssh-agent and add the key - setupAgentScript := fmt.Sprintf(` - # Start ssh-agent with specified socket path - eval $(ssh-agent -a %s) > /dev/null 2>&1 - export SSH_AUTH_SOCK=%s - export SSH_AGENT_PID=$SSH_AGENT_PID - - # Add the SSH key to the agent using the askpass script - if [ -n "%s" ]; then - DISPLAY=:0 SSH_ASKPASS=%s ssh-add /home/%s/.ssh/id_rsa &1 - else - ssh-add /home/%s/.ssh/id_rsa &1 - fi - - # Output the agent socket path for use in docker command - echo $SSH_AUTH_SOCK - `, agentSocketPath, agentSocketPath, config.SSHPassphrase, askpassScriptPath, config.VMSSHUser, config.VMSSHUser) - - // Run the agent setup script - agentOutput, err := sshClient.Exec(ctx, fmt.Sprintf("sudo -u %s bash -c %s", config.VMSSHUser, fmt.Sprintf("'%s'", setupAgentScript))) - if err != nil { - // Clean up askpass script on error - _, _ = sshClient.Exec(ctx, fmt.Sprintf("sudo rm -f %s", askpassScriptPath)) - return fmt.Errorf("failed to setup ssh-agent: %w\nOutput: %s", err, agentOutput) + remoteLogPath := fmt.Sprintf("/tmp/bootstrap-%d.log", os.Getpid()) // Use unique name to avoid conflicts + + // Bootstrap previously mounted SSH_AUTH_SOCK into the dhctl container so authentication went through ssh-agent. + // After Deckhouse PR https://github.com/deckhouse/deckhouse/pull/19063, dhctl resolves SSH settings via lib-connection + // ExtractConfig early in bootstrap; that path reads private key files from disk using paths derived from flags (default + // ~/.ssh/id_rsa → /root/.ssh/id_rsa inside the install image where HOME is /root). Mounting only the agent socket is then + // too late and fails with errors like "extract config: Failed to read private keys from flags". We bind-mount the same + // key already placed on the VM by UploadBootstrapFiles and pass --ssh-agent-private-keys explicitly. + // + // When SSH_PASSPHRASE is set, dhctl cannot prompt inside the non-interactive container. dhctl also forbids combining + // --connection-config with other SSH flags, so we upload a small dhctl connection manifest (SSHConfig + SSHHost) with inline + // key PEM and passphrase; dhctl copies that into temp key files and fills PrivateKeysToPassPhrasesFromConfig (see dhctl + // pkg/config/connection.go ParseConnectionConfigFromFile). + const dhctlContainerSSHKeyPath = "/root/.ssh/id_rsa" + remoteSSHPrivateKey := filepath.Join("/home", config.VMSSHUser, ".ssh", "id_rsa") + + var dockerVolFlags, dhctlSSHArgs string + var remoteConnYAMLPath string // passphrase-only; removed ASAP after docker run (avoid long-lived secrets in /tmp) + + removeRemoteConnYAML := func() { + if remoteConnYAMLPath == "" { + return + } + cleanupCtx, cancel := context.WithTimeout(context.Background(), 30*time.Second) + defer cancel() + _, _ = sshClient.Exec(cleanupCtx, fmt.Sprintf("sudo rm -f %s", remoteConnYAMLPath)) + remoteConnYAMLPath = "" } - // Extract the actual SSH_AUTH_SOCK path from output (last line) - agentSocketLines := strings.Split(strings.TrimSpace(agentOutput), "\n") - actualAgentSocket := agentSocketPath // Default to our specified path - if len(agentSocketLines) > 0 { - lastLine := strings.TrimSpace(agentSocketLines[len(agentSocketLines)-1]) - if lastLine != "" && strings.HasPrefix(lastLine, "/") { - actualAgentSocket = lastLine + if config.SSHPassphrase != "" { + stageDir := filepath.Join("/home", config.VMSSHUser, ".config", "storage-e2e") + if _, prepErr := sshClient.Exec(ctx, fmt.Sprintf("sudo install -d -m 0700 -o %s -g %s -- %q", config.VMSSHUser, config.VMSSHUser, stageDir)); prepErr != nil { + return fmt.Errorf("prepare setup-node dir for dhctl connection-config: %w", prepErr) + } + + tmpPattern := filepath.Join(stageDir, "dhctl-bootstrap-connection.XXXXXX.yaml") + mktempOut, mktempErr := sshClient.Exec(ctx, fmt.Sprintf("sudo -u %s mktemp %q", config.VMSSHUser, tmpPattern)) + if mktempErr != nil { + return fmt.Errorf("create temp path for dhctl connection-config on setup node: %w", mktempErr) + } + remoteConnYAMLPath = strings.TrimSpace(strings.Split(strings.TrimSpace(mktempOut), "\n")[0]) + if remoteConnYAMLPath == "" { + return fmt.Errorf("mktemp returned empty path for dhctl connection-config") } - } - // Make the socket readable by root (needed when docker runs with sudo) - // This allows the docker process (running as root) to access the socket - chmodCmd := fmt.Sprintf("sudo chmod 666 %s 2>/dev/null || true", actualAgentSocket) - _, _ = sshClient.Exec(ctx, chmodCmd) + if _, probeErr := sshClient.Exec(ctx, fmt.Sprintf("sudo -u %s test -r %s", config.VMSSHUser, remoteSSHPrivateKey)); probeErr != nil { + removeRemoteConnYAML() + return fmt.Errorf("SSH private key not readable at %s on setup node: %w", remoteSSHPrivateKey, probeErr) + } - // Step 3: Run dhctl bootstrap command with ssh-agent - // Mount SSH_AUTH_SOCK into the container and use it for authentication - // Note: We don't use --ssh-agent-private-keys anymore, dhctl will use SSH_AUTH_SOCK - // Docker needs to run with sudo for access to docker socket + pemOut, pemErr := sshClient.Exec(ctx, fmt.Sprintf("sudo -u %s cat %s", config.VMSSHUser, remoteSSHPrivateKey)) + if pemErr != nil { + removeRemoteConnYAML() + // Do not include pemOut in the error: Exec uses CombinedOutput and stdout may already contain key material. + return fmt.Errorf("read bootstrap SSH private key from setup node for connection-config: %w", pemErr) + } + if strings.TrimSpace(pemOut) == "" { + removeRemoteConnYAML() + return fmt.Errorf("empty SSH private key at %s on setup node", remoteSSHPrivateKey) + } + + connYAML, connErr := buildDHCTLSSHConnectionConfig(pemOut, config.VMSSHUser, masterIP, config.SSHPassphrase) + if connErr != nil { + removeRemoteConnYAML() + return fmt.Errorf("build dhctl connection-config: %w", connErr) + } + + localConnFile, tmpErr := os.CreateTemp("", "dhctl-bootstrap-connection-*.yaml") + if tmpErr != nil { + removeRemoteConnYAML() + return fmt.Errorf("create temp connection-config: %w", tmpErr) + } + localConnPath := localConnFile.Name() + defer func() { _ = os.Remove(localConnPath) }() + + if chmodErr := os.Chmod(localConnPath, 0600); chmodErr != nil { + _ = localConnFile.Close() + removeRemoteConnYAML() + return fmt.Errorf("chmod temp connection-config: %w", chmodErr) + } + if _, writeErr := localConnFile.Write(connYAML); writeErr != nil { + _ = localConnFile.Close() + removeRemoteConnYAML() + return fmt.Errorf("write temp connection-config: %w", writeErr) + } + if closeErr := localConnFile.Close(); closeErr != nil { + removeRemoteConnYAML() + return fmt.Errorf("close temp connection-config: %w", closeErr) + } + + if upErr := sshClient.UploadPrivate(ctx, localConnPath, remoteConnYAMLPath, 0600); upErr != nil { + removeRemoteConnYAML() + return fmt.Errorf("upload dhctl connection-config to setup node: %w", upErr) + } + + dockerVolFlags = fmt.Sprintf( + `-v "/home/%s/config.yml:/config.yml" -v "%s:/dhctl-connection.yaml:ro"`, + config.VMSSHUser, remoteConnYAMLPath, + ) + dhctlSSHArgs = "--connection-config=/dhctl-connection.yaml --config=/config.yml" + } else { + dockerVolFlags = fmt.Sprintf( + `-v "/home/%s/config.yml:/config.yml" -v "%s:%s:ro"`, + config.VMSSHUser, remoteSSHPrivateKey, dhctlContainerSSHKeyPath, + ) + dhctlSSHArgs = fmt.Sprintf( + "--ssh-host=%s --ssh-user=%s --ssh-agent-private-keys=%s --config=/config.yml", + masterIP, config.VMSSHUser, dhctlContainerSSHKeyPath, + ) + } + + // Step 2: Run dhctl bootstrap (Docker needs sudo for access to docker socket) installImage := fmt.Sprintf("%s/install:%s", registryRepo, devBranch) bootstrapCmd := fmt.Sprintf( - "sudo -u %s bash -c 'export SSH_AUTH_SOCK=%s; sudo docker run --network=host --pull=always -v \"/home/%s/config.yml:/config.yml\" -v \"%s:/tmp/ssh-agent.sock\" -e SSH_AUTH_SOCK=/tmp/ssh-agent.sock %s dhctl bootstrap --ssh-host=%s --ssh-user=%s --config=/config.yml > %s 2>&1'", - config.VMSSHUser, actualAgentSocket, config.VMSSHUser, actualAgentSocket, installImage, masterIP, config.VMSSHUser, remoteLogPath, + `sudo -u %s bash -c 'sudo docker run --network=host --pull=always %s %s dhctl bootstrap %s > %s 2>&1'`, + config.VMSSHUser, + dockerVolFlags, + installImage, + dhctlSSHArgs, + remoteLogPath, ) // Run the bootstrap command // Output is redirected to remote log file, so output variable will be empty output, err = sshClient.Exec(ctx, bootstrapCmd) - // Clean up ssh-agent and askpass script after bootstrap (whether success or failure) - cleanupAgentCmd := fmt.Sprintf("sudo -u %s bash -c 'SSH_AUTH_SOCK=%s ssh-agent -k 2>/dev/null || true; rm -f %s %s 2>/dev/null || true'", config.VMSSHUser, actualAgentSocket, actualAgentSocket, askpassScriptPath) - _, _ = sshClient.Exec(ctx, cleanupAgentCmd) + removeRemoteConnYAML() // Always download log file from remote host (whether success or failure) // Use sudo cat since the log file was created with sudo @@ -889,7 +989,6 @@ func WaitForAllNodesReady(ctx context.Context, kubeconfig *rest.Config, clusterD return nil } - // GetSSHPublicKeyContent returns the SSH public key content as a string. // If SSHPublicKey is a file path, it reads and returns the file content. // If SSHPublicKey is a plain-text string, it returns it directly. diff --git a/pkg/cluster/vms.go b/pkg/cluster/vms.go index 72f5cd1..61c4e3f 100644 --- a/pkg/cluster/vms.go +++ b/pkg/cluster/vms.go @@ -31,6 +31,12 @@ import ( "github.com/deckhouse/storage-e2e/internal/kubernetes/virtualization" "github.com/deckhouse/storage-e2e/internal/logger" "github.com/deckhouse/virtualization/api/core/v1alpha2" + "github.com/deckhouse/virtualization/api/core/v1alpha3" +) + +const ( + vmClassAutoCreatedLabelKey = "storage-e2e.deckhouse.io/auto-created" + vmClassAutoCreatedLabelValue = "true" ) // VMResources tracks VM-related resources created for a test cluster @@ -91,6 +97,10 @@ func CreateVirtualMachines(ctx context.Context, virtClient *virtualization.Clien return nil, nil, fmt.Errorf("the following VM-related resources already exist (CLUSTER_CREATE_MODE=%s): %s", config.TestClusterCreateMode, strings.Join(conflictMessages, ", ")) } + if err := ensureVirtualMachineClassForClusterVMs(ctx, virtClient); err != nil { + return nil, nil, err + } + // Create all CVMIs first (with waiting for Ready) storageClass := config.TestClusterStorageClass var wg sync.WaitGroup @@ -243,6 +253,100 @@ func checkResourceConflicts(ctx context.Context, virtClient *virtualization.Clie return conflicts, nil } +// ensureVirtualMachineClassForClusterVMs ensures the configured VirtualMachineClass exists on the base cluster and reaches Ready. +// For every configured name (including default generic), it GETs the class and waits for Ready when present. +// If the default generic class is missing, it fails fast with an actionable error instead of failing later during VM creation. +// When the name is not generic and the class is missing, it creates one by cloning spec from the built-in "generic" +// class and setting spec.cpu.type to Host. Inherited sizing policies and similar fields stay; spec.nodeSelector and +// spec.tolerations are cleared because Host CPU pins the instruction set to the node—keeping generic placement rules +// could allow heterogeneous nodes and break live migration (see Deckhouse VirtualMachineClass CPU type Host docs). +// The new object is labeled for identification and is never deleted by e2e cleanup. +func ensureVirtualMachineClassForClusterVMs(ctx context.Context, virtClient *virtualization.Client) error { + className := config.EffectiveVirtualMachineClassName() + vmcClient := virtClient.VirtualMachineClasses() + + _, err := vmcClient.Get(ctx, className) + if err == nil { + return waitForVirtualMachineClassReady(ctx, virtClient, className) + } + if !errors.IsNotFound(err) { + return fmt.Errorf("VirtualMachineClass %q: %w", className, err) + } + + if className == config.TestClusterVirtualMachineClassNameDefaultValue { + return fmt.Errorf("VirtualMachineClass %q not found on the base cluster; enable the virtualization module and ensure this VirtualMachineClass exists before running tests", className) + } + + template, err := vmcClient.Get(ctx, config.TestClusterVirtualMachineClassNameDefaultValue) + if err != nil { + if errors.IsNotFound(err) { + return fmt.Errorf("VirtualMachineClass %q is missing and template class %q was not found on the cluster; cannot auto-create the class", + className, config.TestClusterVirtualMachineClassNameDefaultValue) + } + return fmt.Errorf("get template VirtualMachineClass %q: %w", config.TestClusterVirtualMachineClassNameDefaultValue, err) + } + + cloned := template.Spec.DeepCopy() + cloned.CPU = v1alpha3.CPU{Type: v1alpha3.CPUTypeHost} + cloned.NodeSelector = v1alpha3.NodeSelector{} + cloned.Tolerations = nil + + vmc := &v1alpha3.VirtualMachineClass{ + ObjectMeta: metav1.ObjectMeta{ + Name: className, + Labels: map[string]string{ + vmClassAutoCreatedLabelKey: vmClassAutoCreatedLabelValue, + }, + }, + Spec: *cloned, + } + + if err := vmcClient.Create(ctx, vmc); err != nil { + if errors.IsAlreadyExists(err) { + return waitForVirtualMachineClassReady(ctx, virtClient, className) + } + return fmt.Errorf("create VirtualMachineClass %q: %w", className, err) + } + + logger.Info("Created VirtualMachineClass %s (from generic template, cpu.type=Host, cleared nodeSelector/tolerations, label %s=%s)", + className, vmClassAutoCreatedLabelKey, vmClassAutoCreatedLabelValue) + return waitForVirtualMachineClassReady(ctx, virtClient, className) +} + +func waitForVirtualMachineClassReady(ctx context.Context, virtClient *virtualization.Client, name string) error { + waitCtx, cancel := context.WithTimeout(ctx, config.VirtualMachineClassReadinessTimeout) + defer cancel() + + ticker := time.NewTicker(5 * time.Second) + defer ticker.Stop() + lastLog := time.Now() + + for { + select { + case <-waitCtx.Done(): + return fmt.Errorf("timeout waiting for VirtualMachineClass %s to reach Ready (phase still not Ready after %v)", + name, config.VirtualMachineClassReadinessTimeout) + case <-ticker.C: + vmc, err := virtClient.VirtualMachineClasses().Get(waitCtx, name) + if err != nil { + return fmt.Errorf("get VirtualMachineClass %s: %w", name, err) + } + switch vmc.Status.Phase { + case v1alpha3.ClassPhaseReady: + logger.Debug("VirtualMachineClass %s is Ready", name) + return nil + case v1alpha3.ClassPhaseTerminating: + return fmt.Errorf("VirtualMachineClass %s is Terminating", name) + default: + if time.Since(lastLog) >= 30*time.Second { + logger.Debug("VirtualMachineClass %s phase: %s", name, vmc.Status.Phase) + lastLog = time.Now() + } + } + } + } +} + // getVMNodes extracts all VM nodes from cluster definition func getVMNodes(clusterDef *config.ClusterDefinition) []config.ClusterNode { var vmNodes []config.ClusterNode @@ -385,7 +489,7 @@ func createVM(ctx context.Context, virtClient *virtualization.Client, namespace Labels: map[string]string{"vm": "linux", "service": "v1"}, }, Spec: v1alpha2.VirtualMachineSpec{ - VirtualMachineClassName: "generic", + VirtualMachineClassName: config.EffectiveVirtualMachineClassName(), EnableParavirtualization: true, RunPolicy: v1alpha2.RunPolicy("AlwaysOn"), OsType: v1alpha2.OsType("Generic"), diff --git a/tests/csi-all-stress-tests/csi_all_stress_tests_test.go b/tests/csi-all-stress-tests/csi_all_stress_tests_test.go index 938e61e..2aea4a5 100644 --- a/tests/csi-all-stress-tests/csi_all_stress_tests_test.go +++ b/tests/csi-all-stress-tests/csi_all_stress_tests_test.go @@ -121,6 +121,8 @@ var _ = Describe("All CSIs Stress Tests", Ordered, func() { GinkgoWriter.Printf(" TEST_CLUSTER_STORAGE_CLASS: %s\n", config.TestClusterStorageClass) } + GinkgoWriter.Printf(" TEST_CLUSTER_VIRTUAL_MACHINE_CLASS_NAME: %s\n", config.EffectiveVirtualMachineClassName()) + // SSH_HOST - no masking if config.SSHHost != "" { GinkgoWriter.Printf(" SSH_HOST: %s\n", config.SSHHost) diff --git a/tests/test-template/template_test.go b/tests/test-template/template_test.go index 5518ecf..6b9a775 100644 --- a/tests/test-template/template_test.go +++ b/tests/test-template/template_test.go @@ -74,6 +74,8 @@ var _ = Describe("Template Test", Ordered, func() { GinkgoWriter.Printf(" TEST_CLUSTER_STORAGE_CLASS: %s\n", config.TestClusterStorageClass) } + GinkgoWriter.Printf(" TEST_CLUSTER_VIRTUAL_MACHINE_CLASS_NAME: %s\n", config.EffectiveVirtualMachineClassName()) + // SSH_HOST - no masking if config.SSHHost != "" { GinkgoWriter.Printf(" SSH_HOST: %s\n", config.SSHHost)