Skip to content

Swap encryption pr1 shared infra#6776

Open
DevVegeta wants to merge 15 commits into
GoogleCloudPlatform:masterfrom
ashishsuneja:swap-encryption-pr1-shared-infra_new
Open

Swap encryption pr1 shared infra#6776
DevVegeta wants to merge 15 commits into
GoogleCloudPlatform:masterfrom
ashishsuneja:swap-encryption-pr1-shared-infra_new

Conversation

@DevVegeta

@DevVegeta DevVegeta commented Jun 19, 2026

Copy link
Copy Markdown
Collaborator

feat(swap_encryption/pr1): shared infra — SwapDaemonSet BaseResource + PKB framework swap config

Stacked PR 1/5. Adds the infrastructure all later PRs build on.

New files:

  • resources/container_service/swap_daemonset.pySwapDaemonSet(BaseResource): full DaemonSet lifecycle (_Create, _Delete, WaitForPod, PodExec, RecoverPod). Added to spec.resources for auto-teardown.
  • resources/container_service/swap_config.pyGkeSwapConfig(BaseResource): writes linuxConfig.swapConfig YAML for --system-config-from-file. EksSwapConfig(BaseResource): nodeadm stub (deferred to PR Add swap_config to node-pool spec #6780).

PKB framework changes:

  • configs/container_spec.pySwapConfigSpec(BaseSpec) + _SwapConfigDecoder; swap_config field on NodepoolSpec (mirrors SandboxSpec pattern)
  • resources/container_service/container.pyswap_config attr on BaseNodePoolConfig
  • resources/container_service/container_cluster.py — propagate swap_config in _InitializeNodePool()
  • providers/gcp/google_kubernetes_engine.py_AddNodeParamsToCmd() reads nodepool_config.swap_config: applies --system-config-from-file (linuxConfig.swapConfig + sysctls), sets UBUNTU_CONTAINERD, --no-enable-autorepair, boot-disk IOPS/throughput

Thin benchmark (swap_encryption_benchmark.py):

  • BENCHMARK_CONFIG declares benchmark nodepool with swap_config — GKE cluster creation handles provisioning automatically, no separate nodepool management needed
  • Prepare(): deploy SwapDaemonSet + _delete_default_pool()
  • Run(): verify swap_active, swap_encrypted; report swap_cipher + swap_total_kb
  • Cleanup(): empty — PKB auto-deletes spec.resources

srikant-patil-epam and others added 5 commits June 19, 2025 10:46
…il.IssueCommand

Replace all raw ['gcloud', ...] list + vm_util.IssueCommand calls in
swap_encryption_benchmark.py with PKB's existing GcloudCommand infrastructure:

- _create_benchmark_node_pool: cluster._GcloudCommand() + cmd.flags + cmd.Issue
- _delete_default_node_pool: cluster._GcloudCommand() + cmd.Issue
- _attach_swap_disk: gcp_util.GcloudCommand(_GcpZonalResource) for create+attach
- _delete_disk_by_name: gcp_util.GcloudCommand for describe/detach/delete

Add _GcpZonalResource shim: pins zone for gcloud compute operations.
GcloudCommand auto-injects --project and --zone/--region, handles auth
token refresh -- matching PKB standards.
…fix imports

Replace manual temp-file + kubectl apply in _deploy_daemonset() with
PKB's kubernetes_commands.ApplyManifest():

- Remove _daemonset_yaml() helper
- _deploy_daemonset() delegates to kubernetes_commands.ApplyManifest(
    'cluster/swap_encryption_daemonset.yaml.j2', **kwargs)
- Add kubernetes_commands import; remove vm_util import (now unused)
- Fix import order: providers.gcp before resources.container_service
… remove cgroup hack

Address Ajay review comments on PR GoogleCloudPlatform#6776:

Comment #r3457877984 (linuxConfig.swapConfig):
Extend --system-config-from-file YAML with linuxConfig blocks:
  linuxConfig.swapConfig.enabled: true -- GKE sets up node-level swap
  dedicatedLocalSsdProfile.diskCount: N -- LSSD: use local NVMe for swap
  linuxConfig.sysctl: vm.swappiness=100, vm.min_free_kbytes=200,
    vm.watermark_scale_factor=500
Ref: https://cloud.google.com/kubernetes-engine/docs/how-to/node-memory-swap

Comment #r3457928855 (cgroup hack):
Remove memory.swap.max=max loop from swap_encryption_daemonset.yaml.j2.
With kubeletConfig.memorySwapBehavior=LimitedSwap the kubelet manages
per-container swap allocation; the cgroup hack is unnecessary.
…5); manifest moved to data/cluster and rendered via vm_util
@ajaysundark

Copy link
Copy Markdown

/cc @ajaysundark

DevVegeta and others added 2 commits June 19, 2026 13:18
Per Ajay's review comment on PR GoogleCloudPlatform#6758:
- Add _GKE_KUBELET_MEMORY_SWAP flag (default LimitedSwap) so the
  benchmark nodepool is created with kubeletConfig.memorySwapBehavior
  set via --system-config-from-file, enabling pod-level swap usage.
- Wrap gcloud IssueCommand in try/finally to clean up the temp YAML.
- Update nodepool creation log to include kubelet_swap value.
make install 2>&1) > /tmp/pkb_memtier_build.log 2>&1 || \\
echo "[pkb] WARNING: memtier_benchmark build failed (see /tmp/pkb_memtier_build.log); redis-benchmark fallback will be used"
fi
if command -v memtier_benchmark >/dev/null 2>&1; then

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR calls out this change as only the infra layer. Is this still required to hard-code memtier / other setup here?

The model we discussed with "swap as a base resource" is that by passing a swap config to the vm, we could plug this into other kubernetes benchmarks.

eg:

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ajaysundark , The DaemonSet was stripped to only measurement tools (fio, cryptsetup, mdadm, sysstat, nvme-cli). Workload benchmarks (redis, memtier, opensearch) run in separate PKB pods. Fix committed in the lean DaemonSet PR1 commit and in PR4's DaemonSet slim-down.

echo "[pkb] WARNING: kernel source extraction failed" >&2
fi
echo "[pkb] Unlocking container cgroup swap limits..."
# GKE cgroup v2 sets memory.swap.max=0 per-container, which

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kubernetes sets memory.swap.max=0 unless swap config is enabled.

For GKE, this would be already handled if you use --system-config-from-file to handle the setup.

cmd.flags['system-config-from-file'] = FLAGS.gke_node_system_config

linuxConfig:
  swapConfig:
    enabled: true
    dedicatedLocalSsdProfile:
      diskCount: 1
  sysctl:
    vm.min_free_kbytes: 200
    vm.watermark_scale_factor: 500
    vm.swappiness: 100

More reference: https://docs.cloud.google.com/kubernetes-engine/docs/how-to/node-memory-swap#enable

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If swap provisioning is handled via DaemonSet (for EKS), it needs to also handle this step: https://kubernetes.io/docs/tutorials/cluster-management/provision-swap-memory/#set-up-kubelet-configuration.

To add kubelet configuratoin and perform a kubelet service restart at the end to pickup swap configuration

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alternatively, if a nodeadm script is used instead of DaemonSet in EKS, the restart will not be needed as the setup is handled before kubelet start.

@DevVegeta DevVegeta Jun 24, 2026

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kubernetes sets memory.swap.max=0 unless swap config is enabled.

For GKE, this would be already handled if you use --system-config-from-file to handle the setup.

cmd.flags['system-config-from-file'] = FLAGS.gke_node_system_config

linuxConfig:
  swapConfig:
    enabled: true
    dedicatedLocalSsdProfile:
      diskCount: 1
  sysctl:
    vm.min_free_kbytes: 200
    vm.watermark_scale_factor: 500
    vm.swappiness: 100

More reference: https://docs.cloud.google.com/kubernetes-engine/docs/how-to/node-memory-swap#enable

Hi @ajaysundark ,
Fixed, The --system-config-from-file YAML now includes the full linuxConfig block alongside kubeletConfig.memorySwapBehavior:
yamlkubeletConfig:
memorySwapBehavior: LimitedSwap
linuxConfig:
swapConfig:
enabled: true
dedicatedLocalSsdProfile:
diskCount: 1 # LSSD path; omitted for standard/hyperdisk
sysctl:
vm.min_free_kbytes: 200
vm.watermark_scale_factor: 500
vm.swappiness: 100
For non-LSSD machine types the dedicatedLocalSsdProfile block is omitted. The _LSSD_COUNT flag controls diskCount when LSDs are present.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alternatively, if a nodeadm script is used instead of DaemonSet in EKS, the restart will not be needed as the setup is handled before kubelet start.

Hi @ajaysundark , ACKNOWLEDGED in stub. The _configure_eks_kubelet_swap() docstring explicitly documents the nodeadm bootstrap approach (apiVersion: node.eks.aws/v1alpha1 / Kind: NodeConfig / spec.kubelet.config.memorySwapBehavior: LimitedSwap) as the intended implementation path. Deferred to PR #6780.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If swap provisioning is handled via DaemonSet (for EKS), it needs to also handle this step: https://kubernetes.io/docs/tutorials/cluster-management/provision-swap-memory/#set-up-kubelet-configuration.

To add kubelet configuratoin and perform a kubelet service restart at the end to pickup swap configuration

Hi @ajaysundark , _configure_eks_kubelet_swap() is present in all branches as a documented stub, explaining the nodeadm approach and the PR #6780 dependency. The actual kubelet config write + restart is not implemented — it is explicitly deferred pending PR #6780 (SwapConfigSpec) merging.

fi
echo "[pkb] Unlocking container cgroup swap limits..."
# GKE cgroup v2 sets memory.swap.max=0 per-container, which
# prevents swap usage even when the node has a swap device and

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This hacking memory.swap.max=max across kubepods in the manifest is not required when swap kubelet configuration is set as LimitedSwap, as kubelet will allocate swap limits to the container.

You could still set sysctls as shared in the above comment to have a longer swapping window.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ajaysundark ,

Fixed. The memory.swap.max=max loop across kubepods has been removed from the DaemonSet manifest. With kubeletConfig.memorySwapBehavior=LimitedSwap set via --system-config-from-file, the kubelet manages per-container swap allocation directly — the manual cgroup write is not needed and was superseded by the kubelet config.

'swap_encryption_gke_kubelet_memory_swap',
'LimitedSwap',
'Value for kubeletConfig.memorySwapBehavior injected via '
'--system-config-from-file when creating the GKE benchmark nodepool. '

@ajaysundark ajaysundark Jun 23, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This correctly calls out the config, but I think probably a miss that the script still calls the daemonset for GKE as well ( _deploy_daemonset at :469)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ajaysundark , The DaemonSet is still deployed on GKE. In the current code, it serves as the fio execution vehicle for Phase 1 measurements (all phases run via kubectl exec into the pod) — it's no longer used for swap setup on GKE (that's done natively via linuxConfig.swapConfig)

@DevVegeta DevVegeta force-pushed the swap-encryption-pr1-shared-infra_new branch from 2b0b687 to 2c7a677 Compare June 24, 2026 05:42
Comment on lines +992 to +993
# Per Ajay's review (go/pkb-swap-encryption-pr1 #r3457877984):
# kubeletConfig.memorySwapBehavior: kubelet allocates swap to pods.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont remember suggesting explicit kubelet configuration needed for
memorySwapBehavior:LimitedSwap in GKE.
This will be handled by the swapConfig automatically. you only need to pass
the swapConfig and sysctl in system-config yaml file.

@DevVegeta DevVegeta Jun 25, 2026

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ajaysundark , Done — removed kubeletConfig.memorySwapBehavior from the system-config YAML. The --system-config-from-file now only passes linuxConfig.swapConfig + linuxConfig.sysctl, which is sufficient since GKE automatically enables memorySwapBehavior=LimitedSwap when swapConfig.enabled=true. The _GKE_KUBELET_MEMORY_SWAP flag has also been removed. The YAML passed at nodepool creation is now: ```
linuxConfig:
swapConfig:
enabled: true # (+ dedicatedLocalSsdProfile for LSSD)
sysctl:
vm.min_free_kbytes: 200
vm.watermark_scale_factor: 500
vm.swappiness: 100

GKE vs. EKS swap encryption and LSSD performance comparison.
Two-step nodepool setup: PKB provisions a minimal cluster with a cheap
default nodepool (Step 1), then Prepare() adds the real benchmark
nodepool (n4-highmem-32 / c4-*-lssd, COS_CONTAINERD, 80k IOPS) with a

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this and other comment references
on COS_CONTAINERD accurate? Seems like we are working with UBUNTU_CONTAINERD
(I'm fine with picking either for now), Does it need an update?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ajaysundark ,

Fixed — updated BENCHMARK_CONFIG description, Prepare() docstring, and _create_benchmark_node_pool() docstring to say UBUNTU_CONTAINERD consistently, matching the actual flag default. The one remaining COS_CONTAINERD mention in the _NODE_IMAGE_TYPE flag description ("Use COS_CONTAINERD only when dm-crypt is disabled") is intentional — it documents a valid but non-default option for plain-swap baseline runs.

@DevVegeta DevVegeta force-pushed the swap-encryption-pr1-shared-infra_new branch from bec6422 to 57c4884 Compare June 25, 2026 15:03
- Add SwapDaemonSet(resource.BaseResource) in resources/container_service/swap_daemonset.py
  - _Create(): apply Jinja2 manifest + wait for Running + /tmp/pkb_ready
  - _Delete(): in-pod swapoff/dmsetup/losetup/pkill teardown; kubectl delete
  - PodExec(): transient-reset retry, rc=137 OOM detection, pod recovery
- Add SwapNodePool(resource.BaseResource) in resources/container_service/swap_nodepool.py
  - _Create(): gcloud node-pools create with linuxConfig.swapConfig + optional swap disk
  - _Delete(): detach+delete disk; delete nodepool
  - DeleteDefaultPool(): remove dummy e2-medium pool after DaemonSet pod Running
- Rewrite benchmark to thin pattern: Prepare() uses resource.Create() + spec.resources
  - Cleanup() is empty - PKB framework auto-deletes spec.resources
  - Run() uses daemonset.PodExec() throughout
- Addresses Zac review: resources pattern, no infra code in benchmark file
- Fix COS_CONTAINERD -> UBUNTU_CONTAINERD (r3472549985)
- swapConfig auto-enables memorySwapBehavior=LimitedSwap (r3472513706)
… NodepoolSpec field

BREAKING: replaces SwapNodePool (standalone nodepool lifecycle) with the
correct PKB pattern: swap configuration declared in BENCHMARK_CONFIG and
applied by the existing GKE cluster creation flow.

New files:
- resources/container_service/swap_config.py
  - GkeSwapConfig(BaseResource): WriteLinuxConfigYaml(), ValidHyperdiskThroughput()
  - EksSwapConfig(BaseResource): stub for nodeadm config (deferred to PR GoogleCloudPlatform#6780)

Core framework changes:
- configs/container_spec.py: add SwapConfigSpec(BaseSpec) + _SwapConfigDecoder
  + swap_config field on NodepoolSpec
- resources/container_service/container.py: add swap_config attr to BaseNodePoolConfig
- resources/container_service/container_cluster.py: propagate swap_config in
  _InitializeNodePool() (mirrors sandbox_config pattern)
- providers/gcp/google_kubernetes_engine.py: _AddNodeParamsToCmd() reads
  nodepool_config.swap_config - applies --system-config-from-file,
  UBUNTU_CONTAINERD, --no-enable-autorepair, boot-disk-provisioned-iops/throughput

Thin benchmark:
- BENCHMARK_CONFIG declares benchmark nodepool with swap_config (no separate
  nodepool create needed - GKE cluster creation handles it)
- Prepare(): deploy SwapDaemonSet + delete default-pool
- Run(): verify swap_active + swap_encrypted; report samples
- Cleanup(): empty (PKB auto-deletes spec.resources)

Addresses Ajay reviews:
- r3457826290: swap as base resource plugged into GKE cluster creation flow
- r3457877984: linuxConfig.swapConfig via --system-config-from-file (GkeSwapConfig)
- r3457928855: removed memory.swap.max hack
- r3457964593: UBUNTU_CONTAINERD set per-nodepool in _AddNodeParamsToCmd
- r3472513706: swapConfig auto-enables memorySwapBehavior=LimitedSwap
- r3472549985: UBUNTU_CONTAINERD required for dm-crypt
… NodepoolSpec field

BREAKING: replaces SwapNodePool (standalone nodepool lifecycle) with the
correct PKB pattern: swap configuration declared in BENCHMARK_CONFIG and
applied by the existing GKE cluster creation flow.

New files:
- resources/container_service/swap_config.py
  - GkeSwapConfig(BaseResource): WriteLinuxConfigYaml(), ValidHyperdiskThroughput()
  - EksSwapConfig(BaseResource): stub for nodeadm config (deferred to PR GoogleCloudPlatform#6780)

Core framework changes:
- configs/container_spec.py: add SwapConfigSpec(BaseSpec) + _SwapConfigDecoder
  + swap_config field on NodepoolSpec
- resources/container_service/container.py: add swap_config attr to BaseNodePoolConfig
- resources/container_service/container_cluster.py: propagate swap_config in
  _InitializeNodePool() (mirrors sandbox_config pattern)
- providers/gcp/google_kubernetes_engine.py: _AddNodeParamsToCmd() reads
  nodepool_config.swap_config - applies --system-config-from-file,
  UBUNTU_CONTAINERD, --no-enable-autorepair, boot-disk-provisioned-iops/throughput

Thin benchmark:
- BENCHMARK_CONFIG declares benchmark nodepool with swap_config (no separate
  nodepool create needed - GKE cluster creation handles it)
- Prepare(): deploy SwapDaemonSet + delete default-pool
- Run(): verify swap_active + swap_encrypted; report samples
- Cleanup(): empty (PKB auto-deletes spec.resources)

Addresses Ajay reviews:
- r3457826290: swap as base resource plugged into GKE cluster creation flow
- r3457877984: linuxConfig.swapConfig via --system-config-from-file (GkeSwapConfig)
- r3457928855: removed memory.swap.max hack
- r3457964593: UBUNTU_CONTAINERD set per-nodepool in _AddNodeParamsToCmd
- r3472513706: swapConfig auto-enables memorySwapBehavior=LimitedSwap
- r3472549985: UBUNTU_CONTAINERD required for dm-crypt
GkeSwapConfig and EksSwapConfig now both inherit from BaseSwapConfig(BaseResource).
Common sysctl attrs (swappiness, min_free_kbytes, watermark_scale_factor) live in
the base class. Cloud-specific attrs remain in each subclass.

Addresses Zac review: GkeSwapConfig & EksSwapConfig should inherit from BaseSwapConfig.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants