Skip to content

feat(disruption): Memory pressure.#1040

Open
Zenithar wants to merge 4 commits intomainfrom
zenithar/chaos-controller/memory_pressure_injection
Open

feat(disruption): Memory pressure.#1040
Zenithar wants to merge 4 commits intomainfrom
zenithar/chaos-controller/memory_pressure_injection

Conversation

@Zenithar
Copy link

@Zenithar Zenithar commented Feb 17, 2026

[Disruption] Add memory pressure injection

Summary

What: Adds a new memoryPressure disruption kind that allows injecting controlled memory pressure on targeted pods by allocating memory inside the target's cgroup until a specified percentage of the memory limit is consumed.

Why: The chaos-controller already supports CPU pressure, disk pressure, and disk failure disruptions, but lacked the ability to simulate memory pressure scenarios. This is critical for testing application resilience to OOM conditions, memory-constrained environments, and memory-based autoscaling behaviors.

How: Memory is consumed by joining the target's cgroup and using mmap(2) with MAP_POPULATE to allocate anonymous memory pages that are immediately backed by physical memory. Allocation can be instantaneous or gradually ramped over a configurable duration. A two-tier architecture (pressure injector + stress subprocess) mirrors the existing CPU pressure pattern.

Changes

Domain Logic (api/v1beta1/):

  • New MemoryPressureSpec type with targetPercent (required, 1-100%) and rampDuration (optional) fields
  • Validation: percentage range, non-negative ramp duration, incompatibility with container-specific targeting
  • GenerateArgs() and Explain() methods for CLI arg generation and human-readable explanations
  • ParseTargetPercent() helper accepting both "76%" and "76" formats
  • DisruptionSpec extended with MemoryPressure field, wired into all validation rules (exclusivity with NodeFailure/ContainerFailure/PodReplacement, onInit incompatibility, container specificity check, pulse compatibility)
  • DisruptionKindPicker, DisruptionCount, and Explain updated for the new kind
  • Deepcopy generated (zz_generated.deepcopy.go)
  • CRD manifests regenerated (disruptions, disruptioncrons, disruptionrollouts)

Injector (injector/):

  • memoryPressureInjector — orchestrator that parses the spec, spawns a background stress process, and manages its lifecycle (inject/clean)
  • memoryStressInjector — in-process stress worker that joins the target cgroup, reads memory.max/memory.current (cgroupv2) or memory.limit_in_bytes/memory.usage_in_bytes (cgroupv1), computes the allocation delta, and consumes memory via mmap in configurable ramp steps
  • Platform-specific mmap: memory_alloc_linux.go (real syscall.Mmap with MAP_POPULATE) and memory_alloc_other.go (stub returning error)
  • MemoryStressArgsBuilder interface + mock for testability

CLI (cli/injector/):

  • memory-pressure cobra subcommand with --target-percent and --ramp-duration flags
  • memory-stress cobra subcommand (subprocess) with --target-percent and --ramp-duration flags
  • Registered in injector's main.go

Types (types/):

  • New DisruptionKindMemoryPressure and DisruptionKindMemoryStress constants
  • DisruptionKindMemoryPressure added to DisruptionKindNames slice

Safemode (safemode/):

  • New Memory safemode struct implementing the Safemode interface
  • Registered in AddAllSafemodeObjects when MemoryPressure is present

E2E Tests (controllers/):

  • New controllers/memory_pressure_test.go with comprehensive scenarios:
    • Basic injection and status lifecycle
    • Ramp duration injection
    • Combined with CPU pressure (multi-disruption)
    • Pulse mode with active/dormant cycling
    • Targeted container stop resilience (SIGTERM and SIGKILL)
  • Refactored controllers/cpu_pressure_test.go:
    • Removed ExecuteRemoteCommand helper (and remotecommand / spdystream dependencies)
    • Replaced remote exec-based stress verification with pod readiness checks
    • Un-skipped the data race test (CHAOSPLT-212)

Unit Tests:

  • api/v1beta1/memory_pressure_test.go — Validate, GenerateArgs, Explain
  • injector/memory_pressure_test.go — Inject (success, invalid percent, background error), Clean (no-op, after inject)

Documentation:

  • docs/memory_disruption.md — Full user-facing documentation (spec, examples, cgroup behavior, troubleshooting)
  • docs/disruption_catalogue.md — Comprehensive disruption catalogue
  • docs/README.md — Updated with memory pressure link
  • CLAUDE.md — Project-level AI coding guidelines

Vendor / Dependencies:

  • Removed github.com/moby/spdystream and github.com/mxk/go-flowrate from go.mod (no longer needed after removing remotecommand usage in tests)
  • Removed vendored packages: moby/spdystream, mxk/go-flowrate, x/net/websocket, k8s.io/apimachinery/pkg/util/httpstream, k8s.io/apimachinery/pkg/util/portforward, k8s.io/apimachinery/pkg/util/proxy, k8s.io/apimachinery/pkg/util/remotecommand, k8s.io/client-go/tools/remotecommand, k8s.io/client-go/transport/spdy, k8s.io/client-go/transport/websocket, k8s.io/client-go/util/exec

Testing

Unit Tests:

  • MemoryPressureSpec.Validate() — valid percentages, edge cases (0%, 101%, "abc"), ramp duration
  • MemoryPressureSpec.GenerateArgs() — with and without ramp duration
  • MemoryPressureSpec.Explain() — immediate vs ramped descriptions
  • memoryPressureInjector.Inject() — successful injection, invalid percent, background process errors
  • memoryPressureInjector.Clean() — no-op when no process, proper stop after injection

E2E Tests:

  • Memory pressure injection status lifecycle
  • Ramp duration gradual allocation
  • Multi-disruption (memory + CPU pressure)
  • Pulse mode active/dormant cycling
  • Container restart resilience (SIGTERM / SIGKILL)

Breaking Changes

No breaking changes. memoryPressure is a new optional field on DisruptionSpec.

Dependencies

  • Removed github.com/moby/spdystream v0.5.0 (indirect, no longer needed)
  • Removed github.com/mxk/go-flowrate (indirect, no longer needed)
  • No new dependencies added

Checklist

  • Code follows Go idioms and project conventions
  • Platform-specific code gated with build tags (linux / !linux)
  • CRD manifests regenerated
  • Deepcopy code regenerated
  • License headers present on all new files
  • Safemode integration wired
  • Validation rules updated for all exclusivity/compatibility checks
  • No sensitive information in logs
  • Context propagation implemented correctly

@Zenithar Zenithar self-assigned this Feb 17, 2026
@datadog-official
Copy link

datadog-official bot commented Feb 17, 2026

✅ Tests

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

🎯 Code Coverage (details)
Patch Coverage: 29.43%
Overall Coverage: 39.07% (-0.22%)

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 5e73238 | Docs | Datadog PR Page | Was this helpful? Give us feedback!

@Zenithar Zenithar force-pushed the zenithar/chaos-controller/memory_pressure_injection branch 3 times, most recently from 713b891 to 90f2dfe Compare February 17, 2026 16:19
@Zenithar Zenithar force-pushed the zenithar/chaos-controller/memory_pressure_injection branch from 90f2dfe to e0350aa Compare February 17, 2026 17:07
@Zenithar Zenithar marked this pull request as ready for review February 18, 2026 14:57
@Zenithar Zenithar requested a review from a team as a code owner February 18, 2026 14:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments