Skip to content

EFS mount fails on Bottlerocket: efs-proxy cannot bind to localhost #323

@mike-ainsel

Description

@mike-ainsel

What happened?

EFS mounts fail on EKS 1.34 with Bottlerocket nodes. The efs-proxy component starts but immediately panics when trying to bind to localhost, causing mount failures with DeadlineExceeded errors.

When manually running efs-proxy inside the CSI node container, it crashes with:

thread 'main' (127) panicked at src/controller.rs:89:13:
Failed to bind 127.0.0.1:20381
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

The mount.log shows:

2026-01-14 16:12:01 UTC - INFO - Starting efs-proxy: "/sbin/efs-proxy /var/run/efs/stunnel-config... --tls"
2026-01-14 16:12:01 UTC - INFO - Started efs-proxy, pid: 26
2026-01-14 16:12:01 UTC - WARNING - Error connecting to 127.0.0.1:20241, [Errno 111] Connection refused
2026-01-14 16:12:16 UTC - ERROR - Mounting ... failed due to timeout after 15 sec

What you expected to happen?

EFS should mount successfully. The efs-proxy should be able to bind to localhost and proxy NFS traffic over TLS.

How to reproduce it (as minimally and precisely as possible)?

  1. Deploy EKS 1.34 cluster with Bottlerocket nodes (tested with Bottlerocket OS 1.52.0)
  2. Install EFS CSI driver v2.2.0 as EKS add-on
  3. Create EFS with mount targets in node subnets
  4. Create StorageClass, PVC, and Pod:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: efs-sc
provisioner: efs.csi.aws.com
parameters:
  provisioningMode: efs-ap
  fileSystemId: fs-XXXXXXXXX
  directoryPerms: "755"
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: efs-pvc
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: efs-sc
  resources:
    requests:
      storage: 5Gi
---
apiVersion: v1
kind: Pod
metadata:
  name: efs-test
spec:
  containers:
  - name: app
    image: amazonlinux:2023
    command: ["/bin/sh", "-c", "sleep 3600"]
    volumeMounts:
    - name: efs
      mountPath: /data
  volumes:
  - name: efs
    persistentVolumeClaim:
      claimName: efs-pvc
  1. Pod stays in ContainerCreating with FailedMount events

Anything else we need to know?

  • Plain NFS mount works (without TLS/efs-proxy):

    mount -t nfs4 -o nfsvers=4.1 fs-xxx.efs.region.amazonaws.com:/ /mnt/efs
  • TLS connection to EFS works (tested with openssl s_client)

  • The CSI node container is properly configured:

    • privileged: true
    • hostNetwork: true
    • Loopback interface exists and has traffic
  • The stunnel config generated includes socket = a:SO_BINDTODEVICE=lo which may have issues in containerized environments

  • Tested with both encrypted (KMS) and unencrypted EFS - same failure

  • Also reproduced on Amazon Linux 2023 EC2 instance with efs-utils v2.4.1

  • Warning in logs: Could not start amazon-efs-mount-watchdog, unrecognized init system "aws-efs-csi-dri"

Environment

  • Kubernetes version (use kubectl version): v1.34 (EKS platform eks.9)
  • Driver version: v2.2.0-eksbuild.1 (EKS Add-on)
  • Node OS: Bottlerocket OS 1.52.0 (aws-k8s-1.34)
  • Kernel: 6.12.58
  • Container Runtime: containerd://2.1.5+bottlerocket
  • Region: eu-central-1

Please also attach debug logs to help us better diagnose

EFS CSI Node Pod logs:

I0114 16:09:14.552835       1 config_dir.go:88] Creating symlink from '/etc/amazon/efs' to '/var/amazon/efs'
I0114 16:09:14.568524       1 driver.go:131] Registering Node Server
I0114 16:09:14.568583       1 driver.go:133] Registering Controller Server
I0114 16:09:14.568623       1 driver.go:136] Starting efs-utils watchdog
I0114 16:09:14.569367       1 driver.go:151] Listening for connections on address: &net.UnixAddr{Name:"/csi/csi.sock", Net:"unix"}
I0114 16:12:00.975226       1 mount_linux.go:285] Detected OS without systemd

Mount log from inside container (/var/log/amazon/efs/mount.log):

2026-01-14 16:12:01 UTC - INFO - version=2.4.1 options={'rw': None, 'accesspoint': 'fsap-xxx', 'tls': None}
2026-01-14 16:12:01 UTC - INFO - binding 20241
2026-01-14 16:12:01 UTC - WARNING - Could not start amazon-efs-mount-watchdog, unrecognized init system "aws-efs-csi-dri"
2026-01-14 16:12:01 UTC - INFO - Starting efs-proxy: "/sbin/efs-proxy /var/run/efs/stunnel-config.fs-xxx... --tls"
2026-01-14 16:12:01 UTC - INFO - Started efs-proxy, pid: 26
2026-01-14 16:12:01 UTC - WARNING - Error connecting to 127.0.0.1:20241, [Errno 111] Connection refused
2026-01-14 16:12:01 UTC - INFO - Executing: "/sbin/mount.nfs4 127.0.0.1:/ /var/lib/kubelet/pods/.../mount -o rw,nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport,port=20241" with 15 sec time limit.
2026-01-14 16:12:16 UTC - ERROR - Mounting fs-xxx.efs.eu-central-1.amazonaws.com to ... failed due to timeout after 15 sec, mount attempt 1/3

Manual efs-proxy execution (from inside CSI node container):

$ /sbin/efs-proxy "/var/run/efs/stunnel-config.fs-xxx..." --tls
thread 'main' (127) panicked at src/controller.rs:89:13:
Failed to bind 127.0.0.1:20381
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Stunnel config content:

fips = no
foreground = yes
socket = l:SO_REUSEADDR=yes
socket = a:SO_BINDTODEVICE=lo
pid = /var/run/efs/.../stunnel.pid
[efs]
client = yes
accept = 127.0.0.1:20381
connect = fs-xxx.efs.eu-central-1.amazonaws.com:2049
sslVersion = TLSv1.2
renegotiation = no
TIMEOUTbusy = 20
TIMEOUTclose = 0
TIMEOUTidle = 70
delay = yes
verify = 2
CAfile = /etc/amazon/efs/efs-utils.crt
cert = /var/run/efs/.../certificate.pem
key = /etc/amazon/efs/privateKey.pem
checkHost = fs-xxx.efs.eu-central-1.amazonaws.com

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions