Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 37 additions & 13 deletions hack/e2e/README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
# AKS Flex Node E2E Tests

End-to-end tests that provision an AKS cluster and two Ubuntu VMs in Azure, join
them as flex nodes (one via MSI, one via bootstrap token), and run smoke tests.
End-to-end tests that provision an AKS cluster and three Ubuntu VMs in Azure,
join them as flex nodes (one via MSI, one via bootstrap token, one via kubeadm
join using `apply -f`), and run smoke tests.

## Prerequisites

Expand All @@ -27,8 +28,8 @@ export E2E_LOCATION=westus2
make e2e
```

This will build the agent binary, deploy infrastructure via Bicep, join both
nodes, run validations, collect logs, and tear everything down.
This will build the agent binary, deploy infrastructure via Bicep, join all
three nodes, run validations, collect logs, and tear everything down.

## Commands

Expand All @@ -38,10 +39,11 @@ omitted it defaults to `all`.
| Command | Description |
|---------|-------------|
| `all` | Full flow: build, infra, join, validate, cleanup (default) |
| `infra` | Deploy AKS cluster + 2 VMs via Bicep |
| `join` | Join both nodes to the cluster |
| `infra` | Deploy AKS cluster + 3 VMs via Bicep |
| `join` | Join all nodes to the cluster |
| `join-msi` | Join only the MSI-authenticated node |
| `join-token` | Join only the bootstrap-token node |
| `join-kubeadm` | Join only the kubeadm node (`apply -f` with `KubeadmNodeJoin`) |
| `validate` | Verify nodes joined and run smoke tests |
| `smoke` | Run smoke tests only (nginx pods on flex nodes) |
| `logs` | Collect logs from VMs |
Expand Down Expand Up @@ -72,6 +74,21 @@ Additional environment variables:
| `AZURE_SUBSCRIPTION_ID` | (auto-detected) | Azure subscription |
| `AZURE_TENANT_ID` | (auto-detected) | Azure tenant |

## Node Join Modes

The E2E suite tests three node join methods:

| VM | Auth Mode | Join Method |
|----|-----------|-------------|
| `vm-e2e-msi-*` | Managed Identity (MSI) | `aks-flex-node agent --config config.json` |
| `vm-e2e-token-*` | Bootstrap Token | `aks-flex-node agent --config config.json` |
| `vm-e2e-kubeadm-*` | Bootstrap Token | `aks-flex-node apply -f kubeadm-join.json` |

The kubeadm VM uses the `apply -f` command with a JSON action file that
contains a sequence of component actions (configure OS, download CRI/kube/CNI
binaries, start containerd, then `KubeadmNodeJoin`) to join the cluster using
the kubeadm join flow.

## Iterative Development

The subcommands make it easy to deploy infrastructure once and iterate on the
Expand All @@ -84,6 +101,7 @@ join or validation steps without re-provisioning every time.
# Iterate on the join logic
./hack/e2e/run.sh join-msi
./hack/e2e/run.sh join-token
./hack/e2e/run.sh join-kubeadm

# Run validation
./hack/e2e/run.sh validate
Expand Down Expand Up @@ -114,13 +132,16 @@ make e2e-cleanup # Tear down resources
hack/e2e/
run.sh Main entry point / orchestrator
infra/
main.bicep Bicep template (AKS + VNet + NSG + 2 VMs + role assignments)
main.bicep Bicep template (AKS + VNet + NSG + 3 VMs + role assignments)
lib/
common.sh Logging, prereqs, config, state management, SSH helpers
infra.sh Bicep deployment, output extraction, kubeconfig fetch
node-join.sh MSI and token node join logic
validate.sh Node-ready checks and smoke tests (nginx pods)
cleanup.sh Log collection and Azure resource teardown
common.sh Logging, prereqs, config, state management, SSH helpers
infra.sh Bicep deployment, output extraction, kubeconfig fetch
node-join.sh Shared helper (_deploy_and_start_agent) + node_join_all orchestration
node-join-msi.sh MSI auth node join (node_join_msi)
node-join-token.sh Bootstrap token node join (node_join_token)
node-join-kubeadm.sh Kubeadm apply -f node join (node_join_kubeadm)
validate.sh Node-ready checks and smoke tests (nginx pods)
cleanup.sh Log collection and Azure resource teardown
```

## State File
Expand All @@ -135,7 +156,10 @@ previous one left off. Use `run.sh status` to inspect it.
your SSH key is available (defaults to `~/.ssh/id_rsa.pub`). Check the state
file for the correct VM public IPs with `run.sh status`.
- **Node not joining**: Run `run.sh logs` to pull `journalctl` and agent logs
from both VMs. Logs are saved to `$E2E_WORK_DIR/logs/`.
from all VMs. Logs are saved to `$E2E_WORK_DIR/logs/`.
- **Kubeadm join failures**: Check `kubeadm-agent-journal.log` and
`kubeadm-kubelet.log` in the logs directory. The `apply -f` approach runs
sequentially; each action step must succeed before the next one starts.
- **Timeouts**: Adjust `E2E_SSH_WAIT_TIMEOUT`, `E2E_NODE_JOIN_TIMEOUT`, or
`E2E_POD_READY_TIMEOUT` environment variables (in seconds).
- **Leftover resources**: If a previous run didn't clean up, run
Expand Down
217 changes: 56 additions & 161 deletions hack/e2e/infra/main.bicep
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,10 @@
// - AKS cluster (1-node control plane)
// - VM with system-assigned managed identity (MSI auth mode)
// - VM without managed identity (bootstrap token auth mode)
// - VM without managed identity (kubeadm apply -f auth mode)
//
// Both VMs run Ubuntu 22.04 LTS, have public IPs, and allow SSH ingress.
// All flex-node VMs run Ubuntu 22.04 LTS, have public IPs, and allow SSH
// ingress. VM creation is delegated to the reusable modules/vm.bicep module.
// =============================================================================

@description('Azure region for all resources.')
Expand Down Expand Up @@ -34,11 +36,12 @@ param tags object = {}
// ---------------------------------------------------------------------------
// Variables
// ---------------------------------------------------------------------------
var clusterName = 'aks-e2e-${nameSuffix}'
var msiVmName = 'vm-e2e-msi-${nameSuffix}'
var tokenVmName = 'vm-e2e-token-${nameSuffix}'
var vnetName = 'vnet-e2e-${nameSuffix}'
var nsgName = 'nsg-e2e-${nameSuffix}'
var clusterName = 'aks-e2e-${nameSuffix}'
var msiVmName = 'vm-e2e-msi-${nameSuffix}'
var tokenVmName = 'vm-e2e-token-${nameSuffix}'
var kubeadmVmName = 'vm-e2e-kubeadm-${nameSuffix}'
var vnetName = 'vnet-e2e-${nameSuffix}'
var nsgName = 'nsg-e2e-${nameSuffix}'

var subnetAksName = 'snet-aks'
var subnetVmName = 'snet-vm'
Expand Down Expand Up @@ -136,159 +139,47 @@ resource aksCluster 'Microsoft.ContainerService/managedClusters@2024-01-01' = {
}

// ---------------------------------------------------------------------------
// Public IPs for VMs
// Flex-node VMs (via reusable module)
// ---------------------------------------------------------------------------
resource pipMsi 'Microsoft.Network/publicIPAddresses@2023-11-01' = {
name: '${msiVmName}-pip'
location: location
tags: tags
sku: { name: 'Standard' }
properties: {
publicIPAllocationMethod: 'Static'
}
}

resource pipToken 'Microsoft.Network/publicIPAddresses@2023-11-01' = {
name: '${tokenVmName}-pip'
location: location
tags: tags
sku: { name: 'Standard' }
properties: {
publicIPAllocationMethod: 'Static'
}
}

// ---------------------------------------------------------------------------
// NICs
// ---------------------------------------------------------------------------
resource nicMsi 'Microsoft.Network/networkInterfaces@2023-11-01' = {
name: '${msiVmName}-nic'
location: location
tags: tags
properties: {
ipConfigurations: [
{
name: 'ipconfig1'
properties: {
subnet: {
id: vnet.properties.subnets[1].id
}
publicIPAddress: {
id: pipMsi.id
}
privateIPAllocationMethod: 'Dynamic'
}
}
]
}
}

resource nicToken 'Microsoft.Network/networkInterfaces@2023-11-01' = {
name: '${tokenVmName}-nic'
location: location
tags: tags
properties: {
ipConfigurations: [
{
name: 'ipconfig1'
properties: {
subnet: {
id: vnet.properties.subnets[1].id
}
publicIPAddress: {
id: pipToken.id
}
privateIPAllocationMethod: 'Dynamic'
}
}
]
module vmMsi 'modules/vm.bicep' = {
name: 'deploy-vm-msi'
params: {
location: location
vmName: msiVmName
vmSize: vmSize
adminUsername: adminUsername
sshPublicKey: sshPublicKey
subnetId: vnet.properties.subnets[1].id
assignManagedIdentity: true
tags: tags
}
}

// ---------------------------------------------------------------------------
// VM: MSI (system-assigned managed identity)
// ---------------------------------------------------------------------------
resource vmMsi 'Microsoft.Compute/virtualMachines@2024-03-01' = {
name: msiVmName
location: location
tags: tags
identity: {
type: 'SystemAssigned'
}
properties: {
hardwareProfile: { vmSize: vmSize }
osProfile: {
computerName: msiVmName
adminUsername: adminUsername
linuxConfiguration: {
disablePasswordAuthentication: true
ssh: {
publicKeys: [
{
path: '/home/${adminUsername}/.ssh/authorized_keys'
keyData: sshPublicKey
}
]
}
}
}
storageProfile: {
imageReference: {
publisher: 'Canonical'
offer: '0001-com-ubuntu-server-jammy'
sku: '22_04-lts-gen2'
version: 'latest'
}
osDisk: {
createOption: 'FromImage'
managedDisk: { storageAccountType: 'StandardSSD_LRS' }
}
}
networkProfile: {
networkInterfaces: [ { id: nicMsi.id } ]
}
module vmToken 'modules/vm.bicep' = {
name: 'deploy-vm-token'
params: {
location: location
vmName: tokenVmName
vmSize: vmSize
adminUsername: adminUsername
sshPublicKey: sshPublicKey
subnetId: vnet.properties.subnets[1].id
assignManagedIdentity: false
tags: tags
}
}

// ---------------------------------------------------------------------------
// VM: Token (no managed identity)
// ---------------------------------------------------------------------------
resource vmToken 'Microsoft.Compute/virtualMachines@2024-03-01' = {
name: tokenVmName
location: location
tags: tags
properties: {
hardwareProfile: { vmSize: vmSize }
osProfile: {
computerName: tokenVmName
adminUsername: adminUsername
linuxConfiguration: {
disablePasswordAuthentication: true
ssh: {
publicKeys: [
{
path: '/home/${adminUsername}/.ssh/authorized_keys'
keyData: sshPublicKey
}
]
}
}
}
storageProfile: {
imageReference: {
publisher: 'Canonical'
offer: '0001-com-ubuntu-server-jammy'
sku: '22_04-lts-gen2'
version: 'latest'
}
osDisk: {
createOption: 'FromImage'
managedDisk: { storageAccountType: 'StandardSSD_LRS' }
}
}
networkProfile: {
networkInterfaces: [ { id: nicToken.id } ]
}
module vmKubeadm 'modules/vm.bicep' = {
name: 'deploy-vm-kubeadm'
params: {
location: location
vmName: kubeadmVmName
vmSize: vmSize
adminUsername: adminUsername
sshPublicKey: sshPublicKey
subnetId: vnet.properties.subnets[1].id
assignManagedIdentity: false
tags: tags
}
}

Expand All @@ -297,21 +188,21 @@ resource vmToken 'Microsoft.Compute/virtualMachines@2024-03-01' = {
// ---------------------------------------------------------------------------
// Azure Kubernetes Service Cluster Admin Role
resource roleClusterAdmin 'Microsoft.Authorization/roleAssignments@2022-04-01' = {
name: guid(aksCluster.id, vmMsi.id, 'aks-cluster-admin')
name: guid(aksCluster.id, msiVmName, 'aks-cluster-admin')
scope: aksCluster
properties: {
principalId: vmMsi.identity.principalId
principalId: vmMsi.outputs.principalId
principalType: 'ServicePrincipal'
roleDefinitionId: subscriptionResourceId('Microsoft.Authorization/roleDefinitions', '0ab0b1a8-8aac-4efd-b8c2-3ee1fb270be8')
}
}

// Azure Kubernetes Service RBAC Cluster Admin
resource roleRbacAdmin 'Microsoft.Authorization/roleAssignments@2022-04-01' = {
name: guid(aksCluster.id, vmMsi.id, 'aks-rbac-cluster-admin')
name: guid(aksCluster.id, msiVmName, 'aks-rbac-cluster-admin')
scope: aksCluster
properties: {
principalId: vmMsi.identity.principalId
principalId: vmMsi.outputs.principalId
principalType: 'ServicePrincipal'
roleDefinitionId: subscriptionResourceId('Microsoft.Authorization/roleDefinitions', 'b1ff04bb-8a4e-4dc4-8eb5-8693973ce19b')
}
Expand All @@ -324,11 +215,15 @@ output clusterName string = aksCluster.name
output clusterId string = aksCluster.id
output clusterFqdn string = aksCluster.properties.fqdn

output msiVmName string = vmMsi.name
output msiVmIp string = pipMsi.properties.ipAddress
output msiVmPrincipalId string = vmMsi.identity.principalId
output msiVmName string = vmMsi.outputs.vmName
output msiVmIp string = vmMsi.outputs.publicIpAddress
output msiVmPrincipalId string = vmMsi.outputs.principalId

output tokenVmName string = vmToken.name
output tokenVmIp string = pipToken.properties.ipAddress
output tokenVmName string = vmToken.outputs.vmName
output tokenVmIp string = vmToken.outputs.publicIpAddress

output kubeadmVmName string = vmKubeadm.outputs.vmName
output kubeadmVmIp string = vmKubeadm.outputs.publicIpAddress

output adminUsername string = adminUsername

Loading
Loading