Skip to content

Add E2E test cases for apply -f with kubeadm join flow#106

Open
Copilot wants to merge 6 commits intomainfrom
copilot/add-e2e-tests-apply-f-usage
Open

Add E2E test cases for apply -f with kubeadm join flow#106
Copilot wants to merge 6 commits intomainfrom
copilot/add-e2e-tests-apply-f-usage

Conversation

Copy link

Copilot AI commented Feb 28, 2026

  • Add a third VM (vm-e2e-kubeadm-*) to hack/e2e/infra/main.bicep for the kubeadm join test
  • Update hack/e2e/lib/infra.sh to extract and store kubeadm VM outputs, and wait for SSH on the new VM
  • Add node_join_kubeadm function to hack/e2e/lib/node-join-kubeadm.sh (uses apply -f with KubeadmNodeJoin action)
  • Update node_join_all in node-join.sh to also join the kubeadm VM in parallel
  • Update hack/e2e/lib/validate.sh to include kubeadm VM in node validation and smoke tests
  • Update hack/e2e/lib/cleanup.sh to collect logs from and clean up the kubeadm VM
  • Update hack/e2e/run.sh to add join-kubeadm command
  • Update hack/e2e/README.md to document the new kubeadm test flow and the three node join modes
  • Refactor hack/e2e/infra/main.bicep to use reusable modules/vm.bicep for all flex-node VMs
  • Replace systemd-run with direct aks-flex-node apply -f invocation in node_join_kubeadm
  • Expand kubeadm bootstrap setup in node_join_kubeadm: add kubeadm Roles/RoleBindings (nodes-kubeadm-config, kubelet-config), ClusterRole/ClusterRoleBinding (get-nodes), and ConfigMaps (cluster-info, kubeadm-config, kubelet-config)
  • Add imagePublisher, imageOffer, imageSku, imageVersion parameters to modules/vm.bicep with Ubuntu 24.04 LTS defaults
  • Fix BCP120 Bicep error: replace vmMsi.outputs.principalId (runtime value) with msiVmName var (deployment-start value) in roleAssignments name fields
  • Split node-join.sh into category-specific files:
    • node-join.sh — shared _deploy_and_start_agent helper + node_join_all orchestration + sources sub-files
    • node-join-msi.shnode_join_msi
    • node-join-token.shnode_join_token
    • node-join-kubeadm.shnode_join_kubeadm
Original prompt

This section details on the original issue you should resolve

<issue_title>E2E test cases for apply -f usage</issue_title>
<issue_description>We should follow the existing e2e tests under hack/e2e for creating and joining node by using apply -f. We will use kubeadm join flow for this case to fulfill the test coverage for the kubeadm component.</issue_description>

Comments on the Issue (you are @copilot in this section)


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Co-authored-by: bcho <1975118+bcho@users.noreply.github.com>
Copilot AI changed the title [WIP] Add E2E test cases for apply -f usage Add E2E test cases for apply -f with kubeadm join flow Feb 28, 2026
Copy link
Member

@bcho bcho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}
}

resource pipKubeadm 'Microsoft.Network/publicIPAddresses@2023-11-01' = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do a refactor to make azure vm creation as a reusable bicep module

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored in 949e7bc. Extracted pip + NIC + VM into hack/e2e/infra/modules/vm.bicep with an assignManagedIdentity flag. main.bicep now uses three module declarations and references outputs via vmMsi.outputs.principalId etc.

sudo mkdir -p /etc/aks-flex-node /var/log/aks-flex-node
sudo cp /tmp/kubeadm-join.json /etc/aks-flex-node/

sudo systemd-run \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't use systemd-run, just invoke the apply command

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 949e7bc. The systemd-run wrapper is gone; the remote script now calls sudo /usr/local/bin/aks-flex-node apply --no-prettyui -f /etc/aks-flex-node/kubeadm-join.json directly and tee's output to /var/log/aks-flex-node/aks-flex-node.log.

…rect apply invocation

Co-authored-by: bcho <1975118+bcho@users.noreply.github.com>
Copy link
Member

@bcho bcho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// =============================================================================
// modules/vm.bicep - Reusable Ubuntu flex-node VM module
//
// Creates a public IP, NIC, and Ubuntu 22.04 VM in the given subnet.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allow customizing the vhd version. Use 24.04 by default

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 836de59. Added imagePublisher, imageOffer, imageSku, and imageVersion parameters to modules/vm.bicep. Defaults are now Ubuntu 24.04 LTS (ubuntu-24_04-lts / server / latest). Callers can override any of these to pin a specific image.

…p module

Co-authored-by: bcho <1975118+bcho@users.noreply.github.com>
@bcho
Copy link
Member

bcho commented Feb 28, 2026

[06:36:53] [INFO] Checking prerequisites...
[06:36:53] [OK] All prerequisites satisfied
[06:36:53] [INFO] Configuration loaded:
[06:36:53] [INFO]   Resource Group:   ***
[06:36:53] [INFO]   Location:         ***
[06:36:53] [INFO]   Subscription:     ***
[06:36:53] [INFO]   Name Suffix:      1772260613
[06:36:53] [INFO]   Skip Cleanup:     1
==================================================================
  AKS Flex Node E2E Test - Full Run
==================================================================
[06:36:53] [INFO] Building aks-flex-node binary for linux/amd64...
[06:36:56] [OK] Binary built in 3s -> /tmp/aks-flex-node-e2e/aks-flex-node
==================================================================
  Deploying Infrastructure (Bicep)
==================================================================
[06:36:57] [INFO] Resource group already exists: ***
[06:36:57] [INFO] Deploying Bicep template (this may take 5-10 minutes)...
ERROR: /home/github-runner/actions-runner/_work/AKSFlexNode/AKSFlexNode/hack/e2e/infra/main.bicep(191,29) : Error BCP120: This expression is being used in an assignment to the "name" property of the "Microsoft.Authorization/roleAssignments" type, which requires a value that can be calculated at the start of the deployment. Properties of vmMsi which can be calculated at the start include "name". [https://aka.ms/bicep/core-diagnostics#BCP120]
/home/github-runner/actions-runner/_work/AKSFlexNode/AKSFlexNode/hack/e2e/infra/main.bicep(202,29) : Error BCP120: This expression is being used in an assignment to the "name" property of the "Microsoft.Authorization/roleAssignments" type, which requires a value that can be calculated at the start of the deployment. Properties of vmMsi which can be calculated at the start include "name". [https://aka.ms/bicep/core-diagnostics#BCP120]
Error: Process completed with exit code 1.

test failed @copilot

…ent names

Co-authored-by: bcho <1975118+bcho@users.noreply.github.com>
Copy link
Author

Copilot AI commented Feb 28, 2026

[06:36:53] [INFO] Checking prerequisites...
[06:36:53] [OK] All prerequisites satisfied

...

Fixed in 9c59f4c. The BCP120 error occurred because the name property of Microsoft.Authorization/roleAssignments was seeded with vmMsi.outputs.principalId, which is a runtime-only module output. Replaced both occurrences with msiVmName (a var known at deployment start) — the principalId is still used correctly in properties.principalId.

Copy link
Member

@bcho bcho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# node_join_msi - Install Azure CLI + MSI auth, deploy binary, run agent
# node_join_token - Create bootstrap token/RBAC, deploy binary, run agent
# node_join_all - Join both nodes (MSI first, then token)
# node_join_msi - Install Azure CLI + MSI auth, deploy binary, run agent
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

split node-join.sh by category into smaller files

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in eba1003. node-join.sh now only contains the shared _deploy_and_start_agent helper, the three source statements, and node_join_all. Each join mode has its own file:

  • node-join-msi.shnode_join_msi
  • node-join-token.shnode_join_token
  • node-join-kubeadm.shnode_join_kubeadm

run.sh still sources only node-join.sh, so the public API is unchanged.

…in-kubeadm.sh

Co-authored-by: bcho <1975118+bcho@users.noreply.github.com>
@bcho bcho marked this pull request as ready for review February 28, 2026 22:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

E2E test cases for apply -f usage

2 participants