Node Join Failure
⏱️ Estimated time required: 30 minutes
Background
Corporation XYZ's e-commerce platform has been steadily growing, and the engineering team has decided to expand the EKS cluster to handle the increased workload. The team plans to create a new subnet in the us-west-2 region and provision a new managed node group under this subnet.
Sam, an experienced DevOps engineer, has been tasked with executing this expansion plan. Sam begins by creating a new VPC subnet in the us-west-2 region, with a new CIDR block. The goal is to have the new managed node group run the application workloads in this new subnet, separate from the existing node groups.
After creating the new subnet, Sam proceeds to configure the new managed node group new_nodegroup_2 in the EKS cluster. During the node group creation process, Sam notices that the new nodes are not visible in the EKS cluster and not joining the cluster.
Step 1: Verify Node Status
- Let's first verify if the new nodes from nodegroup new_nodegroup_2 are visible in the cluster:
No resources found
Step 2: Check Managed Node Group Status
Let's examine the EKS managed node group configuration to verify its status and configuration:
Output:
{
"nodegroup": {
"nodegroupName": "new_nodegroup_2",
"nodegroupArn": "arn:aws:eks:us-west-2:1234567890:nodegroup/eks-workshop/new_nodegroup_2/abcd1234-1234-abcd-1234-1234abcd1234",
"clusterName": "eks-workshop",
...
"status": "ACTIVE",
"capacityType": "ON_DEMAND",
"scalingConfig": {
"minSize": 0,
"maxSize": 1,
"desiredSize": 1
},
...
"health": {
"issues": []
Alternatively, you can also check the console for the same. Click the button below to open the EKS Console.
Open EKS Cluster Compute TabKey observations from the output:
- Node group status is ACTIVE
- Desired capacity is 1
- No health issues reported
- Scaling configuration is correct
Step 3: Investigate Auto Scaling Group
Let's check the ASG activities to understand the instance launch status:
3.1. Identify Nodegroup's Auto Scaling Group Name
Run the below command to capture Nodegroup Autoscale Group name as NEW_NODEGROUP_2_ASG_NAME.
4.2. Check the AutoScaling Activities
Output:
{
"Activities": [
{
"ActivityId": "1234abcd-1234-abcd-1234-1234abcd1234",
"AutoScalingGroupName": "eks-new_nodegroup_2-abcd1234-1234-abcd-1234-1234abcd1234",
--->>> "Description": "Launching a new EC2 instance: i-1234abcd1234abcd1",
"Cause": "At 2024-10-09T14:59:26Z a user request update of AutoScalingGroup constraints to min: 0, max: 2, desired: 1 changing the desired capacity from 0 to 1. At 2024-10-09T14:59:36Z an instance was started in response to a difference between desired and actual capacity, increasing the capacity from 0 to 1.",
...
--->>> "StatusCode": "Successful",
...
}
]
}
You can check the EKS console as well. Click the Autoscaling group name to open the ASG console view ASG activity.
Open EKS cluster Nodegroup TabKey findings:
- Instance launch was successful
- ASG reports normal operation
- Desired capacity changes were processed
Step 4: Examine EC2 Instance Configuration
Let's inspect the launched EC2 instance configuration:
Note: For your convenience we have added the instance ID as env variable with the variable $NEW_NODEGROUP_2_INSTANCE_ID.
Output:
[
[
{
"InstanceState": "running",
"SubnetId": "subnet-1234abcd1234abcd1",
"VpcId": "vpc-1234abcd1234abcd1",
"InstanceProfile": {
"Arn": "arn:aws:iam::1234567890:instance-profile/eks-abcd1234-1234-abcd-1234-1234abcd1234",
"Id": "ABCDEFGHIJK1LMNOP2QRS"
},
"SecurityGroups": [
{
"GroupName": "eks-cluster-sg-eks-workshop-123456789",
"GroupId": "sg-1234abcd1234abcd1"
}
]
}
]
]
Important aspects to verify:
- Instance state is "running"
- Instance profile and IAM role assignments
- Security group configurations
info
To use the console, click the button below to open the EC2 Console.
Open EC2 Console
Step 5: Analyze Network Configuration
Let's examine the subnet and routing configuration:
Note: For your convenience Subnet ID is added as env variable $NEW_NODEGROUP_2_SUBNET_ID.