-
Notifications
You must be signed in to change notification settings - Fork 284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sonobuoy Conformance on EKS-A on Baremetal shows failures. #3423
Comments
Hey @elamaran11, thanks for reporting this. I'm not seeing any failures testing on bare metal hardware with Would you mind sharing the details of your cluster creation, please (cluster spec, hardware csv, etc)? Are you following the Equinix guide here? I will test on Equinix Metal to see about reproducing. Thanks again for the report! |
@jacobweinstock Yes im following the exact equinix guide. Please provide us an update as soon as you can reproduce and fix the issue. Here is my
|
Hey @elamaran11. Here's the results from my conformance test. I wasn't able to reproduce the failures you posted. I did get one failure, but it is only because my cluster had only a single worker node. One thing that did stand out was the bottlerocket and Kubernetes versions. @elamaran11, would you mind doing another run on your side? If you have only one worker node, Sonobuoy will throw the following error: [sig-apps] Daemon set [Serial] should rollback without unnecessary restarts [Conformance] -- ref I followed the guide from here, https://github.com/equinix-labs/terraform-equinix-metal-eks-anywhere, to setup the cluster. cd terraform-equinix-metal-eks-anywhere/examples/deploy
terraform init
terraform apply Then, on the admin node i ran the conformance test with sonobuoy run --wait
Results Plugin: e2e
Status: failed
Total: 7050
Passed: 343
Failed: 1
Skipped: 6706
Failed tests:
[sig-apps] Daemon set [Serial] should rollback without unnecessary restarts [Conformance]
Plugin: systemd-logs
Status: passed
Total: 2
Passed: 2
Failed: 0
Skipped: 0
Run Details:
API Server version: v1.23.9-eks-68c1cba
Node health: 2/2 (100%)
Pods health: 35/36 (97%)
Details for failed pods:
sonobuoy/sonobuoy-e2e-job-62d8ed75dd74406a Ready:False: ContainersNotReady: containers with unready status: [e2e sonobuoy-worker]
Errors detected in files:
Errors:
1705 podlogs/kube-system/cilium-jdlsz/logs/cilium-agent.txt
1347 podlogs/kube-system/kube-controller-manager-139.178.68.19/logs/kube-controller-manager.txt
588 podlogs/sonobuoy/sonobuoy-e2e-job-62d8ed75dd74406a/logs/e2e.txt
107 podlogs/kube-system/kube-apiserver-139.178.68.19/logs/kube-apiserver.txt
70 podlogs/kube-system/kube-scheduler-139.178.68.19/logs/kube-scheduler.txt
8 podlogs/kube-system/kube-proxy-vkptp/logs/kube-proxy.txt
8 podlogs/kube-system/kube-proxy-tp52s/logs/kube-proxy.txt
5 podlogs/kube-system/cilium-6thdk/logs/cilium-agent.txt
1 podlogs/kube-system/etcd-139.178.68.19/logs/etcd.txt
1 podlogs/kube-system/kube-vip-139.178.68.19/logs/kube-vip.txt
Warnings:
486 podlogs/kube-system/kube-controller-manager-139.178.68.19/logs/kube-controller-manager.txt
379 podlogs/kube-system/cilium-jdlsz/logs/cilium-agent.txt
103 podlogs/kube-system/kube-apiserver-139.178.68.19/logs/kube-apiserver.txt
37 podlogs/kube-system/kube-scheduler-139.178.68.19/logs/kube-scheduler.txt
14 podlogs/sonobuoy/sonobuoy-e2e-job-62d8ed75dd74406a/logs/e2e.txt
10 podlogs/kube-system/cilium-6thdk/logs/cilium-agent.txt
4 podlogs/kube-system/etcd-139.178.68.19/logs/etcd.txt
2 podlogs/sonobuoy/sonobuoy/logs/kube-sonobuoy.txt Here is the final generated EKSA cluster config: apiVersion: anywhere.eks.amazonaws.com/v1alpha1
kind: Cluster
metadata:
name: my-eksa-cluster
spec:
clusterNetwork:
cniConfig:
cilium: {}
pods:
cidrBlocks:
- 192.168.0.0/16
services:
cidrBlocks:
- 10.96.0.0/12
controlPlaneConfiguration:
count: 1
endpoint:
host: "139.178.68.30"
machineGroupRef:
kind: TinkerbellMachineConfig
name: my-eksa-cluster-cp
datacenterRef:
kind: TinkerbellDatacenterConfig
name: my-eksa-cluster
kubernetesVersion: "1.23"
managementCluster:
name: my-eksa-cluster
workerNodeGroupConfigurations:
- count: 1
machineGroupRef:
kind: TinkerbellMachineConfig
name: my-eksa-cluster
name: md-0
---
apiVersion: anywhere.eks.amazonaws.com/v1alpha1
kind: TinkerbellDatacenterConfig
metadata:
name: my-eksa-cluster
spec:
tinkerbellIP: "139.178.68.29"
---
apiVersion: anywhere.eks.amazonaws.com/v1alpha1
kind: TinkerbellMachineConfig
metadata:
name: my-eksa-cluster-cp
spec:
hardwareSelector:
type: cp
osFamily: bottlerocket
templateRef:
kind: TinkerbellTemplateConfig
name: cp-my-eksa-cluster-m3-small-x86
users:
- name: ec2-user
sshAuthorizedKeys:
- ssh-rsa AA...
---
apiVersion: anywhere.eks.amazonaws.com/v1alpha1
kind: TinkerbellMachineConfig
metadata:
name: my-eksa-cluster
spec:
hardwareSelector:
type: dp
osFamily: bottlerocket
templateRef:
kind: TinkerbellTemplateConfig
name: dp-my-eksa-cluster-m3-small-x86
users:
- name: ec2-user
sshAuthorizedKeys:
- ssh-rsa AA...
---
{}
---
apiVersion: anywhere.eks.amazonaws.com/v1alpha1
kind: TinkerbellTemplateConfig
metadata:
name: cp-my-eksa-cluster-m3-small-x86
spec:
template:
global_timeout: 6000
id: ""
name: cp-my-eksa-cluster-m3-small-x86
tasks:
- actions:
- environment:
COMPRESSED: "true"
DEST_DISK: /dev/sda
IMG_URL: https://anywhere-assets.eks.amazonaws.com/releases/bundles/17/artifacts/raw/1-23/bottlerocket-v1.23.9-eks-d-1-23-5-eks-a-17-amd64.img.gz
image: public.ecr.aws/eks-anywhere/tinkerbell/hub/image2disk:6c0f0d437bde2c836d90b000312c8b25fa1b65e1-eks-a-17
name: stream-image
timeout: 600
- environment:
CONTENTS: |
# Version is required, it will change as we support
# additional settings
version = 1
# "eno1" is the interface name
# Users may turn on dhcp4 and dhcp6 via boolean
[enp1s0f0np0]
dhcp4 = true
dhcp6 = false
# Define this interface as the "primary" interface
# for the system. This IP is what kubelet will use
# as the node IP. If none of the interfaces has
# "primary" set, we choose the first interface in
# the file
primary = true
DEST_DISK: /dev/sda12
DEST_PATH: /net.toml
DIRMODE: "0755"
FS_TYPE: ext4
GID: "0"
MODE: "0644"
UID: "0"
image: public.ecr.aws/eks-anywhere/tinkerbell/hub/writefile:6c0f0d437bde2c836d90b000312c8b25fa1b65e1-eks-a-17
name: write-netplan
pid: host
timeout: 90
- environment:
BOOTCONFIG_CONTENTS: |
kernel {
console = "ttyS1,115200n8"
}
DEST_DISK: /dev/sda12
DEST_PATH: /bootconfig.data
DIRMODE: "0700"
FS_TYPE: ext4
GID: "0"
MODE: "0644"
UID: "0"
image: public.ecr.aws/eks-anywhere/tinkerbell/hub/writefile:6c0f0d437bde2c836d90b000312c8b25fa1b65e1-eks-a-17
name: write-bootconfig
pid: host
timeout: 90
- environment:
DEST_DISK: /dev/sda12
DEST_PATH: /user-data.toml
DIRMODE: "0700"
FS_TYPE: ext4
GID: "0"
HEGEL_URLS: http://139.178.68.18:50061,http://139.178.68.29:50061
MODE: "0644"
UID: "0"
image: public.ecr.aws/eks-anywhere/tinkerbell/hub/writefile:6c0f0d437bde2c836d90b000312c8b25fa1b65e1-eks-a-17
name: write-user-data
pid: host
timeout: 90
- image: public.ecr.aws/eks-anywhere/tinkerbell/hub/reboot:6c0f0d437bde2c836d90b000312c8b25fa1b65e1-eks-a-17
name: reboot-image
pid: host
timeout: 90
volumes:
- /worker:/worker
name: cp-my-eksa-cluster-m3-small-x86
volumes:
- /dev:/dev
- /dev/console:/dev/console
- /lib/firmware:/lib/firmware:ro
worker: '{{.device_1}}'
version: "0.1"
---
apiVersion: anywhere.eks.amazonaws.com/v1alpha1
kind: TinkerbellTemplateConfig
metadata:
name: dp-my-eksa-cluster-m3-small-x86
spec:
template:
global_timeout: 6000
id: ""
name: dp-my-eksa-cluster-m3-small-x86
tasks:
- actions:
- environment:
COMPRESSED: "true"
DEST_DISK: /dev/sda
IMG_URL: https://anywhere-assets.eks.amazonaws.com/releases/bundles/17/artifacts/raw/1-23/bottlerocket-v1.23.9-eks-d-1-23-5-eks-a-17-amd64.img.gz
image: public.ecr.aws/eks-anywhere/tinkerbell/hub/image2disk:6c0f0d437bde2c836d90b000312c8b25fa1b65e1-eks-a-17
name: stream-image
timeout: 600
- environment:
CONTENTS: |
# Version is required, it will change as we support
# additional settings
version = 1
# "eno1" is the interface name
# Users may turn on dhcp4 and dhcp6 via boolean
[enp1s0f0np0]
dhcp4 = true
dhcp6 = false
# Define this interface as the "primary" interface
# for the system. This IP is what kubelet will use
# as the node IP. If none of the interfaces has
# "primary" set, we choose the first interface in
# the file
primary = true
DEST_DISK: /dev/sda12
DEST_PATH: /net.toml
DIRMODE: "0755"
FS_TYPE: ext4
GID: "0"
MODE: "0644"
UID: "0"
image: public.ecr.aws/eks-anywhere/tinkerbell/hub/writefile:6c0f0d437bde2c836d90b000312c8b25fa1b65e1-eks-a-17
name: write-netplan
pid: host
timeout: 90
- environment:
BOOTCONFIG_CONTENTS: |
kernel {
console = "ttyS1,115200n8"
}
DEST_DISK: /dev/sda12
DEST_PATH: /bootconfig.data
DIRMODE: "0700"
FS_TYPE: ext4
GID: "0"
MODE: "0644"
UID: "0"
image: public.ecr.aws/eks-anywhere/tinkerbell/hub/writefile:6c0f0d437bde2c836d90b000312c8b25fa1b65e1-eks-a-17
name: write-bootconfig
pid: host
timeout: 90
- environment:
DEST_DISK: /dev/sda12
DEST_PATH: /user-data.toml
DIRMODE: "0700"
FS_TYPE: ext4
GID: "0"
HEGEL_URLS: http://139.178.68.18:50061,http://139.178.68.29:50061
MODE: "0644"
UID: "0"
image: public.ecr.aws/eks-anywhere/tinkerbell/hub/writefile:6c0f0d437bde2c836d90b000312c8b25fa1b65e1-eks-a-17
name: write-user-data
pid: host
timeout: 90
- image: public.ecr.aws/eks-anywhere/tinkerbell/hub/reboot:6c0f0d437bde2c836d90b000312c8b25fa1b65e1-eks-a-17
name: reboot-image
pid: host
timeout: 90
volumes:
- /worker:/worker
name: dp-my-eksa-cluster-m3-small-x86
volumes:
- /dev:/dev
- /dev/console:/dev/console
- /lib/firmware:/lib/firmware:ro
worker: '{{.device_1}}'
version: "0.1" Here is the hostname,vendor,mac,ip_address,gateway,netmask,nameservers,disk,labels
eksa-gi3g9q-node-cp-001,Equinix,10:70:fd:7f:99:a2,139.178.68.19,139.178.68.17,255.255.255.240,8.8.8.8,/dev/sda,type=cp
eksa-gi3g9q-node-dp-001,Equinix,10:70:fd:86:ee:aa,139.178.68.20,139.178.68.17,255.255.255.240,8.8.8.8,/dev/sda,type=dp |
Hey @displague and @cprivitere, would either of you, by chance, have any thoughts or insights on this? |
Looks like this was the only failed test, as you pointed out, because of the limited cluster size. What does it test?
We've released v0.3.2 but I can't think of any significant changes you'd encounter in the previous builds. |
Hey @displague, thanks for the response. Any insight into @elamaran11 's original failures at the very top, by chance? |
Team - |
Team -
|
I was trying to run a Conformance test using Sonobuoy on EKS-A deployed on Bare metal with partner hardware Equinix. The Sonobuoy validation failed with following errors when I ran this with sonobuoy
v0.56.10
.When i ran with Sonobuoy
v0.50.0
, Sonobuoy validation never made a progress and thats another error i want to report.Appreciate if you can take a look in to these failures and lets us know if these are good to go or we will be having any bug fixes to EKSA version. Thanks in advance.
The text was updated successfully, but these errors were encountered: