Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kube-proxy fails with "--bind-address" flag: x is not a valid IP address seen when not using IP address as node address and cloud provider is configured #1725

Closed
superseb opened this issue Oct 29, 2019 · 6 comments

Comments

@superseb
Copy link
Contributor

superseb commented Oct 29, 2019

RKE version:
v0.3.2

cluster.yml file:

nodes
- address: shortname
  user: ubuntu
  role: [ "controlplane", "etcd", "worker" ]

cloud_provider:
  name: aws

Steps to Reproduce:
Run rke up with cluster.yml as shown above. (address or internal_address not as IP address and cloud_provider set)

Results:
kube-proxy fails to start:

FATA[0354] [workerPlane] Failed to bring up Worker Plane: [Failed to verify healthcheck: Failed to check http://localhost:10256/healthz for service [kube-proxy] on host [shortname]: Get http://localhost:10256/healthz: Unable to access the service on localhost:10256. The service might be still starting up. Error: ssh: rejected: connect failed (Connection refused), log: invalid argument "shortname" for "--bind-address" flag: "shortname" is not a valid IP address] 

This was implemented because kube-proxy fails to start on k8s 1.16 if the IP if not provided to identify the node when cloud-provider is configured (AWS and possibly Openstack)

Main problem is that the address still needs to be provided, so we should probably error out in validation or find another way to make this happen. We already concluded not to do DNS resolution in RKE and use the address from that.

Workaround is to use IP addresses and not hostname/fqdn as address.

@nagonzalez
Copy link

I ran into this issue trying to go from 0.3.1 to 0.3.2 on a rke cluster in AWS. I reverted back to 0.3.1 and everything seems ok.

Can I replace DNS hostname with IP value and retry the upgrade or will that mess things up even more?

@nagonzalez
Copy link

putting a IP value for internal_address allowed me to upgrade to 0.3.2 in AWS. ie:

nodes
- address: shortname
  internal_address: 1.2.3.4
  user: ubuntu
  role: [ "controlplane", "etcd", "worker" ]

cloud_provider:
  name: aws

@vterdunov
Copy link

Faced with the same problem. After adding internal_address rke add nodes with other names.

k8s-etcd1                   Ready      etcd           15m   v1.15.5
k8s-etcd1.lab.vi.local      NotReady   etcd           21h   v1.15.5
k8s-ingress1                Ready      worker         15m   v1.15.5
k8s-ingress1.lab.vi.local   NotReady   worker         21h   v1.15.5
k8s-master1                 Ready      controlplane   15m   v1.15.5
k8s-master1.lab.vi.local    NotReady   controlplane   21h   v1.15.5
k8s-node1                   Ready      worker         15m   v1.15.5
k8s-node1.lab.vi.local      NotReady   worker         21h   v1.15.5
k8s-node2                   Ready      worker         15m   v1.15.5
k8s-node2.lab.vi.local      NotReady   worker         21h   v1.15.5
k8s-node3                   Ready      worker         15m   v1.15.5
k8s-node3.lab.vi.local      NotReady   worker         21h   v1.15.5

@remche
Copy link

remche commented Nov 21, 2019

It's a concern, because when using a not resolvable hostname for internal_address, kube-proxy will start with nodeIP 127.0.0.1, causing LoadBalancer service with local externalTrafficPolicy to fail (at least with MetalLB) : metallb/metallb#287

@iTaybb
Copy link

iTaybb commented Dec 13, 2019

I saw this issue in rancher 2.3.3.
This doesn't happen in 2.3.2.

@github-actions
Copy link
Contributor

github-actions bot commented Aug 3, 2022

This repository uses an automated workflow to automatically label issues which have not had any activity (commit/comment/label) for 60 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the workflow can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the workflow will automatically close the issue in 14 days. Thank you for your contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants