Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Etcd snapshot trying too frequently #153

Merged

Conversation

jakefhyde
Copy link
Contributor

@jakefhyde jakefhyde commented Jun 10, 2022

rancher/rancher#37639

  • Added retry flags for backups and s3, so that rke can control the number of retries. This fixes scenarios where rancher will continuously retry (previously 5 times), and rke-tools would retry separately (4 times per snapshot), which would end up choking rancher, rke, the downstream etcd nodes, and the s3 endpoint.
  • Check whether the s3 bucket has versioning enabled, as in this case subsequent uploads to the s3 bucket would be useless. Although this versioning is not useful from a rancher perspective (since during the first restore we download the latest version and plow over the existing snapshot on all the etcd nodes), it does allow us to save on network traffic which has been the source of other issues while trying to restore from snapshots taken during this sort of network saturation, because taking snapshots (including uploading to the s3 bucket) is done node by node.
  • Fixed an issue where failure to create an s3 client (such as a bad token, or the bucket doesn't exist) would cause the snapshot on the node to be deleted. Failure to upload a snapshot to s3 should not constitute a cluster with no backups ever.
  • Removed using the v1 s3 API when the endpoint is storage.googleapis.com. I tested this with a bucket in GCP, and it worked fine with the v2 s3 API, so whatever requirement drove that is no longer valid.

@jakefhyde jakefhyde force-pushed the 37639-etcd-snapshot-trying-too-frequently branch 2 times, most recently from 5c80e10 to a2cf7c5 Compare June 10, 2022 05:20
@jakefhyde jakefhyde requested review from a team and snasovich June 10, 2022 22:30
@jakefhyde jakefhyde marked this pull request as ready for review June 10, 2022 22:41
Copy link
Member

@jiaqiluo jiaqiluo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall LGTM.

main.go Outdated
return client, nil
}

func CreateBackup(backupName, etcdCACert, etcdCert, etcdKey, endpoints string, backupRetries, s3Retries int) (compressedFilePath string, err error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the parameters3Retries is not used in the function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another good catch!

@jiaqiluo jiaqiluo requested a review from a team June 13, 2022 23:42
@jakefhyde jakefhyde force-pushed the 37639-etcd-snapshot-trying-too-frequently branch from a2cf7c5 to a62e5ca Compare June 14, 2022 16:08
@jakefhyde jakefhyde requested a review from jiaqiluo June 14, 2022 16:11
jiaqiluo
jiaqiluo previously approved these changes Jun 14, 2022
@jiaqiluo jiaqiluo requested a review from a team June 14, 2022 16:14
@jakefhyde jakefhyde force-pushed the 37639-etcd-snapshot-trying-too-frequently branch from a62e5ca to a9ba50d Compare June 14, 2022 16:23
@jakefhyde jakefhyde requested a review from jiaqiluo June 14, 2022 17:21
main.go Show resolved Hide resolved
main.go Outdated Show resolved Hide resolved
main.go Outdated Show resolved Hide resolved
main.go Outdated Show resolved Hide resolved
main.go Show resolved Hide resolved
Added optimistic checking of s3 bucket for existing snapshot and
versioning enabled before attempting multiple uploads. Update minio client to v7
@jakefhyde jakefhyde force-pushed the 37639-etcd-snapshot-trying-too-frequently branch from a9ba50d to 5eb3ae8 Compare June 16, 2022 21:02
@jakefhyde jakefhyde requested a review from a-blender June 16, 2022 21:09
@a-blender
Copy link

Also force pushing for this big of a PR is pretty confusing for reviewers. I know we have the compare button, but an extra commit with changes would be helpful

Copy link

@a-blender a-blender left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See previous comment

@a-blender a-blender self-requested a review June 16, 2022 22:14
@jakefhyde jakefhyde requested a review from a team June 16, 2022 22:56
main.go Show resolved Hide resolved
@jiaqiluo jiaqiluo self-requested a review June 21, 2022 16:25
@jakefhyde jakefhyde merged commit 8a2ab96 into rancher:master Jun 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants