Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: added validation for prometheusRule size (#6478) #6606

Merged
merged 11 commits into from
Sep 16, 2024

Conversation

yp969803
Copy link
Contributor

Description

fixes: #6478
Reject PrometheusRule objects that generate > 1MB configmap

Type of change

What type of changes does your code introduce to the Prometheus operator? Put an x in the box that apply.

  • CHANGE (fix or feature that would cause existing functionality to not work as expected)
  • FEATURE (non-breaking change which adds functionality)
  • BUGFIX (non-breaking change which fixes an issue)
  • ENHANCEMENT (non-breaking change which improves existing functionality)
  • NONE (if none of the other choices apply. Example, tooling, build system, CI, docs, etc.)

Verification

added unit test

Changelog entry


@yp969803 yp969803 requested a review from a team as a code owner May 18, 2024 21:56
@yp969803
Copy link
Contributor Author

@simonpasquier can u review the pr

Copy link
Contributor

@simonpasquier simonpasquier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact the original issue is about the internal hardcoded limit of the operator which is 512KiB (1MiB/2):

// The maximum `Data` size of a ConfigMap seems to differ between
// environments. This is probably due to different meta data sizes which count
// into the overall maximum size of a ConfigMap. Thereby lets leave a
// large buffer.
var maxConfigMapDataSize = int(float64(v1.MaxSecretSize) * 0.5)

IMHO the first things to do would be

  • better advertise this limit in the API documentation.
  • have a descriptive error message stating the operator limit and the actual value.

I'd also rather see the check in ValidateRule() since it would help users that don't run the admission webhook.

Eventually we may want to make the 512MB limit configurable and/or increase the default limit. Unfortunately #1591 has little details on why 50% of 1MiB was picked up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this specific case, I'd avoid storing a large file in the Git repository.

@yp969803
Copy link
Contributor Author

@simonpasquier done the above changes

@github-actions github-actions bot added the stale label Jul 27, 2024
@yp969803
Copy link
Contributor Author

yp969803 commented Aug 26, 2024

@simonpasquier @ArthurSens @slashpai the pr is ready to merge.

@github-actions github-actions bot removed the stale label Aug 27, 2024
Copy link
Contributor

@simonpasquier simonpasquier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove the size check from each operator because it becomes redundant with ValidateRule(). The (minor) change would be a generated rule file could be slightly above the limit in case the namespace label enforcer is enabled but it should be safe (the 512KB limit is arbitrary anyway).

//check if none of the rule files is too large for a single ConfigMap
for filename, file := range ruleFiles {
if len(file) > maxConfigMapDataSize {
return nil, fmt.Errorf(
"rule file '%v' is too large for a single Kubernetes ConfigMap",
filename,
)
}
}

@@ -233,6 +233,14 @@ func (a *Admission) validatePrometheusRules(ar v1.AdmissionReview) *v1.Admission
return toAdmissionResponseFailure(errUnmarshalRules, prometheusRuleResource, []error{err})
}

// Check if the size of prometheusRule exceeds 512KB (1048576 Bytes).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ValidateRule() which is called below will also check the total size hence we don't have to do it explicitly here (bonus point: it would be based on the spec size, not the raw payload).

maxSize := 524288
promRuleSize := len(content)
if promRuleSize > maxSize {
return []error{fmt.Errorf("prometheusRules exceeded by %d bytes, maximum limit is 512KB", promRuleSize-maxSize)}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather print the current size and max limit.

Suggested change
return []error{fmt.Errorf("prometheusRules exceeded by %d bytes, maximum limit is 512KB", promRuleSize-maxSize)}
return []error{fmt.Errorf("the length of rendered Prometheus Rule is %d bytes which is above the maximum limit of %d bytes", promRuleSize, maxSize)}

@@ -158,6 +158,14 @@ func ValidateRule(promRuleSpec monitoringv1.PrometheusRuleSpec) []error {
if err != nil {
return []error{fmt.Errorf("failed to marshal content: %w", err)}
}

// Check if the size of prometheusRule exceeds 512KB (1048576 Bytes).
maxSize := 524288
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move the maxConfigMapDataSize consts from the pkg/prometheus/server and pkg/thanos packages to pkg/operator and use it here.

@pull-request-size pull-request-size bot added size/M and removed size/S labels Aug 29, 2024
@yp969803
Copy link
Contributor Author

@simonpasquier done the changes

Copy link
Contributor

@simonpasquier simonpasquier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we'd need an e2e test but it can be added later.

@@ -158,6 +164,13 @@ func ValidateRule(promRuleSpec monitoringv1.PrometheusRuleSpec) []error {
if err != nil {
return []error{fmt.Errorf("failed to marshal content: %w", err)}
}

// Check if the size of prometheusRule exceeds 512KB (1048576 Bytes).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Check if the size of prometheusRule exceeds 512KB (1048576 Bytes).
// Check if the serialized rules exceed our internal limit.

@@ -206,7 +200,7 @@ func (c *Operator) selectRuleNamespaces(p *monitoringv1.Prometheus) ([]string, e
func makeRulesConfigMaps(p *monitoringv1.Prometheus, ruleFiles map[string]string, opts ...operator.ObjectOption) ([]v1.ConfigMap, error) {
//check if none of the rule files is too large for a single ConfigMap
for filename, file := range ruleFiles {
if len(file) > maxConfigMapDataSize {
if len(file) > operator.MaxConfigMapDataSize {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the block shouldn't be needed anymore since the operator should have called ValidateRule() before.

Suggested change
if len(file) > operator.MaxConfigMapDataSize {

@yp969803
Copy link
Contributor Author

@simonpasquier done the changes

pkg/prometheus/server/rules_test.go Show resolved Hide resolved
@@ -207,7 +201,7 @@ func (o *Operator) selectRuleNamespaces(p *monitoringv1.ThanosRuler) ([]string,
func makeRulesConfigMaps(t *monitoringv1.ThanosRuler, ruleFiles map[string]string, opts ...operator.ObjectOption) ([]v1.ConfigMap, error) {
//check if none of the rule files is too large for a single ConfigMap
for filename, file := range ruleFiles {
if len(file) > maxConfigMapDataSize {
if len(file) > operator.MaxConfigMapDataSize {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same remark than before: promRuleSelector.Select() will prevent "too big" rules from reaching here so we can remove this check.

@yp969803
Copy link
Contributor Author

@simonpasquier done the changes

Copy link
Contributor

@simonpasquier simonpasquier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@simonpasquier simonpasquier merged commit 4f36476 into prometheus-operator:main Sep 16, 2024
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reject PrometheusRule objects that generate > 512KiB configmap
2 participants