-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: added validation for prometheusRule size (#6478) #6606
Conversation
@simonpasquier can u review the pr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact the original issue is about the internal hardcoded limit of the operator which is 512KiB (1MiB/2):
prometheus-operator/pkg/prometheus/server/rules.go
Lines 36 to 40 in 7e18b82
// The maximum `Data` size of a ConfigMap seems to differ between | |
// environments. This is probably due to different meta data sizes which count | |
// into the overall maximum size of a ConfigMap. Thereby lets leave a | |
// large buffer. | |
var maxConfigMapDataSize = int(float64(v1.MaxSecretSize) * 0.5) |
IMHO the first things to do would be
- better advertise this limit in the API documentation.
- have a descriptive error message stating the operator limit and the actual value.
I'd also rather see the check in ValidateRule()
since it would help users that don't run the admission webhook.
Eventually we may want to make the 512MB limit configurable and/or increase the default limit. Unfortunately #1591 has little details on why 50% of 1MiB was picked up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this specific case, I'd avoid storing a large file in the Git repository.
@simonpasquier done the above changes |
@simonpasquier @ArthurSens @slashpai the pr is ready to merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would remove the size check from each operator because it becomes redundant with ValidateRule(). The (minor) change would be a generated rule file could be slightly above the limit in case the namespace label enforcer is enabled but it should be safe (the 512KB limit is arbitrary anyway).
prometheus-operator/pkg/prometheus/server/rules.go
Lines 206 to 214 in 0cf01e5
//check if none of the rule files is too large for a single ConfigMap | |
for filename, file := range ruleFiles { | |
if len(file) > maxConfigMapDataSize { | |
return nil, fmt.Errorf( | |
"rule file '%v' is too large for a single Kubernetes ConfigMap", | |
filename, | |
) | |
} | |
} |
pkg/admission/admission.go
Outdated
@@ -233,6 +233,14 @@ func (a *Admission) validatePrometheusRules(ar v1.AdmissionReview) *v1.Admission | |||
return toAdmissionResponseFailure(errUnmarshalRules, prometheusRuleResource, []error{err}) | |||
} | |||
|
|||
// Check if the size of prometheusRule exceeds 512KB (1048576 Bytes). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ValidateRule() which is called below will also check the total size hence we don't have to do it explicitly here (bonus point: it would be based on the spec size, not the raw payload).
pkg/operator/rules.go
Outdated
maxSize := 524288 | ||
promRuleSize := len(content) | ||
if promRuleSize > maxSize { | ||
return []error{fmt.Errorf("prometheusRules exceeded by %d bytes, maximum limit is 512KB", promRuleSize-maxSize)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather print the current size and max limit.
return []error{fmt.Errorf("prometheusRules exceeded by %d bytes, maximum limit is 512KB", promRuleSize-maxSize)} | |
return []error{fmt.Errorf("the length of rendered Prometheus Rule is %d bytes which is above the maximum limit of %d bytes", promRuleSize, maxSize)} |
pkg/operator/rules.go
Outdated
@@ -158,6 +158,14 @@ func ValidateRule(promRuleSpec monitoringv1.PrometheusRuleSpec) []error { | |||
if err != nil { | |||
return []error{fmt.Errorf("failed to marshal content: %w", err)} | |||
} | |||
|
|||
// Check if the size of prometheusRule exceeds 512KB (1048576 Bytes). | |||
maxSize := 524288 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would move the maxConfigMapDataSize
consts from the pkg/prometheus/server and pkg/thanos packages to pkg/operator and use it here.
@simonpasquier done the changes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally we'd need an e2e test but it can be added later.
pkg/operator/rules.go
Outdated
@@ -158,6 +164,13 @@ func ValidateRule(promRuleSpec monitoringv1.PrometheusRuleSpec) []error { | |||
if err != nil { | |||
return []error{fmt.Errorf("failed to marshal content: %w", err)} | |||
} | |||
|
|||
// Check if the size of prometheusRule exceeds 512KB (1048576 Bytes). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Check if the size of prometheusRule exceeds 512KB (1048576 Bytes). | |
// Check if the serialized rules exceed our internal limit. |
pkg/prometheus/server/rules.go
Outdated
@@ -206,7 +200,7 @@ func (c *Operator) selectRuleNamespaces(p *monitoringv1.Prometheus) ([]string, e | |||
func makeRulesConfigMaps(p *monitoringv1.Prometheus, ruleFiles map[string]string, opts ...operator.ObjectOption) ([]v1.ConfigMap, error) { | |||
//check if none of the rule files is too large for a single ConfigMap | |||
for filename, file := range ruleFiles { | |||
if len(file) > maxConfigMapDataSize { | |||
if len(file) > operator.MaxConfigMapDataSize { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the block shouldn't be needed anymore since the operator should have called ValidateRule() before.
if len(file) > operator.MaxConfigMapDataSize { |
@simonpasquier done the changes |
pkg/thanos/rules.go
Outdated
@@ -207,7 +201,7 @@ func (o *Operator) selectRuleNamespaces(p *monitoringv1.ThanosRuler) ([]string, | |||
func makeRulesConfigMaps(t *monitoringv1.ThanosRuler, ruleFiles map[string]string, opts ...operator.ObjectOption) ([]v1.ConfigMap, error) { | |||
//check if none of the rule files is too large for a single ConfigMap | |||
for filename, file := range ruleFiles { | |||
if len(file) > maxConfigMapDataSize { | |||
if len(file) > operator.MaxConfigMapDataSize { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same remark than before: promRuleSelector.Select()
will prevent "too big" rules from reaching here so we can remove this check.
@simonpasquier done the changes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Description
fixes: #6478
Reject PrometheusRule objects that generate > 1MB configmap
Type of change
What type of changes does your code introduce to the Prometheus operator? Put an
x
in the box that apply.CHANGE
(fix or feature that would cause existing functionality to not work as expected)FEATURE
(non-breaking change which adds functionality)BUGFIX
(non-breaking change which fixes an issue)ENHANCEMENT
(non-breaking change which improves existing functionality)NONE
(if none of the other choices apply. Example, tooling, build system, CI, docs, etc.)Verification
added unit test
Changelog entry