You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We use Dynatrace at our company and are currently testing using the new OTEL configuration it provides with Istio 1.22, since OpenTracing was removed in Envoy 1.30, which is what Istio 1.22 uses. We have it all working but are seeing this message nonstop in our logs, to the point it's very difficult to see actual traffic logs without some sort of filtering done:
2024-09-05T18:28:10.787073363Z 2024-09-05T18:28:10.786914Z error envoy tracing external/envoy/source/extensions/tracers/opentelemetry/http_trace_exporter.cc:86 OTLP HTTP exporter received a non-success status code: 503 while exporting the OTLP message thread=23
It looks like Istio hardcodes a sample rate of 100 when using a custom sampler -
// If the TracingProvider has a custom sampler (OTel Sampler)
// the sampling percentage is set to 100% so all spans arrive at the sampler for its decision.
sampling=100
} elseifspec.RandomSamplingPercentage!=nil {
sampling=*spec.RandomSamplingPercentage
} else {
// gracefully fallback to MeshConfig configuration. It will act as an implicit
// parent configuration during transition period.
sampling=proxyConfigSamplingValue(proxyCfg)
}
. Thus, we have deduced with Dynatrace support that it's just rejecting the majority of the traces due to sampling rate limits on its side.
The comment says "the sampling percentage is set to 100% so all spans arrive at the sampler for its decision" but why limit it in this way? I don't want to sample EVERY call, and send EVERY call to Dynatrace just for them to throw away 95% of them; isn't that incredibly wasteful? Not to mention, the very point of this issue which is my istio-proxy logs are filled nonstop with 503's.
I suggest making the sample rate configurable, just like it is with a non-custom sampler, so consumers can choose what works best for them.
Affected product area (please put an X in all that apply)
[ ] Ambient
[ ] Docs
[ ] Dual Stack
[ ] Installation
[ ] Networking
[ ] Performance and Scalability
[ X ] Extensions and Telemetry
[ ] Security
[ ] Test and Release
[ ] User Experience
[ ] Developer Infrastructure
Affected features (please put an X in all that apply)
[ ] Multi Cluster
[ ] Virtual Machine
[ ] Multi Control Plane
Additional context
The text was updated successfully, but these errors were encountered:
Not an expert here but I think there is some confusion. First, the logs you are seeing (OTLP HTTP exporter received a non-success status code: 503 while exporting the OTLP message) have nothing to do with sampling. This means that we are failing to report to dynatrace. This is orthogonal from how much we sample -- it seems like your setup is completely broken and not able to export to dynatrace.
Note a single one of those calls could be (attempting to) exporting 1000s of spans at once.
Now on the 100% tracing - it does not mean 100% of traces will be sampled. It means you are using a custom sampler which is not (just) percentage based. if we were to set this to something else (say 1%), then we would prefilter 1% of spans before we even let the custom sampler. Dynatraces sampler (and others) expect to see all spans (locally!) so they can decide which ones to sample using more complex algorithms.
Interesting.....ok then, let me take this back to our own monitoring SME's and Dynatrace and see what they say. Interestingly enough, we see traces in Dynatrace, so at least SOME of getting over there. But then we also see these 503's, so it almost seems like it's working sporadically, which makes no sense to me. We are using the exact setup steps in their docs. ¯_(ツ)_/¯
Describe the feature request
We use Dynatrace at our company and are currently testing using the new OTEL configuration it provides with Istio 1.22, since OpenTracing was removed in Envoy 1.30, which is what Istio 1.22 uses. We have it all working but are seeing this message nonstop in our logs, to the point it's very difficult to see actual traffic logs without some sort of filtering done:
2024-09-05T18:28:10.787073363Z 2024-09-05T18:28:10.786914Z error envoy tracing external/envoy/source/extensions/tracers/opentelemetry/http_trace_exporter.cc:86 OTLP HTTP exporter received a non-success status code: 503 while exporting the OTLP message thread=23
It looks like Istio hardcodes a sample rate of 100 when using a custom sampler -
istio/pilot/pkg/networking/core/tracing.go
Lines 119 to 130 in b8197f4
The comment says "the sampling percentage is set to 100% so all spans arrive at the sampler for its decision" but why limit it in this way? I don't want to sample EVERY call, and send EVERY call to Dynatrace just for them to throw away 95% of them; isn't that incredibly wasteful? Not to mention, the very point of this issue which is my istio-proxy logs are filled nonstop with 503's.
I suggest making the sample rate configurable, just like it is with a non-custom sampler, so consumers can choose what works best for them.
Affected product area (please put an X in all that apply)
[ ] Ambient
[ ] Docs
[ ] Dual Stack
[ ] Installation
[ ] Networking
[ ] Performance and Scalability
[ X ] Extensions and Telemetry
[ ] Security
[ ] Test and Release
[ ] User Experience
[ ] Developer Infrastructure
Affected features (please put an X in all that apply)
[ ] Multi Cluster
[ ] Virtual Machine
[ ] Multi Control Plane
Additional context
The text was updated successfully, but these errors were encountered: