-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a matcher for partitioning services #3224
Comments
Hi, I'm quite new to the Service Fabric world so excuse my candor here. Why can't you use the existing matchers in your use case? Like, a simple regex-based matcher? To me, it looks like you're trying to implement a load balancing rule in the matcher. Before taking any action, we need to fully understand your use case. More specifically, what is a partitioned service? Can you give us some pointers, a diagram along with a use case? Thanks for your help |
@geraldcroes A partitioned service is typically a stateful service, that has been broken into partitions( like shards for databases ). Imagine you have customers A,B,C,D, and you decide to have 3 partitions. The client only knows the customer, it does not know how A,B,C,D are allocated to partitions, but this information is required to route the request correctly. Without @lawrencegripper 's feature, either the client must call somewhere else to get the partition, or a separate proxy service has to be setup. Also see https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-concepts-partitioning . FYI, the reason i am replying here, is that we are starting to use Traefik in Service Fabric to replace the Azure APIM in our setup, and this is a bit of an annoyance (so there is a real world "customer" here :) |
Thank you for your pointers, I'll read them right away. I still don't get why the partitioning logic has to exist both in the stateful service and in Traefik (and why Traefik can't contact some kind of master that would handle the routing). Also, one of my question was, "why do you need to hash / compute the value and not use the value "as is" using a regexp?" |
The SF partitions have no knowledge of their fellow partitions and need to be individually load balanced across - hence the appropriate partition endpoint must be resolved before the request is then load balanced over the partition instances. There is a SF API that can resolve the request but this would require a lookup from a piece of custom middleware for each request destined for a stateful partition (pick your poison?). The need for a hash is to support range based matching rather than direct string matching. Hashing ensures even distribution across the partitions to avoid getting hot spots and effective use of the underlying resources. |
A similar approach is used by the Metaparticle project to handle this in Kubernetes - good doc with diagram. The doc explains how this approach used, hopefully demonstrating that the label is useful in both SF and other orchestrators. |
A quick update -- I wasn't able to work on it last week but am now setting up an environment so I can test it and move forward. |
Another update -- I've dived into Service Fabric and now have a better grasp of the problem at hand. To be completely honest, I was not familiar with the stateful services approach. Until now, I've always preferred the stateless one (a computing unit with an external persistance store). That being said, I understand its value and the fact that it is an important feature (even if I have questions that I cannot yet find the answers to). Allow us a bit more time to discuss it. We'll soon come back to you. |
Thanks for taking a look into the Service Fabric use case. I think the matcher also has broader usability for other systems too - as partitioning can be used for both scale and A/B testing. For scale in large deploymentsThe Metaparticle link I shared provides a good example:
https://metaparticle.io/tutorials/dotnet-sharding/ For A/B testingFor example you want to A/B test a new UI change. You want to expose the new version to a low number of users initially to understand how it affects engagement or errors rates. To do this you treat deployments as immutable, keeping the old version deployed alongside the new version and sending a % of requests to the new version. The problem is that, once a user has the new UI, you don't want them to jump randomly between new and old versions between each request or device. With the
This would ensure that 5% of users are directed to the new version always, even if they disconnect/reconect, logged in on a different device, browsers etc. The same 5% of users (through the x-userid header) will always see the new deployment. This gives users a consistent experience during a A/B test and you a consistent test group. This method may be preferrable to stick sessions (cookies) as, even if the user disconnects/reconnects, flushes cookies, uses incognito or different browers they will always see the new deployment. |
Another update -- We've heavily discussed the proposal, and there are still some debates whether Traefik should or should not embed this feature. For my part, after having investigated on the issue and its use cases, I'm convinced that it should be (at some point) included. There are some cons though.
So for now, even if the team seems interested in the feature, it doesn't fully agree (yet) on the proposal. Still, in the foreseeable future, Traefik will provide a feature that should enable users to customise and introduce the behaviour you're asking for. In the meantime, I'll let maintainers take over and move forward. |
Hi @geraldcroes thanks for taking a look and the wider team for discussions - appreciate the time and effort taken and agree with a number of the con's listed. In the interest of exploring all options, do any of the following give us a way forward?
In the proposal I tried to make the We would need a way to add a
We ruled it out as go-plugin still doesn't support windows. Would you be open to using something like hashicorp/go-plugin? I'd be happy to POC creating an extension point with it allowing plugins to register Let me know your thoughts. |
@geraldcroes I definitely agree that it makes sense to support multiple sharding algorithms, but i am not sure why it would be considered so much more complicated than the other matchers or have that significant performance impact? You mention that is it not a very requested feature, but it is certainly a feature we would like where i work. Would it make any difference if we were a paying customer (i noticed you introduced commercial support)? |
@lawrencegripper We're discussing options, I'll keep you updated as soon as I can. |
Moving the code under the SF Provider doesn't look appealing because it would make it stand apart (even more than it currently does). One of our goals is to offer a cohesive and straightforward API, whatever provider the users have chosen and we don't welcome the idea of proposing features here but not there. Once again we understand that the feature would be welcome by the Service Fabric community, but unfortunately we're not yet ready to include it as is. This is not the first time that plugin systems (or others) have come up into the discussion (see below for references), but even if we're working toward solutions that would make it possible, we're not ready yet, and by "yet", I mean that we're actively working on it. It's never an easy thing to answer with "sorry, not yet," but this is all I can do for now. @petertiedemann Wether you're a paying customer or not has not come up once in the debate. The only reason why we're postponing the proposal is because we truely are not ready yet. We thank you once again for the proposal that we'll keep open, and regret to close the current pull request.
https://twitter.com/solomonstre/status/715277134978113536?lang=en |
@ldez Really appreciate the response, thanks for taking a look at the alternatives I proposed and all the help with SF provider 👍 Let me know when/how things go with the plugin model and look forward to taking another crack at this in the future. |
@ldez I only brought the support thing up, because @geraldcroes said this functionality was not a much requested feature and not supported by a large community, thinking that having paying customers using the feature might help justify having to support it. Without this feature we will either have to use a fork of Traefik, or write stateless proxies for our stateful services (luckily we only have a few). I haven't explored how paid support would work if using a fork, but i doubt it would work out well. You guys really need plugin support :) |
So I recently came across It would, in theory, allow us to have simple JS functions defining matching rules/middleware. These could be base64 encoded and set as labels on the services then loaded and run dynamically or provided to Traefik in the TOML. We would need to run some tests to understand that impact on performance, my hope is that basic rules would be faster than out-of-process RPC style plugin models. @ldez If this sounds of interest I'd be happy to look at running some benchmarks. |
@lawrencegripper I also want a dynamic matcher for A/B testing or service-chain, and |
I’m unclear on the perf as haven’t run any benchmarks but I’d be happy to do some testing if this is something that the traefik team would consider merging, assuming it can meet performance goals. |
As a potentially interested party seeking additional options to Azure APIM and custom coded API gateways, I was wondering if any progress has been made, since the original PR almost a year ago and more than 9 months since the "no, not yet" response, in supporting stateful services in Service Fabric with Traefik? Is there another path under development by the Traefik maintainers such as the JS functions as matching rules/middleware approach @lawrencegripper mentioned? This is important to Traefik's integration with the SF platform since stateful service support is a major differentiator of the SF platform. In other words, without it, folks in my situation will likely look elsewhere. |
There hasn't been any progress on this that I'm aware of as it's blocked by the availability of a plugin model to move this out of the tree. Ldez's comment is the a good summary of the situation. I understand that managing an OSS project which has lots of different users and supported platforms means some will not get everything they want. On an related note building OSS is hard and people can expect a lot and unintentionally sometimes not appear grateful for the hard-work of others. Please keep in mind that @ldez and the Traefik team have taken a lot of time to review, improve and maintain the Service Fabric provider. |
I appreciate the response and status update. I agree that successful OSS projects are built upon lots of hard work. Contributions of ideas and code from the community help further that success. I appreciate the Traefik team's specific vision for the right way to evolve the project. It's understandable that there has been much interest in an extensibility model to allow additional functionality or leverage platform capabilities (e.g. Service Fabric and others) for a long time. I think we're all just trying to move things forward and address significant functional requirements / use cases. |
For what it’s worth, I’ve been prototyping various embeddable interpreters like go-lua and gojo, etc. So far their performance hasn’t been great (worse than the previous attempts with GRPC or Hashicorp go-plugin). I have got good results with https://github.com/d5/tengo, so will try to make a PR for Traefik folks to consider. |
Hello, This proposal targets Traefik v1 which is not supported anymore. We'll re-open it later if necessary. |
Do you want to request a feature or report a bug?
Feature (I'll write it)
What did you do?
Service Fabric uses partitioning of services to improve scalability. I would like to add a matching rule which allows requests to be partitioned. A frontend would be created per partition and the matching rule would ensure requests are matched to the correct frontend based on the value of a hash function – allowing you to evenly distribute across n number of partitions. This would be useful to other providers, for example allowing requests to be partitioned across multiple container instances or services in Kubernetes.
Additional discussion: jjcollinge/traefik-on-service-fabric#45
Proposal
Add an additional matching rule to Traefik which enables a hashed range match for example
HashedRange: type:header value:x-partitionheader match:0-100 range:0-300
. It would take an input and use a hashing algorithm to convert this to anint
with even distribution in a range. In this case the full range would be 0-300 and this rule would match if the hashed result of the headerx-partitionheader
fell in the range 0-100.This could be used to create 3 partitions with a KeyMin=0 and KeyMax=300 for example and distribute load between them:
HashedRange: type:header value:x-partitionheader match:0-100 range 0-300
HashedRange: type:header value:x-partitionheader match:100-200 range 0-300
HashedRange: type:header value:x-partitionheader match:200-300 range 0-300
In addition to the
type:header
option I would also look to addurl-regex
which would match a section of the url to hash,I can think of more
types
but I think these two cover most use cases.Example of url-regex type
URL: http://example.com/bob/?customerid=jamesnesbit
HashedRange: type:url-regex value:[=].* match:0-100 range:0-300
This would hash
jamesnesbit
and match if it the result was in range 0-100What did you expect to see?
The service fabric provider would query stateful services and create a frontend for each partition with the appropriate
hashedrange
matcher. Requests would then match the correct partition based on the value of theirheader
orurl-regex
.What did you see instead?
I don't believe it's currently possible to achieve this behavior in Traefik
CC: @jjcollinge
The text was updated successfully, but these errors were encountered: