-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Workflows with CosmosDB throwing 412 errors #8004
Comments
I have seen #7162, but this is all new greenfield stuff, I have never used those old versions of dapr. |
This error means that two (or more) concurrent operations trying to mutate the same state are happening on your Cosmos collection and one got rejected due to record versioning. This is a retriable error that you can safely retry from your code. However, please note that we completely revamped the actor reminder system in Dapr 1.14 - it'd be great if you could upgrade to 1.14.1 and enable the Scheduler service with the following configuration:
Then apply the configuration to your app with the following annotation: You should not only see these errors resolved but also get improved performance. Notice that old reminder data will not be moved over. |
Yes I am anxiously awaiting 1.14 to hit the AKS Dapr extension, not quite available yet. Great job on the work there! As for retrying this error, not really sure it will help. I'm not seeing intermittent failure; it's failing every time on every workflow. I too thought there must be concurrency somewhere, but I can't find it. It seems there is only 1 instance of this workflow running and nothing else is touching that state. |
Can you reach out to me on Discord? handle is |
It seems this was due to Cosmos consistency level and multiple regions being enabled. Session consistency was the cause. We switched to Bounded Staleness and that resolved it, although there is a cost associated with that so it would be good if Session support could be added. One question I have is, with the new reminder scheduling, if we delete and recreate our AKS cluster will reminder now be lost? |
Yes, all existing reminder data will be lost. |
This issue has been automatically marked as stale because it has not had activity in the last 60 days. It will be closed in the next 7 days unless it is tagged (pinned, good first issue, help wanted or triaged/resolved) or other activity occurs. Thank you for your contributions. |
What version of Dapr?
Expected Behavior
Workflows should function correctly
Actual Behavior
{"app_id":"my-service-workflow","instance":"my-service-workflow-c89bdf786-xtvjj","level":"warning","msg":"Workflow actor '56631279-491c-422f-8ac3-ef885ad5a448': execution failed with a recoverable error and will be retried later: 'failed to invoke activity actor '56631279-491c-422f-8ac3-ef885ad5a448::1::1' to execute 'GetSomeData': error from internal actor: error saving reminders partition and metadata: transaction failed due to operation 1 which failed with status code 412'","scope":"dapr.wfengine.backend.actors","time":"2024-08-15T20:54:58.986427073Z","type":"log","ver":"1.13.5"}
Steps to Reproduce the Problem
Not quite sure. This same workflow works on my DEV and QA environment, but not in my UAT environment which appears to be identical. Looking for help to determine what could cause this.
I am running in an Azure AKS cluster, Kubernetes 1.29.7, Dapr 1.13.5-msft.1 in HA mode. Cosmos collection has the correct partition key, and I see workflow records in the collection. I only have 1 instance of my workflow deployed.
The text was updated successfully, but these errors were encountered: