Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VAULT-28677: Fix dangling entity-aliases in MemDB after invalidation #27750

Merged
merged 10 commits into from
Jul 25, 2024

Conversation

marcboudreau
Copy link
Contributor

Description

This change corrects a regression that was introduced by #27184.

When an entity has been modified in a storage bucket such that one or more aliases has been removed, those removed aliases were not being deleted from the MemDB table containing them. This change corrects this by scanning all associated aliases with entities that have been determined to be modified in the storage bucket, and deleting any associated aliases from MemDB that are no longer associated with the entity in the storage bucket.

TODO only if you're a HashiCorp employee

  • Labels: If this PR is the CE portion of an ENT change, and that ENT change is
    getting backported to N-2, use the new style backport/ent/x.x.x+ent labels
    instead of the old style backport/x.x.x labels.
  • Labels: If this PR is a CE only change, it can only be backported to N, so use
    the normal backport/x.x.x label (there should be only 1).
  • ENT Breakage: If this PR either 1) removes a public function OR 2) changes the signature
    of a public function, even if that change is in a CE file, double check that
    applying the patch for this PR to the ENT repo and running tests doesn't
    break any tests. Sometimes ENT only tests rely on public functions in CE
    files.
  • Jira: If this change has an associated Jira, it's referenced either
    in the PR description, commit message, or branch name.
  • RFC: If this change has an associated RFC, please link it in the description.
  • ENT PR: If this change has an associated ENT PR, please link it in the
    description. Also, make sure the changelog is in this PR, not in your ENT PR.

@marcboudreau marcboudreau added backport/ent/1.16.x+ent Changes are backported to 1.16.x+ent backport/1.17.x labels Jul 10, 2024
@marcboudreau marcboudreau added this to the 1.17.3 milestone Jul 10, 2024
@github-actions github-actions bot added the hashicorp-contributed-pr If the PR is HashiCorp (i.e. not-community) contributed label Jul 10, 2024
Copy link

github-actions bot commented Jul 10, 2024

CI Results:
All Go tests succeeded! ✅

Copy link

github-actions bot commented Jul 10, 2024

Build Results:
All builds succeeded! ✅

}

// If the entity exists in MemDB it must differ from the entity in
// the storage bucket because of above test. Go through all of the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the entity exists and it differs, are we assuming that the difference between them is always going to be the aliases? Why don't we simply delete the entity from memdb and add it again like the previous implementation did?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the MemDB entity differs from its corresponding storage bucket entity, it may or may not be the aliases. But this makes me think we could detect if the Aliases slices are the same but that would entail walking the entire set and that's what the current algorithm is doing. I could add a test case where there are no changes to the Aliases to really make sure it works.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this makes me think we could detect if the Aliases slices are the same but that would entail walking the entire set and that's what the current algorithm is doing.

Wouldn't be easy to just detect if there are changes and replace one entity with the other? Deleting from memdb and adding without having to walk a slice and detect changes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to what Bianca is saying. Just chiming in so I can follow along 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed a change that follows this suggestion.

// function does not delete those aliases, it only creates missing
// ones.
if memDBEntity != nil {
if err := i.deleteAliasesInEntityInTxn(txn, memDBEntity, memDBEntity.Aliases); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marcboudreau @elliesterner brought up a valid point while we were talking about entity merge prevention. Do you think not deleting the entity from memdb might cause an automatic merge to be triggered? if you could write a test for that, that would be awesome. we would like to prevent further merges from happening.

Copy link
Contributor Author

@marcboudreau marcboudreau Jul 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From looking at the code in (*IdentityStore).upsertEntityInTxn, there are 2 circumstances that lead to (*IdentityStore).mergeEntityAsPartOfUpsert being called:

  1. previousEntity is not nil and entity has an alias in its Aliases field that exists in MemDB and whose CanonicalID field is set to the value of previousEntity.ID
  2. entity has an alias in its Aliases field that exists in MemDB and whose CanonicalID field is set to a value that is different than entity.ID.

In the (*IdentityStore).invalidateEntityBucket function, when upsertEntityInTxn is called, the previousEntity argument is always nil, so that rules out circumstance 1.

And by pre-deleting the aliases, we ensure that circumstance 2 cannot happen either.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the sake of clarity, pre-deleting the entity from MemDB and then calling (*IdentityStore).upsertEntityInTxn won't prevent an entity merge from happening, since the logic that decides that doesn't take into account whether the entity exists in MemDB or not. I think the only way to prevent an entity merge from happening, would be to scan each of the aliases associated with the entity (instead of pre-deleted them) and search for any alias in MemDB with a matching alias name and mount accessor and delete those. That would make it impossible for circumstance 2 to happen.

Copy link
Contributor

@elliesterner elliesterner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great!!

@marcboudreau marcboudreau merged commit a41c21b into main Jul 25, 2024
84 checks passed
@marcboudreau marcboudreau deleted the marcboudreau/fix-merge-entities-on-invalidate branch July 25, 2024 19:36
Comment on lines +758 to +767
// If this is a performance secondary, the entity created on
// this node would have been cached in a local cache based on
// the result of the CreateEntity RPC call to the primary
// cluster. Since this invalidation is signaling that the
// entity is now in the primary cluster's storage, the locally
// cached entry can be removed.
if i.localNode.ReplicationState().HasState(consts.ReplicationPerformanceSecondary) && i.localNode.HAState() == consts.Active {
if err := i.localAliasPacker.DeleteItem(ctx, bucketEntity.ID+tmpSuffix); err != nil {
i.logger.Error("failed to clear local alias entity cache", "error", err, "entity_id", bucketEntity.ID)
return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole dance is fascinating. I'm a little curious how you discovered it here - it seems like this being missing is an unrelated bug to the regression right?

@biazmoreira you probably know all about this already, is this another place that breaks the mental model of all global writes go to primary because we updated memdb with a global thing outside of replication? I think we saw places like that with standbys but this was new to me that we have perf secondaries updating their memdb outside of replication for replicated state.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For anyone who reads this later, I realised this wasn't new code - just moved down from further up (see lines 739 and on in the before part of this diff).

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/ent/1.16.x+ent Changes are backported to 1.16.x+ent backport/1.17.x hashicorp-contributed-pr If the PR is HashiCorp (i.e. not-community) contributed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants