Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rgw: support new zone pools #14715

Merged
merged 1 commit into from
Oct 22, 2024
Merged

Conversation

arttor
Copy link
Contributor

@arttor arttor commented Sep 11, 2024

closes #14708

@@ -916,11 +916,14 @@ func adjustZoneDefaultPools(zone map[string]interface{}, spec cephv1.ObjectShare
"user_uid_pool": ".meta.users.uid",
"otp_pool": ".otp",
"notif_pool": ".log.notif",
"topics_pool": ".meta.topics", // introduced in Ceph v19
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we try to set these new properties in v18, does it fail? I assume that's why you're checking if the properties exist like this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, if pool is not presented in source json then we just ignore it.

}
for pool, nsSuffix := range zonePoolNSSuffix {
// replace rgw internal index pools with namespaced metadata pool
namespacedPool := defaultMetaPool + nsSuffix
prev, err := setObjProperty(zone, namespacedPool, pool)
prev, err := setObjPropertyIfExists(zone, namespacedPool, pool)
Copy link
Member

@travisn travisn Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative to the "IfExists" approach, is to check the ceph version. Something like this:

if cephVersion.IsAtLeast(cephver.CephVersion{Major: 19, Minor: 0, Extra: 0}) {
  // add new properties
}

But if we can just check if the property exists like you already implemented, that may certainly keep things simpler in the long term when other properties are added.

During an upgraded cluster (e.g. from v18 --> v19), the new properties will be found to exist and then we'll add them? Just wanted to confirm if you already tested that scenario.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that Ceph version check approach is better for code readability. But, as you said, with "IfExists" approach we can just add any new pools and don't care about ceph version.
I personally more like "IfExists" approach because it makes RGW responsible for pools structure and rook just tries to apply namespaces to all existing pools.
I can also suggest iterating through all root string fields ending with _pool and log warning if pool is unknown. Or try to apply some namespace anyway. What do you think?

I've upgaded my local rook-ceph from v18.something to v19.1.1-0. All daemon pods been restared and using v19 images, but when i run radosgw-admin zone get i don't see any new pools in json after update:

{
    "id": "e522c205-dbd2-43e1-8ef8-a81a55d44ef1",
    "name": "pp-store",
    "domain_root": "meta-pool:pp-store.meta.root",
    "control_pool": "meta-pool:pp-store.control",
    "gc_pool": "meta-pool:pp-store.log.gc",
    "lc_pool": "meta-pool:pp-store.log.lc",
    "log_pool": "meta-pool:pp-store.log",
    "intent_log_pool": "meta-pool:pp-store.log.intent",
    "usage_log_pool": "meta-pool:pp-store.log.usage",
    "roles_pool": "meta-pool:pp-store.meta.roles",
    "reshard_pool": "meta-pool:pp-store.log.reshard",
    "user_keys_pool": "meta-pool:pp-store.meta.users.keys",
    "user_email_pool": "meta-pool:pp-store.meta.users.email",
    "user_swift_pool": "meta-pool:pp-store.meta.users.swift",
    "user_uid_pool": "meta-pool:pp-store.meta.users.uid",
    "otp_pool": "meta-pool:pp-store.otp",
    "system_key": {
        "access_key": "",
        "secret_key": ""
    },
    "placement_pools": [...],
    "realm_id": "ceb20ae5-3e69-40c4-af9f-fc9bab7904db",
    "notif_pool": "meta-pool:pp-store.log.notif"
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can also suggest iterating through all root string fields ending with _pool and log warning if pool is unknown. Or try to apply some namespace anyway. What do you think?

That sounds great to log a warning if we find an extra pool, until we are more confident with that approach.

I've upgaded my local rook-ceph from v18.something to v19.1.1-0. All daemon pods been restared and using v19 images, but when i run radosgw-admin zone get i don't see any new pools in json after update:

This is what I worried about, the schema may not be updated until some zone update is triggered. So that means we may need to use the approach to check the ceph version instead of the "IfExists" approach.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds great to log a warning if we find an extra pool, until we are more confident with that approach.

I really like this idea as well.


Conceptually, I much prefer the IfExists approach, but it does seem like we have an edge condition on upgrade we need figure out how to deal with properly.

If we upgrade from v17/v18 -> v19, my expectation would be that RGW (upon upgrade) is going to create a new pool before we Rook has a chance to see if a new pool needs configured. I'm wondering what bad would happen if the RGW did that and started putting stuff into that pool before Rook has a chance to configure it. But it also seems like my expectation is not what happens IRL, which is also concerning.

I think it's best to ask @cbodley what his recommendation is for handling the upgrade case when new fields are introduced to the zone config. How does RGW operate on upgrade when the zone spec is missing configs for default pools? Does RGW create the new default pools and use them implicitly, or does it just not use the new default pools? Is there a good (preferably idempotent) way of checking to see if the zone config is missing fields it needs after an RGW upgrade? Are there other things around this we're missing?

I think the best-case scenario for Rook would be if RGW doesn't init new default pools on upgrade and we have a recommendation for detecting what pools RGW wants to be configured that we can use on an if-exists basis.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've upgaded my local rook-ceph from v18.something to v19.1.1-0. All daemon pods been restared and using v19 images, but when i run radosgw-admin zone get i don't see any new pools in json after update:

@arttor that makes it sound like the radosgw-admin binary was still v18

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, yea. You are correct, i was using old radosgw-admin 🤦 . After toolbox update i can see new pools.
For existing rgw with shared pools, new pools are using <instance_name>.rgw.meta pool:

{
    "notif_pool": "meta-pool:pp-store.log.notif", // existing is using shared metadata pool "meta-pool"
    "topics_pool": "pp-store.rgw.meta:topics", // new pool using default "pp-store.rgw.meta"
    "account_pool": "pp-store.rgw.meta:accounts", 
    "group_pool": "pp-store.rgw.meta:groups",
...
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if rook wants to prevent this, it would have to use an upgraded radosgw-admin zone set command to update those pool names before any other radosgws/radosgw-admin commands get upgraded

@BlaineEXE how can we achive this? It would be very easy if we could add new pools with old radosgw-admin. Or it is ok to ask users to upgrade rook to the latest version before upgrading Ceph? If it is ok, then we can just always force set new pools to zone and it should work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we know about the pools, can we attempt to always set them regardless of the Ceph version? That way, they will be ready on any upgrade.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that makes it sound like the radosgw-admin binary was still v18

That's a good point, and this is accurate. Because Ceph v19 is still pre-release, Rook is using the latest v18 image for its base image. That means that the Ceph CLI and all tools a re v18 tools, including radosgw-admin.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as discussed, added WARN log message for unknown pools in zone json

pkg/operator/ceph/object/json_helpers.go Outdated Show resolved Hide resolved
}
for pool, nsSuffix := range zonePoolNSSuffix {
// replace rgw internal index pools with namespaced metadata pool
namespacedPool := defaultMetaPool + nsSuffix
prev, err := setObjProperty(zone, namespacedPool, pool)
prev, err := setObjPropertyIfExists(zone, namespacedPool, pool)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds great to log a warning if we find an extra pool, until we are more confident with that approach.

I really like this idea as well.


Conceptually, I much prefer the IfExists approach, but it does seem like we have an edge condition on upgrade we need figure out how to deal with properly.

If we upgrade from v17/v18 -> v19, my expectation would be that RGW (upon upgrade) is going to create a new pool before we Rook has a chance to see if a new pool needs configured. I'm wondering what bad would happen if the RGW did that and started putting stuff into that pool before Rook has a chance to configure it. But it also seems like my expectation is not what happens IRL, which is also concerning.

I think it's best to ask @cbodley what his recommendation is for handling the upgrade case when new fields are introduced to the zone config. How does RGW operate on upgrade when the zone spec is missing configs for default pools? Does RGW create the new default pools and use them implicitly, or does it just not use the new default pools? Is there a good (preferably idempotent) way of checking to see if the zone config is missing fields it needs after an RGW upgrade? Are there other things around this we're missing?

I think the best-case scenario for Rook would be if RGW doesn't init new default pools on upgrade and we have a recommendation for detecting what pools RGW wants to be configured that we can use on an if-exists basis.

@arttor arttor force-pushed the rgw_new_shared_pools branch 2 times, most recently from bfb3c9a to 304c189 Compare September 24, 2024 12:32
Copy link
Contributor Author

@arttor arttor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BlaineEXE @travisn

i've done some more testing locally. Tried 4 possible combinations with v18/v19 ceph-daemons and rook-ceph-dependencies. I was not able to find any errors in rgw or operator logs for a simple scenarios like put/get/head bucket and object. I've also tested bucket notifications for v19 rook and v18 rgw combination and it worked.

I propose to use setIfExistApproach approach for zone pools. Otherwise there will be unnecesary attempt of zone update for rook v18 case because new pools won't be saved to zone json.

pkg/operator/ceph/object/json_helpers.go Outdated Show resolved Hide resolved
}
for pool, nsSuffix := range zonePoolNSSuffix {
// replace rgw internal index pools with namespaced metadata pool
namespacedPool := defaultMetaPool + nsSuffix
prev, err := setObjProperty(zone, namespacedPool, pool)
prev, err := setObjPropertyIfExists(zone, namespacedPool, pool)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as discussed, added WARN log message for unknown pools in zone json

Copy link
Member

@BlaineEXE BlaineEXE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really just a small nit around naming as far as the code is written.

Comment on lines 73 to 61
if shouldExist && !ok {
// not exists:
return prev, nil
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect a bool called shouldExist to cause the function to return an error if the field doesn't exist. Is there a better naming for this? Maybe setIfExists?

Copy link
Member

@BlaineEXE BlaineEXE Sep 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arttor this is also bringing up some related questions for me, and I'd like to clarify my understanding.

From reading the overall code, it seems like there is no error returned if the field Rook tries to set doesn't exist. Maybe I am misreading the code; if so let me know. If I'm reading correctly, is this for a specific reason?

IMO, it's probably safest for users for Rook to return an error if it tries to set a field using setObjProperty() if the field doesn't exist. Presumably, that case would happen if (1) RGW removes a field in the future, which we want to make sure we handle properly in code if needed, or (2) a dev makes a typo/mistake in a future Rook update.

Being able to catch (2) would be good, though not critical. Being able to catch (1) could be good to help make sure Rook's handling of RGW zone configs is always aware of RGW changes. I don't expect (1) to happen often, but catching it could be important.


Also, I don't think these questions are intrinsic to this PR. We can follow up after your PTO if needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I was thinking about returning an error when field is missing to log it in caller. But in this case we should implement error handling on caller side: check error type to not fail method in case of missing field. Because for v18 it is expected that some pools will be missing. I was not able to find existing "domain" error types in Rook and thought that i should not introduce error types just for this funciton.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done. now it retruns error if pool not exists and the error is logged with INFO level.

Comment on lines 54 to 61
return setObjPropertyCommon(false, obj, val, path...)
}

// setObjPropertyIfExists - helper function to manipulate JSON Objects.
// sets new value to json object nested field only if it is already exists in json and returns previous value.
func setObjPropertyIfExists[T string | []string | map[string]interface{} | []interface{}](obj map[string]interface{}, val T, path ...string) (T, error) {
return setObjPropertyCommon(true, obj, val, path...)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these switched from what they should be?

I think shouldExist=false when using setObjPropertyIfExists() and shouldExist=true when using setObjProperty() makes sense right?

Suggested change
return setObjPropertyCommon(false, obj, val, path...)
}
// setObjPropertyIfExists - helper function to manipulate JSON Objects.
// sets new value to json object nested field only if it is already exists in json and returns previous value.
func setObjPropertyIfExists[T string | []string | map[string]interface{} | []interface{}](obj map[string]interface{}, val T, path ...string) (T, error) {
return setObjPropertyCommon(true, obj, val, path...)
}
return setObjPropertyCommon(true, obj, val, path...)
}
// setObjPropertyIfExists - helper function to manipulate JSON Objects.
// sets new value to json object nested field only if it is already exists in json and returns previous value.
func setObjPropertyIfExists[T string | []string | map[string]interface{} | []interface{}](obj map[string]interface{}, val T, path ...string) (T, error) {
return setObjPropertyCommon(false, obj, val, path...)
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, it is correct. setObjPropertyIfExists() means that shouldExist=true. Is it makes sense to change namig?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the confusion I have is the use of shouldExist. I am interpreting this to be the same as mustExist, effectively failIfNotExist.

Perhaps the parameter shouldExist could be renamed to allowNotExists?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You interpreting it correctly. It acts like mustExist and will return error if property is not exists

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read through the unit tests, and I think I understand where my confusion is coming from.

In the unit tests, I see that setObjProperty() will inject a property into the json if the property doesn't already exist. Based on the name, I have been assuming that the setObjProperty() function would error out if the property it is operating on doesn't exist because I interpret "set" to be equivalent to "update".

I also see that setObjPropertyIfExists() returns an error if the property doesn't exist. Based on the name, I have been assuming that the setObjPropertyIfExists() doesn't need to return an error if the property is non-existent because of "if exists".

I have some follow-ups thoughts to try to get/create clarity:

Does Rook need to be able to add fields to the json struct for Ceph?
If yes, renaming setObjProperty() to setOrAddObjProperty() (or addOrUpdateObjProperty()) could be a good clarifying change. Following from that, setObjPropertyIfExists() might be better named as setObjProperty() (or updateObjProperty()).

If Rook doesn't need to add fields to the json struct, maybe that feature could be removed from the methods? This case probably needs more discussion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarification, now i also see the problem. You are right, we use this method only for zone and zonegroup JSONs. And these JSONs are returned from radosgw-admin and should always have all properties. This means that at least for now we can use setObjPropertyIfExists() for all cases.
I will remove setObjProperty() method and rename setObjPropertyIfExists() to updateObjProperty().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 923 to 913
prev, err := setObjProperty(zone, namespacedPool, pool)
prev, err := setObjPropertyIfExists(zone, namespacedPool, pool)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the pools added here, I think setObjPropertyIfExists() makes sense, but for existing pools, should we use setObjProperty() so that Rook can error out if the property is missing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think that we already have described behaviour with setObjPropertyIfExists(). it will return error if pool is missing.
Old pools should always be presented. Even it is missing in ceph, radosgw-admin will set default values to json anyway. If it will be missing in json then we will see warning in rook logs.

add rgw shared pools rados ns to topics_pool, account_pool, and group_pool

Signed-off-by: Artem Torubarov <artem.torubarov@clyso.com>
Copy link
Member

@BlaineEXE BlaineEXE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good. Thanks @arttor for taking the time to do this and working to make sure the code is as readable and maintainable as possible going forward!

I'll let @travisn have another look if he wants before merging

@travisn travisn merged commit 615086c into rook:master Oct 22, 2024
56 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RGW: shared pools: extra <storename>.rgw.meta pool created
4 participants