rgw: support new zone pools #14715

arttor · 2024-09-11T15:53:11Z

travisn · 2024-09-11T19:27:15Z

pkg/operator/ceph/object/objectstore.go

@@ -916,11 +916,14 @@ func adjustZoneDefaultPools(zone map[string]interface{}, spec cephv1.ObjectShare
 		"user_uid_pool":   ".meta.users.uid",
 		"otp_pool":        ".otp",
 		"notif_pool":      ".log.notif",
+		"topics_pool":     ".meta.topics",  // introduced in Ceph v19


If we try to set these new properties in v18, does it fail? I assume that's why you're checking if the properties exist like this.

Correct, if pool is not presented in source json then we just ignore it.

travisn · 2024-09-11T19:40:26Z

pkg/operator/ceph/object/objectstore.go

 	}
 	for pool, nsSuffix := range zonePoolNSSuffix {
 		// replace rgw internal index pools with namespaced metadata pool
 		namespacedPool := defaultMetaPool + nsSuffix
-		prev, err := setObjProperty(zone, namespacedPool, pool)
+		prev, err := setObjPropertyIfExists(zone, namespacedPool, pool)


An alternative to the "IfExists" approach, is to check the ceph version. Something like this:

if cephVersion.IsAtLeast(cephver.CephVersion{Major: 19, Minor: 0, Extra: 0}) { // add new properties }

But if we can just check if the property exists like you already implemented, that may certainly keep things simpler in the long term when other properties are added.

During an upgraded cluster (e.g. from v18 --> v19), the new properties will be found to exist and then we'll add them? Just wanted to confirm if you already tested that scenario.

I think that Ceph version check approach is better for code readability. But, as you said, with "IfExists" approach we can just add any new pools and don't care about ceph version.
I personally more like "IfExists" approach because it makes RGW responsible for pools structure and rook just tries to apply namespaces to all existing pools.
I can also suggest iterating through all root string fields ending with _pool and log warning if pool is unknown. Or try to apply some namespace anyway. What do you think?

I've upgaded my local rook-ceph from v18.something to v19.1.1-0. All daemon pods been restared and using v19 images, but when i run radosgw-admin zone get i don't see any new pools in json after update:

{ "id": "e522c205-dbd2-43e1-8ef8-a81a55d44ef1", "name": "pp-store", "domain_root": "meta-pool:pp-store.meta.root", "control_pool": "meta-pool:pp-store.control", "gc_pool": "meta-pool:pp-store.log.gc", "lc_pool": "meta-pool:pp-store.log.lc", "log_pool": "meta-pool:pp-store.log", "intent_log_pool": "meta-pool:pp-store.log.intent", "usage_log_pool": "meta-pool:pp-store.log.usage", "roles_pool": "meta-pool:pp-store.meta.roles", "reshard_pool": "meta-pool:pp-store.log.reshard", "user_keys_pool": "meta-pool:pp-store.meta.users.keys", "user_email_pool": "meta-pool:pp-store.meta.users.email", "user_swift_pool": "meta-pool:pp-store.meta.users.swift", "user_uid_pool": "meta-pool:pp-store.meta.users.uid", "otp_pool": "meta-pool:pp-store.otp", "system_key": { "access_key": "", "secret_key": "" }, "placement_pools": [...], "realm_id": "ceb20ae5-3e69-40c4-af9f-fc9bab7904db", "notif_pool": "meta-pool:pp-store.log.notif" }

I can also suggest iterating through all root string fields ending with _pool and log warning if pool is unknown. Or try to apply some namespace anyway. What do you think?

That sounds great to log a warning if we find an extra pool, until we are more confident with that approach.

I've upgaded my local rook-ceph from v18.something to v19.1.1-0. All daemon pods been restared and using v19 images, but when i run radosgw-admin zone get i don't see any new pools in json after update:

This is what I worried about, the schema may not be updated until some zone update is triggered. So that means we may need to use the approach to check the ceph version instead of the "IfExists" approach.

That sounds great to log a warning if we find an extra pool, until we are more confident with that approach.

I really like this idea as well.

Conceptually, I much prefer the IfExists approach, but it does seem like we have an edge condition on upgrade we need figure out how to deal with properly.

If we upgrade from v17/v18 -> v19, my expectation would be that RGW (upon upgrade) is going to create a new pool before we Rook has a chance to see if a new pool needs configured. I'm wondering what bad would happen if the RGW did that and started putting stuff into that pool before Rook has a chance to configure it. But it also seems like my expectation is not what happens IRL, which is also concerning.

I think it's best to ask @cbodley what his recommendation is for handling the upgrade case when new fields are introduced to the zone config. How does RGW operate on upgrade when the zone spec is missing configs for default pools? Does RGW create the new default pools and use them implicitly, or does it just not use the new default pools? Is there a good (preferably idempotent) way of checking to see if the zone config is missing fields it needs after an RGW upgrade? Are there other things around this we're missing?

I think the best-case scenario for Rook would be if RGW doesn't init new default pools on upgrade and we have a recommendation for detecting what pools RGW wants to be configured that we can use on an if-exists basis.

I've upgaded my local rook-ceph from v18.something to v19.1.1-0. All daemon pods been restared and using v19 images, but when i run radosgw-admin zone get i don't see any new pools in json after update:

@arttor that makes it sound like the radosgw-admin binary was still v18

oh, yea. You are correct, i was using old radosgw-admin 🤦 . After toolbox update i can see new pools.
For existing rgw with shared pools, new pools are using <instance_name>.rgw.meta pool:

{ "notif_pool": "meta-pool:pp-store.log.notif", // existing is using shared metadata pool "meta-pool" "topics_pool": "pp-store.rgw.meta:topics", // new pool using default "pp-store.rgw.meta" "account_pool": "pp-store.rgw.meta:accounts", "group_pool": "pp-store.rgw.meta:groups", ... }

if rook wants to prevent this, it would have to use an upgraded radosgw-admin zone set command to update those pool names before any other radosgws/radosgw-admin commands get upgraded

@BlaineEXE how can we achive this? It would be very easy if we could add new pools with old radosgw-admin. Or it is ok to ask users to upgrade rook to the latest version before upgrading Ceph? If it is ok, then we can just always force set new pools to zone and it should work.

Since we know about the pools, can we attempt to always set them regardless of the Ceph version? That way, they will be ready on any upgrade.

that makes it sound like the radosgw-admin binary was still v18

That's a good point, and this is accurate. Because Ceph v19 is still pre-release, Rook is using the latest v18 image for its base image. That means that the Ceph CLI and all tools a re v18 tools, including radosgw-admin.

as discussed, added WARN log message for unknown pools in zone json

pkg/operator/ceph/object/json_helpers.go

BlaineEXE · 2024-09-12T18:39:29Z

pkg/operator/ceph/object/objectstore.go

 	}
 	for pool, nsSuffix := range zonePoolNSSuffix {
 		// replace rgw internal index pools with namespaced metadata pool
 		namespacedPool := defaultMetaPool + nsSuffix
-		prev, err := setObjProperty(zone, namespacedPool, pool)
+		prev, err := setObjPropertyIfExists(zone, namespacedPool, pool)


That sounds great to log a warning if we find an extra pool, until we are more confident with that approach.

I really like this idea as well.

Conceptually, I much prefer the IfExists approach, but it does seem like we have an edge condition on upgrade we need figure out how to deal with properly.

If we upgrade from v17/v18 -> v19, my expectation would be that RGW (upon upgrade) is going to create a new pool before we Rook has a chance to see if a new pool needs configured. I'm wondering what bad would happen if the RGW did that and started putting stuff into that pool before Rook has a chance to configure it. But it also seems like my expectation is not what happens IRL, which is also concerning.

I think it's best to ask @cbodley what his recommendation is for handling the upgrade case when new fields are introduced to the zone config. How does RGW operate on upgrade when the zone spec is missing configs for default pools? Does RGW create the new default pools and use them implicitly, or does it just not use the new default pools? Is there a good (preferably idempotent) way of checking to see if the zone config is missing fields it needs after an RGW upgrade? Are there other things around this we're missing?

I think the best-case scenario for Rook would be if RGW doesn't init new default pools on upgrade and we have a recommendation for detecting what pools RGW wants to be configured that we can use on an if-exists basis.

arttor

@BlaineEXE @travisn

i've done some more testing locally. Tried 4 possible combinations with v18/v19 ceph-daemons and rook-ceph-dependencies. I was not able to find any errors in rgw or operator logs for a simple scenarios like put/get/head bucket and object. I've also tested bucket notifications for v19 rook and v18 rgw combination and it worked.

I propose to use setIfExistApproach approach for zone pools. Otherwise there will be unnecesary attempt of zone update for rook v18 case because new pools won't be saved to zone json.

pkg/operator/ceph/object/json_helpers.go

arttor · 2024-09-24T12:42:38Z

pkg/operator/ceph/object/objectstore.go

 	}
 	for pool, nsSuffix := range zonePoolNSSuffix {
 		// replace rgw internal index pools with namespaced metadata pool
 		namespacedPool := defaultMetaPool + nsSuffix
-		prev, err := setObjProperty(zone, namespacedPool, pool)
+		prev, err := setObjPropertyIfExists(zone, namespacedPool, pool)


as discussed, added WARN log message for unknown pools in zone json

BlaineEXE

Really just a small nit around naming as far as the code is written.

BlaineEXE · 2024-09-24T16:16:53Z

pkg/operator/ceph/object/json_helpers.go

+			if shouldExist && !ok {
+				// not exists:
+				return prev, nil
+			}


I would expect a bool called shouldExist to cause the function to return an error if the field doesn't exist. Is there a better naming for this? Maybe setIfExists?

@arttor this is also bringing up some related questions for me, and I'd like to clarify my understanding.

From reading the overall code, it seems like there is no error returned if the field Rook tries to set doesn't exist. Maybe I am misreading the code; if so let me know. If I'm reading correctly, is this for a specific reason?

IMO, it's probably safest for users for Rook to return an error if it tries to set a field using setObjProperty() if the field doesn't exist. Presumably, that case would happen if (1) RGW removes a field in the future, which we want to make sure we handle properly in code if needed, or (2) a dev makes a typo/mistake in a future Rook update.

Being able to catch (2) would be good, though not critical. Being able to catch (1) could be good to help make sure Rook's handling of RGW zone configs is always aware of RGW changes. I don't expect (1) to happen often, but catching it could be important.

Also, I don't think these questions are intrinsic to this PR. We can follow up after your PTO if needed.

Good point. I was thinking about returning an error when field is missing to log it in caller. But in this case we should implement error handling on caller side: check error type to not fail method in case of missing field. Because for v18 it is expected that some pools will be missing. I was not able to find existing "domain" error types in Rook and thought that i should not introduce error types just for this funciton.

done. now it retruns error if pool not exists and the error is logged with INFO level.

BlaineEXE · 2024-09-26T18:26:49Z

pkg/operator/ceph/object/json_helpers.go

+	return setObjPropertyCommon(false, obj, val, path...)
+}
+
+// setObjPropertyIfExists - helper function to manipulate JSON Objects.
+// sets new value to json object nested field only if it is already exists in json and returns previous value.
+func setObjPropertyIfExists[T string | []string | map[string]interface{} | []interface{}](obj map[string]interface{}, val T, path ...string) (T, error) {
+	return setObjPropertyCommon(true, obj, val, path...)
+}


Are these switched from what they should be?

I think shouldExist=false when using setObjPropertyIfExists() and shouldExist=true when using setObjProperty() makes sense right?

Suggested change

return setObjPropertyCommon(false, obj, val, path...)

}

// setObjPropertyIfExists - helper function to manipulate JSON Objects.

// sets new value to json object nested field only if it is already exists in json and returns previous value.

func setObjPropertyIfExists[T string | []string | map[string]interface{} | []interface{}](obj map[string]interface{}, val T, path ...string) (T, error) {

return setObjPropertyCommon(true, obj, val, path...)

}

return setObjPropertyCommon(true, obj, val, path...)

}

// setObjPropertyIfExists - helper function to manipulate JSON Objects.

// sets new value to json object nested field only if it is already exists in json and returns previous value.

func setObjPropertyIfExists[T string | []string | map[string]interface{} | []interface{}](obj map[string]interface{}, val T, path ...string) (T, error) {

return setObjPropertyCommon(false, obj, val, path...)

}

no, it is correct. setObjPropertyIfExists() means that shouldExist=true. Is it makes sense to change namig?

I think the confusion I have is the use of shouldExist. I am interpreting this to be the same as mustExist, effectively failIfNotExist.

Perhaps the parameter shouldExist could be renamed to allowNotExists?

You interpreting it correctly. It acts like mustExist and will return error if property is not exists

I read through the unit tests, and I think I understand where my confusion is coming from.

In the unit tests, I see that setObjProperty() will inject a property into the json if the property doesn't already exist. Based on the name, I have been assuming that the setObjProperty() function would error out if the property it is operating on doesn't exist because I interpret "set" to be equivalent to "update".

I also see that setObjPropertyIfExists() returns an error if the property doesn't exist. Based on the name, I have been assuming that the setObjPropertyIfExists() doesn't need to return an error if the property is non-existent because of "if exists".

I have some follow-ups thoughts to try to get/create clarity:

Does Rook need to be able to add fields to the json struct for Ceph?
If yes, renaming setObjProperty() to setOrAddObjProperty() (or addOrUpdateObjProperty()) could be a good clarifying change. Following from that, setObjPropertyIfExists() might be better named as setObjProperty() (or updateObjProperty()).

If Rook doesn't need to add fields to the json struct, maybe that feature could be removed from the methods? This case probably needs more discussion.

Thanks for clarification, now i also see the problem. You are right, we use this method only for zone and zonegroup JSONs. And these JSONs are returned from radosgw-admin and should always have all properties. This means that at least for now we can use setObjPropertyIfExists() for all cases.
I will remove setObjProperty() method and rename setObjPropertyIfExists() to updateObjProperty().

BlaineEXE · 2024-09-26T18:28:44Z

pkg/operator/ceph/object/objectstore.go

-		prev, err := setObjProperty(zone, namespacedPool, pool)
+		prev, err := setObjPropertyIfExists(zone, namespacedPool, pool)


For the pools added here, I think setObjPropertyIfExists() makes sense, but for existing pools, should we use setObjProperty() so that Rook can error out if the property is missing?

i think that we already have described behaviour with setObjPropertyIfExists(). it will return error if pool is missing.
Old pools should always be presented. Even it is missing in ceph, radosgw-admin will set default values to json anyway. If it will be missing in json then we will see warning in rook logs.

add rgw shared pools rados ns to topics_pool, account_pool, and group_pool Signed-off-by: Artem Torubarov <artem.torubarov@clyso.com>

BlaineEXE

I think this looks good. Thanks @arttor for taking the time to do this and working to make sure the code is as readable and maintainable as possible going forward!

I'll let @travisn have another look if he wants before merging

arttor requested review from travisn and BlaineEXE September 11, 2024 15:53

arttor force-pushed the rgw_new_shared_pools branch from 1854f01 to d83f7bc Compare September 11, 2024 15:57

travisn reviewed Sep 11, 2024

View reviewed changes

BlaineEXE requested changes Sep 12, 2024

View reviewed changes

arttor force-pushed the rgw_new_shared_pools branch from d83f7bc to c53d5ae Compare September 13, 2024 14:28

arttor force-pushed the rgw_new_shared_pools branch 2 times, most recently from bfb3c9a to 304c189 Compare September 24, 2024 12:32

arttor commented Sep 24, 2024

View reviewed changes

BlaineEXE requested changes Sep 24, 2024

View reviewed changes

arttor force-pushed the rgw_new_shared_pools branch from 304c189 to 23ff820 Compare September 26, 2024 07:26

BlaineEXE reviewed Sep 26, 2024

View reviewed changes

arttor force-pushed the rgw_new_shared_pools branch from 23ff820 to 5d44698 Compare October 14, 2024 08:42

rgw: support v19 zone pools

f6a25b0

add rgw shared pools rados ns to topics_pool, account_pool, and group_pool Signed-off-by: Artem Torubarov <artem.torubarov@clyso.com>

arttor force-pushed the rgw_new_shared_pools branch from 5d44698 to f6a25b0 Compare October 22, 2024 09:24

BlaineEXE approved these changes Oct 22, 2024

View reviewed changes

travisn approved these changes Oct 22, 2024

View reviewed changes

travisn merged commit 615086c into rook:master Oct 22, 2024
56 checks passed

arttor mentioned this pull request Oct 25, 2024

rgw: support custom name for default pool placement #14884

Merged

arttor added the backport-release-1.15 label Oct 29, 2024

mergify bot mentioned this pull request Oct 29, 2024

rgw: support new zone pools (backport #14715) #14931

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rgw: support new zone pools #14715

rgw: support new zone pools #14715

arttor commented Sep 11, 2024

travisn Sep 11, 2024

arttor Sep 12, 2024

travisn Sep 11, 2024 •

edited

Loading

arttor Sep 12, 2024

travisn Sep 12, 2024

BlaineEXE Sep 12, 2024

cbodley Sep 12, 2024

arttor Sep 13, 2024

arttor Sep 13, 2024

BlaineEXE Sep 13, 2024

BlaineEXE Sep 13, 2024

arttor Sep 24, 2024

BlaineEXE Sep 12, 2024

arttor left a comment

arttor Sep 24, 2024

BlaineEXE left a comment

BlaineEXE Sep 24, 2024

BlaineEXE Sep 24, 2024 •

edited

Loading

arttor Sep 24, 2024

arttor Sep 26, 2024

BlaineEXE Sep 26, 2024

arttor Oct 11, 2024

BlaineEXE Oct 21, 2024

arttor Oct 21, 2024

BlaineEXE Oct 21, 2024

arttor Oct 22, 2024

arttor Oct 22, 2024

BlaineEXE Sep 26, 2024

arttor Oct 11, 2024

BlaineEXE left a comment

		prev, err := setObjProperty(zone, namespacedPool, pool)
		prev, err := setObjPropertyIfExists(zone, namespacedPool, pool)

rgw: support new zone pools #14715

rgw: support new zone pools #14715

Conversation

arttor commented Sep 11, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

travisn Sep 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arttor left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BlaineEXE left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BlaineEXE Sep 24, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BlaineEXE left a comment

Choose a reason for hiding this comment

travisn Sep 11, 2024 •

edited

Loading

BlaineEXE Sep 24, 2024 •

edited

Loading