Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rbdmirror: enable rbd rados namespace mirroring #14701

Merged
merged 1 commit into from
Oct 10, 2024

Conversation

parth-gr
Copy link
Member

@parth-gr parth-gr commented Sep 9, 2024

Modify the CR to allow mirroring of an rados namespace
to a differently named namespace on the remote cluster.

Tasks:

  1. Add the new mirroring API
  2. Include snapschedule with mirroring
  3. Enable and disable of mirroring

Pend work maybe future PR:

  1. CI
  2. Add CR monitoring for mirroring
  3. Update the block pool CR with latest code added in rbdmirror
  4. rbdmirror for external mode

OPen question:
If we disable mirroring at cluster1 how we can notify to disable at cluster2 too

Checklist:

  • Commit Message Formatting: Commit titles and messages follow guidelines in the developer guide.
  • Reviewed the developer guide on Submitting a Pull Request
  • Pending release notes updated with breaking and/or notable changes for the next minor release.
  • Documentation has been updated, if necessary.
  • Unit tests have been added, if necessary.
  • Integration tests have been added, if necessary.

@parth-gr
Copy link
Member Author

parth-gr commented Sep 9, 2024

Two points still need suggestions,

  1. Do Rook needs validation for remote nameapce

  2. For rbd Daemon, do we want to create bootstrapMirrioring secret per rados namesapce

@parth-gr parth-gr changed the title rbd: enable rbd rados namesapce mirroring rbdmirror: enable rbd rados namesapce mirroring Sep 9, 2024
Copy link
Member

@Madhu-1 Madhu-1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we update the documentation and E2E testing for this one also add this to PendingReleaseNotes as its a new feature for 1.16?

pkg/apis/ceph.rook.io/v1/types.go Outdated Show resolved Hide resolved
pkg/daemon/ceph/client/radosnamespace.go Outdated Show resolved Hide resolved
pkg/operator/ceph/pool/radosnamespace/controller.go Outdated Show resolved Hide resolved
pkg/operator/ceph/pool/radosnamespace/controller.go Outdated Show resolved Hide resolved
pkg/operator/ceph/pool/radosnamespace/controller.go Outdated Show resolved Hide resolved
pkg/operator/ceph/pool/radosnamespace/controller.go Outdated Show resolved Hide resolved
pkg/apis/ceph.rook.io/v1/types.go Outdated Show resolved Hide resolved
pkg/apis/ceph.rook.io/v1/types.go Outdated Show resolved Hide resolved
pkg/apis/ceph.rook.io/v1/types.go Outdated Show resolved Hide resolved
pkg/apis/ceph.rook.io/v1/types.go Outdated Show resolved Hide resolved
pkg/apis/ceph.rook.io/v1/types.go Outdated Show resolved Hide resolved
pkg/daemon/ceph/client/radosnamespace.go Outdated Show resolved Hide resolved
pkg/operator/ceph/pool/radosnamespace/controller.go Outdated Show resolved Hide resolved
@parth-gr parth-gr marked this pull request as draft September 10, 2024 11:23
@travisn travisn changed the title rbdmirror: enable rbd rados namesapce mirroring rbdmirror: enable rbd rados namespace mirroring Sep 10, 2024
@parth-gr
Copy link
Member Author

Testing:

sh-5.1$ rbd namespace list replicapool 
NAME
r11 
sh-5.1$ 
sh-5.1$ 
sh-5.1$ rbd mirror pool enable replicapool/r11 --remote-namesapce r12
rbd: unrecognised option '--remote-namesapce'
sh-5.1$ 
sh-5.1$ rbd mirror pool enable replicapool/r11 --remote-namespace r12
rbd: must specify 'image' or 'pool' mode.
sh-5.1$ 
sh-5.1$ rbd mirror pool enable replicapool/r11 image --remote-namespace r12
 DBG: Found remote namespace : r12
sh-5.1$ 
sh-5.1$ exit
exit
runner@fv-az530-782:~/work/rook/rook$ kubectl get pods -nrook-ceph-secondary 
NAME                                               READY   STATUS      RESTARTS   AGE
rook-ceph-exporter-fv-az530-782-658c6b578c-5bz2l   1/1     Running     0          10m
rook-ceph-mgr-a-674cd7598b-vk247                   1/1     Running     0          10m
rook-ceph-mon-a-7d54d66d8d-cmrjd                   1/1     Running     0          11m
rook-ceph-osd-0-58fb76959f-6s28d                   1/1     Running     0          10m
rook-ceph-osd-prepare-fv-az530-782-6drk6           0/1     Completed   0          10m
rook-ceph-rbd-mirror-a-748db5d8b4-p6krp            1/1     Running     0          10m
rook-ceph-tools-5997889b95-tg9f5                   1/1     Running     0          13m
runner@fv-az530-782:~/work/rook/rook$ kubectl  exec rook-ceph-tools-5997889b95-tg9f5 -nrook-ceph-secondary -it -- sh 
sh-5.1$ rbd namespace create replicapool/r12
sh-5.1$ rbd namespace list replicapool 
NAME
r12 
sh-5.1$ rbd mirror pool enable replicapool/r12 image --remote-namespace r11
 DBG: Found remote namespace : r11
sh-5.1$ 
sh-5.1$ exit
exit
runner@fv-az530-782:~/work/rook/rook$ kubectl  exec rook-ceph-tools-5997889b95-l4g56 -nrook-ceph -it -- sh 
sh-5.1$ rbd create replicapool/r11/test1 -s 1G
sh-5.1$ 
sh-5.1$ rbd ls replicapool/r11
test
test1
sh-5.1$ rbd mirror image enable replicapool/r11/test1 snapshot
Mirroring enabled
sh-5.1$ rbd info replicapool/r11/test1      
rbd image 'test1':
        size 1 GiB in 256 objects
        order 22 (4 MiB objects)
        snapshot_count: 1
        id: 158471be72eb
        block_name_prefix: rbd_data.158471be72eb
        format: 2
        features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
        op_features: 
        flags: 
        create_timestamp: Wed Sep 11 09:44:39 2024
        access_timestamp: Wed Sep 11 09:44:39 2024
        modify_timestamp: Wed Sep 11 09:44:39 2024
        mirroring state: enabled
        mirroring mode: snapshot
        mirroring global id: c160ff1e-fd32-40ae-ad6e-10baf889d3a4
        mirroring primary: true
sh-5.1$ exit
exit
runner@fv-az530-782:~/work/rook/rook$ kubectl  exec rook-ceph-tools-5997889b95-tg9f5 -nrook-ceph-secondary -it -- sh 
sh-5.1$ rbd ls replicapool
test
sh-5.1$ 
sh-5.1$ rbd ls replicapool/r12
test
test1
sh-5.1$ 

Disable

runner@fv-az530-782:~/work/rook/rook$ kubectl  exec rook-ceph-tools-5997889b95-l4g56 -nrook-ceph -it -- sh 
sh-5.1$ 
sh-5.1$ rbd mirror image disable replicapool/r11/test1
Mirroring disabled
sh-5.1$ 
sh-5.1$ exit
exit
runner@fv-az530-782:~/work/rook/rook$ kubectl  exec rook-ceph-tools-5997889b95-tg9f5 -nrook-ceph-secondary -it -- sh 
sh-5.1$ rbd ls replicapool/r12
test

@parth-gr parth-gr marked this pull request as ready for review September 11, 2024 14:34
@parth-gr parth-gr force-pushed the rbd-mirror-rados branch 2 times, most recently from d0fa69f to a22b0f8 Compare September 11, 2024 15:07
pkg/apis/ceph.rook.io/v1/types.go Outdated Show resolved Hide resolved
pkg/operator/ceph/pool/radosnamespace/controller.go Outdated Show resolved Hide resolved
}

// Schedule snapshots
if len(cephBlockPoolRadosNamespace.Spec.Mirroring.SnapshotSchedules) > 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have concerns about how this conditional interacts with the EnableSnapshotSchedules() call to removeSnapshotSchedules(). It seems to me that if a user removes the last snapshot schedule, Rook might not remove it.

Generally, operators can avoid many bugs by always processing logic regardless of whether lists have items or not.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense, We have all over in the code if c.poolSpec.Mirroring.SnapshotSchedulesEnabled() {

should we refactor it?

pkg/daemon/ceph/client/radosnamespace.go Outdated Show resolved Hide resolved
pkg/operator/ceph/pool/radosnamespace/controller.go Outdated Show resolved Hide resolved
PendingReleaseNotes.md Outdated Show resolved Hide resolved
pkg/daemon/ceph/client/radosnamespace.go Outdated Show resolved Hide resolved
pkg/operator/ceph/pool/radosnamespace/controller.go Outdated Show resolved Hide resolved
@parth-gr
Copy link
Member Author

parth-gr commented Sep 12, 2024

Should we rename all the functions paramaters where poolname is written? (
example: func GetPoolMirroringInfo(context *clusterd.Context, clusterInfo *ClusterInfo, poolName string) (*cephv1.PoolMirroringInfo, error) { )

Currently used poolname/radosnmaesapce name to pass on already built up functions for poolName parameter

pkg/daemon/ceph/client/radosnamespace.go Outdated Show resolved Hide resolved
pkg/operator/ceph/pool/radosnamespace/controller.go Outdated Show resolved Hide resolved
pkg/operator/ceph/pool/radosnamespace/controller.go Outdated Show resolved Hide resolved
@BlaineEXE
Copy link
Member

Squash the commits?

Comment on lines 138 to 140
!!! note
And Disabling the cephblockpool namespace would require to disable mirroring on all the
rados namespace present underneath.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having trouble interpreting what this means, and I think users will also be confused. Can we add clarity by specifying the CRD names where appropriate? We could add field names also if needed.

I can make this suggestion, but TBH I am still confused about what the note means in terms of what actions I should take to disable mirroring.

Suggested change
!!! note
And Disabling the cephblockpool namespace would require to disable mirroring on all the
rados namespace present underneath.
!!! note
Disabling the CephBlockPool namespace requires disabling mirroring on all
CephBlockPoolRadosNamespaces present underneath.

What does "Disabling the CephBlockPool namespace" mean? Should this be "Disabling mirroring for the CephBlockPool"? Something else?

I am also wondering whether users should be informed of the inverse. This talks about disabling the feature, but what about enabling the feature? Are there considerations there that need to be noted?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No for enabling we dont need to specify it, that is not restricted or an exception

Comment on lines 56 to 58
- `remoteNamespace`: Name of the rados namespace on the peer cluster where the namespace should get mirrored.
- `snapshotSchedules`: schedule(s) snapshot at the **rados namespace** level. One or more schedules are supported.
- `interval`: frequency of the snapshots. The interval can be specified in days, hours, or minutes using d, h, m suffix respectively.
- `startTime`: optional, determines at what time the snapshot process starts, specified using the ISO 8601 time format.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two things:

  1. An example interval and start time would be helpful for users.
  2. I don't see any note about required/optional on 3 of these fields, where that is noted above.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An example interval and start time would be helpful for users.

I have explained in the example, under ### Mirroring

I don't see any note about required/optional on 3 of these fields, where that is noted above.
All fields are optional, not sure if we do that with any where else

- `enabled`: whether mirroring is enabled on that rados namespace (default: false)
- `mode`: mirroring mode to run, possible values are "pool" or "image" (required). Refer to the [mirroring modes Ceph documentation](https://docs.ceph.com/docs/master/rbd/rbd-mirroring/#enable-mirroring) for more details
- `remoteNamespace`: Name of the rados namespace on the peer cluster where the namespace should get mirrored.
- `snapshotSchedules`: schedule(s) snapshot at the **rados namespace** level. One or more schedules are supported.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something that is probably important is to note that this is an array/list.

Documentation/CRDs/Block-Storage/ceph-rbd-mirror-crd.md Outdated Show resolved Hide resolved
deploy/examples/radosnamesapce-mirrored.yaml Outdated Show resolved Hide resolved
pkg/apis/ceph.rook.io/v1/types.go Show resolved Hide resolved
pkg/apis/ceph.rook.io/v1/types.go Outdated Show resolved Hide resolved
pkg/apis/ceph.rook.io/v1/types.go Outdated Show resolved Hide resolved
pkg/apis/ceph.rook.io/v1/types.go Outdated Show resolved Hide resolved
@parth-gr
Copy link
Member Author

parth-gr commented Oct 3, 2024

Squash the commits?

Actually, if you go through the commits messages,
I have kept
1st commit as a base commit
2nd -> Exception from ceph for enable and disable which can reverted later in future releases
3rd -> Version check easily reverted later when min version is updated
4th- > unit tests

@BlaineEXE
Copy link
Member

BlaineEXE commented Oct 3, 2024

Squash the commits?

Actually, if you go through the commits messages, I have kept 1st commit as a base commit 2nd -> Exception from ceph for enable and disable which can reverted later in future releases 3rd -> Version check easily reverted later when min version is updated 4th- > unit tests

Unit tests should never be a separate commit from functionality. No exception.

IMO, commit 2 is sufficiently important for correctness of commit 1 so can be squashed.
IMO, commit 3 is also sufficiently important for correctness of commit 1+2 so can also be squashed.

Copy link
Member

@travisn travisn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some final suggestions on docs and examples...

mode: image
# specify the schedule(s) on which snapshots should be taken
# snapshotSchedules:
# - interval: 24h # daily snapshots
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the default snapshot schedule? Is it daily like this example? If so, how about uncommenting it? Even though it's the default, it's more readable that way in the example.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about the default

pkg/daemon/ceph/client/mirror.go Outdated Show resolved Hide resolved
Copy link
Member

@travisn travisn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just one more small update

Copy link
Member

@BlaineEXE BlaineEXE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving with a couple doc update requests

Documentation/CRDs/Block-Storage/ceph-block-pool-crd.md Outdated Show resolved Hide resolved
pkg/apis/ceph.rook.io/v1/types.go Outdated Show resolved Hide resolved
Modify the CR to allow mirroring of an rados namespace
to a differently named namespace on the remote cluster

1) enable rados namesapce mirroring only
if the blockpool mirrroing is enabled

2) disable blockpool mirroing only if
all the namesapce mirroing is disabled

if the rbd mirroring fails and ceph version is not supported
provide a error message with supported version details
and reason of failing

Signed-off-by: parth-gr <partharora1010@gmail.com>
Copy link
Member

@Madhu-1 Madhu-1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, small nits for better debugging

pkg/daemon/ceph/client/radosnamespace.go Show resolved Hide resolved
@parth-gr parth-gr requested a review from Madhu-1 October 10, 2024 09:45
@travisn travisn merged commit 0fa2196 into rook:master Oct 10, 2024
54 of 56 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants