Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with OSD Prepare #14754

Open
pomland-94 opened this issue Sep 22, 2024 · 9 comments
Open

Problem with OSD Prepare #14754

pomland-94 opened this issue Sep 22, 2024 · 9 comments
Labels

Comments

@pomland-94
Copy link

When I try to install Rook Ceph as described in the QuickStart Guide, I get an error when it prepares the OSDs. All config files (operator.yaml, crd.yaml, common.yaml) were not modified.

I use Kubernetes 1.30.4 on Debian 12 (ARM64), this are the Pod logs from one of the old-prepare Pods:

[2024-09-21 20:33:50,685][ceph_volume.util.disk][INFO  ] opening device /dev/sdb to check for BlueStore label
[2024-09-21 20:33:50,686][ceph_volume.process][INFO  ] Running command: /usr/sbin/udevadm info --query=property /dev/sdb
[2024-09-21 20:33:50,696][ceph_volume.process][INFO  ] stdout DEVPATH=/devices/pci0000:00/0000:00:02.5/0000:06:00.0/virtio4/host0/target0:0:0/0:0:0:2/block/sdb
[2024-09-21 20:33:50,696][ceph_volume.process][INFO  ] stdout DEVNAME=/dev/sdb
[2024-09-21 20:33:50,696][ceph_volume.process][INFO  ] stdout DEVTYPE=disk
[2024-09-21 20:33:50,696][ceph_volume.process][INFO  ] stdout DISKSEQ=12
[2024-09-21 20:33:50,696][ceph_volume.process][INFO  ] stdout MAJOR=8
[2024-09-21 20:33:50,696][ceph_volume.process][INFO  ] stdout MINOR=16
[2024-09-21 20:33:50,696][ceph_volume.process][INFO  ] stdout SUBSYSTEM=block
[2024-09-21 20:33:50,696][ceph_volume.process][INFO  ] stdout USEC_INITIALIZED=6312815107
[2024-09-21 20:33:50,696][ceph_volume.process][INFO  ] stdout ID_SCSI=1
[2024-09-21 20:33:50,696][ceph_volume.process][INFO  ] stdout ID_VENDOR=HC
[2024-09-21 20:33:50,696][ceph_volume.process][INFO  ] stdout ID_VENDOR_ENC=HC\x20\x20\x20\x20\x20\x20
[2024-09-21 20:33:50,696][ceph_volume.process][INFO  ] stdout ID_MODEL=Volume
[2024-09-21 20:33:50,696][ceph_volume.process][INFO  ] stdout ID_MODEL_ENC=Volume\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20
[2024-09-21 20:33:50,696][ceph_volume.process][INFO  ] stdout ID_REVISION=2.5+
[2024-09-21 20:33:50,696][ceph_volume.process][INFO  ] stdout ID_TYPE=disk
[2024-09-21 20:33:50,697][ceph_volume.process][INFO  ] stdout ID_SERIAL=0HC_Volume_101330090
[2024-09-21 20:33:50,697][ceph_volume.process][INFO  ] stdout ID_SERIAL_SHORT=101330090
[2024-09-21 20:33:50,697][ceph_volume.process][INFO  ] stdout ID_SCSI_SERIAL=101330090
[2024-09-21 20:33:50,697][ceph_volume.process][INFO  ] stdout ID_BUS=scsi
[2024-09-21 20:33:50,697][ceph_volume.process][INFO  ] stdout ID_PATH=pci-0000:06:00.0-scsi-0:0:0:2
[2024-09-21 20:33:50,697][ceph_volume.process][INFO  ] stdout ID_PATH_TAG=pci-0000_06_00_0-scsi-0_0_0_2
[2024-09-21 20:33:50,697][ceph_volume.process][INFO  ] stdout DEVLINKS=/dev/disk/by-id/scsi-0HC_Volume_101330090 /dev/disk/by-path/pci-0000:06:00.0-scsi-0:0:0:2 /dev/disk/by-diskseq/12
[2024-09-21 20:33:50,697][ceph_volume.process][INFO  ] stdout TAGS=:systemd:
[2024-09-21 20:33:50,697][ceph_volume.process][INFO  ] stdout CURRENT_TAGS=:systemd:
[2024-09-21 20:33:50,697][ceph_volume.process][INFO  ] Running command: /usr/bin/ceph-bluestore-tool show-label --dev /dev/sdb
[2024-09-21 20:33:50,729][ceph_volume.process][INFO  ] stderr unable to read label for /dev/sdb: (2) No such file or directory
[2024-09-21 20:33:50,730][ceph_volume.process][INFO  ] stderr 2024-09-21T20:33:50.721+0000 ffffaa316040 -1 bluestore(/dev/sdb) _read_bdev_label unable to decode label at offset 102: void bluestore_bdev_label_t::decode(ceph::buffer::v15_2_0::list::const_iterator&) decode past end of struct encoding: Malformed input [buffer:3]
[2024-09-21 20:33:50,730][ceph_volume.process][INFO  ] Running command: /usr/sbin/blkid -c /dev/null -p /dev/sdb
[2024-09-21 20:33:50,755][ceph_volume.process][INFO  ] Running command: /usr/bin/ceph-authtool --gen-print-key
[2024-09-21 20:33:50,778][ceph_volume.process][INFO  ] Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 8c3272ec-5983-4709-bcc7-69b83fa1bbc0
[2024-09-21 20:33:51,198][ceph_volume.process][INFO  ] stdout 0
[2024-09-21 20:33:51,198][ceph_volume.process][INFO  ] Running command: /usr/bin/ceph-authtool --gen-print-key
[2024-09-21 20:33:51,229][ceph_volume.process][INFO  ] Running command: /usr/bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-0
[2024-09-21 20:33:51,233][ceph_volume.util.system][INFO  ] CEPH_VOLUME_SKIP_RESTORECON environ is set, will not call restorecon
[2024-09-21 20:33:51,234][ceph_volume.process][INFO  ] Running command: /usr/bin/chown -R ceph:ceph /dev/sdb
[2024-09-21 20:33:51,239][ceph_volume.process][INFO  ] Running command: /usr/bin/ln -s /dev/sdb /var/lib/ceph/osd/ceph-0/block
[2024-09-21 20:33:51,244][ceph_volume.process][INFO  ] Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o /var/lib/ceph/osd/ceph-0/activate.monmap
[2024-09-21 20:33:51,678][ceph_volume.process][INFO  ] stderr got monmap epoch 3
[2024-09-21 20:33:51,705][ceph_volume.util.prepare][INFO  ] Creating keyring file for osd.0
[2024-09-21 20:33:51,705][ceph_volume.process][INFO  ] Running command: /usr/bin/ceph-authtool /var/lib/ceph/osd/ceph-0/keyring --create-keyring --name osd.0 --add-key AQCuLe9mQ6QxLhAA4cEJJ6+xclcEmE0vMc3TGA==
[2024-09-21 20:33:51,742][ceph_volume.process][INFO  ] Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-0/keyring
[2024-09-21 20:33:51,747][ceph_volume.process][INFO  ] Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-0/
[2024-09-21 20:33:51,751][ceph_volume.process][INFO  ] Running command: /usr/bin/ceph-osd --cluster ceph --osd-objectstore bluestore --mkfs -i 0 --monmap /var/lib/ceph/osd/ceph-0/activate.monmap --keyfile - --osd-data /var/lib/ceph/osd/ceph-0/ --osd-uuid 8c3272ec-5983-4709-bcc7-69b83fa1bbc0 --setuser ceph --setgroup ceph
[2024-09-21 20:33:51,785][ceph_volume.devices.raw.prepare][ERROR ] raw prepare was unable to complete
Traceback (most recent call last):
  File "/usr/lib/python3.9/site-packages/ceph_volume/devices/raw/prepare.py", line 80, in safe_prepare
    self.prepare()
  File "/usr/lib/python3.9/site-packages/ceph_volume/decorators.py", line 16, in is_root
    return func(*a, **kw)
  File "/usr/lib/python3.9/site-packages/ceph_volume/devices/raw/prepare.py", line 118, in prepare
    prepare_bluestore(
  File "/usr/lib/python3.9/site-packages/ceph_volume/devices/raw/prepare.py", line 53, in prepare_bluestore
    prepare_utils.osd_mkfs_bluestore(
  File "/usr/lib/python3.9/site-packages/ceph_volume/util/prepare.py", line 459, in osd_mkfs_bluestore
    raise RuntimeError('Command failed with exit code %s: %s' % (returncode, ' '.join(command)))
RuntimeError: Command failed with exit code -11: /usr/bin/ceph-osd --cluster ceph --osd-objectstore bluestore --mkfs -i 0 --monmap /var/lib/ceph/osd/ceph-0/activate.monmap --keyfile - --osd-data /var/lib/ceph/osd/ceph-0/ --osd-uuid 8c3272ec-5983-4709-bcc7-69b83fa1bbc0 --setuser ceph --setgroup ceph
[2024-09-21 20:33:51,786][ceph_volume.devices.raw.prepare][INFO  ] will rollback OSD ID creation
[2024-09-21 20:33:51,787][ceph_volume.process][INFO  ] Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd purge-new osd.0 --yes-i-really-mean-it
[2024-09-21 20:33:52,173][ceph_volume.process][INFO  ] stderr purged osd.0
[2024-09-21 20:33:52,199][ceph_volume.process][INFO  ] Running command: /usr/bin/systemctl is-active ceph-osd@0
[2024-09-21 20:33:52,209][ceph_volume.process][INFO  ] stderr System has not been booted with systemd as init system (PID 1). Can't operate.
[2024-09-21 20:33:52,209][ceph_volume.process][INFO  ] stderr Failed to connect to bus: Host is down
[2024-09-21 20:33:52,214][ceph_volume.util.system][INFO  ] Executable lvs found on the host, will use /sbin/lvs
[2024-09-21 20:33:52,214][ceph_volume.process][INFO  ] Running command: nsenter --mount=/rootfs/proc/1/ns/mnt --ipc=/rootfs/proc/1/ns/ipc --net=/rootfs/proc/1/ns/net --uts=/rootfs/proc/1/ns/uts /sbin/lvs --noheadings --readonly --separator=";" -a --units=b --nosuffix -S tags={ceph.osd_id=0} -o lv_tags,lv_path,lv_name,vg_name,lv_uuid,lv_size
[2024-09-21 20:33:52,289][ceph_volume][ERROR ] exception caught by decorator
Traceback (most recent call last):
  File "/usr/lib/python3.9/site-packages/ceph_volume/devices/raw/prepare.py", line 80, in safe_prepare
    self.prepare()
  File "/usr/lib/python3.9/site-packages/ceph_volume/decorators.py", line 16, in is_root
    return func(*a, **kw)
  File "/usr/lib/python3.9/site-packages/ceph_volume/devices/raw/prepare.py", line 118, in prepare
    prepare_bluestore(
  File "/usr/lib/python3.9/site-packages/ceph_volume/devices/raw/prepare.py", line 53, in prepare_bluestore
    prepare_utils.osd_mkfs_bluestore(
  File "/usr/lib/python3.9/site-packages/ceph_volume/util/prepare.py", line 459, in osd_mkfs_bluestore
    raise RuntimeError('Command failed with exit code %s: %s' % (returncode, ' '.join(command)))
RuntimeError: Command failed with exit code -11: /usr/bin/ceph-osd --cluster ceph --osd-objectstore bluestore --mkfs -i 0 --monmap /var/lib/ceph/osd/ceph-0/activate.monmap --keyfile - --osd-data /var/lib/ceph/osd/ceph-0/ --osd-uuid 8c3272ec-5983-4709-bcc7-69b83fa1bbc0 --setuser ceph --setgroup ceph

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.9/site-packages/ceph_volume/decorators.py", line 59, in newfunc
    return f(*a, **kw)
  File "/usr/lib/python3.9/site-packages/ceph_volume/main.py", line 153, in main
    terminal.dispatch(self.mapper, subcommand_args)
  File "/usr/lib/python3.9/site-packages/ceph_volume/terminal.py", line 194, in dispatch
    instance.main()
  File "/usr/lib/python3.9/site-packages/ceph_volume/devices/raw/main.py", line 32, in main
    terminal.dispatch(self.mapper, self.argv)
  File "/usr/lib/python3.9/site-packages/ceph_volume/terminal.py", line 194, in dispatch
    instance.main()
  File "/usr/lib/python3.9/site-packages/ceph_volume/devices/raw/prepare.py", line 160, in main
    self.safe_prepare(self.args)
  File "/usr/lib/python3.9/site-packages/ceph_volume/devices/raw/prepare.py", line 84, in safe_prepare
    rollback_osd(self.args, self.osd_id)
  File "/usr/lib/python3.9/site-packages/ceph_volume/devices/lvm/common.py", line 35, in rollback_osd
    Zap(['--destroy', '--osd-id', osd_id]).main()
  File "/usr/lib/python3.9/site-packages/ceph_volume/devices/lvm/zap.py", line 407, in main
    self.zap_osd()
  File "/usr/lib/python3.9/site-packages/ceph_volume/decorators.py", line 16, in is_root
    return func(*a, **kw)
  File "/usr/lib/python3.9/site-packages/ceph_volume/devices/lvm/zap.py", line 305, in zap_osd
    devices = find_associated_devices(self.args.osd_id, self.args.osd_fsid)
  File "/usr/lib/python3.9/site-packages/ceph_volume/devices/lvm/zap.py", line 88, in find_associated_devices
    raise RuntimeError('Unable to find any LV for zapping OSD: '
RuntimeError: Unable to find any LV for zapping OSD: 0
2024-09-21 20:33:52.371758 C | rookcmd: failed to configure devices: failed to initialize osd: failed to run ceph-volume raw command. Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 8c3272ec-5983-4709-bcc7-69b83fa1bbc0
Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /usr/bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-0
Running command: /usr/bin/chown -R ceph:ceph /dev/sdb
Running command: /usr/bin/ln -s /dev/sdb /var/lib/ceph/osd/ceph-0/block
Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o /var/lib/ceph/osd/ceph-0/activate.monmap
 stderr: got monmap epoch 3
--> Creating keyring file for osd.0
Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-0/keyring
Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-0/
Running command: /usr/bin/ceph-osd --cluster ceph --osd-objectstore bluestore --mkfs -i 0 --monmap /var/lib/ceph/osd/ceph-0/activate.monmap --keyfile - --osd-data /var/lib/ceph/osd/ceph-0/ --osd-uuid 8c3272ec-5983-4709-bcc7-69b83fa1bbc0 --setuser ceph --setgroup ceph
--> Was unable to complete a new OSD, will rollback changes
Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd purge-new osd.0 --yes-i-really-mean-it
 stderr: purged osd.0
Traceback (most recent call last):
  File "/usr/lib/python3.9/site-packages/ceph_volume/devices/raw/prepare.py", line 80, in safe_prepare
    self.prepare()
  File "/usr/lib/python3.9/site-packages/ceph_volume/decorators.py", line 16, in is_root
    return func(*a, **kw)
  File "/usr/lib/python3.9/site-packages/ceph_volume/devices/raw/prepare.py", line 118, in prepare
    prepare_bluestore(
  File "/usr/lib/python3.9/site-packages/ceph_volume/devices/raw/prepare.py", line 53, in prepare_bluestore
    prepare_utils.osd_mkfs_bluestore(
  File "/usr/lib/python3.9/site-packages/ceph_volume/util/prepare.py", line 459, in osd_mkfs_bluestore
    raise RuntimeError('Command failed with exit code %s: %s' % (returncode, ' '.join(command)))
RuntimeError: Command failed with exit code -11: /usr/bin/ceph-osd --cluster ceph --osd-objectstore bluestore --mkfs -i 0 --monmap /var/lib/ceph/osd/ceph-0/activate.monmap --keyfile - --osd-data /var/lib/ceph/osd/ceph-0/ --osd-uuid 8c3272ec-5983-4709-bcc7-69b83fa1bbc0 --setuser ceph --setgroup ceph

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/sbin/ceph-volume", line 33, in <module>
    sys.exit(load_entry_point('ceph-volume==1.0.0', 'console_scripts', 'ceph-volume')())
  File "/usr/lib/python3.9/site-packages/ceph_volume/main.py", line 41, in __init__
    self.main(self.argv)
  File "/usr/lib/python3.9/site-packages/ceph_volume/decorators.py", line 59, in newfunc
    return f(*a, **kw)
  File "/usr/lib/python3.9/site-packages/ceph_volume/main.py", line 153, in main
    terminal.dispatch(self.mapper, subcommand_args)
  File "/usr/lib/python3.9/site-packages/ceph_volume/terminal.py", line 194, in dispatch
    instance.main()
  File "/usr/lib/python3.9/site-packages/ceph_volume/devices/raw/main.py", line 32, in main
    terminal.dispatch(self.mapper, self.argv)
  File "/usr/lib/python3.9/site-packages/ceph_volume/terminal.py", line 194, in dispatch
    instance.main()
  File "/usr/lib/python3.9/site-packages/ceph_volume/devices/raw/prepare.py", line 160, in main
    self.safe_prepare(self.args)
  File "/usr/lib/python3.9/site-packages/ceph_volume/devices/raw/prepare.py", line 84, in safe_prepare
    rollback_osd(self.args, self.osd_id)
  File "/usr/lib/python3.9/site-packages/ceph_volume/devices/lvm/common.py", line 35, in rollback_osd
    Zap(['--destroy', '--osd-id', osd_id]).main()
  File "/usr/lib/python3.9/site-packages/ceph_volume/devices/lvm/zap.py", line 407, in main
    self.zap_osd()
  File "/usr/lib/python3.9/site-packages/ceph_volume/decorators.py", line 16, in is_root
    return func(*a, **kw)
  File "/usr/lib/python3.9/site-packages/ceph_volume/devices/lvm/zap.py", line 305, in zap_osd
    devices = find_associated_devices(self.args.osd_id, self.args.osd_fsid)
  File "/usr/lib/python3.9/site-packages/ceph_volume/devices/lvm/zap.py", line 88, in find_associated_devices
    raise RuntimeError('Unable to find any LV for zapping OSD: '
RuntimeError: Unable to find any LV for zapping OSD: 0: exit status 1
@pomland-94 pomland-94 added the bug label Sep 22, 2024
@BlaineEXE
Copy link
Member

@pomland-94 the issue template requests versions for Rook and Ceph as well as other fields. Please fill in all fields so we can help triage more effectively

<!-- **Are you in the right place?**
1. For issues or feature requests, please create an issue in this repository.
2. For general technical and non-technical questions, we are happy to help you on our [Rook.io Slack](https://slack.rook.io/).
3. Did you already search the existing open issues for anything similar? -->

**Is this a bug report or feature request?**
* Bug Report

**Deviation from expected behavior:**

**Expected behavior:**

**How to reproduce it (minimal and precise):**
<!-- Please let us know any circumstances for reproduction of your bug. -->

**File(s) to submit**:

* Cluster CR (custom resource), typically called `cluster.yaml`, if necessary

**Logs to submit**:

* Operator's logs, if necessary
* Crashing pod(s) logs, if necessary

  To get logs, use `kubectl -n <namespace> logs <pod name>`
  When pasting logs, always surround them with backticks or use the `insert code` button from the Github UI.
  Read [GitHub documentation if you need help](https://help.github.com/en/articles/creating-and-highlighting-code-blocks).

**Cluster Status to submit**:

* Output of kubectl commands, if necessary

  To get the health of the cluster, use `kubectl rook-ceph health`
  To get the status of the cluster, use `kubectl rook-ceph ceph status`
  For more details, see the [Rook kubectl Plugin](https://rook.io/docs/rook/latest-release/Troubleshooting/kubectl-plugin)

**Environment**:
* OS (e.g. from /etc/os-release):
* Kernel (e.g. `uname -a`):
* Cloud provider or hardware configuration:
* Rook version (use `rook version` inside of a Rook Pod):
* Storage backend version (e.g. for ceph do `ceph -v`):
* Kubernetes version (use `kubectl version`):
* Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift):
* Storage backend status (e.g. for Ceph use `ceph health` in the [Rook Ceph toolbox](https://rook.io/docs/rook/latest-release/Troubleshooting/ceph-toolbox/#interactive-toolbox)):

@pomland-94
Copy link
Author

Is this a bug report or feature request?

  • Bug Report

Deviation from expected behavior:

Expected behavior:
OSDs should be provisioned without error

How to reproduce it (minimal and precise):
I do exactly what the Quickstart Guide says: https://rook.io/docs/rook/latest-release/Getting-Started/quickstart/
I have not change any configs/deployment files

File(s) to submit:

[2024-09-21 20:33:50,685][ceph_volume.util.disk][INFO  ] opening device /dev/sdb to check for BlueStore label
[2024-09-21 20:33:50,686][ceph_volume.process][INFO  ] Running command: /usr/sbin/udevadm info --query=property /dev/sdb
[2024-09-21 20:33:50,696][ceph_volume.process][INFO  ] stdout DEVPATH=/devices/pci0000:00/0000:00:02.5/0000:06:00.0/virtio4/host0/target0:0:0/0:0:0:2/block/sdb
[2024-09-21 20:33:50,696][ceph_volume.process][INFO  ] stdout DEVNAME=/dev/sdb
[2024-09-21 20:33:50,696][ceph_volume.process][INFO  ] stdout DEVTYPE=disk
[2024-09-21 20:33:50,696][ceph_volume.process][INFO  ] stdout DISKSEQ=12
[2024-09-21 20:33:50,696][ceph_volume.process][INFO  ] stdout MAJOR=8
[2024-09-21 20:33:50,696][ceph_volume.process][INFO  ] stdout MINOR=16
[2024-09-21 20:33:50,696][ceph_volume.process][INFO  ] stdout SUBSYSTEM=block
[2024-09-21 20:33:50,696][ceph_volume.process][INFO  ] stdout USEC_INITIALIZED=6312815107
[2024-09-21 20:33:50,696][ceph_volume.process][INFO  ] stdout ID_SCSI=1
[2024-09-21 20:33:50,696][ceph_volume.process][INFO  ] stdout ID_VENDOR=HC
[2024-09-21 20:33:50,696][ceph_volume.process][INFO  ] stdout ID_VENDOR_ENC=HC\x20\x20\x20\x20\x20\x20
[2024-09-21 20:33:50,696][ceph_volume.process][INFO  ] stdout ID_MODEL=Volume
[2024-09-21 20:33:50,696][ceph_volume.process][INFO  ] stdout ID_MODEL_ENC=Volume\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20
[2024-09-21 20:33:50,696][ceph_volume.process][INFO  ] stdout ID_REVISION=2.5+
[2024-09-21 20:33:50,696][ceph_volume.process][INFO  ] stdout ID_TYPE=disk
[2024-09-21 20:33:50,697][ceph_volume.process][INFO  ] stdout ID_SERIAL=0HC_Volume_101330090
[2024-09-21 20:33:50,697][ceph_volume.process][INFO  ] stdout ID_SERIAL_SHORT=101330090
[2024-09-21 20:33:50,697][ceph_volume.process][INFO  ] stdout ID_SCSI_SERIAL=101330090
[2024-09-21 20:33:50,697][ceph_volume.process][INFO  ] stdout ID_BUS=scsi
[2024-09-21 20:33:50,697][ceph_volume.process][INFO  ] stdout ID_PATH=pci-0000:06:00.0-scsi-0:0:0:2
[2024-09-21 20:33:50,697][ceph_volume.process][INFO  ] stdout ID_PATH_TAG=pci-0000_06_00_0-scsi-0_0_0_2
[2024-09-21 20:33:50,697][ceph_volume.process][INFO  ] stdout DEVLINKS=/dev/disk/by-id/scsi-0HC_Volume_101330090 /dev/disk/by-path/pci-0000:06:00.0-scsi-0:0:0:2 /dev/disk/by-diskseq/12
[2024-09-21 20:33:50,697][ceph_volume.process][INFO  ] stdout TAGS=:systemd:
[2024-09-21 20:33:50,697][ceph_volume.process][INFO  ] stdout CURRENT_TAGS=:systemd:
[2024-09-21 20:33:50,697][ceph_volume.process][INFO  ] Running command: /usr/bin/ceph-bluestore-tool show-label --dev /dev/sdb
[2024-09-21 20:33:50,729][ceph_volume.process][INFO  ] stderr unable to read label for /dev/sdb: (2) No such file or directory
[2024-09-21 20:33:50,730][ceph_volume.process][INFO  ] stderr 2024-09-21T20:33:50.721+0000 ffffaa316040 -1 bluestore(/dev/sdb) _read_bdev_label unable to decode label at offset 102: void bluestore_bdev_label_t::decode(ceph::buffer::v15_2_0::list::const_iterator&) decode past end of struct encoding: Malformed input [buffer:3]
[2024-09-21 20:33:50,730][ceph_volume.process][INFO  ] Running command: /usr/sbin/blkid -c /dev/null -p /dev/sdb
[2024-09-21 20:33:50,755][ceph_volume.process][INFO  ] Running command: /usr/bin/ceph-authtool --gen-print-key
[2024-09-21 20:33:50,778][ceph_volume.process][INFO  ] Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 8c3272ec-5983-4709-bcc7-69b83fa1bbc0
[2024-09-21 20:33:51,198][ceph_volume.process][INFO  ] stdout 0
[2024-09-21 20:33:51,198][ceph_volume.process][INFO  ] Running command: /usr/bin/ceph-authtool --gen-print-key
[2024-09-21 20:33:51,229][ceph_volume.process][INFO  ] Running command: /usr/bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-0
[2024-09-21 20:33:51,233][ceph_volume.util.system][INFO  ] CEPH_VOLUME_SKIP_RESTORECON environ is set, will not call restorecon
[2024-09-21 20:33:51,234][ceph_volume.process][INFO  ] Running command: /usr/bin/chown -R ceph:ceph /dev/sdb
[2024-09-21 20:33:51,239][ceph_volume.process][INFO  ] Running command: /usr/bin/ln -s /dev/sdb /var/lib/ceph/osd/ceph-0/block
[2024-09-21 20:33:51,244][ceph_volume.process][INFO  ] Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o /var/lib/ceph/osd/ceph-0/activate.monmap
[2024-09-21 20:33:51,678][ceph_volume.process][INFO  ] stderr got monmap epoch 3
[2024-09-21 20:33:51,705][ceph_volume.util.prepare][INFO  ] Creating keyring file for osd.0
[2024-09-21 20:33:51,705][ceph_volume.process][INFO  ] Running command: /usr/bin/ceph-authtool /var/lib/ceph/osd/ceph-0/keyring --create-keyring --name osd.0 --add-key AQCuLe9mQ6QxLhAA4cEJJ6+xclcEmE0vMc3TGA==
[2024-09-21 20:33:51,742][ceph_volume.process][INFO  ] Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-0/keyring
[2024-09-21 20:33:51,747][ceph_volume.process][INFO  ] Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-0/
[2024-09-21 20:33:51,751][ceph_volume.process][INFO  ] Running command: /usr/bin/ceph-osd --cluster ceph --osd-objectstore bluestore --mkfs -i 0 --monmap /var/lib/ceph/osd/ceph-0/activate.monmap --keyfile - --osd-data /var/lib/ceph/osd/ceph-0/ --osd-uuid 8c3272ec-5983-4709-bcc7-69b83fa1bbc0 --setuser ceph --setgroup ceph
[2024-09-21 20:33:51,785][ceph_volume.devices.raw.prepare][ERROR ] raw prepare was unable to complete
Traceback (most recent call last):
  File "/usr/lib/python3.9/site-packages/ceph_volume/devices/raw/prepare.py", line 80, in safe_prepare
    self.prepare()
  File "/usr/lib/python3.9/site-packages/ceph_volume/decorators.py", line 16, in is_root
    return func(*a, **kw)
  File "/usr/lib/python3.9/site-packages/ceph_volume/devices/raw/prepare.py", line 118, in prepare
    prepare_bluestore(
  File "/usr/lib/python3.9/site-packages/ceph_volume/devices/raw/prepare.py", line 53, in prepare_bluestore
    prepare_utils.osd_mkfs_bluestore(
  File "/usr/lib/python3.9/site-packages/ceph_volume/util/prepare.py", line 459, in osd_mkfs_bluestore
    raise RuntimeError('Command failed with exit code %s: %s' % (returncode, ' '.join(command)))
RuntimeError: Command failed with exit code -11: /usr/bin/ceph-osd --cluster ceph --osd-objectstore bluestore --mkfs -i 0 --monmap /var/lib/ceph/osd/ceph-0/activate.monmap --keyfile - --osd-data /var/lib/ceph/osd/ceph-0/ --osd-uuid 8c3272ec-5983-4709-bcc7-69b83fa1bbc0 --setuser ceph --setgroup ceph
[2024-09-21 20:33:51,786][ceph_volume.devices.raw.prepare][INFO  ] will rollback OSD ID creation
[2024-09-21 20:33:51,787][ceph_volume.process][INFO  ] Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd purge-new osd.0 --yes-i-really-mean-it
[2024-09-21 20:33:52,173][ceph_volume.process][INFO  ] stderr purged osd.0
[2024-09-21 20:33:52,199][ceph_volume.process][INFO  ] Running command: /usr/bin/systemctl is-active ceph-osd@0
[2024-09-21 20:33:52,209][ceph_volume.process][INFO  ] stderr System has not been booted with systemd as init system (PID 1). Can't operate.
[2024-09-21 20:33:52,209][ceph_volume.process][INFO  ] stderr Failed to connect to bus: Host is down
[2024-09-21 20:33:52,214][ceph_volume.util.system][INFO  ] Executable lvs found on the host, will use /sbin/lvs
[2024-09-21 20:33:52,214][ceph_volume.process][INFO  ] Running command: nsenter --mount=/rootfs/proc/1/ns/mnt --ipc=/rootfs/proc/1/ns/ipc --net=/rootfs/proc/1/ns/net --uts=/rootfs/proc/1/ns/uts /sbin/lvs --noheadings --readonly --separator=";" -a --units=b --nosuffix -S tags={ceph.osd_id=0} -o lv_tags,lv_path,lv_name,vg_name,lv_uuid,lv_size
[2024-09-21 20:33:52,289][ceph_volume][ERROR ] exception caught by decorator
Traceback (most recent call last):
  File "/usr/lib/python3.9/site-packages/ceph_volume/devices/raw/prepare.py", line 80, in safe_prepare
    self.prepare()
  File "/usr/lib/python3.9/site-packages/ceph_volume/decorators.py", line 16, in is_root
    return func(*a, **kw)
  File "/usr/lib/python3.9/site-packages/ceph_volume/devices/raw/prepare.py", line 118, in prepare
    prepare_bluestore(
  File "/usr/lib/python3.9/site-packages/ceph_volume/devices/raw/prepare.py", line 53, in prepare_bluestore
    prepare_utils.osd_mkfs_bluestore(
  File "/usr/lib/python3.9/site-packages/ceph_volume/util/prepare.py", line 459, in osd_mkfs_bluestore
    raise RuntimeError('Command failed with exit code %s: %s' % (returncode, ' '.join(command)))
RuntimeError: Command failed with exit code -11: /usr/bin/ceph-osd --cluster ceph --osd-objectstore bluestore --mkfs -i 0 --monmap /var/lib/ceph/osd/ceph-0/activate.monmap --keyfile - --osd-data /var/lib/ceph/osd/ceph-0/ --osd-uuid 8c3272ec-5983-4709-bcc7-69b83fa1bbc0 --setuser ceph --setgroup ceph

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.9/site-packages/ceph_volume/decorators.py", line 59, in newfunc
    return f(*a, **kw)
  File "/usr/lib/python3.9/site-packages/ceph_volume/main.py", line 153, in main
    terminal.dispatch(self.mapper, subcommand_args)
  File "/usr/lib/python3.9/site-packages/ceph_volume/terminal.py", line 194, in dispatch
    instance.main()
  File "/usr/lib/python3.9/site-packages/ceph_volume/devices/raw/main.py", line 32, in main
    terminal.dispatch(self.mapper, self.argv)
  File "/usr/lib/python3.9/site-packages/ceph_volume/terminal.py", line 194, in dispatch
    instance.main()
  File "/usr/lib/python3.9/site-packages/ceph_volume/devices/raw/prepare.py", line 160, in main
    self.safe_prepare(self.args)
  File "/usr/lib/python3.9/site-packages/ceph_volume/devices/raw/prepare.py", line 84, in safe_prepare
    rollback_osd(self.args, self.osd_id)
  File "/usr/lib/python3.9/site-packages/ceph_volume/devices/lvm/common.py", line 35, in rollback_osd
    Zap(['--destroy', '--osd-id', osd_id]).main()
  File "/usr/lib/python3.9/site-packages/ceph_volume/devices/lvm/zap.py", line 407, in main
    self.zap_osd()
  File "/usr/lib/python3.9/site-packages/ceph_volume/decorators.py", line 16, in is_root
    return func(*a, **kw)
  File "/usr/lib/python3.9/site-packages/ceph_volume/devices/lvm/zap.py", line 305, in zap_osd
    devices = find_associated_devices(self.args.osd_id, self.args.osd_fsid)
  File "/usr/lib/python3.9/site-packages/ceph_volume/devices/lvm/zap.py", line 88, in find_associated_devices
    raise RuntimeError('Unable to find any LV for zapping OSD: '
RuntimeError: Unable to find any LV for zapping OSD: 0
2024-09-21 20:33:52.371758 C | rookcmd: failed to configure devices: failed to initialize osd: failed to run ceph-volume raw command. Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 8c3272ec-5983-4709-bcc7-69b83fa1bbc0
Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /usr/bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-0
Running command: /usr/bin/chown -R ceph:ceph /dev/sdb
Running command: /usr/bin/ln -s /dev/sdb /var/lib/ceph/osd/ceph-0/block
Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o /var/lib/ceph/osd/ceph-0/activate.monmap
 stderr: got monmap epoch 3
--> Creating keyring file for osd.0
Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-0/keyring
Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-0/
Running command: /usr/bin/ceph-osd --cluster ceph --osd-objectstore bluestore --mkfs -i 0 --monmap /var/lib/ceph/osd/ceph-0/activate.monmap --keyfile - --osd-data /var/lib/ceph/osd/ceph-0/ --osd-uuid 8c3272ec-5983-4709-bcc7-69b83fa1bbc0 --setuser ceph --setgroup ceph
--> Was unable to complete a new OSD, will rollback changes
Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd purge-new osd.0 --yes-i-really-mean-it
 stderr: purged osd.0
Traceback (most recent call last):
  File "/usr/lib/python3.9/site-packages/ceph_volume/devices/raw/prepare.py", line 80, in safe_prepare
    self.prepare()
  File "/usr/lib/python3.9/site-packages/ceph_volume/decorators.py", line 16, in is_root
    return func(*a, **kw)
  File "/usr/lib/python3.9/site-packages/ceph_volume/devices/raw/prepare.py", line 118, in prepare
    prepare_bluestore(
  File "/usr/lib/python3.9/site-packages/ceph_volume/devices/raw/prepare.py", line 53, in prepare_bluestore
    prepare_utils.osd_mkfs_bluestore(
  File "/usr/lib/python3.9/site-packages/ceph_volume/util/prepare.py", line 459, in osd_mkfs_bluestore
    raise RuntimeError('Command failed with exit code %s: %s' % (returncode, ' '.join(command)))
RuntimeError: Command failed with exit code -11: /usr/bin/ceph-osd --cluster ceph --osd-objectstore bluestore --mkfs -i 0 --monmap /var/lib/ceph/osd/ceph-0/activate.monmap --keyfile - --osd-data /var/lib/ceph/osd/ceph-0/ --osd-uuid 8c3272ec-5983-4709-bcc7-69b83fa1bbc0 --setuser ceph --setgroup ceph

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/sbin/ceph-volume", line 33, in <module>
    sys.exit(load_entry_point('ceph-volume==1.0.0', 'console_scripts', 'ceph-volume')())
  File "/usr/lib/python3.9/site-packages/ceph_volume/main.py", line 41, in __init__
    self.main(self.argv)
  File "/usr/lib/python3.9/site-packages/ceph_volume/decorators.py", line 59, in newfunc
    return f(*a, **kw)
  File "/usr/lib/python3.9/site-packages/ceph_volume/main.py", line 153, in main
    terminal.dispatch(self.mapper, subcommand_args)
  File "/usr/lib/python3.9/site-packages/ceph_volume/terminal.py", line 194, in dispatch
    instance.main()
  File "/usr/lib/python3.9/site-packages/ceph_volume/devices/raw/main.py", line 32, in main
    terminal.dispatch(self.mapper, self.argv)
  File "/usr/lib/python3.9/site-packages/ceph_volume/terminal.py", line 194, in dispatch
    instance.main()
  File "/usr/lib/python3.9/site-packages/ceph_volume/devices/raw/prepare.py", line 160, in main
    self.safe_prepare(self.args)
  File "/usr/lib/python3.9/site-packages/ceph_volume/devices/raw/prepare.py", line 84, in safe_prepare
    rollback_osd(self.args, self.osd_id)
  File "/usr/lib/python3.9/site-packages/ceph_volume/devices/lvm/common.py", line 35, in rollback_osd
    Zap(['--destroy', '--osd-id', osd_id]).main()
  File "/usr/lib/python3.9/site-packages/ceph_volume/devices/lvm/zap.py", line 407, in main
    self.zap_osd()
  File "/usr/lib/python3.9/site-packages/ceph_volume/decorators.py", line 16, in is_root
    return func(*a, **kw)
  File "/usr/lib/python3.9/site-packages/ceph_volume/devices/lvm/zap.py", line 305, in zap_osd
    devices = find_associated_devices(self.args.osd_id, self.args.osd_fsid)
  File "/usr/lib/python3.9/site-packages/ceph_volume/devices/lvm/zap.py", line 88, in find_associated_devices
    raise RuntimeError('Unable to find any LV for zapping OSD: '
RuntimeError: Unable to find any LV for zapping OSD: 0: exit status 1

Environment:

  • OS (e.g. from /etc/os-release):
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"
VERSION_ID="12"
VERSION="12 (bookworm)"
VERSION_CODENAME=bookworm
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
  • Kernel (e.g. uname -a):
Linux admingw-01 6.1.0-25-arm64 #1 SMP Debian 6.1.106-3 (2024-08-26) aarch64 GNU/Linux
  • Cloud provider or hardware configuration: On Premise Hardware

  • Rook version (use rook version inside of a Rook Pod):

  • Storage backend version (e.g. for ceph do ceph -v):

  • Kubernetes version (use kubectl version):

Client Version: v1.30.1
Server Version v1.30.1
  • Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift): On Premise, Plain Kubernetes
  • Storage backend status (e.g. for Ceph use ceph health in the Rook Ceph toolbox):

@BlaineEXE
Copy link
Member

BlaineEXE commented Sep 25, 2024

We don't have any known ceph/rook issues with the current 1.15 manifests. The error code being returned is 11, which is EAGAIN (try again). For disk operations, this usually means that the disk is in use by something, possibly a filesystem, or that the disk has some internal error.

Check to make sure that /dev/sdb is wiped with sgdisk --zap or an equivalent, and that it isn't mounted to fstab or anything like that.

@pomland-94
Copy link
Author

My Disk are completely cleaned with sgdisk --zap-all /dev/sdb And not in use.

Here are my Operator logs:
https://gist.github.com/pomland-94/f9a9e2f6ba6477796c797d6eef9f66fc

@pomland-94
Copy link
Author

Ok, i try the rook installation from an old release version 1.14.5 and everything works fine, so that seem sto be a Problem with the Version 1.15 which i have to check.

I will Setup some test clusters and do it version by version, when i have any news i will comment them here.

@rajha-korithrien
Copy link

rajha-korithrien commented Sep 26, 2024

I can confirm encountering a similar error, but when installing on top of PVC.

We install the rook operator via helm.
We install the ceph cluster via helm.

We install on top of PVC so the underlying storage is certain to be clean.

Our helm answers for installing the ceph cluster are:

cephClusterSpec:
  dataDirHostPath: /var/lib/rook
  mon:
    count: 3
    allowMultiplePerNode: false
    volumeClaimTemplate:
      spec:
        storageClassName: openebs-lvm
        resources:
          requests:
            storage: 1Gi
  storage:
    storageClassDeviceSets:
    - name: set1
      count: 3
      portable: false
      volumeClaimTemplates:
      - metadata:
          name: data
        spec:
          resources:
            requests:
              storage: 2Gi
          storageClassName: openebs-lvm
          volumeMode: Block
          accessModes:
            - ReadWriteOnce
    onlyApplyOSDPlacement: false
cephBlockPools:
  - name: cephblock0
    spec:
      replicated:
        size: 3
    storageClass:
      isDefault: false
cephFileSystems: []
cephObjectStores: []

note The small sizes in the above are because we are trying to quickly iterate to discover which version of Rook introduced the failure.

The method to reproduce is:

helm install --create-namespace --namespace rook-ceph rook-ceph rook-release/rook-ceph --version v1.12.1

To install the operator and then

helm install --create-namespace --namespace rook-ceph rook-ceph-cluster --set operatorNamespace=rook-ceph rook-release/rook-ceph-cluster -f rook-ceph-cluster-answers.yaml --version v1.12.1

To install the ceph cluster. We have been testing in VMs to quickly iterate over rook versions.
Between installs we delete /var/lib/rook on each node and also remove the CRDs the helm chart installs.

It works fine for 1.12.1, 1.13.8, 1.14.0, 1.14.6, 1.14.7, 1.14.8 but fails on 1.14.9, 1.14.11, 1.15.1, 1.15.2 (those are the versions we tested)

Have added the complete output of the OSD prepare pod from 1.14.11. This is the prepare pod that runs against the "set" specified in the ceph cluster. For example rook-ceph-osd-prepare-set1-data-0dnk6q-vj9n8

rook-ceph-osd-prepare-set1-data-0dnk6q-vj9n8.log

I find this part interesting:

[2024-09-25 22:59:35,711][ceph_volume.util.disk][INFO  ] opening device /mnt/set1-data-0dnk6q to check for BlueStore label
[2024-09-25 22:59:35,711][ceph_volume.process][INFO  ] Running command: /usr/sbin/udevadm info --query=property /mnt/set1-data-0dnk6q
[2024-09-25 22:59:35,715][ceph_volume.process][INFO  ] stderr Unknown device "/mnt/set1-data-0dnk6q": No such device
[2024-09-25 22:59:35,716][ceph_volume.process][INFO  ] Running command: /usr/bin/ceph-bluestore-tool show-label --dev /mnt/set1-data-0dnk6q
[2024-09-25 22:59:35,739][ceph_volume.process][INFO  ] stderr unable to read label for /mnt/set1-data-0dnk6q: (2) No such file or directory
[2024-09-25 22:59:35,739][ceph_volume.process][INFO  ] Running command: /usr/sbin/blkid -c /dev/null -p /mnt/set1-data-0dnk6q

because earlier in the log, it finds /mnt/set1-data-0dnk6q correctly, i.e:

2024-09-25 23:01:07.227149 D | exec: Running command: lsblk /mnt/set1-data-0dnk6q --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME,MOUNTPOINT,FSTYPE
2024-09-25 23:01:07.228585 D | sys: lsblk output: "SIZE=\"2147483648\" ROTA=\"0\" RO=\"0\" TYPE=\"lvm\" PKNAME=\"\" NAME=\"/dev/mapper/k8s-pvc--e77d9e81--cae7--4815--a795--14821dbd8f5b\" KNAME=\"/dev/dm-5\" MOUNTPOINT=\"\" FSTYPE=\"\""
2024-09-25 23:01:07.228604 D | exec: Running command: udevadm info --query=property /dev/dm-5

There are very similar if not identical logs in 1.41.11, 1.15.1 and 1.15.2

An example of a working setup (from 1.14.0) can be found in:
rook-ceph-osd-prepare-set1-data-0qtkh8-zpnd4.log

Are these similar enough to be in this same issue, or should I open a new issue?

@pomland-94
Copy link
Author

When i use "rook Version 1.14.8" everything works fine, when i use "rook version 1.14.9" and above i get the errors inside the osd prepare Pods. So it seems that there is a change at Version 1.14.9 which is responsible for the Problem, the question is what change and why....

@travisn
Copy link
Member

travisn commented Sep 27, 2024

I see now that you're running on arm64, which is a known issue with Ceph v18.2.4 (see #14502), which Rook enabled by default in v1.14.9. You'll need to use Ceph version v18.2.2 until that is resolved.

@rajha-korithrien
Copy link

I can confirm that changing my helm answers to cause Rook to deploy Ceph v18.2.2 works as expected with the latest version of the Rook Operator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants