-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem with OSD Prepare #14754
Comments
@pomland-94 the issue template requests versions for Rook and Ceph as well as other fields. Please fill in all fields so we can help triage more effectively
|
Is this a bug report or feature request?
Deviation from expected behavior: Expected behavior: How to reproduce it (minimal and precise): File(s) to submit:
[2024-09-21 20:33:50,685][ceph_volume.util.disk][INFO ] opening device /dev/sdb to check for BlueStore label
[2024-09-21 20:33:50,686][ceph_volume.process][INFO ] Running command: /usr/sbin/udevadm info --query=property /dev/sdb
[2024-09-21 20:33:50,696][ceph_volume.process][INFO ] stdout DEVPATH=/devices/pci0000:00/0000:00:02.5/0000:06:00.0/virtio4/host0/target0:0:0/0:0:0:2/block/sdb
[2024-09-21 20:33:50,696][ceph_volume.process][INFO ] stdout DEVNAME=/dev/sdb
[2024-09-21 20:33:50,696][ceph_volume.process][INFO ] stdout DEVTYPE=disk
[2024-09-21 20:33:50,696][ceph_volume.process][INFO ] stdout DISKSEQ=12
[2024-09-21 20:33:50,696][ceph_volume.process][INFO ] stdout MAJOR=8
[2024-09-21 20:33:50,696][ceph_volume.process][INFO ] stdout MINOR=16
[2024-09-21 20:33:50,696][ceph_volume.process][INFO ] stdout SUBSYSTEM=block
[2024-09-21 20:33:50,696][ceph_volume.process][INFO ] stdout USEC_INITIALIZED=6312815107
[2024-09-21 20:33:50,696][ceph_volume.process][INFO ] stdout ID_SCSI=1
[2024-09-21 20:33:50,696][ceph_volume.process][INFO ] stdout ID_VENDOR=HC
[2024-09-21 20:33:50,696][ceph_volume.process][INFO ] stdout ID_VENDOR_ENC=HC\x20\x20\x20\x20\x20\x20
[2024-09-21 20:33:50,696][ceph_volume.process][INFO ] stdout ID_MODEL=Volume
[2024-09-21 20:33:50,696][ceph_volume.process][INFO ] stdout ID_MODEL_ENC=Volume\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20
[2024-09-21 20:33:50,696][ceph_volume.process][INFO ] stdout ID_REVISION=2.5+
[2024-09-21 20:33:50,696][ceph_volume.process][INFO ] stdout ID_TYPE=disk
[2024-09-21 20:33:50,697][ceph_volume.process][INFO ] stdout ID_SERIAL=0HC_Volume_101330090
[2024-09-21 20:33:50,697][ceph_volume.process][INFO ] stdout ID_SERIAL_SHORT=101330090
[2024-09-21 20:33:50,697][ceph_volume.process][INFO ] stdout ID_SCSI_SERIAL=101330090
[2024-09-21 20:33:50,697][ceph_volume.process][INFO ] stdout ID_BUS=scsi
[2024-09-21 20:33:50,697][ceph_volume.process][INFO ] stdout ID_PATH=pci-0000:06:00.0-scsi-0:0:0:2
[2024-09-21 20:33:50,697][ceph_volume.process][INFO ] stdout ID_PATH_TAG=pci-0000_06_00_0-scsi-0_0_0_2
[2024-09-21 20:33:50,697][ceph_volume.process][INFO ] stdout DEVLINKS=/dev/disk/by-id/scsi-0HC_Volume_101330090 /dev/disk/by-path/pci-0000:06:00.0-scsi-0:0:0:2 /dev/disk/by-diskseq/12
[2024-09-21 20:33:50,697][ceph_volume.process][INFO ] stdout TAGS=:systemd:
[2024-09-21 20:33:50,697][ceph_volume.process][INFO ] stdout CURRENT_TAGS=:systemd:
[2024-09-21 20:33:50,697][ceph_volume.process][INFO ] Running command: /usr/bin/ceph-bluestore-tool show-label --dev /dev/sdb
[2024-09-21 20:33:50,729][ceph_volume.process][INFO ] stderr unable to read label for /dev/sdb: (2) No such file or directory
[2024-09-21 20:33:50,730][ceph_volume.process][INFO ] stderr 2024-09-21T20:33:50.721+0000 ffffaa316040 -1 bluestore(/dev/sdb) _read_bdev_label unable to decode label at offset 102: void bluestore_bdev_label_t::decode(ceph::buffer::v15_2_0::list::const_iterator&) decode past end of struct encoding: Malformed input [buffer:3]
[2024-09-21 20:33:50,730][ceph_volume.process][INFO ] Running command: /usr/sbin/blkid -c /dev/null -p /dev/sdb
[2024-09-21 20:33:50,755][ceph_volume.process][INFO ] Running command: /usr/bin/ceph-authtool --gen-print-key
[2024-09-21 20:33:50,778][ceph_volume.process][INFO ] Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 8c3272ec-5983-4709-bcc7-69b83fa1bbc0
[2024-09-21 20:33:51,198][ceph_volume.process][INFO ] stdout 0
[2024-09-21 20:33:51,198][ceph_volume.process][INFO ] Running command: /usr/bin/ceph-authtool --gen-print-key
[2024-09-21 20:33:51,229][ceph_volume.process][INFO ] Running command: /usr/bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-0
[2024-09-21 20:33:51,233][ceph_volume.util.system][INFO ] CEPH_VOLUME_SKIP_RESTORECON environ is set, will not call restorecon
[2024-09-21 20:33:51,234][ceph_volume.process][INFO ] Running command: /usr/bin/chown -R ceph:ceph /dev/sdb
[2024-09-21 20:33:51,239][ceph_volume.process][INFO ] Running command: /usr/bin/ln -s /dev/sdb /var/lib/ceph/osd/ceph-0/block
[2024-09-21 20:33:51,244][ceph_volume.process][INFO ] Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o /var/lib/ceph/osd/ceph-0/activate.monmap
[2024-09-21 20:33:51,678][ceph_volume.process][INFO ] stderr got monmap epoch 3
[2024-09-21 20:33:51,705][ceph_volume.util.prepare][INFO ] Creating keyring file for osd.0
[2024-09-21 20:33:51,705][ceph_volume.process][INFO ] Running command: /usr/bin/ceph-authtool /var/lib/ceph/osd/ceph-0/keyring --create-keyring --name osd.0 --add-key AQCuLe9mQ6QxLhAA4cEJJ6+xclcEmE0vMc3TGA==
[2024-09-21 20:33:51,742][ceph_volume.process][INFO ] Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-0/keyring
[2024-09-21 20:33:51,747][ceph_volume.process][INFO ] Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-0/
[2024-09-21 20:33:51,751][ceph_volume.process][INFO ] Running command: /usr/bin/ceph-osd --cluster ceph --osd-objectstore bluestore --mkfs -i 0 --monmap /var/lib/ceph/osd/ceph-0/activate.monmap --keyfile - --osd-data /var/lib/ceph/osd/ceph-0/ --osd-uuid 8c3272ec-5983-4709-bcc7-69b83fa1bbc0 --setuser ceph --setgroup ceph
[2024-09-21 20:33:51,785][ceph_volume.devices.raw.prepare][ERROR ] raw prepare was unable to complete
Traceback (most recent call last):
File "/usr/lib/python3.9/site-packages/ceph_volume/devices/raw/prepare.py", line 80, in safe_prepare
self.prepare()
File "/usr/lib/python3.9/site-packages/ceph_volume/decorators.py", line 16, in is_root
return func(*a, **kw)
File "/usr/lib/python3.9/site-packages/ceph_volume/devices/raw/prepare.py", line 118, in prepare
prepare_bluestore(
File "/usr/lib/python3.9/site-packages/ceph_volume/devices/raw/prepare.py", line 53, in prepare_bluestore
prepare_utils.osd_mkfs_bluestore(
File "/usr/lib/python3.9/site-packages/ceph_volume/util/prepare.py", line 459, in osd_mkfs_bluestore
raise RuntimeError('Command failed with exit code %s: %s' % (returncode, ' '.join(command)))
RuntimeError: Command failed with exit code -11: /usr/bin/ceph-osd --cluster ceph --osd-objectstore bluestore --mkfs -i 0 --monmap /var/lib/ceph/osd/ceph-0/activate.monmap --keyfile - --osd-data /var/lib/ceph/osd/ceph-0/ --osd-uuid 8c3272ec-5983-4709-bcc7-69b83fa1bbc0 --setuser ceph --setgroup ceph
[2024-09-21 20:33:51,786][ceph_volume.devices.raw.prepare][INFO ] will rollback OSD ID creation
[2024-09-21 20:33:51,787][ceph_volume.process][INFO ] Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd purge-new osd.0 --yes-i-really-mean-it
[2024-09-21 20:33:52,173][ceph_volume.process][INFO ] stderr purged osd.0
[2024-09-21 20:33:52,199][ceph_volume.process][INFO ] Running command: /usr/bin/systemctl is-active ceph-osd@0
[2024-09-21 20:33:52,209][ceph_volume.process][INFO ] stderr System has not been booted with systemd as init system (PID 1). Can't operate.
[2024-09-21 20:33:52,209][ceph_volume.process][INFO ] stderr Failed to connect to bus: Host is down
[2024-09-21 20:33:52,214][ceph_volume.util.system][INFO ] Executable lvs found on the host, will use /sbin/lvs
[2024-09-21 20:33:52,214][ceph_volume.process][INFO ] Running command: nsenter --mount=/rootfs/proc/1/ns/mnt --ipc=/rootfs/proc/1/ns/ipc --net=/rootfs/proc/1/ns/net --uts=/rootfs/proc/1/ns/uts /sbin/lvs --noheadings --readonly --separator=";" -a --units=b --nosuffix -S tags={ceph.osd_id=0} -o lv_tags,lv_path,lv_name,vg_name,lv_uuid,lv_size
[2024-09-21 20:33:52,289][ceph_volume][ERROR ] exception caught by decorator
Traceback (most recent call last):
File "/usr/lib/python3.9/site-packages/ceph_volume/devices/raw/prepare.py", line 80, in safe_prepare
self.prepare()
File "/usr/lib/python3.9/site-packages/ceph_volume/decorators.py", line 16, in is_root
return func(*a, **kw)
File "/usr/lib/python3.9/site-packages/ceph_volume/devices/raw/prepare.py", line 118, in prepare
prepare_bluestore(
File "/usr/lib/python3.9/site-packages/ceph_volume/devices/raw/prepare.py", line 53, in prepare_bluestore
prepare_utils.osd_mkfs_bluestore(
File "/usr/lib/python3.9/site-packages/ceph_volume/util/prepare.py", line 459, in osd_mkfs_bluestore
raise RuntimeError('Command failed with exit code %s: %s' % (returncode, ' '.join(command)))
RuntimeError: Command failed with exit code -11: /usr/bin/ceph-osd --cluster ceph --osd-objectstore bluestore --mkfs -i 0 --monmap /var/lib/ceph/osd/ceph-0/activate.monmap --keyfile - --osd-data /var/lib/ceph/osd/ceph-0/ --osd-uuid 8c3272ec-5983-4709-bcc7-69b83fa1bbc0 --setuser ceph --setgroup ceph
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.9/site-packages/ceph_volume/decorators.py", line 59, in newfunc
return f(*a, **kw)
File "/usr/lib/python3.9/site-packages/ceph_volume/main.py", line 153, in main
terminal.dispatch(self.mapper, subcommand_args)
File "/usr/lib/python3.9/site-packages/ceph_volume/terminal.py", line 194, in dispatch
instance.main()
File "/usr/lib/python3.9/site-packages/ceph_volume/devices/raw/main.py", line 32, in main
terminal.dispatch(self.mapper, self.argv)
File "/usr/lib/python3.9/site-packages/ceph_volume/terminal.py", line 194, in dispatch
instance.main()
File "/usr/lib/python3.9/site-packages/ceph_volume/devices/raw/prepare.py", line 160, in main
self.safe_prepare(self.args)
File "/usr/lib/python3.9/site-packages/ceph_volume/devices/raw/prepare.py", line 84, in safe_prepare
rollback_osd(self.args, self.osd_id)
File "/usr/lib/python3.9/site-packages/ceph_volume/devices/lvm/common.py", line 35, in rollback_osd
Zap(['--destroy', '--osd-id', osd_id]).main()
File "/usr/lib/python3.9/site-packages/ceph_volume/devices/lvm/zap.py", line 407, in main
self.zap_osd()
File "/usr/lib/python3.9/site-packages/ceph_volume/decorators.py", line 16, in is_root
return func(*a, **kw)
File "/usr/lib/python3.9/site-packages/ceph_volume/devices/lvm/zap.py", line 305, in zap_osd
devices = find_associated_devices(self.args.osd_id, self.args.osd_fsid)
File "/usr/lib/python3.9/site-packages/ceph_volume/devices/lvm/zap.py", line 88, in find_associated_devices
raise RuntimeError('Unable to find any LV for zapping OSD: '
RuntimeError: Unable to find any LV for zapping OSD: 0
2024-09-21 20:33:52.371758 C | rookcmd: failed to configure devices: failed to initialize osd: failed to run ceph-volume raw command. Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 8c3272ec-5983-4709-bcc7-69b83fa1bbc0
Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /usr/bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-0
Running command: /usr/bin/chown -R ceph:ceph /dev/sdb
Running command: /usr/bin/ln -s /dev/sdb /var/lib/ceph/osd/ceph-0/block
Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o /var/lib/ceph/osd/ceph-0/activate.monmap
stderr: got monmap epoch 3
--> Creating keyring file for osd.0
Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-0/keyring
Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-0/
Running command: /usr/bin/ceph-osd --cluster ceph --osd-objectstore bluestore --mkfs -i 0 --monmap /var/lib/ceph/osd/ceph-0/activate.monmap --keyfile - --osd-data /var/lib/ceph/osd/ceph-0/ --osd-uuid 8c3272ec-5983-4709-bcc7-69b83fa1bbc0 --setuser ceph --setgroup ceph
--> Was unable to complete a new OSD, will rollback changes
Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd purge-new osd.0 --yes-i-really-mean-it
stderr: purged osd.0
Traceback (most recent call last):
File "/usr/lib/python3.9/site-packages/ceph_volume/devices/raw/prepare.py", line 80, in safe_prepare
self.prepare()
File "/usr/lib/python3.9/site-packages/ceph_volume/decorators.py", line 16, in is_root
return func(*a, **kw)
File "/usr/lib/python3.9/site-packages/ceph_volume/devices/raw/prepare.py", line 118, in prepare
prepare_bluestore(
File "/usr/lib/python3.9/site-packages/ceph_volume/devices/raw/prepare.py", line 53, in prepare_bluestore
prepare_utils.osd_mkfs_bluestore(
File "/usr/lib/python3.9/site-packages/ceph_volume/util/prepare.py", line 459, in osd_mkfs_bluestore
raise RuntimeError('Command failed with exit code %s: %s' % (returncode, ' '.join(command)))
RuntimeError: Command failed with exit code -11: /usr/bin/ceph-osd --cluster ceph --osd-objectstore bluestore --mkfs -i 0 --monmap /var/lib/ceph/osd/ceph-0/activate.monmap --keyfile - --osd-data /var/lib/ceph/osd/ceph-0/ --osd-uuid 8c3272ec-5983-4709-bcc7-69b83fa1bbc0 --setuser ceph --setgroup ceph
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/sbin/ceph-volume", line 33, in <module>
sys.exit(load_entry_point('ceph-volume==1.0.0', 'console_scripts', 'ceph-volume')())
File "/usr/lib/python3.9/site-packages/ceph_volume/main.py", line 41, in __init__
self.main(self.argv)
File "/usr/lib/python3.9/site-packages/ceph_volume/decorators.py", line 59, in newfunc
return f(*a, **kw)
File "/usr/lib/python3.9/site-packages/ceph_volume/main.py", line 153, in main
terminal.dispatch(self.mapper, subcommand_args)
File "/usr/lib/python3.9/site-packages/ceph_volume/terminal.py", line 194, in dispatch
instance.main()
File "/usr/lib/python3.9/site-packages/ceph_volume/devices/raw/main.py", line 32, in main
terminal.dispatch(self.mapper, self.argv)
File "/usr/lib/python3.9/site-packages/ceph_volume/terminal.py", line 194, in dispatch
instance.main()
File "/usr/lib/python3.9/site-packages/ceph_volume/devices/raw/prepare.py", line 160, in main
self.safe_prepare(self.args)
File "/usr/lib/python3.9/site-packages/ceph_volume/devices/raw/prepare.py", line 84, in safe_prepare
rollback_osd(self.args, self.osd_id)
File "/usr/lib/python3.9/site-packages/ceph_volume/devices/lvm/common.py", line 35, in rollback_osd
Zap(['--destroy', '--osd-id', osd_id]).main()
File "/usr/lib/python3.9/site-packages/ceph_volume/devices/lvm/zap.py", line 407, in main
self.zap_osd()
File "/usr/lib/python3.9/site-packages/ceph_volume/decorators.py", line 16, in is_root
return func(*a, **kw)
File "/usr/lib/python3.9/site-packages/ceph_volume/devices/lvm/zap.py", line 305, in zap_osd
devices = find_associated_devices(self.args.osd_id, self.args.osd_fsid)
File "/usr/lib/python3.9/site-packages/ceph_volume/devices/lvm/zap.py", line 88, in find_associated_devices
raise RuntimeError('Unable to find any LV for zapping OSD: '
RuntimeError: Unable to find any LV for zapping OSD: 0: exit status 1 Environment:
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"
VERSION_ID="12"
VERSION="12 (bookworm)"
VERSION_CODENAME=bookworm
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
Linux admingw-01 6.1.0-25-arm64 #1 SMP Debian 6.1.106-3 (2024-08-26) aarch64 GNU/Linux
Client Version: v1.30.1
Server Version v1.30.1
|
We don't have any known ceph/rook issues with the current 1.15 manifests. The error code being returned is 11, which is Check to make sure that |
My Disk are completely cleaned with sgdisk --zap-all /dev/sdb And not in use. Here are my Operator logs: |
Ok, i try the rook installation from an old release version 1.14.5 and everything works fine, so that seem sto be a Problem with the Version 1.15 which i have to check. I will Setup some test clusters and do it version by version, when i have any news i will comment them here. |
I can confirm encountering a similar error, but when installing on top of PVC. We install the rook operator via helm. We install on top of PVC so the underlying storage is certain to be clean. Our helm answers for installing the ceph cluster are:
note The small sizes in the above are because we are trying to quickly iterate to discover which version of Rook introduced the failure. The method to reproduce is:
To install the operator and then
To install the ceph cluster. We have been testing in VMs to quickly iterate over rook versions. It works fine for 1.12.1, 1.13.8, 1.14.0, 1.14.6, 1.14.7, 1.14.8 but fails on 1.14.9, 1.14.11, 1.15.1, 1.15.2 (those are the versions we tested) Have added the complete output of the OSD prepare pod from rook-ceph-osd-prepare-set1-data-0dnk6q-vj9n8.log I find this part interesting:
because earlier in the log, it finds
There are very similar if not identical logs in 1.41.11, 1.15.1 and 1.15.2 An example of a working setup (from 1.14.0) can be found in: Are these similar enough to be in this same issue, or should I open a new issue? |
When i use "rook Version 1.14.8" everything works fine, when i use "rook version 1.14.9" and above i get the errors inside the osd prepare Pods. So it seems that there is a change at Version 1.14.9 which is responsible for the Problem, the question is what change and why.... |
I can confirm that changing my helm answers to cause Rook to deploy Ceph v18.2.2 works as expected with the latest version of the Rook Operator. |
When I try to install Rook Ceph as described in the QuickStart Guide, I get an error when it prepares the OSDs. All config files (operator.yaml, crd.yaml, common.yaml) were not modified.
I use Kubernetes 1.30.4 on Debian 12 (ARM64), this are the Pod logs from one of the old-prepare Pods:
The text was updated successfully, but these errors were encountered: