-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ceph-exporters are wrong on two rook/ceph clusters in one k8s environment #14790
Comments
@avanthakkar please take a look into this. This is an issue for multi cluster users. I wonder if this is a limitation of the exporter itself, or just in how it is configured for Rook. |
@llamerada-jp is the issue resolved if the 2 clusters use different |
The exporter host path is dependent on the |
@BlaineEXE @travisn
Our clusters are PVC-based cluster. Then rook doesn't use |
If there are multiple clusters, `dataDirHostPath` must be unique for each cluster. However, it's not documented yet. related issue: rook#14790 Signed-off-by: Satoru Takeuchi <satoru.takeuchi@gmail.com>
If there are multiple clusters, `dataDirHostPath` must be unique for each cluster. However, it's not documented yet. In addition, `ceph-teardown.md` was also updated because the original document assumes that `dataDirHostPath` is `/var/lib/rook` and the files for a cluster `rook-ceph` is under `/var/lib/rook/rook-ceph`. related issue: rook#14790 Signed-off-by: Satoru Takeuchi <satoru.takeuchi@gmail.com>
If there are multiple clusters, `dataDirHostPath` must be unique for each cluster. However, it's not documented yet. In addition, `ceph-teardown.md` was also updated because the original document assumes that `dataDirHostPath` is `/var/lib/rook` and the files for a cluster `rook-ceph` is under `/var/lib/rook/rook-ceph`. related issue: rook#14790 Signed-off-by: Satoru Takeuchi <satoru.takeuchi@gmail.com>
If there are multiple clusters, `dataDirHostPath` must be unique for each cluster. However, it's not documented yet. In addition, `ceph-teardown.md` was also updated because the original document assumes that `dataDirHostPath` is `/var/lib/rook` and the files for a cluster `rook-ceph` is under `/var/lib/rook/rook-ceph`. related issue: #14790 Signed-off-by: Satoru Takeuchi <satoru.takeuchi@gmail.com> (cherry picked from commit 62d9933)
Is this a bug report or feature request?
Deviation from expected behavior:
I found that metrics from ceph-exporters are wrong on two rook/ceph clusters environment. Both ceph-exporter report metrics of OSDs belonging to both rook/ceph clusters.
I guess it's due to the hostPath,
/var/lib/rook/exporter
, is shared with all clusters.Expected behavior:
A ceph-exporter pod only reports metrics of the cluster to which the pod belongs.
How to reproduce it (minimal and precise):
File(s) to submit:
Logs to submit:
I attached the screenshot of Grafana which shows the metrics,
ceph_osd_stat_bytes_used
.I have two clusters,
ceph-canary-object-store
andceph-canary-block
. osd.2 of both clusters exist in the same node.A query,
ceph_osd_stat_bytes_used{namespace="ceph-canary-object-store", ceph_daemon="osd.2"}
, reports two metrics that corresponding to the green line and the yellow line.However, only one line should be shown here because ceph-canary-object-store has only one "osd.2". The yellow line corresponds to the metrics of osd.2 of ceph-canary-block.
The fact, the yellow line matches the blue line corresponding to
ceph_osd_stat_bytes_used{namespace="ceph-canary-block", ceph_daemon="osd.2"}
, implies this behavior.Cluster Status to submit:
ceph health
command indicatesHEALTH_OK
.Environment:
OS: Flatcar Container Linux 3815.2.5
Kernel: Linux 6.1.96-flatcar
Cloud provider or hardware configuration: On premise hardware
Rook version (use
rook version
inside of a Rook Pod):2024/10/01 08:16:59 maxprocs: Leaving GOMAXPROCS=56: CPU quota undefined
rook: v1.15.1
go: go1.22.7
Storage backend version (e.g. for ceph do
ceph -v
):ceph version 18.2.4 (e7ad5345525c7aa95470c26863873b581076945d) reef (stable)
Kubernetes version (use
kubectl version
):Client Version: v1.29.7
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.7
Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift): On premise neco](https://github.com/cybozu-go/neco%29)
Storage backend status (e.g. for Ceph use
ceph health
in the Rook Ceph toolbox):](https://rook.io/docs/rook/latest-release/Troubleshooting/ceph-toolbox/#interactive-toolbox%29%29:) HEALTH_OKThe text was updated successfully, but these errors were encountered: