Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ceph-exporters are wrong on two rook/ceph clusters in one k8s environment #14790

Closed
llamerada-jp opened this issue Oct 2, 2024 · 4 comments
Closed
Labels

Comments

@llamerada-jp
Copy link
Contributor

Is this a bug report or feature request?

  • Bug Report

Deviation from expected behavior:

I found that metrics from ceph-exporters are wrong on two rook/ceph clusters environment. Both ceph-exporter report metrics of OSDs belonging to both rook/ceph clusters.
I guess it's due to the hostPath, /var/lib/rook/exporter, is shared with all clusters.

Expected behavior:

A ceph-exporter pod only reports metrics of the cluster to which the pod belongs.

How to reproduce it (minimal and precise):

  1. Create two rook/ceph clusters Cluster A and B. Ceph versions are >= 18 to enable ceph-exporter.
  2. Deploy at least one OSD for each cluster on the same node. Assuming OSD A for Cluster A and OSD B for Cluster B.
  3. Scrape OSD metrics from ceph-exporter on both clusters.
  4. See OSD metrics of both clusters. Then Both cluster reports the metrics of OSD A and OSB B.

File(s) to submit:

Logs to submit:

I attached the screenshot of Grafana which shows the metrics, ceph_osd_stat_bytes_used.
metrics
I have two clusters, ceph-canary-object-store and ceph-canary-block. osd.2 of both clusters exist in the same node.
A query, ceph_osd_stat_bytes_used{namespace="ceph-canary-object-store", ceph_daemon="osd.2"}, reports two metrics that corresponding to the green line and the yellow line.
However, only one line should be shown here because ceph-canary-object-store has only one "osd.2". The yellow line corresponds to the metrics of osd.2 of ceph-canary-block.
The fact, the yellow line matches the blue line corresponding to ceph_osd_stat_bytes_used{namespace="ceph-canary-block", ceph_daemon="osd.2"}, implies this behavior.

Cluster Status to submit:

ceph health command indicates HEALTH_OK.

Environment:

  • OS: Flatcar Container Linux 3815.2.5

  • Kernel: Linux 6.1.96-flatcar

  • Cloud provider or hardware configuration: On premise hardware

  • Rook version (use rook version inside of a Rook Pod):
    2024/10/01 08:16:59 maxprocs: Leaving GOMAXPROCS=56: CPU quota undefined
    rook: v1.15.1
    go: go1.22.7

  • Storage backend version (e.g. for ceph do ceph -v):
    ceph version 18.2.4 (e7ad5345525c7aa95470c26863873b581076945d) reef (stable)

  • Kubernetes version (use kubectl version):
    Client Version: v1.29.7
    Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
    Server Version: v1.29.7

  • Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift): On premise neco](https://github.com/cybozu-go/neco%29)

  • Storage backend status (e.g. for Ceph use ceph health in the Rook Ceph toolbox):](https://rook.io/docs/rook/latest-release/Troubleshooting/ceph-toolbox/#interactive-toolbox%29%29:) HEALTH_OK

@BlaineEXE
Copy link
Member

@avanthakkar please take a look into this. This is an issue for multi cluster users. I wonder if this is a limitation of the exporter itself, or just in how it is configured for Rook.

@BlaineEXE
Copy link
Member

@llamerada-jp is the issue resolved if the 2 clusters use different dataDirHostPath? That might help us come to a documentation resolution or code change if possible.

@travisn
Copy link
Member

travisn commented Oct 2, 2024

The exporter host path is dependent on the dataDirHostPath (see here), so that should fix it. I'm also surprised you haven't seen other issues if the clusters are using the same dataDirHostPath. For example, the mon host path is also directly in the dataDirHostPath.

@llamerada-jp
Copy link
Contributor Author

@BlaineEXE @travisn
Thank you for your reply.

I'm also surprised you haven't seen other issues if the clusters are using the same dataDirHostPath. For example, the mon host path is also directly in the dataDirHostPath.

Our clusters are PVC-based cluster. Then rook doesn't use /var/lib/rook/ for mons.
We are considering changing dataDirHostPath setting based on your reply. So our clusters has worked fine. I'll close this ticket.

satoru-takeuchi added a commit to cybozu-go/rook that referenced this issue Oct 30, 2024
If there are multiple clusters, `dataDirHostPath` must be unique
for each cluster. However, it's not documented yet.

related issue:
rook#14790

Signed-off-by: Satoru Takeuchi <satoru.takeuchi@gmail.com>
satoru-takeuchi added a commit to cybozu-go/rook that referenced this issue Oct 31, 2024
If there are multiple clusters, `dataDirHostPath` must be unique
for each cluster. However, it's not documented yet.
In addition, `ceph-teardown.md` was also updated because
the original document assumes that `dataDirHostPath` is
`/var/lib/rook` and the files for a cluster `rook-ceph` is
under `/var/lib/rook/rook-ceph`.

related issue:
rook#14790

Signed-off-by: Satoru Takeuchi <satoru.takeuchi@gmail.com>
satoru-takeuchi added a commit to cybozu-go/rook that referenced this issue Oct 31, 2024
If there are multiple clusters, `dataDirHostPath` must be unique
for each cluster. However, it's not documented yet.
In addition, `ceph-teardown.md` was also updated because
the original document assumes that `dataDirHostPath` is
`/var/lib/rook` and the files for a cluster `rook-ceph` is
under `/var/lib/rook/rook-ceph`.

related issue:
rook#14790

Signed-off-by: Satoru Takeuchi <satoru.takeuchi@gmail.com>
mergify bot pushed a commit that referenced this issue Oct 31, 2024
If there are multiple clusters, `dataDirHostPath` must be unique
for each cluster. However, it's not documented yet.
In addition, `ceph-teardown.md` was also updated because
the original document assumes that `dataDirHostPath` is
`/var/lib/rook` and the files for a cluster `rook-ceph` is
under `/var/lib/rook/rook-ceph`.

related issue:
#14790

Signed-off-by: Satoru Takeuchi <satoru.takeuchi@gmail.com>
(cherry picked from commit 62d9933)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants