client: Fix libcephfs aio metadata corruption. #59987

kotreshhr · 2024-09-26T05:35:21Z

Problem:
With cephfs nfs-ganesha, there were following
asserts hit while doing write on a file.

FAILED ceph_assert((bool)_front == (bool)_size)
FAILED ceph_assert(cap_refs[c] > 0)

Cause:
In aio path, the client_lock was not being held
in the internal callback after the io is done where it's expected to be taken leading to corruption.

Fix:
Take client_lock in the callback

Fixes: https://tracker.ceph.com/issues/68146
Fixes: https://tracker.ceph.com/issues/68308
Fixes: https://tracker.ceph.com/issues/68309

Contribution Guidelines

To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

Tracker (select at least one)
- References tracker ticket
- Very recent bug; references commit where it was introduced
- New feature (ticket optional)
- Doc update (no ticket needed)
- Code cleanup (no ticket needed)
Component impact
- Affects Dashboard, opened tracker ticket
- Affects Orchestrator, opened tracker ticket
- No impact that needs to be tracked
Documentation (select at least one)
- Updates relevant documentation
- No doc update is appropriate
Tests (select at least one)
- Includes unit test(s)
- Includes integration test(s)
- Includes bug reproducer
- No tests

Show available Jenkins commands

jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox
jenkins test windows
jenkins test rook e2e

kotreshhr · 2024-09-26T05:36:10Z

Need to add test

src/client/Client.cc

kotreshhr · 2024-09-27T08:51:53Z

@vshankar @gregsfortytwo
This solution is incorrect and needs change. I will work on it and refresh the PR.
The client behaves differently with objectcacher enabled and disabled. The async io can happen using two interfaces, objectcacher and filer.

objectcacher
With objectcacher, the client_lock is behaving correctly as it holds the reference for the lock during objectercacher creation and the 'client_lock' is taken at the start of calling context finishers (of objectcacher)
filer
With filer interface, the context finishers don't have an idea of 'client_lock' and hence the failures with nfsganesha.
I am not sure whether we can enable 'objectcacher' (client_oc) with nfsganesha and that should just fix the issue for nfsganesha temporarily ?

This is also the reason that our libcephfs nonblocking io tests are not catching this issue. It is using objectcacher and everything works fine with it.

dparmar18 · 2024-09-27T11:25:35Z

@vshankar @gregsfortytwo This solution is incorrect and needs change. I will work on it and refresh the PR. The client behaves differently with objectcacher enabled and disabled. The async io can happen using two interfaces, objectcacher and filer.

objectcacher
With objectcacher, the client_lock is behaving correctly as it holds the reference for the lock during objectercacher creation and the 'client_lock' is taken at the start of calling context finishers (of objectcacher)

filer
With filer interface, the context finishers don't have an idea of 'client_lock' and hence the failures with nfsganesha.
I am not sure whether we can enable 'objectcacher' (client_oc) with nfsganesha and that should just fix the issue for nfsganesha temporarily ?

This is also the reason that our libcephfs nonblocking io tests are not catching this issue. It is using objectcacher and everything works fine with it.

@vshankar @gregsfortytwo This solution is incorrect and needs change. I will work on it and refresh the PR. The client behaves differently with objectcacher enabled and disabled. The async io can happen using two interfaces, objectcacher and filer.

objectcacher
With objectcacher, the client_lock is behaving correctly as it holds the reference for the lock during objectercacher creation and the 'client_lock' is taken at the start of calling context finishers (of objectcacher)

filer
With filer interface, the context finishers don't have an idea of 'client_lock' and hence the failures with nfsganesha.
I am not sure whether we can enable 'objectcacher' (client_oc) with nfsganesha and that should just fix the issue for nfsganesha temporarily ?

https://github.com/nfs-ganesha/nfs-ganesha/blob/d8a169612a04ec1541f1dbd8ba014936f34b3f75/src/doc/man/ganesha-ceph-config.rst#ceph-

nfs-ganesha runs without oc intentionally

This is also the reason that our libcephfs nonblocking io tests are not catching this issue. It is using objectcacher and everything works fine with it.

I can add some test cases in #54435 (which is yet to merged).

kotreshhr · 2024-09-30T10:34:17Z

@vshankar @gregsfortytwo This solution is incorrect and needs change. I will work on it and refresh the PR. The client behaves differently with objectcacher enabled and disabled. The async io can happen using two interfaces, objectcacher and filer.

objectcacher
With objectcacher, the client_lock is behaving correctly as it holds the reference for the lock during objectercacher creation and the 'client_lock' is taken at the start of calling context finishers (of objectcacher)

filer
With filer interface, the context finishers don't have an idea of 'client_lock' and hence the failures with nfsganesha.
I am not sure whether we can enable 'objectcacher' (client_oc) with nfsganesha and that should just fix the issue for nfsganesha temporarily ?

This is also the reason that our libcephfs nonblocking io tests are not catching this issue. It is using objectcacher and everything works fine with it.

@vshankar @gregsfortytwo This solution is incorrect and needs change. I will work on it and refresh the PR. The client behaves differently with objectcacher enabled and disabled. The async io can happen using two interfaces, objectcacher and filer.

objectcacher
With objectcacher, the client_lock is behaving correctly as it holds the reference for the lock during objectercacher creation and the 'client_lock' is taken at the start of calling context finishers (of objectcacher)

filer
With filer interface, the context finishers don't have an idea of 'client_lock' and hence the failures with nfsganesha.
I am not sure whether we can enable 'objectcacher' (client_oc) with nfsganesha and that should just fix the issue for nfsganesha temporarily ?

https://github.com/nfs-ganesha/nfs-ganesha/blob/d8a169612a04ec1541f1dbd8ba014936f34b3f75/src/doc/man/ganesha-ceph-config.rst#ceph-

nfs-ganesha runs without oc intentionally

This is also the reason that our libcephfs nonblocking io tests are not catching this issue. It is using objectcacher and everything works fine with it.

I can add some test cases in #54435 (which is yet to merged).

For now I have added a test to run the exisiting tests with objectcacher disabled.

kotreshhr · 2024-09-30T11:12:23Z

jenkins test make check

kotreshhr · 2024-10-01T06:18:08Z

jenkins test make check

src/client/Client.cc

vshankar · 2024-10-01T11:00:34Z

src/client/Client.cc

+	// Adjust cap_ref - do get_cap_ref again if get_caps fails. Otherwise,
+	// the cap_ref would go negative when C_Read_Finisher::finish_io does
+	// the final put_cap_ref
+        clnt->get_cap_ref(in, CEPH_CAP_FILE_RD);


When we reach here, we already did a put_cap_ref() and then incrementing the cap reference again. So, if put_cap_ref() is placed below this block, then the get_cap_ref() isn't really required. Am I reading/understanding this right?

Yes you are right. I too thought about it. I think the cap is being given out and requested again because, this involves network call (actual osd call). So to avoid starvation to other requesters, just before network call, it's been dropped and acquired again ?

In objectcacher, the cap is taken only once and they are not dropping and acquring again between multiple reads.

Yes you are right. I too thought about it. I think the cap is being given out and requested again because, this involves network call (actual osd call). So to avoid starvation to other requesters, just before network call, it's been dropped and acquired again ?

Maybe, yes, since put_cap_ref() would flush cap snaps. But I think it suffice here to not put and get again.

ok, shall I go ahead and remove put_cap_ref and get_caps here ?

yes, please.

kotreshhr · 2024-10-03T09:48:05Z

jenkins test make check

kotreshhr · 2024-10-03T11:32:30Z

Rebased and did small nfs test fix test_nfs.py

vshankar · 2024-10-03T17:02:28Z

jenkins retest this please

kotreshhr · 2024-10-04T04:19:31Z

Fixed test_nfs

vshankar · 2024-10-04T05:51:40Z

jenkins test make check

vshankar · 2024-10-04T07:08:48Z

This PR is under test in https://tracker.ceph.com/issues/68382.

kotreshhr · 2024-10-04T10:55:12Z

jenkins test make check

kotreshhr · 2024-10-04T14:03:47Z

jenkins test make check

kotreshhr · 2024-10-04T17:02:54Z

@vshankar The make check failure is because of the flake8 error on the test_nfs.py. I will fix this and rebase.

./tasks/cephfs/test_nfs.py:369:9: F841 local variable 'e' is assigned to but never used

Fixes: https://tracker.ceph.com/issues/68146 Signed-off-by: Kotresh HR <khiremat@redhat.com>

The same bufferlist is used without cleaning for multiple calls. The test 'LlreadvLlwritev' used to fail because of it. Fixed the same. Fixes: https://tracker.ceph.com/issues/68146 Signed-off-by: Kotresh HR <khiremat@redhat.com>

Problem: With cephfs nfs-ganesha, there were following asserts hit while doing write on a file. 1. FAILED ceph_assert((bool)_front == (bool)_size) 2. FAILED ceph_assert(cap_refs[c] > 0) Cause: In aio path, the client_lock was not being held in the internal callback after the io is done where it's expected to be taken leading to corruption. Fix: Take client_lock in the callback Fixes: https://tracker.ceph.com/issues/68146 Signed-off-by: Kotresh HR <khiremat@redhat.com>

When libcephfs aio tests (src/test/client) are run with objectcacher disabled (ceph_test_client --client_oc=false), the TestClient.LlreadvLlwritev fails and core dumps. The client hits the assert 'caps_ref[c]<0'. This patch fixes the same. There is no need to give out cap_ref and take it again between multiple read because of short reads. In some cases, the get_caps used to fail in C_Read_Sync_NonBlocking::finish causing cap_ref to go negative when put_cap_ref is done at last in C_Read_Finish::finish_io Fixes: https://tracker.ceph.com/issues/68308 Signed-off-by: Kotresh HR <khiremat@redhat.com>

The following test fails when run with objectcacher disabled. TestClient.LlreadvLlwritevZeroBytes Failure - nonblocking.cc ceph/src/osdc/Striper.cc: 186: FAILED ceph_assert(len > 0) Traceback: ceph version Development (no_version) squid (dev) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x125) [0x7fc0a340aafe] 2: (ceph::register_assert_context(ceph::common::CephContext*)+0) [0x7fc0a340ad20] 3: (Striper::file_to_extents(ceph::common::CephContext*, file_layout_t const*, ...)+0x184) [0x562727e13ab4] 4: (Striper::file_to_extents(ceph::common::CephContext*, char const*, ...)+0x97) [0x562727e145d1] 5: (Striper::file_to_extents(ceph::common::CephContext*, inodeno_t, ...)+0x75) [0x562727d29520] 6: (Filer::read_trunc(inodeno_t, file_layout_t const*, snapid_t, ...)+0x61) [0x562727d66ea5] 7: (Client::C_Read_Sync_NonBlocking::retry()+0x10c) [0x562727cd8a8e] 8: (Client::_read(Fh*, long, unsigned long, ceph::buffer::v15_2_0::list*, Context*)+0x578) [0x562727d10cb6] 9: (Client::_preadv_pwritev_locked(Fh*, iovec const*, int, long, bool, ...)+0x3a7) [0x562727d18159] 10: (Client::ll_preadv_pwritev(Fh*, iovec const*, int, long, bool, ...)+0x179) [0x562727d18b99] 11: (TestClient_LlreadvLlwritevZeroBytes_Test::TestBody()+0x592) [0x562727ca5352] 12: (void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, ...)+0x1b) [0x562727d9dea3] 13: (void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, ...)+0x80) [0x562727da2b26] 14: (testing::Test::Run()+0xb4) [0x562727d927ae] 15: (testing::TestInfo::Run()+0x104) [0x562727d92988] 16: (testing::TestSuite::Run()+0xb2) [0x562727d92b34] 17: (testing::internal::UnitTestImpl::RunAllTests()+0x36b) [0x562727d95303] 18: (bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, ...)(), char const*)+0x1b) [0x562727d9e15f] 19: (bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, ...)+0x80) [0x562727da3083] 20: (testing::UnitTest::Run()+0x63) [0x562727d92813] 21: (RUN_ALL_TESTS()+0x11) [0x562727c828d9] 22: main() The patch fixes the same. Fixes: https://tracker.ceph.com/issues/68309 Signed-off-by: Kotresh HR <khiremat@redhat.com>

kotreshhr · 2024-10-04T17:07:40Z

@vshankar The make check failure is because of the flake8 error on the test_nfs.py. I will fix this and rebase.
./tasks/cephfs/test_nfs.py:369:9: F841 local variable 'e' is assigned to but never used

Done

vshankar · 2024-10-04T18:15:26Z

@vshankar The make check failure is because of the flake8 error on the test_nfs.py. I will fix this and rebase.

Never mind. I built the change locally before building packages in shaman. Just that when running test I'll point to a custom qa suite branch (saves time).

kotreshhr · 2024-10-05T15:58:01Z

jenkins test make check arm64

* refs/pull/59987/head: client: Fix aio zerobyte file read client: Fix caps_ref[c]<0 assert client: Fix libcephfs aio metadata corruption. test/client: Fix aio nonblocking test qa: Add libcephfs client test with objectcacher disabled qa: Add data read/write test for nfs-ganesha

vshankar

This is good to merge. Preparing run wiki now - will merge soon.

vshankar

https://tracker.ceph.com/projects/cephfs/wiki/Main#wip-vshankar-testing-20241004070739-debug

github-actions bot added the cephfs Ceph File System label Sep 26, 2024

kotreshhr requested a review from vshankar September 26, 2024 05:35

vshankar reviewed Sep 26, 2024

View reviewed changes

src/client/Client.cc Outdated Show resolved Hide resolved

kotreshhr force-pushed the licephfs-aio-nfsganesha branch from 42f75b2 to be26259 Compare September 26, 2024 10:05

dparmar18 reviewed Sep 26, 2024

View reviewed changes

src/client/Client.cc Outdated Show resolved Hide resolved

kotreshhr force-pushed the licephfs-aio-nfsganesha branch from be26259 to d990535 Compare September 30, 2024 08:32

github-actions bot added nfs tests labels Sep 30, 2024

vshankar reviewed Oct 1, 2024

View reviewed changes

src/client/Client.cc Outdated Show resolved Hide resolved

src/client/Client.cc Show resolved Hide resolved

kotreshhr force-pushed the licephfs-aio-nfsganesha branch from d990535 to 3e60a67 Compare October 1, 2024 19:35

dparmar18 reviewed Oct 1, 2024

View reviewed changes

src/client/Client.cc Outdated Show resolved Hide resolved

vshankar reviewed Oct 3, 2024

View reviewed changes

kotreshhr requested a review from a team October 3, 2024 08:41

kotreshhr force-pushed the licephfs-aio-nfsganesha branch 2 times, most recently from 7e54d44 to 1c68510 Compare October 3, 2024 11:11

kotreshhr force-pushed the licephfs-aio-nfsganesha branch from 1c68510 to 3f42326 Compare October 4, 2024 04:19

vshankar added the wip-vshankar-testing label Oct 4, 2024

kotreshhr added 6 commits October 4, 2024 22:35

qa: Add data read/write test for nfs-ganesha

6d8f610

Fixes: https://tracker.ceph.com/issues/68146 Signed-off-by: Kotresh HR <khiremat@redhat.com>

qa: Add libcephfs client test with objectcacher disabled

59b996f

Fixes: https://tracker.ceph.com/issues/68146 Signed-off-by: Kotresh HR <khiremat@redhat.com>

test/client: Fix aio nonblocking test

b5af1c1

The same bufferlist is used without cleaning for multiple calls. The test 'LlreadvLlwritev' used to fail because of it. Fixed the same. Fixes: https://tracker.ceph.com/issues/68146 Signed-off-by: Kotresh HR <khiremat@redhat.com>

kotreshhr force-pushed the licephfs-aio-nfsganesha branch from 3f42326 to 942474c Compare October 4, 2024 17:07

vshankar approved these changes Oct 9, 2024

View reviewed changes

vshankar approved these changes Oct 10, 2024

View reviewed changes

vshankar merged commit 4301208 into ceph:main Oct 10, 2024
11 checks passed

vshankar removed the wip-vshankar-testing label Oct 10, 2024

vshankar mentioned this pull request Oct 10, 2024

squid: client: Fix libcephfs aio metadata corruption. #60245

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

client: Fix libcephfs aio metadata corruption. #59987

client: Fix libcephfs aio metadata corruption. #59987

kotreshhr commented Sep 26, 2024 •

edited

Loading

kotreshhr commented Sep 26, 2024

kotreshhr commented Sep 27, 2024

dparmar18 commented Sep 27, 2024

kotreshhr commented Sep 30, 2024

kotreshhr commented Sep 30, 2024

kotreshhr commented Oct 1, 2024

vshankar Oct 1, 2024

kotreshhr Oct 3, 2024

vshankar Oct 3, 2024

kotreshhr Oct 3, 2024

vshankar Oct 3, 2024

kotreshhr Oct 3, 2024

kotreshhr commented Oct 3, 2024

kotreshhr commented Oct 3, 2024

vshankar commented Oct 3, 2024

kotreshhr commented Oct 4, 2024

vshankar commented Oct 4, 2024

vshankar commented Oct 4, 2024

kotreshhr commented Oct 4, 2024

kotreshhr commented Oct 4, 2024

kotreshhr commented Oct 4, 2024

kotreshhr commented Oct 4, 2024

vshankar commented Oct 4, 2024

kotreshhr commented Oct 5, 2024

vshankar left a comment

vshankar left a comment

client: Fix libcephfs aio metadata corruption. #59987

client: Fix libcephfs aio metadata corruption. #59987

Conversation

kotreshhr commented Sep 26, 2024 • edited Loading

Contribution Guidelines

Checklist

kotreshhr commented Sep 26, 2024

kotreshhr commented Sep 27, 2024

dparmar18 commented Sep 27, 2024

kotreshhr commented Sep 30, 2024

kotreshhr commented Sep 30, 2024

kotreshhr commented Oct 1, 2024

vshankar Oct 1, 2024

Choose a reason for hiding this comment

kotreshhr Oct 3, 2024

Choose a reason for hiding this comment

vshankar Oct 3, 2024

Choose a reason for hiding this comment

kotreshhr Oct 3, 2024

Choose a reason for hiding this comment

vshankar Oct 3, 2024

Choose a reason for hiding this comment

kotreshhr Oct 3, 2024

Choose a reason for hiding this comment

kotreshhr commented Oct 3, 2024

kotreshhr commented Oct 3, 2024

vshankar commented Oct 3, 2024

kotreshhr commented Oct 4, 2024

vshankar commented Oct 4, 2024

vshankar commented Oct 4, 2024

kotreshhr commented Oct 4, 2024

kotreshhr commented Oct 4, 2024

kotreshhr commented Oct 4, 2024

kotreshhr commented Oct 4, 2024

vshankar commented Oct 4, 2024

kotreshhr commented Oct 5, 2024

vshankar left a comment

Choose a reason for hiding this comment

vshankar left a comment

Choose a reason for hiding this comment

kotreshhr commented Sep 26, 2024 •

edited

Loading