Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

client: Fix libcephfs aio metadata corruption. #59987

Merged
merged 6 commits into from
Oct 10, 2024

Conversation

kotreshhr
Copy link
Contributor

@kotreshhr kotreshhr commented Sep 26, 2024

Problem:
With cephfs nfs-ganesha, there were following
asserts hit while doing write on a file.

  1. FAILED ceph_assert((bool)_front == (bool)_size)
  2. FAILED ceph_assert(cap_refs[c] > 0)

Cause:
In aio path, the client_lock was not being held
in the internal callback after the io is done where it's expected to be taken leading to corruption.

Fix:
Take client_lock in the callback

Fixes: https://tracker.ceph.com/issues/68146
Fixes: https://tracker.ceph.com/issues/68308
Fixes: https://tracker.ceph.com/issues/68309

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

@github-actions github-actions bot added the cephfs Ceph File System label Sep 26, 2024
@kotreshhr
Copy link
Contributor Author

Need to add test

src/client/Client.cc Outdated Show resolved Hide resolved
src/client/Client.cc Outdated Show resolved Hide resolved
@kotreshhr
Copy link
Contributor Author

@vshankar @gregsfortytwo
This solution is incorrect and needs change. I will work on it and refresh the PR.
The client behaves differently with objectcacher enabled and disabled. The async io can happen using two interfaces, objectcacher and filer.

  1. objectcacher
    With objectcacher, the client_lock is behaving correctly as it holds the reference for the lock during objectercacher creation and the 'client_lock' is taken at the start of calling context finishers (of objectcacher)
  2. filer
    With filer interface, the context finishers don't have an idea of 'client_lock' and hence the failures with nfsganesha.
    I am not sure whether we can enable 'objectcacher' (client_oc) with nfsganesha and that should just fix the issue for nfsganesha temporarily ?

This is also the reason that our libcephfs nonblocking io tests are not catching this issue. It is using objectcacher and everything works fine with it.

@dparmar18
Copy link
Contributor

@vshankar @gregsfortytwo This solution is incorrect and needs change. I will work on it and refresh the PR. The client behaves differently with objectcacher enabled and disabled. The async io can happen using two interfaces, objectcacher and filer.

  1. objectcacher
    With objectcacher, the client_lock is behaving correctly as it holds the reference for the lock during objectercacher creation and the 'client_lock' is taken at the start of calling context finishers (of objectcacher)
  2. filer
    With filer interface, the context finishers don't have an idea of 'client_lock' and hence the failures with nfsganesha.
    I am not sure whether we can enable 'objectcacher' (client_oc) with nfsganesha and that should just fix the issue for nfsganesha temporarily ?

This is also the reason that our libcephfs nonblocking io tests are not catching this issue. It is using objectcacher and everything works fine with it.

@vshankar @gregsfortytwo This solution is incorrect and needs change. I will work on it and refresh the PR. The client behaves differently with objectcacher enabled and disabled. The async io can happen using two interfaces, objectcacher and filer.

  1. objectcacher
    With objectcacher, the client_lock is behaving correctly as it holds the reference for the lock during objectercacher creation and the 'client_lock' is taken at the start of calling context finishers (of objectcacher)
  2. filer
    With filer interface, the context finishers don't have an idea of 'client_lock' and hence the failures with nfsganesha.
    I am not sure whether we can enable 'objectcacher' (client_oc) with nfsganesha and that should just fix the issue for nfsganesha temporarily ?

https://github.com/nfs-ganesha/nfs-ganesha/blob/d8a169612a04ec1541f1dbd8ba014936f34b3f75/src/doc/man/ganesha-ceph-config.rst#ceph-

nfs-ganesha runs without oc intentionally

This is also the reason that our libcephfs nonblocking io tests are not catching this issue. It is using objectcacher and everything works fine with it.

I can add some test cases in #54435 (which is yet to merged).

@kotreshhr
Copy link
Contributor Author

@vshankar @gregsfortytwo This solution is incorrect and needs change. I will work on it and refresh the PR. The client behaves differently with objectcacher enabled and disabled. The async io can happen using two interfaces, objectcacher and filer.

  1. objectcacher
    With objectcacher, the client_lock is behaving correctly as it holds the reference for the lock during objectercacher creation and the 'client_lock' is taken at the start of calling context finishers (of objectcacher)
  2. filer
    With filer interface, the context finishers don't have an idea of 'client_lock' and hence the failures with nfsganesha.
    I am not sure whether we can enable 'objectcacher' (client_oc) with nfsganesha and that should just fix the issue for nfsganesha temporarily ?

This is also the reason that our libcephfs nonblocking io tests are not catching this issue. It is using objectcacher and everything works fine with it.

@vshankar @gregsfortytwo This solution is incorrect and needs change. I will work on it and refresh the PR. The client behaves differently with objectcacher enabled and disabled. The async io can happen using two interfaces, objectcacher and filer.

  1. objectcacher
    With objectcacher, the client_lock is behaving correctly as it holds the reference for the lock during objectercacher creation and the 'client_lock' is taken at the start of calling context finishers (of objectcacher)
  2. filer
    With filer interface, the context finishers don't have an idea of 'client_lock' and hence the failures with nfsganesha.
    I am not sure whether we can enable 'objectcacher' (client_oc) with nfsganesha and that should just fix the issue for nfsganesha temporarily ?

https://github.com/nfs-ganesha/nfs-ganesha/blob/d8a169612a04ec1541f1dbd8ba014936f34b3f75/src/doc/man/ganesha-ceph-config.rst#ceph-

nfs-ganesha runs without oc intentionally

This is also the reason that our libcephfs nonblocking io tests are not catching this issue. It is using objectcacher and everything works fine with it.

I can add some test cases in #54435 (which is yet to merged).

For now I have added a test to run the exisiting tests with objectcacher disabled.

@kotreshhr
Copy link
Contributor Author

jenkins test make check

1 similar comment
@kotreshhr
Copy link
Contributor Author

jenkins test make check

src/client/Client.cc Outdated Show resolved Hide resolved
src/client/Client.cc Show resolved Hide resolved
src/client/Client.cc Outdated Show resolved Hide resolved
// Adjust cap_ref - do get_cap_ref again if get_caps fails. Otherwise,
// the cap_ref would go negative when C_Read_Finisher::finish_io does
// the final put_cap_ref
clnt->get_cap_ref(in, CEPH_CAP_FILE_RD);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we reach here, we already did a put_cap_ref() and then incrementing the cap reference again. So, if put_cap_ref() is placed below this block, then the get_cap_ref() isn't really required. Am I reading/understanding this right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you are right. I too thought about it. I think the cap is being given out and requested again because, this involves network call (actual osd call). So to avoid starvation to other requesters, just before network call, it's been dropped and acquired again ?

In objectcacher, the cap is taken only once and they are not dropping and acquring again between multiple reads.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you are right. I too thought about it. I think the cap is being given out and requested again because, this involves network call (actual osd call). So to avoid starvation to other requesters, just before network call, it's been dropped and acquired again ?

Maybe, yes, since put_cap_ref() would flush cap snaps. But I think it suffice here to not put and get again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, shall I go ahead and remove put_cap_ref and get_caps here ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, please.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@kotreshhr kotreshhr requested a review from a team October 3, 2024 08:41
@kotreshhr
Copy link
Contributor Author

jenkins test make check

@kotreshhr kotreshhr force-pushed the licephfs-aio-nfsganesha branch 2 times, most recently from 7e54d44 to 1c68510 Compare October 3, 2024 11:11
@kotreshhr
Copy link
Contributor Author

Rebased and did small nfs test fix test_nfs.py

@vshankar
Copy link
Contributor

vshankar commented Oct 3, 2024

jenkins retest this please

@kotreshhr
Copy link
Contributor Author

Fixed test_nfs

@vshankar
Copy link
Contributor

vshankar commented Oct 4, 2024

jenkins test make check

@vshankar
Copy link
Contributor

vshankar commented Oct 4, 2024

This PR is under test in https://tracker.ceph.com/issues/68382.

@kotreshhr
Copy link
Contributor Author

jenkins test make check

@kotreshhr
Copy link
Contributor Author

jenkins test make check

@kotreshhr
Copy link
Contributor Author

@vshankar The make check failure is because of the flake8 error on the test_nfs.py. I will fix this and rebase.

./tasks/cephfs/test_nfs.py:369:9: F841 local variable 'e' is assigned to but never used

Fixes: https://tracker.ceph.com/issues/68146
Signed-off-by: Kotresh HR <khiremat@redhat.com>
The same bufferlist is used without cleaning
for multiple calls. The test 'LlreadvLlwritev'
used to fail because of it. Fixed the same.

Fixes: https://tracker.ceph.com/issues/68146
Signed-off-by: Kotresh HR <khiremat@redhat.com>
Problem:
With cephfs nfs-ganesha, there were following
asserts hit while doing write on a file.

1. FAILED ceph_assert((bool)_front == (bool)_size)
2. FAILED ceph_assert(cap_refs[c] > 0)

Cause:
In aio path, the client_lock was not being held
in the internal callback after the io is done where
it's expected to be taken leading to corruption.

Fix:
Take client_lock in the callback

Fixes: https://tracker.ceph.com/issues/68146
Signed-off-by: Kotresh HR <khiremat@redhat.com>
When libcephfs aio tests (src/test/client) are run
with objectcacher disabled (ceph_test_client --client_oc=false),
the TestClient.LlreadvLlwritev fails and core dumps. The client
hits the assert 'caps_ref[c]<0'.

This patch fixes the same. There is no need to give out cap_ref
and take it again between multiple read because of short reads.
In some cases, the get_caps used to fail in C_Read_Sync_NonBlocking::finish
causing cap_ref to go negative when put_cap_ref is done at last in
C_Read_Finish::finish_io

Fixes: https://tracker.ceph.com/issues/68308
Signed-off-by: Kotresh HR <khiremat@redhat.com>
The following test fails when run with objectcacher
disabled.

TestClient.LlreadvLlwritevZeroBytes Failure - nonblocking.cc

ceph/src/osdc/Striper.cc: 186: FAILED ceph_assert(len > 0)

Traceback:
 ceph version Development (no_version) squid (dev)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x125) [0x7fc0a340aafe]
 2: (ceph::register_assert_context(ceph::common::CephContext*)+0) [0x7fc0a340ad20]
 3: (Striper::file_to_extents(ceph::common::CephContext*, file_layout_t const*, ...)+0x184) [0x562727e13ab4]
 4: (Striper::file_to_extents(ceph::common::CephContext*, char const*, ...)+0x97) [0x562727e145d1]
 5: (Striper::file_to_extents(ceph::common::CephContext*, inodeno_t, ...)+0x75) [0x562727d29520]
 6: (Filer::read_trunc(inodeno_t, file_layout_t const*, snapid_t, ...)+0x61) [0x562727d66ea5]
 7: (Client::C_Read_Sync_NonBlocking::retry()+0x10c) [0x562727cd8a8e]
 8: (Client::_read(Fh*, long, unsigned long, ceph::buffer::v15_2_0::list*, Context*)+0x578) [0x562727d10cb6]
 9: (Client::_preadv_pwritev_locked(Fh*, iovec const*, int, long, bool, ...)+0x3a7) [0x562727d18159]
 10: (Client::ll_preadv_pwritev(Fh*, iovec const*, int, long, bool, ...)+0x179) [0x562727d18b99]
 11: (TestClient_LlreadvLlwritevZeroBytes_Test::TestBody()+0x592) [0x562727ca5352]
 12: (void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, ...)+0x1b) [0x562727d9dea3]
 13: (void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, ...)+0x80) [0x562727da2b26]
 14: (testing::Test::Run()+0xb4) [0x562727d927ae]
 15: (testing::TestInfo::Run()+0x104) [0x562727d92988]
 16: (testing::TestSuite::Run()+0xb2) [0x562727d92b34]
 17: (testing::internal::UnitTestImpl::RunAllTests()+0x36b) [0x562727d95303]
 18: (bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, ...)(), char const*)+0x1b) [0x562727d9e15f]
 19: (bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, ...)+0x80) [0x562727da3083]
 20: (testing::UnitTest::Run()+0x63) [0x562727d92813]
 21: (RUN_ALL_TESTS()+0x11) [0x562727c828d9]
 22: main()

The patch fixes the same.

Fixes: https://tracker.ceph.com/issues/68309
Signed-off-by: Kotresh HR <khiremat@redhat.com>
@kotreshhr
Copy link
Contributor Author

@vshankar The make check failure is because of the flake8 error on the test_nfs.py. I will fix this and rebase.

./tasks/cephfs/test_nfs.py:369:9: F841 local variable 'e' is assigned to but never used

Done

@vshankar
Copy link
Contributor

vshankar commented Oct 4, 2024

@vshankar The make check failure is because of the flake8 error on the test_nfs.py. I will fix this and rebase.

Never mind. I built the change locally before building packages in shaman. Just that when running test I'll point to a custom qa suite branch (saves time).

@kotreshhr
Copy link
Contributor Author

jenkins test make check arm64

vshankar added a commit to vshankar/ceph that referenced this pull request Oct 7, 2024
* refs/pull/59987/head:
	client: Fix aio zerobyte file read
	client: Fix caps_ref[c]<0 assert
	client: Fix libcephfs aio metadata corruption.
	test/client: Fix aio nonblocking test
	qa: Add libcephfs client test with objectcacher disabled
	qa: Add data read/write test for nfs-ganesha
Copy link
Contributor

@vshankar vshankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good to merge. Preparing run wiki now - will merge soon.

Copy link
Contributor

@vshankar vshankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vshankar vshankar merged commit 4301208 into ceph:main Oct 10, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cephfs Ceph File System nfs tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants