Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

squid: client: Fix libcephfs aio metadata corruption. #60245

Open
wants to merge 6 commits into
base: squid
Choose a base branch
from

Conversation

vshankar
Copy link
Contributor

backport tracker: https://tracker.ceph.com/issues/68495


backport of #59987
parent tracker: https://tracker.ceph.com/issues/68146

this backport was staged using ceph-backport.sh version 16.0.0.6848
find the latest version at https://github.com/ceph/ceph/blob/main/src/script/ceph-backport.sh

Fixes: https://tracker.ceph.com/issues/68146
Signed-off-by: Kotresh HR <khiremat@redhat.com>
(cherry picked from commit 6d8f610)
Fixes: https://tracker.ceph.com/issues/68146
Signed-off-by: Kotresh HR <khiremat@redhat.com>
(cherry picked from commit 59b996f)
The same bufferlist is used without cleaning
for multiple calls. The test 'LlreadvLlwritev'
used to fail because of it. Fixed the same.

Fixes: https://tracker.ceph.com/issues/68146
Signed-off-by: Kotresh HR <khiremat@redhat.com>
(cherry picked from commit b5af1c1)
Problem:
With cephfs nfs-ganesha, there were following
asserts hit while doing write on a file.

1. FAILED ceph_assert((bool)_front == (bool)_size)
2. FAILED ceph_assert(cap_refs[c] > 0)

Cause:
In aio path, the client_lock was not being held
in the internal callback after the io is done where
it's expected to be taken leading to corruption.

Fix:
Take client_lock in the callback

Fixes: https://tracker.ceph.com/issues/68146
Signed-off-by: Kotresh HR <khiremat@redhat.com>
(cherry picked from commit 3ebe974)
When libcephfs aio tests (src/test/client) are run
with objectcacher disabled (ceph_test_client --client_oc=false),
the TestClient.LlreadvLlwritev fails and core dumps. The client
hits the assert 'caps_ref[c]<0'.

This patch fixes the same. There is no need to give out cap_ref
and take it again between multiple read because of short reads.
In some cases, the get_caps used to fail in C_Read_Sync_NonBlocking::finish
causing cap_ref to go negative when put_cap_ref is done at last in
C_Read_Finish::finish_io

Fixes: https://tracker.ceph.com/issues/68308
Signed-off-by: Kotresh HR <khiremat@redhat.com>
(cherry picked from commit 10c8330)
The following test fails when run with objectcacher
disabled.

TestClient.LlreadvLlwritevZeroBytes Failure - nonblocking.cc

ceph/src/osdc/Striper.cc: 186: FAILED ceph_assert(len > 0)

Traceback:
 ceph version Development (no_version) squid (dev)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x125) [0x7fc0a340aafe]
 2: (ceph::register_assert_context(ceph::common::CephContext*)+0) [0x7fc0a340ad20]
 3: (Striper::file_to_extents(ceph::common::CephContext*, file_layout_t const*, ...)+0x184) [0x562727e13ab4]
 4: (Striper::file_to_extents(ceph::common::CephContext*, char const*, ...)+0x97) [0x562727e145d1]
 5: (Striper::file_to_extents(ceph::common::CephContext*, inodeno_t, ...)+0x75) [0x562727d29520]
 6: (Filer::read_trunc(inodeno_t, file_layout_t const*, snapid_t, ...)+0x61) [0x562727d66ea5]
 7: (Client::C_Read_Sync_NonBlocking::retry()+0x10c) [0x562727cd8a8e]
 8: (Client::_read(Fh*, long, unsigned long, ceph::buffer::v15_2_0::list*, Context*)+0x578) [0x562727d10cb6]
 9: (Client::_preadv_pwritev_locked(Fh*, iovec const*, int, long, bool, ...)+0x3a7) [0x562727d18159]
 10: (Client::ll_preadv_pwritev(Fh*, iovec const*, int, long, bool, ...)+0x179) [0x562727d18b99]
 11: (TestClient_LlreadvLlwritevZeroBytes_Test::TestBody()+0x592) [0x562727ca5352]
 12: (void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, ...)+0x1b) [0x562727d9dea3]
 13: (void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, ...)+0x80) [0x562727da2b26]
 14: (testing::Test::Run()+0xb4) [0x562727d927ae]
 15: (testing::TestInfo::Run()+0x104) [0x562727d92988]
 16: (testing::TestSuite::Run()+0xb2) [0x562727d92b34]
 17: (testing::internal::UnitTestImpl::RunAllTests()+0x36b) [0x562727d95303]
 18: (bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, ...)(), char const*)+0x1b) [0x562727d9e15f]
 19: (bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, ...)+0x80) [0x562727da3083]
 20: (testing::UnitTest::Run()+0x63) [0x562727d92813]
 21: (RUN_ALL_TESTS()+0x11) [0x562727c828d9]
 22: main()

The patch fixes the same.

Fixes: https://tracker.ceph.com/issues/68309
Signed-off-by: Kotresh HR <khiremat@redhat.com>
(cherry picked from commit 942474c)
@vshankar vshankar added this to the squid milestone Oct 10, 2024
@vshankar vshankar added the cephfs Ceph File System label Oct 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cephfs Ceph File System nfs tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants