Bug 13860 - CTDB restarts failed NFS RPC services by hand, which is incompatible with systemd
Summary: CTDB restarts failed NFS RPC services by hand, which is incompatible with sys...
Status: RESOLVED FIXED
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: CTDB (show other bugs)
Version: 4.8.9
Hardware: All All
: P5 normal (vote)
Target Milestone: ---
Assignee: Karolin Seeger
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-03-28 05:26 UTC by Martin Schwenke
Modified: 2019-04-15 07:27 UTC (History)
1 user (show)

See Also:


Attachments
Patch for 4.9 and 4.10 (25.25 KB, patch)
2019-04-03 06:04 UTC, Martin Schwenke
no flags Details
Patch for 4.9 and 4.10 (23.58 KB, patch)
2019-04-09 06:33 UTC, Martin Schwenke
amitay: review+
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Schwenke 2019-03-28 05:26:10 UTC
60.nfs.script detects that NFS RPC services have failed via nfs-checks.d/.  For rpc.mountd, rpc.rquotad or rpc.statd it kills the relevant process via killall and restarts it by constructing a command-line for the relevant RPC daemon and running it directly.

systemd later fails to start such hard-restarted services because they are already running.  It tracks these services individually via PID files instead of using a generic kill command to stop them.

When systemd is in use and is tracking these service then they should only ever be stopped and started via systemd.
Comment 1 Martin Schwenke 2019-03-28 05:35:12 UTC
Additionally, the code that "corrects" the nfsd thread count if it is not at the expected level, corrects if the thread count is 0.  This is almost certainly a mistake because it can cause the general problem described above in the bug description.
Comment 2 Martin Schwenke 2019-04-03 06:04:23 UTC
Created attachment 15041 [details]
Patch for 4.9 and 4.10

This is the patch set that went into master, *minus* the patch that changes the default to systemd-redhat.
Comment 3 Martin Schwenke 2019-04-04 07:18:12 UTC
Comment on attachment 15041 [details]
Patch for 4.9 and 4.10

I'm going to remove this patch for now and try testing for a couple of days without the last commit.  I think there's something else going on...
Comment 4 Martin Schwenke 2019-04-09 06:33:38 UTC
Created attachment 15052 [details]
Patch for 4.9 and 4.10

Patch for 4.9 and 4.10

This is the patch set that went into master, *minus*:

* The patch that changes the default to systemd-redhat

  We don't want to change the default for released versions, but we do want to
  give people the ability to edit the call-back to take advantage of the changes.

* The final patch, which avoids changing the thread count when it is 0

  I originally thought this was being triggered and causing problems in
  cluster testing.  However, I have since fixed the test environment and
  tested many times with the patch reverted... and have seen no problems.
Comment 5 Amitay Isaacs 2019-04-11 06:12:31 UTC
Hi Karolin,

This is ready for v4-9 and v4-10.
Comment 6 Karolin Seeger 2019-04-11 11:34:57 UTC
(In reply to Amitay Isaacs from comment #5)
Pushed to autobuild-v4-{10,9}-test.
Comment 7 Karolin Seeger 2019-04-15 07:27:50 UTC
(In reply to Karolin Seeger from comment #6)
Pushed to both branches.
Closing out bug report.

Thanks!