13860 – CTDB restarts failed NFS RPC services by hand, which is incompatible with systemd

Bug 13860 - CTDB restarts failed NFS RPC services by hand, which is incompatible with systemd

Summary: CTDB restarts failed NFS RPC services by hand, which is incompatible with sys...

Status:	RESOLVED FIXED

Alias:	None

Product:	Samba 4.1 and newer
Classification:	Unclassified
Component:	CTDB (show other bugs)
Version:	4.8.9
Hardware:	All All

Importance:	P5 normal (vote)
Target Milestone:	---
Assignee:	Karolin Seeger
QA Contact:	Samba QA Contact

URL:
Keywords:

Depends on:
Blocks:

Reported:	2019-03-28 05:26 UTC by Martin Schwenke
Modified:	2019-04-15 07:27 UTC (History)
CC List:	1 user (show)

See Also:

Attachments
Patch for 4.9 and 4.10 (25.25 KB, patch) 2019-04-03 06:04 UTC, Martin Schwenke	no flags	Details
Patch for 4.9 and 4.10 (23.58 KB, patch) 2019-04-09 06:33 UTC, Martin Schwenke	amitay: review+	Details
Show Obsolete (1) View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Martin Schwenke 2019-03-28 05:26:10 UTC

60.nfs.script detects that NFS RPC services have failed via nfs-checks.d/.  For rpc.mountd, rpc.rquotad or rpc.statd it kills the relevant process via killall and restarts it by constructing a command-line for the relevant RPC daemon and running it directly.

systemd later fails to start such hard-restarted services because they are already running.  It tracks these services individually via PID files instead of using a generic kill command to stop them.

When systemd is in use and is tracking these service then they should only ever be stopped and started via systemd.

Comment 1 Martin Schwenke 2019-03-28 05:35:12 UTC

Additionally, the code that "corrects" the nfsd thread count if it is not at the expected level, corrects if the thread count is 0.  This is almost certainly a mistake because it can cause the general problem described above in the bug description.

Comment 2 Martin Schwenke 2019-04-03 06:04:23 UTC

Created attachment 15041 [details]
Patch for 4.9 and 4.10

This is the patch set that went into master, *minus* the patch that changes the default to systemd-redhat.

Comment 3 Martin Schwenke 2019-04-04 07:18:12 UTC

Comment on attachment 15041 [details]
Patch for 4.9 and 4.10

I'm going to remove this patch for now and try testing for a couple of days without the last commit.  I think there's something else going on...

Comment 4 Martin Schwenke 2019-04-09 06:33:38 UTC

Created attachment 15052 [details]
Patch for 4.9 and 4.10

Patch for 4.9 and 4.10

This is the patch set that went into master, *minus*:

* The patch that changes the default to systemd-redhat

  We don't want to change the default for released versions, but we do want to
  give people the ability to edit the call-back to take advantage of the changes.

* The final patch, which avoids changing the thread count when it is 0

  I originally thought this was being triggered and causing problems in
  cluster testing.  However, I have since fixed the test environment and
  tested many times with the patch reverted... and have seen no problems.

Comment 5 Amitay Isaacs 2019-04-11 06:12:31 UTC

Hi Karolin,

This is ready for v4-9 and v4-10.

Comment 6 Karolin Seeger 2019-04-11 11:34:57 UTC

(In reply to Amitay Isaacs from comment #5)
Pushed to autobuild-v4-{10,9}-test.

Comment 7 Karolin Seeger 2019-04-15 07:27:50 UTC

(In reply to Karolin Seeger from comment #6)
Pushed to both branches.
Closing out bug report.

Thanks!