The main issue here is that if a node starts taking the recovery lock and loses an election then it can continue to hold the recovery lock. This means that the new master will be unable to take the lock. Another issue is that the cluster mutex child effectively drops SIGTERM until after the desired helper is exec()ed.
Created attachment 14492 [details] Patch for 4.8 and 4.9 These commits cherry-picked cleanly into 4.9. The resulting patch applies cleanly to 4.8 using "git am". Smoke tested both branched using relevant simple tests. This code hasn't changed in a long time, so would not expect anything unusual when backporting to these recent releases. This patch also applies to 4.7 but this branch is now security-only. However, worth mentioning that it applies and should go there if there is another bug fix release... ;-)
Hi Karolin, This is ready for v4-9 and v4-8.
Pushed to autobuild-v4-{8,9}-test.
Pushed to both branches. Closing out bug report. Thanks!