When we are waiting on a pending byte range lock, another smbd might exit uncleanly, and therefore not notify us of the removal of the lock, and thus not trigger the lock to be retried. We currently cope with this by adding a message_send_all() in the SIGCHLD and cluster reconfigure handlers to send a MSG_SMB_UNLOCK to all smbd processes. That generates O(N^2) work when a large number of clients disconnected at once (such as on a network outage), which can leave the whole system unusable for a very long time (many minutes, or even longer).
Created attachment 5580 [details] Patches by Tridge that fix this bug. This is a set of patches by Tridge that have already gont into master and the clustered samba branches that fix the problem by replacing the sending of all a UNLOCK message to all smbd processes at unclean shutdown with a regular cleanup.
Comment on attachment 5580 [details] Patches by Tridge that fix this bug. Looks good
Pushed to v3-5-test. Closing out bug report. Thanks!