Bug 15346 - 2-3min delays at reconnect with smb2_validate_sequence_number: bad message_id 2
Summary: 2-3min delays at reconnect with smb2_validate_sequence_number: bad message_id 2
Status: RESOLVED FIXED
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: File services (show other bugs)
Version: 4.18.0
Hardware: x64 Windows 10
: P5 normal (vote)
Target Milestone: ---
Assignee: Jule Anger
QA Contact: Samba QA Contact
URL: https://gitlab.com/samba-team/samba/-...
Keywords:
Depends on:
Blocks: 15498
  Show dependency treegraph
 
Reported: 2023-03-29 08:24 UTC by Markus Wolff
Modified: 2023-10-17 02:24 UTC (History)
4 users (show)

See Also:


Attachments
my_smb.conf_global (2.20 KB, text/plain)
2023-03-29 08:24 UTC, Markus Wolff
no flags Details
tcpdump_server_p445.log (97.88 KB, text/x-log)
2023-05-16 16:01 UTC, Markus Wolff
no flags Details
smb_machine.log related to tcpdump (3.43 KB, text/x-log)
2023-05-16 16:10 UTC, Markus Wolff
no flags Details
Attachment for my comment from 2023-05-16 21:09:16 UTC (332.92 MB, application/x-bzip2)
2023-05-17 00:32 UTC, Mike Silva
no flags Details
Not stupidly large non-debug server smbd log file. (124.67 KB, text/plain)
2023-05-17 09:15 UTC, Mike Silva
no flags Details
5-17-23 Test Run 1 with debug (45.34 MB, application/x-bzip2)
2023-05-18 02:21 UTC, Mike Silva
no flags Details
Patches for v4-19-test (27.86 KB, patch)
2023-08-08 14:04 UTC, Stefan Metzmacher
asn: review+
Details
Patches for v4-18-test (27.86 KB, patch)
2023-08-08 14:05 UTC, Stefan Metzmacher
asn: review+
Details
Patches for v4-17-test (27.86 KB, patch)
2023-08-08 14:05 UTC, Stefan Metzmacher
asn: review+
Details
Patches for v4-17-test (+ 1 additional backport) (39.51 KB, text/plain)
2023-08-11 15:14 UTC, Stefan Metzmacher
asn: review+
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Markus Wolff 2023-03-29 08:24:55 UTC
Created attachment 17846 [details]
my_smb.conf_global

Dear Samba Supporter, 
thank you for the great work and letting me present my problem here.

I observed some strange Issue since samba 4.17.x in an AD setup of about 50 Users.

It is reported having a delay of ~2-3min, when contacting (usually by Explorer) the Samba Server for the first time of the day or after a couple of hours.
Afterwards it is fluid again.  It also seems to not affect all Users/clients.

I have googled around and found:
https://www.spinics.net/lists/samba/msg177542.html
which looks similar.

related with the above issues reports appearing, about trouble with open MSOffice Documents, which i expect to be a side effect of the delays. 
I tried to adjust byte locking config parameters, which didn't change anything.


so i was thinking:
- authentication
- protocol negotiation

The logfiles show:

[2023/03/27 07:54:45.282614,  0] ../../source3/smbd/smb2_server.c:677(smb2_validate_sequence_number)
 smb2_validate_sequence_number: smb2_validate_sequence_number: bad message_id 2 (sequence id 2) (granted = 1, low = 1, range = 1)
[2023/03/27 07:54:45.282840,  0] ../../source3/smbd/smb2_server.c:677(smb2_validate_sequence_number)
 smb2_validate_sequence_number: smb2_validate_sequence_number: bad message_id 2 (sequence id 2) (granted = 1, low = 1, range = 1)
[2023/03/27 07:54:45.284370,  0] ../../source3/smbd/smb2_server.c:677(smb2_validate_sequence_number)
 smb2_validate_sequence_number: smb2_validate_sequence_number: bad message_id 2 (sequence id 2) (granted = 1, low = 1, range = 1)
[2023/03/27 12:58:35.824159,  0] ../../source3/smbd/smb2_server.c:677(smb2_validate_sequence_number)
 smb2_validate_sequence_number: smb2_validate_sequence_number: bad message_id 2 (sequence id 2) (granted = 1, low = 1, range = 1)
[2023/03/27 12:58:36.240109,  0] ../../source3/smbd/smb2_server.c:677(smb2_validate_sequence_number)
 smb2_validate_sequence_number: smb2_validate_sequence_number: bad message_id 2 (sequence id 2) (granted = 1, low = 1, range = 1)
[2023/03/27 12:58:45.519484,  0] ../../source3/smbd/smb2_server.c:677(smb2_validate_sequence_number)
 smb2_validate_sequence_number: smb2_validate_sequence_number: bad message_id 2 (sequence id 2) (granted = 1, low = 1, range = 1)
[2023/03/27 12:58:49.465573,  0] ../../source3/smbd/smb2_server.c:677(smb2_validate_sequence_number)
 smb2_validate_sequence_number: smb2_validate_sequence_number: bad message_id 2 (sequence id 2) (granted = 1, low = 1, range = 1)
[2023/03/27 12:59:40.079221,  0] ../../source3/smbd/smb2_server.c:677(smb2_validate_sequence_number)
 smb2_validate_sequence_number: smb2_validate_sequence_number: bad message_id 2 (sequence id 2) (granted = 1, low = 1, range = 1)
[2023/03/27 12:59:40.082962,  0] ../../source3/smbd/smb2_server.c:677(smb2_validate_sequence_number)
 smb2_validate_sequence_number: smb2_validate_sequence_number: bad message_id 2 (sequence id 2) (granted = 1, low = 1, range = 1)
[2023/03/27 13:00:02.495754,  0] ../../source3/smbd/smb2_server.c:677(smb2_validate_sequence_number)
 smb2_validate_sequence_number: smb2_validate_sequence_number: bad message_id 2 (sequence id 2) (granted = 1, low = 1, range = 1)
Comment 1 Douglas Bagnall 2023-04-04 22:31:28 UTC
I shifted this to "File services" because "source3/smbd/smb2_server.c", but I notice all the affected people say they are using Samba as an AD.

That is:

* this report, 
* https://lists.samba.org/archive/samba/2023-January/243564.html ( == the spinics link above)
* https://lists.samba.org/archive/samba/2023-January/243483.html which gets to https://lists.samba.org/archive/samba/2023-January/243501.html
Comment 2 Markus Wolff 2023-05-03 12:27:51 UTC
I still observing this with samba 4.18.0 

Is there anything i can do to get this rolling?
Comment 3 Volker Lendecke 2023-05-03 12:41:39 UTC
Please upload a network trace of such a failure. You can find information on how to create network traces under https://wiki.samba.org/index.php/Capture_Packets.
Comment 4 Mike Silva 2023-05-16 08:34:34 UTC
I'm seeing this when connecting from macOS (latest Monterey 12.6.5 or Ventura) to an Ubuntu 23.04 host running Samba 4.17.7 not acting as a DC. The Ubuntu server is setup as a standalone Ubuntu server.

I've also configured my Mac to only negotiate SMB2 & SMB3 connections, while the server will only accept SMB3.
Comment 5 Markus Wolff 2023-05-16 16:01:39 UTC
Created attachment 17884 [details]
tcpdump_server_p445.log

tcpdump around the error attached
Comment 6 Markus Wolff 2023-05-16 16:10:02 UTC
Created attachment 17885 [details]
smb_machine.log related to tcpdump

I noticed the error can be provoked by trying to access the security tab in a folders properties. 

Sometimes, not always, the win10 desktop is redrawed by what looks like a explorer crash.



problem appears at:    [2023/05/16 17:38:23.190667
... 
problem disappears at: [2023/05/16 17:39:24.878964
Comment 7 Mike Silva 2023-05-16 18:36:57 UTC
One additional note, in my case, my Mac is automounting 4 shares simultaneously, so I'm wondering if this isn't related to some sort of race condition.

I'll try to capture some enhanced logging today, but at the base level I just see the above message in the original bug repeated over and over until the mounts finally happen.
Comment 8 Mike Silva 2023-05-16 21:09:16 UTC
I reproduced the bug, configuring my server per the instructions here:
https://www.samba.org/~asn/reporting_samba_bugs.txt

Note that if it is a race condition, setting debug level to 10 seemed to have changed timing enough to change behavior slightly. Typically, my mounts eventually mount and stay mounted, even if it takes up to 40 minutes of them banging away, perhaps with one or two repeatedly mounting and unmounting in that time. But, in this test run some mounts momentarily seemed to mount, then dismounted, with none of them staying mounted for about an hour.

Before the test run, I further simplified my Mac client configuration to only try to negotiate a SMB3 connection, to eliminate a negotiation variable. Before changing the log level, my mounts did successfully mount with negotiating SMB3 alone, albeit slowly thanks to this issue we are investigating.

You'll find smb_mount_problem.tar.gz attached, which contains the server's /var/log/samba, /var/log/, /var/lib/samba, /etc/samba/ directories along with a tcpdump pcap file for the test run. Note that I did delete the contents of these directories before starting up samba for the test run, so the files contained therein are for the test run alone.

Server Details
--------------
uname -a
Linux Vancouver 6.2.0-1004-raspi #5-Ubuntu SMP PREEMPT Mon Apr  3 11:15:14 UTC 2023 aarch64 aarch64 aarch64 GNU/Linux

lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 23.04
Release:	23.04
Codename:	lunar

samba --version
Version 4.17.7-Ubuntu

Client Details
---------------
uname -a
Darwin Mikes-MacBook-Pro.local 21.6.0 Darwin Kernel Version 21.6.0: Thu Mar  9 20:08:59 PST 2023; root:xnu-8020.240.18.700.8~1/RELEASE_X86_64 x86_64

That is, macOS Monterey 12.6.5
Comment 9 Mike Silva 2023-05-17 00:32:08 UTC
Created attachment 17886 [details]
Attachment for my comment from 2023-05-16 21:09:16 UTC
Comment 10 Markus Wolff 2023-05-17 08:05:22 UTC
Mike, 
thanks for adding information.
your log.smbd shows a somewhat different issue

yours:
[2023/05/16 12:37:11.924264, 10, pid=3970, effective(0, 0), real(0, 0), class=smb2_credits] ../../source3/smbd/smb2_server.c:718(smb2_validate_sequence_number)
  smb2_validate_sequence_number: smb2_validate_sequence_number: clearing id 0 (position 0) from bitmap
 
mine:
[2023/05/17 05:10:50.223869,  0] ../../source3/smbd/smb2_server.c:677(smb2_validate_sequence_number)
  smb2_validate_sequence_number: smb2_validate_sequence_number: bad message_id 2 (sequence id 2) (granted = 1, low = 1, range = 1)


They may lead to the same symptoms.
Comment 11 Mike Silva 2023-05-17 09:14:17 UTC
It's not easy to see, because I *cough* *cough* 'helpfully' tarred up and attached a 11GB log.smbd like an idiot without noticing. But, there are similar references to line 677 in smb2_server.c, in that gargantuan log.smbd. And, indeed, that same message has been the one recurring over and over in my smbd logs for the few months[?] this has been an issue.

I believe the difference may be because of the chatty debug log level I was running my test at. Or, because I'm configuring my Mac to only negotiate an SMB3 connection.

After I get some sleep, I'll re-run some test with chatty logging tomorrow morning, for shorter periods of time, producing some output that's actually easier to search and interpret. Hopefully, nailing this down a bit more.

For now, I'll attach one of my typical server logs from a few days ago, when the server was operating at default logging level. As you can see, lot's of mention of line 677 in smb2_server.c, very similar to your log.
Comment 12 Mike Silva 2023-05-17 09:15:12 UTC
Created attachment 17890 [details]
Not stupidly large non-debug server smbd log file.
Comment 13 Mike Silva 2023-05-18 02:21:21 UTC
Created attachment 17891 [details]
5-17-23 Test Run 1 with debug

I've completed a reasonable duration test run, with debug level 10 and a tcpdump, which captures the typical behavior I see with the slow share mounts seemingly due to samba negotiation problems.

Relevant Time Stamps in 5-17-23 Test Run 1
- 17:52 Test run starts
- Shares on the server seem to mount immediately.[All shares in log are on the same server.]
- 17:55 All shares dismount and start trying to mount again.
- 18:00 Another round of all shares on the server dismounting and trying to re-mount.
- 18:01 Judging from atop running on the server, smb seems to stop, and restarts.
- 18:03 Test run ended.
Comment 14 Matthias Leopold 2023-08-03 10:27:31 UTC
Is something happening here? I'm seeing this with Samba AD DC 4.17.10. Since people also report it for 4.18 it seems problem is not solved.

Thank you
Comment 15 Ralph Böhme 2023-08-03 13:19:56 UTC
We think we (metze) have identified this problem coincidentally while debugging a problem at a customer site.

It's likely related to our smb3 multi channel code. If you set

  server multi channel support = no

The problems should go away.

We're working on a fix...
Comment 16 Stefan Metzmacher 2023-08-07 14:34:13 UTC
(In reply to Ralph Böhme from comment #15)

https://gitlab.com/samba-team/samba/-/merge_requests/3209
has a reproducer and a fix, if someone wants to try it
Comment 17 Samba QA Contact 2023-08-08 14:00:10 UTC
This bug was referenced in samba master:

ade663ee6ca1a2813b203ea667d933f4dab9e7b7
dc5a500f0a76720b2a5cb5b1142cf4c35cb6bdea
2b93058be3f6e5eaee239ad3b0e707c62089d18e
4028d6582907cf582730ceec56872d8584ad02e6
50d61e5300250922bf36bb699306f82dff6a00b9
f348b84fbcf203ab1ba92840cf7aecd55dbf9aa0
Comment 18 Stefan Metzmacher 2023-08-08 14:04:29 UTC
Created attachment 18037 [details]
Patches for v4-19-test
Comment 19 Stefan Metzmacher 2023-08-08 14:05:08 UTC
Created attachment 18038 [details]
Patches for v4-18-test
Comment 20 Stefan Metzmacher 2023-08-08 14:05:45 UTC
Created attachment 18039 [details]
Patches for v4-17-test
Comment 21 Jule Anger 2023-08-11 07:52:09 UTC
Pushed to autobuild-v4-{19,18,17}-test.
Comment 22 Samba QA Contact 2023-08-11 09:02:07 UTC
This bug was referenced in samba v4-19-test:

cf84e662db5f75f09f855a8b511cf91d7b143b7a
a2bc1a936080682e6c59b165ea275a2c9147da69
fba1440069655a13d8b6b81262a86bb71372363f
c52d4274277fff5d69adbbba363216feb927474f
6caab1022b6e279f6215f890542c8b69ef42460f
7d786ccb6f7f38d957dd89b29eef5af7416c6416
Comment 23 Samba QA Contact 2023-08-11 09:50:04 UTC
This bug was referenced in samba v4-18-test:

aec49671a3186f920a3fa359caff46710bcb0cbc
208bece139b69631602f07bd950422fdb74cf899
76c6234a0c9fcb6cbd9b0b5e86e48e493ece0591
28e68be96127f9a0398a24f324d05e0378d442f0
4dcefc01054d046080b85a71b822ab67e60a98b2
cd866f5c4ce5a85ce7076f6beb9b32aacb25fb5a
Comment 24 Jule Anger 2023-08-11 12:04:33 UTC
autobuild-v4-17-test failed several times for task samba-h5l-build:

../../source4/torture/smb2/multichannel.c:2389:11: error: too many arguments to function ‘smbXcli_negprot_recv’
../../source4/torture/smb2/multichannel.c:2544:16: error: too many arguments to function ‘smbXcli_negprot_send’
Comment 25 Stefan Metzmacher 2023-08-11 15:14:18 UTC
Created attachment 18041 [details]
Patches for v4-17-test (+ 1 additional backport)
Comment 26 Jule Anger 2023-08-15 07:29:58 UTC
Pushed to autobuild-v4-17-test.
Comment 27 Samba QA Contact 2023-08-15 09:01:04 UTC
This bug was referenced in samba v4-17-test:

d167b80dc729f8c8caf7052a3494433f9764b72c
65b05090ee4153ffd407e53a076a60706911f0ef
8c727eef9e304b9cca6ed6baa718c2e5bc4c0490
3c23c7f36c8c14e2eba5489339bbeb3fb37f298f
534f13630333644db8e19dc23b6bd38fd6ec24f3
f3d5e3add54184bc34d57866a7b566b64e54117c
8738efc4042430cc123c04234bf6d9bfa4ae6f67
Comment 28 Jule Anger 2023-08-15 09:12:26 UTC
Closing out bug report.

Thanks!
Comment 29 Samba QA Contact 2023-08-16 16:57:29 UTC
This bug was referenced in samba v4-18-stable (Release samba-4.18.6):

aec49671a3186f920a3fa359caff46710bcb0cbc
208bece139b69631602f07bd950422fdb74cf899
76c6234a0c9fcb6cbd9b0b5e86e48e493ece0591
28e68be96127f9a0398a24f324d05e0378d442f0
4dcefc01054d046080b85a71b822ab67e60a98b2
cd866f5c4ce5a85ce7076f6beb9b32aacb25fb5a
Comment 30 Samba QA Contact 2023-08-18 11:21:13 UTC
This bug was referenced in samba v4-19-stable (Release samba-4.19.0rc3):

cf84e662db5f75f09f855a8b511cf91d7b143b7a
a2bc1a936080682e6c59b165ea275a2c9147da69
fba1440069655a13d8b6b81262a86bb71372363f
c52d4274277fff5d69adbbba363216feb927474f
6caab1022b6e279f6215f890542c8b69ef42460f
7d786ccb6f7f38d957dd89b29eef5af7416c6416
Comment 31 Samba QA Contact 2023-09-07 09:02:34 UTC
This bug was referenced in samba v4-17-stable (Release samba-4.17.11):

d167b80dc729f8c8caf7052a3494433f9764b72c
65b05090ee4153ffd407e53a076a60706911f0ef
8c727eef9e304b9cca6ed6baa718c2e5bc4c0490
3c23c7f36c8c14e2eba5489339bbeb3fb37f298f
534f13630333644db8e19dc23b6bd38fd6ec24f3
f3d5e3add54184bc34d57866a7b566b64e54117c
8738efc4042430cc123c04234bf6d9bfa4ae6f67