Bug 15346 - 2-3min delays at reconnect with smb2_validate_sequence_number: bad message_id 2
Summary: 2-3min delays at reconnect with smb2_validate_sequence_number: bad message_id 2
Status: NEW
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: File services (show other bugs)
Version: 4.18.0
Hardware: x64 Windows 10
: P5 normal (vote)
Target Milestone: ---
Assignee: Samba QA Contact
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-03-29 08:24 UTC by Markus Wolff
Modified: 2023-05-18 02:21 UTC (History)
1 user (show)

See Also:


Attachments
my_smb.conf_global (2.20 KB, text/plain)
2023-03-29 08:24 UTC, Markus Wolff
no flags Details
tcpdump_server_p445.log (97.88 KB, text/x-log)
2023-05-16 16:01 UTC, Markus Wolff
no flags Details
smb_machine.log related to tcpdump (3.43 KB, text/x-log)
2023-05-16 16:10 UTC, Markus Wolff
no flags Details
Attachment for my comment from 2023-05-16 21:09:16 UTC (332.92 MB, application/x-bzip2)
2023-05-17 00:32 UTC, Mike Silva
no flags Details
Not stupidly large non-debug server smbd log file. (124.67 KB, text/plain)
2023-05-17 09:15 UTC, Mike Silva
no flags Details
5-17-23 Test Run 1 with debug (45.34 MB, application/x-bzip2)
2023-05-18 02:21 UTC, Mike Silva
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Markus Wolff 2023-03-29 08:24:55 UTC
Created attachment 17846 [details]
my_smb.conf_global

Dear Samba Supporter, 
thank you for the great work and letting me present my problem here.

I observed some strange Issue since samba 4.17.x in an AD setup of about 50 Users.

It is reported having a delay of ~2-3min, when contacting (usually by Explorer) the Samba Server for the first time of the day or after a couple of hours.
Afterwards it is fluid again.  It also seems to not affect all Users/clients.

I have googled around and found:
https://www.spinics.net/lists/samba/msg177542.html
which looks similar.

related with the above issues reports appearing, about trouble with open MSOffice Documents, which i expect to be a side effect of the delays. 
I tried to adjust byte locking config parameters, which didn't change anything.


so i was thinking:
- authentication
- protocol negotiation

The logfiles show:

[2023/03/27 07:54:45.282614,  0] ../../source3/smbd/smb2_server.c:677(smb2_validate_sequence_number)
 smb2_validate_sequence_number: smb2_validate_sequence_number: bad message_id 2 (sequence id 2) (granted = 1, low = 1, range = 1)
[2023/03/27 07:54:45.282840,  0] ../../source3/smbd/smb2_server.c:677(smb2_validate_sequence_number)
 smb2_validate_sequence_number: smb2_validate_sequence_number: bad message_id 2 (sequence id 2) (granted = 1, low = 1, range = 1)
[2023/03/27 07:54:45.284370,  0] ../../source3/smbd/smb2_server.c:677(smb2_validate_sequence_number)
 smb2_validate_sequence_number: smb2_validate_sequence_number: bad message_id 2 (sequence id 2) (granted = 1, low = 1, range = 1)
[2023/03/27 12:58:35.824159,  0] ../../source3/smbd/smb2_server.c:677(smb2_validate_sequence_number)
 smb2_validate_sequence_number: smb2_validate_sequence_number: bad message_id 2 (sequence id 2) (granted = 1, low = 1, range = 1)
[2023/03/27 12:58:36.240109,  0] ../../source3/smbd/smb2_server.c:677(smb2_validate_sequence_number)
 smb2_validate_sequence_number: smb2_validate_sequence_number: bad message_id 2 (sequence id 2) (granted = 1, low = 1, range = 1)
[2023/03/27 12:58:45.519484,  0] ../../source3/smbd/smb2_server.c:677(smb2_validate_sequence_number)
 smb2_validate_sequence_number: smb2_validate_sequence_number: bad message_id 2 (sequence id 2) (granted = 1, low = 1, range = 1)
[2023/03/27 12:58:49.465573,  0] ../../source3/smbd/smb2_server.c:677(smb2_validate_sequence_number)
 smb2_validate_sequence_number: smb2_validate_sequence_number: bad message_id 2 (sequence id 2) (granted = 1, low = 1, range = 1)
[2023/03/27 12:59:40.079221,  0] ../../source3/smbd/smb2_server.c:677(smb2_validate_sequence_number)
 smb2_validate_sequence_number: smb2_validate_sequence_number: bad message_id 2 (sequence id 2) (granted = 1, low = 1, range = 1)
[2023/03/27 12:59:40.082962,  0] ../../source3/smbd/smb2_server.c:677(smb2_validate_sequence_number)
 smb2_validate_sequence_number: smb2_validate_sequence_number: bad message_id 2 (sequence id 2) (granted = 1, low = 1, range = 1)
[2023/03/27 13:00:02.495754,  0] ../../source3/smbd/smb2_server.c:677(smb2_validate_sequence_number)
 smb2_validate_sequence_number: smb2_validate_sequence_number: bad message_id 2 (sequence id 2) (granted = 1, low = 1, range = 1)
Comment 1 Douglas Bagnall 2023-04-04 22:31:28 UTC
I shifted this to "File services" because "source3/smbd/smb2_server.c", but I notice all the affected people say they are using Samba as an AD.

That is:

* this report, 
* https://lists.samba.org/archive/samba/2023-January/243564.html ( == the spinics link above)
* https://lists.samba.org/archive/samba/2023-January/243483.html which gets to https://lists.samba.org/archive/samba/2023-January/243501.html
Comment 2 Markus Wolff 2023-05-03 12:27:51 UTC
I still observing this with samba 4.18.0 

Is there anything i can do to get this rolling?
Comment 3 Volker Lendecke 2023-05-03 12:41:39 UTC
Please upload a network trace of such a failure. You can find information on how to create network traces under https://wiki.samba.org/index.php/Capture_Packets.
Comment 4 Mike Silva 2023-05-16 08:34:34 UTC
I'm seeing this when connecting from macOS (latest Monterey 12.6.5 or Ventura) to an Ubuntu 23.04 host running Samba 4.17.7 not acting as a DC. The Ubuntu server is setup as a standalone Ubuntu server.

I've also configured my Mac to only negotiate SMB2 & SMB3 connections, while the server will only accept SMB3.
Comment 5 Markus Wolff 2023-05-16 16:01:39 UTC
Created attachment 17884 [details]
tcpdump_server_p445.log

tcpdump around the error attached
Comment 6 Markus Wolff 2023-05-16 16:10:02 UTC
Created attachment 17885 [details]
smb_machine.log related to tcpdump

I noticed the error can be provoked by trying to access the security tab in a folders properties. 

Sometimes, not always, the win10 desktop is redrawed by what looks like a explorer crash.



problem appears at:    [2023/05/16 17:38:23.190667
... 
problem disappears at: [2023/05/16 17:39:24.878964
Comment 7 Mike Silva 2023-05-16 18:36:57 UTC
One additional note, in my case, my Mac is automounting 4 shares simultaneously, so I'm wondering if this isn't related to some sort of race condition.

I'll try to capture some enhanced logging today, but at the base level I just see the above message in the original bug repeated over and over until the mounts finally happen.
Comment 8 Mike Silva 2023-05-16 21:09:16 UTC
I reproduced the bug, configuring my server per the instructions here:
https://www.samba.org/~asn/reporting_samba_bugs.txt

Note that if it is a race condition, setting debug level to 10 seemed to have changed timing enough to change behavior slightly. Typically, my mounts eventually mount and stay mounted, even if it takes up to 40 minutes of them banging away, perhaps with one or two repeatedly mounting and unmounting in that time. But, in this test run some mounts momentarily seemed to mount, then dismounted, with none of them staying mounted for about an hour.

Before the test run, I further simplified my Mac client configuration to only try to negotiate a SMB3 connection, to eliminate a negotiation variable. Before changing the log level, my mounts did successfully mount with negotiating SMB3 alone, albeit slowly thanks to this issue we are investigating.

You'll find smb_mount_problem.tar.gz attached, which contains the server's /var/log/samba, /var/log/, /var/lib/samba, /etc/samba/ directories along with a tcpdump pcap file for the test run. Note that I did delete the contents of these directories before starting up samba for the test run, so the files contained therein are for the test run alone.

Server Details
--------------
uname -a
Linux Vancouver 6.2.0-1004-raspi #5-Ubuntu SMP PREEMPT Mon Apr  3 11:15:14 UTC 2023 aarch64 aarch64 aarch64 GNU/Linux

lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 23.04
Release:	23.04
Codename:	lunar

samba --version
Version 4.17.7-Ubuntu

Client Details
---------------
uname -a
Darwin Mikes-MacBook-Pro.local 21.6.0 Darwin Kernel Version 21.6.0: Thu Mar  9 20:08:59 PST 2023; root:xnu-8020.240.18.700.8~1/RELEASE_X86_64 x86_64

That is, macOS Monterey 12.6.5
Comment 9 Mike Silva 2023-05-17 00:32:08 UTC
Created attachment 17886 [details]
Attachment for my comment from 2023-05-16 21:09:16 UTC
Comment 10 Markus Wolff 2023-05-17 08:05:22 UTC
Mike, 
thanks for adding information.
your log.smbd shows a somewhat different issue

yours:
[2023/05/16 12:37:11.924264, 10, pid=3970, effective(0, 0), real(0, 0), class=smb2_credits] ../../source3/smbd/smb2_server.c:718(smb2_validate_sequence_number)
  smb2_validate_sequence_number: smb2_validate_sequence_number: clearing id 0 (position 0) from bitmap
 
mine:
[2023/05/17 05:10:50.223869,  0] ../../source3/smbd/smb2_server.c:677(smb2_validate_sequence_number)
  smb2_validate_sequence_number: smb2_validate_sequence_number: bad message_id 2 (sequence id 2) (granted = 1, low = 1, range = 1)


They may lead to the same symptoms.
Comment 11 Mike Silva 2023-05-17 09:14:17 UTC
It's not easy to see, because I *cough* *cough* 'helpfully' tarred up and attached a 11GB log.smbd like an idiot without noticing. But, there are similar references to line 677 in smb2_server.c, in that gargantuan log.smbd. And, indeed, that same message has been the one recurring over and over in my smbd logs for the few months[?] this has been an issue.

I believe the difference may be because of the chatty debug log level I was running my test at. Or, because I'm configuring my Mac to only negotiate an SMB3 connection.

After I get some sleep, I'll re-run some test with chatty logging tomorrow morning, for shorter periods of time, producing some output that's actually easier to search and interpret. Hopefully, nailing this down a bit more.

For now, I'll attach one of my typical server logs from a few days ago, when the server was operating at default logging level. As you can see, lot's of mention of line 677 in smb2_server.c, very similar to your log.
Comment 12 Mike Silva 2023-05-17 09:15:12 UTC
Created attachment 17890 [details]
Not stupidly large non-debug server smbd log file.
Comment 13 Mike Silva 2023-05-18 02:21:21 UTC
Created attachment 17891 [details]
5-17-23 Test Run 1 with debug

I've completed a reasonable duration test run, with debug level 10 and a tcpdump, which captures the typical behavior I see with the slow share mounts seemingly due to samba negotiation problems.

Relevant Time Stamps in 5-17-23 Test Run 1
- 17:52 Test run starts
- Shares on the server seem to mount immediately.[All shares in log are on the same server.]
- 17:55 All shares dismount and start trying to mount again.
- 18:00 Another round of all shares on the server dismounting and trying to re-mount.
- 18:01 Judging from atop running on the server, smb seems to stop, and restarts.
- 18:03 Test run ended.