Bug 10839 - High CPU Load samba 4.1.12 possibly because of Notify Request problem w/ AutoCAD
High CPU Load samba 4.1.12 possibly because of Notify Request problem w/ AutoCAD
Status: ASSIGNED
Product: Samba 4.1 and newer
Classification: Unclassified
Component: File services
4.1.12
x64 Linux
: P5 critical
: ---
Assigned To: Jeremy Allison
Samba QA Contact
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-09-25 11:00 UTC by jkoop
Modified: 2016-10-20 00:40 UTC (History)
2 users (show)

See Also:


Attachments
wireshark dump of the looping requests (53.35 KB, application/x-gzip)
2014-09-25 11:00 UTC, jkoop
no flags Details
Fix from Volker (modified). (3.67 KB, patch)
2014-09-25 23:42 UTC, Jeremy Allison
no flags Details
Correct version.. (2.70 KB, patch)
2014-09-25 23:43 UTC, Jeremy Allison
no flags Details
configurable patch for 4.1 (5.34 KB, patch)
2015-02-06 16:02 UTC, Volker Lendecke
no flags Details
configurable patch for master (5.24 KB, patch)
2015-02-06 16:04 UTC, Volker Lendecke
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description jkoop 2014-09-25 11:00:36 UTC
Created attachment 10305 [details]
wireshark dump of the looping requests

The CPU load on the server is very high, processes that are causing it are nslcd or since I changed to it sssd and the smbd processes themselves.

Top looks like this:
---------------------------------
PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
2216 root      20   0  200m 5820 3024 R 31.2  0.1  44:47.46 sssd_nss
3631 XXXXX  20   0  430m  10m 8472 R 17.3  0.3  12:37.12 smbd
2870 XXXXX  20   0  434m  17m  11m R 16.3  0.5  21:22.22 smbd
3603 XXXXX  20   0  430m 9.8m 7492 R 16.3  0.3  14:19.45 smbd
3250 XXXXX  20   0  432m  14m 9384 R 15.9  0.4  12:17.09 smbd
3616 XXXXX  20   0  430m  11m 8452 R 15.9  0.3  13:37.91 smbd
2842 XXXXX  20   0  435m  21m  13m R 15.6  0.6  22:24.76 smbd
3246 XXXXX  20   0  430m  12m 9684 S 15.6  0.3  15:35.65 smbd
3267 XXXXX  20   0  433m  16m  11m S 15.6  0.4  15:04.27 smbd
1637 XXXXX  20   0  425m  15m  13m S 14.9  0.4  13:32.95 smbd
2879 XXXXX  20   0  435m  18m  10m R 14.9  0.5  19:01.03 smbd
2899 XXXXX  20   0  434m  17m  11m R 14.9  0.5  17:24.44 smbd
2838 XXXXX  20   0  433m  17m  11m R 14.3  0.5  11:46.12 smbd
1657 XXXXX  20   0  429m  19m  12m R 13.9  0.5  15:17.44 smbd
2841 XXXXX  20   0  434m  17m  10m R 13.9  0.5  14:31.61 smbd
2221 XXXXX  20   0  429m  15m 9972 R 13.6  0.4  14:51.52 smbd
2495 XXXXX  20   0  425m  12m 9.8m R 13.6  0.3  12:17.84 smbd
2837 XXXXX  20   0  434m  20m  13m R 13.6  0.5  17:46.70 smbd
2869 XXXXX  20   0  432m  12m 7920 R 13.6  0.3  19:38.08 smbd
3605 XXXXX  20   0  433m  15m 9.9m R 13.6  0.4  11:32.85 smbd
2277 XXXXX    20   0  429m  13m  10m R 13.3  0.4   2:53.38 smbd
2872 XXXXX   20   0  434m  17m  10m R 13.3  0.5  18:19.79 smbd
2215 root      20   0  212m 4900 3756 R 11.0  0.1  16:57.29 sssd_be
2340 root      20   0  435m  22m  13m R  4.0  0.6   6:42.97 smbd
2377 root      20   0  436m  20m  11m R  4.0  0.5   4:11.66 smbd
…
…
---------------------------------

When I tshark the network traffic I always see these packets:
---------------------------------
Seconds   packet type
1.507312 SMB2 258 Create Request File: a\directory
1.508001 SMB2 242 Create Response File: a\directory
1.509120 SMB2 154 Notify Request File: a\directory
1.509786 SMB2 131 Notify Response, Error: STATUS_INVALID_PARAMETER
1.510548 SMB2 258 Create Request File: a\directory 
<starting over and over and over>
---------------------------------
A debug level of 10 on such an smbd gives me:
---------------------------------
[2014/09/24 16:15:04.429169, 10, pid=2495, effective(547, 500), real(547, 0)] ../source3/smbd/smb2_server.c:874(smb2_set_operation_credit)  smb2_set_operation_credit: requested 1, charge 1, granted 1, current possible/max 482/512, total granted/max/low/range 31/8192/3287320/31
[2014/09/24 16:15:04.433078, 10, pid=2495, effective(547, 500), real(547, 0)] ../source3/smbd/smb2_server.c:3257(smbd_smb2_io_handler)  smbd_smb2_request idx[1] of 5 vectors
[2014/09/24 16:15:04.433153, 10, pid=2495, effective(547, 500), real(547, 0)] ../source3/smbd/smb2_server.c:621(smb2_validate_sequence_number)  smb2_validate_sequence_number: clearing id 3287320 (position 2328) from bitmap
[2014/09/24 16:15:04.433242, 10, pid=2495, effective(547, 500), real(547, 0)] ../source3/smbd/smb2_server.c:1878(smbd_smb2_request_dispatch)  smbd_smb2_request_dispatch: opcode[SMB2_OP_NOTIFY] mid = 3287320
[2014/09/24 16:15:04.433285,  4, pid=2495, effective(547, 500), real(547, 0)] ../source3/smbd/uid.c:384(change_to_user)  Skipping user change - already user
[2014/09/24 16:15:04.433325, 10, pid=2495, effective(547, 500), real(547, 0)] ../source3/smbd/smb2_server.c:1780(smbd_smb2_request_verify_creditcharge)  mid 3287320, CreditCharge: 1, NeededCharge: 2
[2014/09/24 16:15:04.433393,  2, pid=2495, effective(547, 500), real(547, 0)] ../source3/smbd/smb2_server.c:1784(smbd_smb2_request_verify_creditcharge)  CreditCharge too low, given 1, needed 2
[2014/09/24 16:15:04.433435, 10, pid=2495, effective(547, 500), real(547, 0)] ../source3/smbd/smb2_server.c:2598(smbd_smb2_request_error_ex)  smbd_smb2_request_error_ex: idx[1] status[NT_STATUS_INVALID_PARAMETER] || at ../source3/smbd/smb2_notify.c:85
[2014/09/24 16:15:04.433471, 10, pid=2495, effective(547, 500), real(547, 0)] ../source3/smbd/smb2_server.c:2499(smbd_smb2_request_done_ex)  smbd_smb2_request_done_ex: idx[1] status[NT_STATUS_INVALID_PARAMETER] body[8] dyn[yes:1] at ../source3/smbd/smb2_server.c:2651
[2014/09/24 16:15:04.433516, 10, pid=2495, effective(547, 500), real(547, 0)] ../source3/smbd/smb2_server.c:874(smb2_set_operation_credit)  smb2_set_operation_credit: requested 1, charge 1, granted 1, current possible/max 482/512, total granted/max/low/range 31/8192/3287321/31
---------------------------------

-The clients are Windows 7 pro x64
-apparently the application causing this is AutoCAD
-Server Centos 6.5 / Samba 4.1.12 / 
-Domain member to a samba 3.x domain
-Users in openLDAP RFC2307 directory

-Global config section and relevant share:
---------------------------------
[global]
   workgroup = XXXX
   security = domain
   idmap config * : range = 16777216-33554431
   template shell = /sbin/nologin
   winbind use default domain = false
   winbind offline logon = false
        server string = %h SERVER Samba %v
        map untrusted to domain = yes
        encrypt passwords = Yes
        map to guest = Bad Password
        passwd program = /usr/bin/passwd %u
        log level = 0
        log file = /var/log/samba/log.smb.%m
        max log size = 1000
        unix charset = UTF8
        os level = 2
        preferred master = No
        dns proxy = No
        wins server = 192.168.X.X
        preload = backup
        interfaces = 192.168.X.X/xx

[X]
       comment = xxxxxx
        path = /xxx/x
        writeable = yes
        create mask = 0775
        directory mask = 0775
        veto files = /*.vlx/autorun.inf/*.fas/*.lsp/*.dvb/
        invalid users = @xxx @xxx xxx xxx
        vfs objects = full_audit
        full_audit:prefix = %u|%I|%m|%S
        full_audit:success = unlink rmdir rename
        full_audit:failure = none
        full_audit:facility = local6
        full_audit:priority = NOTICE
---------------------------------
Comment 1 Jeremy Allison 2014-09-25 17:49:04 UTC
Investigating. I need to see what Win2K12 does with a Notify request with creditcharge of 1, and a return buffer size of 131072 (which according to MS-SMB2 is invalid).
Comment 2 Stefan Metzmacher 2014-09-25 21:10:00 UTC
(In reply to comment #0)
> Created attachment 10305 [details]
> wireshark dump of the looping requests
> 
> The CPU load on the server is very high, processes that are causing it are
> nslcd or since I changed to it sssd and the smbd processes themselves.
> 
> Top looks like this:
> ---------------------------------
> PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> 2216 root      20   0  200m 5820 3024 R 31.2  0.1  44:47.46 sssd_nss
> 3631 XXXXX  20   0  430m  10m 8472 R 17.3  0.3  12:37.12 smbd
> 2870 XXXXX  20   0  434m  17m  11m R 16.3  0.5  21:22.22 smbd
> 3603 XXXXX  20   0  430m 9.8m 7492 R 16.3  0.3  14:19.45 smbd
> 3250 XXXXX  20   0  432m  14m 9384 R 15.9  0.4  12:17.09 smbd
> 3616 XXXXX  20   0  430m  11m 8452 R 15.9  0.3  13:37.91 smbd
> 2842 XXXXX  20   0  435m  21m  13m R 15.6  0.6  22:24.76 smbd
> 3246 XXXXX  20   0  430m  12m 9684 S 15.6  0.3  15:35.65 smbd
> 3267 XXXXX  20   0  433m  16m  11m S 15.6  0.4  15:04.27 smbd
> 1637 XXXXX  20   0  425m  15m  13m S 14.9  0.4  13:32.95 smbd
> 2879 XXXXX  20   0  435m  18m  10m R 14.9  0.5  19:01.03 smbd
> 2899 XXXXX  20   0  434m  17m  11m R 14.9  0.5  17:24.44 smbd
> 2838 XXXXX  20   0  433m  17m  11m R 14.3  0.5  11:46.12 smbd
> 1657 XXXXX  20   0  429m  19m  12m R 13.9  0.5  15:17.44 smbd
> 2841 XXXXX  20   0  434m  17m  10m R 13.9  0.5  14:31.61 smbd
> 2221 XXXXX  20   0  429m  15m 9972 R 13.6  0.4  14:51.52 smbd
> 2495 XXXXX  20   0  425m  12m 9.8m R 13.6  0.3  12:17.84 smbd
> 2837 XXXXX  20   0  434m  20m  13m R 13.6  0.5  17:46.70 smbd
> 2869 XXXXX  20   0  432m  12m 7920 R 13.6  0.3  19:38.08 smbd
> 3605 XXXXX  20   0  433m  15m 9.9m R 13.6  0.4  11:32.85 smbd
> 2277 XXXXX    20   0  429m  13m  10m R 13.3  0.4   2:53.38 smbd
> 2872 XXXXX   20   0  434m  17m  10m R 13.3  0.5  18:19.79 smbd
> 2215 root      20   0  212m 4900 3756 R 11.0  0.1  16:57.29 sssd_be
> 2340 root      20   0  435m  22m  13m R  4.0  0.6   6:42.97 smbd
> 2377 root      20   0  436m  20m  11m R  4.0  0.5   4:11.66 smbd
> …
> …
> ---------------------------------
> 
> When I tshark the network traffic I always see these packets:
> ---------------------------------
> Seconds   packet type
> 1.507312 SMB2 258 Create Request File: a\directory
> 1.508001 SMB2 242 Create Response File: a\directory
> 1.509120 SMB2 154 Notify Request File: a\directory
> 1.509786 SMB2 131 Notify Response, Error: STATUS_INVALID_PARAMETER
> 1.510548 SMB2 258 Create Request File: a\directory 
> <starting over and over and over>
> ---------------------------------
> A debug level of 10 on such an smbd gives me:
> ---------------------------------
> [2014/09/24 16:15:04.429169, 10, pid=2495, effective(547, 500), real(547, 0)]
> ../source3/smbd/smb2_server.c:874(smb2_set_operation_credit) 
> smb2_set_operation_credit: requested 1, charge 1, granted 1, current
> possible/max 482/512, total granted/max/low/range 31/8192/3287320/31
> [2014/09/24 16:15:04.433078, 10, pid=2495, effective(547, 500), real(547, 0)]
> ../source3/smbd/smb2_server.c:3257(smbd_smb2_io_handler)  smbd_smb2_request
> idx[1] of 5 vectors
> [2014/09/24 16:15:04.433153, 10, pid=2495, effective(547, 500), real(547, 0)]
> ../source3/smbd/smb2_server.c:621(smb2_validate_sequence_number) 
> smb2_validate_sequence_number: clearing id 3287320 (position 2328) from bitmap
> [2014/09/24 16:15:04.433242, 10, pid=2495, effective(547, 500), real(547, 0)]
> ../source3/smbd/smb2_server.c:1878(smbd_smb2_request_dispatch) 
> smbd_smb2_request_dispatch: opcode[SMB2_OP_NOTIFY] mid = 3287320
> [2014/09/24 16:15:04.433285,  4, pid=2495, effective(547, 500), real(547, 0)]
> ../source3/smbd/uid.c:384(change_to_user)  Skipping user change - already user
> [2014/09/24 16:15:04.433325, 10, pid=2495, effective(547, 500), real(547, 0)]
> ../source3/smbd/smb2_server.c:1780(smbd_smb2_request_verify_creditcharge)  mid
> 3287320, CreditCharge: 1, NeededCharge: 2
> [2014/09/24 16:15:04.433393,  2, pid=2495, effective(547, 500), real(547, 0)]
> ../source3/smbd/smb2_server.c:1784(smbd_smb2_request_verify_creditcharge) 
> CreditCharge too low, given 1, needed 2
> [2014/09/24 16:15:04.433435, 10, pid=2495, effective(547, 500), real(547, 0)]
> ../source3/smbd/smb2_server.c:2598(smbd_smb2_request_error_ex) 
> smbd_smb2_request_error_ex: idx[1] status[NT_STATUS_INVALID_PARAMETER] || at
> ../source3/smbd/smb2_notify.c:85
> [2014/09/24 16:15:04.433471, 10, pid=2495, effective(547, 500), real(547, 0)]
> ../source3/smbd/smb2_server.c:2499(smbd_smb2_request_done_ex) 
> smbd_smb2_request_done_ex: idx[1] status[NT_STATUS_INVALID_PARAMETER] body[8]
> dyn[yes:1] at ../source3/smbd/smb2_server.c:2651
> [2014/09/24 16:15:04.433516, 10, pid=2495, effective(547, 500), real(547, 0)]
> ../source3/smbd/smb2_server.c:874(smb2_set_operation_credit) 
> smb2_set_operation_credit: requested 1, charge 1, granted 1, current
> possible/max 482/512, total granted/max/low/range 31/8192/3287321/31
> ---------------------------------
> 
> -The clients are Windows 7 pro x64

Did you set DisableLargeMtu = 0 under 
HKLM\System\CurrentControlSet\Services\LanmanWorkstation\Parameters ?

See http://blogs.technet.com/b/josebda/archive/2010/08/27/performance-tuning-guidelines-for-windows-server-2008-r2.aspx for details about this.

In order to debug this it would be good to know how the client was configured.

We you used the default, then DisableLargeMtu = 0 might change things.
Comment 3 Stefan Metzmacher 2014-09-25 21:19:21 UTC
(In reply to comment #2)
> (In reply to comment #0)
> > Created attachment 10305 [details] [details]
> > wireshark dump of the looping requests
> > 
> > The CPU load on the server is very high, processes that are causing it are
> > nslcd or since I changed to it sssd and the smbd processes themselves.
> > 
> > Top looks like this:
> > ---------------------------------
> > PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> > 2216 root      20   0  200m 5820 3024 R 31.2  0.1  44:47.46 sssd_nss
> > 3631 XXXXX  20   0  430m  10m 8472 R 17.3  0.3  12:37.12 smbd
> > 2870 XXXXX  20   0  434m  17m  11m R 16.3  0.5  21:22.22 smbd
> > 3603 XXXXX  20   0  430m 9.8m 7492 R 16.3  0.3  14:19.45 smbd
> > 3250 XXXXX  20   0  432m  14m 9384 R 15.9  0.4  12:17.09 smbd
> > 3616 XXXXX  20   0  430m  11m 8452 R 15.9  0.3  13:37.91 smbd
> > 2842 XXXXX  20   0  435m  21m  13m R 15.6  0.6  22:24.76 smbd
> > 3246 XXXXX  20   0  430m  12m 9684 S 15.6  0.3  15:35.65 smbd
> > 3267 XXXXX  20   0  433m  16m  11m S 15.6  0.4  15:04.27 smbd
> > 1637 XXXXX  20   0  425m  15m  13m S 14.9  0.4  13:32.95 smbd
> > 2879 XXXXX  20   0  435m  18m  10m R 14.9  0.5  19:01.03 smbd
> > 2899 XXXXX  20   0  434m  17m  11m R 14.9  0.5  17:24.44 smbd
> > 2838 XXXXX  20   0  433m  17m  11m R 14.3  0.5  11:46.12 smbd
> > 1657 XXXXX  20   0  429m  19m  12m R 13.9  0.5  15:17.44 smbd
> > 2841 XXXXX  20   0  434m  17m  10m R 13.9  0.5  14:31.61 smbd
> > 2221 XXXXX  20   0  429m  15m 9972 R 13.6  0.4  14:51.52 smbd
> > 2495 XXXXX  20   0  425m  12m 9.8m R 13.6  0.3  12:17.84 smbd
> > 2837 XXXXX  20   0  434m  20m  13m R 13.6  0.5  17:46.70 smbd
> > 2869 XXXXX  20   0  432m  12m 7920 R 13.6  0.3  19:38.08 smbd
> > 3605 XXXXX  20   0  433m  15m 9.9m R 13.6  0.4  11:32.85 smbd
> > 2277 XXXXX    20   0  429m  13m  10m R 13.3  0.4   2:53.38 smbd
> > 2872 XXXXX   20   0  434m  17m  10m R 13.3  0.5  18:19.79 smbd
> > 2215 root      20   0  212m 4900 3756 R 11.0  0.1  16:57.29 sssd_be
> > 2340 root      20   0  435m  22m  13m R  4.0  0.6   6:42.97 smbd
> > 2377 root      20   0  436m  20m  11m R  4.0  0.5   4:11.66 smbd
> > …
> > …
> > ---------------------------------
> > 
> > When I tshark the network traffic I always see these packets:
> > ---------------------------------
> > Seconds   packet type
> > 1.507312 SMB2 258 Create Request File: a\directory
> > 1.508001 SMB2 242 Create Response File: a\directory
> > 1.509120 SMB2 154 Notify Request File: a\directory
> > 1.509786 SMB2 131 Notify Response, Error: STATUS_INVALID_PARAMETER
> > 1.510548 SMB2 258 Create Request File: a\directory 
> > <starting over and over and over>
> > ---------------------------------
> > A debug level of 10 on such an smbd gives me:
> > ---------------------------------
> > [2014/09/24 16:15:04.429169, 10, pid=2495, effective(547, 500), real(547, 0)]
> > ../source3/smbd/smb2_server.c:874(smb2_set_operation_credit) 
> > smb2_set_operation_credit: requested 1, charge 1, granted 1, current
> > possible/max 482/512, total granted/max/low/range 31/8192/3287320/31
> > [2014/09/24 16:15:04.433078, 10, pid=2495, effective(547, 500), real(547, 0)]
> > ../source3/smbd/smb2_server.c:3257(smbd_smb2_io_handler)  smbd_smb2_request
> > idx[1] of 5 vectors
> > [2014/09/24 16:15:04.433153, 10, pid=2495, effective(547, 500), real(547, 0)]
> > ../source3/smbd/smb2_server.c:621(smb2_validate_sequence_number) 
> > smb2_validate_sequence_number: clearing id 3287320 (position 2328) from bitmap
> > [2014/09/24 16:15:04.433242, 10, pid=2495, effective(547, 500), real(547, 0)]
> > ../source3/smbd/smb2_server.c:1878(smbd_smb2_request_dispatch) 
> > smbd_smb2_request_dispatch: opcode[SMB2_OP_NOTIFY] mid = 3287320
> > [2014/09/24 16:15:04.433285,  4, pid=2495, effective(547, 500), real(547, 0)]
> > ../source3/smbd/uid.c:384(change_to_user)  Skipping user change - already user
> > [2014/09/24 16:15:04.433325, 10, pid=2495, effective(547, 500), real(547, 0)]
> > ../source3/smbd/smb2_server.c:1780(smbd_smb2_request_verify_creditcharge)  mid
> > 3287320, CreditCharge: 1, NeededCharge: 2
> > [2014/09/24 16:15:04.433393,  2, pid=2495, effective(547, 500), real(547, 0)]
> > ../source3/smbd/smb2_server.c:1784(smbd_smb2_request_verify_creditcharge) 
> > CreditCharge too low, given 1, needed 2
> > [2014/09/24 16:15:04.433435, 10, pid=2495, effective(547, 500), real(547, 0)]
> > ../source3/smbd/smb2_server.c:2598(smbd_smb2_request_error_ex) 
> > smbd_smb2_request_error_ex: idx[1] status[NT_STATUS_INVALID_PARAMETER] || at
> > ../source3/smbd/smb2_notify.c:85
> > [2014/09/24 16:15:04.433471, 10, pid=2495, effective(547, 500), real(547, 0)]
> > ../source3/smbd/smb2_server.c:2499(smbd_smb2_request_done_ex) 
> > smbd_smb2_request_done_ex: idx[1] status[NT_STATUS_INVALID_PARAMETER] body[8]
> > dyn[yes:1] at ../source3/smbd/smb2_server.c:2651
> > [2014/09/24 16:15:04.433516, 10, pid=2495, effective(547, 500), real(547, 0)]
> > ../source3/smbd/smb2_server.c:874(smb2_set_operation_credit) 
> > smb2_set_operation_credit: requested 1, charge 1, granted 1, current
> > possible/max 482/512, total granted/max/low/range 31/8192/3287321/31
> > ---------------------------------
> > 
> > -The clients are Windows 7 pro x64
> 
> Did you set DisableLargeMtu = 0 under 
> HKLM\System\CurrentControlSet\Services\LanmanWorkstation\Parameters ?
> 
> See
> http://blogs.technet.com/b/josebda/archive/2010/08/27/performance-tuning-guidelines-for-windows-server-2008-r2.aspx
> for details about this.
> 
> In order to debug this it would be good to know how the client was configured.
> 
> We you used the default, then DisableLargeMtu = 0 might change things.

In either way please also try the oposite of the current setting.

Thanks!
Comment 4 Jeremy Allison 2014-09-25 23:42:44 UTC
Created attachment 10307 [details]
Fix from Volker (modified).

Can you try this raw patch (originally from Volker, modified by me) ?

Neither we nor Microsoft think this is the right fix :-), but if it gets you going temporarily at least it'll stop disruption to your file services.

Cheers,

Jeremy.
Comment 5 Jeremy Allison 2014-09-25 23:43:55 UTC
Created attachment 10308 [details]
Correct version..

Sorry, posted the wrong version. Here is the version to try.
Comment 6 Volker Lendecke 2014-09-26 02:47:13 UTC
jkoop, if you're in the position to do so, please file a bug with Microsoft. We have very solid confirmation that this *is* a bug in Windows 7 that will be fixed in whatever the next version of Windows will be. Backporting however requires some customer to complain. It's extremely likely that this will also happen against a Windows server.
Comment 7 Volker Lendecke 2014-09-26 02:48:18 UTC
(In reply to comment #6)
> jkoop, if you're in the position to do so, please file a bug with Microsoft. We
> have very solid confirmation that this *is* a bug in Windows 7 that will be
> fixed in whatever the next version of Windows will be. Backporting however
> requires some customer to complain. It's extremely likely that this will also
> happen against a Windows server.

By the way, AutoCAD also has their share in this bug probably. They should not retry this call in an endless loop. You should see some CPU load on the client as well.
Comment 8 jkoop 2014-09-26 09:39:38 UTC
(In reply to comment #3)
> (In reply to comment #2)
> > (In reply to comment #0)
> > > Created attachment 10305 [details] [details] [details]
> > > wireshark dump of the looping requests
> > > 
> > > The CPU load on the server is very high, processes that are causing it are
> > > nslcd or since I changed to it sssd and the smbd processes themselves.
> > > 
> > > Top looks like this:
> > > ---------------------------------
> > > PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> > > 2216 root      20   0  200m 5820 3024 R 31.2  0.1  44:47.46 sssd_nss
> > > 3631 XXXXX  20   0  430m  10m 8472 R 17.3  0.3  12:37.12 smbd
> > > 2870 XXXXX  20   0  434m  17m  11m R 16.3  0.5  21:22.22 smbd
> > > 3603 XXXXX  20   0  430m 9.8m 7492 R 16.3  0.3  14:19.45 smbd
> > > 3250 XXXXX  20   0  432m  14m 9384 R 15.9  0.4  12:17.09 smbd
> > > 3616 XXXXX  20   0  430m  11m 8452 R 15.9  0.3  13:37.91 smbd
> > > 2842 XXXXX  20   0  435m  21m  13m R 15.6  0.6  22:24.76 smbd
> > > 3246 XXXXX  20   0  430m  12m 9684 S 15.6  0.3  15:35.65 smbd
> > > 3267 XXXXX  20   0  433m  16m  11m S 15.6  0.4  15:04.27 smbd
> > > 1637 XXXXX  20   0  425m  15m  13m S 14.9  0.4  13:32.95 smbd
> > > 2879 XXXXX  20   0  435m  18m  10m R 14.9  0.5  19:01.03 smbd
> > > 2899 XXXXX  20   0  434m  17m  11m R 14.9  0.5  17:24.44 smbd
> > > 2838 XXXXX  20   0  433m  17m  11m R 14.3  0.5  11:46.12 smbd
> > > 1657 XXXXX  20   0  429m  19m  12m R 13.9  0.5  15:17.44 smbd
> > > 2841 XXXXX  20   0  434m  17m  10m R 13.9  0.5  14:31.61 smbd
> > > 2221 XXXXX  20   0  429m  15m 9972 R 13.6  0.4  14:51.52 smbd
> > > 2495 XXXXX  20   0  425m  12m 9.8m R 13.6  0.3  12:17.84 smbd
> > > 2837 XXXXX  20   0  434m  20m  13m R 13.6  0.5  17:46.70 smbd
> > > 2869 XXXXX  20   0  432m  12m 7920 R 13.6  0.3  19:38.08 smbd
> > > 3605 XXXXX  20   0  433m  15m 9.9m R 13.6  0.4  11:32.85 smbd
> > > 2277 XXXXX    20   0  429m  13m  10m R 13.3  0.4   2:53.38 smbd
> > > 2872 XXXXX   20   0  434m  17m  10m R 13.3  0.5  18:19.79 smbd
> > > 2215 root      20   0  212m 4900 3756 R 11.0  0.1  16:57.29 sssd_be
> > > 2340 root      20   0  435m  22m  13m R  4.0  0.6   6:42.97 smbd
> > > 2377 root      20   0  436m  20m  11m R  4.0  0.5   4:11.66 smbd
> > > …
> > > …
> > > ---------------------------------
> > > 
> > > When I tshark the network traffic I always see these packets:
> > > ---------------------------------
> > > Seconds   packet type
> > > 1.507312 SMB2 258 Create Request File: a\directory
> > > 1.508001 SMB2 242 Create Response File: a\directory
> > > 1.509120 SMB2 154 Notify Request File: a\directory
> > > 1.509786 SMB2 131 Notify Response, Error: STATUS_INVALID_PARAMETER
> > > 1.510548 SMB2 258 Create Request File: a\directory 
> > > <starting over and over and over>
> > > ---------------------------------
> > > A debug level of 10 on such an smbd gives me:
> > > ---------------------------------
> > > [2014/09/24 16:15:04.429169, 10, pid=2495, effective(547, 500), real(547, 0)]
> > > ../source3/smbd/smb2_server.c:874(smb2_set_operation_credit) 
> > > smb2_set_operation_credit: requested 1, charge 1, granted 1, current
> > > possible/max 482/512, total granted/max/low/range 31/8192/3287320/31
> > > [2014/09/24 16:15:04.433078, 10, pid=2495, effective(547, 500), real(547, 0)]
> > > ../source3/smbd/smb2_server.c:3257(smbd_smb2_io_handler)  smbd_smb2_request
> > > idx[1] of 5 vectors
> > > [2014/09/24 16:15:04.433153, 10, pid=2495, effective(547, 500), real(547, 0)]
> > > ../source3/smbd/smb2_server.c:621(smb2_validate_sequence_number) 
> > > smb2_validate_sequence_number: clearing id 3287320 (position 2328) from bitmap
> > > [2014/09/24 16:15:04.433242, 10, pid=2495, effective(547, 500), real(547, 0)]
> > > ../source3/smbd/smb2_server.c:1878(smbd_smb2_request_dispatch) 
> > > smbd_smb2_request_dispatch: opcode[SMB2_OP_NOTIFY] mid = 3287320
> > > [2014/09/24 16:15:04.433285,  4, pid=2495, effective(547, 500), real(547, 0)]
> > > ../source3/smbd/uid.c:384(change_to_user)  Skipping user change - already user
> > > [2014/09/24 16:15:04.433325, 10, pid=2495, effective(547, 500), real(547, 0)]
> > > ../source3/smbd/smb2_server.c:1780(smbd_smb2_request_verify_creditcharge)  mid
> > > 3287320, CreditCharge: 1, NeededCharge: 2
> > > [2014/09/24 16:15:04.433393,  2, pid=2495, effective(547, 500), real(547, 0)]
> > > ../source3/smbd/smb2_server.c:1784(smbd_smb2_request_verify_creditcharge) 
> > > CreditCharge too low, given 1, needed 2
> > > [2014/09/24 16:15:04.433435, 10, pid=2495, effective(547, 500), real(547, 0)]
> > > ../source3/smbd/smb2_server.c:2598(smbd_smb2_request_error_ex) 
> > > smbd_smb2_request_error_ex: idx[1] status[NT_STATUS_INVALID_PARAMETER] || at
> > > ../source3/smbd/smb2_notify.c:85
> > > [2014/09/24 16:15:04.433471, 10, pid=2495, effective(547, 500), real(547, 0)]
> > > ../source3/smbd/smb2_server.c:2499(smbd_smb2_request_done_ex) 
> > > smbd_smb2_request_done_ex: idx[1] status[NT_STATUS_INVALID_PARAMETER] body[8]
> > > dyn[yes:1] at ../source3/smbd/smb2_server.c:2651
> > > [2014/09/24 16:15:04.433516, 10, pid=2495, effective(547, 500), real(547, 0)]
> > > ../source3/smbd/smb2_server.c:874(smb2_set_operation_credit) 
> > > smb2_set_operation_credit: requested 1, charge 1, granted 1, current
> > > possible/max 482/512, total granted/max/low/range 31/8192/3287321/31
> > > ---------------------------------
> > > 
> > > -The clients are Windows 7 pro x64
> > 
> > Did you set DisableLargeMtu = 0 under 
> > HKLM\System\CurrentControlSet\Services\LanmanWorkstation\Parameters ?
> > 
> > See
> > http://blogs.technet.com/b/josebda/archive/2010/08/27/performance-tuning-guidelines-for-windows-server-2008-r2.aspx
> > for details about this.
> > 
> > In order to debug this it would be good to know how the client was configured.
> > 
> > We you used the default, then DisableLargeMtu = 0 might change things.
> 
> In either way please also try the oposite of the current setting.
> 
> Thanks!

Hi,
I tried both settings, 0 and 1 to DisableLargeMtu with rebooting in between. The result was the same.
Comment 9 jkoop 2014-09-26 11:37:36 UTC
Since I saw that on our machines running Samba 3.6 this problem doesn't occur with the same client machines I just disabled SMB2 and SMB3 as a temporary workaround on our production servers.
I didn't want to try "change notify = no" on production, so I put
server max protocol = NT1
instead of down-grading to 3.6.
Now the behaviour is gone along with all SMB2 / 3 Features ;), but it eliminates the pressure. All file-servers with SMB2 enabled samba hosting AutoCAD 2011 drawings will receive that option.

I will try Volker's patch in a test environment.
I have to check if we can file a bug report with Microsoft, I've never done that before.
Comment 10 jkoop 2014-11-14 09:30:33 UTC
Hello,

it took me a while to implement but I can confirm that the strange behavior is gone after the patch!!! No more endless loops, no more high CPU load and excessive LDAP queries.

I tried the patch on 4.1.12 and 4.1.13 on multiple systems and have 4.1.13 in production.

For me this works better than disabling SMB2. Although I know it is a workaround for a windows client bug and the original behavior is altered, I'd love to see this as a configurable option in a future release.

As a side note, I received a report that some printing machine had troubles connecting to a smb share for dumping scan data after the update from 3.6 to 4.1. This was gone after the patch, but I wasn't able to track that down reliably, but it might be related.

Cheers,
Jan
Comment 11 Volker Lendecke 2015-02-06 16:02:49 UTC
Created attachment 10708 [details]
configurable patch for 4.1
Comment 12 Volker Lendecke 2015-02-06 16:04:06 UTC
Created attachment 10709 [details]
configurable patch for master

This is a version of this patch with a config option for master. As it's unlikely that Microsoft or Autodesk will fix their software and we want to host Autocad files, I'd love to see this patch in
Comment 13 Richard Sharpe 2015-02-25 21:19:37 UTC
I think we saw this at Panzura with Samba 3.6.12 as well.

We fixed it by silently allowing it, I think, but forgot to file a bug report.
Comment 14 Richard Sharpe 2015-02-25 21:22:48 UTC
(In reply to Richard Sharpe from comment #13)

Actually, not the same problem in Samba 3.6.x. Windows was asking for more data than we could put in a response. I think it was crashing Samba 3.6.x.