Bug 1944 - NTBackup fails under XP Home SP2 after a few minutes
NTBackup fails under XP Home SP2 after a few minutes
Status: CLOSED FIXED
Product: Samba 3.0
Classification: Unclassified
Component: File Services
3.0.7
x86 Linux
: P3 major
: none
Assigned To: Samba Bugzilla Account
Samba QA Contact
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2004-10-16 23:54 UTC by Paul Johnson
Modified: 2005-08-24 10:21 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Paul Johnson 2004-10-16 23:54:33 UTC
Linux environment: Samba 3.0.7 on Fedora Core 2, kernel 2.6.8-1.521smp running
on a P4.

Windows environment: XP Home SP2 running on a P4.  NT Backup utility version
unknown, but it was installed from the XP CD.  I don't know if MS Update has
changed it.

How to reproduce: configure a share "backup" on the Linux box.  Tell NT Backup
on the XP box to write a backup file to the "backup" share.  The backup proceeds
at around 3MB/sec for a few minutes and a few hundred MB, then hangs.  After a
minute or so the backup terminates with the following message in the log:

   Error: The device reported an error on a request to write data to media.
   Error reported: Unknown Error

Relevant parts of smb.conf:
-----------------8<--------------8<--------------
# Global parameters
[global]
        workgroup = HOUSE
        security = share
        server string = Paul's Linux box
        null passwords = Yes
        log file = /var/log/samba/%m.log
        max log size = 50
        name resolve order = wins lmhosts bcast
        socket options = TCP_NODELAY IPTOS_LOWDELAY
        preferred master = No
        domain master = No
        large readwrite = yes
        dns proxy = No
        ldap ssl = no
        hosts allow = 192.168.1., 192.168.2., 192.168.123., 127.

[backup]
        comment = Backup disk
        path = /mnt/backup
        read only = No
        guest ok = Yes
-----------------8<--------------8<--------------

I ran tcpdump on one run and got the following at the point of failure.  However
tcpdump was only getting about half the packets.  If you can't replicate this
let me know and I'll see if I can get better data.

-----------------8<--------------8<--------------
21:30:51.344526 IP 192.168.123.101.1267 > 192.168.123.200.netbios-ssn: .
293353638:293355098(1460) ack 530506 win 64325 NBT Packet
21:30:51.344655 IP 192.168.123.101.1267 > 192.168.123.200.netbios-ssn: .
293355098:293356558(1460) ack 530506 win 64325 NBT Packet
21:30:51.344680 IP 192.168.123.200.netbios-ssn > 192.168.123.101.1267: . ack
293356558 win 32767
21:30:51.344779 IP 192.168.123.101.1267 > 192.168.123.200.netbios-ssn: .
293356558:293358018(1460) ack 530506 win 64325 NBT Packet
21:30:51.344904 IP 192.168.123.101.1267 > 192.168.123.200.netbios-ssn: .
293359478:293360938(1460) ack 530506 win 64325 NBT Packet
21:30:51.344932 IP 192.168.123.200.netbios-ssn > 192.168.123.101.1267: . ack
293358018 win 32767 <nop,nop,sack sack 1 {293359478:293360938} >
21:30:51.345033 IP 192.168.123.101.1267 > 192.168.123.200.netbios-ssn: .
293362398:293363858(1460) ack 530506 win 64325 NBT Packet
21:30:51.345054 IP 192.168.123.200.netbios-ssn > 192.168.123.101.1267: . ack
293358018 win 32767 <nop,nop,sack sack 2
{293362398:293363858}{293359478:293360938} >
21:30:51.345158 IP 192.168.123.101.1267 > 192.168.123.200.netbios-ssn: .
293365318:293366778(1460) ack 530506 win 64325 NBT Packet
21:30:51.345179 IP 192.168.123.200.netbios-ssn > 192.168.123.101.1267: . ack
293358018 win 32767 <nop,nop,sack sack 3
{293365318:293366778}{293362398:293363858}{293359478:293360938} >
21:30:51.345301 IP 192.168.123.101.1267 > 192.168.123.200.netbios-ssn: .
293366778:293368238(1460) ack 530506 win 64325 NBT Packet
21:30:51.345411 IP 192.168.123.101.1267 > 192.168.123.200.netbios-ssn: .
293369698:293371158(1460) ack 530506 win 64325 NBT Packet
21:31:17.325761 IP 192.168.123.200.netbios-dgm > 192.168.123.255.netbios-dgm:
NBT UDP PACKET(138)
21:31:17.325852 IP 192.168.123.200.netbios-ns > 192.168.123.255.netbios-ns: NBT
UDP PACKET(137): QUERY; REQUEST; BROADCAST
21:31:17.325968 IP 192.168.123.101.netbios-ns > 192.168.123.200.netbios-ns: NBT
UDP PACKET(137): QUERY; POSITIVE; RESPONSE; UNICAST
21:31:17.326068 IP 192.168.123.200.32819 > 192.168.123.254.domain:  42896+ PTR?
255.123.168.192.in-addr.arpa. (46)
21:31:17.363154 IP 192.168.123.254.domain > 192.168.123.200.32819:  42896
NXDomain* 0/1/0 (114)
21:31:45.437128 IP 192.168.123.101.1267 > 192.168.123.200.netbios-ssn: P
294001672:294001725(53) ack 531567 win 64729 NBT Packet
21:31:45.437158 IP 192.168.123.200.netbios-ssn > 192.168.123.101.1267: . ack
294001725 win 32767

-----------------8<--------------8<--------------

The /var/log/samba log file contains the following entries that may be relevant.
 However the times do not seem to co-incide.

-----------------8<--------------8<--------------
[2004/10/16 21:22:05, 1] smbd/service.c:make_connection_snum(648)
  treetop (192.168.123.101) connect to service backup initially as user nobody (
uid=99, gid=99) (pid 3964)
[2004/10/16 21:32:19, 1] smbd/service.c:make_connection_snum(648)
  treetop (192.168.123.101) connect to service backup initially as user nobody (
uid=99, gid=99) (pid 4057)
[2004/10/16 21:37:58, 1] smbd/service.c:close_cnum(837)
  treetop (192.168.123.101) closed connection to service backup
[2004/10/16 22:44:33, 1] smbd/service.c:make_connection_snum(648)
  treetop (192.168.123.101) connect to service backup initially as user nobody (
uid=99, gid=99) (pid 4596)
[2004/10/16 22:48:24, 1] smbd/service.c:close_cnum(837)
  treetop (192.168.123.101) closed connection to service backup
-----------------8<--------------8<--------------

Hope this helps.
Comment 1 Viktor Mihajlovski 2005-01-10 07:26:31 UTC
I'd like to confirm Paul's observation with a newer version of Samba (3.0.9)
under Fedora Core 3, Kernel 2.6.9 as Server and WinXP Home SP2 as client.
Both machines are AMD K7s on SIS476 based MBs with SIS900 onboard NICs connected
via a the switch of a DLINK DI604 WAN router.

When doing backups greater than a few hundred MB the problem occurs.

What happens is that once in a while the "worker" smbd is going into the "D"
state (uninterruptible sleep) while there's a considerable amount of data
waiting in the receive queue of its' socket. Depending on the length of the
sleep (usually around 30 seconds) NTBACKUP may decide to bail out, which it
regularly does. The samba log files contain varying error entries, a "connection
reset by peer" is the prevailing one.

Note that the smbd's fall asleep not only in conjunction with NTBACKUP but also
when copying big files (>1GB) over to the Samba server. The file copy operation
fails less often, but if it does, with an even worse result. This is because the
file size on the server copy creates the impression that the operation
succeeded, while the content is totally corrupted.

Copying arbitrary big files from the server to the client doesn't expose this
behaviour.

Note that I'd like to have raised the severity to critical, as data loss is the
result of the failure, effectively breaking my backup strategy.
Comment 2 Gerald (Jerry) Carter 2005-02-08 07:24:43 UTC
please retest against 3.0.11 and reopen if the bug still exists.  Thanks.
Comment 3 Gerald (Jerry) Carter 2005-08-24 10:21:29 UTC
sorry for the same, cleaning up the database to prevent unecessary reopens of bugs.