Linux environment: Samba 3.0.7 on Fedora Core 2, kernel 2.6.8-1.521smp running on a P4. Windows environment: XP Home SP2 running on a P4. NT Backup utility version unknown, but it was installed from the XP CD. I don't know if MS Update has changed it. How to reproduce: configure a share "backup" on the Linux box. Tell NT Backup on the XP box to write a backup file to the "backup" share. The backup proceeds at around 3MB/sec for a few minutes and a few hundred MB, then hangs. After a minute or so the backup terminates with the following message in the log: Error: The device reported an error on a request to write data to media. Error reported: Unknown Error Relevant parts of smb.conf: -----------------8<--------------8<-------------- # Global parameters [global] workgroup = HOUSE security = share server string = Paul's Linux box null passwords = Yes log file = /var/log/samba/%m.log max log size = 50 name resolve order = wins lmhosts bcast socket options = TCP_NODELAY IPTOS_LOWDELAY preferred master = No domain master = No large readwrite = yes dns proxy = No ldap ssl = no hosts allow = 192.168.1., 192.168.2., 192.168.123., 127. [backup] comment = Backup disk path = /mnt/backup read only = No guest ok = Yes -----------------8<--------------8<-------------- I ran tcpdump on one run and got the following at the point of failure. However tcpdump was only getting about half the packets. If you can't replicate this let me know and I'll see if I can get better data. -----------------8<--------------8<-------------- 21:30:51.344526 IP 192.168.123.101.1267 > 192.168.123.200.netbios-ssn: . 293353638:293355098(1460) ack 530506 win 64325 NBT Packet 21:30:51.344655 IP 192.168.123.101.1267 > 192.168.123.200.netbios-ssn: . 293355098:293356558(1460) ack 530506 win 64325 NBT Packet 21:30:51.344680 IP 192.168.123.200.netbios-ssn > 192.168.123.101.1267: . ack 293356558 win 32767 21:30:51.344779 IP 192.168.123.101.1267 > 192.168.123.200.netbios-ssn: . 293356558:293358018(1460) ack 530506 win 64325 NBT Packet 21:30:51.344904 IP 192.168.123.101.1267 > 192.168.123.200.netbios-ssn: . 293359478:293360938(1460) ack 530506 win 64325 NBT Packet 21:30:51.344932 IP 192.168.123.200.netbios-ssn > 192.168.123.101.1267: . ack 293358018 win 32767 <nop,nop,sack sack 1 {293359478:293360938} > 21:30:51.345033 IP 192.168.123.101.1267 > 192.168.123.200.netbios-ssn: . 293362398:293363858(1460) ack 530506 win 64325 NBT Packet 21:30:51.345054 IP 192.168.123.200.netbios-ssn > 192.168.123.101.1267: . ack 293358018 win 32767 <nop,nop,sack sack 2 {293362398:293363858}{293359478:293360938} > 21:30:51.345158 IP 192.168.123.101.1267 > 192.168.123.200.netbios-ssn: . 293365318:293366778(1460) ack 530506 win 64325 NBT Packet 21:30:51.345179 IP 192.168.123.200.netbios-ssn > 192.168.123.101.1267: . ack 293358018 win 32767 <nop,nop,sack sack 3 {293365318:293366778}{293362398:293363858}{293359478:293360938} > 21:30:51.345301 IP 192.168.123.101.1267 > 192.168.123.200.netbios-ssn: . 293366778:293368238(1460) ack 530506 win 64325 NBT Packet 21:30:51.345411 IP 192.168.123.101.1267 > 192.168.123.200.netbios-ssn: . 293369698:293371158(1460) ack 530506 win 64325 NBT Packet 21:31:17.325761 IP 192.168.123.200.netbios-dgm > 192.168.123.255.netbios-dgm: NBT UDP PACKET(138) 21:31:17.325852 IP 192.168.123.200.netbios-ns > 192.168.123.255.netbios-ns: NBT UDP PACKET(137): QUERY; REQUEST; BROADCAST 21:31:17.325968 IP 192.168.123.101.netbios-ns > 192.168.123.200.netbios-ns: NBT UDP PACKET(137): QUERY; POSITIVE; RESPONSE; UNICAST 21:31:17.326068 IP 192.168.123.200.32819 > 192.168.123.254.domain: 42896+ PTR? 255.123.168.192.in-addr.arpa. (46) 21:31:17.363154 IP 192.168.123.254.domain > 192.168.123.200.32819: 42896 NXDomain* 0/1/0 (114) 21:31:45.437128 IP 192.168.123.101.1267 > 192.168.123.200.netbios-ssn: P 294001672:294001725(53) ack 531567 win 64729 NBT Packet 21:31:45.437158 IP 192.168.123.200.netbios-ssn > 192.168.123.101.1267: . ack 294001725 win 32767 -----------------8<--------------8<-------------- The /var/log/samba log file contains the following entries that may be relevant. However the times do not seem to co-incide. -----------------8<--------------8<-------------- [2004/10/16 21:22:05, 1] smbd/service.c:make_connection_snum(648) treetop (192.168.123.101) connect to service backup initially as user nobody ( uid=99, gid=99) (pid 3964) [2004/10/16 21:32:19, 1] smbd/service.c:make_connection_snum(648) treetop (192.168.123.101) connect to service backup initially as user nobody ( uid=99, gid=99) (pid 4057) [2004/10/16 21:37:58, 1] smbd/service.c:close_cnum(837) treetop (192.168.123.101) closed connection to service backup [2004/10/16 22:44:33, 1] smbd/service.c:make_connection_snum(648) treetop (192.168.123.101) connect to service backup initially as user nobody ( uid=99, gid=99) (pid 4596) [2004/10/16 22:48:24, 1] smbd/service.c:close_cnum(837) treetop (192.168.123.101) closed connection to service backup -----------------8<--------------8<-------------- Hope this helps.
I'd like to confirm Paul's observation with a newer version of Samba (3.0.9) under Fedora Core 3, Kernel 2.6.9 as Server and WinXP Home SP2 as client. Both machines are AMD K7s on SIS476 based MBs with SIS900 onboard NICs connected via a the switch of a DLINK DI604 WAN router. When doing backups greater than a few hundred MB the problem occurs. What happens is that once in a while the "worker" smbd is going into the "D" state (uninterruptible sleep) while there's a considerable amount of data waiting in the receive queue of its' socket. Depending on the length of the sleep (usually around 30 seconds) NTBACKUP may decide to bail out, which it regularly does. The samba log files contain varying error entries, a "connection reset by peer" is the prevailing one. Note that the smbd's fall asleep not only in conjunction with NTBACKUP but also when copying big files (>1GB) over to the Samba server. The file copy operation fails less often, but if it does, with an even worse result. This is because the file size on the server copy creates the impression that the operation succeeded, while the content is totally corrupted. Copying arbitrary big files from the server to the client doesn't expose this behaviour. Note that I'd like to have raised the severity to critical, as data loss is the result of the failure, effectively breaking my backup strategy.
please retest against 3.0.11 and reopen if the bug still exists. Thanks.
sorry for the same, cleaning up the database to prevent unecessary reopens of bugs.