Upgraded to Samba 3.2.6, and immediately printing went into coma. Observed that printers lose their configurations (driver models, locations, etc). And if I reset the printer drivers on each share, eventually they will be lost again. Looking at the logs we are getting panics like this: --- [2008/12/16 05:12:51, 0] lib/util_tdb.c:tdb_log(682) tdb(/var/lib/samba/printing/CanonC1.tdb): tdb_reopen: file dev/inode has changed! [2008/12/16 05:12:51, 0] lib/util.c:reinit_after_fork(1050) tdb_reopen_all failed. [2008/12/16 05:12:51, 0] printing/print_cups.c:cups_pcap_load_async(392) cups_pcap_load_async: reinit_after_fork() failed [2008/12/16 05:12:51, 0] lib/util.c:smb_panic(1663) PANIC (pid 2590): cups_pcap_load_async: reinit_after_fork() failed [2008/12/16 05:12:51, 0] lib/util.c:log_stack_trace(1767) BACKTRACE: 11 stack frames: #0 smbd(log_stack_trace+0x2d) [0x479c2f] #1 smbd(smb_panic+0x84) [0x479d71] #2 smbd(cups_cache_reload+0x299) [0x440d2c] #3 smbd(pcap_cache_reload+0x107) [0x43d189] #4 smbd(reload_printers+0x31) [0x6c477a] #5 smbd(check_reload+0x130) [0x2c6510] #6 smbd [0x2c655f] #7 smbd(smbd_process+0xc92) [0x2c8a76] #8 smbd(main+0x212c) [0x6c70ec] #9 /lib/libc.so.6(__libc_start_main+0xdc) [0xdd9dec] #10 smbd [0x24c061] [2008/12/16 05:12:51, 0] lib/fault.c:dump_core(201) dumping core in /var/log/samba/cores/smbd --- They come in pairs, and always relate to one of the printers tdb file at /var/lib/samba/printing/ . I've already checked the tdb files for corruption and cleaned some of them (the ones I can), with no luck.
Should add we were upgrading from Samba 3.2.3, and configuration hasn't been changed since 3.2.1. Also, we are running it in Centos 5.2 fully updated.
> This message is the key : > >> tdb(/var/lib/samba/printing/CanonC1.tdb): tdb_reopen: file dev/inode >> has >> changed! >> [2008/12/16 05:12:51, 0] lib/util.c:reinit_after_fork(1050) >> tdb_reopen_all failed. > > which causes a deliberate panic in the code here : > > /* Child. */ > if (!reinit_after_fork(smbd_messaging_context(), true)) { > DEBUG(0,("cups_pcap_load_async: reinit_after_fork() > failed\n")); > smb_panic("cups_pcap_load_async: reinit_after_fork() > failed"); > } > > How many printers do you have ? Do you have some sort of a batch > job running that may delete printing tdb's ? > > Jeremy. > There are five printers, four of which with p&p drivers. And no, no batch running, nothing that even touches the tdb files. A ten minuet session of lsof shows only smbd accesses those tdb files. I would be surprised if it was otherwise, since only Windows clients ever hit this server. /Pedro
This : tdb(/var/lib/samba/printing/CanonC1.tdb): tdb_reopen: file dev/inode has changed! is the line that confuses me. On a reopen, how can the dev/inode have changed unless the file has been deleted ? What filesystem type are you using for var/lib/samba/printing/CanonC1.tdb ? Jeremy.
(In reply to comment #3) > This : > > tdb(/var/lib/samba/printing/CanonC1.tdb): tdb_reopen: file dev/inode has > changed! > > is the line that confuses me. On a reopen, how can the dev/inode have changed > unless the file has been deleted ? What filesystem type are you using for > var/lib/samba/printing/CanonC1.tdb ? > > Jeremy. > It's an ext3 filesystem: /dev/mapper/VolGroup00-LogVol00 on / type ext3 (rw,noatime) Output from df: Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/VolGroup00-LogVol00 16188328 2653988 12698756 18% / Could some kind of filesystem corruption do this? /Pedro
No, it shouldn't be filesystem corruption. Does this always happen to a <printername>.tdb file when someone is printing to that printer ? How reproducible is this (and exactly what do you do to reproduce it) ? Jeremy.
(In reply to comment #3) > This : > > tdb(/var/lib/samba/printing/CanonC1.tdb): tdb_reopen: file dev/inode has > changed! > > is the line that confuses me. On a reopen, how can the dev/inode have changed > unless the file has been deleted ? What filesystem type are you using for > var/lib/samba/printing/CanonC1.tdb ? > > Jeremy. > It's an ext3 filesystem: /dev/mapper/VolGroup00-LogVol00 on / type ext3 (rw,noatime) Output from df: Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/VolGroup00-LogVol00 16188328 2653988 12698756 18% / Could some kind of filesystem corruption do this? /Pedro (In reply to comment #5) > No, it shouldn't be filesystem corruption. Does this always happen to a > <printername>.tdb file when someone is printing to that printer ? How > reproducible is this (and exactly what do you do to reproduce it) ? > > Jeremy. > I don't need to reproduce it. In the last 24 hours we got 394 "tdb_reopen: file dev/inode has changed!" messages, with a corresponding panic and core dump. It is happening as I am writing this. It happens more frequently during the day , of course. At night there is very little if any activity on the server. I assume any activity is because users leave their workstations on. /Pedro
(In reply to comment #3) > This : > > tdb(/var/lib/samba/printing/CanonC1.tdb): tdb_reopen: file dev/inode has > changed! > > is the line that confuses me. On a reopen, how can the dev/inode have changed > unless the file has been deleted ? What filesystem type are you using for > var/lib/samba/printing/CanonC1.tdb ? > > Jeremy. > It's an ext3 filesystem: /dev/mapper/VolGroup00-LogVol00 on / type ext3 (rw,noatime) Output from df: Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/VolGroup00-LogVol00 16188328 2653988 12698756 18% / Could some kind of filesystem corruption do this? /Pedro (In reply to comment #5) > No, it shouldn't be filesystem corruption. Does this always happen to a > <printername>.tdb file when someone is printing to that printer ? How > reproducible is this (and exactly what do you do to reproduce it) ? > > Jeremy. > I don't need to reproduce it. In the last 24 hours we got 394 "tdb_reopen: file dev/inode has changed!" messages, with a corresponding panic and core dump. It is happening as I am writing this. It happens more frequently during the day , of course. At night there is very little if any activity on the server. I assume any activity is because users leave their workstations on. /Pedro(In reply to comment #6) > (In reply to comment #3) > > This : > > > > tdb(/var/lib/samba/printing/CanonC1.tdb): tdb_reopen: file dev/inode has > > changed! > > > > is the line that confuses me. On a reopen, how can the dev/inode have changed > > unless the file has been deleted ? What filesystem type are you using for > > var/lib/samba/printing/CanonC1.tdb ? > > > > Jeremy. > > > > It's an ext3 filesystem: > > /dev/mapper/VolGroup00-LogVol00 on / type ext3 (rw,noatime) > > Output from df: > > Filesystem 1K-blocks Used Available Use% Mounted on > /dev/mapper/VolGroup00-LogVol00 > 16188328 2653988 12698756 18% / > > Could some kind of filesystem corruption do this? > > /Pedro > > > > > (In reply to comment #5) > > No, it shouldn't be filesystem corruption. Does this always happen to a > > <printername>.tdb file when someone is printing to that printer ? How > > reproducible is this (and exactly what do you do to reproduce it) ? > > > > Jeremy. > > > > I don't need to reproduce it. In the last 24 hours we got 394 "tdb_reopen: file > dev/inode has changed!" messages, with a corresponding panic and core dump. It > is happening as I am writing this. > > It happens more frequently during the day , of course. At night there is very > little if any activity on the server. I assume any activity is because users > leave their workstations on. > > /Pedro > (In reply to comment #5) > No, it shouldn't be filesystem corruption. Does this always happen to a > <printername>.tdb file when someone is printing to that printer ? How > reproducible is this (and exactly what do you do to reproduce it) ? > > Jeremy. > Oh, and no one needs to be printing, also. We are getting these now, and I am sure there is no one working at that office at this time. /Pedro
Having the same issue as Pedro Rodrigues explains. We applied the patch from commit 101ef64e3181335b66524296af08f7df04391b9a . The panics stopped, but we keep losing printer settings. It seems to be that samba is corrupting the pdb files. Also it seems that not all threads contain the same information about printers. I've experienced that windows clients will show all the correct settings when displaying the printer properties page for the server printer, but when trying to connect windows will say the driver is missing and then trying to connect again will succeed without errors. If you you have a possible solution I'll be happy to test it on at least one of our servers.
That is tdb files of course, not pdb.
I have tried 101ef64e3181335b66524296af08f7df04391b9a but for me that did resolve the crashing issue. Currently runing 3.2.7 + http://samba.org/~vlendec/0001-Ensure-null-termination.patch This seems so fix the issue and my issue with rinter settings unsetting. I will post again on in some days if it the fix still holds (no crashes or printers unsetting)
Looking closer, this definitely is another workaround for a deeper problem. The memory overflow that this fixes just "can't happen" from looking at the code. So there's deeper problems. I've spent the last hours staring at that code, I hope I'll find something over the weekend. Volker
Took the better part of today, but I think why my patch fixes a valgrind error. In 3.2.7, after a fork, smbd_event_context() is not correctly re-initialized. Thus if we have just forked a cups printcap updater and connect while that is still active, we have two processes trying to read from the pipe that is only intended for the parent process which has forked the updater. Thus we get garbage in the binary pipe protocol. I have no clue yet how this can affect the tdb dev/ino thing, but this particular one is fixed in v3-2-test where we have a proper event_context_reinit. Volker
Ok both my machines ( one 3.2.7 + http://samba.org/~vlendec/0001-Ensure-null-termination.patch) ( and the other 3.2.7 + http://samba.org/~vlendec/0001-Ensure-null-termination.patch + 7648b8f94c0709f39b873b872bc1273a6568ad3b + e72978ae81580b1ec74d3b21df82dd0d49551c64) Have not had crashed smbd processes but they have both unset one or more printer drivers overnight. rpcclient enumprinters description:[\\atlas\ToshibaESTD2500c,TOSHIBA e-STUDIO3510cSeries PS3,ToshibaESTD2500c] turned into: description:[\\atlas\ToshibaESTD2500c,,ToshibaESTD2500c] without any users doing anything. Last person left the office yesterday and then at about 20:00 it was still ok this morning it was broken again. I think I will have to donwgrade to 3.0 since users are not always willing guinea pigs. Part of my log not sure if this is helpfull: Jan 23 20:44:58 [smbd] [2009/01/23 20:44:58, 0] lib/substitute.c:alloc_sub_basic(556)_ Jan 23 20:44:58 [smbd] alloc_sub_basic: NULL source string! This should not happen_
See my latest comment. I'll have a patch that fixes the problem I mentioned in a few hours. I have a theory why this can also cause the tdb reopen problems. I'm very certain that the v3-2-test git tree fixes this, but I'll provide a minimal patch latest by sunday. Volker
Created attachment 3891 [details] proposed patch Ok, I think this is it. It does not go upstream, v3-2-test has already fixed this in another way. Volker
Wow - thanks for tracking that one down. I would *never* have found that :-). Great detective work. Jeremy.
Closing out bug report. Seems to be fixed. Thanks for reporting!