On 1 server I administer (out of dozens), a few months ago I started seeing weird behaviour every few weeks, always the same. rsyslod and smbd CPU time goes through the roof. The smb logs fill up with: [2010/07/20 13:51:30, 0] lib/events.c:287(s3_event_debug) s3_event: sys_select() failed: 9:Bad file descriptor to the tune of about 16000 a second (all identical)! System wait time goes through the as syslog writes them all to disk, then the 500GB disk fills up and the system blows up in the usual way when root is at 0% free. While this is happening, smbd appears to be serving everything up just fine, albeit slower and slower. Very strange! Must kill -9 smbd or reboot to get things sane again. The logs just before this begins look normal. Just the normal million complaints (but only 1 every few seconds, so harmless) about IPv6 reverse lookups. This last time it happened (today) I got an strace of the running smbd: time(NULL) = 1282050865 stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2865, ...}) = 0 time(NULL) = 1282050865 send(24, "<27>Aug 17 08:14:25 smbd[8204]: "..., 91, MSG_NOSIGNAL) = 91 geteuid32() = 0 write(337, "[2010/08/17 08:14:25, 0] lib/ev"..., 59) = 59 time(NULL) = 1282050865 send(24, "<27>Aug 17 08:14:25 smbd[8204]: "..., 87, MSG_NOSIGNAL) = 87 geteuid32() = 0 write(337, " s3_event: sys_select() failed:"..., 55) = 55 gettimeofday({1282050865, 684949}, NULL) = 0 gettimeofday({1282050865, 684975}, NULL) = 0 select(1515, [8 490 1024 1025 1026 1027 1032 1033 1034 1037 1088 1092 1093 1096 1097 1099 1103 1105 1107 1109 1110 1114 1115 1118 1120 1121 1122 1123 1125 1127 1128 1129 1132 1133 1134 1137 1139 1153 1155 1157 1158 1159 1160 1162 1282 1284 1285 1286 1287 1288 1289 1290 1291 1293 1296 1299 1301 1302 1303 1315 1317 1318 1319 1320 1322 1323 1327 1333 1334 1335 1336 1347 1348 1352 1353 1355 1356 1358 1360 1361 1362 1364 1366 1367 1379 1381 1382 1389 1390 1394 1395 1399 1400 1401 1402 1403 1404 1405 1407 1411 1417 1420 1424 1425 1427 1428 1431 1443 1445 1446 1447 1448 1450 1451 1455 1461 1462 1463 1464 1475 1476 1480 1481 1483 1484 1486 1488 1489 1490 1492 1494 1495 1505 1514], [1032 1514], NULL, {9999, 0}) = -1 EBADF (Bad file descriptor) That just seems to repeat a zillion times a second. This computer was perfectly reliable. The problems may have begun when I upgraded from Fedora 10 to 12, but I couldn't guarantee that. I run a dozen other almost identical servers and haven't seen this problem elsewhere (the workload and office environments differ). The box is a <1 year old Xeon with ECC RAM. It would be nice to either fix this bug, or mitigate its effects by having samba rate-limit how many errors it spits to syslog (16k/sec is too many). Since I'm sure this will hit again in 7-21 days, any ideas on what to run at the time to better debug I can do. However, restarting smbd during the problem is sometimes tricky (heavily used file server).
Can you look in /proc/<pid>/fd what those file descriptors are? And, can you try to run a longer "strace -p <pid> -f -o /tmp/smbd.strace" on an smbd while it is operating normal and upload that strace file here? Please also upload your smb.conf file. Thanks, Volker
Here's smb.conf and a 10 sec strace on the working, loaded server. The fd lookup I'll have to do when it goes crazy again, right?
Created attachment 5908 [details] smb.conf file This file almost never changes, I've used this config on 12+ servers for a long while now (years).
Created attachment 5909 [details] strace of working (non-wonky) server
That looks completely normal. Although I *NEVER* understand why people completely cripple their performance by setting SO_SNDBUF/RCVBUF. Probably you have other stuff with higher prio running in your network, because otherwise you would not artificially slow down Samba so severely. Also, fd 3 on that process is not clear to me. What do we read those 8 bytes from? To really nail this, we need the strace up to the point where those file descriptors pile up. Volker
Re SO_SNDBUF/RCVBUF: did that because of about 10 different web articles / mag / book bits recommending that for performance. Google "samba performance tuning" or similar and I'm sure you'll find that as a top recommendation. However, if a samba dev says it hurts, then I'll try taking them out. Thanks! > Also, fd 3 on that process is not clear to me. What do we read those 8 bytes from? Anything I can do there to help? Your mention of /proc/*/fd things, do you mean to just cat or less one of those and try to decipher what file it is, or is there a way to get the filename of the fd from the proc fs? Is it safe to cat one of those files without disturbing samba? > To really nail this, we need the strace up to the point where those file descriptors pile up. Do you mean leave strace running at all times until the bug hits? Would that kill performance and fill the disks fast? I haven't been able to guess at a trigger yet, other than just time, so that may be hard to accomplish. I'm open to ideas and I can have a cheat-sheet ready to try for the next time it happens.
ls -l /proc/<pid-number/fd/ never hurts
:-) Funny, but in all my time in linux, I've never seen/used that. lr-x------. 1 root root 64 Aug 17 10:57 3 -> /dev/urandom 3 is urandom, that explains the small byte reads.
Hmm, what if I write a script to keep an eye on the /proc/X/fd dir and when it gets too large start logging changes to it? That should capture the changing/growth of the open fd's when the bug hits? Useful? or no?
Bug just hit again. 200G+ in log files. Unfortunately it hit in the middle of a busy session and I had to reboot quickly (no time to run diags). I did save some log output but it just looks the same as before. I upgraded the kernel so we'll see if that helps any. Will report back...
Bug just hit again, same story, busy day, had to clear logs and restart quickly. I had to clear the logs as the 1TB disk ran out of space from the 500GB log files. Samba this time was: samba-3.4.8-59.fc12.i686 I am updating today to: 3.4.9-60.fc12 Maybe that will help.
Very odd, the bug just hit again, within days of the last one. But this bug was different (for the first time?). The strace of the 100% CPU processshows non-stop: read(8, 0xbfa3828f, 1) = -1 EAGAIN (Resource temporarily unavailable) gettimeofday({1288045077, 873111}, NULL) = 0 gettimeofday({1288045077, 873141}, NULL) = 0 gettimeofday({1288045077, 873164}, NULL) = 0 select(1159, [8 1024 1025 1026 1027 1032 1033 1034 1037 1088 1090 1092 1097 1105 1106 1110 1111 1114 1115 1118 1122 1123 1126 1127 1129 1132 1134 1136 1138 1139 1153 1154], [1032], NULL, {9999, 0}) = 32 (in [8 1025 1026 1027 1032 1033 1034 1037 1056 1058 1059 1060 1061 1065 1070 1072 1073 1074 1075 1088 1090 1092 1097 1105 1106 1110 1111 1114 1115 1118 1122 1123 1126 1127 1129 1132 1134 1136 1138 1139 1153 1154], out [1032], left {9998, 999997}) read(8, 0xbfa3828f, 1) = -1 EAGAIN (Resource temporarily unavailable) gettimeofday({1288045077, 873438}, NULL) = 0 gettimeofday({1288045077, 873461}, NULL) = 0 gettimeofday({1288045077, 873484}, NULL) = 0 select(1159, [8 1024 1025 1026 1027 1032 1033 1034 1037 1088 1090 1092 1097 1105 1106 1110 1111 1114 1115 1118 1122 1123 1130 1132 1134 1136 1138 1139 1153 1154], [1032], NULL, {9999, 0}) = 30 (in [8 1025 1026 1027 1032 1033 1034 1037 1056 1058 1059 1060 1061 1065 1070 1072 1073 1074 1075 1088 1090 1092 1097 1105 1106 1110 1111 1114 1115 1118 1122 1123 1130 1132 1134 1136 1138 1139 1153 1154], out [1032], left {9998, 999997}) That's different from the usual bug manifestation. And it didn't seem to be filling up syslog this time, in fact it doesn't seem to have been logged anywhere? Not sure if this helps, the parent smb process strace showed lots of child exited: select(25, [6 22 23 24], [], NULL, {9998, 721920}) = 1 (in [23], left {9998, 3929}) gettimeofday({1288045120, 542212}, NULL) = 0 accept(23, {sa_family=AF_INET6, sin6_port=htons(40679), inet_pton(AF_INET6, "::ffff:192.168.101.2", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 26 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb78947c8) = 18092 close(26) = 0 gettimeofday({1288045120, 542482}, NULL) = 0 gettimeofday({1288045120, 542507}, NULL) = 0 select(25, [6 22 23 24], [], NULL, {9999, 0}) = ? ERESTARTNOHAND (To be restarted) --- SIGCHLD (Child exited) @ 0 (0) --- write(7, "\0", 1) = 1 sigreturn() = ? (mask now [FPE USR2 PIPE]) waitpid(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG) = 18092 waitpid(-1, 0xbfa392fc, WNOHANG) = 0 gettimeofday({1288045120, 563828}, NULL) = 0 gettimeofday({1288045120, 563854}, NULL) = 0 select(25, [6 22 23 24], [], NULL, {9999, 0}) = 1 (in [6], left {9998, 999998}) gettimeofday({1288045120, 563918}, NULL) = 0 read(6, "\0", 16) = 1 gettimeofday({1288045120, 563978}, NULL) = 0 gettimeofday({1288045120, 564003}, NULL) = 0 select(25, [6 22 23 24], [], NULL, {9999, 0}) = 1 (in [23], left {9998, 21821}) gettimeofday({1288045121, 542257}, NULL) = 0 accept(23, {sa_family=AF_INET6, sin6_port=htons(40681), inet_pton(AF_INET6, "::ffff:192.168.101.2", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 26 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb78947c8) = 18093 close(26) = 0 gettimeofday({1288045121, 542504}, NULL) = 0 gettimeofday({1288045121, 542529}, NULL) = 0 select(25, [6 22 23 24], [], NULL, {9999, 0}) = ? ERESTARTNOHAND (To be restarted) --- SIGCHLD (Child exited) @ 0 (0) ---
Bug just hit again (the normal fd one, not the new variation). Disk filled up with logs. Very strange, that's 3 problems in about 1 week, which is way more often than normal. Can't think of anything that has changed recently though, except for a new version of the Windows program that this file server feeds. Captured some more info this time. The logs were filled with: [2010/10/27 09:22:02, 0] lib/events.c:287(s3_event_debug) s3_event: sys_select() failed: 9:Bad file descriptor I had shut down samba (service smb stop) but the 1 runaway process was still running (this always happens) and I remembered to check its FD's. There were over 1200 of them! Is this normal? I'll attach the output. At the moment I captured it, FD 9 was: l-wx------ 1 root root 64 Oct 27 09:42 9 -> /var/run/smbd.pid (deleted) I wish I had captured it before I did smb stop as it would have been interesting to see if that file had been deleted before the stop. I'll try to catch the closed FD before I stop next time. I'm not sure, should I look at the parent smb process or the runaway one when getting the list of FD's?
Created attachment 6037 [details] output of ll /proc/<pid>/fd while ps was running away
Another strange occurrence and maybe a clue? This was the first time the problem appears to have fixed itself without my intervention. Bug appears to have hit and filled the logs from [2010/11/01 16:07:59, 0] lib/events.c:287(s3_event_debug) s3_event: sys_select() failed: 9:Bad file descriptor to [2010/11/02 06:40:47, 0] lib/events.c:287(s3_event_debug) s3_event: sys_select() failed: 9:Bad file descriptor And then magically disappeared! I checked and the parent smbd had not been restarted: root 855 1 0 Nov01 ? 00:00:41 smbd -D I can see by the logs it was last restarted 2010-11-01 04:00:14 So the bug hit, ran and eventually resolved itself before the disk was full of logs (97% full 1TB when I checked today). I checked all running smbd's fd's to check what fd 9 was. They are all smbd.pid: l-wx------ 1 root root 64 Nov 3 03:03 9 -> /var/run/smbd.pid What a strange file to be causing fd errors!? That's been confirmed the file a few times now. It appears that with my setup fd 9 is always smbd.pid. Checking that file now I see: #ll /var/run/smbd.pid -rw-r--r-- 1 root root 4 Nov 1 04:00 /var/run/smbd.pid Here's an idea: what if this bug is caused by a wonky Windows workstation that is screwing up its smbd connection and the one smbd child goes berserk. Then in this recent occurrence the workstation is disconnecting (rebooting, whatever) and the wonky smbd ps dies and the bug goes away? What else could be happening?
Happened again, but I think I have some useful data this time, and a new theory. The CPU was 99% on process 27981 ps -ef |grep smbd root 12484 24308 0 07:17 ? 00:00:00 smbd -D root 14114 24308 0 07:24 ? 00:00:00 smbd -D root 14210 24308 0 07:24 ? 00:00:00 smbd -D root 24308 1 0 04:00 ? 00:00:02 smbd -D root 24320 24308 0 04:00 ? 00:00:00 smbd -D root 24385 24308 0 04:00 ? 00:00:00 smbd -D root 24952 24308 0 04:00 ? 00:00:00 smbd -D samba 24954 24308 0 04:00 ? 00:00:01 smbd -D samba 27981 1 92 Nov16 ? 21:42:01 smbd -D Notice how there runaway ps is the "old" one. I have a script that sometimes restarts smbd at 4am (service smbd restart on a Fedora/RH box). There is no reason that I can see why a Nov16 "leftover" would still be running after a Nov17 4am restart. I've seen hints of this before. I think this is the gist of the problem. More info: strace -p 27981 gettimeofday({1290000838, 735285}, NULL) = 0 gettimeofday({1290000838, 735305}, NULL) = 0 gettimeofday({1290000838, 735325}, NULL) = 0 select(1082, [8 1024 1025 1026 1027 1032 1033 1034 1037 1081], [1032], NULL, {9999, 33554432}) = 10 (in [8 1028 1029 1032 1033 1034 1037 1057 1058 1059 1060 106$ read(8, 0xbfe6d0ef, 1) = -1 EAGAIN (Resource temporarily unavailable) gettimeofday({1290000838, 735558}, NULL) = 0 gettimeofday({1290000838, 735579}, NULL) = 0 gettimeofday({1290000838, 735599}, NULL) = 0 select(1082, [8 1024 1025 1026 1027 1032 1033 1034 1037 1081], [1032], NULL, {9999, 33554432}) = 10 (in [8 1028 1029 1032 1033 1034 1037 1057 1058 1059 1060 106$ read(8, 0xbfe6d0ef, 1) = -1 EAGAIN (Resource temporarily unavailable) ... ad infinitum #ll /proc/27981/fd/|wc 1095 12033 76640 #ll /proc/27981/fd/8 lr-x------ 1 root root 64 Nov 17 07:34 /proc/27981/fd/8 -> pipe:[51587366] Also, I wrote a script to check for runaway ps's and log it. From this I can see the runaway started Nov 16 10am. Here's the ps -ef |grep {sn}mbd and "top |head-1" equivalent at that time: Tue Nov 16 10:07:21 CST 2010 samba 343 29217 0 08:15 ? 00:00:04 smbd -D samba 1577 29217 0 08:21 ? 00:00:02 smbd -D samba 5944 29217 0 08:37 ? 00:00:00 smbd -D samba 14300 29217 0 09:08 ? 00:00:02 smbd -D root 21468 29217 0 09:34 ? 00:00:01 smbd -D samba 27981 29217 13 07:58 ? 00:17:08 smbd -D samba 28236 29217 0 10:00 ? 00:00:00 smbd -D root 29217 1 0 04:00 ? 00:00:05 smbd -D root 29229 29217 0 04:00 ? 00:00:00 smbd -D root 29243 1 0 04:00 ? 00:00:00 nmbd -D root 29244 29243 0 04:00 ? 00:00:00 nmbd -D samba 29295 29217 0 04:00 ? 00:00:13 smbd -D samba 31171 29217 0 08:09 ? 00:00:07 smbd -D samba 31174 29217 0 08:09 ? 00:00:05 smbd -D samba 32681 29217 0 08:15 ? 00:00:07 smbd -D samba 27981 13.3 0.1 25764 5608 ? R 07:58 17:08 smbd -D 15 mins later: samba 27981 22.3 0.1 25764 5608 ? R 07:58 32:07 smbd -D 15 mins later: samba 27981 29.6 0.1 25764 5608 ? R 07:58 47:07 smbd -D Every 15 mins 27981 ramps up a few more CPU: top ps cpu: 35.7 top ps cpu: 40.8 top ps cpu: 45.1 top ps cpu: 48.9 top ps cpu: 52.2 top ps cpu: 55.0 top ps cpu: 57.6 It was at 91.9% when I killed it. As usual, only kill -9 works on it. Obviously the process is going mental. Once gone mental, service smbd restarts do NOT kill the mental process so you're left with a child smbd process without a parent and I'm guessing that it freaks out even more that its parent's resources are gone/closed. I've managed to stop the disk-filling logs by heavily customizing the samba logging features and rsyslog. What else can I do to help with this bug?
Created attachment 6359 [details] Make FD_SETSIZE tuneable for select(2) with more FDs Executing select(2) and friends (e.g. FD_SET macro) with a value of fd that is equal to or larger than FD_SETSIZE will result in undefined behavior. The default value for FD_SETSIZE is 1024 on major OSes. This is a patch for v3-5-test branch to make FD_SETSIZE tuneable for select(2) with more FDs. Try as the following: $ gzip -dc samba-3.5.8.tar.gz |tar xf - $ cd samba-3.5.8 $ patch -p1 </path/to/samba-3.5.8-FD_SETSIZE.patch $ cd source3 $ CPPFLAGS="-DSAMBA_FD_SETSIZE=16384" ./configure --YOUR-CONFIUGRE-OPTIONS $ make # su # make install
Increasing FD_SETSIZE is not the full fix unfortunately. For 3.6 we've converted all samba calls to select to poll(2), which is the real fix. Please be aware that this is neither the full fix, as there are libraries that still call select. But all that Samba can do is done at this point. Closing as fixed. Volker