7624 – logs fill up with s3_event: sys_select() failed: 9:Bad file descriptor

Bug 7624 - logs fill up with s3_event: sys_select() failed: 9:Bad file descriptor

Summary: logs fill up with s3_event: sys_select() failed: 9:Bad file descriptor

Status:	RESOLVED FIXED

Alias:	None

Product:	Samba 3.4
Classification:	Unclassified
Component:	File services (show other bugs)
Version:	3.4.8
Hardware:	Other Linux

Importance:	P3 critical
Target Milestone:	---
Assignee:	Volker Lendecke
QA Contact:	Samba QA Contact

URL:
Keywords:

Depends on:
Blocks:

Reported:	2010-08-17 08:43 UTC by Trevor Cordes
Modified:	2011-05-14 09:19 UTC (History)
CC List:	1 user (show)

See Also:

Attachments
smb.conf file (2.86 KB, text/plain) 2010-08-17 10:23 UTC, Trevor Cordes	no flags	Details
strace of working (non-wonky) server (87.72 KB, text/plain) 2010-08-17 10:23 UTC, Trevor Cordes	no flags	Details
output of ll /proc/<pid>/fd while ps was running away (83.60 KB, text/plain) 2010-10-27 10:09 UTC, Trevor Cordes	no flags	Details
Make FD_SETSIZE tuneable for select(2) with more FDs (1.24 KB, patch) 2011-03-30 07:13 UTC, SATOH Fumiyasu	no flags	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Trevor Cordes 2010-08-17 08:43:53 UTC

On 1 server I administer (out of dozens), a few months ago I started seeing weird behaviour every few weeks, always the same.  rsyslod and smbd CPU time goes through the roof.  The smb logs fill up with:

[2010/07/20 13:51:30,  0] lib/events.c:287(s3_event_debug)
  s3_event: sys_select() failed: 9:Bad file descriptor

to the tune of about 16000 a second (all identical)!  System wait time goes through the as syslog writes them all to disk, then the 500GB disk fills up and the system blows up in the usual way when root is at 0% free.

While this is happening, smbd appears to be serving everything up just fine, albeit slower and slower.  Very strange!  Must kill -9 smbd or reboot to get things sane again.

The logs just before this begins look normal.  Just the normal million complaints (but only 1 every few seconds, so harmless) about IPv6 reverse lookups.

This last time it happened (today) I got an strace of the running smbd:

time(NULL)                              = 1282050865
stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2865, ...}) = 0
time(NULL)                              = 1282050865
send(24, "<27>Aug 17 08:14:25 smbd[8204]: "..., 91, MSG_NOSIGNAL) = 91
geteuid32()                             = 0
write(337, "[2010/08/17 08:14:25,  0] lib/ev"..., 59) = 59
time(NULL)                              = 1282050865
send(24, "<27>Aug 17 08:14:25 smbd[8204]: "..., 87, MSG_NOSIGNAL) = 87
geteuid32()                             = 0
write(337, "  s3_event: sys_select() failed:"..., 55) = 55
gettimeofday({1282050865, 684949}, NULL) = 0
gettimeofday({1282050865, 684975}, NULL) = 0
select(1515, [8 490 1024 1025 1026 1027 1032 1033 1034 1037 1088 1092 1093 1096 1097 1099 1103 1105 1107 1109 1110 1114 1115 1118 1120 1121 1122 1123 1125 1127 1128 1129 1132 1133 1134 1137 1139 1153 1155 1157 1158 1159 1160 1162 1282 1284 1285 1286 1287 1288 1289 1290 1291 1293 1296 1299 1301 1302 1303 1315 1317 1318 1319 1320 1322 1323 1327 1333 1334 1335 1336 1347 1348 1352 1353 1355 1356 1358 1360 1361 1362 1364 1366 1367 1379 1381 1382 1389 1390 1394 1395 1399 1400 1401 1402 1403 1404 1405 1407 1411 1417 1420 1424 1425 1427 1428 1431 1443 1445 1446 1447 1448 1450 1451 1455 1461 1462 1463 1464 1475 1476 1480 1481 1483 1484 1486 1488 1489 1490 1492 1494 1495 1505 1514], [1032 1514], NULL, {9999, 0}) = -1 EBADF (Bad file descriptor)

That just seems to repeat a zillion times a second.

This computer was perfectly reliable.  The problems may have begun when I upgraded from Fedora 10 to 12, but I couldn't guarantee that.  I run a dozen other almost identical servers and haven't seen this problem elsewhere (the workload and office environments differ).

The box is a <1 year old Xeon with ECC RAM.

It would be nice to either fix this bug, or mitigate its effects by having samba rate-limit how many errors it spits to syslog (16k/sec is too many).

Since I'm sure this will hit again in 7-21 days, any ideas on what to run at the time to better debug I can do.  However, restarting smbd during the problem is sometimes tricky (heavily used file server).

Comment 1 Volker Lendecke 2010-08-17 08:48:19 UTC

Can you look in /proc/<pid>/fd what those file descriptors are? And, can you try to run a longer "strace -p <pid> -f -o /tmp/smbd.strace" on an smbd while it is operating normal and upload that strace file here?

Please also upload your smb.conf file.

Thanks,

Volker

Comment 2 Trevor Cordes 2010-08-17 10:22:13 UTC

Here's smb.conf and a 10 sec strace on the working, loaded server.

The fd lookup I'll have to do when it goes crazy again, right?

Comment 3 Trevor Cordes 2010-08-17 10:23:09 UTC

Created attachment 5908 [details]
smb.conf file

This file almost never changes, I've used this config on 12+ servers for a long while now (years).

Comment 4 Trevor Cordes 2010-08-17 10:23:50 UTC

Created attachment 5909 [details]
strace of working (non-wonky) server

Comment 5 Volker Lendecke 2010-08-17 10:36:35 UTC

That looks completely normal. Although I *NEVER* understand why people completely cripple their performance by setting SO_SNDBUF/RCVBUF. Probably you have other stuff with higher prio running in your network, because otherwise you would not artificially slow down Samba so severely.

Also, fd 3 on that process is not clear to me. What do we read those 8 bytes from?

To really nail this, we need the strace up to the point where those file descriptors pile up.

Volker

Comment 6 Trevor Cordes 2010-08-17 11:05:11 UTC

Re SO_SNDBUF/RCVBUF: did that because of about 10 different web articles / mag / book bits recommending that for performance.  Google "samba performance tuning" or similar and I'm sure you'll find that as a top recommendation.  However, if a samba dev says it hurts, then I'll try taking them out.  Thanks!

> Also, fd 3 on that process is not clear to me. What do we read those 8 bytes
from?

Anything I can do there to help?  Your mention of /proc/*/fd things, do you mean to just cat or less one of those and try to decipher what file it is, or is there a way to get the filename of the fd from the proc fs?  Is it safe to cat one of those files without disturbing samba?

> To really nail this, we need the strace up to the point where those file
descriptors pile up.

Do you mean leave strace running at all times until the bug hits?  Would that kill performance and fill the disks fast?  I haven't been able to guess at a trigger yet, other than just time, so that may be hard to accomplish.

I'm open to ideas and I can have a cheat-sheet ready to try for the next time it happens.

Comment 7 Volker Lendecke 2010-08-17 11:06:52 UTC

ls -l /proc/<pid-number/fd/

never hurts

Comment 8 Trevor Cordes 2010-08-17 11:10:37 UTC

:-)  Funny, but in all my time in linux, I've never seen/used that.

lr-x------. 1 root root 64 Aug 17 10:57 3 -> /dev/urandom

3 is urandom, that explains the small byte reads.

Comment 9 Trevor Cordes 2010-08-17 11:12:39 UTC

Hmm, what if I write a script to keep an eye on the /proc/X/fd dir and when it gets too large start logging changes to it?  That should capture the changing/growth of the open fd's when the bug hits?  Useful? or no?

Comment 10 Trevor Cordes 2010-09-14 07:04:27 UTC

Bug just hit again.  200G+ in log files.  Unfortunately it hit in the middle of a busy session and I had to reboot quickly (no time to run diags).  I did save some log output but it just looks the same as before.  I upgraded the kernel so we'll see if that helps any.  Will report back...

Comment 11 Trevor Cordes 2010-10-20 05:52:47 UTC

Bug just hit again, same story, busy day, had to clear logs and restart quickly.  I had to clear the logs as the 1TB disk ran out of space from the 500GB log files.

Samba this time was:
samba-3.4.8-59.fc12.i686

I am updating today to:
3.4.9-60.fc12

Maybe that will help.

Comment 12 Trevor Cordes 2010-10-25 17:34:57 UTC

Very odd, the bug just hit again, within days of the last one.  But this bug was different (for the first time?).

The strace of the 100% CPU processshows non-stop:

read(8, 0xbfa3828f, 1)                  = -1 EAGAIN (Resource temporarily unavailable)
gettimeofday({1288045077, 873111}, NULL) = 0
gettimeofday({1288045077, 873141}, NULL) = 0
gettimeofday({1288045077, 873164}, NULL) = 0
select(1159, [8 1024 1025 1026 1027 1032 1033 1034 1037 1088 1090 1092 1097 1105 1106 1110 1111 1114 1115 1118 1122 1123 1126 1127 1129 1132 1134 1136 1138 1139 1153 1154], [1032], NULL, {9999, 0}) = 32 (in [8 1025 1026 1027 1032 1033 1034 1037 1056 1058 1059 1060 1061 1065 1070 1072 1073 1074 1075 1088 1090 1092 1097 1105 1106 1110 1111 1114 1115 1118 1122 1123 1126 1127 1129 1132 1134 1136 1138 1139 1153 1154], out [1032], left {9998, 999997})
read(8, 0xbfa3828f, 1)                  = -1 EAGAIN (Resource temporarily unavailable)
gettimeofday({1288045077, 873438}, NULL) = 0
gettimeofday({1288045077, 873461}, NULL) = 0
gettimeofday({1288045077, 873484}, NULL) = 0
select(1159, [8 1024 1025 1026 1027 1032 1033 1034 1037 1088 1090 1092 1097 1105 1106 1110 1111 1114 1115 1118 1122 1123 1130 1132 1134 1136 1138 1139 1153 1154], [1032], NULL, {9999, 0}) = 30 (in [8 1025 1026 1027 1032 1033 1034 1037 1056 1058 1059 1060 1061 1065 1070 1072 1073 1074 1075 1088 1090 1092 1097 1105 1106 1110 1111 1114 1115 1118 1122 1123 1130 1132 1134 1136 1138 1139 1153 1154], out [1032], left {9998, 999997})


That's different from the usual bug manifestation.  And it didn't seem to be filling up syslog this time, in fact it doesn't seem to have been logged anywhere?

Not sure if this helps, the parent smb process strace showed lots of child exited:

select(25, [6 22 23 24], [], NULL, {9998, 721920}) = 1 (in [23], left {9998, 3929})
gettimeofday({1288045120, 542212}, NULL) = 0
accept(23, {sa_family=AF_INET6, sin6_port=htons(40679), inet_pton(AF_INET6, "::ffff:192.168.101.2", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 26
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb78947c8) = 18092
close(26)                               = 0
gettimeofday({1288045120, 542482}, NULL) = 0
gettimeofday({1288045120, 542507}, NULL) = 0
select(25, [6 22 23 24], [], NULL, {9999, 0}) = ? ERESTARTNOHAND (To be restarted)
--- SIGCHLD (Child exited) @ 0 (0) ---
write(7, "\0", 1)                       = 1
sigreturn()                             = ? (mask now [FPE USR2 PIPE])
waitpid(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG) = 18092
waitpid(-1, 0xbfa392fc, WNOHANG)        = 0
gettimeofday({1288045120, 563828}, NULL) = 0
gettimeofday({1288045120, 563854}, NULL) = 0
select(25, [6 22 23 24], [], NULL, {9999, 0}) = 1 (in [6], left {9998, 999998})
gettimeofday({1288045120, 563918}, NULL) = 0
read(6, "\0", 16)                       = 1
gettimeofday({1288045120, 563978}, NULL) = 0
gettimeofday({1288045120, 564003}, NULL) = 0
select(25, [6 22 23 24], [], NULL, {9999, 0}) = 1 (in [23], left {9998, 21821})
gettimeofday({1288045121, 542257}, NULL) = 0
accept(23, {sa_family=AF_INET6, sin6_port=htons(40681), inet_pton(AF_INET6, "::ffff:192.168.101.2", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 26
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb78947c8) = 18093
close(26)                               = 0
gettimeofday({1288045121, 542504}, NULL) = 0
gettimeofday({1288045121, 542529}, NULL) = 0
select(25, [6 22 23 24], [], NULL, {9999, 0}) = ? ERESTARTNOHAND (To be restarted)
--- SIGCHLD (Child exited) @ 0 (0) ---

Comment 13 Trevor Cordes 2010-10-27 10:07:12 UTC

Bug just hit again (the normal fd one, not the new variation).  Disk filled up with logs.  Very strange, that's 3 problems in about 1 week, which is way more often than normal.  Can't think of anything that has changed recently though, except for a new version of the Windows program that this file server feeds.

Captured some more info this time.

The logs were filled with:
[2010/10/27 09:22:02,  0] lib/events.c:287(s3_event_debug)
  s3_event: sys_select() failed: 9:Bad file descriptor

I had shut down samba (service smb stop) but the 1 runaway process was still running (this always happens) and I remembered to check its FD's.  There were over 1200 of them!  Is this normal?  I'll attach the output.  At the moment I captured it, FD 9 was:
l-wx------ 1 root  root  64 Oct 27 09:42 9 -> /var/run/smbd.pid (deleted)

I wish I had captured it before I did smb stop as it would have been interesting to see if that file had been deleted before the stop.  I'll try to catch the closed FD before I stop next time.

I'm not sure, should I look at the parent smb process or the runaway one when getting the list of FD's?

Comment 14 Trevor Cordes 2010-10-27 10:09:23 UTC

Created attachment 6037 [details]
output of ll /proc/<pid>/fd while ps was running away

Comment 15 Trevor Cordes 2010-11-03 03:12:42 UTC

Another strange occurrence and maybe a clue?  This was the first time the problem appears to have fixed itself without my intervention.

Bug appears to have hit and filled the logs from
[2010/11/01 16:07:59,  0] lib/events.c:287(s3_event_debug)
  s3_event: sys_select() failed: 9:Bad file descriptor
to
[2010/11/02 06:40:47,  0] lib/events.c:287(s3_event_debug)
  s3_event: sys_select() failed: 9:Bad file descriptor

And then magically disappeared!  I checked and the parent smbd had not been restarted:
root       855     1  0 Nov01 ?        00:00:41 smbd -D

I can see by the logs it was last restarted 2010-11-01 04:00:14

So the bug hit, ran and eventually resolved itself before the disk was full of logs (97% full 1TB when I checked today).

I checked all running smbd's fd's to check what fd 9 was.  They are all smbd.pid:
l-wx------ 1 root  root  64 Nov  3 03:03 9 -> /var/run/smbd.pid

What a strange file to be causing fd errors!?  That's been confirmed the file a few times now.  It appears that with my setup fd 9 is always smbd.pid.

Checking that file now I see:
#ll /var/run/smbd.pid 
-rw-r--r-- 1 root root 4 Nov  1 04:00 /var/run/smbd.pid

Here's an idea: what if this bug is caused by a wonky Windows workstation that is screwing up its smbd connection and the one smbd child goes berserk.  Then in this recent occurrence the workstation is disconnecting (rebooting, whatever) and the wonky smbd ps dies and the bug goes away?  What else could be happening?

Comment 16 Trevor Cordes 2010-11-17 08:03:44 UTC

Happened again, but I think I have some useful data this time, and a new theory.

The CPU was 99% on process 27981

ps -ef |grep smbd
root     12484 24308  0 07:17 ?        00:00:00 smbd -D
root     14114 24308  0 07:24 ?        00:00:00 smbd -D
root     14210 24308  0 07:24 ?        00:00:00 smbd -D
root     24308     1  0 04:00 ?        00:00:02 smbd -D
root     24320 24308  0 04:00 ?        00:00:00 smbd -D
root     24385 24308  0 04:00 ?        00:00:00 smbd -D
root     24952 24308  0 04:00 ?        00:00:00 smbd -D
samba    24954 24308  0 04:00 ?        00:00:01 smbd -D
samba    27981     1 92 Nov16 ?        21:42:01 smbd -D

Notice how there runaway ps is the "old" one.  I have a script that sometimes restarts smbd at 4am (service smbd restart on a Fedora/RH box).  There is no reason that I can see why a Nov16 "leftover" would still be running after a Nov17 4am restart.  I've seen hints of this before.  I think this is the gist of the problem.

More info:
strace -p 27981
gettimeofday({1290000838, 735285}, NULL) = 0
gettimeofday({1290000838, 735305}, NULL) = 0
gettimeofday({1290000838, 735325}, NULL) = 0
select(1082, [8 1024 1025 1026 1027 1032 1033 1034 1037 1081], [1032], NULL, {9999, 33554432}) = 10 (in [8 1028 1029 1032 1033 1034 1037 1057 1058 1059 1060 106$
read(8, 0xbfe6d0ef, 1)                  = -1 EAGAIN (Resource temporarily unavailable)
gettimeofday({1290000838, 735558}, NULL) = 0
gettimeofday({1290000838, 735579}, NULL) = 0
gettimeofday({1290000838, 735599}, NULL) = 0
select(1082, [8 1024 1025 1026 1027 1032 1033 1034 1037 1081], [1032], NULL, {9999, 33554432}) = 10 (in [8 1028 1029 1032 1033 1034 1037 1057 1058 1059 1060 106$
read(8, 0xbfe6d0ef, 1)                  = -1 EAGAIN (Resource temporarily unavailable)

... ad infinitum

#ll /proc/27981/fd/|wc
   1095   12033   76640

#ll /proc/27981/fd/8
lr-x------ 1 root root 64 Nov 17 07:34 /proc/27981/fd/8 -> pipe:[51587366]


Also, I wrote a script to check for runaway ps's and log it.  From this I can see the runaway started Nov 16 10am.  Here's the ps -ef |grep {sn}mbd and "top |head-1" equivalent at that time:

Tue Nov 16 10:07:21 CST 2010
samba      343 29217  0 08:15 ?        00:00:04 smbd -D
samba     1577 29217  0 08:21 ?        00:00:02 smbd -D
samba     5944 29217  0 08:37 ?        00:00:00 smbd -D
samba    14300 29217  0 09:08 ?        00:00:02 smbd -D
root     21468 29217  0 09:34 ?        00:00:01 smbd -D
samba    27981 29217 13 07:58 ?        00:17:08 smbd -D
samba    28236 29217  0 10:00 ?        00:00:00 smbd -D
root     29217     1  0 04:00 ?        00:00:05 smbd -D
root     29229 29217  0 04:00 ?        00:00:00 smbd -D
root     29243     1  0 04:00 ?        00:00:00 nmbd -D
root     29244 29243  0 04:00 ?        00:00:00 nmbd -D
samba    29295 29217  0 04:00 ?        00:00:13 smbd -D
samba    31171 29217  0 08:09 ?        00:00:07 smbd -D
samba    31174 29217  0 08:09 ?        00:00:05 smbd -D
samba    32681 29217  0 08:15 ?        00:00:07 smbd -D

samba    27981 13.3  0.1  25764  5608 ?        R    07:58  17:08 smbd -D

15 mins later:
samba    27981 22.3  0.1  25764  5608 ?        R    07:58  32:07 smbd -D

15 mins later:
samba    27981 29.6  0.1  25764  5608 ?        R    07:58  47:07 smbd -D

Every 15 mins 27981 ramps up a few more CPU:
top ps cpu: 35.7
top ps cpu: 40.8
top ps cpu: 45.1
top ps cpu: 48.9
top ps cpu: 52.2
top ps cpu: 55.0
top ps cpu: 57.6

It was at 91.9% when I killed it.

As usual, only kill -9 works on it.

Obviously the process is going mental.  Once gone mental, service smbd restarts do NOT kill the mental process so you're left with a child smbd process without a parent and I'm guessing that it freaks out even more that its parent's resources are gone/closed.

I've managed to stop the disk-filling logs by heavily customizing the samba logging features and rsyslog.

What else can I do to help with this bug?

Comment 17 SATOH Fumiyasu 2011-03-30 07:13:32 UTC

Created attachment 6359 [details]
Make FD_SETSIZE tuneable for select(2) with more FDs

Executing select(2) and friends (e.g. FD_SET macro) with a value of
fd that is equal to or larger than FD_SETSIZE will result in undefined
behavior. The default value for FD_SETSIZE is 1024 on major OSes.

This is a patch for v3-5-test branch to make FD_SETSIZE tuneable for
select(2) with more FDs.

Try as the following:

  $ gzip -dc samba-3.5.8.tar.gz |tar xf -
  $ cd samba-3.5.8
  $ patch -p1 </path/to/samba-3.5.8-FD_SETSIZE.patch
  $ cd source3
  $ CPPFLAGS="-DSAMBA_FD_SETSIZE=16384" ./configure --YOUR-CONFIUGRE-OPTIONS
  $ make
  # su
  # make install

Comment 18 Volker Lendecke 2011-05-14 09:19:46 UTC

Increasing FD_SETSIZE is not the full fix unfortunately. For 3.6 we've converted all samba calls to select to poll(2), which is the real fix. Please be aware that this is neither the full fix, as there are libraries that still call select. But all that Samba can do is done at this point. Closing as fixed.

Volker