Hi, at a test enviroment with two machines, one providing GPFS and the other one acting as a single ctdb node I noticed that the smbd is forked till the machine dies with about 790 processes (about 460 smbd processes). I tried this a few times and it should be traceable if you got 2 machines. Machine A is providing GPFS which is mounted by machine B. Machine B got a single samba 3.2.4 (with enabled --with-clustering-support) configured as a ctdb node. Machine B provides one share which is the mounted gpfs. This samba share is mounted locally on machine B. If you run the iozone benchmark with the following parameters "worked well". iozone -s 10G -r 256k -r 512k -r 1024k -r 2048k -r 4096k -r 8096k -f /mnt/iozone After some time you should see errors in /var/log/samba/log.smbd that look like: [2008/12/02 14:23:29, 0] lib/util_sock.c:write_data(1059) [2008/12/02 14:23:29, 0] lib/util_sock.c:get_peer_addr_internal(1607) getpeername failed. Error was Transport endpoint is not connected write_data: write failure in writing to client 0.0.0.0. Error Broken pipe [2008/12/02 14:23:29, 0] smbd/process.c:srv_send_smb(74) Error writing 101 bytes to client. -1. (Transport endpoint is not connected) [2008/12/02 14:23:29, 0] lib/util_sock.c:write_data(1059) [2008/12/02 14:23:29, 0] lib/util_sock.c:get_peer_addr_internal(1607) getpeername failed. Error was Transport endpoint is not connected write_data: write failure in writing to client 0.0.0.0. Error Broken pipe My guess would be, that the io wait gets at some point too high and samba thinks the gpfs is gone. I retried that three times now with the same result.
Hi, First a general comment: the cluster support in Samba 3.2 is still incomplete. For better testing, you might want to use the samba-ctdb branch, that is based on Samba 3.2 and contains the latest cluster enhancements (or try the 3.3 release candidate). The samba-ctdb branch is located here on the web: http://gitweb.samba.org/?p=obnox/samba-ctdb.git;a=summary or as git repository: git://git.samba.org/obnox/samba-ctdb.git Packages for RHEL can be found here: http://ctdb.samba.org/packages/ Cheers - Michael
Now for your concrete test case: (In reply to comment #0) > Hi, > at a test enviroment with two machines, one providing GPFS and the other one > acting as a single ctdb node I noticed that the smbd is forked till the machine > dies with about 790 processes (about 460 smbd processes). > > I tried this a few times and it should be traceable if you got 2 machines. > > Machine A is providing GPFS which is mounted by machine B. > Machine B got a single samba 3.2.4 (with enabled --with-clustering-support) > configured as a ctdb node. I don't completely understand your setup: Only machine A has gpfs? Then how is that mounted on machine B? What operating system are you running? Usual setup: * we have a gpfs cluster with several nodes. it should be irrelevant where the storage comes from, since this is visible as a plain gpfs file system to the nodes. * ctdb runs on all the nodes (or some of them) (ctdb needs the common cluster storage for communication via locks) so we need gpfs running on those nodes where we want to run ctdb. * samba can run on (all or some of) the nodes on top of ctdb + gpfs I need to understand your setup first, before I can comment any further. Cheers - Michael > Machine B provides one share which is the mounted gpfs. This samba > share is mounted locally on machine B. If you run the iozone benchmark > with the following parameters "worked well". > > iozone -s 10G -r 256k -r 512k -r 1024k -r 2048k -r 4096k -r 8096k -f > /mnt/iozone > > After some time you should see errors in /var/log/samba/log.smbd that look > like: > > [2008/12/02 14:23:29, 0] lib/util_sock.c:write_data(1059) > [2008/12/02 14:23:29, 0] lib/util_sock.c:get_peer_addr_internal(1607) > getpeername failed. Error was Transport endpoint is not connected > write_data: write failure in writing to client 0.0.0.0. Error Broken pipe > [2008/12/02 14:23:29, 0] smbd/process.c:srv_send_smb(74) > Error writing 101 bytes to client. -1. (Transport endpoint is not connected) > [2008/12/02 14:23:29, 0] lib/util_sock.c:write_data(1059) > [2008/12/02 14:23:29, 0] lib/util_sock.c:get_peer_addr_internal(1607) > getpeername failed. Error was Transport endpoint is not connected > write_data: write failure in writing to client 0.0.0.0. Error Broken pipe > > My guess would be, that the io wait gets at some point too high and samba > thinks the gpfs is gone. I retried that three times now with the same result.
I upgraded the samba version to 3.3RC1, the same issue is happening but it seems kind of slower this time. > Only machine A has gpfs? > Then how is that mounted on machine B? No, both machines got gpfs, but the storage aka the storage pool is in my case on machine A, so no data is actually written to the gpfs on B. The samba config is: [root@kempes ~]# net conf list [global] clustering = yes vfs objects = gpfs fileid idmap backend = tdb2 private dir = /gpfs/fs1/ctdb/ use mmap = no gpfs:sharemodes = No force unknown acl user = yes fileid:mapping = fsname [gpfs_fast] comment = Share for GPFS fast fileset path = /gpfs/fs1/fast read only = no Where gpfs_fast is a dataOnly fileset with currently one machine as a target. > What operating system are you running? Centos 5.2 > I need to understand your setup first, before I can comment any further. It is basicly a testing enviroment, I'm going to add more machines later. I changed the log level to 10 this morning so I could provide some more information, as the file was about 4 MB big, I tried to cut it down, hope it still helps that way.
Created attachment 3781 [details] Part of the log file with log level 10
Some first thoughts. (In reply to comment #3) > I upgraded the samba version to 3.3RC1, the same issue is happening but it > seems kind of slower this time. > > > Only machine A has gpfs? > > Then how is that mounted on machine B? > No, both machines got gpfs, but the storage aka the storage pool is in > my case on machine A, so no data is actually written to the gpfs on B. This should be irrelevant for samba/ctdb. They only use the mounted storage. > The samba config is: > > [root@kempes ~]# net conf list > [global] > clustering = yes > vfs objects = gpfs fileid > idmap backend = tdb2 ok > private dir = /gpfs/fs1/ctdb/ You should not (need to) do this. ctdb handles the tdbs stored in the private dir. Putting the private tdbs in cluster storage was useful before ctdb started supporting persistent tdbs. > use mmap = no > gpfs:sharemodes = No > force unknown acl user = yes > fileid:mapping = fsname When you are using NFS acls on GPFS, you should also set the following: force unknown acl user = yes nfs4: mode = special nfs4: chown = yes nfs4: acedup = merge See http://wiki.samba.org/index.php/CTDB_Setup I still need to look at the log file you have provided. Cheers - Michael