13751 – Samba 4.9.3 smbd coredumps in AIX

Bug 13751 - Samba 4.9.3 smbd coredumps in AIX

Summary: Samba 4.9.3 smbd coredumps in AIX

Status:	RESOLVED INVALID

Alias:	None

Product:	TDB
Classification:	Unclassified
Component:	libtdb (show other bugs)
Version:	unspecified
Hardware:	PPC AIX

Importance:	P5 normal
Target Milestone:	---
Assignee:	Samba QA Contact
QA Contact:	Samba QA Contact

URL:
Keywords:

Depends on:
Blocks:

Reported:	2019-01-16 15:48 UTC by Ayappan
Modified:	2020-11-12 22:49 UTC (History)
CC List:	3 users (show)

See Also:

Attachments
smbd-debug10 (28.78 KB, text/plain) 2019-01-18 15:07 UTC, Ayappan	no flags	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Ayappan 2019-01-16 15:48:14 UTC

Running smbd (Samba-4.9.3) immediately coredumps in AIX.
The stack trace reveals it happens inside tdb_write.

Below are the details

# /opt/freeware/sbin/smbd --interactive
smbd version 4.9.3 started.
Copyright Andrew Tridgell and the Samba Team 1992-2018
===============================================================
INTERNAL ERROR: Signal 10 in pid 11141180 (4.9.3)
Please read the Trouble-Shooting section of the Samba HOWTO
===============================================================
PANIC (pid 11141180): internal error
unable to produce a stack trace on this platform
dumping core in /var/log/samba/cores/smbd
IOT/Abort trap(coredump)



# dbx /opt/freeware/sbin/smbd core
Type 'help' for help.
warning: The core file is not a fullcore. Some info may
not be available.
[using memory image in core]
reading symbolic information ...warning: Unable to access the stab file. Some info may not be available


IOT/Abort trap in pthread_kill at 0xd0521f14
0xd0521f14 (pthread_kill+0xb4) 80410014         lwz   r2,0x14(r1)
(dbx) where
pthread_kill(??, ??) at 0xd0521f14
_p_raise(??) at 0xd0521348
raise.raise(??) at 0xd011f9c0
abort() at 0xd01af584
dump_core(), line 338 in "dumpcore.c"
smb_panic_s3(why = "internal error"), line 839 in "util.c"
smb_panic(why = "internal error"), line 170 in "fault.c"
fault_report(sig = 10), line 84 in "fault.c"
sig_fault(sig = 10), line 95 in "fault.c"
.() at 0xf014
tdb_write(tdb = 0x3003b618, off = 16380, buf = 0x2ff223f0, len = 4), line 223 in "io.c"
tdb_ofs_write(tdb = 0x3003b618, offset = 16380, d = 0x2ff22440), line 674 in "io.c"
update_tailer(tdb = 0x3003b618, offset = 696, rec = 0x2ff224e0), line 96 in "freelist.c"
tdb_free(tdb = 0x3003b618, offset = 696, rec = 0x2ff224e0), line 316 in "freelist.c"
tdb_expand(tdb = 0x3003b618, size = 15688), line 655 in "io.c"
tdb_allocate_from_freelist(tdb = 0x3003b618, length = 108, rec = 0x2ff22650), line 577 in "freelist.c"
tdb_allocate(tdb = 0x3003b618, hash = 1167830340, length = 83, rec = 0x2ff22650), line 664 in "freelist.c"
tdb._tdb_storev(tdb = 0x3003b618, key = (...), dbufs = 0x2ff227dc, num_dbufs = 1, flag = 1, hash = 1167830340), line 591 in "tdb.c"
tdb_storev(tdb = 0x3003b618, key = (...), dbufs = 0x2ff227dc, num_dbufs = 1, flag = 1), line 700 in "tdb.c"
db_tdb_storev(rec = 0x3003b418, dbufs = 0x2ff227dc, num_dbufs = 1, flag = 1), line 298 in "dbwrap_tdb.c"
dbwrap_record_storev(rec = 0x3003b418, dbufs = 0x2ff227dc, num_dbufs = 1, flags = 1), line 90 in "dbwrap.c"
dbwrap_record_store(rec = 0x3003b418, data = (...), flags = 1), line 99 in "dbwrap.c"
smbXsrv_version_global_init(server_id = 0x2ff22ac0), line 240 in "smbXsrv_version.c"
main(0x2, 0x2ff22c74) at 0x10003700
(dbx) quit

Comment 1 Ayappan 2019-01-17 10:02:29 UTC

Any update on this ?

Comment 2 Ayappan 2019-01-17 14:28:19 UTC

Doing a truss on this shows the below

statx("/var/locks", 0x2FF22688, 128, 011)       = 0
kopen("/var/locks/smbXsrv_version_global.tdb", O_RDWR|O_CREAT|O_LARGEFILE, S_IRUSR|S_IWUSR) = 13
kfcntl(13, F_GETFD, 0x00000000)                 = 0
kfcntl(13, F_SETFD, 0x00000001)                 = 0
kfcntl(13, 13, 0x2FF22250)                      = 0
kfcntl(13, 12, 0x2FF22250)                      = 0
kfcntl(13, 13, 0x2FF222B0)                      = 0
klseek(13, 0, 0, 0x00000000)                    = 0
kftruncate(13, 0x0000000000000000)              = 0
kwrite(13, " T D B   f i l e\n\0\0\0".., 696)   = 696
kfcntl(13, 13, 0x2FF222C0)                      = 0
klseek(13, 0, 0, 0x00000000)                    = 0
kread(13, " T D B   f i l e\n\0\0\0".., 168)    = 168
fstatx(13, 0x2FF22438, 128, 010)                = 0
fstatx(13, 0x2FF222A0, 128, 010)                = 0
kmmap(0x00000000, 696, 3, 1, 13, 0x00000000, 0x00000000) = 0xB006C000
kfcntl(13, 13, 0x2FF22250)                      = 0
kfcntl(13, 13, 0x2FF22250)                      = 0
kfcntl(13, 13, 0x2FF22250)                      = 0
fstatx(13, 0x2FF22658, 128, 010)                = 0
kfcntl(13, 13, 0x2FF22510)                      = 0
kfcntl(13, 13, 0x2FF22430)                      = 0
fstatx(13, 0x2FF22410, 128, 010)                = 0
munmap(0xB006C000, 696)                         = 0
kmmap(0x00000000, 696, 3, 1, 13, 0x00000000, 0x00000000) = 0xB006C000
fstatx(13, 0x2FF20338, 128, 010)                = 0
kioctl(13, -2147195273, 0x2FF20300, 0x00000000) = 1
kioctl(13, -2147195273, 0x2FF20300, 0x00000000) = 0
munmap(0xB006C000, 696)                         = 0
kmmap(0x00000000, 16384, 3, 1, 13, 0x00000000, 0x00000000) = 0xB006C000
    Received signal #10, SIGBUS [caught]

Comment 3 Ayappan 2019-01-18 14:48:43 UTC

I just cleaned up the system and started fresh.
Removed the secrets.tdb file as well.

Now i am getting an error with secrets.tdb file.

# /opt/freeware/sbin/smbd -i
smbd version 4.9.3 started.
Copyright Andrew Tridgell and the Samba Team 1992-2018
tdb(/var/lib/samba/private/secrets.tdb): tdb_oob len 16408 beyond eof at 696
tdb(/var/lib/samba/private/secrets.tdb): tdb_transaction_recover: failed to read recovery record
Failed to open /var/lib/samba/private/secrets.tdb
tdb(/var/lib/samba/private/secrets.tdb): tdb_oob len 16408 beyond eof at 696
tdb(/var/lib/samba/private/secrets.tdb): tdb_transaction_recover: failed to read recovery record
Failed to open /var/lib/samba/private/secrets.tdb
exit_daemon: STATUS=daemon failed to start: smbd can not open secrets.tdb, error code 13


Any ideas will be really helpful.

Comment 4 Ayappan 2019-01-18 15:07:35 UTC

Created attachment 14789 [details]
smbd-debug10

Attaching the output of smbd -i with debug=10

Comment 5 Ayappan 2019-01-18 15:55:24 UTC

Again removing the file "/var/lib/samba/private/secrets.tdb" and doing a smbd -i -d10 results in the below error.

Attempting to register passdb backend tdbsam
Successfully added passdb backend 'tdbsam'
Found pdb backend tdbsam
pdb backend tdbsam has a valid init
tdb(/var/lib/samba/private/secrets.tdb): tdb_transaction_start: nesting 1
dbwrap_lock_order_lock: check lock order 1 for /var/lib/samba/private/secrets.tdb
lock order:  1:/var/lib/samba/private/secrets.tdb 2:<none> 3:<none>
dbwrap_lock_order_unlock: release lock order 1 for /var/lib/samba/private/secrets.tdb
tdb(/var/lib/samba/private/secrets.tdb): tdb_transaction_start: nesting 1
tdb(/var/lib/samba/private/secrets.tdb): tdb_transaction_setup_recovery: transaction data over new region boundary
tdb(/var/lib/samba/private/secrets.tdb): tdb_transaction_prepare_commit: failed to setup recovery data
PANIC (pid 56951290): could not start commit secrets db
unable to produce a stack trace on this platform
dumping core in /var/log/samba/cores/smbd
IOT/Abort trap(coredump)


# dbx /opt/freeware/sbin/smbd core
Type 'help' for help.
[using memory image in core]
reading symbolic information ...warning: Unable to access the stab file. Some info may not be available


IOT/Abort trap in pthread_kill at 0xd05833ec ($t1)
0xd05833ec (pthread_kill+0xac) 80410014            lwz   r2,0x14(r1)
(dbx) where
pthread_kill(??, ??) at 0xd05833ec
_p_raise(??) at 0xd05827c8
raise.raise(??) at 0xd01234a4
abort() at 0xd0189a18
dump_core(), line 338 in "dumpcore.c"
smb_panic_s3(why = "could not start commit secrets db"), line 839 in "util.c"
smb_panic(why = "could not start commit secrets db"), line 170 in "fault.c"
get_global_sam_sid(), line 217 in "machine_sid.c"
main(0x3, 0x2ff22ae0) at 0x10003694

Comment 6 Ayappan 2019-01-23 11:03:15 UTC

After some debugging it seems like posix_fallocate is broken in AIX 6.1, 7.1, 7.2
File : lib/tdb/common/io.c

  +416  #if HAVE_POSIX_FALLOCATE
  +417          ret = tdb_posix_fallocate(tdb, size, addition);
  +418          if (ret == 0) {
  +419                  return 0;
  +420          }
  +421          if (ret == ENOSPC) {


   +99  #if HAVE_POSIX_FALLOCATE
  +100  static int tdb_posix_fallocate(struct tdb_context *tdb, off_t offset,
  +101                                 off_t len)
  +102  {
  +103          ssize_t ret;
  +104
  +105          if (!tdb_adjust_offset(tdb, &offset)) {
  +106                  return -1;
  +107          }
  +108
  +109          do {
  +110                  ret = posix_fallocate(tdb->fd, offset, len);
  +111          } while ((ret == -1) && (errno == EINTR));
  +112


The call to posix_fallocate is returning zero but file (secrets.tdb) size is not increased. Adding if !defined(_AIX) around the above mentioned code (basically disabling the code) makes the daemon run again. 

I see there is one more place where posix_fallocate is called.

File : source3/lib/system.c

  +439  int sys_posix_fallocate(int fd, off_t offset, off_t len)
  +440  {
  +441  #if defined(HAVE_POSIX_FALLOCATE)
  +442          return posix_fallocate(fd, offset, len);
  +443  #elif defined(F_RESVSP64)
  +444          /* this handles XFS on IRIX */
  +445          struct flock64 fl;

I am not sure if we disable this like above, what impact this has.
May be the community can shed some light here.

Comment 7 Björn Jacke 2019-01-23 11:07:18 UTC

if you are confident that this is a AIX bug, can you please file an upstream AIX bug for that and reference here, where the result can be followed?

Comment 8 Ayappan 2019-01-23 11:15:40 UTC

I am from IBM AIX Toolbox development team. 
This will be a internal defect and there won't be any public url to reference the defect.

Comment 9 Björn Jacke 2019-02-19 11:50:52 UTC

we are not encountering this issue in the SerNet samba+ packages

Comment 10 Andrew Bartlett 2019-04-23 05:19:32 UTC

(In reply to Ayappan from comment #8)
If this can be reproduced at will then a configure test could be written to detect this behaviour and then to blacklist the posix_fallocate() call.

Comment 11 Ayappan 2019-04-23 06:43:10 UTC

Thanks for the update.

A simple sample program using posix_fallocate works in AIX. Need to analyze more on this.

Comment 12 SATOH Fumiyasu 2019-04-23 07:06:46 UTC

My Samba 4.10.2 on AIX 7.2 with posix_fallocate support has no problem.

```
# file /usr/local/sbin/smbd
/usr/local/sbin/smbd: executable (RISC System/6000 V3.1) or obj module not stripped
# /usr/local/sbin/smbd -b |grep -i posix_fallocate
   HAVE_POSIX_FALLOCATE
   _POSIX_FALLOCATE_CAPABLE_LIBC
```

What filesystem are you using for /var/lib/samba/private/secrets.tdb?
`mount |grep /var`

Samba is 32-bit or 64-bit binary?
`file /opt/freeware/sbin/smbd`

Comment 13 Ayappan 2019-04-23 07:28:54 UTC

Interesting!!

# mount |grep /var
         /dev/hd9var      /var             jfs2   Apr 23 12:10 rw,log=/dev/hd8
         /dev/livedump    /var/adm/ras/livedump jfs2   Apr 23 12:10 rw,log=/dev/hd8

The Samba build is 32bit. 

What is the AIX level you are using ? (Mine is 7200-03-00-0000)
oslevel -s

Can you execute below command on tdb library and paste the output here ?

# dump -Tov libtdb.so | grep posix
[32]    0x00000000    undef      IMP     DS EXTref   libc.a(shr.o) posix_fallocate

Comment 14 SATOH Fumiyasu 2019-04-23 07:47:59 UTC

(In reply to Ayappan from comment #13)

```
$ oslevel -s
7200-00-04-1717
$ dump -Tov libtdb.so |grep posix
[34]  0x00000000    undef      IMP     DS EXTref   libc.a(shr.o) posix_fallocate64
```

I've compiled Samba on AIX 7.2 with GCC 4.8.5, CPPFLAGS="$CPPFLAGS -D_LARGE_FILES -DHAVE_BROKEN_READLINK -D_UINTPTR_T_DEFINED=1" and bug #9557 #10270 patches. If no _LARGE_FILES and _UNIPTR_T_DEFINED in CPPFLAGS, build fails.

Comment 15 Ayappan 2019-04-23 08:54:26 UTC

Thanks for the info.

I see "posix_fallocate64" in your case. AIX 7.2 has posix_fallocate64 under "_LARGE_FILES" condition whereas AIX 6.1 don't have that. And my build is on AIX 6.1 . 

Looks like the implementation could be wrong in AIX 6.1 
Need to check with AIX core team.

Comment 16 Björn Jacke 2020-08-13 08:28:49 UTC

(In reply to Ayappan from comment #15)
> Looks like the implementation could be wrong in AIX 6.1 
> Need to check with AIX core team.

just for completeness: what is the outcome of this finally? Can you say which AIX versions and os levels are broken and which are no longer broken for the posix_fallocate64 implementation?