Created attachment 17864 [details]
Example smb.conf file that triggers the error
OS: Debian 11 Bullseye container under Proxmox 7.4.1
Admin Tool: Cockpit with the 45drives "File Sharing" plugin
Setting up a new Proxmox-based home server with Samba running on Debian in an LXC container.
Configuring the smb.conf file (which the File Sharing web UI loads into the Samba Registry) I ran into an issue with the "copy = [servicename]" directive causing errors, but only for some uses under very specific circumstances.
The error is repeatable, but very hard to pin down to a specific cause. For this reason I have not been able to produce an anonymised test case because almost any change (e.g. to the service names) can cause the bug to disappear.
The example file is therefore essentially the top half of my real smb.conf file
In smb.conf I set up one service ("Alex") as a template for the home shares for all users. I then "copy =" this service and just change the Path and any permissions tweaks.
In the example (see attachment) the copy directive works correctly 3 times, but then fails on the 4th invocation with the error-
Unable to copy service - source not found: Alex
Extending the Example
You will see in the example smb.conf there is a home share for user "Shirley" commented-out. If I uncomment this share (so there are 6 shares in total) the "unable to copy service" error occurs twice, once for both the [Shirley] and [Colin] service definitions.
1) The "File Sharing" web tool reloads the Samba config whenever a config change is made, but I have sometimes found it necessary to also-
a) Manually run "smbcontrol all reload-config"
b) Re-access at least one of the shares from a (Win10 Pro) client
to start seeing the errors pop up in the logs.
2) Any change to the smb.conf text can change the behaviour of the bug. For example I attempted to anonymise the file by changing the usernames to "User1,User2..." but even this small change seemed to prevent the error from occurring
3) The error is not simply caused by using "copy =" 4 or more times on one template. I have used the same technique to define 5 other services from a single template "bulk share" service later in the full smb.conf file, and those shares work correctly with no errors
The use of the "copy =" feature was (a) for convenience (smaller file) but also to give consistency of the service definitions.
For the moment I have had to revert to specifying the most of the users /home service definitions in full without the copy= function
Created attachment 17865 [details]
Screenshot of error log
Can you run smbd under valgrind and reproduce this ? This seems as though it might be a memory corruption error somehow.
I'd need a pointer to some instructions on how to run Samba under valgrind. It's not something i'm familiar with.
valgrind --trace-children=yes --num-callers=100 /usr/sbin/smbd
I had a quick look at the valgrind docs. Apparently it needs a version of the application compiled with debug synbols enabled and all the optimisations turned off (-D0) in order to provide accurate information. It's beyond my level of knowledge to start creating custom builds of the Samba suit from source i'm afraid.
If there's a debug-friendly version of the smbd executable I can download and use as a drop-in replacement for the production version then i can try that, but that's about my limit.
I don't need accurate information, I just need to know if it's a memory access error. If you can confirm that it's worth the time to take to try and reproduce.
Ok. So the process I followed (as root) was-
1) Install valgrind via apt
2) cd to /etc/samba
2) Stop smbd with "smbcontrol smbd shutdown"
3) Create the "breaking" config file as smb_bug_example_failed.conf
4) Run smbd as-
valgrind --trace-children=yes --num-callers=100 /usr/sbin/smbd -s smb_bug_example_failed.conf
I believe the config change "took" as accessing the server from a Windows client showed only the 5 uncommented services present in the conf file.
No errors from the "copy =" line appearing in the logs unfortunately (I did say this error was pretty mercurial)
Initially the valgrind output was showing 0 errors, however after accessing the shares from a client a non-zero leak value did appear (full output in the next comment).
root@fileserver:/etc/samba# valgrind --trace-children=yes --num-callers=100 /usr/sbin/smbd -s smb_bug_example_failed.conf
==1278== Memcheck, a memory error detector
==1278== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==1278== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info
==1278== Command: /usr/sbin/smbd -s smb_bug_example_failed.conf
==1278== HEAP SUMMARY:
==1278== in use at exit: 184,727 bytes in 1,023 blocks
==1278== total heap usage: 2,046 allocs, 1,023 frees, 400,709 bytes allocated
==1278== LEAK SUMMARY:
==1278== definitely lost: 0 bytes in 0 blocks
==1278== indirectly lost: 0 bytes in 0 blocks
==1278== possibly lost: 88,677 bytes in 310 blocks
==1278== still reachable: 96,050 bytes in 713 blocks
==1278== suppressed: 0 bytes in 0 blocks
==1278== Rerun with --leak-check=full to see details of leaked memory
==1278== For lists of detected and suppressed errors, rerun with: -s
==1278== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
==1280== HEAP SUMMARY:
==1280== in use at exit: 190,937 bytes in 1,054 blocks
==1280== total heap usage: 2,106 allocs, 1,052 frees, 411,079 bytes allocated
==1280== LEAK SUMMARY:
==1280== definitely lost: 0 bytes in 0 blocks
==1280== indirectly lost: 0 bytes in 0 blocks
==1280== possibly lost: 94,550 bytes in 327 blocks
==1280== still reachable: 96,387 bytes in 727 blocks
==1280== suppressed: 0 bytes in 0 blocks
==1280== Rerun with --leak-check=full to see details of leaked memory
==1280== For lists of detected and suppressed errors, rerun with: -s
==1280== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
==1284== Warning: invalid file descriptor -1 in syscall close()
==1284== HEAP SUMMARY:
==1284== in use at exit: 264,655 bytes in 1,495 blocks
==1284== total heap usage: 8,140 allocs, 6,645 frees, 1,713,943 bytes allocated
==1284== LEAK SUMMARY:
==1284== definitely lost: 34 bytes in 1 blocks
==1284== indirectly lost: 0 bytes in 0 blocks
==1284== possibly lost: 156,576 bytes in 708 blocks
==1284== still reachable: 108,045 bytes in 786 blocks
==1284== suppressed: 0 bytes in 0 blocks
==1284== Rerun with --leak-check=full to see details of leaked memory
==1284== For lists of detected and suppressed errors, rerun with: -s
==1284== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
==1300== Warning: invalid file descriptor -1 in syscall close()
==1300== HEAP SUMMARY:
==1300== in use at exit: 264,655 bytes in 1,495 blocks
==1300== total heap usage: 8,156 allocs, 6,661 frees, 1,715,522 bytes allocated
==1300== LEAK SUMMARY:
==1300== definitely lost: 34 bytes in 1 blocks
==1300== indirectly lost: 0 bytes in 0 blocks
==1300== possibly lost: 156,576 bytes in 708 blocks
==1300== still reachable: 108,045 bytes in 786 blocks
==1300== suppressed: 0 bytes in 0 blocks
==1300== Rerun with --leak-check=full to see details of leaked memory
==1300== For lists of detected and suppressed errors, rerun with: -s
==1300== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
==1302== HEAP SUMMARY:
==1302== in use at exit: 268,379 bytes in 1,557 blocks
==1302== total heap usage: 8,336 allocs, 6,779 frees, 1,736,797 bytes allocated
==1302== LEAK SUMMARY:
==1302== definitely lost: 0 bytes in 0 blocks
==1302== indirectly lost: 0 bytes in 0 blocks
==1302== possibly lost: 155,728 bytes in 696 blocks
==1302== still reachable: 112,651 bytes in 861 blocks
==1302== suppressed: 0 bytes in 0 blocks
==1302== Rerun with --leak-check=full to see details of leaked memory
==1302== For lists of detected and suppressed errors, rerun with: -s
==1302== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Hmm. After running smbd under valgrind I returned the server to normal operation (as I thought) by rebooting the host (a Debian LXC container) which should have restarted Samba "clean" with the original config.
I also rebooted my client PC to clear out any cached view of the server.
However I am seeing bizarre behaviour. Even though the 45drives web UI ("File Sharing") shows the original config with a total of 12 file shares (plus [printers]) defined, my Windows client sees only a single share. The one called [Colin].
Attempting to access other shares by name (e.g. \\fileserver\Alex) fails, so they really aren't there, it's not just that they are not browsable.
I have reloaded the original smb.conf file into the samba registry via the web UI, and also dumped out the registry config in smb.conf format and everything looks fine, but the shares are 90% missing.
I'm guessing, but could this be some kind of samba registry corruption problem, and if so how would one diagnose that?
Not sure if this is a consequence of the original "copy=" problem or a new issue. If you would prefer it logged as a new bug please let me know.
Interesting. It required a full reboot of the Proxmox host that was running the LXC container that was itself hosting Samba to get the config back running as expected with all the defined shares visible.
The LXC container should be largely isolated from the host OS, other than in my case I have the file space for the Samba shares set up in the Proxmox host and then bind-mounted into the container so Samba can see them.
However LXC containers do share the host kernel. Could there be some interaction happening with the fact that Samba config is now stored in the registry?