Bug 2681 - Filling disk (or quota) while saving MS-Word corrupts document
Summary: Filling disk (or quota) while saving MS-Word corrupts document
Status: CLOSED FIXED
Alias: None
Product: Samba 3.0
Classification: Unclassified
Component: File Services (show other bugs)
Version: 3.0.14a
Hardware: x86 Linux
: P3 major
Target Milestone: none
Assignee: Samba Bugzilla Account
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-05-05 08:07 UTC by Tom Schaefer
Modified: 2018-07-03 23:24 UTC (History)
1 user (show)

See Also:


Attachments
Proposed patch (2.48 KB, patch)
2005-05-16 18:02 UTC, Jeremy Allison
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Tom Schaefer 2005-05-05 08:07:22 UTC
I work at a university.  Each day hundreds if not thousands of students use our 
computer labs and have a Samba drive mapped for their own personal use.  They 
are given a 20 Megabyte quota.

What has been happening is a student with their disk quota near full capacity 
will open a MS-Word document and add a lot of text and/or images and then go to 
save their document filling their disk quota in the process.  What happens next 
is very very ugly.  The save to the Samba share fails with Windows XP reporting 
a "delayed write failed" at the lower right of the desktop, the original 
document on the Samba share is now corrupt and the document is also corrupted 
in Word in memory.  Its actually quite the site to behold, all the text on the 
screen in Word just transforms to garbage right before your eyes.

The production set up we use is Samba running on Solaris/Sparc.  The student 
directories are served to the Samba server from a NetApp NFS server and then re-
exported as Samba shares.  However, I've been chipping away at this issue 
through much experimentation.  Its not an NFS problem, or quota problem, or 
Solaris specific problem.  I can replicate it on x86 Red Hat Enterprise Linux 4 
without quotas, just using a nearly full local disk partition.

The same scenario saving to Microsoft servers always results in Word just 
popping up a friendly disk full error, not corrupting the document on disk or 
in memory, and allowing you to save elsewhere, or delete some stuff to make 
room and then save again.  Its been tested saving to MS Server 2003, Win 2K, 
and XP Pro.  

To reproduce:

Fill an entire disk partition that is being served out by Samba to just under 
capacity.  About 1 Megabyte free is perfect.  

On your Windows XP or Win 2K system accessing that Samba share, if there isn't 
one there already put a smallish MS Word document onto it.  The test document 
I've been using is 29 kilobytes.  Open the document in MS Word 2002 or 2003.  
Now we have to make the document to large to be saved.  I do this by by 
browsing my hard drive to a several hundred kilobyte jpg file, right click and 
select copy, then paste into Word 2 or 3 or however many times is necessary so 
the document will be to large to save.  Attempt to save the document.  Get a 
delayed write failure error.  Watch all the text on the screen in Word corrupt 
to garbage before your eyes.  Try to open the orginal document from the Samba 
share, its completely corrupt as well.  Not only have you lost your new edits, 
you've lost the entire original document as well, very nasty!
Comment 1 Jeremy Allison 2005-05-06 07:43:31 UTC
Can you reproduce this problem if you set the smb.conf parameter :

strict allocate = true 

in the [global] or share-specific part of the smb.conf ? If you can I'd like to
see an ethereal capture trace of the problem. I'll try and reproduce this once I
get back to the USA.
Jeremy.
Comment 2 Tom Schaefer 2005-05-06 09:51:26 UTC
(In reply to comment #1)

Wow!

Hi Jeremy, I didn't expect to hear from you so soon if ever, seeing as how
you're employed by Novell now.  They are lucky to have you.

You may find this amusing: We used to be big Novell users around here for many
years.  Then, about 4 years ago I demoed Samba for my boss's boss.  He fell in
love with it immediately.  The rest is history as they say.  Over the course of
the next 20 odd months the entire campus was migrated off Novell and onto Samba.  :)

Back to the issue at hand, yes I do have "strict allocate = yes" enabled in the
global section.  I came up with that one on my own in the course of trying to
solve this.  It didn't.

I could do some packet captures.  For it to be of any use I guess I'd have to do
two, one to a Samba server and one to a Windows server and you could try and see
whats different.

Honestly that would be somewhat of a hassle though.  If you are going to be back
from Germany relatively soon I'd probably just assume wait.  I do think you'll
be able to replicate this without much trouble.  When are you expecting to come
back?  If you are going to be over there a while I'll see about gathering up
some traces to send your way.

Thankyou,
Tom Schaefer
Comment 3 Jeremy Allison 2005-05-10 10:40:32 UTC
Tom, what would really help is to see a trace from a Windows client to a Windows
server returning the disk full error. I need to see exactly when Windows returns
this so we can match the call.
Can you get me tcpdump or ethereal full traces of this between Windows -> Windows ?
A Windows -> Samba would also help but isn't as important. With nfs we're
returning disk full on close as that's when we find out (when the client code in
the kernel flushes the write onto the NFS server). This was a bug fixed for
Intel in Roseville about 5 years ago so I don't want to just ignore the error on
close here, I'd rather find out where Windows detects the disk full problem and
ensure the same.
As a test (although this will damage performance), try setting :

strict sync = yes
sync always = yes

in the [global] section of the smb.conf file. This will force a flush onto disk
on every write. If a write returns full it will force it to be detected on the
write call, not the close call. Performace will suck though but I'll be
interested to see if it fixes the problem.

Jeremy.
Comment 4 Tom Schaefer 2005-05-10 15:48:03 UTC
Jeremy,

I enabled strict sync = yes and sync always = yes and it did not solve the
problem.  In fact I had temporarily enabled those parameters myself trying to
come up with a solution before I ever even filed this bug report.

I'll capture some traffic with windump (Windows port of tcpdump) and get it to
you as soon as I can.  Probably tomorrow.  

Also, I need to tell you that this problem isn't completely black and white. 
Its not reproduceable 100% of the time.  Well it is and it isn't.  Let me explain:

In some cases attempting to save the Word document to a Samba server that would
result in running out of disk space is handled flawlessly by the Windows/Word
client, a friendly error is popped up telling you the disk is full and that is
that.  No "delayed write failed" errors.  No document corruption.  That
generally seems to be the case if the document you are attempting to save is
going to put you vastly over your quota.  

In some cases it all goes bezerk as outlined in the original bug report above. 
Those cases seem to be easily caused by following my guidelines above, about 1
Meg free on the disk to begin with and then add a little more than 1 Meg of new
material to your document and attempt to save it.

So Jeremy, you may have to take a few cracks at it before being able to
reproduce the problem but then once you do come up with a bug producing evil
magic formula of free disk space, original document size, and additions to the
document before resaving you'll be able to replicate the bug every single
attempt without fail.

In other Words, if I can copy document1.doc from my C: drive onto a Samba share,
then open it from the Samba share, paste a 600kb jpeg file into it twice and
then attempt to save it and see all the text turn to garbage then I could do it
times in a row by just repeating this exact same procedure 10 times in a row.

As far as the document corruption goes, that doesn't really seem to be a great
mystery.  When saving a document from Word it seems to go through the routine of
save to a temp file ~wrl0001.tmp or something like that, then deleting the
original document and then renaming ~wrl0001.tmp to the name of the original
document.  What I've observed is that even when these saves out to the Samba
share fail Word will still delete the original document and rename ~wrl0001.tmp
(the file it was writing when it ran out of space) to the name of the original
document and then I guess Word reads it back into memory from disk at that point
 causing the corruption of the document in memory.

Now I'm going to share with you a feeling I've had as to what's maybe going on
here, please discard without hesitation of this sounds ludicrous which to you it
very well might... I've been getting the sense this might be related to the
whole business of sparse files.  strict allocate = yes makes total sense to me
as a potential solution.  Is it at all possible that even with that parameter
enabled Samba is still creating sparse files?  I ask because
#1) what we have discovered is that in the end the Word document displays a size
that shouldn't be possible given the quota or disk limitation in place.  Say I
start with 1 Meg disk free, and try to save what would result in a 2 Megabyte
Word document and it gets corrupted as I've described - the resulting corrupt
file actually does list as a 2 Megabyte file and acts that way too - I can't
move the corrupt 2 Megabyte Word file from the Samba share to my c: drive and
back onto the Samba share, I just get an error that there's not enough disk space.
#2) It just kind of seems like it would fit - like thats how Windows would check
if there's space available - try to grow the file and see if it succeeds
#3) I do think strict allocate = yes helped.  As I was saying its not a black
and white situation and it seems with strict allocate enabled I have to work
harder at coming up with a scenario to trigger the problem.

Thankyou,
Tom Schaefer
Comment 5 Tom Schaefer 2005-05-12 07:08:57 UTC
Jeremy,

I made some Ethereal traces yesterday and had to e-mail them to you directly 
because they where to large to attach on Bugzilla.  Hopefully you got them.

Anyhow, I stared at them some more last and I think I've probably determined 
where the problem is.

I've analyzed other traces besides the ones I've sent you and what I'm seeing 
holds true for all of them.

If you where to look at the server2003 trace I sent you would see in packets 89 
through 216 you'll see a big series of requests and responses where the Windows 
client is telling the server to write 1 byte in a file to which it keeps 
incrementing the offset by about 32K each time.  This goes on successfully in 
packets 89 through 178.  At packet 179 the offset is up to 994815 which 
apparently makes the file to large to remain under quota so in packets 179 
through 216 the Windows client keeps requesting these 1 byte writes and upping 
the offset by about 32K and keeps getting told over and over with each of those 
requests STATUS_DISK_FULL ... 

     89 0.321558    134.124.42.203        134.124.18.221        SMB      Write 
AndX Request, FID: 0x0007, 1 byte at offset 11305
     90 0.321954    134.124.18.221        134.124.42.203        TCP      
netbios-ssn > 2724 [ACK] Seq=4401 Ack=4788 Win=64284 Len=0
     91 0.322118    134.124.18.221        134.124.42.203        SMB      Write 
AndX Response, FID: 0x0007, 1 byte
     92 0.322206    134.124.42.203        134.124.18.221        SMB      Trans2 
Request, QUERY_FILE_INFO, FID: 0x0007, Query File Standard Info
     93 0.323763    134.124.18.221        134.124.42.203        SMB      Trans2 
Response, QUERY_FILE_INFO
     94 0.324293    134.124.42.203        134.124.18.221        SMB      Trans2 
Request, SET_FILE_INFO, FID: 0x0007
     95 0.324876    134.124.18.221        134.124.42.203        SMB      Trans2 
Response, SET_FILE_INFO
     96 0.324918    134.124.18.221        134.124.42.203        SMB      NT 
Trans Response, NT NOTIFY
     97 0.324974    134.124.42.203        134.124.18.221        TCP      2724 > 
netbios-ssn [ACK] Seq=4960 Ack=4680 Win=65256 [TCP CHECKSUM INCORRECT] Len=0
     98 0.324986    134.124.18.221        134.124.42.203        SMB      NT 
Trans Response, NT NOTIFY
     99 0.325132    134.124.42.203        134.124.18.221        SMB      Trans2 
Request, SET_FILE_INFO, FID: 0x0007
    100 0.325353    134.124.42.203        134.124.18.221        SMB      NT 
Trans Request, NT NOTIFY, FID: 0xc00f
    101 0.325420    134.124.18.221        134.124.42.203        SMB      Trans2 
Response, SET_FILE_INFO
    102 0.325539    134.124.18.221        134.124.42.203        SMB      NT 
Trans Response, NT NOTIFY
    103 0.325573    134.124.42.203        134.124.18.221        TCP      2724 > 
netbios-ssn [ACK] Seq=5136 Ack=4896 Win=65040 [TCP CHECKSUM INCORRECT] Len=0
    104 0.326267    134.124.42.203        134.124.18.221        SMB      Write 
AndX Request, FID: 0x0007, 1 byte at offset 44543
    105 0.326505    134.124.18.221        134.124.42.203        SMB      Write 
AndX Response, FID: 0x0007, 1 byte

...

    179 0.399282    134.124.42.203        134.124.18.221        SMB      Write 
AndX Request, FID: 0x0007, 1 byte at offset 994815
    180 0.399615    134.124.18.221        134.124.42.203        SMB      Write 
AndX Response, FID: 0x0007, 0 bytes, Error: STATUS_DISK_FULL
    181 0.400564    134.124.42.203        134.124.18.221        SMB      Write 
AndX Request, FID: 0x0007, 1 byte at offset 1027583
    182 0.401010    134.124.18.221        134.124.42.203        SMB      Write 
AndX Response, FID: 0x0007, 0 bytes, Error: STATUS_DISK_FULL
    183 0.401489    134.124.42.203        134.124.18.221        SMB      Write 
AndX Request, FID: 0x0007, 1 byte at offset 1060351
    184 0.402523    134.124.18.221        134.124.42.203        SMB      Write 
AndX Response, FID: 0x0007, 0 bytes, Error: STATUS_DISK_FULL
    185 0.402946    134.124.42.203        134.124.18.221        SMB      Write 
AndX Request, FID: 0x0007, 1 byte at offset 1093119
    186 0.403338    134.124.18.221        134.124.42.203        SMB      Write 
AndX Response, FID: 0x0007, 0 bytes, Error: STATUS_DISK_FULL
    187 0.403757    134.124.42.203        134.124.18.221        SMB      Write 
AndX Request, FID: 0x0007, 1 byte at offset 1125887
    188 0.404074    134.124.18.221        134.124.42.203        SMB      Write 
AndX Response, FID: 0x0007, 0 bytes, Error: STATUS_DISK_FULL
    189 0.404508    134.124.42.203        134.124.18.221        SMB      Write 
AndX Request, FID: 0x0007, 1 byte at offset 1158655
    190 0.404786    134.124.18.221        134.124.42.203        SMB      Write 
AndX Response, FID: 0x0007, 0 bytes, Error: STATUS_DISK_FULL
    191 0.405216    134.124.42.203        134.124.18.221        SMB      Write 
AndX Request, FID: 0x0007, 1 byte at offset 1191423
    192 0.405559    134.124.18.221        134.124.42.203        SMB      Write 
AndX Response, FID: 0x0007, 0 bytes, Error: STATUS_DISK_FULL
    193 0.405972    134.124.42.203        134.124.18.221        SMB      Write 
AndX Request, FID: 0x0007, 1 byte at offset 1224191
    194 0.406280    134.124.18.221        134.124.42.203        SMB      Write 
AndX Response, FID: 0x0007, 0 bytes, Error: STATUS_DISK_FULL
    195 0.406683    134.124.42.203        134.124.18.221        SMB      Write 
AndX Request, FID: 0x0007, 1 byte at offset 1256959
    196 0.407601    134.124.18.221        134.124.42.203        SMB      Write 
AndX Response, FID: 0x0007, 0 bytes, Error: STATUS_DISK_FULL
    197 0.408056    134.124.42.203        134.124.18.221        SMB      Write 
AndX Request, FID: 0x0007, 1 byte at offset 1289727
    198 0.408650    134.124.18.221        134.124.42.203        SMB      Write 
AndX Response, FID: 0x0007, 0 bytes, Error: STATUS_DISK_FULL
    199 0.409067    134.124.42.203        134.124.18.221        SMB      Write 
AndX Request, FID: 0x0007, 1 byte at offset 1322495
    200 0.410005    134.124.18.221        134.124.42.203        SMB      Write 
AndX Response, FID: 0x0007, 0 bytes, Error: STATUS_DISK_FULL
    201 0.410358    134.124.42.203        134.124.18.221        SMB      Write 
AndX Request, FID: 0x0007, 1 byte at offset 1355263
    202 0.411128    134.124.18.221        134.124.42.203        SMB      Write 
AndX Response, FID: 0x0007, 0 bytes, Error: STATUS_DISK_FULL
    203 0.411531    134.124.42.203        134.124.18.221        SMB      Write 
AndX Request, FID: 0x0007, 1 byte at offset 1388031
    204 0.412383    134.124.18.221        134.124.42.203        SMB      Write 
AndX Response, FID: 0x0007, 0 bytes, Error: STATUS_DISK_FULL
    205 0.412786    134.124.42.203        134.124.18.221        SMB      Write 
AndX Request, FID: 0x0007, 1 byte at offset 1420799
    206 0.413405    134.124.18.221        134.124.42.203        SMB      Write 
AndX Response, FID: 0x0007, 0 bytes, Error: STATUS_DISK_FULL
    207 0.413840    134.124.42.203        134.124.18.221        SMB      Write 
AndX Request, FID: 0x0007, 1 byte at offset 1453567
    208 0.415027    134.124.18.221        134.124.42.203        SMB      Write 
AndX Response, FID: 0x0007, 0 bytes, Error: STATUS_DISK_FULL
    209 0.415447    134.124.42.203        134.124.18.221        SMB      Write 
AndX Request, FID: 0x0007, 1 byte at offset 1486335
    210 0.415853    134.124.18.221        134.124.42.203        SMB      Write 
AndX Response, FID: 0x0007, 0 bytes, Error: STATUS_DISK_FULL
    211 0.416263    134.124.42.203        134.124.18.221        SMB      Write 
AndX Request, FID: 0x0007, 1 byte at offset 1519103
    212 0.417208    134.124.18.221        134.124.42.203        SMB      Write 
AndX Response, FID: 0x0007, 0 bytes, Error: STATUS_DISK_FULL
    213 0.417560    134.124.42.203        134.124.18.221        SMB      Write 
AndX Request, FID: 0x0007, 1 byte at offset 1543057
    214 0.418340    134.124.18.221        134.124.42.203        SMB      Write 
AndX Response, FID: 0x0007, 0 bytes, Error: STATUS_DISK_FULL
    215 0.418721    134.124.42.203        134.124.18.221        SMB      Write 
AndX Request, FID: 0x0007, 1 byte at offset 970239
    216 0.419224    134.124.18.221        134.124.42.203        SMB      Write 
AndX Response, FID: 0x0007, 0 bytes, Error: STATUS_DISK_FULL

Now we jump over to the same situation saving to the Samba server and Samba 
never returns a STATUS_DISK_FULL even though the series of 1 byte writes and 
incrementing offsets creates a file on the Samba server, it must be a sparse 
file, that is to large to fit if all the space where actually allocated to the 
file.  This is with strict allocate, strict sync, and sync always all enabled.  
The following is from the Samba Ethereal trace I sent you, there will be some 
clusters of Windows probing around a bit with these 1 byte writes and then 
actually writing some big chunks out to the file and then doing some more 
probing.  The first probe is at packet 58, a 1 byte write at offset 11305, 
throughout the capture there are many of these 1 byte writes with offsets well 
over 1 Megabyte yet there was only about 1 Meg free on the disk partition.  It 
culminates at packet 2509 writing 1 byte with an offset of 1575423.  Unlike the 
trace of the Windows Server 2003 system none of these 1 byte writes ever fail 
with STATUS_DISK_FULL they always return STATUS_SUCCESS.  I end up with a 
corrupt Word document on the disk that if do ls -l on in Linux or look at its 
properties in Windows Explorer is supposedly 1575424 bytes.  In my 
understanding of the way the world operates that has to be a sparse file 
because there was only about 1 Meg free on the disk to begin with and as I said 
yesterday I can't move that file from the Samba share to my C: drive and then 
back to the Samba share, I'll get an insufficent space error.

     58 0.282850    134.124.42.203        134.124.48.126        SMB      Write 
AndX Request, FID: 0x31d7, 1 byte at offset 11305
     59 0.319457    134.124.48.126        134.124.42.203        TCP      
netbios-ssn > 2517 [ACK] Seq=3640 Ack=2938 Win=10220 Len=0
     60 0.336431    134.124.48.126        134.124.42.203        SMB      Write 
AndX Response, FID: 0x31d7, 1 byte

...

   2509 8.567376    134.124.42.203        134.124.48.126        SMB      Write 
AndX Request, FID: 0x31db, 1 byte at offset 1575423
   2510 8.599248    134.124.48.126        134.124.42.203        TCP      
netbios-ssn > 2517 [ACK] Seq=38848 Ack=1781719 Win=10440 Len=0
   2511 8.664959    134.124.48.126        134.124.42.203        SMB      Write 
AndX Response, FID: 0x31db, 1 byte

There you go Jeremy, I'll be eagerly waiting to hear from you.

Thanks again,
Tom Schaefer
Comment 6 Jeremy Allison 2005-05-16 18:02:36 UTC
Created attachment 1231 [details]
Proposed patch

Ok, please test the attached patch. You will need to set "strict allocate =
yes" in order to execute the new code and it will run slower than before, as
each write beyond EOF will cause smbd to zero-fill the previously sparse area.
It should catch the situation you describe in your (excellent) bug report
though. Ie. All the 1 byte writes should force a DISK_FULL error return. If
applies (with a little fuzz factor) to the 3.0.14a source code.
Cheers,
Jeremy.
Comment 7 Tom Schaefer 2005-05-23 13:56:01 UTC
Hi Jeremy,

I finally got to test the patch over the weekend and today.  The whole issue of
Word corrupting documents seems to be eliminated.  So very awesome!!  Thankyou,
this will work.

However, and this is pretty much just an FYI, the patch doesn’t seem to be a
"perfect" solution.  Windows still gets a "Delayed Write Failed" error from the
OS when saving the document as well as the normal error from Word whereas
attempting to save to large of a document to an actual Windows server only
results in the one error message from Word, the the Windows client OS never
complains. 

In other words the user saving to large of a document has to click OK on two
errors instead of one.  I don’t care if you don’t.  Or, if you are interested in
pursuing what that’s about I’ll go down that road with you.  Whatever you want.  

I’ve actually been looking at it already with Ethereal.  What I’ve noticed now
is that as Windows/Word is probing around on the server for available space it
will write 1 byte to the new temporary file and then if that succeeds it
immediately does a “Trans2 Request, Query File Info” on it.  The Trans2
Response, Query File Info Includes the information End of File and and
Allocation Size.  On a Windows server if the 1 byte was written at offset 44543
the Trans2 Response, Query File Info will then show End of File as 44544 and the
allocation size will always be just a little greater than EOF say like about
46000.  Then the next 1 byte probe will be made at a location a little bit
higher than the allocation size so say about 47000.  And so on.  And the
Windows/Word client kind of methodically stair steps its way up in small
increments to where STATUS_DISK_FULL is reached and thus the Windows/Word client
has a pretty accurate knowledge of just exactly how much space is available on
the Windows server.

Contrast this with the situation where it’s a Samba server, what I’ve found
there is that Windows/Word will just do a couple 1 byte probes at low locations
like 44543 and then when it does its Trans2 Query and gets the Reponse the Samba
server tells it that EOF is at 44544 and the Allocation Size is 1048576.  So
then the next Windows 1 byte probe will be at an offset a little higher than
1048576  say 1060000.  If that succeeds then the next Trans2 File Query/Response
will show Samba reporting that exactly 2 Megabytes is the Allocation Size and 3
Megabytes allocated as soon as something is written over 2 Megabytes and so
forth.  In other Words the Samba server always responds with the allocation size
in rounded up whole Megabytes only and so the Windows client never can really
get a fine grained picture of just how much space is available like it can
talking to a Windows server.  When talking to the Samba server I think the
Windows client can only get as accurate a picture of how much space is available
rounded up to the next whole Megabyte.  If there is actually 1.2 Megabytes
available Windows comes away from its one byte probes thinking there is exactly
2 Megabytes available.

Now as far as the Delayed Write Fails go.  What I see happening when talking to
a Windows server the Windows/Word client will determine that there should be say
at least 1.6 Megabytes available on the server and then it will actually start
writing the contents of the Word document into the file on the server and halt
itself just short of 1.6 Megabytes.

Now when saving out to the Samba server that say has 1.6 Megabytes available the
Windows/Word client thinks there is exactly 2 Megabytes available and starts
writing the Word document out to the Samba share intending to halt itself just
short of 2 Megabytes.  But beyond 1.6 Megabytes of writing Samba starts
responding with STATUS_DISK_FULL and I believe that is the source of the Delayed
Write Failed errors.  Windows is getting an error writing out to a file position
that it had previously determined via the 1 byte probes "should be" available
for writing.

I’m guestimating the whole 1 byte probes thing is the Windows client OS figuring
out how much space is available, then it lets the Word application start writing
out its file and just short of where the server disk would fill the Windows
client OS tells the Word application itself that there is no more space so that
Word itself doesn’t have to deal with talking directly to a network disk server.

I got to wondering if there was less than 1 Megabyte available on the Samba
server say like 900k and I tried to save a very small like 30k document onto the
share would I be told the disk full.  So I tried it and sure enough thats the
case.  I can't save any Word documents to a Samba share unless there is at least
1 Meg free even if its a just a little tiny document that should fit onto the
share no problem.  I'm told by Word that there isn't disk space available and do
not get any Delayed Write Failed errors.

I think if Samba could be made to round down on the whole Megabyte allocation
sizes it reports it would probably entirely solve the Delayed Write Failed
errors.  Or better yet if Samba could be made to report those allocation sizes
in much smaller than whole 1 Megabyte increments that would be even better.

Anyhow, again Jeremy the real crisis has been solved.  If you’ve got time and/or
an interest in solving the now cosmetic issue of these Delayed Write Failed
errors I'm certainly willing to try and help.  If nothing else though, if you've
got at least a comment or two about my above analysis I'd love to hear it.

Once again, the important problem, corrupt Word documents is solved.

Thanks again,
Tom Schaefer
Comment 8 Jeremy Allison 2005-05-23 14:36:32 UTC
We already have the capability to change this. When a client asks for the
"allocation size" we round up to the nearest :

"allocation roundup size"

parameter, which is a per-share parameter set to be 0x100000 bytes (1mb). If you
can tell me what the roundup size is that Windows servers use when allocating
space then you can test the theory by setting "allocation roundup size =
<windows value>" and this should behave exactly the same as Windows. From your
mail it looks like the allocation size that Windows uses might be somewhere
between 1k - 8k. 

We don't actuall allocate this on the disk (as there is no way to do this using
the POSIX API's), we just round up to the given value. The reason we use 1mb is
a cheat :-). Someone discovered that Windows clients use the allocation size as
a disk cache tuning parameter, so if we set it very large then they cache much
better to a Samba server than to a Windows one :-).

But this large allocation roundup can cause problems (specifically with visual
studio) so we made it a tunable parameter. If you can work out the right value
for it this should fix your problem.

In the meantime I'm going to close this one out as the data corruption bug is
fixed - if you discover the correct allocation size just add it to this bug
report for future knowledge.

Thanks,

Jeremy.


Comment 9 Gerald (Jerry) Carter (dead mail address) 2005-08-24 10:25:04 UTC
sorry for the same, cleaning up the database to prevent unecessary reopens of bugs.