Bug 13302 - smbd does not support AESNI on AMD platforms
smbd does not support AESNI on AMD platforms
Product: Samba 4.1 and newer
Classification: Unclassified
Component: File services
x86 All
: P5 normal
: 4.7
Assigned To: Jeremy Allison
Samba QA Contact
Depends on:
  Show dependency treegraph
Reported: 2018-02-27 07:17 UTC by eric.vannier
Modified: 2018-03-23 04:36 UTC (History)
3 users (show)

See Also:

Fixes bug (adds AESNI support for AMD processors) (2.40 KB, patch)
2018-03-22 07:07 UTC, eric.vannier
bjacke: review-
git-am fix for master. (1.56 KB, patch)
2018-03-22 23:49 UTC, Jeremy Allison
no flags Details
Revised git-am fix for master. (1.39 KB, patch)
2018-03-22 23:54 UTC, Jeremy Allison
jra: review? (bjacke)
Proposed patch for all processors (3.57 KB, patch)
2018-03-23 04:36 UTC, eric.vannier
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description eric.vannier 2018-02-27 07:17:34 UTC
AESNI acceleration is not enabled on any AMD processors (and hence on Ryzens). As a result, when configuring samba with encryption the performance is very poor.

Looking at the code, I can see that the problems like in lib/crypto/aes.c where the code checks for cpuid reg0 (has_intel_aes_instructions) (pseudo code):
if (cpu_results != "IntelGenuine")
  return false

Obviously on AMD processors, the value is AuthenticAMD, and therefore the code fails to detect the acceleration. I changed the code to recognize the processor, and I was then able to use AESNI (or at least performance went up, CPU usage down, and perf record shows the aesni assembly being invoked ;-)) on either my Ivy Bridge (where it was working before the patch), and Ryzen machines.

I am not sure why the code is specifically checking for Intel instead of accepting AMD and Intel (or for that matter any other clone), so I might be missing something, but either removing the if on genuineintel, or or'ing with authenticamd should work just fine (I did not see such checks in the linux kernel, but I might have missed it).
One concern with my patch is that I do have access to an Ivy Bridge and Ryzen, but I do not have access to a battery of various processors to see if they exhibit a weird behavior that the previous code was handling.

related to bug 13008 which added support for Intel processors (but not amd's).
Comment 1 Ralph Böhme 2018-02-27 14:04:11 UTC
*scratches head*
Comment 2 eric.vannier 2018-02-28 06:12:24 UTC
Sorry, my employer has not agreed yet to my contributing the patch. Though, the patch is trivial, so should somebody have the time to write the couple of lines that I suggested, it might get done before my employer approves my contribution. Thanks for understanding.
Comment 3 eric.vannier 2018-03-22 07:07:45 UTC
Created attachment 14067 [details]
Fixes bug (adds AESNI support for AMD processors)
Comment 4 Björn Jacke 2018-03-22 07:58:15 UTC
Comment on attachment 14067 [details]
Fixes bug (adds AESNI support for AMD processors)

you already mentioned that the check for the vendor should be removed or extended. As the cpuid flag 25 is always the aes instruction set on x86, the check should just be removed actually here.
Comment 5 Jeremy Allison 2018-03-22 23:44:08 UTC
Björn, as the __cpuid(cpuid_results, 1) seem to be standard, as 


show cpuid(1) with is supported on a pentium or above. Looks like we could just remove the check for vendor and always just directly check for bit 25 and if so assume AES instructions - correct ?
Comment 6 Jeremy Allison 2018-03-22 23:49:09 UTC
Created attachment 14076 [details]
git-am fix for master.

Bjorn and Eric, does this patch work for you ?
Comment 7 Jeremy Allison 2018-03-22 23:54:56 UTC
Created attachment 14077 [details]
Revised git-am fix for master.

Slightly better fix that doesn't zero out cpuid_results[] each time we check the has_aes_instructions bool, but only once on initial check.
Comment 8 eric.vannier 2018-03-23 04:36:58 UTC
Created attachment 14078 [details]
Proposed patch for all processors


lgtm. Though, I am not sure the zero'ing is necessary: cpuid asm will store into the four locations (maybe I am missing something).
Maybe, as well, rename has_intel ... to has_aes since this is no longer about intel (and the instructions are called AESNI) ?
See proposed patch.