TY - GEN
T1 - Efficient ring-LWE encryption on 8-bit AVR processors
AU - Liu, Zhe
AU - Seo, Hwajeong
AU - Roy, Sujoy Sinha
AU - Großschädl, Johann
AU - Kim, Howon
AU - Verbauwhede, Ingrid
PY - 2015/9/1
Y1 - 2015/9/1
N2 - Public-key cryptography based on the “ring-variant” of the Learning with Errors (ring-LWE) problem is both efficient and believed to remain secure in a post-quantum world. In this paper, we introduce a carefully-optimized implementation of a ring-LWE encryption scheme for 8-bit AVR processors like the ATxmega128. Our research contributions include several optimizations for the Number Theoretic Transform (NTT) used for polynomial multiplication. More concretely, we describe the Move-and-Add (MA) and the Shift-Add-Multiply-Subtract-Subtract (SAMS2) technique to speed up the performance-critical multiplication and modular reduction of coefficients, respectively. We take advantage of incompletely-reduced intermediate results to minimize the total number of reduction operations and use a special coefficient-storage method to decrease the RAM footprint of NTT multiplications. In addition, we propose a byte-wise scanning strategy to improve the performance of a discrete Gaussian sampler based on the Knuth-Yao random walk algorithm. For medium-term security, our ring-LWE implementation needs 590 k, 672 k, and 276 k clock cycles for key-generation, encryption, and decryption, respectively. On the other hand, for long-term security, the execution time of key-generation, encryption, and decryption amount to 2. 2M, 2. 6 M, and 686 k cycles, respectively. These results set new speed records for ring-LWE encryption on an 8-bit processor and outperform related RSA and ECC implementations by an order of magnitude.
AB - Public-key cryptography based on the “ring-variant” of the Learning with Errors (ring-LWE) problem is both efficient and believed to remain secure in a post-quantum world. In this paper, we introduce a carefully-optimized implementation of a ring-LWE encryption scheme for 8-bit AVR processors like the ATxmega128. Our research contributions include several optimizations for the Number Theoretic Transform (NTT) used for polynomial multiplication. More concretely, we describe the Move-and-Add (MA) and the Shift-Add-Multiply-Subtract-Subtract (SAMS2) technique to speed up the performance-critical multiplication and modular reduction of coefficients, respectively. We take advantage of incompletely-reduced intermediate results to minimize the total number of reduction operations and use a special coefficient-storage method to decrease the RAM footprint of NTT multiplications. In addition, we propose a byte-wise scanning strategy to improve the performance of a discrete Gaussian sampler based on the Knuth-Yao random walk algorithm. For medium-term security, our ring-LWE implementation needs 590 k, 672 k, and 276 k clock cycles for key-generation, encryption, and decryption, respectively. On the other hand, for long-term security, the execution time of key-generation, encryption, and decryption amount to 2. 2M, 2. 6 M, and 686 k cycles, respectively. These results set new speed records for ring-LWE encryption on an 8-bit processor and outperform related RSA and ECC implementations by an order of magnitude.
KW - Discrete Gaussian sampling
KW - Number-theoretic transform
KW - Public-key encryption
KW - Ring learning with errors (Ring-LWE)
UR - http://www.scopus.com/inward/record.url?scp=84946024060&partnerID=8YFLogxK
U2 - 10.1007/978-3-662-48324-4_33
DO - 10.1007/978-3-662-48324-4_33
M3 - Conference contribution
AN - SCOPUS:84946024060
SN - 9783662483237
T3 - Lecture Notes in Computer Science
SP - 663
EP - 682
BT - Cryptographic Hardware and Embedded Systems - CHES 2015
A2 - Güneysu, Tim
A2 - Handschuh, Helena
PB - Springer Verlag
T2 - International Workshop on Cryptographic Hardware and Embedded Systems, CHES 2015
Y2 - 13 September 2015 through 16 September 2015
ER -