wolfCryptにおけるメモリ使用量のベンチマーク

世界各地で開催された展示会でwolfCryptベンチマークをデモンストレーションした際、特にポスト量子アルゴリズムのメモリ使用量について頻繁にご質問を頂きました。

偶然にも、当時求められていたものを正確に提供する機能に取り組んでいたところでした。これからご紹介する例はすでにwolfSSLのGitHubリポジトリに存在しており、wolfSSLの次期リリースに含まれる予定です。

概要

wolfSSLをビルドする際、./configureオプション 

--enable-memory
--enable-trackmemory=verbose
--enable-stacksize=verbose

または以下のマクロ定義

#define WOLFSSL_TRACK_MEMORY
#define WOLFSSL_TRACK_MEMORY_VERBOSE
#define HAVE_STACK_SIZE
#define HAVE_STACK_SIZE_VERBOSE

を追加することで、メモリとスタック追跡を有効にできます。

これらのオプションは、wolfCryptベンチマーク内でヒープ割り当て追跡と、詳細なスタック使用量レポートの両方を有効にします。

これにより、ベンチマークは各アルゴリズムのベンチマーク中のピークメモリ使用量と合計メモリ使用量、そして最後にアプリケーション全体の合計を出力できます。

通常のパフォーマンスベンチマーク記録とは異なり、このモードは各アルゴリズムのセットアップフェーズ中のメモリも追跡するため、アルゴリズムの初期化と実行全体でどれだけのRAMが使用されるかを確認できます。

実行例

STM32U585においてRNG、AES、SHA-256のハードウェアアクセラレーションを有効化してコンパイルし、実行しました。リソース制約のある組み込みシステムでメモリ追跡がどのように動作するかを示しています。なお耐量子アルゴリズムでは、メモリモードsmallを使用しています。

wolfCrypt Benchmark (block bytes 1024, min 1.0 sec each)
RNG 425 KiB took 1.039 seconds, 409.047 KiB/s [heap 494 bytes (8 allocs), stack 1448 bytes]
(クリックして実行ログ全体を表示)
wolfCrypt Benchmark (block bytes 1024, min 1.0 sec each)
RNG                        425 KiB took 1.039 seconds,  409.047 KiB/s [heap 494 bytes (8 allocs), stack 1448 bytes]
AES-128-CBC-enc              9 MiB took 1.000 seconds,    8.521 MiB/s [heap 312 bytes (1 allocs), stack 736 bytes]
AES-128-CBC-dec              8 MiB took 1.000 seconds,    8.472 MiB/s [heap 312 bytes (1 allocs), stack 736 bytes]
AES-256-CBC-enc              8 MiB took 1.000 seconds,    7.910 MiB/s [heap 312 bytes (1 allocs), stack 736 bytes]
AES-256-CBC-dec              8 MiB took 1.000 seconds,    7.861 MiB/s [heap 312 bytes (1 allocs), stack 736 bytes]
AES-128-GCM-enc              8 MiB took 1.000 seconds,    7.935 MiB/s [heap 344 bytes (3 allocs), stack 992 bytes]
AES-128-GCM-dec              8 MiB took 1.000 seconds,    7.886 MiB/s [heap 312 bytes (1 allocs), stack 984 bytes]
AES-256-GCM-enc              7 MiB took 1.000 seconds,    7.397 MiB/s [heap 344 bytes (3 allocs), stack 976 bytes]
AES-256-GCM-dec              7 MiB took 1.000 seconds,    7.349 MiB/s [heap 312 bytes (1 allocs), stack 984 bytes]
AES-128-GCM-enc-no_AAD       8 MiB took 1.000 seconds,    7.983 MiB/s [heap 344 bytes (3 allocs), stack 976 bytes]
AES-128-GCM-dec-no_AAD       8 MiB took 1.000 seconds,    7.935 MiB/s [heap 312 bytes (1 allocs), stack 944 bytes]
AES-256-GCM-enc-no_AAD       7 MiB took 1.000 seconds,    7.422 MiB/s [heap 344 bytes (3 allocs), stack 976 bytes]
AES-256-GCM-dec-no_AAD       7 MiB took 1.000 seconds,    7.397 MiB/s [heap 312 bytes (1 allocs), stack 944 bytes]
GMAC Small                  14 MiB took 1.000 seconds,   14.154 MiB/s [heap 0 bytes (0allocs), stack 1536 bytes]
CHACHA                       6 MiB took 1.000 seconds,    5.688 MiB/s [heap 68 bytes (1 allocs), stack 624 bytes]
CHA-POLY                     4 MiB took 1.004 seconds,    3.623 MiB/s [heap 232 bytes (4 allocs), stack 672 bytes]
POLY1305                    16 MiB took 1.000 seconds,   15.918 MiB/s [heap 40 bytes (1 allocs), stack 800 bytes]
SHA-256                     14 MiB took 1.000 seconds,   14.429 MiB/s [heap 344 bytes (2 allocs), stack 624 bytes]
SHA-384                      1 MiB took 1.012 seconds,    1.158 MiB/s [heap 400 bytes (3 allocs), stack 624 bytes]
SHA-512                      1 MiB took 1.016 seconds,    1.153 MiB/s [heap 416 bytes (3 allocs), stack 658 bytes]
SHA-512/224                  1 MiB took 1.012 seconds,    1.158 MiB/s [heap 380 bytes (3 allocs), stack 624 bytes]
SHA-512/256                  1 MiB took 1.012 seconds,    1.158 MiB/s [heap 384 bytes (3 allocs), stack 624 bytes]
SHA3-224                     1 MiB took 1.003 seconds,    1.290 MiB/s [heap 436 bytes (2 allocs), stack 656 bytes]
SHA3-256                     1 MiB took 1.000 seconds,    1.221 MiB/s [heap 440 bytes (2 allocs), stack 656 bytes]
SHA3-384                   975 KiB took 1.016 seconds,  959.646 KiB/s [heap 456 bytes (2 allocs), stack 656 bytes]
SHA3-512                   675 KiB took 1.008 seconds,  669.643 KiB/s [heap 472 bytes (2 allocs), stack 656 bytes]
SHAKE128                     2 MiB took 1.012 seconds,    1.496 MiB/s [heap 576 bytes (2 allocs), stack 672 bytes]
SHAKE256                     1 MiB took 1.003 seconds,    1.217 MiB/s [heap 544 bytes (2 allocs), stack 656 bytes]
HMAC-SHA256                 14 MiB took 1.000 seconds,   14.014 MiB/s [heap 768 bytes (1 allocs), stack 784 bytes]
HMAC-SHA384                  1 MiB took 1.008 seconds,    1.138 MiB/s [heap 896 bytes (2 allocs), stack 840 bytes]
HMAC-SHA512                  1 MiB took 1.008 seconds,    1.138 MiB/s [heap 896 bytes (2 allocs), stack 784 bytes]
RSA     2048   public        58 ops took 1.000 sec, avg 17.241 ms, 58.000 ops/sec [heap 6725 bytes (6 allocs), stack 1040 bytes]
RSA     2048  private         2 ops took 2.047 sec, avg 1023.500 ms, 0.977 ops/sec [heap 2860 bytes (4 allocs), stack 1096 bytes]
DH      2048  key gen         3 ops took 1.278 sec, avg 426.000 ms, 2.347 ops/sec [heap 7752 bytes (10 allocs), stack 1072 bytes]
DH      2048    agree         4 ops took 1.706 sec, avg 426.500 ms, 2.345 ops/sec [heap 10428 bytes (9 allocs), stack 1376 bytes]
ML-KEM 512    128  key gen       290 ops took 1.004 sec, avg 3.462 ms, 288.845 ops/sec[heap 1530 bytes (2 allocs), stack 1096 bytes]
ML-KEM 512    128    encap       278 ops took 1.004 sec, avg 3.612 ms, 276.892 ops/sec[heap 3578 bytes (2 allocs), stack 1088 bytes]
ML-KEM 512    128    decap       206 ops took 1.000 sec, avg 4.854 ms, 206.000 ops/sec[heap 4346 bytes (3 allocs), stack 1088 bytes]
ML-KEM 768    192  key gen       176 ops took 1.000 sec, avg 5.682 ms, 176.000 ops/sec[heap 2042 bytes (2 allocs), stack 1096 bytes]
ML-KEM 768    192    encap       164 ops took 1.008 sec, avg 6.146 ms, 162.698 ops/sec[heap 5114 bytes (2 allocs), stack 1792 bytes]
ML-KEM 768    192    decap       128 ops took 1.012 sec, avg 7.906 ms, 126.482 ops/sec[heap 6202 bytes (3 allocs), stack 1792 bytes]
ML-KEM 1024   256  key gen       108 ops took 1.004 sec, avg 9.296 ms, 107.570 ops/sec[heap 2554 bytes (2 allocs), stack 1096 bytes]
ML-KEM 1024   256    encap       102 ops took 1.008 sec, avg 9.882 ms, 101.190 ops/sec[heap 6650 bytes (2 allocs), stack 1792 bytes]
ML-KEM 1024   256    decap        84 ops took 1.019 sec, avg 12.131 ms, 82.434 ops/sec[heap 8218 bytes (3 allocs), stack 1792 bytes]
ECC   [      SECP256R1]   256  key gen        12 ops took 1.008 sec, avg 84.000 ms, 11.905 ops/sec [heap 4628 bytes (6 allocs), stack 1080 bytes]
ECDHE [      SECP256R1]   256    agree        12 ops took 1.004 sec, avg 83.667 ms, 11.952 ops/sec [heap 9393 bytes (15 allocs), stack 1416 bytes]
ECDSA [      SECP256R1]   256     sign        58 ops took 1.023 sec, avg 17.638 ms, 56.696 ops/sec [heap 308 bytes (5 allocs), stack 1112 bytes]
ECDSA [      SECP256R1]   256   verify        54 ops took 1.000 sec, avg 18.519 ms, 54.000 ops/sec [heap 152 bytes (2 allocs), stack 1432 bytes]
CURVE  25519  key gen         3 ops took 1.086 sec, avg 362.000 ms, 2.762 ops/sec [heap 119 bytes (3 allocs), stack 1000 bytes]
CURVE  25519    agree         4 ops took 1.447 sec, avg 361.750 ms, 2.764 ops/sec [heap 119 bytes (3 allocs), stack 1768 bytes]
ED     25519  key gen         3 ops took 1.102 sec, avg 367.333 ms, 2.722 ops/sec [heap 128 bytes (3 allocs), stack 1136 bytes]
ED     25519     sign         4 ops took 1.494 sec, avg 373.500 ms, 2.677 ops/sec [heap 256 bytes (4 allocs), stack 1792 bytes]
ED     25519   verify         2 ops took 1.538 sec, avg 769.000 ms, 1.300 ops/sec [heap 128 bytes (1 allocs), stack 1792 bytes]
ML-DSA    44  key gen        66 ops took 1.000 sec, avg 15.152 ms, 66.000 ops/sec [heap 26531 bytes (6 allocs), stack 1072 bytes]
ML-DSA    44     sign        16 ops took 1.051 sec, avg 65.688 ms, 15.224 ops/sec [heap 15528 bytes (3 allocs), stack 1416 bytes]
ML-DSA    44   verify        62 ops took 1.027 sec, avg 16.565 ms, 60.370 ops/sec [heap 8104 bytes (1 allocs), stack 1416 bytes]
ML-DSA    65  key gen        38 ops took 1.016 sec, avg 26.737 ms, 37.402 ops/sec [heap 31651 bytes (6 allocs), stack 1040 bytes]
ML-DSA    65     sign         8 ops took 1.008 sec, avg 126.000 ms, 7.937 ops/sec [heap 20648 bytes (3 allocs), stack 1416 bytes]
ML-DSA    65   verify        38 ops took 1.039 sec, avg 27.342 ms, 36.574 ops/sec [heap 9128 bytes (1 allocs), stack 1416 bytes]
ML-DSA    87  key gen        24 ops took 1.079 sec, avg 44.958 ms, 22.243 ops/sec [heap 37795 bytes (6 allocs), stack 1072 bytes]
ML-DSA    87     sign         6 ops took 1.146 sec, avg 191.000 ms, 5.236 ops/sec [heap 26792 bytes (3 allocs), stack 1416 bytes]
ML-DSA    87   verify        22 ops took 1.024 sec, avg 46.545 ms, 21.484 ops/sec [heap 11432 bytes (1 allocs), stack 1416 bytes]
Benchmark complete

より読みやすくするために

これらの出力はツールで解析するにはよいのですが、人の目にはあまり優しくないかもしれません。
そこで、追加のオプションを2点ご紹介します。

1つ目は、wolfCryptベンチマークをCSVデータを出力するように設定することです。この追加のメモリ追跡もCSV結果に表示されます。

2つ目に、ベンチマークデータをきれいな表形式にレンダリングするPythonスクリプトがあります。現時点ではPythonレンダリングスクリプトはSTM32U585デモを対象としていますが、ご要望に応じて他のプラットフォームにも簡単に適応できます。

まとめ

この新しい機能拡張により、wolfCryptユーザは異なるコンパイル設定下でのパフォーマンスとメモリフットプリントの両方を定量化できるようになりました。これは、メモリ効率が重要な耐量子暗号アルゴリズムおよび組み込み環境でのユースケースにとおいて特に有用です。

現時点ではベアメタルおよびPOSIXシステムで動作します。今後RTOSなどの他のオペレーティングシステムでも動作するようにアップデートする予定です。

ご質問がございましたら、ぜひ info@wolfssl.jp までお問い合わせください。

原文:https://www.wolfssl.com/benchmarking-memory-usage-in-wolfcrypt-bench-new-heap-and-stack-tracking-support