Implement a simple cost model for FHE computations #929

j2kun · 2024-08-22T03:45:01Z

Starting this issue so I have a place to jot down notes as I come across relevant information in the literature.

Basically, various passes will eventually need to start making decisions about whether to choose one type of homomorphic operation or another. We should have a cost model (or multiple models) that help quantify these trade-offs. Eventually I think they will boil down to more quantifiable costs per hardware target, but for high level passes it will still be useful to have a relative tradeoff.

To start, the HyPHEN paper Kim-Park-Kim-Kim-Ahn 2023 (https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=10376063) has a table of benchmark CKKS operation times; 64-threads across two 2.35 GHz CPUs, using RNS-CKKS HEaaN, N=2^16 (see IV.A in the linked paper).

Op	Runtime (ms)
Add plaintext	0.169
Add ciphertext	0.202
Mul plaintext	0.506
Mul ciphertext	17.3
Rescale	3.90
Plaintext rotate	0.102
Ciphertext rotate	15.5
Bootstrap	2160

The text was updated successfully, but these errors were encountered:

FlorentCLMichel · 2024-08-25T21:59:11Z

This may be relevant: https://eprint.iacr.org/2024/1284.pdf
If I understand correctly, the main idea is to convert a batch of RLWE ciphers to a ‘multi-secret’ variant to reduce the cost of plaintext-ciphertext matrix multiplication and embed the latter in the CKKS bootstrapping.
Reported runtimes for bootstrap + plaintext-ciphertext matrix multiplication (Table 6) on an Intel Xeon Gold 6242 CPU running at 2.80GHz (single thread) using the HEaaN library (I think ‘cipher’ is used as a synonym for ‘65536 elements’ here; but I'm not entirely sure):

Dimension	Runtime (s)	Runtime (s) / cipher
256	18	18
512	42.3	10.6
1024	143	8.92
2048	581	9.07
4096	2310	9.03
8192	9200	8.99
16384	34200	9.09

Parameters (from Table 4): N = 65536, log(Q P) = 1555, log(Q) = 1258, dnum = 5, depth = 9

AlexanderViand-Intel · 2024-09-08T09:59:30Z

It's odd that most benchmark papers just report a single multiplication/etc speed per parameter set, as there's a significant difference between the first multiplication of two fresh ciphertexts (let's say 10+ RNS limbs) and the final multiplication after lots of modswitch/rescale (might be down to 2 RNS limbs).
I've seen papers go to the trouble of differentiating between ctxt mul and ctxt square due to the slightly lower overhead of the latter, but very rarely do I see people consider the cost as a function of depth.

EDIT: Oh, and I guess for our purposes, we very much don't want to consider "multiplication" as "mul + relin" as the HyPHEN paper seems to do, since we already know we'll usually be decoupling those operations.

j2kun added the optimization label Aug 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement a simple cost model for FHE computations #929

Implement a simple cost model for FHE computations #929

j2kun commented Aug 22, 2024

FlorentCLMichel commented Aug 25, 2024 •

edited

Loading

AlexanderViand-Intel commented Sep 8, 2024 •

edited

Loading

Implement a simple cost model for FHE computations #929

Implement a simple cost model for FHE computations #929

Comments

j2kun commented Aug 22, 2024

FlorentCLMichel commented Aug 25, 2024 • edited Loading

AlexanderViand-Intel commented Sep 8, 2024 • edited Loading

FlorentCLMichel commented Aug 25, 2024 •

edited

Loading

AlexanderViand-Intel commented Sep 8, 2024 •

edited

Loading