idnits 2.17.00 (12 Aug 2021) /tmp/idnits51574/draft-dss-star-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document date (7 March 2022) is 68 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'OPRF' is defined on line 332, but no explicit reference was found in the text ** Downref: Normative reference to an Informational draft: draft-irtf-cfrg-voprf (ref. 'OPRF') == Outdated reference: A later version (-01) exists of draft-gpew-priv-ppm-00 Summary: 1 error (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group A. Davidson 3 Internet-Draft S. K. Sahib 4 Intended status: Standards Track P. Snyder 5 Expires: 8 September 2022 Brave Software 6 7 March 2022 8 STAR: Distributed Secret Sharing for Private Threshold Aggregation 9 Reporting 10 draft-dss-star-00 12 Abstract 14 Servers often need to collect data from clients that can be privacy- 15 sensitive if the server is able to associate the collected data with 16 a particular user. In this document we describe STAR, an efficient 17 and secure threshold aggregation protocol for collecting measurements 18 from clients by an untrusted aggregation server, while maintaining 19 K-anonymity guarantees. 21 Status of This Memo 23 This Internet-Draft is submitted in full conformance with the 24 provisions of BCP 78 and BCP 79. 26 Internet-Drafts are working documents of the Internet Engineering 27 Task Force (IETF). Note that other groups may also distribute 28 working documents as Internet-Drafts. The list of current Internet- 29 Drafts is at https://datatracker.ietf.org/drafts/current/. 31 Internet-Drafts are draft documents valid for a maximum of six months 32 and may be updated, replaced, or obsoleted by other documents at any 33 time. It is inappropriate to use Internet-Drafts as reference 34 material or to cite them other than as "work in progress." 36 This Internet-Draft will expire on 8 September 2022. 38 Copyright Notice 40 Copyright (c) 2022 IETF Trust and the persons identified as the 41 document authors. All rights reserved. 43 This document is subject to BCP 78 and the IETF Trust's Legal 44 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 45 license-info) in effect on the date of publication of this document. 46 Please review these documents carefully, as they describe your rights 47 and restrictions with respect to this document. Code Components 48 extracted from this document must include Revised BSD License text as 49 described in Section 4.e of the Trust Legal Provisions and are 50 provided without warranty as described in the Revised BSD License. 52 Table of Contents 54 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 55 2. Conventions and Definitions . . . . . . . . . . . . . . . . . 3 56 3. System Overview . . . . . . . . . . . . . . . . . . . . . . . 3 57 3.1. Objective . . . . . . . . . . . . . . . . . . . . . . . . 3 58 3.2. System Architecture . . . . . . . . . . . . . . . . . . . 3 59 3.3. Randomness sampling . . . . . . . . . . . . . . . . . . . 4 60 3.4. Measurement Encryption . . . . . . . . . . . . . . . . . 4 61 3.5. Server Aggregation . . . . . . . . . . . . . . . . . . . 5 62 4. Comparisons with other Systems . . . . . . . . . . . . . . . 6 63 4.1. Private Heavy-Hitter Discovery . . . . . . . . . . . . . 6 64 4.2. General Aggregation . . . . . . . . . . . . . . . . . . . 6 65 5. Security Considerations . . . . . . . . . . . . . . . . . . . 6 66 5.1. Randomness Sampling . . . . . . . . . . . . . . . . . . . 6 67 5.2. Cryptographic Choices . . . . . . . . . . . . . . . . . . 7 68 5.3. Oblivious Submission . . . . . . . . . . . . . . . . . . 7 69 5.4. Leakage . . . . . . . . . . . . . . . . . . . . . . . . . 7 70 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8 71 7. References . . . . . . . . . . . . . . . . . . . . . . . . . 8 72 7.1. Normative References . . . . . . . . . . . . . . . . . . 8 73 7.2. Informative References . . . . . . . . . . . . . . . . . 8 74 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . 9 75 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 9 77 1. Introduction 79 Collecting user data is often fraught with privacy issues because 80 without adequate protections it is trivial for the server to learn 81 sensitive information about the client contributing data. Even when 82 the client's identity is separated from the data (for e.g. if the 83 client is using the [Tor] network or [OHTTP], it's possible for the 84 collected data to be unique enough that the user's identity is 85 leaked. A common solution to this problem of the measurement being 86 user-identifying/sensitive is to make sure that the measurement is 87 only revealed to the server if there are at least K clients that have 88 contributed the same data, thus providing K-anonymity to 89 participating clients. Such privacy-preserving systems are referred 90 to as threshold aggregation systems. 92 In this document we describe one such system, namely Distributed 93 Secret Sharing for Private Threshold Aggregation Reporting (STAR) 94 [STAR], that is currently deployed in production by the [Brave] 95 browser. 97 2. Conventions and Definitions 99 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 100 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 101 "OPTIONAL" in this document are to be interpreted as described in 102 BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all 103 capitals, as shown here. 105 The following terms are used: 107 Aggregation Server: An entity that provides some tool/software, that 108 would like to learn aggregated data points from their user-base. 110 Client: The entity that uses the tool. 112 Measurement: The unencrypted, potentially-sensitive data point that 113 the client is asked to report. 115 Message: The encrypted measurement being sent by the client. 117 Auxiliary Data: Arbitrary data that clients may send as part of 118 their message, but which is not included in any security measures. 120 3. System Overview 122 3.1. Objective 124 In STAR, clients generate encrypted measurements, that they send to a 125 single untrusted aggregation server. In a given amount of time, if 126 the aggregation server receives the same encrypted value from K 127 clients (i.e. K values), the server is able to decrypt the value. 128 This ensures that clients only have their measurements revealed if 129 they are part of a larger crowd. This allows the client to maintain 130 K-anonymity, when paired with mechanisms for removing client- 131 identifying information from their requests. 133 3.2. System Architecture 135 The overall system architecture is shown in Figure 1, where x is the 136 measurement and aux is auxiliary data. 138 Client (x, aux) Aggregation Server 139 | | 140 | | 141 | | 142 sample_rand(x, epoch) => rand | 143 | | 144 | | 145 | | 146 encrypt(x, aux; rand) => msg | 147 | | 148 | | 149 | | 150 |--------------- msg ------------> | 151 | | 152 | | 153 | If Kth instance of msg, 154 | decrypt(msg) => (x, aux) 155 | | 156 | | 157 | | 158 | | 160 Figure 1: System Architecture 162 The main goal in the STAR protocol is to have the aggregation 163 performed by a single untrusted server, without requiring 164 communication with any other non-colluding entities. In order for 165 the aggregation to succeed, clients must send messages that are 166 consistent with other client messages. This requires sampling 167 randomness that is equivalent when clients share the same 168 measurement. 170 3.3. Randomness sampling 172 The randomness rand sampled for each message MUST be a deterministic 173 function of the measurement. Either the client MAY sample the 174 randomness directly by computing a randomness extractor over their 175 measurement, or they MAY sample it as the output of an exchange with 176 a separate server that implements a partially oblivious pseudorandom 177 function protocol [OPRF]}. We discuss both cases more throughly in 178 Section 5.1. 180 3.4. Measurement Encryption 182 The client measurement encryption process involves the following 183 steps. 185 * Sample 48-bytes of randomness rand deterministically from their 186 measurement x (as described in Section 5.1). 188 * The client parses rand as three 16-byte chunks: r1, r2, and r3. 190 * The client samples a share s of r1 from a K-out-of-N secret 191 sharing scheme based on Lagrange interpolation, such as [ADSS]. 192 This process involves r2 as consistent randomness for generating 193 the coefficients for the polynomial. The client must then use 194 independent local randomness for determining the point at which to 195 evaluate the polynomial, and generate their share. 197 * The client derives a symmetric encryption key, key, from r1. 199 * The client encrypts the concatenation of x and aux into a 200 ciphertext c. 202 * The client then generates the message to send to the server as the 203 tuple (c,s,r3). 205 3.5. Server Aggregation 207 The server computes the output of the aggregation by performing the 208 following steps. 210 * Group client messages together depending on whether they share the 211 same value r3. 213 * For any subset of client messages greater that is smaller than K: 215 - Abort. 217 * Otherwise: 219 - Run secret share recovery on the set of client-received shares 220 s to reveal r1. 222 - Derive key from r1. 224 - Decrypt each ciphertext c to retrieve x and aux. 226 - Check that each decrypted x value is equivalent. 228 - Output x and the set of all auxiliary data. 230 4. Comparisons with other Systems 232 (for information/discussion: consider removing before publication) 234 4.1. Private Heavy-Hitter Discovery 236 STAR is similar in nature to private heavy-hitter discovery 237 protocols, such as Poplar [Poplar]. In such systems, the aggregation 238 server reveals the set of client measurements that are shared by at 239 least K clients. The STAR protocol is orders of magnitude more 240 efficient than the Poplar approach, with respect to computational, 241 network-usage, and financial metrics. Therefore, STAR scales much 242 better for large numbers of client submissions. Moreover, STAR 243 allows a single untrusted server to perform the aggregation process, 244 as opposed to Poplar which requires two non-colluding servers. 246 4.2. General Aggregation 248 In comparison to general aggregation protocols like Prio [Prio], the 249 STAR protocol provides a more constrained set of functionality. 250 However, STAR is significantly more efficient for the threshold 251 aggregation functionality, requires only a single aggregation server, 252 and is not limited to only processing numerical data types. 254 5. Security Considerations 256 5.1. Randomness Sampling 258 If clients sample randomness from their measurement directly, then 259 security of the encryption process is dependent on the amount of 260 entropy in the measurement input space. In other words, it is 261 crucial for the privacy guarantees provided by this protocol that the 262 aggregation server cannot simply iterate over all possible encrypted 263 values and generate the K values needed to decrypt a given client's 264 measurement. If this requirement does not hold, then the server can 265 do this easily by locally evaluating the randomness derivation 266 process on multiple measurements. 268 For better security guarantees, it is RECOMMENDED that clients sample 269 their randomness as part of an interaction with an independent entity 270 (AKA randomness server) running a partially oblivious pseudorandom 271 function protocol. In such an exchange, the client submits their 272 measurement as input, and learns rand = POPRF(sk,x;t) as the 273 randomness, where sk is the POPRF secret key, and t is public 274 metadata that dictates the current epoch. Sampling randomness in 275 this way restricts the aggregation server to only being able to run 276 the previous attack as an online interaction with the randomness 277 server. 279 For further security enhancements, clients MAY sample their 280 randomness in epoch t and then send it to the aggregation server in 281 t+1 (after the randomness server has rotated their secret key). This 282 prevents the aggregation server from being after receiving the client 283 messages, which shortens the window of the attack. In addition, the 284 original STAR paper [STAR] details potential constructions of POPRF 285 protocols that allow puncturing epoch metadata tags, which prevents 286 the need for the randomness server to perform a full key rotation. 288 5.2. Cryptographic Choices 290 * All encryption operations MUST be carried out using a secure 291 symmetric authenticated encryption scheme. 293 * The secret sharing scheme MUST be information-theoretically 294 secure, and SHOULD based upon traditional K-out-of-N Shamir secret 295 sharing. 297 * For functionality reasons, secret sharing operations SHOULD be 298 implemented in a finite field where collisions are unlikely (e.g. 299 of size 128-bits). This is to ensure that clients do not sample 300 identical shares of the same value. 302 * Client messages MUST be sent over a secure, authenticated channel, 303 such as TLS. 305 5.3. Oblivious Submission 307 Clients SHOULD ensure that their message submission is detached from 308 their identity. This is to ensure that the aggregation server does 309 not learn exactly what each client submits, in the event that their 310 measurement is revealed. This can be achieved by having the clients 311 submit their messages via an [OHTTP] proxy. Note that the OHTTP 312 proxy and randomness server can be combined into a single entity, 313 since client messages are protected by a TLS connection between the 314 client and the aggregation server. 316 5.4. Leakage 318 Client messages immediately leak the size of the anonymity set for 319 each received measurement, even if the measurement is not revealed. 320 As long as client messages are sent via an [OHTTP] proxy, then the 321 leakage derived from the anonymity sets themselves is significantly 322 reduced. 324 6. IANA Considerations 326 This document has no IANA actions. 328 7. References 330 7.1. Normative References 332 [OPRF] Davidson, A., Faz-Hernandez, A., Sullivan, N., and C. A. 333 Wood, "Oblivious Pseudorandom Functions (OPRFs) using 334 Prime-Order Groups", Work in Progress, Internet-Draft, 335 draft-irtf-cfrg-voprf-09, 8 February 2022, 336 . 339 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 340 Requirement Levels", BCP 14, RFC 2119, 341 DOI 10.17487/RFC2119, March 1997, 342 . 344 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 345 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 346 May 2017, . 348 7.2. Informative References 350 [ADSS] Bellare, M., Dai, W., and P. Rogaway, "Reimagining Secret 351 Sharing: Creating a Safer and More Versatile Primitive by 352 Adding Authenticity, Correcting Errors, and Reducing 353 Randomness Requirements", 27 June 2020, 354 . 356 [Brave] "Brave Browser", n.d., . 358 [OHTTP] Thomson, M. and C. A. Wood, "Oblivious HTTP", Work in 359 Progress, Internet-Draft, draft-thomson-http-oblivious-02, 360 24 August 2021, . 363 [Poplar] Boneh, D., Boyle, E., Corrigan-Gibbs, H., Gilboa, N., and 364 Y. Ishai, "Lightweight Techniques for Private Heavy 365 Hitters", 4 January 2022, 366 . 368 [Prio] Geoghegan, T., Patton, C., Rescorla, E., and C. A. Wood, 369 "Privacy Preserving Measurement", Work in Progress, 370 Internet-Draft, draft-gpew-priv-ppm-00, 25 October 2021, 371 . 374 [STAR] Davidson, A., Snyder, P., Quirk, E., Genereux, J., and B. 375 Livshits, "STAR: Distributed Secret Sharing for Private 376 Threshold Aggregation Reporting", 8 December 2021, 377 . 379 [Tor] Dingledine, R., Mathewson, N., and P. Syverson, "Tor: The 380 Second-Generation Onion Router", 2004, . 384 Acknowledgments 386 The authors would like to thank the authors of the original [STAR] 387 paper, which forms the basis for this document. 389 Authors' Addresses 391 Alex Davidson 392 Brave Software 393 Email: alex.davidson92@gmail.com 395 Shivan Kaul Sahib 396 Brave Software 397 Email: shivankaulsahib@gmail.com 399 Peter Snyder 400 Brave Software 401 Email: pes@brave.com