idnits 2.17.00 (12 Aug 2021) /tmp/idnits48363/draft-iab-privacy-terminology-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (January 10, 2012) is 3783 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: draft-iab-privacy-considerations has been published as RFC 6973 == Outdated reference: draft-iab-identifier-comparison has been published as RFC 6943 Summary: 0 errors (**), 0 flaws (~~), 3 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group M. Hansen 3 Internet-Draft ULD Kiel 4 Intended status: Informational H. Tschofenig 5 Expires: July 13, 2012 Nokia Siemens Networks 6 R. Smith 7 JANET(UK) 8 January 10, 2012 10 Privacy Terminology 11 draft-iab-privacy-terminology-00.txt 13 Abstract 15 Privacy is a concept that has been debated and argued throughout the 16 last few millennia by all manner of people. Its most striking 17 feature is that nobody seems able to agree upon a precise definition 18 of what it actually is. In order to discuss privacy in any 19 meaningful way a tightly defined context needs to be elucidated. The 20 specific context of privacy used within this document is that of 21 "personal data", any information relating to a data subject; a data 22 subject is an identified natural person or a natural person who can 23 be identified, directly or indirectly. This context is highly 24 relevant since a lot of work within the IETF involves defining 25 protocols that can potentially transport (either explicitly or 26 implicitly) personal data. 28 This document aims to establish a basic lexicon around privacy so 29 that IETF contributors who wish to discuss privacy considerations 30 within their work can do so using terminology consistent across the 31 area. 33 Note: This document is discussed at 34 https://www.ietf.org/mailman/listinfo/ietf-privacy 36 Status of This Memo 38 This Internet-Draft is submitted in full conformance with the 39 provisions of BCP 78 and BCP 79. 41 Internet-Drafts are working documents of the Internet Engineering 42 Task Force (IETF). Note that other groups may also distribute 43 working documents as Internet-Drafts. The list of current Internet- 44 Drafts is at http://datatracker.ietf.org/drafts/current/. 46 Internet-Drafts are draft documents valid for a maximum of six months 47 and may be updated, replaced, or obsoleted by other documents at any 48 time. It is inappropriate to use Internet-Drafts as reference 49 material or to cite them other than as "work in progress." 51 This Internet-Draft will expire on July 13, 2012. 53 Copyright Notice 55 Copyright (c) 2012 IETF Trust and the persons identified as the 56 document authors. All rights reserved. 58 This document is subject to BCP 78 and the IETF Trust's Legal 59 Provisions Relating to IETF Documents 60 (http://trustee.ietf.org/license-info) in effect on the date of 61 publication of this document. Please review these documents 62 carefully, as they describe your rights and restrictions with respect 63 to this document. Code Components extracted from this document must 64 include Simplified BSD License text as described in Section 4.e of 65 the Trust Legal Provisions and are provided without warranty as 66 described in the Simplified BSD License. 68 Table of Contents 70 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 71 2. Anonymity . . . . . . . . . . . . . . . . . . . . . . . . . . 5 72 3. Unlinkability . . . . . . . . . . . . . . . . . . . . . . . . 6 73 4. Undetectability . . . . . . . . . . . . . . . . . . . . . . . 8 74 5. Pseudonymity . . . . . . . . . . . . . . . . . . . . . . . . . 9 75 6. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 11 76 7. Security Considerations . . . . . . . . . . . . . . . . . . . 12 77 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 13 78 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 14 79 9.1. Normative References . . . . . . . . . . . . . . . . . . . 14 80 9.2. Informative References . . . . . . . . . . . . . . . . . . 14 82 1. Introduction 84 Privacy is a concept that has been debated and argued throughout the 85 last few millennia by all manner of people, including philosophers, 86 psychologists, lawyers, and more recently, computer scientists. Its 87 most striking feature is that nobody seems able to agree upon a 88 precise definition of what it actually is. Every individual, every 89 group, and every culture have their own different views and 90 preconceptions about the concept - some mutually complimentary, some 91 distinctly different. However, it is generally (but not 92 unanimously!) agreed that the protection of privacy is "A Good Thing" 93 and often, people only realize what it was when they feel that they 94 have lost it. 96 In order to discuss privacy in any meaningful way a tightly defined 97 context needs to be elucidated. The specific context of privacy used 98 within this document is that of "personal data", any information 99 relating to a data subject; a (data) subject is an identified natural 100 person or a natural person who can be identified, directly or 101 indirectly. We A lot of work within the IETF involves defining 102 protocols that can potentially transport personal data and can 103 therefore either, by dint of design decisions when creating them, 104 enable either privacy protection or result in privacy breaches. 105 While identifiers and data elements communicated in protocols often 106 do not assume a specific association with a human using the software. 107 However, a protocol may help or simplify the re-identification of a 108 natural person by the choice of their identifiers and other state 109 that is established and communicated, particularly when information 110 from various sources is combined and analyzed together. 112 Work in this area of privacy and privacy protection over the last few 113 decades has centered on the idea of data minimization; it uses 114 terminologies such as anonymity, unlinkability, unobservability, and 115 pseudonymity. These terms are often used in discussions about the 116 privacy properties of systems. 118 The core principal of data minimization is that the ability for 119 others to collect personal data should be removed or at least 120 minimized when this is either not desirable or when it cannot be 121 entirely prevented. 123 Data minimization is the only generic strategy to enhance individual 124 privacy in cases where valid personal information is used since all 125 valid personal data inherently provides some linkability. Other 126 techniques have been proposed and implemented that aim to enhance 127 privacy by providing misinformation (inaccurate or erroneous 128 information, provided usually without conscious effort to mislead or 129 deceive) or disinformation (deliberately false or distorted 130 information provided in order to mislead or deceive). However, these 131 techniques are out of scope for this document. 133 We use the term 'attacker' in this writeup to refer to an entity that 134 violates the privacy expectations of a data subject. The attacker 135 may not only be an entity that is external to the system but, in many 136 cases, the attacker is actually one of the communication partners. 137 When necessary we use the term initiator and responder to refer to 138 the communication interaction of a protocol. This particular 139 terminology is used to highlight that many protocols utilize 140 bidirectional communication where both ends send and receive data. 141 We assume that the attacker uses all information available to infer 142 (probabilities of) his items of interest (IOIs). These IOIs may be 143 attributes (and their values) of personal data, or may be actions 144 such as who sent, or who received, which messages. 146 This document aims to establish a basic lexicon around privacy so 147 that IETF contributors who wish to discuss privacy considerations 148 within their work (see [I-D.iab-privacy-considerations]) can do so 149 using terminology consistent across areas. Note that it does not 150 attempt to define all aspects of privacy terminology, rather it just 151 establishes terms to some of the most common ideas and concepts. 153 2. Anonymity 155 Definition: Anonymity of a subject means that the attacker cannot 156 sufficiently identify the subject within a set of subjects, the 157 anonymity set. 159 To enable anonymity of a subject, there always has to be an 160 appropriate set of subjects with potentially the same attributes. 161 The set of all possible subjects is known as the anonymity set, and 162 membership of this set may vary over time. 164 The set of possible subjects depends on the knowledge of the 165 attacker. Thus, anonymity is relative with respect to the attacker. 166 Therefore, an initiator may be anonymous (initiator anonymity) only 167 within a set of potential initiators - their initiator anonymity set 168 - which itself may be a subset of all subjects who may send a 169 message. Conversely a responder may be anonymous (responder 170 anonymity) only within a set of potential responders - their 171 responder anonymity set. Both anonymity sets may be disjoint, may 172 overlap, or may be the same. 174 As an example consider RFC 3325 (P-Asserted-Identity, PAI) 175 [RFC3325], an extension for the Session Initiation Protocol (SIP), 176 that allows subjects, such as a VoIP caller, to instruct an 177 intermediary he or she trusts not to populate the SIP From header 178 field with its authenticated and verified identity. The recipient 179 of the call, as well as any other entity outside the data 180 subjects's trust domain, would therefore only learn that the SIP 181 message (typically a SIP INVITE) was sent with a header field 182 'From: "Anonymous" ' rather than 183 the subject's address-of-record, which is typically thought of as 184 the "public address" of the user, i.e., the data subject. When 185 PAI is used the subject becomes anonymous within the initiator 186 anonymity set that is populated by every subject making use of 187 that specific intermediary. 189 Note that this example assumes that other personal data cannot be 190 inferred from the other SIP protocol payloads, which is a useful 191 assumption to be made in the analysis of one specific protocol 192 extension but not for analysis of an entire architecture. 194 3. Unlinkability 196 Definition: Unlinkability of two or more Items Of Interest (e.g., 197 subjects, messages, actions, ...) means that within a particular 198 set of information, the attacker cannot distinguish whether these 199 IOIs are related or not (with a high enough degree of probability 200 to be useful). 202 Unlinkability of two (or more) messages may of course depend on 203 whether their content is protected against the attacker. In the 204 cases where this is not true, messages may only be unlinkable if we 205 assume that the attacker is not able to infer information about the 206 initiator or responder from the message content itself. It is worth 207 noting that even if the content itself does not betray linkable 208 information explicitly, deep semantical analysis of a message 209 sequence can often detect certain characteristics which link them 210 together, e.g., similarities in structure, style, use of some words 211 or phrases, consistent appearance of some grammatical errors, etc. 213 The unlinkability property can be considered as a more "fine-grained" 214 version of anonymity since there are many more relations where 215 unlinkability might be an issue than just the relation of "anonymity" 216 between subjects and IOIs. As such, it may sometimes be necessary to 217 explicitly state to which attributes anonymity refers to (beyond the 218 subject to IOI relationship). An attacker might get to know 219 information on linkability of various messages while not necessarily 220 reducing anonymity of the particular subject. As an example an 221 attacker, in spite of being able to link all encrypted messages in a 222 set of transactions, does not learn the identify of the subject who 223 is the source of the transactions. 225 There are several items of terminology heavily related to 226 unlinkability: 228 Definition: We use the term "profiling" to mean learning information 229 about a particular subject while that subject remains anonymous to 230 the attacker. For example, if an attacker concludes that a 231 subject plays a specific computer game, reads specific news 232 article on a website, and uploads certain videos, then the 233 subjects activities have been profiled, even if the attacker is 234 unable to identify that specific subject. 236 Definition: "Relationship anonymity" of a pair of subjects means 237 that sender and recipient (or each recipient in case of multicast) 238 are unlinkable. The classical MIX-net [Chau81] without dummy 239 traffic is one implementation with just this property: The 240 attacker sees who sends messages when, and who receives messages 241 when, but cannot figure out who is sending messages to whom. 243 Definition: The term "unlinkable session" refers the ability of the 244 system to render a set of actions by a subject unlinkable from one 245 another over a sequence of protocol runs (sessions). This term is 246 useful for cases where a sequence of interactions between an 247 initiator and a responder is necessary for the application logic 248 rather than a single-shot message. We refer to this as a session. 249 When doing an analysis with respect to unlinkability we compare 250 this session to a sequence of sessions to determine linkability. 252 Definition: We use the term "fingerprinting" to refer to any 253 parameter (or set of parameters) that an attacker can observe for 254 the purpose of re-identification. Fingerprinting is a form of 255 tracking by associating activities of a communication software at 256 different times, potentially with different communication 257 partners, but without explicitly sharing state information (as it 258 would be the case with cookies [RFC6265]). For example, the 259 Panopticlick project by the Electronic Frontier Foundation uses 260 parameters an HTTP-based Web browser shares with sites it visits 261 to determine the uniqueness of the browser [panopticlick]. 263 4. Undetectability 265 Definition: Undetectability of an item of interest (IOI) means that 266 the attacker cannot sufficiently distinguish whether it exists or 267 not. 269 In contrast to anonymity and unlinkability, where the IOI is 270 protected indirectly through protection of the IOI's relationship to 271 a subject or other IOI, undetectability is the direct protection of 272 an IOI. For example, undetectability can be regarded as a possible 273 and desirable property of steganographic systems. 275 If we consider messages as IOIs, then undetectability means that 276 messages are not sufficiently discernible from, e.g., "random noise". 278 5. Pseudonymity 280 Definition: A pseudonym is an identifier of a subject other than one 281 of the subject's real names. An identifier, as defined in [id], 282 is "a lexical token that names entities". 284 In the context of IETF protocols almost all identifiers are 285 pseudonyms since there is typically no requirement to use real names 286 in protocols. However, in certain scenario it is reasonable to 287 assume that real names will be used and it will be worthwhile to 288 point out this circumstance. 290 For Internet protocols it is important whether protocols allow 291 identifiers to be recycled dynamically, what the lifetime of the 292 pseudonyms are, to whom they get exposed, how subjects are able to 293 control disclosure, and how often they can be changed over time (and 294 what the consequences are when they are regularly changed). These 295 aspects are described in [I-D.iab-privacy-considerations]. 297 Achieving anonymity, unlinkability, and maybe undetectability may 298 enable the ideal of data minimization. Unfortunately, it would also 299 prevent a certain class of useful two-way communication scenarios. 300 Therefore, for many applications, we need to accept a certain amount 301 of linkability and detectability while attempting to retain 302 unlinkability between the subject and their transactions. This is 303 achieved through appropriate kinds of pseudonymous identifiers. 304 These identifiers are then often used to refer to established state 305 or are used for access control purposes, see 306 [I-D.iab-identifier-comparison]. 308 The term 'real name' is the antonym to "pseudonym". There may be 309 multiple real names over a lifetime -- in particular legal names. 310 For example, a human being may possess the names which appear on 311 their birth certificate or on other official identity documents 312 issued by the State; for a legal person the name under which it 313 operates and which is registered in official registers (e.g., 314 commercial register or register of associations). A human being's 315 real name typically comprises their given name and a family name. 316 Note that from a mere technological perspective it cannot always be 317 determined whether an identifier of a subject is a pseudonym or a 318 real name. 320 Additional useful terms are: 322 Definition: The "holder" of the pseudonym is the subject to whom the 323 pseudonym refers. 325 Definition: A subject is "pseudonymous" if a pseudonym is used as 326 identifier instead of one of its real names. 328 Definition: Pseudonymity is the state of remaining pseudonymous 329 through the use of pseudonyms as identifiers. 331 Sender pseudonymity is defined as the sender being pseudonymous, 332 recipient pseudonymity is defined as the recipient being 333 pseudonymous. 335 Anonymity through the use of pseudonyms is stronger where ... 337 o the less personal data of the pseudonym holder can be linked to 338 the pseudonym; 340 o the less often and the less context-spanning pseudonyms are used 341 and therefore the less data about the holder can be linked; 343 o the more often independently chosen pseudonyms are used for new 344 actions (i.e., making them, from an observer's perspective, 345 unlinkable) 347 6. Acknowledgments 349 Parts of this document utilizes content from [anon_terminology], 350 which had a long history starting in 2000 and whose quality was 351 improved due to the feedback from a number of people. The authors 352 would like to thank Andreas Pfitzmann for his work on an earlier 353 draft version of this document. 355 Within the IETF a number of persons had provided their feedback to 356 this document. We would like to thank Scott Brim, Marc Linsner, 357 Bryan McLaughlin, Nick Mathewson, Eric Rescorla, Alissa Cooper, Scott 358 Bradner, Nat Sakimura, Bjoern Hoehrmann, David Singer, Dean Willis, 359 Christine Runnegar, Lucy Lynch, Trend Adams, Mark Lizar, Martin 360 Thomson, Josh Howlett, and Mischa Tuffield. 362 7. Security Considerations 364 This document introduces terminology for talking about privacy within 365 IETF specifications. Since privacy protection often relies on 366 security mechanisms then this document is also related to security in 367 its broader context. 369 8. IANA Considerations 371 This document does not require actions by IANA. 373 9. References 375 9.1. Normative References 377 [I-D.iab-privacy-considerations] Cooper, A., Tschofenig, H., Aboba, 378 B., Peterson, J., and J. Morris, 379 "Privacy Considerations for 380 Internet Protocols", 381 draft-iab-privacy-considerations-01 382 (work in progress), October 2011. 384 [id] "Identifier - Wikipeadia", 385 Wikipedia , URL: http:// 386 en.wikipedia.org/wiki/Identifier, 387 2011. 389 9.2. Informative References 391 [Chau81] Chaum, D., "Untraceable Electronic 392 Mail, Return Addresses, and Digital 393 Pseudonyms", Communications of the 394 ACM , 24/2, 84-88, 1981. 396 [I-D.iab-identifier-comparison] Thaler, D., "Issues in Identifier 397 Comparison for Security Purposes", 398 draft-iab-identifier-comparison-00 399 (work in progress), July 2011. 401 [RFC3325] Jennings, C., Peterson, J., and M. 402 Watson, "Private Extensions to the 403 Session Initiation Protocol (SIP) 404 for Asserted Identity within 405 Trusted Networks", RFC 3325, 406 November 2002. 408 [RFC6265] Barth, A., "HTTP State Management 409 Mechanism", RFC 6265, April 2011. 411 [anon_terminology] Pfitzmann, A. and A. Pfitzmann, "A 412 terminology for talking about 413 privacy by data minimization: 414 Anonymity, Unlinkability, 415 Undetectability, Unobservability, 416 Pseudonymity, and Identity 417 Management", URL: http:// 418 dud.inf.tu-dresden.de/literatur/ 419 Anon_Terminology_v0.34.pdf , 420 version 034, 2010. 422 [panopticlick] Eckersley, P., "How Unique Is Your 423 Web Browser?", Electronig Frontier 424 Foundation , URL: https:// 425 panopticlick.eff.org/ 426 browser-uniqueness.pdf, 2009. 428 Authors' Addresses 430 Marit Hansen 431 ULD Kiel 433 EMail: marit.hansen@datenschutzzentrum.de 435 Hannes Tschofenig 436 Nokia Siemens Networks 437 Linnoitustie 6 438 Espoo 02600 439 Finland 441 Phone: +358 (50) 4871445 442 EMail: Hannes.Tschofenig@gmx.net 443 URI: http://www.tschofenig.priv.at 445 Rhys Smith 446 JANET(UK) 448 EMail: rhys.smith@ja.net