idnits 2.17.00 (12 Aug 2021) /tmp/idnits13152/draft-iab-privacy-terminology-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (March 12, 2012) is 3715 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Unused Reference: 'RFC6265' is defined on line 519, but no explicit reference was found in the text == Outdated reference: draft-iab-privacy-considerations has been published as RFC 6973 == Outdated reference: draft-iab-identifier-comparison has been published as RFC 6943 -- Obsolete informational reference (is this intentional?): RFC 4282 (Obsoleted by RFC 7542) -- Obsolete informational reference (is this intentional?): RFC 5077 (Obsoleted by RFC 8446) -- Obsolete informational reference (is this intentional?): RFC 5246 (Obsoleted by RFC 8446) Summary: 0 errors (**), 0 flaws (~~), 4 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group M. Hansen 3 Internet-Draft ULD Kiel 4 Intended status: Informational H. Tschofenig 5 Expires: September 13, 2012 Nokia Siemens Networks 6 R. Smith 7 JANET(UK) 8 A. Cooper 9 CDT 10 March 12, 2012 12 Privacy Terminology and Concepts 13 draft-iab-privacy-terminology-01.txt 15 Abstract 17 Privacy is a concept that has been debated and argued throughout the 18 last few millennia. Its most striking feature is the difficulty that 19 disparate parties encounter when they attempt to precisely define it. 20 In order to discuss privacy in a meaningful way, a tightly defined 21 context is necessary. The specific context of privacy used within 22 this document is that of personal data in Internet protocols. 23 Personal data is any information relating to a data subject, where a 24 data subject is an identified natural person or a natural person who 25 can be identified, directly or indirectly. 27 A lot of work within the IETF involves defining protocols that can 28 potentially transport (either explicitly or implicitly) personal 29 data. This document aims to establish a consistent lexicon around 30 privacy for IETF contributors to use when discussing privacy 31 considerations within their work. 33 Note: This document is discussed at 34 https://www.ietf.org/mailman/listinfo/ietf-privacy 36 Status of This Memo 38 This Internet-Draft is submitted in full conformance with the 39 provisions of BCP 78 and BCP 79. 41 Internet-Drafts are working documents of the Internet Engineering 42 Task Force (IETF). Note that other groups may also distribute 43 working documents as Internet-Drafts. The list of current Internet- 44 Drafts is at http://datatracker.ietf.org/drafts/current/. 46 Internet-Drafts are draft documents valid for a maximum of six months 47 and may be updated, replaced, or obsoleted by other documents at any 48 time. It is inappropriate to use Internet-Drafts as reference 49 material or to cite them other than as "work in progress." 51 This Internet-Draft will expire on September 13, 2012. 53 Copyright Notice 55 Copyright (c) 2012 IETF Trust and the persons identified as the 56 document authors. All rights reserved. 58 This document is subject to BCP 78 and the IETF Trust's Legal 59 Provisions Relating to IETF Documents 60 (http://trustee.ietf.org/license-info) in effect on the date of 61 publication of this document. Please review these documents 62 carefully, as they describe your rights and restrictions with respect 63 to this document. Code Components extracted from this document must 64 include Simplified BSD License text as described in Section 4.e of 65 the Trust Legal Provisions and are provided without warranty as 66 described in the Simplified BSD License. 68 Table of Contents 70 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 71 2. Basic Terms . . . . . . . . . . . . . . . . . . . . . . . . . 5 72 3. Identifiability . . . . . . . . . . . . . . . . . . . . . . . 6 73 3.1. Anonymity . . . . . . . . . . . . . . . . . . . . . . . . 6 74 3.2. Pseudonymity . . . . . . . . . . . . . . . . . . . . . . . 7 75 3.3. Identity Confidentiality . . . . . . . . . . . . . . . . . 8 76 3.4. Identity Management . . . . . . . . . . . . . . . . . . . 8 77 4. Unlinkability . . . . . . . . . . . . . . . . . . . . . . . . 9 78 5. Undetectability . . . . . . . . . . . . . . . . . . . . . . . 11 79 6. Example . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 80 7. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 13 81 8. Security Considerations . . . . . . . . . . . . . . . . . . . 14 82 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 15 83 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 16 84 10.1. Normative References . . . . . . . . . . . . . . . . . . . 16 85 10.2. Informative References . . . . . . . . . . . . . . . . . . 16 87 1. Introduction 89 Privacy is a concept that has been debated and argued throughout the 90 last few millennia by all manner of people, including philosophers, 91 psychologists, lawyers, and more recently, computer scientists. Its 92 most striking feature is the difficulty that disparate parties 93 encounter when they attempt to precisely define it. Each individual, 94 group, and culture has its own views and preconceptions about 95 privacy, some of which are mutually complimentary and some of which 96 diverge. However, it is generally (but not unanimously) agreed that 97 the protection of privacy is "A Good Thing." People often do not 98 realize how they value privacy until they lose it. 100 In order to discuss privacy in a meaningful way, a tightly defined 101 context is necessary. The specific context of privacy used within 102 this document is that of "personal data" in Internet protocols. 103 Personal data is any information relating to a data subject, where a 104 data subject is an identified natural person or a natural person who 105 can be identified, directly or indirectly. 107 A lot of work within the IETF involves defining protocols that can 108 potentially transport personal data. Protocols are therefore capable 109 of enabling both privacy protections and privacy breaches. Protocol 110 architects often do not assume a specific relationship between the 111 identifiers and data elements communicated in protocols and the 112 humans using the software running the protocols. However, a protocol 113 may facilitate the identification of a natural person depending on 114 how protocol identifiers and other state are created and 115 communicated. 117 One commonly held privacy objective is that of data minimization -- 118 eliminating the potential for personal data to be collected. Often, 119 however, the collection of personal data cannot not be prevented 120 entirely, in which case the goal is to minimize the amount of 121 personal data that can be collected for a given purpose and to offer 122 ways to control the dissemination of personal data. This document 123 focuses on introducing terms used to describe privacy properties that 124 support data minimization. 126 Other techniques have been proposed and implemented that aim to 127 enhance privacy by providing misinformation (inaccurate or erroneous 128 information, provided usually without conscious effort to mislead or 129 deceive) or disinformation (deliberately false or distorted 130 information provided in order to mislead or deceive). These 131 techniques are out of scope for this document. 133 This document aims to establish a basic lexicon around privacy so 134 that IETF contributors who wish to discuss privacy considerations 135 within their work (see [I-D.iab-privacy-considerations]) can do so 136 using terminology consistent across areas. Note that it does not 137 attempt to define all aspects of privacy terminology, rather it 138 discusses terms describing the most common ideas and concepts. 140 2. Basic Terms 142 Personal data: Any information relating to a data subject. 144 Data subject: An identified natural person or a natural person who 145 can be identified, directly or indirectly. 147 Item of Interest (IOI): Any data item that an observer or attacker 148 might be interested in. This includes attributes, identifiers, 149 communication actions (such as sending data to or receiving data 150 from certain communication partners), etc. 152 Initiator: The protocol entity that starts a communication 153 interaction with a recipient. The term "initiator" is used rather 154 than "sender" to highlight the fact that many protocols use 155 bidirectional communication where both ends send and receive data 157 Recipient: A protocol entity that recieves communications from an 158 initiator. 160 Attacker: An entity that intentionally works against some protection 161 goal. It is assumed that an attacker uses all information 162 available to infer information about its items of interest. 164 Observer: A protocol entity that is authorized to receive and handle 165 data from an initiator and thereby is able to observe and collect 166 information, potentially posing privacy threats depending on the 167 context. These entities are not generally considered as 168 "attackers" in the security sense, but they are still capable of 169 privacy invasion. 171 3. Identifiability 173 Identity: Any subset of a data subject's attributes that identifies 174 the data subject within a given context. Data subjects usually 175 have multiple identities for use in different contexts. 177 Identifier: A data object that represents a specific identity of a 178 protocol entity or data subject. See [RFC4949]. 180 Identifiability: The extent to which a data subject is identifiable. 182 Identification: The linking of information to a particular data 183 subject to infer the subject's identity. 185 The following sub-sections define terms related to different ways of 186 reducing identifiability. 188 3.1. Anonymity 190 Anonymous: A property of a data subject in which an observer or 191 attacker cannot identify the data subject within a set of other 192 subjects (the anonymity set). 194 Anonymity: The state of being anonymous. 196 To enable anonymity of a data subject, there must exist a set of data 197 subjects with potentially the same attributes, i.e., to the attacker 198 or the observer these data subjects must appear indistinguishable 199 from each other. The set of all such data subjects is known as the 200 anonymity set and membership of this set may vary over time. 202 The composition of the anonymity set depends on the knowledge of the 203 observer or attacker. Thus anonymity is relative with respect to the 204 observer or attacker. An initiator may be anonymous only within a 205 set of potential initiators -- its initiator anonymity set -- which 206 itself may be a subset of all data subjects that may initiate 207 communications. Conversely, a recipient may be anonymous only within 208 a set of potential receipients -- its receipient anonymity set. Both 209 anonymity sets may be disjoint, may overlap, or may be the same. 211 As an example consider RFC 3325 (P-Asserted-Identity, PAI) [RFC3325], 212 an extension for the Session Initiation Protocol (SIP), that allows a 213 data subject, such as a VoIP caller, to instruct an intermediary that 214 he or she trusts not to populate the SIP From header field with the 215 subject's authenticated and verified identity. The recipient of the 216 call, as well as any other entity outside of the data subject's trust 217 domain, would therefore only learn that the SIP message (typically a 218 SIP INVITE) was sent with a header field 'From: "Anonymous" 219 ' rather than the subject's address- 220 of-record, which is typically thought of as the "public address" of 221 the user (the data subject). When PAI is used, the data subject 222 becomes anonymous within the initiator anonymity set that is 223 populated by every data subject making use of that specific 224 intermediary. 226 Note: This example ignores the fact that other personal data may be 227 inferred from the other SIP protocol payloads. This caveat makes the 228 analysis of the specific protocol extension easier but cannot be 229 assumed when conducting analysis of an entire architecture. 231 3.2. Pseudonymity 233 Pseudonym: An identifier of a subject other than one of the 234 subject's real names. 236 Real name: The opposite of a pseudonym. For example, a natural 237 person may possess the names that appear on his or her birth 238 certificate or on other official identity documents issued by the 239 state. A natural person's real name typically comprises his or 240 her given names and a family name. A data subject may have 241 multiple real names over a lifetime, including legal names. Note 242 that from a technological perspective it cannot always be 243 determined whether an identifier of a data subject is a pseudonym 244 or a real name. 246 Pseudonymous: A property of a data subject in which the subject is 247 identified by a pseudonym. 249 Pseudonymity: The state of being pseudonymous. 251 In the context of IETF protocols almost all identifiers are 252 pseudonyms since there is typically no requirement to use real names 253 in protocols. However, in certain scenarios it is reasonable to 254 assume that real names will be used (with vCard [RFC6350], for 255 example). 257 Pseudonymity is strengthened when less personal data can be linked to 258 the pseudonym; when the same pseudonym is used less often and across 259 fewer contexts; and when independently chosen pseudonyms are more 260 frequently used for new actions (making them, from an observer's or 261 attacker's perspective, unlinkable). 263 For Internet protocols it is important whether protocols allow 264 pseudonyms to be changed without human interaction, the default 265 length of pseudonym lifetimes, to whom pseudonyms are exposed, how 266 data subjects are able to control disclosure, how often pseudonyms 267 can be changed, and the consequences of changing them. These aspects 268 are described in [I-D.iab-privacy-considerations]. 270 3.3. Identity Confidentiality 272 Identity confidentiality: A property of a data subject wherein any 273 party other than the recipient cannot sufficiently identify the 274 data subject within the anonymity set. In comparison to anonymity 275 and pseudonymity, identity confidentiality is concerned with 276 eavesdroppers and intermediaries. 278 As an example, consider the network access authentication procedures 279 utilizing the Extensible Authentication Protocol (EAP) [RFC3748]. 280 EAP includes an identity exchange where the Identity Response is 281 primarily used for routing purposes and selecting which EAP method to 282 use. Since EAP Identity Requests and Responses are sent in 283 cleartext, eavesdroppers and intermediaries along the communication 284 path between the EAP peer and the EAP server can snoop on the 285 identity. To address this treat, as discussed in RFC 4282 [RFC4282], 286 the user's identity can be hidden against these observers with the 287 cryptography support by EAP methods. Identity confidentiality has 288 become a recommended design criteria for EAP (see [RFC4017]). EAP- 289 AKA [RFC4187], for example, protects the EAP peer's identity against 290 passive adversaries by utilizing temporal identities. EAP-IKEv2 291 [RFC5106] is an example of an EAP method that offers protection 292 against active observers with regard to the data subject's identity. 294 3.4. Identity Management 296 Identity Provider (IdP): An entity (usually an organization) that 297 has a relationship with a data subject and is responsible for 298 providing authentication and authorization information to relying 299 parties (see below). To facilitate the provision of 300 authentication and authorization, an IdP will usually go through a 301 process of verifying the data subject's identity and issuing the 302 subject a set of credentials. Each function that the IdP performs 303 -- identity verification, credential issuing, providing 304 authentication assertions, providing authorization assertions, and 305 so forth -- may be performed by separate entities, but for the 306 purposes of this document, it is assumed that a single entity is 307 performing all of them. 309 Relying Party (RP): An entity that relies on authentication and 310 authorization of a data subject provided by an identity provider, 311 typically to process a transaction or grant access to information 312 or a system. 314 4. Unlinkability 316 Unlinkability: Within a particular set of information, a state in 317 which an observer or attacker cannot distinguish whether two items 318 of interest are related or not (with a high enough degree of 319 probability to be useful to the observer or attacker). 321 Unlinkability of two or more messages may depend on whether their 322 content is protected against the observer or attacker. In the cases 323 where this is not true, messages may only be unlinkable if it is 324 assumed that the observer or attacker is not able to infer 325 information about the initiator or receipient from the message 326 content itself. It is worth noting that even if the content itself 327 does not betray linkable information explicitly, deep semantic 328 analysis of a message sequence can often detect certain 329 characteristics that link them together, including similarities in 330 structure, style, use of particular words or phrases, consistent 331 appearance of certain grammatical errors, and so forth. 333 There are several items of terminology highly related to 334 unlinkability: 336 Correlation: The combination of various pieces of information about 337 a data subject. For example, if an observer or attacker concludes 338 that a data subject plays a specific computer game, reads a 339 specific news article on a website, and uploads specific videos, 340 then the data subject's activities have been correlated, even if 341 the observer or attacker is unable to identify the specific data 342 subject. 344 Relationship anonymity: When an initiator and receipient (or each 345 recipient in the case of multicast) are unlinkable. The classical 346 MIX-net [Chau81] without dummy traffic is one implementation with 347 this property: the observer sees who sends and receives messages 348 and when they are sent and received, but it cannot figure out who 349 is sending messages to whom. 351 Unlinkable protocol interaction: When one protocol interaction is 352 not linkable to another protocol interaction of the same protocol. 354 An example of a protocol that does not provide this property is 355 Transport Layer Security (TLS) session resumption [RFC5246] or the 356 TLS session resumption without server side state [RFC5077]. In 357 RFC 5246 [RFC5246] a server provides the client with a session_id 358 in the ServerHello message and caches the master_secret for later 359 exchanges. When the client initiates a new connection with the 360 server it re-uses the previously obtained session_id in its 361 ClientHello message. The server agrees to resume the session by 362 using the same session_id and the previously stored master_secret 363 for the generation of the TLS Record Layer security association. 364 RFC 5077 [RFC5077] borrows from the session resumption design idea 365 but the server encapsulates all state information into a ticket 366 instead of caching it. An attacker who is able to observe the 367 protocol exchanges between the TLS client and the TLS server is 368 able to link the initial exchange to subsequently resumed TLS 369 sessions when the session_id and the ticket is exchanged in clear 370 (which is the case with data exchange in the initial handshake 371 messages). 373 Fingerprinting: The process of an observer or attacker partially or 374 fully identifying a device, application, or initiator based on 375 multiple information elements communicated to the observer or 376 attacker. For example, the Panopticlick project by the Electronic 377 Frontier Foundation uses parameters an HTTP-based Web browser 378 shares with sites it visits to determine the uniqueness of the 379 browser [panopticlick]. 381 5. Undetectability 383 Undetectability: The state in which an observer or attacker cannot 384 sufficiently distinguish whether an item of interest exists or 385 not. 387 In contrast to anonymity and unlinkability, where the IOI is 388 protected indirectly through protection of the IOI's relationship to 389 a subject or other IOI, undetectability means the IOI is directly 390 protected. For example, undetectability is as a desirable property 391 of steganographic systems. 393 If we consider the case where an IOI is a message, then 394 undetectability means that the message is not sufficiently 395 discernible from other messages (from, e.g., random noise). 397 Achieving anonymity, unlinkability, and undetectability may enable 398 extreme data minimization. Unfortunately, this would also prevent a 399 certain class of useful two-way communication scenarios. Therefore, 400 for many applications, a certain amount of linkability and 401 detectability is usually accepted while attempting to retain 402 unlinkability between the data subject and his or her transactions. 403 This is achieved through the use of appropriate kinds of pseudonymous 404 identifiers. These identifiers are then often used to refer to 405 established state or are used for access control purposes, see 406 [I-D.iab-identifier-comparison]. 408 6. Example 410 [To be provided in a future version once the guidance is settled.] 412 7. Acknowledgments 414 Parts of this document utilizes content from [anon_terminology], 415 which had a long history starting in 2000 and whose quality was 416 improved due to the feedback from a number of people. The authors 417 would like to thank Andreas Pfitzmann for his work on an earlier 418 draft version of this document. 420 Within the IETF a number of persons had provided their feedback to 421 this document. We would like to thank Scott Brim, Marc Linsner, 422 Bryan McLaughlin, Nick Mathewson, Eric Rescorla, Scott Bradner, Nat 423 Sakimura, Bjoern Hoehrmann, David Singer, Dean Willis, Christine 424 Runnegar, Lucy Lynch, Trend Adams, Mark Lizar, Martin Thomson, Josh 425 Howlett, Mischa Tuffield, S. Moonesamy, Ted Hardie, Zhou Sujing, 426 Claudia Diaz, Leif Johansson, and Klaas Wierenga. 428 8. Security Considerations 430 This document introduces terminology for talking about privacy within 431 IETF specifications. Since privacy protection often relies on 432 security mechanisms then this document is also related to security in 433 its broader context. 435 9. IANA Considerations 437 This document does not require actions by IANA. 439 10. References 441 10.1. Normative References 443 [I-D.iab-privacy-considerations] Cooper, A., Tschofenig, H., Aboba, 444 B., Peterson, J., and J. Morris, 445 "Privacy Considerations for 446 Internet Protocols", 447 draft-iab-privacy-considerations-01 448 (work in progress), October 2011. 450 [id] "Identifier - Wikipedia", 451 Wikipedia , URL: http:// 452 en.wikipedia.org/wiki/Identifier, 453 Dec 2011. 455 10.2. Informative References 457 [Chau81] Chaum, D., "Untraceable Electronic 458 Mail, Return Addresses, and Digital 459 Pseudonyms", Communications of the 460 ACM , 24/2, 84-88, 1981. 462 [I-D.iab-identifier-comparison] Thaler, D., "Issues in Identifier 463 Comparison for Security Purposes", 464 draft-iab-identifier-comparison-00 465 (work in progress), July 2011. 467 [RFC3325] Jennings, C., Peterson, J., and M. 468 Watson, "Private Extensions to the 469 Session Initiation Protocol (SIP) 470 for Asserted Identity within 471 Trusted Networks", RFC 3325, 472 November 2002. 474 [RFC3748] Aboba, B., Blunk, L., Vollbrecht, 475 J., Carlson, J., and H. Levkowetz, 476 "Extensible Authentication Protocol 477 (EAP)", RFC 3748, June 2004. 479 [RFC4017] Stanley, D., Walker, J., and B. 480 Aboba, "Extensible Authentication 481 Protocol (EAP) Method Requirements 482 for Wireless LANs", RFC 4017, 483 March 2005. 485 [RFC4187] Arkko, J. and H. Haverinen, 486 "Extensible Authentication Protocol 487 Method for 3rd Generation 488 Authentication and Key Agreement 489 (EAP-AKA)", RFC 4187, January 2006. 491 [RFC4282] Aboba, B., Beadles, M., Arkko, J., 492 and P. Eronen, "The Network Access 493 Identifier", RFC 4282, 494 December 2005. 496 [RFC4949] Shirey, R., "Internet Security 497 Glossary, Version 2", RFC 4949, 498 August 2007. 500 [RFC5077] Salowey, J., Zhou, H., Eronen, P., 501 and H. Tschofenig, "Transport Layer 502 Security (TLS) Session Resumption 503 without Server-Side State", 504 RFC 5077, January 2008. 506 [RFC5106] Tschofenig, H., Kroeselberg, D., 507 Pashalidis, A., Ohba, Y., and F. 508 Bersani, "The Extensible 509 Authentication Protocol-Internet 510 Key Exchange Protocol version 2 511 (EAP-IKEv2) Method", RFC 5106, 512 February 2008. 514 [RFC5246] Dierks, T. and E. Rescorla, "The 515 Transport Layer Security (TLS) 516 Protocol Version 1.2", RFC 5246, 517 August 2008. 519 [RFC6265] Barth, A., "HTTP State Management 520 Mechanism", RFC 6265, April 2011. 522 [RFC6350] Perreault, S., "vCard Format 523 Specification", RFC 6350, 524 August 2011. 526 [anon_terminology] Pfitzmann, A. and M. Hansen, "A 527 terminology for talking about 528 privacy by data minimization: 529 Anonymity, Unlinkability, 530 Undetectability, Unobservability, 531 Pseudonymity, and Identity 532 Management", URL: http:// 533 dud.inf.tu-dresden.de/literatur/ 534 Anon_Terminology_v0.34.pdf , 535 version 034, 2010. 537 [panopticlick] Eckersley, P., "How Unique Is Your 538 Web Browser?", Electronig Frontier 539 Foundation , URL: https:// 540 panopticlick.eff.org/ 541 browser-uniqueness.pdf, 2009. 543 Authors' Addresses 545 Marit Hansen 546 ULD Kiel 548 EMail: marit.hansen@datenschutzzentrum.de 550 Hannes Tschofenig 551 Nokia Siemens Networks 552 Linnoitustie 6 553 Espoo 02600 554 Finland 556 Phone: +358 (50) 4871445 557 EMail: Hannes.Tschofenig@gmx.net 558 URI: http://www.tschofenig.priv.at 560 Rhys Smith 561 JANET(UK) 563 EMail: rhys.smith@ja.net 565 Alissa Cooper 566 CDT 568 EMail: acooper@cdt.org