idnits 2.17.00 (12 Aug 2021) /tmp/idnits49198/draft-hansen-privacy-terminology-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (March 14, 2011) is 4085 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- No issues found here. Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group M. Hansen, Ed. 3 Internet-Draft ULD Kiel 4 Intended status: Informational H. Tschofenig 5 Expires: September 15, 2011 Nokia Siemens Networks 6 March 14, 2011 8 Terminology for Talking about Privacy by Data Minimization: Anonymity, 9 Unlinkability, Undetectability, Unobservability, Pseudonymity, and 10 Identity Management 11 draft-hansen-privacy-terminology-02.txt 13 Abstract 15 This document is an attempt to consolidate terminology in the field 16 privacy by data minimization. It motivates and develops definitions 17 for anonymity/identifiability, (un)linkability, (un)detectability, 18 (un)observability, pseudonymity, identity, partial identity, digital 19 identity and identity management. Starting the definitions from the 20 anonymity and unlinkability perspective reveals some deeper 21 structures in this field. 23 Note: This document is discussed at 24 https://www.ietf.org/mailman/listinfo/ietf-privacy 26 Status of This Memo 28 This Internet-Draft is submitted in full conformance with the 29 provisions of BCP 78 and BCP 79. 31 Internet-Drafts are working documents of the Internet Engineering 32 Task Force (IETF). Note that other groups may also distribute 33 working documents as Internet-Drafts. The list of current Internet- 34 Drafts is at http://datatracker.ietf.org/drafts/current/. 36 Internet-Drafts are draft documents valid for a maximum of six months 37 and may be updated, replaced, or obsoleted by other documents at any 38 time. It is inappropriate to use Internet-Drafts as reference 39 material or to cite them other than as "work in progress." 41 This Internet-Draft will expire on September 15, 2011. 43 Copyright Notice 45 Copyright (c) 2011 IETF Trust and the persons identified as the 46 document authors. All rights reserved. 48 This document is subject to BCP 78 and the IETF Trust's Legal 49 Provisions Relating to IETF Documents 50 (http://trustee.ietf.org/license-info) in effect on the date of 51 publication of this document. Please review these documents 52 carefully, as they describe your rights and restrictions with respect 53 to this document. Code Components extracted from this document must 54 include Simplified BSD License text as described in Section 4.e of 55 the Trust Legal Provisions and are provided without warranty as 56 described in the Simplified BSD License. 58 Table of Contents 60 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 61 2. Anonymity . . . . . . . . . . . . . . . . . . . . . . . . . . 3 62 3. Unlinkability . . . . . . . . . . . . . . . . . . . . . . . . 6 63 4. Anonymity in Terms of Unlinkability . . . . . . . . . . . . . 8 64 5. Undetectability and Unobservability . . . . . . . . . . . . . 10 65 6. Pseudonymity . . . . . . . . . . . . . . . . . . . . . . . . . 13 66 7. Identity Management . . . . . . . . . . . . . . . . . . . . . 19 67 8. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 20 68 9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 21 69 10. Security Considerations . . . . . . . . . . . . . . . . . . . 21 70 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 21 71 12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 21 72 12.1. Normative References . . . . . . . . . . . . . . . . . . 21 73 12.2. Informative References . . . . . . . . . . . . . . . . . 21 74 Appendix A. Overview of Main Definitions and their Opposites . . 22 75 Appendix B. Relationships between Terms . . . . . . . . . . . . . 23 77 1. Introduction 79 Early papers from the 1980ies about privacy by data minimization 80 already deal with anonymity, unlinkability, unobservability, and 81 pseudonymity. These terms are often used in discussions about 82 privacy properties of systems. 84 Data minimization means that first of all, the ability for others to 85 collect personal data should be minimized. Often, however, the 86 collection of personal data cannot not be prevented entirely. In 87 such a case, the goal is to minimize the collection of personal data. 88 The time how long collected personal data is stored should be 89 minimized. 91 Data minimization is the only generic strategy to enable anonymity, 92 since all correct personal data help to identify if we exclude 93 providing misinformation (inaccurate or erroneous information, 94 provided usually without conscious effort at misleading, deceiving, 95 or persuading one way or another) or disinformation (deliberately 96 false or distorted information given out in order to mislead or 97 deceive). 99 Furthermore, data minimization is the only generic strategy to enable 100 unlinkability, since all correct personal data provide some 101 linkability if we exclude providing misinformation or disinformation. 103 This document does not aim to collect all terms used in the area of 104 privacy. Even the definition of the term 'privacy' itself difficult 105 due to the contextual nature of it; the understanding of privacy has 106 changed over time. For the purpose of this document we refer to one 107 fairly well established definition by Alan Westin from 1967 [West67]: 109 "Privacy is the claim of individuals, groups, or institutions to 110 determine for themselves when, how, and to what extent information 111 about them is communicated to others. Viewed in terms of the 112 relation of the individual to social participation, privacy is the 113 voluntary and temporary withdrawal of a person from the general 114 society through physical or psychological means, either in a state 115 of solitude or small-group intimacy or, when among larger groups, 116 in a condition of anonymity or reserve.", see page 7 of [West67]. 118 2. Anonymity 120 To enable anonymity of a subject, there always has to be an 121 appropriate set of subjects with potentially the same attributes. 123 Definition: Anonymity of a subject means that the subject is not 124 identifiable within a set of subjects, the anonymity set. 126 Note: 128 "not identifiable within the anonymity set" means that only using 129 the information the attacker has at his discretion, the subject is 130 not distinguishable from the other subjects within the anonymity 131 set. 133 In order to underline that there is a possibility to quantify 134 anonymity for some applications (instead to treating it purely as 135 a binary value it is possible to use the following variation of 136 the previous definition: "Anonymity of a subject from an 137 attacker's perspective means that the attacker cannot sufficiently 138 identify the subject within a set of subjects, the anonymity set." 140 The anonymity set is the set of all possible subjects. The set of 141 possible subjects depends on the knowledge of the attacker. Thus, 142 anonymity is relative with respect to the attacker. With respect to 143 actors, the anonymity set consists of the subjects who might cause an 144 action. With respect to actees, the anonymity set consists of the 145 subjects who might be acted upon. Therefore, a sender may be 146 anonymous (sender anonymity) only within a set of potential senders, 147 his/her sender anonymity set, which itself may be a subset of all 148 subjects who may send a message. The same for the recipient means 149 that a recipient may be anonymous (recipient anonymity) only within a 150 set of potential recipients, his/her recipient anonymity set. Both 151 anonymity sets may be disjoint, be the same, or they may overlap. 152 The anonymity sets may vary over time. Since we assume that the 153 attacker does not forget anything he knows, the anonymity set cannot 154 increase w.r.t. a particular IOI. Especially subjects joining the 155 system in a later stage, do not belong to the anonymity set from the 156 point of view of an attacker observing the system in an earlier 157 stage. (Please note that if the attacker cannot decide whether the 158 joining subjects were present earlier, the anonymity set does not 159 increase either: It just stays the same.) Due to linkability, cf. 160 below, the anonymity set normally can only decrease. 162 Anonymity of a set of subjects within an anonymity set means that all 163 these individual subjects are not identifiable within this anonymity 164 set. In this definition, "set of subjects" is just taken to describe 165 that the anonymity property holds for all elements of the set. 166 Another possible definition would be to consider the anonymity 167 property for the set as a whole. Then a semantically quite different 168 definition could read: Anonymity of a set S of subjects within a 169 larger anonymity set A means that it is not distinguishable whether 170 the subject S whose anonymity is at stake (and which clearly is 171 within A) is within S or not. 173 Anonymity in general as well as the anonymity of each particular 174 subject is a concept which is very much context dependent (on, e.g., 175 subjects population, attributes, time frame, etc). In order to 176 quantify anonymity within concrete situations, one would have to 177 describe the system in sufficient detail, which is practically not 178 always possible for large open systems. Besides the quantity of 179 anonymity provided within a particular setting, there is another 180 aspect of anonymity: its robustness. Robustness of anonymity 181 characterizes how stable the quantity of anonymity is against changes 182 in the particular setting, e.g., a stronger attacker or different 183 probability distributions. We might use quality of anonymity as a 184 term comprising both quantity and robustness of anonymity. To keep 185 this text as simple as possible, we will mainly discuss the quantity 186 of anonymity in the following, using the wording "strength of 187 anonymity". 189 The above definitions of anonymity and the mentioned measures of 190 quantifying anonymity are fine to characterize the status of a 191 subject in a world as it is. If we want to describe changes to the 192 anonymity of a subject if the world is changed somewhat, e.g., the 193 subject uses the communication network differently or uses a modified 194 communication network, we need another definition of anonymity 195 capturing the delta. The simplest way to express this delta is by 196 the observations of "the" attacker. 198 Definition: An anonymity delta (regarding a subject's anonymity) 199 from an attacker's perspective specifies the difference between 200 the subject's anonymity taking into account the attacker's 201 observations (i.e., the attacker's a-posteriori knowledge) and the 202 subject's anonymity given the attacker's a-priori knowledge only. 204 Note: 206 In some publications, the a-priori knowledge of the attacker is 207 called "background knowledge" and the a-posteriori knowledge of 208 the attacker is called "new knowledge". 210 As we can quantify anonymity in concrete situations, so we can 211 quantify the anonymity delta. This can be done by just defining: 212 quantity(anonymity delta) := quantity(anonymity_a-posteriori) - 213 quantity(anonymity_a-priori) 215 If anonymity_a-posteriori and anonymity_a-priori are the same, their 216 quantification is the same and therefore the difference of these 217 quantifications is 0. If anonymity can only decrease (which usually 218 is quite a reasonable assumption), the maximum of quantity(anonymity 219 delta) is 0. 221 Since anonymity cannot increase, the anonymity delta can never be 222 positive. Having an anonymity delta of zero means that anonymity 223 stays the same. This means that if the attacker has no a-priori 224 knowledge about the particular subject, having no anonymity delta 225 implies anonymity. But if the attacker has an a-priori knowledge 226 covering all actions of the particular subject, having no anonymity 227 delta does not imply any anonymity at all. If there is no anonymity 228 from the very beginning, even preserving it completely does not yield 229 any anonymity. To be able to express this conveniently, we use 230 wordings like "perfect preservation of a subject's anonymity". It 231 might be worthwhile to generalize "preservation of anonymity of 232 single subjects" to "preservation of anonymity of sets of subjects", 233 in the limiting case all subjects in an anonymity set. An important 234 special case is that the "set of subjects" is the set of subjects 235 having one or several attribute values A in common. Then the meaning 236 of "preservation of anonymity of this set of subjects" is that 237 knowing A does not decrease anonymity. Having a negative anonymity 238 delta means that anonymity is decreased. 240 3. Unlinkability 242 Definition: Unlinkability of two or more items of interest (IOIs, 243 e.g., subjects, messages, actions, ...) from an attacker's 244 perspective means that within the system (comprising these and 245 possibly other items), the attacker cannot sufficiently 246 distinguish whether these IOIs are related or not. 248 Linkability is the negation of unlinkability: 250 Definition: Linkability of two or more items of interest (IOIs, 251 e.g., subjects, messages, actions, ...) from an attacker's 252 perspective means that within the system (comprising these and 253 possibly other items), the attacker can sufficiently distinguish 254 whether these IOIs are related or not. 256 For example, in a scenario with at least two senders, two messages 257 sent by subjects within the same anonymity set are unlinkable for an 258 attacker if for him, the probability that these two messages are sent 259 by the same sender is sufficiently close to 1/(number of senders). 261 Definition: An unlinkability delta of two or more items of interest 262 (IOIs, e.g., subjects, messages, actions, ...) from an attacker's 263 perspective specifies the difference between the unlinkability of 264 these IOIs taking into account the attacker's observations and the 265 unlinkability of these IOIs given the attacker's a-priori 266 knowledge only. 268 Since we assume that the attacker does not forget anything, 269 unlinkability cannot increase. Normally, the attacker's knowledge 270 cannot decrease (analogously to Shannon's definition of "perfect 271 secrecy"). An exception of this rule is the scenario where the use 272 of misinformation (inaccurate or erroneous information, provided 273 usually without conscious effort at misleading, deceiving, or 274 persuading one way or another [Wils93]) or disinformation 275 (deliberately false or distorted information given out in order to 276 mislead or deceive [Wils93]) leads to a growing uncertainty of the 277 attacker which information is correct. A related, but different 278 aspect is that information may become wrong (i.e., outdated) simply 279 because the state of the world changes over time. Since privacy is 280 not only about to protect the current state, but the past and history 281 of a data subject as well, we will not make use of this different 282 aspect in the rest of this document. Therefore, the unlinkability 283 delta can never be positive. Having an unlinkability delta of zero 284 means that the probability of those items being related from the 285 attacker's perspective stays exactly the same before (a-priori 286 knowledge) and after the attacker's observations (a-posteriori 287 knowledge of the attacker). If the attacker has no a-priori 288 knowledge about the particular IOIs, having an unlinkability delta of 289 zero implies unlinkability. But if the attacker has a-priori 290 knowledge covering the relationships of all IOIs, having an 291 unlinkability delta of zero does not imply any unlinkability at all. 292 If there is no unlinkability from the very beginning, even preserving 293 it completely does not yield any unlinkability. To be able to 294 express this conveniently, we use wordings like "perfect preservation 295 of unlinkability w.r.t. specific items" to express that the 296 unlinkability delta is zero. It might be worthwhile to generalize 297 "preservation of unlinkability of two IOIs" to "preservation of 298 unlinkability of sets of IOIs", in the limiting case all IOIs in the 299 system. 301 For example, the unlinkability delta of two messages is sufficiently 302 small (zero) for an attacker if the probability describing his 303 a-posteriori knowledge that these two messages are sent by the same 304 sender and/or received by the same recipient is sufficiently 305 (exactly) the same as the probability imposed by his a-priori 306 knowledge. Please note that unlinkability of two (or more) messages 307 of course may depend on whether their content is protected against 308 the attacker considered. In particular, messages may be unlinkable 309 if we assume that the attacker is not able to get information on the 310 sender or recipient from the message content. Yet with access to 311 their content even without deep semantical analysis the attacker can 312 notice certain characteristics which link them together - e.g. 313 similarities in structure, style, use of some words or phrases, 314 consistent appearance of some grammatical errors, etc. In a sense, 315 content of messages may play a role as "side channel" in a similar 316 way as in cryptanalysis - i.e., content of messages may leak some 317 information on their linkability. 319 Roughly speaking, no unlinkability delta of items means that the 320 ability of the attacker to relate these items does not increase by 321 observing the system or by possibly interacting with it. 323 The definitions of unlinkability, linkability and unlinkability delta 324 do not mention any particular set of IOIs they are restricted to. 325 Therefore, the definitions of unlinkability and unlinkability delta 326 are very strong, since they cover the whole system. We could weaken 327 the definitions by restricting them to part of the system: 328 "Unlinkability of two or more IOIs from an attacker's perspective 329 means that within an unlinkability set of IOIs (comprising these and 330 possibly other items), the attacker cannot sufficiently distinguish 331 whether these IOIs are related or not." 333 4. Anonymity in Terms of Unlinkability 335 To describe anonymity in terms of unlinkability, we have to augment 336 the definitions of anonymity given in Section 2 by making explicit 337 the attributes anonymity relates to. For example, if we choose the 338 attribute "having sent a message" then we can define: 340 A sender s sends a set of messages M anonymously, iff s is anonymous 341 within the set of potential senders of M, the sender anonymity set of 342 M. 344 If the attacker's focus is not on the sender, but on the message, we 345 can define: 347 A set of messages M is sent anonymously, iff M can have been sent by 348 each set of potential senders, i.e., by any set of subjects within 349 the cross product of the sender anonymity sets of each message m 350 within M. 352 When considering sending and receiving of messages as attributes, the 353 items of interest (IOIs) are "who has sent or received which 354 message", then, anonymity of a subject w.r.t. an attribute may be 355 defined as unlinkability of this subject and this attribute. In the 356 wording of the definition of unlinkability: a subject s is related to 357 the attribute value "has sent message m" if s has sent message m. s 358 is not related to that attribute value if s has not sent message m. 359 Same for receiving.Unlinkability is a sufficient condition of 360 anonymity, but it is not a necessary condition. Thus, failing 361 unlinkability w.r.t. some attribute value(s) does not necessarily 362 eliminate anonymity as defined in Section 2; in specific cases (i.e., 363 depending on the attribute value(s)) even the strength of anonymity 364 may not be affected. 366 Definition: Sender anonymity of a subject means that to this 367 potentially sending subject, each message is unlinkable. 369 Note: 371 The property unlinkability might be more "fine-grained" than 372 anonymity, since there are many more relations where unlinkability 373 might be an issue than just the relation "anonymity" between 374 subjects and IOIs. Therefore, the attacker might get to know 375 information on linkability while not necessarily reducing 376 anonymity of the particular subject - depending on the defined 377 measures. An example might be that the attacker, in spite of 378 being able to link, e.g., by timing, all encrypted messages of a 379 transactions, does not learn who is doing this transaction. 381 Correspondingly, recipient anonymity of a subject means that to this 382 potentially receiving subject, each message is unlinkable. 384 Relationship anonymity of a pair of subjects, the potentially sending 385 subject and the potentially receiving subject, means that to this 386 potentially communicating pair of subjects, each message is 387 unlinkable. In other words, sender and recipient (or each recipient 388 in case of multicast) are unlinkable. As sender anonymity of a 389 message cannot hold against the sender of this message himself nor 390 can recipient anonymity hold against any of the recipients w.r.t. 391 himself, relationship anonymity is considered w.r.t. outsiders only, 392 i.e., attackers being neither the sender nor one of the recipients of 393 the messages under consideration. 395 Thus, relationship anonymity is a weaker property than each of sender 396 anonymity and recipient anonymity: The attacker might know who sends 397 which messages or he might know who receives which messages (and in 398 some cases even who sends which messages and who receives which 399 messages). But as long as for the attacker each message sent and 400 each message received are unlinkable, he cannot link the respective 401 senders to recipients and vice versa, i.e., relationship anonymity 402 holds. The relationship anonymity set can be defined to be the cross 403 product of two potentially distinct sets, the set of potential 404 senders and the set of potential recipients or - if it is possible to 405 exclude some of these pairs - a subset of this cross product. So the 406 relationship anonymity set is the set of all possible sender- 407 recipient(s)-pairs. In case of multicast, the set of potential 408 recipients is the power set of all potential recipients. If we take 409 the perspective of a subject sending (or receiving) a particular 410 message, the relationship anonymity set becomes the set of all 411 potential recipients (senders) of that particular message. So fixing 412 one factor of the cross product gives a recipient anonymity set or a 413 sender anonymity set. 415 Note: 417 The following is an explanation of the statement made in the 418 previous paragraph regarding relationship anonymity: For all 419 attackers it holds that sender anonymity implies relationship 420 anonymity, and recipient anonymity implies relationship anonymity. 421 This is true if anonymity is taken as a binary property: Either it 422 holds or it does not hold. If we consider quantities of 423 anonymity, the validity of the implication possibly depends on the 424 particular definitions of how to quantify sender anonymity and 425 recipient anonymity on the one hand, and how to quantify 426 relationship anonymity on the other. There exists at least one 427 attacker model, where relationship anonymity does neither imply 428 sender anonymity nor recipient anonymity. Consider an attacker 429 who neither controls any senders nor any recipients of messages, 430 but all lines and - maybe - some other stations. If w.r.t. this 431 attacker relationship anonymity holds, you can neither argue that 432 against him sender anonymity holds nor that recipient anonymity 433 holds. The classical MIX-net [Chau81] without dummy traffic is 434 one implementation with just this property: The attacker sees who 435 sends messages when and who receives messages when, but cannot 436 figure out who sends messages to whom. 438 5. Undetectability and Unobservability 440 In contrast to anonymity and unlinkability, where not the IOI, but 441 only its relationship to subjects or other IOIs is protected, for 442 undetectability, the IOIs are protected as such. Undetectability can 443 be regarded as a possible and desirable property of steganographic 444 systems. Therefore it matches the information hiding terminology 445 (see [Pfit96], [ZFKP98]). In contrast, anonymity, dealing with the 446 relationship of discernible IOIs to subjects, does not directly fit 447 into that terminology, but independently represents a different 448 dimension of properties. 450 Definition: Undetectability of an item of interest (IOI) from an 451 attacker's perspective means that the attacker cannot sufficiently 452 distinguish whether it exists or not. 454 If we consider messages as IOIs, this means that messages are not 455 sufficiently discernible from, e.g., "random noise". A slightly more 456 precise formulation might be that messages are not discernible from 457 no message. A quantification of this property might measure the 458 number of indistinguishable IOIs and/or the probabilities of 459 distinguishing these IOIs. 461 Undetectability is maximal iff whether an IOI exists or not is 462 completely indistinguishable. We call this perfect undetectability. 464 Definition: An undetectability delta of an item of interest (IOI) 465 from an attacker's perspective specifies the difference between 466 the undetectability of the IOI taking into account the attacker's 467 observations and the undetectability of the IOI given the 468 attacker's a-priori knowledge only. 470 The undetectability delta is zero iff whether an IOI exists or not is 471 indistinguishable to exactly the same degree whether the attacker 472 takes his observations into account or not. We call this "perfect 473 preservation of undetectability". 475 Undetectability of an IOI clearly is only possible w.r.t. subjects 476 being not involved in the IOI (i.e., neither being the sender nor one 477 of the recipients of a message). Therefore, if we just speak about 478 undetectability without spelling out a set of IOIs, it goes without 479 saying that this is a statement comprising only those IOIs the 480 attacker is not involved in. 482 As the definition of undetectability stands, it has nothing to do 483 with anonymity - it does not mention any relationship between IOIs 484 and subjects. Even more, for subjects being involved in an IOI, 485 undetectability of this IOI is clearly impossible. Therefore, early 486 papers describing new mechanisms for undetectability designed the 487 mechanisms in a way that if a subject necessarily could detect an 488 IOI, the other subject(s) involved in that IOI enjoyed anonymity at 489 least. The rational for this is to strive for data minimization: No 490 subject should get to know any (potentially personal) data - except 491 this is absolutely necessary. This means that 493 1. Subjects being not involved in the IOI get to know absolutely 494 nothing. 496 2. Subjects being involved in the IOI only get to know the IOI, but 497 not the other subjects involved - the other subjects may stay 498 anonymous. 500 The attributes "sending a message" or "receiving a message" are the 501 only kinds of attributes considered, 1. and 2. together provide data 502 minimization in this setting in an absolute sense. Undetectability 503 by uninvolved subjects together with anonymity even if IOIs can 504 necessarily be detected by the involved subjects has been called 505 unobservability: 507 Definition: Unobservability of an item of interest (IOI) means 509 * undetectability of the IOI against all subjects uninvolved in 510 it and 512 * anonymity of the subject(s) involved in the IOI even against 513 the other subject(s) involved in that IOI. 515 As we had anonymity sets of subjects with respect to anonymity, we 516 have unobservability sets of subjects with respect to 517 unobservability. Mainly, unobservability deals with IOIs instead of 518 subjects only. Though, like anonymity sets, unobservability sets 519 consist of all subjects who might possibly cause these IOIs, i.e. 520 send and/or receive messages. 522 Sender unobservability then means that it is sufficiently 523 undetectable whether any sender within the unobservability set sends. 524 Sender unobservability is perfect iff it is completely undetectable 525 whether any sender within the unobservability set sends. 527 Recipient unobservability then means that it is sufficiently 528 undetectable whether any recipient within the unobservability set 529 receives. Recipient unobservability is perfect iff it is completely 530 undetectable whether any recipient within the unobservability set 531 receives. 533 Relationship unobservability then means that it is sufficiently 534 undetectable whether anything is sent out of a set of could-be 535 senders to a set of could-be recipients. In other words, it is 536 sufficiently undetectable whether within the relationship 537 unobservability set of all possible sender-recipient(s)-pairs, a 538 message is sent in any relationship. Relationship unobservability is 539 perfect iff it is completely undetectable whether anything is sent 540 out of a set of could-be senders to a set of could-be recipients. 542 All other things being equal, unobservability is the stronger, the 543 larger the respective unobservability set is. 545 Definition: An unobservability delta of an item of interest (IOI) 546 means 548 * undetectability delta of the IOI against all subjects 549 uninvolved in it and 551 * anonymity delta of the subject(s) involved in the IOI even 552 against the other subject(s) involved in that IOI. 554 Since we assume that the attacker does not forget anything, 555 unobservability cannot increase. Therefore, the unobservability 556 delta can never be positive. Having an unobservability delta of zero 557 w.r.t. an IOI means an undetectability delta of zero of the IOI 558 against all subjects uninvolved in the IOI and an anonymity delta of 559 zero against those subjects involved in the IOI. To be able to 560 express this conveniently, we use wordings like "perfect preservation 561 of unobservability" to express that the unobservability delta is 562 zero. 564 6. Pseudonymity 566 Having anonymity of human beings, unlinkability, and maybe 567 unobservability is superb w.r.t. data minimization, but would prevent 568 any useful two-way communication. For many applications, we need 569 appropriate kinds of identifiers: 571 Definition: A pseudonym is an identifier of a subject other than one 572 of the subject's real names. 574 Note: 576 An identifier is defined in [id] as "a lexical token that names 577 entities". 579 In our setting 'subject' means sender or recipient. 581 The term 'real name' is the antonym to "pseudonym". There may be 582 multiple real names over lifetime, in particular the legal names, 583 i.e., for a human being the names which appear on the birth 584 certificate or on other official identity documents issued by the 585 State; for a legal person the name under which it operates and 586 which is registered in official registers (e.g., commercial 587 register or register of associations). A human being's real name 588 typically comprises their given name and a family name. In the 589 realm of identifiers, it is tempting to define anonymity as "the 590 attacker cannot sufficiently determine a real name of the 591 subject". But despite the simplicity of this definition, it is 592 severely restricted: It can only deal with subjects which have at 593 least one real name. It presumes that it is clear who is 594 authorized to attach real names to subjects. It fails to work if 595 the relation to real names is irrelevant for the application at 596 hand. Therefore, we stick to the definitions given in Section 2. 597 Note that from a mere technological perspective it cannot always 598 be determined whether an identifier of a subject is a pseudonym or 599 a real name. 601 Additional useful terms are: 603 Definition: The subject which the pseudonym refers to is the holder 604 of the pseudonym. 606 Definition: A subject is pseudonymous if a pseudonym is used as 607 identifier instead of one of its real names. 609 Definition: Pseudonymity is the use of pseudonyms as identifiers. 611 So sender pseudonymity is defined as the sender being pseudonymous, 612 recipient pseudonymity is defined as the recipient being 613 pseudonymous. 615 In order to be useful in the context of Internet communication we use 616 the term digital pseudonym and declare it as a pseudonym that is 617 suitable to be used to authenticate the holder's IOIs. 619 Defining the process of preparing for the use of pseudonyms, e.g., by 620 establishing certain rules how and under which conditions civil 621 identities of holders of pseudonyms will be disclosed by so-called 622 identity brokers or how to prevent uncovered claims by so-called 623 liability brokers, leads to the more general notion of pseudonymity, 624 as defined below. 626 Note: 628 Identity brokers have for the pseudonyms they are the identity 629 broker for the information who is their respective holder. 630 Therefore, identity brokers can be implemented as a special kind 631 of certification authorities for pseudonyms. Since anonymity can 632 be described as a particular kind of unlinkability, cf. Section 4, 633 the concept of identity broker can be generalized to linkability 634 broker. A linkability broker is a (trusted) third party that, 635 adhering to agreed rules, enables linking IOIs for those entities 636 being entitled to get to know the linking. 638 To authenticate IOIs relative to pseudonyms usually is not enough to 639 achieve accountability for IOIs. 641 Therefore, in many situations, it might make sense to let identity 642 brokers authenticate digital pseudonyms (i.e., check the civil 643 identity of the holder of the pseudonym and then issue a digitally 644 signed statement that this particular identity broker has proof of 645 the identity of the holder of this digital pseudonym and is willing 646 to divulge that proof under well-defined circumstances) or both. 648 Note: 650 If the holder of the pseudonym is a natural person or a legal 651 person, civil identity has the usual meaning, i.e. the identity 652 attributed to that person by a State (e.g., a natural person being 653 represented by the social security number or the combination of 654 name, date of birth, and location of birth etc.). If the holder 655 is, e.g., a computer, it remains to be defined what "civil 656 identity" should mean. It could mean, for example, exact type and 657 serial number of the computer (or essential components of it) or 658 even include the natural person or legal person responsible for 659 its operation. 661 If the digitally signed statement of a trusted identity broker is 662 checked before entering into a transaction with the holder of that 663 pseudonym, accountability can be realized in spite of anonymity. 665 Whereas anonymity and accountability are the extremes with respect to 666 linkability to subjects, pseudonymity is the entire field between and 667 including these extremes. Thus, pseudonymity comprises all degrees 668 of linkability to a subject. Ongoing use of the same pseudonym 669 allows the holder to establish or consolidate a reputation. 670 Establishing and/or consolidating a reputation under a pseudonym is, 671 of course, insecure if the pseudonym does not enable to authenticate 672 messages, i.e., if the pseudonym is not a digital pseudonym. Then, 673 at any moment, another subject might use this pseudonym possibly 674 invalidating the reputation, both for the holder of the pseudonym and 675 all others having to do with this pseudonym. Some kinds of 676 pseudonyms enable dealing with claims in case of abuse of 677 unlinkability to holders: Firstly, third parties (identity brokers) 678 may have the possibility to reveal the civil identity of the holder 679 in order to provide means for investigation or prosecution. To 680 improve the robustness of anonymity, chains of identity brokers may 681 be used [Chau81]. Secondly, third parties may act as liability 682 brokers of the holder to clear a debt or settle a claim. [BuPf90] 683 presents the particular case of value brokers. 685 There are many properties of pseudonyms which may be of importance in 686 specific application contexts. In order to describe the properties 687 of pseudonyms with respect to anonymity, we limit our view to two 688 aspects and give some typical examples: 690 The knowledge of the linking may not be a constant, but change over 691 time for some or even all people. Normally, for non-transferable 692 pseudonyms the knowledge of the linking cannot decrease (with the 693 exception of misinformation or disinformation, which may blur the 694 attacker's knowledge.). Typical kinds of such pseudonyms are: 696 Public Pseudonym: The linking between a public pseudonym and its 697 holder may be publicly known even from the very beginning. E.g., 698 the linking could be listed in public directories such as the 699 entry of a phone number in combination with its owner. 701 Initially non-Public Pseudonym: The linking between an initially 702 non-public pseudonym and its holder may be known by certain 703 parties, but is not public at least initially. E.g., a bank 704 account where the bank can look up the linking may serve as a non- 705 public pseudonym. For some specific non-public pseudonyms, 706 certification authorities acting as identity brokers could reveal 707 the civil identity of the holder in case of abuse. 709 Initially Unlinked Pseudonym: The linking between an initially 710 unlinked pseudonym and its holder is - at least initially - not 711 known to anybody with the possible exception of the holder 712 himself/herself. Examples for unlinked pseudonyms are (non- 713 public) biometrics like DNA information unless stored in databases 714 including the linking to the holders. 716 Public pseudonyms and initially unlinked pseudonyms can be seen as 717 extremes of the described pseudonym aspect whereas initially non- 718 public pseudonyms characterize the continuum in between. 720 Anonymity is the stronger, the less is known about the linking to a 721 subject. The strength of anonymity decreases with increasing 722 knowledge of the pseudonym linking. In particular, under the 723 assumption that no gained knowledge on the linking of a pseudonym 724 will be forgotten and that the pseudonym cannot be transferred to 725 other subjects, a public pseudonym never can become an unlinked 726 pseudonym. In each specific case, the strength of anonymity depends 727 on the knowledge of certain parties about the linking relative to the 728 chosen attacker model. 730 If the pseudonym is transferable, the linking to its holder can 731 change. Considering an unobserved transfer of a pseudonym to another 732 subject, a formerly public pseudonym can become non-public again. 734 With respect to the degree of linkability, various kinds of 735 pseudonyms may be distinguished according to the kind of context for 736 their usage: 738 Person pseudonym: A person pseudonym is a substitute for the 739 holder's name which is regarded as representation for the holder's 740 civil identity. It may be used in many different contexts, e.g., 741 a number of an identity card, the social security number, DNA, a 742 nickname, the pseudonym of an actor, or a mobile phone number. 744 Role pseudonym: The use of role pseudonyms is limited to specific 745 roles, e.g., a customer pseudonym or an Internet account used for 746 many instantiations of the same role "Internet user". The same 747 role pseudonym may be used with different communication partners. 748 Roles might be assigned by other parties, e.g., a company, but 749 they might be chosen by the subject himself/herself as well. 751 Relationship pseudonym: For each communication partner, a different 752 relationship pseudonym is used. The same relationship pseudonym 753 may be used in different roles for communicating with the same 754 partner. Examples are distinct nicknames for each communication 755 partner. In case of group communication, the relationship 756 pseudonyms may be used between more than two partners. 758 Role-relationship pseudonym: For each role and for each 759 communication partner, a different role-relationship pseudonym is 760 used. This means that the communication partner does not 761 necessarily know, whether two pseudonyms used in different roles 762 belong to the same holder. On the other hand, two different 763 communication partners who interact with a user in the same role, 764 do not know from the pseudonym alone whether it is the same user. 765 As with relationship pseudonyms, in case of group communication, 766 the role-relationship pseudonyms may be used between more than two 767 partners. 769 Transaction pseudonym: Apart from "transaction pseudonym" some 770 employ the term "one-time-use pseudonym", taking the naming from 771 "one-time pad". For each transaction, a transaction pseudonym 772 unlinkable to any other transaction pseudonyms and at least 773 initially unlinkable to any other IOI is used, e.g., randomly 774 generated transaction numbers for online-banking. Therefore, 775 transaction pseudonyms can be used to realize as strong anonymity 776 as possible. In fact, the strongest anonymity is given when there 777 is no identifying information at all, i.e., information that would 778 allow linking of anonymous entities, thus transforming the 779 anonymous transaction into a pseudonymous one. If the transaction 780 pseudonym is used exactly once, we have the same strength of 781 anonymity as if no pseudonym is used at all. Another possibility 782 to achieve strong anonymity is to prove the holdership of the 783 pseudonym or specific attribute values (e.g., with zero-knowledge 784 proofs) without revealing the information about the pseudonym or 785 more detailed attribute values themselves. Then, no identifiable 786 or linkable information is disclosed. 788 Linkability across different contexts due to the use of these 789 pseudonyms can be represented as the lattice that is illustrated in 790 the following diagram, see Figure 1. The arrows point in direction 791 of increasing unlinkability, i.e., A -> B stands for "B enables 792 stronger unlinkability than A". Note that "->" is not the same as 793 "=>" of Appendix B, which stands for the implication concerning 794 anonymity and unobservability. 796 linkable 798 +-----------------+ * 799 Person | | * 800 / Pseudonym \ | decreasing | * 801 // \\ | linkability | * 802 / \ | across | * 803 / \-+ | contexts | * 804 +-/ v | | * 805 v Role Relationship | | * 806 Pseudonym Pseudonym | | * 807 -- -- | | * 808 -- --- | | * 809 --- ---- | | * 810 --+ +--- | | * 811 v v | | * 812 Role-Relationship | | |* 813 Pseudonym | | * 814 | | | * 815 | | | * 816 | | | * 817 | | | * 818 | | | * 819 v | | * 820 Transaction | * 821 Pseudonym | v 823 unlinkable 825 Figure 1: Lattice of pseudonyms according to their use across 826 different contexts 828 In general, unlinkability of both role pseudonyms and relationship 829 pseudonyms is stronger than unlinkability of person pseudonyms. The 830 strength of unlinkability increases with the application of role- 831 relationship pseudonyms, the use of which is restricted to both the 832 same role and the same relationship. If a role-relationship 833 pseudonym is used for roles comprising many kinds of activities, the 834 danger arises that after a while, it becomes a person pseudonym in 835 the sense of: "A person pseudonym is a substitute for the holder's 836 name which is regarded as representation for the holder's civil 837 identity." This is even more true both for role pseudonyms and 838 relationship pseudonyms. Ultimate strength of unlinkability is 839 obtained with transaction pseudonyms, provided that no other 840 information, e.g., from the context or from the pseudonym itself, 841 enabling linking is available. 843 Anonymity is the stronger, ... 845 o the less personal data of the pseudonym holder can be linked to 846 the pseudonym; 848 o the less often and the less context-spanning pseudonyms are used 849 and therefore the less data about the holder can be linked; 851 o the more often independently chosen, i.e., from an observer's 852 perspective unlinkable, pseudonyms are used for new actions. 854 The amount of information of linked data can be reduced by different 855 subjects using the same pseudonym (e.g., one after the other when 856 pseudonyms are transferred or simultaneously with specifically 857 created group pseudonyms) or by misinformation or disinformation. 858 The group of pseudonym holders acts as an inner anonymity set within 859 a, depending on context information, potentially even larger outer 860 anonymity set. 862 7. Identity Management 864 Identity can be explained as an exclusive perception of life, 865 integration into a social group, and continuity, which is bound to a 866 body and - at least to some degree - shaped by society. This concept 867 of identity distinguishes between "I" and "Me" [Mead34] : "I" is the 868 instance that is accessible only by the individual self, perceived as 869 an instance of liberty and initiative. "Me" is supposed to stand for 870 the social attributes, defining a human identity that is accessible 871 by communications and that is an inner instance of control and 872 consistency (see [ICPP03] for more information). In this 873 terminology, we are interested in identity as communicated to others 874 and seen by them. Therefore, we concentrate on the "Me". 876 Motivated by identity as an exclusive perception of life, i.e., a 877 psychological perspective, but using terms defined from a computer 878 science, i.e., a mathematical perspective (as we did in the sections 879 before), identity can be explained and defined as a property of an 880 entity in terms of the opposite of anonymity and the opposite of 881 unlinkability. In a positive wording, identity enables both to be 882 identifiable as well as to link IOIs because of some continuity of 883 life. Here we have the opposite of anonymity (identifiability) and 884 the opposite of unlinkability (linkability) as positive properties. 885 So the perspective changes: What is the aim of an attacker w.r.t. 887 anonymity, now is the aim of the subject under consideration, so the 888 attacker's perspective becomes the perspective of the subject. And 889 again, another attacker (attacker2) might be considered working 890 against identifiability and/or linkability. I.e., attacker2 might 891 try to mask different attributes of subjects to provide for some kind 892 of anonymity or attacker2 might spoof some messages to interfere with 893 the continuity of the subject's life. 895 Definition: An identity is any subset of attribute values of an 896 individual person which sufficiently identifies this individual 897 person within any set of persons. So usually there is no such 898 thing as "the identity", but several of them. 900 Definition: Identity management means managing various identities 901 (usually denoted by pseudonyms) of an individual person, i.e., 902 administration of identity attributes including the development 903 and choice of the partial identity and pseudonym to be (re-)used 904 in a specific context or role. Establishment of reputation is 905 possible when the individual person re-uses partial identities. A 906 prerequisite to choose the appropriate partial identity is to 907 recognize the situation the person is acting in. 909 Of course, attribute values or even attributes themselves may change 910 over time. Therefore, if the attacker has no access to the change 911 history of each particular attribute, the fact whether a particular 912 subset of attribute values of an individual person is an identity or 913 not may change over time as well. If the attacker has access to the 914 change history of each particular attribute, any subset forming an 915 identity will form an identity from his perspective irrespective how 916 attribute values change. Any reasonable attacker will not just try 917 to figure out attribute values per se, but the point in time (or even 918 the time frame) they are valid (in), since this change history helps 919 a lot in linking and thus inferring further attribute values. 920 Therefore, it may clarify one's mind to define each "attribute" in a 921 way that its value cannot get invalid. So instead of the attribute 922 "location" of a particular individual person, take the set of 923 attributes "location at time x". Depending on the inferences you are 924 interested in, refining that set as a list ordered concerning 925 "location" or "time" may be helpful. 927 Identities may of course comprise particular attribute values like 928 names, identifiers, digital pseudonyms, and addresses - but they 929 don't have to. 931 8. Contributors 933 The authors would like to thank Andreas Pfitzmann for all his work on 934 this document. 936 9. Acknowledgments 938 Before this document was submitted to the IETF it already had a long 939 history starting at 2000 and a number of people helped to improve the 940 quality of the document with their feedback. A number of persons 941 contributed to the original writeup and they are acknowledged in 942 http://dud.inf.tu-dresden.de/Anon_Terminology.shtml. 944 10. Security Considerations 946 This document introduces terminology for talking about privacy by 947 data minimization. Since privacy protection relies on security 948 mechanisms this document is also related to security in a broader 949 context. 951 11. IANA Considerations 953 This document does not require actions by IANA. 955 12. References 957 12.1. Normative References 959 12.2. Informative References 961 [BuPf90] Buerk, H. and A. Pfitzmann, "Value Exchange Systems 962 Enabling Security and Unobservability", Computers & 963 Security , 9/8, 715-721, January 1990. 965 [Chau81] Chaum, D., "Untraceable Electronic Mail, Return Addresses, 966 and Digital Pseudonyms", Communications of the ACM , 24/2, 967 84-88, 1981. 969 [ICPP03] Independent Centre for Privacy Protection & Studio Notarile 970 Genghini, "Identity Management Systems (IMS): 971 Identification and Comparison Study", Study commissioned by 972 the Joint Research Centre Seville, Spain , http:// 973 www.datenschutzzentrum.de/projekte/idmanage/study.htm, 974 September 2003. 976 [Mead34] Mead, G., "Mind, Self and Society", Chicago Press , 1934. 978 [Pfit96] Pfitzmann, B., "Information Hiding Terminology -- Results 979 of an informal plenary meeting and additional proposals", 980 Information Hiding , NCS 1174, Springer, Berlin 1996, 347- 981 350, 1996. 983 [ReRu98] Reiter, M. and A. Rubin, "Crowds: Anonymity for Web 984 Transactions", ACM Transactions on Information and System 985 Security , 1(1), 66-92, November 1998. 987 [West67] Westin, A., "Privacy and Freedom", Atheneum, New York , 988 1967. 990 [Wils93] Wilson, K., "The Columbia Guide to Standard American 991 English", Columbia University Press, New York , 1993. 993 [ZFKP98] Zoellner, J., Federrath, H., Klimant, H., Pfitzmann, A., 994 Piotraschke, R., Westfeld, A., Wicke, G., and G. Wolf, 995 "Modeling the security of steganographic systems", 2nd 996 Workshop on Information Hiding , LNCS 1525, Springer, 997 Berlin 1998, 345-355, 1998. 999 [id] "Identifier - Wikipeadia", Wikipedia , 2011. 1001 Appendix A. Overview of Main Definitions and their Opposites 1003 o 1005 o 1007 +---------------------------------+---------------------------------+ 1008 | Definition | Negation | 1009 +---------------------------------+---------------------------------+ 1010 | Anonymity of a subject from an | Identifiability of a subject | 1011 | attacker's perspective means | from an attacker's perspective | 1012 | that the attacker cannot | means that the attacker can | 1013 | sufficiently identify the | sufficiently identify the | 1014 | subject within a set of | subject within a set of | 1015 | subjects, the anonymity set. | subjects, the identifiability | 1016 | | set. | 1017 | ------------------------------- | ------------------------------- | 1018 | Unlinkability of two or more | Linkability of two or more | 1019 | items of interest (IOIs, e.g., | items of interest (IOIs, e.g., | 1020 | subjects, messages, actions, | subjects, messages, actions, | 1021 | ...) from an attacker's | ...) from an attacker's | 1022 | perspective means that within | perspective means that within | 1023 | the system (comprising these | the system (comprising these | 1024 | and possibly other items), the | and possibly other items), the | 1025 | attacker cannot sufficiently | attacker can sufficiently | 1026 | distinguish whether these IOIs | distinguish whether these IOIs | 1027 | are related or not. | are related or not. | 1028 | ------------------------------- | ------------------------------- | 1029 | Undetectability of an item of | Detectability of an item of | 1030 | interest (IOI) from an | interest (IOI) from an | 1031 | attacker's perspective means | attacker's perspective means | 1032 | that the attacker cannot | that the attacker can | 1033 | sufficiently distinguish | sufficiently distinguish | 1034 | whether it exists or not. | whether it exists or not. | 1035 | ------------------------------- | ------------------------------- | 1036 | Unobservability of an item of | Observability of an item of | 1037 | interest (IOI) means | interest (IOI) means "many | 1038 | undetectability of the IOI | possibilities to define the | 1039 | against all subjects uninvolved | semantics". | 1040 | in it and anonymity of the | | 1041 | subject(s) involved in the IOI | | 1042 | even against the other | | 1043 | subject(s) involved in that | | 1044 | IOI. | | 1045 +---------------------------------+---------------------------------+ 1047 Appendix B. Relationships between Terms 1049 With respect to the same attacker, unobservability reveals always 1050 only a subset of the information anonymity reveals. [ReRu98] propose 1051 a continuum for describing the strength of anonymity. They give 1052 names: "absolute privacy" (the attacker cannot perceive the presence 1053 of communication, i.e., unobservability) - "beyond suspicion" - 1054 "probable innocence" - "possible innocence" - "exposed" - "provably 1055 exposed" (the attacker can prove the sender, recipient, or their 1056 relationship to others). Although we think that the terms "privacy" 1057 and "innocence" are misleading, the spectrum is quite useful. We 1058 might use the shorthand notation 1060 unobservability => anonymity 1062 for that (=> reads "implies"). Using the same argument and notation, 1063 we have 1065 sender unobservability => sender anonymity 1067 recipient unobservability => recipient anonymity 1069 relationship unobservability => relationship anonymity 1071 As noted above, we have 1073 sender anonymity => relationship anonymity 1075 recipient anonymity => relationship anonymity 1076 sender unobservability => relationship unobservability 1078 recipient unobservability => relationship unobservability 1080 With respect to the same attacker, unobservability reveals always 1081 only a subset of the information undetectability reveals 1083 unobservability => undetectability 1085 Authors' Addresses 1087 Marit Hansen (editor) 1088 ULD Kiel 1090 EMail: marit.hansen@datenschutzzentrum.de 1092 Hannes Tschofenig 1093 Nokia Siemens Networks 1094 Linnoitustie 6 1095 Espoo 02600 1096 Finland 1098 Phone: +358 (50) 4871445 1099 EMail: Hannes.Tschofenig@gmx.net 1100 URI: http://www.tschofenig.priv.at