idnits 2.17.00 (12 Aug 2021) /tmp/idnits54543/draft-bonatti-generic-antispam-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2022-05-20) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2022 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. ** The document is more than 15 pages and seems to lack a Table of Contents. == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 1) being 871 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) == There are 12 instances of lines with non-RFC2606-compliant FQDNs in the document. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (12 May 2004) is 6582 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- == Unused Reference: 'REPORT' is defined on line 843, but no explicit reference was found in the text == Unused Reference: 'MIME3' is defined on line 847, but no explicit reference was found in the text == Unused Reference: 'MSGFMT' is defined on line 851, but no explicit reference was found in the text ** Obsolete normative reference: RFC 2298 (ref. 'MDN') (Obsoleted by RFC 3798) -- Obsolete informational reference (is this intentional?): RFC 1892 (ref. 'REPORT') (Obsoleted by RFC 3462) -- Obsolete informational reference (is this intentional?): RFC 2822 (ref. 'MSGFMT') (Obsoleted by RFC 5322) Summary: 8 errors (**), 0 flaws (~~), 5 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Draft Chris Bonatti 2 Document: draft-bonatti-generic-antispam-00 IECA, Inc. 3 Proposed Category: Experimental 12 May 2004 5 A Generalized Mechanism for Control of 6 Unwanted Application Communications 8 STATUS OF THIS MEMO 10 This document is an Internet-Draft and is in full conformance 11 with all provisions of section 10 of RFC 2026. 13 Internet-Drafts are working documents of the Internet 14 Engineering Task Force (IETF), its areas, and its working 15 groups. Note that other groups may also distribute working 16 documents as Internet-Drafts. 18 Internet-Drafts are draft documents valid for a maximum of six 19 months and may be updated, replaced, or obsoleted by other 20 documents at any time. It is inappropriate to use 21 Internet-Drafts as reference material or to cite them other 22 than as "work in progress". 24 To learn the current status of any Internet-Draft, please 25 check the "1id-abstracts.txt" listing contained in the 26 Internet-Drafts Shadow Directories on ftp.is.co.za (Africa), 27 nic.nordu.net (Europe), munnari.oz.au (Pacific Rim), 28 ftp.ietf.org (US East Coast), or ftp.isi.edu (US West Coast). 30 ABSTRACT 32 This draft describes a new anti-spam technique that could be 33 applied to e-mail or (in principle) any push-mode application. 34 It includes a discussion of problem background, a description 35 of the proposed technique, and an analysis of the 36 effectiveness of the approach. 38 1. INTRODUCTION 40 This draft describes a generic mechanism that can be incorporated into 41 Internet applications to allow application user agents (UAs) to 42 automatically separate legitimate from fraudulent communications for the 43 purpose of facilitating selective filtering mechanisms. This mechanism 44 might, for example, be incorporated in electronic mail (e-mail) UAs or 45 domain gateways to aid in the rejection of spam. If used on a widespread 46 basis, this technique has the potential to dramatically reduce the 47 volume of spam reaching users. Thus deprived of recipients, spammers 48 will shift to other, more profitable means of advertising. This 49 mechanism could likewise be applied to other push-mode applications 50 (e.g., instant messaging, VoIP) to prevent undesirable communications. 52 1.1 Problem Description 54 Unwanted bulk e-mail, or spam, is regarded as the Internet plague of the 55 early 21st century. To date, the e-mail industry has dealt rather poorly 56 with the spam threat. Many halfway measures have been instituted that 57 have been largely ineffective at stemming the tide, but which have 58 caused a lot of pain and angst among users. If you've ever tried 59 traveling and plugging your laptop's e-mail into local service 60 providers, you know something of this pain. Internet mailing lists are 61 now frequently moderated, and have controlled submission because of 62 spam. This adds dramatically to the effort required to maintain a list, 63 and detracts from its functionality. Filters applied either at receiving 64 UAs or at Message Transfer Agents (MTAs) provide some spam relief, but 65 are often unreliable because of the frequent occurrence of invalid or 66 inaccurate header information. Newer filters based on the content of the 67 message offer some promise, but these have resulted in a sort of "arms 68 race" between filter vendors and spammers, with each trying to gain the 69 upper hand. 71 Open relay was never the problem. Mailing lists were never the problem. 72 Yet we took steps to hobble both. Filters are never going to be wholly 73 effective because they are trying to analyze ever-changing fraudulent 74 headers and body data. Establishing control over the set of originators 75 from which recipient domains will elect to receive mail is the real 76 problem. Only by addressing this problem directly will we manage to curb 77 spam. 79 Furthermore, the definition of what is spam lies solely with the user. 80 As has been occasionally noted, one man's spam is another man's ham. 81 However, the average user today does not take advantage of even the 82 limited control they might have over the problem, via receive-side 83 filters etc. Most just want the problem to "go away". Another way to 84 express this is that they want service providers to block spam without 85 the added complexity that user control implies. However, most 86 server-side filtering leads to significant rates of false positive and 87 false negative spam detection. So any realistic solution must operate by 88 default without much user input, and yield a reduced rate of false spam 89 detection. 91 Certainly there are many existing techniques that would facilitate 92 giving the recipient better control over the originators from which 93 messages will be received. The Secure Multipurpose Internet Mail 94 Extensions (S/MIME) standards, as well as Open Pretty Good Privacy 95 (OpenPGP), are capable of establishing strong authentication of the 96 actual originator. Other technologies such as Transport Layer Security 97 (TLS) and Internet Protocol Security (IPsec) are capable of providing 98 strong authentication between application layer entities. These could be 99 used to indirectly provide assurance of the originator identity and 100 return path. However, all of these techniques require deployment of 101 strong cryptography and some form of Public Key Infrastructure (PKI). 102 Years of PKI deployment history suggest that deploying any of these 103 technologies in a ubiquitous enough manner to support anti-spam measures 104 is virtually impossible. A simpler, more self-contained solution is 105 required to achieve the widespread degree of implementation necessary. 107 Several characteristics emerge as requirements for a prospective simple 108 and self-contained spam-blocking mechanism. Such a solution must enable 109 recipients or recipient domains to reliably reject unsolicited message 110 if they so choose without breaking the existing e-mail infrastructure. 111 The solution may assume that most users have relatively small sets of 112 partners with whom they exchange e-mail on a regular basis. It may 113 assume that most users do not have a frequent requirement to receive 114 unsolicited e-mail from unknown parties. Most importantly, the solution 115 may assume that spammers will not be able to access message sent to 116 spoofed originator addressed. This represents the Achilles heel of most 117 spam schemes. 119 1.2 Architectural Context 121 The key architectural advantage that the Internet has exhibited over the 122 years is based on the principle of clustering complexity at the edge of 123 the system, while keeping the core infrastructure as simple as possible. 124 This "simple core" principle offers advantages in scalability and 125 interoperability. The principle has proven its value in the deployment 126 of TCP/IP suite, and the widespread deployment of SMTP. However, most of 127 the anti-spam measures to date have attacked the problem by modifying 128 and complicating the e-mail core system. This leads to challenges in 129 policing the uniform deployment of features, and leads to more complex 130 sets of failure modes. Any effective anti-spam technology must embrace 131 the "simple core" principle by pushing the complexity as far outside the 132 core infrastructure as possible. 134 In the case where anti-spam filtering takes place in the recipient UA, 135 the complexity is as close to the system edge as possible. In this case, 136 the mechanism must be implemented locally to the UA and benefits only a 137 single user. This configuration is shown in the figure below. 139 Originating Domain Receiving Domain 140 +------+-+ +-------+ +-------+ +-+------+ 141 | |X| | | | | |X| | 142 | UA |X|---| MTA |\ /| MTA |---|X| UA | 143 | |X| | | \ / | | |X| | 144 +------+-+ +-------+ \ / +-------+ +-+------+ 145 . \ / . 146 . \ / . 147 . \ / . 148 Anti-spam \ / Anti-spam 149 Auto-reponder +-------+ +-------+ Auto-responder 150 | | | | 151 | MTA |---| MTA | 152 | | | | 153 +-------+ +-------+ 154 Infrastructure MTAs at ISPs 155 (optional) 157 Figure 1 - Individualized Spam Protection 159 In the case where anti-spam filtering serves an entire recipient domain, 160 the complexity affects the gateway or MTA components of the recipient 161 domain. The mechanism has the capability to provide benefit to the 162 entire receiving domain. However, the originating UAs will need to 163 implement any aspects of the mechanism individually in order to maintain 164 individual level authentication. This configuration is shown in the 165 figure below. 167 Originating Domain Receiving Domain 168 +------+-+ +-------+ +---------+ +----+ 169 | |X| | | | MTA |-----| UA | 170 | UA |X|---| MTA |\ /| +---+ |\_ +----+ 171 | |X| | | \ / | |XXX| |\ \_ 172 +------+-+ +-------+ \ / +--+---+--+ \ \+----+ 173 . \ / . \ | UA | 174 . \ / Anti-spam \ +----+ 175 . \ / Auto-responder \ 176 Anti-spam \ / +----+ 177 Auto-reponder +-------+ +-------+ | UA | 178 | | | | +----+ 179 | MTA |---| MTA | . 180 | | | | . 181 +-------+ +-------+ Multiple 182 Infrastructure MTAs at ISPs Users 183 (optional) 185 Figure 2 - Domain Spam Protection 187 1.3 Threat Environment 189 The extent of the threat against any potential anti-spam technology is 190 increasingly high. Offshore mass e-mailing firms are reputed to be 191 retaining freelance hackers and crackers to enhance the capability of 192 their messages to penetrate filters. These potential attackers are able 193 to bring a high level of analytical sophistication to bear in attacks 194 upon any anti-spam technologies. For example, recent efforts to limit 195 spam through deployment of "Baynesian" smart content filters have been 196 defeated by spammers using a combination of statistical modeling and 197 inert keyword padding. This level of sophistication is fueled by a 198 strong profit motive. Regardless of how many users are offended by spam, 199 a finite number of recipients will respond. Given a sufficiently large 200 recipient list this is sufficient to justify moderate expenditure on the 201 part of the spammers to preserve their "advertising" revenue stream. 203 Attacks that might be mounted by spammers are multifold. Not only is the 204 spammers main product a form of attack, but domains and organizations 205 perceived to be acting against the interests of hackers and crackers 206 have been specifically targeted for Internet Protocol (IP) Denial of 207 Service (DoS) and other network layer attacks. In this paper, however, 208 we will constrain our concern to variants of attack via the main threat 209 vector; namely unwanted application communications. The main attack 210 variants within this set include: 212 . Impersonation of an invalid source address - This is the 213 most common class of communications, where the indication of 214 originator is set to some invalid value merely to mask the 215 true originator's identity. 217 . Impersonation of a valid, but unknown source address - 218 This is also fairly common attack, whereby spammers will 219 randomly employ valid by incorrect values for the originator 220 based on previously harvested addresses. This will enable 221 the originator to pass a validity check in the DNS. 223 . Impersonation of a valid, and known source address - Same 224 as above, except that the address used is known to the 225 target. This may enable the attack to pass a list-based 226 filter mechanism. 228 . Impersonation of the recipient's own address - This is a 229 blind spot to many filter mechanisms, but are usually 230 readily detectable by the user. 232 . Targeted non-delivery notifications - In this technique 233 the spammer sends a message to an invalid address in a valid 234 domain, and impersonates the true target of the attack as 235 the originator. This results in a non-delivery notification 236 being sent from a valid server to the target, often 237 containing the spammer's original message. 239 . Spam beacons - Many unwanted communications contain 240 executable code or hyperlinks that can alert the attacker of 241 the successful communication, or attempt to gain access to 242 other information. 244 . Malicious code dissemination - Malicious code 245 dissemination is often commingled with other unsolicited 246 communications, compounding the detection problem. 248 . Malformed protocols - Keywords of header fields or HTML 249 tags are sometimes deliberately malformed in order to avoid 250 detection yet elicit a predictable behavior by the receiving 251 system. 253 . Keyword obfuscation - Keywords in the content of the 254 communication are misspelled, thereby evading filter 255 mechanisms. 257 . Inert keyword padding - Inert (e.g., frequently invisible) 258 text includes lists of keywords specifically formulated to 259 make the communication fit the profile of a legitimate 260 communication, thereby defeating statistical analysis 261 filters. 263 While the spammers' revenue stream provides the source of their 264 analytical sophistication, it is also a key weakness that can be turned 265 against them. Spammers are able to milk a relatively healthy revenue 266 stream from their clients because the cost of their operations are 267 underwritten by the vast infrastructure of the Internet. Internet 268 Service Providers (ISPs) bear a particularly heavy portion of that 269 burden. However, like traditional advertisers, spammers must demonstrate 270 to their clients a certain level of return for their fees. If spam 271 filters can sufficiently reduce the size of the audience for a spammer, 272 the reduction in the spammer's level of return will cause the revenue 273 stream to dry up and make the enterprise unprofitable. This means that 274 even although spam filtering at the edge is not effective in blocking 275 spam traffic in the infrastructure , it should result in a reduction in 276 the level of spam traffic via feedback effects. 278 Despite their sophistication, spammers suffer from relative scarcity of 279 resources. Their profit margins are entirely based on low cost overhead, 280 so they generally lack the "big iron" necessary to attack cryptographic 281 systems. 283 However, the "Achilles heel" of spam is the desire of the perpetrators 284 to maintain their anonymity. This forces them to spoof the originator 285 address, making deliberate attempts at reverse communication fail. This 286 common denominator to the attacks can be exploited to formulate a 287 solution. 289 2. THE SOLUTION 291 The solution to this situation from, an architectural standpoint, is to 292 embed an access control decision function in the application code to 293 automatically manage whether or not delivery of each communication will 294 be permitted. This aspect of the solution is not unique, but resembles 295 the e-mail filtering capability already embedded in many UAs. However, 296 as spam's Achilles heel is the spoofing of the originator address and 297 other e-mail headers, we can dramatically improve the effectiveness of 298 this access control function by incorporating a rudimentary handshake 299 process. This handshake process must have the following properties: 301 . It must bring result in a rate of erroneous denials as 302 close as possible to zero. 304 . It can assume that the spammer does not have access to 305 legitimate users' mailboxes. 307 . It must be sufficiently strong to resist moderate attack 308 from cryptographically savvy programmers. 310 . It must not require a large infrastructure to support its 311 operation. 313 . It must pass the "grandmother test", in that it requires 314 sufficiently little attention that anyone can operate it. 316 2.1 Handshake Procedure 318 The proposed solution offers a simple handshake that satisfies all of 319 these conditions. It will allow recipients (or receiving domains) to 320 require the presence of a hashed token in their messages. The solution 321 would work like this: 323 1. Unsolicited e-mail from unknown@foo.com arrives in mail 324 server in domain xyx.abc.com. 326 2. xyz.abc.com blocks delivery of the message, and sends 327 back a specially formatted message (as described in section 328 2.2 below) containing an eXtensible Markup Language (XML) 329 form soliciting the hashed token, and including a randomly 330 generated secret key for this sender and the message-ID of 331 the original unsolicited e-mail message. 333 3. xyz.abc.com retains a copy of key sent to unknown@foo.com 334 in its Originator Key Database (OKD) indexed under 335 unknown@foo.com. This record is retained for a finite period 336 unless validated. The retention period is defined by the 337 Sender Access Policy (SAP). 339 4. If, and only if, unknown@foo.com proves to be the 340 sender's an accurate address, they will receive the XML form 341 containing the key. If the XML form is not received and 342 processed within the retention period of xyz.abc.com, then 343 the original unsolicited message was properly denied access, 344 and the prospective user unknown@foo.com must begin the 345 process anew. 347 5. The UA software for unknown@foo.com decodes the XML form 348 and stores the key from domain xyx.abc.com in its Recipient 349 Key Database (RKD) indexed under domain xyz.abc.com. 351 6. unknown@foo.com looks up the original unsolicited e-mail 352 according to the message-ID included in the received XML 353 form. If the message has been deleted or cannot be located, 354 then the equivalent of a non-receipt notification should be 355 presented to the user. 357 7. unknown@foo.com employs the newly received key in the RKD 358 to generate a hashed token (as described in section 2.3 359 below) and resends the original unsolicited message amended 360 to include the token in a new RFC-822 heading extension. 362 8. On receipt of this resent message, xyz.abc.com will 363 detect the token extension, look up the key for 364 unknown@foo.com in OKD, and either grant or deny delivery 365 depending upon whether the token value is correct. 367 9. unknown@foo.com may employ the existing key in its RKD in 368 future messages to generate the hashed token extension. 370 Variations in this procedure are possible to provide additional 371 functionality depending on the requirements of the user. If a 372 prospective recipient requires exclusion of messages generated by 373 automated processes, then step (2) can include part of the key in a 374 distorted image to make parsing difficult. This feature consists of 375 existing technology employed by web servers today. If xyz.abc.com 376 receives some critical number of unsolicited message from 377 unknown@foo.com without the token extension, it could add 378 unknown@foo.com to a local blacklist and cease responding to future 379 requests. This prevents the OKD from growing without bound in a denial 380 of service (DoS) attack. Another variation would be to allow a facility 381 for unknown@foo.com to send a different XML form to xyz.abc.com at a 382 future time to change their key in the OKD. Alternately, the xyz.abc.com 383 could periodically issue new keys to unknown@foo.com at regular 384 intervals defined by the SAP. 386 A key factor in the procedure is the handling of incoming messages 387 containing the token extension, but not employing the proper key. In 388 this event, step (8) dictates that the delivery of the message would be 389 denied. However, consideration must also be given to reissuing a new key 390 to unknown@foo.com. The conditions under which a new key should be 391 issued may be subject to the SAP. 393 2.2 Secret Key Transmission 395 The response message to an unsolicited e-mail message (as outlined in 396 clause 2.1 step 2 above) will consist of a Message Disposition 397 Notification (MDN) prepared in accordance with [MDN]. The MDN will 398 include a new extension field named Identity-Key that will convey the 399 originator address or the unsolicited message, and a new base64 encoded 400 random secret key. The secret key will be stored in the OKD indexed by 401 the originator address. A notional example of such an MDN is shown 402 below. 404 Reporting-UA: somebody@xyz.abc.com 405 Arrival-Date: Fri, 27 Feb 2004 04:00:59 -0500 (EST) 407 Original-Recipient: rfc822;somebody@xyz.abc.com 408 Final-Recipient: rfc822; somebody@xyz.abc.com 409 Disposition: automatic-action/MDN-sent-automatically; denied 410 Original-Message-ID: <200402272301.23456@foo.com> 411 Identity-Key: ; WIwMXUxL2llY2F3ZWIvcHViGaWxlAxNzo1NS 413 Some variation in the MDN fields used is expected to accommodate local 414 implementation needs. Note that the MDN extension field Identity-Key 415 shown above would require formal registration by the Internet Assigned 416 Numbers Authority (IANA). 418 The MDN response shall be generated either automatically only if 419 indicated in the SAP. In accordance with [MDN] clause 2.1 if there are 420 multiple Return-Path headers, the Return-Path header is absent, or the 421 Return-Path header differs from the address in the 422 Disposition-Notification-To header. 424 The size of the key to be issued by the MDN is somewhat arbitrary, since 425 it is not used for any cryptographic operation per se. The key only 426 provides a secret value for use in later proving the identity of an 427 originator. The key size should be established by the user as part of 428 the SAP. 430 The UA may choose to reissue new keys to existing originators 431 represented in the OKD on a periodic basis. Whether this occurs and how 432 often should be defined by the SAP. 434 MDNs containing the Identity-Key extension should not be routinely 435 presented to users of UAs that support the extension. This MDN is 436 intended to facilitate key transfer and signal that this spam control 437 technique is in use, and offers few if any benefits to the user. For UAs 438 that do not support the extension, formatting the key transfer as an MDN 439 has the benefit that refusal of message by the spam filter can be 440 properly indicated. Visibility of these MDNs in properly cooperating 441 systems may cause user confusion in conflict with the "grandmother 442 test", because the message in question is to be automatically 443 retransmitted. 445 2.3 Token 447 Future messages from unknown@foo.com will be granted access to pass 448 through the receive filter at xyz.abc.com provided that they contain an 449 instance of the Identity-Token heading extension that matches their 450 address and key. The Identity-Token extension will consist of the 451 recipient address, a timestamp to provide a measure of liveness, and a 452 hash generated over these two values and the originator's secret key. 453 Note that a random number might also need to be included in this value 454 to provide sufficient entropy depending on the size of the key used. The 455 hash will employ the Secure Hash Algorithm (SHA-1) defined in [SHA-1]. 456 The originator will locate the proper key by searching for the recipient 457 address in the RKD. A notional example of an Identity-Token extension is 458 shown below. 460 Identity-Token: ; Fri, 27 Feb 2004 04:00:59 461 -0500 (EST); MjAwMS4wOS4yNSAxCIEU6XFxERUwwNS1GaWxlcy5 463 Note that the MIME header shown above will require formal registration 464 by IANA. 466 Canonicalization of the of the hashed information shall consist of 467 encoding exactly the characters presented in the recipient address 468 portion and the dates fields delimited by exactly one space character 469 (i.e., ASCII 32 decimal, 20 hex). No line terminators (i.e., carriage 470 return or line feed) or other whitespace shall be included in the hash. 471 The bytes to be hashed based on the above example would consist of the 472 following. The "*" characters indicate 128 bytes of the secret key. 474 ----------------------- BYTE OFFSET --------------------------- 475 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 476 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 477 --------------------------------------------------------------- 478 (+0) < s o m e b o d y @ x y z . a b c . c o m > ; F r i , 2 7 479 (+32) F e b 2 0 0 4 0 4 : 0 0 : 5 9 - 0 5 0 0 ( E S T ) ; * 480 (+64) * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * 481 (+96) * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * 482 (+128) * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * 483 (+160) * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * 484 (+192) 485 --------------------------------------------------------------- 487 Figure 3 - Canonicalization of the Hash Contents 489 The Identity-Token extension will be multi-valued. In the event that a 490 message is being sent to multiple recipients that require this 491 spam-control mechanism, an instance of the Identity-Token extension 492 should be included for each such recipient. 494 When received, at least one instance of an Identity-Token extension 495 should validate correctly for the purported message originator as a 496 condition for the message to be displayed to the user. The recipient UA 497 should search the tokens present in the message for its own address. If 498 a token with the proper address it not found, the message should be 499 treated as if no token were present. If more than one token contains the 500 proper address, the recipient UA shall process exactly one such token. 501 Selection of which token to process shall be a local matter. To validate 502 the token, the recipient UA shall regenerate the has as per the 503 canonicalization described above using the address and time stamp as 504 provided in the extension and the key associated with the originator 505 that has been retrieved from the OKD. Once generated, the hash should be 506 compared to the hash presented in the Identity-Token extension. If the 507 two hashes match exactly, the token shall be considered validated and 508 the message shall be displayed to the user. If the hashes fail to match, 509 the token fails validation. In event the token fails to validate, the 510 message shall not be presented to the user. It shall be identified in 511 the SAP whether or not a new key shall be sent to an originator in the 512 case of a failed token validation. 514 The Identity-Token extension should not be routinely presented to the 515 recipient user. The token is solely to facilitate automated access 516 control, and offers few if any benefits to the user. Visibility of these 517 tokens may cause user confusion in conflict with the "grandmother test". 519 2.4 Databases 521 The OKD is the foundation in the UA for recognizing previously validated 522 recipients. The OKD consists of a simple database indexing keys 523 previously issued in MDN according to the originator addresses to which 524 they were issued. When a message is delivered to a UA, the purported 525 originator address is looked up OKD. If the address is not found, the 526 message is blocked and an MDN is sent as per clause 2.2. This will 527 result in a new key being added to the OKD with the response field 528 marked with the date by which a response from that originator is 529 required (according to the SAP). If the originator address is contained 530 in the OKD, then the associated key is used to validate any received 531 token. Any time a particular key entry is used, the response field is 532 cleared. The OKD may be purged periodically to remove any records for 533 which the response date has passed. An illustration of the OKD structure 534 is shown below. 536 Originator Address Originator Secret Key Response 537 ------------------ --------------------- -------- 539 dingbat@foo.com b21wYXEtUFBDMjAwMi1Vc 540 GdyYWRlLUNvdXBvbi5wZG 542 unknown@foo.com cy56aXANCjIwMDEuMDkuM 29-Feb 543 jUgMTc6NTUgQiBFOlxcRE 544 . . . 545 . . . 546 . . . 547 bill@another.net Y2EuY29tL3RlbXA0NTYgM 548 DExMjEzLUNvbXBhcS1QUE 550 ted@nameless.edu cGFxLVBQQzIwMDItVXBnc 3-Mar 551 mFkZS1SZWNlaXB0LnBkZi 553 The RKD is the foundation in the UA for identifying the proper key(s) to 554 use for token generation on a given message. The RKD is similar to the 555 OKD, but is indexed by the prospective recipient's address from which a 556 key was received in a prior MDN. When a user is sending a message, they 557 will look up each prospective recipient in the RKD. For each recipient 558 found in the RKD, a separate Identity-Token extension will be generated 559 and added to the message. If a recipient is not in the RKD, it may 560 indicate that they have not yet provided a key, or that they do not 561 support this mechanism for spam control. An illustration of the RKD 562 structure is shown below. 564 Recipient Address Recipient Secret Key 565 ----------------- -------------------- 567 mary@scots.edu b21wYXEtUFBDMjAwMi1Vc 568 GdyYWRlLUNvdXBvbi5wZG 570 xyz.abc.com cy56aXANCjIwMDEuMDkuM 571 jUgMTc6NTUgQiBFOlxcRE 572 . . 573 . . 574 . . 575 t1431@mamma.net Y2EuY29tL3RlbXA0NTYgM 576 DExMjEzLUNvbXBhcS1QUE 578 dave@umich.edu cGFxLVBQQzIwMDItVXBnc 579 mFkZS1SZWNlaXB0LnBkZi 581 Both the OKD and RKD might reasonably be implemented as part of a local 582 address book or directory service. While the content of the databases is 583 sensitive, the degree of protection that must be afforded to the 584 database is relatively limited. It is only necessary to prevent 585 disclosure or the key values to prospective spammers. In many 586 circumstances, localizing the data to the user's home domain or account 587 is sufficient protection. Since the key values in the RKD are assigned 588 on a per-user basis, the user-association of the information must be 589 preserved. The same is true for the OKD, except that the OKD may be used 590 to support spam filtering at the domain level. 592 2.5 Sender Access Policy 594 The SAP defines a number of operational characteristics that affect 595 whether the sender's message will be granted permission to be delivered. 596 The SAP is entirely under the control of the receiving UA, or in the 597 case of the filtering for an entire domain the receiving MTA. This puts 598 the receiving in control of what sort of messages are acceptable. 599 Characteristics that would be defined by the SAP include the following. 601 . Response Delay - The period of time that a new originator 602 key will be retained in the OKD before a response is 603 required 605 . Originator Rekey - An indication whether an originator may 606 submit an XML form to change their own key in the OKD 608 . Key Size - Defines the size in bytes or the key to be 609 issued for new originators. 611 . Rekey Period - Defines the period of time after which new 612 keys will be issued to prospective originators 614 . Automation Exclusion - Defines whether or not to exclude 615 messages generated by automated processes 617 . Blacklist Exclusion Count - Indicates how many unsolicited 618 messages without the token extension will be tolerated from 619 a given originator, after which point that originator will 620 be added to a local blacklist and the UA will cease to 621 respond to future requests from that address 623 . Blacklist Purge Period - Indicates how long entries should 624 remain in the local blacklist 626 . Whitelist Users - Allows the user to manually configure 627 the system to admit messages appearing to be from certain 628 users without employing the challenge and response 629 mechanism. This will allow for interoperability with users 630 whose UAs do not support the mechanism. 632 . Reissue on Bad Key - Indicates whether a new key should be 633 sent in response to an incoming messages containing the 634 token extension, but not employing the proper key 636 . Automatic Response - Indicates whether or not the MDN 637 containing the originator's key shall be generated 638 automatically, or whether user confirmation shall be sought 640 For each of these operational characteristics, the recipient user shall 641 be given control. However, in the interest of passing the "grandmother 642 test" it is necessary to establish reasonable default settings for each 643 of these. Customization of these parameters might be hidden behind an 644 "advanced options" button in the SAP controls. The default values should 645 provide reasonable performance in spam rejection without causing 646 operational problems. The following default settings are proposed. 648 SAP Parameter Default Value 649 ------------- ------------- 651 Response Delay 7 days 653 Originator Rekey No 655 Key Size 128 byte (1024 bit) 657 Rekey Period 12 months 659 Automation Exclusion No 661 Blacklist Exclusion Count Yes 663 Blacklist Purge Period 1 month 665 Whitelist Users (empty) 667 Reissue on Bad Key Yes 669 Automatic Response Yes 671 3. ANALYSIS OF APPROACH 673 In order for it to be considered worthwhile to conduct experiments with 674 the candidate protocol extensions, a certain amount of analysis is 675 required to provide confidence that they will perform as expected and 676 stand up to attack in the proposed operational environment. This section 677 identifies the operational characteristics that are both advantageous 678 and disadvantageous, and possible weaknesses that could be exploited by 679 spammers or their hacker allies. 681 3.1 Operational Advantages 683 This proposed solution should reject spam from non-existent addresses 684 because the MDNs containing the key will not reach the spammer. It 685 should reject mail from valid but usurped addressed because the usurped 686 user won't respond to the XML MDN. The solution has the capacity to 687 reject mail from automated systems if coupled with other existing 688 technologies for ensuring human users. It also has the potential to 689 dramatically reduce the level of false positive spam detections because 690 known communication partners will employ the correct key in preparing 691 their messages. 693 The proposed mechanism incorporates the concept of a flexible SAP under 694 recipient user (or organization) control. This is important as it 695 preserves the principle of complexity to the edge. The default policy 696 recommended should address the needs of a broad user community. 698 The recipient portion of the anti-spam system can be implemented 699 entirely on the server side. This allows the implementation to provide 700 anti-spam protection to an entire organization or site. It also may 701 facilitate roll-out of the mechanism in heterogeneous domains employing 702 a variety of different e-mail UAs. The originator portion of the system 703 could also be implemented entirely on the server side to facilitate 704 roll-out, but this configuration is not recommended (see clause 3.2). 706 The solution can operate relatively autonomously according to the 707 default SAP to provide anti-spam protection even to relatively 708 unsophisticated users. This is important not only because it helps to 709 satisfy the "grandmother test" condition, but because it will allow it 710 to block spam for a wide range of users who cannot (or will not) use 711 less turnkey technology. Widespread blocking of spam is the key to 712 reducing the level of spam by undercutting the spammers' economic model. 714 The cryptography employed in the proposed solution is relatively simple, 715 so that implementation is not likely to be a barrier to the average 716 implementer. Similarly, the RKD could be easily integrated into most 717 existing address book implementations, something already quite common in 718 e-mail UAs. 720 The proposed solution requires zero infrastructure. This maintains the 721 principle of a simple core, and thereby allows incremental deployment, 722 good scalability, and ultimately improved interoperability. 724 The proposed mechanism can help to achieve a much lower rate of false 725 rejections in spam filtering. This can have very positive impacts on 726 user acceptance; especially in business environments where reliable 727 e-mail might be considered crucial. It also contributes to satisfying 728 the "grandmother test". 730 3.2 Operational Disadvantages 732 E-mail UAs that employ filtering based on this proposed mechanism will 733 not interoperate well with e-mail UAs that do not support the proposed 734 extensions. The ability to configure a whitelist in the SAP will 735 mitigate this to some extent, but maintaining a large whitelist has 736 disadvantages. First, each address in the whitelist represents an 737 address that might be exploited by a spammer. Second, management of a 738 large whitelist may be overly onerous for the user. 740 The sizes of both the OKD and RKD scale linearly in proportion to the 741 number of parties with which the user communicates. This may create a 742 scalability issue for users who communicate with large numbers of other 743 users. However, since most users have relatively small sets of partners 744 with whom they exchange e-mail this may not be a serious problem. Also, 745 perhaps "power users" have power platforms from which to run. 747 Repeated attempts to penetrate the filter mechanism can result in rapid 748 expansion of the OKD. Users who receive large volumes of spam might 749 experience OKD scalability issues. This can be managed to some extent by 750 shortening the response delay in the SAP. However, this comes at the 751 expense of requiring a faster response from legitimate users. 753 Spammers attempts to impersonate a known communication partner might 754 result in that partner being automatically blacklisted. If this occurs 755 then future communications from that partner would be blocked 756 constituting false positive spam detections. 758 Implementation of the originator portion of the anti-spam system can 759 introduce weaknesses to the system. If the RKD and token generation are 760 performed by a proxy agent, such as a local mail server etc., then all a 761 spammer in that local domain must do is impersonate a different local 762 user in order to employ their set of key. Since the feasibility of 763 identity spoofing with SMTP has been amply demonstrated, this seems a 764 likely attack to anticipate. 766 3.3 Possible Weaknesses and Vulnerabilities 768 The spammer has the option of trying to attack this mechanism my sending 769 a seemingly legitimate message with an originator or reply-to address 770 that corresponds to a mailbox that is accessible to them. In this event, 771 the spammer would automatically receive a key that would allow them to 772 get messages through to the target. However, the key would only function 773 for messages seeming to come from that address, so subsequent attempts 774 to use that address to spam the target could be dealt with by adding the 775 address to the blacklist. Also, it since mailbox access is required to 776 obtain the key in the first place, it is perhaps possible to identify 777 the spammer via their service provider. 779 The spammer might intercept or otherwise observe the MDN returned to a 780 legitimate user, thereby learning their key and enabling subsequent 781 spamming of the target. In this event, the spammer could impersonate 782 that user and successfully spam the target user. However, since the key 783 is pair-wise between those two users, the spammer would need to repeat 784 this process for every target. Assuming that the spammer could gain 785 access to working address/key combinations for every target user, the 786 odds of the address being identical for any of the targets are poor. So 787 the spammer would need to vary the spoofed originator on a per-target 788 basis, and maintain a very large RKD. Of course, none of this would 789 prevent the target users from blacklisting the address in question 790 making the whole exercise for naught. 792 The spammer might impersonate a legitimate user and generate tokens for 793 spamming message to conduct a brute force attack on the key. This is 794 impractical because the repeated attempts would stand out, allowing the 795 target filter to add the purported address to the blacklist. 796 Furthermore, since the spammer could not be assured of a response when 797 the correct key was used, the odds of the correct key going undetected 798 are high. 800 Excessive use of the whitelist feature in the SAP can introduce 801 weaknesses in the spam protection capabilities of the system. Each 802 address in the whitelist is vulnerable to impersonation by spammers. Of 803 course, since the spammer has no way of knowing what addresses the 804 target has in their whitelist, exploiting this weakness is somewhat 805 problematic. 807 Spammers might bombard the target user with large numbers of messages 808 that do not contain the proposed token in an attempted DoS attack. While 809 this may result in blacklist, the main protection from this attack is 810 the lack of profit motive on the part of the spammers. In other words, 811 this attack falls outside the scope of what we term spam. 813 4. CONCLUSION 815 This mechanism would give recipient users or domains a powerful tool to 816 reject mail from non-existent addresses, valid but usurped addressed, 817 and messages from automated systems. The approach supports commonly 818 desired policy constraints. The recipient half of the system can be 819 implemented entirely on the server side. The cryptography used does not 820 have to be extreme. This seems to me simple, but offering a lot of 821 advantages. 823 A program of simulation is recommended, followed by a limited 824 implementation as a plug-in for one or more e-mail UA. If testing shows 825 this mechanism to be effective in blocking unwanted e-mail communication 826 and achieving a low rate of false rejections, then a derivative of this 827 technique should be considered for Standards Track. Use of a similar 828 technique for other applications other than e-mail (e.g., instant 829 messaging, chat) should also be explored. 831 5. REFERENCES 833 5.1 Normative References 835 [MDN] RFC 2298: An Extensible Message Format for Message 836 Disposition Notifications, R. Fajman, March 1998. 838 [SHA-1] FIPS PUB 180-1: Secure Hash Standard, National Institute of 839 Standards and Technology, 17 April 1995. 841 5.2 Informative References 843 [REPORT] RFC 1892: The Multipart/Report Content Type for the 844 Reporting of Mail System Administrative Messages, G. 845 Vaudreuil, January 1996. 847 [MIME3] RFC 2047: MIME (Multipurpose Internet Mail Extensions) Part 848 Three: Message Header Extensions for Non-ASCII Text, K. 849 Moore, November 1996. 851 [MSGFMT] RFC 2822: Internet Message Format, P. Resnick, April 2001. 853 6. AUTHOR'S ADDRESS 855 Christopher Bonatti 856 IECA, Inc. 857 15309 Turkey Foot Road 858 Darnestown, MD 20878-3640 859 BonattiC@ieca.com