idnits 2.17.00 (12 Aug 2021) /tmp/idnits12268/draft-ietf-dime-ovli-06.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (January 8, 2015) is 2690 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 5226 (Obsoleted by RFC 8126) == Outdated reference: draft-ietf-dime-e2e-sec-req has been published as RFC 7966 -- Obsolete informational reference (is this intentional?): RFC 4006 (Obsoleted by RFC 8506) Summary: 1 error (**), 0 flaws (~~), 2 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Diameter Maintenance and Extensions (DIME) J. Korhonen, Ed. 3 Internet-Draft Broadcom 4 Intended status: Standards Track S. Donovan, Ed. 5 Expires: July 12, 2015 B. Campbell 6 Oracle 7 L. Morand 8 Orange Labs 9 January 8, 2015 11 Diameter Overload Indication Conveyance 12 draft-ietf-dime-ovli-06.txt 14 Abstract 16 This specification defines a base solution for Diameter overload 17 control, referred to as Diameter Overload Indication Conveyance 18 (DOIC). 20 Status of This Memo 22 This Internet-Draft is submitted in full conformance with the 23 provisions of BCP 78 and BCP 79. 25 Internet-Drafts are working documents of the Internet Engineering 26 Task Force (IETF). Note that other groups may also distribute 27 working documents as Internet-Drafts. The list of current Internet- 28 Drafts is at http://datatracker.ietf.org/drafts/current/. 30 Internet-Drafts are draft documents valid for a maximum of six months 31 and may be updated, replaced, or obsoleted by other documents at any 32 time. It is inappropriate to use Internet-Drafts as reference 33 material or to cite them other than as "work in progress." 35 This Internet-Draft will expire on July 12, 2015. 37 Copyright Notice 39 Copyright (c) 2015 IETF Trust and the persons identified as the 40 document authors. All rights reserved. 42 This document is subject to BCP 78 and the IETF Trust's Legal 43 Provisions Relating to IETF Documents 44 (http://trustee.ietf.org/license-info) in effect on the date of 45 publication of this document. Please review these documents 46 carefully, as they describe your rights and restrictions with respect 47 to this document. Code Components extracted from this document must 48 include Simplified BSD License text as described in Section 4.e of 49 the Trust Legal Provisions and are provided without warranty as 50 described in the Simplified BSD License. 52 Table of Contents 54 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 55 2. Terminology and Abbreviations . . . . . . . . . . . . . . . . 4 56 3. Conventions Used in This Document . . . . . . . . . . . . . . 5 57 4. Solution Overview . . . . . . . . . . . . . . . . . . . . . . 5 58 4.1. Piggybacking . . . . . . . . . . . . . . . . . . . . . . 7 59 4.2. DOIC Capability Announcement . . . . . . . . . . . . . . 7 60 4.3. DOIC Overload Condition Reporting . . . . . . . . . . . . 9 61 4.4. DOIC Extensibility . . . . . . . . . . . . . . . . . . . 11 62 4.5. Simplified Example Architecture . . . . . . . . . . . . . 11 63 5. Solution Procedures . . . . . . . . . . . . . . . . . . . . . 12 64 5.1. Capability Announcement . . . . . . . . . . . . . . . . . 12 65 5.1.1. Reacting Node Behavior . . . . . . . . . . . . . . . 13 66 5.1.2. Reporting Node Behavior . . . . . . . . . . . . . . . 13 67 5.1.3. Agent Behavior . . . . . . . . . . . . . . . . . . . 14 68 5.2. Overload Report Processing . . . . . . . . . . . . . . . 15 69 5.2.1. Overload Control State . . . . . . . . . . . . . . . 15 70 5.2.2. Reacting Node Behavior . . . . . . . . . . . . . . . 19 71 5.2.3. Reporting Node Behavior . . . . . . . . . . . . . . . 20 72 5.3. Protocol Extensibility . . . . . . . . . . . . . . . . . 21 73 6. Loss Algorithm . . . . . . . . . . . . . . . . . . . . . . . 22 74 6.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . 22 75 6.2. Reporting Node Behavior . . . . . . . . . . . . . . . . . 23 76 6.3. Reacting Node Behavior . . . . . . . . . . . . . . . . . 24 77 7. Attribute Value Pairs . . . . . . . . . . . . . . . . . . . . 24 78 7.1. OC-Supported-Features AVP . . . . . . . . . . . . . . . . 25 79 7.2. OC-Feature-Vector AVP . . . . . . . . . . . . . . . . . . 25 80 7.3. OC-OLR AVP . . . . . . . . . . . . . . . . . . . . . . . 25 81 7.4. OC-Sequence-Number AVP . . . . . . . . . . . . . . . . . 26 82 7.5. OC-Validity-Duration AVP . . . . . . . . . . . . . . . . 26 83 7.6. OC-Report-Type AVP . . . . . . . . . . . . . . . . . . . 26 84 7.7. OC-Reduction-Percentage AVP . . . . . . . . . . . . . . . 27 85 7.8. Attribute Value Pair flag rules . . . . . . . . . . . . . 27 86 8. Error Response Codes . . . . . . . . . . . . . . . . . . . . 28 87 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 28 88 9.1. AVP codes . . . . . . . . . . . . . . . . . . . . . . . . 28 89 9.2. New registries . . . . . . . . . . . . . . . . . . . . . 28 90 10. Security Considerations . . . . . . . . . . . . . . . . . . . 29 91 10.1. Potential Threat Modes . . . . . . . . . . . . . . . . . 29 92 10.2. Denial of Service Attacks . . . . . . . . . . . . . . . 31 93 10.3. Non-Compliant Nodes . . . . . . . . . . . . . . . . . . 31 94 10.4. End-to End-Security Issues . . . . . . . . . . . . . . . 31 95 11. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 33 96 12. References . . . . . . . . . . . . . . . . . . . . . . . . . 33 97 12.1. Normative References . . . . . . . . . . . . . . . . . . 33 98 12.2. Informative References . . . . . . . . . . . . . . . . . 33 99 Appendix A. Issues left for future specifications . . . . . . . 34 100 A.1. Additional traffic abatement algorithms . . . . . . . . . 34 101 A.2. Agent Overload . . . . . . . . . . . . . . . . . . . . . 34 102 A.3. New Error Diagnostic AVP . . . . . . . . . . . . . . . . 34 103 Appendix B. Deployment Considerations . . . . . . . . . . . . . 34 104 Appendix C. Requirements Conformance Analysis . . . . . . . . . 35 105 C.1. Deferred Requirements . . . . . . . . . . . . . . . . . . 35 106 C.2. Detection of non-supporting Intermediaries . . . . . . . 35 107 C.3. Implicit Application Indication . . . . . . . . . . . . . 36 108 C.4. Stateless Operation . . . . . . . . . . . . . . . . . . . 36 109 C.5. No New Vulnerabilities . . . . . . . . . . . . . . . . . 36 110 C.6. Detailed Requirements . . . . . . . . . . . . . . . . . . 36 111 C.6.1. General . . . . . . . . . . . . . . . . . . . . . . . 36 112 C.6.2. Performance . . . . . . . . . . . . . . . . . . . . . 38 113 C.6.3. Heterogeneous Support for Solution . . . . . . . . . 40 114 C.6.4. Granular Control . . . . . . . . . . . . . . . . . . 42 115 C.6.5. Priority and Policy . . . . . . . . . . . . . . . . . 43 116 C.6.6. Security . . . . . . . . . . . . . . . . . . . . . . 43 117 C.6.7. Flexibility and Extensibility . . . . . . . . . . . . 44 118 Appendix D. Considerations for Applications Integrating the DOIC 119 Solution . . . . . . . . . . . . . . . . . . . . . . 46 120 D.1. Application Classification . . . . . . . . . . . . . . . 46 121 D.2. Application Type Overload Implications . . . . . . . . . 47 122 D.3. Request Transaction Classification . . . . . . . . . . . 48 123 D.4. Request Type Overload Implications . . . . . . . . . . . 49 124 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 50 126 1. Introduction 128 This specification defines a base solution for Diameter overload 129 control, referred to as Diameter Overload Indication Conveyance 130 (DOIC), based on the requirements identified in [RFC7068]. 132 This specification addresses Diameter overload control between 133 Diameter nodes that support the DOIC solution. The solution, which 134 is designed to apply to existing and future Diameter applications, 135 requires no changes to the Diameter base protocol [RFC6733] and is 136 deployable in environments where some Diameter nodes do not implement 137 the Diameter overload control solution defined in this specification. 139 A new application specification can incorporate the overload control 140 mechanism specified in this document by making it mandatory to 141 implement for the application and referencing this specification 142 normatively. It is the responsibility of the Diameter application 143 designers to define how overload control mechanisms works on that 144 application. 146 Note that the overload control solution defined in this specification 147 does not address all the requirements listed in [RFC7068]. A number 148 of overload control related features are left for future 149 specifications. See Appendix A for a list of extensions that are 150 currently being considered. See Appendix C for an analysis of 151 conformance to the requirements specified in [RFC7068]. 153 2. Terminology and Abbreviations 155 Abatement 157 Reaction to receipt of an overload report resulting in a reduction 158 in traffic sent to the reporting node. Abatement actions include 159 diversion and throttling. 161 Abatement Algorithm 163 An extensible mechanism requested by reporting nodes and used by 164 reacting nodes to reduce the amount of traffic sent during an 165 occurrence of overload control. 167 Diversion 169 An overload abatement mechanism, where the reacting node selects 170 alternate destinations or paths for for requests. 172 Host-Routed Requests 174 Requests that a reacting node knows will be served by a particular 175 host, either due to the presence of a Destination-Host AVP, or by 176 some other local knowledge on the part of the reacting node. 178 Overload Control State (OCS) 180 Internal state maintained by a reporting or reacting node 181 describing occurrences of overload control. 183 Overload Report (OLR) 185 Overload control information for a particular overload occurrence 186 sent by a reporting node. 188 Reacting Node 190 A Diameter node that acts upon an overload report. 192 Realm-Routed Requests 193 Requests that a reacting node does not know which host will 194 service the request. 196 Reporting Node 198 A Diameter node that generates an overload report. (This may or 199 may not be the overloaded node.) 201 Throttling 203 A mechanism for overload abatement that limits the number of 204 requests sent by the DIOC reacting node. Throttling can include a 205 Diameter Client choosing to not send requests, or a Diameter Agent 206 or Server rejecting requests with appropriate error responses. In 207 both cases the result of the throttling is a permanent rejection 208 of the transaction. 210 3. Conventions Used in This Document 212 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 213 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 214 document are to be interpreted as described in RFC 2119 [RFC2119]. 216 4. Solution Overview 218 The Diameter Overload Information Conveyance (DOIC) solution allows 219 Diameter nodes to request other Diameter nodes to perform overload 220 abatement actions, that is, actions to reduce the load offered to the 221 overloaded node or realm. 223 A Diameter node that supports DOIC is known as a "DOIC node". Any 224 Diameter node can act as a DOIC node, including Diameter Clients, 225 Diameter Servers, and Diameter Agents. DOIC nodes are further 226 divided into "Reporting Nodes" and "Reacting Nodes." A reporting 227 node requests overload abatement by sending Overload Reports (OLR). 229 A reacting node acts upon OLRs, and performs whatever actions are 230 needed to fulfill the abatement requests included in the OLRs. A 231 Reporting node may report overload on its own behalf, or on behalf of 232 other nodes. Likewise, a reacting node may perform overload 233 abatement on its own behalf, or on behalf of other nodes. 235 A Diameter node's role as a DOIC node is independent of its Diameter 236 role. For example, Diameter Agents may act as DOIC nodes, even 237 though they are not endpoints in the Diameter sense. Since Diameter 238 enables bi-directional applications, where Diameter Servers can send 239 requests towards Diameter Clients, a given Diameter node can 240 simultaneously act as both a reporting node and a reacting node. 242 Likewise, a Diameter Agent may act as a reacting node from the 243 perspective of upstream nodes, and a reporting node from the 244 perspective of downstream nodes. 246 DOIC nodes do not generate new messages to carry DOIC related 247 information. Rather, they "piggyback" DOIC information over existing 248 Diameter messages by inserting new AVPs into existing Diameter 249 requests and responses. Nodes indicate support for DOIC, and any 250 needed DOIC parameters, by inserting an OC-Supported-Features AVP 251 (Section 7.2) into existing requests and responses. Reporting nodes 252 send OLRs by inserting OC-OLR AVPs (Section 7.3). 254 A given OLR applies to the Diameter realm and application of the 255 Diameter message that carries it. If a reporting node supports more 256 than one realm and/or application, it reports independently for each 257 combination of realm and application. Similarly, the OC-Supported- 258 Features AVP applies to the realm and application of the enclosing 259 message. This implies that a node may support DOIC for one 260 application and/or realm, but not another, and may indicate different 261 DOIC parameters for each application and realm for which it supports 262 DOIC. 264 Reacting nodes perform overload abatement according to an agreed-upon 265 abatement algorithm. An abatement algorithm defines the meaning of 266 some of the parameters of an OLR and the procedures required for 267 overload abatement. An overload abatement algorithm separates 268 Diameter requests into two sets. The first set contains the requests 269 that are to undergo overload abatement treatment of either throttling 270 or diversion. The second set contains the requests that are to be 271 given normal routing treatment. This document specifies a single 272 must-support algorithm, namely the "loss" algorithm (Section 6). 273 Future specifications may introduce new algorithms. 275 Overload conditions may vary in scope. For example, a single 276 Diameter node may be overloaded, in which case reacting nodes may 277 attempt to send requests to other destinations. On the other hand, 278 an entire Diameter realm may be overloaded, in which case such 279 attempts would do harm. DOIC OLRs have a concept of "report type" 280 (Section 7.6), where the type defines such behaviors. Report types 281 are extensible. This document defines report types for overload of a 282 specific host, and for overload of an entire realm. 284 DOIC works through non supporting Diameter Agents that properly pass 285 unknown AVPs unchanged. 287 4.1. Piggybacking 289 There is no new Diameter application defined to carry overload 290 related AVPs. The overload control AVPs defined in this 291 specification have been designed to be piggybacked on top of existing 292 application messages. This is made possible by adding the optional 293 overload control AVPs OC-OLR and OC-Supported-Features into existing 294 commands. 296 Reacting nodes indicate support for DOIC by including the OC- 297 Supported-Features AVP in all request messages originated or relayed 298 by the reacting node. 300 Reporting nodes indicate support for DOIC by including the OC- 301 Supported-Features AVP in all answer messages originated or relayed 302 by the reporting node that are in response to a request that 303 contained the OC-Supported-Features AVP. Reporting nodes may include 304 overload reports using the OC-OLR AVP in answer messages. 306 Note that the overload control solution does not have fixed server 307 and client roles. The DOIC node role is determined based on the 308 message type: whether the message is a request (i.e. sent by a 309 "reacting node") or an answer (i.e. sent by a "reporting node"). 310 Therefore, in a typical "client-server" deployment, the Diameter 311 Client may report its overload condition to the Diameter Server for 312 any Diameter Server initiated message exchange. An example of such 313 is the Diameter Server requesting a re-authentication from a Diameter 314 Client. 316 4.2. DOIC Capability Announcement 318 The DOIC solution supports the ability for Diameter nodes to 319 determine if other nodes in the path of a request support the 320 solution. This capability is referred to as DOIC Capability 321 Announcement (DCA) and is separate from Diameter Capability Exchange. 323 The DCA mechanism uses the OC-Supported-Features AVPs to indicate the 324 Diameter overload features supported. 326 The first node in the path of a Diameter request that supports the 327 DOIC solution inserts the OC-Supported-Features AVP in the request 328 message. 330 The individual features supported by the DOIC nodes are indicated in 331 the OC-Feature-Vector AVP. Any semantics associated with the 332 features will be defined in extension specifications that introduce 333 the features. 335 Note: As discussed elsewhere in the document, agents in the path 336 of the request can modify the OC-Supported-Features AVP. 338 Note: The DOIC solution must support deployments where Diameter 339 Clients and/or Diameter Servers do not support the DOIC solution. 340 In this scenario, Diameter Agents that support the DOIC solution 341 may handle overload abatement for the non supporting Diameter 342 nodes. In this case the DOIC agent will insert the OC-Supported- 343 Features AVP in requests that do not already contain one, telling 344 the reporting node that there is a DOIC node that will handle 345 overload abatement. For transactions where there was an OC- 346 Supporting-Features AVP in the request, the agent will insert the 347 OC-Supported-Features AVP in answers, telling the reacting node 348 that there is a reporting node. 350 The OC-Feature-Vector AVP will always contain an indication of 351 support for the loss overload abatement algorithm defined in this 352 specification (see Section 6). This ensures that a reporting node 353 always supports at least one of the advertized abatement algorithms 354 received in a request messages. 356 The reporting node inserts the OC-Supported-Features AVP in all 357 answer messages to requests that contained the OC-Supported-Features 358 AVP. The contents of the reporting node's OC-Supported-Features AVP 359 indicate the set of Diameter overload features supported by the 360 reporting node. This specification defines one exception - the 361 reporting node only includes an indication of support for one 362 overload abatement algorithm, independent of the number of overload 363 abatement algorithms actually supported by the reacting node. The 364 overload abatement algorithm indicated is the algorithm that the 365 reporting node intends to use should it enter an overload condition. 366 Reacting nodes can use the indicated overload abatement algorithm to 367 prepare for possible overload reports and must use the indicated 368 overload abatement algorithm if traffic reduction is actually 369 requested. 371 Note that the loss algorithm defined in this document is a 372 stateless abatement algorithm. As a result it does not require 373 any actions by reacting nodes prior to the receipt of an overload 374 report. Stateful abatement algorithms that base the abatement 375 logic on a history of request messages sent might require reacting 376 nodes to maintain state in advance of receiving an overload report 377 to ensure that the overload reports can be properly handled. 379 Reporting nodes can change the overload abatement algorithm indicated 380 in the OC-Feature-Vector AVP if the reporting node is not currently 381 in an overload condition and sending overload reports. The reporting 382 node is not allowed to change the overload abatement algorithm while 383 the reporting node is in an overload condition. 385 The DCA mechanism must also allow the scenario where the set of 386 features supported by the sender of a request and by agents in the 387 path of a request differ. In this case, the agent can update the OC- 388 Supported-Features AVP to reflect the mixture of the two sets of 389 supported features. 391 Note: The logic to determine if the content of the OC-Supported- 392 Features AVP should be changed is out-of-scope for this document, 393 as is the logic to determine the content of a modified OC- 394 Supported-Features AVP. These are left to implementation 395 decisions. Care must be taken not to introduce interoperability 396 issues for downstream or upstream DOIC nodes. 398 4.3. DOIC Overload Condition Reporting 400 As with DOIC capability announcement, overload condition reporting 401 uses new AVPs (Section 7.3) to indicate an overload condition. 403 The OC-OLR AVP is referred to as an overload report. The OC-OLR AVP 404 includes the type of report, a sequence number, the length of time 405 that the report is valid and abatement algorithm specific AVPs. 407 Two types of overload reports are defined in this document: host 408 reports and realm reports. 410 A report of type "HOST_REPORT" is sent to indicate the overload of a 411 specific host, identified by the Origin-Host AVP of the message 412 containing the OLR, for the application-id indicated in the 413 transaction. When receiving an OLR of type "HOST_REPORT", a reacting 414 node applies overload abatement treatment to the host-routed requests 415 identified by the overload abatement algorithm (see definition in 416 Section 2) sent for this application to the overloaded host. 418 A report of type "REALM_REPORT" is sent to indicate the overload of a 419 realm for the application-id indicated in the transaction. The 420 overloaded realm is identified by the Destination-Realm AVP of the 421 message containing the OLR. When receiving an OLR of type 422 "REALM_REPORT", a reacting node applies overload abatement treatment 423 to realm-routed requests identified by the overload abatement 424 algorithm (see definition in Section 2) sent for this application to 425 the overloaded realm. 427 This document assumes that there is a single source for realm-reports 428 for a given realm, or that if multiple nodes can send realm reports, 429 that each such node has full knowledge of the overload state of the 430 entire realm. A reacting node cannot distinguish between receiving 431 realm-reports from a single node, or from multiple nodes. 433 Note: Known issues exist if multiple sources for overload reports 434 which apply to the same Diameter entity exist. Reacting nodes 435 have no way of determining the source and, as such, will treat 436 them as coming from a single source. Variance in sequence numbers 437 between the two sources can then cause incorrect overload 438 abatement treatment to be applied for indeterminate periods of 439 time. 441 Reporting nodes are responsible for determining the need for a 442 reduction of traffic. The method for making this determination is 443 implementation specific and depend on the type of overload report 444 being generated. A host-report might be generated by tracking use of 445 resources required by the host to handle transactions for the 446 Diameter application. A realm-report generally impacts the traffic 447 sent to multiple hosts and, as such, requires tracking the capacity 448 of all servers able to handle realm- routed requests for the 449 application and realm. 451 Once a reporting node determines the need for a reduction in traffic, 452 it uses the DOIC defined AVPs to report on the condition. These AVPs 453 are included in answer messages sent or relayed by the reporting 454 node. The reporting node indicates the overload abatement algorithm 455 that is to be used to handle the traffic reduction in the OC- 456 Supported-Features AVP. The OC-OLR AVP is used to communicate 457 information about the requested reduction. 459 Reacting nodes, upon receipt of an overload report, applying the 460 overload abatement algorithm to traffic impacted by the overload 461 report. The method used to determine the requests that are to 462 receive overload abatement treatment is dependent on the abatement 463 algorithm. The loss abatement algorithm is defined in this document 464 (Section 6). Other abatement algorithms can be defined in extensions 465 to the DOIC solutions. 467 Two types of overload abatement treatment are defined, diversion and 468 throttling. Reacting nodes are responsible for determining which 469 treatment is appropriate for individual requests. 471 As the conditions that lead to the generation of the overload report 472 change the reporting node can send new overload reports requesting 473 greater reduction if the condition gets worse or less reduction if 474 the condition improves. The reporting node sends an overload report 475 with a duration of zero to indicate that the overload condition has 476 ended and abatement is no longer needed. 478 The reacting node also determines when the overload report expires 479 based on the OC-Validity-Duration AVP in the overload report and 480 stops applying the abatement algorithm when the report expires. 482 4.4. DOIC Extensibility 484 The DOIC solution is designed to be extensible. This extensibility 485 is based on existing Diameter based extensibility mechanisms, along 486 with the DOIC capability announcement mechanism. 488 There are multiple categories of extensions that are expected. This 489 includes the definition of new overload abatement algorithms, the 490 definition of new report types and the definition of new scopes of 491 messages impacted by an overload report. 493 A DOIC node communicates supported features by including them in the 494 OC-Feature-Vector AVP, as a sub-AVP of OC-Supported-Features. Any 495 non-backwards compatible DOIC extensions define new values for the 496 OC-Feature-Vector AVP. DOIC extensions also have the ability to add 497 new AVPs to the OC-Supported-Features AVP, if additional information 498 about the new feature is required. 500 Overload reports can be also extended by adding new sub-AVPs to the 501 OC-OLR AVP, allowing reporting nodes to communicate additional 502 information about handling an overload condition. 504 If necessary, new extensions can also define new AVPs that are not 505 part of the OC-Supported-Features and OC-OLR group AVPs. It is, 506 however, recommended that DOIC extensions use the OC-Supported- 507 Features AVP and OC-OLR AVP to carry all DOIC related AVPs. 509 4.5. Simplified Example Architecture 511 Figure 1 illustrates the simplified architecture for Diameter 512 overload information conveyance. 514 Realm X Same or other Realms 515 <--------------------------------------> <----------------------> 517 +--^-----+ : (optional) : 518 |Diameter| : : 519 |Server A|--+ .--. : +---^----+ : .--. 520 +--------+ | _( `. : |Diameter| : _( `. +---^----+ 521 +--( )--:-| Agent |-:--( )--|Diameter| 522 +--------+ | ( ` . ) ) : +-----^--+ : ( ` . ) ) | Client | 523 |Diameter|--+ `--(___.-' : : `--(___.-' +-----^--+ 524 |Server B| : : 525 +---^----+ : : 527 End-to-end Overload Indication 528 1) <-----------------------------------------------> 529 Diameter Application Y 531 Overload Indication A Overload Indication A' 532 2) <----------------------> <----------------------> 533 Diameter Application Y Diameter Application Y 535 Figure 1: Simplified architecture choices for overload indication 536 delivery 538 In Figure 1, the Diameter overload indication can be conveyed (1) 539 end-to-end between servers and clients or (2) between servers and 540 Diameter agent inside the realm and then between the Diameter agent 541 and the clients. 543 5. Solution Procedures 545 This section outlines the normative behavior for the DOIC solution. 547 5.1. Capability Announcement 549 This section defines DOIC Capability Announcement (DCA) behavior. 551 Note: This specification assumes that changes in DOIC node 552 capabilities are relatively rare events that occur as a result of 553 administrative action. Reacting nodes ought to minimize changes 554 that force the reporting node to change the features being used, 555 especially during active overload conditions. But even if 556 reacting nodes avoid such changes, reporting nodes still have to 557 be prepared for them to occur. For example, differing 558 capabilities between multiple reacting nodes may still force a 559 reporting node to select different features on a per-transaction 560 basis. 562 5.1.1. Reacting Node Behavior 564 A reacting node MUST include the OC-Supported-Features AVP in all 565 requests. It MAY include the OC-Feature-Vector AVP, as a sub-avp of 566 OC-Supported-Features. If it does so, it MUST indicate support for 567 the "loss" algorithm. If the reacting node is configured to support 568 features (including other algorithms) in addition to the loss 569 algorithm, it MUST indicate such support in an OC-Feature-Vector AVP. 571 An OC-Supported-Features AVP in answer messages indicates there is a 572 reporting node for the transaction. The reacting node MAY take 573 action, for example creating state for some stateful abatement 574 algorithm, based on the features indicated in the OC-Feature-Vector 575 AVP. 577 Note: The loss abatement algorithm does not require stateful 578 behavior when there is no active overload report. 580 5.1.2. Reporting Node Behavior 582 Upon receipt of a request message, a reporting node determines if 583 there is a reacting node for the transaction based on the presence of 584 the OC-Supported-Features AVP in the request message. 586 If the request message contains an OC-Supported-Features AVP then a 587 reporting node MUST include the OC-Supported-Features AVP in the 588 answer message for that transaction. 590 Note: Capability announcement is done on a per transaction basis. 591 The reporting node cannot assume that the capabilities announced 592 by a reacting node will be the same between transactions. 594 A reporting node MUST NOT include the OC-Supported-Features AVP, OC- 595 OLR AVP or any other overload control AVPs defined in extension 596 drafts in response messages for transactions where the request 597 message does not include the OC-Supported-Features AVP. Lack of the 598 OC-Supported-Features AVP in the request message indicates that there 599 is no reacting node for the transaction. 601 A reporting node knows what overload control functionality is 602 supported by the reacting node based on the content or absence of the 603 OC-Feature-Vector AVP within the OC-Supported-Features AVP in the 604 request message. 606 A reporting node MUST indicate support for one and only one abatement 607 algorithm in the OC-Feature-Vector AVP. The abatement algorithm 608 selected MUST indicate the abatement algorithm the reporting node 609 wants the reacting node to use when the reporting node enters an 610 overload condition. 612 The abatement algorithm selected MUST be from the set of abatement 613 algorithms contained in the request message's OC-Feature-Vector AVP. 615 A reporting node that selects the loss algorithm may do so by 616 including the OC-Feature-Vector AVP with an explicit indication of 617 the loss algorithm, or it MAY omit OC-Feature-Vector. If it selects 618 a different algorithm, it MUST include the OC-Feature-Vector AVP with 619 an explicit indication of the selected algorithm. 621 A reporting node MUST NOT change the selected algorithm during the 622 period of time that starts when entering an overload condition and 623 ends when the associated OCS becomes invalid in all reacting nodes. 625 The reporting node MAY change the overload abatement algorithm 626 indicated in the OC-Supported-Features AVP at any time as long as no 627 previously sent OLRs may be active. 629 The reporting node SHOULD indicate support for other DOIC features 630 defined in extension drafts that it supports and that apply to the 631 transaction. It does so using the OC-Feature-Vector AVP. 633 Note: Not all DOIC features will apply to all Diameter 634 applications or deployment scenarios. The features included in 635 the OC-Feature-Vector AVP are based on local reporting node 636 policy. 638 5.1.3. Agent Behavior 640 Diameter Agents that support DOIC MAY ensure that all messages 641 relayed by the agent contain the OC-Supported-Features AVP. 643 A Diameter Agent SHOULD take on reacting node behavior for Diameter 644 endpoints that do not support the DOIC solution. A Diameter Agent 645 detects that a Diameter endpoint does not support DOIC reacting node 646 behavior when there is no OC-Supported-Features AVP in a request 647 message. 649 For a Diameter Agent to be a reacting node for a non supporting 650 Diameter endpoint, the Diameter Agent MUST include the OC-Supported- 651 Features AVP in request messages it relays that do not contain the 652 OC-Supported-Features AVP. 654 A Diameter Agent MAY take on reporting node behavior for Diameter 655 endpoints that do not support the DOIC solution. The Diameter Agent 656 MUST have visibility to all traffic destined for the non supporting 657 host in order to become the reporting node for the Diameter endpoint. 658 A Diameter Agent detects that a Diameter endpoint does not support 659 DOIC reporting node behavior when there is no OC-Supported-Features 660 AVP in an answer message for a transaction that contained the OC- 661 Supported-Features AVP in the request message. 663 If a request already has the OC-Supported-Features AVP, a Diameter 664 agent MAY modify it to reflect the features appropriate for the 665 transaction. Otherwise, the agent relays the OC-Supported-Features 666 AVP without change. 668 For instance, if the agent supports a superset of the features 669 reported by the reacting node then the agent might choose, based 670 on local policy, to advertise that superset of features to the 671 reporting node. 673 If the Diameter Agent changes the OC-Supported-Features AVP in a 674 request message then it is likely it will also need to modify the OC- 675 Supported-Features AVP in the answer message for the transaction. A 676 Diameter Agent MAY modify the OC-Supported-Features AVP carried in 677 answer messages. 679 When making changes to the OC-Supported-Features or OC-OLR AVPs, the 680 Diameter Agent needs to ensure consistency in its behavior with both 681 upstream and downstream DOIC nodes. 683 5.2. Overload Report Processing 685 5.2.1. Overload Control State 687 Both reacting and reporting nodes maintain Overload Control State 688 (OCS) for active overload conditions. The following sections define 689 behavior associated with that OCS. 691 5.2.1.1. Overload Control State for Reacting Nodes 693 A reacting node SHOULD maintain the following OCS per supported 694 Diameter application: 696 o A host-type OCS entry for each Destination-Host to which it sends 697 host-type requests and 699 o A realm-type OCS entry for each Destination-Realm to which it 700 sends realm-type requests. 702 A host-type OCS entry is identified by the pair of application-id and 703 the node's DiameterIdentity. 705 A realm-type OCS entry is identified by the pair of application-id 706 and realm. 708 The host-type and realm-type OCS entries MAY include the following 709 information (the actual information stored is an implementation 710 decision): 712 o Sequence number (as received in OC-OLR) 714 o Time of expiry (derived from OC-Validity-Duration AVP received in 715 the OC-OLR AVP and time of reception of the message carrying OC- 716 OLR AVP) 718 o Selected Abatement Algorithm (as received in the OC-Supported- 719 Features AVP) 721 o Abatement Algorithm specific input data (as received in the OC-OLR 722 AVP, for example, OC-Reduction-Percentage for the Loss abatement 723 algorithm) 725 5.2.1.2. Overload Control State for Reporting Nodes 727 A reporting node SHOULD maintain OCS entries per supported Diameter 728 application, per supported (and eventually selected) Abatement 729 Algorithm and per report-type. 731 An OCS entry is identified by the tuple of Application-Id, Report- 732 Type and Abatement Algorithm and MAY include the following 733 information (the actual information stored is an implementation 734 decision): 736 o Sequence number 738 o Validity Duration 740 o Expiration Time 742 o Algorithm specific input data (for example, the Reduction 743 Percentage for the Loss Abatement Algorithm) 745 5.2.1.3. Reacting Node Maintenance of Overload Control State 747 When a reacting node receives an OC-OLR AVP, it MUST determine if it 748 is for an existing or new overload condition. 750 Note: For the remainder of this section the term OLR refers to the 751 combination of the contents of the received OC-OLR AVP and the 752 abatement algorithm indicated in the received OC-Supported- 753 Features AVP. 755 When receiving an answer message with multiple OLRs of different 756 supported report types, a reacting node MUST process each received 757 OLR. 759 When receiving an answer message with multiple OLRs and multiple of 760 the OLRs are of the same supported report types, a reacting node 761 SHOULD ignore the duplicated OLRs. 763 A reacting node SHOULD ignore an OC-OLR with a OC-Report-Type AVP 764 that contains an unrecognized value. 766 The OLR is for an existing overload condition if a reacting node has 767 an OCS that matches the received OLR. 769 For a host-report this means it matches the application-id and the 770 host's DiameterIdentity in an existing host OCS entry. 772 For a realm-report this means it matches the application-id and the 773 realm in an existing realm OCS entry. 775 If the OLR is for an existing overload condition then a reacting node 776 MUST determine if the OLR is a retransmission or an update to the 777 existing OLR. 779 If the sequence number for the received OLR is greater than the 780 sequence number stored in the matching OCS entry then a reacting node 781 MUST update the matching OCS entry. 783 If the sequence number for the received OLR is less than or equal to 784 the sequence number in the matching OCS entry then a reacting node 785 MUST silently ignore the received OLR. The matching OCS MUST NOT be 786 updated in this case. 788 If the received OLR is for a new overload condition then a reacting 789 node MUST generate a new OCS entry for the overload condition. 791 For a host-report this means a reacting node creates on OCS entry 792 with the application-id in the received message and DiameterIdentity 793 of the Origin-Host in the received message. 795 Note: This solution assumes that the Origin-Host AVP in the answer 796 message included by the reporting node is not changed along the 797 path to the reacting node. 799 For a realm-report this means a reacting node creates on OCS entry 800 with the application-id in the received message and realm of the 801 Origin-Realm in the received message. 803 If the received OLR contains a validity duration of zero ("0") then a 804 reacting node MUST update the OCS entry as being expired. 806 Note: It is not necessarily appropriate to delete the OCS entry, 807 as there is recommended behavior that the reacting node slowly 808 returns to full traffic when ending an overload abatement period. 810 The reacting node does not delete an OCS when receiving an answer 811 message that does not contain an OC-OLR AVP (i.e. absence of OLR 812 means "no change"). 814 5.2.1.4. Reporting Node Maintenance of Overload Control State 816 A reporting node SHOULD create a new OCS entry when entering an 817 overload condition. 819 Note: If a reporting node knows through absence of the OC- 820 Supported-Features AVP in received messages that there are no 821 reacting nodes supporting DOIC then the reporting node can choose 822 to not create OCS entries. 824 When generating a new OCS entry the sequence number SHOULD be set to 825 zero ("0"). 827 When generating sequence numbers for new overload conditions, the new 828 sequence number MUST be greater than any sequence number in an active 829 (unexpired) overload report for the same application and report-type 830 previously sent by the reporting node. This property MUST hold over 831 a reboot of the reporting node. 833 Note: One way of addressing this over a reboot of a reporting node 834 is to use a time stamp for the first overload condition that 835 occurs after the report and to start using sequence numbers of 836 zero for subsequent overload conditions. 838 A reporting node MUST update an OCS entry when it needs to adjust the 839 validity duration of the overload condition at reacting nodes. 841 For instance, if a reporting node wishes to instruct reacting 842 nodes to continue overload abatement for a longer period of time 843 than originally communicated. This also applies if the reporting 844 node wishes to shorten the period of time that overload abatement 845 is to continue. 847 A reporting node MUST NOT update the abatement algorithm in an active 848 OCS entry. 850 A reporting node MUST update an OCS entry when it wishes to adjust 851 any abatement algorithm specific parameters, including, for example, 852 the reduction percentage used for the Loss abatement algorithm. 854 For instance, if a reporting node wishes to change the reduction 855 percentage either higher, if the overload condition has worsened, 856 or lower, if the overload condition has improved, then the 857 reporting node would update the appropriate OCS entry. 859 A reporting node MUST increment the sequence number associated with 860 the OCS entry anytime the contents of the OCS entry are changed. 861 This will result in a new sequence number being sent to reacting 862 nodes, instructing reacting nodes to process the OC-OLR AVP. 864 A reporting node SHOULD update an OCS entry with a validity duration 865 of zero ("0") when the overload condition ends. 867 Note: If a reporting node knows that the OCS entries in the 868 reacting nodes are near expiration then the reporting node might 869 decide not to send an OLR with a validity duration of zero. 871 A reporting node MUST keep an OCS entry with a validity duration of 872 zero ("0") for a period of time long enough to ensure that any non- 873 expired reacting node's OCS entry created as a result of the overload 874 condition in the reporting node is deleted. 876 5.2.2. Reacting Node Behavior 878 When a reacting node sends a request it MUST determine if that 879 request matches an active OCS. 881 If the request matches an active OCS then the reacting node MUST use 882 the overload abatement algorithm indicated in the OCS to determine if 883 the request is to receive overload abatement treatment. 885 For the Loss abatement algorithm defined in this specification, see 886 Section 6 for the overload abatement algorithm logic applied. 888 If the overload abatement algorithm selects the request for overload 889 abatement treatment then the reacting node MUST apply overload 890 abatement treatment on the request. The abatement treatment applied 891 depends on the context of the request. 893 If diversion abatement treatment is possible (i.e. a different path 894 for the request can be selected where the overloaded node is not part 895 of the different path), then the reacting node SHOULD apply diversion 896 abatement treatment to the request. Otherwise the reacting node 897 SHOULD apply throttling abatement treatment to the request. 899 If the overload abatement treatment results in throttling of the 900 request and if the reacting node is an agent then the agent MUST send 901 an appropriate error as defined in Section 8. 903 Diameter endpoints that throttle requests need to do so according to 904 the rules of the client application. Those rules will vary by 905 application, and are beyond the scope of this document. 907 In the case that the OCS entry indicated no traffic was to be sent to 908 the overloaded entity and the validity duration expires then overload 909 abatement associated with the overload report MUST be ended in a 910 controlled fashion. 912 5.2.3. Reporting Node Behavior 914 If there is an active OCS entry then a reporting node SHOULD include 915 the OC-OLR AVP in all answers to requests that contain the OC- 916 Supported-Features AVP and that match the active OCS entry. 918 Note: A request matches if the application-id in the request 919 matches the application-id in any active OCS entry and if the 920 report-type in the OCS entry matches a report-type supported by 921 the reporting node as indicated in the OC-Supported-Features AVP. 923 The contents of the OC-OLR AVP depend on the selected algorithm. 925 A reporting node MAY choose to not resend an overload report to a 926 reacting node if it can guarantee that this overload report is 927 already active in the reacting node. 929 Note: In some cases (e.g. when there are one or more agents in the 930 path between reporting and reacting nodes, or when overload 931 reports are discarded by reacting nodes) a reporting node may not 932 be able to guarantee that the reacting node has received the 933 report. 935 A reporting node MUST NOT send overload reports of a type that has 936 not been advertised as supported by the reacting node. 938 Note: A reacting node implicitly advertises support for the host 939 and realm report types by including the OC-Supported-Features AVP 940 in the request. Support for other report types will be explicitly 941 indicated by new feature bits in the OC-Feature-Vector AVP. 943 A reporting node SHOULD explicitly indicate the end of an overload 944 occurrence by sending a new OLR with OC-Validity-Duration set to a 945 value of zero ("0"). The reporting node SHOULD ensure that all 946 reacting nodes receive the updated overload report. 948 A reporting node MAY rely on the OC-Validity-Duration AVP values for 949 the implicit overload control state cleanup on the reacting node. 951 Note: All OLRs sent have an expiration time calculated by adding 952 the validity-duration contained in the OLR to the time the message 953 was sent. Transit time for the OLR can be safely ignored. The 954 reporting node can ensure that all reacting nodes have received 955 the OLR by continuing to send it in answer messages until the 956 expiration time for all OLRs sent for that overload condition have 957 expired. 959 When a reporting node sends an OLR, it effectively delegates any 960 necessary throttling to downstream nodes. If the reporting node also 961 locally throttles the same set of messages, the overall number of 962 throttled requests may be higher than intended. Therefore, before 963 applying local message throttling, a reporting node needs to check if 964 these messages match existing OCS entries, indicating that these 965 messages have survived throttling applied by downstream nodes that 966 have received the related OLR. 968 However, even if the set of messages match existing OCS entries, the 969 reporting node can still apply other abatement methods such as 970 diversion. The reporting node might also need to throttle requests 971 for reasons other than overload. For example, an agent or server 972 might have a configured rate limit for each client, and throttle 973 requests that exceed that limit, even if such requests had already 974 been candidates for throttling by downstream nodes. The reporting 975 node also has the option to send new OLRs requesting greater 976 reductions in traffic, reducing the need for local throttling. 978 A reporting node SHOULD decrease requested overload abatement 979 treatment in a controlled fashion to avoid oscillations in traffic. 981 For example, it might wait some period of time after overload ends 982 before terminating the OLR, or it might send a series of OLRs 983 indicating progressively less overload severity. 985 5.3. Protocol Extensibility 987 The DOIC solution can be extended. Types of potential extensions 988 include new traffic abatement algorithms, new report types or other 989 new functionality. 991 When defining a new extension that requires new normative behavior, 992 the specification MUST define a new feature for the OC-Feature- 993 Vector. This feature bit is used to communicate support for the new 994 feature. 996 The extension MAY define new AVPs for use in DOIC Capability 997 Announcement and for use in DOIC Overload reporting. These new AVPs 998 SHOULD be defined to be extensions to the OC-Supported-Features or 999 OC-OLR AVPs defined in this document. 1001 [RFC6733] defined Grouped AVP extension mechanisms apply. This 1002 allows, for example, defining a new feature that is mandatory to be 1003 understood even when piggybacked on an existing application. 1005 When defining new report type values, the corresponding specification 1006 MUST define the semantics of the new report types and how they affect 1007 the OC-OLR AVP handling. 1009 The OC-OLR AVP can be expanded with optional sub-AVPs only if a 1010 legacy DOIC implementation can safely ignore them without breaking 1011 backward compatibility for the given OC-Report-Type AVP value. 1013 Documents that introduce new report types MUST describe any 1014 limitations on their use across non-supporting agents. 1016 As with any Diameter specification, RFC6733 requires all new AVPs to 1017 be registered with IANA. See Section 9 for the required procedures. 1018 New features (feature bits in the OC-Feature-Vector AVP) and report 1019 types (in the OC-Report-Type AVP) MUST be registered with IANA. 1021 6. Loss Algorithm 1023 This section documents the Diameter overload loss abatement 1024 algorithm. 1026 6.1. Overview 1028 The DOIC specification supports the ability for multiple overload 1029 abatement algorithms to be specified. The abatement algorithm used 1030 for any instance of overload is determined by the Diameter Overload 1031 Capability Announcement process documented in Section 5.1. 1033 The loss algorithm described in this section is the default algorithm 1034 that must be supported by all Diameter nodes that support DOIC. 1036 The loss algorithm is designed to be a straightforward and stateless 1037 overload abatement algorithm. It is used by reporting nodes to 1038 request a percentage reduction in the amount of traffic sent. The 1039 traffic impacted by the requested reduction depends on the type of 1040 overload report. 1042 Reporting nodes request the stateless reduction of the number of 1043 requests by an indicated percentage. This percentage reduction is in 1044 comparison to the number of messages the node otherwise would send, 1045 regardless of how many requests the node might have sent in the past. 1047 From a conceptual level, the logic at the reacting node could be 1048 outlined as follows. 1050 1. An overload report is received and the associated OCS is either 1051 saved or updated (if required) by the reacting node. 1053 2. A new Diameter request is generated by the application running on 1054 the reacting node. 1056 3. The reacting node determines that an active overload report 1057 applies to the request, as indicated by the corresponding OCS 1058 entry. 1060 4. The reacting node determines if overload abatement treatment 1061 should be applied to the request. One approach that could be 1062 taken for each request is to select a random number between 1 and 1063 100. If the random number is less than or equal to the indicated 1064 reduction percentage then the request is given abatement 1065 treatment, otherwise the request is given normal routing 1066 treatment. 1068 6.2. Reporting Node Behavior 1070 The method a reporting node uses to determine the amount of traffic 1071 reduction required to address an overload condition is an 1072 implementation decision. 1074 When a reporting node that has selected the loss abatement algorithm 1075 determines the need to request a reduction in traffic, it includes an 1076 OC-OLR AVP in answer messages as described in Section 5.2.3. 1078 When sending the OC-OLR AVP, the reporting node MUST indicate a 1079 percentage reduction in the OC-Reduction-Percentage AVP. 1081 The reporting node MAY change the reduction percentage in subsequent 1082 overload reports. When doing so the reporting node must conform to 1083 overload report handing specified in Section 5.2.3. 1085 6.3. Reacting Node Behavior 1087 The method a reacting node uses to determine which request messages 1088 are given abatement treatment is an implementation decision. 1090 When receiving an OC-OLR in an answer message where the algorithm 1091 indicated in the OC-Supported-Features AVP is the loss algorithm, the 1092 reacting node MUST apply abatement treatment to the requested 1093 percentage of request messages sent. 1095 Note: The loss algorithm is a stateless algorithm. As a result, 1096 the reacting node does not guarantee that there will be an 1097 absolute reduction in traffic sent. Rather, it guarantees that 1098 the requested percentage of new requests will be given abatement 1099 treatment. 1101 When applying overload abatement treatment for the loss abatement 1102 algorithm, the reacting node MUST abate the requested percentage of 1103 requests that would have otherwise been sent to the reporting host or 1104 realm. 1106 If reacting node comes out of the 100 percent traffic reduction as a 1107 result of the overload report timing out, the following procedures 1108 are RECOMMENDED to be applied. The reacting node sending the traffic 1109 should be conservative and, for example, first send "probe" messages 1110 to learn the overload condition of the overloaded node before 1111 converging to any traffic amount/rate decided by the sender. Similar 1112 concerns apply in all cases when the overload report times out unless 1113 the previous overload report stated 0 percent reduction. 1115 The goal of this behavior is to reduce the probability of overload 1116 condition thrashing where an immediate transition from 100% 1117 reduction to 0% reduction results in the reporting node moving 1118 quickly back into an overload condition. 1120 If the reacting node does not receive an OLR in answers received from 1121 the formerly overloaded node then the reacting node SHOULD slowly 1122 increase the rate of traffic sent to the overloaded node. 1124 7. Attribute Value Pairs 1126 This section describes the encoding and semantics of the Diameter 1127 Overload Indication Attribute Value Pairs (AVPs) defined in this 1128 document. 1130 7.1. OC-Supported-Features AVP 1132 The OC-Supported-Features AVP (AVP code TBD1) is of type Grouped and 1133 serves two purposes. First, it announces a node's support for the 1134 DOIC solution in general. Second, it contains the description of the 1135 supported DOIC features of the sending node. The OC-Supported- 1136 Features AVP MUST be included in every Diameter request message a 1137 DOIC supporting node sends. 1139 OC-Supported-Features ::= < AVP Header: TBD1 > 1140 [ OC-Feature-Vector ] 1141 * [ AVP ] 1143 7.2. OC-Feature-Vector AVP 1145 The OC-Feature-Vector AVP (AVP code TBD2) is of type Unsigned64 and 1146 contains a 64 bit flags field of announced capabilities of a DOIC 1147 node. The value of zero (0) is reserved. 1149 The OC-Feature-Vector sub-AVP is used to announce the DOIC features 1150 supported by the DOIC node, in the form of a flag-bits field in which 1151 each bit announces one feature or capability supported by the node. 1152 The absence of the OC-Feature-Vector AVP in request messages 1153 indicates that only the default traffic abatement algorithm described 1154 in this specification is supported. The absence of the OC- Feature- 1155 Vector AVP in answer messages indicates that the default traffic 1156 abatement algorithm described in this specification is selected 1157 (while other traffic abatement algorithms may be supported), and no 1158 features other than abatement algorithms are supported. 1160 The following capabilities are defined in this document: 1162 OLR_DEFAULT_ALGO (0x0000000000000001) 1164 When this flag is set by the a DOIC reacting node it means that 1165 the default traffic abatement (loss) algorithm is supported. When 1166 this flag is set by a DOIC reporting node it means that the loss 1167 algorithm will be used for requested overload abatement. 1169 7.3. OC-OLR AVP 1171 The OC-OLR AVP (AVP code TBD3) is of type Grouped and contains the 1172 information necessary to convey an overload report on an overload 1173 condition at the reporting node. The application the OC-OLR AVP 1174 applies to is the same as the Application-Id found in the Diameter 1175 message header. The host or realm the OC-OLR AVP concerns is 1176 determined from the Origin-Host AVP and/or Origin-Realm AVP found in 1177 the encapsulating Diameter command. The OC-OLR AVP is intended to be 1178 sent only by a reporting node. 1180 OC-OLR ::= < AVP Header: TBD2 > 1181 < OC-Sequence-Number > 1182 < OC-Report-Type > 1183 [ OC-Reduction-Percentage ] 1184 [ OC-Validity-Duration ] 1185 * [ AVP ] 1187 7.4. OC-Sequence-Number AVP 1189 The OC-Sequence-Number AVP (AVP code TBD4) is of type Unsigned64. 1190 Its usage in the context of overload control is described in 1191 Section 5.2. 1193 From the functionality point of view, the OC-Sequence-Number AVP is 1194 used as a non-volatile increasing counter for a sequence of overload 1195 reports between two DOIC nodes for the same overload occurrence. 1196 Sequence numbers are treated in a uni-directional manner, i.e. two 1197 sequence numbers on each direction between two DOIC nodes are not 1198 related or correlated. 1200 7.5. OC-Validity-Duration AVP 1202 The OC-Validity-Duration AVP (AVP code TBD5) is of type Unsigned32 1203 and indicates in seconds the validity time of the overload report. 1204 The number of mseconds is measured after reception of the first OC- 1205 OLR AVP with a given value of OC-Sequence-Number AVP. The default 1206 value for the OC-Validity-Duration AVP is 30 seconds. When the OC- 1207 Validity-Duration AVP is not present in the OC-OLR AVP, the default 1208 value applies. The maximum value for the OC-Validity-Duration AVP is 1209 86,400 seconds (24 hours). 1211 7.6. OC-Report-Type AVP 1213 The OC-Report-Type AVP (AVP code TBD6) is of type Enumerated. The 1214 value of the AVP describes what the overload report concerns. The 1215 following values are initially defined: 1217 HOST_REPORT 0 The overload report is for a host. Overload abatement 1218 treatment applies to host-routed requests. 1220 REALM_REPORT 1 The overload report is for a realm. Overload 1221 abatement treatment applies to realm-routed requests. 1223 7.7. OC-Reduction-Percentage AVP 1225 The OC-Reduction-Percentage AVP (AVP code TBD7) is of type Unsigned32 1226 and describes the percentage of the traffic that the sender is 1227 requested to reduce, compared to what it otherwise would send. The 1228 OC-Reduction-Percentage AVP applies to the default (loss) algorithm 1229 specified in this specification. However, the AVP can be reused for 1230 future abatement algorithms, if its semantics fit into the new 1231 algorithm. 1233 The value of the Reduction-Percentage AVP is between zero (0) and one 1234 hundred (100). Values greater than 100 are ignored. The value of 1235 100 means that all traffic is to be throttled, i.e. the reporting 1236 node is under a severe load and ceases to process any new messages. 1237 The value of 0 means that the reporting node is in a stable state and 1238 has no need for the reacting node to apply any traffic abatement. 1240 7.8. Attribute Value Pair flag rules 1242 +---------+ 1243 |AVP flag | 1244 |rules | 1245 +----+----+ 1246 AVP Section | |MUST| 1247 Attribute Name Code Defined Value Type |MUST| NOT| 1248 +--------------------------------------------------+----+----+ 1249 |OC-Supported-Features TBD1 6.1 Grouped | | V | 1250 +--------------------------------------------------+----+----+ 1251 |OC-Feature-Vector TBD2 6.2 Unsigned64 | | V | 1252 +--------------------------------------------------+----+----+ 1253 |OC-OLR TBD3 6.3 Grouped | | V | 1254 +--------------------------------------------------+----+----+ 1255 |OC-Sequence-Number TBD4 6.4 Unsigned64 | | V | 1256 +--------------------------------------------------+----+----+ 1257 |OC-Validity-Duration TBD5 6.5 Unsigned32 | | V | 1258 +--------------------------------------------------+----+----+ 1259 |OC-Report-Type TBD6 6.6 Enumerated | | V | 1260 +--------------------------------------------------+----+----+ 1261 |OC-Reduction | | | 1262 | -Percentage TBD7 6.7 Unsigned32 | | V | 1263 +--------------------------------------------------+----+----+ 1265 As described in the Diameter base protocol [RFC6733], the M-bit usage 1266 for a given AVP in a given command may be defined by the application. 1268 8. Error Response Codes 1270 When a DOIC node rejects a Diameter request due to overload, the DOIC 1271 node MUST select an appropriate error response code. This 1272 determination is made based on the probability of the request 1273 succeeding if retried on a different path. 1275 A reporting node rejecting a Diameter request due to an overload 1276 condition SHOULD send a DIAMETER_TOO_BUSY error response, if it can 1277 assume that the same request may succeed on a different path. 1279 If a reporting node knows or assumes that the same request will not 1280 succeed on a different path, DIAMETER_UNABLE_TO_COMPLY error response 1281 SHOULD be used. Retrying would consume valuable resources during an 1282 occurrence of overload. 1284 For instance, if the request arrived at the reporting node without 1285 a Destination-Host AVP then the reporting node might determine 1286 that there is an alternative Diameter node that could successfully 1287 process the request and that retrying the transaction would not 1288 negatively impact the reporting node. DIAMETER_TOO_BUSY would be 1289 sent in this case. 1291 If the request arrived at the reporting node with a Destination- 1292 Host AVP populated with its own Diameter identity then the 1293 reporting node can assume that retrying the request would result 1294 in it coming to the same reporting node. 1295 DIAMETER_UNABLE_TO_COMPLY would be sent in this case. 1297 A second example is when an agent that supports the DOIC solution 1298 is performing the role of a reacting node for a non supporting 1299 client. Requests that are rejected as a result of DOIC throttling 1300 by the agent in this scenario would generally be rejected with a 1301 DIAMETER_UNABLE_TO_COMPLY response code. 1303 9. IANA Considerations 1305 9.1. AVP codes 1307 New AVPs defined by this specification are listed in Section 7. All 1308 AVP codes are allocated from the 'Authentication, Authorization, and 1309 Accounting (AAA) Parameters' AVP Codes registry. 1311 9.2. New registries 1313 Two new registries are needed under the 'Authentication, 1314 Authorization, and Accounting (AAA) Parameters' registry. 1316 A new "Overload Control Feature Vector" registry is required. The 1317 registry must contain the following: 1319 Feature Vector Value 1321 Specification - the specification that defines the new value. 1323 See Section 7.2 for the initial Feature Vector Value in the registry. 1324 This specification is the specification defining the value. New 1325 values can be added into the registry using the Specification 1326 Required policy. [RFC5226]. 1328 A new "Overload Report Type" registry is required. The registry must 1329 contain the following: 1331 Report Type Value 1333 Specification - the specification that defines the new value. 1335 See Section 7.6 for the initial assignment in the registry. New 1336 types can be added using the Specification Required policy [RFC5226]. 1338 10. Security Considerations 1340 DOIC gives Diameter nodes the ability to request that downstream 1341 nodes send fewer Diameter requests. Nodes do this by exchanging 1342 overload reports that directly effect this reduction. This exchange 1343 is potentially subject to multiple methods of attack, and has the 1344 potential to be used as a Denial-of-Service (DoS) attack vector. 1346 Overload reports may contain information about the topology and 1347 current status of a Diameter network. This information is 1348 potentially sensitive. Network operators may wish to control 1349 disclosure of overload reports to unauthorized parties to avoid its 1350 use for competitive intelligence or to target attacks. 1352 Diameter does not include features to provide end-to-end 1353 authentication, integrity protection, or confidentiality. This may 1354 cause complications when sending overload reports between non- 1355 adjacent nodes. 1357 10.1. Potential Threat Modes 1359 The Diameter protocol involves transactions in the form of requests 1360 and answers exchanged between clients and servers. These clients and 1361 servers may be peers, that is, they may share a direct transport 1362 (e.g. TCP or SCTP) connection, or the messages may traverse one or 1363 more intermediaries, known as Diameter Agents. Diameter nodes use 1364 TLS, DTLS, or IPsec to authenticate peers, and to provide 1365 confidentiality and integrity protection of traffic between peers. 1366 Nodes can make authorization decisions based on the peer identities 1367 authenticated at the transport layer. 1369 When agents are involved, this presents an effectively transitive 1370 trust model. That is, a Diameter client or server can authorize an 1371 agent for certain actions, but it must trust that agent to make 1372 appropriate authorization decisions about its peers, and so on. 1373 Since confidentiality and integrity protection occurs at the 1374 transport layer, agents can read, and perhaps modify, any part of a 1375 Diameter message, including an overload report. 1377 There are several ways an attacker might attempt to exploit the 1378 overload control mechanism. An unauthorized third party might inject 1379 an overload report into the network. If this third party is upstream 1380 of an agent, and that agent fails to apply proper authorization 1381 policies, downstream nodes may mistakenly trust the report. This 1382 attack is at least partially mitigated by the assumption that nodes 1383 include overload reports in Diameter answers but not in requests. 1384 This requires an attacker to have knowledge of the original request 1385 in order to construct an answer. Such an answer would also need to 1386 arrive at a Diameter node via a protected transport connection. 1387 Therefore, implementations MUST validate that an answer containing an 1388 overload report is a properly constructed response to a pending 1389 request prior to acting on the overload report, and that the answer 1390 was received via an appropriate transport connection. 1392 A similar attack involves a compromised but otherwise authorized node 1393 that sends an inappropriate overload report. For example, a server 1394 for the realm "example.com" might send an overload report indicating 1395 that a competitor's realm "example.net" is overloaded. If other 1396 nodes act on the report, they may falsely believe that "example.net" 1397 is overloaded, effectively reducing that realm's capacity. 1398 Therefore, it's critical that nodes validate that an overload report 1399 received from a peer actually falls within that peer's responsibility 1400 before acting on the report or forwarding the report to other peers. 1401 For example, an overload report from a peer that applies to a realm 1402 not handled by that peer is suspect. 1404 This attack is partially mitigated by the fact that the 1405 application, as well as host and realm, for a given OLR is 1406 determined implicitly by respective AVPs in the enclosing answer. 1407 If a reporting node modifies any of those AVPs, the enclosing 1408 transaction will also be affected. 1410 10.2. Denial of Service Attacks 1412 Diameter overload reports, especially realm-reports, can cause a node 1413 to cease sending some or all Diameter requests for an extended 1414 period. This makes them a tempting vector for DoS attacks. 1415 Furthermore, since Diameter is almost always used in support of other 1416 protocols, a DoS attack on Diameter is likely to impact those 1417 protocols as well. Therefore, Diameter nodes MUST NOT honor or 1418 forward OLRs received from peers that are not trusted to send them. 1420 An attacker might use the information in an OLR to assist in DoS 1421 attacks. For example, an attacker could use information about 1422 current overload conditions to time an attack for maximum effect, or 1423 use subsequent overload reports as a feedback mechanism to learn the 1424 results of a previous or ongoing attack. Operators need the ability 1425 to ensure that OLRs are not leaked to untrusted parties. 1427 10.3. Non-Compliant Nodes 1429 In the absence of an overload control mechanism, Diameter nodes need 1430 to implement strategies to protect themselves from floods of 1431 requests, and to make sure that a disproportionate load from one 1432 source does not prevent other sources from receiving service. For 1433 example, a Diameter server might throttle a certain percentage of 1434 requests from sources that exceed certain limits. Overload control 1435 can be thought of as an optimization for such strategies, where 1436 downstream nodes never send the excess requests in the first place. 1437 However, the presence of an overload control mechanism does not 1438 remove the need for these other protection strategies. 1440 When a Diameter node sends an overload report, it cannot assume that 1441 all nodes will comply, even if they indicate support for DOIC. A 1442 non-compliant node might continue to send requests with no reduction 1443 in load. Such non-compliance could be done accidentally, or 1444 maliciously to gain an unfair advantage over compliant nodes. 1445 Requirement 28 [RFC7068] indicates that the overload control solution 1446 cannot assume that all Diameter nodes in a network are trusted, and 1447 that malicious nodes not be allowed to take advantage of the overload 1448 control mechanism to get more than their fair share of service. 1450 10.4. End-to End-Security Issues 1452 The lack of end-to-end integrity features makes it difficult to 1453 establish trust in overload reports received from non-adjacent nodes. 1454 Any agents in the message path may insert or modify overload reports. 1455 Nodes must trust that their adjacent peers perform proper checks on 1456 overload reports from their peers, and so on, creating a transitive- 1457 trust requirement extending for potentially long chains of nodes. 1459 Network operators must determine if this transitive trust requirement 1460 is acceptable for their deployments. Nodes supporting Diameter 1461 overload control MUST give operators the ability to select which 1462 peers are trusted to deliver overload reports, and whether they are 1463 trusted to forward overload reports from non-adjacent nodes. DOIC 1464 nodes MUST strip DOIC AVPs from messages received from peers that are 1465 not trusted for DOIC purposes. 1467 The lack of end-to-end confidentiality protection means that any 1468 Diameter agent in the path of an overload report can view the 1469 contents of that report. In addition to the requirement to select 1470 which peers are trusted to send overload reports, operators MUST be 1471 able to select which peers are authorized to receive reports. A node 1472 MUST NOT send an overload report to a peer not authorized to receive 1473 it. Furthermore, an agent MUST remove any overload reports that 1474 might have been inserted by other nodes before forwarding a Diameter 1475 message to a peer that is not authorized to receive overload reports. 1477 A DOIC node cannot always automatically detect that a peer also 1478 supports DOIC. For example, a node might have a peer that is a 1479 non-supporting agent. If nodes on the other side of that agent 1480 send OC-Supported-Features AVPs, the agent is likely to forward 1481 them as unknown AVPs. Messages received across the non-supporting 1482 agent may be indistinguishable from messages received across a 1483 DOIC supporting agent, giving the false impression that the non- 1484 supporting agent actually supports DOIC. This complicates the 1485 transitive-trust nature of DOIC. Operators need to be careful to 1486 avoid situations where a non-supporting agent is mistakenly 1487 trusted to enforce DOIC related authorization policies. 1489 At the time of this writing, the DIME working group is studying 1490 requirements for adding end-to-end security features 1491 [I-D.ietf-dime-e2e-sec-req] to Diameter. These features, when they 1492 become available, might make it easier to establish trust in non- 1493 adjacent nodes for overload control purposes. Readers should be 1494 reminded, however, that the overload control mechanism encourages 1495 Diameter agents to modify AVPs in, or insert additional AVPs into, 1496 existing messages that are originated by other nodes. If end-to-end 1497 security is enabled, there is a risk that such modification could 1498 violate integrity protection. The details of using any future 1499 Diameter end-to-end security mechanism with overload control will 1500 require careful consideration, and are beyond the scope of this 1501 document. 1503 11. Contributors 1505 The following people contributed substantial ideas, feedback, and 1506 discussion to this document: 1508 o Eric McMurry 1510 o Hannes Tschofenig 1512 o Ulrich Wiehe 1514 o Jean-Jacques Trottin 1516 o Maria Cruz Bartolome 1518 o Martin Dolly 1520 o Nirav Salot 1522 o Susan Shishufeng 1524 12. References 1526 12.1. Normative References 1528 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1529 Requirement Levels", BCP 14, RFC 2119, March 1997. 1531 [RFC5226] Narten, T. and H. Alvestrand, "Guidelines for Writing an 1532 IANA Considerations Section in RFCs", BCP 26, RFC 5226, 1533 May 2008. 1535 [RFC6733] Fajardo, V., Arkko, J., Loughney, J., and G. Zorn, 1536 "Diameter Base Protocol", RFC 6733, October 2012. 1538 12.2. Informative References 1540 [Cx] 3GPP, , "ETSI TS 129 229 V11.4.0", August 2013. 1542 [I-D.ietf-dime-e2e-sec-req] 1543 Tschofenig, H., Korhonen, J., Zorn, G., and K. Pillay, 1544 "Diameter AVP Level Security: Scenarios and Requirements", 1545 draft-ietf-dime-e2e-sec-req-01 (work in progress), October 1546 2013. 1548 [PCC] 3GPP, , "ETSI TS 123 203 V11.12.0", December 2013. 1550 [RFC4006] Hakala, H., Mattila, L., Koskinen, J-P., Stura, M., and J. 1551 Loughney, "Diameter Credit-Control Application", RFC 4006, 1552 August 2005. 1554 [RFC7068] McMurry, E. and B. Campbell, "Diameter Overload Control 1555 Requirements", RFC 7068, November 2013. 1557 [S13] 3GPP, , "ETSI TS 129 272 V11.9.0", December 2012. 1559 Appendix A. Issues left for future specifications 1561 The base solution for the overload control does not cover all 1562 possible use cases. A number of solution aspects were intentionally 1563 left for future specification and protocol work. The following sub- 1564 sections define some of the potential extensions to the DOIC 1565 solution. 1567 A.1. Additional traffic abatement algorithms 1569 This specification describes only means for a simple loss based 1570 algorithm. Future algorithms can be added using the designed 1571 solution extension mechanism. The new algorithms need to be 1572 registered with IANA. See Sections 7.1 and 9 for the required IANA 1573 steps. 1575 A.2. Agent Overload 1577 This specification focuses on Diameter endpoint (server or client) 1578 overload. A separate extension will be required to outline the 1579 handling of the case of agent overload. 1581 A.3. New Error Diagnostic AVP 1583 This specification indicates the use of existing error messages when 1584 nodes reject requests due to overload. The DIME working group is 1585 considering defining additional error codes or AVPs to indicate that 1586 overload was the reason for the rejection of the message. 1588 Appendix B. Deployment Considerations 1590 Non Supporting Agents 1592 Due to the way that realm-routed requests are handled in Diameter 1593 networks with the server selection for the request done by an 1594 agent, network operators should enable DOIC at agents that perform 1595 server selection first. 1597 Topology Hiding Interactions 1598 There exist proxies that implement what is referred to as Topology 1599 Hiding. This can include cases where the agent modifies the 1600 Origin-Host in answer messages. The behavior of the DOIC solution 1601 is not well understood when this happens. As such, the DOIC 1602 solution does not address this scenario. 1604 Appendix C. Requirements Conformance Analysis 1606 This section contains the result of an analysis of the DOIC solutions 1607 conformance to the requirements defined in [RFC7068]. 1609 C.1. Deferred Requirements 1611 The 3GPP has adopted an early version of this document as normative 1612 references in various Diameter related specifications to support the 1613 overload control mechanism in their release 12 framework. The DIME 1614 working group has therefore decided to defer certain requirements in 1615 order to complete the design of an extensible, generic solution 1616 before the deadline scheduled by the 3GPP for the completion of the 1617 release 12 protocol work by the end of 2014. The deferred work 1618 includes the following: 1620 o Agent Overload - The ability for an agent to report an overload 1621 condition of the agent itself. 1623 o Load Information - The ability for a node to report its load level 1624 when not overloaded. 1626 At the time of this writing, DIME has begun separate work efforts for 1627 these requirements. 1629 C.2. Detection of non-supporting Intermediaries 1631 The DOIC mechanism as currently defined does not allow supporting 1632 nodes to automatically determine whether OC-Supported-Features or OC- 1633 OLR AVPs are originated by a peer node, or by a non-peer node and 1634 sent across a non-supporting peer. This makes it impossible to 1635 detect the presence of non-supporting nodes between supporting nodes, 1636 except by configuration. The working group determined that such a 1637 configuration requirement is acceptable. 1639 This limits full compliance with certain requirements related to the 1640 limitation of new configuration, deployment in environments with 1641 mixed support, operating across non-supporting agents, and 1642 authorization. 1644 C.3. Implicit Application Indication 1646 The working group elected to determine the application for an 1647 overload report from that of the enclosing message. This prevents 1648 sending an OLR for an application when there are no transactions for 1649 that application. 1651 As a consequence, DOIC does not comply with the requirement to be 1652 able to report overload information across quiescent connections. 1653 DOIC does not fully comply with requirements to operate on up-to-date 1654 information, since if an OLR causes all transactions to stop for an 1655 application, the only way traffic will resume is for the OLR to 1656 expire. 1658 C.4. Stateless Operation 1660 RFC7068 explicitly discourages the sending of OLRs in every answer 1661 message, as part of the requirement to avoid additional work for 1662 overloaded nodes. DOIC recommends exactly that behavior during 1663 active overload conditions. The working group determined that doing 1664 otherwise would reduce reliability and increase statefulness. (Note 1665 that DOIC does allow nodes to avoid sending OLRs in every answer if 1666 they have some other method of ensuring that OLRs get to all relevant 1667 reacting nodes.) 1669 C.5. No New Vulnerabilities 1671 The working group believes that DOIC is compliant with the 1672 requirement to avoid introducing new vulnerabilities. However, this 1673 requirement may warrant an early security expert review. 1675 C.6. Detailed Requirements 1677 [RFC Editor: Please remove this section and subsections prior to 1678 publication as an RFC.] 1680 C.6.1. General 1682 REQ 1: The solution MUST provide a communication method for Diameter 1683 nodes to exchange load and overload information. 1685 *Partially Compliant*. The mechanism uses new AVPs 1686 piggybacked on existing Diameter messages to exchange 1687 overload information. It does not currently support "load" 1688 information or the ability to report overload of an agent. 1689 These have been left for future extensions. 1691 REQ 2: The solution MUST allow Diameter nodes to support overload 1692 control regardless of which Diameter applications they 1693 support. Diameter clients and agents must be able to use the 1694 received load and overload information to support graceful 1695 behavior during an overload condition. Graceful behavior 1696 under overload conditions is best described by REQ 3. 1698 *Partially Compliant*. The DOIC AVPs can be used in any 1699 application that allows the extension of AVPs. However, 1700 "load" information is not currently supported. 1702 REQ 3: The solution MUST limit the impact of overload on the overall 1703 useful throughput of a Diameter server, even when the 1704 incoming load on the network is far in excess of its 1705 capacity. The overall useful throughput under load is the 1706 ultimate measure of the value of a solution. 1708 *Compliant*. DOIC provides information that nodes can use to 1709 reduce the impact of overload. 1711 REQ 4: Diameter allows requests to be sent from either side of a 1712 connection, and either side of a connection may have need to 1713 provide its overload status. The solution MUST allow each 1714 side of a connection to independently inform the other of its 1715 overload status. 1717 *Compliant*. DOIC AVPs can be included regardless of 1718 transaction "direction" 1720 REQ 5: Diameter allows nodes to determine their peers via dynamic 1721 discovery or manual configuration. The solution MUST work 1722 consistently without regard to how peers are determined. 1724 *Compliant*. DOIC contains no assumptions about how peers are 1725 discovered. 1727 REQ 6: The solution designers SHOULD seek to minimize the amount of 1728 new configuration required in order to work. For example, it 1729 is better to allow peers to advertise or negotiate support 1730 for the solution, rather than to require that this knowledge 1731 to be configured at each node. 1733 *Partially Compliant*. Most DOIC parameters are advertised 1734 using the DOIC capability announcement mechanism. However, 1735 there are some situations where configuration is required. 1736 For example, a DOIC node detect the fact that a peer may not 1737 support DOIC when nodes on the other side of the non- 1738 supporting node do support DOIC without configuration. 1740 C.6.2. Performance 1742 REQ 7: The solution and any associated default algorithm(s) MUST 1743 ensure that the system remains stable. At some point after 1744 an overload condition has ended, the solution MUST enable 1745 capacity to stabilize and become equal to what it would be in 1746 the absence of an overload condition. Note that this also 1747 requires that the solution MUST allow nodes to shed load 1748 without introducing non-converging oscillations during or 1749 after an overload condition. 1751 *Compliant*. The specification offers guidance that 1752 implementations should apply hysteresis when recovering from 1753 overload, and avoid sudden ramp ups in offered load when 1754 recovering. 1756 REQ 8: Supporting nodes MUST be able to distinguish current overload 1757 information from stale information. 1759 *Partially Compliant*. DOIC overload reports are "soft 1760 state", that is they expire after an indicated period. DOIC 1761 nodes may also send reports that end existing overload 1762 conditions. DOIC requires reporting nodes to ensure that all 1763 relevant reacting nodes receive overload reports. 1765 However, since DOIC does not allow reporting nodes to send 1766 OLRs in watchdog messages, if an overload condition results 1767 in zero offered load, the reporting node cannot update the 1768 condition until the expiration of the original OLR. 1770 REQ 9: The solution MUST function across fully loaded as well as 1771 quiescent transport connections. This is partially derived 1772 from the requirement for stability in REQ 7. 1774 *Not Compliant*. DOIC does not allow OLRs to be sent over 1775 quiescent transport connections. This is due to the fact 1776 that OLRs cannot be sent outside of the application to which 1777 they apply. 1779 REQ 10: Consumers of overload information MUST be able to determine 1780 when the overload condition improves or ends. 1782 *Partially Compliant*. (See response to previous two 1783 requirements.) 1785 REQ 11: The solution MUST be able to operate in networks of different 1786 sizes. 1788 *Compliant*. DOIC makes no assumptions about the size of the 1789 network. DOIC can operate purely between clients and 1790 servers, or across agents. 1792 REQ 12: When a single network node fails, goes into overload, or 1793 suffers from reduced processing capacity, the solution MUST 1794 make it possible to limit the impact of the affected node on 1795 other nodes in the network. This helps to prevent a small- 1796 scale failure from becoming a widespread outage. 1798 *Partially Compliant*. DOIC allows overload reports for an 1799 entire realm, where abated traffic will not be redirected 1800 towards another server. But in situations where nodes choose 1801 to divert traffic to other nodes, DOIC offers no way of 1802 knowing whether the new recipients can handle the traffic if 1803 they have not already indicated overload. This may be 1804 mitigated with the use of a future "load" extension, or with 1805 the use of proprietary dynamic load-balancing mechanisms. 1807 REQ 13: The solution MUST NOT introduce substantial additional work 1808 for a node in an overloaded state. For example, a 1809 requirement for an overloaded node to send overload 1810 information every time it received a new request would 1811 introduce substantial work. 1813 *Not Compliant*. DOIC does in fact encourage an overloaded 1814 node to send an OLR in every response. The working group 1815 that other mechanisms to ensure that every relevant node 1816 receives an OLR would create even more work. [Note: This 1817 needs discussion.] 1819 REQ 14: Some scenarios that result in overload involve a rapid 1820 increase of traffic with little time between normal levels 1821 and levels that induce overload. The solution SHOULD provide 1822 for rapid feedback when traffic levels increase. 1824 *Compliant*. The piggyback mechanism allows OLRs to be sent 1825 at the same rate as application traffic. 1827 REQ 15: The solution MUST NOT interfere with the congestion control 1828 mechanisms of underlying transport protocols. For example, a 1829 solution that opened additional TCP connections when the 1830 network is congested would reduce the effectiveness of the 1831 underlying congestion control mechanisms. 1833 *Compliant*. DOIC does not require or recommend changes in 1834 the handling of transport protocols or connections. 1836 C.6.3. Heterogeneous Support for Solution 1838 REQ 16: The solution is likely to be deployed incrementally. The 1839 solution MUST support a mixed environment where some, but not 1840 all, nodes implement it. 1842 *Partially Compliant*. DOIC works with most mixed-deployment 1843 scenarios. However, it cannot work across a non-supporting 1844 proxy that modifies Origin-Host AVPs in answer messages. 1845 DOIC will have limited impact in networks where the nodes 1846 that perform server selections do not support the mechanism. 1848 REQ 17: In a mixed environment with nodes that support the solution 1849 and nodes that do not, the solution MUST NOT result in 1850 materially less useful throughput during overload as would 1851 have resulted if the solution were not present. It SHOULD 1852 result in less severe overload in this environment. 1854 *Compliant*. In most mixed-support deployment, DOIC will 1855 offer at least some value, and will not make things worse. 1857 REQ 18: In a mixed environment of nodes that support the solution and 1858 nodes that do not, the solution MUST NOT preclude elements 1859 that support overload control from treating elements that do 1860 not support overload control in an equitable fashion relative 1861 to those that do. Users and operators of nodes that do not 1862 support the solution MUST NOT unfairly benefit from the 1863 solution. The solution specification SHOULD provide guidance 1864 to implementers for dealing with elements not supporting 1865 overload control. 1867 *Compliant*. DOIC provides mechanisms to abate load from non- 1868 supporting sources. Furthermore, it recommends that 1869 reporting nodes will still need to be able to apply whatever 1870 protections they would ordinarily apply if DOIC were not in 1871 use. 1873 REQ 19: It MUST be possible to use the solution between nodes in 1874 different realms and in different administrative domains. 1876 *Partially Compliant*. DOIC allows sending OLRs across 1877 administrative domains, and potentially to nodes in other 1878 realms. However, an OLR cannot indicate overload for realms 1879 other than the one in the Origin-Realm AVP of the containing 1880 answer. 1882 REQ 20: Any explicit overload indication MUST be clearly 1883 distinguishable from other errors reported via Diameter. 1885 *Compliant*. DOIC sends explicit overload indication in 1886 overload reports. It does not depend on error result codes. 1888 REQ 21: In cases where a network node fails, is so overloaded that it 1889 cannot process messages, or cannot communicate due to a 1890 network failure, it may not be able to provide explicit 1891 indications of the nature of the failure or its levels of 1892 overload. The solution MUST result in at least as much 1893 useful throughput as would have resulted if the solution were 1894 not in place. 1896 *Compliant*. DOIC overload reports have the primary effect of 1897 suppressing message retries in overload conditions. DOIC 1898 recommends that messages never be silently dropped if at all 1899 possible. 1901 C.6.4. Granular Control 1903 REQ 22: The solution MUST provide a way for a node to throttle the 1904 amount of traffic it receives from a peer node. This 1905 throttling SHOULD be graded so that it can be applied 1906 gradually as offered load increases. Overload is not a 1907 binary state; there may be degrees of overload. 1909 *Compliant*. The "loss" algorithm expresses a percentage 1910 reduction. 1912 REQ 23: The solution MUST provide sufficient information to enable a 1913 load-balancing node to divert messages that are rejected or 1914 otherwise throttled by an overloaded upstream node to other 1915 upstream nodes that are the most likely to have sufficient 1916 capacity to process them. 1918 *Not Compliant*. DOIC provides no built in mechanism to 1919 determine the best place to divert messages that would 1920 otherwise be throttled. This can be accomplished with a 1921 future "load" extension, or with proprietary load balancing 1922 mechanisms. 1924 REQ 24: The solution MUST provide a mechanism for indicating load 1925 levels, even when not in an overload condition, to assist 1926 nodes in making decisions to prevent overload conditions from 1927 occurring. 1929 *Not Compliant*. "Load" information has been left for a 1930 future extension. 1932 C.6.5. Priority and Policy 1934 REQ 25: The base specification for the solution SHOULD offer general 1935 guidance on which message types might be desirable to send or 1936 process over others during times of overload, based on 1937 application-specific considerations. For example, it may be 1938 more beneficial to process messages for existing sessions 1939 ahead of new sessions. Some networks may have a requirement 1940 to give priority to requests associated with emergency 1941 sessions. Any normative or otherwise detailed definition of 1942 the relative priorities of message types during an overload 1943 condition will be the responsibility of the application 1944 specification. 1946 *Compliant*. The specification offers guidance on how 1947 requests might be prioritized for different types of 1948 applications. 1950 REQ 26: The solution MUST NOT prevent a node from prioritizing 1951 requests based on any local policy, so that certain requests 1952 are given preferential treatment, given additional 1953 retransmission, not throttled, or processed ahead of others. 1955 *Compliant*. Nothing in the specification prevents 1956 application-specific, implementation-specific, or local 1957 policies. 1959 C.6.6. Security 1961 REQ 27: The solution MUST NOT provide new vulnerabilities to 1962 malicious attack or increase the severity of any existing 1963 vulnerabilities. This includes vulnerabilities to DoS and 1964 DDoS attacks as well as replay and man-in-the-middle attacks. 1965 Note that the Diameter base specification [RFC6733] lacks 1966 end-to-end security and this must be considered (see the 1967 Security Considerations in [RFC7068]). Note that this 1968 requirement was expressed at a high level so as to not 1969 preclude any particular solution. It is expected that the 1970 solution will address this in more detail. 1972 *Compliant*. The working group is not aware of any such 1973 vulnerabilities. [This may need further analysis.] 1975 REQ 28: The solution MUST NOT depend on being deployed in 1976 environments where all Diameter nodes are completely trusted. 1977 It SHOULD operate as effectively as possible in environments 1978 where other nodes are malicious; this includes preventing 1979 malicious nodes from obtaining more than a fair share of 1980 service. Note that this does not imply any responsibility on 1981 the solution to detect, or take countermeasures against, 1982 malicious nodes. 1984 *Partially Compliant*. Since all Diameter security is 1985 currently at the transport layer, nodes must trust immediate 1986 peers to enforce trust policies. However, there are 1987 situations where a DOIC node cannot determine if an immediate 1988 peer supports DOIC. The authors recommend an expert security 1989 review. 1991 REQ 29: It MUST be possible for a supporting node to make 1992 authorization decisions about what information will be sent 1993 to peer nodes based on the identity of those nodes. This 1994 allows a domain administrator who considers the load of their 1995 nodes to be sensitive information to restrict access to that 1996 information. Of course, in such cases, there is no 1997 expectation that the solution itself will help prevent 1998 overload from that peer node. 2000 *Partially Compliant*. (See response to previous 2001 requirement.) 2003 REQ 30: The solution MUST NOT interfere with any Diameter-compliant 2004 method that a node may use to protect itself from overload 2005 from non-supporting nodes or from denial-of-service attacks. 2007 *Compliant*. The specification recommends that any such 2008 protection mechanism needed without DOIC should continue to 2009 be employed with DOIC. 2011 C.6.7. Flexibility and Extensibility 2013 REQ 31: There are multiple situations where a Diameter node may be 2014 overloaded for some purposes but not others. For example, 2015 this can happen to an agent or server that supports multiple 2016 applications, or when a server depends on multiple external 2017 resources, some of which may become overloaded while others 2018 are fully available. The solution MUST allow Diameter nodes 2019 to indicate overload with sufficient granularity to allow 2020 clients to take action based on the overloaded resources 2021 without unreasonably forcing available capacity to go unused. 2022 The solution MUST support specification of overload 2023 information with granularities of at least "Diameter node", 2024 "realm", and "Diameter application" and MUST allow 2025 extensibility for others to be added in the future. 2027 *Partially Compliant*. All DOIC overload reports are scoped 2028 to the specific application and realm. Inside that scope, 2029 overload can be reported at the specific server or whole 2030 realm scope. As currently specified, DOIC cannot indicate 2031 local overload for an agent. At the time of this writing, 2032 the DIME working group has plans to work on an agent-overload 2033 extension. 2035 DOIC allows new "scopes" through the use of extended report 2036 types. 2038 REQ 32: The solution MUST provide a method for extending the 2039 information communicated and the algorithms used for overload 2040 control. 2042 *Compliant*. DOIC allows new report types and abatement 2043 algorithms to be created. These may be indicated using the 2044 OC-Supported-Features AVP. 2046 REQ 33: The solution MUST provide a default algorithm that is 2047 mandatory to implement. 2049 *Compliant*. The "loss" algorithm is mandatory to implement. 2051 REQ 34: The solution SHOULD provide a method for exchanging overload 2052 and load information between elements that are connected by 2053 intermediaries that do not support the solution. 2055 *Partially Compliant*. DOIC information can traverse non- 2056 supporting agents, as long as those agents do not modify 2057 certain AVPs. (e.g., Origin-Host). DOIC does not provide a 2058 way for supporting nodes to detect such modification. 2060 Appendix D. Considerations for Applications Integrating the DOIC 2061 Solution 2063 This section outlines considerations to be taken into account when 2064 integrating the DOIC solution into Diameter applications. 2066 D.1. Application Classification 2068 The following is a classification of Diameter applications and 2069 request types. This discussion is meant to document factors that 2070 play into decisions made by the Diameter identity responsible for 2071 handling overload reports. 2073 Section 8.1 of [RFC6733] defines two state machines that imply two 2074 types of applications, session-less and session-based applications. 2075 The primary difference between these types of applications is the 2076 lifetime of Session-Ids. 2078 For session-based applications, the Session-Id is used to tie 2079 multiple requests into a single session. 2081 The Credit-Control application defined in [RFC4006] is an example of 2082 a Diameter session-based application. 2084 In session-less applications, the lifetime of the Session-Id is a 2085 single Diameter transaction, i.e. the session is implicitly 2086 terminated after a single Diameter transaction and a new Session-Id 2087 is generated for each Diameter request. 2089 For the purposes of this discussion, session-less applications are 2090 further divided into two types of applications: 2092 Stateless Applications: 2094 Requests within a stateless application have no relationship to 2095 each other. The 3GPP defined S13 application is an example of a 2096 stateless application [S13], where only a Diameter command is 2097 defined between a client and a server and no state is maintained 2098 between two consecutive transactions. 2100 Pseudo-Session Applications: 2102 Applications that do not rely on the Session-Id AVP for 2103 correlation of application messages related to the same session 2104 but use other session-related information in the Diameter requests 2105 for this purpose. The 3GPP defined Cx application [Cx] is an 2106 example of a pseudo-session application. 2108 The handling of overload reports must take the type of application 2109 into consideration, as discussed in Appendix D.2. 2111 D.2. Application Type Overload Implications 2113 This section discusses considerations for mitigating overload 2114 reported by a Diameter entity. This discussion focuses on the type 2115 of application. Appendix D.3 discusses considerations for handling 2116 various request types when the target server is known to be in an 2117 overloaded state. 2119 These discussions assume that the strategy for mitigating the 2120 reported overload is to reduce the overall workload sent to the 2121 overloaded entity. The concept of applying overload treatment to 2122 requests targeted for an overloaded Diameter entity is inherent to 2123 this discussion. The method used to reduce offered load is not 2124 specified here but could include routing requests to another Diameter 2125 entity known to be able to handle them, or it could mean rejecting 2126 certain requests. For a Diameter agent, rejecting requests will 2127 usually mean generating appropriate Diameter error responses. For a 2128 Diameter client, rejecting requests will depend upon the application. 2129 For example, it could mean giving an indication to the entity 2130 requesting the Diameter service that the network is busy and to try 2131 again later. 2133 Stateless Applications: 2135 By definition there is no relationship between individual requests 2136 in a stateless application. As a result, when a request is sent 2137 or relayed to an overloaded Diameter entity - either a Diameter 2138 Server or a Diameter Agent - the sending or relaying entity can 2139 choose to apply the overload treatment to any request targeted for 2140 the overloaded entity. 2142 Pseudo-Session Applications: 2144 For pseudo-session applications, there is an implied ordering of 2145 requests. As a result, decisions about which requests towards an 2146 overloaded entity to reject could take the command code of the 2147 request into consideration. This generally means that 2148 transactions later in the sequence of transactions should be given 2149 more favorable treatment than messages earlier in the sequence. 2150 This is because more work has already been done by the Diameter 2151 network for those transactions that occur later in the sequence. 2152 Rejecting them could result in increasing the load on the network 2153 as the transactions earlier in the sequence might also need to be 2154 repeated. 2156 Session-Based Applications: 2158 Overload handling for session-based applications must take into 2159 consideration the work load associated with setting up and 2160 maintaining a session. As such, the entity sending requests 2161 towards an overloaded Diameter entity for a session-based 2162 application might tend to reject new session requests prior to 2163 rejecting intra-session requests. In addition, session ending 2164 requests might be given a lower probability of being rejected as 2165 rejecting session ending requests could result in session status 2166 being out of sync between the Diameter clients and servers. 2167 Application designers that would decide to reject mid-session 2168 requests will need to consider whether the rejection invalidates 2169 the session and any resulting session cleanup procedures. 2171 D.3. Request Transaction Classification 2173 Independent Request: 2175 An independent request is not correlated to any other requests 2176 and, as such, the lifetime of the session-id is constrained to an 2177 individual transaction. 2179 Session-Initiating Request: 2181 A session-initiating request is the initial message that 2182 establishes a Diameter session. The ACR message defined in 2183 [RFC6733] is an example of a session-initiating request. 2185 Correlated Session-Initiating Request: 2187 There are cases when multiple session-initiated requests must be 2188 correlated and managed by the same Diameter server. It is notably 2189 the case in the 3GPP PCC architecture [PCC], where multiple 2190 apparently independent Diameter application sessions are actually 2191 correlated and must be handled by the same Diameter server. 2193 Intra-Session Request: 2195 An intra-session request is a request that uses the same Session- 2196 Id than the one used in a previous request. An intra-session 2197 request generally needs to be delivered to the server that handled 2198 the session creating request for the session. The STR message 2199 defined in [RFC6733] is an example of an intra-session request. 2201 Pseudo-Session Requests: 2203 Pseudo-session requests are independent requests and do not use 2204 the same Session-Id but are correlated by other session-related 2205 information contained in the request. There exists Diameter 2206 applications that define an expected ordering of transactions. 2207 This sequencing of independent transactions results in a pseudo 2208 session. The AIR, MAR and SAR requests in the 3GPP defined Cx 2209 [Cx] application are examples of pseudo-session requests. 2211 D.4. Request Type Overload Implications 2213 The request classes identified in Appendix D.3 have implications on 2214 decisions about which requests should be throttled first. The 2215 following list of request treatment regarding throttling is provided 2216 as guidelines for application designers when implementing the 2217 Diameter overload control mechanism described in this document. The 2218 exact behavior regarding throttling is a matter of local policy, 2219 unless specifically defined for the application. 2221 Independent Requests: 2223 Independent requests can generally be given equal treatment when 2224 making throttling decisions, unless otherwise indicated by 2225 application requirements or local policy. 2227 Session-Initiating Requests: 2229 Session-initiating requests often represent more work than 2230 independent or intra-session requests. Moreover, session- 2231 initiating requests are typically followed by other session- 2232 related requests. Since the main objective of the overload 2233 control is to reduce the total number of requests sent to the 2234 overloaded entity, throttling decisions might favor allowing 2235 intra-session requests over session-initiating requests. In the 2236 absence of local policies or application specific requirements to 2237 the contrary, Individual session-initiating requests can be given 2238 equal treatment when making throttling decisions. 2240 Correlated Session-Initiating Requests: 2242 A Request that results in a new binding, where the binding is used 2243 for routing of subsequent session-initiating requests to the same 2244 server, represents more work load than other requests. As such, 2245 these requests might be throttled more frequently than other 2246 request types. 2248 Pseudo-Session Requests: 2250 Throttling decisions for pseudo-session requests can take into 2251 consideration where individual requests fit into the overall 2252 sequence of requests within the pseudo session. Requests that are 2253 earlier in the sequence might be throttled more aggressively than 2254 requests that occur later in the sequence. 2256 Intra-Session Requests: 2258 There are two types of intra-sessions requests, requests that 2259 terminate a session and the remainder of intra-session requests. 2260 Implementers and operators may choose to throttle session- 2261 terminating requests less aggressively in order to gracefully 2262 terminate sessions, allow cleanup of the related resources (e.g. 2263 session state) and avoid the need for additional intra-session 2264 requests. Favoring session-termination requests may reduce the 2265 session management impact on the overloaded entity. The default 2266 handling of other intra-session requests might be to treat them 2267 equally when making throttling decisions. There might also be 2268 application level considerations whether some request types are 2269 favored over others. 2271 Authors' Addresses 2273 Jouni Korhonen (editor) 2274 Broadcom 2275 Porkkalankatu 24 2276 Helsinki FIN-00180 2277 Finland 2279 Email: jouni.nospam@gmail.com 2281 Steve Donovan (editor) 2282 Oracle 2283 7460 Warren Parkway 2284 Frisco, Texas 75034 2285 United States 2287 Email: srdonovan@usdonovans.com 2289 Ben Campbell 2290 Oracle 2291 7460 Warren Parkway 2292 Frisco, Texas 75034 2293 United States 2295 Email: ben@nostrum.com 2296 Lionel Morand 2297 Orange Labs 2298 38/40 rue du General Leclerc 2299 Issy-Les-Moulineaux Cedex 9 92794 2300 France 2302 Phone: +33145296257 2303 Email: lionel.morand@orange.com