idnits 2.17.00 (12 Aug 2021) /tmp/idnits10391/draft-lu-fn-transport-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 24) being 61 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There is 1 instance of too long lines in the document, the longest one being 1 character in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document date (March 14, 2011) is 4086 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'OSPF' is mentioned on line 437, but not defined == Missing Reference: 'RFC2154' is mentioned on line 499, but not defined == Unused Reference: 'RFC2328' is defined on line 566, but no explicit reference was found in the text Summary: 1 error (**), 0 flaws (~~), 6 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group W. Lu 3 Internet-Draft S. Kini 4 Intended status: Standards Track A. Csaszar 5 Expires: September 15, 2011 G. Enyedi 6 J. Tantsura 7 A. Tian 8 Ericsson 9 March 14, 2011 11 Transport of Fast Notification Messages 12 draft-lu-fn-transport-01 14 Abstract 16 This document specifies a fast, light-weight event notification 17 protocol, called Fast Notification (FN) protocol. The draft 18 discusses the design goals, the message container and options for 19 delivering the notifications to all routers within a routing 20 area. 22 Status of this Memo 24 This Internet-Draft is submitted in full conformance with the 25 provisions of BCP 78 and BCP 79. 27 Internet-Drafts are working documents of the Internet Engineering 28 Task Force (IETF). Note that other groups may also distribute 29 working documents as Internet-Drafts. The list of current Internet- 30 Drafts is at http://datatracker.ietf.org/drafts/current/. 32 Internet-Drafts are draft documents valid for a maximum of six months 33 and may be updated, replaced, or obsoleted by other documents at any 34 time. It is inappropriate to use Internet-Drafts as reference 35 material or to cite them other than as "work in progress." 37 This Internet-Draft will expire on September 15, 2011. 39 Copyright Notice 41 Copyright (c) 2011 IETF Trust and the persons identified as the 42 document authors. All rights reserved. 44 This document is subject to BCP 78 and the IETF Trust's Legal 45 Provisions Relating to IETF Documents 46 (http://trustee.ietf.org/license-info) in effect on the date of 47 publication of this document. Please review these documents 48 carefully, as they describe your rights and restrictions with respect 49 to this document. Code Components extracted from this document must 50 include Simplified BSD License text as described in Section 4.e of 51 the Trust Legal Provisions and are provided without warranty as 52 described in the Simplified BSD License. 54 Table of Contents 56 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 57 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 3 58 1.2. Acronyms . . . . . . . . . . . . . . . . . . . . . . . . . 3 59 2. Design Goals . . . . . . . . . . . . . . . . . . . . . . . . . 4 60 3. Transport Logic - Distribution of the Notifications . . . . . 4 61 3.1. Multicast Tree-based Transport . . . . . . . . . . . . . . 4 62 3.1.1. Fault Tolerance of a Single Distribution Tree . . . . 5 63 3.1.2. Pair of Redundant Trees . . . . . . . . . . . . . . . 5 64 4. Message Encoding . . . . . . . . . . . . . . . . . . . . . . . 7 65 4.1. Seamless Encapsulation . . . . . . . . . . . . . . . . . . 7 66 4.2. Dedicated FN Message . . . . . . . . . . . . . . . . . . . 7 67 4.2.1. Authentication . . . . . . . . . . . . . . . . . . . . 9 68 4.2.1.1. Areas-scoped and Link-scoped Authentication . . . 10 69 4.2.1.2. Simple Password Authentication . . . . . . . . . . 10 70 4.2.1.3. Cryptographic Authentication for FN . . . . . . . 10 71 4.2.1.4. MD5 . . . . . . . . . . . . . . . . . . . . . . . 11 72 4.2.1.5. Digital Signatures . . . . . . . . . . . . . . . . 12 73 5. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 12 74 6. Security Considerations . . . . . . . . . . . . . . . . . . . 12 75 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 12 76 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 13 77 8.1. Normative References . . . . . . . . . . . . . . . . . . . 13 78 8.2. Informative References . . . . . . . . . . . . . . . . . . 13 79 Appendix A. Further Options for Transport Logic . . . . . . . . . 13 80 A.1. Unicast . . . . . . . . . . . . . . . . . . . . . . . . . 14 81 A.1.1. Method . . . . . . . . . . . . . . . . . . . . . . . . 14 82 A.1.2. Sample Operation . . . . . . . . . . . . . . . . . . . 15 83 A.2. Gated Multicast through RPF Check . . . . . . . . . . . . 15 84 A.2.1. Loop Prevention - RPF Check . . . . . . . . . . . . . 16 85 A.2.2. Operation . . . . . . . . . . . . . . . . . . . . . . 16 86 A.3. Further Multicast Tree based Transport Options . . . . . . 18 87 A.3.1. Source Specific Trees . . . . . . . . . . . . . . . . 18 88 A.3.2. A Single Bidirectional Shared Tree . . . . . . . . . . 18 89 A.4. Layer 2 Networks . . . . . . . . . . . . . . . . . . . . . 19 90 Appendix B. Computing maximally redundant trees . . . . . . . . . 19 91 B.1. Simple pair of maximally redundant trees in 92 2-connected networks . . . . . . . . . . . . . . . . . . . 19 93 B.2. Non-2-connected networks . . . . . . . . . . . . . . . . . 21 94 B.3. Finding maximally redundant trees in distributed 95 environment . . . . . . . . . . . . . . . . . . . . . . . 22 96 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 23 98 1. Introduction 100 Draft [fn-framework] describes the architectural framework to enable 101 fast dissemination of a network event to routers in a limited area. 102 Existing use cases involve new approaches for IP Fast ReRoute such as 103 [ipfrr-fn], and faster dissemination of link state information for 104 routing protocols [ospf-fn] in order to speed up convergence. 106 A hop by hop control plane based flooding mechanism is used widely 107 today in link state routing protocols such as OSPF and ISIS to 108 propagate routing information throughout an area. In this mechanism, 109 the information is processed in the control plane at each hop before 110 being forwarded to the next. The extra processing, scheduling, and 111 communications overhead causes unnecessary delays in the 112 dissemination of the information. 114 This draft proposes a generic fast notification (FN) protocol as a 115 separate transport layer, which focuses on delivering notifications 116 quickly in a secure manner. It can be used by many existing 117 applications to enhance the performance of those applications, as 118 well as to enable new services in the network. 120 1.1. Requirements Language 122 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 123 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 124 document are to be interpreted as described in RFC 2119 [RFC2119]. 126 1.2. Acronyms 128 FN - Fast Notification 130 RPF - Reverse Path Forwarding 132 IGP - Interior Gateway Protocol 134 SPT - Shortest Path Tree 136 STP - Spanning Tree Protocol 138 OSPF - Open Shortest Path First 140 IS-IS - Intermediate System to Intermediate System 142 MD5 - Message Digest 5 144 2. Design Goals 146 A light-weight event notification mechanism that could be used to 147 facilitate quick dissemination of information in a limited area 148 should have the following properties. 150 1. The mechanism should be fast. It should provide low end to end 151 propagation delay for the notifications. 153 2. The signaling mechanism should offer a high degree of reliability 154 under network failure conditions. 156 3. The mechanism should be secure; that is, it should provide means 157 to verify the authenticity of the notifications. 159 4. The new protocol should not be dependent upon routing protocol 160 flooding procedures. 162 5. The mechanism should have low processing overhead 164 These design goals present a trade-off. Proper balance needs to be 165 found that offers good authentication and reliability while keeping 166 processing complexity sufficiently low. This draft proposes 167 solutions that take the above goals and trade-offs into 168 considerations. 170 3. Transport Logic - Distribution of the Notifications 172 The distribution of a notification to multiple receivers can be 173 implemented in many ways. The main body of this draft describes one 174 such option: dual redundant trees. This option allows each 175 notification to be delivered to any node in the area in case of 176 single node or link failure. 178 3.1. Multicast Tree-based Transport 180 One way of transporting an identical piece of information to several 181 receivers at the same time is to use multicast distribution trees. A 182 tree based transport solution is beneficial since multicast support 183 is already implemented in all forwarding entities, so it is possible 184 to use existing implementations. 186 With multicast or tree based transport, the Fast Notification (FN) 187 packet can be recognized by a pre-configured or well known 188 destination IP address, denoted by MC-FN in the following, which is 189 the group address of the FN service. 191 If the FN service is triggered to send out a notification, the 192 notification will be encapsulated in a new IP packet, where the 193 destination IP address is set to MC-FN. 195 3.1.1. Fault Tolerance of a Single Distribution Tree 197 Several solutions described in this draft use a single tree to 198 disseminate a notification from one given source. 200 The single tree solution is simple, however it is not redundant: a 201 single failure may partition the tree, which will prevent 202 notifications from reaching some nodes in the area. 204 Different applications may have different needs for reliability. For 205 example, when we use fast notification to disseminate network failure 206 information, all nodes surrounding the failure can detect and 207 originate the failure notifications independently. Any one of these 208 notifications (or a subset of them) may be sufficient for the 209 application to make the right decision. This draft provides several 210 different transport options from which an applications can choose. 212 3.1.2. Pair of Redundant Trees 214 If an FN application needs the exact same data to be distributed in 215 the case of any single node or any single link failure, the FN 216 service should be run in "redundant tree mode". 218 A pair of "redundant trees" ensures that at each single node or link 219 failure each node still reaches the common root of the trees through 220 at least one of the trees. A redundant tree pair is a known prior- 221 art graph-theoretical object that is possible to find on any 2-node 222 connected network. Even better, it is even possible to find 223 maximally redundant trees in networks where the 2-node connected 224 criterion does not "fully" hold (e.g. there are a few cut vertices) 225 [Eny2009]. 227 Note that the referenced algorithm(s) build a pair of trees 228 considering a specific root. The root can be selected in different 229 ways, the only thing that is important that each node makes the same 230 selection, consistently. For instance, the node with the highest or 231 lowest router ID can be used. 233 #1 tree #2 tree 234 +---+ +---+ +---+ +---+ 235 | B |=======| | | B |=======| | 236 +---+ +---+ +---+ +---+ 237 // \\ // \ 238 // \\ // \ 239 +---+ +---+ +---+ +---+ 240 | A |---------------------| R | | A |=====================| R | 241 +---+ +---+ +---+ +---+ 242 \ // \\ / 243 \ // \\ / 244 +---+ +---+ +---+ +---+ 245 | |=======| | | |=======| | 246 +---+ +---+ +---+ +---+ 248 Figure 1: Example: a pair of redundant trees (double lines) of a 249 common root R 251 There is one special constraint in building the redundant trees. A 252 (maximally) redundant tree pair is needed, where in one of the trees 253 the root has only one child in order to protect against the failure 254 of the root itself. Algorithms presented in [Eny2009] produce such 255 trees. The algorithm is also described in Appendix B in this 256 specification. 258 In redundant-tree mode, each node multicasts the requested 259 notification on both trees, if it is possible, but at least along one 260 of the trees. Redundant trees require two multicast group addresses. 261 MC-FN identifies one of the trees, and MC-FN-2 identifies the other 262 tree. 264 Each node multicast forwards the received notification packet (on the 265 same tree). The root node performs as every other node but in 266 addition it also multicast the notification on the other tree! I.e. 267 it forwards a replica of the incoming notification in which it 268 replaces the destination address identifying the other multicast 269 distribution tree. 271 When the network remains connected and the root remains operable 272 after a single failure, the root will be reached on at least one of 273 the trees. Thus, since the root can reach every node along at least 274 one of the trees, all the notifications will reach each node. 275 However, when the root or the link to the root fails, that tree, in 276 which the root has only one child, remains connected (the root is a 277 leaf there), thus, all the nodes can be reached along that tree. 279 For example, let us consider that in Figure 1 FN is used to 280 disseminate failure information. If link A-B fails, the 281 notifications originating from node B (e.g. reporting that the 282 connectivity from B to A is lost) will reach R on tree #1. 283 Notifications originating from A (e.g. reporting that the 284 connectivity from A to B is lost) will reach R on tree #2. From R, 285 each node is reachable through one of the trees, so each node will be 286 notified about both events. 288 4. Message Encoding 290 4.1. Seamless Encapsulation 292 An application may define its own message for FN to distribute 293 quickly. In this case, only the special destination address (e.g. 294 MC-FN) shows that the message was sent using the FN service. 296 In this case, the entire payload of the IP packet is determined by 297 the application including sequence numbering and authentication. The 298 IP packet's protocol field can also be set by the application. 300 4.2. Dedicated FN Message 302 An alternative option is for the FN messages to be distributed in UDP 303 datagrams with well-known port values in the UDP header that need to 304 be allocated by IANA. 306 The FN packet format inside a UDP datagram is the following: 308 0 1 2 3 309 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 310 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 311 | | 312 +- -+ 313 | IP Header | 314 +- +-------------+ -+ 315 | | Protocol=UDP| | 316 +- +-------------+ -+ 317 | | 318 +- -+ 319 | | 320 +---------------------------------------------------------------+ 321 | UDP Source Port = FN | UDP Destination Port = FN | 322 +---------------------------------------------------------------+ 323 | UDP Header cont'd | 324 +---------------------------------------------------------------+ 325 | FN Header | 326 +---------------------------------------------------------------+ 327 | ... | 328 . . 329 . FN Payload . 330 . . 331 | ... | 332 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 333 | ... | 334 . . 335 . Authentication (optional) . 336 . . 337 | ... | 338 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 340 Figure 2: FN packet format as a UDP datagram 342 The encoding of the FN Header is as follows: 344 0 1 2 3 345 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 346 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 347 | FN Length | FN App Type | AuType|unused | 348 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 350 Figure 3: FN Header encoding 352 FN Length (16 bits) 353 The length of the FN message in bytes including the FN Header and 354 the FN Payload. The authentication data optionally appended to 355 the FN packet is not considered part of the FN message: the 356 authentication data is not included in the FN Length field, 357 although it is included in the length field of the packet's IP 358 header. 360 FN App Type (8 bits) 361 Identifies the application which should be the receiver of the 362 notification. A value for each application needs to be assigned 363 by IANA. 365 AuType 366 Identifies the authentication procedure to be used for the packet. 367 Authentication options are discussed in Section 4.2.1. of the 368 specification. 370 4.2.1. Authentication 372 Fast Notification intends to provide a trustable service option, so 373 that receivers of FN packets are able to verify that the packet is 374 sent by an authentic source. Simple password authentication and MD5 375 authentication is described in the following subsections. 377 If AuType is set to 0x0, then the FN packet is not carrying an 378 Authentication field at the end of the packet. Note that even in 379 this case the FN application in the payload may still use its own 380 authentication mechanism. 382 If AuType is non null, an Authentication field must be appended after 383 the FN message. The encoding of this field is as follows: 385 0 1 2 3 386 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 387 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 388 | AuLength | ... Authentication Data ... | 389 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 390 | ... | 392 Figure 4: Authentication field in FN packets 394 AuLength 395 Describes the length of the entire Authentication field in bytes. 397 4.2.1.1. Areas-scoped and Link-scoped Authentication 399 Since FN is a solution to disseminate an event notification from one 400 source to a whole area of nodes, the simplest approach would be to 401 use per-area authentication, for example. a common password, a common 402 pre- shared key among all nodes in the area as described in the 403 following sub-sections, or digital signatures. 405 Carriers may, however, prefer per-link authentication. In order not 406 to lose the speed (simple per-hop processing, fast forwarding 407 property) of FN, link-scoped authentication is suggested only if the 408 forwarding plane supports it, i.e. if there is hardware support to 409 verify and re-generate authentication hop-by-hop. In such cases, the 410 operator may need to configure a common pre-shared key only on 411 routers connected by the same link. It is even possible that there 412 is no authentication on some links considered safe. 414 4.2.1.2. Simple Password Authentication 416 Simple password authentication guards against routers inadvertently 417 joining the routing area; each router must first be configured with a 418 password before it can participate in Fast Notification. 420 The password is stored in the Authentication Data field. AuLength is 421 set to the length of the password in bytes plus 1. Two AuType values 422 for simple password authentication need to be allocated by IANA: one 423 for area-scope and another for link-scoped. 425 With per-link authentication mode, the Authentication field must be 426 stripped and regenerated hop-by-hop. 428 Simple password authentication, however, can be easily compromised as 429 anyone with physical access to the network can read the password. 431 4.2.1.3. Cryptographic Authentication for FN 433 Using this authentication type, a secret key is used to generate/ 434 verify a "message digest" that is appended to the end of the FN 435 packet. The message digest is a one-way function of the FN packet 436 and the secret key. This authentication mechanism resembles the 437 cryptographic authentication mechanism of [OSPF]. 439 4.2.1.4. MD5 441 The packet signature is created by an MD5 hash performed on an object 442 which is the concatenation of the FN message, including the FN 443 header, and the pre-shared secret key. The resulting 16 byte MD5 444 message digest is appended to the FN message into the Authentication 445 field as shown below. 447 The AuType in the FN header is set to indicate cryptographic 448 authentication, the specific value is to be assigned by IANA both for 449 area-scoped and for link-scoped versions. 451 0 1 2 3 452 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 453 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 454 | AuLength | Key ID | Unused | 455 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 456 | Message Digest (bytes 1-4) | 457 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 458 | Message Digest (bytes 5-8) | 459 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 460 | Message Digest (bytes 9-12) | 461 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 462 | Message Digest (bytes 13-16) | 463 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 465 Figure 5: Authentication field in FN packets with MD5 cryptographic 466 authentication. 468 AuLength 469 AuLength is set to 20 bytes. 471 Key ID 472 This field identifies the algorithm and secret key used to create 473 the message digest appended to the FN packet. This field allows 474 that multiple pre-shared keys may exist in parallel. 476 Message Digest 477 The 16 byte long MD5 hash performed on an object which is the 478 concatenation of the FN message, including the FN header, and the 479 pre-shared secret key identified by Key ID. 481 When receiving an FN message, if the FN header indicates MD5 482 authentication, then the last 20 bytes of the FN message are set 483 aside. The recipient forwarding plane element calculates a new MD5 484 digest of the remainder of the FN message to which it appends its own 485 known secret key identified by Key ID. The calculated and received 486 digests are compared. In case of mismatch, the FN message is 487 discarded. 489 In per-link authentication mode, the Authentication field must be 490 regenerated hop-by-hop using the key of the outgoing link. 492 4.2.1.5. Digital Signatures 494 A router may choose to use public key cryptography to digitally sign 495 the notification to provide certification of authenticity. This 496 mechanism can avoid shared secret that is required for other 497 authentication mechanisms described in this document. This 498 authentication mechanism resembles the authentication mechanism of 499 OSPF with digital signatures as defined in [RFC2154]. 501 5. Acknowledgements 503 The authors owe thanks to Acee Lindem, Joel Halpern and Jakob Heitz 504 for their review and comments. 506 6. Security Considerations 508 This draft has described basic optional procedures for 509 authentication. The mechanism, however, does not protect against 510 replay attacks. 512 If an application of FN require protection against replay attacks, 513 then these applications should provide their own specific sequence 514 numbering within the FN payload. Recipient applications should 515 accept FN messages only if the included sequence number is valid. 517 Since the message digest of cryptographic authentication also covers 518 the payload, even if an attacker knew how to construct the new 519 sequence number, it would not be able to generate a correct message 520 digest without the pre shared key. This way, a sequence number in 521 the payload combined with FN's cryptographic authentication offers 522 sufficient protection against replay attacks. 524 7. IANA Considerations 526 An IP protocol value needs to be assigned by IANA for FN. IANA also 527 needs to maintain values for FN App Type as applications are being 528 proposed. 530 Multicast addresses used for the distribution trees are either 531 allocated by IANA or they can be a configuration parameter within the 532 local domain. 534 8. References 536 8.1. Normative References 538 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 539 Requirement Levels", BCP 14, RFC 2119, March 1997. 541 [BIDIR-PIM] Handley, M., Kouvelas, I., Speakman, T., and L. Vicisano, 542 "Bidirectional Protocol Independent Multicast (BIDIR- 543 PIM)", RFC 5015, October 2007. 545 8.2. Informative References 547 [Eny2009] Enyedi, G., Retvari, G., and A. Csaszar, "On Finding 548 Maximally Redundant Trees in Strictly Linear Time, IEEE 549 Symposium on Computers and Communications (ISCC)", 2009. 551 [ipfrr-fn] 552 Kini, S., Csaszar, A., and G. Envedi, "IP Fast Re-Route 553 with Fast Notification", draft-csaszar-ipfrr-fn-00 (work 554 in progress), March 2011. 556 [ospf-fn] 557 Kini, S., Lu, W., and A. Tian, "OSPF Fast Notification", 558 draft-kini-ospf-fast-notification-01 (work in progress), 559 March 2011. 561 [fn-framework] 562 Lu, W., Tian, A., and S. Kini, "Fast Notification 563 Framework", draft-lu-fast-notification-framework-00 (work 564 in progress), October 2010. 566 [RFC2328] Moy, J., "OSPF Version 2", STD 54, RFC 2328, April 1998. 568 Appendix A. Further Options for Transport Logic 570 The options described in this appendix represent alternative 571 solutions to the redundant tree based approach described in Section 572 3.1.2. 574 It is left for WG discussion and further evaluation to decide whether 575 any of these options should potentially be preferred instead of 576 redundant trees. 578 A.1. Unicast 580 This method addresses the need in a unique way. It has the following 581 properties: 583 Plain simple, without the need of any forwarding plane change or 584 cooperation; 586 Short turnaround time (i.e. ready for next hit); 588 100% link break coverage (may not work in certain node failure 589 cases); 591 Little change to OSPF (need encapsulation for IS-IS). 593 A.1.1. Method 595 The method is simple in design, easy to implement and quick to 596 deploy. It requires no topology changes or specific configurations. 597 It adds little overhead to the overall system. 599 The method sends the event message to every router in the area in an 600 IP packet. This appears burdensome to the sending router which has 601 to duplicate the packet sending effort many times. Practical 602 experience has shown, however, that the amount of effort is not a big 603 concern in reasonable sized networks. 605 Normal flooding (regular or fast) process requires a router to 606 duplicate the packet to all flooding eligible interfaces. All 607 routers have to be fast-flooding-aware. This implies new code to 608 every router in control plane and/or forwarding plane. 610 The method uses a different approach. It takes advantage of the 611 given routing/forwarding table in each router in the IP domain. The 612 originating router of the flooding information simply sends multiple 613 copies of the packet to each and every router in the domain. These 614 packets are forwarded to the destination routers at forwarding plane 615 speed, 617 just like the way the regular IP data traffic is handled. No special 618 handling in any other routers is needed. 620 This small delay on the sender can be minimized by pre-downloading 621 the link-broken message packets to the forwarding plane. Since the 622 forwarding plane already has the list of all routers which are part 623 of the IGP routing table, the forwarding plane can dispatch the 624 packet directly. 626 In essence, the flooding in this method is tree based, just like a 627 multicast tree. The key is that no special tree is generated for 628 this purpose; the normal routing table which is an SPF tree (SPT) 629 plays a role of the flooding tree. This logic guarantees that the 630 flooding follows the shortest path and no flooding loop is created. 632 A.1.2. Sample Operation 634 Figure 6 depicts a scenario where router A wants to flood its message 635 to all other routers in the domain using the unicast flooding method. 637 Instead of sending one packet to each of its neighbor, and letting 638 the neighbor flood the packet further, router A directly send the 639 same packet to each router in the domain, one at a time. In this 640 sample network, router A sends out 5 packets. 642 A---B---C---D 643 \ 644 --E---F 646 1. Packet(A->B); 647 2. Packet(A->C); 648 3. Packet(A->D); 649 4. Packet(A->E); 650 5. Packet(A->F). 652 Figure 6: Multiple Unicast Packets 654 The unicast flooding procedure is solely controlled by the sending 655 router. No action is needed from other routers other than their 656 normal forwarding functionalities. This method is extremely simple 657 and useful for quick prototyping and deployment. 659 A.2. Gated Multicast through RPF Check 661 This method fulfills the purpose with the following characters: 663 1. No need to build the multicast tree. It is the same as the SPT 664 computed by the IGP routing process; 666 2. Flooding loops are prevented by RPF Check. 668 The method has all the benefits of multicast flooding. It, however, 669 does not require running multicast protocol to setup the multicast 670 tree. The unicast shortest path tree is used as a multicast tree. 672 A.2.1. Loop Prevention - RPF Check 674 In this mechanism, the distribution tree is not explicitly built. 675 Rather, each node will first do a Reverse Path Forwarding (RPF) check 676 before it floods the notification to other links. 678 A special multicast address is defined and is subject to IANA 679 approval. This address is used to qualify the notification packet 680 for fast flooding. When a notification packet arrives, the receiving 681 node will perform an IP unicast routing table lookup for the 682 originator IP address of the notification and find the outgoing 683 interface. Only when the arriving interface of the notification is 684 the same as the outgoing interface leading towards the originator IP 685 address, will the notification be flooded to other interfaces. 687 IP Multicast forwarding with RPF check is available on most of the 688 routing/switching platforms. To support flooding with RPF check, a 689 special IP multicast group must be used. A bi-directional IP 690 multicast forwarding entry is created that consists of all interfaces 691 within the flooding scope, typically an IGP area. 693 A.2.2. Operation 695 The Gated flooding operation is illustrated in Figure 7. 697 All Routers, IGP Process: 698 if (SPT ready) { 699 duplicate the SPT as Bidir_Multicast_tree; 700 download the multicast_tree to forwarding plane; 701 } 702 add FNF_multicast_group_addr; 704 Sender of the FNF notification: 705 if (breakage detected) { 706 pack the notification in a packet; 707 send the packet to the FNF_multicast_group_addr; 708 } 710 Receiver of the FNF notification: 711 if (notification received) { 712 if (RPC_interface == incoming_interface) { 713 multicast the notification to all other interfaces; 714 } 715 forward the notification to IGP for processing; 716 } 718 Figure 7: Gated flooding operation 720 Figure 8 shows a sample operation on a four-router mesh network. The 721 left figure is the topology. The right figure is the shortest path 722 tree rooted at A. 724 Router A initiates the flooding. But the downstream routers B, C, 725 and D will drop all messages except the ones that come from their 726 shortest path parent node. For example, A's message to C via B is 727 dropped by C, because C knows that its reverse path forwarding (RPF) 728 nexthop is A. 730 A A 731 /|\ / \ 732 B---C B C 733 \|/ \ 734 D D 736 Figure 8: Loop Prevention through the RPF check 738 A.3. Further Multicast Tree based Transport Options 740 A.3.1. Source Specific Trees 742 One implementation option is to rely on source specific multicast. 743 This means that even though there is only a single multicast group 744 address (MC-FN) allocated to the FN service, the FIB of each router 745 is configured with forwarding information for as many trees as many 746 FN sources (nodes) there are in the routing area, i.e. to each 747 (S_i,MC-FN) pair. 749 A.3.2. A Single Bidirectional Shared Tree 751 In the previous solution each source specific tree is a spanning 752 tree. It is possible to reduce the complexity of managing and 753 configuring n spanning trees in the area by using bidirectional 754 shared trees. By building a bidirectional shared tree, all nodes on 755 the tree can send and receive traffic using that single tree. Each 756 sent packet from any source is multicasted on the tree to all other 757 receivers. 759 The tree must be consistently computed at all routers. For this, the 760 following rules may be given: 762 The tree can be computed as a shortest path tree rooted at e.g. the 763 highest router-id. When multiple paths are available, the 764 neighbouring node in the graph e.g. with highest router-id can be 765 picked. When multiple paths are available through multiple 766 interfaces to a neighbouring node, e.g. a numbered interface may be 767 preferred over an unnumbered interface. A higher IP address may be 768 preferred among numbered interfaces and a higher ifIndex may be 769 preferred among unnumbered interfaces. 771 Note, however, that the important point is that the rules are 772 consistent among nodes. That is, a router may pick the lower router 773 IDs if it is ensured that ALL routers will do the same to ensure 774 consistency. 776 Multicast forwarding state is installed using such a tree as a bi- 777 directional tree. Each router on the tree can send packets to all 778 other routers on that tree. 780 Note that the multicast spanning tree can be built using [BIDIR-PIM] 781 so that each router within an area subscribes to the same multicast 782 group address. Using BIDIR-PIM in such a way will eventually build a 783 multicast spanning tree among all routers within the area. (BIDIR- 784 PIM is normally used to build a shared, bidirectional multicast tree 785 among multiple sources and receivers.) 787 A.4. Layer 2 Networks 789 Layer 2 (e.g. Ethernet) networks offer further options for 790 distributing the notification (e.g. using spanning trees offered by 791 STP). Definition of these is being considered and will be included 792 in a future revision of this draft. 794 Appendix B. Computing maximally redundant trees 796 Here we describe a possible, not optimal, way of computing maximally 797 redundant trees. First, we suppose that the network is 2-connected 798 and that we have a central processor computing the trees, then we 799 lift these assumptions. 801 B.1. Simple pair of maximally redundant trees in 2-connected networks 803 Finding a simple pair of maximally redundant trees in a 2-connected 804 network is quite simple. We call a node "ready", if it was already 805 added to the trees. Initially, the only ready node is the common 806 root (node r in the sequel). 808 When we have at least one node x in the network, which is not ready, 809 find two node-disjoint paths from x either to r or to two distinct 810 ready nodes. Since the network is 2-connected, there are always two 811 node-disjoint paths from x to r. It is possible that one or both of 812 these paths reaches another ready node sooner than r, in which case 813 we have the two node-disjoint paths to distinct nodes. Combining the 814 undirected links of these paths makes up an *ear*. 816 x---a 817 \ 818 b 819 | 820 c 821 / 822 y---d 824 Figure 9: An *ear* connected to node x and y (x and y are ready) 826 Let x and y be the two ready endpoints of an ear, and first suppose 827 that they are different nodes and none of them is r. Note that both 828 x and y are in the two trees (since they are "ready") and if x is an 829 ancestor of y in the first tree (x is on the path from y to r), then 830 x cannot be the ancestor of y along the second tree at the same time. 831 Thus, it is safe to connect the nodes of the freshly found ear to x 832 in the first tree and to y in the second tree, if either x is an 833 ancestor of y in the first tree, or y is an ancestor of x in the 834 second tree. Considering the example in Figure 9, this means that 835 links d-c-b-a-x should be added to the first tree, and a-b-c-d-y 836 should be added to the second one. 838 In the case, when either x=r or y=r or when neither x is an ancestor 839 of y nor y is an ancestor of x in any of the trees, the endpoints are 840 not firmly bound to one of the trees, it is only important to put the 841 links to one endpoint in one of the trees and put the links towards 842 the other endpoint to the other tree. In our example this means that 843 either d-c-b-a-x or a-b-c-d-y could be added to the first tree. 844 Naturally, then the other endpoint must be selected for the second 845 tree. 847 In order to protect against the failure of the root r, we need to 848 construct (maximally) redundant trees, where there is only one edge 849 entering to the root on one of the trees. This makes the root a leaf 850 in that tree. To achieve this, we should add the ear to the second 851 tree through r only if both endpoints are r. Moreover, we need to 852 select an ear with different endpoints when it is possible. 854 Finding an ear is relatively simple and can be done in different 855 ways. Probably the simplest way is to find a ready node q (q is not 856 the root) with a non-ready neighbor w, (virtually) remove q from the 857 topology, and to find a path from w to r; since the network is 2- 858 connected, such a path either reaches r, or reach another ready node. 859 Moreover, when only r is ready such a node q does not exist, so we 860 select one of r's neighbors as w, and remove not r but the link 861 between them. 863 e---d 864 / / \ 865 r---c f 866 \ \ / 867 a---b 869 Figure 10: A 2-connected network 870 e---d e---d 871 / / \ 872 r c f r---c f 873 \ \ / \ 874 a---b a---b 876 Figure 11: The two maximally redundant trees found in the network 878 Now, a simple example is in order. Consider the network depicted in 879 Figure 10, and suppose that the common root is node r. We have only 880 r in the trees, so we select one of its neighbors, let it be a, 881 remove the link between them, and select a path (let it be the 882 shortest one) from a to r; this path is a-b-c-r, so the ear is 883 r-a-b-c-r. Since both endpoints of the ear are r, selecting the 884 right tree is not important, e.g., we can add c-b-a-r to the first 885 tree, and a-b-c-r to the second one (Figure 11). This way, r, a, b 886 and c form the set of "ready" nodes. From the ready set, c and d are 887 not the root and have non-ready neighbors. Let us select, e.g., c. 888 The shortest path from d to r when c is removed is d-e-r, so we have 889 ear c-d-e-r, we add d-e-r to the first tree and e-d-c to the second 890 one (recall that we do not want to create a new neighbor for r in the 891 second tree). Finally, the last non-ready node is f, and the ear is 892 b-f-d. Since neither is b an ancestor of d nor is d an ancestor of b 893 in any of the trees, we can connect f to the trees in both ways. 894 E.g., add f-b to the first tree, and f-d to the second one. 896 B.2. Non-2-connected networks 898 When, however, the network is not 2-connected, it is not always 899 possible to find a pair of node-disjoint paths from any node x to 900 root r, which makes our previous algorithm unable to find the trees. 901 However, while the network is connected, it is made up by 2-connected 902 components bordered by "cut-vertices" (naturally, some of these 903 components may contain only one node). A node is a cut-vertex, if 904 removing that node splits the network into two. 906 A simple algorithm to find the components and the cut-vertices can be 907 to (virtually) remove each vertex one by one, and check connectivity 908 with BFS or DFS. Moreover, nodes a and b are in the same 2-connected 909 component, if a remains reachable from b after removing any single 910 node. Note that linear time algorithms do exist that find both the 911 2- connected components and the cut-vertices. 913 Now, we can build up redundant trees in each component. In 914 components containing r, the root of such trees must be r. 915 Otherwise, in the remaining components the root must be the last node 916 in the component along a path to the root. Recall, that this must be 917 a cut-vertex, so it is the same for each path emanating from that 918 component. 920 At this point, we are ready, if there is no cut-edge in the network. 921 However, if some 2-connected components are connected by a cut-edge, 922 we must add that edge to both of the trees. 924 e---d i 925 / / \ /| 926 r---c f--g | 927 \ \ / \| 928 a---b j 930 Figure 12: Non-2-connected network 932 e---d i e---d i 933 / /| / \ | 934 r c f--g | r---c f--g | 935 \ \ / | \ \| 936 a---b j a---b j 938 Figure 13: The two maximally redundant trees found in the network 940 As an example consider the network depicted in Figure 12. Observe 941 that now we have two 2-connected components, one contains r, a, b, c, 942 d, e, f and the other contains g, i, j. Moreover, these components 943 have no common node, they are connected with a cut-edge. 945 Finding the trees in the component containing r is simple, these 946 trees are the same as previously. Moreover, the other component is a 947 cycle, so it will be covered by a single ear. Finally we must add 948 link f-g to both of the trees, to get the trees depicted in 949 Figure 13. 951 B.3. Finding maximally redundant trees in distributed environment 953 If we need to compute exactly the same maximally redundant trees at 954 each of the routers, consistency needs to be ensured by tie-breaking 955 mechanisms. Observe that the previous algorithm has multiple choices 956 when it selects how to connect nodes to the trees when only r is 957 ready, how to select ready node q and non-ready node w for a later 958 ear and when neither of the endpoints is an ancestor of the other 959 one. 961 All the previous problems can easily be handled. E.g., the first ear 962 should be connected in such a way, that the neighbor of r with the 963 lowest ID must be directly connected to r in the first tree. 965 Moreover, later we should choose ready router with non-ready neighbor 966 as q and its non-ready neighbor with the lowest ID as w. Finally, 967 when neither of the endpoint is an ancestor of the other one, connect 968 the ear to the endpoint with the lower ID in the first tree. 970 Authors' Addresses 972 Wenhu Lu 973 Ericsson 974 300 Holger Way 975 San Jose, California 95134 976 USA 978 Email: Wenhu.Lu@ericsson.com 980 Sriganesh Kini 981 Ericsson 982 300 Holger Way 983 San Jose, California 95134 984 USA 986 Email: Sriganesh.Kini@ericsson.com 988 Andras Csaszar 989 Ericsson 990 Irinyi utca 4-10 991 Budapest 1117 992 Hungary 994 Email: Andras.Csaszar@ericsson.com 996 Gabor Sandor Enyedi 997 Ericsson 998 Irinyi utca 4-10 999 Budapest 1117 1000 Hungary 1002 Email: Gabor.Sandor.Enyedi@ericsson.com 1003 Jeff Tantsura 1004 Ericsson 1005 300 Holger Way 1006 San Jose, California 95134 1007 USA 1009 Email: Jeff.Tantsura@ericsson.com 1011 Albert Tian 1012 Ericsson 1013 300 Holger Way 1014 San Jose, California 95134 1015 USA 1017 Email: Albert.Tian@ericsson.com