idnits 2.17.00 (12 Aug 2021) /tmp/idnits12526/draft-lu-fn-transport-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (March 7, 2011) is 4093 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) No issues found here. Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Network Working Group W. Lu 2 Internet Draft S. Kini 3 Intended status: Standards Track A. Csaszar 4 Expires: September 7, 2011 G. Enyedi 5 J. Tantsura 6 A. Tian 7 March 7, 2011 9 Transport of Fast Notification Messages 10 draft-lu-fn-transport-00.txt 12 Status of this Memo 14 This Internet-Draft is submitted in full conformance with the 15 provisions of BCP 78 and BCP 79. 17 Internet-Drafts are working documents of the Internet Engineering 18 Task Force (IETF), its areas, and its working groups. Note that 19 other groups may also distribute working documents as Internet- 20 Drafts. 22 Internet-Drafts are draft documents valid for a maximum of six months 23 and may be updated, replaced, or obsoleted by other documents at any 24 time. It is inappropriate to use Internet-Drafts as reference 25 material or to cite them other than as "work in progress." 27 The list of current Internet-Drafts can be accessed at 28 http://www.ietf.org/ietf/1id-abstracts.txt 30 The list of Internet-Draft Shadow Directories can be accessed at 31 http://www.ietf.org/shadow.html 33 This Internet-Draft will expire on September 7, 2011. 35 Copyright Notice 37 Copyright (c) 2011 IETF Trust and the persons identified as the 38 document authors. All rights reserved. 40 This document is subject to BCP 78 and the IETF Trust's Legal 41 Provisions Relating to IETF Documents 42 (http://trustee.ietf.org/license-info) in effect on the date of 43 publication of this document. Please review these documents 44 carefully, as they describe your rights and restrictions with respect 45 to this document. 47 Abstract 49 This document specifies a fast, dataplane-based low-overhead event 50 notification protocol, called Fast Notification (FN) protocol. The 51 draft discusses the design goals, the message container and options 52 for delivering the notifications to all routers within a routing 53 area. 55 Table of Contents 57 1. Introduction................................................3 58 2. Design Goals................................................3 59 3. Transport Logic - Distribution of the Notifications...........4 60 3.1. Multicast Tree-based Transport..........................5 61 3.1.1. Fault Tolerance of a Single Distribution Tree.......5 62 3.1.2. Pair of Redundant Trees............................5 63 4. Message Encoding............................................7 64 4.1. Seamless Encapsulation..................................7 65 4.2. Dedicated FN Message....................................7 66 4.2.1. Authentication.....................................9 67 4.2.1.1. Areas-scoped and Link-scoped Authentication...10 68 4.2.1.2. Simple Password Authentication...............10 69 4.2.1.3. Cryptographic Authentication for FN..........10 70 5. Security Considerations.....................................12 71 6. IANA Considerations........................................12 72 7. References.................................................12 73 7.1. Normative References...................................12 74 7.2. Informative References.................................12 75 8. Acknowledgments............................................13 76 Appendix A. Further Options for Transport Logic................14 77 A.1. Unicast................................................14 78 A.1.1. Method...........................................14 79 A.1.2. Sample Operation..................................15 80 A.2. Gated Multicast through RPF Check......................15 81 A.2.1. Loop Prevention - RPF Check.......................16 82 A.2.2. Operation........................................16 83 A.3. Further Multicast Tree based Transport Options..........18 84 A.3.1. Source Specific Trees.............................18 85 A.3.2. A Single Bidirectional Shared Tree................18 86 A.4. Layer 2 Networks.......................................19 87 Appendix B. Computing maximally redundant trees................20 88 A.5. Simple pair of maximally redundant trees in 2-connected 89 networks...................................................20 90 A.6. Non-2-connected networks...............................22 91 A.7. Finding maximally redundant trees in distributed environment 92 ...........................................................23 94 1. Introduction 96 Draft [fn-framework] describes the architectural framework to enable 97 fast dissemination of a network event to routers in a limited domain. 98 Existing use cases involve new approaches for IP Fast ReRoute such as 99 [ipfrr-fn], and faster dissemination of link state information of 100 routing protocols [ospf-fn] e.g. in order to speed up convergence. 102 Existing link state routing protocols such as OSPF and ISIS rely on a 103 flooding mechanism to advertise information. These flooding 104 mechanisms are implemented in the control plane, i.e. processed at 105 the control plane in each hop before forwarding to the next hop. A 106 new fast notification protocol should be implemented with low hop-by- 107 hop processing overhead. 109 This draft proposes a fast notification (FN) protocol as a separate 110 transport layer, whereby applications are proposed in separate 111 drafts. 113 The functions offered by the FN transport layer include but are not 114 limited to: 116 o authentication 118 o forwarding (lookup, policy, etc.) 120 o topology setup (statically or dynamically configurable) 122 2. Design Goals 124 A low-overhead event notification mechanism which should be used to 125 facilitate extremely quick dissemination of information in a limited 126 area should have the following properties. 128 1. The mechanism should be fast in the sense that getting the 129 notification packet to remote nodes through possible multiple hops 130 should not require (considerably) more processing at each hop than 131 plain fast path packet forwarding. Latency due to involving the 132 control plane in each hop is not preferable for some applications. 134 2. The signaling mechanism should be resilient or, at least, it 135 should offer a delivery option which can disseminate event 136 notifications to all interested nodes even in a network where a 137 single link or a node is down. 139 3. The mechanism should be trustable; that is, it should provide 140 means to verify the authenticity of the notifications. 142 4. No fate sharing with control plane, allowing easy graceful 143 restart. 145 5. The mechanism should be applicable in nodes where control plane is 146 separated from data plane (such as ForCES or OpenFlow forwarding 147 elements). 149 6. The new protocol should not be dependent upon routing protocol 150 flooding procedures, and should work with either IS-IS or OSPF. 152 The above reasons point towards a dataplane implementation with the 153 following design goal: 155 7. The mechanism should involve simple and efficient processing to be 156 feasible for implementation in the dataplane. 158 The last goal manifests itself in the following ways: 160 o Origination of notification should very easy, e.g. creating a 161 simple IP packet, the payload of which can be filled easily by an 162 application. 164 o When receiving the packet, it should be easy to recognize by 165 dataplane linecards so that processing can commence after 166 forwarding. 168 o No complex operations should be required in order to extract the 169 information from the packet. 171 o Duplication of notification packets should be either strictly 172 bounded or handled without significant dataplane processing 173 burden. 175 These design goals present a trade-off. A proper balance needs to be 176 found that offers good enough authentication and reliability while 177 keeping processing complexity sufficiently low to be feasible for 178 data plane implementation. This draft attempts to propose such a 179 solution. 181 3. Transport Logic - Distribution of the Notifications 183 The distribution of a notification to multiple receivers can be 184 implemented in various ways. The main body of this draft describes 185 one such option that is especially advantageous: dual redundant 186 trees. This option is perfect to guarantee that each notification 187 object reaches each node in the domain even with single link or 188 single node failures. 190 3.1. Multicast Tree-based Transport 192 The optimal way of transporting an identical piece of information to 193 several receivers at the same time is to use multicast distribution 194 trees. A tree based transport solution is beneficial since multicast 195 support is already implemented in all forwarding entities, so it is 196 possible to leverage on existing implementations. 198 With multicast or tree based transport, the Fast Notification (FN) 199 packet can be recognized by a pre-configured or well known 200 destination IP address, denoted by MC-FN in the following, which is 201 the group address of the FN service. 203 If the FN service is triggered to send out a notification, the 204 notification will originated in a new IP packet, where the 205 destination IP address is set to MC-FN. 207 3.1.1. Fault Tolerance of a Single Distribution Tree 209 The previous solutions and those listed in Appendix A use a single 210 logical or explicit tree for disseminating a notification from one 211 given source. 213 If an application requires that the FN transport is tolerant to 214 single link or node failures, a single distribution tree can be 215 questionable, as a single tree is not redundant: a single failure may 216 cut the tree into two halves, which means that until reconfiguration, 217 a specific notification originating in one half of the tree will not 218 reach nodes in the other part. 220 For those applications that disseminate failure notifications (as 221 e.g. in [ospf-fn] or [ipfrr-fn]) in many failure conditions (e.g. 222 link failure) a failure notification may not be necessary from each 223 of the nodes first detecting the failure as the failure notification 224 from one node may be sufficient to deduce the notification from the 225 other node detecting the failure. This draft lets the applications 226 choose their preferred transport options. 228 3.1.2. Pair of Redundant Trees 230 If an FPN application needs that the exact same data is distributed 231 in the case of any single node or any single link failure, the FPN 232 service should be run in "redundant tree mode". 234 A pair of "redundant trees" ensures that at each single node or link 235 failure each node still reaches the common root of the trees through 236 at least one of the trees. A redundant tree pair is a known prior-art 237 graph-theoretical object that is possible to find on any 2-node 238 connected network. Even better, it is even possible to find maximally 239 redundant trees in networks where the 2-node connected criterion does 240 not "fully" hold (e.g. there are a few cut vertices) [Eny2009]. 242 Note that the referenced algorithm(s) build a pair of trees 243 considering a specific root. The root can be selected in different 244 ways, the only thing that is important that each node makes the same 245 selection, consistently. For instance, the node with the highest or 246 lowest router ID can be used. 248 #1 tree #2 tree 249 +---+ +---+ +---+ +---+ 250 | B |=======| | | B |=======| | 251 +---+ +---+ +---+ +---+ 252 // \\ // \ 253 // \\ // \ 254 +---+ +---+ +---+ +---+ 255 | A |---------------------| R | | A |=====================| R | 256 +---+ +---+ +---+ +---+ 257 \ // \\ / 258 \ // \\ / 259 +---+ +---+ +---+ +---+ 260 | |=======| | | |=======| | 261 +---+ +---+ +---+ +---+ 263 Figure 1 Example: a pair of redundant trees (double lines) of a 264 common root R 266 There is one special constraint in building the redundant trees. A 267 (maximally) redundant tree pair is needed, where in one of the trees 268 the root has only one child in order to protect against the failure 269 of the root itself. Algorithms presented in [Eny2009] produce such 270 trees. The algorithm is also described in Appendix B in this 271 specification. 273 In redundant-tree mode, each node multicasts the requested 274 notification on both trees, if it is possible, but at least along one 275 of the trees. Redundant trees require two multicast group addresses. 276 MC-FN identifies one of the trees, and MC-FN-2 identifies the other 277 tree. 279 Each node multicast forwards the received notification packet (on the 280 same tree). The root node performs as every other node but in 281 addition it also multicast the notification on the other tree! I.e. 282 it forwards a replica of the incoming notification in which it 283 replaces the destination address identifying the other multicast 284 distribution tree. 286 When the network remains connected and the root remains operable 287 after a single failure, the root will be reached on at least one of 288 the trees. Thus, since the root can reach every node along at least 289 one of the trees, all the notifications will reach each node. 290 However, when the root or the link to the root fails, that tree, in 291 which the root has only one child, remains connected (the root is a 292 leaf there), thus, all the nodes can be reached along that tree. 294 For example, let us consider that in Figure 1 FN is used to 295 disseminate failure information. If link A-B fails, the notifications 296 originating from node B (e.g. reporting that the connectivity from B 297 to A is lost) will reach R on tree #1. Notifications originating from 298 A (e.g. reporting that the connectivity from A to B is lost) will 299 reach R on tree #2. From R, each node is reachable through one of the 300 trees, so each node will be notified about both events. 302 4. Message Encoding 304 4.1. Seamless Encapsulation 306 An application may define its own message to distribute quickly. In 307 this case, only the special destination address (e.g. MC-FN) shows 308 that the message was sent using the FN service. 310 In this case, the entire payload of the IP packet is determined by 311 the application including sequence numbering and authentication. The 312 IP packet's protocol field is also set by the application. The draft 313 [ospf-fn] presents a use case, where the IP protocol field is set to 314 "OSPF" and the payload is entirely handled by OSPF. 316 4.2. Dedicated FN Message 318 An alternative option is that the FN messages are distributed in UDP 319 datagrams with well-known port values in the UDP header that need to 320 be allocated by IANA. 322 The FN packet format inside a UDP datagram is the following: 324 0 1 2 3 325 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 326 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 327 | | 328 +- -+ 329 | IP Header | 330 +- +-------------+ -+ 331 | | Protocol=UDP| | 332 +- +-------------+ -+ 333 | | 334 +- -+ 335 | | 336 +---------------------------------------------------------------+ 337 | UDP Source Port = FN | UDP Destination Port = FN | 338 +---------------------------------------------------------------+ 339 | UDP Header cont'd | 340 +---------------------------------------------------------------+ 341 | FN Header | 342 +---------------------------------------------------------------+ 343 | ... | 344 . . 345 . FN Payload . 346 . . 347 | ... | 348 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 349 | ... | 350 . . 351 . Authentication (optional) . 352 . . 353 | ... | 354 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 356 Figure 2 FN packet format as a UDP datagram 358 The encoding of the FN Header is as follows: 360 0 1 2 3 361 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 362 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 363 | FN Length | FN App Type | AuType|unused | 364 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 366 Figure 3 FN Header encoding 367 FN Length (16 bits) 368 The length of the FN message in bytes including the FN Header and 369 the FN Payload. The authentication data optionally appended to the 370 FN packet is not actually considered part of the FN message: the 371 authentication data is not included in the FN Length field, 372 although it is included in the length field of the packet's IP 373 header. 375 FN App Type (8 bits) 376 Identifies the application which should be the receiver of the 377 notification. A value for each application needs to be assigned by 378 IANA. 380 AuType 381 Identifies the authentication procedure to be used for the packet. 382 Authentication options are discussed in Section 4.2.1. of the 383 specification. Note 385 4.2.1. Authentication 387 Fast Notification intends to provide a trustable service option, so 388 that receivers of FN packets are able to verify that the packet is 389 sent by an authentic source. Simple password authentication and MD5 390 authentication is described in the following subsections. 392 If AuType is set to 0x0, then the FN packet is not carrying an 393 Authentication field at the end of the packet. Note that even in this 394 case the FN application in the payload may still use its own 395 authentication mechanism. 397 If AuType is non null, an Authentication field must be appended after 398 the FN message. The encoding of this field is as follows: 400 0 1 2 3 401 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 402 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 403 | AuLength | ... Authentication Data ... | 404 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 405 | ... | 407 Figure 4 Authentication field in FN packets 409 AuLength 410 Describes the length of the entire Authentication field in bytes. 412 4.2.1.1. Areas-scoped and Link-scoped Authentication 414 Since FN is a solution to disseminate an event notification from one 415 source to a whole area of nodes, the simplest approach would be to 416 use per-area authentication, i.e. a common password, a common pre- 417 shared key among all nodes in the area as described in the following 418 sub-sections. 420 Carriers may, however, prefer per-link authentication. In order not 421 to lose the speed (simple per-hop processing, fast forwarding 422 property) of FN, link-scoped authentication is suggested only if the 423 dataplane supports it, i.e. if there is hardware support to verify 424 and re-generate authentication hop-by-hop. In such cases, the 425 operator may need to configure a common pre-shared key only on 426 routers connected by the same link. It is even possible that there is 427 no authentication on some links considered safe. 429 4.2.1.2. Simple Password Authentication 431 Simple password authentication guards against routers inadvertently 432 joining the routing area; each router must first be configured with a 433 password before it can participate in Fast Notification. 435 The password is stored in the Authentication Data field. AuLength is 436 set to the length of the password in bytes plus 1. Two AuType values 437 for simple password authentication need to be allocated by IANA: one 438 for area-scope and another for link-scoped. 440 With per-link authentication mode, the Authentication field must be 441 stripped and regenerated hop-by-hop. 443 Simple password authentication, however, can be easily compromised as 444 anyone with physical access to the network can read the password. 446 4.2.1.3. Cryptographic Authentication for FN 448 Using this authentication type, a shared secret key is configured in 449 all routers subscribed to the FN service. The key is used to 450 generate/verify a "message digest" that is appended to the end of the 451 FN packet. The message digest is a one-way function of the FN packet 452 and the secret key. This authentication mechanism resembles the 453 cryptographic authentication mechanism of [OSPF]. 455 The packet signature is created by an MD5 hash performed on an object 456 which is the concatenation of the FN message, including the FN 457 header, and the pre-shared secret key. The resulting 16 byte MD5 458 message digest is appended to the FN message into the Authentication 459 field as shown below. 461 The AuType in the FN header is set to indicate cryptographic 462 authentication, the specific value is to be assigned by IANA both for 463 area-scoped and for link-scoped versions. 465 0 1 2 3 466 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 467 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 468 | AuLength | Key ID | Unused | 469 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 470 | Message Digest (bytes 1-4) | 471 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 472 | Message Digest (bytes 5-8) | 473 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 474 | Message Digest (bytes 9-12) | 475 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 476 | Message Digest (bytes 13-16) | 477 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 479 Figure 5 Authentication field in FN packets with MD5 cryptographic 480 authentication. 482 AuLength 483 AuLength is set to 20 bytes. 485 Key ID 486 This field identifies the algorithm and secret key used to create 487 the message digest appended to the FN packet. This field allows 488 that multiple pre-shared keys may exist in parallel. 490 Message Digest 491 The 16 byte long MD5 hash performed on an object which is the 492 concatenation of the FN message, including the FN header, and the 493 pre-shared secret key identified by Key ID. 495 When receiving an FN message, if the FN header indicates MD5 496 authentication, then the last 20 bytes of the FN message are set 497 aside. The recipient data plane element calculates a new MD5 digest 498 of the remainder of the FN message to which it appends its own known 499 secret key identified by Key ID. The calculated and received digests 500 are compared. In case of mismatch, the FN message is discarded. 502 In per-link authentication mode, the Authentication field must be 503 stripped and regenerated hop-by-hop using the key of the outgoing 504 link. 506 5. Security Considerations 508 This draft has described basic optional procedures for 509 authentication. The mechanism, however, does not protect against 510 replay attacks. 512 If applications of FN require protection against replay attacks, then 513 these applications should provide their own specific sequence 514 numbering within the FN payload. Recipient applications should accept 515 FN messages only if the included sequence number is what they 516 expected. 518 Since the message digest of cryptographic authentication also covers 519 the payload, even if an attacker knew how to construct the new 520 sequence number, it would not be able to generate a correct message 521 digest without the pre shared key. This way, a sequence number in the 522 payload combined with FN's cryptographic authentication offers 523 sufficient protection against replay attacks. 525 6. IANA Considerations 527 An IP protocol value needs to be assigned by IANA for FN. IANA also 528 needs to maintain values for FN App Type as applications are being 529 proposed. 531 Multicast addresses used for the distribution trees are either 532 allocated by IANA or they can be a configuration parameter within 533 the local domain. 535 7. References 537 7.1. Normative References 539 [BIDIR-PIM] M. Handley et al., "Bidirectional Protocol Independent 540 Multicast (BIDIR-PIM)", RFC 5015, IETF, 2007 542 7.2. Informative References 544 [Eny2009] Gabor Enyedi, Gabor Retvari, Andras Csaszar, "On Finding 545 Maximally Redundant Trees in Strictly Linear Time", IEEE 546 Symposium on Computers and Communications (ISCC), 2009. 548 [fn-framework] W. Lu, A. Tian, S. Kini, "Fast Notification 549 Framework", draft-lu-fast-notification-framework-00 551 [ipfrr-fn] A. Csaszar, G. Enyedi, S. Kini, "IP Fast Re-Route with 552 Fast Notification", draft-csaszar-ipfrr-fn-00 554 [OSPF] J. Moy, "OSPF Version 2", IETF RFC 2328, 1998 556 [ospf-fn] S. Kini, W. Lu, A. Tian, "OSPF Fast Notifications", draft- 557 kini-ospf-fast-notification-00 559 8. Acknowledgments 561 The authors owe thanks to Acee Lindem and Joel Harpern for their 562 review and comments. 564 This document was prepared using 2-Word-v2.0.template.dot. 566 Appendix A. Further Options for Transport Logic 568 The options described in this appendix represent alternative 569 solutions to the redundant tree based approach described in 570 Section 3.1.2. 572 It is left for WG discussion and further evaluation to decide whether 573 any of these options should potentially be preferred instead of 574 redundant trees. 576 A.1. Unicast 578 This method addresses the need in a unique way. It has the following 579 properties: 581 o Extremely simple, without the need of any data plane change or 582 cooperation; 584 o Short turnaround time (i.e. ready for next hit); 586 o 100% link break coverage (may not work in certain node failure 587 cases); 589 o Little change to OSPF (need encapsulation for IS-IS). 591 A.1.1. Method 593 The method is simple in design, easy to implement and quick to 594 deploy. It requires no topology changes or specific configurations. 595 It adds little overhead to the overall system. 597 The method is literally sending the event message to every router in 598 the domain in form of IP packet. This appears burdensome to the 599 sending router which has to duplicate the packet sending effort many 600 times. Practical experience has shown, however, that the amount of 601 effort is not a big concern in reasonable sized networks. 603 Normal flooding (regular or fast) process requires a router to 604 duplicate the packet to all flooding eligible interfaces. All routers 605 have to be fast-flooding-aware. This implies new code to every router 606 in control plane and/or data plane. 608 The method uses a different approach. It takes advantage of the given 609 routing/forwarding table in each router in the IP domain. The 610 originating router of the flooding information simply sends multiple 611 copies of the packet to each and every router in the domain. These 612 packets are forwarded to the destination routers at data plane speed, 613 just like the way the regular IP data traffic is handled. No special 614 handling in any other routers is needed. 616 This small delay on the sender can be minimized by pre-downloading 617 the link-broken message packets to the data plane. Since the data 618 plane already has the list of all routers which are part of the IGP 619 routing table, the data plane can dispatch the packet directly. 621 By essence, the flooding in this method is tree based, just like a 622 multicast tree. The key is that no special tree is generated for this 623 purpose; the normal routing table which is an SPF tree (SPT) plays a 624 role of the flooding tree. This logic guarantees that the flooding 625 follows the shortest path and no flooding loop is created. 627 A.1.2. Sample Operation 629 1.1. depicts a scenario where router A wants to flood its message to 630 all other routers in the domain using the unicast flooding method. 632 Instead of sending one packet to each of its neighbor, and let the 633 neighbor to flood the packet further, router A directly send the same 634 packet to each router in the domain, one at a time. In this sample 635 network, router A literally sends out 5 packets. 637 A---B---C---D 638 \ 639 --E---F 641 1. Packet(A->B); 642 2. Packet(A->C); 643 3. Packet(A->D); 644 4. Packet(A->E); 645 5. Packet(A->F). 647 Figure 1 Multiple Unicast Packets 649 The unicast flooding procedure is solely controlled by the sending 650 router. No action is needed from other routers other than their 651 normal forwarding functionalities. This method is extremely simple 652 and useful for quick prototyping and deployment. 654 A.2. Gated Multicast through RPF Check 656 This method fulfills the purpose with the following characters: 658 1. No need to build the multicast tree. It is the same as the SPT 659 computed by the IGP routing process; 661 2. The flooding loop is prevented through RPF Check. 663 The method has all the benefit of multicast flooding. It, however, 664 does not require running multicast protocol to setup the multicast 665 tree. The unicast shortest path tree is leveraged as a multicast 666 tree. 668 A.2.1. Loop Prevention - RPF Check 670 In this mechanism, the distribution tree is not explicitly built; 671 rather each node will first do a Reverse Path Forwarding (RPF) check 672 before it floods the notification to other links. 674 A special multicast address is defined and is subject to IANA 675 approval. This address is used to qualify the notification packet for 676 fast flooding. When a notification packet arrives, the receiving node 677 will perform an IP unicast routing table lookup for the originator IP 678 address of the notification and find the outgoing interface. Only 679 when the arriving interface of the notification is the same as the 680 outgoing interface leading towards the originator IP address, the 681 notification will be flooded to other interfaces. 683 IP Multicast forwarding with RPF check is available on most of the 684 routing/switch platforms. To support flooding with RPF check, a 685 special IP multicast group must be used. A bi-directional IP 686 multicast forwarding entry is created that consists of all interfaces 687 with in the flooding scope, typically an IGP area. 689 A.2.2. Operation 691 The Gated flooding operation is illustrated in 1.1. . 693 All Routers, IGP Process: 694 if (SPT ready) { 695 duplicate the SPT as Bidir_Multicast_tree; 696 download the multicast_tree to data plane; 697 } 698 add FNF_multicast_group_addr; 700 Sender of the FNF notification: 701 if (breakage detected) { 702 pack the notification in a packet; 703 send the packet to the FNF_multicast_group_addr; 704 } 706 Receiver of the FNF notification: 707 if (notification received) { 708 if (RPC_interface == incoming_interface) { 709 multicast the notification to all other interfaces; 710 } 711 forward the notification to IGP for processing; 712 } 713 Figure 2 Gated flooding operation 715 1.1. shows a sample operation on a four-router mesh network. The 716 left figure is the topology. The right figure is the shortest path 717 tree rooted at A. 719 Router A initiates the flooding. But the downstream routers B, C, and 720 D will drop all messages except the ones that come from their 721 shortest path parent node. For example, A's message to C via B is 722 dropped by C, because C knows that its reverse path forwarding (RPF) 723 nexthop is A. 725 A A 726 /|\ / \ 727 B---C B C 728 \|/ \ 729 D D 731 Figure 3 Loop Prevention through the RPF check 733 A.3. Further Multicast Tree based Transport Options 735 A.3.1. Source Specific Trees 737 One implementation option is to rely on source specific multicast. 738 This means, that even though there is only a single multicast group 739 address (MC-FN) allocated to the FN service, the FIB of each router 740 is configured with forwarding information for as many trees as many 741 FN sources (nodes) there are in the routing area, i.e. to each 742 (S_i,MC-FN) pair. 744 A.3.2. A Single Bidirectional Shared Tree 746 In the previous solution each source specific tree is a spanning 747 tree. It is possible to reduce the complexity of managing and 748 configuring n spanning trees in the area by using bidirectional 749 shared trees. By building a bidirectional shared tree, all nodes on 750 the tree can send and receive traffic using that single tree. Each 751 sent packet from any source is multicasted on the tree to all other 752 receivers. 754 The tree must be consistently computed at all routers. For this, the 755 following rules may be given: 757 The tree can be computed as a shortest path tree rooted at e.g. the 758 highest router-id. When multiple paths are available, the 759 neighbouring node in the graph e.g. with highest router-id can be 760 picked. When multiple paths are available through multiple interfaces 761 to a neighbouring node, e.g. a numbered interface may be preferred 762 over an unnumbered interface. A higher IP address may be preferred 763 among numbered interfaces and a higher ifIndex may be preferred among 764 unnumbered interfaces. 766 Note, however, that the important point is that the rules are 767 consistent among nodes. That is, a router may pick the lower router 768 IDs if it is ensured that ALL routers will do the same to ensure 769 consistency. 771 Multicast forwarding state is installed using such a tree as a bi- 772 directional tree. Each router on the tree can send packets to all 773 other routers on that tree. 775 Note that the multicast spanning tree can be built using [BIDIR-PIM] 776 so that each router within an area subscribes to the same multicast 777 group address. Using BIDIR-PIM in such a way will eventually build a 778 multicast spanning tree among all routers within the area. (BIDIR-PIM 779 is normally used to build a shared, bidirectional multicast tree 780 among multiple sources and receivers.) 782 A.4. Layer 2 Networks 784 Layer 2 (e.g. Ethernet) networks offer further options for 785 distributing the notification (e.g. using spanning trees offered by 786 STP). Definition of these is being considered and will be included in 787 a future revision of this draft. 789 Appendix B. Computing maximally redundant trees 791 Here we describe a possible, not optimal, way of computing maximally 792 redundant trees. First, we suppose that the network is 2-connected 793 and that we have a central processor computing the trees, then we 794 lift these assumptions. 796 A.5. Simple pair of maximally redundant trees in 2-connected networks 798 Finding a simple pair of maximally redundant trees in a 2-connected 799 network is quite simple. We call a node "ready", if it was already 800 added to the trees. Initially, the only ready node is the common root 801 (node r in the sequel). 803 When we have at least one node x in the network, which is not ready, 804 find two node-disjoint paths from x either to r or to two distinct 805 ready nodes. Since the network is 2-connected, there are always two 806 node-disjoint paths from x to r. It is possible that one or both of 807 these paths reaches another ready node sooner than r, in which case 808 we have the two node-disjoint paths to distinct nodes. Combining the 809 undirected links of these paths makes up an *ear*. 811 x---a 812 \ 813 b 814 | 815 c 816 / 817 y---d 819 Figure 6 An *ear* connected to node x and y (x and y are ready) 821 Let x and y be the two ready endpoints of an ear, and first suppose 822 that they are different nodes and none of them is r. Note that both x 823 and y are in the two trees (since they are "ready") and if x is an 824 ancestor of y in the first tree (x is on the path from y to r), then 825 x cannot be the ancestor of y along the second tree at the same time. 826 Thus, it is safe to connect the nodes of the freshly found ear to x 827 in the first tree and to y in the second tree, if either x is an 828 ancestor of y in the first tree, or y is an ancestor of x in the 829 second tree. Considering the example in Figure 6, this means that 830 links d-c-b-a-x should be added to the first tree, and a-b-c-d-y 831 should be added to the second one. 833 In the case, when either x=r or y=r or when neither x is an ancestor 834 of y nor y is an ancestor of x in any of the trees, the endpoints are 835 not firmly bound to one of the trees, it is only important to put the 836 links to one endpoint in one of the trees and put the links towards 837 the other endpoint to the other tree. In our example this means that 838 either d-c-b-a-x or a-b-c-d-y could be added to the first tree. 839 Naturally, then the other endpoint must be selected for the second 840 tree. 842 In order to protect against the failure of the root r, we need to 843 construct (maximally) redundant trees, where there is only one edge 844 entering to the root on one of the trees. This makes the root a leaf 845 in that tree. To achieve this, we should add the ear to the second 846 tree through r only if both endpoints are r. Moreover, we need to 847 select an ear with different endpoints when it is possible. 849 Finding an ear is relatively simple and can be done in different 850 ways. Probably the simplest way is to find a ready node q (q is not 851 the root) with a non-ready neighbor w, (virtually) remove q from the 852 topology, and to find a path from w to r; since the network is 2- 853 connected, such a path either reaches r, or reach another ready node. 854 Moreover, when only r is ready such a node q does not exist, so we 855 select one of r's neighbors as w, and remove not r but the link 856 between them. 858 e---d 859 / / \ 860 r---c f 861 \ \ / 862 a---b 864 Figure 7 A 2-connected network 866 e---d e---d 867 / / \ 868 r c f r---c f 869 \ \ / \ 870 a---b a---b 872 Figure 8 The two maximally redundant trees found in the network 873 depicted in Figure 7. 875 Now, a simple example is in order. Consider the network depicted in 876 Figure 7, and suppose that the common root is node r. We have only r 877 in the trees, so we select one of its neighbors, let it be a, remove 878 the link between them, and select a path (let it be the shortest one) 879 from a to r; this path is a-b-c-r, so the ear is r-a-b-c-r. Since 880 both endpoints of the ear are r, selecting the right tree is not 881 important, e.g., we can add c-b-a-r to the first tree, and a-b-c-r to 882 the second one (Figure 8). This way, r, a, b and c form the set of 883 "ready" nodes. From the ready set, c and d are not the root and have 884 non-ready neighbors. Let us select, e.g., c. The shortest path from d 885 to r when c is removed is d-e-r, so we have ear c-d-e-r, we add d-e-r 886 to the first tree and e-d-c to the second one (recall that we do not 887 want to create a new neighbor for r in the second tree). Finally, the 888 last non-ready node is f, and the ear is b-f-d. Since neither is b an 889 ancestor of d nor is d an ancestor of b in any of the trees, we can 890 connect f to the trees in both ways. E.g., add f-b to the first tree, 891 and f-d to the second one. 893 A.6. Non-2-connected networks 895 When, however, the network is not 2-connected, it is not always 896 possible to find a pair of node-disjoint paths from any node x to 897 root r, which makes our previous algorithm unable to find the trees. 898 However, while the network is connected, it is made up by 2-connected 899 components bordered by "cut-vertices" (naturally, some of these 900 components may contain only one node). A node is a cut-vertex, if 901 removing that node splits the network into two. 903 A simple algorithm to find the components and the cut-vertices can be 904 to (virtually) remove each vertex one by one, and check connectivity 905 with BFS or DFS. Moreover, nodes a and b are in the same 2-connected 906 component, if a remains reachable from b after removing any single 907 node. Note that linear time algorithms do exist that find both the 2- 908 connected components and the cut-vertices. 910 Now, we can build up redundant trees in each component. In components 911 containing r, the root of such trees must be r. Otherwise, in the 912 remaining components the root must be the last node in the component 913 along a path to the root. Recall, that this must be a cut-vertex, so 914 it is the same for each path emanating from that component. 916 At this point, we are ready, if there is no cut-edge in the network. 917 However, if some 2-connected components are connected by a cut-edge, 918 we must add that edge to both of the trees. 920 e---d i 921 / / \ /| 922 r---c f--g | 923 \ \ / \| 924 a---b j 926 Figure 9 Non-2-connected network 928 e---d i e---d i 929 / /| / \ | 930 r c f--g | r---c f--g | 931 \ \ / | \ \| 932 a---b j a---b j 934 Figure 10 The two maximally redundant trees found in the network 935 depicted Figure 9. 937 As an example consider the network depicted in Figure 9. Observe that 938 now we have two 2-connected components, one contains r, a, b, c, d, 939 e, f and the other contains g, i, j. Moreover, these components have 940 no common node, they are connected with a cut-edge. 942 Finding the trees in the component containing r is simple, these 943 trees are the same as previously. Moreover, the other component is a 944 cycle, so it will be covered by a single ear. Finally we must add 945 link f-g to both of the trees, to get the trees depicted in Figure 946 10. 948 A.7. Finding maximally redundant trees in distributed environment 950 If we need to compute exactly the same maximally redundant trees at 951 each of the routers, consistency needs to be ensured by tie-breaking 952 mechanisms. Observe that the previous algorithm has multiple choices 953 when it selects how to connect nodes to the trees when only r is 954 ready, how to select ready node q and non-ready node w for a later 955 ear and when neither of the endpoints is an ancestor of the other 956 one. 958 All the previous problems can easily be handled. E.g., the first ear 959 should be connected in such a way, that the neighbor of r with the 960 lowest ID must be directly connected to r in the first tree. 961 Moreover, later we should choose ready router with non-ready neighbor 962 as q and its non-ready neighbor with the lowest ID as w. Finally, 963 when neither of the endpoint is an ancestor of the other one, connect 964 the ear to the endpoint with the lower ID in the first tree. 966 Authors' Addresses 968 Wenhu Lu 969 Ericsson 970 300 Holger Way, San Jose, CA 95134 971 Email: wenhu.lu@ericsson.com 973 Sriganesh Kini 974 Ericsson 975 300 Holger Way, San Jose, CA 95134 976 Email: sriganesh.kini@ericsson.com 978 Andras Csaszar 979 Ericsson 980 Irinyi utca 4-10, Budapest, Hungary, 1117 981 Email: Andras.Csaszar@ericsson.com 983 Gabor Sandor Enyedi 984 Ericsson 985 Irinyi utca 4-10, Budapest, Hungary, 1117 986 Email: Gabor.Sandor.Enyedi@ericsson.com 988 Jeff Tantsura 989 Ericsson 990 300 Holger Way, San Jose, CA 95134 991 Email: jeff.tantsura@ericsson.com 993 Albert Tian 994 Ericsson 995 300 Holger Way, San Jose, CA 95134 996 Email: albert.tian@ericsson.com