idnits 2.17.00 (12 Aug 2021) /tmp/idnits25464/draft-ietf-idmr-gum-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2022-05-14) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2022 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity. ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 2) being 109 lines == It seems as if not all pages are separated by form feeds - found 0 form feeds but 43 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There is 1 instance of too long lines in the document, the longest one being 3 characters in excess of 72. ** The abstract seems to contain references ([MASC]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 1387 has weird spacing: '...is less than...' == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords -- however, there's a paragraph with a matching beginning. Boilerplate error? (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'BR32' is mentioned on line 216, but not defined == Missing Reference: 'BR41' is mentioned on line 216, but not defined == Missing Reference: 'BR31' is mentioned on line 218, but not defined == Missing Reference: 'BR42' is mentioned on line 218, but not defined == Missing Reference: 'BR43' is mentioned on line 218, but not defined == Missing Reference: 'BR22' is mentioned on line 220, but not defined == Missing Reference: 'BR52' is mentioned on line 220, but not defined == Missing Reference: 'BR53' is mentioned on line 220, but not defined == Missing Reference: 'BR21' is mentioned on line 225, but not defined == Missing Reference: 'BR51' is mentioned on line 225, but not defined == Missing Reference: 'BR12' is mentioned on line 227, but not defined == Missing Reference: 'BR61' is mentioned on line 227, but not defined == Missing Reference: 'BR13' is mentioned on line 229, but not defined == Missing Reference: 'BR71' is mentioned on line 233, but not defined == Missing Reference: 'BR81' is mentioned on line 233, but not defined == Missing Reference: 'BRXY' is mentioned on line 237, but not defined == Missing Reference: 'HPIM' is mentioned on line 256, but not defined == Missing Reference: 'PIM-SM' is mentioned on line 256, but not defined == Unused Reference: 'DVMRP' is defined on line 1848, but no explicit reference was found in the text == Unused Reference: 'MOSPF' is defined on line 1873, but no explicit reference was found in the text == Unused Reference: 'PIMDM' is defined on line 1877, but no explicit reference was found in the text ** Obsolete normative reference: RFC 1771 (ref. 'BGP') (Obsoleted by RFC 4271) ** Obsolete normative reference: RFC 2283 (ref. 'MBGP') (Obsoleted by RFC 2858) -- Possible downref: Non-RFC (?) normative reference: ref. 'CBT' == Outdated reference: A later version (-02) exists of draft-ietf-idmr-cbt-br-spec-00 -- Possible downref: Normative reference to a draft: ref. 'CBTDM' == Outdated reference: A later version (-11) exists of draft-ietf-idmr-dvmrp-v3-05 ** Downref: Normative reference to an Informational draft: draft-ietf-idmr-dvmrp-v3 (ref. 'DVMRP') -- Possible downref: Non-RFC (?) normative reference: ref. 'DWR' == Outdated reference: draft-thaler-multicast-interop has been published as RFC 2715 ** Downref: Normative reference to an Informational draft: draft-thaler-multicast-interop (ref. 'INTEROP') == Outdated reference: draft-ietf-ipngwg-multicast-assgn has been published as RFC 2375 ** Downref: Normative reference to an Informational draft: draft-ietf-ipngwg-multicast-assgn (ref. 'IPv6MAA') == Outdated reference: A later version (-03) exists of draft-ietf-mboned-imrp-some-issues-02 -- Possible downref: Normative reference to a draft: ref. 'ISSUES' -- Possible downref: Non-RFC (?) normative reference: ref. 'MASC' ** Downref: Normative reference to an Historic RFC: RFC 1584 (ref. 'MOSPF') == Outdated reference: A later version (-08) exists of draft-ietf-idmr-pim-dm-spec-05 -- Possible downref: Normative reference to a draft: ref. 'PIMDM' ** Obsolete normative reference: RFC 2117 (ref. 'PIMSM') (Obsoleted by RFC 2362) ** Obsolete normative reference: RFC 1966 (ref. 'REFLECT') (Obsoleted by RFC 4456) ** Obsolete normative reference: RFC 1700 (Obsoleted by RFC 3232) -- Duplicate reference: RFC1771, mentioned in 'RFC1771', was also mentioned in 'BGP'. ** Obsolete normative reference: RFC 1771 (Obsoleted by RFC 4271) Summary: 20 errors (**), 0 flaws (~~), 32 warnings (==), 9 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 IDMR Working Group D. Thaler 2 Internet Engineering Task Force Microsoft 3 INTERNET-DRAFT D. Estrin 4 November 17, 1998 USC/ISI 5 Expires May 1999 D. Meyer 6 Cisco 7 Editors 9 Border Gateway Multicast Protocol (BGMP): 10 Protocol Specification 11 13 Status of this Memo 15 This document is an Internet Draft. Internet Drafts are working 16 documents of the Internet Engineering Task Force (IETF), its Areas, and 17 its Working Groups. Note that other groups may also distribute working 18 documents as Internet Drafts. 20 Internet Drafts are valid for a maximum of six months and may be 21 updated, replaced, or obsoleted by other documents at any time. It is 22 inappropriate to use Internet Drafts as reference material or to cite 23 them other than as a "work in progress". 25 Abstract 27 This document describes BGMP, a protocol for inter-domain multicast 28 routing. BGMP builds shared trees for active multicast groups, and 29 allows receiver domains to build source-specific, inter-domain, 30 distribution branches where needed. Building upon concepts from CBT and 31 PIM-SM, BGMP requires that each multicast group be associated with a 32 single root (in BGMP it is referred to as the root domain). BGMP 33 assumes that at any point in time, different ranges of the class D space 34 are associated (e.g., with MASC [MASC]) with different domains. Each of 35 these domains then becomes the root of the shared domain-trees for all 36 groups in its range. Multicast participants will generally receive 37 better multicast service if the session initiator's address allocator 38 selects addresses from its own domain's part of the space, thereby 40 Draft BGMP November 1998 42 causing the root domain to be local to at least one of the session 43 participants. 45 1. Acknowledgements 47 In addition to the editors, the following individuals have 48 contributed to the design of BGMP: Cengiz Alaettinoglu, Tony 49 Ballardie, Steve Casner, Steve Deering, Dino Farinacci, Bill Fenner, 50 Mark Handley, Ahmed Helmy, Van Jacobson, and Satish Kumar. 52 This document is the product of the IETF IDMR Working Group with Dave 53 Thaler, Deborah Estrin, and David Meyer as editors. 55 Rusty Eddy also provided valuable feedback on this document. 57 2. Purpose 59 It has been suggested that inter-domain multicast is better supported 60 with a rendezvous mechanism whereby members receive source's data 61 packets without any sort of global broadcast (e.g., DVMRP and PIM-DM 62 broadcast initial data packets and MOSPF broadcasts membership 63 information). CBT [CBT] and PIM-SM [PIMSM] use a shared group-tree, 64 to which all members join and thereby hear from all sources (and to 65 which non-members do not join and thereby hear from no sources). 67 This document describes BGMP, a protocol for inter-domain multicast 68 routing. BGMP builds shared trees for active multicast groups, and 69 allows domains to build source-specific, inter-domain, distribution 70 branches where needed. Building upon concepts from CBT and PIM-SM, 71 BGMP requires that each global multicast group be associated with a 72 single root. However, in BGMP, the root is an entire exchange or 73 domain, rather than a single router. 75 BGMP assumes that ranges of the class D space have been associated 76 (e.g., with MASC [MASC]) with selected domains. Each such domain then 77 becomes the root of the shared domain-trees for all groups in its 78 range. An address allocator will generally achieve better 79 distribution trees if it takes its multicast addresses from its own 80 domain's part of the space, thereby causing the root domain to be 81 local. 83 BGMP uses TCP as its transport protocol. This eliminates the need to 84 implement message fragmentation, retransmission, acknowledgement, and 86 Draft BGMP November 1998 88 sequencing. BGMP uses TCP port 264 for establishing its connections. 89 This port is distinct from BGP's port to provide protocol 90 independence, and to facilitate distinguishing between protocol 91 packets (e.g., by packet classifiers, diagnostic utilities, etc.) 93 Two BGMP peers form a TCP connection between one another, and 94 exchange messages to open and confirm the connection parameters. 95 They then send incremental Join/Prune Updates as group memberships 96 change. BGMP does not require periodic refresh of individual 97 entries. KeepAlive messages are sent periodically to ensure the 98 liveness of the connection. Notification messages are sent in 99 response to errors or special conditions. If a connection encounters 100 an error condition, a notification message is sent and the connection 101 is closed. 103 3. Terminology 105 This document uses the following technical terms: 107 Domain: 108 A set of one or more contiguous links and zero or more routers 109 surrounded by one or more multicast border routers. Note that this 110 loose definition of domain also applies to an external link between 111 two domains, as well as an exchange. 113 Root Domain: 114 When constructing a shared tree of domains for some group, one 115 domain will be the "root" of the tree. The root domain receives 116 data from each sender to the group, and functions as a rendezvous 117 domain toward which member domains can send inter-domain joins, and 118 to which sender domains can send data. 120 Multicast RIB: 121 The Routing Information Base, or routing table, used to calculate 122 the "next-hop" towards a particular address for multicast traffic. 124 Multicast IGP (M-IGP): 125 A generic term for any multicast routing protocol used for tree 126 construction within a domain. Typical examples of M-IGPs are: 127 DVMRP, PIM-DM, PIM-SM, CBT, and MOSPF. 129 Draft BGMP November 1998 131 EGP: A generic term for the interdomain unicast routing protocol in use. 132 Typically, this will be some version of BGP which can support a 133 Multicast RIB, such as MBGP [MBGP], containing both unicast and 134 multicast address prefixes. 136 Component: 137 The portion of a border router associated with (and logically 138 inside) a particular domain that runs the multicast IGP (M-IGP) for 139 that domain, if any. Each border router thus has zero or more 140 components inside routing domains. In addition, each border router 141 with external links that do not fall inside any routing domain will 142 have an inter-domain component that runs BGMP. 144 External peer: 145 A border router in another multicast AS (autonomous system, as used 146 in BGP), to which a BGMP TCP-connection is open. Assuming MBGP is 147 being used, a separate "eBGP" TCP-connection will also be open to 148 the same peer. 150 Internal peer: 151 Another border router of the same multicast AS. A border router 152 either speaks iBGP ("internal" BGP) directly to internal peers in a 153 full mesh, or indirectly through a route reflector [REFLECT]. A 154 border router is only required to establish a BGMP TCP-connection 155 to an internal peer when one border router acts as as a data 156 injector for another. 158 Next-hop peer: 159 The next-hop peer towards a given IP address is the next EGP router 160 on the path to the given address, according to multicast RIB routes 161 in the EGP's routing table (e.g., in MBGP, routes whose Subsequent 162 Address Family Identifier field indicates that the route is valid 163 for multicast traffic). 165 target: 166 Either an EGP peer, or an M-IGP component. 168 Tree State Table: 169 This is a table of (S-prefix,G-prefix) entries (including (*,G- 170 prefix) entries) that have been explicitly joined by a set of 171 targets. Each entry has, in addition to the source and group 172 addresses and masks, a list of targets that have explicitly 173 requested data (on behalf of directly connected hosts or downstream 174 routers). (S,G) entries also have an "SPT" bit. 176 Draft BGMP November 1998 178 The key words "MUST", "MUST NOT", "SHOULD", "SHOULD NOT", and "MAY" in 179 this document are to be interpreted as described in [RFC2119]. 181 4. Protocol Overview 183 BGMP maintains group-prefix state in response to messages from BGMP 184 peers and notifications from M-IGP components. Group-shared trees are 185 rooted at the domain advertising the group prefix covering those 186 groups. When a receiver joins a specific group address, the border 187 router towards the root domain generates a group-specific Join 188 message, which is then forwarded Border-Router-by-Border-Router 189 towards the root domain (see Figure 1). BGMP Join and Prune messages 190 are sent over TCP connections between BGMP peers, and BGMP protocol 191 state is refreshed by KEEPALIVE messages periodically sent over TCP. 193 BGMP routers build group-specific bidirectional forwarding state as 194 they process the BGMP Join messages. Bidirectional forwarding state 195 means that packets received from any target are forwarded to all 196 other targets in the target list without any RPF checks. No group- 197 specific state or traffic exists in parts of the network where there 198 are no members of that group. 200 BGMP routers build source-specific unidirectional forwarding state, 201 only where needed, to be compatible with source-specific trees (SPTs) 202 used by some M-IGPs (e.g., DVMRP, PIM-DM, or PIM-SM). A domain that 203 uses an SPT-based M-IGP may need to inject multicast packets from 204 external sources via different border routers (to be compatible with 205 the M-IGP RPF checks) which thus act as "surrogates". For example, in 206 the Transit_1 domain, data from Src_A arrives at BR12, but must be 207 injected by BR11. A surrogate router may create a source-specific 208 BGMP branch if no shared tree state exists. Note: stub domains with 209 a single border router, such as Rcvr_Stub_7 in Figure 1, receive all 210 multicast data packets through that router, to which all RPF checks 211 point. Therefore, stub domains never build source-specific state. 213 Root_Domain 214 [BR91]--------------------------\ 215 | | 216 [BR32] [BR41] 217 Transit_3 Transit_4 218 [BR31] [BR42] [BR43] 219 | | | 220 [BR22] [BR52] [BR53] 221 Transit_2 Transit_5 223 Draft BGMP November 1998 225 [BR21] [BR51] 226 | | 227 [BR12] [BR61] 228 Transit_1[BR11]----------[BR62]Stub_6 229 [BR13] (Src_A) 230 | (Rcvr_D) 231 ------------------- 232 | | 233 [BR71] [BR81] 234 Rcvr_Stub_7 Src_only_Stub_8 235 (Rcvr_C) (Src_B) 237 Figure 1: Example inter-domain topology. [BRXY] represents a BGMP border 238 router. Transit_X is a transit domain network. *_Stub_X is a stub 239 domain network. 241 Data packets are forwarded based on a combination of BGMP and M-IGP 242 rules. The router forwards to a set of targets according to a 243 matching (S,G) BGMP tree state entry if it exists. If not found, the 244 router checks for a matching (*,G) BGMP tree state entry. If neither 245 is found, then the packet is sent natively to the next-hop EGP peer 246 for G, according to the Multicast RIB (for example, in the case of a 247 non-member sender such as Src_B in Figure 1). If a matching entry was 248 found, the packet is forwarded to all other targets in the target 249 list. In this way BGMP trees forward data in a bidirectional manner. 250 If a target is an M-IGP component then forwarding is subject to the 251 rules of that M-IGP protocol. 253 4.1. Design Rationale 255 Several other protocols, or protocol proposals, build shared trees 256 within domains [CBT, HPIM, PIM-SM]. The design choices made for BGMP 257 result from our focus on Inter-Domain multicast in particular. The 258 design choices made by CBT and PIM-SM are better suited to the wide- 259 area intra-domain case. There are three major differences between 260 BGMP and other shared-tree protocols: 262 (1) Unidirectional vs. Bidirectional trees 264 Bidirectional trees (using bidirectional forwarding state as 265 described above) minimize third party dependence which is essential 266 in the inter-domain context. For example, in Figure 1, stub domains 7 267 and 8 would like to exchange multicast packets without being 269 Draft BGMP November 1998 271 dependent on the quality of connectivity of the root domain. 272 However, unidirectional shared trees (i.e., those using RPF checks) 273 have more aggressive loop prevention and share the same processing 274 rules as source-specific entries which are inherently unidirectional. 276 The lack of third party dependence concerns in the INTRA domain case 277 reduces the incentive to employ bidirectional trees. BGMP supports 278 bidirectional trees because it has to, and because it can without 279 excessive cost. 281 (2) Source-specific distribution trees/branches 283 In a departure from other shared tree protocols, source-specific BGMP 284 state is built ONLY where (a) it is needed to pull the multicast 285 traffic down to a BGMP router that has source-specific (S,G) state, 286 and (b) that router is NOT already on the shared tree (i.e., has no 287 (*,G) state), and (c) that router does not want to receive packets 288 via encapsulation from from a router which is on the shared tree. 289 BGMP provides source-specific branches because most M-IGP protocols 290 in use today build source-specific trees. BGMP's source-specific 291 branches eliminate the unnecessary overhead of encapsulations for 292 high data rate sources from the shared tree's ingress router to the 293 surrogate injector (e.g. from BR12 to BR11 in Figure 1). Moreover, 294 cases in which shared paths are significantly longer than SPT paths 295 will also benefit. 297 However, we do not build source-specific inter-domain trees in 298 general because (a) inter-domain connectivity is generally less rich 299 than intra-domain connectivity, so shared distribution trees should 300 have more acceptible path length and traffic concentration properties 301 in the inter-domain context, than in the intra-domain case, and (b) 302 by having the shared tree state always take precedence over source- 303 specific tree state, we avoid ambiguities that can otherwise arise. 305 In summary, BGMP trees are, in a sense, a hybrid between CBT and 306 PIM-SM trees. 308 (3) Method of choosing root of group shared tree 310 The choice of a group's shared-tree-root has implications for 311 performance and policy. In the intra-domain case it can be assumed 312 that all potential shared-tree roots (RPs/Cores) within the domain 313 are equally suited to be the root for a group that is initiated 314 within that domain. In the INTER-domain case, there is far more 315 opportunity for unacceptably poor locality, and administrative 317 Draft BGMP November 1998 319 control of a group's shared-tree root. Therefore in the intra-domain 320 case, other protocols treat all candidate roots (RPs or Cores) as 321 equivalent and emphasize load sharing and stability to maximize 322 performance. In the Inter-Domain case, all roots are not equivalent, 323 and we adopt an approach whereby a group's root domain is not random 324 but is subject to administrative and performance input. 326 5. Protocol Details 328 In this section, we describe the detailed protocol that border 329 routers perform. We assume that each border router conforms to the 330 component-based model described in [INTEROP]. 332 5.1. Interaction with the EGP 334 A fundamental requirement imposed by BGMP on the design of an EGP is 335 that it be able to carry multicast prefixes. For example, a multi- 336 protocol BGP (MBGP) must be able to carry a multicast prefix in the 337 Unicast Network Layer Reachability Information (NLRI) field of the 338 UPDATE message (i.e., either an IPv4 class D prefix or an IPv6 prefix 339 with high-order octet equal to FF [IPv6MAA]). This capability is 340 required by BGMP in the implementation of bi-directional trees; BGMP 341 must be able to forward data and control packets to the next hop 342 towards either a unicast source S or a multicast group G (see section 343 5.2). It is also required that the path attributes defined in 344 [RFC1771] have the same semantics whether they are accompany unicast 345 or multicast NLRI. 347 MBGP [MBGP] satisfies the requirement described above. [MBGP] defines 348 the optional transitive attributes Multiprotocol Reachable NLRI 349 (MP_REACH_NLRI) and Multiprotocol Unreachable (MP_UNREACH_NRLI) to 350 carry sets of reachable or unreachable destinations, and the 351 appropriate next hop in the case of MP_REACH_NLRI. These attributes 352 contain an Address Family Information field [RFC1700] which indicates 353 the type of NLRI carried in the attribute. In addition, the attribute 354 carries another field, the Subsequent Address Family Identifier, or 355 SAFI, which can be used to provide additional information about the 356 type of NLRI. For example, SAFI value two indicates that the NLRI is 357 valid for multicast forwarding. BGMP's requirement can be satisfied 358 by allowing the NLRI field of the MP_REACH_NLRI (or MP_UNREACH_NLRI) 359 to carry a multicast prefix in the Prefix field of the NLRI encoding. 361 Finally, while not required for correct BGMP operation, the design of 363 Draft BGMP November 1998 365 an EGP should also provide a mechanism that allows discrimination 366 between NLRI that is to be used for unicast forwarding and NLRI to be 367 used for multicast forwarding. This property is required to support 368 multicast-specific policy. As mentioned above, MBGP [MBGP] has this 369 capability. 371 5.2. Multicast Data Packet Processing 373 For BGMP rules to be applied, an incoming packet must first be 374 "accepted": 376 o If the packet arrived on an interface owned by an M-IGP, the M-IGP 377 component determines whether the packet should be accepted or 378 dropped according to its rules. If the packet is accepted, the 379 packet is forwarded (or not forwarded) out any other interfaces 380 owned by the same component, as specified by the M-IGP. 382 o If the packet was received over a point-to-point interface owned 383 by BGMP, the packet is accepted. 385 o If the packet arrived on a multiaccess network interface owned by 386 BGMP, the packet is accepted if it is the designated forwarder for 387 longest matching route for S, if it is receiving data on a 388 source-specific branch, or for the longest matching route for G. 390 If the packet is accepted, then the router checks the tree state 391 table for a matching (S,G) entry. If one is found, but the packet 392 was not received from the next hop target towards S (if the entry's 393 SPT bit is True), or was not received from the next hop target 394 towards G (if the entry's SPT bit is False) then the packet is 395 dropped and no further actions are taken. If no (S,G) entry was 396 found, the router then checks for a matching (*,G) entry. 398 If neither is found, then the packet is forwarded towards the next- 399 hop peer for G, according to the Multicast RIB. If a matching entry 400 was found, the packet is forwarded to all other targets in the target 401 list. 403 Forwarding to a target which is an M-IGP component means that the 404 packet is forwarded out any interfaces owned by that component 405 according to that component's multicast forwarding rules. 407 Draft BGMP November 1998 409 5.3. BGMP processing of Join and Prune messages and notifications 411 5.3.1. Receiving Joins 413 When the BGMP component receives a (*,G) or (S,G) Join alert from 414 another component, or a BGMP (S,G) or (*,G) Join message from an 415 external peer, it searches the tree state table for a matching entry. 416 If an entry is found, and that peer is already listed in the target 417 list, then no further actions are taken. 419 Otherwise, if no (*,G) or (S,G) entry was found, one is created. In 420 the case of a (*,G), the target list is initialized to contain the 421 next-hop peer towards G, if it is an external peer. If the peer is 422 internal, the target list is initialized to contain the M-IGP 423 component owning the next-hop interface. If there is no next-hop 424 peer (because G is inside the domain), then the target list is 425 initialized to contain the next-hop component. If an (S,G) entry 426 exists for the same G for which the (*,G) Join is being processed, 427 and the next-hop peers toward S and G are different, the BGMP router 428 must first send a (S,G) Prune message toward the source and clear the 429 SPT bit on the (S,G) entry, before activating the (*,G) entry. 431 The target from which the Join was received is then added to the 432 target list. The router then looks up S or G in the Multicast RIB to 433 find the next-hop EGP peer. If the target list, not including the 434 next-hop target towards G for a (*,G) entry, becomes non-null as a 435 result, the next-hop EGP peer must be notified as follows: 437 a) If the next-hop peer towards G (for a (*,G) entry) is an external 438 peer, a BGMP (*,G) Join message is unicast to the external peer. 439 If the next-hop peer towards S (for an (S,G) entry) is an external 440 peer, and the router does NOT have any active (*,G) state for that 441 group address G, a BGMP (S,G) Join message is unicast to the 442 external peer. A BGMP (S,G) Join message is never sent to an 443 external peer by a router that also contains active (*,G) state 444 for the same group. If the next-hop peer towards S (for an (S,G 445 entry) is an external peer and the router DOES have active (*,G) 446 state for that group G, the SPT bit is always set to False. 448 b) If the next-hop peer is an internal peer, a (*,G) or (S,G) Join 449 alert is sent to the M-IGP component owning the next-hop 450 interface. 452 c) If there is no next-hop peer, a (*,G) or (S,G) Join alert is sent 453 to the M-IGP component owning the next-hop interface. 455 Draft BGMP November 1998 457 5.3.2. Receiving Prune Notifications 459 When the BGMP component receives a (*,G) or (S,G) Prune alert from 460 another component, or a BGMP (*,G) or (S,G) Prune message from an 461 external peer, it searches the tree state table for a matching entry. 462 If no (S,G) entry was found for an (S,G) Prune, but (*,G) state 463 exists, an (S,G) entry is created, with the target list copied from 464 the (*,G) entry. If no matching entry exists, or if the component or 465 peer is not listed in the target list, no further actions are taken. 467 Otherwise, the component or peer is removed from the target list. If 468 the target list becomes null as a result, the next-hop peer towards G 469 (for a (*,G) entry), or towards S (for an (S,G) entry if and only if 470 the BGMP router does NOT have any corresponding (*,G) entry), must be 471 notified as follows. 473 a) If the peer is an external peer, a BGMP (*,G) or (S,G) Prune 474 message is unicast to it. 476 b) If the next-hop peer is an internal peer, a (*,G) or (S,G) Prune 477 alert is sent to the M-IGP component owning the next-hop 478 interface. 480 c) If there is no next-hop peer, a (*,G) or (S,G) Prune alert is sent 481 to the M-IGP component owning the next-hop interface. 483 5.3.3. Receiving Route Change Notifications 485 When a border router receives a route for a new prefix in the 486 multicast RIB, or a existing route for a prefix is withdrawn, a route 487 change notification for that prefix must be sent to the BGMP 488 component. In addition, when the next hop peer (according to the 489 multicast RIB) changes, a route change notification for that prefix 490 must be sent to the BGMP component. 492 In addition, an internal route for each class-D prefix associated 493 with the domain (if any) MUST be injected into the multicast RIB in 494 the EGP by the domain's border routers. 496 When a route for a new group prefix is learned, or an existing route 497 for a group prefix is withdrawn, or the next-hop peer for a group 498 prefix changes, a BGMP router updates all affected (*,G) target 499 lists. The router sends a (*,G) Join to the new next-hop target, and 501 Draft BGMP November 1998 503 a (*,G) Prune to the old next-hop target, as appropriate. 505 When an existing route for a source prefix is withdrawn, or the 506 next-hop peer for a source prefix changes, a BGMP router updates all 507 affected (S,G) target lists. The router sends a (S,G) Join to the 508 new next-hop target, and a (S,G) Prune to the old next-hop target, as 509 appropriate. 511 5.4. Interaction with M-IGP components 513 When an M-IGP component on a border router first learns that there 514 are internally-reached members for a group G (whose scope is larger 515 than that domain), a (*,G) Join alert is sent to the BGMP component. 516 Similarly, when an M-IGP component on a border router learns that 517 there are no longer internally-reached members for a group G (whose 518 scope is larger than a single domain), a (*,G) Prune alert is sent to 519 the BGMP component. 521 At any time, any M-IGP domain MAY decide to join a source-specific 522 branch for some external source S and group G. When the M-IGP 523 component in the border router that is the next-hop router for a 524 particular source S learns that a receiver wishes to receive data 525 from S on a source-specific path, an (S,G) Join alert is sent to the 526 BGMP component. When it is learned that such receivers no longer 527 exist, an (S,G) Prune alert is sent to the BGMP component. Recall 528 that the BGMP component will generate external source-specific Joins 529 only where the source-specific branch does not coincide with the 530 shared tree distribution tree for that group. 532 Finally, we will require that the border router that is the next-hop 533 internal peer for a particular address S or G be able to forward data 534 for a matching tree state table entry to all members within the 535 domain. This requirement has implications on specific M-IGPs as 536 follows. 538 5.4.1. Interaction with DVMRP and PIM-DM 540 DVMRP and PIM-DM are both "broadcast and prune" protocols in which 541 every data packet must pass an RPF check against the packet's source 542 address, or be dropped. If the border router receiving packets from 543 an external source is the only BR to inject the route for the source 544 into the domain, then there are no problems. For example, this will 545 always be true for stub domains with a single border router (see 547 Draft BGMP November 1998 549 Figure 1). Otherwise, the border router receiving packets externally 550 is responsible for encapsulating the data to any other border routers 551 that must inject the data into the domain for RPF checks to succeed. 552 Although peering sessions to internal peers are normally not 553 required, in this situation, BGMP TCP-connections must exist between 554 such internal peers, and the "virtual" interfaces used for 555 encapsulation are owned by BGMP. 557 When an intended border router injector for a source receives 558 encapsulated packets from another border router in its domain, it 559 should create source-specific (S,G) BGMP state. Note that the border 560 router may be configured to do this on a data-rate triggered basis so 561 that the state is not created for very low data-rate/intermittent 562 sources. If source-specific state is created, then its incoming 563 interface points to the virtual encapsulation interface from the 564 border router that forwarded the packet, and it has an SPT flag that 565 is initialized to be False. 567 When the (S,G) BGMP state is created, the BGMP component will in turn 568 send a BGMP (S,G) Join message to the next-hop external peer towards 569 S if there is no (*,G) state for that same group, G. The (S,G) BGMP 570 state will have the SPT bit set to False if (*,G) BGMP state is 571 present. 573 When the first data packet from S arrives from the external peer and 574 matches on the BGMP (S,G) state, and IF there is no (*,G) state, the 575 router sets the SPT flag to True, resets the incoming interface to 576 point to the external peer, and sends a BGMP (S,G) Prune message to 577 the border router that was encapsulating the packets (e.g., in Figure 578 1, BR11 sends the (Src_A,G) Prune to BR12). When the border router 579 with (*,G) state receives the prune for (S,G), it then deletes that 580 border router from its list of targets. 582 PIM-DM and DVMRP present an additional problem, i.e., no protocol 583 mechanism exists for joining and pruning entire groups; only joins 584 and prunes for individual sources are available. We therefore require 585 that some form of Domain-Wide Reports (DWRs) [DWR] are available 586 within such domains. Such messages provide the ability to join and 587 prune an entire group across the domain. One simple heuristic to 588 approximate DWRs is to assume that if there are any internally- 589 reached members, then at least one of them is a sender. With this 590 heuristic, the presense of any M-IGP (S,G) state for internally- 591 reached sources can be used instead. Sending a data packet to a 592 group is then equivalent to sending a DWR for the group. 594 Draft BGMP November 1998 596 5.4.2. Interaction with PIM-SM 598 Protocols such as PIM-SM build unidirectional shared and source- 599 specific trees. As with DVMRP and PIM-DM, every data packet must 600 pass an RPF check against some group-specific or source-specific 601 address. 603 The fewest encapsulations/decapsulations will be done when the 604 intra-domain tree is rooted at the next-hop internal peer towards G 605 (which becomes the RP), since in general that router will receive the 606 most packets from external sources. To achieve this, each BGMP 607 border router to a PIM-SM domain should send Candidate-RP- 608 Advertisements within the domain for those groups for which it is the 609 shared-domain tree ingress router. When the border router that is the 610 RP for a group G receives an external data packet, it forwards the 611 packet according to the M-IGP (i.e., PIM-SM) shared-tree outgoing 612 interface list. 614 Other border routers will receive data packets from external sources 615 that are farther down the bidirectional tree of domains. When a 616 border router that is not the RP receives an external packet for 617 which it does not have a source-specific entry, the border router 618 treats it like a local source by creating (S,G) state with a Register 619 flag set, based on normal PIM-SM rules; the Border router then 620 encapsulates the data packets in PIM-SM Registers and unicasts them 621 to the RP for the group. As explained above, the RP for the inter- 622 domain group will be one of the other border routers of the domain. 624 If a source's data rate is high enough, DRs within the PIM-SM domain 625 may switch to the shortest path tree. If the shortest path to an 626 external source is via the group's ingress router for the shared 627 tree, the new (S,G) state in the BGMP border router will not cause 628 BGMP (S,G) Joins because that border router will already have (*,G) 629 state. If however, the shortest path to an external source is via 630 some other border router, that border router will create (S,G) BGMP 631 state in response to the M-IGP (S,G) Join alert. In this case, 632 because there is no local (*,G) state to supress it, the border 633 router will send a BGMP (S,G) Join to the next-hop external peer 634 towards S, in order to pull the data down directly. (See BR11 in 635 Figure 1.) As in normal PIM-SM operation, those PIM-SM routers that 636 have (*,G) and (S,G) state pointing to different incoming interfaces 637 will prune that source off the shared tree. Therefore, all internal 638 interfaces may be eventually pruned off the internal shared tree. 640 Draft BGMP November 1998 642 5.4.3. Interaction with CBT 644 CBT builds bidirectional shared trees but must address two points of 645 compatibility with BGMP. First, CBT can not accommodate more than 646 one border router injecting a packet. Therefore, if a CBT domain 647 does have multiple external connections, the M-IGP components of the 648 border routers are responsible for insuring that only one of them 649 will inject data from any given source. This mechanism is provided 650 in [CBTDM]. 652 Second, CBT cannot process source-specific Joins or Prunes. Two 653 options thus exist for each CBT domain: 655 Option A: 656 The CBT component interprets a (S,G) Join alert as if it were an 657 (*,G) Join alert, as described in [INTEROP]. That is, if it is not 658 already on the core-tree for G, then it sends a CBT (*,G) JOIN- 659 REQUEST message towards the core for G. Similarly, when the CBT 660 component receives an (S,G) Prune alert, and the child interface 661 list for a group is NULL, then it sends a (*,G) QUIT_NOTIFICATION 662 towards the core for G. This option has the disadvantage of 663 pulling all data for the group G down to the CBT domain when no 664 members exist. 666 Option B: 667 The CBT domain does not propagate any source routes (i.e., non- 668 class D routes) to their external peers for the Multicast RIB 669 unless it is known that no other path exists to that prefix (e.g., 670 routes for prefixes internal to the domain or in a singly-homed 671 customer's domain may be propagated). This insures that source- 672 specific joins are never received unless the source's data already 673 passes through the domain on the shared tree, in which case the 674 (S,G) Join need not be propagated anyway. BGMP border routers will 675 only send source-specific Joins or Prunes to an external peer if 676 that external peer advertises source-prefixes in the EGP. If a 677 BGMP-CBT border router does receive an (S,G) Join or Prune, that 678 border router should ignore the message. 680 To minimize en/de-capsulations, CBTv2 BR's may follow the same 681 scheme as described under PIM-SM above, in which Candidate-Core 682 advertisements are sent for those groups for which it is the 683 shared-tree ingress router. 685 Draft BGMP November 1998 687 5.4.4. Interaction with MOSPF 689 As with CBTv2, MOSPF cannot process source-specific Joins or Prunes, 690 and the same two options are available. Therefore, an MOSPF domain 691 may either: 693 Option A: 694 send a Group-Membership-LSA for all of G in response to a (S,G) 695 Join alert, and "prematurely age" it out (when no other downstream 696 members exist) in response to an (S,G) Prune alert, OR 698 Option B: 699 not propagate any source routes (i.e., non-class D routes) to their 700 external peers for the Multicast RIB unless it is known that no 701 other path exists to that prefix (e.g., routes for prefixes 702 internal to the domain or in a singly-homed customer's domain may 703 be propagated) 705 5.5. Operation over Multi-access Networks 707 Multiaccess links require special handling to prevent duplicates. 708 The following mechanism enables BGMP to operate over multiaccess 709 links which do not run an M-IGP. This avoids broadcast-and-prune 710 behavior and does not require (S,G) state. 712 To elect a designated forwarder per prefix, BGMP uses a FWDR_PREF 713 message to exchange "forwarder preference" values for each prefix. 714 The peer with the highest forwarder preference becomes the designated 715 forwarder, with ties broken by lowest BGMP Identifier. The 716 designated forwarder is the router responsible for forwarding packets 717 up the tree, and is the peer to which joins will be sent. 719 When BGMP first learns that a route exists in the multicast RIB whose 720 next-hop interface is NOT the multiaccess link, the BGMP router sends 721 a BGMP FWDR_PREF message for the prefix, to all BGMP peers on the 722 LAN. The FWDR_PREF message contains a "forwarder preference value" 723 for the local router, and the same value MUST be sent to all peers on 724 the LAN. Likewise, when the prefix is no longer reachable, a 725 FWDR_PREF of 0 is sent to all peers on the LAN. 727 Whenever a BGMP router calculates the next-hop peer towards a 728 particular address, and that peer is reached over a BGMP-owned 729 multiaccess LAN, the designated forwarder is used instead. 731 Draft BGMP November 1998 733 When a BGMP router receives a FWDR_PREF message from a peer, it looks 734 up the matching route in its multicast RIB, and calculates the new 735 designated forwarder. If the router has tree state entries whose 736 parent target was the old forwarder, it sends Joins to the new 737 forwarder and Prunes to the old forwarder. 739 When a BGMP router which is NOT the designated forwarder receives a 740 packet on the multiaccess link, it is silently dropped. 742 Finally, this mechanism prevents duplicates where full peering exists 743 on a "logical" link. Where full peering does not exist, steps must 744 be taken (outside of BGMP) to present separate logical interfaces to 745 BGMP, each of which is a link with full peering. This might entail, 746 for example, using different link-layer address mappings, doing 747 encapsulation, or changing the physical media. 749 6. Interaction with address allocation 751 6.1. Requirements for BGMP components 753 Each border router must be able to determine (e.g., from MASC [MASC]) 754 which class-D prefixes (if any) belong to each domain in which an M- 755 IGP component resides, so that it can inject routes for them into the 756 routing table. 758 7. Transition Strategy 760 There have been significant barriers to multicast deployment in 761 Internet backbones. While many of the problems with the current 762 DVMRP backbone (MBONE) have been documented in [ISSUES], most of 763 these problems require longer term engineering solutions. However, 764 there is much that can be done with existing technologies to enable 765 deployment and put in place an architecture that will enable a smooth 766 transition to the next generation of inter-domain multicast routing 767 protocols (i.e., BGMP). This section proposes a near-term transition 768 strategy and architecture that is designed to be simple, risk- 769 neutral, and provide a smooth, incremental transition path to BGMP. 770 In addition, the transition architecture provides for improved 771 convergence properties, some initial policy control, and the 772 opportunity for providers to run either native or tunneled multicast 773 backbones and exchanges. 775 Draft BGMP November 1998 777 The transition strategy proposed here is to initially use MBGP [MBGP] 778 to provide the desired convergence and policy control properties, and 779 PIM-DM for multicast data forwarding. Once this architecture is in 780 place, backbones and exchanges can incrementally transition to BGMP 781 and domains running other M-IGPs may be incorporated more fully. 783 Since the current MBone uses a broadcast-and-prune backbone running 784 DVMRP, BGMP may view the entire MBone as a single multi-homed stub 785 domain (with a new AS number). The members-are-senders heuristic can 786 then be used initially to provide membership notifications within 787 this stub domain. 789 A BGMP backbone can then be formed by designating one or more neutral 790 PIM-DM domains (say, exchanges) as initial BGMP backbones. Each 791 exchange is then associated with a group prefix which is injected 792 into the Multicast RIB by all MBGP/BGMP border routers on that 793 exchange. 795 Any domain which meets the following constraints may then transition 796 from a normal MBone-connected domain to one running BGMP: 798 (1) Must peer with another BGMP domain and participate in M-BGP to 799 propagate routes in the Multicast RIB. 801 (2) Must establish an internal (to the MBone AS) EGP (e.g., iBGP) peer 802 relationship with other border routers of the MBone "stub" domain, 803 as is done with unicast routing. We expect this to eventually 804 involve the use of one or more route reflectors [REFLECT] inside 805 the MBone domain. 807 (3) If the transition will partition the MBone "stub" domain, then it 808 must be insured that the MBone domain will be administratively 809 split into multiple domains, each with a different multicast AS 810 number. 812 Draft BGMP November 1998 814 7.1. Preventing transit through the MBone stub 816 We desire that two AS's which are mutually reachable through BGMP use 817 paths which do not pass through the MBone stub domain. This is 818 illustrated in Figure 2, where the MBone stub is AS 5, which is 819 multi-homed to both AS 3 and AS 4. Paths between sources and 820 destinations which have already transitioned to MBGP/BGMP should not 821 use AS 5 as transit unless no other path exists. 823 ----------------------\ /---------------------------- 824 | | 825 DVMRP /----\ | | /----\ IGP/iBGP 826 ..............| BR |+++++++++| BR |----------- 827 \----/ | E | \----/ 828 + | B | + AS 3 829 MBone + | G | + 830 + | P \-----+---------------------- 831 AS 5 iBGP + | + eBGP 832 + | /-----+---------------------- 833 + | | + 834 + | | + 835 DVMRP /----\ | | /----\ IGP/iBGP 836 ..............| BR |+++++++++| BR |----------- 837 \----/ | | \----/ 838 | | AS 4 839 | | 840 ----------------------/ \---------------------------- 842 Figure 2: Preventing Transit through MBone Stub 844 This requirement is easily solved using standard BGP policy 845 mechanisms. The MBone border routers should prefer EGP routes to 846 DVMRP routes, since DVMRP cannot tag routes as being external. Thus, 847 external routes may appear in the DVMRP routing table, but will not 848 be imported into the EGP since they will be overridden by iBGP 849 routes. 851 Other EGP routers should prefer routes whose ASpath does not contain 852 the well-known MBone AS number. This will insure that the route 853 through the MBone stub is not used unless no other path exists. For 854 safety, routes whose ASpath begins with the MBone AS should receive 855 the worst preference. 857 Draft BGMP November 1998 859 8. Message Formats 861 This section describes message formats used by BGMP. 863 Messages are sent over a reliable transport protocol connection. A 864 message is processed only after it is entirely received. The maximum 865 message size is 4096 octets. All implementations are required to 866 support this maximum message size. 868 All fields labelled "Reserved" below must be transmitted as 0, and 869 ignored upon receipt. 871 8.1. Message Header Format 873 Each message has a fixed-size (4-byte) header. There may or may not 874 be a data portion following the header, depending on the message 875 type. The layout of these fields is shown below: 877 0 1 2 3 878 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 879 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 880 | Length | Type | Reserved | 881 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 883 Length: 884 This 2-octet unsigned integer indicates the total length of the 885 message, including the header, in octets. Thus, e.g., it allows 886 one to locate in the transport-level stream the start of the next 887 message. The value of the Length field must always be at least 4 888 and no greater than 4096, and may be further constrained, depending 889 on the message type. No "padding" of extra data after the message 890 is allowed, so the Length field must have the smallest value 891 required given the rest of the message. 893 Type: 894 This 1-octet unsigned integer indicates the type code of the 895 message. The following type codes are defined: 897 1 - OPEN 898 2 - UPDATE 899 3 - NOTIFICATION 900 4 - KEEPALIVE 902 Draft BGMP November 1998 904 8.2. OPEN Message Format 906 After a transport protocol connection is established, the first 907 message sent by each side is an OPEN message. If the OPEN message is 908 acceptable, a KEEPALIVE message confirming the OPEN is sent back. 909 Once the OPEN is confirmed, UPDATE, KEEPALIVE, and NOTIFICATION 910 messages may be exchanged. 912 In addition to the fixed-size BGMP header, the OPEN message contains 913 the following fields: 915 0 1 2 3 916 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 917 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 918 | Version | Reserved | Hold Time | 919 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 920 | BGMP Identifier | 921 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 922 | | 923 + (Optional Parameters) | 924 | | 925 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 927 Version: 928 This 1-octet unsigned integer indicates the protocol version number 929 of the message. The current BGMP version number is 1. 931 Hold Time: 932 This 2-octet unsigned integer indicates the number of seconds that 933 the sender proposes for the value of the Hold Timer. Upon receipt 934 of an OPEN message, a BGMP speaker MUST calculate the value of the 935 Hold Timer by using the smaller of its configured Hold Time and the 936 Hold Time received in the OPEN message. The Hold Time MUST be 937 either zero or at least three seconds. An implementation may 938 reject connections on the basis of the Hold Time. The calculated 939 value indicates the maximum number of seconds that may elapse 940 between the receipt of successive KEEPALIVE, and/or UPDATE messages 941 by the sender. 943 BGMP Identifier: 944 This 4-octet unsigned integer indicates the BGMP Identifier of the 945 sender. A given BGMP speaker sets the value of its BGMP Identifier 947 Draft BGMP November 1998 949 to a globally-unique value assigned to that BGMP speaker (e.g., an 950 IPv4 address). The value of the BGMP Identifier is determined on 951 startup and is the same for every BGMP session opened. 953 Optional Parameters: 954 This field may contain a list of optional parameters, where each 955 parameter is encoded as a triplet. The combined length of all optional 957 parameters can be derived from the Length field in the message 958 header. 960 0 1 961 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 962 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-... 963 | Parm. Type | Parm. Length | Parameter Value (variable) 964 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-... 966 Parameter Type is a one octet field that unambiguously identifies 967 individual parameters. Parameter Length is a one octet field that 968 contains the length of the Parameter Value field in octets. 969 Parameter Value is a variable length field that is interpreted 970 according to the value of the Parameter Type field. 972 This document defines the following Optional Parameters: 974 a) Authentication Information (Parameter Type 1): 975 This optional parameter may be used to authenticate a BGMP peer. 976 The Parameter Value field contains a 1-octet Authentication Code 977 followed by a variable length Authentication Data. 979 0 1 2 3 4 5 6 7 8 980 +-+-+-+-+-+-+-+-+ 981 | Auth. Code | 982 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 983 | | 984 | Authentication Data | 985 | | 986 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 988 Authentication Code: 990 This 1-octet unsigned integer indicates the authentication 991 mechanism being used. Whenever an authentication mechanism is 993 Draft BGMP November 1998 995 specified for use within BGMP, three things must be included in 996 the specification: 998 - the value of the Authentication Code which indicates use of the 999 mechanism, - the form and meaning of the Authentication Data, and 1000 - the algorithm for computing values of Marker fields. 1002 Note that a separate authentication mechanism may be used in 1003 establishing the transport level connection. 1005 Authentication Data: 1007 The form and meaning of this field is a variable-length field 1008 depend on the Authentication Code. 1010 The minimum length of the OPEN message is 12 octets (including 1011 message header). 1013 b) Capability Information (Parameter Type 2): 1014 This is an Optional Parameter that is used by a BGMP-speaker to 1015 convey to its peer the list of capabilities supported by the 1016 speaker. The parameter contains one or more triples , where each triple is 1018 encoded as shown below: 1019 +------------------------------+ 1020 | Capability Code (1 octet) | 1021 +------------------------------+ 1022 | Capability Length (1 octet) | 1023 +------------------------------+ 1024 | Capability Value (variable) | 1025 +------------------------------+ 1026 Capability Code: 1028 Capability Code is a one octet field that unambiguously identifies 1029 individual capabilities. 1031 Capability Length: 1033 Capability Length is a one octet field that contains the length of 1034 the Capability Value field in octets. 1036 Capability Value: 1038 Capability Value is a variable length field that is interpreted 1040 Draft BGMP November 1998 1042 according to the value of the Capability Code field. 1044 A particular capability, as identified by its Capability Code, may 1045 occur more than once within the Optional Parameter. 1047 This document reserves Capability Codes 128-255 for vendor-specific 1048 applications. 1050 This document reserves value 0. 1052 Capability Codes (other than those reserved for vendor specific use) 1053 are assigned only by the IETF consensus process and IESG approval. 1055 8.3. UPDATE Message Format 1057 UPDATE messages are used to transfer Join/Prune/FwdrPref information 1058 between BGMP peers. The UPDATE message always includes the fixed- 1059 size BGMP header, and one or more attributes as described below. 1061 The message format below allows compact encoding of (*,G) Joins and 1062 Prunes, while allowing the flexibility needed to do other updates 1063 such as (S,G) Joins and Prunes towards sources as well as on the 1064 shared tree. In the discussion below, an Encoded-Address-Prefix is 1065 of the form: 1066 0 1 2 3 1067 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1068 +-+-+-+-+-+-+-+-+ 1069 |EnTyp| AddrFam | 1070 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1071 | Address (variable length) | 1072 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1073 | Mask (variable length) | 1074 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1076 EnTyp: 1077 0 - All 1's Mask. The Mask field is 0 bytes long. 1078 1 - Mask length included. The Mask field is 4 bytes long, and 1079 contains the mask length, in bits. 1080 2 - Full Mask included. The Mask field is the same length 1081 as the Address field, and contains the full bitmask. 1083 AddrFam: 1084 The IANA-assigned address family number of the encoded prefix. 1086 Draft BGMP November 1998 1088 These include (among others): 1090 Number Description 1091 ------ ----------- 1092 1 IP (IP version 4) 1093 2 IPv6 (IP version 6) 1095 Address: 1096 The address associated with the given prefix to be encoded. The 1097 length is determined based on the Address Family. 1099 Mask: 1100 The mask associated with the given prefix. The format (or absence) 1101 of this field is determined by the EnTyp field. 1103 Each attribute is of the form: 1105 0 1 2 3 1106 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1107 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1108 | Length | Type | Data ... 1109 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1110 All attributes are 4-byte aligned. 1112 Length: 1113 The Length is the length of the entire attribute, including the 1114 length, type, and data fields. If other attributes are nested 1115 within the data field, the length includes the size of all such 1116 nested attributes. 1118 Type: 1120 Types 128-255 are reserved for "optional" attributes. If a 1121 required attribute is unrecognized, a NOTIFICATION will be sent and 1122 the connection will be closed. Unrecognized optional attributes 1123 are simply ignored. 1125 0 - JOIN 1126 1 - PRUNE 1127 2 - GROUP 1128 3 - SOURCE 1129 4 - FWDR_PREF 1131 Draft BGMP November 1998 1133 a) JOIN (Type Code 0) 1135 The JOIN attribute indicates that all GROUP or SOURCE options 1136 nested immediately within the JOIN option should be joined. 1138 0 1 2 3 1139 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1140 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1141 | Length | Type=0 | Reserved | 1142 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1143 | Nested Attributes ... 1144 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1145 No JOIN, PRUNE, or FWDR_PREF attributes may be immediately nested 1146 within a JOIN attribute. 1148 b) PRUNE (Type Code 1) 1150 The PRUNE attribute indicates that all GROUP or SOURCE attributes 1151 nested immediately within the PRUNE attribute should be pruned. 1153 0 1 2 3 1154 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1155 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1156 | Length | Type=1 | Reserved | 1157 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1158 | Nested Attributes ... 1159 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1160 No JOIN, PRUNE, or FWDR_PREF attributes may be immediately nested 1161 within a PRUNE attribute. 1163 c) GROUP (Type Code 2) 1165 The GROUP attribute identifies a given group-prefix. In addition, 1166 any attributes nested immediately within the GROUP attribute also 1167 apply to the given group-prefix. 1169 0 1 2 3 1170 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1171 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1172 | Length | Type=2 | | 1173 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + 1174 | | 1175 | Encoded-Address-Prefix | 1176 | | 1177 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1179 Draft BGMP November 1998 1181 | Nested Attributes (optional) ... 1182 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1183 Encoded-Address-Prefix 1184 The multicast group prefix to be joined to 1185 pruned, 1186 in the format described above. 1187 Nested Attributes No GROUP, SOURCE, or FWDR_PREF attributes may 1188 be 1189 immediately nested within a GROUP attribute. 1191 d) SOURCE (Type Code 3): 1193 The SOURCE attribute identifies a given source-prefix. In 1194 addition, any attributes nested immediately within the SOURCE 1195 attribute also apply to the given source-prefix. 1197 The SOURCE attribute has the following format: 1199 0 1 2 3 1200 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1201 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1202 | Length | Type=2 | | 1203 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + 1204 | | 1205 | Encoded-Address-Prefix | 1206 | | 1207 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1208 | Nested Attributes (optional) ... 1209 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1210 Encoded-Address-Prefix 1211 The Source-prefix in the format described 1212 above. 1213 Nested Attributes No GROUP, SOURCE, or FWDR_PREF attributes may 1214 be 1215 immediately nested within a SOURCE attribute. 1217 e) FWDR_PREF (Type Code 4) 1219 The FWDR_PREF attribute provides a forwarder preference value for 1220 all GROUP or SOURCE attributes nested immediately within the 1221 FWDR_PREF attribute. It is used by a BGMP speaker to inform other 1222 BGMP speakers of the originating speaker's degree of preference for 1223 a given group or source prefix. Usage of this attribute is 1224 described in 5.5. 1226 Draft BGMP November 1998 1228 0 1 2 3 1229 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1230 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1231 | Length | Type=1 | Reserved | 1232 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1233 | Preference Value | 1234 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1235 | Nested Attributes ... 1236 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1237 Preference Value A 32-bit non-negative integer. 1238 Nested Attributes No JOIN, PRUNE, or FWDR_PREF attributes may be 1239 immediately nested within a FWDR_PREF 1240 attribute. 1242 8.4. Encoding examples 1244 Below are enumerated examples of how various updates are built using 1245 nested attributes, where A ( B ) denotes that attribute B is nested 1246 within attribute A. 1247 (*,G-prefix) Join: JOIN ( GROUP ) 1248 (*,G-prefix) Prune: PRUNE ( GROUP ) 1249 (S,G) Join towards S : GROUP ( JOIN ( SOURCE ) ) 1250 (S,G) Join cancelling prune towards G: GROUP ( JOIN ( SOURCE ) ) 1251 (S,G) Prune towards S: GROUP ( PRUNE ( SOURCE ) ) 1252 (S,G) Prune towards G: GROUP ( PRUNE ( SOURCE ) ) 1253 Switch from (*,G) to (S,G): PRUNE ( GROUP ( JOIN ( SOURCE ) ) ) 1254 Switch from (S,G) to (*,G): JOIN ( GROUP ) 1255 Initial (*,G) Join with S pruned: JOIN ( GROUP ( PRUNE ( SOURCE ) ) ) 1256 Forwarder preference announcement for G-prefix: FWDR_PREF ( GROUP ) 1257 Forwarder preference announcement for S-prefix: FWDR_PREF ( SOURCE ) 1259 8.5. KEEPALIVE Message Format 1261 BGMP does not use any transport protocol-based keep-alive mechanism 1262 to determine if peers are reachable. Instead, KEEPALIVE messages are 1263 exchanged between peers often enough as not to cause the Hold Timer 1264 to expire. A reasonable maximum time between the last KEEPALIVE or 1265 UPDATE message sent, and the time at which a KEEPALIVE message is 1266 sent, would be one third of the Hold Time interval. KEEPALIVE 1267 messages MUST NOT be sent more frequently than one per second. An 1268 implementation MAY adjust the rate at which it sends KEEPALIVE 1269 messages as a function of the Hold Time interval. 1271 Draft BGMP November 1998 1273 If the negotiated Hold Time interval is zero, then periodic KEEPALIVE 1274 messages MUST NOT be sent. 1276 A KEEPALIVE message consists of only a message header, and has a 1277 length of 4 octets. 1279 8.6. NOTIFICATION Message Format 1281 A NOTIFICATION message is sent when an error condition is detected. 1282 The BGMP connection is closed immediately after sending it. 1284 In addition to the fixed-size BGMP header, the NOTIFICATION message 1285 contains the following fields: 1286 0 1 2 3 1287 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1288 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1289 | Error code | Error subcode | Data | 1290 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + 1291 | | 1292 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1294 Error Code: 1296 This 1-octet unsigned integer indicates the type of 1297 NOTIFICATION. The following Error Codes have been defined: 1299 Error Code Symbolic Name Reference 1301 1 Message Header Error Section 9.1 1303 2 OPEN Message Error Section 9.2 1305 3 UPDATE Message Error Section 9.3 1307 4 Hold Timer Expired Section 9.5 1309 5 Finite State Machine Error Section 9.6 1311 6 Cease Section 9.7 1313 Error subcode: 1315 This 1-octet unsigned integer provides more specific 1316 information about the nature of the reported error. Each 1318 Draft BGMP November 1998 1320 Error 1321 Code may have one or more Error Subcodes associated with it. 1322 If no appropriate Error Subcode is defined, then a zero 1323 (Unspecific) value is used for the Error Subcode field. 1325 Message Header Error subcodes: 1327 2 - Bad Message Length. 1328 3 - Bad Message Type. 1330 OPEN Message Error subcodes: 1332 1 - Unsupported Version Number 1333 4 - Unsupported Optional Parameter 1334 5 - Authentication Failure 1335 6 - Unacceptable Hold Time 1336 7 - Unsupported Capability 1338 UPDATE Message Error subcodes: 1340 1 - Malformed Attribute List 1341 2 - Unrecognized Well-known Attribute 1342 5 - Attribute Length Error 1343 10 - Invalid Prefix Field 1344 Data: 1345 This variable-length field is used to diagnose the reason for the 1346 NOTIFICATION. The contents of the Data field depend upon the 1347 Error Code and Error Subcode. See Section 9 below for more 1348 details. 1350 Note that the length of the Data field can be determined from the 1351 message Length field by the formula: 1353 Message Length = 6 + Data Length 1355 The minimum length of the NOTIFICATION message is 6 octets 1356 (including message header). 1358 9. BGMP Error Handling 1360 This section describes actions to be taken when errors are detected 1361 while processing BGMP messages. BGMP Error Handling is similar to 1362 that of BGP [BGP]. 1364 Draft BGMP November 1998 1366 When any of the conditions described here are detected, a 1367 NOTIFICATION message with the indicated Error Code, Error Subcode, 1368 and Data fields is sent, and the BGMP connection is closed. If no 1369 Error Subcode is specified, then a zero must be used. 1371 The phrase "the BGMP connection is closed" means that the transport 1372 protocol connection has been closed and that all resources for that 1373 BGMP connection have been deallocated. The remote peer is removed 1374 from the target list of all tree state entries. 1376 Unless specified explicitly, the Data field of the NOTIFICATION 1377 message that is sent to indicate an error is empty. 1379 9.1. Message Header error handling 1381 All errors detected while processing the Message Header are indicated 1382 by sending the NOTIFICATION message with Error Code Message Header 1383 Error. The Error Subcode elaborates on the specific nature of the 1384 error. 1386 If the Length field of the message header is less than 4 or greater 1387 than 4096, or if the Length field of an OPEN message is less than 1388 the minimum length of the OPEN message, or if the Length field of an 1389 UPDATE message is less than the minimum length of the UPDATE message, 1390 or if the Length field of a KEEPALIVE message is not equal to 4, then 1391 the Error Subcode is set to Bad Message Length. The Data field 1392 contains the erroneous Length field. 1394 If the Type field of the message header is not recognized, then the 1395 Error Subcode is set to Bad Message Type. The Data field contains 1396 the erroneous Type field. 1398 9.2. OPEN message error handling 1400 All errors detected while processing the OPEN message are indicated 1401 by sending the NOTIFICATION message with Error Code OPEN Message 1402 Error. The Error Subcode elaborates on the specific nature of the 1403 error. 1405 If the version number contained in the Version field of the received 1406 OPEN message is not supported, then the Error Subcode is set to 1407 Unsupported Version Number. The Data field is a 2-octet unsigned 1408 integer, which indicates the largest locally supported version number 1410 Draft BGMP November 1998 1412 less than the version the remote BGMP peer bid (as indicated in the 1413 received OPEN message). 1415 If the Hold Time field of the OPEN message is unacceptable, then the 1416 Error Subcode MUST be set to Unacceptable Hold Time. An 1417 implementation MUST reject Hold Time values of one or two seconds. 1418 An implementation MAY reject any proposed Hold Time. An 1419 implementation which accepts a Hold Time MUST use the negotiated 1420 value for the Hold Time. 1422 If one of the Optional Parameters in the OPEN message is not 1423 recognized, then the Error Subcode is set to Unsupported Optional 1424 Parameters. 1426 If the OPEN message carries Authentication Information (as an 1427 Optional Parameter), then the corresponding authentication procedure 1428 is invoked. If the authentication procedure (based on Authentication 1429 Code and Authentication Data) fails, then the Error Subcode is set to 1430 Authentication Failure. 1432 If the OPEN message indicates that the peer does not support a 1433 capability which the receiver requires, the receiver may send a 1434 NOTIFICATION message to the peer, and terminate peering. The Error 1435 Subcode in the message is set to Unsupported Capability. The Data 1436 field in the NOTIFICATION message lists the set of capabilities that 1437 cause the speaker to send the message. Each such capability is 1438 encoded the same way as it was encoded in the received OPEN message. 1440 9.3. UPDATE message error handling 1442 All errors detected while processing the UPDATE message are indicated 1443 by sending the NOTIFICATION message with Error Code UPDATE Message 1444 Error. The error subcode elaborates on the specific nature of the 1445 error. 1447 If any recognized attribute has Attribute Length that conflicts with 1448 the expected length (based on the attribute type code), then the 1449 Error Subcode is set to Attribute Length Error. The Data field 1450 contains the erroneous attribute (type, length and value). 1452 If the Encoded-Address-Prefix field in some attribute is 1454 Draft BGMP November 1998 1456 syntactically incorrect, then the Error Subcode is set to Invalid 1457 Prefix Field. 1459 If any other is encountered when processing attributes (such as 1460 invalid nestings), then the Error Subcode is set to Malformed 1461 Attribute List, and the problematic attribute is included in the data 1462 field. 1464 9.4. NOTIFICATION message error handling 1466 If a peer sends a NOTIFICATION message, and there is an error in that 1467 message, there is unfortunately no means of reporting this error via 1468 a subsequent NOTIFICATION message. Any such error, such as an 1469 unrecognized Error Code or Error Subcode, should be noticed, logged 1470 locally, and brought to the attention of the administration of the 1471 peer. The means to do this, however, lies outside the scope of this 1472 document. 1474 9.5. Hold Timer Expired error handling 1476 If a system does not receive successive KEEPALIVE and/or UPDATE 1477 and/or NOTIFICATION messages within the period specified in the Hold 1478 Time field of the OPEN message, then the NOTIFICATION message with 1479 Hold Timer Expired Error Code must be sent and the BGMP connection 1480 closed. 1482 9.6. Finite State Machine error handling 1484 Any error detected by the BGMP Finite State Machine (e.g., receipt of 1485 an unexpected event) is indicated by sending the NOTIFICATION message 1486 with Error Code Finite State Machine Error. 1488 9.7. Cease 1490 In absence of any fatal errors (that are indicated in this section), 1491 a BGMP peer may choose at any given time to close its BGMP connection 1492 by sending the NOTIFICATION message with Error Code Cease. However, 1493 the Cease NOTIFICATION message must not be used when a fatal error 1494 indicated by this section does exist. 1496 Draft BGMP November 1998 1498 9.8. Connection collision detection 1500 If a pair of BGMP speakers try simultaneously to establish a TCP 1501 connection to each other, then two parallel connections between this 1502 pair of speakers might well be formed. We refer to this situation as 1503 connection collision. Clearly, one of these connections must be 1504 closed. 1506 Based on the value of the BGMP Identifier a convention is established 1507 for detecting which BGMP connection is to be preserved when a 1508 collision does occur. The convention is to compare the BGMP 1509 Identifiers of the peers involved in the collision and to retain only 1510 the connection initiated by the BGMP speaker with the higher-valued 1511 BGMP Identifier. 1513 Upon receipt of an OPEN message, the local system must examine all of 1514 its connections that are in the OpenConfirm state. A BGMP speaker 1515 may also examine connections in an OpenSent state if it knows the 1516 BGMP Identifier of the peer by means outside of the protocol. If 1517 among these connections there is a connection to a remote BGMP 1518 speaker whose BGMP Identifier equals the one in the OPEN message, 1519 then the local system performs the following collision resolution 1520 procedure: 1522 1. The BGMP Identifier of the local system is compared to the BGMP 1523 Identifier of the remote system (as specified in the OPEN message). 1525 2. If the value of the local BGMP Identifier is less than the remote 1526 one, the local system closes BGMP connection that already exists (the 1527 one that is already in the OpenConfirm state), and accepts BGMP 1528 connection initiated by the remote system. 1530 3. Otherwise, the local system closes newly created BGMP connection 1531 (the one associated with the newly received OPEN message), and 1532 continues to use the existing one (the one that is already in the 1533 OpenConfirm state). 1535 Comparing BGMP Identifiers is done by treating them as (4-octet long) 1536 unsigned integers. 1538 A connection collision with an existing BGMP connection that is in 1539 Established states causes unconditional closing of the newly created 1540 connection. Note that a connection collision cannot be detected with 1541 connections that are in Idle, or Connect, or Active states. 1543 Draft BGMP November 1998 1545 Closing the BGMP connection (that results from the collision 1546 resolution procedure) is accomplished by sending the NOTIFICATION 1547 message with the Error Code Cease. 1549 10. BGMP Version Negotiation 1551 BGMP speakers may negotiate the version of the protocol by making 1552 multiple attempts to open a BGMP connection, starting with the 1553 highest version number each supports. If an open attempt fails with 1554 an Error Code OPEN Message Error, and an Error Subcode Unsupported 1555 Version Number, then the BGMP speaker has available the version 1556 number it tried, the version number its peer tried, the version 1557 number passed by its peer in the NOTIFICATION message, and the 1558 version numbers that it supports. If the two peers do support one or 1559 more common versions, then this will allow them to rapidly determine 1560 the highest common version. In order to support BGMP version 1561 negotiation, future versions of BGMP must retain the format of the 1562 OPEN and NOTIFICATION messages. 1564 10.1. BGMP Capability Negotiation 1566 When a BGMP speaker sends an OPEN message to its BGMP peer, the 1567 message may include an Optional Parameter, called Capabilities. The 1568 parameter lists the capabilities supported by the speaker. 1570 A BGMP speaker may use a particular capability when peering with 1571 another speaker only if both speakers support that capability. A 1572 BGMP speaker determines the capabilities supported by its peer by 1573 examining the list of capabilities present in the Capabilities 1574 Optional Parameter carried by the OPEN message that the speaker 1575 receives from the peer. 1577 11. BGMP Finite State machine 1579 This section specifies BGMP operation in terms of a Finite State 1580 Machine (FSM). Following is a brief summary and overview of BGMP 1581 operations by state as determined by this FSM. 1583 Initially BGMP is in the Idle state. 1585 Idle state: 1587 Draft BGMP November 1998 1589 In this state BGMP refuses all incoming BGMP connections. No 1590 resources are allocated to the peer. In response to the Start 1591 event (initiated by either system or operator) the local system 1592 initializes all BGMP resources, starts the ConnectRetry timer, 1593 initiates a transport connection to the other BGMP peer, while 1594 listening for a connection that may be initiated by the remote 1595 BGMP peer, and changes its state to Connect. The exact value of 1596 the ConnectRetry timer is a local matter, but should be 1597 sufficiently large to allow TCP initialization. 1599 If a BGMP speaker detects an error, it shuts down the connection 1600 and changes its state to Idle. Getting out of the Idle state 1601 requires generation of the Start event. If such an event is 1602 generated automatically, then persistent BGMP errors may result in 1603 persistent flapping of the speaker. To avoid such a condition it 1604 is recommended that Start events should not be generated 1605 immediately for a peer that was previously transitioned to Idle 1606 due to an error. For a peer that was previously transitioned to 1607 Idle due to an error, the time between consecutive generation of 1608 Start events, if such events are generated automatically, shall 1609 exponentially increase. The value of the initial timer shall be 60 1610 seconds. The time shall be doubled for each consecutive retry. 1612 Any other event received in the Idle state is ignored. 1614 Connect state: 1616 In this state BGMP is waiting for the transport protocol 1617 connection to be completed. 1619 If the transport protocol connection succeeds, the local system 1620 clears the ConnectRetry timer, completes initialization, sends an 1621 OPEN message to its peer, and changes its state to OpenSent. If 1622 the transport protocol connect fails (e.g., retransmission 1623 timeout), the local system restarts the ConnectRetry timer, 1624 continues to listen for a connection that may be initiated by the 1625 remote BGMP peer, and changes its state to Active state. 1627 In response to the ConnectRetry timer expired event, the local 1628 system restarts the ConnectRetry timer, initiates a transport 1629 connection to the other BGMP peer, continues to listen for a 1630 connection that may be initiated by the remote BGMP peer, and 1631 stays in the Connect state. 1633 The Start event is ignored in the Connect state. 1635 Draft BGMP November 1998 1637 In response to any other event (initiated by either system or 1638 operator), the local system releases all BGMP resources associated 1639 with this connection and changes its state to Idle. 1641 Active state: 1643 In this state BGMP is trying to acquire a peer by initiating a 1644 transport protocol connection. 1646 If the transport protocol connection succeeds, the local system 1647 clears the ConnectRetry timer, completes initialization, sends an 1648 OPEN message to its peer, sets its Hold Timer to a large value, 1649 and changes its state to OpenSent. A Hold Timer value of 4 1650 minutes is suggested. 1652 In response to the ConnectRetry timer expired event, the local 1653 system restarts the ConnectRetry timer, initiates a transport 1654 connection to other BGMP peer, continues to listen for a 1655 connection that may be initiated by the remote BGMP peer, and 1656 changes its state to Connect. 1658 If the local system detects that a remote peer is trying to 1659 establish BGMP connection to it, and the IP address of the remote 1660 peer is not an expected one, the local system restarts the 1661 ConnectRetry timer, rejects the attempted connection, continues to 1662 listen for a connection that may be initiated by the remote BGMP 1663 peer, and stays in the Active state. 1665 The Start event is ignored in the Active state. 1667 In response to any other event (initiated by either system or 1668 operator), the local system releases all BGMP resources associated 1669 with this connection and changes its state to Idle. 1671 OpenSent state: 1673 In this state BGMP waits for an OPEN message from its peer. When 1674 an OPEN message is received, all fields are checked for 1675 correctness. If the BGMP message header checking or OPEN message 1676 checking detects an error (see Section 6.2), or a connection 1677 collision (see Section 6.8) the local system sends a NOTIFICATION 1678 message and changes its state to Idle. 1680 If there are no errors in the OPEN message, BGMP sends a KEEPALIVE 1681 message and sets a KeepAlive timer. The Hold Timer, which was 1683 Draft BGMP November 1998 1685 originally set to a large value (see above), is replaced with the 1686 negotiated Hold Time value (see section 4.2). If the negotiated 1687 Hold Time value is zero, then the Hold Time timer and KeepAlive 1688 timers are not started. If the value of the Autonomous System 1689 field is the same as the local Autonomous System number, then the 1690 connection is an "internal" connection; otherwise, it is 1691 "external". Finally, the state is changed to OpenConfirm. 1693 If a disconnect notification is received from the underlying 1694 transport protocol, the local system closes the BGMP connection, 1695 restarts the ConnectRetry timer, while continue listening for 1696 connection that may be initiated by the remote BGMP peer, and goes 1697 into the Active state. 1699 If the Hold Timer expires, the local system sends NOTIFICATION 1700 message with error code Hold Timer Expired and changes its state 1701 to Idle. 1703 In response to the Stop event (initiated by either system or 1704 operator) the local system sends NOTIFICATION message with Error 1705 Code Cease and changes its state to Idle. 1707 The Start event is ignored in the OpenSent state. 1709 In response to any other event the local system sends NOTIFICATION 1710 message with Error Code Finite State Machine Error and changes its 1711 state to Idle. 1713 Whenever BGMP changes its state from OpenSent to Idle, it closes 1714 the BGMP (and transport-level) connection and releases all 1715 resources associated with that connection. 1717 OpenConfirm state: 1719 In this state BGMP waits for a KEEPALIVE or NOTIFICATION message. 1721 If the local system receives a KEEPALIVE message, it changes its 1722 state to Established. 1724 If the Hold Timer expires before a KEEPALIVE message is received, 1725 the local system sends NOTIFICATION message with error code Hold 1726 Timer Expired and changes its state to Idle. 1728 If the local system receives a NOTIFICATION message, it changes 1729 its state to Idle. 1731 Draft BGMP November 1998 1733 If the KeepAlive timer expires, the local system sends a KEEPALIVE 1734 message and restarts its KeepAlive timer. 1736 If a disconnect notification is received from the underlying 1737 transport protocol, the local system changes its state to Idle. 1739 In response to the Stop event (initiated by either system or 1740 operator) the local system sends NOTIFICATION message with Error 1741 Code Cease and changes its state to Idle. 1743 The Start event is ignored in the OpenConfirm state. 1745 In response to any other event the local system sends NOTIFICATION 1746 message with Error Code Finite State Machine Error and changes its 1747 state to Idle. 1749 Whenever BGMP changes its state from OpenConfirm to Idle, it 1750 closes the BGMP (and transport-level) connection and releases all 1751 resources associated with that connection. 1753 Established state: 1755 In the Established state BGMP can exchange UPDATE, NOTIFICATION, 1756 and KEEPALIVE messages with its peer. 1758 If the local system receives an UPDATE or KEEPALIVE message, it 1759 restarts its Hold Timer, if the negotiated Hold Time value is 1760 non-zero. 1762 If the local system receives a NOTIFICATION message, it changes 1763 its state to Idle. 1765 If the local system receives an UPDATE message and the UPDATE 1766 message error handling procedure (see Section 6.3) detects an 1767 error, the local system sends a NOTIFICATION message and changes 1768 its state to Idle. 1770 If a disconnect notification is received from the underlying 1771 transport protocol, the local system changes its state to Idle. 1773 If the Hold Timer expires, the local system sends a NOTIFICATION 1774 message with Error Code Hold Timer Expired and changes its state 1775 to Idle. 1777 If the KeepAlive timer expires, the local system sends a KEEPALIVE 1779 Draft BGMP November 1998 1781 message and restarts its KeepAlive timer. 1783 Each time the local system sends a KEEPALIVE or UPDATE message, it 1784 restarts its KeepAlive timer, unless the negotiated Hold Time 1785 value is zero. 1787 In response to the Stop event (initiated by either system or 1788 operator), the local system sends a NOTIFICATION message with 1789 Error Code Cease and changes its state to Idle. 1791 The Start event is ignored in the Established state. 1793 In response to any other event, the local system sends 1794 NOTIFICATION message with Error Code Finite State Machine Error 1795 and changes its state to Idle. 1797 Whenever BGMP changes its state from Established to Idle, it 1798 closes the BGMP (and transport-level) connection, releases all 1799 resources associated with that connection, and deletes all routes 1800 derived from that connection. 1802 12. Security Considerations 1804 Security issues are not discussed in this memo. 1806 13. Authors' Addresses 1808 Dave Thaler 1809 Department of Electrical Engineering and Computer Science 1810 Microsoft 1811 One Microsoft Way 1812 Redmond, WA 98052 1813 EMail: dthaler@microsoft.com 1815 Deborah Estrin 1816 Computer Science Dept./ISI 1817 University of Southern California 1818 Los Angeles, CA 90089 1819 EMail: estrin@usc.edu 1821 David Meyer 1822 Cisco Systems 1824 Draft BGMP November 1998 1826 San Jose, CA 1827 EMail: dmm@cisco.com 1829 14. References 1831 [BGP] 1832 Rekhter, Y., and T. Li, "A Border Gateway Protocol 4 (BGP-4)", RFC 1833 1771, March 1995. 1835 [MBGP] 1836 Bates, T., Chandra, R., Katz, D., and Y. Rekhter, "Multiprotocol 1837 Extensions for BGP-4", RFC 2283, February 1998. 1839 [CBT] 1840 Ballardie, A. J., "Core Based Trees (CBT) Multicast: Architectural 1841 Overview and Specification", University College London, November 1842 1994. 1844 [CBTDM] 1845 Ballardie, A., "Core Based Tree (CBT) Multicast Border Router 1846 Specification" draft-ietf-idmr-cbt-br-spec-00.txt, October 1997. 1848 [DVMRP] 1849 Pusateri, T., "Distance Vector Multicast Routing Protocol", draft- 1850 ietf-idmr-dvmrp-v3-05.txt, October 1997. 1852 [DWR] 1853 Fenner, W., "Domain-Wide Reports", Work in progress. 1855 [INTEROP] 1856 Thaler, D., "Interoperability Rules for Multicast Routing 1857 Protocols", draft-thaler-multicast-interop-01.txt, March 1997. 1859 [IPv6MAA] 1860 R. Hinden, S. Deering, "IPv6 Multicast Address Assignments", 1861 draft-ietf-ipngwg-multicast-assgn-04.txt, July 1997. 1863 [ISSUES] 1864 Meyer, D., "Some Issues for an Inter-domain Multicast Routing 1865 Protocol", draft-ietf-mboned-imrp-some-issues-02.txt, June 1997. 1867 [MASC] 1868 Estrin, D., Handley, M, and D. Thaler, "Multicast-Address-Set 1869 advertisement and Claim mechanism", Work in Progress, June 1997. 1871 Draft BGMP November 1998 1873 [MOSPF] 1874 Moy, J., "Multicast Extensions to OSPF", RFC 1584, Proteon, March 1875 1994. 1877 [PIMDM] 1878 Estrin, et al., "Protocol Independent Multicast-Dense Mode (PIM- 1879 DM): Protocol Specification", draft-ietf-idmr-pim-dm-spec-05.txt, 1880 May 1997. 1882 [PIMSM] 1883 Estrin, et al., "Protocol Independent Multicast-Sparse Mode (PIM- 1884 SM): Protocol Specification", RFC 2117, June 1997. 1886 [REFLECT] 1887 Bates, T., and R. Chandra, "BGP Route Reflection: An alternative to 1888 full mesh IBGP", RFC 1966, June 1996. 1890 [RFC1700] 1891 S. J. Reynolds, J. Postel, "ASSIGNED NUMBERS", RFC 1700, October 1892 1994. 1894 [RFC1771] 1895 Y. Rekhter, T. Li, "A Border Gateway Protocol 4 (BGP-4)", RFC 1771, 1896 March 1995. 1898 [RFC2119] 1899 S. Bradner, "Key words for use in RFCs to Indicate Requirement 1900 Levels", BCP 14, RFC 2119, March 1997. 1902 Table of Contents 1904 1 Acknowledgements ................................................ 2 1905 2 Purpose ......................................................... 2 1906 3 Terminology ..................................................... 3 1907 4 Protocol Overview ............................................... 5 1908 4.1 Design Rationale .............................................. 6 1909 5 Protocol Details ................................................ 8 1910 5.1 Interaction with the EGP ...................................... 8 1911 5.2 Multicast Data Packet Processing .............................. 9 1912 5.3 BGMP processing of Join and Prune messages and notifications 1913 .............................................................. 10 1914 5.3.1 Receiving Joins ............................................. 10 1916 Draft BGMP November 1998 1918 5.3.2 Receiving Prune Notifications ............................... 11 1919 5.3.3 Receiving Route Change Notifications ........................ 11 1920 5.4 Interaction with M-IGP components ............................. 12 1921 5.4.1 Interaction with DVMRP and PIM-DM ........................... 12 1922 5.4.2 Interaction with PIM-SM ..................................... 14 1923 5.4.3 Interaction with CBT ........................................ 15 1924 5.4.4 Interaction with MOSPF ...................................... 16 1925 5.5 Operation over Multi-access Networks .......................... 16 1926 6 Interaction with address allocation ............................. 17 1927 6.1 Requirements for BGMP components .............................. 17 1928 7 Transition Strategy ............................................. 17 1929 7.1 Preventing transit through the MBone stub ..................... 19 1930 8 Message Formats ................................................. 20 1931 8.1 Message Header Format ......................................... 20 1932 8.2 OPEN Message Format ........................................... 21 1933 8.3 UPDATE Message Format ......................................... 24 1934 8.4 Encoding examples ............................................. 28 1935 8.5 KEEPALIVE Message Format ...................................... 28 1936 8.6 NOTIFICATION Message Format ................................... 29 1937 9 BGMP Error Handling ............................................. 30 1938 9.1 Message Header error handling ................................. 31 1939 9.2 OPEN message error handling ................................... 31 1940 9.3 UPDATE message error handling ................................. 32 1941 9.4 NOTIFICATION message error handling ........................... 33 1942 9.5 Hold Timer Expired error handling ............................. 33 1943 9.6 Finite State Machine error handling ........................... 33 1944 9.7 Cease ......................................................... 33 1945 9.8 Connection collision detection ................................ 34 1946 10 BGMP Version Negotiation ....................................... 35 1947 10.1 BGMP Capability Negotiation .................................. 35 1948 11 BGMP Finite State machine ...................................... 35 1949 12 Security Considerations ........................................ 40 1950 13 Authors' Addresses ............................................. 40 1951 14 References ..................................................... 41