idnits 2.17.00 (12 Aug 2021) /tmp/idnits43851/draft-zzhang-rift-multicast-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** There are 2 instances of too long lines in the document, the longest one being 5 characters in excess of 72. == There are 1 instance of lines with multicast IPv4 addresses in the document. If these are generic example addresses, they should be changed to use the 233.252.0.x range defined in RFC 5771 Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 177 has weird spacing: '...mIDType flo...' -- The document date (July 13, 2020) is 670 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC8174' is mentioned on line 25, but not defined == Unused Reference: 'I-D.ietf-rift-rift' is defined on line 274, but no explicit reference was found in the text == Unused Reference: 'I-D.zzhang-pim-pds' is defined on line 286, but no explicit reference was found in the text == Outdated reference: A later version (-15) exists of draft-ietf-rift-rift-12 Summary: 2 errors (**), 0 flaws (~~), 7 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 RIFT Z. Zhang 3 Internet-Draft Juniper Networks 4 Intended status: Standards Track P. Thubert 5 Expires: January 14, 2021 Cisco 6 July 13, 2020 8 Multicast Routing In Fat Trees 9 draft-zzhang-rift-multicast-01 11 Abstract 13 This document specifies multicast procedures with RIFT. Multicast in 14 RIFT is similar to Bidirectional Protocol Independent Multicast (PIM- 15 Bidir), with the Rendezvous Point Link (RP-Link) simulated by a 16 spanning tree of some Top of Fabric (TOF) nodes and sub-TOF nodes. 18 Requirements Language 20 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 21 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 22 "OPTIONAL" in this document are to be interpreted as described in BCP 23 14 [RFC2119] [RFC8174] when, and only when, they appear in all 24 capitals, as shown here. 26 Status of This Memo 28 This Internet-Draft is submitted in full conformance with the 29 provisions of BCP 78 and BCP 79. 31 Internet-Drafts are working documents of the Internet Engineering 32 Task Force (IETF). Note that other groups may also distribute 33 working documents as Internet-Drafts. The list of current Internet- 34 Drafts is at https://datatracker.ietf.org/drafts/current/. 36 Internet-Drafts are draft documents valid for a maximum of six months 37 and may be updated, replaced, or obsoleted by other documents at any 38 time. It is inappropriate to use Internet-Drafts as reference 39 material or to cite them other than as "work in progress." 41 This Internet-Draft will expire on January 14, 2021. 43 Copyright Notice 45 Copyright (c) 2020 IETF Trust and the persons identified as the 46 document authors. All rights reserved. 48 This document is subject to BCP 78 and the IETF Trust's Legal 49 Provisions Relating to IETF Documents 50 (https://trustee.ietf.org/license-info) in effect on the date of 51 publication of this document. Please review these documents 52 carefully, as they describe your rights and restrictions with respect 53 to this document. Code Components extracted from this document must 54 include Simplified BSD License text as described in Section 4.e of 55 the Trust Legal Provisions and are provided without warranty as 56 described in the Simplified BSD License. 58 Table of Contents 60 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 61 2. Specifications . . . . . . . . . . . . . . . . . . . . . . . 4 62 2.1. Multicast Capability . . . . . . . . . . . . . . . . . . 4 63 2.2. Optional Per-neighbor Flooding Scope . . . . . . . . . . 5 64 2.3. Multicast TIE . . . . . . . . . . . . . . . . . . . . . . 5 65 2.4. Building Spanning Tree among TOFs and sub-TOFs . . . . . 6 66 3. Security Considerations . . . . . . . . . . . . . . . . . . . 7 67 4. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 7 68 5. References . . . . . . . . . . . . . . . . . . . . . . . . . 7 69 5.1. Normative References . . . . . . . . . . . . . . . . . . 7 70 5.2. Informative References . . . . . . . . . . . . . . . . . 8 71 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 8 73 1. Introduction 75 Because of the simple north-south regular topology in Fat Tree 76 networks, the PIM-Bidir [RFC5015] solution is extended for multicast 77 in RIFT (referred to as MRIFT in this document). The following is a 78 summary of the changes and adaptations compared to PIM-Bidir. 80 With PIM-Bidir, PIM joins are sent towards a Rendezvous Point Address 81 (RPA), which could be an address not belonging to any router. The 82 RPA does belong to a RP Link (RPL), which could be attached to a 83 single router or multiple routers (e.g. RPL is a LAN). With MRIFT, 84 there is no concept of RPA any more (joins are simply sent 85 northbound). The joins are terminated on some sub-TOF nodes and the 86 RPL is simulated by a spanning tree among some TOF and sub-TOF nodes. 88 Instead of (*,G) trees in PIM-Bidir, MRIFT uses (*,G-Prefix) trees, 89 where the G-Prefix could be *, G, or anything in between (e.g., 90 225.1.1.0/24). For light flows, they could just follow the (*,*) 91 tree. For heavy flows, individual (*,G) trees could be built. For 92 medium flows, some (*,G-prefix) trees could be shared. All the First 93 Hop Routers (FHRs, connecting to sources) and the Last Hop Routers 94 (LHRs, connecting to receivers) of a particular (*,G) flow must agree 95 on whether a (*,*) or (*,G) or (*,G-prefix) tree is used for the flow 96 so that they all join the same tree. This is done via out of band 97 control outside the scope of this document. 99 Because of the rich connections in Fat Trees, a router has to choose 100 one of its many north neighbors to send join to. This is done 101 through hashing. The hashing algorithm should lead to several but 102 not too many routers choosing the same north neighbor, so that fewer 103 routers are involved in multicast traffic forwarding, yet none of 104 those routers are overburdened by replicating to too many downstream 105 neighbors. 107 Instead of PIM messages, RIFT's own TIEs are used, similar to the 108 concept in [draft-zzhang-pim-pds]. This introduces the concept of 109 neighbor-scoped flooding - a multicast TIE is sent only to a chosen 110 upstream north neighbor that consumes it and then regenerates a new 111 TIE for the next upstream. 113 When a join reaches a sub-TOF node, the normal join process stops. 114 This forms a sub-tree rooted at this sub-TOF node. Multiple sub- 115 trees of the same tree may be joined by a single TOF node, or they 116 may have to be connected by a spanning tree serving as the RPL. For 117 example, in the following topology, in normal situations the two sub- 118 tree roots for the two pods, say Spine111 and Spine121, may be joined 119 by TOF21, but if the TOF21-Spine121 link is down, then TOF22 may be 120 used, and if the TOF22-Spine111 link is also down, then Spine111 and 121 Spine121 will have to be joined via 122 Spine111-TOF21-Spine112-TOF22-Spine121. 124 . +--------+ +--------+ ^ N 125 . |TOF 21| |TOF 22| | 126 .Level 2 ++-+--+-++ ++-+--+-++ <-*-> E/W 127 . | | | | | | | | | 128 . P111/2| |P121 | | | | S v 129 . ^ ^ ^ ^ | | | | 130 . | | | | | | | | 131 . +--------------+ | +-----------+ | | | +---------------+ 132 . | | | | | | | | 133 . South +-----------------------------+ | | ^ 134 . | | | | | | | All TIEs 135 . 0/0 0/0 0/0 +-----------------------------+ | 136 . v v v | | | | | 137 . | | +-+ +<-0/0----------+ | | 138 . | | | | | | | | 139 .+-+----++ optional +-+----++ ++----+-+ ++-----++ 140 .| | E/W link | | | | | | 141 .|Spin111+----------+Spin112| |Spin121| |Spin122| 142 .+-+---+-+ ++----+-+ +-+---+-+ ++---+--+ 143 . | | | South | | | | 144 . | +---0/0--->-----+ 0/0 | +----------------+ | 145 . 0/0 | | | | | | | 146 . | +---<-0/0-----+ | v | +--------------+ | | 147 . v | | | | | | | 148 .+-+---+-+ +--+--+-+ +-+---+-+ +---+-+-+ 149 .| | (L2L) | | | | Level 0 | | 150 .|Leaf111~~~~~~~~~~~~Leaf112| |Leaf121| |Leaf122| 151 .+-+-----+ +-+---+-+ +--+--+-+ +-+-----+ 152 . + + \ / + + 153 . Prefix111 Prefix112 \ / Prefix121 Prefix122 154 . multi-homed 155 . Prefix 156 .+---------- Pod 1 ---------+ +---------- Pod 2 ---------+ 158 2. Specifications 160 2.1. Multicast Capability 162 A new optional field is added to the NodeCapabilities to indicate 163 that the node is enabled for multicast: 165 struct NodeCapabilities { 166 ... 167 4: optional bool multicast_enabled; 168 } 170 2.2. Optional Per-neighbor Flooding Scope 172 This document introduces an optional per-neighbor flooding scope for 173 TIEs: 175 struct TIEHeader { 176 ... 177 13: optional common.SystemIDType flooding_scope_neighbor; 178 } 180 When a node originates a TIE with a per-neighbor flooding scope, it 181 is sent to the specified neighbor only. When a node receives a TIE 182 with per-neighbor flooding scope, it is accepted only if the node is 183 the specified neighbor, and it is not reflooded any further. 185 2.3. Multicast TIE 187 Currently the multicast TIEs are only N-TIEs with per-neighbor 188 flooding scope except on TOFs and sub-TOFs. If a multicast TIE is 189 received from a node south of sub-TOFs without the per-neighbor 190 flooding scope specified, it MUST be discarded. 192 /** TIE for multicast */ 193 struct IPMulticastTIEElement { 194 /** Multicast TIEs are for (*, group-prefix) joins. 195 The '*' is not encoded in the TIE. */ 196 1: required common.IPPrefixType group_prefix; 198 /** fields used by TOFs and sub-TOFs to build spanning tree RPL */ 199 2: optional common.SystemIDType chosen_or_highest_parent; 200 3: optional list sub_tof_children; 201 } 203 /** Type of TIE. 204 ... 205 */ 206 enum TIETypeType { 207 ... 208 TIETypeIPMulticast = 11, 209 TIETypeMaxValue = 12, 210 } 212 /** Single element in a TIE. 213 ... 214 */ 215 union TIEElement { 216 ... 217 /** IP multicast elements. */ 218 10: optional IPMulticastTIEElement ip_multicast; 219 } 221 2.4. Building Spanning Tree among TOFs and sub-TOFs 223 Note: this is still subject to further discussion/change. It may be 224 replaced by another scheme upon further discussions. 226 If a sub-TOF node is the root of a sub-tree for a (*, G-prefix) tree, 227 it hashes to a TOF neighbor as its parent for the tree, and 228 originates a corresponding multicast N-TIE without the per-neighbor 229 flooding scope - flooded to all its north TOF neighbors. The 230 chosen_or_highest_parent field is set to the chosen TOF neighbor. 232 A receiving TOF node originates a corresponding S-TIE without the 233 per-neighbor flooding scope. The chosen_or_highest_parent field is 234 set to the highest chosen_or_highest_parent of all received N-TIEs 235 and S-TIEs for the tree, identifying the root of all sub-trees from 236 that TOF node's point of view. The sub_tof_children list all of sub- 237 TOF nodes that have chosen the root as parent. 239 If a sub-TOF node that is the root of a sub-tree receives from TOF 240 neighbors some S-TIE for the same tree but with different 241 chosen_or_highest_parent values, it chooses, from all its TOF 242 neighbors that are recorded as a chosen_or_highest_parent, the one 243 with the highest system-id and (re)parent to that neighbor if that 244 neighbor is not already its parent. 246 After the above steps, if a TOF node remains as the chosen parent of 247 some sub-TOF nodes but its system-id does not match the highest 248 chosen_or_highest_parent of all N-TIEs and S-TIEs (i.e. the root), 249 the TOF node needs to join towards the root through some intermediate 250 sub-TOF and TOF nodes. If it has a sub-TOF neighbor listed in the 251 sub_tof_children of the root, it originates an S-TIE with the per- 252 neighbor flooding scope set to the sub-TOF neighbor, i.e. the sub-TOF 253 neighbor now becomes the parent of the TOF node (that is a parent of 254 some other sub-TOF nodes). 256 In case the TOF node does not have a neighbor listed in the 257 sub_tof_children of the S-TIE for the root, further study is needed. 258 It could be that the topology is so partitioned that a spanning tree 259 could not be built. 261 3. Security Considerations 263 To be provided. 265 4. Acknowledgements 267 The authors thank Bruno Rijsman and Antoni Przygenda for their review 268 and suggestions. 270 5. References 272 5.1. Normative References 274 [I-D.ietf-rift-rift] 275 Przygienda, T., Sharma, A., Thubert, P., Rijsman, B., and 276 D. Afanasiev, "RIFT: Routing in Fat Trees", draft-ietf- 277 rift-rift-12 (work in progress), May 2020. 279 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 280 Requirement Levels", BCP 14, RFC 2119, 281 DOI 10.17487/RFC2119, March 1997, 282 . 284 5.2. Informative References 286 [I-D.zzhang-pim-pds] 287 Zhang, J. and K. Patel, "Protocol Dependent Multicast 288 Signaling", draft-zzhang-pim-pds-00 (work in progress), 289 October 2015. 291 [RFC5015] Handley, M., Kouvelas, I., Speakman, T., and L. Vicisano, 292 "Bidirectional Protocol Independent Multicast (BIDIR- 293 PIM)", RFC 5015, DOI 10.17487/RFC5015, October 2007, 294 . 296 Authors' Addresses 298 Zhaohui Zhang 299 Juniper Networks 301 EMail: zzhang@juniper.net 303 Pascal Thubert 304 Cisco Systems, Inc 306 EMail: pthubert@cisco.com