idnits 2.17.00 (12 Aug 2021) /tmp/idnits52048/draft-zzhang-rift-multicast-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) == There are 1 instance of lines with multicast IPv4 addresses in the document. If these are generic example addresses, they should be changed to use the 233.252.0.x range defined in RFC 5771 Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document date (July 8, 2019) is 1048 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'I-D.ietf-rift-rift' is defined on line 210, but no explicit reference was found in the text == Unused Reference: 'RFC2119' is defined on line 214, but no explicit reference was found in the text == Unused Reference: 'I-D.zzhang-pim-pds' is defined on line 221, but no explicit reference was found in the text == Outdated reference: A later version (-15) exists of draft-ietf-rift-rift-06 Summary: 1 error (**), 0 flaws (~~), 7 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 RIFT Z. Zhang 3 Internet-Draft Juniper Networks 4 Intended status: Standards Track P. Thubert 5 Expires: January 9, 2020 Cisco 6 July 8, 2019 8 Multicast Routing In Fat Trees 9 draft-zzhang-rift-multicast-00 11 Abstract 13 This document specifies multicast procedures with RIFT. Multicast in 14 RIFT is similar to Bidirectional Protocol Independent Multicast (PIM- 15 Bidir), with the Rendezvous Point Link (RP-Link) simulated by a 16 spanning tree of some Top of Fabric (ToF) nodes and sub-ToF nodes. 18 Requirements Language 20 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 21 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 22 document are to be interpreted as described in RFC2119. 24 Status of This Memo 26 This Internet-Draft is submitted in full conformance with the 27 provisions of BCP 78 and BCP 79. 29 Internet-Drafts are working documents of the Internet Engineering 30 Task Force (IETF). Note that other groups may also distribute 31 working documents as Internet-Drafts. The list of current Internet- 32 Drafts is at https://datatracker.ietf.org/drafts/current/. 34 Internet-Drafts are draft documents valid for a maximum of six months 35 and may be updated, replaced, or obsoleted by other documents at any 36 time. It is inappropriate to use Internet-Drafts as reference 37 material or to cite them other than as "work in progress." 39 This Internet-Draft will expire on January 9, 2020. 41 Copyright Notice 43 Copyright (c) 2019 IETF Trust and the persons identified as the 44 document authors. All rights reserved. 46 This document is subject to BCP 78 and the IETF Trust's Legal 47 Provisions Relating to IETF Documents 48 (https://trustee.ietf.org/license-info) in effect on the date of 49 publication of this document. Please review these documents 50 carefully, as they describe your rights and restrictions with respect 51 to this document. Code Components extracted from this document must 52 include Simplified BSD License text as described in Section 4.e of 53 the Trust Legal Provisions and are provided without warranty as 54 described in the Simplified BSD License. 56 Table of Contents 58 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 59 2. Specifications . . . . . . . . . . . . . . . . . . . . . . . 5 60 3. Security Considerations . . . . . . . . . . . . . . . . . . . 5 61 4. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 5 62 5. References . . . . . . . . . . . . . . . . . . . . . . . . . 5 63 5.1. Normative References . . . . . . . . . . . . . . . . . . 5 64 5.2. Informative References . . . . . . . . . . . . . . . . . 6 65 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 6 67 1. Introduction 69 Because of the simple north-south regular topology in Fat Tree 70 networks, the PIM-Bidir [RFC5015] solution is extended for multicast 71 in RIFT (referred to as MRIFT in this document). The following is a 72 summary of the changes and adaptations compared to PIM-Bidir. 74 With PIM-Bidir, PIM joins are sent towards a Rendezvous Point 75 Address, which could be an address not belonging to any router. The 76 RPA does belong to a RP Link (RPL), which could be attached to a 77 single router or multiple routers (e.g. RPL is a LAN). With MRIFT, 78 there is no concept of RPA any more (joins are simply sent 79 northbound). The joins are terminated on some sub-ToF nodes and the 80 RPL is simulated by a spanning tree among some ToF and sub-ToF nodes. 82 Instead of (*,G) trees in PIM-Bidir, MRIFT uses (*,G-Prefix) trees, 83 where the G-Prefix could be *, G, or anything in between (e.g., 84 225.1.1.0/24). For light flows, they could just follow the (*,*) 85 tree. For heavy flows, individual (*,G) trees could be built. For 86 medium flows, some (*,G-prefix) trees could be shared. All the First 87 Hop Routers (FHRs, connecting to sources) and the Last Hop Routers 88 (LHRs, connecting to receivers) of a particular (*,G) flow must agree 89 on whether a (*,*) or (*,G) or (*,G-prefix) tree is used for the flow 90 so that they all join the same tree. This is done via out of band 91 control outside the scope of this document. 93 Because of the rich connections in Fat Trees, a router has to choose 94 one of its many north neighbors to send join to. This is done 95 through hashing. The hashing algorithm should lead to several but 96 not too many routers choosing the same north neighbor, so that fewer 97 routers are involved in multicast traffic forwarding, yet none of 98 those routers are overburdened by replicating to too many downstream 99 neighbors. 101 Instead of PIM messages, RIFT's own TIEs are used. This is similar 102 to the concept in [draft-zzhang-pim-pds]. Specifically, RIFT Policy 103 Guided Prefixes (PGP) [draft-atlas-rift-pgp] are used. The TIEs are 104 consumed, processed at each hop and then regenerated for the next 105 hop. 107 When a join reaches a sub-ToF node, the normal join process stops. 108 This forms a sub-tree rooted at this sub-ToF node. Multiple sub- 109 trees of the same tree may be joined by a single ToF node, or they 110 may have to be connected by a spanning tree serving as the RPL. For 111 example, in the following topology, in normal situations the two sub- 112 tree roots for the two pods, say Spine111 and Spine121, may be joined 113 by ToF21, but if the ToF21-Spine121 link is down, then ToF22 may be 114 used, and if the ToF22-Spine111 link is also down, then Spine111 and 115 Spine121 will have to be joined via 116 Spine111-ToF21-Spine112-ToF22-Spine121. 118 . +--------+ +--------+ ^ N 119 . |ToF 21| |ToF 22| | 120 .Level 2 ++-+--+-++ ++-+--+-++ <-*-> E/W 121 . | | | | | | | | | 122 . P111/2| |P121 | | | | S v 123 . ^ ^ ^ ^ | | | | 124 . | | | | | | | | 125 . +--------------+ | +-----------+ | | | +---------------+ 126 . | | | | | | | | 127 . South +-----------------------------+ | | ^ 128 . | | | | | | | All TIEs 129 . 0/0 0/0 0/0 +-----------------------------+ | 130 . v v v | | | | | 131 . | | +-+ +<-0/0----------+ | | 132 . | | | | | | | | 133 .+-+----++ optional +-+----++ ++----+-+ ++-----++ 134 .| | E/W link | | | | | | 135 .|Spin111+----------+Spin112| |Spin121| |Spin122| 136 .+-+---+-+ ++----+-+ +-+---+-+ ++---+--+ 137 . | | | South | | | | 138 . | +---0/0--->-----+ 0/0 | +----------------+ | 139 . 0/0 | | | | | | | 140 . | +---<-0/0-----+ | v | +--------------+ | | 141 . v | | | | | | | 142 .+-+---+-+ +--+--+-+ +-+---+-+ +---+-+-+ 143 .| | (L2L) | | | | Level 0 | | 144 .|Leaf111~~~~~~~~~~~~Leaf112| |Leaf121| |Leaf122| 145 .+-+-----+ +-+---+-+ +--+--+-+ +-+-----+ 146 . + + \ / + + 147 . Prefix111 Prefix112 \ / Prefix121 Prefix122 148 . multi-homed 149 . Prefix 150 .+---------- Pod 1 ---------+ +---------- Pod 2 ---------+ 152 The following algorithm is used to form the spanning tree. 154 1. Each sub-tree root (a sub-ToF node) hashes to a ToF neighbor as 155 its parent and advertises the parent's SystemID in a N-TIE for 156 the tree. This allows different trees to have different RPLs for 157 load-balancing. In the above example, Suppose Spine111 158 advertises its choice of ToF21, and Spine121 advertises its 159 choice of ToF22. 161 2. Each ToF node advertises the highest SystemID in its S-TIE for a 162 tree, of all the ToF nodes chosen and advertised by sub-ToF nodes 163 for the same tree. The S-TIE also includes the SystemID of the 164 sub-ToFs who made the choice. A ToF node knows the choices 165 either because it is the neighbor of a sub-ToF who made a choice 166 (e.g. ToF21 knows Spine121's choice is ToF22 because of 167 Spine121's N-TIE), or because it received another ToF's S-TIE 168 reflected by a common south neighbor (e.g. if the ToF21-Spine121 169 link is down, ToF21 still knows ToF22 was chosen by Spine121 170 because of ToF22's S-TIE for the tree reflected by Spine122). 172 3. If a sub-ToF node sees ToF nodes with higher SystemIDs (than that 173 of its own chosen parent) advertised for the tree, it reparents 174 to the one that is its neighbor and has the highest SystemID, and 175 re-advertises the new parent. In the above example, Spine111 176 will reparent to ToF22, assuming ToF22 has higher SystemID than 177 ToF21. 179 4. A ToF parent (with remaining sub-ToF children who could not 180 reparent) joins towards the ToF parent with the highest SystemID 181 (as determined in step #2) via a south neighbor by including in 182 its S-TIE for the tree the identity of the south neighbor, who 183 either advertised its choice of the highest SystemID ToF parent, 184 or reflected a ToF node's S-TIE about sub-ToF node's choice of 185 the highest SystemID ToF parent. In the above example, if the 186 ToF22-Spine111 link is down, ToF21 will join ToF22 either via 187 Spine112 or Spine122. 189 The above procedures may repeat multiple times before the spanning 190 tree is settled; unless the connections among ToF and sub-ToF nodes 191 are badly broken, the process should be fairly simple. 193 2. Specifications 195 More details will be specified in future revisions. 197 3. Security Considerations 199 To be provided. 201 4. Acknowledgements 203 The authors thank Bruno Rijsman and Antoni Przygenda for their review 204 and suggestions. 206 5. References 208 5.1. Normative References 210 [I-D.ietf-rift-rift] 211 Team, T., "RIFT: Routing in Fat Trees", draft-ietf-rift- 212 rift-06 (work in progress), June 2019. 214 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 215 Requirement Levels", BCP 14, RFC 2119, 216 DOI 10.17487/RFC2119, March 1997, 217 . 219 5.2. Informative References 221 [I-D.zzhang-pim-pds] 222 Zhang, J. and K. Patel, "Protocol Dependent Multicast 223 Signaling", draft-zzhang-pim-pds-00 (work in progress), 224 October 2015. 226 [RFC5015] Handley, M., Kouvelas, I., Speakman, T., and L. Vicisano, 227 "Bidirectional Protocol Independent Multicast (BIDIR- 228 PIM)", RFC 5015, DOI 10.17487/RFC5015, October 2007, 229 . 231 Authors' Addresses 233 Zhaohui Zhang 234 Juniper Networks 236 EMail: zzhang@juniper.net 238 Pascal Thubert 239 Cisco Systems, Inc 241 EMail: pthubert@cisco.com