idnits 2.17.00 (12 Aug 2021) /tmp/idnits27292/draft-ietf-bier-te-arch-06.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (February 19, 2020) is 822 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '2' on line 1308 -- Looks like a reference, but probably isn't: '1' on line 1322 == Missing Reference: 'SI' is mentioned on line 1362, but not defined == Missing Reference: 'I' is mentioned on line 1369, but not defined == Missing Reference: 'VRF' is mentioned on line 1852, but not defined == Outdated reference: A later version (-06) exists of draft-ietf-bier-multicast-http-response-03 Summary: 0 errors (**), 0 flaws (~~), 5 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group T. Eckert, Ed. 3 Internet-Draft Futurewei 4 Intended status: Standards Track G. Cauchie 5 Expires: August 22, 2020 Bouygues Telecom 6 M. Menth 7 University of Tuebingen 8 February 19, 2020 10 Path Engineering for Bit Index Explicit Replication (BIER-TE) 11 draft-ietf-bier-te-arch-06 13 Abstract 15 This memo introduces per-packet stateless strict and loose path 16 engineered replication and forwarding for Bit Index Explicit 17 Replication packets (RFC8279). This is called BIER-TE. 19 BIER-TE leverages RFC8279 and extends it with a new semantic for bits 20 in the bitstring. BIER-TE can leverage BIER forwarding engines with 21 little or no changes. 23 In BIER, the BitPositions (BP) of the packets bitstring indicate BIER 24 Forwarding Egress Routers (BFER), and hop-by-hop forwarding uses a 25 Routing Underlay such as an IGP. 27 In BIER-TE, BitPositions indicate adjacencies. The BIFT of each BFR 28 are only populated with BPs that are adjacent to the BFR in the BIER- 29 TE topology. The BIER-TE topology can consist of layer 2 or remote 30 (route) adjacencies. The BFR then replicates and forwards BIER 31 packets to those adjacencies. This results in the aforementioned 32 strict and loose path forwarding. 34 BIER-TE can co-exist with BIER forwarding in the same domain, for 35 example by using separate sub-domains. In the absence of routed 36 adjacencies, BIER-TE does not require a BIER routing underlay, and 37 can then be operated without requiring an IGP routing protocol. 39 BIER-TE operates without explicit in-network tree-building and 40 carries the multicast distribution tree in the packet header. It can 41 therefore be a good fit to support multicast path steering in Segment 42 Routing (SR) networks. 44 Status of This Memo 46 This Internet-Draft is submitted in full conformance with the 47 provisions of BCP 78 and BCP 79. 49 Internet-Drafts are working documents of the Internet Engineering 50 Task Force (IETF). Note that other groups may also distribute 51 working documents as Internet-Drafts. The list of current Internet- 52 Drafts is at https://datatracker.ietf.org/drafts/current/. 54 Internet-Drafts are draft documents valid for a maximum of six months 55 and may be updated, replaced, or obsoleted by other documents at any 56 time. It is inappropriate to use Internet-Drafts as reference 57 material or to cite them other than as "work in progress." 59 This Internet-Draft will expire on August 22, 2020. 61 Copyright Notice 63 Copyright (c) 2020 IETF Trust and the persons identified as the 64 document authors. All rights reserved. 66 This document is subject to BCP 78 and the IETF Trust's Legal 67 Provisions Relating to IETF Documents 68 (https://trustee.ietf.org/license-info) in effect on the date of 69 publication of this document. Please review these documents 70 carefully, as they describe your rights and restrictions with respect 71 to this document. Code Components extracted from this document must 72 include Simplified BSD License text as described in Section 4.e of 73 the Trust Legal Provisions and are provided without warranty as 74 described in the Simplified BSD License. 76 Table of Contents 78 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 79 1.1. BIER-TE and Traffic Engineering . . . . . . . . . . . . . 4 80 1.2. Basic Examples . . . . . . . . . . . . . . . . . . . . . 5 81 1.3. BIER-TE Topology and adjacencies . . . . . . . . . . . . 8 82 1.4. Comparison with BIER . . . . . . . . . . . . . . . . . . 9 83 1.5. Requirements Language . . . . . . . . . . . . . . . . . . 9 84 2. Components . . . . . . . . . . . . . . . . . . . . . . . . . 9 85 2.1. The Multicast Flow Overlay . . . . . . . . . . . . . . . 10 86 2.2. The BIER-TE Controller . . . . . . . . . . . . . . . . . 10 87 2.2.1. Assignment of BitPositions to adjacencies of the 88 network topology . . . . . . . . . . . . . . . . . . 11 89 2.2.2. Changes in the network topology . . . . . . . . . . . 11 90 2.2.3. Set up per-multicast flow BIER-TE state . . . . . . . 11 91 2.2.4. Link/Node Failures and Recovery . . . . . . . . . . . 12 92 2.3. The BIER-TE Forwarding Layer . . . . . . . . . . . . . . 12 93 2.4. The Routing Underlay . . . . . . . . . . . . . . . . . . 12 94 3. BIER-TE Forwarding . . . . . . . . . . . . . . . . . . . . . 12 95 3.1. The Bit Index Forwarding Table (BIFT) . . . . . . . . . . 12 96 3.2. Adjacency Types . . . . . . . . . . . . . . . . . . . . . 14 97 3.2.1. Forward Connected . . . . . . . . . . . . . . . . . . 14 98 3.2.2. Forward Routed . . . . . . . . . . . . . . . . . . . 14 99 3.2.3. ECMP . . . . . . . . . . . . . . . . . . . . . . . . 14 100 3.2.4. Local Decap . . . . . . . . . . . . . . . . . . . . . 15 101 3.3. Encapsulation considerations . . . . . . . . . . . . . . 15 102 3.4. Basic BIER-TE Forwarding Example . . . . . . . . . . . . 15 103 3.5. Forwarding comparison with BIER . . . . . . . . . . . . . 18 104 3.6. Requirements . . . . . . . . . . . . . . . . . . . . . . 18 105 4. BIER-TE Controller BitPosition Assignments . . . . . . . . . 19 106 4.1. P2P Links . . . . . . . . . . . . . . . . . . . . . . . . 19 107 4.2. BFER . . . . . . . . . . . . . . . . . . . . . . . . . . 19 108 4.3. Leaf BFERs . . . . . . . . . . . . . . . . . . . . . . . 19 109 4.4. LANs . . . . . . . . . . . . . . . . . . . . . . . . . . 20 110 4.5. Hub and Spoke . . . . . . . . . . . . . . . . . . . . . . 21 111 4.6. Rings . . . . . . . . . . . . . . . . . . . . . . . . . . 21 112 4.7. Equal Cost MultiPath (ECMP) . . . . . . . . . . . . . . . 22 113 4.8. Routed adjacencies . . . . . . . . . . . . . . . . . . . 25 114 4.8.1. Reducing BitPositions . . . . . . . . . . . . . . . . 25 115 4.8.2. Supporting nodes without BIER-TE . . . . . . . . . . 25 116 4.9. Reuse of BitPositions (without DNR) . . . . . . . . . . . 25 117 4.10. Summary of BP optimizations . . . . . . . . . . . . . . . 27 118 5. Avoiding loops and duplicates . . . . . . . . . . . . . . . . 28 119 5.1. Loops . . . . . . . . . . . . . . . . . . . . . . . . . . 28 120 5.2. Duplicates . . . . . . . . . . . . . . . . . . . . . . . 28 121 6. BIER-TE Forwarding Pseudocode . . . . . . . . . . . . . . . . 28 122 7. Managing SI, subdomains and BFR-ids . . . . . . . . . . . . . 31 123 7.1. Why SI and sub-domains . . . . . . . . . . . . . . . . . 32 124 7.2. Bit assignment comparison BIER and BIER-TE . . . . . . . 33 125 7.3. Using BFR-id with BIER-TE . . . . . . . . . . . . . . . . 33 126 7.4. Assigning BFR-ids for BIER-TE . . . . . . . . . . . . . . 34 127 7.5. Example bit allocations . . . . . . . . . . . . . . . . . 35 128 7.5.1. With BIER . . . . . . . . . . . . . . . . . . . . . . 35 129 7.5.2. With BIER-TE . . . . . . . . . . . . . . . . . . . . 36 130 7.6. Summary . . . . . . . . . . . . . . . . . . . . . . . . . 37 131 8. BIER-TE and Segment Routing . . . . . . . . . . . . . . . . . 37 132 9. Security Considerations . . . . . . . . . . . . . . . . . . . 38 133 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 39 134 11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 39 135 12. Change log [RFC Editor: Please remove] . . . . . . . . . . . 39 136 13. References . . . . . . . . . . . . . . . . . . . . . . . . . 44 137 13.1. Normative References . . . . . . . . . . . . . . . . . . 44 138 13.2. Informative References . . . . . . . . . . . . . . . . . 44 139 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 45 141 1. Introduction 143 BIER-TE shares architecture, terminology and packet formats with BIER 144 as described in [RFC8279] and [RFC8296]. This document describes 145 BIER-TE in the expectation that the reader is familiar with these two 146 documents. 148 In BIER-TE, BitPositions (BP) indicate adjacencies. The BIFT of each 149 BFR is only populated with BP that are adjacent to the BFR in the 150 BIER-TE Topology. Other BPs are left without adjacency. The BFR 151 replicate and forwards BIER packets to adjacent BPs that are set in 152 the packet. BPs are normally also reset upon forwarding to avoid 153 duplicates and loops. This is detailed further below. 155 Note that related work, [I-D.ietf-roll-ccast] uses bloom filters to 156 represent leaves or edges of the intended delivery tree. Bloom 157 filters in general can support larger trees/topologies with fewer 158 addressing bits than explicit bitstrings, but they introduce the 159 heuristic risk of false positives and cannot reset bits in the 160 bitstring during forwarding to avoid loops. For these reasons, BIER- 161 TE uses explicit bitstrings like BIER. The explicit bitstrings of 162 BIER-TE can also be seen as a special type of bloom filter, and this 163 is how related work [ICC] describes it. 165 1.1. BIER-TE and Traffic Engineering 167 BIER-TE is not a standalone, complete traffic engineering signaling 168 solution like RSVP with RSVP-TE extensions ([RFC2205], [RFC3209]). 169 Instead it is a BIER derived architecture and forwarding plane that 170 allows to signal "source-routed" engineered path and replication 171 points without per-path/replication-point state on the transit nodes. 172 It is therefore more similar to Segment Routing (SR, ([RFC8402])) 173 than RSVP-TE, except that SR does not provide stateless replication 174 point and receiver set signaling in its packet header. See Section 8 175 for a more detailled discussion of BIER-TE and SR. 177 BIER-TE can be used alone in use cases not requiring bandwidth or 178 buffer resource reservations, such as high resilient services through 179 dual transmission with engineered path diversity or optimization of 180 network capacity utilization through engineered paths/trees ("load 181 balancing across non-ECMP paths"). BIER-TE it is intended to scale 182 better for the number of multicast flows in these use cases than 183 traditional IP multicast plus other stateful path engineering 184 mechanisms due to its stateless nature. 186 BIER-TE could be combined with transit-node stateless bandwidth 187 admission control (AC) mechanisms to provide path engineered 188 multicast traffic with bandwidth reservations. In Section 2 below, 189 the AC function is expected to be integrated into the BIER-TE 190 Controller in these use-cases. 192 BIER-TE could be combined with transit-node stateless buffer 193 management such as [I-D.qiang-detnet-large-scale-detnet] to provide 194 path engineered multicast traffic with guaranteed bounded latency. 195 Note that bounded latency solutions also require bandwidth 196 reservations as explained above. 198 BIER-TE could be combined with transit-node stateful bandwidth and 199 buffer management mechanisms such as per-hop/per-flow shaping used in 200 Guaranteed Services ([RFC2212]), but scalability may not be as good 201 as in a complete transit-node stateless combinations. Nevertheless, 202 BIER-TE still avoids the need for per-flow replication state, which 203 is typically scaling limited separate from shaping state. BIER-TE 204 also continues to provide the benefits of path engineering with per- 205 packet selection of subsets of destinations and no need for in- 206 network reconvergence of per-flow replication state. 208 Mechanisms how to combine bandwidth and/or buffer reservation 209 mechanisms with BIER-TE are outside the scope of this document. 211 1.2. Basic Examples 213 BIER-TE forwarding is best introduced with simple examples. 215 BIER-TE Topology: 217 Diagram: 219 p5 p6 220 --- BFR3 --- 221 p3/ p13 \p7 222 BFR1 ---- BFR2 BFR5 ----- BFR6 223 p1 p2 p4\ p14 /p10 p11 p12 224 --- BFR4 --- 225 p8 p9 227 (simplified) BIER-TE Bit Index Forwarding Tables (BIFT): 229 BFR1: p1 -> local_decap 230 p2 -> forward_connected to BFR2 232 BFR2: p1 -> forward_connected to BFR1 233 p5 -> forward_connected to BFR3 234 p8 -> forward_connected to BFR4 236 BFR3: p3 -> forward_connected to BFR2 237 p7 -> forward_connected to BFR5 238 p13 -> local_decap 240 BFR4: p4 -> forward_connected to BFR2 241 p10 -> forward_connected to BFR5 242 p14 -> local_decap 244 BFR5: p6 -> forward_connected to BFR3 245 p9 -> forward_connected to BFR4 246 p12 -> forward_connected to BFR6 248 BFR6: p11 -> forward_connected to BFR5 249 p12 -> local_decap 251 Figure 1: BIER-TE basic example 253 Consider the simple network in the above BIER-TE overview example 254 picture with 6 BFRs. p1...p14 are the BitPositions (BP) used. All 255 BFRs can act as ingress BFR (BFIR), BFR1, BFR3, BFR4 and BFR6 can 256 also be egress BFR (BFER). Forward_connected is the name for 257 adjacencies that are representing subnet adjacencies of the network. 258 Local_decap is the name of the adjacency to decapsulate BIER-TE 259 packets and pass their payload to higher layer processing. 261 Assume a packet from BFR1 should be sent via BFR4 to BFR6. This 262 requires a bitstring (p2,p8,p10,p12). When this packet is examined 263 by BIER-TE on BFR1, the only BitPosition from the bitstring that is 264 also set in the BIFT is p2. This will cause BFR1 to send the only 265 copy of the packet to BFR2. Similarly, BFR2 will forward to BFR4 266 because of p8, BFR4 to BFR5 because of p10 and BFR5 to BFR6 because 267 of p12. p12 also makes BFR6 receive and decapsulate the packet. 269 To send in addition to BFR6 via BFR4 also a copy to BFR3, the 270 bitstring needs to be (p2,p5,p8,p10,p12,p13). When this packet is 271 examined by BFR2, p5 causes one copy to be sent to BFR3 and p8 one 272 copy to BFR4. When BFR3 receives the packet, p13 will cause it to 273 receive and decapsulate the packet. 275 If instead the bitstring was (p2,p6,p8,p10,p12,p13), the packet would 276 be copied by BFR5 towards BFR3 because p6 instead of BFR2 to BFR5 277 because of p6 in the prior case. This is showing the ability of the 278 shown BIER-TE Topology to make the traffic pass across any possible 279 path and be replicated where desired. 281 BIER-TE has various options to minimize BP assignments, many of which 282 are based on assumptions about the required multicast traffic paths 283 and bandwidth consumption in the network. 285 The following picture shows a modified example, in which Rtr2 and 286 Rtr5 are assumed not to support BIER-TE, so traffic has to be unicast 287 encapsulated across them. Unicast tunneling of BIER-TE packets can 288 leverage any feasible mechanism such as MPLS or IP, these 289 encapsulations are out of scope of this document. To emphasize non- 290 native forwarding of BIER-TE packets, these adjacencies are called 291 "forward_routed", but otherwise there is no difference in their 292 processing over the aforementioned "forward_connected" adjacencies. 294 In addition, bits are saved in the following example by assuming that 295 BFR1 only needs to be BFIR but not BFER or transit BFR. 297 BIER-TE Topology: 299 Diagram: 301 p1 p3 p7 302 ....> BFR3 <.... p5 303 ........ ........> 304 BFR1 (Rtr2) (Rtr5) BFR6 305 ........ ........> 306 ....> BFR4 <.... p6 307 p2 p4 p8 309 (simplified) BIER-TE Bit Index Forwarding Tables (BIFT): 311 BFR1: p1 -> forward_routed to BFR3 312 p2 -> forward_routed to BFR4 314 BFR3: p3 -> local_decap 315 p5 -> forward_routed to BFR6 317 BFR4: p4 -> local_decap 318 p6 -> forward_routed to BFR6 320 BFR6: p5 -> local_decap 321 p6 -> local_decap 322 p7 -> forward_routed to BFR3 323 p8 -> forward_routed to BFR4 325 Figure 2: BIER-TE basic overlay example 327 To send a BIER-TE packet from BFR1 via BFR3 to BFR6, the bitstring is 328 (p1,p5). From BFR1 via BFR4 to BFR6 it is (p2,p6). A packet from 329 BFR1 to BFR3,BFR4 and BFR6 can use (p1,p2,p3,p4,p5) or 330 (p1,p2,p3,p4,p6), or via BFR6 (p2,p3,p4,p6,p7) or (p1.p3,p4,p5,p8). 332 1.3. BIER-TE Topology and adjacencies 334 The key new component in BIER-TE to control where replication can or 335 should happens and how to minimize the required BP for segments is - 336 as shown in these two examples - the BIER-TE topology. 338 The BIER-TE Topology effectively consists of the BIFT of all the BFR 339 and can also be expressed in a diagram as a graph where the edges are 340 the adjacencies between the BFR. Adjacencies are naturally 341 unidirectional. BP can be reused across multiple adjacencies as long 342 as this does not lead to undesired duplicates or loops as explained 343 further down in the text. 345 If the BIER-TE topology represents the underlying (layer 2) topology 346 of the network, this is called "native" BIER-TE as shown in the first 347 example. This can be freely mixed with "overlay" BIER-TE, in 348 "forward_routed" adjacencies are used. 350 1.4. Comparison with BIER 352 The key differences over BIER are: 354 o BIER-TE replaces in-network autonomous path calculation by 355 explicit paths calculated off-path by the BIER-TE Controller. 357 o In BIER-TE every BitPosition of the BitString of a BIER-TE packet 358 indicates one or more adjacencies - instead of a BFER as in BIER. 360 o BIER-TE in each BFR has no routing table but only a BIER-TE 361 Forwarding Table (BIFT) indexed by SI:BitPosition and populated 362 with only those adjacencies to which the BFR should replicate 363 packets to. 365 BIER-TE headers use the same format as BIER headers. 367 BIER-TE forwarding does not require/use the BFIR-ID. The BFIR-ID can 368 still be useful though for coordinated BFIR/BFER functions, such as 369 the context for upstream assigned labels for MPLS payloads in MVPN 370 over BIER-TE. 372 If the BIER-TE domain is also running BIER, then the BFIR-ID in BIER- 373 TE packets can be set to the same BFIR-ID as used with BIER packets. 375 If the BIER-TE domain is not running full BIER or does not want to 376 reduce the need to allocate bits in BIER bitstrings for BFIR-ID 377 values, then the allocation of BFIR-ID values in BIER-TE packets can 378 be done through other mechanisms outside the scope of this document, 379 as long as this is appropriately agreed upon between all BFIR/BFER. 381 1.5. Requirements Language 383 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 384 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 385 document are to be interpreted as described in RFC 2119 [RFC2119]. 387 2. Components 389 End to end BIER-TE operations consists of four mayor components: The 390 "Multicast Flow Overlay", the "BIER-TE control plane" consisting of 391 the "BIER-TE Controller" and its signaling channels to the BFR, the 392 "Routing Underlay" and the "BIER-TE forwarding layer". The Bier-TE 393 Controller is the new architectural component in BIER-TE compared to 394 BIER. 396 Picture 2: Components of BIER-TE 398 <------BGP/PIM-----> 399 |<-IGMP/PIM-> multicast flow <-PIM/IGMP->| 400 overlay 402 [BIER-TE Controller] <=> [BIER-TE Topology] 403 BIER-TE control plane 404 ^ ^ ^ 405 / | \ BIER-TE control protocol 406 | | | e.g. Netconf/Restconf/Yang 407 v v v 408 Src -> Rtr1 -> BFIR-----BFR-----BFER -> Rtr2 -> Rcvr 410 |<----------------->| 411 BIER-TE forwarding layer 413 |<- BIER-TE domain->| 415 |<--------------------->| 416 Routing underlay 418 Figure 3: BIER-TE architecture 420 2.1. The Multicast Flow Overlay 422 The Multicast Flow Overlay operates as in BIER. See [RFC8279]. 423 Instead of interacting with the BIER forwarding layer (as in BIER), 424 it interacts with the BIER-TE Controller. 426 2.2. The BIER-TE Controller 428 The BIER-TE Controller is representing the control plane of BIER-TE. 429 It communicates two sets of information with BFRs: 431 During initial provisioning or modifications of the network topology, 432 the BIER-TE Controller discovers the network topology and creates the 433 BIER-TE topology from it: determine which adjacencies are required/ 434 desired and assign BitPositions to them. Then it signals the 435 resulting of BitPositions and their adjacencies to each BFR to set up 436 their BIER-TE BIFTs. 438 During day-to-day operations of the network, the BIER-TE Controller 439 signals to BFIRs what multicast flows are mapped to what BitStrings. 441 Communications between the BIER-TE Controller and BFRs is ideally via 442 standardized protocols and data-models such as Netconf/Restconf/Yang. 443 This is currently outside the scope of this document. Vendor- 444 specific CLI on the BFRs is also a possible stopgap option (as in 445 many other SDN solutions lacking definition of standardized data 446 model). 448 For simplicity, the procedures of the BIER-TE Controller are 449 described in this document as if it is a single, centralized 450 automated entity, such as an SDN controller. It could equally be an 451 operator setting up CLI on the BFRs. Distribution of the functions 452 of the BIER-TE Controller is currently outside the scope of this 453 document. 455 2.2.1. Assignment of BitPositions to adjacencies of the network 456 topology 458 The BIER-TE Controller tracks the BFR topology of the BIER-TE domain. 459 It determines what adjacencies require BitPositions so that BIER-TE 460 explicit paths can be built through them as desired by operator 461 policy. 463 The BIER-TE Controller then pushes the BitPositions/adjacencies to 464 the BIFT of the BFRs, populating only those SI:BitPositions to the 465 BIFT of each BFR to which that BFR should be able to send packets to 466 - adjacencies connecting to this BFR. 468 2.2.2. Changes in the network topology 470 If the network topology changes (not failure based) so that 471 adjacencies that are assigned to BitPositions are no longer needed, 472 the BIER-TE Controller can re-use those BitPositions for new 473 adjacencies. First, these BitPositions need to be removed from any 474 BFIR flow state and BFR BIFT state, then they can be repopulated, 475 first into BIFT and then into the BFIR. 477 2.2.3. Set up per-multicast flow BIER-TE state 479 The BIER-TE Controller interacts with the multicast flow overlay to 480 determine what multicast flow needs to be sent by a BFIR to which set 481 of BFER. It calculates the desired distribution tree across the 482 BIER-TE domain based on algorithms outside the scope of this document 483 (e.g. CSFP, Steiner Tree, ...). It then pushes the calculated 484 BitString into the BFIR. 486 See [I-D.ietf-bier-multicast-http-response] for a solution describing 487 this interaction. 489 2.2.4. Link/Node Failures and Recovery 491 When link or nodes fail or recover in the topology, BIER-TE can 492 quickly respond with the optional FRR procedures described in [I- 493 D.eckert-bier-te-frr]. It can also more slowly react by 494 recalculating the BitStrings of affected multicast flows. This 495 reaction is slower than the FRR procedure because the BIER-TE 496 Controller needs to receive link/node up/down indications, 497 recalculate the desired BitStrings and push them down into the BFIRs. 498 With FRR, this is all performed locally on a BFR receiving the 499 adjacency up/down notification. 501 2.3. The BIER-TE Forwarding Layer 503 When the BIER-TE Forwarding Layer receives a packet, it simply looks 504 up the BitPositions that are set in the BitString of the packet in 505 the Bit Index Forwarding Table (BIFT) that was populated by the BIER- 506 TE Controller. For every BP that is set in the BitString, and that 507 has one or more adjacencies in the BIFT, a copy is made according to 508 the type of adjacencies for that BP in the BIFT. Before sending any 509 copy, the BFR resets all BP in the BitString of the packet for which 510 the BFR has one or more adjacencies in the BIFT, except when the 511 adjacency indicates "DoNotReset" (DNR, see Section 3.2.1). This is 512 done to inhibit that packets can loop. 514 2.4. The Routing Underlay 516 BIER-TE is sending BIER packets to directly connected BIER-TE 517 neighbors as L2 (unicasted) BIER packets without requiring a routing 518 underlay. BIER-TE forwarding uses the Routing underlay for 519 forward_routed adjacencies which copy BIER-TE packets to not- 520 directly-connected BFRs (see below for adjacency definitions). 522 If the BFR intends to support FRR for BIER-TE, then the BIER-TE 523 forwarding plane needs to receive fast adjacency up/down 524 notifications: Link up/down or neighbor up/down, e.g. from BFD. 525 Providing these notifications is considered to be part of the routing 526 underlay in this document. 528 3. BIER-TE Forwarding 530 3.1. The Bit Index Forwarding Table (BIFT) 532 The Bit Index Forwarding Table (BIFT) exists in every BFR. For every 533 subdomain in use, it is a table indexed by SI:BitPosition and is 534 populated by the BIER-TE control plane. Each index can be empty or 535 contain a list of one or more adjacencies. 537 BIER-TE can support multiple subdomains like BIER. Each one with a 538 separate BIFT 540 In the BIER architecture, indices into the BIFT are explained to be 541 both BFR-id and SI:BitString (BitPosition). This is because there is 542 a 1:1 relationship between BFR-id and SI:BitString - every bit in 543 every SI is/can be assigned to a BFIR/BFER. In BIER-TE there are 544 more bits used in each BitString than there are BFIR/BFER assigned to 545 the bitstring. This is because of the bits required to express the 546 engineered path through the topology. The BIER-TE forwarding 547 definitions do therefore not use the term BFR-id at all. Instead, 548 BFR-ids are only used as required by routing underlay, flow overlay 549 of BIER headers. Please refer to Section 7 for explanations how to 550 deal with SI, subdomains and BFR-id in BIER-TE. 552 ------------------------------------------------------------------ 553 | Index: | Adjacencies: | 554 | SI:BitPosition | or one or more per entry | 555 ================================================================== 556 | 0:1 | forward_connected(interface,neighbor{,DNR}) | 557 ------------------------------------------------------------------ 558 | 0:2 | forward_connected(interface,neighbor{,DNR}) | 559 | | forward_connected(interface,neighbor{,DNR}) | 560 ------------------------------------------------------------------ 561 | 0:3 | local_decap({VRF}) | 562 ------------------------------------------------------------------ 563 | 0:4 | forward_routed({VRF,}l3-neighbor) | 564 ------------------------------------------------------------------ 565 | 0:5 | | 566 ------------------------------------------------------------------ 567 | 0:6 | ECMP({adjacency1,...adjacencyN}, seed) | 568 ------------------------------------------------------------------ 569 ... 570 | BitStringLength | ... | 571 ------------------------------------------------------------------ 572 Bit Index Forwarding Table 574 Figure 4: BIFT adjacencies 576 The BIFT is programmed into the data plane of BFRs by the BIER-TE 577 Controller and used to forward packets, according to the rules 578 specified in the BIER-TE Forwarding Procedures. 580 Adjacencies for the same BP when populated in more than one BFR by 581 the BIER-TE Controller does not have to have the same adjacencies. 582 This is up to the BIER-TE Controller. BPs for p2p links are one case 583 (see below). 585 3.2. Adjacency Types 587 3.2.1. Forward Connected 589 A "forward_connected" adjacency is towards a directly connected BFR 590 neighbor using an interface address of that BFR on the connecting 591 interface. A forward_connected adjacency does not route packets but 592 only L2 forwards them to the neighbor. 594 Packets sent to an adjacency with "DoNotReset" (DNR) set in the BIFT 595 will not have the BitPosition for that adjacency reset when the BFR 596 creates a copy for it. The BitPosition will still be reset for 597 copies of the packet made towards other adjacencies. This can be 598 used for example in ring topologies as explained below. 600 3.2.2. Forward Routed 602 A "forward_routed" adjacency is an adjacency towards a BFR that is 603 not a forward_connected adjacency: towards a loopback address of a 604 BFR or towards an interface address that is non-directly connected. 605 Forward_routed packets are forwarded via the Routing Underlay. 607 If the Routing Underlay has multiple paths for a forward_routed 608 adjacency, it will perform ECMP independent of BIER-TE for packets 609 forwarded across a forward_routed adjacency. This is independent of 610 BIER-TE ECMP described in Section 3.2.3. 612 If the Routing Underlay has FRR, it will perform FRR independent of 613 BIER-TE for packets forwarded across a forward_routed adjacency. 615 3.2.3. ECMP 617 The ECMP mechanisms in BIER are tied to the BIER BIFT and are 618 therefore not directly useable with BIER-TE. The following 619 procedures describe ECMP for BIER-TE that we consider to be 620 lightweight but also well manageable. It leverages the existing 621 entropy parameter in the BIER header to keep packets of the flows on 622 the same path and it introduces a "seed" parameter to allow 623 engineering traffic to be polarized or randomized across multiple 624 hops. 626 An "Equal Cost Multipath" (ECMP) adjacency has a list of two or more 627 adjacencies included in it. It copies the BIER-TE to one of those 628 adjacencies based on the ECMP hash calculation. The BIER-TE ECMP 629 hash algorithm must select the same adjacency from that list for all 630 packets with the same "entropy" value in the BIER-TE header if the 631 same number of adjacencies and same seed are given as parameters. 632 Further use of the seed parameter is explained below. 634 3.2.4. Local Decap 636 A "local_decap" adjacency passes a copy of the payload of the BIER-TE 637 packet to the packets NextProto within the BFR (IPv4/IPv6, 638 Ethernet,...). A local_decap adjacency turns the BFR into a BFER for 639 matching packets. Local_decap adjacencies require the BFER to 640 support routing or switching for NextProto to determine how to 641 further process the packet. 643 3.3. Encapsulation considerations 645 Specifications for BIER-TE encapsulation are outside the scope of 646 this document. This section gives explanations and guidelines. 648 Because a BFR needs to interpret the BitString of a BIER-TE packet 649 differently from a BIER packet, it is necessary to distinguish BIER 650 from BIER-TE packets. This is subject to definitions in BIER 651 encapsulation specifications. 653 MPLS encapsulation [RFC8296] for example assigns one label by which 654 BFRs recognizes BIER packets for every (SI,subdomain) combination. 655 If it is desirable that every subdomain can forward only BIER or 656 BIER-TE packets, then the label allocation could stay the same, and 657 only the forwarding model (BIER/BIER-TE) would have to be defined per 658 subdomain. If it is desirable to support both BIER and BIER-TE 659 forwarding in the same subdomain, then additional labels would need 660 to be assigned for BIER-TE forwarding. 662 "forward_routed" requires an encapsulation permitting to unicast 663 BIER-TE packets to a specific interface address on a target BFR. 664 With MPLS encapsulation, this can simply be done via a label stack 665 with that addresses label as the top label - followed by the label 666 assigned to (SI,subdomain) - and if necessary (see above) BIER-TE. 667 With non-MPLS encapsulation, some form of IP tunneling (IP in IP, 668 LISP, GRE) would be required. 670 The encapsulation used for "forward_routed" adjacencies can equally 671 support existing advanced adjacency information such as "loose source 672 routes" via e.g. MPLS label stacks or appropriate header extensions 673 (e.g. for IPv6). 675 3.4. Basic BIER-TE Forwarding Example 677 [RFC Editor: remove this section.] 679 THIS SECTION TO BE REMOVED IN RFC BECAUSE IT WAS SUPERCEEDED BY 680 SECTION 1.1 EXAMPLE - UNLESS REVIEWERS CHIME IN AND EXPRESS DESIRE TO 681 KEEP THIS ADDITIONAL EXAMPLE SECTION. 683 Step by step example of basic BIER-TE forwarding. This does not use 684 ECMP or forward_routed adjacencies nor does it try to minimize the 685 number of required BitPositions for the topology. 687 [BIER-TE Controller] 688 / | \ 689 v v v 691 | p13 p1 | 692 +- BFIR2 --+ | 693 | | p2 p6 | LAN2 694 | +-- BFR3 --+ | 695 | | | p7 p11 | 696 Src -+ +-- BFER1 --+ 697 | | p3 p8 | | 698 | +-- BFR4 --+ +-- Rcv1 699 | | | | 700 | | 701 | p14 p4 | 702 +- BFIR1 --+ | 703 | +-- BFR5 --+ p10 p12 | 704 LAN1 | p5 p9 +-- BFER2 --+ 705 | +-- Rcv2 706 | 707 LAN3 709 IP |..... BIER-TE network......| IP 711 Figure 5: BIER-TE Forwarding Example 713 pXX indicate the BitPositions number assigned by the BIER-TE 714 Controller to adjacencies in the BIER-TE topology. For example, p9 715 is the adjacency towards BFR5 on the LAN connecting to BFER2. 717 BIFT BFIR2: 718 p13: local_decap() 719 p2: forward_connected(BFR3) 721 BIFT BFR3: 722 p1: forward_connected(BFIR2) 723 p7: forward_connected(BFER1) 724 p8: forward_connected(BFR4) 726 BIFT BFER1: 727 p11: local_decap() 728 p6: forward_connected(BFR3) 729 p8: forward_connected(BFR4) 731 Figure 6: BIER-TE Forwarding Example Adjacencies 733 ...and so on. 735 For example, we assume that some multicast traffic seen on LAN1 needs 736 to be sent via BIER-TE by BFIR2 towards Rcv1 and Rcv2. The BIER-TE 737 Controller determines it wants it to pass this traffic across the 738 following paths: 740 -> BFER1 ---------------> Rcv1 741 BFIR2 -> BFR3 742 -> BFR4 -> BFR5 -> BFER2 -> Rcv2 744 Figure 7: BIER-TE Forwarding Example Paths 746 These paths equal to the following BitString: p2, p5, p7, p8, p10, 747 p11, p12. 749 This BitString is assigned by BFIR2 to the example multicast traffic 750 received from LAN1. 752 Then BFIR2 forwards this multicast traffic with BIER-TE based on that 753 BitString. The BIFT of BFIR2 has only p2 and p13 populated. Only p2 754 is in the BitString and this is an adjacency towards BFR3. BFIR2 755 therefore resets p2 in the BitString and sends a copy towards BFR2. 757 BFR3 sees a BitString of p5,p7,p8,p10,p11,p12. It is only interested 758 in p1,p7,p8. It creates a copy of the packet to BFER1 (due to p7) 759 and one to BFR4 (due to p8). It resets p7, p8 before sending. 761 BFER1 sees a BitString of p5,p10,p11,p12. It is only interested in 762 p6,p7,p8,p11 and therefore considers only p11. p11 is a "local_decap" 763 adjacency installed by the BIER-TE Controller because BFER1 should 764 pass packets to IP multicast. The local_decap adjacency instructs 765 BFER1 to create a copy, decapsulate it from the BIER header and pass 766 it on to the NextProtocol, in this example IP multicast. IP 767 multicast will then forward the packet out to LAN2 because it did 768 receive PIM or IGMP joins on LAN2 for the traffic. 770 Further processing of the packet in BFR4, BFR5 and BFER2 accordingly. 772 3.5. Forwarding comparison with BIER 774 Forwarding of BIER-TE is designed to allow common forwarding hardware 775 with BIER. In fact, one of the main goals of this document is to 776 encourage the building of forwarding hardware that can not only 777 support BIER, but also BIER-TE - to allow experimentation with BIER- 778 TE and support building of BIER-TE control plane code. 780 The pseudocode in Section 6 shows how existing BIER/BIFT forwarding 781 can be amended to support basic BIER-TE forwarding, by using BIER 782 BIFT's F-BM. Only the masking of bits due to avoid duplicates must 783 be skipped when forwarding is for BIER-TE. 785 Whether to use BIER or BIER-TE forwarding can simply be a configured 786 choice per subdomain and accordingly be set up by a BIER-TE 787 Controller. The BIER packet encapsulation [RFC8296] too can be 788 reused without changes except that the currently defined BIER-TE ECMP 789 adjacency does not leverage the entropy field so that field would be 790 unused when BIER-TE forwarding is used. 792 3.6. Requirements 794 Basic BIER-TE forwarding MUST support to configure Subdomains to use 795 basic BIER-TE forwarding rules (instead of BIER). With basic BIER-TE 796 forwarding, every bit MUST support to have zero or one adjacency. It 797 MUST support the adjacency types forward_connected without DNR flag, 798 forward_routed and local_decap. All other BIER-TE forwarding 799 features are optional. These basic BIER-TE requirements make BIER-TE 800 forwarding exactly the same as BIER forwarding with the exception of 801 skipping the aforementioned F-BM masking on egress. 803 BIER-TE forwarding SHOULD support the DNR flag, as this is highly 804 useful to save bits in rings (see Section 4.6). 806 BIER-TE forwarding MAY support more than one adjacency on a bit and 807 ECMP adjacencies. The importance of ECMP adjacencies is unclear when 808 traffic engineering is used because it may be more desirable to 809 explicitly steer traffic across non-ECMP paths to make per-path 810 traffic calculation easier for BIER-TE Controllers. Having more than 811 one adjacency for a bit allows further savings of bits in hub&spoke 812 scenarios, but unlike rings it is less "natural" to flood traffic 813 across multiple links unconditional. Both ECMP and multiple 814 adjacencies are forwarding plane features that should be possible to 815 support later when needed as they do not impact the basic BIER-TE 816 replication loop. This is true because there is no inter-copy 817 dependency through resetting of F-BM as in BIER. 819 4. BIER-TE Controller BitPosition Assignments 821 This section describes how the BIER-TE Controller can use the 822 different BIER-TE adjacency types to define the BitPositions of a 823 BIER-TE domain. 825 Because the size of the BitString is limiting the size of the BIER-TE 826 domain, many of the options described exist to support larger 827 topologies with fewer BitPositions (4.1, 4.3, 4.4, 4.5, 4.6, 4.7, 828 4.8). 830 4.1. P2P Links 832 Each P2p link in the BIER-TE domain is assigned one unique 833 BitPosition with a forward_connected adjacency pointing to the 834 neighbor on the p2p link. 836 4.2. BFER 838 Every non-Leaf BFER is given a unique BitPosition with a local_decap 839 adjacency. 841 4.3. Leaf BFERs 843 BFR1(P) BFR2(P) BFR1(P) BFR2(P) 844 | \ / | | | 845 | X | | | 846 | / \ | | | 847 BFER1(PE) BFER2(PE) BFER1(PE)----BFER2(PE) 849 Leaf BFER / Non-Leaf BFER / 850 PE-router PE-router 852 Figure 8: Leaf vs. non-Leaf BFER Example 854 Leaf BFERs are BFERs where incoming BIER-TE packets never need to be 855 forwarded to another BFR but are only sent to the BFER to exit the 856 BIER-TE domain. For example, in networks where PEs are spokes 857 connected to P routers, those PEs are Leaf BFERs unless there is a 858 U-turn between two PEs. Consider how redundant disjoint traffic can 859 reach BFER1/BFER2 in above picture: When BFER1/BFER2 are Non-Leaf 860 BFER as shown on the right hand side, one traffic copy would be 861 forwarded to BFER1 from BFR1, but the other one could only reach 862 BFER1 via BFER2, which makes BFER2 a non-Leaf BFER. Likewise BFER1 863 is a non-Leaf BFER when forwarding traffic to BFER2. 865 Note that the BFERs in the left hand picture are only guaranteed to 866 be leaf-BFER by fitting routing configuration that prohibits transit 867 traffic to pass through the BFERs, which is commonly applied in these 868 topologies. 870 All leaf-BFER in a BIER-TE domain can share a single BitPosition. 871 This is possible because the BitPosition for the adjacency to reach 872 the BFER can be used to distinguish whether or not packets should 873 reach the BFER. 875 This optimization will not work if an upstream interface of the BFER 876 is using a BitPosition optimized as described in the following two 877 sections (LAN, Hub and Spoke). 879 4.4. LANs 881 In a LAN, the adjacency to each neighboring BFR on the LAN is given a 882 unique BitPosition. The adjacency of this BitPosition is a 883 forward_connected adjacency towards the BFR and this BitPosition is 884 populated into the BIFT of all the other BFRs on that LAN. 886 BFR1 887 |p1 888 LAN1-+-+---+-----+ 889 p3| p4| p2| 890 BFR3 BFR4 BFR7 892 Figure 9: LAN Example 894 If Bandwidth on the LAN is not an issue and most BIER-TE traffic 895 should be copied to all neighbors on a LAN, then BitPositions can be 896 saved by assigning just a single BitPosition to the LAN and 897 populating the BitPosition of the BIFTs of each BFRs on the LAN with 898 a list of forward_connected adjacencies to all other neighbors on the 899 LAN. 901 This optimization does not work in the case of BFRs redundantly 902 connected to more than one LANs with this optimization because these 903 BFRs would receive duplicates and forward those duplicates into the 904 opposite LANs. Adjacencies of such BFRs into their LANs still need a 905 separate BitPosition. 907 4.5. Hub and Spoke 909 In a setup with a hub and multiple spokes connected via separate p2p 910 links to the hub, all p2p links can share the same BitPosition. The 911 BitPosition on the hub's BIFT is set up with a list of 912 forward_connected adjacencies, one for each Spoke. 914 This option is similar to the BitPosition optimization in LANs: 915 Redundantly connected spokes need their own BitPositions. 917 This type of optimized BP could be used for example when all traffic 918 is "broadcast" traffic (very dense receiver set) such as live-TV or 919 situation-awareness (SA). This BP optimization can then be used to 920 explicitly steer different traffic flows across different ECMP paths 921 in Data-Center or broadband-aggregation networks with minimal use of 922 BPs. 924 4.6. Rings 926 In L3 rings, instead of assigning a single BitPosition for every p2p 927 link in the ring, it is possible to save BitPositions by setting the 928 "Do Not Reset" (DNR) flag on forward_connected adjacencies. 930 For the rings shown in the following picture, a single BitPosition 931 will suffice to forward traffic entering the ring at BFRa or BFRb all 932 the way up to BFR1: 934 On BFRa, BFRb, BFR30,... BFR3, the BitPosition is populated with a 935 forward_connected adjacency pointing to the clockwise neighbor on the 936 ring and with DNR set. On BFR2, the adjacency also points to the 937 clockwise neighbor BFR1, but without DNR set. 939 Handling DNR this way ensures that copies forwarded from any BFR in 940 the ring to a BFR outside the ring will not have the ring BitPosition 941 set, therefore minimizing the chance to create loops. 943 v v 944 | | 945 L1 | L2 | L3 946 /-------- BFRa ---- BFRb --------------------\ 947 | | 948 \- BFR1 - BFR2 - BFR3 - ... - BFR29 - BFR30 -/ 949 | | L4 | | 950 p33| p15| 951 BFRd BFRc 953 Figure 10: Ring Example 955 Note that this example only permits for packets to enter the ring at 956 BFRa and BFRb, and that packets will always travel clockwise. If 957 packets should be allowed to enter the ring at any ring BFR, then one 958 would have to use two ring BitPositions. One for clockwise, one for 959 counterclockwise. 961 Both would be set up to stop rotating on the same link, e.g. L1. 962 When the ingress ring BFR creates the clockwise copy, it will reset 963 the counterclockwise BitPosition because the DNR bit only applies to 964 the bit for which the replication is done. Likewise for the 965 clockwise BitPosition for the counterclockwise copy. In result, the 966 ring ingress BFR will send a copy in both directions, serving BFRs on 967 either side of the ring up to L1. 969 4.7. Equal Cost MultiPath (ECMP) 971 The ECMP adjacency allows to use just one BP per link bundle between 972 two BFRs instead of one BP for each p2p member link of that link 973 bundle. In the following picture, one BP is used across L1,L2,L3. 975 --L1----- 976 BFR1 --L2----- BFR2 977 --L3----- 979 BIFT entry in BFR1: 980 ------------------------------------------------------------------ 981 | Index | Adjacencies | 982 ================================================================== 983 | 0:6 | ECMP({forward_connected(L1, BFR2), | 984 | | forward_connected(L2, BFR2), | 985 | | forward_connected(L3, BFR2)}, seed) | 986 ------------------------------------------------------------------ 988 BIFT entry in BFR2: 989 ------------------------------------------------------------------ 990 | Index | Adjacencies | 991 ================================================================== 992 | 0:6 | ECMP({forward_connected(L1, BFR1), | 993 | | forward_connected(L2, BFR1), | 994 | | forward_connected(L3, BFR1)}, seed) | 995 ------------------------------------------------------------------ 997 Figure 11: ECMP Example 999 This document does not standardize any ECMP algorithm because it is 1000 sufficient for implementations to document their freely chosen ECMP 1001 algorithm. This allows the BIER-TE Controller to calculate ECMP 1002 paths and seeds. The following picture shows an example ECMP 1003 algorithm: 1005 forward(packet, ECMP(adj(0), adj(1),... adj(N-1), seed)): 1006 i = (packet(bier-header-entropy) XOR seed) % N 1007 forward packet to adj(i) 1009 Figure 12: ECMP algorithm Example 1011 In the following example, all traffic from BFR1 towards BFR10 is 1012 intended to be ECMP load split equally across the topology. This 1013 example is not meant as a likely setup, but to illustrate that ECMP 1014 can be used to share BPs not only across link bundles, and it 1015 explains the use of the seed parameter. 1017 BFR1 (BFIR) 1018 /L11 \L12 1019 / \ 1020 BFR2 BFR3 1021 /L21 \L22 /L31 \L32 1022 / \ / \ 1023 BFR4 BFR5 BFR6 BFR7 1024 \ / \ / 1025 \ / \ / 1026 BFR8 BFR9 1027 \ / 1028 \ / 1029 BFR10 (BFER) 1031 BIFT entry in BFR1: 1032 ------------------------------------------------------------------ 1033 | 0:6 | ECMP({forward_connected(L11, BFR2), | 1034 | | forward_connected(L12, BFR3)}, seed1) | 1035 ------------------------------------------------------------------ 1037 BIFT entry in BFR2: 1038 ------------------------------------------------------------------ 1039 | 0:7 | ECMP({forward_connected(L21, BFR4), | 1040 | | forward_connected(L22, BFR5)}, seed1) | 1041 ------------------------------------------------------------------ 1043 BIFT entry in BFR3: 1044 ------------------------------------------------------------------ 1045 | 0:7 | ECMP({forward_connected(L31, BFR6), | 1046 | | forward_connected(L32, BFR7)}, seed1) | 1047 ------------------------------------------------------------------ 1048 BIFT entry in BFR4, BFR5: 1049 ------------------------------------------------------------------ 1050 | 0:8 | forward_connected(Lxx, BFR8) |xx differs on BFR4/BFR5| 1051 ------------------------------------------------------------------ 1053 BIFT entry in BFR6, BFR7: 1054 ------------------------------------------------------------------ 1055 | 0:8 | forward_connected(Lxx, BFR9) |xx differs on BFR6/BFR7| 1056 ------------------------------------------------------------------ 1058 BIFT entry in BFR8, BFR9: 1059 ------------------------------------------------------------------ 1060 | 0:9 | forward_connected(Lxx, BFR10) |xx differs on BFR8/BFR9| 1061 ------------------------------------------------------------------ 1063 Figure 13: Polarization Example 1065 Note that for the following discussion of ECMP, only the BIFT ECMP 1066 adjacencies on BFR1, BFR2, BFR3 are relevant. The re-use of BP 1067 across BFR in this example is further explained in Section 4.9 below. 1069 With the setup of ECMP in above topology, traffic would not be 1070 equally load-split. Instead, links L22 and L31 would see no traffic 1071 at all: BFR2 will only see traffic from BFR1 for which the ECMP hash 1072 in BFR1 selected the first adjacency in the list of 2 adjacencies 1073 given as parameters to the ECMP. It is link L11-to-BFR2. BFR2 1074 performs again ECMP with two adjacencies on that subset of traffic 1075 using the same seed1, and will therefore again select the first of 1076 its two adjacencies: L21-to-BFR4. And therefore L22 and BFR5 sees no 1077 traffic. Likewise for L31 and BFR6. 1079 This issue in BFR2/BFR3 is called polarization. It results from the 1080 re-use of the same hash function across multiple consecutive hops in 1081 topologies like these. To resolve this issue, the ECMP adjacency on 1082 BFR1 can be set up with a different seed2 than the ECMP adjacencies 1083 on BFR2/BFR3. BFR2/BFR3 can use the same hash because packets will 1084 not sequentially pass across both of them. Therefore, they can also 1085 use the same BP 0:7. 1087 Note that ECMP solutions outside of BIER often hide the seed by auto- 1088 selecting it from local entropy such as unique local or next-hop 1089 identifiers. The solutions choosen for BIER-TE to allow the BIER-TE 1090 Controller to explicitly set the seed maximizes the ability of the 1091 BIER-TE Controller to choose the seed, independent of such seed 1092 source that the BIER-TE Controller may not be able to control well, 1093 and even calculate optimized seeds for multi-hop cases. 1095 4.8. Routed adjacencies 1097 4.8.1. Reducing BitPositions 1099 Routed adjacencies can reduce the number of BitPositions required 1100 when the path engineering requirement is not hop-by-hop explicit path 1101 selection, but loose-hop selection. Routed adjacencies can also 1102 allow to operate BIER-TE across intermediate hop routers that do not 1103 support BIER-TE. 1105 ............... 1106 ...BFR1--... ...--L1-- BFR2... 1107 ... .Routers. ...--L2--/ 1108 ...BFR4--... ...------ BFR3... 1109 ............... | 1110 LO 1111 Network Area 1 1113 Figure 14: Routed Adjacencies Example 1115 Assume the requirement in the above picture is to explicitly steer 1116 traffic flows that have arrived at BFR1 or BFR4 via a shortest path 1117 in the routing underlay "Network Area 1" to one of the following 1118 three next segments: (1) BFR2 via link L1, (2) BFR2 via link L2, (3) 1119 via BFR3. 1121 To enable this, both BFR1 and BFR4 are set up with a forward_routed 1122 adjacency BitPosition towards an address of BFR2 on link L1, another 1123 forward_routed BitPosition towards an address of BFR2 on link L2 and 1124 a third forward_routed Bitposition towards a node address LO of BFR3. 1126 4.8.2. Supporting nodes without BIER-TE 1128 Routed adjacencies also enable incremental deployment of BIER-TE. 1129 Only the nodes through which BIER-TE traffic needs to be steered - 1130 with or without replication - need to support BIER-TE. Where they 1131 are not directly connected to each other, forward_routed adjacencies 1132 are used to pass over non BIER-TE enabled nodes. 1134 4.9. Reuse of BitPositions (without DNR) 1136 BitPositions can be re-used across multiple BFR to minimize the 1137 number of BP needed. This happens when adjacencies on multiple BFR 1138 use the DNR flag as described above, but it can also be done for non- 1139 DNR adjacencies. This section only discussses this non-DNR case. 1141 Because BP are reset after passing a BFR with an adjacency for that 1142 BP, reuse of BP across multiple BFR does not introduce any problems 1143 with duplicates or loops that do not also exist when every adjacency 1144 has a unique BP: Instead of setting one BP in a BitString that is 1145 reused in N-adjacencies, one would get the same or worse results if 1146 each of these adjacencies had a unique BP and all of them where set 1147 in the BitString. Instead, based on the case, BPs can be reused 1148 without limitation, or they introduce fewer path engineering choices, 1149 or they do not work. 1151 BP cannot be reused across two BFR that would need to be passed 1152 sequentially for some path: The first BFR will reset the BP, so those 1153 paths cannot be built. BP can be set across BFR that would (A) only 1154 occur across different paths or (B) across different branches of the 1155 same tree. 1157 An example of (A) was given in Figure 13, where BP 0:7, BP 0:8 and BP 1158 0:9 are each reused across multiple BFR because a single packet/path 1159 would never be able to reach more than one BFR sharing the same BP. 1161 Assume the example was changed: BFR1 has no ECMP adjacency for BP 1162 0:6, but instead BP 0:5 with forward_connected to BFR2 and BP 0:6 1163 with forward_connected to BFR3. Packets with both BP 0:5 and BP 0:6 1164 would now be able to reach both BFR2 and BFR3 and the still existing 1165 re-use of BP 0:7 between BFR2 and BFR3 is a case of (B) where reuse 1166 of BP is perfect because it does not limit the set of useful path 1167 choices: 1169 If instead of reusing BP 0:7, BFR3 used a separate BP 0:10 for its 1170 ECMP adjacency, no useful additional path engineering would be 1171 enabled. If duplicates at BFR10 where undesirable, this would be 1172 done by not setting BP 0:5 and BP 0:6 for the same packet. If the 1173 duplicates where desirable (e.g.: resilient transmission), the 1174 additional BP 0:10 would also not render additional value. 1176 Reuse may also save BPs in larger topologies. Consider the topology 1177 shown in Figure 17, but only the following explanations: A BFIR/ 1178 sender (e.g.: video headend) is attached to area 1, and area 2...6 1179 contain receivers/BFER. Assume each area had a distribution ring, 1180 each with two BPs to indicate the direction (as explained in before). 1181 These two BPs could be reused across the 5 areas. Packets would be 1182 replicated through other BPs to the desired subset of areas, and once 1183 a packet copy reaches the ring of the area, the two ring BPs come 1184 into play. This reuse is a case of (B), but it limits the topology 1185 choices: Packets can only flow around the same direction in the rings 1186 of all areas. This may or may not be acceptable based on the desired 1187 path engineering: If resilient transmission is the path engineering 1188 goal, then it is likely a good optimization, if the bandwidth of each 1189 ring was to be optimized separately, it would not be a good 1190 limitation. 1192 4.10. Summary of BP optimizations 1194 This section reviewed a range of techniques by which a BIER-TE 1195 Controller can create a BIER-TE topology in a way that minimizes the 1196 number of necessary BPs. 1198 Without any optimization, a BIER-TE Controller would attempt to map 1199 the network subnet topology 1:1 into the BIER-TE topology and every 1200 subnet adjacent neighbor requires a forward_connected BP and every 1201 BFER requires a local_decap BP. 1203 The optimizations described are then as follows: 1205 o P2p links require only one BP (Section 4.1). 1207 o All leaf-BFER can share a single local_decap BP (Section 4.3). 1209 o A LAN with N BFR needs at most N BP (one for each BFR). It only 1210 needs one BP for all those BFR tha are not redundanty connected to 1211 multiple LANs (Section 4.4). 1213 o A hub with p2p connections to multiple non-leaf-BFER spokes can 1214 share one BP to all spokes if traffic can be flooded to all 1215 spokes, e.g.: because of no bandwidth concerns or dense receiver 1216 sets (Section 4.5). 1218 o Rings of BFR can be built with just two BP (one for each 1219 direction) except for BFR with multiple ring connections - similar 1220 to LANs (Section 4.6). 1222 o ECMP adjacencies to N neighbors can replace N BP with 1 BP. 1223 Multihop ECMP can avoid polarization through different seeds of 1224 the ECMP algorithm (Section 4.7). 1226 o Routed adjacencies allow to "tunnel" across non-BIER-TE capable 1227 routers and across BIER-TE capable routers where no traffic- 1228 steering or replications are required (Section 4.8). 1230 o BP can generally be reused across nodes that do not need to be 1231 consecutive in paths, but depending on scenario, this may limit 1232 the feasible path engineering options (Section 4.9). 1234 Note that the described list of optimizations is not exhaustive. 1235 Especially when the set of required path engineering choices is 1236 limited and the set of possible subsets of BFER that should be able 1237 to receive traffic is limited, further optimizations of BP are 1238 possible. The hub & spoke optimization is a simple example of such 1239 traffic pattern dependent optimizations. 1241 5. Avoiding loops and duplicates 1243 5.1. Loops 1245 Whenever BIER-TE creates a copy of a packet, the BitString of that 1246 copy will have all BitPositions cleared that are associated with 1247 adjacencies on the BFR. This inhibits looping of packets. The only 1248 exception are adjacencies with DNR set. 1250 With DNR set, looping can happen. Consider in the ring picture that 1251 link L4 from BFR3 is plugged into the L1 interface of BFRa. This 1252 creates a loop where the rings clockwise BitPosition is never reset 1253 for copies of the packets traveling clockwise around the ring. 1255 To inhibit looping in the face of such physical misconfiguration, 1256 only forward_connected adjacencies are permitted to have DNR set, and 1257 the link layer port unique unicast destination address of the 1258 adjacency (e.g. MAC address) protects against closing the loop. 1259 Link layers without port unique link layer addresses should not be 1260 used with the DNR flag set. 1262 5.2. Duplicates 1264 Duplicates happen when the topology of the BitString is not a tree 1265 but redundantly connecting BFRs with each other. The BIER-TE 1266 Controller must therefore ensure to only create BitStrings that are 1267 trees in the topology. 1269 When links are incorrectly physically re-connected before the BIER-TE 1270 Controller updates BitStrings in BFIRs, duplicates can happen. Like 1271 loops, these can be inhibited by link layer addressing in 1272 forward_connected adjacencies. 1274 If interface or loopback addresses used in forward_routed adjacencies 1275 are moved from one BFR to another, duplicates can equally happen. 1276 Such re-addressing operations must be coordinated with the BIER-TE 1277 Controller. 1279 6. BIER-TE Forwarding Pseudocode 1281 The following simplified pseudocode for BIER-TE forwarding is using 1282 BIER forwarding pseudocode of [RFC8279], section 6.5 with the one 1283 modification necessary to support basic BIER-TE forwarding. Like the 1284 BIER pseudo forwarding code, for simplicity it does hide the details 1285 of the adjacency processing inside PacketSend() which can be 1286 forward_connected, forward_routed or local_decap. 1288 void ForwardBitMaskPacket_withTE (Packet) 1289 { 1290 SI=GetPacketSI(Packet); 1291 Offset=SI*BitStringLength; 1292 for (Index = GetFirstBitPosition(Packet->BitString); Index ; 1293 Index = GetNextBitPosition(Packet->BitString, Index)) { 1294 F-BM = BIFT[Index+Offset]->F-BM; 1295 if (!F-BM) continue; 1296 BFR-NBR = BIFT[Index+Offset]->BFR-NBR; 1297 PacketCopy = Copy(Packet); 1298 PacketCopy->BitString &= F-BM; [2] 1299 PacketSend(PacketCopy, BFR-NBR); 1300 // The following must not be done for BIER-TE: 1301 // Packet->BitString &= ~F-BM; [1] 1302 } 1303 } 1305 Figure 15: Simplified BIER-TE Forwarding Pseudocode 1307 The difference is that in BIER-TE, step [1] must not be performed, 1308 but is replaced with [2] (when the forwarding plane algorithm is 1309 implemented verbatim as shown above). 1311 In BIER, the F-BM of a BP has all BP set that are meant to be 1312 forwarded via the same neighbor. It is used to reset those BP in the 1313 packet after the first copy to this neighbor has been made to inhibit 1314 multiple copies to the same neighbor. 1316 In BIER-TE, the F-BM of a particular BP with an adjacency is the list 1317 of all BPs with an adjacency on this BFR except the particular BP 1318 itself if it has an adjacency with the DNR bit set. The F-BM is used 1319 to reset the F-BM BPs before creating copies. 1321 In BIER, the order of BPs impacts the result of forwarding because of 1322 [1]. In BIER-TE, forwarding is not impacted by the order of BPs. It 1323 is therefore possible to further optimize forwarding than in BIER. 1324 For example, BIER-TE forwarding can be parallelized such that a 1325 parallel instance (such as an egres linecard) can process any subset 1326 of BPs without any considerations for the other BPs - and without any 1327 prior, cross-BP shared processing. 1329 The above simplified pseudocode is elaborated further as follows: 1331 o This pseudocode eliminates per-bit F-BM, therefore reducing state 1332 by BitStringLength^2*SI and eliminating the need for per-packet- 1333 copy masking operation except for adjacencies with DNR flag set: 1335 * AdjacentBits[SI] are bits with a non-empty list of adjacencies. 1336 This can be computed whenever the BIER-TE Controller updates 1337 the adjacencies. 1339 * Only the AdjacentBits need to be examined in the loop for 1340 packet copies. 1342 * The packets BitString is masked with those AdjacentBits on 1343 ingress to avoid packets looping. 1345 o The code loops over the adjacencies because there may be more than 1346 one adjacency for a bit. 1348 o When an adjacency has the DNR bit, the bit is set in the packet 1349 copy (to save bits in rings for example). 1351 o The ECMP adjacency is shown. Its parameters are a 1352 ListOfAdjacencies from which one is picked. 1354 o The forward_local, forward_routed, local_decap adjacencies are 1355 shown with their parameters. 1357 void ForwardBitMaskPacket_withTE (Packet) 1358 { 1359 SI=GetPacketSI(Packet); 1360 Offset=SI*BitStringLength; 1361 AdjacentBitstring = Packet->BitString &= ~AdjacentBits[SI]; 1362 Packet->BitString &= AdjacentBits[SI]; 1363 for (Index = GetFirstBitPosition(AdjacentBits); Index ; 1364 Index = GetNextBitPosition(AdjacentBits, Index)) { 1365 foreach adjacency BIFT[Index+Offset] { 1366 if(adjacency == ECMP(ListOfAdjacencies, seed) ) { 1367 I = ECMP_hash(sizeof(ListOfAdjacencies), 1368 Packet->Entropy, seed); 1369 adjacency = ListOfAdjacencies[I]; 1370 } 1371 PacketCopy = Copy(Packet); 1372 switch(adjacency) { 1373 case forward_connected(interface,neighbor,DNR): 1374 if(DNR) 1375 PacketCopy->BitString |= 2<<(Index-1); 1376 SendToL2Unicast(PacketCopy,interface,neighbor); 1378 case forward_routed({VRF},neighbor): 1379 SendToL3(PacketCopy,{VRF,}l3-neighbor); 1381 case local_decap({VRF},neighbor): 1382 DecapBierHeader(PacketCopy); 1383 PassTo(PacketCopy,{VRF,}Packet->NextProto); 1384 } 1385 } 1386 } 1387 } 1389 Figure 16: BIER-TE Forwarding Pseudocode 1391 7. Managing SI, subdomains and BFR-ids 1393 When the number of bits required to represent the necessary hops in 1394 the topology and BFER exceeds the supported bitstring length, 1395 multiple SI and/or subdomains must be used. This section discusses 1396 how. 1398 BIER-TE forwarding does not require the concept of BFR-id, but 1399 routing underlay, flow overlay and BIER headers may. This section 1400 also discusses how BFR-ids can be assigned to BFIR/BFER for BIER-TE. 1402 7.1. Why SI and sub-domains 1404 For BIER and BIER-TE forwarding, the most important result of using 1405 multiple SI and/or subdomains is the same: Packets that need to be 1406 sent to BFER in different SI or subdomains require different BIER 1407 packets: each one with a bitstring for a different (SI,subdomain) 1408 combination. Each such bitstring uses one bitstring length sized SI 1409 block in the BIFT of the subdomain. We call this a BIFT:SI (block). 1411 For BIER and BIER-TE forwarding itself there is also no difference 1412 whether different SI and/or sub-domains are chosen, but SI and 1413 subdomain have different purposes in the BIER architecture shared by 1414 BIER-TE. This impacts how operators are managing them and how 1415 especially flow overlays will likely use them. 1417 By default, every possible BFIR/BFER in a BIER network would likely 1418 be given a BFR-id in subdomain 0 (unless there are > 64k BFIR/BFER). 1420 If there are different flow services (or service instances) requiring 1421 replication to different subsets of BFER, then it will likely not be 1422 possible to achieve the best replication efficiency for all of these 1423 service instances via subdomain 0. Ideal replication efficiency for 1424 N BFER exists in a subdomain if they are split over not more than 1425 ceiling(N/bitstring-length) SI. 1427 If service instances justify additional BIER:SI state in the network, 1428 additional subdomains will be used: BFIR/BFER are assigned BFIR-id in 1429 those subdomains and each service instance is configured to use the 1430 most appropriate subdomain. This results in improved replication 1431 efficiency for different services. 1433 Even if creation of subdomains and assignment of BFR-id to BFIR/BFER 1434 in those subdomains is automated, it is not expected that individual 1435 service instances can deal with BFER in different subdomains. A 1436 service instance may only support configuration of a single subdomain 1437 it should rely on. 1439 To be able to easily reuse (and modify as little as possible) 1440 existing BIER procedures including flow-overlay and routing underlay, 1441 when BIER-TE forwarding is added, we therefore reuse SI and subdomain 1442 logically in the same way as they are used in BIER: All necessary 1443 BFIR/BFER for a service use a single BIER-TE BIFT and are split 1444 across as many SI as necessary (see below). Different services may 1445 use different subdomains that primarily exist to provide more 1446 efficient replication (and for BIER-TE desirable path engineering) 1447 for different subsets of BFIR/BFER. 1449 7.2. Bit assignment comparison BIER and BIER-TE 1451 In BIER, bitstrings only need to carry bits for BFER, which leads to 1452 the model that BFR-ids map 1:1 to each bit in a bitstring. 1454 In BIER-TE, bitstrings need to carry bits to indicate not only the 1455 receiving BFER but also the intermediate hops/links across which the 1456 packet must be sent. The maximum number of BFER that can be 1457 supported in a single bitstring or BIFT:SI depends on the number of 1458 bits necessary to represent the desired topology between them. 1460 "Desired" topology because it depends on the physical topology, and 1461 on the desire of the operator to allow for explicit path engineering 1462 across every single hop (which requires more bits), or reducing the 1463 number of required bits by exploiting optimizations such as unicast 1464 (forward_route), ECMP or flood (DNR) over "uninteresting" sub-parts 1465 of the topology - e.g. parts where different trees do not need to 1466 take different paths due to traffic-engineering reasons. 1468 The total number of bits to describe the topology vs. the BFER in a 1469 BIFT:SI can range widely based on the size of the topology and the 1470 amount of alternative paths in it. The higher the percentage, the 1471 higher the likelihood, that those topology bits are not just BIER-TE 1472 overhead without additional benefit, but instead that they will allow 1473 to express desirable traffic-engineering path alternatives. 1475 7.3. Using BFR-id with BIER-TE 1477 Because there is no 1:1 mapping between bits in the bitstring and 1478 BFER, BIER-TE cannot simply rely on the BIER 1:1 mapping between bits 1479 in a bitstring and BFR-id. 1481 In BIER, automatic schemes could assign all possible BFR-ids 1482 sequentially to BFERs. This will not work in BIER-TE. In BIER-TE, 1483 the operator or BIER-TE Controller has to determine a BFR-id for each 1484 BFER in each required subdomain. The BFR-id may or may not have a 1485 relationship with a bit in the bitstring. Suggestions are detailed 1486 below. Once determined, the BFR-id can then be configured on the 1487 BFER and used by flow overlay, routing underlay and the BIER header 1488 almost the same as the BFR-id in BIER. 1490 The one exception are application/flow-overlays that automatically 1491 calculate the bitstring(s) of BIER packets by converting BFR-id to 1492 bits. In BIER-TE, this operation can be done in two ways: 1494 "Independent branches": For a given application or (set of) trees, 1495 the branches from a BFIR to every BFER are independent of the 1496 branches to any other BFER. For example, shortest part trees have 1497 independent branches. 1499 "Interdependent branches": When a BFER is added or deleted from a 1500 particular distribution tree, branches to other BFER still in the 1501 tree may need to change. Steiner tree are examples of dependent 1502 branch trees. 1504 If "independent branches" are sufficient, the BIER-TE Controller can 1505 provide to such applications for every BFR-id a SI:bitstring with the 1506 BIER-TE bits for the branch towards that BFER. The application can 1507 then independently calculate the SI:bitstring for all desired BFER by 1508 OR'ing their bitstrings. 1510 If "interdependent branches" are required, the application could call 1511 a BIER-TE Controller API with the list of required BFER-id and get 1512 the required bitstring back. Whenever the set of BFER-id changes, 1513 this is repeated. 1515 Note that in either case (unlike in BIER), the bits in BIER-TE may 1516 need to change upon link/node failure/recovery, network expansion and 1517 network load by other traffic (as part of traffic engineering goals). 1518 Interactions between such BFIR applications and the BIER-TE 1519 Controller do therefore need to support dynamic updates to the 1520 bitstrings. 1522 7.4. Assigning BFR-ids for BIER-TE 1524 For a non-leaf BFER, there is usually a single bit k for that BFER 1525 with a local_decap() adjacency on the BFER. The BFR-id for such a 1526 BFER is therefore most easily the one it would have in BIER: SI * 1527 bitstring-length + k. 1529 As explained earlier in the document, leaf BFERs do not need such a 1530 separate bit because the fact alone that the BIER-TE packet is 1531 forwarded to the leaf BFER indicates that the BFER should decapsulate 1532 it. Such a BFER will have one or more bits for the links leading 1533 only to it. The BFR-id could therefore most easily be the BFR-id 1534 derived from the lowest bit for those links. 1536 These two rules are only recommendations for the operator or BIER-TE 1537 Controller assigning the BFR-ids. Any allocation scheme can be used, 1538 the BFR-ids just need to be unique across BFRs in each subdomain. 1540 It is not currently determined if a single subdomain could or should 1541 be allowed to forward both BIER and BIER-TE packets. If this should 1542 be supported, there are two options: 1544 A. BIER and BIER-TE have different BFR-id in the same subdomain. 1545 This allows higher replication efficiency for BIER because their BFR- 1546 id can be assigned sequentially, while the bitstrings for BIER-TE 1547 will have also the additional bits for the topology. There is no 1548 relationship between a BFR BIER BFR-id and BIER-TE BFR-id. 1550 B. BIER and BIER-TE share the same BFR-id. The BFR-id are assigned 1551 as explained above for BIER-TE and simply reused for BIER. The 1552 replication efficiency for BIER will be as low as that for BIER-TE in 1553 this approach. Depending on topology, only the same 20%..80% of bits 1554 as possible for BIER-TE can be used for BIER. 1556 7.5. Example bit allocations 1558 7.5.1. With BIER 1560 Consider a network setup with a bitstring length of 256 for a network 1561 topology as shown in the picture below. The network has 6 areas, 1562 each with ca. 170 BFR, connecting via a core with some larger (core) 1563 BFR. To address all BFER with BIER, 4 SI are required. To send a 1564 BIER packet to all BFER in the network, 4 copies need to be sent by 1565 the BFIR. On the BFIR it does not make a difference how the BFR-id 1566 are allocated to BFER in the network, but for efficiency further down 1567 in the network it does make a difference. 1569 area1 area2 area3 1570 BFR1a BFR1b BFR2a BFR2b BFR3a BFR3b 1571 | \ / \ / | 1572 ................................ 1573 . Core . 1574 ................................ 1575 | / \ / \ | 1576 BFR4a BFR4b BFR5a BFR5b BFR6a BFR6b 1577 area4 area5 area6 1579 Figure 17: Scaling BIER-TE bits by reuse 1581 With random allocation of BFR-id to BFER, each receiving area would 1582 (most likely) have to receive all 4 copies of the BIER packet because 1583 there would be BFR-id for each of the 4 SI in each of the areas. 1584 Only further towards each BFER would this duplication subside - when 1585 each of the 4 trees runs out of branches. 1587 If BFR-id are allocated intelligently, then all the BFER in an area 1588 would be given BFR-id with as few as possible different SI. Each 1589 area would only have to forward one or two packets instead of 4. 1591 Given how networks can grow over time, replication efficiency in an 1592 area will also easily go down over time when BFR-id are network wide 1593 allocated sequentially over time. An area that initially only has 1594 BFR-id in one SI might end up with many SI over a longer period of 1595 growth. Allocating SIs to areas with initially sufficiently many 1596 spare bits for growths can help to alleviate this issue. Or renumber 1597 BFR-id after network expansion. In this example one may consider to 1598 use 6 SI and assign one to each area. 1600 This example shows that intelligent BFR-id allocation within at least 1601 subdomain 0 can even be helpful or even necessary in BIER. 1603 7.5.2. With BIER-TE 1605 In BIER-TE one needs to determine a subset of the physical topology 1606 and attached BFER so that the "desired" representation of this 1607 topology and the BFER fit into a single bitstring. This process 1608 needs to be repeated until the whole topology is covered. 1610 Once bits/SIs are assigned to topology and BFER, BFR-id is just a 1611 derived set of identifiers from the operator/BIER-TE Controller as 1612 explained above. 1614 Every time that different sub-topologies have overlap, bits need to 1615 be repeated across the bitstrings, increasing the overall amount of 1616 bits required across all bitstring/SIs. In the worst case, random 1617 subsets of BFER are assigned to different SI. This is much worse 1618 than in BIER because it not only reduces replication efficiency with 1619 the same number of overall bits, but even further - because more bits 1620 are required due to duplication of bits for topology across multiple 1621 SI. Intelligent BFER to SI assignment and selecting specific 1622 "desired" subtopologies can minimize this problem. 1624 To set up BIER-TE efficiently for above topology, the following bit 1625 allocation methods can be used. This method can easily be expanded 1626 to other, similarly structured larger topologies. 1628 Each area is allocated one or more SI depending on the number of 1629 future expected BFER and number of bits required for the topology in 1630 the area. In this example, 6 SI, one per area. 1632 In addition, we use 4 bits in each SI: bia, bib, bea, beb: bit 1633 ingress a, bit ingress b, bit egress a, bit egress b. These bits 1634 will be used to pass BIER packets from any BFIR via any combination 1635 of ingress area a/b BFR and egress area a/b BFR into a specific 1636 target area. These bits are then set up with the right 1637 forward_routed adjacencies on the BFIR and area edge BFR: 1639 On all BFIR in an area j, bia in each BIFT:SI is populated with the 1640 same forward_routed(BFRja), and bib with forward_routed(BFRjb). On 1641 all area edge BFR, bea in BIFT:SI=k is populated with 1642 forward_routed(BFRka) and beb in BIFT:SI=k with 1643 forward_routed(BFRkb). 1645 For BIER-TE forwarding of a packet to some subset of BFER across all 1646 areas, a BFIR would create at most 6 copies, with SI=1...SI=6, In 1647 each packet, the bits indicate bits for topology and BFER in that 1648 topology plus the four bits to indicate whether to pass this packet 1649 via the ingress area a or b border BFR and the egress area a or b 1650 border BFR, therefore allowing path engineering for those two 1651 "unicast" legs: 1) BFIR to ingress are edge and 2) core to egress 1652 area edge. Replication only happens inside the egress areas. For 1653 BFER in the same area as in the BFIR, these four bits are not used. 1655 7.6. Summary 1657 BIER-TE can like BIER support multiple SI within a sub-domain to 1658 allow re-using the concept of BFR-id and therefore minimize BIER-TE 1659 specific functions in underlay routing, flow overlay methods and BIER 1660 headers. 1662 The number of BFIR/BFER possible in a subdomain is smaller than in 1663 BIER because BIER-TE uses additional bits for topology. 1665 Subdomains can in BIER-TE be used like in BIER to create more 1666 efficient replication to known subsets of BFER. 1668 Assigning bits for BFER intelligently into the right SI is more 1669 important in BIER-TE than in BIER because of replication efficiency 1670 and overall amount of bits required. 1672 8. BIER-TE and Segment Routing 1674 SR aims to enable lightweight path engineering via loose source 1675 routing. Compared to its more heavy-weight predecessor RSVP-TE, SR 1676 does for example not require per-path signaling to each of these 1677 hops. 1679 BIER-TE supports the same design philosophy for multicast. Like in 1680 SR, it relies on source-routing - via the definition of a BitString. 1681 Like SR, it only requires to consider the "hops" on which either 1682 replication has to happen, or across which the traffic should be 1683 steered (even without replication). Any other hops can be skipped 1684 via the use of routed adjacencies. 1686 BIER-TE BitPosition (BP) can be understood as the BIER-TE equivalent 1687 of "forwarding segments" in SR, but they have a different scope than 1688 SR forwarding segments. Whereas forwarding segments in SR are global 1689 or local, BPs in BIER-TE have a scope that is the group of BFR(s) 1690 that have adjacencies for this BP in their BIFT. This can be called 1691 "adjacency" scoped forwarding segments. 1693 Adjacency scope could be global, but then every BFR would need an 1694 adjacency for this BP, for example a forward_routed adjacency with 1695 encapsulation to the global SR SID of the destination. Such a BP 1696 would always result in ingress replication though. The first BFR 1697 encountering this BP would directly replicate to it. Only by using 1698 non-global adjacency scope for BPs can traffic be steered and 1699 replicated on non-ingress BFR. 1701 SR can naturally be combined with BIER-TE and help to optimize it. 1702 For example, instead of defining BitPositions for non-replicating 1703 hops, it is equally possible to use segment routing encapsulations 1704 (eg: MPLS label stacks) for the encapsulation of "forward_routed" 1705 adjacencies. 1707 Note that BIER itself can also be seen to be similar to SR. BIER BPs 1708 act as global destination Node-SIDs and the BIER bitstring is simply 1709 a highly optimized mechanism to indicate multiple such SIDS and let 1710 the network take care of effectively replicating the packet hop-by- 1711 hop to each destination Node-SID. What BIER does not allow is to 1712 indicate intermediate hops, or terms of SR the ability to indicate a 1713 sequence of SID to reach the destination. This is what BIER-TE and 1714 its adjacency scoped BP enables. 1716 Both BIER and BIER-TE allow BFIR to "opportunistically" copy packets 1717 to a set of desired BFER on a packet-by-packet basis. In BIER, this 1718 is done by OR'ing the BP for the desired BFER. In BIER-TE this can 1719 be done by OR'ing for each desired BFER a bitstring using the 1720 "independent branches" approach described in Section 7.3 and 1721 therefore also indicating the engineered path towards each desired 1722 BFER. This is the approach that 1723 [I-D.ietf-bier-multicast-http-response] relies on. 1725 9. Security Considerations 1727 The security considerations are the same as for BIER with the 1728 following differences: 1730 BFR-ids and BFR-prefixes are not used in BIER-TE, nor are procedures 1731 for their distribution, so these are not attack vectors against BIER- 1732 TE. 1734 10. IANA Considerations 1736 This document requests no action by IANA. 1738 11. Acknowledgements 1740 The authors would like to thank Greg Shepherd, Ijsbrand Wijnands, 1741 Neale Ranns, Dirk Trossen, Sandy Zheng and Jeffrey Zhang for their 1742 extensive review and suggestions. 1744 12. Change log [RFC Editor: Please remove] 1746 draft-ietf-bier-te-arch: 1748 06: Concern by Lou berger re. BIER-TE as full traffic engineering 1749 solution. 1751 Changed title "Traffic Engineering" to "Path Engineering" 1753 Added intro section of relationship BIER-TE to traffic 1754 engineering. 1756 Changed "traffic engineering" term in text" to "path engineering", 1757 where appropriate 1759 Other: 1761 Shortened "BIER-TE Controller Host" to "BIER-TE Controller". 1762 Fixed up all instances of controller to do this. 1764 05: Review Jeffrey Zhang. 1766 Part 2: 1768 4.3 added note about leaf-BFER being also a propery of routing 1769 setup. 1771 4.7 Added missing details from example to avoid confusion with 1772 routed adjacencies, also compressed explanatory text and better 1773 justification why seed is explicitly configured by controller. 1775 4.9 added section discussing generic reuse of BP methods. 1777 4.10 added section summarizing BP optimizations of section 4. 1779 6. Rewrote/compressed explanation of comparison BIER/BIER-TE 1780 forwarding difference. Explained benefit of BIER-TE per-BP 1781 forwarding being independent of forwarding for other BPs. 1783 Part 1: 1785 Explicitly ue forwarded_connected adjcency in ECMP adjcency 1786 examples to avoid confusion. 1788 4.3 Add picture as example for leav vs. non-leaf BFR in topology. 1789 Improved description. 1791 4.5 Exampe for traffic that can be broadcast -> for single BP in 1792 hub&spoke. 1794 4.8.1 Simplified example picture for routed adjacency, explanatory 1795 text. 1797 Review from Dirk Trossen: 1799 Fixed up explanation of ICC paper vs. bloom filter. 1801 04: spell check run. 1803 Addded remaining fixes for Sandys (Zhang Zheng) review: 1805 4.7 Enhance ECMP explanations: 1807 example ECMP algorithm, highlight that doc does not standardize 1808 ECMP algorithm. 1810 Review from Dirk Trossen: 1812 1. Added mentioning of prior work for traffic engineered paths 1813 with bloom filters. 1815 2. Changed title from layers to components and added "BIER-TE 1816 control plane" to "BIER-TE Controller" to make it clearer, what it 1817 does. 1819 2.2.3. Added reference to I-D.ietf-bier-multicast-http-response 1820 as an example solution. 1822 2.3. clarified sentence about resetting BPs before sending copies 1823 (also forgot to mention DNR here). 1825 3.4. Added text saying this section will be removed unless IESG 1826 review finds enough redeeming value in this example given how -03 1827 introduced section 1.1 with basic examples. 1829 7.2. Removed explicit numbers 20%/80% for number of topology bits 1830 in BIER-TE, replaced with more vague (high/low) description, 1831 because we do not have good reference material Added text saying 1832 this section will be removed unless IESG review finds enough 1833 redeeming value in this example given how -03 introduced section 1834 1.1 with basic examples. 1836 many typos fixed. Thanks a lot. 1838 03: Last call textual changes by authors to improve readability: 1840 removed Wolfgang Braun as co-authors (as requested). 1842 Improved abstract to be more explanatory. Removed mentioning of 1843 FRR (not concluded on so far). 1845 Added new text into Introduction section because the text was too 1846 difficult to jump into (too many forward pointers). This 1847 primarily consists of examples and the early introduction of the 1848 BIER-TE Topology concept enabled by these examples. 1850 Amended comparison to SR. 1852 Changed syntax from [VRF] to {VRF} to indicate its optional and to 1853 make idnits happy. 1855 Split references into normative / informative, added references. 1857 02: Refresh after IETF104 discussion: changed intended status back 1858 to standard. Reasoning: 1860 Tighter review of standards document == ensures arch will be 1861 better prepared for possible adoption by other WGs (e.g. DetNet) 1862 or std. bodies. 1864 Requirement against the degree of existing implementations is self 1865 defined by the WG. BIER WG seems to think it is not necessary to 1866 apply multiple interoperating implementations against an 1867 architecture level document at this time to make it qualify to go 1868 to standards track. Also, the levels of support introduced in -01 1869 rev. should allow all BIER forwarding engines to also be able to 1870 support the base level BIER-TE forwarding. 1872 01: Added note comparing BIER and SR to also hopefully clarify 1873 BIER-TE vs. BIER comparison re. SR. 1875 - added requirements section mandating only most basic BIER-TE 1876 forwarding features as MUST. 1878 - reworked comparison with BIER forwarding section to only 1879 summarize and point to pseudocode section. 1881 - reworked pseudocode section to have one pseudocode that mirrors 1882 the BIER forwarding pseudocode to make comparison easier and a 1883 second pseudocode that shows the complete set of BIER-TE 1884 forwarding options and simplification/optimization possible vs. 1885 BIER forwarding. Removed MyBitsOfInterest (was pure 1886 optimization). 1888 - Added captions to pictures. 1890 - Part of review feedback from Sandy (Zhang Zheng) integrated. 1892 00: Changed target state to experimental (WG conclusion), updated 1893 references, mod auth association. 1895 - Source now on http://www.github.com/toerless/bier-te-arch 1897 - Please open issues on the github for change/improvement requests 1898 to the document - in addition to posting them on the list 1899 (bier@ietf.). Thanks!. 1901 draft-eckert-bier-te-arch: 1903 06: Added overview of forwarding differences between BIER, BIER- 1904 TE. 1906 05: Author affiliation change only. 1908 04: Added comparison to Live-Live and BFIR to FRR section 1909 (Eckert). 1911 04: Removed FRR content into the new FRR draft [I-D.eckert-bier- 1912 te-frr] (Braun). 1914 - Linked FRR information to new draft in Overview/Introduction 1916 - Removed BTAFT/FRR from "Changes in the network topology" 1918 - Linked new draft in "Link/Node Failures and Recovery" 1920 - Removed FRR from "The BIER-TE Forwarding Layer" 1922 - Moved FRR section to new draft 1924 - Moved FRR parts of Pseudocode into new draft 1925 - Left only non FRR parts 1927 - removed FrrUpDown(..) and //FRR operations in 1928 ForwardBierTePacket(..) 1930 - New draft contains FrrUpDown(..) and ForwardBierTePacket(Packet) 1931 from bier-arch-03 1933 - Moved "BIER-TE and existing FRR to new draft 1935 - Moved "BIER-TE and Segment Routing" section one level up 1937 - Thus, removed "Further considerations" that only contained this 1938 section 1940 - Added Changes for version 04 1942 03: Updated the FRR section. Added examples for FRR key concepts. 1943 Added BIER-in-BIER tunneling as option for tunnels in backup 1944 paths. BIFT structure is expanded and contains an additional 1945 match field to support full node protection with BIER-TE FRR. 1947 03: Updated FRR section. Explanation how BIER-in-BIER 1948 encapsulation provides P2MP protection for node failures even 1949 though the routing underlay does not provide P2MP. 1951 02: Changed the definition of BIFT to be more inline with BIER. 1952 In revs. up to -01, the idea was that a BIFT has only entries for 1953 a single bitstring, and every SI and subdomain would be a separate 1954 BIFT. In BIER, each BIFT covers all SI. This is now also how we 1955 define it in BIER-TE. 1957 02: Added Section 7 to explain the use of SI, subdomains and BFR- 1958 id in BIER-TE and to give an example how to efficiently assign 1959 bits for a large topology requiring multiple SI. 1961 02: Added further detailed for rings - how to support input from 1962 all ring nodes. 1964 01: Fixed BFIR -> BFER for section 4.3. 1966 01: Added explanation of SI, difference to BIER ECMP, 1967 consideration for Segment Routing, unicast FRR, considerations for 1968 encapsulation, explanations of BIER-TE Controller and CLI. 1970 00: Initial version. 1972 13. References 1974 13.1. Normative References 1976 [RFC8279] Wijnands, IJ., Ed., Rosen, E., Ed., Dolganow, A., 1977 Przygienda, T., and S. Aldrin, "Multicast Using Bit Index 1978 Explicit Replication (BIER)", RFC 8279, 1979 DOI 10.17487/RFC8279, November 2017, 1980 . 1982 [RFC8296] Wijnands, IJ., Ed., Rosen, E., Ed., Dolganow, A., 1983 Tantsura, J., Aldrin, S., and I. Meilik, "Encapsulation 1984 for Bit Index Explicit Replication (BIER) in MPLS and Non- 1985 MPLS Networks", RFC 8296, DOI 10.17487/RFC8296, January 1986 2018, . 1988 13.2. Informative References 1990 [I-D.ietf-bier-multicast-http-response] 1991 Trossen, D., Rahman, A., Wang, C., and T. Eckert, 1992 "Applicability of BIER Multicast Overlay for Adaptive 1993 Streaming Services", draft-ietf-bier-multicast-http- 1994 response-03 (work in progress), February 2020. 1996 [I-D.ietf-roll-ccast] 1997 Bergmann, O., Bormann, C., Gerdes, S., and H. Chen, 1998 "Constrained-Cast: Source-Routed Multicast for RPL", 1999 draft-ietf-roll-ccast-01 (work in progress), October 2017. 2001 [I-D.qiang-detnet-large-scale-detnet] 2002 Qiang, L., Geng, X., Liu, B., Eckert, T., Geng, L., and G. 2003 Li, "Large-Scale Deterministic IP Network", draft-qiang- 2004 detnet-large-scale-detnet-05 (work in progress), September 2005 2019. 2007 [ICC] Reed, M., Al-Naday, M., Thomos, N., Trossen, D., 2008 Petropoulos, G., and S. Spirou, "Stateless multicast 2009 switching in software defined networks", IEEE 2010 International Conference on Communications (ICC), Kuala 2011 Lumpur, Malaysia, 2016, May 2016, 2012 . 2014 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 2015 Requirement Levels", BCP 14, RFC 2119, 2016 DOI 10.17487/RFC2119, March 1997, 2017 . 2019 [RFC2205] Braden, R., Ed., Zhang, L., Berson, S., Herzog, S., and S. 2020 Jamin, "Resource ReSerVation Protocol (RSVP) -- Version 1 2021 Functional Specification", RFC 2205, DOI 10.17487/RFC2205, 2022 September 1997, . 2024 [RFC2212] Shenker, S., Partridge, C., and R. Guerin, "Specification 2025 of Guaranteed Quality of Service", RFC 2212, 2026 DOI 10.17487/RFC2212, September 1997, 2027 . 2029 [RFC3209] Awduche, D., Berger, L., Gan, D., Li, T., Srinivasan, V., 2030 and G. Swallow, "RSVP-TE: Extensions to RSVP for LSP 2031 Tunnels", RFC 3209, DOI 10.17487/RFC3209, December 2001, 2032 . 2034 [RFC8402] Filsfils, C., Ed., Previdi, S., Ed., Ginsberg, L., 2035 Decraene, B., Litkowski, S., and R. Shakir, "Segment 2036 Routing Architecture", RFC 8402, DOI 10.17487/RFC8402, 2037 July 2018, . 2039 Authors' Addresses 2041 Toerless Eckert (editor) 2042 Futurewei Technologies Inc. 2043 2330 Central Expy 2044 Santa Clara 95050 2045 USA 2047 Email: tte+ietf@cs.fau.de 2049 Gregory Cauchie 2050 Bouygues Telecom 2052 Email: GCAUCHIE@bouyguestelecom.fr 2054 Michael Menth 2055 University of Tuebingen 2057 Email: menth@uni-tuebingen.de