idnits 2.17.00 (12 Aug 2021) /tmp/idnits47681/draft-ietf-lsvr-l3dl-08.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == There is 1 instance of lines with non-ascii characters in the document. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (14 October 2021) is 219 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Outdated reference: draft-ietf-idr-bgp-ls-segment-routing-ext has been published as RFC 9085 == Outdated reference: draft-ietf-idr-bgpls-segment-routing-epe has been published as RFC 9086 == Outdated reference: A later version (-16) exists of draft-ietf-lsvr-bgp-spf-15 -- Possible downref: Non-RFC (?) normative reference: ref. 'IANA-PEN' -- Possible downref: Non-RFC (?) normative reference: ref. 'IEEE802-2014' ** Obsolete normative reference: RFC 5226 (Obsoleted by RFC 8126) Summary: 1 error (**), 0 flaws (~~), 5 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group R. Bush 3 Internet-Draft Arrcus & Internet Initiative Japan 4 Intended status: Standards Track R. Austein 5 Expires: 17 April 2022 K. Patel 6 Arrcus 7 14 October 2021 9 Layer-3 Discovery and Liveness 10 draft-ietf-lsvr-l3dl-08 12 Abstract 14 In Massive Data Centers, BGP-SPF and similar routing protocols are 15 used to build topology and reachability databases. These protocols 16 need to discover IP Layer-3 attributes of links, such as neighbor IP 17 addressing, logical link IP encapsulation abilities, and link 18 liveness. This Layer-3 Discovery and Liveness protocol collects 19 these data, which may then be disseminated using BGP-SPF and similar 20 protocols. 22 Requirements Language 24 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 25 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 26 "OPTIONAL" in this document are to be interpreted as described in BCP 27 14 [RFC2119] [RFC8174] when, and only when, they appear in all 28 capitals, as shown here. 30 Status of This Memo 32 This Internet-Draft is submitted in full conformance with the 33 provisions of BCP 78 and BCP 79. 35 Internet-Drafts are working documents of the Internet Engineering 36 Task Force (IETF). Note that other groups may also distribute 37 working documents as Internet-Drafts. The list of current Internet- 38 Drafts is at https://datatracker.ietf.org/drafts/current/. 40 Internet-Drafts are draft documents valid for a maximum of six months 41 and may be updated, replaced, or obsoleted by other documents at any 42 time. It is inappropriate to use Internet-Drafts as reference 43 material or to cite them other than as "work in progress." 45 This Internet-Draft will expire on 17 April 2022. 47 Copyright Notice 49 Copyright (c) 2021 IETF Trust and the persons identified as the 50 document authors. All rights reserved. 52 This document is subject to BCP 78 and the IETF Trust's Legal 53 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 54 license-info) in effect on the date of publication of this document. 55 Please review these documents carefully, as they describe your rights 56 and restrictions with respect to this document. Code Components 57 extracted from this document must include Simplified BSD License text 58 as described in Section 4.e of the Trust Legal Provisions and are 59 provided without warranty as described in the Simplified BSD License. 61 Table of Contents 63 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 64 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 65 3. Background . . . . . . . . . . . . . . . . . . . . . . . . . 5 66 4. Top Level Overview . . . . . . . . . . . . . . . . . . . . . 6 67 5. Inter-Link Protocol Overview . . . . . . . . . . . . . . . . 8 68 5.1. L3DL Ladder Diagram . . . . . . . . . . . . . . . . . . . 8 69 6. Transport Layer . . . . . . . . . . . . . . . . . . . . . . . 10 70 7. The Checksum . . . . . . . . . . . . . . . . . . . . . . . . 12 71 8. TLV PDUs . . . . . . . . . . . . . . . . . . . . . . . . . . 14 72 9. Logical Link Endpoint Identifier . . . . . . . . . . . . . . 15 73 10. HELLO . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 74 11. OPEN . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 75 12. ACK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 76 12.1. Retransmission . . . . . . . . . . . . . . . . . . . . . 21 77 13. The Encapsulations . . . . . . . . . . . . . . . . . . . . . 22 78 13.1. The Encapsulation PDU Skeleton . . . . . . . . . . . . . 22 79 13.2. Encapsulaion Flags . . . . . . . . . . . . . . . . . . . 24 80 13.3. IPv4 Encapsulation . . . . . . . . . . . . . . . . . . . 24 81 13.4. IPv6 Encapsulation . . . . . . . . . . . . . . . . . . . 25 82 13.5. MPLS Label List . . . . . . . . . . . . . . . . . . . . 26 83 13.6. MPLS IPv4 Encapsulation . . . . . . . . . . . . . . . . 26 84 13.7. MPLS IPv6 Encapsulation . . . . . . . . . . . . . . . . 27 85 14. VENDOR - Vendor Extensions . . . . . . . . . . . . . . . . . 27 86 15. KEEPALIVE - Layer-2 Liveness . . . . . . . . . . . . . . . . 28 87 16. Layers-2.5 and 3 Liveness . . . . . . . . . . . . . . . . . . 29 88 17. The North/South Protocol . . . . . . . . . . . . . . . . . . 29 89 17.1. Use BGP-LS as Much as Possible . . . . . . . . . . . . . 30 90 17.2. Extensions to BGP-LS . . . . . . . . . . . . . . . . . . 30 91 18. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 30 92 18.1. HELLO Discussion . . . . . . . . . . . . . . . . . . . . 30 93 18.2. HELLO versus KEEPALIVE . . . . . . . . . . . . . . . . . 31 94 19. VLANs/SVIs/Sub-interfaces . . . . . . . . . . . . . . . . . . 31 95 20. Implementation Considerations . . . . . . . . . . . . . . . . 31 96 21. Security Considerations . . . . . . . . . . . . . . . . . . . 32 97 22. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 32 98 22.1. PDU Types . . . . . . . . . . . . . . . . . . . . . . . 32 99 22.2. Signature Type . . . . . . . . . . . . . . . . . . . . . 33 100 22.3. Flag Bits . . . . . . . . . . . . . . . . . . . . . . . 33 101 22.4. Error Codes . . . . . . . . . . . . . . . . . . . . . . 34 102 23. IEEE Considerations . . . . . . . . . . . . . . . . . . . . . 34 103 24. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 34 104 25. References . . . . . . . . . . . . . . . . . . . . . . . . . 34 105 25.1. Normative References . . . . . . . . . . . . . . . . . . 34 106 25.2. Informative References . . . . . . . . . . . . . . . . . 36 107 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 37 109 1. Introduction 111 The Massive Data Center (MDC) environment presents unusual problems 112 of scale, e.g. O(10,000) forwarding devices, while its homogeneity 113 presents opportunities for simple approaches. Approaches such as 114 Jupiter Rising [JUPITER] use a central controller to deal with 115 scaling, while BGP-SPF [I-D.ietf-lsvr-bgp-spf] provides massive 116 scale-out without centralization using a tried and tested scalable 117 distributed control plane, offering a scalable routing solution in 118 Clos [Clos0][Clos1] and similar environments. But BGP-SPF and 119 similar higher level device-spanning protocols, e.g. 120 [I-D.malhotra-bess-evpn-lsoe], need logical link state and addressing 121 data from the network to build the routing topology. They also need 122 prompt but prudent reaction to (logical) link failure. 124 Layer-3 Discovery and Liveness (L3DL) provides brutally simple 125 mechanisms for devices to 127 * Discover each other's unique endpoint identification, 129 * Discover mutually supported layer-3 encapsulations, e.g. IP/MPLS, 131 * Discover Layer-3 IP and/or MPLS addressing of interfaces of the 132 encapsulations, 134 * Present these data, using a very restricted profile of a BGP-LS 135 [RFC7752] API, to BGP-SPF which computes the topology and builds 136 routing and forwarding tables, 138 * Enable Layer-3 link liveness such as BFD, 140 * Provide Layer-2 keep-alive messages for session continuity, and 141 finally 143 * Provide for authenticity verification of protocol messages. 145 In this document, the use case for L3DL is for point to point links 146 in a datacenter Clos in order to exchange the data needed for BGP-SPF 147 [I-D.ietf-lsvr-bgp-spf] bootstrap and continuity. Once layer-2 148 connectivity has been leveraged to get layer-3 addressability and 149 forwarding capabilities, normal layer-3 forwarding and routing can 150 take over. 152 L3DL might be found to be more widely applicable to a range of 153 routing and similar protocols which need layer-3 discovery and 154 characterisation. 156 2. Terminology 158 Even though it concentrates on the inter-device layer, this document 159 relies heavily on routing terminology. The following attempts to 160 clarify the use of some possibly confusing terms: 162 ASN: Autonomous System Number [RFC4271], a BGP identifier for 163 an originator of Layer-3 routes, particularly BGP 164 announcements. 166 BGP-LS: A mechanism by which link-state and TE information can be 167 collected from networks and shared with external 168 components using the BGP routing protocol. See [RFC7752]. 170 BGP-SPF A hybrid protocol using BGP transport but a Dijkstra 171 Shortest Path First decision process. See 172 [I-D.ietf-lsvr-bgp-spf]. 174 Clos: A hierarchic subset of a crossbar switch topology commonly 175 used in data centers. 177 Datagram: The L3DL content of a single Layer-2 frame, sans Ethernet 178 framing. A full L3DL PDU may be packaged in multiple 179 Datagrams. 181 Encapsulation: Address Family Indicator and Subsequent Address 182 Family Indicator (AFI/SAFI). I.e. classes of layer-2.5 183 and 3 addresses such as IPv4, IPv6, MPLS, etc. 185 Frame: A Layer-2 Ethernet packet. 187 Link or Logical Link: A logical connection between two logical ports 188 on two devices. E.g. two VLANs between the same two ports 189 are two links. 191 LLEI: Logical Link Endpoint Identifier, the unique identifier of 192 one end of a logical link, see Section 9. 194 MAC Address: 48-bit Layer-2 addresses are assumed since they are 195 used by all widely deployed Layer-2 network technologies 196 of interest, especially Ethernet. See [IEEE.802_2001]. 198 MDC: Massive Data Center, commonly composed of thousands of Top 199 of Rack Switches (TORs). 201 MTU: Maximum Transmission Unit, the size in octets of the 202 largest packet that can be sent on a medium, see [RFC1122] 203 1.3.3. 205 PDU: Protocol Data Unit, an L3DL application layer message. A 206 PDU's content may need to be broken into multiple 207 Datagrams to make it through MTU or other restrictions. 209 RouterID: An 32-bit identifier unique in the current routing domain, 210 see [RFC6286]. 212 Session: An established, via OPEN PDUs, session between two L3DL 213 capable link end-points, 215 SPF: Shortest Path First, an algorithm for finding the shortest 216 paths between nodes in a graph; AKA Dijkstra's algorithm. 218 System Identifier: An eight octet ISO System Identifier a la 219 [RFC1629] System ID 221 TOR: Top Of Rack switch, aggregates the servers in a rack and 222 connects to aggregation layers of the Clos tree, AKA the 223 Clos spine. 225 ZTP: Zero Touch Provisioning gives devices initial addresses, 226 credentials, etc. on boot/restart. 228 3. Background 230 L3DL is primarily designed for a Clos type datacenter scale and 231 topology, but can accommodate richer topologies which contain 232 potential cycles. 234 While L3DL is designed for the MDC, there are no inherent reasons it 235 could not run on a WAN. The authentication and authorization needed 236 to run safely on a WAN need to be considered, and the appropriate 237 level of security options chosen. 239 L3DL assumes a new IEEE assigned EtherType (TBD). 241 The number of addresses of one Encapsulation type on an interface 242 link may be quite large given a TOR with tens of servers, each server 243 having a few hundred micro-services, resulting in an inordinate 244 number of addresses. And highly automated micro-service migration 245 can cause serious address prefix disaggregation, resulting in 246 interfaces with thousands of disaggregated prefixes. 248 Therefore the L3DL protocol is session oriented and uses incremental 249 announcement and withdrawal with session restart, a la BGP 250 ([RFC4271]). 252 4. Top Level Overview 254 * Devices discover each other on logical links 256 * Logical Link Endpoint Identifiers (LLEIs) are exchanged 258 * Layer-2 Liveness checks may be started 260 * Encapsulation data are exchanged and IP-Level Liveness checks 261 enabled 263 * A BGP-like upper layer protocol is assumed to use the identifiers 264 and encapsulation data to discover and build a topology database 266 +-------------------+ +-------------------+ +-------------------+ 267 | Device | | Device | | Device | 268 | | | | | | 269 |+-----------------+| |+-----------------+| |+-----------------+| 270 || || || || || || 271 || BGP-SPF <+---+> BGP-SPF <+---+> BGP-SPF || 272 || || || || || || 273 |+--------^--------+| |+--------^--------+| |+--------^--------+| 274 | | | | | | | | | 275 | | | | | | | | | 276 |+--------+--------+| |+--------+--------+| |+--------+--------+| 277 || Encapsulations || || Encapsulations || || Encapsulations || 278 || Addresses || || Addresses || || Addresses || 279 || L2 Liveness || || L2 Liveness || || L2 Liveness || 280 |+--------^--------+| |+--------^--------+| |+--------^--------+| 281 | | | | | | | | | 282 | | | | | | | | | 283 |+--------v--------+| |+--------v--------+| |+--------v--------+| 284 || || || || || || 285 ||Inter-Device PDUs<+---+>Inter-Device PDUs<+---+>Inter-Device PDUs|| 286 || || || || || || 287 |+-----------------+| |+-----------------+| |+-----------------+| 288 +-------------------+ +-------------------+ +-------------------+ 290 There are two protocols, the inter-device (left-right in the diagram) 291 per-link layer-3 discovery and the API to the upper level BGP-like 292 routing protocol (up-down in the above diagram): 294 * Inter-device PDUs are used to exchange device and logical link 295 identities and layer-2.5 (MPLS) and 3 identifiers (not payloads), 296 e.g. device IDs, port identities, VLAN IDs, Encapsulations, and IP 297 addresses. 299 * A Link Layer to BGP API presents these data up the stack to a BGP 300 protocol or an other device-spanning upper layer protocol, 301 presenting them using the BGP-LS BGP-like data format. 303 The upper layer BGP family routing protocols cross all the devices, 304 though they are not part of these L3DL protocols. 306 To simplify this document, Layer-2 framing is not shown. L3DL is 307 about layer-3. 309 5. Inter-Link Protocol Overview 311 Two devices discover each other and their respective identities by 312 sending multicast HELLO PDUs (Section 10). To assure discovery of 313 new devices coming up on a multi-link topology, devices on such a 314 topology, and only on a multi-link topology, send periodic HELLOs 315 forever, see Section 18.1. 317 Once a new device is recognized, both devices attempt to negotiate 318 and establish a session by sending unicast OPEN PDUs (Section 11) to 319 the source MAC addresses (plus VIDs if VLANs) of the received HELLOs. 320 Once a session is established through the OPEN exchange, the 321 Encapsulations (Section 13) configured on an end point may be 322 announced and modified. Note that these are only the encapsulation 323 and addresses configured on the announcing interface; though a 324 device's loopback and overlay interface(s) may also be announced. 325 When two devices on a link have compatible Encapsulations and 326 addresses, i.e. the same AFI/SAFI and the same subnet, the link is 327 announced via the BGP-LS API. 329 5.1. L3DL Ladder Diagram 331 The HELLO, Section 10, is a priming message sent on all configured 332 logical links. It is a small L3DL PDU encapsulated in an Ethernet 333 multicast frame with the simple goal of discovering the identities of 334 logical link endpoint(s) reachable from a Logical Link Endpoint, 335 Section 9. 337 The HELLO and OPEN, Section 11, PDUs, which are used to discover and 338 exchange detailed Logical Link Endpoint Identifiers, LLEIs, and the 339 ACK/ERROR PDU, are mandatory; other PDUs are optional; though at 340 least one encapsulation SHOULD be agreed at some point. 342 The following is a ladder-style diagram of the L3DL protocol 343 exchanges: 345 | HELLO | Logical Link Peer discovery 346 |---------------------------->| 347 | HELLO | Mandatory 348 |<----------------------------| 349 | | 350 | | 351 | OPEN | MACs, IDs, etc. 352 |---------------------------->| 353 | ACK | 354 |<----------------------------| 355 | | 356 | OPEN | Mandatory 357 |<----------------------------| 358 | ACK | 359 |---------------------------->| 360 | | 361 | | 362 | Interface IPv4 Addresses | Interface IPv4 Addresses 363 |---------------------------->| Optional 364 | ACK | 365 |<----------------------------| 366 | | 367 | Interface IPv4 Addresses | 368 |<----------------------------| 369 | ACK | 370 |---------------------------->| 371 | | 372 | | 373 | Interface IPv6 Addresses | Interface IPv6 Addresses 374 |---------------------------->| Optional 375 | ACK | 376 |<----------------------------| 377 | | 378 | Interface IPv6 Addresses | 379 |<----------------------------| 380 | ACK | 381 |---------------------------->| 382 | | 383 | | 384 | Interface MPLSv4 Labels | Interface MPLSv4 Labels 385 |---------------------------->| Optional 386 | ACK | 387 |<----------------------------| 388 | | 389 | Interface MPLSv4 Labels | Interface MPLSv4 Labels 390 |<----------------------------| Optional 391 | ACK | 392 |---------------------------->| 393 | | 394 | | 395 | Interface MPLSv6 Labels | Interface MPLSv6 Labels 396 |---------------------------->| Optional 397 | ACK | 398 |<----------------------------| 399 | | 400 | Interface MPLSv6 Labels | Interface MPLSv6 Labels 401 |<----------------------------| Optional 402 | ACK | 403 |---------------------------->| 404 | | 405 | | 406 | L3DL KEEPALIVE | Layer-2 Liveness 407 |---------------------------->| Optional 408 | L3DL KEEPALIVE | 409 |<----------------------------| 411 6. Transport Layer 413 L3DL PDUs are carried by a simple transport layer which allows long 414 PDUs to occupy many Ethernet frames. The L3DL content of a single 415 Ethernet frame, exclusive of Ethernet framing data, is referred to as 416 a Datagram. 418 The L3DL Transport Layer encapsulates each Datagram using a common 419 transport header. 421 If a PDU does not fit in a single datagram, it is broken into 422 multiple Datagrams and reassembled by the receiver a la [RFC0791] 423 Section 2.3 Fragmentation. 425 This is not classic 'fragmentation', but rather decomposition at the 426 origin to allow PDU payloads larger than the frame allows. There are 427 no intermediate devices capable of further fragmentation or 428 reassembly. 430 A PDU might need a large number of frames to be sent. As fragments 431 are not ACK paced (as PDUs are), to avoid overwhelming bursts, the 432 sender should pace fragments of a large PDU. 434 L3DL is carrying a relatively small amount of data on relatively high 435 bandwidth links, and at a time when the link is not active with other 436 data as it does not yet have layer-3 connectivity. So congestion is 437 not considered a sufficiently significant risk to warrant additional 438 complexity. 440 Should a PDU need to be retransmitted, it MUST BE sent as the 441 identical Datagram set as the original transmission. The 442 Transmission Sequence Number informs the receiver that it is the same 443 PDU. 445 0 1 2 3 446 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 447 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 448 | Version | Transmission Sequence Number |L| Dtgm Number ~ 449 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 450 ~ Datagram Number (contd) | Datagram Length | 451 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 452 | Checksum | 453 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 454 | Payload... | 455 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 457 The fields of the L3DL Transport Header are as follows: 459 Version: Eight-bit Version number of the protocol, currently 0. 460 Values other than 0 MUST BE treated as an error. The protocol 461 version needs to be in one and only one place, so it is in the 462 datagram as opposed to, for example, the PDU header. 464 Transmission Sequence Number: A 16-bit strictly increasing unsigned 465 integer identifying this PDU, possibly across retransmissions, 466 that wraps from 2^16-1 to 0. The initial value is arbitrary. See 467 [RFC1982] on DNS Serial Number Arithmetic for too much detail on 468 comparing and incrementing a wrapping sequence number. 470 L: A bit that set to one if this Datagram is the last Datagram of 471 the PDU. For a PDU which fits in only one Datagram, it is set to 472 one. Note that this is the inverse of the marking technique used 473 by [RFC0791]. 475 Datagram Number: A monotonically increasing 23-bit value which 476 starts at zero for each PDU. This is used to reassemble frames 477 into PDUs a la [RFC0791] Section 2.3. Note that this limits an 478 L3DL PDU to 2^24 frames. 480 Datagram Length: Total number of octets in the Datagram including 481 all payloads and fields. Note that this limits a datagram to 2^16 482 octets; though Ethernet framing is likely to impose a smaller 483 limit. 485 Checksum: A 32 bit hash over the Datagram to detect bit flips, see 486 Section 7. 488 If a Datagram fails checksum verification, the datagram is invalid 489 and SHOULD be silently discarded. The sender will retransmit the 490 PDU, and the receiver can assemble it. 492 Payload: The PDU being transported or a fragment thereof. 494 To avoid the need for a receiver to reassemble two PDUs at the same 495 time, a sender MUST NOT send a subsequent PDU when a PDU is already 496 in flight and not yet acknowledged; assuming it is an ACKed PDU Type. 498 7. The Checksum 500 There is a reason conservative folk use a checksum in UDP. And as 501 many operators stretch to jumbo frames (over 1,500 octets) longer 502 checksums are the prudent approach. 504 For the purpose of computing a checksum, the checksum field itself is 505 assumed to be zero. 507 The following code describes a suggested algorithm. This 508 specification avoids mandatory to implement, algorithm agility, etc. 509 What matters is that the same algorithm is used consistently in any 510 deployment. 512 Sum up 32-bit unsigned ints in a 64-bit long, then take the high- 513 order section, shift it right filling on the left with zeros, rotate, 514 add it in, repeat until the high order 32 bits are all zero. 516 517 #include 518 #include 520 /* The F table from Skipjack, and it would work for the S-Box. */ 521 static const uint8_t sbox[256] = { 522 0xa3,0xd7,0x09,0x83,0xf8,0x48,0xf6,0xf4,0xb3,0x21,0x15,0x78, 523 0x99,0xb1,0xaf,0xf9,0xe7,0x2d,0x4d,0x8a,0xce,0x4c,0xca,0x2e, 524 0x52,0x95,0xd9,0x1e,0x4e,0x38,0x44,0x28,0x0a,0xdf,0x02,0xa0, 525 0x17,0xf1,0x60,0x68,0x12,0xb7,0x7a,0xc3,0xe9,0xfa,0x3d,0x53, 526 0x96,0x84,0x6b,0xba,0xf2,0x63,0x9a,0x19,0x7c,0xae,0xe5,0xf5, 527 0xf7,0x16,0x6a,0xa2,0x39,0xb6,0x7b,0x0f,0xc1,0x93,0x81,0x1b, 528 0xee,0xb4,0x1a,0xea,0xd0,0x91,0x2f,0xb8,0x55,0xb9,0xda,0x85, 529 0x3f,0x41,0xbf,0xe0,0x5a,0x58,0x80,0x5f,0x66,0x0b,0xd8,0x90, 530 0x35,0xd5,0xc0,0xa7,0x33,0x06,0x65,0x69,0x45,0x00,0x94,0x56, 531 0x6d,0x98,0x9b,0x76,0x97,0xfc,0xb2,0xc2,0xb0,0xfe,0xdb,0x20, 532 0xe1,0xeb,0xd6,0xe4,0xdd,0x47,0x4a,0x1d,0x42,0xed,0x9e,0x6e, 533 0x49,0x3c,0xcd,0x43,0x27,0xd2,0x07,0xd4,0xde,0xc7,0x67,0x18, 534 0x89,0xcb,0x30,0x1f,0x8d,0xc6,0x8f,0xaa,0xc8,0x74,0xdc,0xc9, 535 0x5d,0x5c,0x31,0xa4,0x70,0x88,0x61,0x2c,0x9f,0x0d,0x2b,0x87, 536 0x50,0x82,0x54,0x64,0x26,0x7d,0x03,0x40,0x34,0x4b,0x1c,0x73, 537 0xd1,0xc4,0xfd,0x3b,0xcc,0xfb,0x7f,0xab,0xe6,0x3e,0x5b,0xa5, 538 0xad,0x04,0x23,0x9c,0x14,0x51,0x22,0xf0,0x29,0x79,0x71,0x7e, 539 0xff,0x8c,0x0e,0xe2,0x0c,0xef,0xbc,0x72,0x75,0x6f,0x37,0xa1, 540 0xec,0xd3,0x8e,0x62,0x8b,0x86,0x10,0xe8,0x08,0x77,0x11,0xbe, 541 0x92,0x4f,0x24,0xc5,0x32,0x36,0x9d,0xcf,0xf3,0xa6,0xbb,0xac, 542 0x5e,0x6c,0xa9,0x13,0x57,0x25,0xb5,0xe3,0xbd,0xa8,0x3a,0x01, 543 0x05,0x59,0x2a,0x46 544 }; 546 /* non-normative example C code, constant time even */ 548 uint32_t sbox_checksum_32(const uint8_t *b, const size_t n) 549 { 550 uint32_t sum[4] = {0, 0, 0, 0}; 551 uint64_t result = 0; 552 for (size_t i = 0; i < n; i++) 553 sum[i & 3] += sbox[*b++]; 554 for (int i = 0; i < sizeof(sum)/sizeof(*sum); i++) 555 result = (result << 8) + sum[i]; 556 result = (result >> 32) + (result & 0xFFFFFFFFU); 557 result = (result >> 32) + (result & 0xFFFFFFFFU); 558 return (uint32_t) result; 559 } 560 562 8. TLV PDUs 564 The basic L3DL application layer PDU is a typical TLV (Type Length 565 Value) PDU. It includes a signature to provide optional integrity 566 and authentication. It may be broken into multiple Datagrams, see 567 Section 6. 569 0 1 2 3 570 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 571 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 572 | PDU Type | Payload Length ~ 573 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 574 ~ | Payload ... | 575 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 576 | Sig Type | Signature Length | ~ 577 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + 578 ~ Signature ~ 579 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 581 The fields of the basic L3DL header are as follows: 583 PDU Type: An integer differentiating PDU payload types. See 584 Section 22.1. 586 Payload Length: Total number of octets in the Payload field. 588 Payload: The application layer content of the L3DL PDU. 590 Sig Type: The type of the Signature, see Section 22.2. Type 0, a 591 null signature, is defined in this document. 593 Sig Type 0 indicates a null Signature. For a trivial PDU such as 594 KEEPALIVE, the underlying Datagram checksum may be sufficient for 595 integrity, though it lacks authenticity. 597 Other Sig Types may be defined in other documents, cf. 598 [I-D.ymbk-lsvr-l3dl-signing]. 600 Signature Length: The length of the Signature, possibly including 601 padding, in octets. If Sig Type is 0, Signature Length MUST BE 0. 603 Signature: The result of running the signature algorithm specified 604 in Sig Type over all octets of the PDU except for the Signature 605 itself. 607 9. Logical Link Endpoint Identifier 609 L3DL discovers neighbors on logical links and establishes sessions 610 between the two ends of all consenting discovered logical links. A 611 logical link is described by a pair of Logical Link Endpoint 612 Identifiers, LLEIs. 614 An LLEI is a variable length descriptor which could be an ASN, a 615 classic RouterID, a catenation of the two, an eight octet ISO System 616 Identifier [RFC1629], or any other identifier unique to a single 617 logical link endpoint in the topology. 619 An L3DL deployment will choose and define an LLEI which suits its 620 needs, simple or complex. Examples of two extremes follow: 622 A simplistic view of a link between two devices is two ports, 623 identified by unique MAC addresses, carrying a layer-3 protocol 624 conversation. In this case, the MAC addresses might suffice for the 625 LLEIs. 627 Unfortunately, things can get more complex. Multiple VLANs can run 628 between those two MAC addresses. In practice, many real devices use 629 the same MAC address on multiple ports and/or sub-interfaces. 631 Therefore, in the general circumstance, a fully described LLEI might 632 be as follows: 634 0 1 2 3 635 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 636 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 637 | | 638 + System Identifier + 639 | | 640 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 641 | ifIndex | 642 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 644 System Identifier, a la [RFC1629], is an eight octet identifier 645 unique in the entire operational space. Routers and switches usually 646 have internal MAC Addresses which can be padded with high order zeros 647 and used if no System ID exists on the device. If no unique 648 identifier is burned into a device, the local L3DL configuration 649 SHOULD create and assign a unique one, likely by configuration. 651 ifIndex is the SNMP identifier of the (sub-)interface, see [RFC1213]. 652 This uniquely identifies the port. 654 For a layer-3 tagged sub-interface or a VLAN/SVI interface, IfIndex 655 is that of the logical sub-interface, so no further disambiguation is 656 needed. 658 L3DL PDUs learned over VLAN-ports may be interpreted by upper layer-3 659 routing protocols as being learned on the corresponding layer-3 SVI 660 interface for the VLAN. 662 LLEIs are big-endian. 664 10. HELLO 666 The HELLO PDU is unique in that it is encapsulated in a multicast 667 Ethernet frame. It solicits response(s) from other LLEI(s) on the 668 link. See Section 18.1 for why multicast is used. The destination 669 multicast MAC Addressees to be used MUST be one of the following, See 670 Clause 9.2.2 of [IEEE802-2014]: 672 01-80-C2-00-00-0E: Nearest Bridge = Propagation constrained to a 673 single physical link; stopped by all types of bridges (including 674 MPRs (media converters)). This SHOULD be used when the link is 675 known to be a simple point to point link. 677 To Be Assigned: When a switch receives a frame with a multicast 678 destination MAC it does not recognize, it forwards to all ports. 679 This destination MAC SHOULD be sent when the interface is known to 680 be connected to a switch. See Section 23. This SHOULD be used 681 when the link may be a multi-point link. 683 All other L3DL PDUs are encapsulated in unicast frames, as the peer's 684 destination MAC address is known after the HELLO exchange. 686 When an interface is turned up on a device, it SHOULD issue a HELLO 687 if it is to participate in L3DL sessions. 689 If a constrained Nearest Bridge destination address has been 690 configured for a point-to-point interface, see above, then the HELLO 691 SHOULD NOT be repeated once a session has been created by an exchange 692 of OPENs. 694 If the configured destination address is one that is propagated by 695 switches, the HELLO SHOULD be repeated at a configured interval, with 696 a default of 60 seconds. This allows discovery by new devices which 697 come up on the layer-2 mesh. In this multi-link scenario, the 698 operator should be aware of the trade-off between timer tuning and 699 network noise and adjust the inter-HELLO timer accordingly. 701 0 1 2 3 702 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 703 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 704 | PDU Type = 0 | Payload Length = 0 ~ 705 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 706 ~ | Sig Type = 0 | Signature Length = 0 | 707 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 709 If more than one device responds, one adjacency is formed for each 710 unique source LLEI response. L3DL treats each adjacency as a 711 separate logical link. 713 When a HELLO is received from a source MAC address (plus VID if VLAN) 714 with which there is no established L3DL session, the receiver SHOULD 715 respond by sending an OPEN PDU to the source MAC address (plus VID). 716 The two devices establish an L3DL session by exchanging OPEN PDUs. 718 To ameliorate possible load spikes during bootstrap or event 719 recovery, there SHOULD be a jittered delay between receipt of a HELLO 720 and issue of the OPEN. The default delay range SHOULD be zero to 721 five seconds, and MUST be configurable. 723 If a HELLO is received from a MAC address with which there is an 724 established session, the HELLO should be dropped. 726 The Payload Length is zero as there is no payload. 728 HELLO PDUs can not be signed as keying material has yet to be 729 exchanged. Hence the signature MUST always be the null type. 731 11. OPEN 733 Each device has learned the other's MAC Address from the HELLO 734 exchange, see Section 10. Therefore the OPEN and all subsequent PDUs 735 MUST BE unicast, as opposed to the HELLO's multicast frame. 737 0 1 2 3 738 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 739 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 740 | PDU Type = 1 | Payload Length | 741 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 742 | | Nonce | 743 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 744 | | LLEI Length | My LLEI | 745 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 746 | | AttrCount | | 747 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 748 | Attribute List ... | Auth Type | Key Length | 749 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 750 | | Key ... | 751 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 752 | Serial Number | 753 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 754 | Sig Type | Signature Length | Signature ... | 755 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 757 The Payload Length is the number of octets in all fields of the PDU 758 from the Nonce through the Serial Number, not including the three 759 final signature fields. 761 The Nonce enables detection of a duplicate OPEN PDU. It SHOULD be 762 either a random number or a high resolution timestamp. It is needed 763 to prevent session closure due to a repeated OPEN caused by a race or 764 a dropped or delayed ACK. 766 My LLEI is the sender's LLEI, see Section 9. 768 AttrCount is the number of attributes in the Attribute List. 769 Attributes are single octets the semantics of which are operator- 770 defined. 772 A node may have zero or more operator-defined attributes, e.g.: 773 spine, leaf, backbone, route reflector, arabica, ... 775 Attribute syntax and semantics are local to an operator or 776 datacenter; hence there is no global registry. Nodes exchange their 777 attributes only in the OPEN PDU. 779 Auth Type is the Signature algorithm suite, see Section 8. 781 Key Length is a 16-bit field denoting the length in octets of the Key 782 itself, not including the Auth Type or the Key Length. If the Auth 783 Type is zero, then the Key Length MUST also be zero, and there MUST 784 BE no Key data. 786 The Key is specific to the operational environment. A failure to 787 authenticate is a failure to start the L3DL session, an ERROR PDU 788 MUST BE sent (Error Code 3), and HELLOs MUST be restarted. 790 Although delay and jitter in responding with an OPEN were specified 791 above, beware of load created by long strings of authentication 792 failures and retries. A configurable failure count limit (default 8) 793 SHOULD result in giving up on the connection attempt. 795 The Serial Number is a monotonically increasing 32-bit value 796 representing the sender's state at the time of sending the last PDU. 797 It may be an integer, a timestamp, etc. If incrementing the Serial 798 Number would cause it to be zero, it should be incremented again. 800 On session restart (new OPEN), a receiver MAY send the last received 801 Serial Number to tell the sender to only send data with a Serial 802 Number greater (in the [RFC1982] sense), or send a Serial Number of 803 zero to request all data. 805 The Serial Number supports session resumption in anticipation of 806 peers having a very large amount of state they would prefer not to 807 re-exchange because of some glitch. The Serial Number is not 808 expected to wrap for a considerable time, e.g. days or weeks. But to 809 address the rare case it does, [RFC1982] on DNS Serial Number 810 Arithmetic should be used as it is in the Transmission Sequence 811 Number. 813 This allows a sender of an OPEN to tell the receiver that the sender 814 would like to resume a session and that the receiver only needs to 815 send data starting with the PDU with the lowest Serial Number greater 816 (in the [RFC1982] sense) than the one sent in the OPEN. If the 817 sender is not trying to resume a dropped session, the Serial Number 818 MUST be zero. 820 If the receiver of an OPEN PDU with a non-zero Serial Number can not 821 resume from the requested point, it should return an ACK with an 822 Error Code of 2, Session could not be continued. The sender of the 823 failing OPEN PDU SHOULD then send an OPEN PDU with a Serial Number of 824 zero. 826 The Signature fields are described in Section 8 and in an asymmetric 827 key environment serve as a proof of possession of the signing auth 828 data by the sender. 830 Once two logical link endpoints know each other, and have ACKed each 831 other's OPEN PDUs, Layer-2 KEEPALIVEs (see Section 15) MAY be started 832 to ensure Layer-2 liveness and keep the session semantics alive. The 833 timing and acceptable drop of KEEPALIVE PDUs are discussed in 834 Section 15. 836 If a sender of OPEN does not receive an ACK of the OPEN PDU, then 837 they MUST resend the same OPEN PDU, with the same Nonce. Resending 838 an unacknowledged OPEN PDU, like other ACKed PDUs, SHOULD use 839 exponential back-off, see [RFC1122]. 841 If a properly authenticated OPEN arrives at L3DL speaker A with a new 842 Nonce from an LLEI, speaker B, with which A believes it already has 843 an L3DL session (OPENs have already been exchanged), and the Serial 844 Number in the OPEN PDU is non-zero, speaker A SHOULD establish a new 845 sending session by sending an OPEN with the Serial Number being the 846 same as that of A's last sent and ACKed PDU. A MUST resume sending 847 encapsulations etc. subsequent to the requested Sequence Number. And 848 B MUST retain all previously discovered encapsulation and other data 849 received from A. 851 If a properly authenticated OPEN arrives with a new Nonce from an 852 LLEI with which the receiving logical link endpoint believes it 853 already has an L3DL session (OPENs have already been exchanged), and 854 the Serial Number in the OPEN is zero, then the receiver MUST assume 855 that the sending LLEI or entire device has been reset. All 856 Previously discovered encapsulation data MUST NOT be kept and MUST BE 857 withdrawn via the BGP-LS API and the recipient MUST respond with a 858 new OPEN. 860 12. ACK 862 The ACK PDU acknowledges receipt of a PDU and reports any error 863 condition which might have been raised. 865 0 1 2 3 866 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 867 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 868 | PDU Type = 3 | Payload Length = 5 ~ 869 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 870 ~ | ACKed PDU | EType | Error Code | 871 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 872 | Error Hint | Sig Type |Signature Leng.~ 873 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 874 ~ | Signature ... | 875 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 876 The ACK acknowledges receipt of an OPEN, Encapsulation, VENDOR PDU, 877 etc. 879 The ACKed PDU is the PDU Type of the PDU being acknowledged, e.g., 880 OPEN, one of the Encapsulations, etc. 882 If there was an error processing the received PDU, then the EType is 883 non-zero. If the EType is zero, Error Code and Error Hint MUST also 884 be zero. 886 A non-zero EType is the receiver's way of telling the PDU's sender 887 that the receiver had problems processing the PDU. The Error Code 888 and Error Hint will tell the sender more detail about the error. 890 The decimal value of EType gives a strong hint how the receiver 891 sending the ACK believes things should proceed: 893 0 - No Error, Error Code and Error Hint MUST be zero 895 1 - Warning, something not too serious happened, continue 897 2 - Session should not be continued, try to restart 899 3 - Restart is hopeless, call the operator 901 4-15 - Reserved 903 The Error Codes, noting protocol failures, are listed in 904 Section 22.4. Someone stuck in the 1990s might think the catenation 905 of EType and Error Code as an echo of 0x1zzz, 0x2zzz, etc. They 906 might be right; or not. 908 The Error Hint, an arbitrary 16 bits, is any additional data the 909 sender of the error PDU thinks will help the recipient or the 910 debugger with the particular error. 912 The Signature fields are described in Section 8. 914 12.1. Retransmission 916 If a PDU sender expects an ACK, e.g. for an OPEN, an Encapsulation, a 917 VENDOR PDU, etc., and does not receive the ACK for a configurable 918 time (default one second), and the interface is live at layer-2, the 919 sender resends the PDU using exponential back-off, see [RFC1122]. 920 This cycle MAY be repeated a configurable number of times (default 921 three) before it is considered a failure. The session MAY BE 922 considered closed in this case of this ACK failure. 924 If the link is broken at layer-2, retransmission MAY BE retried when 925 the link is restored. 927 13. The Encapsulations 929 Once the devices know each other's LLEIs, know each other's upper 930 layer (L2.5 and L3) identities, have means to ensure link state, 931 etc., the L3DL session is considered established, and the devices 932 SHOULD exchange L3 interface encapsulations, L3 addresses, and L2.5 933 labels. 935 The Encapsulation types the peers exchange may be IPv4 936 (Section 13.3), IPv6 (Section 13.4), MPLS IPv4 (Section 13.6), MPLS 937 IPv6 (Section 13.7), and/or possibly others not defined here. 939 The sender of an Encapsulation PDU MUST NOT assume that the peer is 940 capable of the same Encapsulation Type. An ACK (Section 12) merely 941 acknowledges receipt. Only if both peers have sent the same 942 Encapsulation Type is it safe for Layer-3 protocols to assume that 943 they are compatible for that type. 945 A receiver of an encapsulation might recognize an addressing 946 conflict, such as both ends of the link trying to use the same 947 address. In this case, the receiver SHOULD respond with an error 948 (Error Code 2) ACK. As there may be other usable addresses or 949 encapsulations, this error might log and continue, letting an upper 950 layer topology builder deal with what works. 952 Further, to consider a logical link of a type to formally be 953 established so that it may be pushed up to upper layer protocols, the 954 addressing for the type must be compatible, e.g. on the same IP 955 subnet. 957 13.1. The Encapsulation PDU Skeleton 959 The header for all encapsulation PDUs is as follows: 961 0 1 2 3 962 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 963 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 964 | PDU Type | Payload Length ~ 965 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 966 ~ | Count | 967 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 968 | Serial Number | 969 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 970 | Encapsulation List... | Sig Type | 971 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 972 | Signature Length | Signature ... | 973 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 975 An Encapsulation PDU describes zero or more addresses of the 976 encapsulation type. 978 The 24-bit Count is the number of Encapsulations in the Encapsulation 979 list. 981 The Serial Number is a monotonically increasing 32-bit value 982 representing the sender's state in time. It may be an integer, a 983 timestamp, etc. On session restart (new OPEN), a receiver MAY send 984 the last received Session Number to tell the sender to only send 985 newer data. 987 If a sender has multiple links on the same interface, separate state: 988 data, ACKs, etc. must be kept for each peer session. 990 Over time, multiple Encapsulation PDUs may be sent for an interface 991 as configuration changes. 993 If the length of an Encapsulation PDU exceeds the Datagram size limit 994 on media, the PDU is broken into multiple Datagrams. See Section 8. 996 The Signature fields are described in Section 8. 998 The Receiver MUST acknowledge the Encapsulation PDU with a Type=3, 999 ACK PDU (Section 12) with the Encapsulation Type being that of the 1000 encapsulation being announced, see Section 12. 1002 If the Sender does not receive an ACK in a configurable interval 1003 (default one second), and the interface is live at layer-2, they 1004 SHOULD retransmit. After a user configurable number of failures 1005 (default three), the L3DL session should be considered dead and the 1006 OPEN process SHOULD be restarted. 1008 If the link is broken at layer-2, retransmission MAY BE retried if 1009 data have not changed in the interim. 1011 13.2. Encapsulaion Flags 1013 The Encapsulation Flags are a sequence of bit fields as follows: 1015 0 1 2 3 4 ... 7 1016 +------------+------------+------------+------------+------------+ 1017 | Ann/With | Primary | Under/Over | Loopback | Reserved ..| 1018 +------------+------------+------------+------------+------------+ 1020 Each encapsulation in an Encapsulation PDU of Type T may announce new 1021 and/or withdraw old encapsulations of Type T. It indicates this with 1022 the Ann/With Encapsulation Flag, Announce == 1, Withdraw == 0. 1024 Each Encapsulation interface address in an Encapsulation PDU is 1025 either a new encapsulation be announced (Ann/With == 1) (yes, a la 1026 BGP) or requests one be withdrawn (Ann/With == 0). Adding an 1027 encapsulation which already exists SHOULD raise an Announce/Withdraw 1028 Error (see Section 22.4); the EType SHOULD be 2, suggesting a session 1029 restart (see Section 12 so all encapsulations will be resent. 1031 If an LLEI has multiple addresses for an encapsulation type, one and 1032 only one address MAY be marked as primary (Primary Flag == 1) for 1033 that Encapsulation Type. 1035 An Encapsulation interface address in an Encapsulation PDU MAY be 1036 marked as a loopback, in which case the Loopback bit is set. 1037 Loopback addresses are generally not seen directly on an external 1038 interface. One or more loopback addresses MAY be exposed by 1039 configuration on one or more L3DL speaking external interfaces, e.g. 1040 for iBGP peering. They SHOULD be marked as such, Loopback Flag == 1. 1042 Each Encapsulation interface address in an Encapsulation PDU is that 1043 of the direct 'underlay interface (Under/Over == 1), or an 'overlay' 1044 address (Under/Over == 0), likely that of a VM or container guest 1045 bridged or configured on to the interface already having an underlay 1046 address. 1048 13.3. IPv4 Encapsulation 1050 The IPv4 Encapsulation describes a device's ability to exchange IPv4 1051 packets on one or more subnets. It does so by stating the 1052 interface's addresses and the corresponding prefix lengths. 1054 0 1 2 3 1055 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1056 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1057 | PDU Type = 4 | Payload Length ~ 1058 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1059 ~ | Count | 1060 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1061 | Serial Number | 1062 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1063 | Encaps Flags | IPv4 Address ~ 1064 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1065 ~ | PrefixLen | more ... | Sig Type | 1066 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1067 | Signature Length | Signature ... | 1068 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1070 The 24-bit Count is the sum of the number of IPv4 Encapsulations 1071 being announced and/or withdrawn. 1073 13.4. IPv6 Encapsulation 1075 The IPv6 Encapsulation describes a logical link's ability to exchange 1076 IPv6 packets on one or more subnets. It does so by stating the 1077 interface's addresses and the corresponding prefix lengths. 1079 0 1 2 3 1080 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1081 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1082 | PDU Type = 5 | Payload Length ~ 1083 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1084 ~ | Count | 1085 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1086 | Serial Number | 1087 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1088 | Encaps Flags | | 1089 +-+-+-+-+-+-+-+-+ + 1090 | | 1091 + + 1092 | | 1093 + + 1094 | IPv6 Address | 1095 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1096 | | PrefixLen | more ... | Sig Type | 1097 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1098 | Signature Length | Signature ... | 1099 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1100 The 24-bit Count is the sum of the number of IPv6 Encapsulations 1101 being announced and/or withdrawn. 1103 13.5. MPLS Label List 1105 As an MPLS enabled interface may have a label stack, see [RFC3032], a 1106 variable length list of labels is needed. These are the labels the 1107 sender will accept for the prefix to which the list is attached. 1109 0 1 2 3 1110 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1111 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1112 | Label Count | Label | Exp |S| 1113 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1114 | Label | Exp |S| more ... | 1115 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1117 A Label Count of zero is an implicit withdraw of all labels for that 1118 prefix on that interface. 1120 13.6. MPLS IPv4 Encapsulation 1122 The MPLS IPv4 Encapsulation describes a logical link's ability to 1123 exchange labeled IPv4 packets on one or more subnets. It does so by 1124 stating the interface's addresses the corresponding prefix lengths, 1125 and the corresponding labels which will be accepted for each address. 1127 0 1 2 3 1128 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1129 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1130 | PDU Type = 6 | Payload Length ~ 1131 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1132 ~ | Count | 1133 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1134 | Serial Number | 1135 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1136 | Encaps Flags | MPLS Label List ... | ~ 1137 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1138 ~ IPv4 Address | PrefixLen | 1139 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1140 | more ... | Sig Type | Signature Length | 1141 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1142 | Signature | 1143 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1145 The 24-bit Count is the sum of the number of MPLSv4 Encapsulation 1146 being announced and/or withdrawn. 1148 13.7. MPLS IPv6 Encapsulation 1150 The MPLS IPv6 Encapsulation describes a logical link's ability to 1151 exchange labeled IPv6 packets on one or more subnets. It does so by 1152 stating the interface's addresses, the corresponding prefix lengths, 1153 and the corresponding labels which will be accepted for each address. 1155 0 1 2 3 1156 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1157 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1158 | PDU Type = 7 | Payload Length ~ 1159 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1160 ~ | Count | 1161 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1162 | Serial Number | 1163 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1164 | Encaps Flags | MPLS Label List ... | | 1165 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + 1166 | | 1167 + + 1168 | | 1169 + + 1170 | IPv6 Address | 1171 + +-+-+-+-+-+-+-+-+ 1172 | | Prefix Len | 1173 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1174 | more ... | Sig Type | Signature Length | 1175 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1176 | Signature ... | 1177 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1179 The 24-bit Count is the sum of the number of MPLSv6 Encapsulations 1180 being announced and/or withdrawn. 1182 14. VENDOR - Vendor Extensions 1183 0 1 2 3 1184 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1185 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1186 | PDU Type = 255| Payload Length ~ 1187 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1188 ~ | Serial Number ~ 1189 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1190 ~ | Enterprise Number | 1191 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1192 | Ent Type | Enterprise Data ... ~ 1193 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1194 ~ | Sig Type | Signature Length | 1195 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1196 | Signature ... | 1197 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1199 Vendors or enterprises may define TLVs beyond the scope of L3DL 1200 standards. This is done using a Private Enterprise Number [IANA-PEN] 1201 followed by Enterprise Data in a format defined for that Enterprise 1202 Number and Ent Type. 1204 Ent Type allows a VENDOR PDU to be sub-typed in the event that the 1205 vendor/enterprise needs multiple PDU types. 1207 As with Encapsulation PDUs, a receiver of a VENDOR PDU MUST respond 1208 with an ACK or an ERROR PDU. Similarly, a VENDOR PDU MUST only be 1209 sent over an open session. 1211 15. KEEPALIVE - Layer-2 Liveness 1213 0 1 2 3 1214 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1215 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1216 | PDU Type = 2 | Payload Length = 0 ~ 1217 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1218 ~ | Sig Type = 0 | Signature Length = 0 | 1219 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1221 L3DL devices SHOULD beacon frequent Layer-2 KEEPALIVE PDUs to ensure 1222 session continuity. The inter-KEEPALIVE interval is configurable, 1223 with a default of ten seconds. A receiver may choose to ignore 1224 KEEPALIVE PDUs. 1226 An operational deployment MUST BE configured whether to use 1227 KEEPALIVEs or not, either globally, or as finely as to per-link 1228 granularity. Disagreement MAY result in repeated session failure and 1229 reestablishment. 1231 KEEPALIVEs SHOULD be beaconed at a configured frequency. One per 1232 second is the default. Layer-3 liveness, such as BFD, may be more 1233 (or less) aggressive. 1235 When a sender transmits a PDU which is not a KEEPALIVE, the sender 1236 SHOULD reset the KEEPALIVE timer. I.e. sending any PDU acts as a 1237 keepalive. Once the last fragment has been sent, the KEEPALIVE timer 1238 SHOULD be restarted. Do not wait for the ACK. 1240 If a KEEPALIVE or other PDUs have not been received from a peer with 1241 which a receiver has an open session for a configurable time (default 1242 30 seconds), the link SHOULD be presumed down. The devices MAY keep 1243 configuration state and restore it without retransmission if no data 1244 have changed. Otherwise, a new session SHOULD be established and new 1245 Encapsulation PDUs exchanged. 1247 16. Layers-2.5 and 3 Liveness 1249 Layer-2 liveness may be continuously tested by KEEPALIVE PDUs, see 1250 Section 15. As layer-2.5 or layer-3 connectivity could still break, 1251 liveness above layer-2 MAY be frequently tested using BFD ([RFC5880]) 1252 or a similar technique. 1254 This protocol assumes that one or more Encapsulation addresses may be 1255 used to ping, run BFD, or whatever the operator configures. 1257 17. The North/South Protocol 1259 Thus far, a one-hop point-to-point logical link discovery protocol 1260 has been defined. 1262 The devices know their unique LLEIs and know the unique peer LLEIs 1263 and Encapsulations on each logical link interface. 1265 Full topology discovery is not appropriate at the L3DL layer, so 1266 Dijkstra a la IS-IS etc. is assumed to be done by higher level 1267 protocols such as BGP-SPF. 1269 Therefore the LLEIs, link Encapsulations, and state changes are 1270 pushed North via a small subset of the BGP-LS API. The upper layer 1271 routing protocol(s), e.g. BGP-SPF, learn and maintain the topology, 1272 run Dijkstra, and build the routing database(s). 1274 For example, if a neighbor's IPv4 Encapsulation address changes, the 1275 devices seeing the change push that change Northbound. 1277 17.1. Use BGP-LS as Much as Possible 1279 BGP-LS [RFC7752] defines BGP-like Datagrams describing logical link 1280 state (links, nodes, link prefixes, and many other things), and a new 1281 BGP path attribute providing Northbound transport, all of which can 1282 be ingested by upper layer protocols such as BGP-SPF; see Section 4 1283 of [I-D.ietf-lsvr-bgp-spf]. 1285 For IPv4 links, TLVs 259 and 260 are used. For IPv6 links, TLVs 261 1286 and 262. If there are multiple addresses on a link, multiple TLV 1287 pairs are pushed North, having the same ID pairs. 1289 17.2. Extensions to BGP-LS 1291 The Northbound protocol needs a few minor extensions to BGP-LS. 1292 Luckily, others have needed the same extensions. 1294 Similarly to BGP-SPF, the BGP protocol is used in the Protocol-ID 1295 field specified in table 1 of 1296 [I-D.ietf-idr-bgpls-segment-routing-epe]. The local and remote node 1297 descriptors for all NLRI are the IDs described in Section 11. This 1298 is equivalent to an adjacency SID or a node SID if the address is a 1299 loopback address. 1301 Label Sub-TLVs from [I-D.ietf-idr-bgp-ls-segment-routing-ext] 1302 Section 2.1.1, are used to associate one or more MPLS Labels with a 1303 link. 1305 18. Discussion 1307 This section explores some trade-offs taken and some considerations. 1309 18.1. HELLO Discussion 1311 A device with multiple Layer-2 interfaces, traditionally called a 1312 switch, may be used to forward frames and therefore packets from 1313 multiple devices to one logical interface (LLEI), I, on an L3DL 1314 speaking device. Interface I could discover a peer J across the 1315 switch. Later, a prospective peer K could come up across the switch. 1316 If I was not still sending and listening for HELLOs, the potential 1317 peering with K could not be discovered. Therefore, on multi-link 1318 interfaces, L3DL MUST continue to send HELLOs as long as they are 1319 turned up. 1321 18.2. HELLO versus KEEPALIVE 1323 Both HELLO and KEEPALIVE are periodic. KEEPALIVE might be eliminated 1324 in favor of keeping only HELLOs. But KEEPALIVEs are unicast, and 1325 thus less noisy on the network, especially if HELLO is configured to 1326 transit layer-2-only switches, see Section 18.1. 1328 19. VLANs/SVIs/Sub-interfaces 1330 One can think of the protocol as an instance (i.e. state machine) 1331 which runs on each logical link of a device. 1333 As the upper routing layer must view VLAN topologies as separate 1334 graphs, L3DL treats VLAN ports as separate links. 1336 L3DL PDUs learned over VLAN-ports may be interpreted by upper layer-3 1337 routing protocols as being learned on the corresponding layer-3 SVI 1338 interface for the VLAN. 1340 As Sub-Interfaces each have their own LLIEs, they act as separate 1341 interfaces, forming their own links. 1343 20. Implementation Considerations 1345 An implementation SHOULD provide the ability to configure each 1346 logical interface as L3DL speaking or not. 1348 An implementation SHOULD provide the ability to configure whether 1349 HELLOs on an L3DL enabled interface send Nearest Bridge or the MAC 1350 which is propagated by switches from that interface; see Section 10. 1352 An implementation SHOULD provide the ability to distribute one or 1353 more loopback addresses or interfaces into L3DL on an external L3DL 1354 speaking interface. 1356 An implementation SHOULD provide the ability to distribute one or 1357 more overlay and/or underlay addresses or interfaces into L3DL on an 1358 external L3DL speaking interface. 1360 An implementation SHOULD provide the ability to configure one of the 1361 addresses of an encapsulation as primary on an L3DL speaking 1362 interface. If there is only one address for a particular 1363 encapsulation, the implementation MAY mark it as primary by default. 1365 An implementation MAY allow optional configuration which updates the 1366 local forwarding table with overlay and underlay data both learned 1367 from L3DL peers and configured locally. 1369 21. Security Considerations 1371 The protocol as is MUST NOT be used outside a datacenter or similarly 1372 closed environment without authentication and authorization 1373 mechanisms such as [I-D.ymbk-lsvr-l3dl-signing]. 1375 Many MDC operators have a strange belief that physical walls and 1376 firewalls provide sufficient security. This is not credible. All 1377 MDC protocols need to be examined for exposure and attack surface. 1378 In the case of L3DL, Authentication and Integrity as provided in 1379 [I-D.ymbk-lsvr-l3dl-signing] is strongly recommended. 1381 It is generally unwise to assume that on the wire Layer-2 is secure. 1382 Strange/unauthorized devices may plug into a port. Mis-wiring is 1383 very common in datacenter installations. A poisoned laptop might be 1384 plugged into a device's port, form malicious sessions, etc. to 1385 divert, intercept, or drop traffic. 1387 Similarly, malicious nodes/devices could mis-announce addressing. 1389 If OPENs are not being authenticated, an attacker could forge an OPEN 1390 for an existing session and cause the session to be reset. 1392 For these reasons, the OPEN PDU's authentication data exchange SHOULD 1393 be used. 1395 If the KEEPALIVE PDU is not signed (as suggested in Section 8) to 1396 save computation, then a MITM could fake a session being alive. 1398 22. IANA Considerations 1400 22.1. PDU Types 1402 This document requests the IANA create a registry for L3DL PDU Type, 1403 which may range from 0 to 255. The name of the registry should be 1404 L3DL-PDU-Type. The policy for adding to the registry is RFC Required 1405 per [RFC5226], either standards track or experimental. The initial 1406 entries should be the following: 1408 PDU 1409 Code PDU Name 1410 ---- ------------------- 1411 0 HELLO 1412 1 OPEN 1413 2 KEEPALIVE 1414 3 ACK 1415 4 IPv4 Announcement 1416 5 IPv6 Announcement 1417 6 MPLS IPv4 Announcement 1418 7 MPLS IPv6 Announcement 1419 8-254 Reserved 1420 255 VENDOR 1422 22.2. Signature Type 1424 This document requests the IANA create a registry for L3DL Signature 1425 Type, AKA Sig Type, which may range from 0 to 255. The name of the 1426 registry should be L3DL-Signature-Type. The policy for adding to the 1427 registry is RFC Required per [RFC5226], either standards track or 1428 experimental. The initial entries should be the following: 1430 Number Name 1431 ------ ------------------- 1432 0 Null 1433 1-255 Reserved 1435 22.3. Flag Bits 1437 This document requests the IANA create a registry for L3DL PL Flag 1438 Bits, which may range from 0 to 7. The name of the registry should 1439 be L3DL-PL-Flag-Bits. The policy for adding to the registry is RFC 1440 Required per [RFC5226], either standards track or experimental. The 1441 initial entries should be the following: 1443 Bit Bit Name 1444 ---- ------------------- 1445 0 Announce/Withdraw (ann == 0) 1446 1 Primary 1447 2 Underlay/Overlay (under == 0) 1448 3 Loopback 1449 4-7 Reserved 1451 22.4. Error Codes 1453 This document requests the IANA create a registry for L3DL Error 1454 Codes, a 16 bit integer. The name of the registry should be L3DL- 1455 Error-Codes. The policy for adding to the registry is RFC Required 1456 per [RFC5226], either standards track or experimental. The initial 1457 entries should be the following: 1459 Error 1460 Code Error Name 1461 ---- ------------------- 1462 0 No Error 1463 1 Checksum Error 1464 2 Logical Link Addressing Conflict 1465 3 Authorization Failure 1466 4 Announce/Withdraw Error 1468 23. IEEE Considerations 1470 This document requires a new EtherType. 1472 This document requires a new multicast MAC address that will be 1473 broadcast through a switch. 1475 24. Acknowledgments 1477 The authors thank Cristel Pelsser for multiple reviews, Harsha Kovuru 1478 for comments during implementation, Jeff Haas for review and 1479 comments, Joerg Ott for an early but deep transport review, Joe 1480 Clarke for a useful review, John Scudder for deeply serious review 1481 and comments, Larry Kreeger for a lot of layer-2 clue, Martijn 1482 Schmidt for his contribution, Nalinaksh Pai for transport 1483 discussions, Neeraj Malhotra for review, Paul Congdon for Ethernet 1484 hints, Russ Housley for checksum discussion and sBox, and Steve 1485 Bellovin for checksum advice. 1487 25. References 1489 25.1. Normative References 1491 [I-D.ietf-idr-bgp-ls-segment-routing-ext] 1492 Previdi, S., Talaulikar, K., Filsfils, C., Gredler, H., 1493 and M. Chen, "Border Gateway Protocol - Link State (BGP- 1494 LS) Extensions for Segment Routing", Work in Progress, 1495 Internet-Draft, draft-ietf-idr-bgp-ls-segment-routing-ext- 1496 18, 15 April 2021, . 1499 [I-D.ietf-idr-bgpls-segment-routing-epe] 1500 Previdi, S., Talaulikar, K., Filsfils, C., Patel, K., Ray, 1501 S., and J. Dong, "Border Gateway Protocol - Link State 1502 (BGP-LS) Extensions for Segment Routing BGP Egress Peer 1503 Engineering", Work in Progress, Internet-Draft, draft- 1504 ietf-idr-bgpls-segment-routing-epe-19, 16 May 2019, 1505 . 1508 [I-D.ietf-lsvr-bgp-spf] 1509 Patel, K., Lindem, A., Zandi, S., and W. Henderickx, "BGP 1510 Link-State Shortest Path First (SPF) Routing", Work in 1511 Progress, Internet-Draft, draft-ietf-lsvr-bgp-spf-15, 1 1512 July 2021, . 1515 [I-D.ymbk-lsvr-l3dl-signing] 1516 Bush, R. and R. Austein, "Layer 3 Discovery and Liveness 1517 Signing", Work in Progress, Internet-Draft, draft-ymbk- 1518 lsvr-l3dl-signing-01, 6 May 2020, 1519 . 1522 [IANA-PEN] "IANA Private Enterprise Numbers", 1523 . 1526 [IEEE.802_2001] 1527 IEEE, "IEEE Standard for Local and Metropolitan Area 1528 Networks: Overview and Architecture", IEEE 802-2001, 1529 DOI 10.1109/ieeestd.2002.93395, 27 July 2002, 1530 . 1532 [IEEE802-2014] 1533 Institute of Electrical and Electronics Engineers, "Local 1534 and Metropolitan Area Networks: Overview and 1535 Architecture", IEEE Std 802-2014, 2014. 1537 [RFC1213] McCloghrie, K. and M. Rose, "Management Information Base 1538 for Network Management of TCP/IP-based internets: MIB-II", 1539 STD 17, RFC 1213, DOI 10.17487/RFC1213, March 1991, 1540 . 1542 [RFC1629] Colella, R., Callon, R., Gardner, E., and Y. Rekhter, 1543 "Guidelines for OSI NSAP Allocation in the Internet", 1544 RFC 1629, DOI 10.17487/RFC1629, May 1994, 1545 . 1547 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1548 Requirement Levels", BCP 14, RFC 2119, 1549 DOI 10.17487/RFC2119, March 1997, 1550 . 1552 [RFC3032] Rosen, E., Tappan, D., Fedorkow, G., Rekhter, Y., 1553 Farinacci, D., Li, T., and A. Conta, "MPLS Label Stack 1554 Encoding", RFC 3032, DOI 10.17487/RFC3032, January 2001, 1555 . 1557 [RFC4271] Rekhter, Y., Ed., Li, T., Ed., and S. Hares, Ed., "A 1558 Border Gateway Protocol 4 (BGP-4)", RFC 4271, 1559 DOI 10.17487/RFC4271, January 2006, 1560 . 1562 [RFC5226] Narten, T. and H. Alvestrand, "Guidelines for Writing an 1563 IANA Considerations Section in RFCs", RFC 5226, 1564 DOI 10.17487/RFC5226, May 2008, 1565 . 1567 [RFC5880] Katz, D. and D. Ward, "Bidirectional Forwarding Detection 1568 (BFD)", RFC 5880, DOI 10.17487/RFC5880, June 2010, 1569 . 1571 [RFC6286] Chen, E. and J. Yuan, "Autonomous-System-Wide Unique BGP 1572 Identifier for BGP-4", RFC 6286, DOI 10.17487/RFC6286, 1573 June 2011, . 1575 [RFC7752] Gredler, H., Ed., Medved, J., Previdi, S., Farrel, A., and 1576 S. Ray, "North-Bound Distribution of Link-State and 1577 Traffic Engineering (TE) Information Using BGP", RFC 7752, 1578 DOI 10.17487/RFC7752, March 2016, 1579 . 1581 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 1582 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 1583 May 2017, . 1585 25.2. Informative References 1587 [Clos0] Clos, C., "A study of non-blocking switching networks 1588 [PAYWALLED]", Bell System Technical Journal 32 (2), pp 1589 406-424, March 1953. 1591 [Clos1] "Clos Network", 1592 . 1594 [I-D.malhotra-bess-evpn-lsoe] 1595 Malhotra, N., Patel, K., and J. Rabadan, "LSoE-based PE-CE 1596 Control Plane for EVPN", Work in Progress, Internet-Draft, 1597 draft-malhotra-bess-evpn-lsoe-00, 11 March 2019, 1598 . 1601 [JUPITER] Singh, A., Ong, J., Agarwal, A., Anderson, G., Armistead, 1602 A., Bannon, R., Boving, S., Desai, G., Felderman, B., 1603 Germano, P., Kanagala, A., Liu, H., Provost, J., Simmons, 1604 J., Tanda, E., Wanderer, J., Hölzle, U., Stuart, S., and 1605 A. Vahdat, "Jupiter rising: a decade of clos topologies 1606 and centralized control in Google's datacenter network", 1607 Communications of the ACM Vol. 59, pp. 88-97, 1608 DOI 10.1145/2975159, August 2016, 1609 . 1611 [RFC0791] Postel, J., "Internet Protocol", STD 5, RFC 791, 1612 DOI 10.17487/RFC0791, September 1981, 1613 . 1615 [RFC1122] Braden, R., Ed., "Requirements for Internet Hosts - 1616 Communication Layers", STD 3, RFC 1122, 1617 DOI 10.17487/RFC1122, October 1989, 1618 . 1620 [RFC1982] Elz, R. and R. Bush, "Serial Number Arithmetic", RFC 1982, 1621 DOI 10.17487/RFC1982, August 1996, 1622 . 1624 Authors' Addresses 1626 Randy Bush 1627 Arrcus & Internet Initiative Japan 1628 5147 Crystal Springs 1629 Bainbridge Island, WA 98110 1630 United States of America 1632 Email: randy@psg.com 1634 Rob Austein 1635 Arrcus, Inc 1637 Email: sra@hactrn.net 1638 Keyur Patel 1639 Arrcus 1640 2077 Gateway Place, Suite #400 1641 San Jose, CA 95119 1642 United States of America 1644 Email: keyur@arrcus.com