idnits 2.17.00 (12 Aug 2021) /tmp/idnits677/draft-ymbk-idr-l3nd-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 1 instance of lines with multicast IPv4 addresses in the document. If these are generic example addresses, they should be changed to use the 233.252.0.x range defined in RFC 5771 Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document date (4 April 2022) is 40 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Outdated reference: A later version (-09) exists of draft-ietf-lsvr-l3dl-08 -- Possible downref: Non-RFC (?) normative reference: ref. 'IANA-PEN' ** Obsolete normative reference: RFC 5226 (Obsoleted by RFC 8126) == Outdated reference: A later version (-05) exists of draft-ymbk-idr-l3nd-ulpc-04 Summary: 1 error (**), 0 flaws (~~), 3 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group R. Bush 3 Internet-Draft Arrcus & Internet Initiative Japan 4 Intended status: Standards Track R. Housley 5 Expires: 6 October 2022 Vigil Security 6 R. Austein 7 Arrcus 8 S. Hares 9 Hickory Hill Consulting 10 K. Patel 11 Arrcus 12 4 April 2022 14 Layer-3 Neighbor Discovery 15 draft-ymbk-idr-l3nd-04 17 Abstract 19 Data Centers where the topology is BGP-based need to discover 20 neighbor IP addressing, IP Layer-3 BGP neighbors, etc. This Layer-3 21 Neighbor Discovery protocol identifies BGP neighbor candidates. 23 Requirements Language 25 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 26 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 27 "OPTIONAL" in this document are to be interpreted as described in BCP 28 14 [RFC2119] [RFC8174] when, and only when, they appear in all 29 capitals, as shown here. 31 Status of This Memo 33 This Internet-Draft is submitted in full conformance with the 34 provisions of BCP 78 and BCP 79. 36 Internet-Drafts are working documents of the Internet Engineering 37 Task Force (IETF). Note that other groups may also distribute 38 working documents as Internet-Drafts. The list of current Internet- 39 Drafts is at https://datatracker.ietf.org/drafts/current/. 41 Internet-Drafts are draft documents valid for a maximum of six months 42 and may be updated, replaced, or obsoleted by other documents at any 43 time. It is inappropriate to use Internet-Drafts as reference 44 material or to cite them other than as "work in progress." 46 This Internet-Draft will expire on 6 October 2022. 48 Copyright Notice 50 Copyright (c) 2022 IETF Trust and the persons identified as the 51 document authors. All rights reserved. 53 This document is subject to BCP 78 and the IETF Trust's Legal 54 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 55 license-info) in effect on the date of publication of this document. 56 Please review these documents carefully, as they describe your rights 57 and restrictions with respect to this document. Code Components 58 extracted from this document must include Revised BSD License text as 59 described in Section 4.e of the Trust Legal Provisions and are 60 provided without warranty as described in the Revised BSD License. 62 Table of Contents 64 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 65 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 66 3. Background . . . . . . . . . . . . . . . . . . . . . . . . . 5 67 4. Inter-Link Protocol Overview . . . . . . . . . . . . . . . . 5 68 4.1. L3ND Ladder Diagram . . . . . . . . . . . . . . . . . . . 6 69 5. TLV PDUs . . . . . . . . . . . . . . . . . . . . . . . . . . 7 70 6. HELLO . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 71 6.1. Transport . . . . . . . . . . . . . . . . . . . . . . . . 9 72 6.2. Flags . . . . . . . . . . . . . . . . . . . . . . . . . . 10 73 6.3. Port . . . . . . . . . . . . . . . . . . . . . . . . . . 10 74 7. TCP Set-Up . . . . . . . . . . . . . . . . . . . . . . . . . 11 75 8. OPEN . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 76 9. ACK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 77 9.1. Retransmission . . . . . . . . . . . . . . . . . . . . . 15 78 10. The Encapsulations . . . . . . . . . . . . . . . . . . . . . 15 79 10.1. The Encapsulation PDU Skeleton . . . . . . . . . . . . . 16 80 10.2. Encapsulaion Flags . . . . . . . . . . . . . . . . . . . 17 81 10.3. IPv4 Encapsulation . . . . . . . . . . . . . . . . . . . 18 82 10.4. IPv6 Encapsulation . . . . . . . . . . . . . . . . . . . 18 83 10.5. MPLS Label List . . . . . . . . . . . . . . . . . . . . 19 84 10.6. MPLS IPv4 Encapsulation . . . . . . . . . . . . . . . . 20 85 10.7. MPLS IPv6 Encapsulation . . . . . . . . . . . . . . . . 20 86 11. VENDOR - Vendor Extensions . . . . . . . . . . . . . . . . . 21 87 12. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 22 88 12.1. HELLO Discussion . . . . . . . . . . . . . . . . . . . . 22 89 13. VLANs/SVIs/Sub-interfaces . . . . . . . . . . . . . . . . . . 22 90 14. Implementation Considerations . . . . . . . . . . . . . . . . 22 91 15. Security Considerations . . . . . . . . . . . . . . . . . . . 23 92 16. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 23 93 16.1. Link Local Layer-3 Addresses . . . . . . . . . . . . . . 24 94 16.2. Ports for TLS/TCP . . . . . . . . . . . . . . . . . . . 24 95 16.3. PDU Types . . . . . . . . . . . . . . . . . . . . . . . 24 96 16.4. Flag Bits . . . . . . . . . . . . . . . . . . . . . . . 24 97 16.5. Error Codes . . . . . . . . . . . . . . . . . . . . . . 25 98 17. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 25 99 18. References . . . . . . . . . . . . . . . . . . . . . . . . . 25 100 18.1. Normative References . . . . . . . . . . . . . . . . . . 25 101 18.2. Informative References . . . . . . . . . . . . . . . . . 26 102 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 27 104 1. Introduction 106 The Massive Data Center (MDC) environment presents unusual problems 107 of scale, e.g. O(10,000) forwarding devices, while its homogeneity 108 presents opportunities for simple approaches. Layer-3 Discovery and 109 Liveness (L3DL), [I-D.ietf-lsvr-l3dl], provides neighbor discovery at 110 Layer-2. This document (set) provides a similar solution at Layer-3, 111 attempting to be as similar as reasonable to L3DL. 113 Some guiding principles when dealing with datacenters with tens of 114 thousands of devices are 116 * Predictable Reliability, 118 * Security: Authorization and Integrity more than Confidentiality, 119 and 121 * Massive Scalability 123 Layer-3 Neighbor Discovery (L3ND) provides brutally simple mechanisms 124 for neighboring devices to 126 * Discover each other's IP Addresses, 128 * Discover mutually supported layer-3 encapsulations, e.g. IPv4/ 129 IPv6//MPLS, 131 * Discover Layer-3 IP and/or MPLS addressing of interfaces of the 132 encapsulations, 134 * Provide authenticity, integrity, and verification of protocol 135 messages, and 137 * Accommodate scaling needed for EVPN etc. 139 L3ND is intended for use within single IP subnets (IP over Ethernet 140 or other point-to-point or multi-point IP link) in order to exchange 141 the data needed to bootstrap BGP-based peering, EVPN, etc.; 142 especially in a datacenter Clos [Clos] environment. Once IP 143 connectivity has been leveraged to discover layer-3 addressability 144 and forwarding capabilities, normal IP forwarding and routing can 145 take over. 147 L3ND might be more widely applicable to a range of routing and 148 similar protocols which need Layer-3 neighbor discovery. 150 2. Terminology 152 Even though it concentrates on the inter-device layer, this document 153 relies heavily on routing terminology. The following attempts to 154 clarify the use of some possibly confusing terms: 156 Clos: A hierarchic subset of a crossbar switch topology commonly 157 used in data centers [Clos]. 159 Datagram: The L3ND content of a single Layer-3 UDP Datagram. 161 Encapsulation: Address Family Indicator and Subsequent Address 162 Family Indicator (AFI/SAFI). I.e. classes of Layer-2.5 163 and Layer-3 addresses such as IPv4, IPv6, MPLS. 165 Link or Logical Link: A logical connection between two interfaces on 166 two different devices. E.g. two VLANs between the same 167 two ports are two links. 169 MDC: Massive Data Center, commonly composed of thousands of Top 170 of Rack Switches (TORs). 172 MTU: Maximum Transmission Unit, the size in octets of the 173 largest packet that can be sent on a medium, see [RFC1122] 174 1.3.3. 176 PDU: Protocol Data Unit, an L3ND application layer message. 178 Session: An established, via exchange of OPEN PDUs, session between 179 two L3ND capable IP interfaces on a link. 181 TOR Switch: Top Of Rack switch, aggregates the servers in a rack and 182 connects to aggregation layers of the Clos tree, AKA the 183 Clos spine. 185 3. Background 187 L3ND is primarily designed for a Clos type datacenter scale and 188 topology, but can accommodate richer topologies which contain 189 potential cycles. 191 While L3ND is designed for the MDC, there are no inherent reasons it 192 could not run on a WAN. The authentication and authorization needed 193 to run safely on a WAN need to be considered, and the appropriate 194 level of security options chosen. 196 The number of addresses of one Encapsulation type on an interface 197 link may be quite large given a TOR switch with tens of servers, each 198 server having a few hundred micro-services, resulting in an 199 inordinate number of addresses. And highly automated micro-service 200 migration can cause serious address prefix disaggregation, resulting 201 in interfaces with thousands of disaggregated prefixes. 203 To meet such scaling needs, the L3ND protocol is session oriented and 204 uses incremental announcement and withdrawal with session restart, a 205 la BGP ([RFC4271]). 207 4. Inter-Link Protocol Overview 209 A device broadcasts a Layer-3 Multicast UDP datagram (HELLO) 210 containing the port number that is willing to serve a TLS or raw TCP 211 connection to support the data exchange of the rest of the protocol 212 in a reliable and preferably authenticated manner. 214 Another device on the link then establishes a TLS or raw TCP session 215 in which inter-device PDUs are used to exchange device and logical 216 link identities and layer-2.5 (MPLS) and 3 identifiers (not 217 payloads), e.g. more IP Addresses, loopback addresses, port 218 identities, and Encapsulations. 220 To assure discovery of new devices coming up on a multi-link 221 topology, devices on such a topology, and only on a multi-link 222 topology, send periodic HELLOs forever, see Section 12.1. 224 Given the TLS/TCP session, OPEN PDUs (Section 8) are exchanged, the 225 Encapsulations (Section 10) configured on an end point may be 226 announced and modified. Note that these are only the encapsulation 227 and addresses configured on the announcing interface; though a 228 device's loopback and overlay interface(s) may also be announced. 230 4.1. L3ND Ladder Diagram 232 The HELLO, Section 6, is a priming message sent on all logical links 233 configured for L3ND. It is a small L3ND Multicast UDP PDU with the 234 simple goal of advertising a TLS/TCP service available on an 235 advertised port on the sending IP interface. 237 The HELLO PDU is either IPv4 or IPv6, which selects the AFI to be 238 used for the rest of the session(s) between end-points. Two 239 endpoints MAY establish a link for each AFI. 241 An interface on the link receiving the HELLO PDU attempts to 242 establish a TLS or raw TCP, as specified by the HELLO, session to the 243 source IP address of the HELLO on the port advertised in the HELLO. 245 The OPEN, Section 8 PDUs, used to exchange details about the L3ND 246 session, and the ACK/ERROR PDU, are mandatory; other PDUs are 247 optional; though at least one encapsulation SHOULD be agreed at some 248 point. 250 Like Multi-Protocol BGP, [RFC4760], an L3ND session running over one 251 AFI MAY carry encapsulations etc. of different AFIs, 253 A L3DL extension, [I-D.ymbk-idr-l3nd-ulpc], describes the next upper 254 layer L3DL protocol to exchange BGP parameter information. 256 The following is a ladder-style diagram of the L3ND protocol 257 exchanges: 259 | HELLO | Logical Link Peer discovery 260 |---------------------------->| 261 | TCP OPEN | Mandatory 262 |<----------------------------| 263 | | 264 | | 265 | OPEN | IDs, security, etc. 266 |---------------------------->| 267 | ACK | 268 |<----------------------------| 269 | | 270 | OPEN | Mandatory 271 |<----------------------------| 272 | ACK | 273 |---------------------------->| 274 | | 275 | | 276 | Interface IPv4 Addresses | Interface IPv4 Addresses 277 |---------------------------->| Optional 278 | ACK | 279 |<----------------------------| 280 | | 281 | Interface IPv4 Addresses | 282 |<----------------------------| 283 | ACK | 284 |---------------------------->| 285 | | 286 | | 287 | Interface IPv6 Addresses | Interface IPv6 Addresses 288 |---------------------------->| Optional 289 | ACK | 290 |<----------------------------| 291 | | 292 | Interface IPv6 Addresses | 293 |<----------------------------| 294 | ACK | 295 |---------------------------->| 296 | | 297 | | 298 | Interface MPLSv4 Labels | Interface MPLSv4 Labels 299 |---------------------------->| Optional 300 | ACK | 301 |<----------------------------| 302 | | 303 | Interface MPLSv4 Labels | Interface MPLSv4 Labels 304 |<----------------------------| Optional 305 | ACK | 306 |---------------------------->| 307 | | 308 | | 309 | Interface MPLSv6 Labels | Interface MPLSv6 Labels 310 |---------------------------->| Optional 311 | ACK | 312 |<----------------------------| 313 | | 314 | Interface MPLSv6 Labels | Interface MPLSv6 Labels 315 |<----------------------------| Optional 316 | ACK | 317 |---------------------------->| 319 5. TLV PDUs 321 The basic L3ND application layer PDU is a typical TLV (Type Length 322 Value) PDU. As it is transported over TCP, integrity is assured. 323 When it is transported over TLS, authenticity is also provided. 325 0 1 2 3 326 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 327 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 328 | Version = 0 | PDU Type | Payload Length ~ 329 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 330 ~ | ~ 331 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ~ 332 ~ Payload ... ~ 333 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 335 The fields of the basic L3ND header are as follows: 337 Version: An integer differentiating versions of the L3ND protocol. 338 Currently only Version 0 MAY BE specified. 340 PDU Type: An integer differentiating PDU payload types. See 341 Section 16.3. 343 Payload Length: Total number of octets in the Payload field. 345 Payload: The application layer content of the L3ND PDU. 347 6. HELLO 349 0 1 2 3 350 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 351 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 352 | Version = 0 | PDU Type = 0 | Payload Length = 4 | 353 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 354 | | Transport | Flags | 355 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 356 | Port | 357 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 359 The Payload Length is 4 to cover the Transport, Flags, and Port 360 fields. 362 The IPv4 UDP packets are sent to the IPv4 link local multicast 363 address (TBD1) and the IPv6 UDP packets are sent to an IPv6 link 364 Local multicast address (TBD2). See Section 12.1 for why multicast 365 is used. 367 The HELLO PDU solicits a unicast TLS/TCP session open request of the 368 same AFI from other devices on the link. 370 When a HELLO is received from a source IP address with which there is 371 no established TLS/TCP L3ND session, if the receiver has the higher 372 of the two IP addresses, it SHOULD respond by sending a TLS/TCP 373 client open request, using the same AFI, to the source IP address of 374 the HELLO to establish an L3ND TLS/TCP session. 376 All L3ND PDUs other than HELLO are sent via TLS/TCP, as the server's 377 destination IP address is known after the HELLO. 379 When an interface is turned up on a device, it SHOULD issue a HELLO 380 if it is configured to participate in L3ND sessions and repeat HELLOs 381 at a configured interval, with a default of 60 seconds. 383 If the configured multicast destination address is one that is 384 propagated by switches, the HELLO SHOULD be repeated at a configured 385 interval, with a default of 60 seconds. This allows discovery by new 386 devices which come up on the mesh. In this multi-link scenario, the 387 operator should be aware of the trade-off between timer tuning and 388 network noise and adjust the inter-HELLO timer accordingly. 390 By default, GTSM, [RFC5082], SHOULD be enabled to test that a 391 received HELLO MUST be on the local link; thus leaving no middle on 392 which a monkey in the middle might stand. It MAY be disabled by 393 configuration. GTSM check failures SHOULD be logged, though with 394 rate limiting to keep from overwhelming logs. 396 If more than one device responds, one adjacency is formed for each 397 unique source IP address. L3ND treats each adjacency as a separate 398 logical link. 400 To ameliorate possible load spikes during bootstrap or event 401 recovery, there SHOULD be a jittered delay between receipt of a HELLO 402 and TLS/TCP open. The default delay range SHOULD be zero to five 403 seconds, and MUST be configurable. 405 If a HELLO is received from an IP Address with which there is an 406 established session for that AFI, the HELLO SHOULD be dropped. 408 A device with a TLS/TCP listener SHOULD log or otherwise report 409 repeated failed inbound attempts. 411 6.1. Transport 413 The Transport signals the type of transport security for the session. 415 The actual transport options are actually pre-configured in the 416 devices by provisioning, as most require certificates etc. It is 417 best to think of this field as in-band signaling to conform the 418 correctness of the pre-configurations. Any disagreements MUST BE 419 considered to indicate an error condition and brought to the 420 attention of the operator. 422 The Transport field is an enumeration with the following values: 424 0: Raw TCP: TLS is not used. 426 1: TLS TOFU: TLS using a self-signed server certificate. 428 2: TLS CA-NoIP: TLS using a CA-Based server certificate, with no IP 429 address extension. 431 3: TLS CA WithIP: TLS using a CA-Based server certificate, with the 432 server's IP address in the subject alternative name extension (see 433 [RFC5280] Section 4.2.1.6). 435 4-255: Reserved. 437 If server certificates are to be used, they may be locally generated 438 and then signed by a CA or generated by the CA and loaded. See 439 [RFC8635]. 441 6.2. Flags 443 Though the Working Group scope for this protocol is within a data 444 center, an issue was raised that, on an internet echange with route 445 server(s), it would attempt to form adjacencies with all members of 446 the exchange. Hence a Flag field is provided to indicate that a 447 device does not intend to field a TLS/TCP server on the announcing 448 interface, but does seek one or more from peers. 450 Currently, only one Flags field is defined 452 Bit 0: Client Only This interface does not provide a TLS/TCP server. 454 Bits 1-7: Reserved. 456 6.3. Port 458 The Port is the two octet TCP Port Number (default is TBD3) on which 459 the HELLO sender SHOULD have a waiting TLS/TCP (as specified in 460 Flags) server listening unless the Client Only Flag is set. Though 461 the IANA assigned well-known port SHOULD be used, this field allows 462 configuration of alternate ports. 464 7. TCP Set-Up 466 As it is assumed that the configured deployment of a data center 467 would have compatible parameters on all devices, any disagreement 468 over TLS/TCP or trust anchors MUST be logged; with rate limiting of 469 the logging. 471 By default, GTSM, [RFC5082], SHOULD be enabled to ensure that a SYN 472 received in response to a HELLO is on the local link. It MAY be 473 disabled by configuration. GTSM check failures SHOULD be logged; 474 though with rate limiting to keep from overwhelming logs. 476 If the receiver of a HELLO agrees with the sender's choice of TLS/TCP 477 and authentication, both sides have agreed on an AFI for the 478 transport and on each other's IP address in that AFI. This is 479 sufficient to open a TCP session between them, which will allow for 480 reliable transport of very large data PDUs while obviating the need 481 to invent complex transports. 483 The L3ND peer with the higher IP address MUST act as the TLS/TCP 484 client and open the transport session (send SYN) toward the peer with 485 the lower IP address. 487 The server, the sender of the HELLO from the lower IP address, 488 listens on the advertised port for the TLS/TCP session open. The 489 receiver of the acceptable HELLO, the TLS/TCP client, initiates a TLS 490 or raw TCP session toward the sender of the HELLO, the TLS/TCP 491 server, preferably TLS, as advertised. If TLS, the server has chosen 492 and signaled either a self-signed certificate or one configured from 493 the operational CA trusted by both parties, as negotiated in the 494 HELLO exchange. 496 Once the TLS/TCP session is established, if its interface is 497 configured as point to point, the client side SHOULD stop listening 498 on any port for which it has sent a HELLO. The server, if configured 499 as a point to point interface SHOULD stop sending HELLOs. 501 If the TLS/TCP open fails, then this SHOULD be logged and the parties 502 MUST go back to the initial state and try HELLO. Logging SHOULD be 503 rate limited. 505 Should an interface with an established TLS/TCP session be 506 reconfigured changing the TLS/TCP parameters, the TLS/TCP session 507 should be closed or torn down and both parties should return to the 508 HELLO state. 510 Should the TLS/TCP session terminate for any reason, the devices 511 SHOULD restart/resume HELLOs. When the new TLS/TCP session is 512 started, if possible the OPEN PDU SHOULD try to resume the lost 513 logical session by using the same nonce and resuming from the last 514 Serial Number. 516 Once the TLS/TCP session has been established, the two devices 517 exchange L3ND PDUs, starting with OPENs. 519 8. OPEN 521 Each device has learned the other's IP Address from the HELLO 522 exchange, see Section 6 and established a TLS/TCP session over a 523 particular AFI. 525 The first PDU each sends MUST be an OPEN, and the other side MUST 526 respond with an ACK PDU. 528 0 1 2 3 529 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 530 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 531 | Version = 0 | PDU Type = 1 | Payload Length ~ 532 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 533 ~ | Session ID ~ 534 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 535 ~ | Serial Number ~ 536 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 537 ~ | AttrCount | ~ 538 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + 539 ~ Attribute List ... ~ 540 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 542 The four octet Payload Length is the number of octets in all fields 543 of the PDU from the Session ID through the Serial Number. 545 The four octet Session ID is a nonce which uniquely identifies a 546 session. It enables detection of a duplicate OPEN PDU. It SHOULD be 547 either a random number or a high resolution timestamp. It is needed 548 to prevent session closure due to a repeated OPEN caused by a race or 549 a dropped or delayed ACK. It can be used to resume a dropped logical 550 session. 552 The one octet AttrCount is the number of attributes in the Attribute 553 List. A node may send zero or more attributes. 555 Attributes are single octets the semantics of which are operator- 556 defined, e.g.: spine, leaf, backbone, route reflector, arabica, ... 558 Attribute syntax and semantics are local to an operator or 559 datacenter; hence there is no global registry. Nodes exchange their 560 attributes only in the OPEN PDU. 562 Unlike L3DL [I-D.ietf-lsvr-l3dl], there are no verifiable keys in the 563 PDUs. If the operator wants authentication, integrity, 564 confidentiality, then TLS MUST have been requested by the HELLO and 565 agreed by the TLS session open. 567 The Serial Number is a monotonically increasing four octet value 568 representing the sender's state at the time of sending the last PDU. 569 It may be a non-negative integer, a timestamp, etc. If incrementing 570 the Serial Number would cause it to be zero, it should be incremented 571 again. 573 On session restart (new OPEN, same Session ID), a receiver MAY send 574 the last received Serial Number to tell the sender to only send data 575 with a Serial Number greater (in the [RFC1982] sense), or send a 576 Serial Number of zero to request all data. 578 This allows a sender of an OPEN to tell the receiver that the sender 579 would like to resume a logical session and that the receiver of the 580 OPEN PDU only needs to send data starting with the PDU with the 581 lowest Serial Number greater (in the [RFC1982] sense) than the one 582 sent in the OPEN. If the sender is not trying to resume a dropped 583 session, the Serial Number MUST be zero. 585 If the receiver of an OPEN PDU with a non-zero Serial Number can not 586 resume from the requested point, it should return an ACK with an 587 Error Code of 5, Session May Not Be Continued, EType of 1. The 588 sender of the failing OPEN PDU SHOULD respond with an OPEN PDU with a 589 Serial Number of zero. 591 If a sender of OPEN does not receive an ACK of the OPEN PDU in a 592 configurable time (default 5 seconds), then they SHOULD close or 593 otherwise terminate the TLS/TCP session and restart from the HELLO 594 state. 596 If an OPEN arrives at L3ND speaker A from B with which A believes it 597 already has an L3ND session (i.e. OPENs have already been 598 exchanged), and the Serial Number in B's OPEN PDU is non-zero, 599 speaker A SHOULD establish a new sending session by sending an OPEN 600 with the Serial Number being the same as that of A's last sent and 601 ACKed PDU. A MUST resume sending encapsulations etc. subsequent to 602 the requested Sequence Number. And B MUST retain all previously 603 discovered encapsulation and other data received from A. 605 If an OPEN arrives at L3ND speaker A from B with which A believes it 606 already has an L3ND session (i.e. OPENs have already been 607 exchanged), and the Serial Number in B's OPEN is zero, then the A 608 MUST assume that B's internal state has been reset. All Previously 609 discovered encapsulation data MUST BE discarded; and A MUST respond 610 with a new OPEN PDU with a Serial Number of zero. 612 TCP KeepAlives should be configured and tuned to meet local 613 operational needs. Some defaults and recommendations are needed 614 here. 616 9. ACK 618 The ACK PDU acknowledges receipt of a PDU and reports any error 619 condition which might have been raised. 621 0 1 2 3 622 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 623 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 624 | Version = 0 | PDU Type = 3 | Payload Length = 6 ~ 625 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 626 ~ | ACKed PDU | EType | 627 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 628 | Error Code | Error Hint | 629 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 631 The ACK PDU acknowledges receipt of an OPEN, Encapsulation, Vendor 632 PDU, etc. and is used to return error codes if any. 634 The one octet ACKed PDU is the PDU Type of the PDU being 635 acknowledged, e.g., OPEN, one of the Encapsulations, etc. 637 If there was an error processing the received PDU, then the one octet 638 EType is non-zero. If the EType is zero, Error Code and Error Hint 639 MUST also be zero. 641 A non-zero EType is the receiver's way of telling the PDU's sender 642 that the receiver had problems processing the PDU. The Error Code 643 and Error Hint will tell the sender more detail about the error. 645 The decimal value of EType gives a strong hint how the receiver 646 sending the ACK believes things should proceed: 648 0 - No Error, Error Code and Error Hint MUST be zero 650 1 - Warning, something not too serious happened, continue 652 2 - Session should not be continued, try to restart 653 3 - Restart is hopeless, call the operator 655 4-15 - Reserved 657 The two octet Error Code, noting protocol failures, are listed in 658 Section 16.5. Someone stuck in the 1990s might think the catenation 659 of EType and Error Code as an echo of 0x1zzz, 0x2zzz, etc. They 660 might be right; or not. 662 The two octet Error Hint, is arbitrary additional data the sender of 663 the error PDU thinks will help the recipient or the debugger with the 664 particular error. 666 9.1. Retransmission 668 If a PDU sender expects an ACK, e.g. for an OPEN, an Encapsulation, a 669 Vendor PDU, etc., and does not receive the ACK for a configurable 670 time (default five seconds) the TLS/TCP session should be closed or 671 dropped, and both sides revert to HELLO state. 673 10. The Encapsulations 675 Once the devices know each other's IP Addresses, and have established 676 a TLS/TCP session and have successfully exchanged OPENs, the L3ND 677 session is considered established, and the devices SHOULD exchange 678 Layer-3 interface encapsulations, Layer-3 addresses, and Layer-2.5 679 labels. 681 Encapsulation data for any AFI/SAFI may be exchanged over a TLS/TCP 682 session irrespective of the AFI/SAFI of the session transport. 684 The Encapsulation types the peers exchange may be IPv4 685 (Section 10.3), IPv6 (Section 10.4), MPLS IPv4 (Section 10.6), MPLS 686 IPv6 (Section 10.7), and/or possibly others not defined here. 688 The sender of an Encapsulation PDU MUST NOT assume that the receiver 689 is capable of the same Encapsulation Type. An ACK (Section 9) with 690 EType of 0 merely acknowledges receipt. Only if both peers have sent 691 the same Encapsulation Type is it safe for Layer-3 protocols to 692 assume that they are compatible for that Encapsulation Type. 694 A receiver of an encapsulation might recognize an addressing 695 conflict, such as both ends of the link trying to use the same 696 address. In this case, the receiver MUST respond with an error 697 (Error Code 2, Logical Link Addressing Conflict) ACK. As there may 698 be other usable addresses or encapsulations, this error might log and 699 continue, letting an upper layer topology builder deal with what 700 works. 702 Further, to consider a logical link of a Encapsulation Type to 703 formally be established so that it may be used by other protocols, 704 the addressing for the type must be compatible, e.g. on the same IP 705 subnet. 707 10.1. The Encapsulation PDU Skeleton 709 The header for all encapsulation PDUs is as follows: 711 0 1 2 3 712 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 713 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 714 | Version = 0 | PDU Type | Payload Length ~ 715 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 716 ~ | Count ~ 717 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 718 ~ | Serial Number ~ 719 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 720 ~ | Encapsulation List... ~ 721 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 723 An Encapsulation PDU describes zero or more addresses of the 724 encapsulation type. 726 The three octet Count is the number of Encapsulations in the 727 Encapsulation List. 729 The Serial Number is a monotonically increasing four octet value 730 representing the sender's state in time. It may be an integer, a 731 timestamp, etc. On session restart (new OPEN), a receiver MAY send 732 the last received Serial Number to request the sender to only send 733 newer data. 735 If a sender has multiple links on the same interface, separate state: 736 data, ACKs, etc. must be kept for each peer session. 738 Over time, multiple Encapsulation PDUs may be sent for an interface 739 in a session as configuration changes. 741 The Receiver MUST acknowledge the Encapsulation PDU with an ACK PDU 742 (Section 9) with the Type field being that of the Type of the 743 Encapsulation PDU being announced, see Section 9. 745 If the Sender does not receive an ACK in a configurable interval 746 (default five seconds), they SHOULD retransmit. After a user 747 configurable number of failures (default three), the L3ND session 748 should be considered dead, TLS/TCP torn down, and the HELLO process 749 SHOULD be restarted. 751 If the link is broken below layer-3, retransmission MAY BE retried if 752 data have not changed in the interim and the TLS/TCP session is still 753 alive. 755 Should an Encapsulation in the Encapsulation List be syntactically 756 invalid, e.g. an out of bounds prefix length, the entire 757 Encapsulation PDU MUST be dropped and the sending party notified by 758 an ACK PDU with an EType of 1 and an Error Code of 3, Encapsulation 759 Error. 761 10.2. Encapsulaion Flags 763 The one octet Encapsulation Flags field is a sequence of one bit 764 fields as follows: 766 0 1 2 3 4 ... 7 767 +------------+------------+------------+------------+------------+ 768 | Ann/With | Primary | Under/Over | Loopback | Reserved ..| 769 +------------+------------+------------+------------+------------+ 771 Each encapsulation in an Encapsulation PDU of Type T may announce new 772 and/or withdraw old encapsulations of Type T. It indicates this with 773 the Ann/With Encapsulation Flag, Announce == 1, Withdraw == 0. 775 Announcing an encapsulation which already exists SHOULD raise an 776 Announce/Withdraw Error (see Section 16.5); the EType SHOULD be 2, 777 suggesting a session restart (see Section 9) so all encapsulations 778 will be resent. 780 If an interface on a link has multiple addresses for an encapsulation 781 type, one and only one address MAY be marked as primary (Primary Flag 782 == 1) for that Encapsulation Type. 784 An Encapsulation interface address in an Encapsulation PDU MAY be 785 marked as a loopback, in which case the Loopback bit is set. 786 Loopback addresses are generally not seen directly on an external 787 interface. One or more loopback addresses MAY be exposed by 788 configuration on one or more L3ND speaking external interfaces, e.g. 789 for iBGP peering. They SHOULD be marked as such, Loopback Flag == 1. 791 Each Encapsulation interface address in an Encapsulation PDU is that 792 of the direct 'underlay interface (Under/Over == 1), or an 'overlay' 793 address (Under/Over == 0), likely that of a VM or container guest 794 bridged or configured on to the interface already having an underlay 795 address. 797 10.3. IPv4 Encapsulation 799 The IPv4 Encapsulation describes a device's ability to exchange IPv4 800 packets on one or more subnets. It does so by stating the 801 interface's addresses and the corresponding prefix lengths. 803 0 1 2 3 804 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 805 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 806 | Version = 0 | PDU Type = 4 | Payload Length ~ 807 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 808 ~ | Count ~ 809 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 810 ~ | Serial Number ~ 811 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 812 ~ | Encaps Flags | IPv4 Address ~ 813 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 814 ~ | PrefixLen | more ... ~ 815 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 817 The three octet Count is the sum of the number of IPv4 Encapsulations 818 being announced and/or withdrawn. 820 10.4. IPv6 Encapsulation 822 The IPv6 Encapsulation describes a link's ability to exchange IPv6 823 packets on one or more subnets. It does so by stating the 824 interface's addresses and the corresponding prefix lengths. 826 0 1 2 3 827 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 828 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 829 | Version = 0 | PDU Type = 5 | Payload Length ~ 830 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 831 ~ | Count ~ 832 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 833 ~ | Serial Number ~ 834 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 835 ~ | Encaps Flags | ~ 836 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + 837 | ~ 838 + + 839 ~ ~ 840 + + 841 ~ IPv6 Prefix ~ 842 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 843 ~ | PrefixLen | more ... ~ 844 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 846 The three octet Count is the sum of the number of IPv6 Encapsulations 847 being announced and/or withdrawn. 849 10.5. MPLS Label List 851 As an MPLS enabled interface may have a label stack, see [RFC3032], a 852 variable length list of labels is needed. These are the labels the 853 sender will accept for the prefix to which the list is attached. 855 0 1 2 3 856 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 857 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 858 | Label Count | Label | Exp |S| 859 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 860 ~ Label | Exp |S| more ... ~ 861 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 863 A one octet Label Count of zero is an implicit withdraw of all labels 864 for that prefix on that interface. 866 The bottom of the stack flag, S, MUST be set on one and only one 867 label. Should this not be the case, the receiver of the erroneous 868 PDU MUST respond with an ACK PDU of EType 1 and Error Code 1, MPLS 869 Error. 871 10.6. MPLS IPv4 Encapsulation 873 The MPLS IPv4 Encapsulation describes a logical link's ability to 874 exchange labeled IPv4 packets on one or more subnets. It does so by 875 stating the interface's addresses the corresponding prefix lengths, 876 and the corresponding labels which will be accepted for each address. 878 0 1 2 3 879 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 880 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 881 | Version = 0 | PDU Type = 6 | Payload Length ~ 882 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 883 ~ | Count ~ 884 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 885 ~ | Serial Number ~ 886 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 887 ~ | Encaps Flags | MPLS Label List ... ~ 888 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 889 | IPv4 Address | 890 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 891 | PrefixLen | more ... ~ 892 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 894 The three octet Count is the sum of the number of MPLSv4 895 Encapsulation being announced and/or withdrawn. 897 10.7. MPLS IPv6 Encapsulation 899 The MPLS IPv6 Encapsulation describes a logical link's ability to 900 exchange labeled IPv6 packets on one or more subnets. It does so by 901 stating the interface's addresses, the corresponding prefix lengths, 902 and the corresponding labels which will be accepted for each address. 904 0 1 2 3 905 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 906 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 907 | Version = 0 | PDU Type = 7 | Payload Length ~ 908 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 909 ~ | Count ~ 910 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 911 ~ | Serial Number ~ 912 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 913 ~ | Encaps Flags | MPLS Label List ... | 914 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 915 | ~ 916 + + 917 ~ ~ 918 + IPv6 Address + 919 ~ ~ 920 + + 921 ~ | 922 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 923 | Prefix Len | more ... ~ 924 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 926 The three octet Count is the sum of the number of MPLSv6 927 Encapsulations being announced and/or withdrawn. 929 11. VENDOR - Vendor Extensions 931 0 1 2 3 932 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 933 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 934 | Version = 0 | PDU Type = 255| Payload Length ~ 935 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 936 ~ | Serial Number ~ 937 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 938 ~ | Enterprise Number ~ 939 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 940 ~ | Ent Type | Enterprise Data ... ~ 941 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 943 Vendors or enterprises may define TLVs beyond the scope of L3ND 944 standards. This is done using a Private Enterprise Number [IANA-PEN] 945 followed by Enterprise Data in a format defined for that three octet 946 Enterprise Number and one octet Ent Type. 948 Ent Type allows a Vendor PDU to be sub-typed in the event that the 949 vendor/enterprise needs multiple PDU types. 951 As with Encapsulation PDUs, a receiver of a Vendor PDU MUST respond 952 with an ACK PDU, possibly signalling an error. Similarly, a Vendor 953 PDU MUST only be sent over an open session. 955 12. Discussion 957 This section explores some trade-offs taken and some considerations. 959 12.1. HELLO Discussion 961 A device may send IP packets over a Layer-3 interface which transmits 962 data over a single Layer-2 interface or multiple Layer-2 interfaces. 963 Packets sourced by one Layer-3 IP interface over multiple Layer-2 964 should consider that a Layer-3 interface with multiple Layer-2 965 interfaces could have many devices which might come at various times, 966 therefore the configured HELLO PDU retransmit time SHOULD be set to a 967 non-zero value, and periodic HELLOs should continue. Packets 968 transmitted on a single Layer-2 interface on a point-to-point (p2p) 969 connection, MAY set the configuration value to zero, so when a TLS/ 970 TCP session is up, HELLOs are no longer desirable. 972 A device with multiple Layer-2 interfaces, traditionally called a 973 switch, may be used to forward packets from multiple devices to one 974 Layer-3 interface, I, on an L3ND speaking device. Interface I could 975 discover a peer J across the switch. Later, a prospective peer K 976 could come up across the switch. If I was not still sending and 977 listening for HELLOs, the potential peering with K could not be 978 discovered. Therefore, on multi-link interfaces, L3ND MUST continue 979 to send HELLOs as long as they are turned up. 981 13. VLANs/SVIs/Sub-interfaces 983 One can think of the protocol as an instance (i.e. state machine) 984 which runs on each logical link of a device. 986 As the upper routing layer must view VLAN topologies as separate 987 graphs, L3ND treats VLAN ports as separate links. 989 As Sub-Interfaces each have their own layer-3 identities, they act as 990 separate interfaces, forming their own links. 992 14. Implementation Considerations 994 An implementation SHOULD provide the ability to configure each 995 logical interface as L3ND speaking or not. 997 An implementation SHOULD provide the ability to distribute one or 998 more loopback addresses or interfaces into L3ND on an external L3ND 999 speaking interface. 1001 An implementation SHOULD provide the ability to distribute one or 1002 more overlay and/or underlay addresses or interfaces into L3ND on an 1003 external L3ND speaking interface. 1005 An implementation SHOULD provide the ability to configure one of the 1006 addresses of an encapsulation as primary on an L3ND speaking 1007 interface. If there is only one address for a particular 1008 encapsulation, the implementation MAY mark it as primary by default. 1010 An implementation MAY allow optional configuration which updates the 1011 local forwarding table with overlay and underlay data both learned 1012 from L3ND peers and configured locally. 1014 15. Security Considerations 1016 For TLS, versions greater than 1.1 MUST be used. 1018 The protocol as is MUST NOT be used outside a datacenter or similarly 1019 closed environment without using TLS encapsulation which is based on 1020 a configured CA trust anchor. 1022 Many datacenter operators have a strange belief that physical walls 1023 and firewalls provide sufficient security. This is not credible. 1024 All DC protocols need to be examined for exposure and attack surface. 1025 In the case of L3ND, authentication and integrity as provided by TLS 1026 validated to a configured shared CA trust anchor is strongly 1027 RECOMMENDED. 1029 It is generally unwise to assume that on the wire Layer-3 is secure. 1030 Strange/unauthorized devices may plug into a port. Mis-wiring is 1031 very common in datacenter installations. A poisoned laptop might be 1032 plugged into a device's port, form malicious sessions, etc. to 1033 divert, intercept, or drop traffic. 1035 Similarly, malicious nodes/devices could mis-announce addressing. 1037 If OPEN PDUs are not over validated TLS, an attacker could forge an 1038 OPEN for an existing session and cause the session to be reset. 1040 16. IANA Considerations 1041 16.1. Link Local Layer-3 Addresses 1043 IANA is requested to assignment one address (TBD1) for L3DL-L3-LL 1044 from the IPv4 Multicast Address Space Registry from the Local Network 1045 Control Block (224.0.0.0 - 224.0.0.255 (224.0.0/24)). 1047 IANA is requested to assign one address (TBD2) for L3DL-L3-LL from 1048 the IPv6 Multicast Address Space Registry in the the IPv6 Link-Local 1049 Scope Multicast address (TBD:2). 1051 16.2. Ports for TLS/TCP 1053 This document requests the IANA to assign a well-known TCP Port 1054 Number (TBD3) to the Layer-3 Neighbor Discovery Protocol for the 1055 following, see Section 7: 1057 l3nd-server 1059 16.3. PDU Types 1061 This document requests the IANA create a registry for L3ND PDU Type, 1062 which may range from 0 to 255. The name of the registry should be 1063 L3ND-PDU-Type. The policy for adding to the registry is RFC Required 1064 per [RFC5226], either standards track or experimental. The initial 1065 entries should be the following: 1067 PDU 1068 Code PDU Name 1069 ---- ------------------- 1070 0 HELLO 1071 1 reserved 1072 2 OPEN 1073 3 ACK 1074 4 IPv4 Announcement 1075 5 IPv6 Announcement 1076 6 MPLS IPv4 Announcement 1077 7 MPLS IPv6 Announcement 1078 8-254 Reserved 1079 255 Vendor 1081 16.4. Flag Bits 1083 This document requests the IANA create a registry for L3ND PL Flag 1084 Bits, which may range from 0 to 7. The name of the registry should 1085 be L3ND-PL-Flag-Bits. The policy for adding to the registry is RFC 1086 Required per [RFC5226], either standards track or experimental. The 1087 initial entries should be the following: 1089 Bit Bit Name 1090 ---- ------------------- 1091 0 Announce/Withdraw (ann == 0) 1092 1 Primary 1093 2 Underlay/Overlay (under == 0) 1094 3 Loopback 1095 4-7 Reserved 1097 16.5. Error Codes 1099 This document requests the IANA create a registry for L3ND Error 1100 Codes, a 16 bit integer. The name of the registry should be L3ND- 1101 Error-Codes. The policy for adding to the registry is RFC Required 1102 per [RFC5226], either standards track or experimental. The initial 1103 entries should be the following: 1105 Error 1106 Code Error Name 1107 ---- ------------------- 1108 0 No Error 1109 1 MPLS Error 1110 2 Logical Link Addressing Conflict 1111 3 Encapsulation Error 1112 4 Announce/Withdraw Error 1113 5 Session May Not Be Continued 1115 17. Acknowledgments 1117 The authors thank Ben Maddison and Jeff Haas. 1119 18. References 1121 18.1. Normative References 1123 [I-D.ietf-lsvr-l3dl] 1124 Bush, R., Austein, R., and K. Patel, "Layer-3 Discovery 1125 and Liveness", Work in Progress, Internet-Draft, draft- 1126 ietf-lsvr-l3dl-08, 14 October 2021, 1127 . 1130 [IANA-PEN] "IANA Private Enterprise Numbers", 1131 . 1134 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1135 Requirement Levels", BCP 14, RFC 2119, 1136 DOI 10.17487/RFC2119, March 1997, 1137 . 1139 [RFC3032] Rosen, E., Tappan, D., Fedorkow, G., Rekhter, Y., 1140 Farinacci, D., Li, T., and A. Conta, "MPLS Label Stack 1141 Encoding", RFC 3032, DOI 10.17487/RFC3032, January 2001, 1142 . 1144 [RFC4271] Rekhter, Y., Ed., Li, T., Ed., and S. Hares, Ed., "A 1145 Border Gateway Protocol 4 (BGP-4)", RFC 4271, 1146 DOI 10.17487/RFC4271, January 2006, 1147 . 1149 [RFC5082] Gill, V., Heasley, J., Meyer, D., Savola, P., Ed., and C. 1150 Pignataro, "The Generalized TTL Security Mechanism 1151 (GTSM)", RFC 5082, DOI 10.17487/RFC5082, October 2007, 1152 . 1154 [RFC5226] Narten, T. and H. Alvestrand, "Guidelines for Writing an 1155 IANA Considerations Section in RFCs", RFC 5226, 1156 DOI 10.17487/RFC5226, May 2008, 1157 . 1159 [RFC5280] Cooper, D., Santesson, S., Farrell, S., Boeyen, S., 1160 Housley, R., and W. Polk, "Internet X.509 Public Key 1161 Infrastructure Certificate and Certificate Revocation List 1162 (CRL) Profile", RFC 5280, DOI 10.17487/RFC5280, May 2008, 1163 . 1165 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 1166 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 1167 May 2017, . 1169 [RFC8635] Bush, R., Turner, S., and K. Patel, "Router Keying for 1170 BGPsec", RFC 8635, DOI 10.17487/RFC8635, August 2019, 1171 . 1173 18.2. Informative References 1175 [Clos] "Clos Network", 1176 . 1178 [I-D.ymbk-idr-l3nd-ulpc] 1179 Bush, R. and K. Patel, "L3ND Upper-Layer Protocol 1180 Configuration", Work in Progress, Internet-Draft, draft- 1181 ymbk-idr-l3nd-ulpc-04, 21 March 2022, 1182 . 1185 [RFC1122] Braden, R., Ed., "Requirements for Internet Hosts - 1186 Communication Layers", STD 3, RFC 1122, 1187 DOI 10.17487/RFC1122, October 1989, 1188 . 1190 [RFC1982] Elz, R. and R. Bush, "Serial Number Arithmetic", RFC 1982, 1191 DOI 10.17487/RFC1982, August 1996, 1192 . 1194 [RFC4760] Bates, T., Chandra, R., Katz, D., and Y. Rekhter, 1195 "Multiprotocol Extensions for BGP-4", RFC 4760, 1196 DOI 10.17487/RFC4760, January 2007, 1197 . 1199 Authors' Addresses 1201 Randy Bush 1202 Arrcus & Internet Initiative Japan 1203 5147 Crystal Springs 1204 Bainbridge Island, WA 98110 1205 United States of America 1206 Email: randy@psg.com 1208 Russ Housley 1209 Vigil Security, LLC 1210 516 Dranesville Road 1211 Herndon, VA 20170 1212 United States of America 1213 Email: housley@vigilsec.com 1215 Rob Austein 1216 Arrcus, Inc 1217 Email: sra@hactrn.net 1218 Susan Hares 1219 Hickory Hill Consulting 1220 7453 Hickory Hill 1221 Saline, MI 48176 1222 United States of America 1223 Phone: +1-734-604-0332 1224 Email: shares@ndzh.com 1226 Keyur Patel 1227 Arrcus 1228 2077 Gateway Place, Suite #400 1229 San Jose, CA 95119 1230 United States of America 1231 Email: keyur@arrcus.com