idnits 2.17.00 (12 Aug 2021) /tmp/idnits14921/draft-ietf-nvo3-encap-07.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (July 29, 2021) is 296 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- No issues found here. Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 NVO3 Workgroup S. Boutros, Ed. 2 Internet-Draft Ciena 3 Intended Status: Informational D. Eastlake, Ed. 4 Futurewei 5 Expires: January 28, 2022 July 29, 2021 7 NVO3 Encapsulation Considerations 8 draft-ietf-nvo3-encap-07 10 Abstract 11 As communicated by the WG Chairs, the IETF NVO3 chairs and Routing 12 Area director have chartered a design team to take forward the 13 encapsulation discussion and see if there is potential to design a 14 common encapsulation that addresses the various technical concerns. 16 There are implications of different encapsulations in real 17 environments consisting of both software and hardware implementations 18 and spanning multiple data centers. For example, OAM functions such 19 as path MTU discovery become challenging with multiple encapsulations 20 along the data path. 22 The design team recommends Geneve with a few modifications as the 23 common encapsulation. This document provides more details, 24 particularly in Section 7. 26 Status of This Document 28 This Internet-Draft is submitted in full conformance with the 29 provisions of BCP 78 and BCP 79. 31 Distribution of this document is unlimited. Comments should be sent 32 to the authors or the IDR Working Group mailing list . 34 Internet-Drafts are working documents of the Internet Engineering 35 Task Force (IETF), its areas, and its working groups. Note that 36 other groups may also distribute working documents as Internet- 37 Drafts. 39 Internet-Drafts are draft documents valid for a maximum of six months 40 and may be updated, replaced, or obsoleted by other documents at any 41 time. It is inappropriate to use Internet-Drafts as reference 42 material or to cite them other than as "work in progress." 44 The list of current Internet-Drafts can be accessed at 45 https://www.ietf.org/1id-abstracts.html. The list of Internet-Draft 46 Shadow Directories can be accessed at 47 https://www.ietf.org/shadow.html. 49 Copyright Notice 51 Copyright (c) 2021 IETF Trust and the persons identified as the 52 document authors. All rights reserved. 54 This document is subject to BCP 78 and the IETF Trust's Legal 55 Provisions Relating to IETF Documents 56 (http://trustee.ietf.org/license-info) in effect on the date of 57 publication of this document. Please review these documents 58 carefully, as they describe your rights and restrictions with respect 59 to this document. Code Components extracted from this document must 60 include Simplified BSD License text as described in Section 4.e of 61 the Trust Legal Provisions and are provided without warranty as 62 described in the Simplified BSD License. 64 Table of Contents 66 1. Introduction............................................4 67 2. Design Team Goals.......................................4 68 3. Terminology.............................................5 69 4. Abbreviations and Acronyms..............................5 71 5. Issues with Current Encapsulations......................6 72 5.1. Geneve................................................6 73 5.2. GUE (Generic UDP Encapsulation).......................6 74 5.3. VXLAN-GPE.............................................6 76 6. Common Encapsulation Considerations.....................7 77 6.1. Current Encapsulations................................7 78 6.2. Useful Extensions Use Cases...........................7 79 6.2.1. Telemetry Extensions................................7 80 6.2.2. Security/Integrity Extensions.......................8 81 6.2.3 Group Based Policy...................................8 82 6.3. Hardware Considerations...............................9 83 6.4. Extension Size........................................9 84 6.5. Ordering of Extension Headers........................10 85 6.6. TLV versus Bit Fields................................10 86 6.7. Control Plane Considerations.........................11 87 6.8. Split NVE............................................12 88 6.9. Larger VNI Considerations............................12 90 7. Design Team Recommendations............................14 92 8. Acknowledgements.......................................17 93 9. Security Considerations................................17 94 10. IANA Considerations...................................17 96 11. References............................................18 97 11.1 Normative References.................................18 98 11.2 Informative References...............................18 100 Appendix A: Encapsulations Comparison.....................20 101 A.1. Overview.............................................20 102 A.2. Extensibility........................................20 103 A.2.1. Native Extensibility Support.......................20 104 A.2.2. Extension Parsing..................................20 105 A.2.3. Critical Extensions................................21 106 A.2.4. Maximal Header Length..............................21 107 A.3. Encapsulation Header.................................21 108 A.3.1. Virtual Network Identifier (VNI)...................21 109 A.3.2. Next Protocol......................................21 110 A.3.3. Other Header Fields................................22 111 A.4. Comparison Summary...................................23 113 Contributors..............................................25 115 1. Introduction 117 As communicated by the WG Chairs, the NVO3 WG Charter states that it 118 may produce requirements for network virtualization data planes based 119 on encapsulation of virtual network traffic over an IP-based underlay 120 data plane. Such requirements should consider OAM and security. 121 Based on these requirements the WG will select, extend, and/or 122 develop one or more data plane encapsulation format(s). 124 This has led to WG drafts and an RFC describing three encapsulations 125 as follows: 127 - [RFC8926] Geneve: Generic Network Virtualization Encapsulation 129 - [I-D.ietf-intarea-gue] Generic UDP Encapsulation 131 - [I-D.ietf-nvo3-vxlan-gpe] Generic Protocol Extension for VXLAN 132 (VXLAN-GPE) 134 Discussion on the list and in face-to-face meetings has identified a 135 number of technical problems with each of these encapsulations. 136 Furthermore, there was clear consensus at the 96th IETF meeting in 137 Berlin that it is undesirable for the working group to progress more 138 than one data plane encapsulation. Although consensus could not be 139 reached on the list, the overall consensus was for a single 140 encapsulation [RFC2418], Section 3.3. 142 Nonetheless there has been resistance to converging on a single 143 encapsulation format. 145 2. Design Team Goals 147 As communicated by the WG Chairs, the design team (DT) should take 148 one of the proposed encapsulations and enhance it to address the 149 technical concerns. The simple evolution of deployed networks as 150 well as applicability to all locations in the NVO3 architecture are 151 goals. The DT should specifically avoid a design that is burdensome 152 on hardware implementations but should allow future extensibility. 153 The chosen design should also operate well with ICMP and in ECMP 154 environments. If further extensibility is required, then it should 155 be done in such a manner that it does not require the consent of an 156 entity outside of the IETF. 158 3. Terminology 160 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 161 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 162 "OPTIONAL" in this document are to be interpreted as described in BCP 163 14 [RFC2119] [RFC8174] when, and only when, they appear in all 164 capitals, as shown here. 166 4. Abbreviations and Acronyms 168 DT NVO3 encapsulation Design Team 170 EVPN Ethernet VPN [RFC8365] 172 GUE Generic UDP Encapsulation [I-D.ietf-intarea-gue] 174 NVO3 Network Virtualization Overlays over Layer 3 176 OAM Operations, Administration, and Maintenance 178 TLV Type, Length, and Value 180 VNI Virtual Network Identifier 182 NVE Network Virtualization Edge 184 NVA Network Virtualization Authority 186 NIC Network interface card 188 TCAM Ternary Content-Addressable Memory 190 Transit device - Underlay network devices between NVE(s). 192 5. Issues with Current Encapsulations 194 The following subsections describe issues with current encapsulations 195 as summarized by the WG Chairs: 197 5.1. Geneve 199 - Can't be implemented cost-effectively in all use cases because 200 variable length header and order of the TLVs makes is costly (in 201 terms of number of gates) to implement in hardware. 203 - Header doesn't fit into largest commonly available parse buffer 204 (256 bytes in NIC). Cannot justify doubling buffer size unless it is 205 mandatory for hardware to process additional option fields. 207 5.2. GUE (Generic UDP Encapsulation) 209 - There were a significant number of objections to GUE 210 [I-D.ietf-intarea-gue] related to the complexity of implementation in 211 hardware, similar to those noted for Geneve above. 213 5.3. VXLAN-GPE 215 - GPE is not day-1 backwards compatible with VXLAN [RFC7348]. 216 Although the frame format is similar, it uses a different UDP port, 217 so would require changes to existing implementations even if the rest 218 of the GPE frame is the same. 220 - GPE is insufficiently extensible. Numerous extensions and options 221 have been designed for GUE and Geneve. Note that these have not yet 222 been validated by the WG. 224 - Security, e.g., of the VNI, has not been addressed by GPE. 225 Although a shim header could be used for security and other 226 extensions, this has not been defined yet and its implications on 227 offloading in NICs are not understood. 229 6. Common Encapsulation Considerations 231 6.1. Current Encapsulations 233 Appendix A includes a detailed comparison between the three proposed 234 encapsulations. The comparison indicates several common properties 235 but also three major differences among the encapsulations: 237 - Extensibility: Geneve and GUE were defined with built-in 238 extensibility, while VXLAN-GPE is not inherently extensible. Note 239 that any of the three encapsulations can be extended using the 240 Network Service Header (NSH [RFC8300]). 242 - Extension method: Geneve is extensible using Type/Length/Value 243 (TLV) fields, while GUE uses a small set of possible extensions, and 244 a set of flags that indicate which extensions are present. 246 - Length field: Geneve and GUE include a Length field, indicating the 247 length of the encapsulation header while VXLAN-GPE does not include 248 such a field. 250 6.2. Useful Extensions Use Cases 252 Non vendor specific TLVs MUST follow the standardization process. 253 The following use cases for extensions shows that there is a strong 254 requirement to support variable length extensions with possible 255 different subtypes. 257 6.2.1. Telemetry Extensions 259 In several scenarios it is beneficial to make information about the 260 path a packet took through the network or through a network device as 261 well as associated telemetry information available to the operator. 263 This includes not only tasks like debugging, troubleshooting, and 264 network planning and optimization but also policy or service level 265 agreement compliance checks. 267 Packet scheduling algorithms, especially for balancing traffic across 268 equal cost paths or links, often leverage information contained 269 within the packet, such as protocol number, IP address, or MAC 270 address. Probe packets would thus either need to be sent between the 271 exact same endpoints with the exact same parameters, or probe packets 272 would need to be artificially constructed as "fake" packets and 273 inserted along the path. Both approaches are often not feasible from 274 an operational perspective, be it that access to the end-system is 275 not feasible, or that the diversity of parameters and associated 276 probe packets to be created is simply too large. An extension 277 providing an in-band telemetry mechanism [I-D.ietf-ippm-ioam-data] is 278 an alternative in those cases. 280 6.2.2. Security/Integrity Extensions 282 Since the currently proposed NVO3 encapsulations do not protect their 283 headers, a single bit corruption in the VNI field could deliver a 284 packet to the wrong tenant. Extension headers are needed to use any 285 sophisticated security. 287 The possibility of VNI spoofing with an NVO3 protocol is exacerbated 288 by using UDP. Systems typically have no restrictions on applications 289 being able to send to any UDP port so an unprivileged application can 290 trivially spoof VXLAN [RFC7348] packets for instance, including using 291 arbitrary VNIs. 293 One can envision HMAC-like support in an NVO3 extension to 294 authenticate the header and the outer IP addresses, thereby 295 preventing attackers from injecting packets with spoofed VNIs. 297 Another aspect of security is payload security. Essentially this is 298 to make packets that look like IP|UDP|NVO3 Encap|DTLS/IPSEC-ESP 299 Extension|payload. This is desireable since we still have the UDP 300 header for ECMP, the NVO3 header is in plain text so it can be read 301 by network elements, and different security or other payload 302 transforms can be supported on a single UDP port (we don't need a 303 separate UDP port for DTLS/IPSEC). 305 6.2.3 Group Based Policy 307 Another use case would be to carry the Group Based Policy (GBP) 308 source group information within a NVO3 header extension in a similar 309 manner as has been implemented for VXLAN 310 [I-D.smith-vxlan-group-policy]. This allows various forms of policy 311 such as access control and QoS to be applied between abstract groups 312 rather than coupled to specific endpoint addresses. 314 6.3. Hardware Considerations 316 Hardware restrictions should be taken into consideration along with 317 future hardware enhancements that may provide more flexible metadata 318 processing. However, the set of options that need to and will be 319 implemented in hardware will be a subset of what is implemented in 320 software, since software NVEs are likely to grow features, and hence 321 option support, at a more rapid rate. 323 We note that it is hard to predict which options will be implemented 324 in which piece of hardware and when. That depends on whether the 325 hardware will be in the form of a NIC providing increasing offload 326 capabilities to software NVEs, or a switch chip being used as an NVE 327 gateway towards non-NVO3 parts of the network, or even a transit 328 device that participates in the NVO3 dataplane, e.g., for OAM 329 purposes. 331 A result of this is that it doesn't look useful to prescribe some 332 order of the option so that the ones that are likely to be 333 implemented in hardware come first; we can't decide such an order 334 when we define the options, however a control plane can enforce such 335 an order for some hardware implementation. 337 We do know that hardware needs to initially be able to efficiently 338 skip over the NVO3 header to find the inner payload. That is needed 339 both for NICs implementing various TCP offload mechanisms and for 340 transit devices and NVEs applying policy/ACLs to the inner payload. 342 6.4. Extension Size 344 Extension header length has a significant impact on hardware and 345 software implementations. A total header length that is too small 346 will unnecessarily constrain software flexibility. A total header 347 length that is too large will place a nontrivial cost on hardware 348 implementations. Thus, the DT recommends that there be a minimum and 349 maximum total extension header length specified. The maximum total 350 header length is determined by the size of the bit field allocated 351 for the total extension header length field. The risk with this 352 approach is that it may be difficult to extend the total header size 353 in the future. The minimum total header length is determined by a 354 requirement in the specifications that all implementations must meet. 355 The risk with this approach is that all implementations will only 356 implement the minimum total header length which would then become the 357 de facto maximum total header length. The recommended minimum total 358 header length is 64 bytes. 360 The size of an extension header should always be 4 byte aligned. 362 The maximum length of a single option should be large enough to meet 363 the different extension use case requirements, e.g., in-band 364 telemetry and future use. 366 6.5. Ordering of Extension Headers 368 To support hardware nodes at the target NVE or at a transit device 369 that can process one or a few extension headers in TCAM, a control 370 plane in such a deployment can signal a capability to ensure a 371 specific extension header will always appear in a specific order, for 372 example the first one in the packet. 374 The order of the extension headers should be hardware friendly for 375 both the sender and the receiver and possibly the transit device 376 also. 378 Transit devices don't participate in control plane communication 379 between the end points and are not required to process the extension 380 headers; however, if they do, they may need to process only a small 381 subset of extension headers that will be consumed by target NVEs. 383 6.6. TLV versus Bit Fields 385 If there is a well-known initial set of options that are likely to be 386 implemented in software and in hardware, it can be efficient to use 387 the bit fields approach as in GUE. However, as described in section 388 6.3, if options are added over time and different subsets of options 389 are likely to be implemented in different pieces of hardware, then it 390 would be hard for the IETF to specify which options should get the 391 early bit fields. TLVs are a lot more flexible, which avoids the 392 need to determine the relative importance different options. 393 However, general TLV of arbitrary order, size, and repetition of the 394 same order is difficult to implement in hardware. A middle ground is 395 to use TLVs with restrictions on their size and alignment, observing 396 that individual TLVs can have a fixed length, and support via the 397 control plane a method such that an NVE will only receive options 398 that it needs and implements. The control plane approach can 399 potentially be used to control the order of the TLVs sent to a 400 particular NVE. Note that transit devices are not likely to 401 participate in the control plane; hence, to the extent that they need 402 to participate in option processing, some other method must be used. 403 Transit devices would have issues with future GUE bit fields being 404 defined for future options as well. 406 A benefit of TLVs from a hardware perspective is that they are self 407 describing, i.e., all the information is in the TLV. In a bit field 408 approach, the hardware needs to look up the bit to determine the 409 length of the data associated with the bit through some separate 410 table, which would add hardware complexity. 412 There are use cases where multiple modules of software are running on 413 an NVE. This can be modules such as a diagnostic module by one 414 vendor that does packet sampling and another module from a different 415 vendor that implements a firewall. Using a TLV format, it is easier 416 to have different software modules process different TLVs, which 417 could be standard extensions or vendor specific extensions defined by 418 the different vendors, without conflicting with each other. This can 419 help with hardware modularity as well. There are some 420 implementations with options that allows different software modules, 421 like MAC learning and security, to process different options. 423 6.7. Control Plane Considerations 425 Given that we want to allow considerable flexibility and 426 extensibility for, e.g., software NVEs, yet be able to support 427 important extensions in less flexible contexts such as hardware NVEs, 428 it is useful to consider the control plane. By control plane in this 429 section we mean both protocols, such as EVPN [RFC8365] and others, 430 and deployment specific configuration. 432 If each NVE can express in the control plane that it only supports 433 certain extensions (could be a single extension, or a few), and the 434 source NVEs only include supported extensions in the NVO3 packets, 435 then the target NVE can both use a simpler parser (e.g., a TCAM might 436 be usable to look for a single NVO3 extension) and the depth of the 437 inner payload in the NVO3 packet will be minimized. Furthermore, if 438 the target NVE cares about a few extensions and can express in the 439 control plane the desired order of those extensions in the NVO3 440 packets, then it can provide useful functionality with simplified 441 hardware requirements for the target NVE. 443 Note that transit devices that are not aware of the NVO3 extensions 444 somewhat benefit from such an approach, since the inner payload is 445 less deep in the packet if no extraneous extension headers are 446 included in the packet. In general, a transit device is not likely 447 to participate in the NVO3 control plane. (However, configuration 448 mechanisms can take into account limitations of the transit devices 449 used in particular deployments.) 451 Note that with this approach different NVEs could desire different 452 extensions or sets of extensions, which means that the source NVE 453 needs to be able to place different sets of extensions in different 454 NVO3 packets, and perhaps in different order. It also assumes that 455 underlay multicast or replication servers are not used together with 456 NVO3 extension headers. 458 There is a need to consider mandatory extensions versus optional 459 extensions. Mandatory extensions require the receiver to drop the 460 packet if the extension is unknown. A control plane mechanism can 461 prevent the need for dropping unknown extensions, since they would 462 not be included to target NVEs that do not support them. 464 The control planes defined today need to add the ability to describe 465 the different encapsulations. Thus, perhaps EVPN [RFC8365] and any 466 other control plane protocol that the IETF defines should have a way 467 to indicate the supported NVO3 extensions and their order, for each 468 of the encapsulations supported. 470 The WG should consider developing a separate draft on guidance for 471 option processing and control plane participation. This should 472 provide examples/guidance on range of usage models and deployments 473 scenarios for specific options and ordering that are relevant for 474 that specific deployment. This includes end points and middle boxes 475 using the options. So, having the control plane negotiate the 476 constraints is the most appropriate and flexible way to address these 477 requirements. 479 6.8. Split NVE 481 If the working group sees a need for having the hosts send and 482 receive options in a split NVE case [RFC8394], this is possible using 483 any of the existing extensible encapsulations (Geneve, GUE, GPE+NSH) 484 by defining a way to carry those over other transports. NSH can 485 already be used over different transports. 487 If we need to do this with other encapsulations it can be done by 488 defining an Ethertype for other encapsulations so that it can be 489 carried over Ethernet and 802.1Q. 491 If we need to carry other encapsulations over MPLS, it would require 492 an EVPN control plane to signal that other encapsulation header + 493 options will be present in front of the L2 packet. The VNI can be 494 ignored in the header, and the MPLS label will be the one used to 495 identify the EVPN L2 instance. 497 6.9. Larger VNI Considerations 499 We discussed whether we should make the VNI 32-bits or larger. The 500 benefit of a 24-bit VNI would be to avoid unnecessary changes with 501 existing proposals and implementations that are almost all, if not 502 all, using 24-bit VNI. If we need a larger VNI, an extension can be 503 used to support that. 505 7. Design Team Recommendations 507 We concluded that Geneve is most suitable as a starting point for a 508 proposed standard for network virtualization, for the following 509 reasons: 511 1. We studied whether VNI should be in the base header or in an 512 extension header and whether it should be a 24-bit or 32-bit field. 513 The design team agreed that VNI is critical information for network 514 virtualization and MUST be present in all packets. The design team 515 also agreed that a 24-bit VNI matches the existing widely used 516 encapsulation formats, i.e., VXLAN [RFC7348] and NVGRE [RFC7637], and 517 hence is more suitable to use going forward. 519 2. The Geneve header has the total options length which allows 520 skipping over the options for NIC offload operations and will allow 521 transit devices to view flow information in the inner payload. 523 3. We considered the option of using NSH [RFC8300] with VXLAN-GPE 524 but given that NSH is targeted at service chaining and contains 525 service chaining information, it is less suitable for the network 526 virtualization use case. The other downside for VXLAN-GPE was lack 527 of a header length in VXLAN-GPE which makes skipping over the headers 528 to process inner payload more difficult. Total Option Length is 529 present in Geneve. It is not possible to skip any options in the 530 middle with VXLAN-GPE. In principle a split between a base header 531 and a header with options is interesting (whether that options header 532 is NSH or some new header without ties to a service path). We 533 explored whether it would make sense to either use NSH for this, or 534 define a new NVO3 options header. However, we observed that this 535 makes it slightly harder to find the inner payload since the length 536 field is not in the NVO3 header itself. Thus, one more field would 537 have to be extracted to compute the start of the inner payload. 538 Also, if the experience with IPv6 extension headers is a guide, there 539 would be a risk that key pieces of hardware might not implement the 540 options header, resulting in future calls to deprecate its use. 541 Making the options part of the base NVO3 header has less of those 542 issues. Even though the implementation of any particular option can 543 not be predicted ahead of time, the option mechanism and ability to 544 skip the options is likely to be broadly implemented. 546 4. We compared the TLV vs bit fields style extension and it was 547 deemed that parsing both TLV and bit fields is expensive and while 548 bit fields may be simpler to parse, it is also more restrictive and 549 requires guessing which extensions will be widely implemented so they 550 can get early bit assignments, given that half the bits are already 551 assigned in GUE, a widely deployed extension may appear in a flag 552 extension, and this will require extra processing, to dig the flag 553 from the flag extension and then look for the extension itself. Also 554 bit fields are not flexible enough to address the requirements from 555 OAM, Telemetry, and security extensions, for variable length option 556 and different subtypes of the same option. While TLV are more 557 flexible, a control plane can restrict the number of option TLVs as 558 well the order and size of the TLVs to make it simpler for a 559 dataplane implementation to handle. 561 5. We briefly discussed the multi-vendor NVE case, and the need to 562 allow vendors to put their own extensions in the NVE header. This is 563 possible with TLVs. 565 6. We also agreed that the C bit in Geneve is helpful to allow a 566 receiver NVE to easily decide whether to process options or not, for 567 example a UUID based packet trace, and how an optional extension such 568 as that can be ignored by a receiver NVE and thus make it easy for 569 NVE to skip over the options. Thus, the C bit remains as defined in 570 Geneve. 572 7. There are already some extensions that are being discussed (see 573 section 6.2) of varying sizes. By using Geneve options it is possible 574 to get in band parameters like switch id, ingress port, egress port, 575 internal delay, and queue in telemetry defined extension TLV from 576 switches. It is also possible to add security extension TLVs like 577 HMAC and DTLS/IPSEC to authenticate the Geneve packet header and 578 secure the Geneve packet payload by software or hardware tunnel 579 endpoints. A Group Based Policy extension TLV can be carried as 580 well. 582 8. There are already implementations of Geneve options deployed in 583 production networks as of this writing. There are as well new 584 hardware supporting Geneve TLV parsing. In addition, an In-band 585 Telemetry [INT] specification is being developed by P4.org that 586 illustrates the option of INT meta data carried over Geneve. OVN/OVS 587 have also defined some option TLV(s) for Geneve. 589 9. The DT has addressed the usage models while considering the 590 requirements and implementations in general that includes software 591 and hardware. 593 There seems to be interest to standardize some well-known secure 594 option TLVs to secure the header and payload to guarantee 595 encapsulation header integrity and tenant data privacy. The 596 design team recommends that the working group consider 597 standardizing such option(s). 599 We recommend the following enhancements to Geneve to make it more 600 suitable to hardware and yet provide the flexibility for software: 602 We would propose a text such as, while TLV are more flexible, a 603 control plane can restrict the number of option TLVs as well the 604 order and size of the TLVs to make it simpler for a data plane 605 implementation in software or hardware to handle. For example, 606 there may be some critical information such as a secure hash that 607 must be processed in a certain order at lowest latency. 609 A control plane can negotiate a subset of option TLVs and certain 610 TLV ordering, as well as limiting the total number of option TLVs 611 present in the packet, for example, to allow for hardware capable 612 of processing fewer options. Hence, the control plane needs to 613 have the ability to describe the supported TLVs subset and their 614 order. 616 The Geneve draft should specify that the subset and order of 617 option TLVs should be configurable for each remote NVE in the 618 absence of a protocol control plane. 620 We recommend that Geneve follow fragmentation recommendations in 621 overlay services like PWE3 and the L2/L3 VPN recommendations to 622 guarantee larger MTU for the tunnel overhead ([RFC3985] Section 623 5.3). 625 We request that Geneve provide a recommendation for critical bit 626 processing - text could specify how critical bits can be used with 627 control plane specifying the critical options. 629 Given that there is a telemetry option use case for a length of 630 256 bytes, we recommend that Geneve increase the Single TLV option 631 length to 256. 633 We request that Geneve address Requirements for OAM considerations 634 for alternate marking and for performance measurements that need a 635 2 bit field in the header and clarify the need for the current OAM 636 bit in the Geneve Header. 638 We recommend that the WG work on security options for Geneve. 640 8. Acknowledgements 642 The authors would like to thank Tom Herbert for providing the 643 motivation for the Security/Integrity extension, and for his valuable 644 comments, T. Sridhar for his valuable comments and feedback, and 645 Anoop Ghanwani for his extensive comments. 647 9. Security Considerations 649 This document does not introduce any additional security constraints. 651 10. IANA Considerations 653 This document requires no IANA actions. 655 11. References 657 11.1 Normative References 659 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 660 Requirement Levels", BCP 14, RFC 2119, DOI 661 10.17487/RFC2119, March 1997, . 664 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 665 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 666 2017, . 668 11.2 Informative References 670 [I-D.herbert-gue-extensions] Herbert, T., Yong, L., and F. Templin, 671 "Extensions for Generic UDP Encapsulation", 672 draft-herbert-gue-extensions-01 (work in progress), October 673 2016. 675 [I-D.ietf-intarea-gue] Herbert, T., Yong, L., and O. Zia, "Generic 676 UDP Encapsulation", draft-ietf-intarea-gue (work in 677 progress), October 2019. 679 [I-D.ietf-ippm-ioam-data] F. Brockers, S. Bhandari, T. Mizrahi, "Data 680 Fields for In-situ OAM", draft-ietf-ippm-ioam-data (work in 681 progress), June 2021. 683 [I-D.ietf-nvo3-vxlan-gpe] Maino, F., Kreeger, L., and U. Elzur, 684 "Generic Protocol Extension for VXLAN", 685 draft-ietf-nvo3-vxlan-gpe (work in progress), March 2021. 687 [I-D.smith-vxlan-group-policy] Smith, M. and L. Kreeger, "VXLAN Group 688 Policy Option", draft-smith-vxlan-group-policy-05 (work in 689 progress), October 2018. 691 [INT] P4.org, "In-band Network Telemetry (INT) Dataplane 692 Specification", November 2020, 693 https://p4.org/p4-spec/docs/INT_v2_1.pdf 695 [RFC2418] Bradner, S., "IETF Working Group Guidelines and 696 Procedures", BCP 25, RFC 2418, DOI 10.17487/RFC2418, 697 September 1998, . 699 [RFC3985] Bryant, S., Ed. and P. Pate, Ed., "Pseudo Wire Emulation 700 Edge-to-Edge (PWE3) Architecture", RFC 3985, DOI 701 10.17487/RFC3985, March 2005, 702 . 704 [RFC7348] Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger, 705 L., Sridhar, T., Bursell, M., and C. Wright, "Virtual 706 eXtensible Local Area Network (VXLAN): A Framework for 707 Overlaying Virtualized Layer 2 Networks over Layer 3 708 Networks", RFC 7348, DOI 10.17487/RFC7348, August 2014, 709 . 711 [RFC7637] Garg, P., Ed., and Y. Wang, Ed., "NVGRE: Network 712 Virtualization Using Generic Routing Encapsulation", RFC 713 7637, DOI 10.17487/RFC7637, September 2015, 714 . 716 [RFC8300] Quinn, P., Ed., Elzur, U., Ed., and C. Pignataro, Ed., 717 "Network Service Header (NSH)", RFC 8300, DOI 718 10.17487/RFC8300, January 2018, 719 . 721 [RFC8365] Sajassi, A., Ed., Drake, J., Ed., Bitar, N., Shekhar, R., 722 Uttaro, J., and W. Henderickx, "A Network Virtualization 723 Overlay Solution Using Ethernet VPN (EVPN)", RFC 8365, DOI 724 10.17487/RFC8365, March 2018, 725 . 727 [RFC8394] Li, Y., Eastlake 3rd, D., Kreeger, L., Narten, T., and D. 728 Black, "Split Network Virtualization Edge (Split-NVE) 729 Control-Plane Requirements", RFC 8394, DOI 730 10.17487/RFC8394, May 2018, 731 . 733 [RFC8926] Gross, J., Ed., Ganga, I., Ed., and T. Sridhar, Ed., 734 "Geneve: Generic Network Virtualization Encapsulation", RFC 735 8926, DOI 10.17487/RFC8926, November 2020, 736 . 738 Appendix A: Encapsulations Comparison 740 A.1. Overview 742 This section presents a comparison of the three NVO3 encapsulation 743 proposals, Geneve, GUE, and VXLAN-GPE. The three encapsulations use 744 an outer UDP/IP transport. Geneve and VXLAN-GPE use an 8-octet 745 header, while GUE uses a 4-octet header. In addition to the base 746 header, optional extensions may be included in the encapsulation, as 747 discussed in Section A.2 below. 749 A.2. Extensibility 751 A.2.1. Native Extensibility Support 753 The Geneve and GUE encapsulations both enable optional headers to be 754 incorporated at the end of the base encapsulation header. 756 VXLAN-GPE does not provide native support for header extensions. 757 However, as discussed in [I-D.ietf-nvo3-vxlan-gpe], extensibility can 758 be attained to some extent if the Network Service Header (NSH) 759 [RFC8300] is used immediately following the VXLAN-GPE header. NSH 760 supports either a fixed-size extension (MD Type 1), or a variable- 761 size TLV-based extension (MD Type 2). It should be noted that NSH- 762 over-VXLAN-GPE implies an additional overhead of the 8-octets NSH 763 header, in addition to the VXLAN-GPE header. 765 A.2.2. Extension Parsing 767 The Geneve Variable Length Options are defined as Type/Length/Value 768 (TLV) extensions. Similarly, VXLAN-GPE, when using NSH, can include 769 NSH TLV-based extensions. In contrast, GUE defines a small set of 770 possible extension fields (proposed in [I-D.herbert-gue-extensions]), 771 and a set of flags in the GUE header that indicate for each extension 772 type whether it is present or not. 774 TLV-based extensions, as defined in Geneve, provide the flexibility 775 for a large number of possible extension types. Similar behavior can 776 be supported in NSH-over-VXLAN-GPE when using MD Type 2. The flag- 777 based approach taken in GUE strives to simplify implementations by 778 defining a small number of possible extensions used in a fixed order. 780 The Geneve and GUE headers both include a length field, defining the 781 total length of the encapsulation, including the optional extensions. 783 The length field simplifies the parsing of transit devices that skip 784 the encapsulation header without parsing its extensions. 786 A.2.3. Critical Extensions 788 The Geneve encapsulation header includes the 'C' field, which 789 indicates whether the current Geneve header includes critical 790 options, that is to say, options which must be parsed by the target 791 NVE. If the endpoint is not able to process a critical option, the 792 packet is discarded. 794 A.2.4. Maximal Header Length 796 The maximal header length in Geneve, including options, is 260 797 octets. GUE defines the maximal header to be 128 octets. VXLAN-GPE 798 uses a fixed-length header of 8 octets, unless NSH-over-VXLAN-GPE is 799 used, yielding an encapsulation header of up to 264 octets. 801 A.3. Encapsulation Header 803 A.3.1. Virtual Network Identifier (VNI) 805 The Geneve and VXLAN-GPE headers both include a 24-bit VNI field. 806 GUE, on the other hand, enables the use of a 32-bit field called 807 VNID; this field is not included in the GUE header, but was defined 808 as an optional extension in [I-D.herbert-gue-extensions]. 810 The VXLAN-GPE header includes the 'I' bit, indicating that the VNI 811 field is valid in the current header. A similar indicator is defined 812 as a flag in the GUE header [I-D.herbert-gue-extensions]. 814 A.3.2. Next Protocol 816 The three encapsulation headers include a field that specifies the 817 type of the next protocol header, which resides after the NVO3 818 encapsulation header. The Geneve header includes a 16-bit field that 819 uses the IEEE Ethertype convention. GUE uses an 8-bit field, which 820 uses the IANA Internet protocol numbering. The VXLAN-GPE header 821 incorporates an 8-bit Next Protocol field, using a VXLAN-GPE-specific 822 registry, defined in [I-D.ietf-nvo3-vxlan-gpe]. 824 The VXLAN-GPE header also includes the 'P' bit, which explicitly 825 indicates whether the Next Protocol field is present in the current 826 header. 828 A.3.3. Other Header Fields 830 The OAM bit, which is defined in Geneve and in VXLAN-GPE, indicates 831 whether the current packet is an OAM packet. The GUE header includes 832 a similar field, but uses different terminology; the GUE 'C-bit' 833 specifies whether the current packet is a control packet. Note that 834 the GUE control bit can potentially be used in a large set of 835 protocols that are not OAM protocols. However, the control packet 836 examples discussed in [I-D.ietf-intarea-gue] are OAM-related. 838 Each of the three NVO3 encapsulation headers includes a 2-bit Version 839 field, which is currently defined to be zero. 841 The Geneve and VXLAN-GPE headers include reserved fields; 14 bits in 842 the Geneve header, and 27 bits in the VXLAN-GPE header are reserved. 844 A.4. Comparison Summary 846 The following table summarizes the comparison between the three NVO3 847 encapsulations: 848 +----------------+----------------+----------------+----------------+ 849 | | Geneve | GUE | VXLAN-GPE | 850 +----------------+----------------+----------------+----------------+ 851 | Outer transport| UDP/IP | UDP/IP | UDP/IP | 852 +----------------+----------------+----------------+----------------+ 853 | Base header | 8 octets | 4 octets | 8 octets | 854 | length | | | (16 octets | 855 | | | | using NSH) | 856 +----------------+----------------+----------------+----------------+ 857 | Extensibility |Variable length |Extension fields| No native ext- | 858 | | options | | ensibility. | 859 | | | | Extensible | 860 | | | | using NSH. | 861 +----------------+----------------+----------------+----------------+ 862 | Extension | TLV-based | Flag-based | TLV-based | 863 | parsing method | | |(using NSH with | 864 | | | | MD Type 2) | 865 +----------------+----------------+----------------+----------------+ 866 | Extension | Variable | Fixed | Variable | 867 | order | | | (using NSH) | 868 +----------------+----------------+----------------+----------------+ 869 | Length field | + | + | - | 870 +----------------+----------------+----------------+----------------+ 871 | Max Header | 260 octets | 128 octets | 8 octets | 872 | Length | | |(264 using NSH) | 873 +----------------+----------------+----------------+----------------+ 874 | Critical exte- | + | - | - | 875 | nsion bit | | | | 876 +----------------+----------------+----------------+----------------+ 877 | VNI field size | 24 bits | 32 bits | 24 bits | 878 | | | (extension) | | 879 +----------------+----------------+----------------+----------------+ 880 | Next protocol | 16 bits | 8 bits | 8 bits | 881 | field | Ethertype | Internet prot- | New registry | 882 | | registry | ocol registry | | 883 +----------------+----------------+----------------+----------------+ 884 | Next protocol | - | - | + | 885 | indicator | | | | 886 +----------------+----------------+----------------+----------------+ 887 | OAM / control | OAM bit | Control bit | OAM bit | 888 | field | | | | 889 +----------------+----------------+----------------+----------------+ 890 | Version field | 2 bits | 2 bits | 2 bits | 891 +----------------+----------------+----------------+----------------+ 892 | Reserved bits | 14 bits | - | 27 bits | 893 +----------------+----------------+----------------+----------------+ 894 Figure 1: NVO3 Encapsulations Comparison 896 Contributors 898 The following co-authors have contributed to this document: 900 Ilango Ganga Intel Email: ilango.s.ganga@intel.com 902 Pankaj Garg Microsoft Email: pankajg@microsoft.com 904 Rajeev Manur Broadcom Email: rajeev.manur@broadcom.com 906 Tal Mizrahi Marvell Email: talmi@marvell.com 908 David Mozes Email: mosesster@gmail.com 910 Erik Nordmark ZEDEDA Email: nordmark@sonic.net 912 Michael Smith Cisco Email: michsmit@cisco.com 914 Sam Aldrin Google Email: aldrin.ietf@gmail.com 916 Ignas Bagdonas Equinix Email: ibagdona.ietf@gmail.com 918 Authors' Addresses 920 Sami Boutros (editor) 921 Ciena 922 USA 924 Email: sboutros@ciena.com 926 Donald E. Eastlake, 3rd (editor) 927 Futurewei Technologies 928 2386 Panoramic Circle 929 Apopka, FL 32703 930 USA 932 Tel: +1-508-333-2270 933 Email: d3e3e3@gmail.com