idnits 2.17.00 (12 Aug 2021) /tmp/idnits53150/draft-sajassi-l2vpn-evpn-segment-route-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The abstract seems to contain references ([E-VPN]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document date (July 16, 2012) is 3589 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'EVPN-REQ' is defined on line 373, but no explicit reference was found in the text == Outdated reference: A later version (-04) exists of draft-raggarwa-sajassi-l2vpn-evpn-02 == Outdated reference: A later version (-01) exists of draft-sajassi-raggarwa-l2vpn-evpn-req-00 Summary: 1 error (**), 0 flaws (~~), 5 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 INTERNET-DRAFT Ali Sajassi 3 Intended Status: Standards Track Samer Salam 4 Sami Boutros 5 Keyur Patel 6 Cisco 7 Expires: January 17, 2013 July 16, 2012 9 E-VPN Ethernet Segment Route 10 draft-sajassi-l2vpn-evpn-segment-route-01.txt 12 Status of this Memo 14 This Internet-Draft is submitted to IETF in full conformance with the 15 provisions of BCP 78 and BCP 79. 17 Internet-Drafts are working documents of the Internet Engineering 18 Task Force (IETF), its areas, and its working groups. Note that 19 other groups may also distribute working documents as 20 Internet-Drafts. 22 Internet-Drafts are draft documents valid for a maximum of six months 23 and may be updated, replaced, or obsoleted by other documents at any 24 time. It is inappropriate to use Internet-Drafts as reference 25 material or to cite them other than as "work in progress." 27 The list of current Internet-Drafts can be accessed at 28 http://www.ietf.org/1id-abstracts.html 30 The list of Internet-Draft Shadow Directories can be accessed at 31 http://www.ietf.org/shadow.html 33 Copyright and License Notice 35 Copyright (c) 2012 IETF Trust and the persons identified as the 36 document authors. All rights reserved. 38 This document is subject to BCP 78 and the IETF Trust's Legal 39 Provisions Relating to IETF Documents 40 (http://trustee.ietf.org/license-info) in effect on the date of 41 publication of this document. Please review these documents 42 carefully, as they describe your rights and restrictions with respect 43 to this document. Code Components extracted from this document must 44 include Simplified BSD License text as described in Section 4.e of 45 the Trust Legal Provisions and are provided without warranty as 46 described in the Simplified BSD License. 48 Abstract 50 [E-VPN] defines a solution and architecture for BGP MPLS-based 51 Ethernet VPNs. This document describes procedures and additional BGP 52 route attributes that enhance the multi-homing capabilities of the 53 solution. These are: the DF Election Attribute and the Inter-chassis 54 Communication Attribute. This draft describes their usage, advantages 55 and encoding. 57 Table of Contents 59 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 60 1.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . 3 61 2 Motivation and Usage . . . . . . . . . . . . . . . . . . . . . 3 62 2.1 Support of Multi-Chassis Ethernet Bundles . . . . . . . . . 3 63 2.2 Avoiding Relearning of Subscriber/Session State . . . . . . 4 64 2.3 Preventing Transient Loops and Packet Duplication . . . . . 4 65 3 BGP Encoding . . . . . . . . . . . . . . . . . . . . . . . . . 5 66 3.1 DF Election Attribute . . . . . . . . . . . . . . . . . . . 5 67 3.2 Inter-chassis Communication Attribute . . . . . . . . . . . 5 68 4. DF Election with Paxos Algorithm . . . . . . . . . . . . . . . 6 69 5 LACP State Synchronization . . . . . . . . . . . . . . . . . . 7 70 6 Subscriber/Session State Synchronization . . . . . . . . . . . 8 71 7 Security Considerations . . . . . . . . . . . . . . . . . . . . 8 72 8 IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 8 73 9 References . . . . . . . . . . . . . . . . . . . . . . . . . . 9 74 9.1 Normative References . . . . . . . . . . . . . . . . . . . 9 75 9.2 Informative References . . . . . . . . . . . . . . . . . . 9 76 Author's Addresses . . . . . . . . . . . . . . . . . . . . . . . . 9 78 1 Introduction 80 [E-VPN] defines a solution and architecture for BGP MPLS-based 81 Ethernet L2VPN services with advanced multi-homing capabilities. In 82 this draft we define procedures and extensions that enhance the 83 multi-homing capabilities of the E-VPN solution in the following 84 areas: 86 - Preventing transient loops and packet duplication 87 - Support of multi-chassis Ethernet bundles 88 - Avoiding relearning of subscriber/session state 90 Two new BGP route attributes are defined: the DF Election attribute 91 and the Inter-chassis Communication attribute. 93 Section 2 discusses the motivation and usage of the new attributes. 94 Section 3 describes the BGP encoding. 96 1.1 Terminology 98 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 99 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 100 document are to be interpreted as described in RFC 2119 [RFC2119]. 102 2 Motivation and Usage 104 This section focuses on the reasons for defining the 2 BGP 105 attributes, and describes their usage in E-VPN. 107 2.1 Support of Multi-Chassis Ethernet Bundles 109 When a CE is multi-homed to a set of PE nodes using the [802.1AX] 110 Link Aggregation Control Protocol (LACP), the PEs must act as if they 111 were a single LACP speaker for the Ethernet links to form a bundle, 112 and operate correctly as a Link Aggregation Group (LAG). To achieve 113 this, the PEs connected to the same multi-homed CE must synchronize 114 LACP configuration and operational data among them. The 115 synchronization is required for the following reasons: 117 - to determine if the links in the Ethernet bundle are to operate in 118 all-active or hot-standby resiliency mode 120 - to detect and handle CE mis-configuration when LACP Port Key is 121 configured on the PE 123 - to detect and handle mis-wiring between CE and PE when LACP Port 124 Key is configured on the PE 125 - to deterministically agree on which link(s) should join a bundle 126 based on port and system priorities, especially when the number of 127 links exceeds the aggregation capacity of the PEs, and the PE LACP 128 System Priority is higher than the CE's 130 - to detect and react to actor/partner churn where the LACP speakers 131 are not able to converge 133 Synchronization of LACP state between PEs is performed using the 134 Inter-chassis Communication attribute carried in the Ethernet Segment 135 route, as described in the 'LACP State Synchronization' section 136 below. 138 2.2 Avoiding Relearning of Subscriber/Session State 140 For certain applications, the PE builds and maintains per subscriber 141 or per session 'soft' state that is used for either optimizing the 142 traffic forwarding or enforcing security. Examples of such per 143 subscriber/session state includes: 145 - multicast state derived from IGMP or PIM snooping 147 - IP address to MAC address bindings gleaned from snooping ARP and/or 148 DHCP packets, and used to prevent address spoofing or masquerading 150 When a set of PE nodes provides multi-homed connectivity for an 151 Ethernet Segment, this 'soft' state is built on the active PE node 152 that forwards and snoops the relevant protocol packets. In case of a 153 link or node failure, the state must be reconstructed on the backup 154 PE (e.g. by waiting for the next IGMP query or ARP message or by 155 issuing unsolicited queries). This may cause traffic disruption and 156 affect the availability of the service. Alternatively, the state can 157 be synchronized among the PE nodes via BGP, and that would enhance 158 the convergence of the service after failure. 160 Synchronization of subscriber/session state between PE nodes is 161 performed using the Inter-chassis Communication attribute carried in 162 the Ethernet Segment route, as described in the 'Subscriber/Session 163 State Synchronization' section below. 165 2.3 Preventing Transient Loops and Packet Duplication 167 During routing transients, different PEs may end up electing 168 different DFs for the same Ethernet Segment due to inconsistent views 169 of the network. If the Ethernet Segment is a multi-homed device, this 170 may lead to transient packet duplication. If the Ethernet Segment is 171 a multi-homed network, the presence of multiple DFs may lead to 172 transient forwarding loops in addition to potential packet 173 duplication. 175 To eliminate these issues, an optional handshake mechanism is defined 176 to ensure that the PE nodes connected to the same Ethernet Segment 177 share a common view of the access network topology. This handshake is 178 performed using the DF Election attribute carried in the Ethernet 179 Segment route, as discussed in Appendix I: 'DF Election with Paxos 180 Algorithm'. 182 3 BGP Encoding 184 This section defines the encoding of the BGP attributes. 186 3.1 DF Election Attribute 188 +---------------------------------------+ 189 | State (2 octets) | 190 +---------------------------------------+ 191 | Sequence No. (4 octets) | 192 +---------------------------------------+ 193 | Local No. of links (2 octets) | 194 +---------------------------------------+ 195 | Total No. of links (2 octets) | 196 +---------------------------------------+ 197 | Flags (1 octet) | 198 +---------------------------------------+ 199 | No. of IP addresses (1 octet) | 200 +---------------------------------------+ 201 | Ordered list of tuples: | 202 | [IP address Length (1 octet), | 203 | IP Address (4 or 16 bytes)]| | 204 +---------------------------------------+ 206 State field can take one of the following values: 208 0x0000 Initializing 209 0x0001 Proposal Pending 210 0x0002 Promise Pending 211 0x0003 Active 213 Flags field is encoded as follows: 215 7 bits: reserved 216 Least significant bit: Protecting flag 218 3.2 Inter-chassis Communication Attribute 219 +---------------------------------------+ 220 | Type (2 octets) | 221 +---------------------------------------+ 222 | Length (1 or 2 octets) | 223 +---------------------------------------+ 224 | Opaque (var) | 225 +---------------------------------------+ 227 4. DF Election with Paxos Algorithm 229 The procedures in this section guarantee that all PE nodes in a given 230 redundancy group agree on a unique DF for a given Ethernet Segment. 231 This eliminates the problem of transient forwarding loops and 232 transient packet duplicates described above. The procedures can be 233 broken down to the following steps: 235 1. When a PE discovers the ESI of the attached Ethernet Segment, it 236 advertises an Ethernet Segment route with the associated ES-Import 237 extended community attribute and with the 'Initializing' code in the 238 State field of the DF Election attribute. 240 2. The PE then starts a timer to allow the reception of Ethernet 241 Segment routes from other PE nodes in the same redundancy group. 243 3. When the timer expires, each PE builds an ordered list of the IP 244 addresses of all the PE nodes connected to the Ethernet Segment 245 (including itself), in increasing numeric value. 247 4. The first PE in the ordered list then elects itself as the Arbiter 248 Node (AN). It initiates the handshake by sending an Ethernet Segment 249 route with 'Proposal Pending' code in the State field of the DF 250 Election attribute. 252 5. When a PE node receives an Ethernet Segment route with the 253 'Proposal Pending' code, it takes one of the following options: 255 a. If the receiving PE ranks the transmitting PE's IP address as 256 the top entry in its local ordered list, it acknowledges the 257 handshake by responding with an Ethernet Segment route with the 258 'Promise Pending' code in the State field of the DF Election 259 attribute. This includes the scenario where the receiving PE 260 forfeits the AN role to another advertising PE with a numerically 261 lower IP address. 263 b. If the receiving PE does not rank the transmitting PE's IP 264 address as the top entry in its local ordered list, and the 265 receiving PE had advertised an Ethernet Segment route with the 266 'Initializing' code or with the 'Proposal Pending' code, then the 267 PE takes no further action. 269 6. When the AN receives 'Promise Pending' from all of the PE nodes in 270 the ordered list, it sends an updated Ethernet Segment route with the 271 'Active' code in the DF Election attribute. 273 7. When the other PE nodes in the redundancy group receive the 274 'Active' code from the AN, they respond with an updated Ethernet 275 Segment route with the 'Active' code in the DF Election attribute. 276 This concludes the handshake. 278 In the case where the DF election is performed at the granularity of 279 an Ethernet Segment, i.e. there is a single DF for all VLANs on the 280 segment, the Arbiter Node is effectively the Designated Forwarder for 281 the segment. All the PE nodes start off with their ports, that are 282 connected to the segment, blocked in Step 1 (for multi-destination 283 traffic from core). And in Step 6, the PE confirmed as the AN (i.e. 284 DF) unblocks its port towards the Ethernet Segment. DF election at 285 the granularity of (Ethernet Segment, VLAN) is discussed in the "VLAN 286 Carving" section below. 288 5 LACP State Synchronization 290 To support CE multi-homing with multi-chassis Ethernet bundles, the 291 PE nodes connected to a given CE should synchronize [802.1AX] LACP 292 state amongst each other. This includes the following LACP specific 293 configuration parameters: 295 - System Identifier (MAC Address): uniquely identifies a LACP 296 speaker. 297 - System Priority: determines which LACP speaker's port priorities 298 are used in the Selection logic. 299 - Aggregator Identifier: uniquely identifies a bundle within a LACP 300 speaker. 301 - Aggregator MAC Address: identifies the MAC address of the bundle. 302 - Aggregator Key: used to determine which ports can join an 303 Aggregator. 304 - Port Number: uniquely identifies an interface within a LACP 305 speaker. 306 - Port Key: determines the set of ports that can be bundled. 307 - Port Priority: determines a port's precedence level to join a 308 bundle in case the number of eligible ports exceeds the maximum 309 number of links allowed in a bundle. 311 The above information must be synchronized between the PE nodes 312 wishing to form a multi-chassis bundle with a given CE, in order for 313 the former to convey a single LACP peer to that CE. This is required 314 for initial system bring-up and upon any configuration change. 315 Furthermore, the PEs must also synchronize operational (run-time) 316 data, in order for the LACP Selection logic state-machines to 317 execute. This operational data includes the following LACP 318 operational parameters, on a per port basis: 320 - Partner System Identifier: this is the CE System MAC address. 321 - Partner System Priority: the CE LACP System Priority 322 - Partner Port Number: CE's AC port number. 323 - Partner Port Priority: CE's AC Port Priority. 324 - Partner Key: CE's key for this AC. 325 - Partner State: CE's LACP State for the AC. 326 - Actor State: PE's LACP State for the AC. 327 - Port State: PE's AC port status. 329 The above state needs to be communicated between PEs forming a 330 multi-chassis bundle during LACP initial bring-up, upon any 331 configuration change and upon the occurrence of a failure. 333 It should be noted that the above configuration and operational state 334 is localized in scope and is only relevant to PEs within a given 335 Redundancy Group, i.e. which connect to the same Ethernet Segment 336 over a given Ethernet bundle. Furthermore, the communication of state 337 changes, upon failures, must occur with minimal latency, in order to 338 minimize the switchover time and consequent service disruption. 340 Without synchronization of the above parameters, the system is 341 subject to the issues outlined in section 2.2 above. 343 6 Subscriber/Session State Synchronization 345 Synchronization of subscriber/session state between PE nodes is 346 performed using the Inter-chassis Communication attribute carried in 347 the Ethernet Segment route. The various applications are responsible 348 for the encoding and decoding of the relevant data, and this is 349 outside the scope of this draft. BGP provides a reliable transport 350 service in this case. 352 7 Security Considerations 354 There are no additional security aspects beyond those of VPLS/H-VPLS 355 that need to be considered. 357 8 IANA Considerations 358 To be added in a later revision. 360 9 References 362 9.1 Normative References 364 [RFC2119] S. Bradner, "Key words for use in RFCs to Indicate 365 Requirement Levels", BCP 14, RFC 2119, March 1997. 367 9.2 Informative References 369 [E-VPN] Aggarwal et al., "BGP MPLS Based Ethernet VPN", draft- 370 raggarwa-sajassi-l2vpn-evpn-02.txt, work in progress, 371 March, 2011. 373 [EVPN-REQ] Sajassi et al., "Requirements for Ethernet VPN (E-VPN)", 374 draft-sajassi-raggarwa-l2vpn-evpn-req-00.txt, work in 375 progress, October, 2010. 377 Author's Addresses 379 Ali Sajassi 380 Cisco 381 170 West Tasman Drive 382 San Jose, CA 95134, US 383 Email: sajassi@cisco.com 385 Samer Salam 386 Cisco 387 595 Burrard Street, Suite 2123 388 Vancouver, BC V7X 1J1, Canada 389 Email: ssalam@cisco.com 391 Sami Boutros 392 Cisco 393 170 West Tasman Drive 394 San Jose, CA 95134, US 395 Email: sboutros@cisco.com 397 Keyur Patel 398 Cisco 399 170 West Tasman Drive 400 San Jose, CA 95134, US 401 Email: keyupate@cisco.com