idnits 2.17.00 (12 Aug 2021) /tmp/idnits40924/draft-bonica-v6-multihome-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The draft header indicates that this document updates RFC6296, but the abstract doesn't seem to directly say this. It does mention RFC6296 though, so this could be OK. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (December 11, 2011) is 3814 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Experimental ---------------------------------------------------------------------------- == Outdated reference: A later version (-05) exists of draft-ietf-idr-best-external-04 == Outdated reference: draft-ietf-lisp has been published as RFC 6830 Summary: 0 errors (**), 0 flaws (~~), 3 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 INTAREA R. Bonica, Ed. 3 Internet-Draft Juniper Networks 4 Updates: 6296 (if approved) F. Baker 5 Intended status: Experimental Cisco Systems 6 Expires: June 13, 2012 M. Wasserman 7 Painless Security 8 G. Miller 9 Verizon 10 W. Kumari 11 Google, Inc. 12 December 11, 2011 14 Multihoming with IPv6-to-IPv6 Network Prefix Translation (NPTv6) 15 draft-bonica-v6-multihome-02 17 Abstract 19 This memo describes an architecture for sites that are homed to 20 multiple upstream providers. The architecture described herein uses 21 IPv6-to-IPv6 Network Prefix Translation (NPTv6) to achieve 22 redundancy, transport-layer survivability, load sharing and address 23 independence. 25 This memo updates Section 2.4 of RFC 6296. 27 Status of this Memo 29 This Internet-Draft is submitted in full conformance with the 30 provisions of BCP 78 and BCP 79. 32 Internet-Drafts are working documents of the Internet Engineering 33 Task Force (IETF). Note that other groups may also distribute 34 working documents as Internet-Drafts. The list of current Internet- 35 Drafts is at http://datatracker.ietf.org/drafts/current/. 37 Internet-Drafts are draft documents valid for a maximum of six months 38 and may be updated, replaced, or obsoleted by other documents at any 39 time. It is inappropriate to use Internet-Drafts as reference 40 material or to cite them other than as "work in progress." 42 This Internet-Draft will expire on June 13, 2012. 44 Copyright Notice 46 Copyright (c) 2011 IETF Trust and the persons identified as the 47 document authors. All rights reserved. 49 This document is subject to BCP 78 and the IETF Trust's Legal 50 Provisions Relating to IETF Documents 51 (http://trustee.ietf.org/license-info) in effect on the date of 52 publication of this document. Please review these documents 53 carefully, as they describe your rights and restrictions with respect 54 to this document. Code Components extracted from this document must 55 include Simplified BSD License text as described in Section 4.e of 56 the Trust Legal Provisions and are provided without warranty as 57 described in the Simplified BSD License. 59 Table of Contents 61 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 62 1.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 4 63 2. NPTv6 Deployment . . . . . . . . . . . . . . . . . . . . . . . 4 64 2.1. Topology . . . . . . . . . . . . . . . . . . . . . . . . . 5 65 2.2. Addressing . . . . . . . . . . . . . . . . . . . . . . . . 6 66 2.2.1. Upstream Provider Addressing . . . . . . . . . . . . . 6 67 2.2.2. Site Addressing . . . . . . . . . . . . . . . . . . . 6 68 2.3. Address Translation . . . . . . . . . . . . . . . . . . . 7 69 2.4. Domain Name System (DNS) . . . . . . . . . . . . . . . . . 8 70 2.5. Routing . . . . . . . . . . . . . . . . . . . . . . . . . 8 71 2.6. Failure Detection and Recovery . . . . . . . . . . . . . . 9 72 2.7. Load Balancing . . . . . . . . . . . . . . . . . . . . . . 10 73 3. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 10 74 4. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11 75 5. Security Considerations . . . . . . . . . . . . . . . . . . . 11 76 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 11 77 7. References . . . . . . . . . . . . . . . . . . . . . . . . . . 12 78 7.1. Normative References . . . . . . . . . . . . . . . . . . . 12 79 7.2. Informative References . . . . . . . . . . . . . . . . . . 12 80 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 13 82 1. Introduction 84 [RFC3582] establishes the following goals for IPv6 site multihoming: 86 Redundancy - A site's ability to remain connected to the 87 Internet, even when connectivity through one or more of its 88 upstream providers fails. 90 Transport-Layer Survivability - A site's ability to maintain 91 transport-layer sessions across failover and restoration 92 events. During a failover/restoration event, the transport- 93 layer session may detect packet loss or reordering, but neither 94 of these cause the transport-layer session to fail. 96 Load Sharing - The ability of a site to distribute both inbound 97 and outbound traffic across its upstream providers. 99 [RFC3582] notes that a multihoming solution may require interactions 100 with the routing subsystem. However, multihoming solutions must be 101 simple and scalable. They must not require excessive operational 102 effort and must not cause excessive routing table expansion. 104 [RFC6296] explains how a site can achieve address independence using 105 IPv6-to-IPv6 Network Prefix Translation (NPTv6). In order to achieve 106 address independence, the site assigns an inside address to each of 107 its resources (e.g., hosts). Nodes outside of the site identify 108 those same resources using a corresponding Provider Allocated (PA) 109 address. 111 The site resolves this addressing dichotomy by deploying an NPTv6 112 translator between itself and its upstream provider. The NPTv6 113 translator maintains a static, one-to-one mapping between each inside 114 address and its corresponding PA address. That mapping persists 115 across flows and over time. 117 If the site disconnects from one upstream provider and connects to 118 another, it may lose its PA assignment. However, the site will not 119 need to renumber its resources. It will only need to reconfigure the 120 mapping rules on its local NPTv6 translator. 122 Section 2.4 of [RFC6296] describes an NPTv6 architecture for sites 123 that are homed to multiple upstream providers. While that 124 architecture fulfils many of the goals identified by [RFC3582], it 125 does not achieve transport-layer survivability. Transport-layer 126 survivability is not achieved because in this architecture, a PA 127 address is useable only when the multi-homed site is directly 128 connected to the allocating provider. 130 This memo describes an alternative architecture for multihomed sites 131 that require transport-layer survivability. It updates Section 2.4 132 of [RFC6296]. In this architecture, PA addresses remain usable, even 133 when the multihomed site loses its direct connection to the 134 allocating provider. 136 The architecture described in this document can be deployed in sites 137 that are served by two or more upstream providers. For the purpose 138 of example, this document demonstrates how the architecture can be 139 deployed in a site that is served by two upstream providers. 141 1.1. Terminology 143 The following terms are used in this document: 145 inbound packet - A packet that is destined for the multi-homed 146 site 148 outbound packet - A packet that originates at the multi-homed site 149 and is destined for a point outside of the multi-homed site 151 NPTv6 inside interface - An interface that connects an NPTv6 152 translator to the site 154 NPTv6 outside interface- An interface that connects an NPTv6 155 translator to an upstream provider 157 2. NPTv6 Deployment 159 This section demonstrates how NPTv6 can be deployed in order to 160 achieve the goals of [RFC3582]. 162 2.1. Topology 164 Upstream Upstream 165 Provider #1 Provider #2 166 / \ / \ 167 / \ / \ 168 / +------+ +------+ \ 169 +------+ |Backup| |Backup| +------+ 170 | PE | | PE | | PE | | PE | 171 | #1 | | #1 | | #2 | | #2 | 172 +------+ +------+ +------+ +------+ 173 | | 174 | | 175 +------+ +------+ 176 |NPTv6 | |NPTv6 | 177 | #1 | | #2 | 178 +------+ +------+ 179 | | 180 | | 181 ------------------------------------------------------ 182 Internal Network 184 Figure 1: NPTv6 Multihomed Topology 186 In Figure 1, a site attaches all of its assets, including two NPTv6 187 translators, to an Internal Network. NPTv6 #1 is connected to 188 Provider Edge (PE) Router #1, which is maintained by Upstream 189 Provider #1. Likewise, NPTv6 #2 is connected to PE Router #2, which 190 is maintained by Upstream Provider #2. 192 Each upstream provider also maintains a Backup PE Router. A 193 forwarding tunnel connects the loopback interface of Backup PE Router 194 #1 to the outside interface of NPTv6 #2. Likewise, another 195 forwarding tunnel connects Backup PE Router #2 to NPTv6 #1. Network 196 operators can select from many encapsulation techniques (e.g., GRE) 197 to realize the forwarding tunnel. Tunnels are not depicted in 198 Figure 1. 200 In the figure, NPTv6 #1 and NPTv6 #2 are depicted as separate boxes. 201 While vendors may produce a separate box to support the NPTv6 202 function, they may also integrate the NPTv6 function into a router. 204 During periods of normal operation, the Backup PE routers is very 205 lightly loaded. Therefore, a single Backup PE router may back up 206 multiple PE routers. Furthermore, the Backup PE router may be used 207 for other purposes (e.g., primary PE router for another customer). 209 2.2. Addressing 211 2.2.1. Upstream Provider Addressing 213 A Regional Internet Registry (RIR) allocates Provider Address Block 214 (PAB) #1 to Upstream Provider #1. From PAB #1, Upstream Provider #1 215 allocates two sub-blocks, using them as follows. 217 Upstream Provider #1 uses the first sub-block for its internal 218 address assignments. It also uses that sub-block for numbering both 219 ends of the interfaces between itself and its customers. 221 Upstream Provider #1 uses the second sub-block for address allocation 222 to its customers. We refer to a particular allocation from this sub- 223 block as a Customer Network Block (CNB). A CNB allocated for a 224 particular customer must be large enough to provide addressing for 225 the customer's entire Internal Network. In our example, Upstream 226 Provider #1 allocates a /60, called CNB #1, to its customer. 228 The customer configures translation rules that reference CNB #1 on 229 NPTv6 #1 and NPTv6 #2. This makes selected hosts that are connected 230 to the Internal Network accessible using CNB #1 addresses. See 231 Section 2.3 for details. 233 In a similar fashion, a Regional Internet Registry (RIR) allocates 234 PAB #2 to Upstream Provider #2. Upstream Provider #2, in turn, 235 allocates CNB #2 to the multihomed customer. 237 2.2.2. Site Addressing 239 The site obtains a Site Address Block (SAB), either from Unique Local 240 Address (ULA) [RFC4193] space, or by some other means. The SAB is as 241 large as all of the site's CNBs, combined. In this example, because 242 CNB #1 and CNB #2 are both /60's, the SAB is a /59. 244 The site divides its SAB into smaller blocks, with each block being 245 exactly as large as one CNB. It also associates each of the 246 resulting sub-blocks with one of its CNBs. In this example, the site 247 divides the SAB into a lower half and an upper half. It associates 248 the lower half of the SAB with CNB #1 and the upper half of the SAB 249 with CNB #2. 251 Finally, the site assigns one SAB address to each interface that is 252 connected to the Internal Network, including the inside interfaces of 253 the two NPTv6 translators. The site also assigns a SAB address to 254 the loopback interface of each NPTv6 translator. During periods of 255 normal operation, interfaces that are assigned addresses from the 256 lower half of the SAB receive traffic through Upstream Provider #1. 258 Likewise, interfaces that are assigned addresses from the upper half 259 of the SAB receive traffic through Upstream Provider #2. 261 Selected interfaces, because they receive a great deal of traffic, 262 must receive traffic through both upstream providers simultaneously. 263 Furthermore, those interfaces must control the portion of traffic 264 arriving through each upstream provider. The site assigns multiple 265 addresses to those interfaces, some from the lower half and others 266 from the upper half of the SAB. For any interface, the ratio of 267 upper half to lower half assignments roughly controls the portion of 268 traffic arriving through each upstream provider. See Section 2.3, 269 Section 2.5 and Section 2.7 for details. 271 2.3. Address Translation 273 Both NPTv6 translators are configured with the following rules: 275 For outbound packets, if the first 60 bits of the source 276 address identify the lower half of the SAB, overwrite those 60 277 bits with the 60 bits that identify CNB #1 279 For outbound packets, if the first 60 bits of the source 280 address identify the upper half of the SAB, overwrite those 60 281 bits with the 60 bits that identify CNB #2 283 For outbound packets, if none of the conditions above are met, 284 either drop or pass the packet without translation, according 285 to local security policy 287 For inbound packets, if the first 60 bits of the destination 288 address identify CNB #1, overwrite those 60 bits with the 60 289 bits that identify the lower half of the SAB 291 For inbound packets, if the first 60 bits of the destination 292 address identify CNB #2, overwrite those 60 bits with the 60 293 bits that identify the upper half of the SAB 295 For inbound packets, if none of the conditions above are met, 296 either drop or pass the packet without translation, according 297 to local security policy 299 Due to the nature of the rules described above, NPTv6 translation is 300 stateless. Therefore, traffic flows do not need to be symmetric 301 across NPTv6 translators. Furthermore, a traffic flow can shift from 302 one NPTv6 translator to another without causing transport-layer 303 session failure. 305 2.4. Domain Name System (DNS) 307 In order to make all site resources reachable by domain name 308 [RFC1034], the site publishes AAAA records [RFC3596] associating each 309 resource with all of its CNB addresses. While this DNS architecture 310 is sufficient, it is suboptimal. Traffic that both originates and 311 terminates within the site traverses NPTv6 translators needlessly. 312 Several optimizations are available. These optimizations are well 313 understood and have been applied to [RFC1918] networks for many 314 years. 316 2.5. Routing 318 Upstream Provider #1 uses an Interior Gateway Protocol to flood 319 topology information throughout its domain. It also uses BGP 320 [RFC4271] to distribute customer and peer reachability information. 322 PE #1 acquires a route to CNB #1 with NEXT-HOP equal to the outside 323 interface of NPTv6 #1. PE #1 can either learn this route from a 324 single-hop eBGP session with NPTv6 #1, or acquire it through static 325 configuration. In either case, PE #1 overwrites the NEXT-HOP of this 326 route with its own loopback address and distributes the route 327 throughout Upstream Provider #1 using iBGP. The LOCAL PREF for this 328 route is set high, so that the route will be preferred to alternative 329 routes to CNB #1. Upstream Provider #1 does not distribute this 330 route to CNB #1 outside of its own borders because it is part of the 331 larger aggregate PAB #1, which is itself advertised. 333 NPTv6 #1 acquires a default route with NEXT-HOP equal to the directly 334 connected interface on PE #1. NPTv6 #1 can either learn this route 335 from a single-hop eBGP session with PE #1, or acquire it through 336 static configuration. 338 Similarly, Backup PE #1 acquires a route to CNB #1 with NEXT-HOP 339 equal to the outside interface of NPTv6 #2. Backup PE #1 can either 340 learn this route from a multi-hop eBGP session with NPTv6 #2, or 341 acquire it through static configuration. In either case, Backup PE 342 #1 overwrites the NEXT-HOP of this route with its own loopback 343 address and distributes the route throughout Upstream Provider #1 344 using iBGP. Distribution procedures are defined in 345 [I-D.ietf-idr-best-external]. The LOCAL PREF for this route is set 346 low, so that the route will not be preferred to alternative routes to 347 CNB #1. Upstream Provider #1 does not distribute this route to CNB 348 #1 outside of its own borders. 350 Even if Backup PE #1 maintains an eBGP session NPTv6 #2, it does not 351 advertise the default route through that eBGP session. During 352 failures, Backup PE #1 does not attract outbound traffic to itself. 354 Finally, the Autonomous System Border Routers (ASBR) contained by 355 Upstream Provider #1 maintain eBGP sessions with their peers. The 356 ASBRs advertise only PAB #1 through those eBGP sessions. Upstream 357 Provider #1 does not advertise any of the following to its eBGP 358 peers: 360 any prefix that is contained by PAB #1 (i.e., more specific) 362 PAB #2 or any part thereof 364 the SAB or any part thereof 366 Upstream Provider #2 is configured in a manner analogous to that 367 described above. 369 2.6. Failure Detection and Recovery 371 When PE #1 loses its route to CNB #1, it withdraws its iBGP 372 advertisement for that prefix from Upstream Provider #1. The route 373 advertised by Backup PE #1 remains and Backup PE #1 attracts traffic 374 bound for CNB #1 to itself. Backup PE #1 forwards that traffic 375 through the tunnel to NPTv6 #2. NPTv6 #2 performs translations and 376 delivers the traffic to the Internal Network. 378 Likewise, when NPTv6 #1 loses its default route, it makes itself 379 unavailable as a gateway for other hosts on the Internal Network. 380 NPTv6 #2 attracts all outbound traffic to itself and forwards that 381 traffic through Upstream Provider #2. The mechanism by which NPTv6 382 #1 makes itself unavailable as a gateway is beyond the scope of this 383 document. 385 If PE #1 maintains a single-hop eBGP session with NPTv6 #1, the 386 failure of that eBGP session will cause both routes mentioned above 387 to be lost. Otherwise, another failure detection mechanism such as 388 BFD [RFC5881] is required. 390 Regardless of the failure detection mechanism, inbound traffic 391 traverses the tunnel only during failure periods and outbound traffic 392 never traverses the tunnel. Furthermore, restoration is localized. 393 As soon as the advertisement for CNB #1 is withdrawn throughout 394 Upstream Provider #1, restoration is complete. 396 Transport-layer connections survive Failure/Recovery events because 397 both NPTv6 translators implement identical translation rules. When a 398 traffic flow shifts from one translator to another, neither the 399 source address nor the destination address changes. 401 2.7. Load Balancing 403 In the architecture described above, site addressing determines load 404 balancing. If a host is numbered from the lower half of the SAB, its 405 address is mapped to CNB #1, which is announced only by Upstream 406 Provider #1 (as part of PAB #1). Therefore, during periods of normal 407 operation, all traffic bound for that host traverses Upstream 408 Provider #1 and NPTv6 #1. Likewise, if a host is numbered from the 409 upper half of the SAB, its address is mapped to CNB #2, which is 410 announced only by Upstream Provider #2 (as part of PAB #2). 411 Therefore, during periods of normal operation, all traffic bound for 412 that host traverses Upstream Provider #2 and NPTv6 #2. 414 Hosts that receive a great quantity of traffic can be assigned 415 multiple addresses, with some from the lower half and others from the 416 upper half of the SAB. The address chosen for any particular flow 417 determines the path of inbound traffic for that flow. For flows 418 initiated outside of the Internal Network, the site influences the 419 probability that a particular address will be used by manipulating 420 the type and number of PAB addresses advertised in DNS. 422 3. Discussion 424 This section discusses the merits of the proposed architecture, as 425 compared with other multihoming approaches [I-D.ietf-lisp] 426 [I-D.rja-ilnp-intro]. The following are benefits of the proposed 427 architecture: 429 Address mapping information is required only at the NPTv6 430 translator. There is no need to distribute mapping information 431 beyond the boundaries of the multihomed site. 433 Because only a small number of mapping rules are required at 434 each multihomed site, there is no need to cache these rules. 436 During periods of normal operation, packets do not need to be 437 encapsulated. Inbound traffic traverses a tunnel only during 438 failure periods and outbound traffic never traverses a tunnel. 440 The proposal can be realized using a wide variety of existing 441 encapsulation methods. It does not require a new encapsulation 442 method. 444 The failover/restoration mechanism is localized to a single 445 autonomous system. Once updated routing information has been 446 distributed throughout the autonomous system, the failover/ 447 restoration event is complete. 449 Benefit can be derived from incremental, partial and even 450 minimal deployment. 452 The cost of the solution is born by its beneficiaries (i.e., 453 primarily the multihomed site and secondarily multihomed site's 454 upstream provider). 456 The following are disadvantages of the proposed architecture: 458 By modifying IPv6 addresses, this architecture violates the 459 end-to-end principle. 461 The load balancing capabilities described in this memo may not 462 suffice for all sites. Those sites might be required to fall 463 back upon other load balancing solutions (e.g., advertising 464 multiple prefixes) 466 The time required to redistribute traffic from one path to 467 another is determined by DNS TTL 469 4. IANA Considerations 471 This document requires no IANA actions. 473 5. Security Considerations 475 As with any architecture that modifies source and destination 476 addresses, the operation of access control lists, firewalls and 477 intrusion detection systems may be impacted. Also many users may 478 confuse NPTv6 translation with a NAT. Two limitations of NAT are 479 that a) it does not support incoming connections without special 480 configuration and b) it requires symmetric routing across the NAT 481 device. Many users understood these limitations to be security 482 features. Because NPTv6 has neither of these limitations, it also 483 offers neither of these features. 485 6. Acknowledgements 487 Thanks to John Scudder, Yakov Rekhter and Warren Kumari for their 488 helpful comments, encouragement and support. Special thanks to 489 Johann Jonsson, James Piper, Ravinder Wali, Ashte Collins, Inga 490 Rollins and an anonymous donor, without whom this memo would not have 491 been written. 493 7. References 495 7.1. Normative References 497 [RFC1034] Mockapetris, P., "Domain names - concepts and facilities", 498 STD 13, RFC 1034, November 1987. 500 [RFC1918] Rekhter, Y., Moskowitz, R., Karrenberg, D., Groot, G., and 501 E. Lear, "Address Allocation for Private Internets", 502 BCP 5, RFC 1918, February 1996. 504 [RFC3582] Abley, J., Black, B., and V. Gill, "Goals for IPv6 Site- 505 Multihoming Architectures", RFC 3582, August 2003. 507 [RFC3596] Thomson, S., Huitema, C., Ksinant, V., and M. Souissi, 508 "DNS Extensions to Support IP Version 6", RFC 3596, 509 October 2003. 511 [RFC4193] Hinden, R. and B. Haberman, "Unique Local IPv6 Unicast 512 Addresses", RFC 4193, October 2005. 514 [RFC4271] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway 515 Protocol 4 (BGP-4)", RFC 4271, January 2006. 517 [RFC5881] Katz, D. and D. Ward, "Bidirectional Forwarding Detection 518 (BFD) for IPv4 and IPv6 (Single Hop)", RFC 5881, 519 June 2010. 521 [RFC6296] Wasserman, M. and F. Baker, "IPv6-to-IPv6 Network Prefix 522 Translation", RFC 6296, June 2011. 524 7.2. Informative References 526 [I-D.ietf-idr-best-external] 527 Marques, P., Fernando, R., Chen, E., Mohapatra, P., and H. 528 Gredler, "Advertisement of the best external route in 529 BGP", draft-ietf-idr-best-external-04 (work in progress), 530 April 2011. 532 [I-D.ietf-lisp] 533 Farinacci, D., Fuller, V., Meyer, D., and D. Lewis, 534 "Locator/ID Separation Protocol (LISP)", 535 draft-ietf-lisp-17 (work in progress), December 2011. 537 [I-D.rja-ilnp-intro] 538 Atkinson, R., "ILNP Concept of Operations", 539 draft-rja-ilnp-intro-11 (work in progress), July 2011. 541 Authors' Addresses 543 Ron Bonica (editor) 544 Juniper Networks 545 Sterling, Virginia 20164 546 USA 548 Email: rbonica@juniper.net 550 Fred Baker 551 Cisco Systems 552 Santa Barbara, California 93117 553 USA 555 Email: fred@cisco.com 557 Margaret Wasserman 558 Painless Security 559 356 Abbott Street 560 North Andover, Massachusetts 01845 561 USA 563 Phone: +1 781 405 7464 564 Email: mrw@painless-security.com 565 URI: http://www.painless-security.com 567 Gregory J. Miller 568 Verizon 569 Ashburn, Virginia 20147 570 USA 572 Email: gregory.j.miller@verizon.com 574 Warren Kumari 575 Google, Inc. 576 Mountain View, California 94043 578 Email: warren@kumari.net