idnits 2.17.00 (12 Aug 2021) /tmp/idnits41408/draft-bonica-v6-multihome-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The draft header indicates that this document updates RFC6296, but the abstract doesn't seem to directly say this. It does mention RFC6296 though, so this could be OK. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (December 9, 2011) is 3816 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Experimental ---------------------------------------------------------------------------- == Outdated reference: A later version (-05) exists of draft-ietf-idr-best-external-04 == Outdated reference: draft-ietf-lisp has been published as RFC 6830 Summary: 0 errors (**), 0 flaws (~~), 3 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 INTAREA R. Bonica, Ed. 3 Internet-Draft Juniper Networks 4 Updates: 6296 (if approved) F. Baker 5 Intended status: Experimental Cisco Systems 6 Expires: June 11, 2012 M. Wasserman 7 Painless Security 8 G. Miller 9 Verizon 10 December 9, 2011 12 Multihoming with IPv6-to-IPv6 Network Prefix Translation (NPTv6) 13 draft-bonica-v6-multihome-01 15 Abstract 17 This memo describes an architecture for sites that are homed to 18 multiple upstream providers. The architecture described herein uses 19 IPv6-to-IPv6 Network Prefix Translation (NPTv6) to achieve 20 redundancy, transport-layer survivability, load sharing and address 21 independence. 23 This memo updates Section 2.4 of RFC 6296. 25 Status of this Memo 27 This Internet-Draft is submitted in full conformance with the 28 provisions of BCP 78 and BCP 79. 30 Internet-Drafts are working documents of the Internet Engineering 31 Task Force (IETF). Note that other groups may also distribute 32 working documents as Internet-Drafts. The list of current Internet- 33 Drafts is at http://datatracker.ietf.org/drafts/current/. 35 Internet-Drafts are draft documents valid for a maximum of six months 36 and may be updated, replaced, or obsoleted by other documents at any 37 time. It is inappropriate to use Internet-Drafts as reference 38 material or to cite them other than as "work in progress." 40 This Internet-Draft will expire on June 11, 2012. 42 Copyright Notice 44 Copyright (c) 2011 IETF Trust and the persons identified as the 45 document authors. All rights reserved. 47 This document is subject to BCP 78 and the IETF Trust's Legal 48 Provisions Relating to IETF Documents 49 (http://trustee.ietf.org/license-info) in effect on the date of 50 publication of this document. Please review these documents 51 carefully, as they describe your rights and restrictions with respect 52 to this document. Code Components extracted from this document must 53 include Simplified BSD License text as described in Section 4.e of 54 the Trust Legal Provisions and are provided without warranty as 55 described in the Simplified BSD License. 57 Table of Contents 59 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 60 1.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 4 61 2. NPTv6 Deployment . . . . . . . . . . . . . . . . . . . . . . . 4 62 2.1. Topology . . . . . . . . . . . . . . . . . . . . . . . . . 5 63 2.2. Addressing . . . . . . . . . . . . . . . . . . . . . . . . 6 64 2.2.1. Upstream Provider Addressing . . . . . . . . . . . . . 6 65 2.2.2. Site Addressing . . . . . . . . . . . . . . . . . . . 6 66 2.3. Address Translation . . . . . . . . . . . . . . . . . . . 7 67 2.4. Domain Name System (DNS) . . . . . . . . . . . . . . . . . 8 68 2.5. Routing . . . . . . . . . . . . . . . . . . . . . . . . . 8 69 2.6. Failure Detection and Recovery . . . . . . . . . . . . . . 9 70 2.7. Load Balancing . . . . . . . . . . . . . . . . . . . . . . 10 71 3. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 10 72 4. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11 73 5. Security Considerations . . . . . . . . . . . . . . . . . . . 11 74 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 11 75 7. References . . . . . . . . . . . . . . . . . . . . . . . . . . 12 76 7.1. Normative References . . . . . . . . . . . . . . . . . . . 12 77 7.2. Informative References . . . . . . . . . . . . . . . . . . 12 78 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 13 80 1. Introduction 82 [RFC3582] establishes the following goals for IPv6 site multihoming: 84 Redundancy - A site's ability to remain connected to the 85 Internet, even when connectivity through one or more of its 86 upstream providers fails. 88 Transport-Layer Survivability - A site's ability to maintain 89 transport-layer sessions across failover and restoration 90 events. During a failover/restoration event, the transport- 91 layer session may detect packet loss or reordering, but neither 92 of these cause the transport-layer session to fail. 94 Load Sharing - The ability of a site to distribute both inbound 95 and outbound traffic across its upstream providers. 97 [RFC3582] notes that a multihoming solution may require interactions 98 with the routing subsystem. However, multihoming solutions must be 99 simple and scalable. They must not require excessive operational 100 effort and must not cause excessive routing table expansion. 102 [RFC6296] explains how a site can achieve address independence using 103 IPv6-to-IPv6 Network Prefix Translation (NPTv6). In order to achieve 104 address independence, the site assigns an inside address to each of 105 its resources (e.g., hosts). Nodes outside of the site identify 106 those same resources using a corresponding Provider Allocated (PA) 107 address. 109 The site resolves this addressing dichotomy by deploying an NPTv6 110 translator between itself and its upstream provider. The NPTv6 111 translator maintains a static, one-to-one mapping between each inside 112 address and its corresponding PA address. That mapping persists 113 across flows and over time. 115 If the site disconnects from one upstream provider and connects to 116 another, it may lose its PA assignment. However, the site will not 117 need to renumber its resources. It will only need to reconfigure the 118 mapping rules on its local NPTv6 translator. 120 Section 2.4 of [RFC6296] describes an NPTv6 architecture for sites 121 that are homed to multiple upstream providers. While that 122 architecture fulfils many of the goals identified by [RFC3582], it 123 does not achieve transport-layer survivability. Transport-layer 124 survivability is not achieved because in this architecture, a PA 125 address is useable only when the multi-homed site is directly 126 connected to the allocating provider. 128 This memo describes an alternative architecture for multihomed sites 129 that require transport-layer survivability. It updates Section 2.4 130 of [RFC6296]. In this architecture, PA addresses remain usable, even 131 when the multihomed site loses its direct connection to the 132 allocating provider. 134 The architecture described in this document can be deployed in sites 135 that are served by two or more upstream providers. For the purpose 136 of example, this document demonstrates how the architecture can be 137 deployed in a site that is served by two upstream providers. 139 1.1. Terminology 141 The following terms are used in this document: 143 inbound packet - A packet that is destined for the multi-homed 144 site 146 outbound packet - A packet that originates at the multi-homed site 147 and is destined for a point outside of the multi-homed site 149 NPTv6 inside interface - An interface that connects an NPTv6 150 translator to the site 152 NPTv6 outside interface- An interface that connects an NPTv6 153 translator to an upstream provider 155 2. NPTv6 Deployment 157 This section demonstrates how NPTv6 can be deployed in order to 158 achieve the goals of [RFC3582]. 160 2.1. Topology 162 Upstream Upstream 163 Provider #1 Provider #2 164 / \ / \ 165 / \ / \ 166 / +------+ +------+ \ 167 +------+ |Backup| |Backup| +------+ 168 | PE | | PE | | PE | | PE | 169 | #1 | | #1 | | #2 | | #2 | 170 +------+ +------+ +------+ +------+ 171 | | 172 | | 173 +------+ +------+ 174 |NPTv6 | |NPTv6 | 175 | #1 | | #2 | 176 +------+ +------+ 177 | | 178 | | 179 ------------------------------------------------------ 180 Internal Network 182 Figure 1: NPTv6 Multihomed Topology 184 In Figure 1, a site attaches all of its assets, including two NPTv6 185 translators, to an Internal Network. NPTv6 #1 is connected to 186 Provider Edge (PE) Router #1, which is maintained by Upstream 187 Provider #1. Likewise, NPTv6 #2 is connected to PE Router #2, which 188 is maintained by Upstream Provider #2. 190 Each upstream provider also maintains a Backup PE Router. A 191 forwarding tunnel connects the loopback interface of Backup PE Router 192 #1 to the outside interface of NPTv6 #2. Likewise, another 193 forwarding tunnel connects Backup PE Router #2 to NPTv6 #1. Network 194 operators can select from many encapsulation techniques (e.g., GRE) 195 to realize the forwarding tunnel. Tunnels are not depicted in 196 Figure 1. 198 In the figure, NPTv6 #1 and NPTv6 #2 are depicted as separate boxes. 199 While vendors may produce a separate box to support the NPTv6 200 function, they may also integrate the NPTv6 function into a router. 202 During periods of normal operation, the Backup PE routers is very 203 lightly loaded. Therefore, a single Backup PE router may back up 204 multiple PE routers. Furthermore, the Backup PE router may be used 205 for other purposes (e.g., primary PE router for another customer). 207 2.2. Addressing 209 2.2.1. Upstream Provider Addressing 211 A Regional Internet Registry (RIR) allocates Provider Address Block 212 (PAB) #1 to Upstream Provider #1. From PAB #1, Upstream Provider #1 213 allocates two sub-blocks, using them as follows. 215 Upstream Provider #1 uses the first sub-block for its internal 216 address assignments. It also uses that sub-block for numbering both 217 ends of the interfaces between itself and its customers. 219 Upstream Provider #1 uses the second sub-block for address allocation 220 to its customers. We refer to a particular allocation from this sub- 221 block as a Customer Network Block (CNB). A CNB allocated for a 222 particular customer must be large enough to provide addressing for 223 the customer's entire Internal Network. In our example, Upstream 224 Provider #1 allocates a /60, called CNB #1, to its customer. 226 The customer configures translation rules that reference CNB #1 on 227 NPTv6 #1 and NPTv6 #2. This makes selected hosts that are connected 228 to the Internal Network accessible using CNB #1 addresses. See 229 Section 2.3 for details. 231 In a similar fashion, a Regional Internet Registry (RIR) allocates 232 PAB #2 to Upstream Provider #2. Upstream Provider #2, in turn, 233 allocates CNB #2 to the multihomed customer. 235 2.2.2. Site Addressing 237 The site obtains a Site Address Block (SAB), either from Unique Local 238 Address (ULA) [RFC4193] space, or by some other means. The SAB is as 239 large as all of the site's CNBs, combined. In this example, because 240 CNB #1 and CNB #2 are both /60's, the SAB is a /59. 242 The site divides its SAB into smaller blocks, with each block being 243 exactly as large as one CNB. It also associates each of the 244 resulting sub-blocks with one of its CNBs. In this example, the site 245 divides the SAB into a lower half and an upper half. It associates 246 the lower half of the SAB with CNB #1 and the upper half of the SAB 247 with CNB #2. 249 Finally, the site assigns one SAB address to each interface that is 250 connected to the Internal Network, including the inside interfaces of 251 the two NPTv6 translators. The site also assigns a SAB address to 252 the loopback interface of each NPTv6 translator. During periods of 253 normal operation, interfaces that are assigned addresses from the 254 lower half of the SAB receive traffic through Upstream Provider #1. 256 Likewise, interfaces that are assigned addresses from the upper half 257 of the SAB receive traffic through Upstream Provider #2. 259 Selected interfaces, because they receive a great deal of traffic, 260 must receive traffic through both upstream providers simultaneously. 261 Furthermore, those interfaces must control the portion of traffic 262 arriving through each upstream provider. The site assigns multiple 263 addresses to those interfaces, some from the lower half and others 264 from the upper half of the SAB. For any interface, the ratio of 265 upper half to lower half assignments roughly controls the portion of 266 traffic arriving through each upstream provider. See Section 2.3, 267 Section 2.5 and Section 2.7 for details. 269 2.3. Address Translation 271 Both NPTv6 translators are configured with the following rules: 273 For outbound packets, if the first 60 bits of the source 274 address identify the lower half of the SAB, overwrite those 60 275 bits with the 60 bits that identify CNB #1 277 For outbound packets, if the first 60 bits of the source 278 address identify the upper half of the SAB, overwrite those 60 279 bits with the 60 bits that identify CNB #2 281 For outbound packets, if none of the conditions above are met, 282 either drop or pass the packet without translation, according 283 to local security policy 285 For inbound packets, if the first 60 bits of the destination 286 address identify CNB #1, overwrite those 60 bits with the 60 287 bits that identify the lower half of the SAB 289 For inbound packets, if the first 60 bits of the destination 290 address identify CNB #2, overwrite those 60 bits with the 60 291 bits that identify the upper half of the SAB 293 For inbound packets, if none of the conditions above are met, 294 either drop or pass the packet without translation, according 295 to local security policy 297 Due to the nature of the rules described above, NPTv6 translation is 298 stateless. Therefore, traffic flows do not need to be symmetric 299 across NPTv6 translators. Furthermore, a traffic flow can shift from 300 one NPTv6 translator to another without causing transport-layer 301 session failure. 303 2.4. Domain Name System (DNS) 305 In order to make all site resources reachable by domain name 306 [RFC1034], the site publishes AAAA records [RFC3596] associating each 307 resource with all of its CNB addresses. While this DNS architecture 308 is sufficient, it is suboptimal. Traffic that both originates and 309 terminates within the site traverses NPTv6 translators needlessly. 310 Several optimizations are available. These optimizations are well 311 understood and have been applied to [RFC1918] networks for many 312 years. 314 2.5. Routing 316 Upstream Provider #1 uses an Interior Gateway Protocol to flood 317 topology information throughout its domain. It also uses BGP 318 [RFC4271] to distribute customer and peer reachability information. 320 PE #1 acquires a route to CNB #1 with NEXT-HOP equal to the outside 321 interface of NPTv6 #1. PE #1 can either learn this route from a 322 single-hop eBGP session with NPTv6 #1, or acquire it through static 323 configuration. In either case, PE #1 overwrites the NEXT-HOP of this 324 route with its own loopback address and distributes the route 325 throughout Upstream Provider #1 using iBGP. The LOCAL PREF for this 326 route is set high, so that the route will be preferred to alternative 327 routes to CNB #1. Upstream Provider #1 does not distribute this 328 route to CNB #1 outside of its own borders because it is part of the 329 larger aggregate PAB #1, which is itself advertised. 331 NPTv6 #1 acquires a default route with NEXT-HOP equal to the directly 332 connected interface on PE #1. NPTv6 #1 can either learn this route 333 from a single-hop eBGP session with PE #1, or acquire it through 334 static configuration. 336 Similarly, Backup PE #1 acquires a route to CNB #1 with NEXT-HOP 337 equal to the outside interface of NPTv6 #2. Backup PE #1 can either 338 learn this route from a multi-hop eBGP session with NPTv6 #2, or 339 acquire it through static configuration. In either case, Backup PE 340 #1 overwrites the NEXT-HOP of this route with its own loopback 341 address and distributes the route throughout Upstream Provider #1 342 using iBGP. Distribution procedures are defined in 343 [I-D.ietf-idr-best-external]. The LOCAL PREF for this route is set 344 low, so that the route will not be preferred to alternative routes to 345 CNB #1. Upstream Provider #1 does not distribute this route to CNB 346 #1 outside of its own borders. 348 Even if Backup PE #1 maintains an eBGP session NPTv6 #2, it does not 349 advertise the default route through that eBGP session. During 350 failures, Backup PE #1 does not attract outbound traffic to itself. 352 Finally, the Autonomous System Border Routers (ASBR) contained by 353 Upstream Provider #1 maintain eBGP sessions with their peers. The 354 ASBRs advertise only PAB #1 through those eBGP sessions. Upstream 355 Provider #1 does not advertise any of the following to its eBGP 356 peers: 358 any prefix that is contained by PAB #1 (i.e., more specific) 360 PAB #2 or any part thereof 362 the SAB or any part thereof 364 Upstream Provider #2 is configured in a manner analogous to that 365 described above. 367 2.6. Failure Detection and Recovery 369 When PE #1 loses its route to CNB #1, it withdraws its iBGP 370 advertisement for that prefix from Upstream Provider #1. The route 371 advertised by Backup PE #1 remains and Backup PE #1 attracts traffic 372 bound for CNB #1 to itself. Backup PE #1 forwards that traffic 373 through the tunnel to NPTv6 #2. NPTv6 #2 performs translations and 374 delivers the traffic to the Internal Network. 376 Likewise, when NPTv6 #1 loses its default route, it makes itself 377 unavailable as a gateway for other hosts on the Internal Network. 378 NPTv6 #2 attracts all outbound traffic to itself and forwards that 379 traffic through Upstream Provider #2. The mechanism by which NPTv6 380 #1 makes itself unavailable as a gateway is beyond the scope of this 381 document. 383 If PE #1 maintains a single-hop eBGP session with NPTv6 #1, the 384 failure of that eBGP session will cause both routes mentioned above 385 to be lost. Otherwise, another failure detection mechanism such as 386 BFD [RFC5881] is required. 388 Regardless of the failure detection mechanism, inbound traffic 389 traverses the tunnel only during failure periods and outbound traffic 390 never traverses the tunnel. Furthermore, restoration is localized. 391 As soon as the advertisement for CNB #1 is withdrawn throughout 392 Upstream Provider #1, restoration is complete. 394 Transport-layer connections survive Failure/Recovery events because 395 both NPTv6 translators implement identical translation rules. When a 396 traffic flow shifts from one translator to another, neither the 397 source address nor the destination address changes. 399 2.7. Load Balancing 401 In the architecture described above, site addressing determines load 402 balancing. If a host is numbered from the lower half of the SAB, its 403 address is mapped to CNB #1, which is announced only by Upstream 404 Provider #1 (as part of PAB #1). Therefore, during periods of normal 405 operation, all traffic bound for that host traverses Upstream 406 Provider #1 and NPTv6 #1. Likewise, if a host is numbered from the 407 upper half of the SAB, its address is mapped to CNB #2, which is 408 announced only by Upstream Provider #2 (as part of PAB #2). 409 Therefore, during periods of normal operation, all traffic bound for 410 that host traverses Upstream Provider #2 and NPTv6 #2. 412 Hosts that receive a great quantity of traffic can be assigned 413 multiple addresses, with some from the lower half and others from the 414 upper half of the SAB. The address chosen for any particular flow 415 determines the path of inbound traffic for that flow. For flows 416 initiated outside of the Internal Network, the site influences the 417 probability that a particular address will be used by manipulating 418 the type and number of PAB addresses advertised in DNS. 420 3. Discussion 422 This section discusses the merits of the proposed architecture, as 423 compared with other multihoming approaches [I-D.ietf-lisp] 424 [I-D.rja-ilnp-intro]. The following are benefits of the proposed 425 architecture: 427 Address mapping information is required only at the NPTv6 428 translator. There is no need to distribute mapping information 429 beyond the boundaries of the multihomed site. 431 Because only a small number of mapping rules are required at 432 each multihomed site, there is no need to cache these rules. 434 During periods of normal operation, packets do not need to be 435 encapsulated. Inbound traffic traverses a tunnel only during 436 failure periods and outbound traffic never traverses a tunnel. 438 The proposal can be realized using a wide variety of existing 439 encapsulation methods. It does not require a new encapsulation 440 method. 442 The failover/restoration mechanism is localized to a single 443 autonomous system. Once updated routing information has been 444 distributed throughout the autonomous system, the failover/ 445 restoration event is complete. 447 Benefit can be derived from incremental, partial and even 448 minimal deployment. 450 The cost of the solution is born by its beneficiaries (i.e., 451 primarily the multihomed site and secondarily multihomed site's 452 upstream provider). 454 The following are disadvantages of the proposed architecture: 456 By modifying IPv6 addresses, this architecture violates the 457 end-to-end principle. 459 The load balancing capabilities described in this memo may not 460 suffice for all sites. Those sites might be required to fall 461 back upon other load balancing solutions (e.g., advertising 462 multiple prefixes) 464 The time required to redistribute traffic from one path to 465 another is determined by DNS TTL 467 4. IANA Considerations 469 This document requires no IANA actions. 471 5. Security Considerations 473 As with any architecture that modifies source and destination 474 addresses, the operation of access control lists, firewalls and 475 intrusion detection systems may be impacted. Also many users may 476 confuse NPTv6 translation with a NAT. Two limitations of NAT are 477 that a) it does not support incoming connections without special 478 configuration and b) it requires symmetric routing across the NAT 479 device. Many users understood these limitations to be security 480 features. Because NPTv6 has neither of these limitations, it also 481 offers neither of these features. 483 6. Acknowledgements 485 Thanks to John Scudder, Yakov Rekhter and Warren Kumari for their 486 helpful comments, encouragement and support. Special thanks to 487 Johann Jonsson, James Piper, Ravinder Wali, Ashte Collins, Inga 488 Rollins and an anonymous donor, without whom this memo would not have 489 been written. 491 7. References 493 7.1. Normative References 495 [RFC1034] Mockapetris, P., "Domain names - concepts and facilities", 496 STD 13, RFC 1034, November 1987. 498 [RFC1918] Rekhter, Y., Moskowitz, R., Karrenberg, D., Groot, G., and 499 E. Lear, "Address Allocation for Private Internets", 500 BCP 5, RFC 1918, February 1996. 502 [RFC3582] Abley, J., Black, B., and V. Gill, "Goals for IPv6 Site- 503 Multihoming Architectures", RFC 3582, August 2003. 505 [RFC3596] Thomson, S., Huitema, C., Ksinant, V., and M. Souissi, 506 "DNS Extensions to Support IP Version 6", RFC 3596, 507 October 2003. 509 [RFC4193] Hinden, R. and B. Haberman, "Unique Local IPv6 Unicast 510 Addresses", RFC 4193, October 2005. 512 [RFC4271] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway 513 Protocol 4 (BGP-4)", RFC 4271, January 2006. 515 [RFC5881] Katz, D. and D. Ward, "Bidirectional Forwarding Detection 516 (BFD) for IPv4 and IPv6 (Single Hop)", RFC 5881, 517 June 2010. 519 [RFC6296] Wasserman, M. and F. Baker, "IPv6-to-IPv6 Network Prefix 520 Translation", RFC 6296, June 2011. 522 7.2. Informative References 524 [I-D.ietf-idr-best-external] 525 Marques, P., Fernando, R., Chen, E., Mohapatra, P., and H. 526 Gredler, "Advertisement of the best external route in 527 BGP", draft-ietf-idr-best-external-04 (work in progress), 528 April 2011. 530 [I-D.ietf-lisp] 531 Farinacci, D., Fuller, V., Meyer, D., and D. Lewis, 532 "Locator/ID Separation Protocol (LISP)", 533 draft-ietf-lisp-17 (work in progress), December 2011. 535 [I-D.rja-ilnp-intro] 536 Atkinson, R., "ILNP Concept of Operations", 537 draft-rja-ilnp-intro-11 (work in progress), July 2011. 539 Authors' Addresses 541 Ron Bonica (editor) 542 Juniper Networks 543 Sterling, Virginia 20164 544 USA 546 Email: rbonica@juniper.net 548 Fred Baker 549 Cisco Systems 550 Santa Barbara, California 93117 551 USA 553 Email: fred@cisco.com 555 Margaret Wasserman 556 Painless Security 557 356 Abbott Street 558 North Andover, Massachusetts 01845 559 USA 561 Phone: +1 781 405 7464 562 Email: mrw@painless-security.com 563 URI: http://www.painless-security.com 565 Gregory J. Miller 566 Verizon 567 Ashburn, Virginia 20147 568 USA 570 Email: gregory.j.miller@verizon.com