idnits 2.17.00 (12 Aug 2021) /tmp/idnits41703/draft-bonica-v6-multihome-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The draft header indicates that this document updates RFC6296, but the abstract doesn't seem to directly say this. It does mention RFC6296 though, so this could be OK. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (October 4, 2011) is 3882 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Experimental ---------------------------------------------------------------------------- == Outdated reference: A later version (-05) exists of draft-ietf-idr-best-external-04 == Outdated reference: draft-ietf-lisp has been published as RFC 6830 Summary: 0 errors (**), 0 flaws (~~), 3 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 INTAREA R. Bonica, Ed. 3 Internet-Draft Juniper Networks 4 Updates: 6296 (if approved) F. Baker 5 Intended status: Experimental Cisco Systems 6 Expires: April 6, 2012 M. Wasserman 7 Painless Security 8 October 4, 2011 10 Multihoming with IPv6-to-IPv6 Network Prefix Translation (NPTv6) 11 draft-bonica-v6-multihome-00 13 Abstract 15 This memo describes an architecture for sites that are homed to 16 multiple upstream providers. The architecture described herein uses 17 IPv6-to-IPv6 Network Prefix Translation (NPTv6) to achieve 18 redundancy, transport-layer survivability, load sharing and address 19 independence. 21 This memo updates Section 2.4 of RFC 6296. 23 Status of this Memo 25 This Internet-Draft is submitted in full conformance with the 26 provisions of BCP 78 and BCP 79. 28 Internet-Drafts are working documents of the Internet Engineering 29 Task Force (IETF). Note that other groups may also distribute 30 working documents as Internet-Drafts. The list of current Internet- 31 Drafts is at http://datatracker.ietf.org/drafts/current/. 33 Internet-Drafts are draft documents valid for a maximum of six months 34 and may be updated, replaced, or obsoleted by other documents at any 35 time. It is inappropriate to use Internet-Drafts as reference 36 material or to cite them other than as "work in progress." 38 This Internet-Draft will expire on April 6, 2012. 40 Copyright Notice 42 Copyright (c) 2011 IETF Trust and the persons identified as the 43 document authors. All rights reserved. 45 This document is subject to BCP 78 and the IETF Trust's Legal 46 Provisions Relating to IETF Documents 47 (http://trustee.ietf.org/license-info) in effect on the date of 48 publication of this document. Please review these documents 49 carefully, as they describe your rights and restrictions with respect 50 to this document. Code Components extracted from this document must 51 include Simplified BSD License text as described in Section 4.e of 52 the Trust Legal Provisions and are provided without warranty as 53 described in the Simplified BSD License. 55 Table of Contents 57 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 58 2. NPTv6 Deployment . . . . . . . . . . . . . . . . . . . . . . . 4 59 2.1. Topology . . . . . . . . . . . . . . . . . . . . . . . . . 4 60 2.2. Addressing . . . . . . . . . . . . . . . . . . . . . . . . 5 61 2.2.1. Upstream Provider Addressing . . . . . . . . . . . . . 5 62 2.2.2. Site Addressing . . . . . . . . . . . . . . . . . . . 5 63 2.3. Address Translation . . . . . . . . . . . . . . . . . . . 6 64 2.4. Domain Name System (DNS) . . . . . . . . . . . . . . . . . 7 65 2.5. Routing . . . . . . . . . . . . . . . . . . . . . . . . . 7 66 2.6. Failure Detection and Recovery . . . . . . . . . . . . . . 8 67 2.7. Load Balancing . . . . . . . . . . . . . . . . . . . . . . 9 68 3. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 9 69 4. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10 70 5. Security Considerations . . . . . . . . . . . . . . . . . . . 10 71 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 10 72 7. References . . . . . . . . . . . . . . . . . . . . . . . . . . 11 73 7.1. Normative References . . . . . . . . . . . . . . . . . . . 11 74 7.2. Informative References . . . . . . . . . . . . . . . . . . 11 75 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 12 77 1. Introduction 79 [RFC3582] establishes the following goals for IPv6 site multihoming: 81 Redundancy - A site's ability to remain connected to the 82 Internet, even when connectivity through one or more of its 83 upstream providers fails. 85 Transport-Layer Survivability - A site's ability to maintain 86 transport-layer sessions across failover and restoration 87 events. During a failover/restoration event, the transport- 88 layer session may detect packet loss or reordering, but neither 89 of these cause the transport-layer session to fail. 91 Load Sharing - The ability of a site to distribute both inbound 92 and outbound traffic across its upstream providers. 94 [RFC3582] notes that a multihoming solution may require interactions 95 with the routing subsystem. However, multihoming solutions must be 96 simple and scalable. They must not require excessive operational 97 effort and must not cause excessive routing table expansion. 99 [RFC6296] explains how a site can achieve address independence using 100 IPv6-to-IPv6 Network Prefix Translation (NPTv6). In order to achieve 101 address independence, the site assigns an inside address to each of 102 its resources (e.g., hosts). Nodes outside of the site identify 103 those same resources using a corresponding Provider Allocated (PA) 104 address. 106 The site resolves this addressing dichotomy by deploying an NPTv6 107 translator between itself and its upstream provider. The NPTv6 108 device maintains a static, one-to-one mapping between each inside 109 address and its corresponding PA address. That mapping persists 110 across flows and over time. 112 If the site disconnects from one upstream provider and connects to 113 another, it may lose its PA assignment. However, the site will not 114 need to renumber its resources. It will only need to reconfigure the 115 mapping rules on its local NPTv6 device. 117 Section 2.4 of [RFC6296] describes an NPTv6 architecture for sites 118 that are homed to multiple upstream providers. While that 119 architecture fulfils many of the goals identified by [RFC3582], it 120 does not achieve transport-layer survivability. This memo describes 121 an alternative architecture for multihomed sites that require 122 transport-layer survivability. It updates Section 2.4 of [RFC6296]. 124 2. NPTv6 Deployment 126 This section demonstrates how NPTv6 can be deployed in order to 127 achieve the goals of [RFC3582]. 129 2.1. Topology 131 Upstream Upstream 132 Provider #1 Provider #2 133 / \ / \ 134 / \ / \ 135 / +------+ +------+ \ 136 +------+ |Backup| |Backup| +------+ 137 | PE | | PE | | PE | | PE | 138 | #1 | | #1 | | #2 | | #2 | 139 +------+ +------+ +------+ +------+ 140 | | 141 | | 142 +------+ +------+ 143 |NPTv6 | |NPTv6 | 144 | #1 | | #2 | 145 +------+ +------+ 146 | | 147 | | 148 ------------------------------------------------------ 149 Internal Network 151 Figure 1: NPTv6 Multihomed Topology 153 In Figure 1, a site attaches all of its assets, including two NPTv6 154 translators, to an Internal Network. NPTv6 #1 is connected to 155 Provider Edge (PE) Router #1, which is maintained by Upstream 156 Provider #1. Likewise, NPTv6 #2 is connected to PE Router #2, which 157 is maintained by Upstream Provider #2. 159 Each upstream provider also maintains a Backup PE Router. A 160 forwarding tunnel connects the loopback interface of Backup PE Router 161 #1 to the outside interface of NPTv6 #2. Likewise, another 162 forwarding tunnel connects Backup PE Router #2 to NPTv6 #1. Network 163 operators can select from many encapsulation techniques (e.g., GRE) 164 to realize the forwarding tunnel. Tunnels are not depicted in 165 Figure 1. 167 In the figure, NPTv6 #1 and NPTv6 #2 are depicted as separate boxes. 168 While vendors may produce a separate box to support the NPTv6 169 function, they may also integrate the NPTv6 function into a router. 171 During periods of normal operation, the Backup PE routers is very 172 lightly loaded. Therefore, a single Backup PE router may back up 173 multiple PE routers. Furthermore, the Backup PE router may be used 174 for other purposes (e.g., primary PE router for another customer). 176 2.2. Addressing 178 2.2.1. Upstream Provider Addressing 180 A Regional Internet Registry (RIR) allocates Provider Address Block 181 (PAB) #1 to Upstream Provider #1. From PAB #1, Upstream Provider #1 182 allocates two sub-blocks, using them as follows. 184 Upstream Provider #1 uses the first sub-block for its internal 185 address assignments. It also uses that sub-block for numbering both 186 ends of the interfaces between itself and its customers. 188 Upstream Provider #1 uses the second sub-block for address allocation 189 to its customers. We refer to a particular allocation from this sub- 190 block as a Customer Network Block (CNB). A CNB allocated for a 191 particular customer must be large enough to provide addressing for 192 the customer's entire Internal Network. In our example, Upstream 193 Provider #1 allocates a /60, called CNB #1, to its customer. 195 The customer configures translation rules that reference CNB #1 on 196 NPTv6 #1 and NPTv6 #2. This makes selected hosts that are connected 197 to the Internal Network accessible using CNB #1 addresses. See 198 Section 2.3 for details. 200 In a similar fashion, a Regional Internet Registry (RIR) allocates 201 PAB #2 to Upstream Provider #2. Upstream Provider #2, in turn, 202 allocates CNB #2 to the multihomed customer. 204 2.2.2. Site Addressing 206 The site obtains a Site Address Block (SAB), either from Unique Local 207 Address (ULA) [RFC4193] space, or by some other means. The SAB is as 208 large as all of the site's CNBs, combined. In this example, because 209 CNB #1 and CNB #2 are both /60's, the SAB is a /59. 211 The site divides its SAB into smaller blocks, with each block being 212 exactly as large as one CNB. It also associates each of the 213 resulting sub-blocks with one of its CNBs. In this example, the site 214 divides the SAB into a lower half and an upper half. It associates 215 the lower half of the SAB with CNB #1 and the upper half of the SAB 216 with CNB #2. 218 Finally, the site assigns one SAB address to each interface that is 219 connected to the Internal Network, including the inside interfaces 220 interfaces of the two NPTv6 translators. The site also assigns a SAB 221 address to the loopback interface of each NPTv6 translator. During 222 periods of normal operation, interfaces that are assigned addresses 223 from the lower half of the SAB receive traffic through Upstream 224 Provider #1. Likewise, interfaces that are assigned addresses from 225 the upper half of the SAB receive traffic through Upstream Provider 226 #2. 228 Selected interfaces, because they receive a great deal of traffic, 229 must receive traffic through both upstream providers simultaneously. 230 Furthermore, those interfaces must control the portion of traffic 231 arriving through each upstream provider. The site assigns multiple 232 addresses to those interfaces, some from the lower half and others 233 from the upper half of the SAB. For any interface, the ratio of 234 upper half to lower half assignments roughly controls the portion of 235 traffic arriving through each upstream provider. See Section 2.3 and 236 Section 2.5 for details. 238 2.3. Address Translation 240 Both NPTv6 translators are configured with the following rules: 242 For outbound packets, if the first 60 bits of the source 243 address identify the lower half of the SAB, overwrite those 60 244 bits with the 60 bits that identify CNB #1 246 For outbound packets, if the first 60 bits of the source 247 address identify the upper half of the SAB, overwrite those 60 248 bits with the 60 bits that identify CNB #2 250 For outbound packets, if none of the conditions above are met, 251 either drop or pass the packet without translation, according 252 to local security policy 254 For inbound packets, if the first 60 bits of the destination 255 address identify CNB #1, overwrite those 60 bits with the 60 256 bits that identify the lower half of the SAB 258 For inbound packets, if the first 60 bits of the destination 259 address identify CNB #2, overwrite those 60 bits with the 60 260 bits that identify the upper half of the SAB 262 For inbound packets, if none of the conditions above are met, 263 either drop or pass the packet without translation, according 264 to local security policy 266 Due to the nature of the rules described above, NPTv6 translation is 267 stateless. Therefore, traffic flows do not need to be symmetric 268 across NPTv6 translators. Furthermore, a traffic flow can shift from 269 one NPTv6 translator to another without causing transport-layer 270 session failure. 272 2.4. Domain Name System (DNS) 274 In order to make all site resources reachable by domain name 275 [RFC1034], the site publishes AAAA records [RFC3596] associating each 276 resource with all of its CNB addresses. While this DNS architecture 277 is sufficient, it is suboptimal. Traffic that both originates and 278 terminates within the site traverses NPTv6 translators needlessly. 279 Several optimizations are available. These optimizations are well 280 understood and have been applied to [RFC1918] networks for many 281 years. 283 2.5. Routing 285 Upstream Provider #1 uses an Interior Gateway Protocol to flood 286 topology information throughout its domain. It also uses BGP 287 [RFC4271] to distribute customer and peer reachability information. 289 PE #1 acquires a route to CNB #1 with NEXT-HOP equal to the outside 290 interface of NPTv6 #1. PE #1 can either learn this route from a 291 single-hop eBGP session with NPTv6 #1, or acquire it through static 292 configuration. In either case, PE #1 overwrites the NEXT-HOP of this 293 route with its own loopback address and distributes the route 294 throughout Upstream Provider #1 using iBGP. The LOCAL PREF for this 295 route is set high, so that the route will be preferred to alternative 296 routes to CNB #1. Upstream Provider #1 does not distribute this 297 route to CNB #1 outside of its own borders. 299 NPTv6 #1 acquires a default route with NEXT-HOP equal to the directly 300 connected interface on PE #1. NPTv6 #1 can either learn this route 301 from a single-hop eBGP session with PE #1, or acquire it through 302 static configuration. 304 Similarly, Backup PE #1 acquires a route to CNB #1 with NEXT-HOP 305 equal to the outside interface of NPTv6 #2. Backup PE #1 can either 306 learn this route from a multi-hop eBGP session with NPTv6 #2, or 307 acquire it through static configuration. In either case, Backup PE 308 #1 overwrites the NEXT-HOP of this route with its own loopback 309 address and distributes the route throughout Upstream Provider #1 310 using iBGP. Distribution procedures are defined in 311 [I-D.ietf-idr-best-external]. The LOCAL PREF for this route is set 312 low, so that the route will not be preferred to alternative routes to 313 CNB #1. Upstream Provider #1 does not distribute this route to CNB 314 #1 outside of its own borders. 316 Even if Backup PE #1 maintains an eBGP session NPTv6 #2, it does not 317 advertise the default route through that eBGP session. During 318 failures, Backup PE #1 does not attract outbound traffic to itself. 320 Finally, the Autonomous System Border Routers (ASBR) contained by 321 Upstream Provider #1 maintain eBGP sessions with their peers. The 322 ASBRs advertise only PAB #1 through those eBGP sessions. Upstream 323 Provider #1 does not advertise any of the following to its eBGP 324 peers: 326 any prefix that is contained by PAB #1 (i.e., more specific) 328 PAB #2 or any part thereof 330 the SAB or any part thereof 332 Upstream Provider #2 is configured in a manner analogous to that 333 described above. 335 2.6. Failure Detection and Recovery 337 When PE #1 loses its route to CNB #1, it withdraws its iBGP 338 advertisement for that prefix from Upstream Provider #1. The route 339 advertised by Backup PE #1 remains and Backup PE #1 attracts traffic 340 bound for CNB #1 to itself. Backup PE #1 forwards that traffic 341 through the tunnel to NPTv6 #2. NPTv6 #2 performs translations and 342 delivers the traffic to the Internal Network. 344 Likewise, when NPTv6 #1 loses its default route, it makes itself 345 unavailable as a gateway for other hosts on the Internal Network. 346 NPTv6 #2 attracts all outbound traffic to itself and forwards that 347 traffic through Upstream Provider #2. The mechanism by which NPTv6 348 #1 makes itself unavailable as a gateway is beyond the scope of this 349 document. 351 If PE #1 maintains a single-hop eBGP session with NPTv6 #1, the 352 failure of that eBGP session will cause both routes mentioned above 353 to be lost. Otherwise, another failure detection mechanism such as 354 BFD [RFC5881] is required. 356 Regardless of the failure detection mechanism, inbound traffic 357 traverses the tunnel only during failure periods and outbound traffic 358 never traverses the tunnel. Furthermore, restoration is localized. 359 As soon as the advertisement for CNB #1 is withdrawn throughout 360 Upstream Provider #1, restoration is complete. 362 Transport-layer connections survive Failure/Recovery events because 363 both NPTv6 translators implement identical translation rules. When a 364 traffic flow shifts from one translator to another, neither the 365 source address nor the destination address changes. 367 2.7. Load Balancing 369 In the architecture described above, site addressing determines load 370 balancing. If a host is numbered from the lower half of the SAB, its 371 address is mapped to CNB #1, which is announced only by Upstream 372 Provider #1 (as part of PAB #1). Therefore, during periods of normal 373 operation, all traffic bound for that host traverses Upstream 374 Provider #1 and NPTv6 #1. Likewise, if a host is numbered from the 375 upper half of the SAB, its address is mapped to CNB #2, which is 376 announced only by Upstream Provider #2 (as part of PAB #2). 377 Therefore, during periods of normal operation, all traffic bound for 378 that host traverses Upstream Provider #2 and NPTv6 #2. 380 Hosts that receive a great quantity of traffic can be assigned 381 multiple addresses, with some from the lower half and others from the 382 upper half of the SAB. The address chosen for any particular flow 383 determines the path of inbound traffic for that flow. For flows 384 initiated outside of the Internal Network, the site influences the 385 probability that a particular address will be used by manipulating 386 the type and number of PAB addresses advertised in DNS. 388 3. Discussion 390 This section discusses the merits of the proposed architecture, as 391 compared with other multihoming approaches [I-D.ietf-lisp] 392 [I-D.rja-ilnp-intro]. The following are benefits of the proposed 393 architecture: 395 Address mapping information is required only at the NPTv6 396 translator. There is no need to distribute mapping information 397 beyond the boundaries of the multihomed site. 399 Because only a small number of mapping rules are required at 400 each multihomed site, there is no need to cache these rules. 402 During periods of normal operation, packets do not need to be 403 encapsulated. Inbound traffic traverses a tunnel only during 404 failure periods and outbound traffic never traverses a tunnel. 406 The proposal can be realized using a wide variety of existing 407 encapsulation methods. It does not require a new encapsulation 408 method. 410 The failover/restoration mechanism is localized to a single 411 autonomous system. Once updated routing information has been 412 distributed throughout the autonomous system, the failover/ 413 restoration event is complete. 415 Benefit can be derived from incremental, partial and even 416 minimal deployment. 418 The cost of the solution is born by its beneficiaries (i.e., 419 primarily the multihomed site and secondarily multihomed site's 420 upstream provider). 422 The following are disadvantages of the proposed architecture: 424 By modifying IPv6 addresses, this architecture violates the 425 end-to-end principle. 427 The load balancing capabilities described in this memo may not 428 suffice for all sites. Those sites might be required to fall 429 back upon other load balancing solutions (e.g., advertising 430 multiple prefixes) 432 The time required to redistribute traffic from one path to 433 another is determined by DNS TTL 435 4. IANA Considerations 437 This document requires no IANA actions. 439 5. Security Considerations 441 As with any architecture that modifies source and destination 442 addresses, the operation of access control lists, firewalls and 443 intrusion detection systems may be impacted. Also many users may 444 confuse NPTv6 translation with a NAT. Two limitations of NAT are 445 that a) it does not support incoming connections without special 446 configuration and b) it requires symmetric routing across the NAT 447 device. Many users understood these limitations to be security 448 features. Because NPTv6 has neither of these limitations, it also 449 offers neither of these features. 451 6. Acknowledgements 453 Thanks to John Scudder and Yakov Rekhter for their helpful comments, 454 encouragement and support. Special thanks to Johann Jonsson, James 455 Piper, Ravinder Wali, Ashte Collins, Inga Rollins and an anonymous 456 donor, without whom this memo would not have been written. 458 7. References 460 7.1. Normative References 462 [RFC1034] Mockapetris, P., "Domain names - concepts and facilities", 463 STD 13, RFC 1034, November 1987. 465 [RFC1918] Rekhter, Y., Moskowitz, R., Karrenberg, D., Groot, G., and 466 E. Lear, "Address Allocation for Private Internets", 467 BCP 5, RFC 1918, February 1996. 469 [RFC3582] Abley, J., Black, B., and V. Gill, "Goals for IPv6 Site- 470 Multihoming Architectures", RFC 3582, August 2003. 472 [RFC3596] Thomson, S., Huitema, C., Ksinant, V., and M. Souissi, 473 "DNS Extensions to Support IP Version 6", RFC 3596, 474 October 2003. 476 [RFC4193] Hinden, R. and B. Haberman, "Unique Local IPv6 Unicast 477 Addresses", RFC 4193, October 2005. 479 [RFC4271] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway 480 Protocol 4 (BGP-4)", RFC 4271, January 2006. 482 [RFC5881] Katz, D. and D. Ward, "Bidirectional Forwarding Detection 483 (BFD) for IPv4 and IPv6 (Single Hop)", RFC 5881, 484 June 2010. 486 [RFC6296] Wasserman, M. and F. Baker, "IPv6-to-IPv6 Network Prefix 487 Translation", RFC 6296, June 2011. 489 7.2. Informative References 491 [I-D.ietf-idr-best-external] 492 Marques, P., Fernando, R., Chen, E., Mohapatra, P., and H. 493 Gredler, "Advertisement of the best external route in 494 BGP", draft-ietf-idr-best-external-04 (work in progress), 495 April 2011. 497 [I-D.ietf-lisp] 498 Farinacci, D., Fuller, V., Meyer, D., and D. Lewis, 499 "Locator/ID Separation Protocol (LISP)", 500 draft-ietf-lisp-15 (work in progress), July 2011. 502 [I-D.rja-ilnp-intro] 503 Atkinson, R., "ILNP Concept of Operations", 504 draft-rja-ilnp-intro-11 (work in progress), July 2011. 506 Authors' Addresses 508 Ron Bonica (editor) 509 Juniper Networks 510 Sterling, Virginia 20164 511 USA 513 Email: rbonica@juniper.net 515 Fred Baker 516 Cisco Systems 517 Santa Barbara, California 93117 518 USA 520 Email: fred@cisco.com 522 Margaret Wasserman 523 Painless Security 524 356 Abbott Street 525 North Andover, MA 01845 526 USA 528 Phone: +1 781 405 7464 529 Email: mrw@painless-security.com 530 URI: http://www.painless-security.com