idnits 2.17.00 (12 Aug 2021) /tmp/idnits55389/draft-ietf-pce-state-sync-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document date (21 April 2022) is 23 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) No issues found here. Summary: 0 errors (**), 0 flaws (~~), 0 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 PCE Working Group S. Litkowski 3 Internet-Draft Cisco 4 Intended status: Standards Track S. Sivabalan 5 Expires: 23 October 2022 Ciena Corporation 6 C. Li 7 H. Zheng 8 Huawei Technologies 9 21 April 2022 11 Inter Stateful Path Computation Element (PCE) Communication Procedures. 12 draft-ietf-pce-state-sync-02 14 Abstract 16 The Path Computation Element Communication Protocol (PCEP) provides 17 mechanisms for Path Computation Elements (PCEs) to perform path 18 computation in response to a Path Computation Client (PCC) request. 19 The Stateful PCE extensions allow stateful control of Multi-Protocol 20 Label Switching (MPLS) Traffic Engineering (TE) Label Switched Paths 21 (LSPs) using PCEP. 23 A Path Computation Client (PCC) can synchronize an LSP state 24 information to a Stateful Path Computation Element (PCE). A PCC can 25 have multiple PCEP sessions towards multiple PCEs. There are some 26 use cases, where an inter-PCE stateful communication can bring 27 additional resiliency in the design, for instance when some PCC-PCE 28 session fails. 30 This document describes the procedures to allow a stateful 31 communication between PCEs for various use-cases and also the 32 procedures to prevent computations loops. 34 Status of This Memo 36 This Internet-Draft is submitted in full conformance with the 37 provisions of BCP 78 and BCP 79. 39 Internet-Drafts are working documents of the Internet Engineering 40 Task Force (IETF). Note that other groups may also distribute 41 working documents as Internet-Drafts. The list of current Internet- 42 Drafts is at https://datatracker.ietf.org/drafts/current/. 44 Internet-Drafts are draft documents valid for a maximum of six months 45 and may be updated, replaced, or obsoleted by other documents at any 46 time. It is inappropriate to use Internet-Drafts as reference 47 material or to cite them other than as "work in progress." 48 This Internet-Draft will expire on 23 October 2022. 50 Copyright Notice 52 Copyright (c) 2022 IETF Trust and the persons identified as the 53 document authors. All rights reserved. 55 This document is subject to BCP 78 and the IETF Trust's Legal 56 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 57 license-info) in effect on the date of publication of this document. 58 Please review these documents carefully, as they describe your rights 59 and restrictions with respect to this document. Code Components 60 extracted from this document must include Revised BSD License text as 61 described in Section 4.e of the Trust Legal Provisions and are 62 provided without warranty as described in the Revised BSD License. 64 Table of Contents 66 1. Introduction and Problem Statement . . . . . . . . . . . . . 3 67 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 4 68 1.2. Reporting LSP Changes . . . . . . . . . . . . . . . . . . 4 69 1.3. Split-Brain . . . . . . . . . . . . . . . . . . . . . . . 5 70 1.4. Applicability to H-PCE . . . . . . . . . . . . . . . . . 12 71 2. Proposed solution . . . . . . . . . . . . . . . . . . . . . . 12 72 2.1. State-sync session . . . . . . . . . . . . . . . . . . . 12 73 2.2. Primary/Secondary relationship between PCE . . . . . . . 14 74 3. Procedures and Protocol Extensions . . . . . . . . . . . . . 14 75 3.1. Opening a state-sync session . . . . . . . . . . . . . . 14 76 3.1.1. Capability Advertisement . . . . . . . . . . . . . . 14 77 3.2. State synchronization . . . . . . . . . . . . . . . . . . 15 78 3.3. Incremental updates and report forwarding rules . . . . . 16 79 3.4. Maintaining LSP states from different sources . . . . . . 17 80 3.5. Computation priority between PCEs and sub-delegation . . 18 81 3.6. Passive stateful procedures . . . . . . . . . . . . . . . 19 82 3.7. PCE initiation procedures . . . . . . . . . . . . . . . . 20 83 4. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 20 84 4.1. Example 1 . . . . . . . . . . . . . . . . . . . . . . . . 20 85 4.2. Example 2 . . . . . . . . . . . . . . . . . . . . . . . . 22 86 4.3. Example 3 . . . . . . . . . . . . . . . . . . . . . . . . 23 87 5. Using Primary/Secondary Computation and State-sync Sessions to 88 increase Scaling . . . . . . . . . . . . . . . . . . . . 25 89 6. PCEP-PATH-VECTOR TLV . . . . . . . . . . . . . . . . . . . . 27 90 7. Security Considerations . . . . . . . . . . . . . . . . . . . 28 91 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 28 92 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 28 93 9.1. PCEP-Error Object . . . . . . . . . . . . . . . . . . . . 28 94 9.2. PCEP TLV Type Indicators . . . . . . . . . . . . . . . . 29 95 9.3. STATEFUL-PCE-CAPABILITY TLV . . . . . . . . . . . . . . . 29 97 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 29 98 10.1. Normative References . . . . . . . . . . . . . . . . . . 29 99 10.2. Informative References . . . . . . . . . . . . . . . . . 30 100 Appendix A. Contributors . . . . . . . . . . . . . . . . . . . . 31 101 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 32 103 1. Introduction and Problem Statement 105 The Path Computation Element communication Protocol (PCEP) [RFC5440] 106 provides mechanisms for Path Computation Elements (PCEs) to perform 107 path computations in response to Path Computation Clients' (PCCs) 108 requests. 110 A stateful PCE [RFC8231] is capable of considering, for the purposes 111 of path computation, not only the network state in terms of links and 112 nodes (referred to as the Traffic Engineering Database or TED) but 113 also the status of active services (previously computed paths, and 114 currently reserved resources, stored in the Label Switched Paths 115 Database (LSP-DB). 117 [RFC8051] describes general considerations for a stateful PCE 118 deployment and examines its applicability and benefits, as well as 119 its challenges and limitations through a number of use cases. 121 A PCC can synchronize an LSP state information to a Stateful PCE. 122 The stateful PCE extension allows a redundancy scenario where a PCC 123 can have redundant PCEP sessions towards multiple PCEs. In such a 124 case, a PCC gives control of a LSP to only a single PCE, and only one 125 PCE is responsible for path computation for this delegated LSP. 127 There are some use cases, where an inter-PCE stateful communication 128 can bring additional resiliency in the design, for instance when some 129 PCC-PCE session fails. The inter-PCE stateful communication may also 130 provide a faster update of the LSP states when such an event occurs. 131 Finally, when, in a redundant PCE scenario, there is a need to 132 compute a set of paths that are part of a group (so there is a 133 dependency between the paths), there may be some cases where the 134 computation of all paths in the group is not handled by the same PCE: 135 this situation is called a split-brain. This split-brain scenario 136 may lead to computation loops between PCEs or suboptimal path 137 computation. 139 This document describes the procedures to allow a stateful 140 communication between PCEs for various use-cases and also the 141 procedures to prevent computations loops. 143 Further, the examples in this section are for illustrative purpose to 144 showcase the need for inter-PCE stateful PCEP sessions. 146 1.1. Requirements Language 148 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 149 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 150 "OPTIONAL" in this document are to be interpreted as described in BCP 151 14 [RFC2119] [RFC8174] when, and only when, they appear in all 152 capitals, as shown here. 154 1.2. Reporting LSP Changes 156 When using a stateful PCE ([RFC8231]), a PCC can synchronize an LSP 157 state information to the stateful PCE. If the PCC grants the control 158 of the LSP to the PCE (called delegation [RFC8231]), the PCE can 159 update the LSP parameters at any time. 161 In a multi PCE deployment (redundancy, loadbalancing...), with the 162 current specification defined in [RFC8231], when a PCE makes an 163 update, it is the PCC that is in charge of reporting the LSP status 164 to all PCEs with LSP parameter change which brings additional hops 165 and delays in notifying the overall network of the LSP parameter 166 change. 168 This delay may affect the reaction time of the other PCEs if they 169 need to take action after being notified of the LSP parameter change. 171 Apart from the synchronization from the PCC, it is also useful if 172 there is a synchronization mechanism between the stateful PCEs. As 173 stateful PCE make changes to its delegated LSPs, these changes 174 (pending LSPs and the sticky resources [RFC7399]) can be synchronized 175 immediately to the other PCEs. 177 +----------+ 178 | PCC1 | LSP1 179 +----------+ 180 / \ 181 / \ 182 +---------+ +---------+ 183 | PCE1 | | PCE2 | 184 +---------+ +---------+ 185 \ / 186 \ / 187 +----------+ 188 | PCC2 | LSP2 189 +----------+ 191 In the figure above, we consider a load-balanced PCE architecture, so 192 PCE1 is responsible to compute paths for PCC1 and PCE2 is responsible 193 to compute paths for PCC2. When PCE1 triggers an LSP update for 194 LSP1, it sends a PCUpd message to PCC1 containing the new parameters 195 for LSP1. PCC1 will take the parameters into account and will send a 196 PCRpt message to PCE1 and PCE2 reflecting the changes. PCE2 will so 197 be notified of the change only after receiving the PCRpt message from 198 PCC1. 200 Let's consider that the LSP1 parameters changed in such a way that 201 LSP1 will take over resources from LSP2 with a higher priority. 202 After receiving the report from PCC1, PCE2 will therefore try to find 203 a new path for LSP2. If we consider that there is a round trip delay 204 of about 150 milliseconds (ms) between the PCEs and PCC1 and a round 205 trip delay of 10 ms between the two PCEs if will take more than 150 206 ms for PCE2 to be notified of the change. 208 Adding a PCEP session between PCE1 and PCE2 may allow to reduce the 209 synchronization time, so PCE2 can react more quickly by taking the 210 pending LSPs and attached resources into account during path 211 computation and re-optimization. 213 1.3. Split-Brain 215 In a resiliency case, a PCC has redundant PCEP sessions towards 216 multiple PCEs. In such a case, a PCC gives control on an LSP to a 217 single PCE only, and only this PCE is responsible for the path 218 computation for the delegated LSP: the PCC achieves this by setting 219 the D flag only towards the active PCE [RFC8231] selected for 220 delegation. The election of the active PCE to delegate an LSP is 221 controlled by each PCC. The PCC usually elects the active PCE by a 222 local configured policy (by setting a priority). Upon PCEP session 223 failure, or active PCE failure, PCC may decide to elect a new active 224 PCE by sending new PCRpt message with D flag set to this new active 225 PCE. When the failed PCE or PCEP session comes back online, it will 226 be up to the implementation to do preemption. Doing preemption may 227 lead to some disruption on the existing path if path results from 228 both PCEs are not exactly the same. By considering a network with 229 multiple PCCs and implementing multiple stateful PCEs for redundancy 230 purpose, there is no guarantee that at any time all the PCCs delegate 231 their LSPs to the same PCE. 233 +----------+ 234 | PCC1 | LSP1 235 +----------+ 236 / \ 237 / \ 238 +---------+ +---------+ 239 | PCE1 | | PCE2 | 240 +---------+ +---------+ 241 \ / 242 *fail* \ / 243 +----------+ 244 | PCC2 | LSP2 245 +----------+ 247 In the example above, we consider that by configuration, both PCCs 248 will firstly delegate their LSPs to PCE1. So, PCE1 is responsible 249 for computing a path for both LSP1 and LSP2. If the PCEP session 250 between PCC2 and PCE1 fails, PCC2 will delegate LSP2 to PCE2. So 251 PCE1 becomes responsible only for LSP1 path computation while PCE2 is 252 responsible for the path computation of LSP2. When the PCC2-PCE1 253 session is back online, PCC2 will keep using PCE2 as active PCE 254 (consider no preemption in this example). So the result is a 255 permanent situation where each PCE is responsible for a subset of 256 path computation. 258 This situation is called a split-brain scenario, as there are 259 multiple computation brains running at the same time while a central 260 computation unit was required in some deployments/use cases. 262 Further, there are use cases where a particular LSP path computation 263 is linked to another LSP path computation: the most common use case 264 is path disjointness (see [RFC8800]). The set of LSPs that are 265 dependent to each other may start from a different head-end. 267 _________________________________________ 268 / \ 269 / +------+ +------+ \ 270 | | PCE1 | | PCE2 | | 271 | +------+ +------+ | 272 | | 273 | +------+ +------+ | 274 | | PCC1 | ----------------------> | PCC2 | | 275 | +------+ +------+ | 276 | | 277 | | 278 | +------+ +------+ | 279 | | PCC3 | ----------------------> | PCC4 | | 280 | +------+ +------+ | 281 | | 282 \ / 283 \_________________________________________/ 285 _________________________________________ 286 / \ 287 / +------+ +------+ \ 288 | | PCE1 | | PCE2 | | 289 | +------+ +------+ | 290 | | 291 | +------+ 10 +------+ | 292 | | PCC1 | ----- R1 ---- R2 ------- | PCC2 | | 293 | +------+ | | +------+ | 294 | | | | 295 | | | | 296 | +------+ | | +------+ | 297 | | PCC3 | ----- R3 ---- R4 ------- | PCC4 | | 298 | +------+ +------+ | 299 | | 300 \ / 301 \_________________________________________/ 303 In the figure above, the requirement is to create two link-disjoint 304 LSPs: PCC1->PCC2 and PCC3->PCC4. In the topology, all links cost 305 metric is set to 1 except for the link 'R1-R2' which has a metric of 306 10. The PCEs are responsible for the path computation and PCE1 is 307 the active primary PCE for all PCCs in the nominal case. 309 Scenario 1: 311 In the normal case (PCE1 as active primary PCE), consider that 312 PCC1->PCC2 LSP is configured first with the link disjointness 313 constraint, PCE1 sends a PCUpd message to PCC1 with the ERO: 314 R1->R3->R4->R2->PCC2 (shortest path). PCC1 signals and installs the 315 path. When PCC3->PCC4 is configured, the PCEs already knows the path 316 of PCC1->PCC2 and can compute a link-disjoint path: the solution 317 requires to move PCC1->PCC2 onto a new path to let room for the new 318 LSP. PCE1 sends a PCUpd message to PCC1 with the new ERO: 319 R1->R2->PCC2 and a PCUpd to PCC3 with the following ERO: 320 R3->R4->PCC4. In the normal case, there is no issue for PCE1 to 321 compute a link-disjoint path. 323 Scenario 2: 325 Consider that PCC1 lost its PCEP session with PCE1 (all other PCEP 326 sessions are UP). PCC1 delegates its LSP to PCE2. 328 +----------+ 329 | PCC1 | LSP: PCC1->PCC2 330 +----------+ 331 \ 332 \ D=1 333 +---------+ +---------+ 334 | PCE1 | | PCE2 | 335 +---------+ +---------+ 336 D=1 \ / D=0 337 \ / 338 +----------+ 339 | PCC3 | LSP: PCC3->PCC4 340 +----------+ 342 Consider that the PCC1->PCC2 LSP is configured first with the link 343 disjointness constraint, PCE2 (which is the new active primary PCE 344 for PCC1) sends a PCUpd message to PCC1 with the ERO: 345 R1->R3->R4->R2->PCC2 (shortest path). When PCC3->PCC4 is configured, 346 PCE1 is not aware of LSPs from PCC1 any more, so it cannot compute a 347 disjoint path for PCC3->PCC4 and will send a PCUpd message to PCC3 348 with the shortest path ERO: R3->R4->PCC4. When PCC3->PCC4 LSP will 349 be reported to PCE2 by PCC3, PCE2 will ensure disjointness 350 computation and will correctly move PCC1->PCC2 (as it owns delegation 351 for this LSP) on the following path: R1->R2->PCC2. With this 352 sequence of event and these PCEP sessions, disjointness is ensured. 354 Scenario 3: 356 +----------+ 357 | PCC1 | LSP: PCC1->PCC2 358 +----------+ 359 / \ 360 D=1 / \ D=0 361 +---------+ +---------+ 362 | PCE1 | | PCE2 | 363 +---------+ +---------+ 364 / D=1 365 / 366 +----------+ 367 | PCC3 | LSP: PCC3->PCC4 368 +----------+ 370 Consider the above PCEP sessions and the PCC1->PCC2 LSP is configured 371 first with the link disjointness constraint, PCE1 computes the 372 shortest path as it is the only LSP in the disjoint association group 373 that it is aware of: R1->R3->R4->R2->PCC2 (shortest path). When 374 PCC3->PCC4 is configured, PCE2 must compute a disjoint path for this 375 LSP. The only solution found is to move PCC1->PCC2 LSP on another 376 path, but PCE2 cannot do it as it does not have delegation for this 377 LSP. In this set-up, PCEs are not able to find a disjoint path. 379 Scenario 4: 381 +----------+ 382 | PCC1 | LSP: PCC1->PCC2 383 +----------+ 384 / \ 385 D=1 / \ D=0 386 +---------+ +---------+ 387 | PCE1 | | PCE2 | 388 +---------+ +---------+ 389 D=0 \ / D=1 390 \ / 391 +----------+ 392 | PCC3 | LSP: PCC3->PCC4 393 +----------+ 395 Consider the above PCEP sessions and that PCEs are configured to 396 fall-back to the shortest path if disjointness cannot be found as 397 described in [RFC8800]. The PCC1->PCC2 LSP is configured first, PCE1 398 computes the shortest path as it is the only LSP in the disjoint 399 association group that it is aware of: R1->R3->R4->R2->PCC2 (shortest 400 path). When PCC3->PCC4 is configured, PCE2 must compute a disjoint 401 path for this LSP. The only solution found is to move PCC1->PCC2 LSP 402 on another path, but PCE2 cannot do it as it does not have delegation 403 for this LSP. PCE2 then provides the shortest path for PCC3->PCC4: 405 R3->R4->PCC4. When PCC3 receives the ERO, it reports it back to both 406 PCEs. When PCE1 becomes aware of the PCC3->PCC4 path, it recomputes 407 the constrained shortest path first (CSPF) algorithm and provides a 408 new path for PCC1->PCC2: R1->R2->PCC2. The new path is reported back 409 to all PCEs by PCC1. PCE2 recomputes also CSPF to take into account 410 the new reported path. The new computation does not lead to any path 411 update. 413 Scenario 5: 415 _____________________________________ 416 / \ 417 / +------+ +------+ \ 418 | | PCE1 | | PCE2 | | 419 | +------+ +------+ | 420 | | 421 | +------+ 100 +------+ | 422 | | | -------------------- | | | 423 | | PCC1 | ----- R1 ----------- | PCC2 | | 424 | +------+ | +------+ | 425 | | | | | 426 | 6 | | 2 | 2 | 427 | | | | | 428 | +------+ | +------+ | 429 | | PCC3 | ----- R3 ----------- | PCC4 | | 430 | +------+ 10 +------+ | 431 | | 432 \ / 433 \_____________________________________/ 435 Now, consider a new network topology with the same PCEP sessions as 436 the previous example. Suppose that both LSPs are configured almost 437 at the same time. PCE1 will compute a path for PCC1->PCC2 while PCE2 438 will compute a path for PCC3->PCC4. As each PCE is not aware of the 439 path of the second LSP in the association group (not reported yet), 440 each PCE is computing the shortest path for the LSP. PCE1 computes 441 ERO: R1->PCC2 for PCC1->PCC2 and PCE2 computes ERO: 442 R3->R1->PCC2->PCC4 for PCC3->PCC4. When these shortest paths will be 443 reported to each PCE. Each PCE will recompute disjointness. PCE1 444 will provide a new path for PCC1->PCC2 with ERO: PCC1->PCC2. PCE2 445 will provide also a new path for PCC3->PCC4 with ERO: R3->PCC4. When 446 those new paths will be reported to both PCEs, this will trigger CSPF 447 again. PCE1 will provide a new more optimal path for PCC1->PCC2 with 448 ERO: R1->PCC2 and PCE2 will also provide a more optimal path for 449 PCC3->PCC4 with ERO: R3->R1->PCC2->PCC4. So we come back to the 450 initial state. When those paths will be reported to both PCEs, this 451 will trigger CSPF again. An infinite loop of CSPF computation is 452 then happening with a permanent flap of paths because of the split- 453 brain situation. 455 This permanent computation loop comes from the inconsistency between 456 the state of the LSPs as seen by each PCE due to the split-brain: 457 each PCE is trying to modify at the same time its delegated path 458 based on the last received path information which de facto 459 invalidates this received path information. 461 Scenario 6: multi-domain 463 Domain/Area 1 Domain/Area 2 464 ________________ ________________ 465 / \ / \ 466 / +------+ | | +------+ \ 467 | | PCE1 | | | | PCE3 | | 468 | +------+ | | +------+ | 469 | | | | 470 | +------+ | | +------+ | 471 | | PCE2 | | | | PCE4 | | 472 | +------+ | | +------+ | 473 | | | | 474 | +------+ | | +------+ | 475 | | PCC1 | | | | PCC2 | | 476 | +------+ | | +------+ | 477 | | | | 478 | | | | 479 | +------+ | | +------+ | 480 | | PCC3 | | | | PCC4 | | 481 | +------+ | | +------+ | 482 \ | | | 483 \_______________/ \________________/ 485 In the example above, suppose that the disjoint LSPs from PCC1 to 486 PCC2 and from PCC4 to PCC3 are created. All the PCEs have the 487 knowledge of both domain topologies (e.g. using BGP-LS [RFC7752]). 488 For operation/management reasons, each domain uses its own group of 489 redundant PCEs. PCE1/PCE2 in domain 1 have PCEP sessions with PCC1 490 and PCC3 while PCE3/PCE4 in domain 2 have PCEP sessions with PCC2 and 491 PCC4. As PCE1/2 does not know about LSPs from PCC2/4 and PCE3/4 do 492 not know about LSPs from PCC1/3, there is no possibility to compute 493 the disjointness constraint. This scenario can also be seen as a 494 split-brain scenario. This multi-domain architecture (with multiple 495 groups of PCEs) can also be used in a single domain, where an 496 operator wants to limit the failure domain by creating multiple 497 groups of PCEs maintaining a subset of PCCs. As for the multi-domain 498 example, there will be no possibility to compute the disjoint path 499 starting from head-ends managed by different PCE groups. 501 In this document, we propose a solution that addresses the 502 possibility to compute LSP association based constraints (like 503 disjointness) in split-brain scenarios while preventing computation 504 loops. 506 1.4. Applicability to H-PCE 508 [RFC8751] describes general considerations and use cases for the 509 deployment of Stateful PCE(s) using the Hierarchical PCE [RFC6805] 510 architecture. In this architecture, there is a clear need to 511 communicate between a child stateful PCE and a parent stateful PCE. 512 The procedures and extensions as described in Section 3 are equally 513 applicable to the H-PCE scenario. 515 2. Proposed solution 517 Our solution is based on : 519 * The creation of the inter-PCE stateful PCEP session with specific 520 procedures. 522 * A Primary/Secondary relationship between PCEs. 524 2.1. State-sync session 526 This document proposes to set-up a PCEP session between the stateful 527 PCEs. Creating such a session is already authorized by multiple 528 scenarios like the one described in [RFC4655] (multiple PCEs that are 529 handling part of the path computation) and [RFC6805] (hierarchical 530 PCE) but was only focused on the stateless PCEP sessions. As 531 stateful PCE brings additional features (LSP state synchronization, 532 path update, delegation, ...), thus some new behaviors need to be 533 defined. 535 This inter-PCE PCEP session will allow the exchange of LSP states 536 between PCEs that would help some scenarios where PCEP sessions are 537 lost between PCC and PCE. This inter-PCE PCEP session is henceforth 538 called a state-sync session. 540 For example, in the scenario below, there is no possibility to 541 compute disjointness as there is no PCE that is aware of both LSPs. 543 +----------+ 544 | PCC1 | LSP: PCC1->PCC2 545 +----------+ 546 / 547 D=1 / 548 +---------+ +---------+ 549 | PCE1 | | PCE2 | 550 +---------+ +---------+ 551 / D=1 552 / 553 +----------+ 554 | PCC3 | LSP: PCC3->PCC4 555 +----------+ 557 If we add a state-sync session, PCE1 will be able to do state 558 synchronization via PCRpt messages for its LSP to PCE2 and PCE2 will 559 do the same. All the PCEs will be aware of all LSPs even if a 560 PCC->PCE session is down. PCEs will then be able to compute disjoint 561 paths. 563 +----------+ 564 | PCC1 | LSP : PCC1->PCC2 565 +----------+ 566 / 567 D=1 / 568 +---------+ PCEP +---------+ 569 | PCE1 | ----- | PCE2 | 570 +---------+ +---------+ 571 / D=1 572 / 573 +----------+ 574 | PCC3 | LSP : PCC3->PCC4 575 +----------+ 577 The procedures associated with this state-sync session are defined in 578 Section 3. 580 By just adding this state-sync session, it does not ensure that a 581 path with LSP association based constraints can always be computed 582 and does not prevent the computation loop, but it increases 583 resiliency and ensures that PCEs will have the state information for 584 all LSPs. Also, this session will allow for a PCE to update the 585 other PCEs providing a faster synchronization mechanism than relying 586 on PCCs only. 588 2.2. Primary/Secondary relationship between PCE 590 As seen in Section 1, performing a path computation in a split-brain 591 scenario (multiple PCEs responsible for computation) may provide a 592 non-optimal LSP placement, no path, or computation loops. To provide 593 the best efficiency, an LSP association constraint-based computation 594 requires that a single PCE performs the path computation for all LSPs 595 in the association group. Note that, it could be all LSPs belonging 596 to a particular association group, or all LSPs from a particular PCC, 597 or all LSPs in the network that need to be delegated to a single PCE 598 based on the deployment scenarios. 600 This document proposes to add a priority mechanism between PCEs to 601 elect a single computing PCE. Using this priority mechanism, PCEs 602 can agree on the PCE that will be responsible for the computation for 603 a particular association group, or set of LSPs. The priority could 604 be set per association, per PCC, or for all LSPs. How this priority 605 is set or advertised is out of the scope of this document. The rest 606 of the text considers the association group as an example. 608 When a single PCE is performing the computation for a particular 609 association group, no computation loop can happen and an optimal 610 placement will be provided. The other PCEs will only act as state 611 collectors and forwarders. 613 In the scenario described in Section 2.1, PCE1 and PCE2 will decide 614 that PCE1 will be responsible for the path computation of both LSPs. 615 If we first configure PCC1->PCC2, PCE1 computes the shortest path at 616 it is the only LSP in the disjoint-group that it is aware of: 617 R1->R3->R4->R2->PCC2 (shortest path). When PCC3->PCC4 is configured, 618 PCE2 will not perform computation even if it has delegation but 619 forwards the delegation via PCRpt message to PCE1 through the state- 620 sync session. PCE1 will then perform disjointness computation and 621 will move PCC1->PCC2 onto R1->R2->PCC2 and provides an ERO to PCE2 622 for PCC3->PCC4: R3->R4->PCC4. The PCE2 will further update the PCC3 623 with the new path. 625 3. Procedures and Protocol Extensions 627 3.1. Opening a state-sync session 629 3.1.1. Capability Advertisement 631 A PCE indicates its support of state-sync procedures during the PCEP 632 Initialization phase [RFC5440]. The OPEN object in the Open message 633 MUST contains the "Stateful PCE Capability" TLV defined in [RFC8231]. 634 A new P (INTER-PCE-CAPABILITY) flag is introduced to indicate the 635 support of state-sync. 637 This document adds a new bit in the Flags field with : 639 * P (INTER-PCE-CAPABILITY - 1 bit - TBD4): If set to 1 by a PCEP 640 Speaker, the PCEP speaker indicates that the session MUST follow 641 the state-sync procedures as described in this document. The P 642 bit MUST be set by both speakers: if a PCEP Speaker receives a 643 STATEFUL-PCE-CAPABILITY TLV with P=0 while it advertised P=1 or if 644 both set P flag to 0, the session SHOULD be set-up but the state- 645 sync procedures MUST NOT be applied on this session. 647 The U flag [RFC8231] MUST be set when sending the STATEFUL-PCE- 648 CAPABILITY TLV with the P flag set. In case the U flag is not set 649 along with the P flag, the state sync capability is not enabled and 650 it is considered as if the P flag is not set. The S flag MAY be set 651 if optimized synchronization is required as per [RFC8232]. 653 3.2. State synchronization 655 When the state sync capability has been negotiated between stateful 656 PCEs, each PCEP speaker will behave as a PCE and as a PCC at the same 657 time regarding the state synchronization as defined in [RFC8231]. 658 This means that each PCEP Speaker: 660 * MUST send a PCRpt message towards its neighbor with S flag set for 661 each LSP in its LSP database learned from a PCC. (PCC role) 663 * MUST send the End Of Synchronization Marker towards its neighbor 664 when all LSPs have been reported. (PCC role) 666 * MUST wait for the LSP synchronization from its neighbor to end 667 (receiving an End Of Synchronization Marker). (PCE role) 669 The process of synchronization runs in parallel on each PCE (with no 670 defined order). 672 The optimized state synchronization procedures MAY be used, as 673 defined in [RFC8232]. 675 When a PCEP Speaker sends a PCRpt on a state-sync session, it MUST 676 add the SPEAKER-IDENTITY-TLV (defined in [RFC8232]) in the LSP 677 Object, the value used will refer to the 'owner' PCC of the LSP. If 678 a PCEP Speaker receives a PCRpt on a state-sync session without this 679 TLV, it MUST discard the PCRpt message and it MUST reply with a PCErr 680 message using error-type=6 (Mandatory Object missing) and error- 681 value=TBD1 (SPEAKER-IDENTITY-TLV missing). 683 3.3. Incremental updates and report forwarding rules 685 During the life of an LSP, its state may change (path, constraints, 686 operational state...) and a PCC will advertise a new PCRpt to the PCE 687 for each such change. 689 When propagating LSP state changes from a PCE to other PCEs, it is 690 mandatory to ensure that a PCE always uses the freshest state coming 691 from the PCC. 693 When a PCE receives a new PCRpt from a PCC with the LSP-DB-VERSION, 694 the PCE MUST forward the PCRpt to all its state-sync sessions and 695 MUST add the appropriate SPEAKER-IDENTITY-TLV in the PCRpt. In 696 addition, it MUST add a new ORIGINAL-LSP-DB-VERSION TLV (described 697 below). The ORIGINAL-LSP-DB-VERSION contains the LSP-DB-VERSION 698 coming from the PCC. 700 When a PCE receives a new PCRpt from a PCC without the LSP-DB- 701 VERSION, it SHOULD NOT forward the PCRpt on any state-sync sessions 702 and log such an event on the first occurrence. 704 When a PCE receives a new PCRpt from a PCC with the R flag (Remove) 705 set and an LSP-DB-VERSION TLV, the PCE MUST forward the PCRpt to all 706 its state-sync sessions keeping the R flag set (Remove) and MUST add 707 the appropriate SPEAKER-IDENTITY-TLV and ORIGINAL-LSP-DB-VERSION TLV 708 in the PCRpt message. 710 When a PCE receives a PCRpt from a state-sync session, it MUST NOT 711 forward the PCRpt to other state-sync sessions. This helps to 712 prevent message loops between PCEs. As a consequence, a full mesh of 713 PCEP sessions between PCEs are REQUIRED. 715 When a PCRpt is forwarded, all the original objects and values are 716 kept. As an example, the PLSP-ID used in the forwarded PCRpt will be 717 the same as the original one used by the PCC. Thus an implementation 718 supporting this document MUST consider SPEAKER-IDENTITY-TLV and PLSP- 719 ID together to uniquely identify an LSP on the state-sync session. 721 The ORIGINAL-LSP-DB-VERSION TLV is encoded as follows and MUST always 722 contain the LSP-DB-VERSION received from the owner PCC of the LSP: 724 0 1 2 3 725 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 726 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 727 | Type=TBD2 | Length=8 | 728 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 729 | LSP State DB Version Number | 730 | | 731 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 733 Using the ORIGINAL-LSP-DB-VERSION TLV allows a PCE to keep using 734 optimized synchronization ([RFC8232]) with another PCE. In such a 735 case, the PCE will send a PCRpt to another PCE with both ORIGINAL- 736 LSP-DB-VERSION TLV and LSP-DB-VERSION TLV. The ORIGINAL-LSP-DB- 737 VERSION TLV will contain the version number as allocated by the PCC 738 while the LSP-DB-VERSION will contain the version number allocated by 739 the local PCE. 741 3.4. Maintaining LSP states from different sources 743 When a PCE receives a PCRpt on a state-sync session, it stores the 744 LSP information into the original PCC address context (as the LSP 745 belongs to the PCC). A PCE SHOULD maintain a single state for a 746 particular LSP and SHOULD maintain the list of sources it learned a 747 particular state from. 749 A PCEP speaker may receive state information for a particular LSP 750 from different sources: the PCC that owns the LSP (through a regular 751 PCEP session) and some PCEs (through PCEP state-sync sessions). A 752 PCEP speaker MUST always keep the freshest state in its LSP database, 753 overriding the previously received information. 755 A PCE, receiving a PCRpt from a PCC, updates the state of the LSP in 756 its LSP-DB with the newly received information. When receiving a 757 PCRpt from another PCE, a PCE SHOULD update the LSP state only if the 758 ORIGINAL-LSP-DB-VERSION present in the PCRpt is greater than the 759 current ORIGINAL-LSP-DB-VERSION of the stored LSP state. This 760 ensures that a PCE never tries to update its stored LSP state with an 761 old information. Each time a PCE updates an LSP state in its LSP-DB, 762 it SHOULD reset the source list associated with the LSP state and 763 SHOULD add the source speaker address in the source list. When a PCE 764 receives a PCRpt which has an ORIGINAL-LSP-DB-VERSION (if coming from 765 a PCE) or an LSP-DB-VERSION (if coming from the PCC) equals to the 766 current ORIGINAL-LSP-DB-VERSION of the stored LSP state, it SHOULD 767 add the source speaker address in the source list. 769 When a PCE receives a PCRpt requesting an LSP deletion from a 770 particular source, it SHOULD remove this particular source from the 771 list of sources associated with this LSP. 773 When the list of sources becomes empty for a particular LSP, the LSP 774 state MUST be removed. This means that all the sources must send a 775 PCRpt with R=1 for an LSP to make the PCE remove the LSP state. 777 3.5. Computation priority between PCEs and sub-delegation 779 A computation priority is necessary to ensure that a single PCE will 780 perform the computation for all the LSPs in an association group: 781 this will allow for a more optimized LSP placement and will prevent 782 computation loops. 784 All PCEs in the network that are handling LSPs in a common LSP 785 association group SHOULD be aware of each other including the 786 computation priority of each PCE. Note that there is no need for PCC 787 to be aware of this. The computation priority is a number and the 788 PCE having the highest priority SHOULD be responsible for the 789 computation. If several PCEs have the same priority value, their IP 790 address SHOULD be used as a tie-breaker to provide a rank: the 791 highest IP address has more priority. How PCEs are aware of the 792 priority of each other is out of the scope of this document, but as 793 example learning priorities could be done through PCE discovery or 794 local configuration. 796 The definition of the priority could be global so the highest 797 priority PCE will handle all path computations or more granular, so a 798 PCE may have the highest priority for only a subset of LSPs or 799 association-groups. 801 A PCEP Speaker receiving a PCRpt from a PCC with the D flag set that 802 does not have the highest computation priority, SHOULD forward the 803 PCRpt on all state-sync sessions (as per Section 3.3) and SHOULD set 804 D flag on the state-sync session towards the highest priority PCE, D 805 flag will be unset to all other state-sync sessions. This behavior 806 is similar to the delegation behavior handled at the PCC side and is 807 called a sub-delegation (the PCE sub-delegates the control of the LSP 808 to another PCE). When a PCEP Speaker sub-delegates an LSP to another 809 PCE, it loose control of the LSP and cannot update it anymore by its 810 own decision. When a PCE receives a PCRpt with D flag set on a 811 state-sync session, as a regular PCE, it is granted control over the 812 LSP. 814 If the highest priority PCE is failing or if the state-sync session 815 between the local PCE and the highest priority PCE failed, the local 816 PCE MAY decide to delegate the LSP to the next highest priority PCE 817 or to take back control of the LSP. It is a local policy decision. 819 When a PCE has the delegation for an LSP and needs to update this 820 LSP, it MUST send a PCUpd message to all state-sync sessions and to 821 the PCC session on which it received the delegation. The D-Flag 822 would be unset in the PCUpd for state-sync sessions whereas the 823 D-Flag would be set for the PCC. In the case of sub-delegation, the 824 computing PCE will send the PCUpd only to all state-sync sessions (as 825 it has no direct delegation from a PCC). The D-Flag would be set for 826 the state-sync session to the PCE that sub-delegated this LSP and the 827 D-Flag would be unset for other state-sync sessions. 829 The PCUpd sent over a state-sync session MUST contain the SPEAKER- 830 IDENTITY-TLV in the LSP Object (the value used must identify the 831 target PCC). The PLSP-ID used is the original PLSP-ID generated by 832 the PCC and learned from the forwarded PCRpt. If a PCE receives a 833 PCUpd on a state-sync session without the SPEAKER-IDENTITY-TLV, it 834 MUST discard the PCUpd and MUST reply with a PCErr message using 835 error-type=6 (Mandatory Object missing) and error-value=TBD1 836 (SPEAKER-IDENTITY-TLV missing). 838 When a PCE receives a valid PCUpd on a state-sync session, it SHOULD 839 forward the PCUpd to the appropriate PCC (identified based on the 840 SPEAKER-IDENTITY-TLV value) that delegated the LSP originally and 841 SHOULD remove the SPEAKER-IDENTITY-TLV from the LSP Object. The 842 acknowledgment of the PCUpd is done through a cascaded mechanism, and 843 the PCC is the only responsible for triggering the acknowledgment: 844 when the PCC receives the PCUpd from the local PCE, it acknowledges 845 it with a PCRpt as per [RFC8231]. When receiving the new PCRpt from 846 the PCC, the local PCE uses the defined forwarding rules on the 847 state-sync session so the acknowledgment is relayed to the computing 848 PCE. 850 A PCE SHOULD NOT compute a path using an association-group constraint 851 if it has delegation for only a subset of LSPs in the group. In this 852 case, an implementation MAY use a local policy on PCE to decide if 853 PCE does not compute path at all for this set of LSP or if it can 854 compute a path by relaxing the association-group constraint. 856 3.6. Passive stateful procedures 858 In the passive stateful PCE architecture, the PCC is responsible for 859 triggering a path computation request using a PCReq message to its 860 PCE. Similarly to PCRpt Message, which remains unchanged for passive 861 mode, if a PCE receives a PCReq for an LSP and if this PCE finds that 862 it does not have the highest computation priority of this LSP, or 863 groups..., it MUST forward the PCReq message to the highest priority 864 PCE over the state-sync session. When the highest priority PCE 865 receives the PCReq, it computes the path and generates a PCRep 866 message towards the PCE that made the request. This PCE will then 867 forward the PCRep to the requesting PCC. The handling of LSP object 868 and the SPEAKER-IDENTITY-TLV in PCReq and PCRep is similar to PCRpt/ 869 PCUpd messages. 871 3.7. PCE initiation procedures 873 It is possible that a PCE does not have a PCEP session with the 874 headend to initiate a LSP as per [RFC8281]. A PCE could send the 875 PCInitiate message on the state-sync sessions to other PCE to request 876 it to create a PCE-Initiated LSP on its behalf. If the PCE is able 877 to initiate the LSP it would report it on the state-sync session via 878 PCRpt message. If the PCE does not have a session to the headend, it 879 MUST send a PCErr message with Error-type=24 (PCE instantiation 880 error) and Error-value=TBD5 (No PCEP session with the headend). PCE 881 could try to initiate via another state-sync PCE if available. 883 4. Examples 885 The examples in this section are for illustrative purpose to show how 886 the behavior of the state sync inter-PCE sessions. 888 4.1. Example 1 889 _________________________________________ 890 / \ 891 / +------+ +------+ \ 892 | | PCE1 | | PCE2 | | 893 | +------+ +------+ | 894 | | 895 | +------+ 10 +------+ | 896 | | PCC1 | ----- R1 ---- R2 ------- | PCC2 | | 897 | +------+ | | +------+ | 898 | | | | 899 | | | | 900 | +------+ | | +------+ | 901 | | PCC3 | ----- R3 ---- R4 ------- | PCC4 | | 902 | +------+ +------+ | 903 | | 904 \ / 905 \_________________________________________/ 907 +----------+ 908 | PCC1 | LSP : PCC1->PCC2 909 +----------+ 910 / 911 D=1 / 912 +---------+ +---------+ 913 | PCE1 |----| PCE2 | 914 +---------+ +---------+ 915 / D=1 916 / 917 +----------+ 918 | PCC3 | LSP : PCC3->PCC4 919 +----------+ 921 PCE1 computation priority 100 922 PCE2 computation priority 200 924 Consider the PCEP sessions as shown above, where computation priority 925 is global for all the LSPs and link disjoint between LSPs PCC1->PCC2 926 and PCC3->PCC4 is required. 928 Consider the PCC1->PCC2 is configured first and PCC1 delegates the 929 LSP to PCE1, but as PCE1 does not have the highest computation 930 priority, it sub-delegates the LSP to PCE2 by sending a PCRpt with 931 D=1 and including the SPEAKER-IDENTITY-TLV over the state-sync 932 session. PCE2 receives the PCRpt and as it has delegation for this 933 LSP, it computes the shortest path: R1->R3->R4->R2->PCC2. It then 934 sends a PCUpd to PCE1 (including the SPEAKER-IDENTITY-TLV) with the 935 computed ERO. PCE1 forwards the PCUpd to PCC1 (removing the SPEAKER- 936 IDENTITY-TLV). PCC1 acknowledges the PCUpd by a PCRpt to PCE1. PCE1 937 forwards the PCRpt to PCE2. 939 When PCC3->PCC4 is configured, PCC3 delegates the LSP to PCE2, PCE2 940 can compute a disjoint path as it has knowledge of both LSPs and has 941 delegation also for both. The only solution found is to move 942 PCC1->PCC2 LSP on another path, PCE2 can move PCC1->PCC2 as it has 943 sub-delegation for it. It creates a new PCUpd with a new ERO: 944 R1->R2-PCC2 towards PCE1 which forwards to PCC1. PCE2 sends a PCUpd 945 to PCC3 with the path: R3->R4->PCC4. 947 In this set-up, PCEs are able to find a disjoint path while without 948 state-sync and computation priority they could not. 950 4.2. Example 2 951 _____________________________________ 952 / \ 953 / +------+ +------+ \ 954 | | PCE1 | | PCE2 | | 955 | +------+ +------+ | 956 | | 957 | +------+ 100 +------+ | 958 | | | -------------------- | | | 959 | | PCC1 | ----- R1 ----------- | PCC2 | | 960 | +------+ | +------+ | 961 | | | | | 962 | 6 | | 2 | 2 | 963 | | | | | 964 | +------+ | +------+ | 965 | | PCC3 | ----- R3 ----------- | PCC4 | | 966 | +------+ 10 +------+ | 967 | | 968 \ / 969 \_____________________________________/ 971 +----------+ 972 | PCC1 | LSP : PCC1->PCC2 973 +----------+ 974 / \ 975 D=1 / \ D=0 976 +---------+ +---------+ 977 | PCE1 |----| PCE2 | 978 +---------+ +---------+ 979 D=0 \ / D=1 980 \ / 981 +----------+ 982 | PCC3 | LSP : PCC3->PCC4 983 +----------+ 985 PCE1 computation priority 200 986 PCE2 computation priority 100 988 In this example, suppose both LSPs are configured almost at the same 989 time. PCE1 sub-delegates PCC1->PCC2 to PCE2 while PCE2 keeps 990 delegation for PCC3->PCC4, PCE2 computes a path for PCC1->PCC2 and 991 PCC3->PCC4 and can achieve disjointness computation easily. No 992 computation loop happens in this case. 994 4.3. Example 3 995 _________________________________________ 996 / \ 997 / +------+ +------+ \ 998 | | PCE1 | | PCE2 | | 999 | +------+ +------+ | 1000 | | 1001 | +------+ 10 +------+ | 1002 | | PCC1 | ----- R1 ---- R2 ------- | PCC2 | | 1003 | +------+ | | +------+ | 1004 | | | | 1005 | | | | 1006 | +------+ | | +------+ | 1007 | | PCC3 | ----- R3 ---- R4 ------- | PCC4 | | 1008 | +------+ +------+ | 1009 | | 1010 \ / 1011 \_________________________________________/ 1013 +----------+ 1014 | PCC1 | LSP : PCC1->PCC2 1015 +----------+ 1016 / 1017 D=1 / 1018 +---------+ +---------+ +---------+ 1019 | PCE1 |----| PCE2 |----| PCE3 | 1020 +---------+ +---------+ +---------+ 1021 / D=1 1022 / 1023 +----------+ 1024 | PCC3 | LSP : PCC3->PCC4 1025 +----------+ 1027 PCE1 computation priority 100 1028 PCE2 computation priority 200 1029 PCE3 computation priority 300 1031 With the PCEP sessions as shown above, consider the need to have link 1032 disjoint LSPs PCC1->PCC2 and PCC3->PCC4. 1034 Suppose PCC1->PCC2 is configured first, PCC1 delegates the LSP to 1035 PCE1, but as PCE1 does not have the highest computation priority, it 1036 will sub-delegate the LSP to PCE2 (as it not aware of PCE3 and has no 1037 way to reach it). PCE2 cannot compute a path for PCC1->PCC2 as it 1038 does not have the highest priority and is not allowed to sub-delegate 1039 the LSP again towards PCE3 as per Section 3. 1041 When PCC3->PCC4 is configured, PCC3 delegates the LSP to PCE2 that 1042 performs sub-delegation to PCE3. As PCE3 will have knowledge of only 1043 one LSP in the group, it cannot compute disjointness and can decide 1044 to fall-back to a less constrained computation to provide a path for 1045 PCC3->PCC4. In this case, it will send a PCUpd to PCE2 that will be 1046 forwarded to PCC3. 1048 Disjointness cannot be achieved in this scenario because of lack of 1049 state-sync session between PCE1 and PCE3, but no computation loop 1050 happens. Thus it is advised for all PCEs that support state-sync to 1051 have a full mesh sessions between each other. 1053 5. Using Primary/Secondary Computation and State-sync Sessions to 1054 increase Scaling 1056 The Primary/Secondary computation and state-sync sessions 1057 architecture can be used to increase the scaling of the PCE 1058 architecture. If the number of PCCs is really high, it may be too 1059 resource consuming for a single PCE to maintain all the PCEP sessions 1060 while at the same time performing all path computations. Using 1061 primary/secondary computation and state-sync sessions may allow to 1062 create groups of PCEs that manage a subset of the PCCs and perform 1063 some or no path computations. Decoupling PCEP session maintenance 1064 and computation will allow increasing scaling of the PCE 1065 architecture. 1067 +----------+ 1068 | PCC500 | 1069 +----------+-+ 1070 | PCC1 | 1071 +----------+ 1072 / \ 1073 / \ 1074 +---------+ +---------+ 1075 | PCE1 |---| PCE2 | 1076 +---------+ +---------+ 1077 | \ / | 1078 | \/ | 1079 | /\ | 1080 | / \ | 1081 +---------+ +---------+ 1082 | PCE3 |---| PCE4 | 1083 +---------+ +---------+ 1084 \ / 1085 \ / 1086 +----------+ 1087 | PCC501 | 1088 +----------+-+ 1089 | PCC1000 | 1090 +----------+ 1092 In the figure above, two groups of PCEs are created: PCE1/2 maintain 1093 PCEP sessions with PCC1 up to PCC500, while PCE3/4 maintain PCEP 1094 sessions with PCC501 up to PCC1000. A granular primary/secondary 1095 policy is set-up as follows to load-share computation between PCEs: 1097 * PCE1 has priority 200 for association ID 1 up to 300, association 1098 source 0.0.0.0. All other PCEs have a decreasing priority for 1099 those associations. 1101 * PCE3 has priority 200 for association ID 301 up to 500, 1102 association source 0.0.0.0. All other PCEs have a decreasing 1103 priority for those associations. 1105 If some PCCs delegate LSPs with association ID 1 up to 300 and 1106 association source 0.0.0.0, the receiving PCE (if not PCE1) will sub- 1107 delegate the LSPs to PCE1. PCE1 becomes responsible for the 1108 computation of these LSP associations while PCE3 is responsible for 1109 the computation of another set of associations. 1111 The procedures described in this document could help greatly in load- 1112 sharing between a group of stateful PCEs. 1114 6. PCEP-PATH-VECTOR TLV 1116 This document allows PCEP messages to be propagated among PCEP 1117 speaker. It may be useful to track information about the propagation 1118 of the messages. One of the use cases is a message loop detection 1119 mechanism, but other use cases like hop by hop information recording 1120 may also be implemented. 1122 This document introduces the PCEP-PATH-VECTOR TLV (type TBD3) with 1123 the following format: 1125 0 1 2 3 1126 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1127 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1128 | Type=TBD3 | Length | 1129 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1130 | PCEP-SPEAKER-INFORMATION#1 | 1131 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1132 | ... | 1133 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1134 | ... | 1135 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1136 | PCEP-SPEAKER-INFORMATION#n | 1137 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1139 The TLV format and padding rules are as per [RFC5440]. 1141 The PCEP-SPEAKER-INFORMATION field has the following format: 1143 0 1 2 3 1144 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1145 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1146 | Length | ID Length | 1147 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1148 // Speaker Entity identity (variable) // 1149 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1150 // SubTLVs (optional) // 1151 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1153 * Length: defines the total length of the PCEP-SPEAKER-INFORMATION 1154 field. 1156 * ID Length: defines the length of the Speaker identity actual field 1157 (non-padded). 1159 * Speaker Entity identity: same possible values as the SPEAKER- 1160 IDENTIFIER-TLV. Padded with trailing zeros to a 4-byte boundary. 1162 * The PCEP-SPEAKER-INFORMATION may also carry some optional subTLVs 1163 so each PCEP speaker can add local information that could be 1164 recorded. This document does not define any sub-TLV. 1166 The PCEP-PATH-VECTOR TLV MAY be carried in the LSP Object. Its usage 1167 is purely optional. 1169 The list of speakers within the PCEP-PATH-VECTOR TLV MUST be ordered. 1170 When sending a PCEP message (PCRpt, PCUpd, or PCInitiate), a PCEP 1171 Speaker MAY add the PCEP-PATH-VECTOR TLV with a PCEP-SPEAKER- 1172 INFORMATION containing its own information. If the PCEP message sent 1173 is the result of a previously received PCEP message, and if the PCEP- 1174 PATH-VECTOR TLV was already present in the initial message, the PCEP 1175 speaker MAY append a new PCEP-SPEAKER-INFORMATION containing its own 1176 information. 1178 7. Security Considerations 1180 The security considerations described in [RFC8231] and [RFC5440] 1181 apply to the extensions described in this document as well. 1182 Additional considerations related to state synchronization and sub- 1183 delegation between stateful PCEs are introduced, as it could be 1184 spoofed and could be used as an attack vector. An attacker could 1185 attempt to create too much state in an attempt to load the PCEP peer. 1186 The PCEP peer responds with a PCErr message as described in 1187 [RFC8231]. An attacker could impact LSP operations by creating bogus 1188 state. Further, state synchronization between stateful PCEs could 1189 provide an adversary with the opportunity to eavesdrop on the 1190 network. Thus, securing the PCEP session using Transport Layer 1191 Security (TLS) [RFC8253], as per the recommendations and best current 1192 practices in [RFC7525], is RECOMMENDED. 1194 8. Acknowledgements 1196 Thanks to [I-D.knodel-terminology] urging for better use of terms. 1198 9. IANA Considerations 1200 This document requests IANA actions to allocate code points for the 1201 protocol elements defined in this document. 1203 9.1. PCEP-Error Object 1205 IANA is requested to allocate a new Error Value for the Error Type 9. 1207 +============+============================+===========+ 1208 | Error-Type | Meaning | Reference | 1209 +============+============================+===========+ 1210 | 6 | Mandatory Object Missing | [RFC5440] | 1211 +------------+----------------------------+-----------+ 1212 | | Error-value=TBD1: SPEAKER- | This | 1213 | | IDENTITY-TLV missing | document | 1214 +------------+----------------------------+-----------+ 1215 | 24 | LSP instantiation error | [RFC8281] | 1216 +------------+----------------------------+-----------+ 1217 | | Error-value=TBD5: No PCEP | This | 1218 | | session with the headend | document | 1219 +------------+----------------------------+-----------+ 1221 Table 1 1223 9.2. PCEP TLV Type Indicators 1225 IANA is requested to allocate new TLV Type Indicator values within 1226 the "PCEP TLV Type Indicators" sub-registry of the PCEP Numbers 1227 registry, as follows: 1229 +=======+=============================+===============+ 1230 | Value | Meaning | Reference | 1231 +=======+=============================+===============+ 1232 | TBD2 | ORIGINAL-LSP-DB-VERSION TLV | This document | 1233 +-------+-----------------------------+---------------+ 1234 | TBD3 | PCEP-PATH-VECTOR TLV | This document | 1235 +-------+-----------------------------+---------------+ 1237 Table 2 1239 9.3. STATEFUL-PCE-CAPABILITY TLV 1241 IANA is requested to allocate a new bit value in the STATEFUL-PCE- 1242 CAPABILITY TLV Flag Field sub-registry. 1244 +======+======================+===============+ 1245 | Bit | Description | Reference | 1246 +======+======================+===============+ 1247 | TBD4 | INTER-PCE-CAPABILITY | This document | 1248 +------+----------------------+---------------+ 1250 Table 3 1252 10. References 1254 10.1. Normative References 1256 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1257 Requirement Levels", BCP 14, RFC 2119, 1258 DOI 10.17487/RFC2119, March 1997, 1259 . 1261 [RFC5440] Vasseur, JP., Ed. and JL. Le Roux, Ed., "Path Computation 1262 Element (PCE) Communication Protocol (PCEP)", RFC 5440, 1263 DOI 10.17487/RFC5440, March 2009, 1264 . 1266 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 1267 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 1268 May 2017, . 1270 [RFC8231] Crabbe, E., Minei, I., Medved, J., and R. Varga, "Path 1271 Computation Element Communication Protocol (PCEP) 1272 Extensions for Stateful PCE", RFC 8231, 1273 DOI 10.17487/RFC8231, September 2017, 1274 . 1276 [RFC8232] Crabbe, E., Minei, I., Medved, J., Varga, R., Zhang, X., 1277 and D. Dhody, "Optimizations of Label Switched Path State 1278 Synchronization Procedures for a Stateful PCE", RFC 8232, 1279 DOI 10.17487/RFC8232, September 2017, 1280 . 1282 [RFC8253] Lopez, D., Gonzalez de Dios, O., Wu, Q., and D. Dhody, 1283 "PCEPS: Usage of TLS to Provide a Secure Transport for the 1284 Path Computation Element Communication Protocol (PCEP)", 1285 RFC 8253, DOI 10.17487/RFC8253, October 2017, 1286 . 1288 10.2. Informative References 1290 [I-D.knodel-terminology] 1291 Knodel, M. and N. T. Oever, "Terminology, Power, and 1292 Exclusionary Language in Internet-Drafts and RFCs", Work 1293 in Progress, Internet-Draft, draft-knodel-terminology-09, 1294 9 February 2022, . 1297 [RFC4655] Farrel, A., Vasseur, J.-P., and J. Ash, "A Path 1298 Computation Element (PCE)-Based Architecture", RFC 4655, 1299 DOI 10.17487/RFC4655, August 2006, 1300 . 1302 [RFC6805] King, D., Ed. and A. Farrel, Ed., "The Application of the 1303 Path Computation Element Architecture to the Determination 1304 of a Sequence of Domains in MPLS and GMPLS", RFC 6805, 1305 DOI 10.17487/RFC6805, November 2012, 1306 . 1308 [RFC7399] Farrel, A. and D. King, "Unanswered Questions in the Path 1309 Computation Element Architecture", RFC 7399, 1310 DOI 10.17487/RFC7399, October 2014, 1311 . 1313 [RFC7525] Sheffer, Y., Holz, R., and P. Saint-Andre, 1314 "Recommendations for Secure Use of Transport Layer 1315 Security (TLS) and Datagram Transport Layer Security 1316 (DTLS)", BCP 195, RFC 7525, DOI 10.17487/RFC7525, May 1317 2015, . 1319 [RFC7752] Gredler, H., Ed., Medved, J., Previdi, S., Farrel, A., and 1320 S. Ray, "North-Bound Distribution of Link-State and 1321 Traffic Engineering (TE) Information Using BGP", RFC 7752, 1322 DOI 10.17487/RFC7752, March 2016, 1323 . 1325 [RFC8051] Zhang, X., Ed. and I. Minei, Ed., "Applicability of a 1326 Stateful Path Computation Element (PCE)", RFC 8051, 1327 DOI 10.17487/RFC8051, January 2017, 1328 . 1330 [RFC8281] Crabbe, E., Minei, I., Sivabalan, S., and R. Varga, "Path 1331 Computation Element Communication Protocol (PCEP) 1332 Extensions for PCE-Initiated LSP Setup in a Stateful PCE 1333 Model", RFC 8281, DOI 10.17487/RFC8281, December 2017, 1334 . 1336 [RFC8751] Dhody, D., Lee, Y., Ceccarelli, D., Shin, J., and D. King, 1337 "Hierarchical Stateful Path Computation Element (PCE)", 1338 RFC 8751, DOI 10.17487/RFC8751, March 2020, 1339 . 1341 [RFC8800] Litkowski, S., Sivabalan, S., Barth, C., and M. Negi, 1342 "Path Computation Element Communication Protocol (PCEP) 1343 Extension for Label Switched Path (LSP) Diversity 1344 Constraint Signaling", RFC 8800, DOI 10.17487/RFC8800, 1345 July 2020, . 1347 Appendix A. Contributors 1348 Dhruv Dhody 1349 Huawei Technologies 1350 Divyashree Techno Park, Whitefield 1351 Bangalore, Karnataka 560066 1352 India 1354 Email: dhruv.ietf@gmail.com 1356 Authors' Addresses 1358 Stephane Litkowski 1359 Cisco 1360 Email: slitkows.ietf@gmail.com 1362 Siva Sivabalan 1363 Ciena Corporation 1364 Email: msiva282@gmail.com 1366 Cheng Li 1367 Huawei Technologies 1368 Huawei Campus, No. 156 Beiqing Rd. 1369 Beijing 1370 100095 1371 China 1372 Email: c.l@huawei.com 1374 Haomian Zheng 1375 Huawei Technologies 1376 H1, Huawei Xiliu Beipo Village, Songshan Lake 1377 Dongguan 1378 Guangdong, 523808 1379 China 1380 Email: zhenghaomian@huawei.com