idnits 2.17.00 (12 Aug 2021) /tmp/idnits42545/draft-ietf-raw-oam-support-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document date (17 January 2022) is 124 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- No issues found here. Summary: 0 errors (**), 0 flaws (~~), 0 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 RAW F. Theoleyre 3 Internet-Draft CNRS 4 Intended status: Informational G.Z. Papadopoulos 5 Expires: 21 July 2022 IMT Atlantique 6 G. Mirsky 7 Ericsson 8 CJ. Bernardos 9 UC3M 10 17 January 2022 12 Operations, Administration and Maintenance (OAM) features for RAW 13 draft-ietf-raw-oam-support-03 15 Abstract 17 Some critical applications may use a wireless infrastructure. 18 However, wireless networks exhibit a bandwidth of several orders of 19 magnitude lower than wired networks. Besides, wireless transmissions 20 are lossy by nature; the probability that a packet cannot be decoded 21 correctly by the receiver may be quite high. In these conditions, 22 providing high reliability and a low delay is challenging. This 23 document lists the requirements of the Operation, Administration, and 24 Maintenance (OAM) features are recommended to construct a predictable 25 communication infrastructure on top of a collection of wireless 26 segments. This document describes the benefits, problems, and trade- 27 offs for using OAM in wireless networks to achieve Service Level 28 Objectives (SLO). 30 Status of This Memo 32 This Internet-Draft is submitted in full conformance with the 33 provisions of BCP 78 and BCP 79. 35 Internet-Drafts are working documents of the Internet Engineering 36 Task Force (IETF). Note that other groups may also distribute 37 working documents as Internet-Drafts. The list of current Internet- 38 Drafts is at https://datatracker.ietf.org/drafts/current/. 40 Internet-Drafts are draft documents valid for a maximum of six months 41 and may be updated, replaced, or obsoleted by other documents at any 42 time. It is inappropriate to use Internet-Drafts as reference 43 material or to cite them other than as "work in progress." 45 This Internet-Draft will expire on 21 July 2022. 47 Copyright Notice 49 Copyright (c) 2022 IETF Trust and the persons identified as the 50 document authors. All rights reserved. 52 This document is subject to BCP 78 and the IETF Trust's Legal 53 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 54 license-info) in effect on the date of publication of this document. 55 Please review these documents carefully, as they describe your rights 56 and restrictions with respect to this document. Code Components 57 extracted from this document must include Revised BSD License text as 58 described in Section 4.e of the Trust Legal Provisions and are 59 provided without warranty as described in the Revised BSD License. 61 Table of Contents 63 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 64 1.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 4 65 1.2. Acronyms . . . . . . . . . . . . . . . . . . . . . . . . 5 66 1.3. Requirements Language . . . . . . . . . . . . . . . . . . 6 67 2. Role of OAM in RAW . . . . . . . . . . . . . . . . . . . . . 6 68 2.1. Link concept and quality . . . . . . . . . . . . . . . . 7 69 2.2. Broadcast Transmissions . . . . . . . . . . . . . . . . . 8 70 2.3. Complex Layer 2 Forwarding . . . . . . . . . . . . . . . 8 71 2.4. End-to-end delay . . . . . . . . . . . . . . . . . . . . 8 72 3. Operation . . . . . . . . . . . . . . . . . . . . . . . . . . 8 73 3.1. Information Collection . . . . . . . . . . . . . . . . . 9 74 3.2. Continuity Check . . . . . . . . . . . . . . . . . . . . 9 75 3.3. Connectivity Verification . . . . . . . . . . . . . . . . 9 76 3.4. Route Tracing . . . . . . . . . . . . . . . . . . . . . . 9 77 3.5. Fault Verification/detection . . . . . . . . . . . . . . 10 78 3.6. Fault Isolation/identification . . . . . . . . . . . . . 10 79 4. Administration . . . . . . . . . . . . . . . . . . . . . . . 10 80 4.1. Worst-case metrics . . . . . . . . . . . . . . . . . . . 11 81 4.2. Efficient measurement retrieval (Passive OAM) . . . . . . 11 82 4.3. Reporting OAM packets to the source (Active OAM) . . . . 12 83 5. Maintenance . . . . . . . . . . . . . . . . . . . . . . . . . 13 84 5.1. Soft transition after reconfiguration . . . . . . . . . . 13 85 5.2. Predictive maintenance . . . . . . . . . . . . . . . . . 13 86 6. Requirements . . . . . . . . . . . . . . . . . . . . . . . . 13 87 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 14 88 8. Security Considerations . . . . . . . . . . . . . . . . . . . 14 89 9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 14 90 10. Informative References . . . . . . . . . . . . . . . . . . . 14 91 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 16 93 1. Introduction 95 Reliable and Available Wireless (RAW) is an effort that extends 96 DetNet to approach end-to-end deterministic performances over a 97 network that includes scheduled wireless segments. In wired 98 networks, many approaches try to enable Quality of Service (QoS) by 99 implementing traffic differentiation so that routers handle each type 100 of packets differently. However, this differentiated treatment was 101 expensive for most applications. 103 Deterministic Networking (DetNet) [RFC8655] has proposed to provide a 104 bounded end-to-end latency on top of the network infrastructure, 105 comprising both Layer 2 bridged and Layer 3 routed segments. Their 106 work encompasses the data plane, OAM, time synchronization, 107 management, control, and security aspects. 109 However, wireless networks create specific challenges. First of all, 110 radio bandwidth is significantly lower than in wired networks. In 111 these conditions, the volume of signaling messages has to be very 112 limited. Even worse, wireless links are lossy: a Layer 2 113 transmission may or may not be decoded correctly by the receiver, 114 depending on a broad set of parameters. Thus, providing high 115 reliability through wireless segments is particularly challenging. 117 Wired networks rely on the concept of _links_. All the devices 118 attached to a link receive any transmission. The concept of a link 119 in wireless networks is somewhat different from what many are used to 120 in wireline networks. A receiver may or may not receive a 121 transmission, depending on the presence of a colliding transmission, 122 the radio channel's quality, and the external interference. Besides, 123 a wireless transmission is broadcast by nature: any _neighboring_ 124 device may be able to decode it. This document includes detailed 125 information on the implications for the OAM features. 127 Last but not least, radio links present volatile characteristics. If 128 the wireless networks use an unlicensed band, packet losses are not 129 anymore temporally and spatially independent. Typically, links may 130 exhibit a very bursty characteristic, where several consecutive 131 packets may be dropped because of, e.g., temporary external 132 interference. Thus, providing availability and reliability on top of 133 the wireless infrastructure requires specific Layer 3 mechanisms to 134 counteract these bursty losses. 136 Operations, Administration, and Maintenance (OAM) Tools are of 137 primary importance for IP networks [RFC7276]. They define a toolset 138 for fault detection, isolation, and performance measurement. 140 The primary purpose of this document is to detail the specific 141 requirements of the OAM features recommended to construct a 142 predictable communication infrastructure on top of a collection of 143 wireless segments. This document describes the benefits, problems, 144 and trade-offs for using OAM in wireless networks to provide 145 availability and predictability. 147 1.1. Terminology 149 In this document, the term OAM will be used according to its 150 definition specified in [RFC6291]. We expect to implement an OAM 151 framework in RAW networks to maintain a real-time view of the network 152 infrastructure, and its ability to respect the Service Level 153 Objectives (SLO), such as delay and reliability, assigned to each 154 data flow. 156 We re-use here the same terminology as 157 [I-D.ietf-detnet-oam-framework]: 159 * OAM entity: a data flow to be monitored for defects and/or its 160 performance metrics measured.; 162 * Test End Point (TEP): OAM devices crossed when entering/exiting 163 the network. In RAW, it corresponds mostly to the source or 164 destination of a data flow. OAM message can be exchanged between 165 two TEPs; 167 * Monitoring endPoint (MonEP): an OAM system along the flow; a MonEP 168 MAY respond to an OAM message generated by the TEP; 170 * control/management/data plane: the control and management planes 171 are used to configure and control the network (long-term). The 172 data plane takes the individual decision. Relative to a data 173 flow, the control and/or management plane can be out-of-band; 175 * Active measurement methods (as defined in [RFC7799]) modify a 176 normal data flow by inserting novel fields, injecting specially 177 constructed test packets [RFC2544]). It is critical for the 178 quality of information obtained using an active method that 179 generated test packets are in-band with the monitored data flow. 180 In other words, a test packet is required to cross the same 181 network nodes and links and receive the same Quality of Service 182 (QoS) treatment as a data packet. Active methods may implement 183 one of these two strategies: 185 - In-band: control information follows the same path as the data 186 packets. In other words, a failure in the data plane may 187 prevent the control information from reaching the destination 188 (e.g., end-device or controller). 190 - out-of-band: control information is sent separately from the 191 data packets. Thus, the behavior of control vs. data packets 192 may differ; 194 * Passive measurement methods [RFC7799] infer information by 195 observing unmodified existing flows. 197 We also adopt the following terminology, which is particularly 198 relevant for RAW segments. 200 * piggybacking vs. dedicated control packets: control information 201 may be encapsulated in specific (dedicated) control packets. 202 Alternatively, it may be piggybacked in existing data packets, 203 when the MTU is larger than the actual packet length. 204 Piggybacking makes specifically sense in wireless networks, as the 205 cost (bandwidth and energy) is not linear with the packet size. 207 * router-over vs. mesh under: a control packet is either forwarded 208 directly to the layer-3 next hop (mesh under) or handled hop-by- 209 hop by each router. While the latter option consumes more 210 resources, it allows collecting additional intermediary 211 information, particularly relevant in wireless networks. 213 * Defect: a temporary change in the network (e.g., a radio link 214 which is broken due to a mobile obstacle); 216 * Fault: a definite change which may affect the network performance, 217 e.g., a node runs out of energy. 219 * End-to-end delay: the time between the packet generation and its 220 reception by the destination. 222 1.2. Acronyms 224 OAM Operations, Administration, and Maintenance 226 DetNet Deterministic Networking 228 PSE Path Selection Engine [I-D.pthubert-raw-architecture] 230 QoS Quality of Service 232 RAW Reliable and Available Wireless 233 SLO Service Level Objective 235 SNMP Simple Network Management Protocol 237 SDN Software-Defined Network 239 1.3. Requirements Language 241 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 242 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 243 "OPTIONAL" in this document are to be interpreted as described in BCP 244 14 [RFC2119] [RFC8174] when, and only when, they appear in all 245 capitals, as shown here. 247 2. Role of OAM in RAW 249 RAW networks expect to make the communications reliable and 250 predictable over a wireless network infrastructure. Most critical 251 applications will define an SLO required for the data flows it 252 generates. RAW considers network plane protocol elements such as OAM 253 to improve the RAW operation at the service and the forwarding sub- 254 layers. 256 To respect strict guarantees, RAW relies on the Path Selection Engine 257 (PSE) (as defined in [I-D.pthubert-raw-architecture] to monitor and 258 maintain the L3 network. An L2 scheduler may be used to allocate 259 transmission opportunities, based on the radio link characteristics, 260 SLO of the flows, the number of packets to forward. The PSE exploits 261 the L2 resources reserved by the scheduler and organizes the L3 paths 262 to introduce redundancy, fault tolerance and create backup paths. 263 OAM represents the core of the pre-provisioning process by 264 supervising the network. It maintains a global view of the network 265 resources to detect defects, faults, over-provisioning, anomalies. 267 Fault tolerance also assumes that multiple paths must be provisioned 268 so that an end-to-end circuit remains operational regardless of the 269 conditions. The Packet Replication and Elimination Function 270 ([I-D.thubert-bier-replication-elimination]) on a node is typically 271 controlled by the PSE. OAM mechanisms can be used to monitor that 272 PREOF is working correctly on a node and within the domain. 274 To be energy-efficient, out-of-band OAM SHOULD only be used to report 275 aggregated statistics (e.g., counters, histograms) from the nodes 276 using, e.g., SNMP or Netconf/Restconf using YANG-based data models. 277 The out-of-band OAM flow MAY use a dedicated control and management 278 channel, dedicated for this purpose. 280 RAW supports both proactive and on-demand troubleshooting. 281 Proactively, it is necessary to detect anomalies, report defects, or 282 reduce over-provisioning if it is not required. However, on-demand 283 may also be required to identify the cause of a specific defect. 284 Indeed, some specific faults may only be detected with a global, 285 detailed view of the network, which is too expensive to acquire in 286 the normal operating mode. 288 The specific characteristics of RAW are discussed below. 290 2.1. Link concept and quality 292 In wireless networks, a _link_ does not exist physically. A device 293 has a set of *neighbors* that correspond to all the devices that have 294 a non-null probability of receiving its packets correctly. We make a 295 distinction between: 297 * point-to-point (p2p) link with one transmitter and one receiver. 298 These links are used to transmit unicast packets. 300 * point-to-multipoint (p2mp) link associates one transmitter and a 301 collection of receivers. For instance, broadcast packets assume 302 the existence of p2mp links to avoid duplicating a broadcast 303 packet to reach each possible radio neighbor. 305 In scheduled radio networks, p2mp and p2p links are commonly not 306 scheduled simultaneously to save energy and/or to reduce the number 307 of collisions. More precisely, only one part of the neighbors may 308 wake up at a given instant. 310 Anycast is used in p2mp links to improve the reliability. A 311 collection of receivers are scheduled to wake up simultaneously, so 312 that the transmission fails only if none of the receivers can decode 313 the packet. 315 Each wireless link is associated with a link quality, often measured 316 as the Packet Delivery Ratio (PDR), i.e., the probability that the 317 receiver can decode the packet correctly. It is worth noting that 318 this link quality depends on many criteria, such as the level of 319 external interference, the presence of concurrent transmissions, or 320 the radio channel state. This link quality is even time-variant. 321 For p2mp links, consequently, we have a collection of PDR (one value 322 per receiver). Other more sophisticated, aggregated metrics exist 323 for these p2mp links, such as [anycast-property] 325 2.2. Broadcast Transmissions 327 The unicast transmission is delivered exclusively to the destination 328 in modern switching networks. Wireless networks are much closer to 329 the traditional *shared access* networks. Practically, unicast and 330 broadcast frames are handled similarly at the physical layer. The 331 link layer is just in charge of filtering the frames to discard 332 irrelevant receptions (e.g., different unicast MAC addresses). 334 However, contrary to wired networks, we cannot ensure that a packet 335 is received by *all* the devices attached to the Layer 2 segment. It 336 depends on the radio channel state between the transmitter(s) and the 337 receiver(s). In particular, concurrent transmissions may be possible 338 or not, depending on the radio conditions (e.g., do the different 339 transmitters use a different radio channel or are they sufficiently 340 spatially separated?) 342 2.3. Complex Layer 2 Forwarding 344 Multiple neighbors may receive a transmission. Thus, anycast Layer 2 345 forwarding helps to maximize reliability by assigning multiple 346 receivers to a single transmission. That way, the packet is lost 347 only if *none* of the receivers decode it. Practically, it has been 348 proven that different neighbors may exhibit very different radio 349 conditions, and that reception independence may hold for some of them 350 [anycast-property]. 352 2.4. End-to-end delay 354 In a wireless network, additional transmissions opportunities are 355 provisioned to accommodate packet losses. Thus, the end-to-end delay 356 consists of: 358 * Transmission delay, which is fixed and depends mainly on the data 359 rate, and the presence or absence of an acknowledgement. 361 * Residence time, corresponds to the buffering delay and depends on 362 the schedule. To account for retransmissions, the residence time 363 is equal to the difference between the time of last reception from 364 the previous hop (among all the retransmissions) and the time of 365 emission of the last retransmission. 367 3. Operation 369 OAM features will enable RAW with robust operation both for 370 forwarding and routing purposes. 372 3.1. Information Collection 374 The model for exchanging information should be the same as for a 375 DetNet network to ensure inter-operability. YANG may typically 376 fulfill this objective. 378 However, RAW networks imply specific constraints (e.g., low 379 bandwidth, packet losses, cost of medium access) that may require to 380 minimize the volume of information to collect. Thus, we discuss in 381 Section 4.2 different ways to collect information, i.e., transfer the 382 OAM information physically from the emitter to the receiver. This 383 corresponds to passive OAM as defined in [RFC7799] 385 3.2. Continuity Check 387 Similarly to DetNet, we need to verify that the source and the 388 destination are connected (at least one valid path exists) 390 3.3. Connectivity Verification 392 As in DetNet, we have to verify the absence of misconnection. We 393 focus here on the RAW specificities. 395 Because of radio transmissions' broadcast nature, several receivers 396 may be active at the same time to enable anycast Layer 2 forwarding. 397 Thus, the connectivity verification must test any combination. We 398 also consider priority-based mechanisms for anycast forwarding, i.e., 399 all the receivers have different probabilities of forwarding a 400 packet. To verify a delay SLO for a given flow, we must also 401 consider all the possible combinations, leading to a probability 402 distribution function for end-to-end transmissions. If this 403 verification is implemented naively, the number of combinations to 404 test may be exponential and too costly for wireless networks with low 405 bandwidth. 407 3.4. Route Tracing 409 Wireless networks are broadcast by nature: a radio transmission can 410 be decoded by any radio neighbor. In multihop wireless networks, 411 several paths exist between two endpoints. In hub networks, a device 412 may be covered by several Access Points. We should choose the most 413 efficient path or AP, concerning specifically the reliability, and 414 the delay. 416 Thus, multipath routing / multi-attachment can be viewed as making 417 the network more fault-tolerant. Even better, we can exploit the 418 broadcast nature of wireless networks: we may have multiple 419 Monitoring Endpoints (MonEP) for each of these kinds of hop. While 420 it may be reasonable in the multi-attachment case, the complexity 421 quickly increases with the path length. Indeed, each Maintenance 422 Intermediate Endpoint has several possible next hops in the 423 forwarding plane. Thus, all the possible paths between two 424 maintenance endpoints should be retrieved, which may quickly become 425 intractable if we apply a naive approach. 427 3.5. Fault Verification/detection 429 Wired networks tend to present stable performances. On the contrary, 430 wireless networks are time-variant. We must consequently make a 431 distinction between normal evolutions and malfunction. 433 3.6. Fault Isolation/identification 435 The network has isolated and identified the cause of the fault. 436 While DetNet already expects to identify malfunctions, some problems 437 are specific to wireless networks. We must consequently collect 438 metrics and implement algorithms tailored for wireless networking. 440 For instance, the decrease in the link quality may be caused by 441 several factors: external interference, obstacles, multipath fading, 442 mobility. It is fundamental to be able to discriminate the different 443 causes to make the right decision. 445 4. Administration 447 The RAW network has to expose a collection of metrics to support an 448 operator making proper decisions, including: 450 * Packet losses: the time-window average and maximum values of the 451 number of packet losses have to be measured. Many critical 452 applications stop working if a few consecutive packets are 453 dropped; 455 * Received Signal Strength Indicator (RSSI) is a very common metric 456 in wireless to denote the link quality. The radio chipset is in 457 charge of translating a received signal strength into a normalized 458 quality indicator; 460 * Delay: the time elapsed between a packet generation / enqueuing 461 and its reception by the next hop; 463 * Buffer occupancy: the number of packets present in the buffer, for 464 each of the existing flows. 466 * Battery lifetime: the expected remaining battery lifetime of the 467 device. Since many RAW devices might be battery-powered, this is 468 an important metric for an operator to make proper decisions. 470 * Mobility: if a device is known to be mobile, this might be 471 considered by an operator to take proper decisions. 473 These metrics should be collected per device, virtual circuit, and 474 path, as DetNet already does. However, in RAW, we have to deal with 475 them at a finer granularity: 477 * per radio channel to measure, e.g., the level of external 478 interference, and to be able to apply counter-measures (e.g., 479 blacklisting). 481 * per physical radio technology / interface, if a device has 482 multiple NICs. 484 * per link to detect misbehaving link (asymmetrical link, 485 fluctuating quality). 487 * per resource block: a collision in the schedule is particularly 488 challenging to identify in radio networks with spectrum reuse. In 489 particular, a collision may not be systematic (depending on the 490 radio characteristics and the traffic profile). 492 4.1. Worst-case metrics 494 RAW inherits the same requirements as DetNet: we need to know the 495 distribution of a collection of metrics. However, wireless networks 496 are known to be highly variable. Changes may be frequent, and may 497 exhibit a periodical pattern. Collecting and analyzing this amount 498 of measurements is challenging. 500 Wireless networks are known to be lossy, and RAW has to implement 501 strategies to improve reliability on top of unreliable links. 502 Reliability is typically achieved through Automatic Repeat Request 503 (ARQ), and Forward Error Correction (FEC). Since the different flows 504 don't have the same SLO, RAW must adjust the ARQ and FEC based on the 505 link and path characteristics. 507 4.2. Efficient measurement retrieval (Passive OAM) 509 We have to minimize the number of statistics / measurements to 510 exchange: 512 * energy efficiency: low-power devices have to limit the volume of 513 monitoring information since every bit consumes energy. 515 * bandwidth: wireless networks exhibit a bandwidth significantly 516 lower than wired, best-effort networks. 518 * per-packet cost: it is often more expensive to send several 519 packets instead of combining them in a single link-layer frame. 521 In conclusion, we have to take care of power and bandwidth 522 consumption. The following techniques aim to reduce the cost of such 523 maintenance: 525 * on-path collection: some control information is inserted in the 526 data packets if they do not fragment the packet (i.e., the MTU is 527 not exceeded). Information Elements represent a standardized way 528 to handle such information. IP hop by hop extension headers may 529 help to collect metrics all along the path; 531 * flags/fields: we have to set-up flags in the packets to monitor to 532 be able to monitor the forwarding process accurately. A sequence 533 number field may help to detect packet losses. Similarly, path 534 inference tools such as [ipath] insert additional information in 535 the headers to identify the path followed by a packet a 536 posteriori. 538 * hierarchical monitoring: localized and centralized mechanisms have 539 to be combined together. Typically, a local mechanism should 540 continuously monitor a set of metrics and trigger remote OAM 541 exchanges only when a fault is detected (but possibly not 542 identified). For instance, local temporary defects must not 543 trigger expensive OAM transmissions. Besides, the wireless 544 segments often represent the weakest parts of a path: the volume 545 of control information they produce has to be fixed accordingly. 547 Several passive techniques can be combined. For instance, the DetNet 548 forwarding sublayer MAY combine In-band Network Telemetry (INT) with 549 P4, iOAM and iPath to compute and report different statistics in the 550 track (e.g., number of link-layer retransmissions, link reliability). 552 4.3. Reporting OAM packets to the source (Active OAM) 554 The Test EndPoint will collect measurements from the OAM probes 555 received in the monitored track. However, the aggregated statistics 556 must then be reported to the other Test Endpoint that injected the 557 probes. Unfortunately, the monitored track MAY be unidirectional. 558 In this case, the statistics have to be reported out-of-band 559 (through, e.g., a dedicated control or management channel). 561 It is worth noting that Active OAM and Passive OAM techniques are not 562 mutually exclusive. In particular, Active OAM is useful when a 563 statistic cannot be acquired accurately passively. 565 5. Maintenance 567 Maintenance needs to facilitate the maintenance (repairs and 568 upgrades). In wireless networks, repairs are expected to occur much 569 more frequently, since the link quality may be highly time-variant. 570 Thus, maintenance represents a key feature for RAW. 572 5.1. Soft transition after reconfiguration 574 Because of the wireless medium, the link quality may fluctuate, and 575 the network needs to reconfigure itself continuously. During this 576 transient state, flows may begin to be gradually re-forwarded, 577 consuming resources in different parts of the network. OAM has to 578 make a distinction between a metric that changed because of a legal 579 network change (e.g., flow redirection) and an unexpected event 580 (e.g., a fault). 582 5.2. Predictive maintenance 584 RAW needs to implement self-optimization features. While the network 585 is configured to be fault-tolerant, a reconfiguration may be required 586 to keep on respecting long-term objectives. Obviously, the network 587 keeps on respecting the SLO after a node's crash, but a 588 reconfiguration is required to handle future faults. In other words, 589 the reconfiguration delay MUST be strictly smaller than the inter- 590 fault time. 592 The network must continuously retrieve the state of the network, to 593 judge about the relevance of a reconfiguration, quantifying: 595 * the cost of the sub-optimality: resources may not be used 596 optimally (e.g., a better path exists); 598 * the reconfiguration cost: the controller needs to trigger some 599 reconfigurations. For this transient period, resources may be 600 twice reserved, and control packets have to be transmitted. 602 Thus, reconfiguration may only be triggered if the gain is 603 significant. 605 6. Requirements 607 This section lists requirements for OAM in a RAW domain: 609 1. Each Test and Monitoring Endpoint device MUST expose a list of 610 available metrics per track. It MUST at least provide the end- 611 to-end Packet Delivery Ratio, end-to-end latency, and Maximum 612 Consecutive Failures (MCF). 614 2. PREOF functions MUST guarantee order preservation in the 615 (sub)track. 617 3. OAM nodes MUST provide aggregated statistics to reduce the volume 618 of traffic for measurements. They MAY send a compressed 619 distribution of measurements, or MIN / MAX values over a time 620 interval. 622 4. Monitoring Endpoints SHOULD support route tracing with passive 623 OAM techniques. 625 7. IANA Considerations 627 This document has no actionable requirements for IANA. This section 628 can be removed before the publication. 630 8. Security Considerations 632 This section will be expanded in future versions of the draft. 634 9. Acknowledgments 636 TBD 638 10. Informative References 640 [anycast-property] 641 Teles Hermeto, R., Gallais, A., and F. Theoleyre, "Is 642 Link-Layer Anycast Scheduling Relevant for IEEE 643 802.15.4-TSCH Networks?", 2019, 644 . 646 [I-D.ietf-detnet-oam-framework] 647 Mirsky, G., Theoleyre, F., Papadopoulos, G. Z., Bernardos, 648 C. J., Varga, B., and J. Farkas, "Framework of Operations, 649 Administration and Maintenance (OAM) for Deterministic 650 Networking (DetNet)", Work in Progress, Internet-Draft, 651 draft-ietf-detnet-oam-framework-05, 14 October 2021, 652 . 655 [I-D.pthubert-raw-architecture] 656 Thubert, P., Papadopoulos, G. Z., and L. Berger, "Reliable 657 and Available Wireless Architecture/Framework", Work in 658 Progress, Internet-Draft, draft-pthubert-raw-architecture- 659 09, 7 July 2021, . 662 [I-D.thubert-bier-replication-elimination] 663 Thubert, P., Eckert, T., Brodard, Z., and H. Jiang, "BIER- 664 TE extensions for Packet Replication and Elimination 665 Function (PREF) and OAM", Work in Progress, Internet- 666 Draft, draft-thubert-bier-replication-elimination-03, 3 667 March 2018, . 670 [ipath] Gao, Y., Dong, W., Chen, C., Bu, J., Wu, W., and X. Liu, 671 "iPath: path inference in wireless sensor networks.", 672 2016, . 674 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 675 Requirement Levels", BCP 14, RFC 2119, 676 DOI 10.17487/RFC2119, March 1997, 677 . 679 [RFC2544] Bradner, S. and J. McQuaid, "Benchmarking Methodology for 680 Network Interconnect Devices", RFC 2544, 681 DOI 10.17487/RFC2544, March 1999, 682 . 684 [RFC6291] Andersson, L., van Helvoort, H., Bonica, R., Romascanu, 685 D., and S. Mansfield, "Guidelines for the Use of the "OAM" 686 Acronym in the IETF", BCP 161, RFC 6291, 687 DOI 10.17487/RFC6291, June 2011, 688 . 690 [RFC7276] Mizrahi, T., Sprecher, N., Bellagamba, E., and Y. 691 Weingarten, "An Overview of Operations, Administration, 692 and Maintenance (OAM) Tools", RFC 7276, 693 DOI 10.17487/RFC7276, June 2014, 694 . 696 [RFC7799] Morton, A., "Active and Passive Metrics and Methods (with 697 Hybrid Types In-Between)", RFC 7799, DOI 10.17487/RFC7799, 698 May 2016, . 700 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 701 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 702 May 2017, . 704 [RFC8655] Finn, N., Thubert, P., Varga, B., and J. Farkas, 705 "Deterministic Networking Architecture", RFC 8655, 706 DOI 10.17487/RFC8655, October 2019, 707 . 709 Authors' Addresses 711 Fabrice Theoleyre 712 CNRS 713 Building B 714 300 boulevard Sebastien Brant - CS 10413 715 67400 Illkirch - Strasbourg 716 France 718 Phone: +33 368 85 45 33 719 Email: fabrice.theoleyre@cnrs.fr 720 URI: http://www.theoleyre.eu 722 Georgios Z. Papadopoulos 723 IMT Atlantique 724 Office B00 - 102A 725 2 Rue de la Chataigneraie 726 35510 Cesson-Sevigne - Rennes 727 France 729 Phone: +33 299 12 70 04 730 Email: georgios.papadopoulos@imt-atlantique.fr 732 Greg Mirsky 733 Ericsson 735 Email: gregimirsky@gmail.com 737 Carlos J. Bernardos 738 Universidad Carlos III de Madrid 739 Av. Universidad, 30 740 28911 Leganes, Madrid 741 Spain 743 Phone: +34 91624 6236 744 Email: cjbc@it.uc3m.es 745 URI: http://www.it.uc3m.es/cjbc/