idnits 2.17.00 (12 Aug 2021) /tmp/idnits22615/draft-fioccola-rfc8321bis-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The abstract seems to contain references ([RFC8321]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document date (February 17, 2022) is 93 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. 'IEEE-1588' == Outdated reference: A later version (-04) exists of draft-fioccola-rfc8889bis-01 == Outdated reference: A later version (-09) exists of draft-zhou-ippm-enhanced-alternate-marking-08 Summary: 1 error (**), 0 flaws (~~), 2 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group G. Fioccola, Ed. 3 Internet-Draft Huawei Technologies 4 Obsoletes: 8321 (if approved) M. Cociglio 5 Intended status: Standards Track Telecom Italia 6 Expires: August 21, 2022 G. Mirsky 7 Ericsson 8 T. Mizrahi 9 T. Zhou 10 Huawei Technologies 11 X. Min 12 ZTE Corp. 13 February 17, 2022 15 Alternate-Marking Method 16 draft-fioccola-rfc8321bis-02 18 Abstract 20 This document describes the Alternate-Marking technique to perform 21 packet loss, delay, and jitter measurements on live traffic. This 22 technology can be applied in various situations and for different 23 protocols. It could be considered Passive or Hybrid depending on the 24 application. This document obsoletes [RFC8321]. 26 Status of This Memo 28 This Internet-Draft is submitted in full conformance with the 29 provisions of BCP 78 and BCP 79. 31 Internet-Drafts are working documents of the Internet Engineering 32 Task Force (IETF). Note that other groups may also distribute 33 working documents as Internet-Drafts. The list of current Internet- 34 Drafts is at https://datatracker.ietf.org/drafts/current/. 36 Internet-Drafts are draft documents valid for a maximum of six months 37 and may be updated, replaced, or obsoleted by other documents at any 38 time. It is inappropriate to use Internet-Drafts as reference 39 material or to cite them other than as "work in progress." 41 This Internet-Draft will expire on August 21, 2022. 43 Copyright Notice 45 Copyright (c) 2022 IETF Trust and the persons identified as the 46 document authors. All rights reserved. 48 This document is subject to BCP 78 and the IETF Trust's Legal 49 Provisions Relating to IETF Documents 50 (https://trustee.ietf.org/license-info) in effect on the date of 51 publication of this document. Please review these documents 52 carefully, as they describe your rights and restrictions with respect 53 to this document. Code Components extracted from this document must 54 include Simplified BSD License text as described in Section 4.e of 55 the Trust Legal Provisions and are provided without warranty as 56 described in the Simplified BSD License. 58 Table of Contents 60 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 61 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 4 62 2. Overview of the Method . . . . . . . . . . . . . . . . . . . 4 63 3. Detailed Description of the Method . . . . . . . . . . . . . 5 64 3.1. Packet Loss Measurement . . . . . . . . . . . . . . . . . 5 65 3.1.1. Coloring the Packets . . . . . . . . . . . . . . . . 10 66 3.1.2. Counting the Packets . . . . . . . . . . . . . . . . 10 67 3.1.3. Collecting Data and Calculating Packet Loss . . . . . 11 68 3.2. Timing Aspects . . . . . . . . . . . . . . . . . . . . . 12 69 3.3. One-Way Delay Measurement . . . . . . . . . . . . . . . . 13 70 3.3.1. Single-Marking Methodology . . . . . . . . . . . . . 14 71 3.3.2. Double-Marking Methodology . . . . . . . . . . . . . 16 72 3.4. Delay Variation Measurement . . . . . . . . . . . . . . . 17 73 4. Considerations . . . . . . . . . . . . . . . . . . . . . . . 17 74 4.1. Synchronization . . . . . . . . . . . . . . . . . . . . . 17 75 4.2. Data Correlation . . . . . . . . . . . . . . . . . . . . 18 76 4.3. Packet Reordering . . . . . . . . . . . . . . . . . . . . 19 77 4.4. Packet Fragmentation . . . . . . . . . . . . . . . . . . 20 78 5. Results of the Alternate Marking Experiment . . . . . . . . . 20 79 5.1. Controlled Domain requirement . . . . . . . . . . . . . . 22 80 6. Compliance with Guidelines from RFC 6390 . . . . . . . . . . 22 81 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 24 82 8. Security Considerations . . . . . . . . . . . . . . . . . . . 24 83 9. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 26 84 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 26 85 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 26 86 11.1. Normative References . . . . . . . . . . . . . . . . . . 26 87 11.2. Informative References . . . . . . . . . . . . . . . . . 27 88 Appendix A. Changes Log . . . . . . . . . . . . . . . . . . . . 29 89 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 29 91 1. Introduction 93 Nowadays, most Service Providers' networks carry traffic with 94 contents that are highly sensitive to packet loss [RFC7680], delay 95 [RFC7679], and jitter [RFC3393]. 97 In view of this scenario, Service Providers need methodologies and 98 tools to monitor and measure network performance with an adequate 99 accuracy, in order to constantly control the quality of experience 100 perceived by their customers. Performance monitoring also provides 101 useful information for improving network management (e.g., isolation 102 of network problems, troubleshooting, etc.). 104 A lot of work related to Operations, Administration, and Maintenance 105 (OAM), which also includes performance monitoring techniques, has 106 been done by Standards Developing Organizations (SDOs): [RFC7276] 107 provides a good overview of existing OAM mechanisms defined in the 108 IETF, ITU-T, and IEEE. In the IETF, a lot of work has been done on 109 fault detection and connectivity verification, while a minor effort 110 has been thus far dedicated to performance monitoring. The IPPM WG 111 has defined standard metrics to measure network performance; however, 112 the methods developed in this WG mainly refer to focus on Active 113 measurement techniques. More recently, the MPLS WG has defined 114 mechanisms for measuring packet loss, one-way and two-way delay, and 115 delay variation in MPLS networks [RFC6374], but their applicability 116 to Passive measurements has some limitations, especially for pure 117 connection-less networks. 119 The lack of adequate tools to measure packet loss with the desired 120 accuracy drove an effort to design a new method for the performance 121 monitoring of live traffic, which is easy to implement and deploy. 122 The effort led to the method described in this document: basically, 123 it is a Passive performance monitoring technique, potentially 124 applicable to any kind of packet-based traffic, including Ethernet, 125 IP, and MPLS, both unicast and multicast. The method addresses 126 primarily packet loss measurement, but it can be easily extended to 127 one-way or two-way delay and delay variation measurements as well. 129 The method has been explicitly designed for Passive measurements, but 130 it can also be used with Active probes. Passive measurements are 131 usually more easily understood by customers and provide much better 132 accuracy, especially for packet loss measurements. 134 RFC 7799 [RFC7799] defines Passive and Hybrid Methods of Measurement. 135 In particular, Passive Methods of Measurement are based solely on 136 observations of an undisturbed and unmodified packet stream of 137 interest; Hybrid Methods are Methods of Measurement that use a 138 combination of Active Methods and Passive Methods. 140 Taking into consideration these definitions, the Alternate-Marking 141 Method could be considered Hybrid or Passive, depending on the case. 142 In the case where the marking method is obtained by changing existing 143 field values of the packets the technique is Hybrid. In the case 144 where the marking field is dedicated, reserved, and included in the 145 protocol specification, the Alternate-Marking technique can be 146 considered as Passive. 148 1.1. Requirements Language 150 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 151 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 152 "OPTIONAL" in this document are to be interpreted as described in BCP 153 14 [RFC2119] [RFC8174] when, and only when, they appear in all 154 capitals, as shown here. 156 2. Overview of the Method 158 In order to perform packet loss measurements on a production traffic 159 flow, different approaches exist. The most intuitive one consists in 160 numbering the packets so that each router that receives the flow can 161 immediately detect a packet that is missing. This approach, though 162 very simple in theory, is not simple to achieve: it requires the 163 insertion of a sequence number into each packet, and the devices must 164 be able to extract the number and check it in real time. Such a task 165 can be difficult to implement on live traffic: if UDP is used as the 166 transport protocol, the sequence number is not available; on the 167 other hand, if a higher-layer sequence number (e.g., in the RTP 168 header) is used, extracting that information from each packet and 169 processing it in real time could overload the device. 171 An alternate approach is to count the number of packets sent on one 172 end, count the number of packets received on the other end, and 173 compare the two values. This operation is much simpler to implement, 174 but it requires the devices performing the measurement to be in sync: 175 in order to compare two counters, it is required that they refer 176 exactly to the same set of packets. Since a flow is continuous and 177 cannot be stopped when a counter has to be read, it can be difficult 178 to determine exactly when to read the counter. A possible solution 179 to overcome this problem is to virtually split the flow in 180 consecutive blocks by periodically inserting a delimiter so that each 181 counter refers exactly to the same block of packets. The delimiter 182 could be, for example, a special packet inserted artificially into 183 the flow. However, delimiting the flow using specific packets has 184 some limitations. First, it requires generating additional packets 185 within the flow and requires the equipment to be able to process 186 those packets. In addition, the method is vulnerable to out-of-order 187 reception of delimiting packets and, to a lesser extent, to their 188 loss. 190 The method proposed in this document follows the second approach, but 191 it doesn't use additional packets to virtually split the flow in 192 blocks. Instead, it "marks" the packets so that the packets 193 belonging to the same block will have the same color, whilst 194 consecutive blocks will have different colors. Each change of color 195 represents a sort of auto-synchronization signal that guarantees the 196 consistency of measurements taken by different devices along the 197 path. 199 Figure 1 represents a very simple network and shows how the method 200 can be used to measure packet loss on different network segments: by 201 enabling the measurement on several interfaces along the path, it is 202 possible to perform link monitoring, node monitoring, or end-to-end 203 monitoring. The method is flexible enough to measure packet loss on 204 any segment of the network and can be used to isolate the faulty 205 element. 207 Traffic Flow 208 ========================================================> 209 +------+ +------+ +------+ +------+ 210 ---<> R1 <>-----<> R2 <>-----<> R3 <>-----<> R4 <>--- 211 +------+ +------+ +------+ +------+ 212 . . . . . . 213 . . . . . . 214 . <------> <-------> . 215 . Node Packet Loss Link Packet Loss . 216 . . 217 <---------------------------------------------------> 218 End-to-End Packet Loss 220 Figure 1: Available Measurements 222 3. Detailed Description of the Method 224 This section describes, in detail, how the method operates. A 225 special emphasis is given to the measurement of packet loss, which 226 represents the core application of the method, but applicability to 227 delay and jitter measurements is also considered. 229 3.1. Packet Loss Measurement 231 The basic idea is to virtually split traffic flows into consecutive 232 blocks: each block represents a measurable entity unambiguously 233 recognizable by all network devices along the path. By counting the 234 number of packets in each block and comparing the values measured by 235 different network devices along the path, it is possible to measure 236 if packet loss occurred in any single block between any two points. 238 As discussed in the previous section, a simple way to create the 239 blocks is to "color" the traffic (two colors are sufficient), so that 240 packets belonging to different consecutive blocks will have different 241 colors. Whenever the color changes, the previous block terminates 242 and the new one begins. Hence, all the packets belonging to the same 243 block will have the same color and packets of different consecutive 244 blocks will have different colors. The number of packets in each 245 block depends on the criterion used to create the blocks: 247 o if the color is switched after a fixed number of packets, then 248 each block will contain the same number of packets (except for any 249 losses); and 251 o if the color is switched according to a fixed timer, then the 252 number of packets may be different in each block depending on the 253 packet rate. 255 The rest of the document assumes that the blocks are created 256 according to a fixed timer. The switching after a fixed number of 257 packets is an additional possibility but its detailed specification 258 is out of scope. 260 The following figure shows how a flow looks like when it is split in 261 traffic blocks with colored packets. 263 A: packet with A coloring 264 B: packet with B coloring 266 | | | | | 267 | | Traffic Flow | | 268 -------------------------------------------------------------------> 269 BBBBBBB AAAAAAAAAAA BBBBBBBBBBB AAAAAAAAAAA BBBBBBBBBBB AAAAAAA 270 -------------------------------------------------------------------> 271 ... | Block 5 | Block 4 | Block 3 | Block 2 | Block 1 272 | | | | | 274 Figure 2: Traffic Coloring 276 Figure 3 shows how the method can be used to measure link packet loss 277 between two adjacent nodes. 279 Referring to the figure, let's assume we want to monitor the packet 280 loss on the link between two routers: router R1 and router R2. 281 According to the method, the traffic is colored alternatively with 282 two different colors: A and B. Whenever the color changes, the 283 transition generates a sort of square-wave signal, as depicted in the 284 following figure. 286 Color A ----------+ +-----------+ +---------- 287 | | | | 288 Color B +-----------+ +-----------+ 289 Block n ... Block 3 Block 2 Block 1 290 <---------> <---------> <---------> <---------> <---------> 292 Traffic Flow 293 ===========================================================> 294 Color ...AAAAAAAAAAA BBBBBBBBBBB AAAAAAAAAAA BBBBBBBBBBB AAAAAAA... 295 ===========================================================> 297 Figure 3: Computation of Link Packet Loss 299 Traffic coloring can be done by R1 itself if the traffic is not 300 already colored. R1 needs two counters, C(A)R1 and C(B)R1, on its 301 egress interface: C(A)R1 counts the packets with color A and C(B)R1 302 counts those with color B. As long as traffic is colored as A, only 303 counter C(A)R1 will be incremented, while C(B)R1 is not incremented; 304 conversely, when the traffic is colored as B, only C(B)R1 is 305 incremented. C(A)R1 and C(B)R1 can be used as reference values to 306 determine the packet loss from R1 to any other measurement point down 307 the path. Router R2, similarly, will need two counters on its 308 ingress interface, C(A)R2 and C(B)R2, to count the packets received 309 on that interface and colored with A and B, respectively. When an A 310 block ends, it is possible to compare C(A)R1 and C(A)R2 and calculate 311 the packet loss within the block; similarly, when the successive B 312 block terminates, it is possible to compare C(B)R1 with C(B)R2, and 313 so on, for every successive block. 315 Likewise, by using two counters on the R2 egress interface, it is 316 possible to count the packets sent out of the R2 interface and use 317 them as reference values to calculate the packet loss from R2 to any 318 measurement point down R2. 320 Using a fixed timer for color switching offers better control over 321 the method: the (time) length of the blocks can be chosen large 322 enough to simplify the collection and the comparison of measures 323 taken by different network devices. It's preferable to read the 324 value of the counters not immediately after the color switch: some 325 packets could arrive out of order and increment the counter 326 associated with the previous block (color), so it is worth waiting 327 for some time. A safe choice is to wait L/2 time units (where L is 328 the duration for each block) after the color switch, to read the 329 still counter of the previous color, so the possibility of reading a 330 running counter instead of a still one is minimized. The drawback is 331 that the longer the duration of the block, the less frequent the 332 measurement can be taken. 334 The following table shows how the counters can be used to calculate 335 the packet loss between R1 and R2. The first column lists the 336 sequence of traffic blocks, while the other columns contain the 337 counters of A-colored packets and B-colored packets for R1 and R2. 338 In this example, we assume that the values of the counters are reset 339 to zero whenever a block ends and its associated counter has been 340 read: with this assumption, the table shows only relative values, 341 which is the exact number of packets of each color within each block. 342 If the values of the counters were not reset, the table would contain 343 cumulative values, but the relative values could be determined simply 344 by the difference from the value of the previous block of the same 345 color. 347 The color is switched on the basis of a fixed timer (not shown in the 348 table), so the number of packets in each block is different. 350 +-------+--------+--------+--------+--------+------+ 351 | Block | C(A)R1 | C(B)R1 | C(A)R2 | C(B)R2 | Loss | 352 +-------+--------+--------+--------+--------+------+ 353 | 1 | 375 | 0 | 375 | 0 | 0 | 354 | 2 | 0 | 388 | 0 | 388 | 0 | 355 | 3 | 382 | 0 | 381 | 0 | 1 | 356 | 4 | 0 | 377 | 0 | 374 | 3 | 357 | ... | ... | ... | ... | ... | ... | 358 | 2n | 0 | 387 | 0 | 387 | 0 | 359 | 2n+1 | 379 | 0 | 377 | 0 | 2 | 360 +-------+--------+--------+--------+--------+------+ 362 Figure 4: Evaluation of Counters for Packet Loss Measurements 364 During an A block (blocks 1, 3, and 2n+1), all the packets are 365 A-colored; therefore, the C(A) counters are incremented to the number 366 seen on the interface, while C(B) counters are zero. Conversely, 367 during a B block (blocks 2, 4, and 2n), all the packets are 368 B-colored: C(A) counters are zero, while C(B) counters are 369 incremented. 371 When a block ends (because of color switching), the relative counters 372 stop incrementing; it is possible to read them, compare the values 373 measured on routers R1 and R2, and calculate the packet loss within 374 that block. 376 For example, looking at the table above, during the first block 377 (A-colored), C(A)R1 and C(A)R2 have the same value (375), which 378 corresponds to the exact number of packets of the first block (no 379 loss). Also, during the second block (B-colored), R1 and R2 counters 380 have the same value (388), which corresponds to the number of packets 381 of the second block (no loss). During the third and fourth blocks, 382 R1 and R2 counters are different, meaning that some packets have been 383 lost: in the example, one single packet (382-381) was lost during 384 block three, and three packets (377-374) were lost during block four. 386 The method applied to R1 and R2 can be extended to any other router 387 and applied to more complex networks, as far as the measurement is 388 enabled on the path followed by the traffic flow(s) being observed. 390 It's worth mentioning two different strategies that can be used when 391 implementing the method: 393 o flow-based: the flow-based strategy is used when only a limited 394 number of traffic flows need to be monitored. According to this 395 strategy, only a subset of the flows is colored. Counters for 396 packet loss measurements can be instantiated for each single flow, 397 or for the set as a whole, depending on the desired granularity. 398 A relevant problem with this approach is the necessity to know in 399 advance the path followed by flows that are subject to 400 measurement. Path rerouting and traffic load-balancing increase 401 the issue complexity, especially for unicast traffic. The problem 402 is easier to solve for multicast traffic, where load-balancing is 403 seldom used and static joins are frequently used to force traffic 404 forwarding and replication. 406 o link-based: measurements are performed on all the traffic on a 407 link-by-link basis. The link could be a physical link or a 408 logical link. Counters could be instantiated for the traffic as a 409 whole or for each traffic class (in case it is desired to monitor 410 each class separately), but in the second case, two counters are 411 needed for each class. 413 As mentioned, the flow-based measurement requires the identification 414 of the flow to be monitored and the discovery of the path followed by 415 the selected flow. It is possible to monitor a single flow or 416 multiple flows grouped together, but in this case, measurement is 417 consistent only if all the flows in the group follow the same path. 418 Moreover, if a measurement is performed by grouping many flows, it is 419 not possible to determine exactly which flow was affected by packet 420 loss. In order to have measures per single flow, it is necessary to 421 configure counters for each specific flow. Once the flow(s) to be 422 monitored has been identified, it is necessary to configure the 423 monitoring on the proper nodes. Configuring the monitoring means 424 configuring the rule to intercept the traffic and configuring the 425 counters to count the packets. To have just an end-to-end 426 monitoring, it is sufficient to enable the monitoring on the first- 427 and last-hop routers of the path: the mechanism is completely 428 transparent to intermediate nodes and independent from the path 429 followed by traffic flows. On the contrary, to monitor the flow on a 430 hop-by-hop basis along its whole path, it is necessary to enable the 431 monitoring on every node from the source to the destination. In case 432 the exact path followed by the flow is not known a priori (i.e., the 433 flow has multiple paths to reach the destination), it is necessary to 434 enable the monitoring system on every path: counters on interfaces 435 traversed by the flow will report packet count, whereas counters on 436 other interfaces will be null. 438 3.1.1. Coloring the Packets 440 The coloring operation is fundamental in order to create packet 441 blocks. This implies choosing where to activate the coloring and how 442 to color the packets. 444 In case of flow-based measurements, the flow to monitor can be 445 defined by a set of selection rules (e.g., header fields) used to 446 match a subset of the packets; in this way, it is possible to control 447 the number of involved nodes, the path followed by the packets, and 448 the size of the flows. It is possible, in general, to have multiple 449 coloring nodes or a single coloring node that is easier to manage and 450 doesn't raise any risk of conflict. Coloring in multiple nodes can 451 be done, and the requirement is that the coloring must change 452 periodically between the nodes according to the timing considerations 453 in Section 3.2; so every node that is designated as a measurement 454 point along the path should be able to identify unambiguously the 455 colored packets. Furthermore, [I-D.fioccola-rfc8889bis] generalizes 456 the coloring for multipoint-to-multipoint flow. In addition, it can 457 be advantageous to color the flow as close as possible to the source 458 because it allows an end-to-end measure if a measurement point is 459 enabled on the last-hop router as well. 461 For link-based measurements, all traffic needs to be colored when 462 transmitted on the link. If the traffic had already been colored, 463 then it has to be re-colored because the color must be consistent on 464 the link. This means that each hop along the path must (re-)color 465 the traffic; the color is not required to be consistent along 466 different links. 468 Traffic coloring can be implemented by setting a specific bit in the 469 packet header and changing the value of that bit periodically. How 470 to choose the marking field depends on the application and is out of 471 scope here. 473 3.1.2. Counting the Packets 475 For flow-based measurements, assuming that the coloring of the 476 packets is performed only by the source nodes, the nodes between 477 source and destination (included) have to count the colored packets 478 that they receive and forward: this operation can be enabled on every 479 router along the path or only on a subset, depending on which network 480 segment is being monitored (a single link, a particular metro area, 481 the backbone, or the whole path). Since the color switches 482 periodically between two values, two counters (one for each value) 483 are needed: one counter for packets with color A and one counter for 484 packets with color B. For each flow (or group of flows) being 485 monitored and for every interface where the monitoring is Active, two 486 counters are needed. For example, in order to separately monitor 487 three flows on a router with four interfaces involved, 24 counters 488 are needed (two counters for each of the three flows on each of the 489 four interfaces). Furthermore, [I-D.fioccola-rfc8889bis] generalizes 490 the counting for multipoint-to-multipoint flow. 492 In case of link-based measurements, the behavior is similar except 493 that coloring and counting operations are performed on a link-by-link 494 basis at each endpoint of the link. 496 Another important aspect to take into consideration is when to read 497 the counters: in order to count the exact number of packets of a 498 block, the routers must perform this operation when that block has 499 ended; in other words, the counter for color A must be read when the 500 current block has color B, in order to be sure that the value of the 501 counter is stable. This task can be accomplished in two ways. The 502 general approach suggests reading the counters periodically, many 503 times during a block duration, and comparing these successive 504 readings: when the counter stops incrementing, it means that the 505 current block has ended, and its value can be elaborated safely. 506 Alternatively, if the coloring operation is performed on the basis of 507 a fixed timer, it is possible to configure the reading of the 508 counters according to that timer: for example, reading the counter 509 for color A every period in the middle of the subsequent block with 510 color B is a safe choice. A sufficient margin should be considered 511 between the end of a block and the reading of the counter, in order 512 to take into account any out-of-order packets. 514 3.1.3. Collecting Data and Calculating Packet Loss 516 The nodes enabled to perform performance monitoring collect the value 517 of the counters, but they are not able to directly use this 518 information to measure packet loss, because they only have their own 519 samples. For this reason, an external Network Management System 520 (NMS) can be used to collect and elaborate data and to perform packet 521 loss calculation. The NMS compares the values of counters from 522 different nodes and can calculate if some packets were lost (even a 523 single packet) and where those packets were lost. 525 The value of the counters needs to be transmitted to the NMS as soon 526 as it has been read. This can be accomplished by using SNMP or FTP 527 and can be done in Push Mode or Polling Mode. In the first case, 528 each router periodically sends the information to the NMS; in the 529 latter case, it is the NMS that periodically polls routers to collect 530 information. In any case, the NMS has to collect all the relevant 531 values from all the routers within one cycle of the timer. 533 It would also be possible to use a protocol to exchange values of 534 counters between the two endpoints in order to let them perform the 535 packet loss calculation for each traffic direction. 537 3.2. Timing Aspects 539 This document introduces two color-switching methods: one is based on 540 a fixed number of packets, and the other is based on a fixed timer. 541 But the method based on a fixed timer is preferable because it is 542 more deterministic, and it is considered in the document. 544 In general, clocks in network devices are not accurate and for this 545 reason, there is a clock error between the measurement points R1 and 546 R2. But, to implement the methodology, they must be synchronized to 547 the same clock reference with an accuracy of +/- L/2 time units, 548 where L is the fixed time duration of the block. So each colored 549 packet can be assigned to the right batch by each router. This is 550 because the minimum time distance between two packets of the same 551 color but that belong to different batches is L time units. 553 In practice, in addition to clock errors, the delay between 554 measurement points also affects the implementation of the methodology 555 because each packet can be delayed differently, and this can produce 556 out of order at batch boundaries. This means that, without 557 considering clock error, we wait L/2 after color switching to be sure 558 to take a still counter. 560 In summary, we need to take into account two contributions: clock 561 error between network devices and the interval we need to wait to 562 avoid packets being out of order because of network delay. 564 The following figure explains both issues. 566 ...BBBBBBBBB | AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA | BBBBBBBBB... 567 |<======================================>| 568 | L | 569 ...=========>|<==================><==================>|<==========... 570 | L/2 L/2 | 571 |<===>| |<===>| 572 d | | d 573 |<==========================>| 574 available counting interval 576 Figure 5: Timing Aspects 578 It is assumed that all network devices are synchronized to a common 579 reference time with an accuracy of +/- A/2. Thus, the difference 580 between the clock values of any two network devices is bounded by A. 582 The network delay between the network devices can be represented as a 583 data set and 99.7% of the samples are within 3 standard deviation of 584 the average. 586 The guard band d is given by: 588 d = A + D_avg + 3*D_stddev, 590 where A is the clock accuracy, D_avg is the average value of the 591 network delay between the network devices, and D_stddev is the 592 standard deviation of the delay. 594 The available counting interval is L - 2d that must be > 0. 596 The condition that must be satisfied and is a requirement on the 597 synchronization accuracy is: 599 d < L/2. 601 3.3. One-Way Delay Measurement 603 The same principle used to measure packet loss can be applied also to 604 one-way delay measurement. There are three alternatives, as 605 described hereinafter. 607 Note that, for all the one-way delay alternatives described in the 608 next sections, by summing the one-way delays of the two directions of 609 a path, it is always possible to measure the two-way delay (round- 610 trip "virtual" delay). 612 3.3.1. Single-Marking Methodology 614 The alternation of colors can be used as a time reference to 615 calculate the delay. Whenever the color changes (which means that a 616 new block has started), a network device can store the timestamp of 617 the first packet of the new block; that timestamp can be compared 618 with the timestamp of the same packet on a second router to compute 619 packet delay. When looking at Figure 2, R1 stores the timestamp 620 TS(A1)R1 when it sends the first packet of block 1 (A-colored), the 621 timestamp TS(B2)R1 when it sends the first packet of block 2 622 (B-colored), and so on for every other block. R2 performs the same 623 operation on the receiving side, recording TS(A1)R2, TS(B2)R2, and so 624 on. Since the timestamps refer to specific packets (the first packet 625 of each block), we are sure that timestamps compared to compute delay 626 refer to the same packets. By comparing TS(A1)R1 with TS(A1)R2 (and 627 similarly TS(B2)R1 with TS(B2)R2, and so on), it is possible to 628 measure the delay between R1 and R2. In order to have more 629 measurements, it is possible to take and store more timestamps, 630 referring to other packets within each block. 632 In order to coherently compare timestamps collected on different 633 routers, the clocks on the network nodes must be in sync. 634 Furthermore, a measurement is valid only if no packet loss occurs and 635 if packet misordering can be avoided; otherwise, the first packet of 636 a block on R1 could be different from the first packet of the same 637 block on R2 (for instance, if that packet is lost between R1 and R2 638 or it arrives after the next one). Since packet misordering is 639 generally undetectable it is not possible to check whether the first 640 packet on R1 is the same on R2 and this is part of the intrinsic 641 error in this measurement. 643 The following table shows how timestamps can be used to calculate the 644 delay between R1 and R2. The first column lists the sequence of 645 blocks, while other columns contain the timestamp referring to the 646 first packet of each block on R1 and R2. The delay is computed as a 647 difference between timestamps. For the sake of simplicity, all the 648 values are expressed in milliseconds. 650 +-------+---------+---------+---------+---------+-------------+ 651 | Block | TS(A)R1 | TS(B)R1 | TS(A)R2 | TS(B)R2 | Delay R1-R2 | 652 +-------+---------+---------+---------+---------+-------------+ 653 | 1 | 12.483 | - | 15.591 | - | 3.108 | 654 | 2 | - | 6.263 | - | 9.288 | 3.025 | 655 | 3 | 27.556 | - | 30.512 | - | 2.956 | 656 | 4 | - | 18.113 | - | 21.269 | 3.156 | 657 | ... | ... | ... | ... | ... | ... | 658 | 2n | 77.463 | - | 80.501 | - | 3.038 | 659 | 2n+1 | - | 24.333 | - | 27.433 | 3.100 | 660 +-------+---------+---------+---------+---------+-------------+ 662 Figure 6: Evaluation of Timestamps for Delay Measurements 664 The first row shows timestamps taken on R1 and R2, respectively, and 665 refers to the first packet of block 1 (which is A-colored). Delay 666 can be computed as a difference between the timestamp on R2 and the 667 timestamp on R1. Similarly, the second row shows timestamps (in 668 milliseconds) taken on R1 and R2 and refers to the first packet of 669 block 2 (which is B-colored). By comparing timestamps taken on 670 different nodes in the network and referring to the same packets 671 (identified using the alternation of colors), it is possible to 672 measure delay on different network segments. 674 For the sake of simplicity, in the above example, a single 675 measurement is provided within a block, taking into account only the 676 first packet of each block. The number of measurements can be easily 677 increased by considering multiple packets in the block: for instance, 678 a timestamp could be taken every N packets, thus generating multiple 679 delay measurements. Taking this to the limit, in principle, the 680 delay could be measured for each packet by taking and comparing the 681 corresponding timestamps (possible but impractical from an 682 implementation point of view). 684 3.3.1.1. Mean Delay 686 As mentioned before, the method previously exposed for measuring the 687 delay is sensitive to out-of-order reception of packets. In order to 688 overcome this problem, a different approach has been considered: it 689 is based on the concept of mean delay. The mean delay is calculated 690 by considering the average arrival time of the packets within a 691 single block. The network device locally stores a timestamp for each 692 packet received within a single block: summing all the timestamps and 693 dividing by the total number of packets received, the average arrival 694 time for that block of packets can be calculated. By subtracting the 695 average arrival times of two adjacent devices, it is possible to 696 calculate the mean delay between those nodes. When computing the 697 mean delay, the measurement error could be augmented by accumulating 698 the measurement error of a lot of packets. This method is robust to 699 out-of-order packets and also to packet loss (only a small error is 700 introduced). Moreover, it greatly reduces the number of timestamps 701 (only one per block for each network device) that have to be 702 collected by the management system. On the other hand, it only gives 703 one measure for the duration of the block, and it doesn't give the 704 minimum, maximum, and median delay values [RFC6703]. This limitation 705 could be overcome by reducing the duration of the block (for 706 instance, from 5 minutes to a few seconds), which implies a highly 707 optimized implementation of the method. 709 3.3.2. Double-Marking Methodology 711 The Single-Marking methodology for one-way delay measurement is 712 sensitive to out-of-order reception of packets. The first approach 713 to overcome this problem has been described before and is based on 714 the concept of mean delay. But the limitation of mean delay is that 715 it doesn't give information about the delay value's distribution for 716 the duration of the block. Additionally, it may be useful to have 717 not only the mean delay but also the minimum, maximum, and median 718 delay values and, in wider terms, to know more about the statistic 719 distribution of delay values. So, in order to have more information 720 about the delay and to overcome out-of-order issues, a different 721 approach can be introduced; it is based on a Double-Marking 722 methodology. 724 Basically, the idea is to use the first marking to create the 725 alternate flow and, within this colored flow, a second marking to 726 select the packets for measuring delay/jitter. The first marking is 727 needed for packet loss and mean delay measurement. The second 728 marking creates a new set of marked packets that are fully identified 729 over the network, so that a network device can store the timestamps 730 of these packets; these timestamps can be compared with the 731 timestamps of the same packets on a second router to compute packet 732 delay values for each packet. The number of measurements can be 733 easily increased by changing the frequency of the second marking. 734 But the frequency of the second marking must not be too high in order 735 to avoid out-of-order issues. Between packets with the second 736 marking, there should be a security time gap (e.g., this gap could 737 be, at the minimum, the mean network delay calculated with the 738 previous methodology) to avoid out-of-order issues and also to have a 739 number of measurement packets that are rate independent. If a 740 second-marking packet is lost, the delay measurement for the 741 considered block is corrupted and should be discarded. 743 Mean delay is calculated on all the packets of a sample and is a 744 simple computation to be performed for a Single-Marking Method. In 745 some cases, the mean delay measure is not sufficient to characterize 746 the sample, and more statistics of delay extent data are needed, 747 e.g., percentiles, variance, and median delay values. The 748 conventional range (maximum-minimum) should be avoided for several 749 reasons, including stability of the maximum delay due to the 750 influence by outliers. RFC 5481 [RFC5481], Section 6.5 highlights 751 how the 99.9th percentile of delay and delay variation is more 752 helpful to performance planners. To overcome this drawback, the idea 753 is to couple the mean delay measure for the entire batch with a 754 Double-Marking Method, where a subset of batch packets is selected 755 for extensive delay calculation by using a second marking. In this 756 way, it is possible to perform a detailed analysis on these double- 757 marked packets. Please note that there are classic algorithms for 758 median and variance calculation, but they are out of the scope of 759 this document. The comparison between the mean delay for the entire 760 batch and the mean delay on these double-marked packets gives useful 761 information since it is possible to understand if the Double-Marking 762 measurements are actually representative of the delay trends. 764 3.4. Delay Variation Measurement 766 Similar to one-way delay measurement (both for Single Marking and 767 Double Marking), the method can also be used to measure the inter- 768 arrival jitter. We refer to the definition in RFC 3393 [RFC3393]. 769 The alternation of colors, for a Single-Marking Method, can be used 770 as a time reference to measure delay variations. In case of Double 771 Marking, the time reference is given by the second-marked packets. 772 Considering the example depicted in Figure 2, R1 stores the timestamp 773 TS(A)R1 whenever it sends the first packet of a block, and R2 stores 774 the timestamp TS(B)R2 whenever it receives the first packet of a 775 block. The inter-arrival jitter can be easily derived from one-way 776 delay measurement, by evaluating the delay variation of consecutive 777 samples. 779 The concept of mean delay can also be applied to delay variation, by 780 evaluating the average variation of the interval between consecutive 781 packets of the flow from R1 to R2. 783 4. Considerations 785 This section highlights some considerations about the methodology. 787 4.1. Synchronization 789 The Alternate-Marking technique does not require a strong 790 synchronization, especially for packet loss and two-way delay 791 measurement. Only one-way delay measurement requires network devices 792 to have synchronized clocks. 794 Color switching is the reference for all the network devices, and the 795 only requirement to be achieved is that all network devices have to 796 recognize the right batch along the path. 798 Section 3.2 specifies the level of synchronization accuracy so that 799 all network devices consistently match the color bit to the correct 800 block. 802 This synchronization requirement can be satisfied even with a 803 relatively inaccurate synchronization method. This is true for 804 packet loss and two-way delay measurement, but not for one-way delay 805 measurement, where clock synchronization must be accurate. 807 Therefore, a system that uses only packet loss and two-way delay 808 measurement does not require synchronization. This is because the 809 value of the clocks of network devices does not affect the 810 computation of the two-way delay measurement. 812 4.2. Data Correlation 814 Data correlation is the mechanism to compare counters and timestamps 815 for packet loss, delay, and delay variation calculation. It could be 816 performed in several ways depending on the Alternate-Marking 817 application and use case. Some possibilities are to: 819 o use a centralized solution using NMS to correlate data; and 821 o define a protocol-based distributed solution by introducing a new 822 protocol or by extending the existing protocols (e.g., see RFC 823 6374 [RFC6374] or the Two-Way Active Measurement Protocol (TWAMP) 824 as defined in RFC 5357 [RFC5357] or the One-Way Active Measurement 825 Protocol (OWAMP) as defined in RFC 4656 [RFC4656]) in order to 826 communicate the counters and timestamps between nodes. 828 In the following paragraphs, an example data correlation mechanism is 829 explained and could be used independently of the adopted solutions. 831 When data is collected on the upstream and downstream nodes, e.g., 832 packet counts for packet loss measurement or timestamps for packet 833 delay measurement, and is periodically reported to or pulled by other 834 nodes or an NMS, a certain data correlation mechanism SHOULD be in 835 use to help the nodes or NMS tell whether any two or more packet 836 counts are related to the same block of markers or if any two 837 timestamps are related to the same marked packet. 839 The Alternate-Marking Method described in this document literally 840 splits the packets of the measured flow into different measurement 841 blocks; in addition, a Block Number (BN) could be assigned to each 842 such measurement block. The BN is generated each time a node reads 843 the data (packet counts or timestamps) and is associated with each 844 packet count and timestamp reported to or pulled by other nodes or 845 NMSs. The value of a BN could be calculated as the modulo of the 846 local time (when the data are read) and the interval of the marking 847 time period. 849 When the nodes or NMS see, for example, the same BNs associated with 850 two packet counts from an upstream and a downstream node, 851 respectively, it considers that these two packet counts correspond to 852 the same block, i.e., these two packet counts belong to the same 853 block of markers from the upstream and downstream nodes. The 854 assumption of this BN mechanism is that the measurement nodes are 855 time synchronized. This requires the measurement nodes to have a 856 certain time synchronization capability (e.g., the Network Time 857 Protocol (NTP) [RFC5905] or the IEEE 1588 Precision Time Protocol 858 (PTP) [IEEE-1588]). Synchronization aspects are further discussed in 859 Section 3.2. 861 4.3. Packet Reordering 863 Due to ECMP, packet reordering is very common in an IP network. The 864 accuracy of a marking-based PM, especially packet loss measurement, 865 may be affected by packet reordering. Take a look at the following 866 example: 868 Block : 1 | 2 | 3 | 4 | 5 |... 869 --------|---------|---------|---------|---------|---------|--- 870 Node R1 : AAAAAAA | BBBBBBB | AAAAAAA | BBBBBBB | AAAAAAA |... 871 Node R2 : AAAAABB | AABBBBA | AAABAAA | BBBBBBA | ABAAABA |... 873 Figure 7: Packet Reordering 875 In Figure 7, the packet stream for Node R1 isn't being reordered and 876 can be safely assigned to interval blocks, but the packet stream for 877 Node R2 is being reordered; so, looking at the packet with the marker 878 of "B" in block 3, there is no safe way to tell whether the packet 879 belongs to block 2 or block 4. 881 In general, there is the need to assign packets with the marker of 882 "B" or "A" to the right interval blocks. Most of the packet 883 reordering occurs at the edge of adjacent blocks, and they are easy 884 to handle if the interval of each block is sufficiently large. Then, 885 it can be assumed that the packets with different markers belong to 886 the block that they are closer to. If the interval is small, it is 887 difficult and sometimes impossible to determine to which block a 888 packet belongs. 890 Section 3.2 provides a guidance on how to choose a proper interval 891 and mitigate packet reordering issues. 893 4.4. Packet Fragmentation 895 Fragmentation can be managed with the Alternate-Marking Method and in 896 particular it is possible to give the following guidance: 898 Marking nodes MUST mark all fragments if there are flag bits to 899 use (i.e. it is in the specific encapsulation), as if they were 900 separate packets. 902 Nodes that fragment packets within the measurement domain SHOULD, 903 if they have the capability to do so, ensure that only one 904 resulting fragment carries the marking bit(s) of the original 905 packet. Failure to do so can introduce errors into the 906 measurement. 908 Measurement points MAY simply ignore unmarked fragments and count 909 marked fragments as full packets. However, if resources allow, 910 measurement points MAY make note of both marked and unmarked 911 initial fragments and only increment the corresponding counter if 912 (a) other fragments are also marked, or (b) it observes all other 913 fragments and they are unmarked. 915 The proposed approach allows the marking node to mark all the 916 fragments except in the case of fragmentation within the network 917 domain, in that event it is suggested to mark only the first 918 fragment. In addition it could be possible to take the counters 919 properly in order to keep track of both marked and unmarked 920 fragments. 922 5. Results of the Alternate Marking Experiment 924 The methodology described in the previous sections can be applied to 925 various performance measurement problems, as explained in [RFC8321]. 926 The only requirement is to select and mark the flow to be monitored; 927 in this way, packets are batched by the sender, and each batch is 928 alternately marked such that it can be easily recognized by the 929 receiver. 931 Either one or two flag bits might be available for marking in 932 different deployments: 934 One flag: packet loss measurement SHOULD be done as described in 935 Section 3.1, while delay measurement MAY be done according to the 936 single-marking method described in Section 3.3.1. Mean delay 937 (Section 3.3.1.1) is NOT RECOMMENDED since it implies more 938 computational load. 940 Two flags: packet loss measurement SHOULD be done as described in 941 Section 3.1, while delay measurement SHOULD be done according to 942 double-marking method Section 3.3.2. In this case single-marking 943 MAY also be used in combination with double-marking and the two 944 approaches provide slightly different pieces of information that 945 can be combined to have a more robust data set. 947 The experiment with Alternate Marking methodologies confirmed the 948 following benefits: 950 o easy implementation: it can be implemented by using features 951 already available on major routing platforms, or by applying an 952 optimized implementation of the method for both legacy and newest 953 technologies; 955 o low computational effort: the additional load on processing is 956 negligible; 958 o accurate loss and delay measurements: single packet loss 959 granularity is achieved with a Passive measurement; 961 o potential applicability to any kind of packet-based or frame-based 962 traffic: Ethernet, IP, MPLS, etc., and both unicast and multicast; 964 o robustness: the method can easily tolerate out-of-order packets, 965 and it's not based on "special" packets whose loss could have a 966 negative impact; 968 o flexibility: all the timestamp formats are allowed, because they 969 are managed out of band. The format (the Network Time Protocol 970 (NTP) [RFC5905] or the IEEE 1588 Precision Time Protocol (PTP) 971 [IEEE-1588]) depends on the precision you want; and 973 o no interoperability issues: the features required are available on 974 all current routing platforms. Both a centralized or distributed 975 solution can be used to harvest data from the routers. 977 A deployment of the Alternate-Marking Method SHOULD also take into 978 account how to handle and recognize marked and unmarked traffic 979 depending on whether the technique is applied as Hybrid or Passive. 980 In the case where the marking method is applied by changing existing 981 fields of the packets, it is RECOMMENDED to use an additional flag or 982 some out-of-band signaling to indicate if the measurement is 983 activated or not in order to inform the measurement points. While, 984 in the case where the marking field is dedicated, reserved, and 985 included in a protocol extension, the measurement points can learn 986 whether the measurement is activated or not by checking if the 987 specific extension is included or not within the packets. 989 It is worth mentioning some related work: in particular 990 [IEEE-Network-PNPM] explains the Alternate-Marking method together 991 with new mechanisms based on hashing techniques as also further 992 described in [I-D.mizrahi-ippm-marking]; while 993 [I-D.zhou-ippm-enhanced-alternate-marking] extends the Alternate- 994 Marking Data Fields, to provide enhanced capabilities and allow 995 advanced functionalities. 997 5.1. Controlled Domain requirement 999 The Alternate Marking Method is an example of a solution limited to a 1000 controlled domain [RFC8799]. 1002 A controlled domain is a managed network that selects, monitors, and 1003 controls access by enforcing policies at the domain boundaries, in 1004 order to discard undesired external packets entering the domain and 1005 check internal packets leaving the domain. It does not necessarily 1006 mean that a controlled domain is a single administrative domain or a 1007 single organization. A controlled domain can correspond to a single 1008 administrative domain or multiple administrative domains under a 1009 defined network management. It must be possible to control the 1010 domain boundaries, and use specific precautions if traffic traverses 1011 the Internet. 1013 For security reasons, the Alternate Marking Method is RECOMMENDED 1014 only for controlled domains. 1016 6. Compliance with Guidelines from RFC 6390 1018 RFC 6390 [RFC6390] defines a framework and a process for developing 1019 Performance Metrics for protocols above and below the IP layer (such 1020 as IP-based applications that operate over reliable or datagram 1021 transport protocols). 1023 This document doesn't aim to propose a new Performance Metric but 1024 rather a new Method of Measurement for a few Performance Metrics that 1025 have already been standardized. Nevertheless, it's worth applying 1026 guidelines from [RFC6390] to the present document, in order to 1027 provide a more complete and coherent description of the proposed 1028 method. We used a combination of the Performance Metric Definition 1029 template defined in Section 5.4 of [RFC6390] and the Dependencies 1030 laid out in Section 5.5 of that document. 1032 o Metric Name / Metric Description: as already stated, this document 1033 doesn't propose any new Performance Metrics. On the contrary, it 1034 describes a novel method for measuring packet loss [RFC7680]. The 1035 same concept, with small differences, can also be used to measure 1036 delay [RFC7679] and jitter [RFC3393]. The document mainly 1037 describes the applicability to packet loss measurement. 1039 o Method of Measurement or Calculation: according to the method 1040 described in the previous sections, the number of packets lost is 1041 calculated by subtracting the value of the counter on the source 1042 node from the value of the counter on the destination node. Both 1043 counters must refer to the same color. The calculation is 1044 performed when the value of the counters is in a steady state. 1045 The steady state is an intrinsic characteristic of the marking 1046 method counters because the alternation of color makes the 1047 counters associated with each color still one at a time for the 1048 duration of a marking period. 1050 o Units of Measurement: the method calculates and reports the exact 1051 number of packets sent by the source node and not received by the 1052 destination node. 1054 o Measurement Point(s) with Potential Measurement Domain: the 1055 measurement can be performed between adjacent nodes, on a per-link 1056 basis, or along a multi-hop path, provided that the traffic under 1057 measurement follows that path. In case of a multi-hop path, the 1058 measurements can be performed both end-to-end and hop-by-hop. 1060 o Measurement Timing: the method has a constraint on the frequency 1061 of measurements. This is detailed in Section 3.2, where it is 1062 specified that the marking period and the guard band interval are 1063 strictly related each other to avoid out-of-order issues. That is 1064 because, in order to perform a measurement, the counter must be in 1065 a steady state, and this happens when the traffic is being colored 1066 with the alternate color. 1068 o Implementation: the method uses one or two marking bits to color 1069 the packets; this enables the use of policy configurations on the 1070 router to color the packets and accordingly configure the counter 1071 for each color. The path followed by traffic being measured 1072 should be known in advance in order to configure the counters 1073 along the path and be able to compare the correct values. 1075 o Verification: both in the lab and in the operational network, the 1076 methodology has been tested and experimented for packet loss and 1077 delay measurements by using traffic generators together with 1078 precision test instruments and network emulators. 1080 o Use and Applications: the method can be used to measure packet 1081 loss with high precision on live traffic; moreover, by combining 1082 end-to-end and per-link measurements, the method is useful to 1083 pinpoint the single link that is experiencing loss events. 1085 o Reporting Model: the value of the counters has to be sent to a 1086 centralized management system that performs the calculations; such 1087 samples must contain a reference to the time interval they refer 1088 to, so that the management system can perform the correct 1089 correlation; the samples have to be sent while the corresponding 1090 counter is in a steady state (within a time interval); otherwise, 1091 the value of the sample should be stored locally. 1093 o Dependencies: the values of the counters have to be correlated to 1094 the time interval they refer to. 1096 o Organization of Results: the Method of Measurement produces 1097 singletons. 1099 o Parameters: currently, the main parameter of the method is the 1100 time interval used to alternate the colors and read the counters. 1102 7. IANA Considerations 1104 This document has no IANA actions. 1106 8. Security Considerations 1108 This document specifies a method to perform measurements in the 1109 context of a Service Provider's network and has not been developed to 1110 conduct Internet measurements, so it does not directly affect 1111 Internet security nor applications that run on the Internet. 1112 However, implementation of this method must be mindful of security 1113 and privacy concerns. 1115 There are two types of security concerns: potential harm caused by 1116 the measurements and potential harm to the measurements. 1118 o Harm caused by the measurement: the measurements described in this 1119 document are Passive, so there are no new packets injected into 1120 the network causing potential harm to the network itself and to 1121 data traffic. Nevertheless, the method implies modifications on 1122 the fly to a header or encapsulation of the data packets: this 1123 must be performed in a way that doesn't alter the quality of 1124 service experienced by packets subject to measurements and that 1125 preserves stability and performance of routers doing the 1126 measurements. One of the main security threats in OAM protocols 1127 is network reconnaissance; an attacker can gather information 1128 about the network performance by passively eavesdropping on OAM 1129 messages. The advantage of the methods described in this document 1130 is that the marking bits are the only information that is 1131 exchanged between the network devices. Therefore, Passive 1132 eavesdropping on data-plane traffic does not allow attackers to 1133 gain information about the network performance. 1135 o Harm to the Measurement: the measurements could be harmed by 1136 routers altering the marking of the packets or by an attacker 1137 injecting artificial traffic. Authentication techniques, such as 1138 digital signatures, may be used where appropriate to guard against 1139 injected traffic attacks. Since the measurement itself may be 1140 affected by routers (or other network devices) along the path of 1141 IP packets intentionally altering the value of marking bits of 1142 packets, as mentioned above, the mechanism specified in this 1143 document can be applied just in the context of a controlled 1144 domain; thus, the routers (or other network devices) are locally 1145 administered and this type of attack can be avoided. In addition, 1146 an attacker can't gain information about network performance from 1147 a single monitoring point; it must use synchronized monitoring 1148 points at multiple points on the path, because they have to do the 1149 same kind of measurement and aggregation that Service Providers 1150 using Alternate Marking must do. 1152 Attacks on the data collection and reporting of the statistics 1153 between the monitoring points and the network management system can 1154 interfere with the proper functioning of the system. Hence, the 1155 channels used to report back flow statistics MUST be secured. 1157 The privacy concerns of network measurement are limited because the 1158 method only relies on information contained in the header or 1159 encapsulation without any release of user data. Although information 1160 in the header or encapsulation is metadata that can be used to 1161 compromise the privacy of users, the limited marking technique in 1162 this document seems unlikely to substantially increase the existing 1163 privacy risks from header or encapsulation metadata. It might be 1164 theoretically possible to modulate the marking to serve as a covert 1165 channel, but it would have a very low data rate if it is to avoid 1166 adversely affecting the measurement systems that monitor the marking. 1168 Delay attacks are another potential threat in the context of this 1169 document. Delay measurement is performed using a specific packet in 1170 each block, marked by a dedicated color bit. Therefore, a 1171 man-in-the-middle attacker can selectively induce synthetic delay 1172 only to delay-colored packets, causing systematic error in the delay 1173 measurements. As discussed in previous sections, the methods 1174 described in this document rely on an underlying time synchronization 1175 protocol. Thus, by attacking the time protocol, an attacker can 1176 potentially compromise the integrity of the measurement. A detailed 1177 discussion about the threats against time protocols and how to 1178 mitigate them is presented in RFC 7384 [RFC7384]. 1180 9. Contributors 1182 Mach(Guoyi) Chen 1183 Huawei Technologies 1184 Email: mach.chen@huawei.com 1186 Alessandro Capello 1187 Telecom Italia 1188 Email: alessandro.capello@telecomitalia.it 1190 10. Acknowledgements 1192 The authors would like to thank Alberto Tempia Bonda, Luca 1193 Castaldelli and Lianshu Zheng for their contribution to the 1194 experimentation of the method. 1196 The authors would also thank Martin Duke and Tommy Pauly for their 1197 assistance and their detailed and precious reviews. 1199 11. References 1201 11.1. Normative References 1203 [IEEE-1588] 1204 IEEE, "IEEE Standard for a Precision Clock Synchronization 1205 Protocol for Networked Measurement and Control Systems", 1206 IEEE Std 1588-2008. 1208 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1209 Requirement Levels", BCP 14, RFC 2119, 1210 DOI 10.17487/RFC2119, March 1997, 1211 . 1213 [RFC5905] Mills, D., Martin, J., Ed., Burbank, J., and W. Kasch, 1214 "Network Time Protocol Version 4: Protocol and Algorithms 1215 Specification", RFC 5905, DOI 10.17487/RFC5905, June 2010, 1216 . 1218 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 1219 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 1220 May 2017, . 1222 11.2. Informative References 1224 [I-D.fioccola-rfc8889bis] 1225 Fioccola, G., Cociglio, M., Sapio, A., Sisto, R., and T. 1226 Zhou, "Multipoint Alternate-Marking Method", draft- 1227 fioccola-rfc8889bis-01 (work in progress), December 2021. 1229 [I-D.mizrahi-ippm-marking] 1230 Mizrahi, T., Fioccola, G., Cociglio, M., Chen, M., and G. 1231 Mirsky, "Marking Methods for Performance Measurement", 1232 draft-mizrahi-ippm-marking-00 (work in progress), October 1233 2021. 1235 [I-D.zhou-ippm-enhanced-alternate-marking] 1236 Zhou, T., Fioccola, G., Liu, Y., Lee, S., Cociglio, M., 1237 and W. Li, "Enhanced Alternate Marking Method", draft- 1238 zhou-ippm-enhanced-alternate-marking-08 (work in 1239 progress), January 2022. 1241 [IEEE-Network-PNPM] 1242 IEEE Network, "AM-PM: Efficient Network Telemetry using 1243 Alternate Marking", DOI 10.1109/MNET.2019.1800152, 2019. 1245 [RFC3393] Demichelis, C. and P. Chimento, "IP Packet Delay Variation 1246 Metric for IP Performance Metrics (IPPM)", RFC 3393, 1247 DOI 10.17487/RFC3393, November 2002, 1248 . 1250 [RFC4656] Shalunov, S., Teitelbaum, B., Karp, A., Boote, J., and M. 1251 Zekauskas, "A One-way Active Measurement Protocol 1252 (OWAMP)", RFC 4656, DOI 10.17487/RFC4656, September 2006, 1253 . 1255 [RFC5357] Hedayat, K., Krzanowski, R., Morton, A., Yum, K., and J. 1256 Babiarz, "A Two-Way Active Measurement Protocol (TWAMP)", 1257 RFC 5357, DOI 10.17487/RFC5357, October 2008, 1258 . 1260 [RFC5481] Morton, A. and B. Claise, "Packet Delay Variation 1261 Applicability Statement", RFC 5481, DOI 10.17487/RFC5481, 1262 March 2009, . 1264 [RFC6374] Frost, D. and S. Bryant, "Packet Loss and Delay 1265 Measurement for MPLS Networks", RFC 6374, 1266 DOI 10.17487/RFC6374, September 2011, 1267 . 1269 [RFC6390] Clark, A. and B. Claise, "Guidelines for Considering New 1270 Performance Metric Development", BCP 170, RFC 6390, 1271 DOI 10.17487/RFC6390, October 2011, 1272 . 1274 [RFC6703] Morton, A., Ramachandran, G., and G. Maguluri, "Reporting 1275 IP Network Performance Metrics: Different Points of View", 1276 RFC 6703, DOI 10.17487/RFC6703, August 2012, 1277 . 1279 [RFC7276] Mizrahi, T., Sprecher, N., Bellagamba, E., and Y. 1280 Weingarten, "An Overview of Operations, Administration, 1281 and Maintenance (OAM) Tools", RFC 7276, 1282 DOI 10.17487/RFC7276, June 2014, 1283 . 1285 [RFC7384] Mizrahi, T., "Security Requirements of Time Protocols in 1286 Packet Switched Networks", RFC 7384, DOI 10.17487/RFC7384, 1287 October 2014, . 1289 [RFC7679] Almes, G., Kalidindi, S., Zekauskas, M., and A. Morton, 1290 Ed., "A One-Way Delay Metric for IP Performance Metrics 1291 (IPPM)", STD 81, RFC 7679, DOI 10.17487/RFC7679, January 1292 2016, . 1294 [RFC7680] Almes, G., Kalidindi, S., Zekauskas, M., and A. Morton, 1295 Ed., "A One-Way Loss Metric for IP Performance Metrics 1296 (IPPM)", STD 82, RFC 7680, DOI 10.17487/RFC7680, January 1297 2016, . 1299 [RFC7799] Morton, A., "Active and Passive Metrics and Methods (with 1300 Hybrid Types In-Between)", RFC 7799, DOI 10.17487/RFC7799, 1301 May 2016, . 1303 [RFC8321] Fioccola, G., Ed., Capello, A., Cociglio, M., Castaldelli, 1304 L., Chen, M., Zheng, L., Mirsky, G., and T. Mizrahi, 1305 "Alternate-Marking Method for Passive and Hybrid 1306 Performance Monitoring", RFC 8321, DOI 10.17487/RFC8321, 1307 January 2018, . 1309 [RFC8799] Carpenter, B. and B. Liu, "Limited Domains and Internet 1310 Protocols", RFC 8799, DOI 10.17487/RFC8799, July 2020, 1311 . 1313 Appendix A. Changes Log 1315 Changes from RFC 8321 include: 1317 o Minor editorial changes 1319 o Replacement of the section on "Applications, Implementation, and 1320 Deployment" with "Finding of the Alternate Marking Implementations 1321 and Deployments" 1323 o Moved advantages and benefits of the method from "Introduction" to 1324 the new section on "Finding of the Alternate Marking 1325 Implementations and Deployments" 1327 o Removed section on "Hybrid Measurement" 1329 Changes in v-(01) include: 1331 o Considerations on the reference: [IEEE-Network-PNPM] 1333 o Clarified that the method based on a fixed timer is specified in 1334 this document while the method based on a fixed number of packets 1335 is only mentioned but not detailed. 1337 o Explanation of the the intrinsic error in section 3.3.1 on 1338 "Single-Marking Methodology" 1340 o Deleted some parts in section 4 "Considerations" that no longer 1341 apply 1343 o New section on "Packet Fragmentation" 1345 Changes in v-(02) include: 1347 o Considerations on how to handle unmarked traffic in section 5 on 1348 "Results of the Alternate Marking Experiment" 1350 o Minor rewording in section 4.4 on "Packet Fragmentation" 1352 Authors' Addresses 1354 Giuseppe Fioccola (editor) 1355 Huawei Technologies 1356 Riesstrasse, 25 1357 Munich 80992 1358 Germany 1360 Email: giuseppe.fioccola@huawei.com 1361 Mauro Cociglio 1362 Telecom Italia 1363 Via Reiss Romoli, 274 1364 Torino 10148 1365 Italy 1367 Email: mauro.cociglio@telecomitalia.it 1369 Greg Mirsky 1370 Ericsson 1372 Email: gregimirsky@gmail.com 1374 Tal Mizrahi 1375 Huawei Technologies 1377 Email: tal.mizrahi.phd@gmail.com 1379 Tianran Zhou 1380 Huawei Technologies 1381 156 Beiqing Rd. 1382 Beijing 100095 1383 China 1385 Email: zhoutianran@huawei.com 1387 Xiao Min 1388 ZTE Corp. 1390 Email: xiao.min2@zte.com.cn