idnits 2.17.00 (12 Aug 2021) /tmp/idnits58970/draft-he-netconf-adaptive-collection-usecases-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document date (20 March 2022) is 55 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'RFC3688' is defined on line 371, but no explicit reference was found in the text == Unused Reference: 'RFC6020' is defined on line 375, but no explicit reference was found in the text == Unused Reference: 'RFC6242' is defined on line 380, but no explicit reference was found in the text Summary: 0 errors (**), 0 flaws (~~), 4 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 NETCONF Working Group X. He 3 Internet-Draft X. Mao 4 Intended status: Standards Track China Telecom 5 Expires: 21 September 2022 Q. Ma 6 X. Zhou 7 Huawei 8 20 March 2022 10 Problem Statement and Use Cases of Adaptive Traffic Data Collection 11 draft-he-netconf-adaptive-collection-usecases-00 13 Abstract 15 IP carrier network needs to provide real-time traffic visibility to 16 help network operators quickly and accurately locate network 17 congestion and packet loss, and make timely path adjustment for 18 deterministic services in order to avoid congestion. It is essential 19 to explore the adaptive traffic data collection mechanism so as to 20 capture real-time network state at minimum resource consumption. 22 This document summarizes the problems currently faced by network 23 operators when attempting to provide timely traffic data collection 24 to satisfy the various scenarios that require real-time network state 25 and traffic visibility, and aggregates the requirements for adaptive 26 traffic collecting mechanism from a variety of deployment scenarios. 28 Status of This Memo 30 This Internet-Draft is submitted in full conformance with the 31 provisions of BCP 78 and BCP 79. 33 Internet-Drafts are working documents of the Internet Engineering 34 Task Force (IETF). Note that other groups may also distribute 35 working documents as Internet-Drafts. The list of current Internet- 36 Drafts is at https://datatracker.ietf.org/drafts/current/. 38 Internet-Drafts are draft documents valid for a maximum of six months 39 and may be updated, replaced, or obsoleted by other documents at any 40 time. It is inappropriate to use Internet-Drafts as reference 41 material or to cite them other than as "work in progress." 43 This Internet-Draft will expire on 21 September 2022. 45 Copyright Notice 47 Copyright (c) 2022 IETF Trust and the persons identified as the 48 document authors. All rights reserved. 50 This document is subject to BCP 78 and the IETF Trust's Legal 51 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 52 license-info) in effect on the date of publication of this document. 53 Please review these documents carefully, as they describe your rights 54 and restrictions with respect to this document. Code Components 55 extracted from this document must include Revised BSD License text as 56 described in Section 4.e of the Trust Legal Provisions and are 57 provided without warranty as described in the Revised BSD License. 59 Table of Contents 61 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 62 1.1. Abbreviations . . . . . . . . . . . . . . . . . . . . . . 3 63 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 64 3. Problem Statement . . . . . . . . . . . . . . . . . . . . . . 4 65 4. Scenarios of Adaptive Traffic data collection . . . . . . . . 6 66 4.1. Multi-dimensional real-time portrait of interface traffic 67 characteristic . . . . . . . . . . . . . . . . . . . . . 6 68 4.2. Microburst traffic detecting . . . . . . . . . . . . . . 6 69 4.3. Congestion avoidance for deterministic services . . . . . 7 70 4.4. On-path telemetry based on adaptive traffic sampling . . 7 71 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8 72 6. Security Considerations . . . . . . . . . . . . . . . . . . . 8 73 7. References . . . . . . . . . . . . . . . . . . . . . . . . . 8 74 7.1. Normative References . . . . . . . . . . . . . . . . . . 8 75 7.2. Informative References . . . . . . . . . . . . . . . . . 9 76 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 9 78 1. Introduction 80 With the advent of cloud computing, big data and AI, as well as the 81 scale deployment of 5G mobile communication technology, a large 82 number of uRLLC services such as AR/VR, industrial Internet and 83 computing power network have emerged, which puts forward higher 84 requirements for the service quality of IP carrier network. IP 85 carrier network needs to provide real-time traffic visibility to help 86 network operators quickly and accurately locate network congestion 87 and packet loss, and make timely path adjustment for the services of 88 deterministic delay in order to avoid the congested nodes and links. 89 For such business scenarios, the network needs to provide traffic 90 sampling capability in sub seconds or even milliseconds so as to gain 91 real-time network state. 93 For decades, SNMP and MIBs have been widely deployed and the de facto 94 choice for many monitoring solutions, especially in collecting 95 interface traffic. Arguably the biggest shortcoming of SNMP for 96 those applications concerns the need to rely on periodic polling, 97 because it introduces an additional load on the network and devices, 98 and it is brittle if polling cycles are missed. Therefore, SNMP has 99 no capability to realize real-time traffic sampling at sub seconds or 100 even milliseconds level. Telemetry, as a revolutionary data 101 acquisition technique, based on pull mechanism that is able to 102 deliver object changes as they happen, overcomes the limitations of 103 SNMP such as "slow speed, low efficiency and more demands for 104 processing capacity". Nevertheless, for the sake of capturing real- 105 time network state, persistent sampling of interface traffic at 106 milliseconds interval will generate a considerable amount of data 107 which may claim too much transport bandwidth and overload the servers 108 for data collection, storage, and analysis. Increasing the data 109 handling capacity is technically feasible but expensive, and 110 difficult to achieve large-scale deployment in operator's networks. 111 It is essential to explore the adaptive traffic data collection 112 mechanism so as to capture real-time network state at minimum 113 resource consumption. 115 This document summarizes the problems currently faced by network 116 operators when attempting to provide timely traffic data collection 117 to satisfy the various requirements by the aforementioned new 118 scenarios that require real-time network state and traffic 119 visibility. Also, this document aggregates the requirements for 120 adaptive traffic data collection mechanism from a variety of 121 deployment scenarios. 123 1.1. Abbreviations 125 AI: Artificial Intelligence 127 AR: Augmented Reality 129 VR: Virtual Reality 131 IP RAN: IP Radio Access Network 133 DetNet: Deterministic Networking 135 QoE: Quality of Experience 137 SLA: Service Level Agreement 138 uRLLC: ultra Reliable & Low Latency Communication 140 NMS: Network Management System 142 IDC: Internet Data Center 144 SNMP: Simple Network Management Protocol 146 MIB: Management Information Base 148 2. Terminology 150 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 151 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 152 "OPTIONAL" in this document are to be interpreted as described in BCP 153 14 [RFC2119] [RFC8174] when, and only when, they appear in all 154 capitals, as shown here. 156 The following terms are defined in this document: 158 adaptive traffic data collection: Allow servers automatically switch 159 to different telemetry sampling period to collect traffic data 160 according to the threshold change. 162 3. Problem Statement 164 As is well known ,IP network, based on statistical multiplexing 165 model, is of traffic burst characteristic. For a long time, 166 operators have obtained traffic visibility from the Network 167 Management System (NMS), and satisfied with 30~40% bandwidth 168 utilization. In spite of such low link usage, many complaints have 169 still been received about poor QoE in delivering applications with 170 the sensitivity of delay and packet loss. The fundamental cause is 171 that the observed average network traffic masks the characteristic of 172 traffic burst, given that SNMP is widely employed in operator's 173 networks to collect network traffic at 5 minutes intervals. 175 A large quantity of laboratory data and operational data indicate 176 that a microburst phenomenon occurs frequently in operator's carrier 177 networks, such as IP RAN, IP metropolitan network, IP backbone 178 network and IDC. The typical duration of such a microburst is tens 179 to hundreds of milliseconds, easy to cause instantaneous congestion 180 of the output queue. Network congestion amplifies queuing delay and 181 jitter, it may even give rise to packet loss. All along, solving the 182 problem of network congestion is a major challenge for IP networks. 183 So, the microburst is not beneficial to the deterministic-delay 184 applications. And it is difficult to eliminate the microburst, but 185 must attempt to avoid it. 187 Although the mechanism of microburst is not very distinct, however, 188 it does not hinder us to detect it. Fortunately, Telemetry (e.g., 189 YANG PUSH [RFC8639] [RFC8641],gNMI [gNMI]) has the capability to 190 collect interface traffic at a higher frequency, i.e., millisecond 191 interval. So, by means of telemetry technique, we can capture the 192 complete aspects of a microburst traffic. However, it is impractical 193 to gain the real-time traffic visibility at the cost of persistent 194 sampling at millisecond intervals. For example, in order to capture 195 a microburst traffic of interface, at least 10-millisecond sampling 196 cycle is necessary. Compared with the today's widely employed 197 5-minute sampling cycle based on SNMP, the required resources will 198 increase by 30000 times! 200 It is essential to investigate the adaptive traffic data collection 201 mechanism so as to capture real-time network state at minimum 202 resource consumption. That is, in normal non-congested network 203 conditions, which happen at the time of 95% above, minutes-level 204 sampling cycle is enough as it is. But, while detecting a congestion 205 state or congestion trend, sampling period must be timely tuned to 206 milliseconds to capture a microburst traffic of interface. A 207 congestion state or congestion trend of interface is manifested by 208 packet loss due to queue overflow, queue depth beyond the threshold 209 or too high link utilization, which can be defined as Event-triggered 210 data. Such data can be actively pushed through subscription or 211 passively polled through query. Although the microburst phenomenon 212 occurs frequently, it is transient and an on-line detection tool is 213 preferable to find it timely. The traditional method of using CPU on 214 main control board through query is processing resources consuming, 215 the network device must possess built-in hardware designed especially 216 to monitor it. 218 In order to reduce the excessive consumption of resources caused by 219 millisecond level collection of the single data, batch data such as 220 hundreds of sampled traffic data from an interface can be packaged as 221 a telemetry packet and is sent to the collector. The timestamp is 222 required for every sampled traffic data for the convenience of the 223 collector visualizing the interface traffic trend, And the collector 224 must realize traffic visualization in real-time manner in order that 225 the operators can observe it immediately. 227 4. Scenarios of Adaptive Traffic data collection 229 This section presents several typical scenarios which require 230 adaptive traffic data collection to gain real-time network state and 231 traffic visibility at minimum resource consumption. 233 4.1. Multi-dimensional real-time portrait of interface traffic 234 characteristic 236 Interface traffic data collection is one of the most important 237 functions for NMS. Today, more and more applications are of latency- 238 sensitive and loss-sensitive characteristic, and the real-time 239 traffic visibility can help operators better understand network 240 performance so as to achieve SLA guarantees. On the other hand, 241 obtaining the holistic and genuine characteristic of interface 242 traffic is also a basic requirement for the statistical multiplexing 243 model of IP network, which is of great significance for traffic 244 prediction, network planning, network capacity expansion, network 245 optimization,etc. However, the traditional NMS based on SNMP has no 246 capability to depict genuine characteristic of interface traffic, and 247 interface traffic data collection based on telemetry techniques is 248 preferable. 250 It is essential to exploit the adaptive traffic data collection 251 techniques to depict multi-dimensional real-time portrait of 252 interface traffic characteristic at minimum resource consumption. 253 That is to say, in normal non-congested network conditions, which 254 happen at the time of 95% above, minutes-level sampling cycle is 255 enough as it is. But, while detecting a congestion state or 256 congestion trend, sampling cycle must be timely tuned to milliseconds 257 to capture a microburst traffic of interface. Such an adaptive 258 traffic data collection technique can not only reflect the coarse- 259 grained interface traffic characteristics, but also capture the 260 congestion state of interface with finer time granularity. It will 261 be an important tool for the DetNet to obtain real-time network 262 performance. Because of the lower cost, it can be deployed on large- 263 scale in operator's networks. 265 4.2. Microburst traffic detecting 267 Microburst traffic, as an instantaneous congestion phenomenon 268 occurring frequently in IP carrier network, will cause critical delay 269 jitter and even packet loss, which will seriously affect the QoE of 270 latency-sensitive and loss-sensitive applications. The ability of 271 detecting microburst traffic of interface will help network operators 272 quickly and accurately locate network congestion and packet loss, and 273 make timely path adjustment for deterministic-delay services in order 274 to avoid the congested nodes and links. 276 Because the typical duration of such a microburst is generally tens 277 to hundreds of milliseconds, at least 10-millisecond sampling cycle 278 is necessary. Although the microburst phenomenon occurs frequently, 279 it takes very little time of 24 hours a day. It is not a good 280 approach to observe it through persistent millisecond sampling 281 period. Preferably, we can capture it as soon as a microburst occurs 282 to ensure important diagnose data will not be missed.Because it is 283 transient and an on-line detection tool is required to find it 284 timely. Triggered by the events such as packet loss, queue depth 285 beyond the threshold which is detected timely, sampling period must 286 be timely tuned to milliseconds to capture a microburst traffic of 287 interface. In a word, it is of practical significance to explore the 288 microburst detection technology aiming at minimizing resource 289 consumption. 291 4.3. Congestion avoidance for deterministic services 293 Network congestion will rapidly increase queuing delay and jitter, it 294 may even give rise to packet loss, which will seriously affect the 295 QoE of delay-sensitive and packet loss-sensitive applications. The 296 goal of network optimization is to reduce the occurrence of network 297 congestion as much as possible. 299 It is a complicated problem for network operators to accurately 300 predict the trend of network congestion and make network adjustment 301 in advance. The real-time traffic visibility based on the adaptive 302 traffic data collection techniques can accurately predict the long- 303 term congestion, and quickly capture the instantaneous congestion 304 (i.e,. microburst) of interface. By means of the real-time traffic 305 visibility, the automatic optimization tool (e.g., AI) can make 306 timely path adjustment for key traffic flows. For example, based on 307 the real-time traffic visibility and microburst events (e.g., packet 308 loss, queue depth) collected, the controller can accurately predict 309 the congestion trend of interface and make timely traffic redirection 310 to the non-congested interface for delay deterministic applications. 312 4.4. On-path telemetry based on adaptive traffic sampling 314 On-path telemetry is useful for application-aware networking 315 operations. For example, it is critical for the operators who offer 316 high-bandwidth, latency and loss-sensitive services such as video 317 streaming and online gaming to closely monitor the relevant flows in 318 real-time as the basis for any further optimizations. Applying on- 319 path telemetry on all packets of selected flows can still be out of 320 reach. A sampling rate should be set for these flows and only enable 321 telemetry on the sampled packets. However, a too high rate would 322 exhaust the network resource and even cause packet drops; an overly 323 low rate, on the contrary, would result in the loss of information 324 and inaccuracy of measurements. 326 An adaptive approach can be used based on the network conditions to 327 dynamically adjust the sampling rate. In normal network state, a low 328 sampling rate is enough to reflect network performance; But, in case 329 of network congestion, the controller is aware of it from the real- 330 time traffic visibility and events data collected (e.g., packet loss, 331 queue depth), and timely adjust the packet sampling rate at very high 332 level. Even all packets of selected flows are applicable so as to 333 acquire real-time measurement data such as latency, jitter and packet 334 loss. 336 5. IANA Considerations 338 This document does not include an IANA request. 340 6. Security Considerations 342 This document provides an adaptive telemetry mechanism to minimize 343 the resource consumption. The increased complexity of network 344 telemetry may give rise to some security concerns. For example, 345 persistent traffic collection at very high rate (e.g., at millisecond 346 interval) induced by wrong configuration or spurious triggering might 347 exhaust resources of the forwarding plane and the control plane of 348 network device as well as the collector; An inappropriate threshold 349 setting should be avoided. The telemetry data is highly sensitive, 350 which exposes a lot of information about the network and its 351 configuration. Some of that information can make designing attacks 352 against the network much easier (e.g., exact details of what software 353 and patches have been installed), and allows an attacker to determine 354 whether a device may be subject to unprotected security 355 vulnerabilities. 357 On the other hand, for telemetry interfaces security considerations, 358 NETCONF or gNMI must provide authentication, data 359 integrity,confidentiality, and replay protection. And further study 360 of the security issues will be required. 362 7. References 364 7.1. Normative References 366 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 367 Requirement Levels", BCP 14, RFC 2119, 368 DOI 10.17487/RFC2119, March 1997, 369 . 371 [RFC3688] Mealling, M., "The IETF XML Registry", BCP 81, RFC 3688, 372 DOI 10.17487/RFC3688, January 2004, 373 . 375 [RFC6020] Bjorklund, M., Ed., "YANG - A Data Modeling Language for 376 the Network Configuration Protocol (NETCONF)", RFC 6020, 377 DOI 10.17487/RFC6020, October 2010, 378 . 380 [RFC6242] Wasserman, M., "Using the NETCONF Protocol over Secure 381 Shell (SSH)", RFC 6242, DOI 10.17487/RFC6242, June 2011, 382 . 384 [RFC8639] Voit, E., Clemm, A., Gonzalez Prieto, A., Nilsen-Nygaard, 385 E., and A. Tripathy, "Subscription to YANG Notifications", 386 RFC 8639, DOI 10.17487/RFC8639, September 2019, 387 . 389 [RFC8641] Clemm, A. and E. Voit, "Subscription to YANG Notifications 390 for Datastore Updates", RFC 8641, DOI 10.17487/RFC8641, 391 September 2019, . 393 7.2. Informative References 395 [gNMI] "https://github.com/openconfig/gnmi". 397 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 398 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 399 May 2017, . 401 Authors' Addresses 403 Xiaoming He 404 China Telecom 405 Email: hexm4@chinatelecom.cn 407 Dongfeng Mao 408 China Telecom 409 Email: maodf@chinatelecom.cn 411 Qiufang Ma 412 Huawei 413 101 Software Avenue, Yuhua District 414 Nanjing 415 Jiangsu, 210012 416 China 417 Email: maqiufang1@huawei.com 419 Tianran Zhou 420 Huawei 421 Email: zhoutianran@huawei.com