idnits 2.17.00 (12 Aug 2021) /tmp/idnits48196/draft-mcbride-edge-data-discovery-overview-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document date (January 29, 2020) is 842 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Outdated reference: A later version (-06) exists of draft-bernardos-intarea-vim-discovery-02 ** Downref: Normative reference to an Experimental draft: draft-bernardos-intarea-vim-discovery (ref. 'I-D.bernardos-intarea-vim-discovery') == Outdated reference: A later version (-07) exists of draft-bernardos-sfc-discovery-03 ** Downref: Normative reference to an Experimental draft: draft-bernardos-sfc-discovery (ref. 'I-D.bernardos-sfc-discovery') Summary: 2 errors (**), 0 flaws (~~), 4 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 COINRG M. McBride 3 Internet-Draft Futurewei 4 Intended status: Standards Track D. Kutscher 5 Expires: August 1, 2020 Emden University 6 E. Schooler 7 Intel 8 CJ. Bernardos 9 UC3M 10 January 29, 2020 12 Edge Data Discovery for COIN 13 draft-mcbride-edge-data-discovery-overview-03 15 Abstract 17 This document describes the problem of distributed data discovery in 18 edge computing, and in particular for computing-in-the-network 19 (COIN), both the marshalling of data at the outset of a computation 20 and the persistence of the resultant data after the computation. 21 Although the data might originate at the network edge, as more and 22 more distributed data is created, processed, and stored, it becomes 23 increasingly dispersed throughout the network. There needs to be a 24 standard way to find it. New and existing protocols will need to be 25 developed to support distributed data discovery at the network edge 26 and beyond. 28 Status of This Memo 30 This Internet-Draft is submitted in full conformance with the 31 provisions of BCP 78 and BCP 79. 33 Internet-Drafts are working documents of the Internet Engineering 34 Task Force (IETF). Note that other groups may also distribute 35 working documents as Internet-Drafts. The list of current Internet- 36 Drafts is at https://datatracker.ietf.org/drafts/current/. 38 Internet-Drafts are draft documents valid for a maximum of six months 39 and may be updated, replaced, or obsoleted by other documents at any 40 time. It is inappropriate to use Internet-Drafts as reference 41 material or to cite them other than as "work in progress." 43 This Internet-Draft will expire on August 1, 2020. 45 Copyright Notice 47 Copyright (c) 2020 IETF Trust and the persons identified as the 48 document authors. All rights reserved. 50 This document is subject to BCP 78 and the IETF Trust's Legal 51 Provisions Relating to IETF Documents 52 (https://trustee.ietf.org/license-info) in effect on the date of 53 publication of this document. Please review these documents 54 carefully, as they describe your rights and restrictions with respect 55 to this document. Code Components extracted from this document must 56 include Simplified BSD License text as described in Section 4.e of 57 the Trust Legal Provisions and are provided without warranty as 58 described in the Simplified BSD License. 60 Table of Contents 62 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 63 1.1. Edge Data . . . . . . . . . . . . . . . . . . . . . . . . 3 64 1.2. Background . . . . . . . . . . . . . . . . . . . . . . . 3 65 1.3. Requirements Language . . . . . . . . . . . . . . . . . . 4 66 1.4. Terminology . . . . . . . . . . . . . . . . . . . . . . . 4 67 2. Edge Data Discovery Problem Scope . . . . . . . . . . . . . . 4 68 2.1. A Cloud-Edge Continuum . . . . . . . . . . . . . . . . . 5 69 2.2. Types of Edge Data . . . . . . . . . . . . . . . . . . . 6 70 2.2.1. Example Meta Data . . . . . . . . . . . . . . . . . . 7 71 3. Scenarios for Discovering Edge Data Resources . . . . . . . . 8 72 4. Edge Data Discovery . . . . . . . . . . . . . . . . . . . . . 9 73 4.1. Types of Discovery . . . . . . . . . . . . . . . . . . . 9 74 4.2. Naming the Data . . . . . . . . . . . . . . . . . . . . . 9 75 5. Use Cases of edge data discovery . . . . . . . . . . . . . . 10 76 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10 77 7. Security Considerations . . . . . . . . . . . . . . . . . . . 11 78 8. Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . 11 79 9. Normative References . . . . . . . . . . . . . . . . . . . . 11 80 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 11 82 1. Introduction 84 Edge computing is an architectural shift that migrates Cloud 85 functionality (compute, storage, networking, control, data 86 management, etc.) out of the back-end data center to be more 87 proximate to the IoT data being generated and analyzed at the edges 88 of the network. Edge computing provides local compute, storage and 89 connectivity services, often required for latency- and bandwidth- 90 sensitive applications. Thus, Edge Computing plays a key role in 91 verticals such as Energy, Manufacturing, Automotive, Video 92 Surveillance, Retail, Gaming, Healthcare, Mining, Buildings and Smart 93 Cities. 95 1.1. Edge Data 97 Edge computing is motivated at least in part by the sheer volume of 98 data that is being created by endpoint devices (sensors, cameras, 99 lights, vehicles, drones, wearables, etc.) at the very network edge 100 and that flows upstream, in a direction for which the network was not 101 originally designed. In fact, in dense IoT deployments (e.g., many 102 video cameras are streaming high definition video), where multiple 103 data flows collect or converge at edge nodes, data is likely to need 104 transformation (transcoded, subsampled, compressed, analyzed, 105 annotated, combined, aggregated, etc.) to fit over the next hop link, 106 or even to fit in memory or storage. Note also that the act of 107 performing compute on the data creates yet another new data stream! 108 Preservation of the original data streams are needed sometimes but 109 not always. 111 In addition, data may be cached, copied and/or stored at multiple 112 locations in the network on route to its final destination. With an 113 increasing percentage of devices connecting to the Internet being 114 mobile, support for in-the-network caching and replication is 115 critical for continuous data availability, not to mention efficient 116 network and battery usage for endpoint devices. 118 Additionally, as mobile devices' memory/storage fill up, in an edge 119 context they may have the ability to offload their data to other 120 proximate devices or resources, leaving a bread crumb trail of data 121 in their wakes. Therefore, although data might originate at edge 122 devices, as more and more data is continuously created, processed and 123 stored, it becomes increasingly dispersed throughout the physical 124 world (outside of or scattered across managed local data centers), 125 increasingly isolated in separate local edge clouds or data silos. 126 Thus there needs to be a standard way to find it. New and existing 127 protocols will need to be identified/developed/enhanced for these 128 purposes. Being able to discover distributed data at the edge or in 129 the middle of the network - will be an important component of Edge 130 computing. 132 1.2. Background 134 Several IETF T2T RG Edge Computing discussions have been held over 135 the last couple years, a comparative study on the definition of Edge 136 computing was presented in multiple sessions in T2T RG in 2018 and an 137 Edge Computing I-D was submitted early 2019. An IETF BEC (beyond 138 edge computing) effort has been evaluating potential gaps in existing 139 edge computing architectures. Edge Data Discovery is one potential 140 gap that was identified and that needs evaluation and a solution. 141 The newly proposed COIN RG highlights the need for computations in 142 the network to be able to marshal potentially distributed input data 143 and to handle resultant output data, i.e., its placement, storage 144 and/or possible migration strategy. 146 1.3. Requirements Language 148 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 149 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 150 document are to be interpreted as described in RFC 2119 [RFC2119]. 152 1.4. Terminology 154 o Edge: The edge encompasses all entities not in the back-end cloud. 155 The device edge is the boundary between digital and physical 156 entities in the last mile network. Sensors, gateways, compute 157 nodes are included. The infrastructure edge includes equipment on 158 the network operator side of the last mile network including cell 159 towers, edge data centers, cable headends, POPs, etc. See 160 Figure 1 for other possible tiers of edge clouds between the 161 device edge and the back-end cloud data center. 163 o Edge Computing: Distributed computation that is performed near the 164 network edge, where nearness is determined by the system 165 requirements. This includes high performance compute, storage and 166 network equipment on either the device or infrastructure edge. 168 o Edge Data Discovery: The process of finding required data from 169 edge entities, i.e., from databases, files systems, device memory 170 that might be physically distributed in the network, and providing 171 access to it logically as if it were a single unified source, 172 perhaps through its namespace, that can be evaluated or searched. 174 o ICN: Information Centric Networking. An ICN-enabled network 175 routes data by name (vs address), caches content natively in the 176 network, and employs data-centric security. Data discovery may 177 require that data be associated with a name or names, a series of 178 descriptive attributes, and/or a unique identifier. 180 2. Edge Data Discovery Problem Scope 182 Our focus is on how to define and scope the edge data discovery 183 problem. This requires some discussion of the evolving definition of 184 the edge as part of a cloud-to-edge continuum and in turn what is 185 meant by edge data as well as the meta-data about edge data. 187 2.1. A Cloud-Edge Continuum 189 Although Edge Computing data typically originates at edge devices, 190 there is nothing that precludes edge data from being created anywhere 191 in the cloud-to-edge computing continuum (Figure 1). New edge data 192 may result as a byproduct of computation being performed on the data 193 stream anywhere along its path in the network. For 194 example,infrastructure edges may create new edge data when multiple 195 data streams converge upon this aggregation point and require 196 transformation (e.g., to fit within the available resources, to 197 smooth raw measurements to eliminate high-frequency noise, to 198 obfuscate data for privacy). 200 An assumption is that all data will have associated policies 201 (default, inherited or configured) that describe access control 202 permissions. Consequently, the discoverability of data will be a 203 function of who or what has requested access. In other words, the 204 discoverable view into the available data will be limited to those 205 who are authorized. Discovering edge data that is exclusively 206 private is out of scope of this document, the assumption being that 207 there will be some edge clouds that do not expose or publish the 208 availability of their data. Although edge data may be sent to the 209 back-end cloud as needed, there is nothing that precludes it from 210 being discoverable if the cloud offers it as public. 212 Initially our focus is on discovery of edge data that resides at the 213 Device Edge and the Infrastructure Edge. 215 +-------------------------------+ 216 | Back-end Cloud Data Center | 217 +-------------------------------+ 218 *** Cloud 219 * * Interconnect 220 *** 221 +-------------------------------+ 222 | Core Data Center | 223 +-------------------------------+ 224 *** Backbone 225 * * Network 226 *** 227 +-------------------------------+ 228 | Regional Data Center | 229 +-------------------------------+ 230 *** Metropolitan 231 * * Network 232 *** 233 +-------------------------------+ 234 | Infrastructure Edge | 235 +-------------------------------+ 236 *** Access 237 * * Network 238 *** 239 +-------------------------------+ 240 | Device Edge | 241 +-------------------------------+ 243 Figure 1: Cloud-to-edge computing continuum 245 2.2. Types of Edge Data 247 Besides classically constrained IoT device sensor and measurement 248 data accumulating throughout the edge computing infrastructure, edge 249 data may also take the form of higher frequency and higher volume 250 streaming data (from a continuous sensor or from a camera), meta data 251 (about the data), control data (regarding an event that was 252 triggered), and/or an executable that embodies a function, service, 253 or any other piece of code or algorithm. Edge data also could be 254 created after multiple streams converge at an edge node and are 255 processed, transformed, or aggregated together in some manner. 257 Regardless of edge data type, a key problem in the Cloud-Edge 258 continuum is that data is often kept in silos. Meaning, data is 259 often sequestored within the Edge where it was created. A goal of 260 this discussion is to consider the prospect that different types of 261 edge data will be made accessible across disparate edges, for example 262 to enable richer multi-modal analytics. But this will happen only if 263 data can be described, searched and discovered across heterogeneous 264 edges in a standard way. Having a mechanism to enable granular edge 265 data discovery is the problem that needs solving either with existing 266 or new protocols. The mechanisms shouldn't care to which flavor 267 cloud or edge the request for data discovery is made. 269 2.2.1. Example Meta Data 271 SFC Data and meta-data discovery 273 Service function chaining (SFC) allows the instantiation of an 274 ordered set of service functions and subsequent "steering" of traffic 275 through them. Service functions provide a specific treatment of 276 received packets, therefore they need to be known so they can be used 277 in a given service composition via SFC. So far, how the SFs are 278 discovered and composed has been out of the scope of discussions in 279 IETF. While there are some mechanisms that can be used and/or 280 extended to provide this functionality, work needs to be done. An 281 example of this can be found in [I-D.bernardos-sfc-discovery]. 283 In an SFC environment deployed at the edge, the discovery protocol 284 may also need to make available the following meta-data information 285 per SF: 287 o Service Function Type, identifying the category of SF provided. 289 o SFC-aware: Yes/No. Indicates if the SF is SFC-aware. 291 o Route Distinguisher (RD): IP address indicating the location of 292 the SF(I). 294 o Pricing/costs details. 296 o Migration capabilities of the SF: whether a given function can be 297 moved to another provider (potentially including information about 298 compatible providers topologically close). 300 o Mobility of the device hosting the SF, with e.g. the following 301 sub-options: 303 Level: no, low, high; or a corresponding scale (e.g., 1 to 10). 305 Current geographical area (e.g., GPS coordinates, post code). 307 Target moving area (e.g., GPS coordinates, post code). 309 o Power source of the device hosting the SF, with e.g. the following 310 sub-options: 312 Battery: Yes/No. If Yes, the following sub-options could be 313 defined: 315 Capacity of the battery (e.g., mmWh). 317 Charge status (e.g., %). 319 Lifetime (e.g., minutes). 321 Discovery of resources in an NFV environment: virtualized resources 322 do not need to be limited to those available in traditional data 323 centers, where the infrastructure is stable, static, typically 324 homogeneous and managed by a single admin entity. Computational 325 capabilities are becoming more and more ubiquitous, with terminal 326 devices getting extremely powerful, as well as other types of devices 327 that are close to the end users at the edge (e.g., vehicular onboard 328 devices for infotainment, micro data centers deployed at the edge, 329 etc.). It is envisioned that these devices would be able to offer 330 storage, computing and networking resources to nearby network 331 infrastructure, devices and things (the fog paradigm). These 332 resources can be used to host functions, for example to offload/ 333 complement other resources available at traditional data centers, but 334 also to reduce the end-to- end latency or to provide access to 335 specialized information (e.g., context available at the edge) or 336 hardware. Similar to the discovery of functions, while there are 337 mechanisms that can be reused/extended, there is no complete solution 338 yet defined. An example of work in this area is 339 [I-D.bernardos-intarea-vim-discovery]." 341 3. Scenarios for Discovering Edge Data Resources 343 1. A set of data resources appears (e.g., a mobile node hosting data 344 joins a network) and they want to be discovered by an existing 345 but possibly virtualized and/or ephemeral data directory 346 infrastructure. 348 2. A device wants to discover data resources available at or near 349 its current location. As some of these resources may be mobile, 350 the available set of edge data may vary over time. 352 3. A device wants to discover to where best in the edge 353 infrastructure to opportunistically upload its data, for example 354 if a mobile device wants to offload its data to the 355 infrastructure (for greater data availability, battery savings, 356 etc.) 358 4. Edge Data Discovery 360 How can we discover data on the edge and make use of it? There are 361 proprietary implementations that collect data from various databases 362 and consolidate it for evaluation. We need a standard protocol set 363 for doing this data discovery, on the device or infrastructure edge, 364 in order to meet the requirements of many use cases. We will have 365 terabytes of data on the edge and need a way to identify its 366 existence and find the desired data. A user requires the need to 367 search for specific data in a data set and evaluate it using their 368 own tools. The tools are outside the scope of this document, but the 369 discovery of that data is in scope. 371 4.1. Types of Discovery 373 There are many aspects of discovery and many different protocols that 374 address each aspect. 376 Discovery of new devices added to an environment. Discovery of their 377 capabilities/services in client/server environments. Discovery of 378 these new devices automatically. Discovering a device and then 379 synchronizing the device inventory and configuration for edge 380 services. There are many existing protocols to help in this 381 discovery: UPnP, mDNS, DNS-SD, SSDP, NFC, XMPP, W3C network service 382 discovery, etc. 384 Edge devices discover each other in a standard way. We can use DHCP, 385 SNMP, SMS, COAP, LLDP, and routing protocols such as OSPF for devices 386 to discovery one another. 388 Discovery of link state and traffic engineering data/services by 389 external devices. BGP-LS is one solution. 391 The question is if one or more of these protocols might be a suitable 392 contender to extend to support edge data discovery? 394 4.2. Naming the Data 396 Named Data Networking (NDN) is one of five research projects funded 397 by the U.S. National Science Foundation under its Future Internet 398 Architecture Program. NDN has its roots in an earlier project, 399 Content-Centric Networking (CCN), which Van Jacobson started at Xerox 400 PARC around the time of his Google talk, to turn his architecture 401 vision into a running prototype (see also his CoNEXT 2009 paper and 402 especially Jacobsons ACM Queue interview). The motivation is the 403 mis-match of todays Internet architecture and its usage. Today we 404 build, support, and use Internet applications and services on top of 405 an extremely capable architecture not designed to support them. What 406 if we had an architecture designed to support them? Specifically, 407 todays IP packets can name only endpoints of conversations (IP 408 addresses) at the network layer. What if we generalize this layer to 409 name any information (or content), not just endpoints? We make it 410 easier to develop, manage, secure, and use our networks. NDN can be 411 applied to edge data discovery to make it much easier to extract data 412 and meta-data by naming it. If data was named we would be able to 413 discover the appropriate data simply by its name. 415 5. Use Cases of edge data discovery 417 1. Autonomous Vehicles 419 Autonomous vehicles rely on the processing of huge amounts of complex 420 data in real-time for fast and accurate decisions. These vehicles 421 will rely on high performance compute, storage and network resources 422 to process the volumes of data they produce in a low latency way. 423 Various systems will need a standard way to discover the pertinent 424 data for decision making 426 2. Video Surveillance 428 The majority of the video surveillance footage will remain at the 429 edge infrastructure (not sent to the cloud data center). This 430 footage is coming from vehicles, factories, hotels, universities, 431 farms, etc.Much of the video footage will not be interesting to those 432 evaluating the data. A mechanism, set of protocols perhaps, is 433 needed to identify the interesting data at the edge. What 434 constitutes interesting will be context specific, e.g., video frames 435 with a car in it, a backyard nocturnal creature in it, a person or 436 bicyclist or etc. Interesting video data may be stored longer in 437 storage systems at the very edge of the network or in flight in 438 networking equipment further away from the device edge. 440 3. Elevator Networks 442 Elevators are one of many industrial applications of edge computing. 443 Edge equipment receives data from 100's of elevator sensors. The 444 data coming into the edge equipment is vibration, temperature, speed, 445 level, video, etc. We need the ability to identify where the data we 446 need to evalute is located. 448 6. IANA Considerations 450 N/A 452 7. Security Considerations 454 Security considerations will be a critical component of edge data 455 discovery particularly as intelligence is moved to the extreme edge 456 where data is to be extracted. 458 8. Acknowledgement 460 9. Normative References 462 [I-D.bernardos-intarea-vim-discovery] 463 Bernardos, C. and A. Mourad, "IPv6-based discovery and 464 association of Virtualization Infrastructure Manager (VIM) 465 and Network Function Virtualization Orchestrator (NFVO)", 466 draft-bernardos-intarea-vim-discovery-02 (work in 467 progress), August 2019. 469 [I-D.bernardos-sfc-discovery] 470 Bernardos, C. and A. Mourad, "Service Function discovery 471 in fog environments", draft-bernardos-sfc-discovery-03 472 (work in progress), September 2019. 474 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 475 Requirement Levels", BCP 14, RFC 2119, 476 DOI 10.17487/RFC2119, March 1997, 477 . 479 Authors' Addresses 481 Mike McBride 482 Futurewei 484 Email: michael.mcbride@futurewei.com 486 Dirk Kutscher 487 Emden University 489 Email: ietf@dkutscher.net 491 Eve Schooler 492 Intel 494 Email: eve.m.schooler@intel.com 495 URI: http://www.eveschooler.com 496 Carlos J. Bernardos 497 Universidad Carlos III de Madrid 498 Av. Universidad, 30 499 Leganes, Madrid 28911 500 Spain 502 Phone: +34 91624 6236 503 Email: cjbc@it.uc3m.es 504 URI: http://www.it.uc3m.es/cjbc/