idnits 2.17.00 (12 Aug 2021) /tmp/idnits16311/draft-mcbride-edge-data-discovery-overview-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document date (March 10, 2019) is 1167 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Outdated reference: A later version (-06) exists of draft-bernardos-intarea-vim-discovery-01 ** Downref: Normative reference to an Experimental draft: draft-bernardos-intarea-vim-discovery (ref. 'I-D.bernardos-intarea-vim-discovery') == Outdated reference: A later version (-07) exists of draft-bernardos-sfc-discovery-02 ** Downref: Normative reference to an Experimental draft: draft-bernardos-sfc-discovery (ref. 'I-D.bernardos-sfc-discovery') Summary: 2 errors (**), 0 flaws (~~), 4 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 T2TRG M. McBride 3 Internet-Draft Huawei 4 Intended status: Standards Track D. Kutscher 5 Expires: September 11, 2019 Emden University 6 E. Schooler 7 Intel 8 CJ. Bernardos 9 UC3M 10 March 10, 2019 12 Overview of Edge Data Discovery 13 draft-mcbride-edge-data-discovery-overview-01 15 Abstract 17 This document describes the problem of distributed data discovery in 18 edge computing. Increasing numbers of IoT devices and sensors are 19 generating a torrent of data that originates at the very edges of the 20 network and that flows upstream, if it flows at all. Sometimes that 21 data must be processed or transformed (transcoded, subsampled, 22 compressed, analyzed, annotated, combined, aggregated, etc.) on edge 23 equipment, particularly in places where multiple high bandwidth 24 streams converge and where resources are limited. Support for edge 25 data analysis is critical to make local, low-latency decisions (e.g., 26 regarding predictive maintenance, the dispatch of emergency services, 27 identity, authorization, etc.). In addition, (transformed) data may 28 be cached, copied and/or stored at multiple locations in the network 29 on route to its final destination. Although the data might originate 30 at the edge, for example in factories, automobiles, video cameras, 31 wind farms, etc., as more and more distributed data is created, 32 processed and stored, it becomes increasingly dispersed throughout 33 the network. There needs to be a standard way to find it. New and 34 existing protocols will need to be identified/developed/enhanced for 35 distributed data discovery at the network edge and beyond. 37 Status of This Memo 39 This Internet-Draft is submitted in full conformance with the 40 provisions of BCP 78 and BCP 79. 42 Internet-Drafts are working documents of the Internet Engineering 43 Task Force (IETF). Note that other groups may also distribute 44 working documents as Internet-Drafts. The list of current Internet- 45 Drafts is at https://datatracker.ietf.org/drafts/current/. 47 Internet-Drafts are draft documents valid for a maximum of six months 48 and may be updated, replaced, or obsoleted by other documents at any 49 time. It is inappropriate to use Internet-Drafts as reference 50 material or to cite them other than as "work in progress." 52 This Internet-Draft will expire on September 11, 2019. 54 Copyright Notice 56 Copyright (c) 2019 IETF Trust and the persons identified as the 57 document authors. All rights reserved. 59 This document is subject to BCP 78 and the IETF Trust's Legal 60 Provisions Relating to IETF Documents 61 (https://trustee.ietf.org/license-info) in effect on the date of 62 publication of this document. Please review these documents 63 carefully, as they describe your rights and restrictions with respect 64 to this document. Code Components extracted from this document must 65 include Simplified BSD License text as described in Section 4.e of 66 the Trust Legal Provisions and are provided without warranty as 67 described in the Simplified BSD License. 69 Table of Contents 71 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 72 1.1. Edge Data . . . . . . . . . . . . . . . . . . . . . . . . 3 73 1.2. Background . . . . . . . . . . . . . . . . . . . . . . . 3 74 1.3. Requirements Language . . . . . . . . . . . . . . . . . . 4 75 1.4. Terminology . . . . . . . . . . . . . . . . . . . . . . . 4 76 2. The Edge Data Discovery Problem Scope . . . . . . . . . . . . 5 77 2.1. A Cloud-Edge Continuum . . . . . . . . . . . . . . . . . 5 78 2.2. Types of Edge Data . . . . . . . . . . . . . . . . . . . 6 79 3. Scenarios for Discovering Edge Data Resources . . . . . . . . 8 80 4. Edge Data Discovery . . . . . . . . . . . . . . . . . . . . . 8 81 4.1. Types of Discovery . . . . . . . . . . . . . . . . . . . 9 82 4.2. Naming the Data . . . . . . . . . . . . . . . . . . . . . 9 83 5. Use Cases of edge data discovery . . . . . . . . . . . . . . 10 84 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10 85 7. Security Considerations . . . . . . . . . . . . . . . . . . . 10 86 8. Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . 10 87 9. Normative References . . . . . . . . . . . . . . . . . . . . 11 88 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 11 90 1. Introduction 92 Edge computing is an architectural shift that migrates Cloud 93 functionality (compute, storage, networking, control, data 94 management, etc.) out of the back-end data center to be more 95 proximate to the IoT data being generated and analyzed at the edges 96 of the network. Edge computing provides local compute, storage and 97 connectivity services, often required for latency- and bandwidth- 98 sensitive applications. Thus, Edge Computing plays a key role in 99 verticals such as Energy, Manufacturing, Automotive, Video Analytics, 100 Retail, Gaming, Healthcare, Mining, Buildings and Smart Cities. 102 1.1. Edge Data 104 Edge computing is motivated at least in part by the sheer volume of 105 data that is being created by IoT devices (sensors, cameras, lights, 106 vehicles, drones, wearables, etc.) at the very network edge and that 107 flows upstream, in a direction for which the network was not 108 originally provisioned. In fact, in dense IoT deployments (e.g., 109 many video cameras are streaming high definition video), where 110 multiple data flows collect or converge at edge nodes, data is likely 111 to need transformation (transcoded, subsampled, compressed, analyzed, 112 annotated, combined, aggregated, etc.) to fit over the next hop link, 113 or even to fit in memory or storage. Note also that the act of 114 performing compute on the data creates yet another new data stream! 116 In addition, data may be cached, copied and/or stored at multiple 117 locations in the network on route to its final destination. With an 118 increasing percentage of devices connecting to the Internet being 119 mobile, support for in-the-network caching and replication is 120 critical for continuous data availability, not to mention efficient 121 network and battery usage for endpoint devices. 123 Additionally, as mobile devices' memory/storage fill up, in an edge 124 context they may have the ability to offload their data to other 125 proximate devices or resources, leaving a bread crumb trail of data 126 in their wakes. Therefore, although data might originate at edge 127 devices, as more and more data is continuously created, processed and 128 stored, it becomes increasingly dispersed throughout the physical 129 world (outside of or scattered across managed local data centers), 130 increasingly isolated in separate local edge clouds or data silos. 131 Thus there needs to be a standard way to find it. New and existing 132 protocols will need to be identified/developed/enhanced for these 133 purposes. Being able to discover distributed data at the edge or in 134 the middle of the network - will be an important component of Edge 135 computing. 137 1.2. Background 139 An IETF T2T RG Edge discussion was held and a comparative study on 140 the definition of Edge computing was presented in multiple sessions 141 in T2T RG in 2018. An IETF BEC (beyond edge computing) effort has 142 been evaluating potential gaps in existing edge computing 143 architectures. Edge Data Discovery is one potential gap that needs 144 evaluation and a solution. 146 Businesses, such as industrial companies, are starting to understand 147 how valuable the data is that they've kept in silos. Once this data 148 is made accessible on edge computing platforms, they may be able to 149 monetize the value of the data. But this will happen only if data 150 can be discovered and searched among heterogeneous equipment in a 151 standard way. Discovering the data, that its most useful to a given 152 market segment, will be extremely useful in building business 153 revenues. Having a mechanism to provide this granular discovery is 154 the problem that needs solving either with existing, or new, 155 protocols. 157 1.3. Requirements Language 159 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 160 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 161 document are to be interpreted as described in RFC 2119 [RFC2119]. 163 1.4. Terminology 165 o Edge: The edge encompasses all entities not in the back-end cloud. 166 The device edge is the boundary between digital and physical 167 entities in the last mile network. Sensors, gateways, compute 168 nodes are included. The infrastructure edge includes equipment on 169 the network operator side of the last mile network including cell 170 towers, edge data centers, cable headends, etc. See Figure 1 for 171 other possible tiers of edge clouds between the device edge and 172 the back-end cloud data center. 174 o Edge Computing: Distributed computation that is performed near the 175 edge, where nearness is determined by the system requirements. 176 This includes high performance compute, storage and network 177 equipment on either the device or infrastructure edge. 179 o Edge Data Discovery: The process of finding required data from 180 edge entities, i.e., from databases, files systems, device memory 181 that might be physically distributed in the network, and 182 consolidating it or providing access to it logically as if it were 183 a single unified source, perhaps through its namespace, that can 184 be evaluated or searched. 186 o NDN: Named Data Networking. NDN routes data by name (vs address), 187 caches content natively in the network, and employs data-centric 188 security. Data discovery may require that data be associated with 189 a name or names, a series of descriptive attributes, and/or a 190 unique identifier. 192 2. The Edge Data Discovery Problem Scope 194 Our focus is on how to define and scope the edge data discovery 195 problem. This requires some discussion of the evolving definition of 196 the edge and in turn what is meant by edge data. 198 2.1. A Cloud-Edge Continuum 200 Although Edge Computing data typically originates at edge devices, 201 there is nothing that precludes edge data from being created anywhere 202 in the cloud-to-edge computing continuum (Figure 1). New edge data 203 may result as a byproduct of computation being performed on the data 204 stream anywhere along its path in the network. For example, 205 infrastructure edges may create new edge data when multiple data 206 streams converge upon this aggregation point and require 207 transformation to fit within the available resources. Edge data also 208 may be sent to the back-end cloud as needed. Discovering data which 209 has be sent to the cloud is out of scope of this document, the 210 assumption being that the cloud boundary is one that does not expose 211 or publish the availability of its data. 213 +-------------------------------+ 214 | Back-end Cloud Data Center | 215 +-------------------------------+ 216 *** Cloud 217 * * Interconnect 218 *** 219 +-------------------------------+ 220 | Core Data Center | 221 +-------------------------------+ 222 *** Backbone 223 * * Network 224 *** 225 +-------------------------------+ 226 | Regional Data Center | 227 +-------------------------------+ 228 *** Metropolitan 229 * * Network 230 *** 231 +-------------------------------+ 232 | Infrastructure Edge | 233 +-------------------------------+ 234 *** Access 235 * * Network 236 *** 237 +-------------------------------+ 238 | Device Edge | 239 +-------------------------------+ 241 Figure 1: Cloud-to-edge computing continuum 243 Initially our focus is on discovery of edge data that resides at the 244 Device Edge and the Infrastructure Edge. 246 2.2. Types of Edge Data 248 Besides sensor and measurement data accumulating throughout the edge 249 computing infrastructure, edge data may also take the form of 250 streaming data (from a camera), meta data (about the data), control 251 data (regarding an event that was triggered), and/or an executable 252 that embodies a function, service, or any other piece of code or 253 algorithm. Edge data also could be created after multiple streams 254 converge at the edge node and are processed, transformed, or 255 aggregated together in some manner. 257 SFC Data and meta-data discovery 259 Service function chaining (SFC) allows the instantiation of an 260 ordered set of service functions and subsequent "steering" of traffic 261 through them. Service functions provide a specific treatment of 262 received packets, therefore they need to be known so they can be used 263 in a given service composition via SFC. So far, how the SFs are 264 discovered and composed has been out of the scope of discussions in 265 IETF. While there are some mechanisms that can be used and/or 266 extended to provide this functionality, work needs to be done. An 267 example of this can be found in [I-D.bernardos-sfc-discovery]. 269 In an SFC environment deployed at the edge, the discovery protocol 270 may also need to make available the following meta-data information 271 per SF: 273 o Service Function Type, identifying the category of SF provided. 275 o SFC-aware: Yes/No. Indicates if the SF is SFC-aware. 277 o Route Distinguisher (RD): IP address indicating the location of 278 the SF(I). 280 o Pricing/costs details. 282 o Migration capabilities of the SF: whether a given function can be 283 moved to another provider (potentially including information about 284 compatible providers topologically close). 286 o Mobility of the device hosting the SF, with e.g. the following 287 sub-options: 289 Level: no, low, high; or a corresponding scale (e.g., 1 to 10). 291 Current geographical area (e.g., GPS coordinates, post code). 293 Target moving area (e.g., GPS coordinates, post code). 295 o Power source of the device hosting the SF, with e.g. the following 296 sub-options: 298 Battery: Yes/No. If Yes, the following sub-options could be 299 defined: 301 Capacity of the battery (e.g., mmWh). 303 Charge status (e.g., %). 305 Lifetime (e.g., minutes). 307 Discovery of resources in an NFV environment: virtualized resources 308 do not need to be limited to those available in traditional data 309 centers, where the infrastructure is stable, static, typically 310 homogeneous and managed by a single admin entity. Computational 311 capabilities are becoming more and more ubiquitous, with terminal 312 devices getting extremely powerful, as well as other types of devices 313 that are close to the end users at the edge (e.g., vehicular onboard 314 devices for infotainment, micro data centers deployed at the edge, 315 etc.). It is envisioned that these devices would be able to offer 316 storage, computing and networking resources to nearby network 317 infrastructure, devices and things (the fog paradigm). These 318 resources can be used to host functions, for example to offload/ 319 complement other resources available at traditional data centers, but 320 also to reduce the end-to- end latency or to provide access to 321 specialized information (e.g., context available at the edge) or 322 hardware. Similar to the discovery of functions, while there are 323 mechanisms that can be reused/extended, there is no complete solution 324 yet defined. An example of work in this area is 325 [I-D.bernardos-intarea-vim-discovery]." 327 3. Scenarios for Discovering Edge Data Resources 329 Mainly two types of situations need to be covered: 331 1. A set of data resources appears (e.g., a mobile node hosting data 332 joins a network) and they want to be discovered by an existing 333 but possibly virtualized and/or ephemeral data directory 334 infrastructure. 336 2. A device wants to discover data resources available at or near 337 its current location. As some of these resources may be mobile, 338 the available set of edge data may vary over time. 340 4. Edge Data Discovery 342 How can we discover data on the edge and make use of it? There are 343 proprietary implementations that collect data from various databases 344 and consolidate it for evaluation. We need a standard protocol set 345 for doing this data discovery, on the device or infrastructure edge, 346 in order to meet the requirements of many use cases. We will have 347 terabytes of data on the edge and need a way to identify its 348 existence and find the desired data. A user requires the need to 349 search for specific data in a data set and evaluate it using their 350 own tools. The tools are outside the scope of this document, but the 351 discovery of that data is in scope. 353 4.1. Types of Discovery 355 There are many aspects of discovery and many different protocols that 356 address each aspect. 358 Discovery of new devices added to an environment. Discovery of their 359 capabilities/services in client/server environments. Discovery of 360 these new devices automatically. Discovering a device and then 361 synchronizing the device inventory and configuration for edge 362 services. There are many existing protocols to help in this 363 discovery: UPnP, mDNS, DNS-SD, SSDP, NFC, XMPP, W3C network service 364 discovery, etc. 366 Edge devices discover each other in a standard way. We can use DHCP, 367 SNMP, SMS, COAP, LLDP, and routing protocols such as OSPF for devices 368 to discovery one another. 370 Discovery of link state and traffic engineering data/services by 371 external devices. BGP-LS is one solution. 373 The question is if one or more of these protocols might be a suitable 374 contender to extend to support edge data discovery? 376 4.2. Naming the Data 378 Named Data Networking (NDN) is one of five research projects funded 379 by the U.S. National Science Foundation under its Future Internet 380 Architecture Program. NDN has its roots in an earlier project, 381 Content-Centric Networking (CCN), which Van Jacobson started at Xerox 382 PARC around the time of his Google talk, to turn his architecture 383 vision into a running prototype (see also his CoNEXT 2009 paper and 384 especially Jacobsons ACM Queue interview). The motivation is the 385 mis-match of todays Internet architecture and its usage. Today we 386 build, support, and use Internet applications and services on top of 387 an extremely capable architecture not designed to support them. What 388 if we had an architecture designed to support them? Specifically, 389 todays IP packets can name only endpoints of conversations (IP 390 addresses) at the network layer. What if we generalize this layer to 391 name any information (or content), not just endpoints? We make it 392 easier to develop, manage, secure, and use our networks. NDN can be 393 applied to edge data discovery to make it much easier to extract data 394 and meta-data by naming it. If data was named we would be able to 395 discover the appropriate data simply by its name. 397 5. Use Cases of edge data discovery 399 1. Autonomous Vehicles 401 Autonomous vehicles rely on the processing of huge amounts of complex 402 data in real-time for fast and accurate decisions. These vehicles 403 will rely on high performance compute, storage and network resources 404 to process the volumes of data they produce in a low latency way. 405 Various systems will need a standard way to discover the pertinent 406 data for decision making 408 2. Video Surveillance 410 The majority of the video surveillance footage will remain at the 411 edge infrastructure (not sent to the cloud data center). This 412 footage is coming from vehicles, factories, hotels, universities, 413 farms, etc.Much of the video footage will not be interesting to those 414 evaluating the data. A mechanism, set of protocols perhaps, is 415 needed to identify the interesting data at the edge. What 416 constitutes interesting will be context specific, e.g., video frames 417 with a car in it, a backyard nocturnal creature in it, a person or 418 bicyclist or etc. Interesting video data may be stored longer in 419 storage systems at the very edge of the network or in flight in 420 networking equipment further away from the device edge. 422 3. Elevator Networks 424 Elevators are one of many industrial applications of edge computing. 425 Edge equipment receives data from 100's of elevator sensors. The 426 data coming into the edge equipment is vibration, temperature, speed, 427 level, video, etc. We need the ability to identify where the data we 428 need to evalute is located. 430 6. IANA Considerations 432 N/A 434 7. Security Considerations 436 Security considerations will be a critical component of edge data 437 discovery particularly as intelligence is moved to the extreme edge 438 where data is to be extracted. 440 8. Acknowledgement 441 9. Normative References 443 [I-D.bernardos-intarea-vim-discovery] 444 Bernardos, C. and A. Mourad, "IPv6-based discovery and 445 association of Virtualization Infrastructure Manager (VIM) 446 and Network Function Virtualization Orchestrator (NFVO)", 447 draft-bernardos-intarea-vim-discovery-01 (work in 448 progress), February 2019. 450 [I-D.bernardos-sfc-discovery] 451 Bernardos, C. and A. Mourad, "Service Function discovery 452 in fog environments", draft-bernardos-sfc-discovery-02 453 (work in progress), March 2019. 455 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 456 Requirement Levels", BCP 14, RFC 2119, 457 DOI 10.17487/RFC2119, March 1997, 458 . 460 Authors' Addresses 462 Mike McBride 463 Huawei 465 Email: michael.mcbride@huawei.com 467 Dirk Kutscher 468 Emden University 470 Email: ietf@dkutscher.net 472 Eve Schooler 473 Intel 475 Email: eve.m.schooler@intel.com 477 Carlos J. Bernardos 478 Universidad Carlos III de Madrid 479 Av. Universidad, 30 480 Leganes, Madrid 28911 481 Spain 483 Phone: +34 91624 6236 484 Email: cjbc@it.uc3m.es 485 URI: http://www.it.uc3m.es/cjbc/