idnits 2.17.00 (12 Aug 2021) /tmp/idnits57798/draft-phinney-roll-rpl-industrial-applicability-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (October 1, 2011) is 3885 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: 'HART' is mentioned on line 1048, but not defined == Unused Reference: 'I-D.ietf-roll-of0' is defined on line 1002, but no explicit reference was found in the text == Outdated reference: draft-ietf-roll-of0 has been published as RFC 6552 == Outdated reference: draft-ietf-roll-p2p-rpl has been published as RFC 6997 == Outdated reference: draft-ietf-roll-rpl has been published as RFC 6550 == Outdated reference: draft-ietf-roll-terminology has been published as RFC 7102 Summary: 0 errors (**), 0 flaws (~~), 7 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 ROLL T. Phinney, Ed. 3 Internet-Draft consultant 4 Intended status: Informational P. Thubert 5 Expires: April 3, 2012 Cisco 6 RA. Assimiti 7 Nivis 8 October 1, 2011 10 RPL applicability in industrial networks 11 draft-phinney-roll-rpl-industrial-applicability-00 13 Abstract 15 The wide deployment of wireless devices, with their low installed 16 cost (compared to wired devices), will significantly improve the 17 productivity and safety of industrial plants, while simultaneously 18 increasing the efficiency and safety of the plant's workers, by 19 extending and making more timely the information set available about 20 plant operations. The new Routing Protocol for Low Power and Lossy 21 Networks (RPL) defines a Distance Vector protocol that is designed 22 for such networks. The aim of this document is to analyze the 23 applicability of that routing protocol in industrial LLNs of field 24 devices. 26 Status of this Memo 28 This Internet-Draft is submitted in full conformance with the 29 provisions of BCP 78 and BCP 79. 31 Internet-Drafts are working documents of the Internet Engineering 32 Task Force (IETF). Note that other groups may also distribute 33 working documents as Internet-Drafts. The list of current Internet- 34 Drafts is at http://datatracker.ietf.org/drafts/current/. 36 Internet-Drafts are draft documents valid for a maximum of six months 37 and may be updated, replaced, or obsoleted by other documents at any 38 time. It is inappropriate to use Internet-Drafts as reference 39 material or to cite them other than as "work in progress." 41 This Internet-Draft will expire on April 3, 2012. 43 Copyright Notice 45 Copyright (c) 2011 IETF Trust and the persons identified as the 46 document authors. All rights reserved. 48 This document is subject to BCP 78 and the IETF Trust's Legal 49 Provisions Relating to IETF Documents 50 (http://trustee.ietf.org/license-info) in effect on the date of 51 publication of this document. Please review these documents 52 carefully, as they describe your rights and restrictions with respect 53 to this document. Code Components extracted from this document must 54 include Simplified BSD License text as described in Section 4.e of 55 the Trust Legal Provisions and are provided without warranty as 56 described in the Simplified BSD License. 58 Table of Contents 60 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 61 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 6 62 3. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 63 3.1. Deployment scenarii . . . . . . . . . . . . . . . . . . . 7 64 3.2. Applications and Traffic classes . . . . . . . . . . . . . 9 65 3.3. RPL applicability matrix . . . . . . . . . . . . . . . . . 10 66 4. Characterization of communication flows in IACS wireless 67 networks . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 68 4.1. General . . . . . . . . . . . . . . . . . . . . . . . . . 11 69 4.2. Source-sink (SS) communication paradigm . . . . . . . . . 13 70 4.3. Publish-subscribe (PS, or pub/sub) communication 71 paradigm . . . . . . . . . . . . . . . . . . . . . . . . . 13 72 4.4. Peer-to-peer (P2P) communication paradigm . . . . . . . . 15 73 4.5. Peer-to-multipeer (P2MP) communication paradigm . . . . . 16 74 4.6. Additional considerations: Duocast and N-cast . . . . . . 17 75 4.7. RPL applicability per communication paradigm . . . . . . . 18 76 5. RPL profile . . . . . . . . . . . . . . . . . . . . . . . . . 21 77 5.1. Use for process control . . . . . . . . . . . . . . . . . 21 78 5.2. RPL features . . . . . . . . . . . . . . . . . . . . . . . 21 79 5.2.1. Storing vs. non-storing mode . . . . . . . . . . . . . 21 80 5.2.2. DAO policy . . . . . . . . . . . . . . . . . . . . . . 21 81 5.2.3. Path metrics . . . . . . . . . . . . . . . . . . . . . 22 82 5.2.4. Objective functions . . . . . . . . . . . . . . . . . 22 83 5.2.5. DODAG repair . . . . . . . . . . . . . . . . . . . . . 22 84 5.2.6. Security . . . . . . . . . . . . . . . . . . . . . . . 22 85 5.3. RPL options . . . . . . . . . . . . . . . . . . . . . . . 22 86 5.4. Recommended configuration defaults and ranges . . . . . . 22 87 5.4.1. Trickle parameters . . . . . . . . . . . . . . . . . . 22 88 5.4.2. Other parameters . . . . . . . . . . . . . . . . . . . 23 89 5.4.3. Additional configuration recommendations . . . . . . . 23 90 6. Other related protocols . . . . . . . . . . . . . . . . . . . 24 91 7. Manageability . . . . . . . . . . . . . . . . . . . . . . . . 25 92 8. IANA considerations . . . . . . . . . . . . . . . . . . . . . 26 93 9. Security considerations . . . . . . . . . . . . . . . . . . . 27 94 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 28 95 11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 29 96 11.1. Normative References . . . . . . . . . . . . . . . . . . . 29 97 11.2. Informative References . . . . . . . . . . . . . . . . . . 29 98 11.3. External Informative References . . . . . . . . . . . . . 30 99 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 31 101 1. Introduction 103 Information Technology (IT) is already, and increasingly will be 104 applied to industrial Automation and Control System (IACS) technology 105 in application areas where those IT technologies can be constrained 106 sufficiently by Service Level Agreements (SLA) or other modest change 107 that they are able to meet the operational needs of IACS. When that 108 happens, the IACS benefits from the large intellectual, experiential 109 and training investment that has already occurred in those IT 110 precursors. One can conclude that future reuse of additional IT 111 protocols for IACS will continue to occur due to the significant 112 intellectual, experiential and training economies which result from 113 that reuse. 115 Following that logic, many vendors are already extending or replacing 116 their local field-bus technology with Ethernet and IP-based 117 solutions. Examples of this evolution include CIP EtherNet/IP, 118 Modbus/TCP, Foundation Fieldbus HSE, PROFInet and Invensys/Foxboro 119 FOXnet. At the same time, wireless, low power field devices are 120 being introduced that facilitate a significant increase in the amount 121 of information which industrial users can collect and the number of 122 control points that can be remotely managed. 124 IPv6 appears as a core technology at the conjunction of both trends, 125 as illustrated by the current [ISA100.11a] industrial Wireless Sensor 126 Networking (WSN) specification, where layers 1-4 technologies 127 developed for end uses other than IACS - IEEE 802.15.4 PHY and MAC, 128 6LoWPAN and IPv6, and UDP - are adapted to IACS use. But due to the 129 lack of open standards for routing in Low power and Lossy Networks 130 (LLN), even ISA100.11a leaves the routing operation to proprietary 131 methods. 133 The IETF ROLL Working Group has defined application-specific routing 134 requirements for a LLN routing protocol, specified in: 136 Routing Requirements for Urban LLNs [RFC5548], 138 Industrial Routing Requirements in LLNs [RFC5673], 140 Home Automation Routing Requirements in LLNs [RFC5826], and 142 Building Automation Routing Requirements in LLNs [RFC5867]. 144 The Routing Protocol for Low Power and Lossy Networks (RPL) 145 [I-D.ietf-roll-rpl] specification and its point to point extension/ 146 optimization [I-D.ietf-roll-p2p-rpl] define a generic Distance Vector 147 protocol that is adapted to a variety of Low Power and Lossy Networks 148 (LLN) types by the application of specific Objective Functions (OFs). 150 RPL forms Destination Oriented Directed Acyclic Graphs (DODAGs) 151 within instances of the protocol, each instance being associated with 152 an Objective Function to form a routing topology. 154 A field device that belongs to an instance uses the OF to determine 155 which DODAG and which Version of that DODAG the device should join. 156 The device also uses the OF to select a number of routers within the 157 DODAG current and subsequent Versions to serve as parents or as 158 feasible successors. A new Version of the DODAG is periodically 159 reconstructed to enable a global reoptimization of the graph. 161 A RPL OF states the outcome of the process used by a RPL node to 162 select and optimize routes within a RPL Instance based on the 163 information objects available. The separation of OFs from the core 164 protocol specification allows RPL to be adapted to meet the different 165 optimization criteria required by the wide range of industrial 166 classes of traffic and applications. 168 This document provides information on how RPL can accommodate the 169 industrial requirements for LLNs, in particular as specified in 170 [RFC5673]. 172 2. Terminology 174 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 175 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 176 "OPTIONAL" in this document are to be interpreted as described in RFC 177 2119 [RFC2119]. 179 Additionally, this document uses terminology from 180 [I-D.ietf-roll-terminology], and uses usual terminology from the 181 Process Control and Factory Automation industries, some of which is 182 recapitulated below: 184 FEC: Forward error correction 186 IACS: Industrial automation and control systems 188 RAND: reasonable and non-discriminatory (relative to licensing of 189 patents) 191 3. Overview 193 3.1. Deployment scenarii 195 [RFC5673] describes in detail the routing requirements for industrial 196 LLNs. This RFC provides information on the varying deployment 197 scenarios for such LLNs and how RPL assists in meeting those 198 requirements. 200 Large industrial plants, or major operating areas within such plants, 201 repeatedly go through four major phases, each of which typically 202 lasts from months to years: 204 P1: Construction or major modification phase 206 P2: Planned startup phase 208 P3: Normal operation phase 210 P4: Planned shutdown phase 212 followed eventually by an (at least theoretical) 214 P5: Plant decommissioning phase. 216 It is also likely, after a major catastrophe at a plant, to have a 218 P6: Post-emergency recovery and repair phase. 220 The deployment scenarios for wireless LLN devices may be different in 221 each of these phases. In particular, during the Construction or 222 major modification phase (P1), LLN devices may be installed months 223 before the intended LLN can become usefully operational (because 224 needed routers and infrastructure devices are not yet installed or 225 active), and there are likely to be many personnel in whom the plant 226 owner/operator has only limited trust, such as subcontractors and 227 others in the plant area who have undergone only a cursory background 228 investigation (if any at all). In general, during this phase, plant 229 instrumentation is not yet operational, so could be removed and 230 replaced by a Trojaned device without much likelihood of physical 231 detection of the substitution. Thus physical security of LLN devices 232 is generally a more significant risk factor during this phase than 233 once the plant is operational, where simple replacement of device 234 electronics is detectable. 236 Extra LLN devices and even extra LLN subnets may be employed during 237 Planned startup (P2) and Planned shutdown (P4) phases, in support of 238 the task of transitioning the plant or plant area between operational 239 and shutdown states. The extra devices typically provide extra 240 monitoring as the plant transitions infrequent activity states. (In 241 many continuous process plants, up to 2x extra staff are employed at 242 monitoring and control workstations during these two phases, 243 precisely because the plant is undergoing extraordinary behavior as 244 it transitions to or from its steady-state operational condition.) 246 Similar transient devices and subnets may be used during an 247 unscheduled Post-emergency recovery and repair phase (P6) of 248 operation, but in that case the extra devices usually are routers 249 substituting for plant LLN devices that have been damaged by the 250 incident (such as a fire, explosion, flood, tornado or hurricane) 251 that induced the emergency. 253 The Planned startup (P2) and Planned shutdown (P4) phases are similar 254 in many respects, but the LLN environment of the two can be quite 255 different, since the Planned shutdown phase can assume that the 256 stable LLN environment used for Normal operation (P3) is functional 257 during shutdown, whereas that stable environment usually is still 258 being established during startup. 260 The Post-emergency recovery and repair phase (P6) typically operates 261 in an LLN environment that is somewhere between that of the Planned 262 startup (P2) and Normal operation (P3) phases, but with an 263 indeterminate number of temporary routers placed to facilitate 264 communication across and around the area affected by the catastrophe. 266 Smaller industrial plants and sites may go through similar phases, 267 but often commingle the phases because, in those smaller plants, the 268 phases require less planning and structuring of personnel 269 responsibilities and thus permit less formalization and partitioning 270 of the operating scenarios. For example, it is much simpler, and 271 usually requires much less planning, to bring new equipment on a skid 272 into a plant, using a forklift, than to lay temporary railroad track 273 or employ an extended-axle heavy haul tractor-trailer to deliver a 274 multi-ton process vessel, and temporarily deploy and use very large 275 heavy-lift cranes to install it. In the former cases, nearby 276 equipment usually can continue normal operation while the 277 installation proceeds; in the latter case that is almost always 278 impossible, due to safety and other concerns. 280 The domain of applicability for the RPL protocol may include all 281 phases but the Normal Operation phase, where the bandwidth allocation 282 and the routes are usually optimized by an external Path Computing 283 Engine (PCE), e.g. an ISA100.11a System Manager. 285 Additionally, it could be envisioned to include RPL in the normal 286 operation provided that a new Objective Function is defined that 287 actually interacts with the PCE is order to establish the reference 288 topology, in which case RPL operations would only apply to emergency 289 repair actions. when the reference topology becomes unusable for some 290 failure, and as long as the problem persists. 292 3.2. Applications and Traffic classes 294 The industrial market classifies process applications into three 295 broad categories and six classes. 297 o Safety 299 * Class 0: Emergency action - Always a critical function 301 o Control 303 * Class 1: Closed loop regulatory control - Often a critical 304 function 306 * Class 2: Closed loop supervisory control - Usually non-critical 307 function 309 * Class 3: Open loop control - Operator takes action and controls 310 the actuator (human in the loop) 312 o Monitoring 314 * Class 4: Alerting - Short-term operational effect (for example 315 event-based maintenance) 317 * Class 5: Logging and downloading / uploading - No immediate 318 operational consequence (e.g., history collection, sequence-of- 319 events, preventive maintenance) 321 Safety critical functions effect the basic safety integrity of the 322 plant. These normally dormant functions kick in only when process 323 control systems, or their operators, have failed. By design and by 324 regular interval inspection, they have a well-understood probability 325 of failure on demand in the range of typically once per 10-1000 326 years. 328 In-time deliveries of messages becomes more relevant as the class 329 number decreases. 331 Note that for a control application, the jitter is just as important 332 as latency and has a potential of destabilizing control algorithms. 334 The domain of applicability for the RPL protocol probably matches the 335 range of classes where industrial users are interested in deploying 336 wireless networks. This domain includes monitoring classes (4 and 337 5), and the non-critical portions of control classes (2 and 3). RPL 338 might also be considered as an additional repair mechanism in all 339 situations, and independently of the flow classification and the 340 medium type. 342 3.3. RPL applicability matrix 344 It appears from the above sections that whether and the way RPL can 345 be applied for a given flow depends both on the deployment scenario 346 and on the class of application / traffic. At a high level, this can 347 be summarized by the following matrix: 349 +---------------------+------------------------------------------------+ 350 | Phase \ Class | 0 1 2 3 4 5 | 351 +=====================+================================================+ 352 | Construction | X X X X | 353 +---------------------+------------------------------------------------+ 354 | Planned startup | X X X X | 355 +---------------------+------------------------------------------------+ 356 | Normal operation | ? ? ? | 357 +---------------------+------------------------------------------------+ 358 | Planned shutdown | X X X X | 359 +---------------------+------------------------------------------------+ 360 |Plant decommissioning| X X X X | 361 +---------------------+------------------------------------------------+ 362 | Recovery and repair | X X X X X X | 363 +---------------------+------------------------------------------------+ 365 ? : typically usable for all but higher-rate classes 0,1 PS traffic 367 Figure 1: RPL applicability matrix 369 4. Characterization of communication flows in IACS wireless networks 371 4.1. General 373 In an IACS, high-rate communications flows (e.g., 1 Hz or 4 Hz for a 374 traditional process automation network) typically are such that only 375 a single wireless LLN hop separates the source device from a LLN 376 Border Router (LBR) to a significantly higher data-rate backbone 377 network, typically based on IEEE 802.3, IEEE 802.11, or IEEE 802.16, 378 as illustrated in Figure 2. 380 ---+------------------------ 381 | Plant Network 382 | 383 +-----+ 384 | | Gateway 385 | | 386 +-----+ 387 | 388 | Backbone 389 +--------------------+------------------+ 390 | | | 391 +-----+ +-----+ +-----+ 392 | | LLN border | | LLN border | | LLN border 393 o | | router | | router | | router 394 +-----+ +-----+ +-----+ 395 o o o o 396 o o o o o o o o o o o 397 LLN 399 o : stationary wireless field device, seldom acting as an LLN router 401 Figure 2: High-rate low-delay low-variance IACS topology 403 For factory automation networks, the basic communications cycle for 404 control is typically much faster, on the order of 100 Hz or more. In 405 this case the LLN itself may be based on high-data-rate IEEE 802.11 406 or a 100 Mbit/s or faster optical link, and the higher-rate network 407 used by the LBRs to connect the LLN to superior automation equipment 408 typically might be based on fiber-optic IEEE 802.3, with multiple 409 LBRs around the periphery of the factory area, so that most high-rate 410 communications again requires only a single wireless LLN hop. 412 Multi-hop LLN routing is used within the LLN portion of such networks 413 to provide backup communications paths when primary single-hop LLN 414 paths fail, or for lower repetition rate communications where longer 415 LLN transit times and higher variance are not an issue. Typically, 416 the majority of devices in an IACS can tolerate such higher-delay 417 higher-variance paths, so routing choices often are driven by energy 418 considerations for the affected devices, rather than simply by IACS 419 performance requirements, as illustrated in Figure 3. 421 ---+------------------------ 422 | Plant Network 423 | 424 +-----+ 425 | | Gateway 426 | | 427 +-----+ 428 | 429 | Backbone 430 +--------------------+------------------+ 431 | | | 432 +-----+ +-----+ +-----+ 433 | | Backbone | | Backbone | | Backbone 434 | | router | | router | | router 435 +-----+ +-----+ +-----+ 436 o o o o o o o o o o o o o 437 o o o o o o o o o o o o o o o o o o 438 o o o o o o o o o o o M o o o o o 439 o o M o o o o o o o o o o o o o 440 o o o o o o o o o 441 o o o o o 442 LLN 444 o : stationary wireless field device, often acting as an LLN router 445 M : mobile wireless device 447 Figure 3: Low-rate higher-delay higher-variance IACS topology 449 Two decades of experience with digital fieldbuses has shown that four 450 communications paradigms dominate in IACS: 452 SS: Source-sink 454 PS: Publish-subscribe 456 P2P: Peer-to-peer 458 P2MP: Peer-to-multipeer 460 4.2. Source-sink (SS) communication paradigm 462 In SS, the source-sink communication paradigm, each of many devices 463 in one set, S1, sends UDP-like messages, usually infrequently and 464 intermittently, to a second set of devices, S2, determined by a 465 common multicast address. A typical example would be that all 466 devices within a given process unit N are configured to send process 467 alarm messages to the multicast address 468 Receivers_of_process_alarms_for_unit_N. Receiving devices, typically 469 on non-LLN networks accessed via LBRs, are configured to receive such 470 multicast messages if their work assignment covers process unit N, 471 and not otherwise. 473 Timeliness of message delivery is a significant aspect of some SS 474 communication. When the SS traffic conveys process alarms or device 475 alerts, there is often a contractual requirement, and sometimes even 476 a regulatory requirement, on the maximum end-to-end transit delay of 477 the SS message, including both the LLN and non-LLN components of that 478 delay. However, there is no requirement on relative jitter in the 479 delivery of multiple SS messages from the same source, and message 480 reordering during transit is irrelevant. 482 Within the LLN, the SS paradigm simply requires that messages so 483 addressed be forwarded to the responsible LBR (or set of equivalent 484 LBRs) for further forwarding outside the LLN. Within the LLN such 485 traffic typically is device-to-LBR or device-to-redundant-set-of- 486 equivalent-LBRs. In general, SS traffic may be aggregated before 487 forwarding when both the multicast destination address and other QoS 488 attributes are identical. If information on the target delivery 489 times for SS messages is available to the aggregating forwarding 490 device, that device may intentionally delay forwarding somewhat to 491 facilitate further aggregation, which can significantly reduce LLN 492 alarm-reporting traffic during major plant upset events. 494 4.3. Publish-subscribe (PS, or pub/sub) communication paradigm 496 In PS, the publish-subscribe communication paradigm, a device sends 497 UDP-like messages, usually periodically or cyclicly (i.e., 498 repetitively but without fixed periodicity), to a single multicast 499 address derived from or correlated with the device's own address. A 500 typical example would be that each sensor and actuator device within 501 a given process unit N is configured to send process state messages 502 to the multicast address that designates its specific publications. 503 In essence the derived multicast address for device D is 504 Receivers_of_publications_by_device_D. Typically those receivers are 505 in two categories: controllers (C) for control loops in which device 506 D participates, and devices accessed via the LLN's LBRs that monitor 507 and/or accumulate historical information about device D's status and 508 outputs. 510 If the controller(s) that receive device D's publication are all 511 outside the LLN and accessed by LBRs, then within the LLN such 512 traffic typically is device-to-LBR or device-to-redundant-set-of- 513 equivalent-LBRs. But if a controller (Cn) is within the LLN, then a 514 number of different LLN-local traffic patterns may be employed, 515 depending on the capabilities of the underlying link technology and 516 on configured performance requirements for such reporting. Typically 517 in such a case, publication by device D is forwarded up a DODAG to an 518 LLN router that is also on a downward DODAG to a destination 519 controller Cn, then forwarded down that second DODAG to that 520 destination controller Cn. Of course, if the LLN router (or even the 521 LBR) is itself the intended destination controller, which will often 522 be the case, then no downward forwarding occurs. 524 Timeliness of message delivery is a critical aspect of PS 525 communication. Individual messages can be lost without significant 526 impact on the controlled physical process, but typically a sequence 527 of four consecutive lost messages will trigger fallback behavior of 528 the control algorithms, which is considered a system failure by most 529 system owner/operators. (In general, and unless a local catastrophic 530 event such as a major explosion or a tornado occurs in the plant, 531 invocation of more than one instance of such fallback handling per 532 year, per plant, is considered unacceptable.) 534 Message loss, delay and jitter in delivery of PS messaging is a 535 relative matter. PS messaging is used for transfer of process 536 measurements and associated status from sensors to control 537 computation elements, from control computation elements to actuators, 538 and of current commanded position and status from actuators back to 539 control computation elements. The actual time interval of interest 540 is that which starts with sensing of the physical process (which 541 necessarily occurs before the sensed value can be sent in the first 542 message) and which ends when the computed control correction is 543 applied to the physical process by the appropriate actuator (which 544 cannot occur until after the second message containing the computed 545 control output has been received by that actuator). With rare 546 exception, the control algorithms used with PS messaging in the 547 process automation industries - those managing continuous material 548 flows - rely on fixed-period sampling, computation and transfer of 549 outputs, while those in the factory automation industries - those 550 managing discrete manufacturing operations - rely on bounded delay 551 between sampling of inputs, control computation and transfer of 552 outputs to physical actuators that affect the controlled process. 554 Deliberately manipulated message delay and jitter in delivery of PS 555 messaging has the potential to destabilize control loops. It is the 556 responsibility of conveyed higher-level protocols to protect against 557 such potential security attacks by detecting overly delayed or 558 jittered messages at delivery, converting them into instances of 559 message loss. Thus network and data-link protocols such as IPv6 and 560 Ethernet need not themselves address such issues, although their 561 selection and employment should take the existence (or lack) of such 562 higher-layer protection mechanisms, and the resulting consequences 563 due to excessive delay and jitter, into consideration in their 564 parameterization. 566 In general, PS traffic within the LLN is not aggregated before 567 forwarding, to minimize message loss and delay in reception by any 568 relevant controller(s) that are outside the LLN. However, if all 569 intended destination controllers are within the LLN, and at least one 570 of those intended controllers also serves as an LLN router on a DODAG 571 to off-LLN destinations that all are not controllers, then the router 572 functions in that device may aggregate PS traffic before forwarding 573 when the required routing and other QoS attributes are identical. If 574 information on the target delivery times for PS messages to non- 575 controller devices is available to the aggregating forwarding device, 576 that device may intentionally delay forwarding somewhat to facilitate 577 further aggregation. 579 In some system architectures, message streams that use PS to convey 580 current process measurements and status are compressed at the source 581 through a 2-dimensional winnowing process that compares 583 1) the process measurement values and status of the about-to-be-sent 584 message with that of the last actually-sent message, and 586 2) the current time vs. the queueing time for the last actually-sent 587 message. 589 If the interval since that last-sent message is less than a 590 predefined maximum time, and the status is unchanged, and the process 591 measurement(s) conveyed in the message is within predefined 592 deadband(s) of the last-sent measurement value(s), then transmission 593 of the new message is suppressed. Often this suppression takes the 594 form of not queuing the new message for transmission, but in some 595 protocols a brief placeholder message indicating "no significant 596 change" is queued in its stead. 598 4.4. Peer-to-peer (P2P) communication paradigm 600 In P2P, the peer-to-peer communication paradigm, a device sends UDP- 601 like or TCP-like messages from one device (D1) to a second device 602 (D2), usually with bidirectional but asymmetric flow of application 603 data, where the amount of data is significantly greater in one 604 direction than the other. Typical examples are transfer of 605 configuration information to or from a process field device, or 606 transfer of captured process diagnostics (e.g., time-stamped noise 607 signatures from a coriolis flowmeter) to an off-LLN higher-level 608 asset management system. Unicast addressing is used in both 609 directions of data flow. 611 In general, specific P2P traffic has only loose timeliness 612 requirements, typically just those required so that response times to 613 human-operator-initiated actions meet human factors requirements. As 614 a consequence, in general, message aggregation is permitted, although 615 few opportunities are likely to present themselves for such 616 aggregation due to the sporadic nature of such messaging to a single 617 destination, and/or due to the large message payloads that often 618 occur in at least one direction of transmission. 620 4.5. Peer-to-multipeer (P2MP) communication paradigm 622 In P2MP, the peer-to-multipeer communication paradigm, a device sends 623 UDP-like messages downward, from one device (D1) to a set of other 624 devices (Dn). Typical examples are bulk downloads to a set of 625 devices that use identical code image segments or identically- 626 structured database segments; group commands to enable device state 627 transitions that are quasi-synchronized across all or part of the 628 local network (e.g., switch to the next set of point-to-point 629 downloaded session keys, or notifying that the network is switching 630 to an emergency repair and recovery mode); etc. Multicast addressing 631 is used in the downward direction of data flow. 633 Devices can be assigned to a number of multicast groups, for instance 634 by device type. Then, if it becomes necessary to reflash all devices 635 of a given type with a new load image, a multicast distribution 636 mechanism can be leveraged to optimize the distribution operation. 638 In general, P2MP traffic has only loose timeliness requirements. As 639 a consequence, in general, message aggregation is permitted, although 640 few opportunities are likely to present themselves for such 641 aggregation due to the sporadic nature of such messaging to a single 642 multicast group destination, and/or due to the large message payloads 643 that often occur when P2MP is used for group downloads. However, in 644 general, message aggregation negatively impacts the delivery success 645 rate for each of the aggregated messages, since the probability of 646 error in a received message increases with message length> Together 647 these considerations often lead to a policy of non-aggregation for 648 P2MP messaging. 650 Note: Reliable group download protocols, such as the no-longer- 651 published IEEE 802.1E (ISO/IEC 15802-4) system load protocol, and 652 reliable multicast protocols based on the guidance of RFC2887, are 653 instructive in how P2MP can be used for initial bulk download, 654 followed by either P2MP or P2P selective retransmissions for missed 655 download segments. 657 4.6. Additional considerations: Duocast and N-cast 659 In industrial automation systems, some traffic is from (relatively) 660 high-rate monitoring and control loops, of Class 0 and Class 1 as 661 described in [RFC5673]. In such systems, the wireless link protocol, 662 which typically uses immediate in-band acknowledgement to confirm 663 delivery (or, on failure, conclude that a retransmission is 664 required), can be adapted to attempt simultaneous delivery to more 665 than one receiving device, with separated, sequenced immediate in- 666 band acknowledgement by each of those intended receivers. (This 667 mechanism is known colloquially as "duocast" (for two intended 668 receivers), or more generically as "N-cast" (for N intended 669 receivers).) Transmission is deemed successful if at least one such 670 immediate acknowledgement is received by the sending device; 671 otherwise the device queues the message for retransmission, up until 672 the maximum configured number of retries has been attempted. 674 The logic behind duocast/N-cast is very simple: In wireless systems 675 without FEC (forward error correction), the overall rate of success 676 for transactions consisting of an initial transmission and an 677 immediate acknowledgement is typically 95%. In other words, 5% of 678 such transactions fail, either because the initial message of the 679 transaction is not received correctly by the intended receiver, or 680 because the immediate acknowledgment by that receiver is not received 681 correctly by the transaction initiator. 683 In the generalized case of N-cast, where any received acknowledgement 684 serves to complete the transaction, and where the N intended 685 receivers are spatially diverse, physically separated from each other 686 by multiple wavelengths, the probability that all such receivers fail 687 to receive the initial message of the transaction, or that all 688 generated immediate acknowledgements are not received by the 689 transaction initiator, is typically approximately (5%)^N. Thus, for 690 duocast, the expected success rate for a single transaction goes from 691 95% (1.0 - 0.05) to 99.75% (1.0 - 0.05^2), to 99.9875% (1.0 - 0.05^3) 692 when N=3, and even higher when N>3. 694 From the above analysis, it is obvious that the primary benefit of 695 N-cast occurs when N goes from N=1 (unicast) to N=2 (duocast); the 696 reduction in transaction loss rate for increasing N>2 is quite small, 697 and for N>3 it is infinitesimal. In the typical industrial 698 automation environment of class 1 process control loops, which 699 typically repeat at a 1 Hz or 4 Hz rate, in a very large process 700 plant with thousands of field devices reporting at that rate, the 701 maximum number of transmission retries that must be planned, and for 702 which capacity must be scheduled (within the requisite 250 ms or 1 s 703 interval) is seven (7) retries for unicast PS reporting, but only 704 three (3) retries with duocast PS reporting. (This is determined by 705 the requirement to not miss four successive reports more than once 706 per year, across the entire plant, as such a loss typically triggers 707 fallback behavior in the controlled loop, which is considered a 708 failure of the wireless system by the plant owner/operator.) In 709 practice, the enormous reduction in both planned and used 710 retransmission capacity provided by duocast/N-cast is what enables 711 4 Hz loops to be supported in large wireless systems. 713 When available, duocast/N-cast typically is used only for one-hop PS 714 traffic on Class 1 and Class 0 control loops. It may also be 715 employed for rapid, reliable one-hop delivery of Class 0 and 716 sometimes Class 1 process alarms and device alerts, which use the SS 717 paradigm. Because it requires scheduling of multiple receivers that 718 are prepared to acknowledge the received message during the 719 transaction, in general it is not appropriate for the other types of 720 traffic in such systems - P2P and P2MP - and is not needed for other 721 classes of control loops or other types of traffic, which do not have 722 such stringent reporting requirements. 724 Note: Although there are known patent applications for duocast and 725 N-cast, at the time of this writing the patent assignee, Honeywell 726 International, has offered to permit cost-free RAND use in those 727 industrial wireless standards that have chosen to employee the 728 technology, under a reciprocal licensing requirement relative to that 729 use. Since duocast and N-cast provide performance and energy 730 optimizations, they are not essential for use in wireless systems. 731 However, in practice, their use makes it possible to support 4 Hz 732 wireless loops and meet sub-second safety alarm reporting 733 requirements in large plants, where that might otherwise be 734 impractical without use of a wired network. When duocast/N-cast is 735 not employed, the wireless retransmission capacity that is needed to 736 support such fast loops often is excessive, typically over 100x that 737 actually used for retransmission (i.e., providing for seven retries 738 per transaction when the mean number used is only 0.06 retries). 740 4.7. RPL applicability per communication paradigm 742 To match the requirements above, RPL provides a number of RPL Modes 743 of Operation (MOP): 745 No downward route: defined in [I-D.ietf-roll-rpl], section 6.3.1, 746 MOP of 0. This mode allows only upward routing, that is from 747 nodes (devices) that reside inside the RPL network toward the 748 outside via the DODAG root. 750 Non-storing mode: defined in [I-D.ietf-roll-rpl], section 6.3.1, MOP 751 of 1. This mode improves MOP 0 by adding the capability to use 752 source routing from the root towards registered targets within the 753 instance DODAG. 755 Storing mode without multicast support: defined in 756 [I-D.ietf-roll-rpl], section 6.3.1, MOP of 2. This mode improves 757 MOP 0 by adding the capability to use stateful routing from the 758 root towards registered targets within the instance DODAG. 760 Storing mode with link-scope multicast DAO: defined in 761 [I-D.ietf-roll-rpl] section 9.10, this mode improves MOP 2 by 762 adding the capability to send Destination Advertisements to all 763 nodes over a single Layer 2 link (e.g. a wireless hop) and enables 764 line-of-sight direct communication. 766 Storing mode with multicast support: defined in [I-D.ietf-roll-rpl], 767 Mode-of-operation (MOP) of 3. This mode improves MOP 2 by adding 768 the capability to register multicast groups and perform multicast 769 forwarding along the instance DODAG (or a spanning subtree within 770 the DODAG). 772 Reactive: defined in [I-D.ietf-roll-p2p-rpl], the reactive mode 773 creates on-demand additional DAGs that are used to reach a given 774 node acting as DODAG root within a certain number of hops. This 775 mode can typically be used for an ad-hoc closed-loop 776 communication. 778 The RPL MOP that can be applied for a given flow depends on the 779 communication paradigm. It must be noted that a DODAG that is used 780 for PS traffic can also be used for SS traffic since the MOP 2 781 extends the MOP 0, and that a DODAG that is used for P2MP 782 distribution can also be used for downward PS since the MOP 3 extends 783 the MOP 2. 785 On the other hand, an Objective Function (OF) that optimizes metrics 786 for a pure upwards DODAG might differ from the OF that optimizes a 787 mixed upward and downward DODAG. 789 As a result, it can be expected that different RPL instances are 790 installed with different OFs, different channel allocations, etc... 791 that result in different routing and forwarding topologies, sometimes 792 with differing delay vs. energy profiles, optimized separately for 793 the different flows at hand. 795 This can be broadly summarized in the following table: 797 +---------------------+------------+-----------------------------------+ 798 | Paradigm\RPL MOP | RPL spec | Mode of operation | 799 +=====================+============+===================================+ 800 | Peer-to-peer | RPL P2P | reactive (on-demand) | 801 +---------------------+------------+-----------------------------------+ 802 | P2P line-of-sight | RPL base | 2 (storing) with multicast DAO | 803 +---------------------+------------+-----------------------------------+ 804 | P2MP distribution | RPL base | 3 (storing with multicast) | 805 +---------------------+------------+-----------------------------------+ 806 | Publish-subscribe | RPL base | 1 or 2 (storing or not-storing) | 807 +---------------------+------------+-----------------------------------+ 808 | Source-sink | RPL base | 0 (no downward route) | 809 +---------------------+------------+-----------------------------------+ 810 | N-cast publish | RPL base | 0 (no downward route) | 811 +---------------------+------------+-----------------------------------+ 813 Figure 4: RPL applicability per communication paradigm 815 5. RPL profile 817 5.1. Use for process control 819 This section outlines a RPL profile for a representative deployment 820 in a process control application. Process monitoring without control 821 is typically less demanding, so a subset of this profile generally 822 will suffice. 824 5.2. RPL features 826 5.2.1. Storing vs. non-storing mode 828 RPL operation is defined for a single RPL instance. However, 829 multiple RPL instances can be supported in multi-service networks 830 where different applications may require the use of different routing 831 metrics and constraints, e.g., a network carrying both safety and 832 non-safety control and monitoring traffic. 834 In general, storing mode is required for high-reporting-rate devices 835 (where "high rate" is with respect to the underlying link data 836 conveyance capability). Such devices, in the absence of path 837 failure, are typically only one hop from the LBR(s) that convey their 838 messaging to other parts of the system. Fortunately, in such cases, 839 the routing tables required by such nodes are small, even when they 840 include information on DODAGs that are used as backup alternate 841 routes. 843 In general, devices which communicate with LBRs through a chain of 844 intermediary devices will use storing mode for their upward DODAGs, 845 but will use non-storing mode for downward DODAGs for messaging that 846 they route further into the LLN. However, routers that provide 847 downward forwarding for PS messaging addressed to controllers within 848 the LLN (which is expected to be a rare occurrence) will use storing 849 mode for those forwarding paths, so that timely, destination- 850 constrained forwarding of such recurring messaging does not overload 851 the routing node(s) and their downstream subnets. 853 5.2.2. DAO policy 855 Two-way communication is a requirement in industrial automation 856 systems. As a result, nodes SHOULD send DAO messages to establish 857 downward paths from the root to themselves. 859 861 5.2.3. Path metrics 863 RPL relies on an Objective Function for selecting parents and 864 computing path costs and rank. This objective function is decoupled 865 from the core RPL mechanisms and also from the metrics in use in the 866 network. Two objective functions for RPL have been defined at the 867 time of this writing, OF0 and MRHOF, both of which define the 868 selection of a preferred parent and backup parents, and are suitable 869 for industrial automation network deployments. 871 Neither of the currently defined objective functions supports 872 multiple metrics that might be required in heterogeneous industrial 873 automation networks (e.g., networks composed of devices with 874 different energy and timeliness-of-communication constraints). 875 Additional objective functions specifically designed for such 876 networks may be defined in companion RFCs. 878 5.2.4. Objective functions 880 882 5.2.5. DODAG repair 884 5.2.6. Security 886 Industrial automation network deployments typically operate in areas 887 that provide limited physical security (relative to the risk of 888 attack). For this reason, the link layer, transport layer and 889 application layer technologies utilized within such networks 890 typically provide security mechanisms to ensure authentication, 891 confidentiality, integrity, timeliness and freshness. As a result, 892 such deployments may not need to implement RPL's security mechanisms 893 and could rely on link layer and higher layer security features. 895 5.3. RPL options 897 5.4. Recommended configuration defaults and ranges 899 5.4.1. Trickle parameters 901 Trickle was designed to be density-aware and perform well in networks 902 characterized by a wide range of node densities. The combination of 903 DIO packet suppression and adaptive timers for sending updates allows 904 Trickle to perform well in both sparse and dense environments. 906 908 5.4.2. Other parameters 910 912 5.4.3. Additional configuration recommendations 914 916 6. Other related protocols 918 920 7. Manageability 922 Network manageability is a critical aspect of smart grid network 923 deployment and operation. With millions of devices participating in 924 the smart grid network, many requiring real-time reachability, 925 automatic configuration, and lightweight network health monitoring 926 and management are crucial for achieving network availability and 927 efficient operation. 929 RPL enables automatic and consistent configuration of RPL routers 930 through parameters specified by the DODAG root and disseminated 931 through DIO packets. The use of Trickle for scheduling DIO 932 transmissions ensures lightweight yet timely propagation of important 933 network and parameter updates and allows network operators to choose 934 the trade-off point they are comfortable with respect to overhead vs. 935 reliability and timeliness of network updates. 937 The metrics in use in the network along with the Trickle Timer 938 parameters used to control the frequency and redundancy of network 939 updates can be dynamically varied by the root during the lifetime of 940 the network. To that end, all DIO messages SHOULD contain a Metric 941 Container option for disseminating the metrics and metric values used 942 for DODAG setup. In addition, DIO messages SHOULD contain a DODAG 943 Configuration option for disseminating the Trickle Timer parameters 944 throughout the network. 946 The possibility of dynamically updating the metrics in use in the 947 network as well as the frequency of network updates allows deployment 948 characteristics (e.g., network density) to be discovered during 949 network bring-up and to be used to tailor network parameters once the 950 network is operational rather than having to rely on precise pre- 951 configuration. This also allows the network parameters and the 952 overall routing protocol behavior to evolve during the lifetime of 953 the network. 955 RPL specifies a number of variables and events that can be tracked 956 for purposes of network fault and performance monitoring of RPL 957 routers. Depending on the memory and processing capabilities of each 958 smart grid device, various subsets of these can be employed in the 959 field. 961 963 8. IANA considerations 965 This specification has no requirement on IANA. 967 9. Security considerations 969 This document does not specify operations that could introduce new 970 threats. Security considerations for RPL deployments are to be 971 developed in accordance with recommendations laid out in, for 972 example, [I-D.tsao-roll-security-framework]. 974 Industrial automation networks are subject to stringent security 975 requirements as they are considered a critical infrastructure 976 component. At the same time, since they are composed of large 977 numbers of resource- constrained devices inter-connected with 978 limited-throughput links, many available security mechanisms are not 979 practical for use in such networks. As a result, the choice of 980 security mechanisms is highly dependent on the device and network 981 capabilities characterizing a particular deployment. 983 In contrast to other types of LLNs, in industrial automation networks 984 centralized administrative control and access to a permanent secure 985 infrastructure is available. As a result link-layer, transport-layer 986 and/or application-layer security mechanisms are typically in place 987 and may make use of RPL's secure mode unnecessary. 989 10. Acknowledgements 991 993 11. References 995 11.1. Normative References 997 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 998 Requirement Levels", BCP 14, RFC 2119, March 1997. 1000 11.2. Informative References 1002 [I-D.ietf-roll-of0] 1003 Thubert, P., "RPL Objective Function Zero", 1004 draft-ietf-roll-of0-20 (work in progress), September 2011. 1006 [I-D.ietf-roll-p2p-rpl] 1007 Goyal, M., Baccelli, E., Philipp, M., Brandt, A., Cragie, 1008 R., and J. Martocci, "Reactive Discovery of Point-to-Point 1009 Routes in Low Power and Lossy Networks", 1010 draft-ietf-roll-p2p-rpl-04 (work in progress), July 2011. 1012 [I-D.ietf-roll-rpl] 1013 Winter, T., Thubert, P., Brandt, A., Clausen, T., Hui, J., 1014 Kelsey, R., Levis, P., Pister, K., Struik, R., and J. 1015 Vasseur, "RPL: IPv6 Routing Protocol for Low power and 1016 Lossy Networks", draft-ietf-roll-rpl-19 (work in 1017 progress), March 2011. 1019 [I-D.ietf-roll-terminology] 1020 Vasseur, J., "Terminology in Low power And Lossy 1021 Networks", draft-ietf-roll-terminology-06 (work in 1022 progress), September 2011. 1024 [I-D.tsao-roll-security-framework] 1025 Tsao, T., Alexander, R., Daza, V., and A. Lozano, "A 1026 Security Framework for Routing over Low Power and Lossy 1027 Networks", draft-tsao-roll-security-framework-02 (work in 1028 progress), March 2010. 1030 [RFC5548] Dohler, M., Watteyne, T., Winter, T., and D. Barthel, 1031 "Routing Requirements for Urban Low-Power and Lossy 1032 Networks", RFC 5548, May 2009. 1034 [RFC5673] Pister, K., Thubert, P., Dwars, S., and T. Phinney, 1035 "Industrial Routing Requirements in Low-Power and Lossy 1036 Networks", RFC 5673, October 2009. 1038 [RFC5826] Brandt, A., Buron, J., and G. Porcu, "Home Automation 1039 Routing Requirements in Low-Power and Lossy Networks", 1040 RFC 5826, April 2010. 1042 [RFC5867] Martocci, J., De Mil, P., Riou, N., and W. Vermeylen, 1043 "Building Automation Routing Requirements in Low-Power and 1044 Lossy Networks", RFC 5867, June 2010. 1046 11.3. External Informative References 1048 [HART] www.hartcomm.org, "Highway Addressable Remote Transducer, 1049 a group of specifications for industrial process and 1050 control devices administered by the HART Foundation". 1052 [ISA100.11a] 1053 ISA, "ISA100, Wireless Systems for Automation", May 2008, 1054 < http://www.isa.org/Community/ 1055 SP100WirelessSystemsforAutomation>. 1057 Authors' Addresses 1059 Tom Phinney (editor) 1060 consultant 1061 5012 W. Torrey Pines Circle 1062 Glendale, AZ 85308-3221 1063 USA 1065 Phone: +1 602 938 3163 1066 Email: tom.phinney@cox.net 1068 Pascal Thubert 1069 Cisco Systems 1070 Village d'Entreprises Green Side 1071 400, Avenue de Roumanille 1072 Batiment T3 1073 Biot - Sophia Antipolis 06410 1074 FRANCE 1076 Phone: +33 497 23 26 34 1077 Email: pthubert@cisco.com 1079 Robert Assimiti 1080 Nivis 1081 1000 Circle 75 Parkway SE, Ste 300 1082 Atlanta, GA 30339 1083 USA 1085 Phone: +1 678 202 6859 1086 Email: robert.assimiti@nivis.com