idnits 2.17.00 (12 Aug 2021) /tmp/idnits33230/draft-black-rdma-concerns-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The exact meaning of the all-uppercase expression 'NOT REQUIRED' is not defined in RFC 2119. If it is intended as a requirements expression, it should be rewritten using one of the combinations defined in RFC 2119; otherwise it should not be all-uppercase. -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (June 2002) is 7273 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC2119' is mentioned on line 113, but not defined == Unused Reference: 'RFC 2119' is defined on line 264, but no explicit reference was found in the text == Outdated reference: A later version (-01) exists of draft-bailey-roi-ddp-rdma-arch-00 -- Possible downref: Normative reference to a draft: ref. 'Bailey-arch' == Outdated reference: A later version (-01) exists of draft-romanow-rdma-over-ip-problem-statement-00 -- Possible downref: Normative reference to a draft: ref. 'Romanow-ps' ** Obsolete normative reference: RFC 896 (Obsoleted by RFC 7805) Summary: 4 errors (**), 0 flaws (~~), 5 warnings (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Draft David L. Black 3 Document: draft-black-rdma-concerns-00.txt EMC 4 Expires: November 2002 Michael F. Speer 5 Sun 6 John Wroclawski 7 MIT 8 June 2002 10 DDP and RDMA Concerns 12 Status of this Memo 14 This document is an Internet-Draft and is in full conformance with 15 all provisions of Section 10 of RFC2026. 17 Internet-Drafts are working documents of the Internet Engineering 18 Task Force (IETF), its areas, and its working groups. Note that 19 other groups may also distribute working documents as Internet- 20 Drafts. 22 Internet-Drafts are draft documents valid for a maximum of six 23 months and may be updated, replaced, or obsoleted by other 24 documents at any time. It is inappropriate to use Internet-Drafts 25 as reference material or to cite them other than as "work in 26 progress." 28 The list of current Internet-Drafts can be accessed at 29 http://www.ietf.org/ietf/1id-abstracts.txt 30 The list of Internet-Draft Shadow Directories can be accessed at 31 http://www.ietf.org/shadow.html. 33 Abstract 35 This draft describes technical concerns that should be considered 36 in the design of standardized RDMA and DDP protocols/mechanisms for 37 use with Internet transport protocols. This draft was written to 38 provide input to the proposed new Remote Direct Data Placement 39 (rddp) WG, and is not intended for eventual publication as an RFC. 41 Table of Contents 43 1. Overview......................................................2 44 2. Conventions used in this document.............................3 45 3. Architectural Concerns........................................3 46 3.1 Buffer Management.........................................3 47 3.2 Reliability...............................................4 48 4. Memory is more general that Transport Buffers.................4 49 4.1 Overwrites................................................4 50 4.2 Concurrent Operations to the Same Memory..................4 51 4.3 Completions and Ordering..................................5 52 4.4 Transfer Granularity......................................5 53 5. Security Considerations.......................................5 54 References.......................................................6 55 Author's Addresses...............................................7 57 1. Overview 59 A new effort to standardize RDMA (Remote Direct Memory Access) and 60 DDP (Direct Data Placement) protocols/mechanisms for Internet 61 transport protocols is going to take place in the proposed IETF 62 Remote Direct Data Placement (rddp) WG. This draft describes 63 technical concerns that should be addressed in the design and 64 standardization of these protocols. A basic understanding of RDMA 65 and DDP is assumed; while a basic introduction is included in this 66 section; readers unfamiliar with these concepts may wish to refer 67 to [Bailey-arch, Romanow-ps] for more background. 69 Both Direct Data Placement (DDP) and Remote Direct Memory Access 70 (RDMA) have the goal of eliminating copies between the protocol 71 stack and application buffers at the receiver. For example, when a 72 4 kilobyte file or disk block is retrieved, most operating systems 73 expect the resulting block to be in 4kB of contiguous memory 74 aligned to a 4kB boundary, but most networking interfaces do not 75 behave in this fashion. The result is that a copy is required to 76 produce an aligned 4kB block of data from the data delivered by the 77 network interface. This copy has undesirable performance impacts; 78 the goal of DDP and RDMA is to enable elimination of this copy in 79 an application- and protocol-independent fashion. The basic 80 concept is that the sender identifies data to be placed directly 81 into application buffers, and transmits that identification with 82 the data so that the receiver can place the data directly into 83 application buffers when it is received. 85 DDP is envisioned to share network transport buffers with 86 applications, but to use application-specified tags and offsets to 87 select buffers for use on receive. The primary purposes of this 88 information are to separate application data from headers and deal 89 with applications that return data in unpredictable orders (e.g., 90 the results of concurrent file and disk operations may be returned 91 to the invoker in arbitary order). One way to view DDP on the wire 92 is that it annotates (or "decorates") data that would have been 93 sent anyway. 95 RDMA uses DDP or a DDP-like mechanism to implement remote read and 96 write operations on memory regions explicitly exported by end 97 systems. A tag is used to designate a memory region, and an offset 98 is used to indicate the address within that region. RDMA differs 99 from DDP in that it provides a memory abstraction rather than a 100 transport buffer abstraction. This raises concerns based on the 101 ways in which transport buffers differ from memory in general. In 102 addition, the system coupling over a potentially unreliable network 103 implied by DDP and RDMA raises several architectural concerns. 105 2. Conventions used in this document 107 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 108 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in 109 this document are to be interpreted as described in [RFC2119], 110 although they are used here to describe requirements on protocol 111 development and standardization rather than on protocol 112 implementations. 114 3. Architectural Concerns 116 Both DDP and RDMA expose memory resources on the receiver to one or 117 more potentially untrustworthy sender(s) over a potentially 118 unreliable network. This has a number of architectural 119 implications, particularly for resource management. 121 3.1 Buffer Management 123 Traditional network stacks utilize a pool of interchangeable (aka 124 anonymous) buffers to hold data received from the network. By 125 using specific identifiable application buffers, DDP and RDMA make 126 the memory used for specific receive operations identifiable and 127 may cause protocols to devote more resources to the receive 128 function than might otherwise be the case. In situations where 129 effective use is being made of DDP and/or RDMA, the actual resource 130 demand on the system may be lessened (e.g., because applications 131 only expose memory that is in their working set), but it is 132 necessary to anticipate applications that use DDP and RDMA in a way 133 that increases resource demands and take appropriate precautions to 134 limit system degradation. 136 3.2 Reliability 138 RDMA is motivated by experiences with both local DMA and transfers 139 over reliable channels; these experiences will not be completely 140 applicable to RDMA over IP networks. Local DMA provides an extreme 141 example, in that a local DMA failure is usually caused by hardware 142 problems that often result in the hardware being considered to have 143 failed. In contrast, RDMA over IP must deal with a variety of 144 "stupid IP network tricks" as part of its normal operation. 145 Channel behavior is a less extreme example as channel controllers 146 must expect occasional channel failures and be prepared to deal 147 with the result; one example can be found in multipathing software 148 for disk storage access. 150 This set of concerns is roughly analogous to the reliability 151 difference between local and remote procedure calls and its impact 152 on distributed system design [need to add a reference here]. The 153 impact of the difference in reliability between local DMA and/or 154 channels vs. RDMA needs to be considered as part of any 155 specification effort, but may be best dealt with in applicability 156 statements as opposed to making these considerations part of the 157 core protocol specifications. 159 4. Memory is more general that Transport Buffers 161 The following subsections describe concerns arising from the fact 162 that memory that can be read and/or written is a more general and 163 capable abstraction than a transport buffer. 165 4.1 Overwrites 167 A transport buffer can be written exactly once when the data is 168 received; in contrast memory can be written multiple times. This 169 creates the opportunity for received DDP and RDMA data to overwrite 170 other data, including previously received data (that may or may not 171 have been transferred to the application(s)). DDP and RDMA 172 specifications MUST contain mechanisms to prevent overwrites from 173 impairing system integrity and to isolate the effect of overwrites 174 so that interference among otherwise unrelated applications is 175 prevented. 177 4.2 Concurrent Operations to the Same Memory 179 If a remote (or local) write takes place concurrently with a read 180 to the same memory, the read may return an arbitrary mix of the old 181 and new contents of the memory. If a remote (or local) write takes 182 place concurrently with another write, the resulting memory 183 contents may be an an arbitrary mix of the data from the two 184 writes. These results are generally considered undesirable, and 185 should be avoided. DDP and RDMA specifications must consider how 186 these situations are to be avoided (e.g., application-level 187 synchronization may be required), so that at worst they will occur 188 only as the result of application errors in using DDP and RDMA. 190 4.3 Completions and Ordering 192 RDMA Read and Write operations are asynchronous with respect to the 193 protocol layers above RDMA, hence completion mechanisms are 194 necessary to enable applications to determine when RDMA operations 195 have completed, although these mechanisms need not be invoked for 196 every RDMA opperation. In addition, an RDMA specification MUST 197 include the assumptions that an application may and may not make 198 about the state of "prior" RDMA operations based on observing the 199 completion of a specific RDMA operation. The word "prior" is in 200 quotes because an RDMA specification will need to define it as part 201 of specifying permissible inference of completion of "prior" 202 operations; the definition is likely to involve a partial order. 204 Fence and stream abstractions to enforce and prevent ordering 205 (respectively) MAY be included in RDMA and DDP specifications, but 206 are NOT REQUIRED. 208 4.4 Transfer Granularity 210 IP transports include the functionality to bundle data so that a 211 set of small user transfers is accomplished via a single larger 212 transfer across the network and through the relevant portions of 213 the protocol stacks. By defining specific remote operations that 214 an application may reasonably expect to complete in a timely 215 fashion, RDMA may disrupt this behavior by requiring smaller 216 transfers to be done promptly. The potential inefficiencies of the 217 resulting behavior for protocol stacks and networks have been known 218 for a long time; see the discussion of the small-packet problem in 219 [RFC 896]. Any RDMA specification MUST consider the ability to 220 bundle operations and the potential performance impact of 221 performing multiple smaller transfers in place of a single larger 222 one. This may also apply to DDP, but the first priority is that 223 DDP SHOULD NOT cause major changes to the transmission behavior of 224 any transport protocol to which it is applied by comparison to the 225 same stream without the DDP annotations (some degree of minor 226 change is unavoidable due to the space consumed by the DDP 227 annotations). 229 5. Security Considerations 231 With the possible exception of the Completion and Ordering concerns 232 described in Section 4.3, all of these concerns have security 233 implications in that failing to deal with them adequately may 234 expose attacks on system resources, correct operation and/or 235 integrity. 237 When memory is accessible via the network, such access must be 238 controlled, as allowing arbitrary access by untrusted entities 239 discloses the contents of the memory (read access) and/or allows it 240 to be corrupted (write access). Specifically, it is necessary to 241 provide mechanisms that enable applications to control RDMA and DDP 242 access to their exported memory by both identity (RDMA and DDP) and 243 type of access (read vs. write - RDMA only); this inherently 244 involves authentication of the principals granted access in order 245 to distinguish authorized from unauthorized access. Such 246 authentication MAY be implemented ouside the DDP and/or RDMA 247 protocols (e.g., in the application or a separate security protocol 248 such as TLS or IPsec [citations]) provided that means are specified 249 to securely couple the authorization of DDP and RDMA operations to 250 the corresponding authentications. 252 References 254 [Bailey-arch] Bailey, S., "The Architecture of Direct Data 255 Placement (DDP)And Remote Direct Memory Access (RDMA)On Internet 256 Protocols", Internet-Draft draft-bailey-roi-ddp-rdma-arch- 257 00.txt, Work in Progress, February 2002. 258 [Romanow-ps] Romanow, A., J. Mogul, T. Talpey, and S. Bailey, "RDMA 259 over IP Problem Statement", Internet-Draft draft-romanow-rdma- 260 over-ip-problem-statement-00.txt, Work in Progress, February 261 2002. 262 [RFC 896] Nagle, J., "Congestion Control in IP/TCP Internetworks", 263 RFC 896, January, 1984. 264 [RFC 2119] Bradner, S., "Key words for use in RFCs to Indicate 265 Requirement Levels", RFC 2119, BCP 14, March, 1997. 267 Acknowledgements 269 This draft is based in part on a presentation and discussion at an 270 end2end research group meeting at MIT in May 2002 - the authors 271 thank the end2end RG for providing the opportunity and gratefully 272 acknowledge the comments and suggestions of participants. 274 Author's Addresses 276 David L. Black 277 EMC Corporation 278 42 South Street Phone: +1 (508) 249-6449 279 Hopkinton, MA, 01748, USA Email: black_david@emc.com 281 Michael F. Speer 282 Sun Microsystems, Inc. 283 4150 Network Circle UMPK17-103 Phone: +1 (650) 786-6445 284 Santa Clara, CA 95054 Email: michael.speer@sun.com 286 John Wroclawski 287 MIT Lab for Computer Science 288 200 Technology Square Phone: +1 (617) 253-7885 289 Cambridge, MA 02139 Email: jtw@lcs.mit.edu