idnits 2.17.00 (12 Aug 2021) /tmp/idnits12735/draft-ietf-nfsv4-rfc5667bis-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to contain a disclaimer for pre-RFC5378 work, but was first submitted on or after 10 November 2008. The disclaimer is usually necessary only for documents that revise or obsolete older RFCs, and that take significant amounts of text from those RFCs. If you can contact all authors of the source material and they are willing to grant the BCP78 rights to the IETF Trust, you can and should remove the disclaimer. Otherwise, the disclaimer is needed and you can ignore this comment. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (January 20, 2017) is 1946 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Outdated reference: draft-ietf-nfsv4-rfc5666bis has been published as RFC 8166 == Outdated reference: draft-ietf-nfsv4-rpcrdma-bidirection has been published as RFC 8167 ** Obsolete normative reference: RFC 5661 (Obsoleted by RFC 8881) == Outdated reference: draft-ietf-nfsv4-versioning has been published as RFC 8178 -- Obsolete informational reference (is this intentional?): RFC 5667 (Obsoleted by RFC 8267) Summary: 1 error (**), 0 flaws (~~), 5 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network File System Version 4 C. Lever, Ed. 3 Internet-Draft Oracle 4 Obsoletes: 5667 (if approved) January 20, 2017 5 Intended status: Standards Track 6 Expires: July 24, 2017 8 Network File System (NFS) Upper Layer Binding To RPC-Over-RDMA 9 draft-ietf-nfsv4-rfc5667bis-04 11 Abstract 13 This document specifies Upper Layer Bindings of Network File System 14 (NFS) protocol versions to RPC-over-RDMA. Upper Layer Bindings are 15 required to enable RPC-based protocols, such as NFS, to use Direct 16 Data Placement on RPC-over-RDMA. This document obsoletes RFC 5667. 18 Requirements Language 20 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 21 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 22 document are to be interpreted as described in [RFC2119]. 24 Status of This Memo 26 This Internet-Draft is submitted in full conformance with the 27 provisions of BCP 78 and BCP 79. 29 Internet-Drafts are working documents of the Internet Engineering 30 Task Force (IETF). Note that other groups may also distribute 31 working documents as Internet-Drafts. The list of current Internet- 32 Drafts is at http://datatracker.ietf.org/drafts/current/. 34 Internet-Drafts are draft documents valid for a maximum of six months 35 and may be updated, replaced, or obsoleted by other documents at any 36 time. It is inappropriate to use Internet-Drafts as reference 37 material or to cite them other than as "work in progress." 39 This Internet-Draft will expire on July 24, 2017. 41 Copyright Notice 43 Copyright (c) 2017 IETF Trust and the persons identified as the 44 document authors. All rights reserved. 46 This document is subject to BCP 78 and the IETF Trust's Legal 47 Provisions Relating to IETF Documents 48 (http://trustee.ietf.org/license-info) in effect on the date of 49 publication of this document. Please review these documents 50 carefully, as they describe your rights and restrictions with respect 51 to this document. Code Components extracted from this document must 52 include Simplified BSD License text as described in Section 4.e of 53 the Trust Legal Provisions and are provided without warranty as 54 described in the Simplified BSD License. 56 This document may contain material from IETF Documents or IETF 57 Contributions published or made publicly available before November 58 10, 2008. The person(s) controlling the copyright in some of this 59 material may not have granted the IETF Trust the right to allow 60 modifications of such material outside the IETF Standards Process. 61 Without obtaining an adequate license from the person(s) controlling 62 the copyright in such materials, this document may not be modified 63 outside the IETF Standards Process, and derivative works of it may 64 not be created outside the IETF Standards Process, except to format 65 it for publication as an RFC or to translate it into languages other 66 than English. 68 Table of Contents 70 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 71 2. Conveying NFS Operations On RPC-Over-RDMA . . . . . . . . . . 3 72 3. Upper Layer Binding For NFS Versions 2 And 3 . . . . . . . . 5 73 4. Upper Layer Binding For NFS Version 4 . . . . . . . . . . . . 7 74 5. Extending NFS Upper Layer Bindings . . . . . . . . . . . . . 13 75 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 14 76 7. Security Considerations . . . . . . . . . . . . . . . . . . . 14 77 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 15 78 Appendix A. Changes Since RFC 5667 . . . . . . . . . . . . . . . 16 79 Appendix B. Acknowledgments . . . . . . . . . . . . . . . . . . 17 80 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 18 82 1. Introduction 84 An RPC-over-RDMA transport, such as the one defined in 85 [I-D.ietf-nfsv4-rfc5666bis], may employ direct data placement to 86 convey data payloads associated with RPC transactions. To enable 87 successful interoperation, RPC client and server implementations must 88 agree as to which XDR data items in what particular RPC procedures 89 are eligible for direct data placement (DDP). 91 This document contains material required of Upper Layer Bindings, as 92 specified in [I-D.ietf-nfsv4-rfc5666bis], for the following NFS 93 protocol versions: 95 o NFS Version 2 [RFC1094] 96 o NFS Version 3 [RFC1813] 98 o NFS Version 4.0 [RFC7530] 100 o NFS Version 4.1 [RFC5661] 102 o NFS Version 4.2 [RFC7862] 104 Upper Layer Bindings specified in this document apply to all versions 105 of RPC-over-RDMA. 107 2. Conveying NFS Operations On RPC-Over-RDMA 109 Definitions of terminology and a general discussion of how RPC-over- 110 RDMA is used to convey RPC transactions can be found in 111 [I-D.ietf-nfsv4-rfc5666bis]. In this section, these general 112 principles are applied in the context of conveying NFS procedures on 113 RPC-over-RDMA. Some issues common to all NFS protocol versions are 114 introduced. 116 2.1. The Read List 118 The Read list in each RPC-over-RDMA transport header represents a set 119 of memory regions containing DDP-eligible NFS argument data. Large 120 data items, such as the data payload of an NFS version 3 WRITE 121 procedure, can be referenced by the Read list. The NFS server pulls 122 such payloads from the client and places them directly into its own 123 memory. 125 Exactly which XDR data items may be conveyed in this fashion is 126 detailed later in this document. 128 2.2. The Write List 130 The Write list in each RPC-over-RDMA transport header represents a 131 set of memory regions that can receive DDP-eligible NFS result data. 132 Large data items, such as the payload of an NFS version 3 READ 133 procedure, can be referenced by the Write list. The NFS server 134 pushes such payloads to the client, placing them directly into the 135 client's memory. 137 Each Write chunk corresponds to a specific XDR data item in an NFS 138 reply. This document specifies how NFS client and server 139 implementations identify the correspondence between Write chunks and 140 XDR results. 142 Exactly which XDR data items may be conveyed in this fashion is 143 detailed later in this document. 145 2.3. Long Calls And Replies 147 Small RPC messages are conveyed using RDMA Send operations which are 148 of limited size. If an NFS request is too large to be conveyed 149 within the NFS server's responder inline threshold, and there are no 150 DDP-eligible data items that can be removed, an NFS client must send 151 the request in the form of a Long Call. The entire NFS request is 152 sent in a special Read chunk called a Position Zero Read chunk. 154 If an NFS client determines that the maximum size of an NFS reply 155 could be too large to be conveyed within it's own responder inline 156 threshold, it provides a Reply chunk in the RPC-over-RDMA transport 157 header conveying the NFS request. The server places the entire NFS 158 reply in the Reply chunk. 160 When the RPC authentication flavor requires that DDP-eligible data 161 items are never removed from RPC messages, an NFS client can provide 162 both a Position Zero Read chunk and a Reply chunk for the same RPC. 164 These special chunks are discussed in further detail in 165 [I-D.ietf-nfsv4-rfc5666bis]. 167 2.4. Scatter-Gather Considerations 169 A chunk typically corresponds to exactly one XDR data item. Each 170 Read chunk is represented as a list of segments at the same XDR 171 Position. Each Write chunk is represented as an array of segments. 172 An NFS client thus has the flexibility to advertise a set of 173 discontiguous memory regions in which to convey a single DDP-eligible 174 XDR data item. 176 2.5. DDP Eligibility Violations 178 To report a DDP-eligibity violation, an NFS server MUST return one 179 of: 181 o An RPC-over-RDMA message of type RDMA_ERROR, with the rdma_xid 182 field set to the XID of the matching NFS Call, and the rdma_error 183 field set to ERR_CHUNK; or 185 o An RPC message (via an RDMA_MSG message) with the xid field set to 186 the XID of the matching NFS Call, the mtype field set to REPLY, 187 the stat field set to MSG_ACCEPTED, and the accept_stat field set 188 to GARBAGE_ARGS. 190 Subsequent sections of this document describe further considerations 191 particular to specific NFS protocols or procedures. 193 2.6. Reply Size Estimation 195 During the construction of each RPC Call message, an NFS client is 196 responsible for allocating appropriate resources for receiving the 197 matching Reply message. A Reply buffer overrun can result in 198 corruption of the Reply message or termination of the transport 199 connection. Therefore reliable reply size estimation is necessary to 200 ensure successful interoperation. 202 In many cases the Upper Layer Protocol's XDR definition provides 203 enough information to enable the client to make a reliable prediction 204 of the maximum size of the expected Reply message. If there are 205 variable-size data items in the result, the maximum size of the RPC 206 Reply message can be reliably estimated in most cases: 208 o The client requests only a specific portion of an object (for 209 example, using the "count" and "offset" fields in an NFS READ). 211 o The client has already cached the size of the whole object it is 212 about to request (say, via a previous NFS GETATTR request). 214 It is occasionally not possible to determine the maximum Reply 215 message size based solely on the above criteria. NFS client 216 implementers can choose to provide the largest possible Reply buffer 217 in those cases, based on, for instance, the largest possible NFS READ 218 or WRITE payload (which is negotiated at mount time). 220 In rare cases, a client may encounter a reply for which no a priori 221 determination of reply size bound is possible. The client SHOULD 222 expect a transport error to indicate that it must either terminate 223 that RPC transaction, or retry it with a larger Reply chunk. 225 The use of NFS COMPOUND operations raises the possibility of non- 226 idempotent requests that combine a non-idempotent operation with an 227 operation whose reply size is uncertain. This causes potential 228 difficulties with retrying the transaction. Note however that many 229 operations normally considered non-idempotent (e.g WRITE, SETATTR) 230 are actually idempotent. Truly non-idempotent operations are quite 231 unusual in COMPOUNDs that include operations with uncertain reply 232 sizes. 234 3. Upper Layer Binding For NFS Versions 2 And 3 236 This Upper Layer Binding specification applies to NFS Version 2 237 [RFC1094] and NFS Version 3 [RFC1813]. For brevity, in this section 238 a "legacy NFS client" refers to an NFS client using NFS version 2 or 239 NFS version 3 to communicate with an NFS server. Likewise, a "legacy 240 NFS server" is an NFS server communicating with clients using NFS 241 version 2 or NFS version 3. 243 The following XDR data items in NFS versions 2 and 3 are DDP- 244 eligible: 246 o The opaque file data argument in the NFS WRITE procedure 248 o The pathname argument in the NFS SYMLINK procedure 250 o The opaque file data result in the NFS READ procedure 252 o The pathname result in the NFS READLINK procedure 254 All other argument or result data items in NFS versions 2 and 3 are 255 not DDP-eligible. 257 A legacy server's response to a DDP-eligibility violation (described 258 in Section 2.5) does not give an indication to legacy clients of 259 whether the server has processed the arguments of the RPC Call, or 260 whether the server has accessed or modified client memory associated 261 with that RPC. 263 A legacy NFS client determines the maximum reply size for each 264 operation using the basic criteria outlined in Section 2.6. Such 265 clients provide a Reply chunk when the maximum possible reply size, 266 exclusive of any data items represented by Write chunks, is larger 267 than the client's responder inline threshold. 269 3.1. Auxiliary Protocols 271 NFS versions 2 and 3 are typically deployed with several other 272 protocols, sometimes referred to as "NFS auxiliary protocols." These 273 are separate RPC programs that define procedures which are not part 274 of the NFS version 2 or version 3 RPC programs. These include: 276 o The MOUNT and NLM protocols, introduced in an appendix of 277 [RFC1813] 279 o The NSM protocol, described in Chapter 11 of [NSM] 281 o The NFSACL protocol, which does not have a public definition 282 (NFSACL here is treated as a de facto standard as there are 283 several interoperating implementations). 285 RPC-over-RDMA considers these programs as distinct Upper Layer 286 Protocols [I-D.ietf-nfsv4-rfc5666bis]. To enable the use of these 287 ULPs on an RPC-over-RDMA transport, an Upper Layer Binding 288 specification is provided here for each. 290 3.1.1. MOUNT, NLM, And NSM Protocols 292 Typically MOUNT, NLM, and NSM are conveyed via TCP, even in 293 deployments where NFS operations on RPC-over-RDMA. When a legacy 294 server supports these programs on RPC-over-RDMA, it advertises the 295 port address via the usual rpcbind service [RFC1833]. 297 No operation in these protocols conveys a significant data payload, 298 and the size of RPC messages in these protocols is uniformly small. 299 Therefore, no XDR data items in these protocols are DDP-eligible. 300 The largest variable-length XDR data item is an xdr_netobj. In most 301 implementations this data item is not larger than 1024 bytes, making 302 reliable reply size estimation straightforward using the criteria 303 outlined in Section 2.6. 305 3.1.2. NFSACL Protocol 307 Legacy clients and servers that support the NFSACL RPC program 308 typically convey NFSACL procedures on the same connection as the NFS 309 RPC program. This obviates the need for separate rpcbind queries to 310 discover server support for this RPC program. 312 ACLs are typically small, but even large ACLs must be encoded and 313 decoded to some degree. Thus no data item in this Upper Layer 314 Protocol is DDP-eligible. 316 For procedures whose replies do not include an ACL object, the size 317 of a reply is determined directly from the NFSACL program's XDR 318 definition. 320 There is no protocol-wide size limit for NFS version 3 ACLs, and 321 there is no mechanism in either the NFSACL or NFS programs for a 322 legacy client to ascertain the largest ACL a legacy server can store. 323 Legacy client implementations should choose a maximum size for ACLs 324 based on their own internal limits. A recommended lower bound for 325 this maximum is 32,768 bytes, though a larger Reply chunk (up to the 326 negotiated rsize setting) can be provided. 328 4. Upper Layer Binding For NFS Version 4 330 This Upper Layer Binding specification applies to all protocols 331 defined in NFS Version 4.0 [RFC7530], NFS Version 4.1 [RFC5661], and 332 NFS Version 4.2 [RFC7862]. 334 4.1. DDP-Eligibility 336 Only the following XDR data items in the COMPOUND procedure of all 337 NFS version 4 minor versions are DDP-eligible: 339 o The opaque data field in the WRITE4args structure 341 o The linkdata field of the NF4LNK arm in the createtype4 union 343 o The opaque data field in the READ4resok structure 345 o The linkdata field in the READLINK4resok structure 347 o In minor version 2 and newer, the rpc_data field of the 348 read_plus_content union (further restrictions on the use of this 349 data item follow below). 351 4.1.1. READ_PLUS Replies 353 The NFS version 4.2 READ_PLUS operation returns a complex data type 354 [RFC7862]. The rpr_contents field in the result of this operation is 355 an array of read_plus_content unions, one arm of which contains an 356 opaque byte stream (d_data). 358 The size of d_data is limited to the value of the rpa_count field, 359 but the protocol does not bound the number of elements which can be 360 returned in the rpr_contents array. In order to make the size of 361 READ_PLUS replies predictable by NFS version 4.2 clients, the 362 following restrictions are placed on the use of the READ_PLUS 363 operation on RPC-over-RDMA transports: 365 o An NFS version 4.2 client MUST NOT provide more than one Write 366 chunk for any READ_PLUS operation. When providing a Write chunk 367 for a READ_PLUS operation, an NFS version 4.2 client MUST provide 368 a Write chunk that is either empty (which forces all result data 369 items for this operation to be returned inline) or large enough to 370 receive rpa_count bytes in a single element of the rpr_contents 371 array. 373 o If the Write chunk provided for a READ_PLUS operation by an NFS 374 version 4.2 client is not empty, an NFS version 4.2 server MUST 375 use that chunk for the first element of the rpr_contents array 376 that has an rpc_data arm. 378 o An NFS version 4.2 server MUST NOT return more than two elements 379 in the rpr_contents array of any READ_PLUS operation. It returns 380 as much of the requested byte range as it can fit within these two 381 elements. If the NFS version 4.2 server has not asserted rpr_eof 382 in the reply, the NFS version 4.2 client SHOULD send additional 383 READ_PLUS requests for any remaining bytes. 385 4.2. NFS Version 4 Reply Size Estimation 387 An NFS version 4 client provides a Reply chunk when the maximum 388 possible reply size is larger than the client's responder inline 389 threshold. 391 There are certain NFS version 4 data items whose size cannot be 392 estimated by clients reliably, however, because there is no protocol- 393 specified size limit on these structures. These include: 395 o The attrlist4 field 397 o Fields containing ACLs such as fattr4_acl, fattr4_dacl, 398 fattr4_sacl 400 o Fields in the fs_locations4 and fs_locations_info4 data structures 402 o Opaque fields which pertain to pNFS layout metadata, such as 403 loc_body, loh_body, da_addr_body, lou_body, lrf_body, 404 fattr_layout_types and fs_layout_types, 406 4.2.1. Reply Size Estimation For Minor Version 0 408 The items enumerated above in Section 4.2 make it difficult to 409 predict the maximum size of GETATTR replies that interrogate 410 variable-length attributes. As discussed in Section 2.6, client 411 implementations can rely on their own internal architectural limits 412 to bound the reply size, but such limits are not guaranteed to be 413 reliable. 415 If a client implementation is equipped to recognize that a transport 416 error could mean that it provisioned an inadequately sized Reply 417 chunk, it can retry the operation with a larger Reply chunk. 418 Otherwise, the client must terminate the RPC transaction. 420 It is best to avoid issuing single COMPOUNDs that contain both non- 421 idempotent operations and operations where the maximum reply size 422 cannot be reliably predicted. 424 4.2.2. Reply Size Estimation For Minor Version 1 And Newer 426 In NFS version 4.1 and newer minor versions, the csa_fore_chan_attrs 427 argument of the CREATE_SESSION operation contains a 428 ca_maxresponsesize field. The value in this field can be taken as 429 the absolute maximum size of replies generated by a replying NFS 430 version 4 server. 432 This value can be used in cases where it is not possible to estimate 433 a reply size upper bound precisely. In practice, objects such as 434 ACLs, named attributes, layout bodies, and security labels are much 435 smaller than this maximum. 437 4.3. NFS Version 4 COMPOUND Requests 439 The NFS version 4 COMPOUND procedure allows the transmission of more 440 than one DDP-eligible data item per Call and Reply message. An NFS 441 version 4 client provides XDR Position values in each Read chunk to 442 disambiguate which chunk is associated with which argument data item. 443 However NFS version 4 server and client implementations must agree in 444 advance on how to pair Write chunks with returned result data items. 446 The mechanism specified in Section 4.3.2 of 447 [I-D.ietf-nfsv4-rfc5666bis]) is applied here, with additional 448 restrictions that appear below. In the following list, an "NFS Read" 449 operation refers to any NFS Version 4 operation which has a DDP- 450 eligible result data item (i.e., either a READ, READ_PLUS, or 451 READLINK operation). 453 o If an NFS version 4 client wishes all DDP-eligible items in an NFS 454 reply to be conveyed inline, it leaves the Write list empty. 456 o The first chunk in the Write list MUST be used by the first READ 457 operation in an NFS version 4 COMPOUND procedure. The next Write 458 chunk is used by the next READ operation, and so on. 460 o If an NFS version 4 client has provided a matching non-empty Write 461 chunk, then the corresponding READ operation MUST return its DDP- 462 eligible data item using that chunk. 464 o If an NFS version 4 client has provided an empty matching Write 465 chunk, then the corresponding READ operation MUST return all of 466 its result data items inline. 468 o If an READ operation returns a union arm which does not contain a 469 DDP-eligible result, and the NFS version 4 client has provided a 470 matching non-empty Write chunk, an NFS version 4 server MUST 471 return an empty Write chunk in that Write list position. 473 o If there are more READ operations than Write chunks, then 474 remaining NFS Read operations in an NFS version 4 COMPOUND that 475 have no matching Write chunk MUST return their results inline. 477 4.3.1. NFS Version 4 COMPOUND Example 479 The following example shows a Write list with three Write chunks, A, 480 B, and C. The NFS version 4 server consumes the provided Write 481 chunks by writing the results of the designated operations in the 482 compound request (READ and READLINK) back to each chunk. 484 Write list: 486 A --> B --> C 488 NFS version 4 COMPOUND request: 490 PUTFH LOOKUP READ PUTFH LOOKUP READLINK PUTFH LOOKUP READ 491 | | | 492 v v v 493 A B C 495 If the NFS version 4 client does not want to have the READLINK result 496 returned via RDMA, it provides an empty Write chunk for buffer B to 497 indicate that the READLINK result must be returned inline. 499 4.4. NFS Version 4 Callback 501 The NFS version 4 protocols support server-initiated callbacks to 502 notify clients of events such as recalled delegations. 504 4.4.1. NFS Version 4.0 Callback 506 NFS version 4.0 implementations typically employ a separate TCP 507 connection to handle callback operations, even when the forward 508 channel uses a RPC-over-RDMA transport. 510 No operation in the NFS version 4.0 callback RPC program conveys a 511 significant data payload. Therefore, no XDR data items in this RPC 512 program is DDP-eligible. 514 A CB_RECALL reply is small and fixed in size. The CB_GETATTR reply 515 contains a variable-length fattr4 data item. See Section 4.2.1 for a 516 discussion of reply size prediction for this data item. 518 An NFS version 4.0 client advertises netids and ad hoc port addresses 519 for contacting its NFS version 4.0 callback service using the 520 SETCLIENTID operation. 522 4.4.2. NFS Version 4.1 Callback 524 In NFS version 4.1 and newer minor versions, callback operations may 525 appear on the same connection as is used for NFS version 4 forward 526 channel client requests. NFS version 4 clients and servers MUST use 527 the mechanism described in [I-D.ietf-nfsv4-rpcrdma-bidirection] when 528 backchannel operations are conveyed on RPC-over-RDMA transports. 530 The csa_back_chan_attrs argument of the CREATE_SESSION operation 531 contains a ca_maxresponsesize field. The value in this field can be 532 taken as the absolute maximum size of backchannel replies generated 533 by a replying NFS version 4 client. 535 There are no DDP-eligible data items in callback procedures defined 536 in NFS version 4.1 or NFS version 4.2. However, some callback 537 operations, such as messages that convey device ID information, can 538 be large, in which case a Long Call or Reply might be required. 540 When an NFS version 4.1 client reports a backchannel 541 ca_maxrequestsize that is larger than the connection's inline 542 thresholds, the NFS version 4 client can support Long Calls. 543 Otherwise an NFS version 4 server MUST use Short messages to convey 544 backchannel operations. 546 4.5. Session-Related Considerations 548 Typically the presence of an NFS session [RFC5661] has no effect on 549 the operation of RPC-over-RDMA. None of the operations introduced to 550 support NFS sessions contain DDP-eligible data items. There is no 551 need to match the number of session slots with the number of 552 available RPC-over-RDMA credits. 554 However, there are some rare error conditions which require special 555 handling when an NFS session is operating on an RPC-over-RDMA 556 transport. For example, a requester might receive, in response to an 557 RPC request, an RDMA_ERROR message with an rdma_err value of 558 ERR_CHUNK, or an RDMA_MSG containing an RPC_GARBAGEARGS reply. 559 Within RPC-over-RDMA Version One, this class of error can be 560 generated for two different reasons: 562 o There was an XDR error detected parsing the RPC-over-RDMA headers. 564 o There was an error sending the response, because, for example, a 565 necessary reply chunk was not provided or the one provided is of 566 insufficient length. 568 These two situations, which arise due to incorrect implementations or 569 underestimation of reply size, have different implications with 570 regard to Exactly-Once Semantics. An XDR error in decoding the 571 request precludes the execution of the request on the responder, but 572 failure to send a reply indicates that some or all of the operations 573 were executed. 575 In both instances, the client SHOULD NOT retry the operation without 576 addressing reply resource inadequacy. Such a retry can result in the 577 same sort of error seen previously. Instead, it is best to consider 578 the operation as completed unsuccessfully and report an error to the 579 consumer who requested the RPC. 581 In addition, within the error response, the requester does not have 582 the result of the execution of the SEQUENCE operation, which 583 identifies the session, slot, and sequence id for the request which 584 has failed. The xid associated with the request, obtained from the 585 rdma_xid field of the RDMA_ERROR or RDMA_MSG message, must be used to 586 determine the session and slot for the request which failed, and the 587 slot must be properly retired. If this is not done, the slot could 588 be rendered permanently unavailable. 590 4.6. Connection Keep-Alive 592 NFS version 4 client implementations often rely on a transport-layer 593 keep-alive mechanism to detect when an NFS version 4 server has 594 become unresponsive. When an NFS server is no longer responsive, 595 client-side keep-alive terminates the connection, which in turn 596 triggers reconnection and RPC retransmission. 598 Some RDMA transports (such as Reliable Connections on InfiniBand) 599 have no keep-alive mechanism. Without a disconnect or new RPC 600 traffic, such connections can remain alive long after an NFS server 601 has become unresponsive. Once an NFS client has consumed all 602 available RPC-over-RDMA credits on that transport connection, it will 603 forever await a reply before sending another RPC request. 605 NFS version 4 clients SHOULD reserve one RPC-over-RDMA credit to use 606 for periodic server or connection health assessment. This credit can 607 be used to drive an RPC request on an otherwise idle connection, 608 triggering either a quick affirmative server response or immediate 609 connection termination. 611 5. Extending NFS Upper Layer Bindings 613 RPC programs such as NFS are required to have an Upper Layer Binding 614 specification to interoperate on RPC-over-RDMA transports 615 [I-D.ietf-nfsv4-rfc5666bis]. Via standards action, the Upper Layer 616 Binding specified in this document can be extended to cover versions 617 of the NFS version 4 protocol specified after NFS version 4 minor 618 version 2, or separately published extensions to an existing NFS 619 version 4 minor version, as described in [I-D.ietf-nfsv4-versioning]. 621 6. IANA Considerations 623 NFS use of direct data placement introduces a need for an additional 624 NFS port number assignment for networks that share traditional UDP 625 and TCP port spaces with RDMA services. The iWARP [RFC5041] 626 [RFC5040] protocol is such an example (InfiniBand is not). 628 NFS servers for versions 2 and 3 [RFC1094] [RFC1813] traditionally 629 listen for clients on UDP and TCP port 2049, and additionally, they 630 register these with the portmapper and/or rpcbind [RFC1833] service. 631 However, [RFC7530] requires NFS version 4 servers to listen on TCP 632 port 2049, and they are not required to register. 634 An NFS version 2 or version 3 server supporting RPC-over-RDMA on such 635 a network and registering itself with the RPC portmapper MAY choose 636 an arbitrary port, or MAY use the alternative well-known port number 637 for its RPC-over-RDMA service. The chosen port MAY be registered 638 with the RPC portmapper under the netid assigned by the requirement 639 in [I-D.ietf-nfsv4-rfc5666bis]. 641 An NFS version 4 server supporting RPC-over-RDMA on such a network 642 MUST use the alternative well-known port number for its RPC-over-RDMA 643 service. Clients SHOULD connect to this well-known port without 644 consulting the RPC portmapper (as for NFS version 4 on TCP 645 transports). 647 The port number assigned to an NFS service over an RPC-over-RDMA 648 transport is available from the IANA port registry [RFC3232]. 650 7. Security Considerations 652 RPC-over-RDMA supports all RPC security models, including RPCSEC_GSS 653 security and transport-level security [RFC2203]. The choice of RDMA 654 Read and RDMA Write to convey RPC argument and results does not 655 affect this, since it changes only the method of data transfer. 656 Specifically, the requirements of [I-D.ietf-nfsv4-rfc5666bis] ensure 657 that this choice does not introduce new vulnerabilities. 659 Because this document defines only the binding of the NFS protocols 660 atop [I-D.ietf-nfsv4-rfc5666bis], all relevant security 661 considerations are therefore to be described at that layer. 663 8. References 665 8.1. Normative References 667 [I-D.ietf-nfsv4-rfc5666bis] 668 Lever, C., Simpson, W., and T. Talpey, "Remote Direct 669 Memory Access Transport for Remote Procedure Call, Version 670 One", draft-ietf-nfsv4-rfc5666bis-09 (work in progress), 671 January 2017. 673 [I-D.ietf-nfsv4-rpcrdma-bidirection] 674 Lever, C., "Bi-directional Remote Procedure Call On RPC- 675 over-RDMA Transports", draft-ietf-nfsv4-rpcrdma- 676 bidirection-06 (work in progress), January 2017. 678 [RFC1833] Srinivasan, R., "Binding Protocols for ONC RPC Version 2", 679 RFC 1833, DOI 10.17487/RFC1833, August 1995, 680 . 682 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 683 Requirement Levels", BCP 14, RFC 2119, 684 DOI 10.17487/RFC2119, March 1997, 685 . 687 [RFC2203] Eisler, M., Chiu, A., and L. Ling, "RPCSEC_GSS Protocol 688 Specification", RFC 2203, DOI 10.17487/RFC2203, September 689 1997, . 691 [RFC5661] Shepler, S., Ed., Eisler, M., Ed., and D. Noveck, Ed., 692 "Network File System (NFS) Version 4 Minor Version 1 693 Protocol", RFC 5661, DOI 10.17487/RFC5661, January 2010, 694 . 696 [RFC7530] Haynes, T., Ed. and D. Noveck, Ed., "Network File System 697 (NFS) Version 4 Protocol", RFC 7530, DOI 10.17487/RFC7530, 698 March 2015, . 700 [RFC7862] Haynes, T., "Network File System (NFS) Version 4 Minor 701 Version 2 Protocol", RFC 7862, DOI 10.17487/RFC7862, 702 November 2016, . 704 8.2. Informative References 706 [I-D.ietf-nfsv4-versioning] 707 Noveck, D., "Rules for NFSv4 Extensions and Minor 708 Versions", draft-ietf-nfsv4-versioning-09 (work in 709 progress), December 2016. 711 [NSM] The Open Group, "Protocols for Interworking: XNFS, Version 712 3W", February 1998. 714 [RFC1094] Nowicki, B., "NFS: Network File System Protocol 715 specification", RFC 1094, DOI 10.17487/RFC1094, March 716 1989, . 718 [RFC1813] Callaghan, B., Pawlowski, B., and P. Staubach, "NFS 719 Version 3 Protocol Specification", RFC 1813, 720 DOI 10.17487/RFC1813, June 1995, 721 . 723 [RFC3232] Reynolds, J., Ed., "Assigned Numbers: RFC 1700 is Replaced 724 by an On-line Database", RFC 3232, DOI 10.17487/RFC3232, 725 January 2002, . 727 [RFC5040] Recio, R., Metzler, B., Culley, P., Hilland, J., and D. 728 Garcia, "A Remote Direct Memory Access Protocol 729 Specification", RFC 5040, DOI 10.17487/RFC5040, October 730 2007, . 732 [RFC5041] Shah, H., Pinkerton, J., Recio, R., and P. Culley, "Direct 733 Data Placement over Reliable Transports", RFC 5041, 734 DOI 10.17487/RFC5041, October 2007, 735 . 737 [RFC5667] Talpey, T. and B. Callaghan, "Network File System (NFS) 738 Direct Data Placement", RFC 5667, DOI 10.17487/RFC5667, 739 January 2010, . 741 Appendix A. Changes Since RFC 5667 743 Corrections and updates made necessary by new language in 744 [I-D.ietf-nfsv4-rfc5666bis] have been introduced. For example, 745 references to deprecated features of RPC-over-RDMA Version One, such 746 as RDMA_MSGP, and the use of the Read list for handling RPC replies, 747 have been removed. The term "mapping" has been replaced with the 748 term "binding" or "Upper Layer Binding" throughout the document. 749 Some material that duplicates what is in [I-D.ietf-nfsv4-rfc5666bis] 750 has been deleted. 752 Material required by [I-D.ietf-nfsv4-rfc5666bis] for Upper Layer 753 Bindings that was not present in [RFC5667] has been added, including 754 discussion of how each NFS version properly estimates the maximum 755 size of RPC replies. 757 Technical corrections have been made. For example, the mention of 758 12KB and 36KB inline thresholds have been removed. The reference to 759 a non-existant NFS version 4 SYMLINK operation has been replaced with 760 NFS version 4 CREATE(NF4LNK). 762 The discussion of NFS version 4 COMPOUND handling has been completed. 763 Some changes were made to the algorithm for matching DDP-eligible 764 results to Write chunks. 766 Requirements to ignore extra Read or Write chunks have been removed 767 from the NFS version 2 and 3 Upper Layer Binding, as they conflict 768 with [I-D.ietf-nfsv4-rfc5666bis]. 770 A complete discussion of reply size estimation has been introduced 771 for all protocols covered by the Upper Layer Bindings in this 772 document. 774 The following additional improvements have been made, relative to 775 [RFC5667]: 777 o An explicit discussion of NFS version 4.0 and NFS version 4.1 778 backchannel operation has replaced the previous treatment of 779 callback operations. 781 o A binding for NFS version 4.2 has been added that includes 782 discussion of new data-bearing operations like READ_PLUS. 784 o A section suggesting a mechanism for periodically assessing 785 connection health has been introduced. 787 o Language inconsistent with or contradictory to 788 [I-D.ietf-nfsv4-rfc5666bis] has been removed from Sections 2 and 789 3, and both Sections have been combined into Section 2 in the 790 present document. 792 o Ambiguous or erroneous uses of RFC2119 terms have been corrected. 794 o References to obsolete RFCs have been updated. 796 o An IANA Considerations Section has replaced the "Port Usage 797 Considerations" Section. 799 o Code excerpts have been removed, and figures have been modernized. 801 Appendix B. Acknowledgments 803 The author gratefully acknowledges the work of Brent Callaghan and 804 Tom Talpey on the original NFS Direct Data Placement specification 805 [RFC5667]. The author also wishes to thank Bill Baker and Greg 806 Marsden for their support of this work. 808 Dave Noveck provided excellent review, constructive suggestions, and 809 consistent navigational guidance throughout the process of drafting 810 this document. Dave also contributed the text of Section 4.5 812 Thanks to Karen Deitke for her sharp observations about idempotency, 813 and the clarity of the discussion of NFS COMPOUNDs. 815 Special thanks go to Transport Area Director Spencer Dawkins, nfsv4 816 Working Group Chair Spencer Shepler, and nfsv4 Working Group 817 Secretary Thomas Haynes for their support. 819 Author's Address 821 Charles Lever (editor) 822 Oracle Corporation 823 1015 Granger Avenue 824 Ann Arbor, MI 48104 825 USA 827 Phone: +1 248 816 6463 828 Email: chuck.lever@oracle.com