idnits 2.17.00 (12 Aug 2021) /tmp/idnits34558/draft-dnoveck-nfsv4-migration-issues-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (January 16, 2012) is 3771 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Unused Reference: 'RFC5661' is defined on line 1589, but no explicit reference was found in the text ** Obsolete normative reference: RFC 3530 (Obsoleted by RFC 7530) -- Obsolete informational reference (is this intentional?): RFC 5661 (Obsoleted by RFC 8881) Summary: 1 error (**), 0 flaws (~~), 2 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 NFSv4 D. Noveck, Ed. 3 Internet-Draft EMC 4 Intended status: Informational P. Shivam 5 Expires: July 19, 2012 C. Lever 6 B. Baker 7 ORACLE 8 January 16, 2012 10 NFSv4.0 migration: Implementation experience and spec issues to resolve 11 draft-dnoveck-nfsv4-migration-issues-02 13 Abstract 15 The migration feature of NFSv4 provides for moving responsibility for 16 a single filesystem from one server to another, without disruption to 17 clients. Recent implementation experience has shown problems in the 18 existing specification for this feature. This document discusses the 19 issues which have arisen and explores the options available for 20 curing the issues via clarification and correction of the NFSv4.0 21 specification. 23 Status of this Memo 25 This Internet-Draft is submitted in full conformance with the 26 provisions of BCP 78 and BCP 79. 28 Internet-Drafts are working documents of the Internet Engineering 29 Task Force (IETF). Note that other groups may also distribute 30 working documents as Internet-Drafts. The list of current Internet- 31 Drafts is at http://datatracker.ietf.org/drafts/current/. 33 Internet-Drafts are draft documents valid for a maximum of six months 34 and may be updated, replaced, or obsoleted by other documents at any 35 time. It is inappropriate to use Internet-Drafts as reference 36 material or to cite them other than as "work in progress." 38 This Internet-Draft will expire on July 19, 2012. 40 Copyright Notice 42 Copyright (c) 2012 IETF Trust and the persons identified as the 43 document authors. All rights reserved. 45 This document is subject to BCP 78 and the IETF Trust's Legal 46 Provisions Relating to IETF Documents 47 (http://trustee.ietf.org/license-info) in effect on the date of 48 publication of this document. Please review these documents 49 carefully, as they describe your rights and restrictions with respect 50 to this document. Code Components extracted from this document must 51 include Simplified BSD License text as described in Section 4.e of 52 the Trust Legal Provisions and are provided without warranty as 53 described in the Simplified BSD License. 55 Table of Contents 57 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 5 58 2. Conventions . . . . . . . . . . . . . . . . . . . . . . . . . 5 59 3. Implementation Experience . . . . . . . . . . . . . . . . . . 6 60 3.1. Implementation issues . . . . . . . . . . . . . . . . . . 6 61 3.1.1. Failure to free migrated state on client reboot . . . 6 62 3.1.2. Server reboots resulting in a confused lease 63 situation . . . . . . . . . . . . . . . . . . . . . . 7 64 3.1.3. Client complexity issues . . . . . . . . . . . . . . . 8 65 3.2. Sources of Protocol difficulties . . . . . . . . . . . . . 9 66 3.2.1. Issues with nfs_client_id4 generation and use . . . . 9 67 3.2.2. Issues with lease proliferation . . . . . . . . . . . 11 68 4. Issues to be resolved . . . . . . . . . . . . . . . . . . . . 12 69 4.1. Possible changes to nfs_client_id4 client-string . . . . . 12 70 4.2. Possible changes to handle differing nfs_client_id4 71 string values . . . . . . . . . . . . . . . . . . . . . . 13 72 4.3. Other issues within migration-state sections . . . . . . . 13 73 4.4. Issues within other sections . . . . . . . . . . . . . . . 14 74 5. Proposed resolution of protocol difficulties . . . . . . . . . 14 75 5.1. Proposed changes: nfs_client_id4 client-string . . . . . . 14 76 5.2. Client-string Models (AS PROPOSED) . . . . . . . . . . . . 15 77 5.2.1. Non-Uniform Client-string Model . . . . . . . . . . . 16 78 5.2.2. Uniform Client-string Model . . . . . . . . . . . . . 17 79 5.3. Proposed changes: merged (vs. synchronized) leases . . . . 21 80 5.4. Other proposed changes to migration-state sections . . . . 22 81 5.4.1. Proposed changes: Client ID migration . . . . . . . . 22 82 5.4.2. Proposed changes: Callback re-establishment . . . . . 23 83 5.4.3. Proposed changes: NFS4ERR_LEASE_MOVED rework . . . . . 23 84 5.5. Proposed changes to other sections . . . . . . . . . . . . 24 85 5.5.1. Proposed changes: callback update . . . . . . . . . . 24 86 5.5.2. Proposed changes: clientid4 handling . . . . . . . . . 24 87 5.6. Migration, Replication and State (AS PROPOSED) . . . . . . 26 88 5.6.1. Migration and State . . . . . . . . . . . . . . . . . 26 89 5.6.2. Replication and State . . . . . . . . . . . . . . . . 28 90 5.6.3. Notification of Migrated Lease . . . . . . . . . . . . 29 91 5.6.4. Migration and the Lease_time Attribute . . . . . . . . 31 92 6. Results of proposed changes . . . . . . . . . . . . . . . . . 32 93 6.1. Results: Failure to free migrated state on client 94 reboot . . . . . . . . . . . . . . . . . . . . . . . . . . 32 95 6.2. Results: Server reboots resulting in confused lease 96 situation . . . . . . . . . . . . . . . . . . . . . . . . 33 97 6.3. Results: Client complexity issues . . . . . . . . . . . . 34 98 6.4. Result summary . . . . . . . . . . . . . . . . . . . . . . 35 99 7. Security Considerations . . . . . . . . . . . . . . . . . . . 35 100 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 35 101 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 35 102 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 36 103 10.1. Normative References . . . . . . . . . . . . . . . . . . . 36 104 10.2. Informative References . . . . . . . . . . . . . . . . . . 36 105 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 36 107 1. Introduction 109 This document is in the informational category, and while the facts 110 it reports may have normative implications, any such normative 111 significance reflects the readers' preferences. For example, we may 112 report that the reboot of a client with migrated state results in 113 state not being promptly cleared and that this will prevent granting 114 of conflicting lock requests at least for the lease time, which is a 115 fact. While it is to be expected that client and server implementers 116 will judge this to be a situation that is best avoided, the judgment 117 as to how pressing this issue should be considered is a judgment for 118 the reader, and eventually the nfsv4 working group to make. 120 We do explore possible ways in which such issues can be avoided, with 121 minimal negative effects, in the expectation that the working group 122 will choose to address these issues, but the choice of exactly how to 123 address this is best given effect in a working group document. 125 2. Conventions 127 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 128 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 129 document are to be interpreted as described in [RFC2119]. 131 In the context of this informational document, these normative 132 keywords will always occur in the context of a quotation, most often 133 direct but sometimes indirect. The context will make it clear 134 whether the quotation is from: 136 o The current definitive definition of the NFSv4.0 protocol, whether 137 that is the original NFSv4.0 specification [RFC3530], the current 138 pending draft of RFC3530bis expected to become the definitive 139 definition of NFSv4.0 once certain procedural steps are taken 140 [cur-v4.0-bis], or an eventual RFC3530bis RFC, taking over the 141 role of definitive definition of NFSv4.0 from RFC3530. 143 As the identity of that document may change during the lifetime of 144 this document, we will often refer to the current or pending 145 definition of NFSv4.0 and quote from portions of the documents 146 that are identical among all existing drafts. Given that RFC3530 147 and all RFC3530bis drafts agree as to the issues under discussion, 148 this should not cause undue difficulty. Note that to simplify 149 document maintenance, section names rather than section numbers 150 are used when referring to sections in existing documents so that 151 only minimal changes will be necessary as the identity of the 152 document defining NFSv4.0 changes. 154 o A proposed or possible text to serve as a replacement for the 155 current definitive document text. Sometimes, a number of possible 156 alternative texts may be listed and benefits and detriments of 157 each examined in turn. 159 3. Implementation Experience 161 3.1. Implementation issues 163 Note that the examples below reflect current experience which arises 164 from clients implementing the recommendation to use different 165 nfs_client_id4 id strings for different server addresses, i.e. using 166 what is later referred to herein as the "non-uniform client-string 167 model" 169 This is simply because that is the experience implementers have had. 170 The reader should not assume that in all cases, this practice is the 171 source of the difficulty. It may be so in some cases but clearly it 172 is not in all cases. 174 3.1.1. Failure to free migrated state on client reboot 176 The following sort of situation has proved troublesome: 178 o A client C establishes a clientid4 C1 with server ABC specifying 179 an nfs_client_id4 with "id" value "C-ABC" and verifier 0x111. 181 o The client begins to access files in filesystem F on server ABC, 182 resulting in generating stateids S1, S2, etc. under the lease for 183 clientid C1. It may also access files on other filesystems on the 184 same server. 186 o The filesystem is migrated from ABC to server XYZ. When 187 transparent state migration is in effect, stateids S1 and S2 and 188 clientid4 C1 are now available for use by client C at server XYZ. 189 So far, so good. 191 o Client C reboots and attempts to access data on server XYZ, 192 whether in filesystem F or another. It does a SETCLIENTID with an 193 nfs_client_id4 with "id" value "C-XYZ" and verifier 0x112. There 194 is thus no occasion to free stateids S1 and S2 since they are 195 associated with a different client name and so lease expiration is 196 the only way that they can be gotten rid of. 198 Note here that while it seems clear to us in this example that C-XYZ 199 and C-ABC are from the same client, the server has no way to 200 determine the structure of the "opaque" id. In the protocol, it 201 really is opaque. Only the client knows which nfs_client_id4 values 202 designate the same client on a different server. 204 3.1.2. Server reboots resulting in a confused lease situation 206 Further problems arise from scenarios like the following. 208 o Client C talks to server ABC using an nfs_client_id4 id like 209 "C-ABC" and verifier v1. As a result a lease with clientid4 c.i 210 is established: {v1, "C-ABC", c.i}. 212 o fs_a1 migrates from server ABC to server XYZ along with its state. 213 Now server XYZ also has a lease: {v1, "C-ABC", c.i}. 215 o Server ABC reboots. 217 o Client C talks to server ABC using an nfs_client_id4 id like 218 "C-ABC" and verifier v1. As a result a lease with clientid4 c.j 219 is established: {v1, "C-ABC", c.j}. 221 o fs_a2 migrates from server ABC to server XYZ. Now server XYZ also 222 has a lease: {v1, "C-ABC", c.j}. 224 o Now server XYZ has two leases that match {v1, "C-ABC", *}, when 225 the protocol clearly assumes there can be only one. 227 Note that if the client used "C" (rather than "C-ABC") as the 228 nfs_client_id4 id string, the exact same situation would arise. 230 One of the first cases in which this sort of situation has resulted 231 in difficulties is in connection with doing a SETCLIENTID for 232 callback update. 234 The SETCLIENTID for callback update only includes the nfs_client_id4, 235 assuming there can only be one such with a given nfs_client_id4 236 value. If there are multiple, confirmed client records with 237 identical nfs_client_id4 values, there is no way to map the callback 238 update request to the correct client record. 240 One possible accommodation for this particular issue that has been 241 used is to add a RENEW operation along with SETCLIENTID (on a 242 callback update) to disambiguate the client. 244 When the client updates the callback info to the destination, the 245 client would, by convention, send a compound like this: 247 { RENEW clientid4, SETCLIENTID nfs_client_id4,verf,cb } 248 The presence of the clientid4 in the compound would allow the server 249 to differentiate among the various leases that it knows of, all with 250 the same nfs_client_id4 value. 252 While this would be a reasonable patch for an isolated protocol 253 weakness, interoperable clients and servers would require that the 254 protocol truly be updated to allow such a situation, specifically 255 that of multiple clientid4's with the same nfs_client_id4 value. The 256 protocol is currently designed and implemented assuming this can't 257 happen. We need to either prevent the situation from happening, or 258 fully adapt to the possibilities which can arise. See Section 4 for 259 a discussion of such issues. 261 3.1.3. Client complexity issues 263 Consider the following situation: 265 o There are a set of clients C1 through Cn accessing servers S1 266 through Sm. Each server manages some significant number of 267 filesystems with the filesystem count L being significantly 268 greater than m. 270 o Each client Cx will access a subset of the servers and so will 271 have up to m clientid's, which we will call Cxy for server Sy. 273 o Now assume that for load-balancing or other operational reasons, 274 numbers of filesystems are migrated among the servers. As a 275 result, each client-server pair will have up to m clientid's and 276 each client will have up to m**2 clientids. If we add the 277 possibility of server reboot, the only bound on a client's 278 clientid count is L. 280 Now, instead of a clientid4 identifying a client-server pair, we have 281 many more entities for the client to deal with. In addition, it 282 isn't clear how new state is to be incorporated in this structure. 284 The limitations of the migrated state (inability to be freed on 285 reboot) would argue against adding more such state but trying to 286 avoid that would run into its own difficulties. For example, a 287 single lockowner string presented under two different clientids would 288 appear as two different entities. 290 Thus we have to choose between: 292 o indefinite prolongation of foreign clientid's even after all 293 transferred state is gone. 295 o having multiple requests for the same lockowner-string-named 296 entity carried on in parallel by separate identically named 297 lockowners under different clientid4's 299 o Adding serialization at the lock-owner string level, in addition 300 to that at the lockowner level. 302 In any case, we have gone (in adding migration as it was described) 303 from a situation in which 305 o Each client has a single clientid4/lease or each server it talks 306 to. 308 o Each client has a single nfs_client_id4 for each server it talks 309 to. 311 o Every state id can be mapped to an associated lease based on the 312 server it was obtained from. 314 To one in which 316 o Each client may have multiple clientid4's for a single server. 318 o For each stateid, the client must separately record the clientid4 319 that it is assigned to, or it must manage separate "state blobs" 320 for each fsid and map those to clientid4's. 322 o Before doing an operation that can result in a stateid, the client 323 must either find a "state blob" based on fsid or create a new one, 324 possibly with a new clinetid4. 326 o There may be multiple clientid4's all connected to the same server 327 and using the same nfs_clientid4. 329 This sort of additional client complexity is troublesome and needs to 330 be eliminated. 332 3.2. Sources of Protocol difficulties 334 3.2.1. Issues with nfs_client_id4 generation and use 336 The current definitive definition of the NFSv4.0 protocol [RFC3530], 337 and the current pending draft of RFC3530bis [cur-v4.0-bis] both 338 agree. The section entitled "Client ID" says: 340 The second field, id is a variable length string that uniquely 341 defines the client. 343 There are two possible interpretations of the phrase "uniquely 344 defines" in the above: 346 o The relation between strings and clients is a function from such 347 strings to clients so that each string designates a single client. 349 o The relation between strings and clients is a bijection between 350 such strings and clients so that each string designates a single 351 client and each client is named by a single string. 353 The first interpretation would make these client-strings like phone 354 numbers (a single person can have several) while the second would 355 make them like social security numbers. 357 Endless debate about the true meaning of "uniquely defines" in this 358 context is quite possible but not very helpful. The following points 359 should be noted though: 361 o The second interpretation is more consistent with the way 362 "uniquely defines" is used elsewhere in the spec. 364 o The spec as now written intends the first interpretation (or is 365 internally inconsistent). In fact, it recommends, although it 366 doesn't "RECOMMEND" that a single client have at least as many 367 client-strings as server addresses that it interacts with. It 368 says, in the third bullet point regarding construction of the 369 string (which we shall henceforth refer to as client-string-BP3): 371 The string should be different for each server network address 372 that the client accesses, rather than common to all server 373 network addresses. 375 o If internode interactions are limited to those between a client 376 and its servers, there is no occasion for servers to be concerned 377 with the question of whether two client-strings designate the same 378 client, so that there is no occasion for the difference in 379 interpretation to matter. 381 o When transparent migration of client state occurs between two 382 servers, it becomes important to determine when state on two 383 different servers is for the same client or not, and this 384 distinction becomes very important. 386 Given the need for the server to be aware of client identity with 387 regard to migrated state, either client-string construction rules 388 will have to change or there will be need to get around current 389 issues, or perhaps a combination of these two will be required. 390 Later sections will examine the options and propose a solution. 392 One consideration that may indicate that this cannot remain exactly 393 as it is today has to do with the fact that the current explanation 394 for this behavior is not correct. The current definitive definition 395 of the NFSv4.0 protocol [RFC3530], and the current pending draft of 396 RFC3530bis [cur-v4.0-bis] both agree. The section entitled "Client 397 ID" says: 399 The reason is that it may not be possible for the client to tell 400 if the same server is listening on multiple network addresses. If 401 the client issues SETCLIENTID with the same id string to each 402 network address of such a server, the server will think it is the 403 same client, and each successive SETCLIENTID will cause the server 404 to begin the process of removing the client's previous leased 405 state. 407 In point of fact, a "SETCLIENTID with the same id string" sent to 408 multiple network addresses will be treated as all from the same 409 client but will not "cause the server to begin the process of 410 removing the client's previous leased state" unless the server 411 believes it is a newer instance of the same client, i.e. if the id is 412 the same and there is a different verifier. If the client does not 413 reboot, the verifier should not change. If it does reboot, the 414 verifier will change, and the server should "begin the process of 415 removing the client's previous leased state. 417 The situation of multiple SETCLIENTID requests received by a server 418 on multiple network addresses is exactly the same, from the protocol 419 design point of view, as when multiple (i.e. duplicate) SETCLIENTID 420 requests are received by the server on a single network address. The 421 same protocol mechanisms that prevent erroneous state deletion in the 422 latter case prevent it in the former case. There is no reason for 423 special handling of the multiple-network-appearance case, in this 424 regard. 426 3.2.2. Issues with lease proliferation 428 It is often felt that this is a consequence of the client-string 429 construction issues, and it is certainly the case that the two are 430 closely connected in that non-uniform client-strings make it 431 impossible for the server to appropriately combine leases from the 432 same client. See Section 5.2.1 for a discussion of non-uniform 433 client-strings. 435 However, even where the server could combine leases from the same 436 client, it needs to be clear how and when it will do so, so that the 437 client will be prepared. These issues will have to be addressed at 438 various places in the spec. 440 This could be enough only if we are prepared to do away with the 441 "should" recommending non-uniform client-strings and replace it with 442 a "should not" or even a "SHOULD NOT". Current client implementation 443 patterns make this an unpalatable choice for use as a general 444 solution, but it is reasonable to "RECOMMEND" this choice for a well- 445 defined subset of clients. One alternative would be to create a way 446 for the server to infer from client behavior which leases are held by 447 the same client and use this information to do appropriate lease 448 mergers. Prototyping and detailed specification work has shown that 449 this could be done but the resulting complexity is such that a better 450 choice is to "RECOMMEND" use of the uniform model for clients 451 supporting the migration feature. 453 4. Issues to be resolved 455 4.1. Possible changes to nfs_client_id4 client-string 457 The fact that the reason given in client-string-BP3 is not valid 458 makes the existing "should" insupportable. We can't either 460 o Keep a reason we know is invalid. 462 o Keep saying "should" without giving a reason. 464 What are often presented as reasons that motivate use of the non- 465 uniform model always turn out to be cases in which, if the uniform 466 model were used, the server will treat a client which accesses that 467 server via two different IP addresses as part of a single client, as 468 it in fact is. This may be disconcerting to a client unaware that 469 the two IP addresses connect to the same server. This is thus not a 470 reason to use the non-uniform model but rather an illustration of the 471 fact that those using the uniform model must use server behavior to 472 determine whether any trunking of IP addresses exists, as is 473 described in Section 5.2.2. 475 It is always possible that a valid new reason will be found, but so 476 far none has been proposed. Given the history, the burden of proof 477 should be on those asserting the validity of a proposed new reason. 479 So we will assume for now that the "should" will have to go. The 480 question is what to replace it with. 482 o We can't say "MUST NOT", despite the problems this raises for 483 migration since this is pretty late in the day for such a change. 484 Many currently operating clients obey the existing "should". 485 Similar considerations would apply for "SHOULD NOT" or "should 486 not". 488 o Dropping client-string-BP3 entirely is a possibility but, given 489 the context and history, it would just be a confusing version of 490 "SHOULD NOT". 492 o Using "MAY" would clearly specify that both ways of doing this are 493 valid choices for clients and that servers will have to deal with 494 clients that make either choice. 496 o This might be modified by a "SHOULD" (or even a "MUST") for 497 particular groups of clients. 499 o There will have to be some text explaining why a client might make 500 either choice but, except for the particular cases referred to 501 above, we will have to make sure that it is truly descriptive, and 502 not slanted in either direction. 504 4.2. Possible changes to handle differing nfs_client_id4 string values 506 Given the difficulties caused by having different nfs_client_id4 507 client-string values for the same client, we have two choices: 509 o Deprecate the existing treatment and basically say the client is 510 on its own doing migration, if it follows it. 512 o Introduce a way of having the client provide client identity 513 information to the server, if it can be done compatibly while 514 staying within the bounds of v4.0. 516 4.3. Other issues within migration-state sections 518 There are a number of issues where the existing text is unclear 519 and/or wrong and needs to be fixed in some way. 521 o Lack of clarity in the discussion of moving clientids (as well as 522 stateids) as part of moving state for migration. 524 o The discussion of synchronized leases is wrong in that there is no 525 way to determine (in the current spec) when leases are for the 526 same client and also wrong in suggesting a benefit from leases 527 synchronized at the point of transfer. What is needed is merger 528 of leases, which is necessary to keep client complexity 529 requirements from getting out of hand. 531 o Lack of clarity in the discussion of LEASE_MOVED handling. 533 4.4. Issues within other sections 535 There are a number of cases in which certain sections, not 536 specifically related to migration require additional clarification. 537 This is generally because text that is clear in a context in which 538 leases and clientids are created in one place and live there forever 539 may need further refinement in the more dynamic environment that 540 arises as part of migration. 542 Some examples: 544 o Some people are under the impression that updating callback 545 endpoint information for an existing client, which is part of the 546 client's handling of migration, may cause the destination server 547 to free existing state. There needs to be additions to clarify 548 the situation. 550 o The handling of the sets of clientid4's maintained by each server 551 needs to be clarified. In particular, the issue of how the client 552 adapts to the presumably independent and uncoordinated clientid4 553 sets needs to be clearly addressed 555 o Statements regarding handling of invalid clientid4's need to be 556 clarified and/or refined in light of the possibilities that arise 557 due to lease motion and merger. 559 5. Proposed resolution of protocol difficulties 561 5.1. Proposed changes: nfs_client_id4 client-string 563 We propose replacing client-string-BP3 with the following text and 564 adding the following proposed Section 5.2 to provide implementation 565 guidance. 567 o The string MAY be different for each server network address that 568 the client accesses, rather than common to all server network 569 addresses. The considerations that might influence a client to 570 use different strings for each are explained in Section 5.2. 572 o Despite the use of the word "string" for this identifier, and the 573 fact that using strings will often be convenient, it should be 574 understood that the protocol defines this as opaque data. In 575 particular, those receiving such an id should not assume that it 576 will be in UTF-8 format nor should they reject it if it is not. 578 5.2. Client-string Models (AS PROPOSED) 580 One particular aspect of the construction of the nfs4_client_id4 581 string has proved recurrently troublesome. The client has a choice 582 of: 584 o Presenting the same id string to each server address accessed. 585 This is referred to as the "uniform client-string model" and is 586 discussed in Section 5.2.2. 588 o Presenting a different id string to each server address accessed. 589 This is referred to as the "non-uniform client-string model" and 590 is discussed in Section 5.2.1. 592 Construction of the client-string has been a troublesome issue 593 because of the way in which the NFS protocols have evolved. 595 o NFSv3 as a stateless protocol had no need to identify the state 596 shared by a particular client-server pair. Thus there was no 597 occasion to consider the question of whether a set of requests 598 come from the same client, or whether two server IP addresses are 599 connected to the same server. As the environment was one in which 600 the user supplied the target server IP address as part of 601 incorporating the remote filesystem in the client's file name 602 space, there was no occasion to take note of server trunking. 603 Within a stateless protocol, the situation was symmetrical. The 604 client has no server identity information and the server has no 605 client identity information. 607 o NFSv4.1 is a stateful protocol with full support for client and 608 server identity determination. This enables the server to be 609 aware when two requests come from the same client (they are on 610 sessions sharing a clientid4) and the client to be aware when two 611 server IP addresses are connected to the same server (they return 612 the same server name in responding to an EXCHANGE_ID). 614 NFSv4.0 is unfortunately halfway between these two. The two client- 615 string models have arisen in attempts to deal with the changing 616 requirements of the protocol as implementation has proceeded and 617 features that were not very substantial in [RFC3530], got more 618 substantial. 620 o In the absence of any implementation of the fs_locations-related 621 features (replication, referral, and migration), the situation is 622 very similar to that of NFSv3, with the addition of state but with 623 no concern to provide accurate client and server identity 624 determination. This is the situation that gave rise to the non- 625 uniform client-string model. 627 o In the presence of replication and referrals, the client may have 628 occasion to take advantage of knowledge of server trunking 629 information. Even more important, migration, by transferring 630 state among servers, causes difficulties for the non-uniform 631 client-string model, in that the two different client-strings sent 632 to different IP addresses may wind up on the same IP address, 633 adding confusion. 635 Both models have to deal with the asymmetry in client and server 636 identity information between client and server. Each seeks to make 637 the client's and the server's views match. In the process, each 638 encounters some combination of inelegant protocol features and/or 639 implementation difficulties. The choice of which to use is up to the 640 client implementer and the sections below try to give some useful 641 guidance. 643 5.2.1. Non-Uniform Client-string Model 645 The non-uniform client-string model is an attempt to handle these 646 matters in NFSv4.0 client implementations in as NFSv3-like a way as 647 possible. 649 For a client using the non-uniform model, all internal recording of 650 clientid4 values is to include, whether explicitly or implicitly, the 651 server IP address so that one always has an (IP-address, clientid4) 652 pair. Two such pairs from different servers are always distinct even 653 when the clientid4 values are the same, as they may occasionally be. 654 In this model, such equality is always treated as simple 655 happenstance. 657 Making the client-string different on different servers means that a 658 server has no way of tying together information from the same client 659 and so will treat a single client as multiple clients with multiple 660 leases for each server network address. Since there is no way in the 661 protocol for the client to determine if two network addresses are 662 connected to the same server, the resulting lack of knowledge is 663 symmetrical and can result in simpler client implementations in which 664 there is a single clientid/lease per server network addresses. 666 Support for migration, particularly with transparent state migration, 667 is more complex in the case of non-uniform client-strings. For 668 example, migration of a lease can result in multiple leases for the 669 same client accessing the same server addresses, vitiating many of 670 the advantages of this approach. Therefore, client implementations 671 that support migration with transparent state migration SHOULD NOT 672 use the non-uniform client-string model. 674 5.2.2. Uniform Client-string Model 676 When the client-string is kept uniform, the server has the basis to 677 have a single clientid4/lease for each distinct client. The problem 678 that has to be addressed is the lack of explicit server identity 679 information, which is made available in NFSv4.1. 681 When the same client-string is given to multiple IP addresses, the 682 client can determine whether two IP addresses correspond to a single 683 server, based on the server's behavior. This is the inverse of the 684 strategy adopted for the non-uniform model in which different server 685 IP addresses are told about different clients, simply to prevent a 686 server from manifesting behavior that is inconsistent with there 687 being a single server for each IP address, in line with the 688 traditions of NFS. So, to compare: 690 o In the non-uniform model, servers are told about different clients 691 because, if the server were to use accurate information as to 692 client identity, two IP addresses on the same server would behave 693 as if they were talking to the same client, which might prove 694 disconcerting to a client not expecting such behavior. 696 o In the uniform model, the servers are told about there being a 697 single client, which is, after all, the truth. Then, when the 698 server uses this information, two IP addresses on the same server 699 will behave as if they are talking to the same client, and this 700 difference in behavior allows the client to infer the server IP 701 address trunking configuration, even though NFSv4.0 does not 702 explicitly provide this information. 704 The approach given below shows one example of how this might be 705 done. 707 For a client using the uniform model, clientid4 values are treated as 708 important information in determining server trunking patterns. For 709 two different IP addresses to return the same clientid4 value is a 710 necessary, though not a sufficient condition for them to be 711 considered as connected to the same server. As a result, when two 712 different IP addresses return the same clientid4, the client needs to 713 determine, using the procedure given below or otherwise, whether the 714 IP addresses are connected to the same server. For such clients, all 715 internal recording of clientid4 values needs to include, whether 716 explicitly or implicitly, identification of the server from which the 717 clientid4 was received so that one always has a (server clientid4) 718 pair. Two such pairs from different servers are always considered 719 distinct even when the clientid4 values are the same, as they may 720 occasionally be. 722 In order to make this approach work, the client must have accessible, 723 for each nfs4_client_id4 used (only one in the uniform model) a list 724 of all server IP addresses, together with the associated clientid4 725 values. As a part of the associated data structures, there should be 726 the ability to mark a server IP structure as having the same server 727 as another and to mark an IP-address as currently unresolved. One 728 way to do this is to a allow each such entry to point to another with 729 the pointer value being one of: 731 o A pointer to another entry for an IP address associated with the 732 same server, where that IP address is the first one referenced to 733 access that server. 735 o A pointer to the current entry if there is no earlier IP address 736 associated with the same server, i.e. where the current IP address 737 is the first one referenced to access that server. We'll refer to 738 such an IP address as the lead IP address for a given server. 740 o The value NULL if the address's server identity is currently 741 unresolved. 743 When a SETCLIENTID is done and a clientid4 returned, the data 744 structure is searched for a matching clientid4 and processing depends 745 on what is found. We will refer to the IP address on which this 746 SETCLIENTID is done as X. The SETCLIENTID will use the common 747 nfs_client_id4 and specify X as part of the callback parameters. We 748 call the clientid4 and verifier returned by this operation XC and XV. 750 Note that at this point no SETCLIENTID_CONFIRM has yet been done. 751 This is because we have either established a new clientid4 on a 752 previously unknown server or changed the callback parameters on a 753 clientid4 associated with some already known server. We don't want 754 to confirm something that we are not sure we want to happen. 756 o If no matching clientid4 is found, the IP address X and clientid4 757 XC are added to the list and considered as having no existing 758 known IP addresses trunked with it. The IP address is marked as a 759 lead IP address for a new server. A SETCLIENTID_CONFIRM is done 760 using XC and XV. 762 o If a matching clientid4 is found which is marked unresolved, 763 processing on the new IP address is suspended. In order to 764 simplify processing, there can only be one unresolved IP address 765 for any given clientid4. 767 o If one or more matching clientid4's is found, none of which is 768 marked unresolved, the new IP address in entered and marked 769 unresolved. After applying the steps below to each of the lead IP 770 addresses with a matching clientid4, the address will have been 771 resolved: either it will be part of the same server as a new IP 772 address to be added to an existing set of IP addresses for a 773 server, or it will be recognized as a new server. At the point at 774 which this determination is made, the unresolved indication is 775 cleared and any suspended SETCLIENTID processing is restarted 777 So for each lead IP address IPn with a clientid4 matching XC, the 778 following steps are done. 780 o If the server has an associated stateid S, S is used in a request 781 issued on the address X with the fact of whether it is recognized 782 on X giving definitive information of X's server identity. 784 o If S is not recognized as valid on X, then X and IPn are 785 recognized as distinct and we go on to the next IPn, until we run 786 out of them. 788 o If S is recognized as valid on X, then X and IPn are recognized as 789 connected to the same server and the entry for X is marked as 790 associated with IPn. The entry is now resolved and processing can 791 be restarted for IP addresses whose clientid4 matched XC and whose 792 resolution had been deferred. 794 o If there is no such S for IPn, a different procedure is used. a 795 SETCLIENTID is done to update the callback parameters to reflect 796 the possibility that X will be marked as associated with the 797 server whose lead IP address is IPn. So assume that we do that 798 SETCLIENTID and get back verifier Vn. 800 o Note that we don't want this to happen if address X is not 801 associated with this server. So we do a SETCLIENTID_CONFIRM on 802 address IPn using verifier Vn. 804 o If the verifier generated on X is accepted on IPn, then X and IPn 805 are recognized as connected to the same server and the entry for X 806 is marked as associated with IPn. The entry is now resolved and 807 processing can be restarted for IP addresses whose clientid4 808 matched XC but whose resolution had been deferred. 810 o If the verifier generated on X is not accepted on IPn, then X and 811 IPn are distinct and the callback update will not be confirmed. 812 So we go on to the next IPn, until we run out of them. 814 The procedure above has made no explicit mention of the possibility 815 that server reboot can occur at any time. To address this 816 possibility the client should periodically use the clientid4 XC in 817 RENEW operations, directed to both the IP address X and the current 818 lead IP address that is currently being tested for identity. 820 o When XC becomes invalid on X, the resolution process should be 821 terminated, subject to being redone later. Before redoing the 822 resolution, XC should be checked on all the lead IP addresses on 823 which it was valid. Once a new clientid4 is established on any 824 servers on which XC became invalid, a new clientid4 can be 825 established on X and the resolution process for X can be 826 restarted. 828 o When XC does not becomes invalid on X, but becomes invalid on the 829 current IPn being tested, it should be concluded that X and IPn do 830 not match and that it is time to advance to the next IPn, if any. 832 o In the event of a reboot detected on any server lead IP, the set 833 of IP addresses associated with the server should not change and 834 state should be re-established for the lease as a whole, using all 835 available connected server IP addresses. It is prudent to verify 836 connectivity by doing a RENEW using the new clientid4 on each such 837 server address before using it, however. 839 If we have run out of IPn's without finding a matching server, X is 840 considered as having no existing known IP addresses trunked with it. 841 The IP address is marked as a lead IP address for a new server. A 842 SETCLIENTID_CONFIRM is done using XC and XV. 844 The following are advantages for the implementation of using the 845 uniform client-string model: 847 o Clients can take advantage of server trunking (and clustering with 848 single-server-equivalent semantics) to increase bandwidth or 849 reliability. 851 o There are advantages in state management so that, for example, we 852 never have a delegation under one clientid revoked because of a 853 reference to the same file from the same client under a different 854 clientid. 856 o The uniform client-string model allows the server to do any 857 necessary automatic lease merger in connection with migration, 858 without requiring any client involvement. This consideration is 859 of sufficient weight to cause us RECOMMEND use of the uniform 860 client-string model for clients supporting transparent state 861 migration. 863 The following implementation considerations might cause issues for 864 client implementations. 866 o This model is considerably different from the non-uniform model, 867 which most client implementations have been following. Until 868 substantial implementation experience is obtained with this model, 869 reluctance to embrace something so new is to be expected. 871 o Mapping between server network addresses and leases is more 872 complicated in that it is no longer a one-to-one mapping. 874 How to balance these considerations depends on implementation goals. 876 5.3. Proposed changes: merged (vs. synchronized) leases 878 The current definitive definition of the NFSv4.0 protocol [RFC3530], 879 and the current pending draft of RFC3530bis [cur-v4.0-bis] both 880 agree. The section entitled "Migration and State" says: 882 As part of the transfer of information between servers, leases 883 would be transferred as well. The leases being transferred to the 884 new server will typically have a different expiration time from 885 those for the same client, previously on the old server. To 886 maintain the property that all leases on a given server for a 887 given client expire at the same time, the server should advance 888 the expiration time to the later of the leases being transferred 889 or the leases already present. This allows the client to maintain 890 lease renewal of both classes without special effort: 892 There are a number of problems with this and any resolution of our 893 difficulties must address them somehow. 895 o The current v4.0 spec recommends that the client make it 896 essentially impossible to determine when two leases are from "the 897 same client". 899 o It is not appropriate to speak of "maintain[ing] the property that 900 all leases on a given server for a given client expire at the same 901 time", since this is not a property that holds even in the absence 902 of migration. A server listening on multiple network addresses 903 may have the same client appear as multiple clients with no way to 904 recognize the client as the same. 906 o Even if the client identity issue could be resolved, advancing the 907 lease time at the point of migration would not maintain the 908 desired synchronization property. The leases would be 909 synchronized until one of them was renewed, after which they would 910 be unsynchronized again. 912 To avoid client complexity, we need to have no more than one lease 913 between a single client and a single server. This requires merger of 914 leases since there is no real help from synchronizing them at a 915 single instant. 917 For the uniform model, the destination server would simply merge 918 leases as part of state transfer, since two leases with the same 919 nfs_client_id4 values must be for the same client. 921 We have made the following decisions as far as proposed normative 922 statements regarding for state merger. They reflect the facts that 923 we want to support fully migration support in the simplest way 924 possible and that we can't say MUST since we have older clients and 925 servers to deal with. 927 o Clients SHOULD use the uniform client-string model in order to get 928 good migration support. 930 o Servers SHOULD provide automatic lease merger during state 931 migration so that clients using the uniform id model get the 932 support automatically. 934 If the clients and the servers obey the SHOULD's, having more than a 935 single lease for a given client-server pair will be a transient 936 situation, cleaned up as part of adapting to use of migrated state. 938 Since clients and servers will be a mixture of old and new and 939 because nothing is a MUST we have to ensure that no combination will 940 show worse behavior than is exhibited by current (i.e. old) clients 941 and servers. 943 5.4. Other proposed changes to migration-state sections 945 5.4.1. Proposed changes: Client ID migration 947 The current definitive definition of the NFSv4.0 protocol [RFC3530], 948 and the current pending draft of RFC3530bis [cur-v4.0-bis] both 949 agree. The section entitled "Migration and State" says: 951 In the case of migration, the servers involved in the migration of 952 a filesystem SHOULD transfer all server state from the original to 953 the new server. This must be done in a way that is transparent to 954 the client. This state transfer will ease the client's transition 955 when a filesystem migration occurs. If the servers are successful 956 in transferring all state, the client will continue to use 957 stateids assigned by the original server. Therefore the new 958 server must recognize these stateids as valid. This holds true 959 for the client ID as well. Since responsibility for an entire 960 filesystem is transferred with a migration event, there is no 961 possibility that conflicts will arise on the new server as a 962 result of the transfer of locks. 964 This poses some difficulties, mostly because the part about "client 965 ID" is not clear: 967 o It isn't clear what part of the paragraph the "this" in the 968 statement "this holds true ..." is meant to signify. 970 o The phrase "the client ID" is ambiguous, possibly indicating the 971 clientid4 and possibly indicating the nfs_client_id4. 973 o If the text means to suggest that the same clientid4 must be used, 974 the logic is not clear since the issue is not the same as for 975 stateids of which there might be many. Adapting to the change of 976 a single clientid, as might happen as a part of lease migration, 977 is relatively easy for the client. 979 We have decided to address this issue as follows, with the relevant 980 changes all reflected in Section 5.6. 982 o Make it clear that both clientid4 and nfs_client_id4 are to be 983 transferred. 985 o Indicate that the initial transfer will result in the same 986 clientid4 after transfer but this is not guaranteed since there 987 may conflict with an existing clientid4 on the destination server 988 and because lease merger can result in a change of the clientid4. 990 5.4.2. Proposed changes: Callback re-establishment 992 The current definitive definition of the NFSv4.0 protocol [RFC3530], 993 and the current pending draft of RFC3530bis [cur-v4.0-bis] both 994 agree. The section entitled "Migration and State" says: 996 A client SHOULD re-establish new callback information with the new 997 server as soon as possible, according to sequences described in 998 sections "Operation 35: SETCLIENTID - Negotiate Client ID" and 999 "Operation 36: SETCLIENTID_CONFIRM - Confirm Client ID". This 1000 ensures that server operations are not blocked by the inability to 1001 recall delegations. 1003 The above will need to be fixed to reflect the possibility of merging 1004 of leases and the text to do this appears as part of Section 5.6. 1006 5.4.3. Proposed changes: NFS4ERR_LEASE_MOVED rework 1008 The current definitive definition of the NFSv4.0 protocol [RFC3530], 1009 and the current pending draft of RFC3530bis [cur-v4.0-bis] both 1010 agree. The section entitled "Notification of Migrated Lease" says: 1012 Upon receiving the NFS4ERR_LEASE_MOVED error, a client that 1013 supports filesystem migration MUST probe all filesystems from that 1014 server on which it holds open state. Once the client has 1015 successfully probed all those filesystems which are migrated, the 1016 server MUST resume normal handling of stateful requests from that 1017 client. 1019 There is a lack of clarity that is prompted by ambiguity about what 1020 exactly probing is and what the interlock between client and server 1021 must be. This has led to some worry about the scalability of the 1022 probing process, and although the time required does scale linearly 1023 with the number of fs's that the client may have state for with 1024 respect to a given server, the actual process can be done 1025 efficiently. 1027 To address these issues we propose replacing the above with the text 1028 addressing NFS4RR_LEASE_MOVED as given in Section 5.6.3. 1030 5.5. Proposed changes to other sections 1032 5.5.1. Proposed changes: callback update 1034 Some changes are necessary to reduce confusion about the process of 1035 callback information update and in particular to make it clear that 1036 no state is freed as a result: 1038 o Make it clear that after migration there are confirmed entries for 1039 transferred clientid4/nfs_client_id4 pairs. 1041 o Be explicit in the sections headed "otherwise," in the 1042 descriptions of SETCLIENTID and SETCLIENTID_CONFIRM, that these 1043 don't apply in the cases we are concerned about. 1045 5.5.2. Proposed changes: clientid4 handling 1047 To address both of the clientid4-related issues mentioned in 1048 Section 4.4, we propose replacing the last three paragraphs of the 1049 section entitled "Client ID" with the following: 1051 Once a SETCLIENTID and SETCLIENTID_CONFIRM sequence has 1052 successfully completed, the client uses the shorthand client 1053 identifier, of type clientid4, instead of the longer and less 1054 compact nfs_client_id4 structure. This shorthand client 1055 identifier (a client ID) is assigned by the server and should be 1056 chosen so that it will not conflict with a client ID previously 1057 assigned by same server. This applies across server restarts or 1058 reboots. 1060 Distinct servers MAY assign clientid4's independently, and will 1061 generally do so. Therefore, a client has to be prepared to deal 1062 with multiple instances of the same clientid4 value received on 1063 distinct IP addresses, denoting separate entities. When trunking 1064 of server IP addresses is not a consideration, a client should 1065 keep track of (IP-address, clientid4) pairs, so that each pair is 1066 distinct. For a discussion of how to address the issue in the 1067 face of possible trunking of server IP addresses, see Section 5.2. 1069 When a clientid4 is presented to a server and that clientid4 is 1070 not recognized, the server will reject the request with the error 1071 NFS4ERR_STALE_CLIENTID. This can occur for a number of reasons: 1073 * A server reboot causing loss of the server's knowledge of 1074 client 1076 * Client error sending an incorrect clientid4 or valid clientid4 1077 to the wrong server. 1079 * Loss of lease state due to lease expiration. 1081 * Client or server error causing the server to believe that the 1082 client has rebooted (i.e. receiving a SETCLIENTID with an 1083 nfs_client_id4 which has a matching id and a non-matching 1084 verifier. 1086 * Migration of all state under the associated lease causes its 1087 non-existence to be recognized on the source server. 1089 * Merger of state under the associated lease with another lease 1090 under a different clientid causes the clientid4 serving as the 1091 source of the merge to cease being recognized on its server. 1093 In the event of a server reboot, or loss of lease state due to 1094 lease expiration, the client must obtain a new clientid4 by use of 1095 the SETCLIENTID operation and then proceed to any other necessary 1096 recovery for the server reboot case (See the section entitled 1097 "Server Failure and Recovery"). In cases of server or client 1098 error resulting in this error, use of SETCLIENTID to establish a 1099 new lease is desirable as well. 1101 In the last two cases, different recovery procedures are required. 1102 See Section 5.6 for details. Note that in cases in which there is 1103 any uncertainty about which sort of handling is applicable, the 1104 distinguishing characteristic is that in reboot-like cases, the 1105 clientid4 and all associated stateid cease to exist while in 1106 migration-related cases, the clientid4 ceases to exist while the 1107 stateids are still valid. 1109 The client must also employ the SETCLIENTID operation when it 1110 receives a NFS4ERR_STALE_STATEID error using a stateid derived 1111 from its current clientid4, since this indicates a situation, such 1112 as server reboot which has invalidated the existing clientid4 and 1113 associated stateids (see the section entitled "lock-owner" for 1114 details). 1116 See the detailed descriptions of SETCLIENTID and 1117 SETCLIENTID_CONFIRM for a complete specification of the 1118 operations. 1120 5.6. Migration, Replication and State (AS PROPOSED) 1122 When responsibility for handling a given filesystem is transferred to 1123 a new server (migration) or the client chooses to use an alternate 1124 server (e.g., in response to server unresponsiveness) in the context 1125 of filesystem replication, the appropriate handling of state shared 1126 between the client and server (i.e., locks, leases, stateids, and 1127 client IDs) is as described below. The handling differs between 1128 migration and replication. 1130 If a server replica or a server immigrating a filesystem agrees to, 1131 or is expected to, accept opaque values from the client that 1132 originated from another server, then it is a wise implementation 1133 practice for the servers to encode the "opaque" values in network 1134 byte order. When doing so, servers acting as replicas or immigrating 1135 filesystems will be able to parse values like stateids, directory 1136 cookies, filehandles, etc. even if their native byte order is 1137 different from that of other servers cooperating in the replication 1138 and migration of the filesystem. 1140 5.6.1. Migration and State 1142 In the case of migration, the servers involved in the migration of a 1143 filesystem SHOULD transfer all server state from the original to the 1144 new server. This must be done in a way that is transparent to the 1145 client. This state transfer will ease the client's transition when a 1146 filesystem migration occurs. If the servers are successful in 1147 transferring all state, the client will continue to use stateids 1148 assigned by the original server. Therefore the new server must 1149 recognize these stateids as valid. 1151 If transferring stateids from server to server would result in a 1152 conflict for an existing stateid for the destination server with the 1153 existing client, transparent state migration MUST NOT happen for that 1154 client. Servers participating in using transparent state migration 1155 should co-ordinate their stateid assignment policies to make this 1156 situation unlikely or impossible. The means by which this might be 1157 done, like all of the inter-server interactions for migration, are 1158 not specified by the NFS version 4.0 protocol. 1160 Handling of clientid values is similar but not identical. The 1161 clientid4 and nfs_client_id4 information (id and verifier) will be 1162 transferred with the rest of the state information and the 1163 destination server should use that information to determine 1164 appropriate clientid4 handling. Although the destination server may 1165 make state stored under an existing lease available under the 1166 clientid4 used on the source server, the client should not assume 1167 that this is always so. In particular, 1169 o If there is an existing lease with an nfs_client_id4 that matches 1170 a migrated lease (same id and verifier), the server SHOULD merge 1171 the two, making the union of the sets of stateids available under 1172 the clientid4 for the existing lease. As part of the lease 1173 merger, the expiration time of the lease will reflect renewal done 1174 within either of the ancestor leases (and so will reflect the 1175 latest of the renewals). 1177 o If there is an existing lease with an nfs_client_id4 that 1178 partially matches a migrated lease (same id and a different 1179 verifier), the server MUST eliminate one of the two, possibly 1180 invalidating one of the ancestor clientid4's. Since verifiers are 1181 not ordered, the later lease renewal time will prevail. 1183 When leases are not merged, the transfer of state should result in 1184 creation of a confirmed client record with empty callback information 1185 but matching the {v, x, c} for the transferred client information. 1186 This should enable establishment of new callback information using 1187 SETCLIENTID and SETCLIENTID_CONFIRM. 1189 A client may determine the disposition of migrated state by using a 1190 stateid associated with the migrated state and in an operation on the 1191 new server and using the associated clientid4 in a RENEW on the new 1192 server. 1194 o If the stateid is not valid and an error NFS4ERR_BAD_STATEID is 1195 received, either transparent state migration has not occurred or 1196 the state was purged due to verifier mismatch. 1198 o If the stateid is valid and an error NFS4ERR_STALE_CLIENTID is 1199 received on the RENEW, transparent state migration has occurred 1200 and the lease has been merged with an existing lease on the 1201 destination server. 1203 o If the stateid is valid and the clientid4 is valid, the lease has 1204 been transferred intact. 1206 Since responsibility for an entire filesystem is transferred with a 1207 migration event, there is no possibility that conflicts will arise on 1208 the new server as a result of the transfer of locks. 1210 The servers may choose not to transfer the state information upon 1211 migration. However, this choice is discouraged, except where 1212 specific issues such as stateid conflicts make it necessary. In the 1213 case of migration without state transfer, when the client presents 1214 state information from the original server (e.g. in a RENEW op or a 1215 READ op of zero length), the client must be prepared to receive 1216 either NFS4ERR_STALE_CLIENTID or NFS4ERR_STALE_STATEID from the new 1217 server. The client should then recover its state information as it 1218 normally would in response to a server failure. The new server must 1219 take care to allow for the recovery of state information as it would 1220 in the event of server restart. 1222 When a lease is transferred to a new server (as opposed to being 1223 merged with a lease already on the new server), a client SHOULD re- 1224 establish new callback information with the new server as soon as 1225 possible, according to sequences described in sections "Operation 35: 1226 SETCLIENTID - Negotiate Client ID" and "Operation 36: 1227 SETCLIENTID_CONFIRM - Confirm Client ID". This ensures that server 1228 operations are not blocked by the inability to recall delegations. 1230 5.6.2. Replication and State 1232 Since client switch-over in the case of replication is not under 1233 server control, the handling of state is different. In this case, 1234 leases, stateids and client IDs do not have validity across a 1235 transition from one server to another. The client must re-establish 1236 its locks on the new server. This can be compared to the re- 1237 establishment of locks by means of reclaim-type requests after a 1238 server reboot. The difference is that the server has no provision to 1239 distinguish requests reclaiming locks from those obtaining new locks 1240 or to defer the latter. Thus, a client re-establishing a lock on the 1241 new server (by means of a LOCK or OPEN request), may have the 1242 requests denied due to a conflicting lock. Since replication is 1243 intended for read-only use of filesystems, such denial of locks 1244 should not pose large difficulties in practice. When an attempt to 1245 re-establish a lock on a new server is denied, the client should 1246 treat the situation as if its original lock had been revoked. 1248 5.6.3. Notification of Migrated Lease 1250 In the case of lease renewal, the client may not be submitting 1251 requests for a filesystem that has been migrated to another server. 1252 This can occur because of the implicit lease renewal mechanism. The 1253 client renews a lease containing state of multiple filesystems when 1254 submitting a request to any one filesystem at the server. 1256 In order for the client to schedule renewal of leases that may have 1257 been relocated to the new server, the client must find out about 1258 lease relocation before those leases expire. To accomplish this, all 1259 operations which implicitly renew leases for a client (such as OPEN, 1260 CLOSE, READ, WRITE, RENEW, LOCK, and others), will return the error 1261 NFS4ERR_LEASE_MOVED if responsibility for any of the leases to be 1262 renewed has been transferred to a new server. Note that when the 1263 transfer of responsibility leaves remaining state for that lease on 1264 the source server, the lease is renewed just as it would have been in 1265 the NFS4ERR_OK case, despite returning the error. The transfer of 1266 responsibility happens when the server receives a 1267 GETATTR(fs_locations) from the client for each filesystem for which a 1268 lease has been moved to a new server. Normally it does this after 1269 receiving an NFS4ERR_MOVED for an access to the filesystem but the 1270 server is not required to verify that this happens in order to 1271 terminate the return of NFS4ERR_LEASE_MOVED. By convention, the 1272 compounds containing GETATTR(fs_locations) SHOULD include an appended 1273 RENEW operation to permit the server to identify the client getting 1274 the information. 1276 Note that the NFS4ERR_LEASE_MOVED error is only required when 1277 responsibility for at least one stateid has been transferred. In the 1278 case of a null lease, where the only associated state is a clientid, 1279 no NFS4ERR_LEASE_MOVED error need be generated. 1281 Upon receiving the NFS4ERR_LEASE_MOVED error, a client that supports 1282 filesystem migration MUST perform the necessary GETATTR operation for 1283 each of the filesystems containing state that have been migrated and 1284 so give the server evidence that it is aware of the migration of the 1285 filesystem. Once the client has done this for all migrated 1286 filesystems on which the client holds state, the server MUST resume 1287 normal handling of stateful requests from that client. 1289 One way in which clients can do this efficiently in the presence of 1290 large numbers of filesystems is described below. This approach 1291 divides the process into two phases, one devoted to finding the 1292 migrated filesystems and the second devoted to doing the necessary 1293 GETATTRs. 1295 The client can find the migrated filesystems by building and issuing 1296 one or more COMPOUND requests, each consisting of a set of PUTFH/ 1297 GETFH pairs, each pair using an fh in one of the filesystems in 1298 question. All such COMPOUND requests can be done in parallel. The 1299 successful completion of such a request indicates that none of the 1300 fs's interrogated have been migrated while termination with 1301 NFS4ERR_MOVED indicates that the filesystem getting the error has 1302 migrated while those interrogated before it in the same COMPOUND have 1303 not. Those whose interrogation follows the error remain in an 1304 uncertain state and can be interrogated by restarting the requests 1305 from after the point at which NFS4ERR_MOVED was returned or by 1306 issuing a new set of COMPOUND requests for the filesystems which 1307 remain in an uncertain state. 1309 Once the migrated filesystems have been found, all that is needed is 1310 for client to give evidence to the server that it is aware of the 1311 migrated status of filesystems found by this process, by 1312 interrogating the fs_locations attribute for an fh each of the 1313 migrated filesystems. The client can do this building and issuing 1314 one or more COMPOUND requests, each of which consists of a set of 1315 PUTFH operations, each followed by a GETATTR of the fs_locations 1316 attribute. A RENEW follows to help tie the operations to the lease 1317 returning NFS4ERR_LEASE_MOVED. Once the client has done this for all 1318 migrated filesystems on which the client holds state, the server will 1319 resume normal handling of stateful requests from that client. 1321 In order to support legacy clients that do not handle the 1322 NFS4ERR_LEASE_MOVED error correctly, the server SHOULD time out after 1323 a wait of at least two lease periods, at which time it will resume 1324 normal handling of stateful requests from all clients. If a client 1325 attempts to access the migrated files, the server MUST reply 1326 NFS4ERR_MOVED. 1328 When the client receives an NFS4ERR_MOVED error, the client can 1329 follow the normal process to obtain the new server information 1330 (through the fs_locations attribute) and perform renewal of those 1331 leases on the new server. If the server has not had state 1332 transferred to it transparently, the client will receive either 1333 NFS4ERR_STALE_CLIENTID or NFS4ERR_STALE_STATEID from the new server, 1334 as described above. The client can then recover state information as 1335 it does in the event of server failure. 1337 Aside from recovering from a migration, there are other reasons a 1338 client may wish to retrieve fs_locations information from a server. 1339 When a server becomes unresponsive, for example, a client may use 1340 cached fs_locations data to discover an alternate server hosting the 1341 same fs data. A client may periodically request fs_locations data 1342 from a server in order to keep its cache of fs_locations data fresh. 1344 Since a GETATTR(fs_locations) operation would be used for refreshing 1345 cached fs_locations data, a server could mistake such a request as 1346 indicating recognition of an NFS4ERR_LEASE_MOVED condition. 1347 Therefore a compound which is not intended to signal that a client 1348 has recognized a migrated lease SHOULD be prefixed with a guard 1349 operation which fails with NFS4ERR_MOVED if the file handle being 1350 queried is no longer present on the server. The guard can be as 1351 simple as a GETFH operation. 1353 Though unlikely, it is possible that the target of such a compound 1354 could be migrated in the time after the guard operation is executed 1355 on the server but before the GETATTR(fs_locations) operation is 1356 encountered. When a client issues a GETATTR(fs_locations) operation 1357 as part of a compound not intended to signal recognition of a 1358 migrated lease, it SHOULD be prepared to process fs_locations data in 1359 the reply that shows the current location of the fs is gone. 1361 5.6.4. Migration and the Lease_time Attribute 1363 In order that the client may appropriately manage its leases in the 1364 case of migration, the destination server must establish proper 1365 values for the lease_time attribute. 1367 When state is transferred transparently, that state should include 1368 the correct value of the lease_time attribute. The lease_time 1369 attribute on the destination server must never be less than that on 1370 the source since this would result in premature expiration of leases 1371 granted by the source server. Upon migration in which state is 1372 transferred transparently, the client is under no obligation to re- 1373 fetch the lease_time attribute and may continue to use the value 1374 previously fetched (on the source server). 1376 In the case in which lease merger occurs as part of state transfer, 1377 the lease_time attribute of the destination lease remains in effect. 1378 The client can simply renew that lease with its existing lease_time 1379 attribute. State in the source lease is renewed at the time of 1380 transfer so that it cannot expire, as long as the destination lease 1381 is appropriately renewed. 1383 If state has not been transferred transparently (i.e., the client 1384 sees a real or simulated server reboot), the client should fetch the 1385 value of lease_time on the new (i.e., destination) server, and use it 1386 for subsequent locking requests. However the server must respect a 1387 grace period at least as long as the lease_time on the source server, 1388 in order to ensure that clients have ample time to reclaim their 1389 locks before potentially conflicting non-reclaimed locks are granted. 1390 The means by which the new server obtains the value of lease_time on 1391 the old server is left to the server implementations. It is not 1392 specified by the NFS version 4.0 protocol. 1394 6. Results of proposed changes 1396 The purpose of this section is to examine the troubling results 1397 reported in Section 3.1. We will look at the scenarios as they would 1398 be handled within the proposal. 1400 Because the choice of uniform vs. non-uniform nfs_client_id4 id 1401 strings is a "SHOULD" in these cases, we will designate clients that 1402 follow this recommendation by SHOULD-UF-CID. 1404 We will also have to take account of the various merger-related 1405 "SHOULD" clauses to better understand how they have addressed the 1406 issues seen, we abbreviate these (collectively known as "SHOULD- 1407 merges") as follows: 1409 o SHOULD-SVR-AM refers to the server obeying the SHOULD which 1410 RECOMMENDS that they merge leases with identical nfs_client_id4 id 1411 strings and verifiers. 1413 6.1. Results: Failure to free migrated state on client reboot 1415 Let's look at the troublesome situation cited in Section 3.1.1. We 1416 have already seen what happens when SHOULD-UF-CID does not hold. Now 1417 let's look at the situation in which SHOULD-UF-CID holds, whether 1418 SHOULD-SVR-AM is in effect or not. 1420 o A client C establishes a clientid4 C1 with server ABC specifying 1421 an nfs_client_id4 with "id" value "C" and verifier 0x111. 1423 o The client begins to access files in filesystem F on server ABC, 1424 resulting in generating stateids S1, S2, etc. under the lease for 1425 clientid C1. It may also access files on other filesystems on the 1426 same server. 1428 o The filesystem is migrated from ABC to server XYZ. When 1429 transparent state migration is in effect, stateids S1 and S2 and 1430 lease {0x111, "C", C1} are now available for use by client C at 1431 server XYZ. So far, so good. 1433 o Client C reboots and attempts to access data on server XYZ, 1434 whether in filesystem F or another. It does a SETCLIENID with an 1435 nfs_client_id4 with "id" value "C" and verifier 0x112. The state 1436 associated with lease {0x111, "C", C1} is deleted as part of 1437 creating {0x112, "C", C2}. No problem. 1439 The correctness signature for this issue is 1441 SHOULD-UF-CID 1443 so if you have clients and servers that obey the SHOULD clauses, the 1444 problem is gone regardless of the choice on the MAY. 1446 6.2. Results: Server reboots resulting in confused lease situation 1448 Now let's consider the scenario given in Section 3.1.2. We have 1449 already seen what happens when SHOULD-UF-CID does not hold . Now 1450 let's look at the situation in which SHOULD-UF-CID holds and SHOULD- 1451 SVR-AM holds as well. 1453 o Client C talks to server ABC using an nfs_client_id4 id like 1454 "C-ABC" and verifier v1. As a result a lease with clientid4 c.i 1455 established: {v1, "C-ABC", c.i}. 1457 o fs_a1 migrates from server ABC to server XYZ along with its state. 1458 Now server XYZ also has a lease: {v1, "C-ABC", c.i} 1460 o Server ABC reboots. 1462 o Client C talks to server ABC using an nfs_client_id4 id like 1463 "C-ABC" and verifier v1. As a result a lease with clientid4 c.j 1464 established: {v1, "C-ABC", c.j}. 1466 o fs_a2 migrates from server ABC to server XYZ. As part of 1467 migration the incoming lease is seen to denote same Nfs_client_id4 1468 and so is merged with {v1, "C-ABC, c.i}. 1470 o Now server XYZ has only one lease that matches {v1, "C_ABC", *}, 1471 so the problem is solved 1473 Now let's consider the same scenario in the situation in which 1474 SHOULD-UF-CID holds and SHOULD-SVR-AM holds as well. 1476 o Client C talks to server ABC using an nfs_client_id4 id like "C" 1477 and verifier v1. As a result a lease with clientid4 c.i is 1478 established: {v1, "C", c.i}. 1480 o fs_a1 migrates from server ABC to server XYZ along with its state. 1481 Now XYZ also has a lease: {v1, "C", c.i} 1483 o Server ABC reboots. 1485 o Client C talks to server ABC using an nfs_client_id4 id like "C" 1486 and verifier v1. As a result a lease with clientid4 c.j is 1487 established: {v1, "C", c.j}. 1489 o fs_a2 migrates from server ABC to server XYZ. As part of 1490 migration the incoming lease is seen to denote the same 1491 nfs_client_id4 and so is merged with {v1, "C", c.i}. 1493 o Now server XYZ has only one lease that matches {v1, "C", *}, so 1494 the problem is solved 1496 The correctness signature for this issue is 1498 SHOULD-SVR-AM 1500 so if you have clients and servers that obey the SHOULD clauses, the 1501 problem is gone regardless of the choice on the MAY. 1503 6.3. Results: Client complexity issues 1505 Consider the following situation: 1507 o There are a set of clients C1 through Cn accessing servers S1 1508 through Sm. Each server manages some significant number of 1509 filesystems with the filesystem count L being significantly 1510 greater than m. 1512 o Each client Cx will access a subset of the servers and so will 1513 have up to m clientid's, which we will call Cxy for server Sy. 1515 o Now assume that for load-balancing or other operational reasons, 1516 numbers of filesystems are migrated among the servers. As a 1517 result, depending on how this handled, the number of clientids may 1518 explode. See below. 1520 Now look what will happen under various scenarios: 1522 o We have previously (in Section 3.1.3) looked at this in case of 1523 client following the non-uniform client-string model. In that 1524 case, each client-server pair could have up to m clientid's and 1525 each client will have up to m**2 clientids. If we add the 1526 possibility of server reboot, the only bound on a client's 1527 clientid count is L. 1529 o If we look at this in the SHOULD-UF-CID case in which the SHOULD- 1530 SVR_AM condition holds, the situation is no different. Although 1531 the server has the client identity information that could enable 1532 same-client-same-server leases to be combined, it does not do so. 1533 We still have up to L clientid's per client. 1535 o On the other hand, if we look at the SHOULD-UF-CID case in which 1536 SHOULD-SVR-AM holds, the problem is gone. There can be no more 1537 than m clientids per client, and n clientid's per server. 1539 The correctness signature for this issue is 1541 (SHOULD-UF-CID & SHOULD-SVR-AM) 1543 so if you have clients and servers that obey the SHOULD clauses, the 1544 problem is gone regardless of the choice on the MAY. 1546 6.4. Result summary 1548 We have seen that (SHOULD-SVR-AM & SHOULD-UF-CID) are sufficient to 1549 solve the problems people have experienced. 1551 7. Security Considerations 1553 The current definitive definition of the NFSv4.0 protocol [RFC3530], 1554 and the current pending draft of RFC3530bis [cur-v4.0-bis] both 1555 agree. The section entitled "Security Considerations" encourages 1556 that clients protect the integrity of the SECINFO operation, any 1557 GETATTR operation for the fs_locations attribute, and the operations 1558 SETCLIENTID/SETCLIENTID_CONFIRM. A migration recovery event can use 1559 any or all of these operations. We do not recommend any change here. 1561 8. IANA Considerations 1563 This document does not require actions by IANA. 1565 9. Acknowledgements 1567 The editor and authors of this document gratefully acknowledge the 1568 contributions of Trond Myklebust of NetApp and Robert Thurlow of 1569 Oracle. We also thank Tom Haynes of NetApp and Spencer Shepler of 1570 Microsoft for their guidance and suggestions. 1572 Special thanks go to members of the Oracle Solaris NFS team, 1573 especially Rick Mesta and James Wahlig, for their work implementing 1574 an NFSv4.0 migration prototype and identifying many of the issues 1575 documented here. 1577 10. References 1578 10.1. Normative References 1580 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1581 Requirement Levels", BCP 14, RFC 2119, March 1997. 1583 [RFC3530] Shepler, S., Callaghan, B., Robinson, D., Thurlow, R., 1584 Beame, C., Eisler, M., and D. Noveck, "Network File System 1585 (NFS) version 4 Protocol", RFC 3530, April 2003. 1587 10.2. Informative References 1589 [RFC5661] Shepler, S., Eisler, M., and D. Noveck, "Network File 1590 System (NFS) Version 4 Minor Version 1 Protocol", 1591 RFC 5661, January 2010. 1593 [cur-v4.0-bis] 1594 Haynes, T., Ed. and D. Noveck, Ed., "Network File System 1595 (NFS) Version 4 Protocol", 2011, . 1598 Work in progress. 1600 Authors' Addresses 1602 David Noveck (editor) 1603 EMC Corporation 1604 228 South Street 1605 Hopkinton, MA 01748 1606 US 1608 Phone: +1 508 249 5748 1609 Email: david.noveck@emc.com 1611 Piyush Shivam 1612 Oracle Corporation 1613 5300 Riata Park Ct. 1614 Austin, TX 78727 1615 US 1617 Phone: +1 512 401 1019 1618 Email: piyush.shivam@oracle.com 1619 Charles Lever 1620 Oracle Corporation 1621 1015 Granger Avenue 1622 Ann Arbor, MI 48104 1623 US 1625 Phone: +1 248 614 5091 1626 Email: chuck.lever@oracle.com 1628 Bill Baker 1629 Oracle Corporation 1630 5300 Riata Park Ct. 1631 Austin, TX 78727 1632 US 1634 Phone: +1 512 401 1081 1635 Email: bill.baker@oracle.com