idnits 2.17.00 (12 Aug 2021) /tmp/idnits1634/draft-wkumari-dnsop-hammer-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (October 30, 2016) is 2029 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Unused Reference: 'I-D.ietf-sidr-iana-objects' is defined on line 377, but no explicit reference was found in the text == Outdated reference: draft-ietf-sidr-iana-objects has been published as RFC 6491 Summary: 0 errors (**), 0 flaws (~~), 3 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group W. Kumari 3 Internet-Draft Google 4 Intended status: Informational R. Arends 5 Expires: May 3, 2017 Nominet 6 S. Woolf 8 D. Migault 9 Orange 10 October 30, 2016 12 Highly Automated Method for Maintaining Expiring Records 13 draft-wkumari-dnsop-hammer-02 15 Abstract 17 This document describes a simple DNS cache optimization which keeps 18 the most popular records in the DNS cache: Highly Automated Method 19 for Maintaining Expiring Records (HAMMER). The principle is that 20 records in the cache are fetched, that is to say resolved before 21 their TTL expires and the record is flushed from the cache. By 22 fetching Records before they are being queried by an end user, HAMMER 23 is expected to improve the quality of experience of the end users as 24 well as to optimize the resources involved in large DNSSEC resolving 25 platforms. 27 Status of This Memo 29 This Internet-Draft is submitted in full conformance with the 30 provisions of BCP 78 and BCP 79. 32 Internet-Drafts are working documents of the Internet Engineering 33 Task Force (IETF). Note that other groups may also distribute 34 working documents as Internet-Drafts. The list of current Internet- 35 Drafts is at http://datatracker.ietf.org/drafts/current/. 37 Internet-Drafts are draft documents valid for a maximum of six months 38 and may be updated, replaced, or obsoleted by other documents at any 39 time. It is inappropriate to use Internet-Drafts as reference 40 material or to cite them other than as "work in progress." 42 This Internet-Draft will expire on May 3, 2017. 44 Copyright Notice 46 Copyright (c) 2016 IETF Trust and the persons identified as the 47 document authors. All rights reserved. 49 This document is subject to BCP 78 and the IETF Trust's Legal 50 Provisions Relating to IETF Documents 51 (http://trustee.ietf.org/license-info) in effect on the date of 52 publication of this document. Please review these documents 53 carefully, as they describe your rights and restrictions with respect 54 to this document. Code Components extracted from this document must 55 include Simplified BSD License text as described in Section 4.e of 56 the Trust Legal Provisions and are provided without warranty as 57 described in the Simplified BSD License. 59 Table of Contents 61 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 62 1.1. Requirements notation . . . . . . . . . . . . . . . . . . 3 63 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 64 3. Motivations . . . . . . . . . . . . . . . . . . . . . . . . . 3 65 3.1. Improving browsing Quality of Experience by reducing 66 response time . . . . . . . . . . . . . . . . . . . . . . 3 67 3.2. Optimize the resources involved in large DNSSEC resolving 68 platforms . . . . . . . . . . . . . . . . . . . . . . . . 4 69 4. Overview of Operation . . . . . . . . . . . . . . . . . . . . 4 70 5. Known implementations . . . . . . . . . . . . . . . . . . . . 5 71 5.1. Unbound (NLNet Labs) . . . . . . . . . . . . . . . . . . 5 72 5.2. OpenDNS . . . . . . . . . . . . . . . . . . . . . . . . . 6 73 5.3. ISC BIND . . . . . . . . . . . . . . . . . . . . . . . . 6 74 6. An example / reference implementation . . . . . . . . . . . 6 75 6.1. Variables . . . . . . . . . . . . . . . . . . . . . . . . 7 76 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8 77 8. Security Considerations . . . . . . . . . . . . . . . . . . . 8 78 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 8 79 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 8 80 10.1. Normative References . . . . . . . . . . . . . . . . . . 8 81 10.2. Informative References . . . . . . . . . . . . . . . . . 8 82 Appendix A. Changes / Author Notes. . . . . . . . . . . . . . . 9 83 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 9 85 1. Introduction 87 A recursive DNS resolver may cache a Resource Record (RR) for, at 88 most, the Time To Live (TTL) associated with that record. While the 89 TTL is greater than zero, the resolver may respond to queries from 90 its cache; but once the TTL has reached zero, the resolver flushes 91 the RR. When the resolver gets another query for that resource, it 92 needs to initiate a new query. This is then cached and returned to 93 the querying client. This document discusses an optimization (Highly 94 Automated Method for Maintaining Expiring Records -- (HAMMER), also 95 known as "prefetch") to help keep popular responses in the cache, by 96 fetching new responses before the TTL expires. This behavior is 97 triggered by an incoming query that arrives only shortly before the 98 cache entry was due to expire. 100 1.1. Requirements notation 102 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 103 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 104 document are to be interpreted as described in [RFC2119]. 106 2. Terminology 108 HAMMER resolver: A DNS resolver that implements HAMMER mechanism. 110 HAMMER FQDN: A FQDN that is a candidate for the HAMMER process. 112 HAMMER TIME: TTL Time to consider before triggering the HAMMER 113 mechanism. 115 3. Motivations 117 When a recursive resolver responds to a client, it either responds 118 from cache, or it initiates an iterative query to resolve the answer, 119 caches the answer and then responds with that answer. 121 3.1. Improving browsing Quality of Experience by reducing response time 123 Any end user querying a fetched FQDN will get the response from the 124 cache of the resolver. This provides faster responses, thus 125 improving the end user experience for browsing and other 126 applications/activities. 128 Popular FQDNs are highly queried, and end users have high 129 expectations in terms of application responsiveness for these FQDNs. 130 With regular DNS rules, once the FQDN has been flushed from the 131 cache, it waits for the next end user to request the FQDN before 132 initiating a resolution for this given FQDN with iterative queries. 133 This results in at least one end user waiting for this resolution to 134 be performed over the Internet before the response is sent to them. 135 This may provide a poor user experience since DNS response times over 136 the Internet are unpredictable at best and it provides a response 137 time longer then usual. 139 In some cases, not only the first end user querying that FQDN may be 140 impacted, but also other end users that request the FQDN between the 141 time the FQDN TTL expires and the time the cache is again filled. In 142 this case, the result is impact on multiple end users and possible 143 unnecessary load on the platform. Note that this load is increased 144 by the use of DNSSEC since DNSSEC may involve additional resolutions, 145 larger payloads, and signature checks. 147 3.2. Optimize the resources involved in large DNSSEC resolving 148 platforms 150 Large resolving platforms are often composed of a set of independent 151 resolving nodes. The traffic is usually load balanced based on the 152 query source IP addresses. This results in most popular FQDNs being 153 resolved independently by all nodes. This increases the number of 154 end users who may experience unnecessary latency. Also, when DNSSEC 155 is used, all nodes independently perform signature check operations, 156 possibly resulting in high loads on the authoritative server. 158 The challenge these large DNSSEC resolving platforms have to overcome 159 is to provide a uniform distribution of the nodes given that end user 160 and FQDNs do not have a uniform distribution of the resources. More 161 specifically, FQDNs and end users usually present Zipf popularity 162 distributions, which means that most of the traffic is performed by a 163 small set of end users and by a small set of FQDNs. 165 DNS and large resolving DNS platforms have resulted in uniformly 166 balanced traffic among the nodes. In fact the resolving traffic on 167 the Internet interface was rather small (at least in term of CPU) 168 compared to traffic received from the end users. DNSSEC changed 169 this, as CPU are involved in performing signature checks. One way to 170 reduce the number of DNSSEC resolutions is to fetch the nodes with 171 the most popular FQDNs. This avoids parallel resolutions and overall 172 reduces cost, because signature checks are not performed, while 173 benefiting from the already existing load balancing architecture. 174 This architecture takes advantage of the Zipf distribution of the 175 FQDNs' popularity. In fact, a few number of FQDNs can be cached (a 176 few thousands) to address most of the traffic (up to 70%). 178 Note that to perform a single resolution for the global platform, 179 nodes may be configured as forwarders for the most popular FQDNs 181 4. Overview of Operation 183 When an incoming query is received, and the result is in the cache, 184 the query is answered from the cache. If the remaining TTL of the 185 record is below some threshold, the recursive server will also 186 initiate a cache fill operation in the background to refresh the 187 cache entry. 189 The fact that the behavior is triggered by an incoming query (and not 190 by periodically scanning the cache and refreshing all entries that 191 are about to expire) allows unpopular names to age out of the cache 192 naturally, while keeping popular entries in the cache. 194 5. Known implementations 196 [Ed: Well, this is kinda embarrassing. This idea occurred to us one 197 day while sitting around a pool in New Hampshire. It then took a 198 while before I wrote it down, mostly because I *really* wanted to get 199 "Stop! Hammer Time!" into a draft. Anyway, we presented it in 200 Berlin, and Wouter Wijngaards stood up and mentioned that Unbound 201 already does this (they use a percentage of TTL, instead of a number 202 of seconds). Then we heard from OpenDNS that they *also* implement 203 something similar. Then we had a number of discussions, then got 204 sidetracked into other things. Anyway, BIND as of 9.10, around Feb 205 2014 now implements something like this 206 (https://deepthought.isc.org/article/AA-01122/0/Early-refresh-of- 207 cache-records-cache-prefetch-in-BIND-9.10.html), and enables it by 208 default. Unfortunately, while BIND uses the times based approach, 209 they named their parameters "trigger" and "eligibility" - and 210 shouting "Eligibility! Trigger time!" simply isn't funny (unless you 211 have a very odd sense of humor... So, we are now documenting 212 implementations that existed before this was published and an 213 impl,entation that we think was based on this. We think that this 214 has value to the community. I'm also leaving in the HAMMER TIME bit, 215 because it makes me giggle. This below section should be filled out 216 with more detail, in collaboration with the implementors, but this is 217 being written *just* before the draft cutoff.]. 219 A number of recursive resolvers implement techniques similar to the 220 techniques described in this document. This section documents some 221 of these and tradeoffs they make in picking their techniques. 223 5.1. Unbound (NLNet Labs) 225 The Unbound validating, recursive, and caching DNS resolver 226 implements a HAMMER type feature, called "prefetch". This feature 227 can be enabled or disabled though the configuration option "prefetch: 228 ". When enabled, Unbound will fetch expiring records when 229 their remaining TTL is less than 10% of their original TTL. 231 [Ed: Unbound's "prefetch" function was developed independently, 232 before this draft was written. The authors were unaware of it when 233 writing the document.] 235 5.2. OpenDNS 237 The public DNS resolver, OpenDNS implements a prefetch like solution. 239 [Ed: Will work with OpenDNS to get more details.] 241 5.3. ISC BIND 243 As of version 9.10, Internet Systems Consortium's BIND implements the 244 HAMMER functionality. This feature is enabled by default. 246 The functionality is configured using the "prefetch" options 247 statement, with two parameters: 249 Trigger This is equivalent to the HAMMER_TIME parameter described 250 below. 252 Eligibility This is equivalent to the STOP parameter described 253 below. 255 6. An example / reference implementation 257 When a recursive resolver that implements HAMMER receives a query for 258 information that it has in the cache, it responds from the cache. 260 If the queried FQDN is a HAMMER FQDN, the HAMMER resolver compares 261 the TTL value to the HAMMER TIME, as well as if the FQDN has already 262 been fetched. 264 If the HAMMER FQDN has already been fetched or provisioned) then 265 nothing is done. 267 If the HAMMER FQDN has not yet been fetched and the TTL is less than 268 the HAMMER_TIME, the HAMMER resolver starts a resolution for the 269 queried FQDN in order to fill the cache, just as if the TTL had 270 expired. During this cache fill operation the resolver continues to 271 respond from cache (until the TTL expires). When the cache fill 272 query completes, the new response replaces the existing cached 273 information. This ensures the cache has fresh data for subsequent 274 queries. 276 Since the cache fill query is initiated before the existing cached 277 entry expires (and is flushed), responses will come from the cache 278 more often. This decreases the client resolution latency and 279 improves the user experience. 281 The cache fill resolution is triggered by an incoming query (and only 282 if that query arrives shortly before the record would expire anyway). 284 This effectively keeps the most popular data uniformly queried in the 285 cache, without having to maintain counters in the cache or 286 proactively resolve responses that are not likely to be needed as 287 often. This is purely an implementation optimization - resolvers 288 always have the option to cache records for less than the TTL (for 289 example, when running low on cache space, etc), this simply triggers 290 a refresh of the RR before it expires. 292 Note that non-uniformly queried FQDNs may be popular and may not 293 benefit from the HAMMER mechanism. For example, an FQDNs MAY be 294 heavily queried the first 10 minutes of every hour with a 30 minute 295 TTL. In that case DNS queries are not expected to come between TTL - 296 HAMMER_TIME and TTL. 298 HAMMER FQDNs with small TTL may generate a cache fill process even 299 though they are not so popular. Suppose an end user is setting a 300 specific session which requires multiple DNS resolutions on a given 301 FQDN. These resolutions are necessary for a short period of time, 302 i.e. the necessary time to establish the session. If these FQDNs 303 have been set with a small TTL - in the order of the time session 304 establishment - the multiple queries to a HAMMER resolver may trigger 305 an unnecessary resolution. As a result HAMMER would not scale 306 thousands of these FQDNs. As a result, if the original TTL of the RR 307 is less than (or close to HAMMER_TIME), the described method could 308 cause excessive cache fill queries to occur. In order to prevent 309 this an additional variable named STOP (described below) is 310 introduced. If the original TTL of the RR is less than STOP * 311 HAMMER_TIME then the cache entry should be marked with a "Can't touch 312 this" flag, and the described method should not be used. 314 6.1. Variables 316 These are the mandatory variables: 318 HAMMER_TIME: is the number of seconds before TTL expiration that a 319 cache fill query should be initiated. This should be a user 320 configurable value. A default of 2 seconds is RECOMMENDED. 322 STOP: should be a user configurable variable. A default of 3 is 323 recommended. 325 Implementations may consider additional variables. These are not 326 mandatory but would address specific use of the HAMMER. 328 HAMMER_MATCH: should be a user configurable variable. It defines 329 FQDNs that are expected to implement HAMMER. This rule can be 330 expressed in different ways. It can be a list of FQDNs, or a 331 number indicating the number of most popular FQDNs that needs 332 to be considered. How HAMMER_MATCH is expressed is 333 implementation dependent. Implementations can use a list of 334 FQDNs, others can use a matching rule on the FQDNs, or define 335 the HAMMER_FQDNs as the X most popular FQDNs. 337 HAMMER_FORWARDER: should be a user configurable variable. It is 338 optional and designates the DNS server the resolver forwards 339 the request to. 341 7. IANA Considerations 343 This document makes no request of the IANA. 345 8. Security Considerations 347 This technique leverages existing protocols, and should not introduce 348 any new risks, other than a slight increase in traffic. 350 By initiating cache fill entries before the existing RR has expired 351 this technique will slightly increase the number of queries seen by 352 authoritative servers. This increase will be inversely proportional 353 to the average TTL of the records that they serve. 355 It is unlikely, but possible, that this increase could cause a denial 356 of service condition. 358 9. Acknowledgements 360 The authors wish to thank Tony Finch and MC Hammer. We also wish to 361 thank Brian Somers and Wouter Wijngaards for telling us that they 362 already do this :-) (They should probably be co-authors, but I left 363 this too close to the draft cutoff time to confirm with them that 364 they are willing to have their names on this). 366 10. References 368 10.1. Normative References 370 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 371 Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/ 372 RFC2119, March 1997, 373 . 375 10.2. Informative References 377 [I-D.ietf-sidr-iana-objects] 378 Manderson, T., Vegoda, L., and S. Kent, "RPKI Objects 379 issued by IANA", draft-ietf-sidr-iana-objects-03 (work in 380 progress), May 2011. 382 Appendix A. Changes / Author Notes. 384 [RFC Editor: Please remove this section before publication ] 386 From -01 to -02: 388 o Readbility / cleanup. 390 o Tried to make it more clear that most implementations now support 391 this (although they call it "prefetch" ) 393 From -00 to 01: 395 o Fairly large rewrite. 397 o Added text on the fact that there are implmentations that do this. 399 o Added the "prefetch" name, cleaned up some readability. 401 o Daniel's test (Section 3.2) added. 403 From -template to -00. 405 o Wrote some text. 407 o Changed the name. 409 Authors' Addresses 411 Warren Kumari 412 Google 413 1600 Amphitheatre Parkway 414 Mountain View, CA 94043 415 US 417 Email: warren@kumari.net 418 Roy Arends 419 Nominet 420 Edmund Halley Road 421 Oxford OX4 4DQ 422 United Kingdom 424 Email: roy@nominet.org.uk 426 Suzanne Woolf 427 39 Dodge St. #317 428 Beverly, MA 01915 429 US 431 Email: suzworldwide@gmail.com 433 Daniel Migault 434 Orange 435 38 rue du General Leclerc 436 92794 Issy-les-Moulineaux Cedex 9 437 France 439 Phone: +33 1 45 29 60 52 440 Email: daniel.migaultf@orange.com