idnits 2.17.00 (12 Aug 2021) /tmp/idnits2166/draft-wkumari-dnsop-hammer-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (July 4, 2014) is 2878 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Unused Reference: 'I-D.ietf-sidr-iana-objects' is defined on line 387, but no explicit reference was found in the text == Outdated reference: draft-ietf-sidr-iana-objects has been published as RFC 6491 Summary: 0 errors (**), 0 flaws (~~), 3 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 template W. Kumari 3 Internet-Draft Google 4 Intended status: Informational R. Arends 5 Expires: January 5, 2015 Nominet 6 S. Woolf 8 D. Migault 9 Orange 10 July 4, 2014 12 Highly Automated Method for Maintaining Expiring Records 13 draft-wkumari-dnsop-hammer-01 15 Abstract 17 This document describes a simple DNS cache optimization which keeps 18 the most popular records in the DNS cache: Highly Automated Method 19 for Maintaining Expiring Records (HAMMER). The principle is that 20 records in the cache are fetched, that is to say resolved before 21 their TTL expires and the record is flushed from the cache. By 22 fetching Records before they are being queried by an end user, HAMMER 23 is expected to improve the quality of experience of the end users as 24 well as to optimize the resources involved in large DNSSEC resolving 25 platforms. 27 Status of This Memo 29 This Internet-Draft is submitted in full conformance with the 30 provisions of BCP 78 and BCP 79. 32 Internet-Drafts are working documents of the Internet Engineering 33 Task Force (IETF). Note that other groups may also distribute 34 working documents as Internet-Drafts. The list of current Internet- 35 Drafts is at http://datatracker.ietf.org/drafts/current/. 37 Internet-Drafts are draft documents valid for a maximum of six months 38 and may be updated, replaced, or obsoleted by other documents at any 39 time. It is inappropriate to use Internet-Drafts as reference 40 material or to cite them other than as "work in progress." 42 This Internet-Draft will expire on January 5, 2015. 44 Copyright Notice 46 Copyright (c) 2014 IETF Trust and the persons identified as the 47 document authors. All rights reserved. 49 This document is subject to BCP 78 and the IETF Trust's Legal 50 Provisions Relating to IETF Documents 51 (http://trustee.ietf.org/license-info) in effect on the date of 52 publication of this document. Please review these documents 53 carefully, as they describe your rights and restrictions with respect 54 to this document. Code Components extracted from this document must 55 include Simplified BSD License text as described in Section 4.e of 56 the Trust Legal Provisions and are provided without warranty as 57 described in the Simplified BSD License. 59 Table of Contents 61 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 62 1.1. Requirements notation . . . . . . . . . . . . . . . . . . 3 63 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 64 3. Motivations . . . . . . . . . . . . . . . . . . . . . . . . . 3 65 3.1. Improving browsing Quality of Experience by reducing 66 response time . . . . . . . . . . . . . . . . . . . . . . 3 67 3.2. Optimize the resources involved in large DNSSEC resolving 68 platforms . . . . . . . . . . . . . . . . . . . . . . . . 4 69 4. Overview of Operation . . . . . . . . . . . . . . . . . . . . 5 70 5. Known implementations . . . . . . . . . . . . . . . . . . . . 5 71 5.1. Unbound (NLNet Labs) . . . . . . . . . . . . . . . . . . 6 72 5.2. OpenDNS . . . . . . . . . . . . . . . . . . . . . . . . . 6 73 5.3. ISC BIND . . . . . . . . . . . . . . . . . . . . . . . . 6 74 6. An example / reference implementation . . . . . . . . . . . 6 75 6.1. Variables . . . . . . . . . . . . . . . . . . . . . . . . 7 76 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8 77 8. Security Considerations . . . . . . . . . . . . . . . . . . . 8 78 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 8 79 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 9 80 10.1. Normative References . . . . . . . . . . . . . . . . . . 9 81 10.2. Informative References . . . . . . . . . . . . . . . . . 9 82 Appendix A. Changes / Author Notes. . . . . . . . . . . . . . . 9 83 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 9 85 1. Introduction 87 A recursive DNS resolver may cache a Resource Record (RR) for, at 88 most, the Time To Live (TTL) associated with that record. While the 89 TTL is greater than zero, the resolver may respond to queries from 90 its cache, but once the TTL has reached zero, the resolver flushes 91 the RR. When the resolver gets another query for that resource, it 92 needs to initiate a new query. This is then cached and returned to 93 the querying client. This document discusses an optimization (Highly 94 Automated Method for Maintaining Expiring Records -- (HAMMER), also 95 known as "prefetch") to help keep popular responses in the cache, by 96 fetching new responses before the TTL expires. This behavior is 97 triggered by an incoming query, and only shortly before the cache 98 entry was due to expire. 100 1.1. Requirements notation 102 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 103 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 104 document are to be interpreted as described in [RFC2119]. 106 2. Terminology 108 - HAMMER resolver: A DNS resolver that implements HAMMER mechanism. 110 - HAMMER FQDN: A FQDN that is a candidate for the HAMMER process. 112 - HAMMER TIME: TTL Time to consider before triggering the HAMMER 113 mechanism. 115 3. Motivations 117 When a recursive resolver responds to a client, it either responds 118 from cache, or it initiates an iterative query to resolve the answer, 119 caches the answer and then responds with that answer. 121 3.1. Improving browsing Quality of Experience by reducing response time 123 Any end user querying a fetched FQDN will get the response from the 124 cache of the resolver. This provides faster responses, and thus 125 improves end user experience for browsing and other applications/ 126 activities. 128 Popular FQDNs are highly queried, and end users have high 129 expectations in terms of application response for these FQDNs. With 130 regular DNS rules, once the FQDN has been flushed from the cache, it 131 waits for the next end user to request the FQDN before initiating a 132 resolution for this given FQDN with iterative queries. This results 133 in at least one end user waiting for this resolution to be performed 134 over the Internet before the response is sent to him. This may 135 provide a poor user experience since DNS response times over the 136 Internet are unpredictable at best and it provides a response time 137 longer then usual. 139 In some cases, not only the first end user querying that FQDN may be 140 impacted, but also other end users that request the FQDN between the 141 time the FQDN TTL expires and the time the cache is again filled. In 142 this case, the result is impact on multiple end users and possible 143 unnecessary load on the platform. Note that this load is increased 144 by the use of DNSSEC since DNSSEC may involve additional resolutions, 145 larger payloads, and signature checks. 147 DNS response time for a resolution over the Internet is highly 148 unpredictable as it depends on network congestion and servers' 149 availability. Links share their bandwidth, so heavily loaded links 150 result in higher response time, regardless of whether the congestion 151 occurs close to the resolver, close to the client, or close to the 152 authoritative servers. Loaded switches or routers may result in 153 packet drop, which requires the resolver to notice the packet has 154 been dropped (usually with a time out) and restart the iterative 155 resolution. These issues are increased by the use of DNSSEC which 156 makes DNS packets larger. Similarly, loaded servers have longer 157 response times. 159 3.2. Optimize the resources involved in large DNSSEC resolving 160 platforms 162 Large resolving platforms are often composed of a set of independent 163 resolving nodes. The traffic is usually load balanced based on the 164 query source IP addresses. This results in most popular FQDNs being 165 resolved independently by all nodes. First this increases the number 166 of end users who may experience unnecessary latency. Also, when 167 DNSSEC is used, all nodes independently perform signature check 168 operations, possibly resulting in high loads on the authoritative 169 server. 171 The challenge these large DNSSEC resolving platforms have to overcome 172 is to provide a uniform distribution of the nodes given that end user 173 and FQDNs do not have a uniform distribution of the resources. More 174 specifically, FQDNs and end users usually present Zipf popularity 175 distributions, which means that most of the traffic is performed by a 176 small set of end users and by a small set of FQDNs. 178 DNS and large resolving DNS platforms have resulted in uniformly 179 balanced traffic among the nodes. In fact the resolving traffic on 180 the Internet interface was rather small (at least in term of CPU) 181 compared to traffic received from the end users. DNSSEC changed 182 this, as CPU are involved in performing signature checks. One way to 183 reduce the number of DNSSEC resolutions is to fetch the nodes with 184 the most popular FQDNs. This avoids parallel resolutions and overall 185 reduces cost, because signature checks are not performed, while 186 benefiting from the already existing load balancing architecture. 187 This architecture takes advantage of the Zipf distribution of the 188 FQDNs' popularity. In fact, a few number of FQDNs can be cached (a 189 few thousands) to address most of the traffic (up to 70%). 191 Note that to perform a single resolution for the global platform, 192 nodes may be configured as forwarders for the most popular FQDNs 194 4. Overview of Operation 196 When an incoming query is received, and the result is in the cache, 197 the query is answered from the cache. If the remaining TTL of the 198 record is below some threshold, the recursive server will also 199 initiate a cache fill operation in the background to refresh the 200 cache entry. 202 The fact that the behavior is triggered by an incoming query (and not 203 by periodically scanning the cache and refreshing all entries that 204 are about to expire) allows unpopular names to age out of the cache 205 naturally, while keeping popular entries in the cache. 207 5. Known implementations 209 [Ed: Well, this is kinda embarrassing. This idea occurred to us one 210 day while sitting around a pool in New Hampshire. It then took a 211 while before I wrote it down, mostly because I *really* wanted to get 212 "Stop! Hammer Time!" into a draft. Anyway, we presented it in 213 Berlin, and Wouter Wijngaards stood up and mentioned that Unbound 214 already does this (they use a percentage of TTL, instead of a number 215 of seconds). Then we heard from OpenDNS that they *also* implement 216 something similar. Then we had a number of discussions, then got 217 sidetracked into other things. Anyway, BIND as of 9.10, around Feb 218 2014 now implements something like this (https://deepthought.isc.org/ 219 article/AA-01122/0/Early-refresh-of-cache-records-cache-prefetch-in- 220 BIND-9.10.html), and enables it by default. Unfortunately, while 221 BIND uses the times based approach, they named their parameters 222 "trigger" and "eligibility" - and shouting "Eligibility! Trigger 223 time!" simply isn't funny (unless you have a very odd sense of 224 humor... So, we are now documenting implementations that existed 225 before this was published and an impl,entation that we think was 226 based on this. We think that this has value to the community. I'm 227 also leaving in the HAMMER TIME bit, because it makes me giggle. 228 This below section should be filled out with more detail, in 229 collaboration with the implementors, but this is being written *just* 230 before the draft cutoff.]. 232 A number of recursive resolvers implement techniques similar to the 233 techniques described in this document. This section documents some 234 of these and tradeoffs they make in picking their techniques. 236 5.1. Unbound (NLNet Labs) 238 The Unbound validating, recursive, and caching DNS resolver 239 implements a HAMMER type feature, called "prefetch". This feature 240 can be enabled or disabled though the configuration option "prefetch: 241 ". When enabled, Unbound will fetch expiring records when 242 their remaining TTL is less than 10% of their original TTL. 244 [Ed: Unbound's "prefetch" function was developed independently, 245 before this draft was written. The authors were unaware of it when 246 writing the document.] 248 5.2. OpenDNS 250 The public DNS resolver, OpenDNS implements a prefetch like solution. 252 [Ed: Will work with OpenDNS to get more details.] 254 5.3. ISC BIND 256 As of version 9.10, Internet Systems Consortium's BIND implements the 257 HAMMER functionality. This feature is enabled by default. 259 The functionality is configured using the "prefetch" options 260 statement, with two parameters: 262 Trigger This is equivalent to the HAMMER_TIME parameter described 263 below. 265 Eligibility This is equivalent to the STOP parameter described 266 below. 268 6. An example / reference implementation 270 When a recursive resolver that implements HAMMER receives a query for 271 information that it has in the cache, it responds from the cache. 273 If the queried FQDN is a HAMMER FQDN, the HAMMER resolver compares 274 the TTL value to the HAMMER TIME, as well as if the FQDN has already 275 been fetched. 277 If the HAMMER FQDN has already been fetched or provisioned) then 278 nothing is done. 280 If the HAMMER FQDN has not yet been fetched and the TTL is less then 281 the HAMMER_TIME, the HAMMER resolver starts a resolution for the 282 queried FQDN in order to fill the cache, just as if the TTL had 283 expired. During this cache fill operation the resolver continues to 284 respond from cache (until the TTL expires). When the cache fill 285 query completes, the new response replaces the existing cached 286 information. This ensures the cache has fresh data for subsequent 287 queries. 289 Since the cache fill query is initiated before the existing cached 290 entry expires (and is flushed), responses will come from the cache 291 more often. This decreases the client resolution latency and 292 improves the user experience. 294 The cache fill resolution is triggered by an incoming query (and only 295 if that query arrives shortly before the record would expire anyway). 296 This effectively keeps the most popular data uniformly queried in the 297 cache, without having to maintain counters in the cache or 298 proactively resolve responses that are not likely to be needed as 299 often. This is purely an implementation optimization - resolvers 300 always have the option to cache records for less than the TTL (for 301 example, when running low on cache space, etc), this simply triggers 302 a refresh of the RR before it expires. 304 Note that non-uniformly queried FQDNs may be popular and may not 305 benefit from the HAMMER mechanism. For example, an FQDNs MAY be 306 heavily queried the first 10 minutes of every hour with a 30 minute 307 TTL. In that case DNS queries are not expected to come between TTL - 308 HAMMER_TIME and TTL. 310 HAMMER FQDNs with small TTL may generate a cache fill process even 311 though they are not so popular. Suppose an end user is setting a 312 specific session which requires multiple DNS resolutions on a given 313 FQDN. These resolutions are necessary for a short period of time, 314 i.e. the necessary time to establish the session. If these FQDNs 315 have been set with a small TTL - in the order of the time session 316 establishment - the multiple queries to a HAMMER resolver may trigger 317 an unnecessary resolution. As a result HAMMER would not scale 318 thousands of these FQDNs. As a result, if the original TTL of the RR 319 is less than (or close to HAMMER_TIME), the described method could 320 cause excessive cache fill queries to occur. In order to prevent 321 this an additional variable named STOP (described below) is 322 introduced. If the original TTL of the RR is less than STOP * 323 HAMMER_TIME then the cache entry should be marked with a "Can't touch 324 this" flag, and the described method should not be used. 326 6.1. Variables 328 These are the mandatory variables: 330 - HAMMER_TIME: is the number of seconds before TTL expiration that a 331 cache fill query should be initiated. This should be a user 332 configurable value. A default of 2 seconds is RECOMMENDED. 334 - STOP: should be a user configurable variable. A default of 3 is 335 recommended. 337 Implementations may consider additional variables. These are not 338 mandatory but would address specific use of the HAMMER. 340 - HAMMER_MATCH: should be a user configurable variable. It defines 341 FQDNs that are expected to implement HAMMER. This rule can be 342 expressed in different ways. It can be a list of FQDNs, or a 343 number indicating the number of most popular FQDNs that needs 344 to be considered. How HAMMER_MATCH is expressed is 345 implementation dependent. Implementations can use a list of 346 FQDNs, others can use a matching rule on the FQDNs, or define 347 the HAMMER_FQDNs as the X most popular FQDNs. 349 - HAMMER_FORWARDER: should be a user configurable variable. It is 350 optional and designates the DNS server the resolver forwards 351 the request to. 353 7. IANA Considerations 355 This document makes no request of the IANA. 357 8. Security Considerations 359 This technique leverages existing protocols, and should not introduce 360 any new risks, other than a slight increase in traffic. 362 By initiating cache fill entries before the existing RR has expired 363 this technique will slightly increase the number of queries seen by 364 authoritative servers. This increase will be inversely proportional 365 to the average TTL of the records that they serve. 367 It is unlikely, but possible that this increase could cause a denial 368 of service condition. 370 9. Acknowledgements 372 The authors wish to thank Tony Finch and MC Hammer. We also wish to 373 thank Brian Somers and Wouter Wijngaards for telling us that they 374 already do this :-) (They should probably be co-authors, but I left 375 this too close to the draft cutoff time to confirm with them that 376 they are willing to have thier names on this). 378 10. References 380 10.1. Normative References 382 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 383 Requirement Levels", BCP 14, RFC 2119, March 1997. 385 10.2. Informative References 387 [I-D.ietf-sidr-iana-objects] 388 Manderson, T., Vegoda, L., and S. Kent, "RPKI Objects 389 issued by IANA", draft-ietf-sidr-iana-objects-03 (work in 390 progress), May 2011. 392 Appendix A. Changes / Author Notes. 394 [RFC Editor: Please remove this section before publication ] 396 From -00 to 01: 398 o Fairly large rewrite. 400 o Added text on the fact that there are implmentations that do this. 402 o Added the "prefetch" name, cleaned up some readability. 404 o Daniel's test (Section 3.2) added. 406 From -template to -00. 408 o Wrote some text. 410 o Changed the name. 412 Authors' Addresses 414 Warren Kumari 415 Google 416 1600 Amphitheatre Parkway 417 Mountain View, CA 94043 418 US 420 Email: warren@kumari.net 421 Roy Arends 422 Nominet 423 Edmund Halley Road 424 Oxford OX4 4DQ 425 United Kingdom 427 Email: roy@nominet.org.uk 429 Suzanne Woolf 430 39 Dodge St. #317 431 Beverly, MA 01915 432 US 434 Email: suzworldwide@gmail.com 436 Daniel Migault 437 Orange 438 38 rue du General Leclerc 439 92794 Issy-les-Moulineaux Cedex 9 440 France 442 Phone: +33 1 45 29 60 52 443 Email: daniel.migaultf@orange.com