idnits 2.17.00 (12 Aug 2021) /tmp/idnits39570/draft-waltermire-content-repository-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document date (May 16, 2012) is 3650 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- No issues found here. Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group D. Waltermire, Ed. 3 Internet-Draft NIST 4 Intended status: Informational May 16, 2012 5 Expires: November 17, 2012 7 Automated XML Content Data Exchange and Management 8 draft-waltermire-content-repository-00 10 Abstract 12 TBD... 14 Status of this Memo 16 This Internet-Draft is submitted in full conformance with the 17 provisions of BCP 78 and BCP 79. 19 Internet-Drafts are working documents of the Internet Engineering 20 Task Force (IETF). Note that other groups may also distribute 21 working documents as Internet-Drafts. The list of current Internet- 22 Drafts is at http://datatracker.ietf.org/drafts/current/. 24 Internet-Drafts are draft documents valid for a maximum of six months 25 and may be updated, replaced, or obsoleted by other documents at any 26 time. It is inappropriate to use Internet-Drafts as reference 27 material or to cite them other than as "work in progress." 29 This Internet-Draft will expire on November 17, 2012. 31 Copyright Notice 33 Copyright (c) 2012 IETF Trust and the persons identified as the 34 document authors. All rights reserved. 36 This document is subject to BCP 78 and the IETF Trust's Legal 37 Provisions Relating to IETF Documents 38 (http://trustee.ietf.org/license-info) in effect on the date of 39 publication of this document. Please review these documents 40 carefully, as they describe your rights and restrictions with respect 41 to this document. Code Components extracted from this document must 42 include Simplified BSD License text as described in Section 4.e of 43 the Trust Legal Provisions and are provided without warranty as 44 described in the Simplified BSD License. 46 Table of Contents 48 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 49 1.1. Requirements Language . . . . . . . . . . . . . . . . . . . 7 50 1.2. Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 51 1.2.1. Content . . . . . . . . . . . . . . . . . . . . . . . . 7 52 1.2.2. Security Automation Content . . . . . . . . . . . . . . 7 53 1.2.3. Content Producer . . . . . . . . . . . . . . . . . . . 7 54 1.2.4. Content Consumer . . . . . . . . . . . . . . . . . . . 7 55 1.2.5. Content Bundle . . . . . . . . . . . . . . . . . . . . 7 56 2. Key Concepts . . . . . . . . . . . . . . . . . . . . . . . . . 7 57 2.1. The Content Metadata Model . . . . . . . . . . . . . . . . 7 58 2.2. Content federation . . . . . . . . . . . . . . . . . . . . 8 59 3. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 8 60 4. Security Considerations . . . . . . . . . . . . . . . . . . . . 8 61 5. References . . . . . . . . . . . . . . . . . . . . . . . . . . 8 62 5.1. Normative References . . . . . . . . . . . . . . . . . . . 8 63 5.2. Informative References . . . . . . . . . . . . . . . . . . 9 64 Appendix A. Additional Stuff . . . . . . . . . . . . . . . . . . . 9 65 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 9 67 1. Introduction 69 Data-driven programming is a common paradigm in software engineering. 70 When using this approach, a program is developed to process a series 71 of data statements that describe the sequence of actions to be taken. 72 These data statements, often referred to as content, provide the user 73 with a dynamic degree of control over the function of the software. 74 In many cases, this approach can lead to a proliferation of content. 75 Without adequate content management and distribution capabilities, 76 use of content can become impractical. 78 It is common practice today to format content using the Extensible 79 Markup Language XML . While many content management solutions exist 80 today, few are designed to support the management and distribution of 81 XML-based content. Current solutions largely focus on exploiting the 82 raw XML syntax or a specific data model. Some solutions, such as XML 83 databases, expose the raw syntax of XML for querying using techniques 84 like XQuery. Other solutions utilize specialized database schema 85 designed to support one or more specific data models represented in 86 XML using XML Schema . These solutions are often brittle, inflexible 87 to revisions of the underlying data models and do not adequately 88 represent the logical information components used within data-driven 89 programs. 91 XML-based data-driven content is produced by many organizations in a 92 range of formats, covering many different information domains. Where 93 content repositories exist to support this content, they often 94 operate independently and vary in the data models and capabilities 95 they support. Rarely do these repositories interact and if they do 96 it is through proprietary interfaces. Content consumers often have 97 to manually download the content they want to use with their tools. 98 In many cases they may want to customize this content for local use 99 and must contend with managing updates to the content manually. 101 One example of where data-driven programming is used is in the IT 102 Security Automation community. Standardized security automation 103 content is used to provide the instructions necessary for security 104 tools to examine a computer's state to evaluate and report on the 105 degree of compliance to configuration policies, to detect the 106 presence of vulnerabilities, and to verify the installation state of 107 patches. Other tools use data-driven content to collect and 108 correlate digital events or to aggregate security information. Much 109 of the focus in the security automation community has been on 110 defining the standards and schemas for expressing security-related 111 data in XML. Standardizing the methods for retrieval and exchange of 112 security automation content has not been a primary area of focus. 114 The content management challenges introduced by diverse data models, 115 decentralized production and use of content, and the proprietary 116 nature of content repositories today create a need to define common 117 content exchange requirements and mechanisms that will complement the 118 content specifications and XML schemas. 120 The following challenges are addressed by this specification: 122 Distribution - In the absence of a standardized, automated 123 distribution mechanism, content producers have no way to notify 124 content consumers when new or updated content is available. 125 Content consumers must manually import content at the point of 126 use. This specification defines an automated notification 127 mechanism that can be used to indicate to content consumers when 128 new or updated content is available. The specification also 129 defines the technical mechanisms used to exchange content between 130 repositories, providing a standardized delivery mechanism to make 131 remotely published content available at the point of use. 133 Reuse 135 Without a standardized method to search, retrieve and utilize 136 existing content, both content consumers and producers have a 137 tendency to recreate content. This duplication often causes 138 content to become static or stale, introduces errors, and 139 reduces the efficiency for developing content. In support of 140 making content more reusable, this specification provides 141 mechanisms for querying content so that it can be searched and 142 gathered from many content providers. This allows 143 organizations that are developing content to leverage, extend, 144 and customize existing content from a variety of sources. This 145 specification also defines a stable method of identifying 146 blocks of externally provided content enabling content to be 147 remotely referenced. This approach supports reuse and reduces 148 the need for manual duplication across repositories. 150 Interoperability 152 Content repositories may require proprietary clients or tools 153 to access their content. This hampers the ability for a 154 content consumer to retrieve content from a variety of content 155 sources using a single tool implementation. This specification 156 standardizes the methods used to publish to and retrieve 157 content from a content repository enabling standardized clients 158 to be developed. 160 Access to content repositories may be restricted or require the 161 use of various standard or proprietary communication protocols 162 (e.g. HTTP, FTP). Content is often packaged using various 163 file formats and compression algorithms, such as Zip, CAB or 164 GZIP. Variation in these approaches hampers interoperability. 165 This specification standardizes the communication protocol and 166 distribution formats used promoting interoperability. 168 Content packaging 170 XML-based content is exchanged as XML documents, also called 171 instances. This document centric view of information does not 172 align well with how humans use information. Humans are more 173 comfortable working with logical objects that represent a 174 concept (e.g. rule, assessment check, logical construct) verses 175 XML syntax. While XML Schema enables these concepts to be 176 modeled, XML is still represented as a collection of elements 177 and attributes. This specification defines a metamodel that 178 identifies the logical objects that are represented in XML- 179 based content and their boundaries within the XML model 180 enabling content repositories to use the conceptual view of the 181 content. 183 This technique enables XML instances to be treated as 184 containers of conceptual constructs. These conceptual 185 constructs can be exchanged individually and can be composed 186 into new documents dynamically based on metadata rules. This 187 specification will provide a methodology for gathering and 188 packaging content based on the needs or interest of the content 189 consumer using a metadata approach. 191 Integrity 193 Content consumers and need assurances that the content that has 194 been received has not been modified during the exchange 195 process. This specification defines the use of automated 196 mechanisms for verifying the integrity of exchanged content. 198 Confidentiality 200 In some scenarios, it is necessary to secure the exchange of 201 content or restrict access to specific content. This 202 specification will detail mechanisms for securing repository- 203 to-repository and client-to-repository communications. 204 Additionally this specification will specify authorization 205 mechanisms that enable restricted access to content if needed. 207 Content Version Management 209 The content managed by content repositories may often undergo 210 revision. When revisions occur, it is important to be able to 211 query specific revision to maintain the integrity of content 212 bundles. This specification provides a query method that 213 enables either a specific revision or the latest revision to be 214 retrieved. This approach also enables remote references to 215 include a content identifier and a specific revision. 217 Model Revision Management 219 Content repositories are often based on a specific data 220 specification revision. When using this approach, updating 221 content repository software to support specification revisions 222 may require costly, time-consuming effort. Organizations 223 maintaining content repositories may be reluctant to adopt new 224 revisions or support old revisions due to this burden. This 225 makes it difficult for a tool to use content based on an older 226 or newer model revision. This specification defines properties 227 within the metadata model to indicate where content is 228 backwards and forwards compatible. These properties are then 229 used to enable content to be provided based on the required 230 model revision or to drive proper error handling where content 231 is incompatible. 233 For example the Open Vulnerability and Assessment Language 234 (OVAL) versions content based on the major and minor revision 235 of the OVAL XML schema. A repository containing OVAL content 236 may have content ranging from OVAL 5.3 to 5.10. The difference 237 in model version, while minor, could negatively impact a 238 security tool's ability to properly process content that is 239 outside of its expected range . This could cause tool errors 240 or unexpected results to be produced. By using the model 241 revision properties in the metamodel, the effective model 242 revision of content returned from a content repository may be 243 calculated based on the maximum schema revision used. 244 Alternately, substitute content may be provided that supports a 245 specific maximum schema revision provided in the query. 247 By addressing these challenges, content producers will be able to 248 effectively manage and share content they produce, and content 249 consumers will be able to effectively use content provided by many 250 different providers. By defining communication interfaces that can 251 leverage existing communication protocols, we can begin to automate 252 content distribution among disparate systems and make content more 253 readily available. By defining a federated data model, we can 254 establish rules and relationships of data types which allow for 255 flexible content management with support for dynamic methods for 256 collecting and bundling content for consumers. 258 Sections [...] of this document focus on: 260 TBD 262 1.1. Requirements Language 264 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 265 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 266 document are to be interpreted as described in RFC 2119 [RFC2119]. 268 1.2. Terms 270 1.2.1. Content 272 1.2.2. Security Automation Content 274 1.2.3. Content Producer 276 1.2.4. Content Consumer 278 1.2.5. Content Bundle 280 2. Key Concepts 282 This section provides a high-level overview of key concepts 283 introduced in this specification. The first concept subsection 284 describes a content metamodel that provides a needed level of 285 abstraction over XML-based data models. The second subsection 286 describes the federated content architectural approach defined within 287 this specification. Through the use of these concepts, a robust, 288 general purpose, distributed content management system is possible 289 that supports automated content exchange between content consumers 290 and producers. 292 2.1. The Content Metadata Model 294 In order to create a generalized approach to XML-based content 295 management it is necessary to generalize how XML-based data is 296 processed by the content system. A variety of XML schema languages 297 are used to define the syntax used to express a data model in XML. 298 While these languages provide rules to constrain XML instance data, 299 they do not adequately describe the information objects that exist 300 within the model or the relationships between information objects. 301 An information object is a block of XML data that represents a 302 specific concept such as policy definition, a configuration setting 303 or a scanning rule. Relationships represent cross references or 304 links between information objects. Information objects and 305 relationships are concepts that humans use to conceptualize the data 306 model primitives that exist within content. In order for a content 307 management approach to be successful, a mechanism is needed that 308 bridges the gap between the XML syntax understood by machines and the 309 conceptual primitives that humans understand. The content metadata 310 model provides this bridge. 312 Within the content metamodel, an information object is represented as 313 an entity definition. 315 Complete this section... 317 2.2. Content federation 319 Complete this section... 321 Discuss 323 Use of namespaces within content identifiers for repository lookup 324 using DNS SRV records. Discuss using external namespaces for 325 other cases. 327 Discuss authoritative content repositories vs. caching repository 328 content. 330 Discuss using an architectural model similar to DNS for content 331 repositories (e.g. local, forwarding, caching). 333 3. IANA Considerations 335 This memo includes no request to IANA. 337 4. Security Considerations 339 All drafts are required to have a security considerations section. 340 See RFC 3552 [RFC3552] for a guide. 342 5. References 344 5.1. Normative References 346 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 347 Requirement Levels", BCP 14, RFC 2119, March 1997. 349 5.2. Informative References 351 [RFC3552] Rescorla, E. and B. Korver, "Guidelines for Writing RFC 352 Text on Security Considerations", BCP 72, RFC 3552, 353 July 2003. 355 Appendix A. Additional Stuff 357 This becomes an Appendix if needed. 359 Author's Address 361 David Waltermire (editor) 362 National Institute of Standards and Technology 363 100 Bureau Drive 364 Gaithersburg, Maryland 20877 365 USA 367 Phone: 368 Email: david.waltermire@nist.gov