Network Working Group H. Alvestrand Request for Comments: 3254 Cisco Systems Category: Informational April 2002 Definitions for talking about directories Status of this Memo This memo provides information for the Internet community. It does not specify an Internet standard of any kind. Distribution of this memo is unlimited. Copyright Notice Copyright (C) The Internet Society (2002). All Rights Reserved.
AbstractWhen discussing systems for making information accessible through the Internet in standardized ways, it may be useful if the people who are discussing it have a common understanding of the terms they use. For example, a reference to this document would give one the power to agree that the DNS (Domain Name System) is a global lookup repository with perimeter integrity and loose, converging consistency. On the other hand, a LDAP (Lightweight Directory Access Protocol) directory server is a local, centralized repository with both lookup and search capability. This document discusses one group of such systems which is known under the term, "directories". SEC].) - Repository: An amount of data that is accessible through one or more access methods.
- Requester: Entity that may (try to) access data in a repository. Note that no assumption is made that the requester is animal, vegetable, or mineral. - Maintainer: Entity that causes changes to the data in the repository. Usually, all maintainers are requesters, since they need to look at the data too, however, the roles are distinct. - Access method: Well-defined series of operations that will cause data available from a repository to be obtained by the requester. - Site: Entity that hosts all or part of a repository, and makes it available through one or more access methods. A site may in various contexts be a machine, a datacenter, a network of datacenters, or a single device. This document is not intended to be either comprehensive or definitive, but is intended to give some aid in mutual comprehension when discussing information access methods to be incorporated into Internet Standards-Track documents.
- Replicated repository: A distributed repository where all sites have the same information. - Cooperative repository: A distributed repository where not all sites have all the information, but where mechanisms exist to get the info to the requester, even when it is not available to the site originally asked. Note: The term "global" is often a matter of social or legal context; for instance, the E.164 telephone numbering system is global by international treaty, while the debate about whether the Domain Name System is global in fact or just a local repository with ambitions has proved bait for too many discussions to enumerate. Some claim that globality is in the eye of the beholder; "everything is local to some context". When discussing technology, it may be wise to use "very widely deployed" instead. Note: Locating the repositories changes with the scale of consideration. For instance, the global DNS system is considered a distributed cooperative repository, built out of zone repositories that themselves may be distributed, and are always replicated when distributed.
In database terms, a lookup method corresponds to a query exactly matching a unique key on a table; all other database queries would be classified as "search" methods. In general, repositories that offer more flexible search methods may also give room for ad-hoc queries, refinements from a previous query, approximate matching and other aids; this may lead to many different combinations of precision and recall. One may define terms to enumerate what one gets out of these repositories: . Precision is the degree to which what you asked for is what you wanted (no extraneous information) . Recall is the ability to assure oneself that all relevant data from the repository is returned . Type I errors occurs when relevant data exists in the repository, but is not returned . Type II errors occur when irrelevant data is returned with a query result Note that these concepts can only be applied when the property "relevance" is well defined; that is, it depends on what the repository is used for. A further discussion of these topics is found in [KORFHAGE]. An orthogonal dimension has to do with time: - Query repositories will answer a request with a response, and once that is over with, will do nothing more. - Notify repositories will get a request from a user to have information returned at some later time when it becomes available, current or whatever, and will respond at that time with a notification that information is available. - Subscription repositories are like notify repositories, but will transfer the actual information when available.
defined as having the same request, using the same credentials, be answered with different data at different sites. Distributed repositories may have: - Strict consistency, where the problem above never arises. This is quite difficult; repositories that exhibit this property are usually quite constrained and/or quite expensive. - Strict internal consistency, where the replies always reflect a consistent picture of the total repository, but some sites may reflect an earlier version of the repository than others. - Loose, converging consistency, where different parts of the repository may be updated at different times as seen from a single site, but the process is designed in such a way that if one stops making changes to the repository, all sites will sooner or later present the same information. - Inconsistency, where no guarantee can be made whatsoever One interesting variant is subset consistency, where the system is consistent (according to one of the definitions above), but not all questions will be answered at all sites; possibly because different sites have different policies on what they make available (NetNews), or because different sites only need different subsets of the "whole picture" (BGP). SEC]. Some thoughts, though: On trust in data: Why do we trust a piece of data to be correct? - Because it's in the repository (and therefore must have been authorized). This is perimeter (or Eggshell) integrity. - Because it contains internal integrity checks, usually involving digital signatures by verifiable identities. This is item integrity; the granularity of the integrity and the ability to do
integrity checks on the relationships between objects is extremely important and extremely hard to get right, as is establishing the roots of the trust chains. - Because it fits other available information, and causes the right things to happen when I use it. This is hopeful integrity. Which integrity model to choose is a matter of evaluating the cost of implementing the integrity (cost), the value to you of integrity of the resource being protected (value), and the impact of cost on doing business (risk). On access to information, the usual categories apply: - Open access: Anyone can get the information. - Property-based access: Access because of what you are, or where you are. For example limited to "same network", "physically present", or "resolvable DNS name" - Identity-based access: Access because of who you are (or successfully claim to be). (I.e., username/password, personal certificates or other verifiable information.) These are then backed up by a layer specifying what the identity you have proven yourself to be has access to. - Token-based access: Access because of what you have. Hardware tokens, smartcards, certificates, or capability keys. In this case, access is given to all who can present that credential, without caring about their identity. The most common approaches are identity-based and open access; however, "what you have" access is commonly used informally in, for example, password-protected FTP or Web sites where the password is shared between all members of a group.
- Read-mostly repositories are designed based on a theory that reads will greatly outnumber updates; this may, for instance, be reflected in relatively slow consistency-updating protocols. - Read-write repositories assume that the updates and the read operations are of the same order of magnitude. - Write-mostly repositories are designed to store an incoming stream of data, and when needed reproduce a relevant piece of data from the stream. Typical examples are insurance company databases and audit logs. DNS] is a global cooperative lookup repository with loose, converging consistency and query capability only. It is either strictly read-only or read-mostly (with Dynamic DNS), has an open access model, and mainly perimeter integrity (some would say hopeful integrity). DNSSEC [DNSSEC] aims to give it item integrity. The DNS is built out of zone repositories that themselves may be distributed, and are always replicated when distributed. Note that like many other systems, the DNS has some features that do not fit neatly in the classification; for instance, there is a (deprecated and not widely used) function called IQUERY, which allows a very limited query capability.
If one opens up the box and looks at the relationship between primary and secondary nameservers, that can be seen as a limited form of notify capability, but this is not available to end-users of the total system. X500] was intended to be a global search repository with loose, converging consistency. It was intended to be read-mostly, perimeter secure and query- capable. BGP1] is often viewed as a global read-write repository with loose, converging subset consistency (not all routes are carried everywhere) and very limited integrity control, mostly intended to be perimeter integrity based on, "access control based on what you are". One can argue that BGP [BGP2] is better viewed as a global mechanism for updating a set of local read/write repositories, since far from all routing information is carried everywhere, and the decision on what routes to accept is always considered a local policy matter. But from a security model perspective, a lot of the controls are applied at the periphery of the routing system, not at each local repository; this still makes it interesting to consider properties that apply to the BGP system as a whole. NEWS] is a global read-write repository with loose (non- converging) subset consistency (not all sites carry all articles, and article retention times differ). Between sites it offers subscription capability; to users it offers both search and lookup functionality. SNMP] agent can be thought of as a local, centralized repository offering lookup functionality. With SNMPv3, it offers all kinds of access models, but mostly, "access because of what you have", seems popular.
section 2.4 when designing repositories or repository access protocols. [SEC] Shirey, R., "Internet Security Glossary", FYI 36, RFC 2828, May 2000. [DNS] Mockapetris, P., "Domain names - concepts and facilities", STD 13, RFC 1034, November 1987. [DNSSEC] Eastlake, D., "Domain Name System Security Extensions", RFC 2535, March 1999. [E164] ITU-T Recommendation E.164/I.331 (05/97): The International Public Telecommunication Numbering Plan. 1997. [BGP1] "Analyzing the Internet's BGP Routing Table", published in "The Internet Protocol Journal", Volume 4, No 1, April 2001. At the time of writing, available at http://www.telstra.net/gih/papers/ipj/4-1-bgp.pdf
[BGP2] Rekhter, Y. and T. Li, "A Border Gateway Protocol 4 (BGP-4)", RFC 1771, March 1995. [NEWS] Kantor, B. and P. Lapsley, "Network News Transfer Protocol", RFC 977, February 1986. [SNMP] Case, J., Mundy, R., Partain, D. and B. Stewart, "Introduction to Version 3 of the Internet-standard Network Management Framework", RFC 2570, April 1999. [X500] Weider, C. and J. Reynolds, "Executive Introduction to Directory Services Using the X.500 Protocol", FYI 13, RFC 1308, March 1992. [KORFHAGE] "Information Storage and Retrieval", Robert R. Korfhage, Wiley 1997. See page 194 for "precision" and "recall" definitions.
Acknowledgement Funding for the RFC Editor function is currently provided by the Internet Society.