Network Working Group J. Klensin Request for Comments: 3467 February 2003 Category: Informational Role of the Domain Name System (DNS) Status of this Memo This memo provides information for the Internet community. It does not specify an Internet standard of any kind. Distribution of this memo is unlimited. Copyright Notice Copyright (C) The Internet Society (2003). All Rights Reserved.
AbstractThis document reviews the original function and purpose of the domain name system (DNS). It contrasts that history with some of the purposes for which the DNS has recently been applied and some of the newer demands being placed upon it or suggested for it. A framework for an alternative to placing these additional stresses on the DNS is then outlined. This document and that framework are not a proposed solution, only a strong suggestion that the time has come to begin thinking more broadly about the problems we are encountering and possible approaches to solving them. 1. Introduction and History ..................................... 2 1.1 Context for DNS Development ............................... 3 1.2 Review of the DNS and Its Role as Designed ................ 4 1.3 The Web and User-visible Domain Names ..................... 6 1.4 Internet Applications Protocols and Their Evolution ....... 7 2. Signs of DNS Overloading ..................................... 8 3. Searching, Directories, and the DNS .......................... 12 3.1 Overview ................................................. 12 3.2 Some Details and Comments ................................. 14 4. Internationalization ......................................... 15 4.1 ASCII Isn't Just Because of English ....................... 16 4.2 The "ASCII Encoding" Approaches ........................... 17 4.3 "Stringprep" and Its Complexities ......................... 17 4.4 The Unicode Stability Problem ............................. 19 4.5 Audiences, End Users, and the User Interface Problem ...... 20 4.6 Business Cards and Other Natural Uses of Natural Languages. 22 4.7 ASCII Encodings and the Roman Keyboard Assumption ......... 22
4.8 Intra-DNS Approaches for "Multilingual Names" ............. 23 5. Search-based Systems: The Key Controversies .................. 23 6. Security Considerations ...................................... 24 7. References ................................................... 25 7.1 Normative References ...................................... 25 7.2 Explanatory and Informative References .................... 25 8. Acknowledgements ............................................. 30 9. Author's Address ............................................. 30 10. Full Copyright Statement ..................................... 31 RFC625], [RFC811], [RFC819], [RFC830], [RFC882]). In recent years, the DNS has become a database of convenience for the Internet, with many proposals to add new features. Only some of these proposals have been successful. Often the main (or only) motivation for using the DNS is because it exists and is widely deployed, not because its existing structure, facilities, and content are appropriate for the particular application of data involved. This document reviews the history of the DNS, including examination of some of those newer applications. It then argues that the overloading process is often inappropriate. Instead, it suggests that the DNS should be supplemented by systems better matched to the intended applications and outlines a framework and rationale for one such system. Several of the comments that follow are somewhat revisionist. Good design and engineering often requires a level of intuition by the designers about things that will be necessary in the future; the reasons for some of these design decisions are not made explicit at the time because no one is able to articulate them. The discussion below reconstructs some of the decisions about the Internet's primary namespace (the "Class=IN" DNS) in the light of subsequent development and experience. In addition, the historical reasons for particular decisions about the Internet were often severely underdocumented contemporaneously and, not surprisingly, different participants have different recollections about what happened and what was considered important. Consequently, the quasi-historical story below is just one story. There may be (indeed, almost certainly are) other stories about how the DNS evolved to its present state, but those variants do not invalidate the inferences and conclusions. This document presumes a general understanding of the terminology of RFC 1034 [RFC1034] or of any good DNS tutorial (see, e.g., [Albitz]).
RFC625], [RFC811], [RFC952]. The names themselves were restricted to a subset of ASCII [ASCII] chosen to avoid ambiguities in printed form, to permit interoperation with systems using other character codings (notably EBCDIC), and to avoid the "national use" code positions of ISO 646 [IS646]. These restrictions later became collectively known as the "LDH" rules for "letter-digit-hyphen", the permitted characters. The table was just a list with a common format that was eventually agreed upon; sites were expected to frequently obtain copies of, and install, new versions. The host tables themselves were introduced to: o Eliminate the requirement for people to remember host numbers (addresses). Despite apparent experience to the contrary in the conventional telephone system, numeric numbering systems, including the numeric host number strategy, did not (and do not) work well for more than a (large) handful of hosts. o Provide stability when addresses changed. Since addresses -- to some degree in the ARPANET and more importantly in the contemporary Internet -- are a function of network topology and routing, they often had to be changed when connectivity or topology changed. The names could be kept stable even as addresses changed. o Provide the capability to have multiple addresses associated with a given host to reflect different types of connectivity and topology. Use of names, rather than explicit addresses, avoided the requirement that would otherwise exist for users and other hosts to track these multiple host numbers and addresses and the topological considerations for selecting one over others. After several years of using the host table approach, the community concluded that model did not scale adequately and that it would not adequately support new service variations. A number of discussions and meetings were held which drew several ideas and incomplete proposals together. The DNS was the result of that effort. It continued to evolve during the design and initial implementation period, with a number of documents recording the changes (see [RFC819], [RFC830], and [RFC1034]).
The goals for the DNS included: o Preservation of the capabilities of the host table arrangements (especially unique, unambiguous, host names), o Provision for addition of additional services (e.g., the special record types for electronic mail routing which quickly followed introduction of the DNS), and o Creation of a robust, hierarchical, distributed, name lookup system to accomplish the other goals. The DNS design also permitted distribution of name administration, rather than requiring that each host be entered into a single, central, table by a central administration. X29]).
Another assumption was that there would be a high ratio of physical hosts to second level domains and, more generally, that the system would be deeply hierarchical, with most systems (and names) at the third level or below and a very large percentage of the total names representing physical hosts. There are domains that follow this model: many university and corporate domains use fairly deep hierarchies, as do a few country-oriented top level domains ("ccTLDs"). Historically, the "US." domain has been an excellent example of the deeply hierarchical approach. However, by 1998, comparison of several efforts to survey the DNS showed a count of SOA records that approached (and may have passed) the number of distinct hosts. Looked at differently, we appear to be moving toward a situation in which the number of delegated domains on the Internet is approaching or exceeding the number of hosts, or at least the number of hosts able to provide services to others on the network. This presumably results from synonyms or aliases that map a great many names onto a smaller number of hosts. While experience up to this time has shown that the DNS is robust enough -- given contemporary machines as servers and current bandwidth norms -- to be able to continue to operate reasonably well when those historical assumptions are not met (e.g., with a flat, structure under ".COM" containing well over ten million delegated subdomains [COMSIZE]), it is still useful to remember that the system could have been designed to work optimally with a flat structure (and very large zones) rather than a deeply hierarchical one, and was not. Similarly, despite some early speculation about entering people's names and email addresses into the DNS directly (e.g., see [RFC1034]), electronic mail addresses in the Internet have preserved the original, pre-DNS, "user (or mailbox) at location" conceptual format rather than a flatter or strictly dot-separated one. Location, in that instance, is a reference to a host. The sole exception, at least in the "IN" class, has been one field of the SOA record. Both the DNS architecture itself and the two-level (host name and mailbox name) provisions for email and similar functions (e.g., see the finger protocol [FINGER]), also anticipated a relatively high ratio of users to actual hosts. Despite the observation in RFC 1034 that the DNS was expected to grow to be proportional to the number of users (section 2.3), it has never been clear that the DNS was seriously designed for, or could, scale to the order of magnitude of number of users (or, more recently, products or document objects), rather than that of physical hosts. Just as was the case for the host table before it, the DNS provided critical uniqueness for names, and universal accessibility to them, as part of overall "single internet" and "end to end" models (cf.
[RFC2826]). However, there are many signs that, as new uses evolved and original assumptions were abused (if not violated outright), the system was being stretched to, or beyond, its practical limits. The original design effort that led to the DNS included examination of the directory technologies available at the time. The design group concluded that the DNS design, with its simplifying assumptions and restricted capabilities, would be feasible to deploy and make adequately robust, which the more comprehensive directory approaches were not. At the same time, some of the participants feared that the limitations might cause future problems; this document essentially takes the position that they were probably correct. On the other hand, directory technology and implementations have evolved significantly in the ensuing years: it may be time to revisit the assumptions, either in the context of the two- (or more) level mechanism contemplated by the rest of this document or, even more radically, as a path toward a DNS replacement. REGISTRAR] who have been anxious to "sell" additional names. And, of course, the perception that one needed a second-level (or even top-level) domain per product, rather than having names associated with a (usually organizational) collection of network resources, has led to a rapid acceleration in the number of names being registered. That acceleration has, in turn, clearly benefited registrars charging on a per-name basis, "cybersquatters", and others in the business of "selling" names, but it has not obviously benefited the Internet as a whole. This emphasis on second-level domain names has also created a problem for the trademark community. Since the Internet is international, and names are being populated in a flat and unqualified space, similarly-named entities are in conflict even if there would ordinarily be no chance of confusing them in the marketplace. The problem appears to be unsolvable except by a choice between draconian measures. These might include significant changes to the legislation and conventions that govern disputes over "names" and "marks". Or
they might result in a situation in which the "rights" to a name are typically not settled using the subtle and traditional product (or industry) type and geopolitical scope rules of the trademark system. Instead they have depended largely on political or economic power, e.g., the organization with the greatest resources to invest in defending (or attacking) names will ultimately win out. The latter raises not only important issues of equity, but also the risk of backlash as the numerous small players are forced to relinquish names they find attractive and to adopt less-desirable naming conventions. Independent of these sociopolitical problems, content distribution issues have made it clear that it should be possible for an organization to have copies of data it wishes to make available distributed around the network, with a user who asks for the information by name getting the topologically-closest copy. This is not possible with simple, as-designed, use of the DNS: DNS names identify target resources or, in the case of email "MX" records, a preferentially-ordered list of resources "closest" to a target (not to the source/user). Several technologies (and, in some cases, corresponding business models) have arisen to work around these problems, including intercepting and altering DNS requests so as to point to other locations. Additional implications are still being discovered and evaluated. Approaches that involve interception of DNS queries and rewriting of DNS names (or otherwise altering the resolution process based on the topological location of the user) seem, however, to risk disrupting end-to-end applications in the general case and raise many of the issues discussed by the IAB in [IAB-OPES]. These problems occur even if the rewriting machinery is accompanied by additional workarounds for particular applications. For example, security associations and applications that need to identify "the same host" often run into problems if DNS names or other references are changed in the network without participation of the applications that are trying to invoke the associated services.
provides its own inertia: unless the design is so seriously faulty as to prevent effective use (or there is a widely-perceived sense of impending disaster unless the protocol is replaced), future developments must maintain backward compatibility and workarounds for problematic characteristics rather than benefiting from redesign in the light of experience. Applications that are "almost good enough" prevent development and deployment of high-quality replacements. The DNS is both an illustration of, and an exception to, parts of this pessimistic interpretation. It was a second-generation development, with the host table system being seen as at the end of its useful life. There was a serious attempt made to reflect the computing state of the art at the time. However, deployment was much slower than expected (and very painful for many sites) and some fixed (although relaxed several times) deadlines from a central network administration were necessary for deployment to occur at all. Replacing it now, in order to add functionality, while it continues to perform its core functions at least reasonably well, would presumably be extremely difficult. There are many, perhaps obvious, examples of this. Despite many known deficiencies and weaknesses of definition, the "finger" and "whois" [WHOIS] protocols have not been replaced (despite many efforts to update or replace the latter [WHOIS-UPDATE]). The Telnet protocol and its many options drove out the SUPDUP [RFC734] one, which was arguably much better designed for a diverse collection of network hosts. A number of efforts to replace the email or file transfer protocols with models which their advocates considered much better have failed. And, more recently and below the applications level, there is some reason to believe that this resistance to change has been one of the factors impeding IPv6 deployment. section 2 of [RFC3439]).
For example: o While technical approaches such as larger and higher-powered servers and more bandwidth, and legal/political mechanisms such as dispute resolution policies, have arguably kept the problems from becoming critical, the DNS has not proven adequately responsive to business and individual needs to describe or identify things (such as product names and names of individuals) other than strict network resources. o While stacks have been modified to better handle multiple addresses on a physical interface and some protocols have been extended to include DNS names for determining context, the DNS does not deal especially well with many names associated with a given host (e.g., web hosting facilities with multiple domains on a server). o Efforts to add names deriving from languages or character sets based on other than simple ASCII and English-like names (see below), or even to utilize complex company or product names without the use of hierarchy, have created apparent requirements for names (labels) that are over 63 octets long. This requirement will undoubtedly increase over time; while there are workarounds to accommodate longer names, they impose their own restrictions and cause their own problems. o Increasing commercialization of the Internet, and visibility of domain names that are assumed to match names of companies or products, has turned the DNS and DNS names into a trademark battleground. The traditional trademark system in (at least) most countries makes careful distinctions about fields of applicability. When the space is flattened, without differentiation by either geography or industry sector, not only are there likely conflicts between "Joe's Pizza" (of Boston) and "Joe's Pizza" (of San Francisco) but between both and "Joe's Auto Repair" (of Los Angeles). All three would like to control "Joes.com" (and would prefer, if it were permitted by DNS naming rules, to also spell it as "Joe's.com" and have both resolve the same way) and may claim trademark rights to do so, even though conflict or confusion would not occur with traditional trademark principles. o Many organizations wish to have different web sites under the same URL and domain name. Sometimes this is to create local variations -- the Widget Company might want to present different material to a UK user relative to a US one -- and sometimes it is to provide higher performance by supplying information from the server topologically closest to the user. If the name resolution
mechanism is expected to provide this functionality, there are three possible models (which might be combined): - supply information about multiple sites (or locations or references). Those sites would, in turn, provide information associated with the name and sufficient site-specific attributes to permit the application to make a sensible choice of destination, or - accept client-site attributes and utilize them in the search process, or - return different answers based on the location or identity of the requestor. While there are some tricks that can provide partial simulations of these types of function, DNS responses cannot be reliably conditioned in this way. These, and similar, issues of performance or content choices can, of course, be thought of as not involving the DNS at all. For example, the commonly-cited alternate approach of coupling these issues to HTTP content negotiation (cf. [RFC2295]), requires that an HTTP connection first be opened to some "common" or "primary" host so that preferences can be negotiated and then the client redirected or sent alternate data. At least from the standpoint of improving performance by accessing a "closer" location, both initially and thereafter, this approach sacrifices the desired result before the client initiates any action. It could even be argued that some of the characteristics of common content negotiation approaches are workarounds for the non-optimal use of the DNS in web URLs. o Many existing and proposed systems for "finding things on the Internet" require a true search capability in which near matches can be reported to the user (or to some user agent with an appropriate rule-set) and to which queries may be ambiguous or fuzzy. The DNS, by contrast, can accommodate only one set of (quite rigid) matching rules. Proposals to permit different rules in different localities (e.g., matching rules that are TLD- or zone-specific) help to identify the problem. But they cannot be applied directly to the DNS without either abandoning the desired level of flexibility or isolating different parts of the Internet from each other (or both). Fuzzy or ambiguous searches are desirable for resolution of names that might have spelling variations and for names that can be resolved into different sets of glyphs depending on context. Especially when internationalization is considered, variant name problems go beyond simple differences in representation of a character or
ordering of a string. Instead, avoiding user astonishment and confusion requires consideration of relationships such as languages that can be written with different alphabets, Kanji- Hiragana relationships, Simplified and Traditional Chinese, etc. See [Seng] for a discussion and suggestions for addressing a subset of these issues in the context of characters based on Chinese ones. But that document essentially illustrates the difficulty of providing the type of flexible matching that would be anticipated by users; instead, it tries to protect against the worst types of confusion (and opportunities for fraud). o The historical DNS, and applications that make assumptions about how it works, impose significant risk (or forces technical kludges and consequent odd restrictions), when one considers adding mechanisms for use with various multi-character-set and multilingual "internationalization" systems. See the IAB's discussion of some of these issues [RFC2825] for more information. o In order to provide proper functionality to the Internet, the DNS must have a single unique root (the IAB provides more discussion of this issue [RFC2826]). There are many desires for local treatment of names or character sets that cannot be accommodated without either multiple roots (e.g., a separate root for multilingual names, proposed at various times by MINC [MINC] and others), or mechanisms that would have similar effects in terms of Internet fragmentation and isolation. o For some purposes, it is desirable to be able to search not only an index entry (labels or fully-qualified names in the DNS case), but their values or targets (DNS data). One might, for example, want to locate all of the host (and virtual host) names which cause mail to be directed to a given server via MX records. The DNS does not support this capability (see the discussion in [IQUERY]) and it can be simulated only by extracting all of the relevant records (perhaps by zone transfer if the source permits doing so, but that permission is becoming less frequently available) and then searching a file built from those records. o Finally, as additional types of personal or identifying information are added to the DNS, issues arise with protection of that information. There are increasing calls to make different information available based on the credentials and authorization of the source of the inquiry. As with information keyed to site locations or proximity (as discussed above), the DNS protocols make providing these differentiated services quite difficult if not impossible.
In each of these cases, it is, or might be, possible to devise ways to trick the DNS system into supporting mechanisms that were not designed into it. Several ingenious solutions have been proposed in many of these areas already, and some have been deployed into the marketplace with some success. But the price of each of these changes is added complexity and, with it, added risk of unexpected and destabilizing problems. Several of the above problems are addressed well by a good directory system (supported by the LDAP protocol or some protocol more precisely suited to these specific applications) or searching environment (such as common web search engines) although not by the DNS. Given the difficulty of deploying new applications discussed above, an important question is whether the tricks and kludges are bad enough, or will become bad enough as usage grows, that new solutions are needed and can be deployed. section 4), but all operations but the final one would involve searching other systems, rather than looking up identifiers in the DNS itself. As explained below, this would permit relaxation of several constraints, leading to a more capable and comprehensive overall system. Ultimately, many of the issues with domain names arise as the result of efforts to use the DNS as a directory. While, at the time this document was written, sufficient pressure or demand had not occurred to justify a change, it was already quite clear that, as a directory system, the DNS is a good deal less than ideal. This document suggests that there actually is a requirement for a directory system, and that the right solution to a searchable system requirement is a searchable system, not a series of DNS patches, kludges, or workarounds.
The following points illustrate particular aspects of this conclusion. o A directory system would not require imposition of particular length limits on names. o A directory system could permit explicit association of attributes, e.g., language and country, with a name, without having to utilize trick encodings to incorporate that information in DNS labels (or creating artificial hierarchy for doing so). o There is considerable experience (albeit not much of it very successful) in doing fuzzy and "sonex" (similar-sounding) matching in directory systems. Moreover, it is plausible to think about different matching rules for different areas and sets of names so that these can be adapted to local cultural requirements. Specifically, it might be possible to have a single form of a name in a directory, but to have great flexibility about what queries matched that name (and even have different variations in different areas). Of course, the more flexibility that a system provides, the greater the possibility of real or imagined trademark conflicts. But the opportunity would exist to design a directory structure that dealt with those issues in an intelligent way, while DNS constraints almost certainly make a general and equitable DNS-only solution impossible. o If a directory system is used to translate to DNS names, and then DNS names are looked up in the normal fashion, it may be possible to relax several of the constraints that have been traditional (and perhaps necessary) with the DNS. For example, reverse- mapping of addresses to directory names may not be a requirement even if mapping of addresses to DNS names continues to be, since the DNS name(s) would (continue to) uniquely identify the host. o Solutions to multilingual transcription problems that are common in "normal life" (e.g., two-sided business cards to be sure that recipients trying to contact a person can access romanized spellings and numbers if the original language is not comprehensible to them) can be easily handled in a directory system by inserting both sets of entries. o A directory system could be designed that would return, not a single name, but a set of names paired with network-locational information or other context-establishing attributes. This type of information might be of considerable use in resolving the "nearest (or best) server for a particular named resource"
problems that are a significant concern for organizations hosting web and other sites that are accessed from a wide range of locations and subnets. o Names bound to countries and languages might help to manage trademark realities, while, as discussed in section 1.3 above, use of the DNS in trademark-significant contexts tends to require worldwide "flattening" of the trademark system. Many of these issues are a consequence of another property of the DNS: names must be unique across the Internet. The need to have a system of unique identifiers is fairly obvious (see [RFC2826]). However, if that requirement were to be eliminated in a search or directory system that was visible to users instead of the DNS, many difficult problems -- of both an engineering and a policy nature -- would be likely to vanish. section 5). If a single structure is not needed, then, unlike the DNS, there would be no requirement for a global organization to authorize or delegate operation of portions of the structure.
The "no single structure" concept could be taken further by moving away from simple "names" in favor of, e.g., multiattribute, multihierarchical, faceted systems in which most of the facets use restricted vocabularies. (These terms are fairly standard in the information retrieval and classification system literature, see, e.g., [IS5127].) Such systems could be designed to avoid the need for procedures to ensure uniqueness across, or even within, providers and databases of the faceted entities for which the search is to be performed. (See [DNS-Search] for further discussion.) While the discussion above includes very general comments about attributes, it appears that only a very small number of attributes would be needed. The list would almost certainly include country and language for internationalization purposes. It might require "charset" if we cannot agree on a character set and encoding, although there are strong arguments for simply using ISO 10646 (also known as Unicode or "UCS" (for Universal Character Set) [UNICODE], [IS10646] coding in interchange. Trademark issues might motivate "commercial" and "non-commercial" (or other) attributes if they would be helpful in bypassing trademark problems. And applications to resource location, such as those contemplated for Uniform Resource Identifiers (URIs) [RFC2396, RFC3305] or the Service Location Protocol [RFC2608], might argue for a few other attributes (as outlined above).
Some of the lessons learned from increased understanding and the dissipation of naive beliefs should be taken as cautions by the wider community: the problems are not simple. Specifically, extracting small elements for solution rather than looking at whole systems, may result in obscuring the problems but not solving any problem that is worth the trouble. section 1.1). And ASCII isn't sufficient to completely represent English -- there are several words in the language that are correctly spelled only with characters or diacritical marks that do not appear in ASCII. With a broader selection of scripts, in some examples, case mapping works from one case to the other but is not reversible. In others, there are conventions about alternate ways to represent characters (in the language, not [only] in character coding) that work most of the time, but not always. And there are issues in coding, with Unicode/10646 providing different ways to represent the same character ("character", rather than "glyph", is used deliberately here). And, in still others, there are questions as to whether two glyphs "match", which may be a distance-function question, not one with a binary answer. The IETF approach to these problems is to require pre-matching canonicalization (see the "stringprep" discussion below). The IETF has resisted the temptations to either try to specify an entirely new coded character set, or to pick and choose Unicode/10646 characters on a per-character basis rather than by using well-defined blocks. While it may appear that a character set designed to meet Internet-specific needs would be very attractive, the IETF has never had the expertise, resources, and representation from critically- important communities to actually take on that job. Perhaps more important, a new effort might have chosen to make some of the many complex tradeoffs differently than the Unicode committee did, producing a code with somewhat different characteristics. But there is no evidence that doing so would produce a code with fewer problems and side-effects. It is much more likely that making tradeoffs differently would simply result in a different set of problems, which would be equally or more difficult.
RFC2181]), some restrictions are imposed by the requirement that text be interpreted in a case-independent way ([RFC1034], [RFC1035]). More important, most internet applications assume the hostname-restricted "LDH" syntax that is specified in the host table RFCs and as "prudent" in RFC 1035. If those assumptions are not met, many conforming implementations of those applications may exhibit behavior that would surprise implementors and users. To avoid these potential problems, IETF internationalization work has focused on "ASCII-Compatible Encodings" (ACE). These encodings preserve the LDH conventions in the DNS itself. Implementations of applications that have not been upgraded utilize the encoded forms, while newer ones can be written to recognize the special codings and map them into non-ASCII characters. These approaches are, however, not problem-free even if human interface issues are ignored. Among other issues, they rely on what is ultimately a heuristic to determine whether a DNS label is to be considered as an internationalized name (i.e., encoded Unicode) or interpreted as an actual LDH name in its own right. And, while all determinations of whether a particular query matches a stored object are traditionally made by DNS servers, the ACE systems, when combined with the complexities of international scripts and names, require that much of the matching work be separated into a separate, client-side, canonicalization or "preparation" process before the DNS matching mechanisms are invoked [STRINGPREP].
treatment of code positions to more nearly match the distinctions made by Unicode with user perceptions about similarities and differences between characters. But there were intense temptations (and pressures) to incorporate language-specific or country-specific rules. Those temptations, even when resisted, were indicative of parts of the ongoing controversy or of the basic unsuitability of the DNS for fully internationalized names that are visible, comprehensible, and predictable for end users. There have also been controversies about how far one should go in these processes of preparation and transformation and, ultimately, about the validity of various analogies. For example, each of the following operations has been claimed to be similar to case-mapping in ASCII: o stripping of vowels in Arabic or Hebrew o matching of "look-alike" characters such as upper-case Alpha in Greek and upper-case A in Roman-based alphabets o matching of Traditional and Simplified Chinese characters that represent the same words, o matching of Serbo-Croatian words whether written in Roman-derived or Cyrillic characters A decision to support any of these operations would have implications for other scripts or languages and would increase the overall complexity of the process. For example, unless language-specific information is somehow available, performing matching between Traditional and Simplified Chinese has impacts on Japanese and Korean uses of the same "traditional" characters (e.g., it would not be appropriate to map Kanji into Simplified Chinese). Even were the IDN-WG's other work to have been abandoned completely or if it were to fail in the marketplace, the stringprep and nameprep work will continue to be extremely useful, both in identifying issues and problem code points and in providing a reasonable set of basic rules. Where problems remain, they are arguably not with nameprep, but with the DNS-imposed requirement that its results, as with all other parts of the matching and comparison process, yield a binary "match or no match" answer, rather than, e.g., a value on a similarity scale that can be evaluated by the user or by user-driven heuristic functions.
UTR15]), some additional rules for canonicalization and comparison. Many of those rules and conventions have been factored into the "stringprep" and "nameprep" work, but it is not straightforward to make or define them in a fashion that is sufficiently precise and permanent to be relied on by the DNS. Perhaps more important, the discussions leading to nameprep also identified several areas in which the UTC definitions are inadequate, at least without additional information, to make matching precise and unambiguous. In some of these cases, the Unicode Standard permits several alternate approaches, none of which are an exact and obvious match to DNS needs. That has left these sensitive choices up to IETF, which lacks sufficient in-depth expertise, much less any mechanism for deciding to optimize one language at the expense of another. For example, it is tempting to define some rules on the basis of membership in particular scripts, or for punctuation characters, but there is no precise definition of what characters belong to which script or which ones are, or are not, punctuation. The existence of these areas of vagueness raises two issues: whether trying to do precise matching at the character set level is actually possible (addressed below) and whether driving toward more precision could create issues that cause instability in the implementation and resolution models for the DNS. The Unicode definition also evolves. Version 3.2 appeared shortly after work on this document was initiated. It added some characters and functionality and included a few minor incompatible code point changes. IETF has secured an agreement about constraints on future changes, but it remains to be seen how that agreement will work out in practice. The prognosis actually appears poor at this stage, since UTC chose to ballot a recent possible change which should have been prohibited by the agreement (the outcome of the ballot is not relevant, only that the ballot was issued rather than having the result be a foregone conclusion). However, some members of the community consider some of the changes between Unicode 3.0 and 3.1 and between 3.1 and 3.2, as well as this recent ballot, to be
evidence of instability and that these instabilities are better handled in a system that can be more flexible about handling of characters, scripts, and ancillary information than the DNS. In addition, because the systems implications of internationalization are considered out of scope in SC2, ISO/IEC JTC1 has assigned some of those issues to its SC22/WG20 (the Internationalization working group within the subcommittee that deals with programming languages, systems, and environments). WG20 has historically dealt with internationalization issues thoughtfully and in depth, but its status has several times been in doubt in recent years. However, assignment of these matters to WG20 increases the risk of eventual ISO internationalization standards that specify different behavior than the UTC specifications. figure out where it goes in order to have proof of concept of our solution. Unlike vendors whose sole [business] model is the selling or registering of names, the IETF must produce solutions that actually work, in the applications context as seen by the end user. o The principle that "they will get used to our conventions and adapt" is fine if we are writing rules for programming languages or an API. But the conventions under discussion are not part of a semi-mathematical system, they are deeply ingrained in culture. No matter how often an English-speaking American is told that the Internet requires that the correct spelling of "colour" be used, he or she isn't going to be convinced. Getting a French-speaker in Lyon to use exactly the same lexical conventions as a French-
speaker in Quebec in order to accommodate the decisions of the IETF or of a registrar or registry is just not likely. "Montreal" is either a misspelling or an anglicization of a similar word with an acute accent mark over the "e" (i.e., using the Unicode character U+00E9 or one of its equivalents). But global agreement on a rule that will determine whether the two forms should match -- and that won't astonish end users and speakers of one language or the other -- is as unlikely as agreement on whether "misspelling" or "anglicization" is the greater travesty. More generally, it is not clear that the outcome of any conceivable nameprep-like process is going to be good enough for practical, user-level, use. In the use of human languages by humans, there are many cases in which things that do not match are nonetheless interpreted as matching. The Norwegian/Danish character that appears in U+00F8 (visually, a lower case 'o' overstruck with a forward slash) and the "o-umlaut" German character that appears in U+00F6 (visually, a lower case 'o' with diaeresis (or umlaut)) are clearly different and no matching program should yield an "equal" comparison. But they are more similar to each other than either of them is to, e.g., "e". Humans are able to mentally make the correction in context, and do so easily, and they can be surprised if computers cannot do so. Worse, there is a Swedish character whose appearance is identical to the German o-umlaut, and which shares code point U+00F6, but that, if the languages are known and the sounds of the letters or meanings of words including the character are considered, actually should match the Norwegian/Danish use of U+00F8. This text uses examples in Roman scripts because it is being written in English and those examples are relatively easy to render. But one of the important lessons of the discussions about domain name internationalization in recent years is that problems similar to those described above exist in almost every language and script. Each one has its idiosyncrasies, and each set of idiosyncracies is tied to common usage and cultural issues that are very familiar in the relevant group, and often deeply held as cultural values. As long as a schoolchild in the US can get a bad grade on a spelling test for using a perfectly valid British spelling, or one in France or Germany can get a poor grade for leaving off a diacritical mark, there are issues with the relevant language. Similarly, if children in Egypt or Israel are taught that it is acceptable to write a word with or without vowels or stress marks, but that, if those marks are included, they must be the correct ones, or a user in Korea is potentially offended or astonished by out-of-order sequences of Jamo, systems based on character-at-a-time processing and simplistic matching, with no contextual information, are not going to satisfy user needs.
Users are demanding solutions that deal with language and culture. Systems of identifier symbol-strings that serve specialists or computers are, at best, a solution to a rather different (and, at the time this document was written, somewhat ill-defined), problem. The recent efforts have made it ever more clear that, if we ignore the distinction between the user requirements and narrowly-defined identifiers, we are solving an insufficient problem. And, conversely, the approaches that have been proposed to approximate solutions to the user requirement may be far more complex than simple identifiers require.
for example, fairly easy to imagine populations who use Arabic or Thai scripts and who do not have routine access to scripts or input devices based on Roman-derived alphabets. RFC2671] or that long names are not really important; that Japanese and Chinese computer users (and others) will either give up their local or IS 2022-based character coding solutions (for which addition of a large fraction of a million new code points to Unicode is almost certainly a necessary, but probably not sufficient, condition) or build leakproof and completely accurate boundary conversion mechanisms; that out of band or contextual information will always be sufficient for the "map glyph onto script" problem; and so on. In each case, it is likely that about 80% or 90% of cases will work satisfactorily, but it is unlikely that such partial solutions will be good enough. For example, suppose someone can spell her name 90% correctly, or a company name is matched correctly 80% of the time but the other 20% of attempts identify a competitor: are either likely to be considered adequate?
directory with universal applicability, a single directory that supports locally-tailored search (and, most important, matching) functions, or multiple, locally-determined, directories. Each has its attractions. Any but the first would essentially prevent reverse-mapping (determination of the user-visible name of the host or resource from target information such as an address or DNS name). But reverse mapping has become less useful over the years --at least to users -- as more and more names have been associated with many host addresses and as CIDR [CIDR] has proven problematic for mapping smaller address blocks to meaningful names. Locally-tailored searches and mappings would permit national variations on interpretation of which strings matched which other ones, an arrangement that is especially important when different localities apply different rules to, e.g., matching of characters with and without diacriticals. But, of course, this implies that a URL may evaluate properly or not depending on either settings on a client machine or the network connectivity of the user. That is not, in general, a desirable situation, since it implies that users could not, in the general case, share URLs (or other host references) and that a particular user might not be able to carry references from one host or location to another. And, of course, completely separate directories would permit translation and transliteration functions to be embedded in the directory, giving much of the Internet a different appearance depending on which directory was chosen. The attractions of this are obvious, but, unless things were very carefully designed to preserve uniqueness and precise identities at the right points (which may or may not be possible), such a system would have many of the difficulties associated with multiple DNS roots. Finally, a system of separate directories and databases, if coupled with removal of the DNS-imposed requirement for unique names, would largely eliminate the need for a single worldwide authority to manage the top of the naming hierarchy.
getting back information, and then doing additional lookups potentially in different subtrees. That multistage lookup will often be the case with, e.g., NAPTR records [RFC 2915] unless additional restrictions are imposed. But additional steps, systems, and databases almost certainly involve some additional risks of compromise. [Albitz] Any of the editions of Albitz, P. and C. Liu, DNS and BIND, O'Reilly and Associates, 1992, 1997, 1998, 2001. [ASCII] American National Standards Institute (formerly United States of America Standards Institute), X3.4, 1968, "USA Code for Information Interchange". ANSI X3.4-1968 has been replaced by newer versions with slight modifications, but the 1968 version remains definitive for the Internet. Some time after ASCII was first formulated as a standard, ISO adopted international standard 646, which uses ASCII as a base. IS 646 actually contained two code tables: an "International Reference Version" (often referenced as ISO 646-IRV) which was essentially identical to the ASCII of the time, and a "Basic Version" (ISO 646-BV), which designates a number of character positions for national use. [CIDR] Fuller, V., Li, T., Yu, J. and K. Varadhan, "Classless Inter-Domain Routing (CIDR): an Address Assignment and Aggregation Strategy", RFC 1519, September 1993. Eidnes, H., de Groot, G. and P. Vixie, "Classless IN- ADDR.ARPA delegation", RFC 2317, March 1998. [COM-SIZE] Size information supplied by Verisign Global Registry Services (the zone administrator, or "registry operator", for COM, see [REGISTRAR], below) to ICANN, third quarter 2002. [DNS-Search] Klensin, J., "A Search-based access model for the DNS", Work in Progress.
[FINGER] Zimmerman, D., "The Finger User Information Protocol", RFC 1288, December 1991. Harrenstien, K., "NAME/FINGER Protocol", RFC 742, December 1977. [IAB-OPES] Floyd, S. and L. Daigle, "IAB Architectural and Policy Considerations for Open Pluggable Edge Services", RFC 3238, January 2002. [IQUERY] Lawrence, D., "Obsoleting IQUERY", RFC 3425, November 2002. [IS646] ISO/IEC 646:1991 Information technology -- ISO 7-bit coded character set for information interchange [IS10646] ISO/IEC 10646-1:2000 Information technology -- Universal Multiple-Octet Coded Character Set (UCS) -- Part 1: Architecture and Basic Multilingual Plane and ISO/IEC 10646-2:2001 Information technology -- Universal Multiple-Octet Coded Character Set (UCS) -- Part 2: Supplementary Planes [MINC] The Multilingual Internet Names Consortium, http://www.minc.org/ has been an early advocate for the importance of expansion of DNS names to accommodate non-ASCII characters. Some of their specific proposals, while helping people to understand the problems better, were not compatible with the design of the DNS. [NAPTR] Mealling, M. and R. Daniel, "The Naming Authority Pointer (NAPTR) DNS Resource Record", RFC 2915, September 2000. Mealling, M., "Dynamic Delegation Discovery System (DDDS) Part One: The Comprehensive DDDS", RFC 3401, October 2002. Mealling, M., "Dynamic Delegation Discovery System (DDDS) Part Two: The Algorithm", RFC 3402, October 2002. Mealling, M., "Dynamic Delegation Discovery System (DDDS) Part Three: The Domain Name System (DNS) Database", RFC 3403, October 2002.
[REGISTRAR] In an early stage of the process that created the Internet Corporation for Assigned Names and Numbers (ICANN), a "Green Paper" was released by the US Government. That paper introduced new terminology and some concepts not needed by traditional DNS operations. The term "registry" was applied to the actual operator and database holder of a domain (typically at the top level, since the Green Paper was little concerned with anything else), while organizations that marketed names and made them available to "registrants" were known as "registrars". In the classic DNS model, the function of "zone administrator" encompassed both registry and registrar roles, although that model did not anticipate a commercial market in names. [RFC625] Kudlick, M. and E. Feinler, "On-line hostnames service", RFC 625, March 1974. [RFC734] Crispin, M., "SUPDUP Protocol", RFC 734, October 1977. [RFC811] Harrenstien, K., White, V. and E. Feinler, "Hostnames Server", RFC 811, March 1982. [RFC819] Su, Z. and J. Postel, "Domain naming convention for Internet user applications", RFC 819, August 1982. [RFC830] Su, Z., "Distributed system for Internet name service", RFC 830, October 1982. [RFC882] Mockapetris, P., "Domain names: Concepts and facilities", RFC 882, November 1983. [RFC883] Mockapetris, P., "Domain names: Implementation specification", RFC 883, November 1983. [RFC952] Harrenstien, K, Stahl, M. and E. Feinler, "DoD Internet host table specification", RFC 952, October 1985. [RFC953] Harrenstien, K., Stahl, M. and E. Feinler, "HOSTNAME SERVER", RFC 953, October 1985. [RFC1034] Mockapetris, P., "Domain names, Concepts and facilities", STD 13, RFC 1034, November 1987.
[RFC1035] Mockapetris, P., "Domain names - implementation and specification", STD 13, RFC 1035, November 1987. [RFC1591] Postel, J., "Domain Name System Structure and Delegation", RFC 1591, March 1994. [RFC2181] Elz, R. and R. Bush, "Clarifications to the DNS Specification", RFC 2181, July 1997. [RFC2295] Holtman, K. and A. Mutz, "Transparent Content Negotiation in HTTP", RFC 2295, March 1998 [RFC2396] Berners-Lee, T., Fielding, R. and L. Masinter, "Uniform Resource Identifiers (URI): Generic Syntax", RFC 2396, August 1998. [RFC2608] Guttman, E., Perkins, C., Veizades, J. and M. Day, "Service Location Protocol, Version 2", RFC 2608, June 1999. [RFC2671] Vixie, P., "Extension Mechanisms for DNS (EDNS0)", RFC 2671, August 1999. [RFC2825] IAB, Daigle, L., Ed., "A Tangled Web: Issues of I18N, Domain Names, and the Other Internet protocols", RFC 2825, May 2000. [RFC2826] IAB, "IAB Technical Comment on the Unique DNS Root", RFC 2826, May 2000. [RFC2972] Popp, N., Mealling, M., Masinter, L. and K. Sollins, "Context and Goals for Common Name Resolution", RFC 2972, October 2000. [RFC3305] Mealling, M. and R. Denenberg, Eds., "Report from the Joint W3C/IETF URI Planning Interest Group: Uniform Resource Identifiers (URIs), URLs, and Uniform Resource Names (URNs): Clarifications and Recommendations", RFC 3305, August 2002. [RFC3439] Bush, R. and D. Meyer, "Some Internet Architectural Guidelines and Philosophy", RFC 3439, December 2002. [Seng] Seng, J., et al., Eds., "Internationalized Domain Names: Registration and Administration Guideline for Chinese, Japanese, and Korean", Work in Progress.
[STRINGPREP] Hoffman, P. and M. Blanchet, "Preparation of Internationalized Strings (stringprep)", RFC 3454, December 2002. The particular profile used for placing internationalized strings in the DNS is called "nameprep", described in Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep Profile for Internationalized Domain Names", Work in Progress. [TELNET] Postel, J. and J. Reynolds, "Telnet Protocol Specification", STD 8, RFC 854, May 1983. Postel, J. and J. Reynolds, "Telnet Option Specifications", STD 8, RFC 855, May 1983. [UNICODE] The Unicode Consortium, The Unicode Standard, Version 3.0, Addison-Wesley: Reading, MA, 2000. Update to version 3.1, 2001. Update to version 3.2, 2002. [UTR15] Davis, M. and M. Duerst, "Unicode Standard Annex #15: Unicode Normalization Forms", Unicode Consortium, March 2002. An integral part of The Unicode Standard, Version 3.1.1. Available at (http://www.unicode.org/reports/tr15/tr15-21.html). [WHOIS] Harrenstien, K, Stahl, M. and E. Feinler, "NICNAME/WHOIS", RFC 954, October 1985. [WHOIS-UPDATE] Gargano, J. and K. Weiss, "Whois and Network Information Lookup Service, Whois++", RFC 1834, August 1995. Weider, C., Fullton, J. and S. Spero, "Architecture of the Whois++ Index Service", RFC 1913, February 1996. Williamson, S., Kosters, M., Blacka, D., Singh, J. and K. Zeilstra, "Referral Whois (RWhois) Protocol V1.5", RFC 2167, June 1997; Daigle, L. and P. Faltstrom, "The application/whoispp-query Content-Type", RFC 2957, October 2000. Daigle, L. and P. Falstrom, "The application/whoispp- response Content-type", RFC 2958, October 2000.
[X29] International Telecommuncations Union, "Recommendation X.29: Procedures for the exchange of control information and user data between a Packet Assembly/Disassembly (PAD) facility and a packet mode DTE or another PAD", December 1997. http://lists.elistx.com/archives/ for subscription and archival information.
Acknowledgement Funding for the RFC Editor function is currently provided by the Internet Society.