4. Formation and Processing of Language Tags
This section addresses how to use the information in the registry
with the tag syntax to choose, form, and process language tags.
4.1. Choice of Language Tag
One is sometimes faced with the choice between several possible tags
for the same body of text.
Interoperability is best served when all users use the same language
tag in order to represent the same language. If an application has
requirements that make the rules here inapplicable, then that
application risks damaging interoperability. It is strongly
RECOMMENDED that users not define their own rules for language tag
Subtags SHOULD only be used where they add useful distinguishing
information; extraneous subtags interfere with the meaning,
understanding, and processing of language tags. In particular, users
and implementations SHOULD follow the 'Prefix' and 'Suppress-Script'
fields in the registry (defined in Section 3.1): these fields provide
guidance on when specific additional subtags SHOULD (and SHOULD NOT)
be used in a language tag.
Of particular note, many applications can benefit from the use of
script subtags in language tags, as long as the use is consistent for
a given context. Script subtags were not formally defined in RFC
3066 and their use can affect matching and subtag identification by
implementations of RFC 3066, as these subtags appear between the
primary language and region subtags. For example, if a user requests
content in an implementation of Section 2.5 of [RFC3066] using the
language range "en-US", content labeled "en-Latn-US" will not match
the request. Therefore, it is important to know when script subtags
will customarily be used and when they ought not be used. In the
registry, the Suppress-Script field helps ensure greater
compatibility between the language tags generated according to the
rules in this document and language tags and tag processors or
consumers based on RFC 3066 by defining when users SHOULD NOT include
a script subtag with a particular primary language subtag.
Extended language subtags (type 'extlang' in the registry; see
Section 3.1) also appear between the primary language and region
subtags and are reserved for future standardization. Applications
might benefit from their judicious use in forming language tags in
the future. Similar recommendations are expected to apply to their
use as apply to script subtags.
Standards, protocols, and applications that reference this document
normatively but apply different rules to the ones given in this
section MUST specify how the procedure varies from the one given
The choice of subtags used to form a language tag SHOULD be guided by
the following rules:
1. Use as precise a tag as possible, but no more specific than is
justified. Avoid using subtags that are not important for
distinguishing content in an application.
* For example, 'de' might suffice for tagging an email written
in German, while "de-CH-1996" is probably unnecessarily
precise for such a task.
2. The script subtag SHOULD NOT be used to form language tags unless
the script adds some distinguishing information to the tag. The
field 'Suppress-Script' in the primary language record in the
registry indicates which script subtags do not add distinguishing
information for most applications.
* For example, the subtag 'Latn' should not be used with the
primary language 'en' because nearly all English documents are
written in the Latin script and it adds no distinguishing
information. However, if a document were written in English
mixing Latin script with another script such as Braille
('Brai'), then it might be appropriate to choose to indicate
both scripts to aid in content selection, such as the
application of a style sheet.
3. If a tag or subtag has a 'Preferred-Value' field in its registry
entry, then the value of that field SHOULD be used to form the
language tag in preference to the tag or subtag in which the
preferred value appears.
* For example, use 'he' for Hebrew in preference to 'iw'.
4. The 'und' (Undetermined) primary language subtag SHOULD NOT be
used to label content, even if the language is unknown. Omitting
the language tag altogether is preferred to using a tag with a
primary language subtag of 'und'. The 'und' subtag MAY be useful
for protocols that require a language tag to be provided. The
'und' subtag MAY also be useful when matching language tags in
5. The 'mul' (Multiple) primary language subtag SHOULD NOT be used
whenever the protocol allows the separate tags for multiple
languages, as is the case for the Content-Language header in
HTTP. The 'mul' subtag conveys little useful information:
content in multiple languages SHOULD individually tag the
languages where they appear or otherwise indicate the actual
language in preference to the 'mul' subtag.
6. The same variant subtag SHOULD NOT be used more than once within
a language tag.
* For example, do not use "de-DE-1901-1901".
To ensure consistent backward compatibility, this document contains
several provisions to account for potential instability in the
standards used to define the subtags that make up language tags.
These provisions mean that no language tag created under the rules in
this document will become obsolete.
4.2. Meaning of the Language Tag
The relationship between the tag and the information it relates to is
defined by the context in which the tag appears. Accordingly, this
section gives only possible examples of its usage.
o For a single information object, the associated language tags
might be interpreted as the set of languages that is necessary for
a complete comprehension of the complete object. Example: Plain
o For an aggregation of information objects, the associated language
tags could be taken as the set of languages used inside components
of that aggregation. Examples: Document stores and libraries.
o For information objects whose purpose is to provide alternatives,
the associated language tags could be regarded as a hint that the
content is provided in several languages and that one has to
inspect each of the alternatives in order to find its language or
languages. In this case, the presence of multiple tags might not
mean that one needs to be multi-lingual to get complete
understanding of the document. Example: MIME multipart/
o In markup languages, such as HTML and XML, language information
can be added to each part of the document identified by the markup
structure (including the whole document itself). For example, one
could write <span lang="fr">C'est la vie.</span> inside a
Norwegian document; the Norwegian-speaking user could then access
a French-Norwegian dictionary to find out what the marked section
meant. If the user were listening to that document through a
speech synthesis interface, this formation could be used to signal
the synthesizer to appropriately apply French text-to-speech
pronunciation rules to that span of text, instead of applying the
inappropriate Norwegian rules.
Language tags are related when they contain a similar sequence of
subtags. For example, if a language tag B contains language tag A as
a prefix, then B is typically "narrower" or "more specific" than A.
Thus, "zh-Hant-TW" is more specific than "zh-Hant".
This relationship is not guaranteed in all cases: specifically,
languages that begin with the same sequence of subtags are NOT
guaranteed to be mutually intelligible, although they might be. For
example, the tag "az" shares a prefix with both "az-Latn"
(Azerbaijani written using the Latin script) and "az-Cyrl"
(Azerbaijani written using the Cyrillic script). A person fluent in
one script might not be able to read the other, even though the text
might be identical. Content tagged as "az" most probably is written
in just one script and thus might not be intelligible to a reader
familiar with the other script.
4.3. Length Considerations
[RFC3066] did not provide an upper limit on the size of language
tags. While RFC 3066 did define the semantics of particular subtags
in such a way that most language tags consisted of language and
region subtags with a combined total length of up to six characters,
larger registered tags were not only possible but were actually
Neither the language tag syntax nor other requirements in this
document impose a fixed upper limit on the number of subtags in a
language tag (and thus an upper bound on the size of a tag). The
language tag syntax suggests that, depending on the specific
language, more subtags (and thus a longer tag) are sometimes
necessary to completely identify the language for certain
applications; thus, it is possible to envision long or complex subtag
4.3.1. Working with Limited Buffer Sizes
Some applications and protocols are forced to allocate fixed buffer
sizes or otherwise limit the length of a language tag. A conformant
implementation or specification MAY refuse to support the storage of
language tags that exceed a specified length. Any such limitation
SHOULD be clearly documented, and such documentation SHOULD include
what happens to longer tags (for example, whether an error value is
generated or the language tag is truncated). A protocol that allows
tags to be truncated at an arbitrary limit, without giving any
indication of what that limit is, has the potential for causing harm
by changing the meaning of tags in substantial ways.
In practice, most language tags do not require more than a few
subtags and will not approach reasonably sized buffer limitations;
see Section 4.1.
Some specifications or protocols have limits on tag length but do not
have a fixed length limitation. For example, [RFC2231] has no
explicit length limitation: the length available for the language tag
is constrained by the length of other header components (such as the
charset's name) coupled with the 76-character limit in [RFC2047].
Thus, the "limit" might be 50 or more characters, but it could
potentially be quite small.
The considerations for assigning a buffer limit are:
Implementations SHOULD NOT truncate language tags unless the
meaning of the tag is purposefully being changed, or unless the
tag does not fit into a limited buffer size specified by a
protocol for storage or transmission.
Implementations SHOULD warn the user when a tag is truncated since
truncation changes the semantic meaning of the tag.
Implementations of protocols or specifications that are space
constrained but do not have a fixed limit SHOULD use the longest
possible tag in preference to truncation.
Protocols or specifications that specify limited buffer sizes for
language tags MUST allow for language tags of up to 33 characters.
Protocols or specifications that specify limited buffer sizes for
language tags SHOULD allow for language tags of at least 42
The following illustration shows how the 42-character recommendation
was derived. The combination of language and extended language
subtags was chosen for future compatibility. At up to 15 characters,
this combination is longer than the longest possible primary language
subtag (8 characters):
language = 3 (ISO 639-2; ISO 639-1 requires 2)
extlang1 = 4 (each subsequent subtag includes '-')
extlang2 = 4 (unlikely: needs prefix="language-extlang1")
extlang3 = 4 (extremely unlikely)
script = 5 (if not suppressed: see Section 4.1)
region = 4 (UN M.49; ISO 3166 requires 3)
variant1 = 9 (MUST have language as a prefix)
variant2 = 9 (MUST have language-variant1 as a prefix)
total = 42 characters
Figure 7: Derivation of the Limit on Tag Length4.3.2. Truncation of Language Tags
Truncation of a language tag alters the meaning of the tag, and thus
SHOULD be avoided. However, truncation of language tags is sometimes
necessary due to limited buffer sizes. Such truncation MUST NOT
permit a subtag to be chopped off in the middle or the formation of
invalid tags (for example, one ending with the "-" character).
This means that applications or protocols that truncate tags MUST do
so by progressively removing subtags along with their preceding "-"
from the right side of the language tag until the tag is short enough
for the given buffer. If the resulting tag ends with a single-
character subtag, that subtag and its preceding "-" MUST also be
removed. For example:
Tag to truncate: zh-Latn-CN-variant1-a-extend1-x-wadegile-private1
Figure 8: Example of Tag Truncation
4.4. Canonicalization of Language Tags
Since a particular language tag is sometimes used by many processes,
language tags SHOULD always be created or generated in a canonical
A language tag is in canonical form when:
1. The tag is well-formed according the rules in Section 2.1 and
2. Subtags of type 'Region' that have a Preferred-Value mapping in
the IANA registry (see Section 3.1) SHOULD be replaced with their
mapped value. Note: In rare cases, the mapped value will also
have a Preferred-Value.
3. Redundant or grandfathered tags that have a Preferred-Value
mapping in the IANA registry (see Section 3.1) MUST be replaced
with their mapped value. These items either are deprecated
mappings created before the adoption of this document (such as
the mapping of "no-nyn" to "nn" or "i-klingon" to "tlh") or are
the result of later registrations or additions to this document
(for example, "zh-guoyu" might be mapped to a language-extlang
combination such as "zh-cmn" by some future update of this
4. Other subtags that have a Preferred-Value mapping in the IANA
registry (see Section 3.1) MUST be replaced with their mapped
value. These items consist entirely of clerical corrections to
ISO 639-1 in which the deprecated subtags have been maintained
for compatibility purposes.
5. If more than one extension subtag sequence exists, the extension
sequences are ordered into case-insensitive ASCII order by
Example: The language tag "en-A-aaa-B-ccc-bbb-x-xyz" is in canonical
form, while "en-B-ccc-bbb-A-aaa-X-xyz" is well-formed but not in
Example: The language tag "en-BU" (English as used in Burma) is not
canonical because the 'BU' subtag has a canonical mapping to 'MM'
(Myanmar), although the tag "en-BU" maintains its validity.
Canonicalization of language tags does not imply anything about the
use of upper or lowercase letters when processing or comparing
subtags (and as described in Section 2.1). All comparisons MUST be
performed in a case-insensitive manner.
When performing canonicalization of language tags, processors MAY
regularize the case of the subtags (that is, this process is
OPTIONAL), following the case used in the registry. Note that this
corresponds to the following casing rules: uppercase all non-initial
two-letter subtags; titlecase all non-initial four-letter subtags;
lowercase everything else.
Note: Case folding of ASCII letters in certain locales, unless
carefully handled, sometimes produces non-ASCII character values.
The Unicode Character Database file "SpecialCasing.txt" defines the
specific cases that are known to cause problems with this. In
particular, the letter 'i' (U+0069) in Turkish and Azerbaijani is
uppercased to U+0130 (LATIN CAPITAL LETTER I WITH DOT ABOVE).
Implementers SHOULD specify a locale-neutral casing operation to
ensure that case folding of subtags does not produce this value,
which is illegal in language tags. For example, if one were to
uppercase the region subtag 'in' using Turkish locale rules, the
sequence U+0130 U+004E would result instead of the expected 'IN'.
Note: if the field 'Deprecated' appears in a registry record without
an accompanying 'Preferred-Value' field, then that tag or subtag is
deprecated without a replacement. Validating processors SHOULD NOT
generate tags that include these values, although the values are
canonical when they appear in a language tag.
An extension MUST define any relationships that exist between the
various subtags in the extension and thus MAY define an alternate
canonicalization scheme for the extension's subtags. Extensions MAY
define how the order of the extension's subtags are interpreted. For
example, an extension could define that its subtags are in canonical
order when the subtags are placed into ASCII order: that is,
"en-a-aaa-bbb-ccc" instead of "en-a-ccc-bbb-aaa". Another extension
might define that the order of the subtags influences their semantic
meaning (so that "en-b-ccc-bbb-aaa" has a different value from
"en-b-aaa-bbb-ccc"). However, extension specifications SHOULD be
designed so that they are tolerant of the typical processes described
in Section 3.7.
4.5. Considerations for Private Use Subtags
Private use subtags, like all other subtags, MUST conform to the
format and content constraints in the ABNF. Private use subtags have
no meaning outside the private agreement between the parties that
intend to use or exchange language tags that employ them. The same
subtags MAY be used with a different meaning under a separate private
agreement. They SHOULD NOT be used where alternatives exist and
SHOULD NOT be used in content or protocols intended for general use.
Private use subtags are simply useless for information exchange
without prior arrangement. The value and semantic meaning of private
use tags and of the subtags used within such a language tag are not
defined by this document.
Subtags defined in the IANA registry as having a specific private use
meaning convey more information that a purely private use tag
prefixed by the singleton subtag 'x'. For applications, this
additional information MAY be useful.
For example, the region subtags 'AA', 'ZZ', and in the ranges
'QM'-'QZ' and 'XA'-'XZ' (derived from ISO 3166 private use codes) MAY
be used to form a language tag. A tag such as "zh-Hans-XQ" conveys a
great deal of public, interchangeable information about the language
material (that it is Chinese in the simplified Chinese script and is
suitable for some geographic region 'XQ'). While the precise
geographic region is not known outside of private agreement, the tag
conveys far more information than an opaque tag such as "x-someLang",
which contains no information about the language subtag or script
subtag outside of the private agreement.
However, in some cases content tagged with private use subtags MAY
interact with other systems in a different and possibly unsuitable
manner compared to tags that use opaque, privately defined subtags,
so the choice of the best approach sometimes depends on the
particular domain in question.
5. IANA Considerations
This section deals with the processes and requirements necessary for
IANA to undertake to maintain the subtag and extension registries as
defined by this document and in accordance with the requirements of
The impact on the IANA maintainers of the two registries defined by
this document will be a small increase in the frequency of new
entries or updates.
5.1. Language Subtag Registry
Upon adoption of this document, the registry will be initialized by a
companion document: [RFC4645]. The criteria and process for
selecting the initial set of records are described in that document.
The initial set of records represents no impact on IANA, since the
work to create it will be performed externally.
The new registry MUST be listed under "Language Tags" at
<http://www.iana.org/numbers.html>, replacing the existing
registrations defined by [RFC3066]. The existing set of registration
forms and RFC 3066 registrations MUST be relabeled as "Language Tags
(Obsolete)" and maintained (but not added to or modified).
Future work on the Language Subtag Registry SHALL be limited to
inserting or replacing whole records preformatted for IANA by the
Language Subtag Reviewer as described in Section 3.3 of this document
and archiving the forwarded registration form.
Each record MUST be sent to email@example.com with a subject line
indicating whether the enclosed record is an insertion of a new
record (indicated by the word "INSERT" in the subject line) or a
replacement of an existing record (indicated by the word "MODIFY" in
the subject line). Records MUST NOT be deleted from the registry.
IANA MUST place any inserted or modified records into the appropriate
section of the language subtag registry, grouping the records by
their 'Type' field. Inserted records MAY be placed anywhere in the
appropriate section; there is no guarantee of the order of the
records beyond grouping them together by 'Type'. Modified records
MUST overwrite the record they replace.
Included in any request to insert or modify records MUST be a new
File-Date record. This record MUST be placed first in the registry.
In the event that the File-Date record present in the registry has a
later date than the record being inserted or modified, the existing
record MUST be preserved.
5.2. Extensions Registry
The Language Tag Extensions Registry will also be generated and sent
to IANA as described in Section 3.7. This registry can contain at
most 35 records, and thus changes to this registry are expected to be
Future work by IANA on the Language Tag Extensions Registry is
limited to two cases. First, the IESG MAY request that new records
be inserted into this registry from time to time. These requests
MUST include the record to insert in the exact format described in
Section 3.7. In addition, there MAY be occasional requests from the
maintaining authority for a specific extension to update the contact
information or URLs in the record. These requests MUST include the
complete, updated record. IANA is not responsible for validating the
information provided, only that it is properly formatted. It should
reasonably be seen to come from the maintaining authority named in
the record present in the registry.
6. Security Considerations
Language tags used in content negotiation, like any other information
exchanged on the Internet, might be a source of concern because they
might be used to infer the nationality of the sender, and thus
identify potential targets for surveillance.
This is a special case of the general problem that anything sent is
visible to the receiving party and possibly to third parties as well.
It is useful to be aware that such concerns can exist in some cases.
The evaluation of the exact magnitude of the threat, and any possible
countermeasures, is left to each application protocol (see BCP 72
[RFC3552] for best current practice guidance on security threats and
The language tag associated with a particular information item is of
no consequence whatsoever in determining whether that content might
contain possible homographs. The fact that a text is tagged as being
in one language or using a particular script subtag provides no
assurance whatsoever that it does not contain characters from scripts
other than the one(s) associated with or specified by that language
Since there is no limit to the number of variant, private use, and
extension subtags, and consequently no limit on the possible length
of a tag, implementations need to guard against buffer overflow
attacks. See Section 4.3 for details on language tag truncation,
which can occur as a consequence of defenses against buffer overflow.
Although the specification of valid subtags for an extension (see
Section 3.7) MUST be available over the Internet, implementations
SHOULD NOT mechanically depend on it being always accessible, to
prevent denial-of-service attacks.
7. Character Set Considerations
The syntax in this document requires that language tags use only the
characters A-Z, a-z, 0-9, and HYPHEN-MINUS, which are present in most
character sets, so the composition of language tags should not have
any character set issues.
Rendering of characters based on the content of a language tag is not
addressed in this memo. Historically, some languages have relied on
the use of specific character sets or other information in order to
infer how a specific character should be rendered (notably this
applies to language- and culture-specific variations of Han
ideographs as used in Japanese, Chinese, and Korean). When language
tags are applied to spans of text, rendering engines sometimes use
that information in deciding which font to use in the absence of
other information, particularly where languages with distinct writing
traditions use the same characters.
8. Changes from RFC 3066
The main goals for this revision of language tags were the following:
*Compatibility.* All RFC 3066 language tags (including those in the
IANA registry) remain valid in this specification. The changes in
this document represent additional constraints on language tags.
That is, in no case is the syntax more permissive and processors
based on the ABNF and other provisions of RFC 3066 (such as those
described in [XMLSchema]) will be able to process the tags described
by this document. In addition, this document defines language tags
in such as way as to ensure future compatibility.
*Stability.* Because of changes in the past in the underlying ISO
standards, a valid RFC 3066 language tag could become invalid or have
its meaning change. This has the potential of invalidating content
that may have an extensive shelf-life. In this specification, once a
language tag is valid, it remains valid forever.
*Validity.* The structure of language tags defined by this document
makes it possible to determine if a particular tag is well-formed
without regard for the actual content or "meaning" of the tag as a
whole. This is important because the registry grows and underlying
standards change over time. In addition, it must be possible to
determine if a tag is valid (or not) for a given point in time in
order to provide reproducible, testable results. This process must
not be error-prone; otherwise implementations might give different
results. By having an authoritative registry with specific
versioning information, the validity of language tags at any point in
time can be precisely determined (instead of interpolating values
from many separate sources).
*Utility.* It is sometimes important to be able to differentiate
between written forms of a language -- for many implementations this
is more important than distinguishing between the spoken variants of
a language. Languages are written in a wide variety of different
scripts, so this document provides for the generative use of ISO
15924 script codes. Like the generative use of ISO language and
country codes in RFC 3066, this allows combinations to be produced
without resorting to the registration process. The addition of UN
M.49 codes provides for the generation of language tags with regional
scope, which is also required by some applications.
The recast of the registry from containing whole language tags to
subtags is a key part of this. An important feature of RFC 3066 was
that it allowed generative use of subtags. This allows people to
meaningfully use generated tags, without the delays in registering
whole tags or the need to register all of the combinations that might
The choice of placing the extended language and script subtags
between the primary language and region subtags was widely debated.
This design was chosen because the prevalent matching and content
negotiation schemes rely on the subtags being arranged in order of
increasing specificity. That is, the subtags that mark a greater
barrier to mutual intelligibility appear left-most in a tag. For
example, when selecting content written in Azerbaijani, the script
(Arabic, Cyrillic, or Latin) represents a greater barrier to
understanding than any regional variations (those associated with
Azerbaijan or Iran, for example). Individuals who prefer documents
in a particular script, but can deal with the minor regional
differences, can therefore select appropriate content. Applications
that do not deal with written content will continue to omit these
*Extensibility.* Because of the widespread use of language tags, it
is disruptive to have periodic revisions of the core specification,
even in the face of demonstrated need. The extension mechanism
provides for a way for independent RFCs to define extensions to
language tags. These extensions have a very constrained, well-
defined structure that prevents extensions from interfering with
implementations of language tags defined in this document.
The document also anticipates features of ISO 639-3 with the addition
of the extended language subtags, as well as the possibility of other
ISO 639 parts becoming useful for the formation of language tags in
The use and definition of private use tags have also been modified,
to allow people to use private use subtags to extend or modify
defined tags and to move as much information as possible out of
private use and into the regular structure.
The goal for each of these modifications is to reduce or eliminate
the need for future revisions of this document.
The specific changes in this document to meet these goals are:
o Defines the ABNF and rules for subtags so that the category of all
subtags can be determined without reference to the registry.
o Adds the concept of well-formed vs. validating processors,
defining the rules by which an implementation can claim to be one
or the other.
o Replaces the IANA language tag registry with a language subtag
registry that provides a complete list of valid subtags in the
IANA registry. This allows for robust implementation and ease of
maintenance. The language subtag registry becomes the canonical
source for forming language tags.
o Provides a process that guarantees stability of language tags, by
handling reuse of values by ISO 639, ISO 15924, and ISO 3166 in
the event that they register a previously used value for a new
o Allows ISO 15924 script code subtags and allows them to be used
generatively. Defines a method for indicating in the registry
when script subtags are necessary for a given language tag.
o Adds the concept of a variant subtag and allows variants to be
o Adds the ability to use a class of UN M.49 tags for supra-national
regions and to resolve conflicts in the assignment of ISO 3166
o Defines the private use tags in ISO 639, ISO 15924, and ISO 3166
as the mechanism for creating private use language, script, and
region subtags, respectively.
o Adds a well-defined extension mechanism.
o Defines an extended language subtag, possibly for use with certain
anticipated features of ISO 639-3.
9.1. Normative References
[ISO10646] International Organization for Standardization,
"ISO/IEC 10646:2003. Information technology --
Universal Multiple-Octet Coded Character Set (UCS)",
[ISO15924] International Organization for Standardization, "ISO
15924:2004. Information and documentation -- Codes for
the representation of names of scripts", January 2004.
[ISO3166-1] International Organization for Standardization, "ISO
3166-1:1997. Codes for the representation of names of
countries and their subdivisions -- Part 1: Country
[ISO639-1] International Organization for Standardization, "ISO
639-1:2002. Codes for the representation of names of
languages -- Part 1: Alpha-2 code", 2002.
[ISO639-2] International Organization for Standardization, "ISO
639-2:1998. Codes for the representation of names of
languages -- Part 2: Alpha-3 code, first edition",
[ISO646] International Organization for Standardization,
"ISO/IEC 646:1991, Information technology -- ISO 7-bit
coded character set for information interchange.",
[RFC2026] Bradner, S., "The Internet Standards Process --
Revision 3", BCP 9, RFC 2026, October 1996.
[RFC2028] Hovey, R. and S. Bradner, "The Organizations Involved
in the IETF Standards Process", BCP 11, RFC 2028,
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC2434] Narten, T. and H. Alvestrand, "Guidelines for Writing
an IANA Considerations Section in RFCs", BCP 26,
RFC 2434, October 1998.
[RFC2860] Carpenter, B., Baker, F., and M. Roberts, "Memorandum
of Understanding Concerning the Technical Work of the
Internet Assigned Numbers Authority", RFC 2860,
[RFC3339] Klyne, G., Ed. and C. Newman, "Date and Time on the
Internet: Timestamps", RFC 3339, July 2002.
[RFC4234] Crocker, D., Ed. and P. Overell, "Augmented BNF for
Syntax Specifications: ABNF", RFC 4234, October 2005.
[UN_M.49] Statistics Division, United Nations, "Standard Country
or Area Codes for Statistical Use", UN Standard
Country or Area Codes for Statistical Use, Revision 4
(United Nations publication, Sales No. 98.XVII.9,
9.2. Informative References
[RFC1766] Alvestrand, H., "Tags for the Identification of
Languages", RFC 1766, March 1995.
[RFC2047] Moore, K., "MIME (Multipurpose Internet Mail
Extensions) Part Three: Message Header Extensions for
Non-ASCII Text", RFC 2047, November 1996.
[RFC2231] Freed, N. and K. Moore, "MIME Parameter Value and
Encoded Word Extensions: Character Sets, Languages,
and Continuations", RFC 2231, November 1997.
[RFC2781] Hoffman, P. and F. Yergeau, "UTF-16, an encoding of
ISO 10646", RFC 2781, February 2000.
[RFC3066] Alvestrand, H., "Tags for the Identification of
Languages", BCP 47, RFC 3066, January 2001.
[RFC3552] Rescorla, E. and B. Korver, "Guidelines for Writing
RFC Text on Security Considerations", BCP 72,
RFC 3552, July 2003.
[RFC4645] Ewell, D., Ed., "Initial Language Subtag Registry",
RFC 4645, September 2006.
[RFC4647] Phillips, A., Ed. and M. Davis, Ed., "Matching of
Language Tags", BCP 47, RFC 4647, September 2006.
[Unicode] Unicode Consortium, "The Unicode Standard, Version
5.0", Boston, MA, Addison-Wesley, 2007. ISBN 0-321-
[XML10] Bray (et al), T., "Extensible Markup Language (XML)
1.0", 02 2004.
[XMLSchema] Biron, P., Ed. and A. Malhotra, Ed., "XML Schema Part
2: Datatypes Second Edition", 10 2004, <
[iso639.prin] ISO 639 Joint Advisory Committee, "ISO 639 Joint
Advisory Committee: Working principles for ISO 639
maintenance", March 2000, <http://www.loc.gov/
[record-jar] Raymond, E., "The Art of Unix Programming", 2003,
Appendix A. Acknowledgements
Any list of contributors is bound to be incomplete; please regard the
following as only a selection from the group of people who have
contributed to make this document what it is today.
The contributors to RFC 3066 and RFC 1766, the precursors of this
document, made enormous contributions directly or indirectly to this
document and are generally responsible for the success of language
The following people (in alphabetical order) contributed to this
document or to RFCs 1766 and 3066:
Glenn Adams, Harald Tveit Alvestrand, Tim Berners-Lee, Marc Blanchet,
Nathaniel Borenstein, Karen Broome, Eric Brunner, Sean M. Burke, M.T.
Carrasco Benitez, Jeremy Carroll, John Clews, Jim Conklin, Peter
Constable, John Cowan, Mark Crispin, Dave Crocker, Elwyn Davies,
Martin Duerst, Frank Ellerman, Michael Everson, Doug Ewell, Ned
Freed, Tim Goodwin, Dirk-Willem van Gulik, Marion Gunn, Joel Halpren,
Elliotte Rusty Harold, Paul Hoffman, Scott Hollenbeck, Richard
Ishida, Olle Jarnefors, Kent Karlsson, John Klensin, Erkki
Kolehmainen, Alain LaBonte, Eric Mader, Ira McDonald, Keith Moore,
Chris Newman, Masataka Ohta, Dylan Pierce, Randy Presuhn, George
Rhoten, Felix Sasaki, Markus Scherer, Keld Jorn Simonsen, Thierry
Sourbier, Otto Stolz, Tex Texin, Andrea Vine, Rhys Weatherley, Misha
Wolf, Francois Yergeau and many, many others.
Very special thanks must go to Harald Tveit Alvestrand, who
originated RFCs 1766 and 3066, and without whom this document would
not have been possible. Special thanks must go to Michael Everson,
who has served as Language Tag Reviewer for almost the complete
period since the publication of RFC 1766. Special thanks to Doug
Ewell, for his production of the first complete subtag registry, and
his work in producing a test parser for verifying language tags.
Appendix B. Examples of Language Tags (Informative)
Simple language subtag:
i-enochian (example of a grandfathered tag)
Language subtag plus Script subtag:
zh-Hant (Chinese written using the Traditional Chinese script)
zh-Hans (Chinese written using the Simplified Chinese script)
sr-Cyrl (Serbian written using the Cyrillic script)
sr-Latn (Serbian written using the Latin script)
zh-Hans-CN (Chinese written using the Simplified script as used in
sr-Latn-CS (Serbian written using the Latin script as used in
Serbia and Montenegro)
sl-rozaj (Resian dialect of Slovenian
sl-nedis (Nadiza dialect of Slovenian)
de-CH-1901 (German as used in Switzerland using the 1901 variant
sl-IT-nedis (Slovenian as used in Italy, Nadiza dialect)
sl-Latn-IT-nedis (Nadiza dialect of Slovenian written using the
Latin script as used in Italy. Note that this tag is NOT
RECOMMENDED because subtag 'sl' has a Suppress-Script value of
de-DE (German for Germany)
en-US (English as used in the United States)
es-419 (Spanish appropriate for the Latin America and Caribbean
region using the UN region code)
Private use subtags:
Extended language subtags (examples ONLY: extended languages MUST be
defined by revision or update to this document):
Private use registry values:
x-whatever (private use using the singleton 'x')
qaa-Qaaa-QM-x-southern (all private tags)
de-Qaaa (German, with a private script)
sr-Latn-QM (Serbian, Latin-script, private region)
sr-Qaaa-CS (Serbian, private script, for Serbia and Montenegro)
Tags that use extensions (examples ONLY: extensions MUST be defined
by revision or update to this document or by RFC):
Some Invalid Tags:
de-419-DE (two region tags)
a-DE (use of a single-character subtag in primary position; note
that there are a few grandfathered tags that start with "i-" that
ar-a-aaa-b-bbb-a-ccc (two extensions with same single-letter
Addison Phillips (Editor)
Mark Davis (Editor)
EMail: firstname.lastname@example.org or email@example.com
Full Copyright Statement
Copyright (C) The Internet Society (2006).
This document is subject to the rights, licenses and restrictions
contained in BCP 78, and except as set forth therein, the authors
retain all their rights.
This document and the information contained herein are provided on an
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed to
pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights
might or might not be available; nor does it represent that it has
made any independent effort to identify any such rights. Information
on the procedures with respect to rights in RFC documents can be
found in BCP 78 and BCP 79.
Copies of IPR disclosures made to the IETF Secretariat and any
assurances of licenses to be made available, or the result of an
attempt made to obtain a general license or permission for the use of
such proprietary rights by implementers or users of this
specification can be obtained from the IETF on-line IPR repository at
The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary
rights that may cover technology that may be required to implement
this standard. Please address the information to the IETF at
Funding for the RFC Editor function is provided by the IETF
Administrative Support Activity (IASA).