6. Security Considerations
Language tags used in content negotiation, like any other information
exchanged on the Internet, might be a source of concern because they
might be used to infer the nationality of the sender, and thus
identify potential targets for surveillance.
This is a special case of the general problem that anything sent is
visible to the receiving party and possibly to third parties as well.
It is useful to be aware that such concerns can exist in some cases.
The evaluation of the exact magnitude of the threat, and any possible
countermeasures, is left to each application protocol (see BCP 72
[RFC3552] for best current practice guidance on security threats and
The language tag associated with a particular information item is of
no consequence whatsoever in determining whether that content might
contain possible homographs. The fact that a text is tagged as being
in one language or using a particular script subtag provides no
assurance whatsoever that it does not contain characters from scripts
other than the one(s) associated with or specified by that language
Since there is no limit to the number of variant, private use, and
extension subtags, and consequently no limit on the possible length
of a tag, implementations need to guard against buffer overflow
attacks. See Section 4.4 for details on language tag truncation,
which can occur as a consequence of defenses against buffer overflow.
To prevent denial-of-service attacks, applications SHOULD NOT depend
on either the Language Subtag Registry or the Language Tag Extensions
Registry being always accessible. Additionally, although the
specification of valid subtags for an extension (see Section 3.7)
MUST be available over the Internet, implementations SHOULD NOT
mechanically depend on those sources being always accessible.
The registries specified in this document are not suitable for
frequent or real-time access to, or retrieval of, the full registry
contents. Most applications do not need registry data at all. For
others, being able to validate or canonicalize language tags as of a
particular registry date will be sufficient, as the registry contents
change only occasionally. Changes are announced to
<firstname.lastname@example.org>. This mailing list is
intended for interested organizations and individuals, not for bulk
subscription to trigger automatic software updates. The size of the
registry makes it unsuitable for automatic software updates.
Implementers considering integrating the Language Subtag Registry in
an automatic updating scheme are strongly advised to distribute only
suitably encoded differences, and only via their own infrastructure
-- not directly from IANA.
Changes, or the absence thereof, can also easily be detected by
looking at the 'File-Date' record at the start of the registry, or by
using features of the protocol used for downloading, without having
to download the full registry. At the time of publication of this
document, IANA is making the Language Tag Registry available over
HTTP 1.1. The proper way to update a local copy of the Language
Subtag Registry using HTTP 1.1 is to use a conditional GET [RFC2616].
7. Character Set Considerations
The syntax in this document requires that language tags use only the
characters A-Z, a-z, 0-9, and HYPHEN-MINUS, which are present in most
character sets, so the composition of language tags shouldn't have
any character set issues.
The rendering of text based on the language tag is not addressed
here. Historically, some processes have relied on the use of
character set/encoding information (or other external information) in
order to infer how a specific string of characters should be
rendered. Notably, this applies to language- and culture-specific
variations of Han ideographs as used in Japanese, Chinese, and
Korean, where use of, for example, a Japanese character encoding such
as EUC-JP implies that the text itself is in Japanese. When language
tags are applied to spans of text, rendering engines might be able to
use that information to better select fonts or make other rendering
choices, particularly where languages with distinct writing
traditions use the same characters.
8. Changes from RFC 4646
The main goal for this revision of RFC 4646 was to incorporate two
new parts of ISO 639 (ISO 639-3 and ISO 639-5) and their attendant
sets of language codes into the IANA Language Subtag Registry. This
permits the identification of many more languages and language
collections than previously supported.
The specific changes in this document to meet these goals are:
o Defined the incorporation of ISO 639-3 and ISO 639-5 codes for use
as primary and extended language subtags. It also permanently
reserves and disallows the use of additional 'extlang' subtags.
The changes necessary to achieve this were:
* Modified the ABNF comments.
* Updated various registration and stability requirements
sections to reference ISO 639-3 and ISO 639-5 in addition to
ISO 639-1 and ISO 639-2.
* Edited the text to eliminate references to extended language
subtags where they are no longer used.
* Explained the change in the section on extended language
o Changed the ABNF related to grandfathered tags. The irregular
tags are now listed. Well-formed grandfathered tags are now
described by the 'langtag' production, and the 'grandfathered'
production was removed as a result. Also: added description of
both types of grandfathered tags to Section 2.2.8.
o Added the paragraph on "collections" to Section 4.1.
o Changed the capitalization rules for 'Tag' fields in Section 3.1.
o Split Section 3.1 up into subsections.
o Modified Section 3.5 to allow 'Suppress-Script' fields to be
added, modified, or removed via the registration process. This
was an erratum from RFC 4646.
o Modified examples that used region code 'CS' (formerly Serbia and
Montenegro) to use 'RS' (Serbia) instead.
o Modified the rules for creating and maintaining record
'Description' fields to prevent duplicates, including inverted
o Removed the lengthy description of why RFC 4646 was created from
this section, which also caused the removal of the reference to
o Modified the text in Section 2.1 to place more emphasis on the
fact that language tags are not case sensitive.
o Replaced the example "fr-Latn-CA" in Section 2.1 with "sr-Latn-RS"
and "az-Arab-IR" because "fr-Latn-CA" doesn't respect the
'Suppress-Script' on 'Latn' with 'fr'.
o Changed the requirements for well-formedness to make singleton
repetition checking optional (it is required for validity
checking) in Section 2.2.9.
o Changed the text in Section 2.2.9 referring to grandfathered
checking to note that the list is now included in the ABNF.
o Modified and added text to Section 3.2. The job description was
placed first. A note was added making clear that the Language
Subtag Reviewer may delegate various non-critical duties,
including list moderation. Finally, additional text was added to
make the appointment process clear and to clarify that decisions
and performance of the reviewer are appealable.
o Added text to Section 3.5 clarifying that the
email@example.com list is operated by whomever the IESG
o Added text to Section 3.1.5 clarifying that the first Description
in a 'language' record matches the corresponding Reference Name
for the language in ISO 639-3.
o Modified Section 2.2.9 to define classes of conformance related to
specific tags (formerly 'well-formed' and 'valid' referred to
implementations). Notes were added about the removal of 'extlang'
from the ABNF provided in RFC 4646, allowing for well-formedness
using this older definition. Reference to RFC 3066 well-
formedness was also added.
o Added text to the end of Section 3.1.2 noting that future versions
of this document might add new field types to the registry format
and recommending that implementations ignore any unrecognized
o Added text about what the lack of a 'Suppress-Script' field means
in a record to Section 3.1.9.
o Added text allowing the correction of misspellings and typographic
errors to Section 3.1.5.
o Added text to Section 3.1.8 disallowing 'Prefix' field conflicts
(such as circular prefix references).
o Modified text in Section 3.5 to require the subtag reviewer to
announce his/her decision (or extension) following the two-week
period. Also clarified that any decision or failure to decide can
o Modified text in Section 4.1 to include the (heretofore anecdotal)
guiding principle of tag choice, and clarifying the non-use of
script subtags in non-written applications.
o Prohibited multiple use of the same variant in a tag (i.e., "de-
1901-1901"). Previously, this was only a recommendation
o Removed inappropriate [RFC2119] language from the illustration in
o Replaced the example of deprecating "zh-guoyu" with "zh-
hakka"->"hak" in Section 4.5, noting that it was this document
that caused the change.
o Replaced the section in Section 4.1 dealing with "mul"/"und" to
include the subtags 'zxx' and 'mis', as well as the tag
"i-default". A normative reference to RFC 2277 was added.
o Added text to Section 3.5 clarifying that any modifications of a
registration request must be sent to the <firstname.lastname@example.org>
list before submission to IANA.
o Changed the ABNF for the record-jar format from using the LWSP
production to use a folding whitespace production similar to obs-
FWS in [RFC5234]. This effectively prevents unintentional blank
lines inside a field.
o Clarified and revised text in Sections 3.3, 3.5, and 5.1 to
clarify that the Language Subtag Reviewer sends the complete
registration forms to IANA, that IANA extracts the record from the
form, and that the forms must also be archived separately from the
o Added text to Section 5 requiring IANA to send an announcement to
an ietf-languages-announcements list whenever the registry is
o Modification of the registry to use UTF-8 as its character
encoding. This also entails additional instructions to IANA and
the Language Subtag Reviewer in the registration process.
o Modified the rules in Section 2.2.4 so that "exceptionally
reserved" ISO 3166-1 codes other than 'UK' were included into the
registry. In particular, this allows the code 'EU' (European
Union) to be used to form language tags or (more commonly) for
applications that use the registry for region codes to reference
o Modified the IANA considerations section (Section 5) to remove
unnecessary normative [RFC2119] language.
9.1. Normative References
[ISO15924] International Organization for Standardization, "ISO
15924:2004. Information and documentation -- Codes
for the representation of names of scripts",
[ISO3166-1] International Organization for Standardization, "ISO
3166-1:2006. Codes for the representation of names
of countries and their subdivisions -- Part 1:
Country codes", November 2006.
[ISO639-1] International Organization for Standardization, "ISO
639-1:2002. Codes for the representation of names
of languages -- Part 1: Alpha-2 code", July 2002.
[ISO639-2] International Organization for Standardization, "ISO
639-2:1998. Codes for the representation of names
of languages -- Part 2: Alpha-3 code", October 1998.
[ISO639-3] International Organization for Standardization, "ISO
639-3:2007. Codes for the representation of names
of languages - Part 3: Alpha-3 code for
comprehensive coverage of languages", February 2007.
[ISO639-5] International Organization for Standardization, "ISO
639-5:2008. Codes for the representation of names of
languages -- Part 5: Alpha-3 code for language
families and groups", May 2008.
[ISO646] International Organization for Standardization,
"ISO/IEC 646:1991, Information technology -- ISO
7-bit coded character set for information
[RFC2026] Bradner, S., "The Internet Standards Process --
Revision 3", BCP 9, RFC 2026, October 1996.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC2277] Alvestrand, H., "IETF Policy on Character Sets and
Languages", BCP 18, RFC 2277, January 1998.
[RFC3339] Klyne, G., Ed. and C. Newman, "Date and Time on the
Internet: Timestamps", RFC 3339, July 2002.
[RFC4647] Phillips, A. and M. Davis, "Matching of Language
Tags", BCP 47, RFC 4647, September 2006.
[RFC5226] Narten, T. and H. Alvestrand, "Guidelines for
Writing an IANA Considerations Section in RFCs",
BCP 26, RFC 5226, May 2008.
[RFC5234] Crocker, D. and P. Overell, "Augmented BNF for
Syntax Specifications: ABNF", STD 68, RFC 5234,
[SpecialCasing] The Unicode Consoritum, "Unicode Character Database,
Special Casing Properties", March 2008, <http://
[UAX14] Freitag, A., "Unicode Standard Annex #14: Line
Breaking Properties", August 2006,
[UN_M.49] Statistics Division, United Nations, "Standard
Country or Area Codes for Statistical Use", Revision
4 (United Nations publication, Sales No. 98.XVII.9,
[Unicode] Unicode Consortium, "The Unicode Consortium. The
Unicode Standard, Version 5.0, (Boston, MA, Addison-
Wesley, 2003. ISBN 0-321-49081-0)", January 2007.
9.2. Informative References
[CLDR] "The Common Locale Data Repository Project",
[RFC1766] Alvestrand, H., "Tags for the Identification of
Languages", RFC 1766, March 1995.
[RFC2028] Hovey, R. and S. Bradner, "The Organizations
Involved in the IETF Standards Process", BCP 11,
RFC 2028, October 1996.
[RFC2046] Freed, N. and N. Borenstein, "Multipurpose Internet
Mail Extensions (MIME) Part Two: Media Types",
RFC 2046, November 1996.
[RFC2047] Moore, K., "MIME (Multipurpose Internet Mail
Extensions) Part Three: Message Header Extensions
for Non-ASCII Text", RFC 2047, November 1996.
[RFC2231] Freed, N. and K. Moore, "MIME Parameter Value and
Encoded Word Extensions:
Character Sets, Languages, and Continuations",
RFC 2231, November 1997.
[RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H.,
Masinter, L., Leach, P., and T. Berners-Lee,
"Hypertext Transfer Protocol -- HTTP/1.1", RFC 2616,
[RFC2781] Hoffman, P. and F. Yergeau, "UTF-16, an encoding of
ISO 10646", RFC 2781, February 2000.
[RFC3066] Alvestrand, H., "Tags for the Identification of
Languages", RFC 3066, January 2001.
[RFC3282] Alvestrand, H., "Content Language Headers",
RFC 3282, May 2002.
[RFC3552] Rescorla, E. and B. Korver, "Guidelines for Writing
RFC Text on Security Considerations", BCP 72,
RFC 3552, July 2003.
Appendix A. Examples of Language Tags (Informative)
Simple language subtag:
i-enochian (example of a grandfathered tag)
Language subtag plus Script subtag:
zh-Hant (Chinese written using the Traditional Chinese script)
zh-Hans (Chinese written using the Simplified Chinese script)
sr-Cyrl (Serbian written using the Cyrillic script)
sr-Latn (Serbian written using the Latin script)
Extended language subtags and their primary language subtag
zh-cmn-Hans-CN (Chinese, Mandarin, Simplified script, as used in
cmn-Hans-CN (Mandarin Chinese, Simplified script, as used in
zh-yue-HK (Chinese, Cantonese, as used in Hong Kong SAR)
yue-HK (Cantonese Chinese, as used in Hong Kong SAR)
zh-Hans-CN (Chinese written using the Simplified script as used in
sr-Latn-RS (Serbian written using the Latin script as used in
sl-rozaj (Resian dialect of Slovenian)
sl-rozaj-biske (San Giorgio dialect of Resian dialect of
sl-nedis (Nadiza dialect of Slovenian)
de-CH-1901 (German as used in Switzerland using the 1901 variant
sl-IT-nedis (Slovenian as used in Italy, Nadiza dialect)
hy-Latn-IT-arevela (Eastern Armenian written in Latin script, as
used in Italy)
de-DE (German for Germany)
en-US (English as used in the United States)
es-419 (Spanish appropriate for the Latin America and Caribbean
region using the UN region code)
Private use subtags:
Private use registry values:
x-whatever (private use using the singleton 'x')
qaa-Qaaa-QM-x-southern (all private tags)
de-Qaaa (German, with a private script)
sr-Latn-QM (Serbian, Latin script, private region)
sr-Qaaa-RS (Serbian, private script, for Serbia)
Tags that use extensions (examples ONLY -- extensions MUST be defined
by revision or update to this document, or by RFC):
Some Invalid Tags:
de-419-DE (two region tags)
a-DE (use of a single-character subtag in primary position; note
that there are a few grandfathered tags that start with "i-" that
ar-a-aaa-b-bbb-a-ccc (two extensions with same single-letter
Appendix B. Examples of Registration Forms
LANGUAGE SUBTAG REGISTRATION FORM
1. Name of requester: Han Steenwijk
2. E-mail address of requester: han.steenwijk @ unipd.it
3. Record Requested:
Description: The San Giorgio dialect of Resian
Description: The Bila dialect of Resian
Comments: The dialect of San Giorgio/Bila is one of the
four major local dialects of Resian
4. Intended meaning of the subtag:
The local variety of Resian as spoken in San Giorgio/Bila
5. Reference to published description of the language (book or
-- Jan I.N. Baudouin de Courtenay - Opyt fonetiki rez'janskich
govorov, Varsava - Peterburg: Vende - Kozancikov, 1875.
LANGUAGE SUBTAG REGISTRATION FORM
1. Name of requester: Jaska Zedlik
2. E-mail address of requester: jz53 @ zedlik.com
3. Record Requested:
Description: Belarusian in Taraskievica orthography
Comments: The subtag represents Branislau Taraskievic's Belarusian
orthography as published in "Bielaruski klasycny pravapis" by
Juras Buslakou, Vincuk Viacorka, Zmicier Sanko, and Zmicier Sauka
4. Intended meaning of the subtag:
The subtag is intended to represent the Belarusian orthography as
published in "Bielaruski klasycny pravapis" by Juras Buslakou, Vincuk
Viacorka, Zmicier Sanko, and Zmicier Sauka (Vilnia-Miensk 2005).
5. Reference to published description of the language (book or
Taraskievic, Branislau. Bielaruskaja gramatyka dla skol. Vilnia: Vyd.
"Bielaruskaha kamitetu", 1929, 5th edition.
Buslakou, Juras; Viacorka, Vincuk; Sanko, Zmicier; Sauka, Zmicier.
Bielaruski klasycny pravapis. Vilnia-Miensk, 2005.
6. Any other relevant information:
Belarusian in Taraskievica orthography became widely used, especially
in Belarusian-speaking Internet segment, but besides this some books
and newspapers are also printed using this orthography of Belarusian.
Appendix C. Acknowledgements
Any list of contributors is bound to be incomplete; please regard the
following as only a selection from the group of people who have
contributed to make this document what it is today.
The contributors to RFC 4646, RFC 4647, RFC 3066, and RFC 1766, the
precursors of this document, made enormous contributions directly or
indirectly to this document and are generally responsible for the
success of language tags.
The following people contributed to this document:
Stephane Bortzmeyer, Karen Broome, Peter Constable, John Cowan,
Martin Duerst, Frank Ellerman, Doug Ewell, Deborah Garside, Marion
Gunn, Alfred Hoenes, Kent Karlsson, Chris Newman, Randy Presuhn,
Stephen Silver, Shawn Steele, and many, many others.
Very special thanks must go to Harald Tveit Alvestrand, who
originated RFCs 1766 and 3066, and without whom this document would
not have been possible.
Special thanks go to Michael Everson, who served as the Language Tag
Reviewer for almost the entire RFC 1766/RFC 3066 period, as well as
the Language Subtag Reviewer since the adoption of RFC 4646.
Special thanks also go to Doug Ewell, for his production of the first
complete subtag registry, his work to support and maintain new
registrations, and his careful editorship of both RFC 4645 and
Addison Phillips (editor)
Mark Davis (editor)