tech-invite   World Map     

IETF     RFCs     Groups     SIP     ABNFs    |    3GPP     Specs     Glossaries     Architecture     IMS     UICC    |    search

RFC 4646


Pages: 59
Top     in Index     Prev     Next
 

Tags for Identifying Languages

Part 1 of 3, p. 1 to 18
None       Next RFC Part

Obsoleted by:    5646
Obsoletes:    3066


Top       ToC       Page 1 
Network Working Group                                   A. Phillips, Ed.
Request for Comments: 4646                                   Yahoo! Inc.
BCP: 47                                                    M. Davis, Ed.
Obsoletes: 3066                                                   Google
Category: Best Current Practice                           September 2006


                     Tags for Identifying Languages

Status of This Memo

   This document specifies an Internet Best Current Practices for the
   Internet Community, and requests discussion and suggestions for
   improvements.  Distribution of this memo is unlimited.

Copyright Notice

   Copyright (C) The Internet Society (2005).

Abstract

   This document describes the structure, content, construction, and
   semantics of language tags for use in cases where it is desirable to
   indicate the language used in an information object.  It also
   describes how to register values for use in language tags and the
   creation of user-defined extensions for private interchange.  This
   document, in combination with RFC 4647, replaces RFC 3066, which
   replaced RFC 1766.

Top       Page 2 
Table of Contents

   1. Introduction ....................................................3
   2. The Language Tag ................................................4
      2.1. Syntax .....................................................4
      2.2. Language Subtag Sources and Interpretation .................7
           2.2.1. Primary Language Subtag .............................8
           2.2.2. Extended Language Subtags ..........................10
           2.2.3. Script Subtag ......................................11
           2.2.4. Region Subtag ......................................11
           2.2.5. Variant Subtags ....................................13
           2.2.6. Extension Subtags ..................................14
           2.2.7. Private Use Subtags ................................16
           2.2.8. Preexisting RFC 3066 Registrations .................16
           2.2.9. Classes of Conformance .............................17
   3. Registry Format and Maintenance ................................18
      3.1. Format of the IANA Language Subtag Registry ...............18
      3.2. Language Subtag Reviewer ..................................24
      3.3. Maintenance of the Registry ...............................24
      3.4. Stability of IANA Registry Entries ........................25
      3.5. Registration Procedure for Subtags ........................29
      3.6. Possibilities for Registration ............................32
      3.7. Extensions and Extensions Registry ........................34
      3.8. Initialization of the Registries ..........................37
   4. Formation and Processing of Language Tags ......................38
      4.1. Choice of Language Tag ....................................38
      4.2. Meaning of the Language Tag ...............................40
      4.3. Length Considerations .....................................41
           4.3.1. Working with Limited Buffer Sizes ..................42
           4.3.2. Truncation of Language Tags ........................43
      4.4. Canonicalization of Language Tags .........................44
      4.5. Considerations for Private Use Subtags ....................45
   5. IANA Considerations ............................................46
      5.1. Language Subtag Registry ..................................46
      5.2. Extensions Registry .......................................47
   6. Security Considerations ........................................48
   7. Character Set Considerations ...................................48
   8. Changes from RFC 3066 ..........................................49
   9. References .....................................................52
      9.1. Normative References ......................................52
      9.2. Informative References ....................................53
   Appendix A. Acknowledgements ......................................55
   Appendix B. Examples of Language Tags (Informative) ...............56

Top      ToC       Page 3 
1.  Introduction

   Human beings on our planet have, past and present, used a number of
   languages.  There are many reasons why one would want to identify the
   language used when presenting or requesting information.

   A user's language preferences often need to be identified so that
   appropriate processing can be applied.  For example, the user's
   language preferences in a Web browser can be used to select Web pages
   appropriately.  Language preferences can also be used to select among
   tools (such as dictionaries) to assist in the processing or
   understanding of content in different languages.

   In addition, knowledge about the particular language used by some
   piece of information content might be useful or even required by some
   types of processing; for example, spell-checking, computer-
   synthesized speech, Braille transcription, or high-quality print
   renderings.

   One means of indicating the language used is by labeling the
   information content with an identifier or "tag".  These tags can be
   used to specify user preferences when selecting information content,
   or for labeling additional attributes of content and associated
   resources.

   Tags can also be used to indicate additional language attributes of
   content.  For example, indicating specific information about the
   dialect, writing system, or orthography used in a document or
   resource may enable the user to obtain information in a form that
   they can understand, or it can be important in processing or
   rendering the given content into an appropriate form or style.

   This document specifies a particular identifier mechanism (the
   language tag) and a registration function for values to be used to
   form tags.  It also defines a mechanism for private use values and
   future extension.

   This document, in combination with [RFC4647], replaces [RFC3066],
   which replaced [RFC1766].  For a list of changes in this document,
   see Section 8.

   The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in [RFC2119].

Top      ToC       Page 4 
2.  The Language Tag

   Language tags are used to help identify languages, whether spoken,
   written, signed, or otherwise signaled, for the purpose of
   communication.  This includes constructed and artificial languages,
   but excludes languages not intended primarily for human
   communication, such as programming languages.

2.1.  Syntax

   The language tag is composed of one or more parts, known as
   "subtags".  Each subtag consists of a sequence of alphanumeric
   characters.  Subtags are distinguished and separated from one another
   by a hyphen ("-", ABNF [RFC4234] %x2D).  A language tag consists of a
   "primary language" subtag and a (possibly empty) series of subsequent
   subtags, each of which refines or narrows the range of languages
   identified by the overall tag.

   Usually, each type of subtag is distinguished by length, position in
   the tag, and content: subtags can be recognized solely by these
   features.  The only exception to this is a fixed list of
   grandfathered tags registered under RFC 3066 [RFC3066].  This makes
   it possible to construct a parser that can extract and assign some
   semantic information to the subtags, even if the specific subtag
   values are not recognized.  Thus, a parser need not have an up-to-
   date copy (or any copy at all) of the subtag registry to perform most
   searching and matching operations.

Top      ToC       Page 5 
   The syntax of the language tag in ABNF [RFC4234] is:

   Language-Tag  = langtag
                 / privateuse             ; private use tag
                 / grandfathered          ; grandfathered registrations

   langtag       = (language
                    ["-" script]
                    ["-" region]
                    *("-" variant)
                    *("-" extension)
                    ["-" privateuse])

   language      = (2*3ALPHA [ extlang ]) ; shortest ISO 639 code
                 / 4ALPHA                 ; reserved for future use
                 / 5*8ALPHA               ; registered language subtag

   extlang       = *3("-" 3ALPHA)         ; reserved for future use

   script        = 4ALPHA                 ; ISO 15924 code

   region        = 2ALPHA                 ; ISO 3166 code
                 / 3DIGIT                 ; UN M.49 code

   variant       = 5*8alphanum            ; registered variants
                 / (DIGIT 3alphanum)

   extension     = singleton 1*("-" (2*8alphanum))

   singleton     = %x41-57 / %x59-5A / %x61-77 / %x79-7A / DIGIT
                 ; "a"-"w" / "y"-"z" / "A"-"W" / "Y"-"Z" / "0"-"9"
                 ; Single letters: x/X is reserved for private use

   privateuse    = ("x"/"X") 1*("-" (1*8alphanum))

   grandfathered = 1*3ALPHA 1*2("-" (2*8alphanum))
                   ; grandfathered registration
                   ; Note: i is the only singleton
                   ; that starts a grandfathered tag

   alphanum      = (ALPHA / DIGIT)       ; letters and numbers

                        Figure 1: Language Tag ABNF

   Note: There is a subtlety in the ABNF for 'variant': variants
   starting with a digit MAY be four characters long, while those
   starting with a letter MUST be at least five characters long.

Top      ToC       Page 6 
   All subtags have a maximum length of eight characters and whitespace
   is not permitted in a language tag.  For examples of language tags,
   see Appendix B.

   Note that although [RFC4234] refers to octets, the language tags
   described in this document are sequences of characters from the
   US-ASCII [ISO646] repertoire.  Language tags MAY be used in documents
   and applications that use other encodings, so long as these encompass
   the US-ASCII repertoire.  An example of this would be an XML document
   that uses the UTF-16LE [RFC2781] encoding of [Unicode].

   The tags and their subtags, including private use and extensions, are
   to be treated as case insensitive: there exist conventions for the
   capitalization of some of the subtags, but these MUST NOT be taken to
   carry meaning.

   For example:

   o  [ISO639-1] recommends that language codes be written in lowercase
      ('mn' Mongolian).

   o  [ISO3166-1] recommends that country codes be capitalized ('MN'
      Mongolia).

   o  [ISO15924] recommends that script codes use lowercase with the
      initial letter capitalized ('Cyrl' Cyrillic).

   However, in the tags defined by this document, the uppercase US-ASCII
   letters in the range 'A' through 'Z' are considered equivalent and
   mapped directly to their US-ASCII lowercase equivalents in the range
   'a' through 'z'.  Thus, the tag "mn-Cyrl-MN" is not distinct from
   "MN-cYRL-mn" or "mN-cYrL-Mn" (or any other combination), and each of
   these variations conveys the same meaning: Mongolian written in the
   Cyrillic script as used in Mongolia.

   Although case distinctions do not carry meaning in language tags,
   consistent formatting and presentation of the tags will aid users.
   The format of the tags and subtags in the registry is RECOMMENDED.
   In this format, all non-initial two-letter subtags are uppercase, all
   non-initial four-letter subtags are titlecase, and all other subtags
   are lowercase.

Top      ToC       Page 7 
2.2.  Language Subtag Sources and Interpretation

   The namespace of language tags and their subtags is administered by
   the Internet Assigned Numbers Authority (IANA) [RFC2860] according to
   the rules in Section 5 of this document.  The Language Subtag
   Registry maintained by IANA is the source for valid subtags: other
   standards referenced in this section provide the source material for
   that registry.

   Terminology in this section:

   o  Tag or tags refers to a complete language tag, such as
      "fr-Latn-CA".  Examples of tags in this document are enclosed in
      double-quotes ("en-US").

   o  Subtag refers to a specific section of a tag, delimited by hyphen,
      such as the subtag 'Latn' in "fr-Latn-CA".  Examples of subtags in
      this document are enclosed in single quotes ('Latn').

   o  Code or codes refers to values defined in external standards (and
      that are used as subtags in this document).  For example, 'Latn'
      is an [ISO15924] script code that was used to define the 'Latn'
      script subtag for use in a language tag.  Examples of codes in
      this document are enclosed in single quotes ('en', 'Latn').

   The definitions in this section apply to the various subtags within
   the language tags defined by this document, excepting those
   "grandfathered" tags defined in Section 2.2.8.

   Language tags are designed so that each subtag type has unique length
   and content restrictions.  These make identification of the subtag's
   type possible, even if the content of the subtag itself is
   unrecognized.  This allows tags to be parsed and processed without
   reference to the latest version of the underlying standards or the
   IANA registry and makes the associated exception handling when
   parsing tags simpler.

   Subtags in the IANA registry that do not come from an underlying
   standard can only appear in specific positions in a tag.
   Specifically, they can only occur as primary language subtags or as
   variant subtags.

   Note that sequences of private use and extension subtags MUST occur
   at the end of the sequence of subtags and MUST NOT be interspersed
   with subtags defined elsewhere in this document.

   Single-letter and single-digit subtags are reserved for current or
   future use.  These include the following current uses:

Top      ToC       Page 8 
   o  The single-letter subtag 'x' is reserved to introduce a sequence
      of private use subtags.  The interpretation of any private use
      subtags is defined solely by private agreement and is not defined
      by the rules in this section or in any standard or registry
      defined in this document.

   o  All other single-letter subtags are reserved to introduce
      standardized extension subtag sequences as described in
      Section 3.7.

   The single-letter subtag 'i' is used by some grandfathered tags, such
   as "i-enochian", where it always appears in the first position and
   cannot be confused with an extension.

2.2.1.  Primary Language Subtag

   The primary language subtag is the first subtag in a language tag
   (with the exception of private use and certain grandfathered tags)
   and cannot be omitted.  The following rules apply to the primary
   language subtag:

   1.  All two-character language subtags were defined in the IANA
       registry according to the assignments found in the standard ISO
       639 Part 1, "ISO 639-1:2002, Codes for the representation of
       names of languages -- Part 1: Alpha-2 code" [ISO639-1], or using
       assignments subsequently made by the ISO 639 Part 1 maintenance
       agency or governing standardization bodies.

   2.  All three-character language subtags were defined in the IANA
       registry according to the assignments found in ISO 639 Part 2,
       "ISO 639-2:1998 - Codes for the representation of names of
       languages -- Part 2: Alpha-3 code - edition 1" [ISO639-2], or
       assignments subsequently made by the ISO 639 Part 2 maintenance
       agency or governing standardization bodies.

   3.  The subtags in the range 'qaa' through 'qtz' are reserved for
       private use in language tags.  These subtags correspond to codes
       reserved by ISO 639-2 for private use.  These codes MAY be used
       for non-registered primary language subtags (instead of using
       private use subtags following 'x-').  Please refer to Section 4.5
       for more information on private use subtags.

   4.  All four-character language subtags are reserved for possible
       future standardization.

   5.  All language subtags of 5 to 8 characters in length in the IANA
       registry were defined via the registration process in Section 3.5
       and MAY be used to form the primary language subtag.  At the time

Top      ToC       Page 9 
       this document was created, there were no examples of this kind of
       subtag and future registrations of this type will be discouraged:
       primary languages are strongly RECOMMENDED for registration with
       ISO 639, and proposals rejected by ISO 639/RA will be closely
       scrutinized before they are registered with IANA.

   6.  The single-character subtag 'x' as the primary subtag indicates
       that the language tag consists solely of subtags whose meaning is
       defined by private agreement.  For example, in the tag "x-fr-CH",
       the subtags 'fr' and 'CH' SHOULD NOT be taken to represent the
       French language or the country of Switzerland (or any other value
       in the IANA registry) unless there is a private agreement in
       place to do so.  See Section 4.5.

   7.  The single-character subtag 'i' is used by some grandfathered
       tags (see Section 2.2.8) such as "i-klingon" and "i-bnn".  (Other
       grandfathered tags have a primary language subtag in their first
       position.)

   8.  Other values MUST NOT be assigned to the primary subtag except by
       revision or update of this document.

   Note: For languages that have both an ISO 639-1 two-character code
   and an ISO 639-2 three-character code, only the ISO 639-1 two-
   character code is defined in the IANA registry.

   Note: For languages that have no ISO 639-1 two-character code and for
   which the ISO 639-2/T (Terminology) code and the ISO 639-2/B
   (Bibliographic) codes differ, only the Terminology code is defined in
   the IANA registry.  At the time this document was created, all
   languages that had both kinds of three-character code were also
   assigned a two-character code; it is not expected that future
   assignments of this nature will occur.

   Note: To avoid problems with versioning and subtag choice as
   experienced during the transition between RFC 1766 and RFC 3066, as
   well as the canonical nature of subtags defined by this document, the
   ISO 639 Registration Authority Joint Advisory Committee (ISO 639/
   RA-JAC) has included the following statement in [iso639.prin]:

   "A language code already in ISO 639-2 at the point of freezing ISO
   639-1 shall not later be added to ISO 639-1.  This is to ensure
   consistency in usage over time, since users are directed in Internet
   applications to employ the alpha-3 code when an alpha-2 code for that
   language is not available."

Top      ToC       Page 10 
   In order to avoid instability in the canonical form of tags, if a
   two-character code is added to ISO 639-1 for a language for which a
   three-character code was already included in ISO 639-2, the two-
   character code MUST NOT be registered.  See Section 3.4.

   For example, if some content were tagged with 'haw' (Hawaiian), which
   currently has no two-character code, the tag would not be invalidated
   if ISO 639-1 were to assign a two-character code to the Hawaiian
   language at a later date.

   For example, one of the grandfathered IANA registrations is
   "i-enochian".  The subtag 'enochian' could be registered in the IANA
   registry as a primary language subtag (assuming that ISO 639 does not
   register this language first), making tags such as "enochian-AQ" and
   "enochian-Latn" valid.

2.2.2.  Extended Language Subtags

   The following rules apply to the extended language subtags:

   1.  Three-letter subtags immediately following the primary subtag are
       reserved for future standardization, anticipating work that is
       currently under way on ISO 639.

   2.  Extended language subtags MUST follow the primary subtag and
       precede any other subtags.

   3.  There MAY be up to three extended language subtags.

   4.  Extended language subtags MUST NOT be registered or used to form
       language tags.  Their syntax is described here so that
       implementations can be compatible with any future revision of
       this document that does provide for their registration.

   Extended language subtag records, once they appear in the registry,
   MUST include exactly one 'Prefix' field indicating an appropriate
   language subtag or sequence of subtags that MUST always appear as a
   prefix to the extended language subtag.

   Example: In a future revision or update of this document, the tag
   "zh-gan" (registered under RFC 3066) might become a valid non-
   grandfathered (that is, redundant) tag in which the subtag 'gan'
   might represent the Chinese dialect 'Gan'.

Top      ToC       Page 11 
2.2.3.  Script Subtag

   Script subtags are used to indicate the script or writing system
   variations that distinguish the written forms of a language or its
   dialects.  The following rules apply to the script subtags:

   1.  All four-character subtags were defined according to
       [ISO15924]--"Codes for the representation of names of scripts":
       alpha-4 script codes, or subsequently assigned by the ISO 15924
       maintenance agency or governing standardization bodies, denoting
       the script or writing system used in conjunction with this
       language.

   2.  Script subtags MUST immediately follow the primary language
       subtag and all extended language subtags and MUST occur before
       any other type of subtag described below.

   3.  The script subtags 'Qaaa' through 'Qabx' are reserved for private
       use in language tags.  These subtags correspond to codes reserved
       by ISO 15924 for private use.  These codes MAY be used for non-
       registered script values.  Please refer to Section 4.5 for more
       information on private use subtags.

   4.  Script subtags MUST NOT be registered using the process in
       Section 3.5 of this document.  Variant subtags MAY be considered
       for registration for that purpose.

   5.  There MUST be at most one script subtag in a language tag, and
       the script subtag SHOULD be omitted when it adds no
       distinguishing value to the tag or when the primary language
       subtag's record includes a Suppress-Script field listing the
       applicable script subtag.

   Example: "sr-Latn" represents Serbian written using the Latin script.

2.2.4.  Region Subtag

   Region subtags are used to indicate linguistic variations associated
   with or appropriate to a specific country, territory, or region.
   Typically, a region subtag is used to indicate regional dialects or
   usage, or region-specific spelling conventions.  A region subtag can
   also be used to indicate that content is expressed in a way that is
   appropriate for use throughout a region, for instance, Spanish
   content tailored to be useful throughout Latin America.

Top      ToC       Page 12 
   The following rules apply to the region subtags:

   1.  Region subtags MUST follow any language, extended language, or
       script subtags and MUST precede all other subtags.

   2.  All two-character subtags following the primary subtag were
       defined in the IANA registry according to the assignments found
       in [ISO3166-1] ("Codes for the representation of names of
       countries and their subdivisions -- Part 1: Country codes") using
       the list of alpha-2 country codes, or using assignments
       subsequently made by the ISO 3166 maintenance agency or governing
       standardization bodies.

   3.  All three-character subtags consisting of digit (numeric)
       characters following the primary subtag were defined in the IANA
       registry according to the assignments found in UN Standard
       Country or Area Codes for Statistical Use [UN_M.49] or
       assignments subsequently made by the governing standards body.
       Note that not all of the UN M.49 codes are defined in the IANA
       registry.  The following rules define which codes are entered
       into the registry as valid subtags:

       A.  UN numeric codes assigned to 'macro-geographical
           (continental)' or sub-regions MUST be registered in the
           registry.  These codes are not associated with an assigned
           ISO 3166 alpha-2 code and represent supra-national areas,
           usually covering more than one nation, state, province, or
           territory.

       B.  UN numeric codes for 'economic groupings' or 'other
           groupings' MUST NOT be registered in the IANA registry and
           MUST NOT be used to form language tags.

       C.  UN numeric codes for countries or areas with ambiguous ISO
           3166 alpha-2 codes, when entered into the registry, MUST be
           defined according to the rules in Section 3.4 and MUST be
           used to form language tags that represent the country or
           region for which they are defined.

       D.  UN numeric codes for countries or areas for which there is an
           associated ISO 3166 alpha-2 code in the registry MUST NOT be
           entered into the registry and MUST NOT be used to form
           language tags.  Note that the ISO 3166-based subtag in the
           registry MUST actually be associated with the UN M.49 code in
           question.

Top      ToC       Page 13 
       E.  UN numeric codes and ISO 3166 alpha-2 codes for countries or
           areas listed as eligible for registration in [RFC4645] but
           not presently registered MAY be entered into the IANA
           registry via the process described in Section 3.5.  Once
           registered, these codes MAY be used to form language tags.

       F.  All other UN numeric codes for countries or areas that do not
           have an associated ISO 3166 alpha-2 code MUST NOT be entered
           into the registry and MUST NOT be used to form language tags.
           For more information about these codes, see Section 3.4.

   4.  Note: The alphanumeric codes in Appendix X of the UN document
       MUST NOT be entered into the registry and MUST NOT be used to
       form language tags.  (At the time this document was created,
       these values matched the ISO 3166 alpha-2 codes.)

   5.  There MUST be at most one region subtag in a language tag and the
       region subtag MAY be omitted, as when it adds no distinguishing
       value to the tag.

   6.  The region subtags 'AA', 'QM'-'QZ', 'XA'-'XZ', and 'ZZ' are
       reserved for private use in language tags.  These subtags
       correspond to codes reserved by ISO 3166 for private use.  These
       codes MAY be used for private use region subtags (instead of
       using a private use subtag sequence).  Please refer to
       Section 4.5 for more information on private use subtags.

   "de-CH" represents German ('de') as used in Switzerland ('CH').

   "sr-Latn-CS" represents Serbian ('sr') written using Latin script
   ('Latn') as used in Serbia and Montenegro ('CS').

   "es-419" represents Spanish ('es') appropriate to the UN-defined
   Latin America and Caribbean region ('419').

2.2.5.  Variant Subtags

   Variant subtags are used to indicate additional, well-recognized
   variations that define a language or its dialects that are not
   covered by other available subtags.  The following rules apply to the
   variant subtags:

   1.  Variant subtags are not associated with any external standard.
       Variant subtags and their meanings are defined by the
       registration process defined in Section 3.5.

   2.  Variant subtags MUST follow all of the other defined subtags, but
       precede any extension or private use subtag sequences.

Top      ToC       Page 14 
   3.  More than one variant MAY be used to form the language tag.

   4.  Variant subtags MUST be registered with IANA according to the
       rules in Section 3.5 of this document before being used to form
       language tags.  In order to distinguish variants from other types
       of subtags, registrations MUST meet the following length and
       content restrictions:

       1.  Variant subtags that begin with a letter (a-z, A-Z) MUST be
           at least five characters long.

       2.  Variant subtags that begin with a digit (0-9) MUST be at
           least four characters long.

   Variant subtag records in the language subtag registry MAY include
   one or more 'Prefix' fields, which indicate the language tag or tags
   that would make a suitable prefix (with other subtags, as
   appropriate) in forming a language tag with the variant.  For
   example, the subtag 'nedis' has a Prefix of "sl", making it suitable
   to form language tags such as "sl-nedis" and "sl-IT-nedis", but not
   suitable for use in a tag such as "zh-nedis" or "it-IT-nedis".

   "sl-nedis" represents the Natisone or Nadiza dialect of Slovenian.

   "de-CH-1996" represents German as used in Switzerland and as written
   using the spelling reform beginning in the year 1996 C.E.

   Most variants that share a prefix are mutually exclusive.  For
   example, the German orthographic variations '1996' and '1901' SHOULD
   NOT be used in the same tag, as they represent the dates of different
   spelling reforms.  A variant that can meaningfully be used in
   combination with another variant SHOULD include a 'Prefix' field in
   its registry record that lists that other variant.  For example, if
   another German variant 'example' were created that made sense to use
   with '1996', then 'example' should include two Prefix fields: "de"
   and "de-1996".

2.2.6.  Extension Subtags

   Extensions provide a mechanism for extending language tags for use in
   various applications.  See Section 3.7.  The following rules apply to
   extensions:

   1.   Extension subtags are separated from the other subtags defined
        in this document by a single-character subtag ("singleton").
        The singleton MUST be one allocated to a registration authority
        via the mechanism described in Section 3.7 and MUST NOT be the
        letter 'x', which is reserved for private use subtag sequences.

Top      ToC       Page 15 
   2.   Note: Private use subtag sequences starting with the singleton
        subtag 'x' are described in Section 2.2.7 below.

   3.   An extension MUST follow at least a primary language subtag.
        That is, a language tag cannot begin with an extension.
        Extensions extend language tags, they do not override or replace
        them.  For example, "a-value" is not a well-formed language tag,
        while "de-a-value" is.

   4.   Each singleton subtag MUST appear at most one time in each tag
        (other than as a private use subtag).  That is, singleton
        subtags MUST NOT be repeated.  For example, the tag
        "en-a-bbb-a-ccc" is invalid because the subtag 'a' appears
        twice.  Note that the tag "en-a-bbb-x-a-ccc" is valid because
        the second appearance of the singleton 'a' is in a private use
        sequence.

   5.   Extension subtags MUST meet all of the requirements for the
        content and format of subtags defined in this document.

   6.   Extension subtags MUST meet whatever requirements are set by the
        document that defines their singleton prefix and whatever
        requirements are provided by the maintaining authority.

   7.   Each extension subtag MUST be from two to eight characters long
        and consist solely of letters or digits, with each subtag
        separated by a single '-'.

   8.   Each singleton MUST be followed by at least one extension
        subtag.  For example, the tag "tlh-a-b-foo" is invalid because
        the first singleton 'a' is followed immediately by another
        singleton 'b'.

   9.   Extension subtags MUST follow all language, extended language,
        script, region, and variant subtags in a tag.

   10.  All subtags following the singleton and before another singleton
        are part of the extension.  Example: In the tag "fr-a-Latn", the
        subtag 'Latn' does not represent the script subtag 'Latn'
        defined in the IANA Language Subtag Registry.  Its meaning is
        defined by the extension 'a'.

   11.  In the event that more than one extension appears in a single
        tag, the tag SHOULD be canonicalized as described in
        Section 4.4.

Top      ToC       Page 16 
   For example, if the prefix singleton 'r' and the shown subtags were
   defined, then the following tag would be a valid example:
   "en-Latn-GB-boont-r-extended-sequence-x-private".

2.2.7.  Private Use Subtags

   Private use subtags are used to indicate distinctions in language
   important in a given context by private agreement.  The following
   rules apply to private use subtags:

   1.  Private use subtags are separated from the other subtags defined
       in this document by the reserved single-character subtag 'x'.

   2.  Private use subtags MUST conform to the format and content
       constraints defined in the ABNF for all subtags.

   3.  Private use subtags MUST follow all language, extended language,
       script, region, variant, and extension subtags in the tag.
       Another way of saying this is that all subtags following the
       singleton 'x' MUST be considered private use.  Example: The
       subtag 'US' in the tag "en-x-US" is a private use subtag.

   4.  A tag MAY consist entirely of private use subtags.

   5.  No source is defined for private use subtags.  Use of private use
       subtags is by private agreement only.

   6.  Private use subtags are NOT RECOMMENDED where alternatives exist
       or for general interchange.  See Section 4.5 for more information
       on private use subtag choice.

   For example: Users who wished to utilize codes from the Ethnologue
   publication of SIL International for language identification might
   agree to exchange tags such as "az-Arab-x-AZE-derbend".  This example
   contains two private use subtags.  The first is 'AZE' and the second
   is 'derbend'.

2.2.8.  Preexisting RFC 3066 Registrations

   Existing IANA-registered language tags from RFC 1766 and/or RFC 3066
   maintain their validity.  These tags will be maintained in the
   registry in records of either the "grandfathered" or "redundant"
   type.  Grandfathered tags contain one or more subtags that are not
   defined in the Language Subtag Registry (see Section 3).  Redundant
   tags consist entirely of subtags defined above and whose independent
   registration is superseded by this document.  For more information,
   see Section 3.8.

Top      ToC       Page 17 
   It is important to note that all language tags formed under the
   guidelines in this document were either legal, well-formed tags or
   could have been registered under RFC 3066.

2.2.9.  Classes of Conformance

   Implementations sometimes need to describe their capabilities with
   regard to the rules and practices described in this document.  There
   are two classes of conforming implementations described by this
   document: "well-formed" processors and "validating" processors.
   Claims of conformance SHOULD explicitly reference one of these
   definitions.

   An implementation that claims to check for well-formed language tags
   MUST:

   o  Check that the tag and all of its subtags, including extension and
      private use subtags, conform to the ABNF or that the tag is on the
      list of grandfathered tags.

   o  Check that singleton subtags that identify extensions do not
      repeat.  For example, the tag "en-a-xx-b-yy-a-zz" is not well-
      formed.

   Well-formed processors are strongly encouraged to implement the
   canonicalization rules contained in Section 4.4.

   An implementation that claims to be validating MUST:

   o  Check that the tag is well-formed.

   o  Specify the particular registry date for which the implementation
      performs validation of subtags.

   o  Check that either the tag is a grandfathered tag, or that all
      language, script, region, and variant subtags consist of valid
      codes for use in language tags according to the IANA registry as
      of the particular date specified by the implementation.

   o  Specify which, if any, extension RFCs as defined in Section 3.7
      are supported, including version, revision, and date.

   o  For any such extensions supported, check that all subtags used in
      that extension are valid.

   o  For variant and extended language subtags, if the registry
      contains one or more 'Prefix' fields for that subtag, check that
      the tag matches at least one prefix.  The tag matches if all the

Top      ToC       Page 18 
      subtags in the 'Prefix' also appear in the tag.  For example, the
      prefix "es-CO" matches the tag "es-Latn-CO-x-private" because both
      the 'es' language subtag and 'CO' region subtag appear in the tag.



(page 18 continued on part 2)

Next RFC Part