|
|
|
|
|
|
ABNF for URI Generic Syntax -- RFC 3986
|
|
A Uniform Resource Identifier (URI) is a compact sequence of
characters that identifies an abstract or physical resource.
The URI generic syntax, as defined in RFC3986,
is a superset of the syntax of all URI
schemes. A parser of the generic URI syntax can parse any URI reference into
its five major components. Once the scheme is determined, further
scheme-specific parsing can be performed on the components.
|
|
|
|
|
|
|
|
|
|
|
|
|
A URI-reference is either a URI or a relative reference. If the
URI-reference's prefix does not match the syntax of a scheme followed
by its colon separator, then the URI-reference is a relative
reference.
|
|
|
|
| URI-reference | = |
URI
/ relative-ref
|
|
|
|
|
|
|
|
|
|
Each URI begins with a scheme name that refers to a specification for
assigning identifiers within that scheme.
The process
for registration of new URI schemes is defined by RFC 4395.
The scheme registry (http://www.iana.org/assignments/uri-schemes.html) maintains the mapping between scheme names and
their specifications.
|
|
|
|
|
|
|
|
|
Examples: "sip", "sips", "tel", "http", https", "mailto", "pres", "ftp", "file", "rtsp", "msrp"
|
|
|
|
|
|
|
|
|
The authority component is preceded by a double slash ("//") and is
terminated by the next slash ("/"), question mark ("?"), or number
sign ("#") character, or by the end of the URI.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| host | = |
IP-literal
/ IPv4address
/ reg-name
|
| IP-literal | = |
"["
( IPv6address
/ IPvFuture )
"]"
|
| IPvFuture | = |
"v"
1*HEXDIG "."
1*( unreserved
/ sub-delims
/ ":" )
|
|
| IPv6address | = |
| | | | |
6( |
h16 ":" ) |
ls32 |
| | / | | | |
"::" |
5( |
h16 ":" ) |
ls32 |
| | / |
[ | |
h16 ] |
"::" |
4( |
h16 ":" ) |
ls32 |
| | / |
[ |
*1( h16 ":" ) |
h16 ] |
"::" |
3( |
h16 ":" ) |
ls32 |
| | / |
[ |
*2( h16 ":" ) |
h16 ] |
"::" |
2( |
h16 ":" ) |
ls32 |
| | / |
[ |
*3( h16 ":" ) |
h16 ] |
"::" |
|
h16 ":" |
ls32 |
| | / |
[ |
*4( h16 ":" ) |
h16 ] |
"::" |
|
|
ls32 |
| | / |
[ |
*5( h16 ":" ) |
h16 ] |
"::" |
|
|
h16 |
| | / |
[ |
*6( h16 ":" ) |
h16 ] |
"::"
|
|
| h16 | = |
1*4HEXDIG
|
| ls32 | = |
( h16
":"
h16 )
/ IPv4address
|
| IPv4address | = |
dec-octet
"."
dec-octet
"."
dec-octet
"."
dec-octet
|
|
| dec-octet | = |
DIGIT | ; 0-9 |
| |
/ %x31-39 DIGIT | ; 10-99 |
| |
/ "1" 2DIGIT |
; 100-199 |
| |
/ "2" %x30-34
DIGIT |
; 200-249 |
| |
/ "25" %x30-35 |
; 250-255 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The path component contains data, usually organized in hierarchical
form, that, along with data in the non-hierarchical query component,
serves to identify a resource within the scope of the
URI's scheme and naming authority (if any). The path is terminated
by the first question mark ("?") or number sign ("#") character, or
by the end of the URI.
|
|
|
|
| path | = |
path-abempty |
; begins with "/" or is empty
|
| |
/ path-absolute |
; begins with "/" but not "//"
|
| |
/ path-noscheme |
; begins with a non-colon segment
|
| |
/ path-rootless |
; begins with a segment
|
| |
/ path-empty |
; zero characters
|
|
| path-abempty | = |
*( "/"
segment )
|
| path-absolute | = |
"/"
[ segment-nz
*( "/"
segment ) ]
|
| path-noscheme | = |
segment-nz-nc
*( "/"
segment )
|
| path-rootless | = |
segment-nz
*( "/"
segment )
|
| path-empty | = |
0<pchar>
|
| segment | = |
*pchar
|
| segment-nz | = |
1*pchar
|
| segment-nz-nc | = |
1*( unreserved
/ pct-encoded
/ sub-delims
/ "@" )
; non-zero-length segment without any colon ":"
|
| pchar | = |
unreserved
/ pct-encoded
/ sub-delims
/ ":"
/ "@"
|
|
|
|
|
|
|
The query component contains non-hierarchical data that, along with
data in the path component, serves to identify a
resource within the scope of the URI's scheme and naming authority
(if any). The query component is indicated by the first question
mark ("?") character and terminated by a number sign ("#") character
or by the end of the URI.
|
|
|
|
| query | = |
*( pchar
/ "/"
/ "?" )
|
|
|
|
|
|
|
The fragment identifier component of a URI allows indirect
identification of a secondary resource by reference to a primary
resource and additional identifying information. The identified
secondary resource may be some portion or subset of the primary
resource, some view on representations of the primary resource, or
some other resource defined or described by those representations. A
fragment identifier component is indicated by the presence of a
number sign ("#") character and terminated by the end of the URI.
|
|
|
|
| fragment | = |
*( pchar
/ "/"
/ "?" )
|
|
|
|
|
|
|
A percent-encoding mechanism is used to represent a data octet in a
component when that octet's corresponding character is outside the
allowed set or is being used as a delimiter of, or within, the
component.
|
|
|
|
|
|
|
|
|
Characters that are allowed in a URI but do not have a reserved
purpose are called unreserved. These include uppercase and lowercase
letters, decimal digits, hyphen, period, underscore, and tilde.
|
|
|
|
|
|
|
|
|
URIs include components and subcomponents that are delimited by
characters in the "reserved" set. These characters are called
"reserved" because they may (or may not) be defined as delimiters by
the generic syntax, by each scheme-specific syntax, or by the
implementation-specific syntax of a URI's dereferencing algorithm.
If data for a URI component would conflict with a reserved
character's purpose as a delimiter, then the conflicting data must be
percent-encoded before the URI is formed.
|
|
|
|
| reserved | = |
gen-delims
/ sub-delims
|
| gen-delims | = |
":"
/ "/"
/ "?"
/ "#"
/ "["
/ "]"
/ "@"
|
| sub-delims | = |
"!"
/ "$"
/ "&"
/ "'"
/ "("
/ ")"
/ "*"
/ "+"
/ ","
/ ";"
/ "="
|
|
|
|