Content for TS 23.038 Word version: 18.0.0

1… 4 5 6… 6.2… A… A.3… B… C…

6.2 Character sets and coding 6.2.1 GSM 7 bit Default Alphabet 6.2.1.1 GSM 7 bit default alphabet extension table 6.2.1.2 National Language Identifier 6.2.2 8 bit data 6.2.3 UCS2
...

6.2 Character sets and coding p. 19

This section provides list of character sets and codings to be supported by SMS, CBS, USSD and IEs included in NAS messages as specified in TS 24.008 and TS 24.301. Implementation of the GSM 7 bit default alphabet is mandatory. Support of other character sets is optional.

It should be noted that support of Latin and non-Latin languages by GSM 7 bit default alphabet is limited. It is therefore essential to introduce UCS 2 character set in mobile stations, SCs and systems handling SMSs, CBSs, USSDs, and IEs included in NAS messages.

6.2.1 GSM 7 bit Default Alphabet p. 19

Bits per character:

CBS/USSD/IE of NAS message pad character:

Character table:

				b7	0	0	0	0	1	1	1	1
				b6	0	0	1	1	0	0	1	1
				b5	0	1	0	1	0	1	0	1
b4	b3	b2	b1		0	1	2	3	4	5	6	7
0	0	0	0	0	@	Δ	SP	0	¡	P	¿	p
0	0	0	1	1	£	_	!	1	A	Q	a	q
0	0	1	0	2	$	Φ	"	2	B	R	b	r
0	0	1	1	3	¥	Γ	#	3	C	S	c	s
0	1	0	0	4	è	Λ	¤	4	D	T	d	t
0	1	0	1	5	é	Ω	%	5	E	U	e	u
0	1	1	0	6	ù	Π	&	6	F	V	f	v
0	1	1	1	7	ì	Ψ	'	7	G	W	g	w
1	0	0	0	8	ò	Σ	(	8	H	X	h	x
1	0	0	1	9	Ç	Θ	)	9	I	Y	i	y
1	0	1	0	10	LF	Ξ	*	:	J	Z	j	z
1	0	1	1	11	Ø	(1)	+	;	K	Ä	k	ä
1	1	0	0	12	ø	Æ	,	<	L	Ö	l	ö
1	1	0	1	13	CR	æ	-	=	M	ñ	m	Ø
1	1	1	0	14	Å	ß	.	>	N	Ü	n	ü
1	1	1	1	15	å	É	/	?	O	§	o	à
NOTE 1: This code is an escape to an extension of this table (either to the GSM 7 bit default alphabet extension table, see subclause 6.2.1.1, or a National Language Single Shift Table, see subclause 6.2.1.2.2). A receiving entity which does not understand the meaning of this escape mechanism shall display it as a space character.

6.2.1.1 GSM 7 bit default alphabet extension table p. 20

The Table below is reserved for symbols of international significance (e.g currency symbols). It also contains a mechanism to permit escape (Note 1) to additional Tables for symbols of international significance in the event that the Table below becomes fully populated.

				b7	0	0	0	0	1	1	1	1
				b6	0	0	1	1	0	0	1	1
				b5	0	1	0	1	0	1	0	1
b4	b3	b2	b1		0	1	2	3	4	5	6	7
0	0	0	0	0					\|
0	0	0	1	1
0	0	1	0	2
0	0	1	1	3
0	1	0	0	4		^
0	1	0	1	5							€
0	1	1	0	6
0	1	1	1	7
1	0	0	0	8			{
1	0	0	1	9			}
1	0	1	0	10	(3)
1	0	1	1	11		(1)
1	1	0	0	12				[
1	1	0	1	13				~
1	1	1	0	14				]
1	1	1	1	15			\
In the event that an MS receives a code where a symbol is not represented in the above Table then the MS shall display either the character shown in the main GSM 7 bit default alphabet table in subclause 6.2.1., or the character from the National Language Locking Shift Table in the case where the locking shift mechanism as defined in subclause 6.2.1.2.3 is used. NOTE 1: This code is reserved for the extension to another extension Table. On receipt of this code, a receiving entity shall display a space until another extension Table is defined. It is not intended that this extension mechanism should be used as an alternative to UCS2 to enhance the 7bit default alphabet character repertoire for national specific character sets. NOTE 2: Void NOTE 3: This code is defined as a Page Break character and may be used for example in compressed CBS messages. Any mobile station which does not understand the GSM 7 bit default alphabet Table extension mechanism will treat this character as Line Feed.

6.2.1.2 National Language Identifier p. 22

6.2.1.2.1 Introduction p. 22

The national language tables are used for adding the special characters of certain languages that cannot be expressed using the GSM default 7 bit alphabet.

The principle is to use the National Language Identifier to indicate to a receiving entity that the message has been encoded using a national language table. Both single shift and locking shift mechanisms are defined.

The single shift mechanism, as defined in subclause 6.2.1.2.2, applies to a single character and it replaces the GSM 7 bit default alphabet extension table defined in subclause 6.2.1.1 with a National Language Single Shift Table (see subclause A.2).

The locking shift mechanism, as defined in subclause 6.2.1.2.3, applies throughout the message, or the current segment in case of a concatenated message, and it replaces the GSM 7 bit default alphabet defined in subclause 6.2.1 with a National Language Locking Shift Table (see subclause A.3) that defines the whole character set needed for the language.

In case that several languages are used, which require different national language tables, it is recommended to encode the message in UCS-2, however it is possible to use both single shift and locking shift with the corresponding tables in a single message.

Implementations based on older reference versions (so-called "legacy implementations") will use the fallback mechanisms that are defined in the earlier versions of the specification for handling of unknown characters.

6.2.1.2.2 Single shift mechanism p. 22

In the case where single shift is not combined with locking shift, single shift means that the receiving entity shall decode all characters in the message (or the current segment in case of a concatenated message) using the GSM 7 bit default alphabet unless the escape mechanism is used, i.e <escape><character>, as defined in subclause 6.2.1.

The case where single shift and locking shift (which may be for the same or different languages) are combined is described in subclause 6.2.1.2.3.

If the escape mechanism is used then instead of the GSM 7 bit default alphabet extension Table in subclause 6.2.1.1 the receiving entity shall decode the subsequent character using the National Language Single Shift Table for the indicated language in Table 6.2.1.2.4.1. Each time a sending entity requires to send a character from the National Language Single Shift Table the sending entity shall encode this as <escape><character>, where the <character> is encoded using the indicated National Language Single Shift Table.

6.2.1.2.3 Locking shift mechanism p. 22

Locking Shift means that the receiving entity shall decode all characters in the message (or the current segment in case of a concatenated message) using the National Language Locking Shift Table unless the escape mechanism is used. i.e. <escape><character>, as defined in subclause 6.2.1.

If the escape mechanism is used and no National Language Single Shift Table is indicated (see subclause 6.2.1.2.4), the receiving entity shall decode the message (or the current segment in case of a concatenated message) using the GSM 7 bit default alphabet extension table as defined in subclause 6.2.1.1.

If the escape mechanism is used and a National Language Single Shift Table is indicated (see subclause 6.2.1.2.4), the receiving entity shall decode the message (or the current segment in case of a concatenated message) using the National Language Single Shift Table as defined in subclause 6.2.1.2.2.

6.2.1.2.4 National Language Identifier p. 22

A National Language Single Shift IE and a National Language Locking Shift IE can be included in the TP User Data Header, as defined in TS 23.040. The receiving entity shall decode using single shift or locking shift as applicable for the language indicated in the National Language Identifier within these IEs.

The National Language Identifier octet is encoded as shown in Table 6.2.1.2.4.1.

Table 6.2.1.2.4.1

Language code b7...b0	Language	National Language Single Shift Table	National Language Locking Shift Table
00000000	Reserved	n/a	n/a
00000001	Turkish	Subclause A.2.1	Subclause A.3.1
00000010	Spanish	Subclause A.2.2	Not defined - fallback to GSM 7 bit default alphabet (see subclause 6.2.1)
00000011	Portuguese	Subclause A.2.3	Subclause A.3.3
00000100	Bengali	Subclause A.2.4	Subclause A.3.4
00000101	Gujarati	Subclause A.2.5	Subclause A.3.5
00000110	Hindi	Subclause A.2.6	Subclause A.3.6
00000111	Kannada	Subclause A.2.7	Subclause A.3.7
00001000	Malayalam	Subclause A.2.8	Subclause A.3.8
00001001	Oriya	Subclause A.2.9	Subclause A.3.9
00001010	Punjabi	Subclause A.2.10	Subclause A.3.10
00001011	Tamil	Subclause A.2.11	Subclause A.3.11
00001100	Telugu	Subclause A.2.12	Subclause A.3.12
00001101	Urdu	Subclause A.2.13	Subclause A.3.13
00001110 to 11111111	Reserved	n/a	n/a

6.2.1.2.5 Processing of national language characters p. 23

When supporting a specific national language, the sending entity shall support the encoding of messages using the corresponding National Language Identifier defined in subclause 6.2.1.2.4.

The receiving entity should be able to decode messages usingthe National Language Identifiers defined in subclause 6.2.1.2.4 for the languages that are supported by that entity.

If a message is received, containing a National Language Identifier indicating a reserved value or a value that is not supported by the receiving entity, the receiving entity shall ignore the IE (see TS 23.040) in which the National Language Identifier was indicated.

The receiving entity shall be capable of processing both single shift and locking shift within the same message.

It is an implementation option for the sending entity whether to use the single shift mechanism, the locking shift mechanism or both.

6.2.2 8 bit data p. 24

8 bit data is user defined

Padding:

CR in the case of an 8 bit character set

Otherwise - user defined

Character table:

User Specific

6.2.3 UCS2 p. 24

Bits per character:

CBS/USSD pad character:

Character table:

ISO/IEC 10646 [10]