94 lines
3.8 KiB
Plaintext
94 lines
3.8 KiB
Plaintext
'\"
|
|
'\" Copyright (c) 1998 by Scriptics Corporation.
|
|
'\"
|
|
'\" See the file "license.terms" for information on usage and redistribution
|
|
'\" of this file, and for a DISCLAIMER OF ALL WARRANTIES.
|
|
'\"
|
|
.so man.macros
|
|
.TH encoding n "8.1" Tcl "Tcl Built-In Commands"
|
|
.BS
|
|
.SH NAME
|
|
encoding \- Manipulate encodings
|
|
.SH SYNOPSIS
|
|
\fBencoding \fIoption\fR ?\fIarg arg ...\fR?
|
|
.BE
|
|
|
|
.SH INTRODUCTION
|
|
.PP
|
|
Strings in Tcl are encoded using 16-bit Unicode characters. Different
|
|
operating system interfaces or applications may generate strings in
|
|
other encodings such as Shift-JIS. The \fBencoding\fR command helps
|
|
to bridge the gap between Unicode and these other formats.
|
|
.SH DESCRIPTION
|
|
.PP
|
|
Performs one of several encoding related operations, depending on
|
|
\fIoption\fR. The legal \fIoption\fRs are:
|
|
.TP
|
|
\fBencoding convertfrom\fR ?\fIencoding\fR? \fIdata\fR
|
|
Convert \fIdata\fR to Unicode from the specified \fIencoding\fR. The
|
|
characters in \fIdata\fR are treated as binary data where the lower
|
|
8-bits of each character is taken as a single byte. The resulting
|
|
sequence of bytes is treated as a string in the specified
|
|
\fIencoding\fR. If \fIencoding\fR is not specified, the current
|
|
system encoding is used.
|
|
.TP
|
|
\fBencoding convertto\fR ?\fIencoding\fR? \fIstring\fR
|
|
Convert \fIstring\fR from Unicode to the specified \fIencoding\fR.
|
|
The result is a sequence of bytes that represents the converted
|
|
string. Each byte is stored in the lower 8-bits of a Unicode
|
|
character. If \fIencoding\fR is not specified, the current
|
|
system encoding is used.
|
|
.TP
|
|
\fBencoding dirs\fR ?\fIdirectoryList\fR?
|
|
.VS 8.5
|
|
Tcl can load encoding data files from the file system that describe
|
|
additional encodings for it to work with. This command sets the search
|
|
path for \fB*.enc\fR encoding data files to the list of directories
|
|
\fIdirectoryList\fR. If \fIdirectoryList\fR is omitted then the
|
|
command returns the current list of directories that make up the
|
|
search path. It is an error for \fIdirectoryList\fR to not be a valid
|
|
list. If, when a search for an encoding data file is happening, an
|
|
element in \fIdirectoryList\fR does not refer to a readable,
|
|
searchable directory, that element is ignored.
|
|
.VE 8.5
|
|
.TP
|
|
\fBencoding names\fR
|
|
Returns a list containing the names of all of the encodings that are
|
|
currently available.
|
|
.TP
|
|
\fBencoding system\fR ?\fIencoding\fR?
|
|
Set the system encoding to \fIencoding\fR. If \fIencoding\fR is
|
|
omitted then the command returns the current system encoding. The
|
|
system encoding is used whenever Tcl passes strings to system calls.
|
|
.SH EXAMPLE
|
|
.PP
|
|
It is common practice to write script files using a text editor that
|
|
produces output in the euc-jp encoding, which represents the ASCII
|
|
characters as singe bytes and Japanese characters as two bytes. This
|
|
makes it easy to embed literal strings that correspond to non-ASCII
|
|
characters by simply typing the strings in place in the script.
|
|
However, because the \fBsource\fR command always reads files using the
|
|
current system encoding, Tcl will only source such files correctly
|
|
when the encoding used to write the file is the same. This tends not
|
|
to be true in an internationalized setting. For example, if such a
|
|
file was sourced in North America (where the ISO8859-1 is normally
|
|
used), each byte in the file would be treated as a separate character
|
|
that maps to the 00 page in Unicode. The resulting Tcl strings will
|
|
not contain the expected Japanese characters. Instead, they will
|
|
contain a sequence of Latin-1 characters that correspond to the bytes
|
|
of the original string. The \fBencoding\fR command can be used to
|
|
convert this string to the expected Japanese Unicode characters. For
|
|
example,
|
|
.CS
|
|
set s [\fBencoding convertfrom\fR euc-jp "\exA4\exCF"]
|
|
.CE
|
|
would return the Unicode string
|
|
.QW "\eu306F" ,
|
|
which is the Hiragana letter HA.
|
|
|
|
.SH "SEE ALSO"
|
|
Tcl_GetEncoding(3)
|
|
|
|
.SH KEYWORDS
|
|
encoding
|