NLS Architecture
Web Server/400 implements National Language Support (NLS) with the services
provided by OS/400, the AS/400 operating system. This means that Web
Server/400 has some of the same NLS strengths and weaknesses as OS/400.
This section is especially useful for non-U.S. installations. The
World Wide Web and the Internet are becoming more internationally-aware, but
some inconveniences still exist for non-U.S. Web sites. This section provides
assistance in configuring a successful Web site for all national languages.
NLS Definitions
The following terms which are used to discuss National Language Support (NLS)
may be unfamiliar to you:
- CCSID
-
A number which identifies an encoding scheme (see ESID), and one or more
code pages. This is all that is needed to correctly interpret text data.
- Character set
-
A defined set of characters. No mapping between characters and values is
assumed.
- Code page
-
Specifies a mapping between values and characters for one or more character
sets.
- Double-byte code page
-
A code page in which each character is represented by two bytes.
- ESID
-
Encoding scheme identifier. An encoding scheme specifies the way that text
data is interpreted. It includes information such as whether the text is
ASCII or EBCDIC.
- ISO character set
-
The ISO mechanism for cataloging ways of interpreting text data. This has a
correspondence to CCSID.
- Single-byte code page
-
A code page in which each character is represented by one byte.
Serving Content
When reading a content file, the server has to know in what CCSID it is stored.
It uses the File CCSID configuration value to do this.
In addition to any valid CCSID, the File CCSID may be set to zero or to
NoConvert. Zero tells Web Server/400 to use the file's associated
codepage as a CCSID. NoConvert tells Web Server/400 to serve the
file in binary mode; no conversion is performed.
Web Server/400 converts content files as it reads them. Depending on the
type of content file, it may first convert it to the server job CCSID, then
to the content CCSID, or it may convert the file directly to the content
CCSID. NLS flow has more information.
Serving Webulator Screens
To serve double-byte screens, change the
Terminal Size parameter to DBCS.
No changes are needed for non-double-byte screens.
Reading Configuration Files
When reading a configuration file, Web Server/400 needs to know what CCSID
the file is in so that it can be converted correctly. In most cases, the
file's associated codepage is used. Additionally, the CCSID for the
Directory based configuration file can be
explicitly entered. This is supported to make it easier to enter mixed byte
data in the directory based configuration file. The root file system does not
allow the specification of mixed byte CCSIDs unless they are in QSYS.
Conversion Methods
OS/400 conversion routines are used to convert data. When converting data,
Web Server/400 will keep trying different methods until one succeeds or all
have been exhausted. Web Server/400 tries the following methods (in order): best
fit, enforced subset, round-trip. The three methods differ in what their
purpose is and how they handle mismatched characters (characters that do not
convert correctly).
Best fit conversion will attempt to find a close alternative for a mismatched
character. For example, if converting the letter o with an umlaut above it
(ö) to a CCSID that does not contain this character, it might be converted
to a o without an umlaut above it.
Enforced subset conversion deals with mismatched characters by replacing them
with a single substitution character. The substitution character depends on
the encoding scheme of the destination CCSID.
Round trip conversion is meant to allow conversion from one CCSID to a
second, and then back to the first CCSID without a loss of information. This
is the least useful for Web Server/400, because all conversions are one-way,
and so is a last resort.
If a conversion from a single-byte CCSID to a mixed byte CCSID fails, Web
Server/400 will also attempt a conversion from the single-byte CCSID to the
single-byte codepage of the mixed-byte CCSID.
CCSIDs and ISO character sets
While OS/400 uses CCSIDs to identify the way text data is encoded, the
World Wide Web uses ISO character sets to identify the way text data is
encoded. Following is a table showing some of the useful ISO character sets and
associated CCSIDs:
ASCII
ISO character set CCSID
----------------- -----
US-ASCII 367
ISO-8859-1 819
ISO-8859-2 912
ISO-8859-5 915
ISO-8859-7 813
ISO-8859-8 916
ISO-8859-9 920
ISO-2022-JP 5052
Note that ISO-8859-1 (CCSID 819) is the default character set for HTTP and is
the default value for the Content CCSID.
Limitations
- URLs
-
URLs must be single-byte except for the query
string, which can be mixed-byte. This is a limitation of the HTTP
specification.
- CCSIDs
-
CCSIDs with an encoding scheme (ESID) of 4403 and 5404 are not currently
supported.
- File names
-
All file names must be single byte. This includes configuration file names
and content file names.
Related Documentation
The following IBM manuals contain information that may be useful to you:
Character Data Representation Architecture Level 1 SC09-1390-00
Character Data Representation Architecture Level 2 SC09-1390-01
AS/400 International Application Development SC41-3603-00
AS/400 National Langauge Support Planning Guide Version 2 GC41-9877-02