NLS Architecture

Web Server/400 implements National Language Support (NLS) with the services provided by OS/400, the AS/400 operating system. This means that Web Server/400 has some of the same NLS strengths and weaknesses as OS/400.

This section is especially useful for non-U.S. installations. The World Wide Web and the Internet are becoming more internationally-aware, but some inconveniences still exist for non-U.S. Web sites. This section provides assistance in configuring a successful Web site for all national languages.

NLS Definitions

The following terms which are used to discuss National Language Support (NLS) may be unfamiliar to you:
A number which identifies an encoding scheme (see ESID), and one or more code pages. This is all that is needed to correctly interpret text data.
Character set
A defined set of characters. No mapping between characters and values is assumed.
Code page
Specifies a mapping between values and characters for one or more character sets.
Double-byte code page
A code page in which each character is represented by two bytes.
Encoding scheme identifier. An encoding scheme specifies the way that text data is interpreted. It includes information such as whether the text is ASCII or EBCDIC.
ISO character set
The ISO mechanism for cataloging ways of interpreting text data. This has a correspondence to CCSID.
Single-byte code page
A code page in which each character is represented by one byte.

Serving Content

When reading a content file, the server has to know in what CCSID it is stored. It uses the File CCSID configuration value to do this. In addition to any valid CCSID, the File CCSID may be set to zero or to NoConvert. Zero tells Web Server/400 to use the file's associated codepage as a CCSID. NoConvert tells Web Server/400 to serve the file in binary mode; no conversion is performed.

Web Server/400 converts content files as it reads them. Depending on the type of content file, it may first convert it to the server job CCSID, then to the content CCSID, or it may convert the file directly to the content CCSID. NLS flow has more information.

Serving Webulator Screens

To serve double-byte screens, change the Terminal Size parameter to DBCS. No changes are needed for non-double-byte screens.

Reading Configuration Files

When reading a configuration file, Web Server/400 needs to know what CCSID the file is in so that it can be converted correctly. In most cases, the file's associated codepage is used. Additionally, the CCSID for the Directory based configuration file can be explicitly entered. This is supported to make it easier to enter mixed byte data in the directory based configuration file. The root file system does not allow the specification of mixed byte CCSIDs unless they are in QSYS.

Conversion Methods

OS/400 conversion routines are used to convert data. When converting data, Web Server/400 will keep trying different methods until one succeeds or all have been exhausted. Web Server/400 tries the following methods (in order): best fit, enforced subset, round-trip. The three methods differ in what their purpose is and how they handle mismatched characters (characters that do not convert correctly).

Best fit conversion will attempt to find a close alternative for a mismatched character. For example, if converting the letter o with an umlaut above it (ö) to a CCSID that does not contain this character, it might be converted to a o without an umlaut above it.

Enforced subset conversion deals with mismatched characters by replacing them with a single substitution character. The substitution character depends on the encoding scheme of the destination CCSID.

Round trip conversion is meant to allow conversion from one CCSID to a second, and then back to the first CCSID without a loss of information. This is the least useful for Web Server/400, because all conversions are one-way, and so is a last resort.

If a conversion from a single-byte CCSID to a mixed byte CCSID fails, Web Server/400 will also attempt a conversion from the single-byte CCSID to the single-byte codepage of the mixed-byte CCSID.

CCSIDs and ISO character sets

While OS/400 uses CCSIDs to identify the way text data is encoded, the World Wide Web uses ISO character sets to identify the way text data is encoded. Following is a table showing some of the useful ISO character sets and associated CCSIDs:
ISO character set   CCSID
-----------------   -----
US-ASCII              367
ISO-8859-1            819
ISO-8859-2            912
ISO-8859-5            915
ISO-8859-7            813
ISO-8859-8            916
ISO-8859-9            920
ISO-2022-JP          5052

Note that ISO-8859-1 (CCSID 819) is the default character set for HTTP and is the default value for the Content CCSID.


URLs must be single-byte except for the query string, which can be mixed-byte. This is a limitation of the HTTP specification.
CCSIDs with an encoding scheme (ESID) of 4403 and 5404 are not currently supported.
File names
All file names must be single byte. This includes configuration file names and content file names.

Related Documentation

The following IBM manuals contain information that may be useful to you:
Character Data Representation Architecture Level 1         SC09-1390-00
Character Data Representation Architecture Level 2         SC09-1390-01
AS/400 International Application Development               SC41-3603-00
AS/400 National Langauge Support Planning Guide Version 2  GC41-9877-02