NLS Architecture


Webulator/400 implements National Language Support (NLS) with the services provided by OS/400, the AS/400 operating system. This means that Webulator/400 has some of the same NLS strengths and weaknesses as OS/400.

This section is especially useful for non-U.S. installations. The World Wide Web and the Internet are becoming more internationally aware, but some inconveniences still exist for non-U.S. Web sites. This section provides assistance in configuring a successful Web site for all national languages.

NLS Definitions

The following terms which are used to discuss National Language Support (NLS) may be unfamiliar to you:
CCSID
A number which identifies an encoding scheme (see ESID), and one or more code pages. This is all that is needed to correctly interpret text data.
Character set
A defined set of characters. No mapping between characters and values is assumed.
Code page
Specifies a mapping between values and characters for one or more character sets.
Double-byte code page
A code page in which each character is represented by two bytes.
ESID
Encoding scheme identifier. An encoding scheme specifies the way that text data is interpreted. It includes information such as whether the text is ASCII or EBCDIC.
ISO character set
The ISO mechanism for cataloging ways of interpreting text data. This has a correspondence to CCSID.
Single-byte code page
A code page in which each character is represented by one byte.

Reading Configuration Files

When reading a configuration file, Webulator/400 needs to know what CCSID the file is in so that it can be converted correctly. In most cases, the file's associated codepage is used. Additionally, the CCSID for the session based configuration file can be explicitly entered. This is supported to make it easier to enter mixed byte data in the session based configuration file. The root file system does not allow the specification of mixed byte CCSIDs unless they are in QSYS.

Browser Encoding

Most browsers that are used to interact with the server have a setting that is used to set the browser's encoding method. The server's content CCSID configuration value should be set to a CCSID that is compatible with the browser's encoding setting. When the server receives text data from the browser the server converts the text from the content (ASCII) CCSID to the server job's (EBCDIC) CCSID, and the reverse when returning text data. If the content CCSID does not properly match the browser's encoding setting then certain characters may appear to be corrupted. To support multiple types of encoding on the same AS/400, multiple servers would have to be configured.

Server CCSID

The server job(s) CCSID is controlled by setting the CCSID for the server user profile. Since V3R1 an AS/400 job will always have a "real" CCSID which is known as the default CCSID. The server converts data between the job's (default) CCSID and the configured content CCSID when processing browser requests. The default CCSID should be compatible with the content CCSID or data may appear corrupted.

Serving Webulator Screens

By default, the server assumes the Webulator interactive session's CCSID and the virtual device's code page match the server job's CCSID. If this is not the case, the virtual terminal job CCSID and/or virtual terminal device CCSID configuration values may need to be set. If the CCSIDs/code pages do not match then data may appear corrupted.

To serve double-byte screens, change the Terminal Size parameter to DBCS. No changes are needed for non-double-byte screens.

Conversion Methods

OS/400 conversion routines are used to convert data. When converting data, Webulator/400 will keep trying different methods until one succeeds or all have been exhausted. Webulator/400 tries the following methods (in order): best fit, enforced subset, round-trip. The three methods differ in what their purpose is and how they handle mismatched characters (characters that do not convert correctly).

Best fit conversion will attempt to find a close alternative for a mismatched character. For example, if converting the letter o with an umlaut above it (ö) to a CCSID that does not contain this character, it might be converted to a o without an umlaut above it.

Enforced subset conversion deals with mismatched characters by replacing them with a single substitution character. The substitution character depends on the encoding scheme of the destination CCSID.

Round trip conversion is meant to allow conversion from one CCSID to a second, and then back to the first CCSID without a loss of information. This is the least useful for Webulator/400, because all conversions are one-way, and so is a last resort.

If a conversion from a single-byte CCSID to a mixed byte CCSID fails, Webulator/400 will also attempt a conversion from the single-byte CCSID to the single-byte codepage of the mixed-byte CCSID.

CCSIDs and ISO character sets

While OS/400 uses CCSIDs to identify the way text data is encoded, the World Wide Web uses ISO character sets to identify the way text data is encoded. Following is a table showing some of the useful ISO character sets and associated CCSIDs:
                    ASCII
ISO character set   CCSID
-----------------   -----
US-ASCII              367
ISO-8859-1            819
ISO-8859-2            912
ISO-8859-5            915
ISO-8859-7            813
ISO-8859-8            916
ISO-8859-9            920
ISO-2022-JP          5052

Note that ISO-8859-1 (CCSID 819) is the default character set for HTTP and is the default value for the Content CCSID.

Limitations

URLs
URLs must be single-byte except for the query string, which can be mixed-byte. This is a limitation of the HTTP specification.
CCSIDs
CCSIDs with an encoding scheme (ESID) of 4403 and 5404 are not currently supported.
File names
All file names must be single byte. This includes configuration file names and content file names.

Related Documentation

The following IBM manuals contain information that may be useful to you:
Character Data Representation Architecture Level 1         SC09-1390-00
Character Data Representation Architecture Level 2         SC09-1390-01
AS/400 International Application Development               SC41-3603-00
AS/400 National Langauge Support Planning Guide Version 2  GC41-9877-02