NLS Architecture
Webulator/400 implements National Language Support (NLS) with the services
provided by OS/400, the AS/400 operating system. This means that Webulator/400
has some of the same NLS strengths and weaknesses as OS/400.
This section is especially useful for non-U.S. installations. The
World Wide Web and the Internet are becoming more internationally aware, but
some inconveniences still exist for non-U.S. Web sites. This section provides
assistance in configuring a successful Web site for all national languages.
NLS Definitions
The following terms which are used to discuss National Language Support (NLS)
may be unfamiliar to you:
- CCSID
-
A number which identifies an encoding scheme (see ESID), and one or more
code pages. This is all that is needed to correctly interpret text data.
- Character set
-
A defined set of characters. No mapping between characters and values is
assumed.
- Code page
-
Specifies a mapping between values and characters for one or more character
sets.
- Double-byte code page
-
A code page in which each character is represented by two bytes.
- ESID
-
Encoding scheme identifier. An encoding scheme specifies the way that text
data is interpreted. It includes information such as whether the text is
ASCII or EBCDIC.
- ISO character set
-
The ISO mechanism for cataloging ways of interpreting text data. This has a
correspondence to CCSID.
- Single-byte code page
-
A code page in which each character is represented by one byte.
Reading Configuration Files
When reading a configuration file, Webulator/400 needs to know what CCSID
the file is in so that it can be converted correctly. In most cases, the
file's associated codepage is used. Additionally, the CCSID for the
session based configuration file can be
explicitly entered. This is supported to make it easier to enter mixed byte
data in the session based configuration file. The root file system does not
allow the specification of mixed byte CCSIDs unless they are in QSYS.
Browser Encoding
Most browsers that are used to interact with the server have a setting that is
used to set the browser's encoding method. The server's content CCSID
configuration value should be set to a CCSID that is compatible with the browser's encoding setting.
When the server receives text data from the browser the server converts the text from the
content (ASCII) CCSID to the server job's (EBCDIC) CCSID, and the reverse when returning
text data. If the content CCSID does not properly match the browser's encoding setting
then certain characters may appear to be corrupted.
To support multiple types of encoding on the same AS/400, multiple servers would have to be
configured.
Server CCSID
The server job(s) CCSID is controlled by setting the CCSID for the
server user profile. Since V3R1 an AS/400 job will always
have a "real" CCSID which is known as the default CCSID. The server converts data between
the job's (default) CCSID and the configured content CCSID when processing browser
requests. The default CCSID should be compatible with the content CCSID
or data may appear corrupted.
Serving Webulator Screens
By default, the server assumes the Webulator interactive session's CCSID and the virtual
device's code page match the server job's CCSID. If this is not the case, the
virtual terminal job CCSID and/or
virtual terminal device CCSID configuration values may need
to be set. If the CCSIDs/code pages do not match then data may appear corrupted.
To serve double-byte screens, change the
Terminal Size parameter to DBCS.
No changes are needed for non-double-byte screens.
Conversion Methods
OS/400 conversion routines are used to convert data. When converting data,
Webulator/400 will keep trying different methods until one succeeds or all
have been exhausted. Webulator/400 tries the following methods (in order): best
fit, enforced subset, round-trip. The three methods differ in what their
purpose is and how they handle mismatched characters (characters that do not
convert correctly).
Best fit conversion will attempt to find a close alternative for a mismatched
character. For example, if converting the letter o with an umlaut above it
(ö) to a CCSID that does not contain this character, it might be converted
to a o without an umlaut above it.
Enforced subset conversion deals with mismatched characters by replacing them
with a single substitution character. The substitution character depends on
the encoding scheme of the destination CCSID.
Round trip conversion is meant to allow conversion from one CCSID to a
second, and then back to the first CCSID without a loss of information. This
is the least useful for Webulator/400, because all conversions are one-way,
and so is a last resort.
If a conversion from a single-byte CCSID to a mixed byte CCSID fails, Webulator/400
will also attempt a conversion from the single-byte CCSID to the
single-byte codepage of the mixed-byte CCSID.
CCSIDs and ISO character sets
While OS/400 uses CCSIDs to identify the way text data is encoded, the
World Wide Web uses ISO character sets to identify the way text data is
encoded. Following is a table showing some of the useful ISO character sets and
associated CCSIDs:
ASCII
ISO character set CCSID
----------------- -----
US-ASCII 367
ISO-8859-1 819
ISO-8859-2 912
ISO-8859-5 915
ISO-8859-7 813
ISO-8859-8 916
ISO-8859-9 920
ISO-2022-JP 5052
Note that ISO-8859-1 (CCSID 819) is the default character set for HTTP and is
the default value for the Content CCSID.
Limitations
- URLs
-
URLs must be single-byte except for the query
string, which can be mixed-byte. This is a limitation of the HTTP
specification.
- CCSIDs
-
CCSIDs with an encoding scheme (ESID) of 4403 and 5404 are not currently
supported.
- File names
-
All file names must be single byte. This includes configuration file names
and content file names.
Related Documentation
The following IBM manuals contain information that may be useful to you:
Character Data Representation Architecture Level 1 SC09-1390-00
Character Data Representation Architecture Level 2 SC09-1390-01
AS/400 International Application Development SC41-3603-00
AS/400 National Langauge Support Planning Guide Version 2 GC41-9877-02