I18n Proposal A

From CommonJS Spec Wiki

This proposal focuses on i18n for strings javascript code and strings in HTML files. It may be better described as a module rather than a standard.

Contents

Rationale

Basic internationalization is part of the standard library/platform. Good examples are Python and the GNU platform. Internationalization is frequently an afterthought for software developers so having a well-defined and simple API can ensure that applications can be internationalized without major refactoring.

Philosophy

The i18n mechanism should work in a client/server application and completely offline without any server-side emulation. This is distinctly different than how i18n works in web frameworks such as ruby-on-rails and django. Most web frameworks use server-side html templates that replace translatable strings before sending the page to the client. This scheme does not work for offline applications. It also relies on html markup that is either invalid or unrenderable prior to pre-processing. It also makes the application very dependent a on a specific server-side handler and particular web server configuration.

This document proposes an i18n mechanism without server-side dependencies and using valid html5 markup. It also proposes a mechanism that is as simple as possible and can be applied incrementally to an existing application.

Applicability

This specification is applicable to:

  • Marking strings in javascript code for translation
  • Marking strings in html for translation

It is not applicable at this time to:

  • Common Locale Data Repository (CLDR) as exemplified in POSIX locale
  • internationalizing CSS attributes such as fonts and images

Basic Mechanisms

Method for Marking Strings in javascript Code

1. The Standard GNU Gettext method _("A translatable string");

 Example:  document.write(_("Translate me!"));

2. The same method within E4X

 Example: var navigationBar = <nav><button>{_("Go Back")}</button><button>{_("Reset")}</button></nav> ;  


Method for Marking strings for Translation in HTML

Use the data-* collection of author-defined attributes in html5

Example 1.

  <p data-_="true">To say hello in Spanish, say <span data-_="false">Hola</span></p>

Example 2.

  <button data-_="true" data-_context="research">Compile Articles</button><!-- Somewhere else in same application --><button data-_="true" data-_context="computers">Compile Code</button>


Marking strings means that they will fetched from the translation .po files at run time and that collection script xgettext can be used to gather string for translation.

Storing and Retrieving Translations

The translations will be stored .po files. PO (Portable Files) are well supported by online translation tools such as Pootle.

xgettext is the standard tool for grabbing translatable strings from an application. CommonJS requires a js implementation of this tool.

Implementation

Here are the methods, attributes, global variable, and helper scripts I would like to see. It is primarily using Gettext.js

Methods

Pretty much entirely copied from jsgettext

  • new Gettext (args)textdomain( domain )
  • gettext( MSGID )
  • dgettext( TEXTDOMAIN, MSGID )
  • dcgettext( TEXTDOMAIN, MSGID, CATEGORY )
  • ngettext( MSGID, MSGID_PLURAL, COUNT )
  • dngettext( TEXTDOMAIN, MSGID, MSGID_PLURAL, COUNT )
  • dcngettext( TEXTDOMAIN, MSGID, MSGID_PLURAL, COUNT, CATEGORY )
  • pgettext( MSGCTXT, MSGID )
  • dpgettext( TEXTDOMAIN, MSGCTXT, MSGID )
  • dcpgettext( TEXTDOMAIN, MSGCTXT, MSGID, CATEGORY )
  • npgettext( MSGCTXT, MSGID, MSGID_PLURAL, COUNT )
  • dnpgettext( TEXTDOMAIN, MSGCTXT, MSGID, MSGID_PLURAL, COUNT )
  • dcnpgettext( TEXTDOMAIN, MSGCTXT, MSGID, MSGID_PLURAL, COUNT, CATEGORY )
  • strargs (string, argument_array)

strargs in particular will have to be modified to handle native numerals like १ २ ३ ४ ५ (1, 2, 3, 4, 5 in Nepali). Notably Arabic and Hindi use the standard numeric system (1-10) but different characters to represent the numbers

HTML5 Attributes

For specifying a String should be translated

  • data-translate="true|false" text for element should be translated
  • data-_="true|false" short form of the above
  • data-_C="true|false" grab all text from this element and all its children
  • data-_I="true|false" grab text AND all inline markup, then translators can decide whether <i> or <strong> are semantically meaningful in their language. This grabs all innerHTML and leaves it to the translators to decide
  • data-_comments="help explain meaning of text to be translated"
  example:   <button data-_="true" data-_comments="File is used as verb">File</button>
  • data-_ctxt -- Context -- to differentiate between different usage of the same word w/in the same document, particularly when those meanings do not have a synonym in a different language
  example:  <button data-_="true" data-_context="research">Compile</button>   <button data-_="true" data-_context="computers">Compile</button>

Helper Functions

 xgettext - w/ essentially the same options and function as gnu gettext, but w/ at least one new switch --report to   
 indicate what percent of the application has been translated into other locales

Environment Variables

ALL_LINGUAS = "de en fr"

this variable indicates what locales the application has translations for.

Relevant Files and directories

po/ for po files that contain translations

  POTFILES.in     files that xgettext should grab strings from
  POTFILES.ignore   files that xgettext should ignore
  application_name.pot
  locale_name.po     translation of the application for a given locale


Test Cases

Sample text (Everything should be translated)

 Hello world 
       

Sample text with placeholder for dynamic data (The generated POT file should have the tags as well)

 Score : 0
       

Sample text with context (Everything should be translated, additionally the generated POT file should have two instances of "Compile Articles", with different msgctxt)

 Compile Articles
       
 Compile Articles
       

Date and time (this should appear in local format, eg: শুক্র সেপ্টেম্বর 18 15:10:58 IST 2009)

Numbered list (The numerals should be in native digits)

  1. Hello world1
  2. Hello world2
  3. Hello world3

Right to left text - Direction and justification (left|right) should be preserved

       خطأ في مُغيّرات الألوان المحددة.

Further Reading

TOOLBOX
LANGUAGES