Recently, just out of curiosity, I did a Google for open source translation tool. The results were disappointing. Of the 8 highest ranking results (the number that displayed on my browser window without scrolling), 5 had nothing to do with tools that help people translate software and documentation into different natural languages. Where are all the open source translation tools ?

It looks like the set of translation tools that are used by most open source translators are emacs po-mode, KBabel and poedit. Strange : all of these tools are focused around one file format, gettext po files. PO files are pretty simple, as the GNU gettext manual explains, but they’re really heavily focused towards the translation of software UI messages. Earlier attempts at using PO files as the basis for a decent translation memory system in Sun failed — we just ended up hacking the format by adding comments that included special tags, so we’d get po files looking something like :

# This is a normal comment from the source code author explaining what
# the translation is for (perhaps describing it's context)
# @LC_TRANSLATION@ This is a translators comment, explaining why they translated it this way
# @TM_TOOL@ total-words 5
# @TM_FUZZY-ID@ This is a blue apple
# @TM_FUZZY-STR@ Voici une pomme bleu
msgid "This is a red apple"
msgstr ""

– completely horrendous : our tools were the only ones that could understand the significance of the @FOO@ comments, so people were stuck with using our tools to take advantage of the work our tm-system had done to count the number of words for translation and suggest fuzzy matches. Not good. The trouble is gettext doesn’t allow for the sort of rich metadata that translators really need to do their job more effectively. I’ve already harped on about translation standards, especially XLIFF, and why it’s the solution to this problem, so I’ll not bore you guys any further.

Maybe the lack of interest in XLIFF (at least from the open source community) is because of the absence of a complete, free, reference implementation of something that can manipulate XLIFF documents. The Okapi framework is a step in the right direction, in terms of getting more exposure for the format, but the projects aims state that it’s not trying to be a replacement for any commercial or open-source implementation of a translation tool : so no joy there either. (I think getting a free reference implementation of new technolgies is something standards groups should really strive for : cf. NFS v4. or J2EE 1.4)

Of the other open source translation tools out there, ForeignDesk, being win32-only doesn’t seem to be getting much attention, and OmegaT is quite basic as regards features (though the fact that they’ve got suppport is way-cool!) In general though, there’s really not much out there that can help open source translators when translating documentation.

In some ways, I can understand that the first thing you want to be able to do, is produce the interface of your application in another language to gain as many new users as possible. If the interface is good enough, perhaps your new-found users can live without documentation, but if you really want to give them a good user-experience, a translated manual is the next step !

Attentive readers at this stage, will start finger pointing – perhaps at previous blog entries I’ve made on the subject of how to write a TM system, with a “so what are you going to do about it ?” attitude. I wouldn’t blame them — I still think there’s a good chance we’ll be able to help out (if nothing else, having open source translation tools would probably help Project Looking Glass and those things you’ve been hearing about open-sourcing Solaris). Before that though, I’d be interested to hear of any other open source translation tools out there – does anyone know of good ones I’ve missed ?