Here’s my first “on-topic” post of 2005. I’ve been on vacation for the last while, so hence the lack of strictly work-related posts. Not that I wasn’t thinking a bit about translation language and tools during the holidays, I just didn’t feel like blogging about it.

My brother pointed out a while back a project called Rosetta that looks quite interesting. At the same time, Pootle was also announced (I’d talked to those cool guys at before about it, and it seems they’ve actually released it now – way to go guys, I like it !)

These tools can roughly be classified as Translation Portals – or GMS (Globalisation Management Systems) if you want to be a bit more of a marketroid. Where the point of translation memory tools is to take the pain out of doing translations, the point of translation portals is to take the pain out of all of the ancilliary tasks that should really be automated instead. To help you understand what they do, allow me to delve into some history :

Back when I started at here Sun, I was working as a localisation engineer on Solaris 2.6, HotJava, JDK 1.1 and a few other projects. Things were fairly primitive back in those days wrt. our translation process. Every time we got a new bunch of files for translation from the base teams (the people writing the software) I’d diff each message file against it’s previous version to determine whether or not it was worth sending out to translation – if there were enough changes, we’d count the number of new and changed words in each file and then FTP the files out to the various translation vendors that we had contracted to do the translations for us. At the same time, a project manager would update a spreadsheet with the financial details of each delivery, making sure that we would get billed from the vendors for the appropriate amount of translation. (translators tend to charge per-word translated)

Once each set of translations had been completed, I’d get an email from the translation vendor, and I’d dig up their FTP details again, connect to their server and download the files (which usually came back in a variety of different archive formats (tar, cpio, zip), directory structures, encodings, etc.) I’d then have to check that all of the files were present, make sure they were encoded correctly and would make sure they were still valid files and hadn’t been corrupted during translation (just checked them with msgfmt or gencat). At that stage, I could actually start worrying about how to build them into packages to be added to the end-product.

Of course, as I did more of this work, I’d started to put together some ad-hoc scripts to make life a little less painful for me, but I was still far from a fully automated solution. Well, a translation portal is that fully automated solution I was striving for, I just didn’t know it at the time. A translation portal tends to have the following features (though implementations vary):

  1. some sort of workflow management
  2. integration with a TM system
  3. ability to deal with multiple content sources
  4. some sort of statistics gathering & display functionality

We’ve got one running inside Sun that we built a while back : it’s a bit flakey to be honest and I’d certainly not recommend it to my friends (it’s server-side Java before J2EE came along, which has been retrofitted with EJBs here and there. Fairly crufty code.) That said, it’s still orders of magnitude better than doing everything manually !

Going into a little more detail on each feature :

workflow management: It’s nice to be able to keep track of the files you send out to translation, determine which files are part of which delivery, where they’ve been sent to, what their current translation status is, that sort of thing. Having it plug directly into a traditional source code management system may well have it’s advantages…

integration with a TM system: Obviously, you would like to automatically reuse translations from the previously translated version of the files you’re about to send out to translation, since it saves you time and money

multiple content sources: Of course, you’d like to be able to pick up files from a filesystem, but perhaps you’d also like to plug into a content management system of some sort as well, or perhaps just a flat database schema…

statistics: Being able to see at a glance how much translation is going on is extremely useful, and also which translator is currently translating which file. I quite like the statistics pages the GTP guys are able to display.

So that’s basically it – there’s lots of commercial implementations of translation portals (or GMS systems) but they all tend to have those features. Their key ability is process automation – and trying to reduce the number of chores (and thus improve reliability) in a localisation process. And of course, automation is good, isn’t it ? Pootle and Rosetta are doing the right thing imho – starting off with a simple interface that allows people to translate po files online, but there’s no reason why they couldn’t start branching out into becoming the first open source Translation Portals. At the moment, GNOME, OpenOffice and Mozilla all have their own separate translation status/workflow management systems (well, CVS with bells on most of the time) but perhaps there’s a case to be made for adopting a common translation manangement system ? you don’t want ours, honest!