Today, we’re integrating the file support Charles and myself worked on for our translation memory system. Hopefully, by this evening, we’ll be able to translate .java resource bundles, .properties files, .msg/.tmsg files and .po files (both Solaris and GNU variants) using the same system we already use to translate xml, html, plaintext and Solbook (a subset of Docbook SGML)

For kicks, I also had a quick look at the size of our production database, and it’s really grown quite a lot over the weekend (that’ll be the StarOffice translations being imported I guess).

We now have 775,000 english segments and 3.9 million translated segments on the system. To put that in perspective, I took a random book off my shelf – Rendezvous With Rama, and made an approximate count of the segments in it. It works out at ~ 35 sentences per page and 243 pages =~ 8505 sentences.

Of course, our database doesn’t contain segments that are likely to be in Sir Arthur’s works, so we wouldn’t get much translation there, but on it’s a different story. The more leverage we can get from the database, the faster we’ll be able to produce localised products!

I’m particularly excited about getting the software formats integrated with the server – they’ve been on the long finger for quite a while now, since their structure is pretty easy to parse, we already had tools that could do a simple leverage from one version of a file (and it’s translation) to another version of that same file, reusing the translations from before. The trouble was, you’d still have to go to some effort to make sure that the translations you did for the software messages matched the style of translations used in the accompanying documentation. Now that we can pull translations for both docs and software from the same database, my hope is that we’ll get more consistently translated products.

Some time, it would be interesting to see if we have much material that’s been translated a few different ways, and generally to do some more data mining on the system – perhaps in my already ample free time ?

Next stop, to integrate the software-message highlighting support in our translation editor and then roll on to .mif and StarOffice formats. No rest for the wicked !