Today I guess we did the equivalent of what those kernel/OS guys call a “bringup”. Where they’re interested in booting the OS or OS-loader for the first time on a new system, we loaded a document into our translation editor that was produced from a file format that it hadn’t seen before.

In Sun, we produce software messages and documentation in several formats – the most prevalent being sgml and xml (which is what’s used by most books on and html. For software UI, as I mentioned before, we have .po, .java resource bundles, .properties and .msg/.tmsg file formats. There’s a few out-lying formats that we don’t yet support though.

At the moment, we’re working on Adobe FrameMaker documents. Frame, if you’ve never encountered it before is a pretty tricky file format to deal with. Instead of dealing with native .fm documents (which really would be biting off too much) we’re using their intermediate file format, .mif – Maker Interchange Format. Based on the name, you’d think that it would be relatively easy to deal with, but it’s not. Invented long before xml, it has all sorts of nuances you need to worry about. Translated Frame documents need to have the correct font names embedded, otherwise you’ll see gibberish in the output. There’s cross references, index markers, a whole new character-encoding method to deal with – really you couldn’t ask for a more complex file format to deal with except perhaps pdf, postscript or some other pdl-type of document, but at least there’s lots of converters out there that do those already, and besides they’re seldom a source document format)

So, weighing up all our options – we bravely chickened out.

It turns out, there’s a company out there that has a really really good Frame filter, that can produce XML containing the translatable text and another file containing the non-translatable text, given enough prodding. So, since we already had an XML filter for our TM system, we decided a 3 step process would do the trick :

  • Send the mif file to their converter
  • Get the resulting XML, pass that to our XML filter/sentence segmenter
  • Take the XLIFF file that results and translate it as usual

(and of course, do similar steps in reverse with the translated XLIFF)

Now, the only trouble is, our translation system is a J2EE-based system, running on a large Solaris server and the Frame filter we’re using was win32 only (and porting it would be a pain) So, Webservices To The Rescue!

A few method calls later (to write the webservice that could sit on the windows machine and do appropriate Runtime.exec() calls [1]) and a bit of network traffic, we’re now able to translate Frame documents – getting one of the last documentation formats that we use at Sun supported (at least in theory) by our translation system. Thanks to XLIFF, there was no need to change the rest of the system when adding a new format – everything else just worked, more or less.
As with any bringup, there’s lots more work to do – but just seeing the .mif document pass through our system, get magically routed to a win32 machine in China and wing it’s way back to our machine was really quite a moment. — it’s times like that, that I really like my job.

There’s a few other formats that we’re working on : StarOffice/OpenOffice would be really nice to have, along with some other internal format, but at this stage, I’d say we’re close to covering 95% of the files that get sent out to translation. What’s next ?

Yes, we were too lazy to write a proper JNI wrapper to the Frame conversion program, sorry!