In a comment to my last log entry, Lucas pointed me to OOXlate a fairly nifty addon to OpenOffice that uses a webservice to lookup a TM System. It helpfully included a wsdl of the webservice it was trying to lookup.

So, a few minutes later, I had loaded it into SunONE Studio (Netbeans with bells on) and had generated an EJB which implements the webservice. A few minutes later still, I had the bones of an implementation of the TranslationMemory.wsdl that can call our translation memory server – it’s not quite working yet, but it’s close.

The webservice defines two simple methods, which translate into the Java :

public TranslationMemorySegment[] getSegmentMatches(java.lang.String project, java.lang.String filename, java.lang.String srclang, java.lang.String targetlang, java.lang.String segtype, int nummatches, java.lang.String source) throws java.rmi.RemoteException;

and

public int putSegment(java.lang.String project, java.lang.String filename, java.lang.String srclang, java.lang.String targetlang, java.lang.String segtype, java.lang.String source, java.lang.String translation, int segmentnumber) throws java.rmi.RemoteException;

These are pretty fine grained. To translate a single file, you'd need to call the getSegmentMatches(...) method for every segment in the input file.

One of the large performance gains we got on our system, was instead of translating segment at a time, we'd translate entire files at a time, cutting out all that shuffling back and forth of data between the server and the client. It also allowed us a more global view of the input document, we could count the number of segment repetitions, so a translator could see how much of the document could be translated via cut-n-paste. (or by using more sophisticated means)

The other big one (and I mean, really big!) was that we'd allow the user to translate each file into multiple languages. This meant, that we could cache the fuzzy lookup of the source language string, saving just the database ID of that segment, and returning the localised version of the segment. The next time we were asked for a fuzzy lookup to return the same segment in a different language, we'd just pull the segment from our cache. A given fuzzy lookup takes about 2 seconds to pull a similar english segment from a 900,000 segment index, but only 6 ms to get the translated version of that segment, we were getting a very large speedup by doing this.

The trouble is, with a fine-grained call like the one used for this webservice, we wouldn't be able to do tricks like that - which is a shame really. But it leads me to wonder - in a typical distributed system, how do you determine the granularity of your method calls ? Anyone know ? (I'd probably be worring a lot about The Eight Fallacies of Distributed Systems.)

Some folks at IBM had a different view of webservices
in the translation industry
- being of much much larger granularity. I don't know how far that went, but it seems very much industry focused.

People working on open source translation I guess would have a different view of things, they're mostly interested in code management issues - tracking how much of the project has been translated, and what progress is being made by the various translation teams. The folks working on the GTP have some great status pages that seems to do the trick.

What other things would be useful for a community of open source translators ? Presumably a translation editor that embeds some peer-to-peer cleverness to allow for connected groups of translators to share their work (that's been on my mind for a feature-addition to our editor for a while now, haven't had a chance to work on it yet though)

Can anyone think of other stuff that would be handy ?

Advertisements