Like Chris, I too am stitting at home having read my morning mail, waiting for the rain to ease off before I cycle into the office – I know, I’m a complete wuss. Anyway, this gives me a chance to talk about what I’ve been doing for the last few days.

Our editor has nearly been released for use by the OpenOffice.org guys, but the other day we found a few last-minute bugs that I’ve been fixing. Both were unfortunately a direct (or indirect) result of the somewhat chaotic internals of the code, and how it handles XLIFF
<group>
elements (which are to used to group together logically related sections). We’re using these in the
XLIFF sub-segmentation
stuff I talked about before.

The first (and easier to fix) was for the XLIFF merging and splitting functionality we have in the editor. First written to work around performance problems with opening large files, we have a feature which allows the user to split a large file into smaller files of a few hundred
<trans-unit>
elements each. John has long since fixed the performance problems, but the splitting and merging feature is still handy to allow several translators work on the same document at once. The trouble was, it was sometimes splitting in the middle of groups, breaking files. So, it’s now being more polite and instead of breaking every mod-whatever elements, it’s waiting till the group closes before starting to write a new file.

The second bug was more complex, caused basically by the fact that the editor has had many authors working on it in the past, and it hasn’t always been terribly well documented, so there’s been some confusion over how it works. In this case, it seems that <group> elements were being used to support the concept of 1:n alignments, that is, where one source language sentence can be translated to several target language sentences. There’s also support for n:1 and m:n alignments although the editor doesn’t actually allow translators to ever create these complex sentence alignments directly.

On our system, such sentence alignments are only ever created by a program which does
sentence alignment that we tend not to use all that often. The alignment program reads in two documents, a source language document and it’s translation. It then segments the text and then tries to identify which source language sentences correspond to which target language sentences. This is difficult to do well, and nearly always needs some human post-editing to make sure the alignments are correct. Once these alignments have been identified, we can import them into our TM system and provide the matches via the editor.

Needless to say, the TM system support for finding matches of several source language sentences which map to one target language sentence was quite hard and again, it’s rarely used.

So, changing the editor around to use XLIFF groups for the subsegmentation behaviour was a bit tricky, and was causing some segments to not get saved when they should have been. At some stage it’d be really nice to revisit the internals of the editor, and clean them up a bit. Just a case of
finding some time and resources…

Oh, I see the rain has just stopped – better get on me bike !

Advertisements