This is the third (and possibly last) post in this series of me thinking about the question “How much translation do we need ?” For the full background, read posts about on-the-fly dictionary lookups and word/phrase frequency in GNOME..

Remember, the point of all this, is to see how to best use our translation resources. Given a small pile of resources (be it hard-cash, pizza, free-beer, whatever..) how do we use those resources most effectively to do translation ? Note that this isn’t necessarily the most complete translation, nor the most accurate – I’m interested mainly in what’s “good enough” here.

I remember reading a while back, some ghastly figures showing exactly how many of the features in a given needlessly bloated wordprocessing application (names withheld to protect the guilty!) that typical users ever access, and being pretty amazed at the results. I can’t find the original article now, but the summary was, that there was a huge number of features that were never actually used!

So, the thing is – if there’s loads of features that are never used, that in turn must mean that there’s loads of translations that are never used either ! Can you see where this is going ?

What I thought would be interesting, would be to have a go at determining what strings are actually displayed in a given application, and then compare that against the software message files for that application and see what we come up with. Initially, I started wandering around the excellent GNOME Accessibility Project, thinking that I should be able to easily capture the displayed text in an application that way. So far, my results seem to have been mixed. Being of a Java persuasion, I immediately started messing about with the test programs JNav and TestAT. These seem to work up to a point, but unfortunately not all displayed strings actually appear from these programs output (for example, it doesn’t seem to read tooltips to me) – I could be doing something wrong here, comments welcome!

I was a bit dismayed by this – thinking that I’d have to do backflips to get this info – previously I had been using library interposers to intercept calls to gettext, which gives me some info alright, but a string that’s being that looked up via gettext isn’t necessarily the same as a string that’s displayed on screen.

But, there’s more than one way to do it – DTrace to the rescue! – a 6 line script (sigh) seems to tell me what I’m after – at least for applications that draw strings using Pango, but of course, we could do similar with any text rendering mechanism or function call :

#!/usr/sbin/dtrace -qs
printf("called \"%s\"\n",copyinstr(arg1));

Of course, this will give me everything that’s displayed, not just translatable strings, so it’ll include user-output and other dynamically generated strings too (and indeed, they’ll already have the printf format strings from the message files replaced by the actual values) but I’m making progress. Next step, compare these against message files, and find out what strings are actually called. Does this sound like fun ?

Given any application, I’d love to be able to say to a translator : “at least you should translate these strings first, then worry about the others…”

More effective use of translation resources – I like it.