Simos was thinking more about what may or may not have been a fairly zany idea that I’ve been playing around with in my ample free time of late. He suggested that it might be interesting to see what sort of word distribution we have in the GNOME UI at the moment (so as to determine what words would be most beneficial in a bi-lingual dictionary that said-zany-idea uses).

I managed to dig up some sources for GNOME 2.10, and since I didn’t want to build all the POT files, I just took the pa.po translations (which were listed as 100% translated), and concentranted on the msgid strings.

I wrote a quick bit of Java (~150 lines), which, using the Open Language Tools PO parser, pulls the msgids, uses a Java BreakIterator to split up words, blasts them to lower case, and writes out a frequency distribution. The program stdout is here along with an OpenOffice doc containing the list of the words, and the frequency they appear in the UI.

Now, if we got the top x words translated and put into a dict formatted dictionary, then perhaps my idea of trying to bridge the digital divide by providing just enough translation wasn’t as zany after all ?

As always, thoughts and comments welcome.

by the way, it’s nice to know the most common English word in the GNOME 2.10 UI is “the” – who’d have thunk ;-)

update: – of course, I should have said “… top X nouns translated …” above : translating other parts of speech probably wouldn’t help in this case.