Reading PlanetSun and my
other daily links this morning, two things grabbed my attention that I
thought I’d talk about here. I wasn’t sure whether this was worth
putting in my “Translation, language & tools category” or “Other stuff”
– I think it crosses the boundary and I’ll explain why in a minute.



The first was something on John
Battelle’s Searchblog
– talking about a technology called
Transparansee – read the post here
(then come back here to see what I have to say about it)



Still reading ? Good stuff ! From the description of the article,
they’re using fuzzy search techniques to give clarity where there was
previously chaos. You know that niggly feeling you get from searching
though large amounts of data, thinking “I know it’s in there somewhere,
if only I could formulate the query properly…” well this seems to be
trying to address the problem. The nearest thing I can think of that
describes what’s going on here, is in graphics editors like GIMP
or Photoshop. There’s an option called “Feather
Selection”
that allows you to do things like this
– you’ve got a selection, but want to consider things slightly outside
your criteria, but the futher outside your criteria things become, the
more you want to filter them out. (Hmm, I bet that sort of graphical
technique be a useful way to display the results from a search engine –
wonder is anyone doing that already ? But I’m straying off the topic
now…)



The second thing that I think is interesting, is something that
Christopher wrote
about, a research
project
being carried out by UCD
(my alma mater) and IBM. It’s good see some research activities in the
UCD CompSci department getting some publicity. The topic here is
something that’s been on my mind for quite a while and is in the area of document
classification
.



How does this relate to translation tools ? Quite closely, I believe.
The application of fuzzy search in translation tools is pretty straight
forward, I’ve already
talked about that
: obviously translators want to
reuse previous translations where possible, if a string has been
translated that doesn’t quite match the new string, then taking the old
translation and changing it can be faster than retranslating from
scratch. But document classification – how does that fit ? Well, one of
the problems when writing translation tools, is how you select previous
translations to suggest to translators – having done the fuzzy search
and found strings that look similar, how do you then narrow that search
to show only relevant strings. Remember, the fuzzy search only looks at
the source language string – it doesn’t care about the context of the
source string or take into account the several possible translations you
may have (perhaps you decide to translate the a piece of text in a legal
document slightly differently to the same piece of text in some
marketing material) Right now, we’ve got a simple attributes-based
system : when doing a search for old translations, we can say “First
look for translations of this particular product, then look for
translations from this product team, then…” etc.



The problem is, when we manually specify attributes, it’s possible we
could be missing better translations that fall just outside our criteria
– enter document classification ! If we could classify a document
automatically, and then apply a fuzzy search mechanism to consider other
documents with similar classifications, this process would be a whole
lot easier and more efficient (and it would definitely simplify our
user-interface no end…) It’d be nice to have time to investigate this
area, but I fear we’re going to run into the same problem we’ve been
having for a while, limited resources and a requirement to support the
production systems, rather than sit on bean bags dreaming up new ways of
improving the tools. Oh well.



Anyway, there you go – that’s my last vaguely “on-topic” post for a
while. We’re leaving for New Zealand tomorrow afternoon and I don’t know
how often I’ll be able to blog on our progress, but I’ll probably not be
talking shop for a while. Cheerio !


Advertisements