(Microsoft included)

Being involved in translation memory systems, the question I get a lot is “how do you actually search for translations ?”. Well, simple : we use a neat search engine, that wasn’t directly designed for translation memory applications, but turns out to work pretty well for us. Last year, I got the pleasure of integrating an engine that came from SunLabs into our TM system – and while doing so, got hooked on search engines and the potential for other applications of them.

Desktop search is one of the things I think is cool, though not everyone gets it (and more importantly, I’m not sure they get why it’s cool). Here’s a bit of a post from the Slashdot article today about Microsoft’s foray into the realm of desktop search :

Am I the only one who doesn’t get the point of this new-fangled “Desktop Search” idea? I mean, I tried installing the Google Desktop Search for awhile, but I never actually used it. In fact, I couldn’t even think of a use for it. Unless you’re hard drive is completely unorganized, or you’re on a multi-user computer, I don’t see the point of searching for things you should already know you have.

Advanced search isn’t just about searching for things you already know you have, it’s about your computer helping you to forge relationships between distinct pieces of data that you know you have have. A nice side effect of indexing your information in order to use a search engine, is that if your index is good enough and your system is clever, you should be able to perform some data mining on your hard-disk. That is, you can index and search your email (incoming and outgoing), all the people you’ve spoken to over IM, the webpages you’ve visited, the documents you’ve written and every combination of the above (like that web page your friend Niall pointed you to while you were chatting last week and perhaps the fact that he also emailed you something else on the same day) but crucially, have your computer identify relationships in this mass of data.

Did you see that 3.8 Ghz processor that was just released : you might as well do something to take advantage of it’s spare cycles – there’ll be lots of these. (otherwise, doing protein research would be another excellent use of it’s spare time)

There’s a couple of approaches to the desktop search problem : Apple have one, Google have one and the folks working on Beagle have another – I’m not sure if any are the right approach, but they sure are interesting : can’t wait for this stuff to hit my desktop. Having said that, the systems we’re starting to see today are really just another implementation of what the PalmPilot had years ago – the ability to search quickly across several different content types.

The next step, I believe, will be the promise of a PDA that can actually organise your data for you, and inform you of relationships between your data that you haven’t yet spotted. Likewise, it should be able to provide meaningful summaries of the information you’ve collected, and perhap spot trends in the data. Computers should use their extra cycles helping us out : not spinning around doing nothing. We’ve had bits and pieces of this vision for a while – that “What’s related” web-browser feature, predictive text input methods, that bloody paper clip in Microsoft Word : we’re getting there. Until that happens though, people will still be writing complex database descriptions in SQL and worrying about database normalisation and suchlike. Ho hum.