A database schema

The previous two entries in this series have been about segmentation, something fairly specific to translation tools. However, this entry is about something that most programmers should be able to relate to. Writing a decent database schema takes a lot of thought and is difficult to get right. In particular, you need to know how you’re going to use the data and should be thinking about ways in which you can make the table layout as elegant as possible (since that usually ends up as being the one that performs best!)

For our purposes, here are the things we’d be interested in storing in a database :

  • Source language segments
  • Target language segments (linked to the above somehow)
  • Metadata about those translations
  • Metadata about the context of the original source language segment

That’s it — the four things that really matter when doing translation memory lookup and maintenance. Drilling down into more detail, there’s other questions you need to ask yourself :

  • Do I want to allow different translations for a single source segment
  • Do I want to allow duplicate source strings (or just have different contexts for a unique instance of that segment?)
  • Do I want to provide version control of the translations (to see what other translations have been done in the past for the same source segment)
  • Do I want to limit the metadata vocabulary – and how do I decide what metadata to apply ? (this itself can lead to opening a whole new can of worms)

We’ve run into all of these problems when designing our own system : the practical upshot of the whole thing is that you really need to understand how you're going to use the data – but I suppose that’s why DB consultants get paid the big bucks :-)