Following on from my bag of words post, folks suggested I look at doing n-gram word lists as well, and seeing their frequency distribution. I did that, but I’m not sure if the results are terribly interesting, but it wasn’t much extra work, so that’s okay. I suspect my sample size was too small …

The Wikipedia page on N-grams might be more interesting perhaps ?

(the post that started all this is back in the archives a bit)