This corpus was available from 2005 until October 2009. It allowed researchers to access 37,000,000 words in 2,200,000 quotations from the Oxford English Dictionary to study a wide range a changes in English. These included changes like the following:
Due to the advanced corpus architecture and interface it was also possible to find, for example, all words that had a much higher frequency in one period than in another. Examples of this would be adverbs that increased in frequency from the 1700s to the 1800s, or adjectives with woman that occurred much more in the 1800s than in the 1900s. This is a huge improvement, of course, over the simple OED interface, which just indicates the first occurrence of a word, but has little if any sense of its frequency over time, or which words occur together.
We tried for several years to collaborate with Oxford University Press on the use of this corpus. We even offered to give them free access to the frequency and collocates data from the corpus. In spite of our repeated efforts at collaboration, no one from the OUP ever responded -- ever. And then in October 2009, we received the following email:
I am the director responsible for the Oxford English Dictionary at Oxford University Press. It has just come to my attention that you have made available a large searchable set of quotation data derived from the OED on your corpus web site. I recollect that you were in touch with us on previous occasions about the possibility of extracting data from the OED for your research, but I do not recall any agreement being reached by us that you were entitled to download copyright OUP data systematically in this way, whether for research or any other purposes: such downloading is prohibited under the terms of access to the online edition of the OED. Nor do I recall any agreement that you should publish our copyright data.
I request that you immediately remove all means of electronic access to any OED dataset in your possession. This includes any form of access within your institution, as well as access on the worldwide web.
I shall be taking further legal advice on this matter. OUP is willing in certain circumstances to co-operate with linguistic researchers who wish to base specific research projects on aspects of OED or other Oxford dictionary data. However, any such use must be explicitly agreed by us in advance, and would not normally include any right to publish our copyright data.
Scholarly and General Reference
Needless to say, because we do not have access to the same range of legal and financial resources as the Oxford University Press -- and under threat of legal action from the OUP -- we are forced to withdraw access to this research tool.
For those who are interested in large historical corpora of English, however, we might mention the 400 million word Corpus of Historical American English (COHA), which is now available.