corpora, size, queries = better resources, more insight

 Upgrade   Contributors 

 Academic site license 

Size, speed, queries
Insight into variation

Updates (May 2016)
History / updates
FAQ / questions

Register / create profile
Log in / password
Reset password

Related resources
   Full-text data
   Word frequency
   Academic vocabulary

Contact us

In addition to the regular corpus interface, there are a wide range of other corpus-based resources, some of which allow you to download large amounts of data for offline use. (Compare to academic license)

Download 440 million words of full-text data for COCA (190,000 texts), 385 million words from COHA (115,000 texts), or 1.8 billion words for GloWbE (1,800,000 texts). With this data, you will have the texts from the corpora on your own computer, rather than having to use the web interface. The data comes in three formats: relational database, word/lemma/PoS (vertical format), or text (linear format).

Word and Phrase
(analyze texts)

Enter entire texts and see detailed frequency information on the words in the text, and create word lists based on your text. Click through the words to see detailed information on any word. Highlight phrases in your text and have it search for related phrases in COCA.

Word and Phrase
(frequency lists)

Search and browse the most complete frequency dictionary of English. See detailed information (all on one page) -- definition, frequency by genre, collocates (nearby words), concordance lines, synonyms, and Wordnet-related words, all with useful links from one resource to another.

Word Frequency
100,000 list

Download free lists, including the top 5000 lemmas. You can also download other lists, which show the frequency of the top 60,000 lemmas by genre (and sub-genre). You can also download a 100,000 integrated word list from COCA, COHA, BNC, and SOAP -- the largest, corrected frequency list of English.


Download lists with the top 200-300 collocates (nearby words) for 60,000 different lemmas -- 4,300,000 node/collocate pairs in all.


Download free lists containing the top 1,000,000 2-grams (two word sequences), 3-grams, 4-grams, and 5-grams in COCA. There are also other lists that contain the frequency of all 2, 3, and 4-grams (up to 155 million rows of data).

Academic Vocabulary

Download free lists from the 120 million words of COCA-Academic texts, including academic words grouped by word families, lists of "core" academic English, and "technical" word lists for the nine domains of COCA-Academic (e.g. Law, Medicine, or Business).

Word and Phrase

Similar to the two resources below, but limited strictly to the 120 million words of COCA-Academic. Get detailed information on words and phrases, frequency by sub-genre, and concordances and collocates in just the academic genre. Also, analyze entire academic texts that you input.