corpus.byu.edu

corpora, size, queries = better resources, more insight


Overview
Corpora
Size, speed, queries
Insight into variation

History / updates
FAQ / questions
Researchers

Register
Modify profile

Related resources
   Full-text data 
   Word frequency
   Collocates
   N-grams
   WordAndPhrase
   Academic vocabulary

Problems
Contact us


The corpora at this site were created by Mark Davies, Professor of Linguistics at Brigham Young University. They have many different uses, including: finding out how native speakers actually speak and write; looking at language variation and change; finding the frequency of words, phrases, and collocates; and designing authentic language teaching materials and resources.

The corpora are used by more than 170,000 people each month (more than 330,000 visits) -- for example, 65,000 distinct people each month for COCA alone. This makes them perhaps the most widely-used corpora currently available.

In addition to the nine corpora (and the Google Books (Advanced) interface), there are also many new COCA-based resources. The site www.WordAndPhrase.info allows you to enter and analyze entire texts, and see extremely detailed corpus-based entries from a frequency listing of the top 60,000 words in English. The sites www.wordfrequency.info, www.collocates.info, and www.ngrams.info allow you to download large amounts of corpus data for offline use. Note especially the new 100,000 integrated word list from COCA, COHA, BNC, and SOAP -- the largest, corrected frequency list of English.