CORPUS.BYU.EDU

seven online corpora | 45 - 425 million words each

corpora queries history researchers publications profile | register FAQ | questions contact us offline data

 
These corpora were created by Mark Davies, Professor of Linguistics at Brigham Young University. They have many different uses, including: finding out how native speakers actually speak and write; looking at language variation and change; finding the frequency of words, phrases, and collocates; and designing authentic language teaching materials and resources.

The corpora are used by more than 80,000 people each month (more than 200,000 visits), which makes them perhaps the most widely-used corpora currently available. They also serve as the basis for an increasing number of publications by researchers from throughout the world.

English

# Words

Language / dialect

Time period

Compare to:

Corpus of Contemporary American English (COCA)

425 million

American English

1990-2011

Google, BNC, ANC, BoE

Try the alternate COCA interface: www.wordandphrase.info. Frequency lists (1-60,000), integrated genre information, definitions, collocates, concordance lines, synonyms, and WordNet

Corpus of Historical American English (COHA)

400 million

American English

1810-2009

Google Books, small corpora

TIME Magazine Corpus of American English

100 million

American English

1923-2006

 

BYU-BNC: British National Corpus*

100 million

British English

1980s-1993

COCA

N-grams

       

Google Book (American English) Corpus

155 billion

American English

1810-2009

Google Books (Standard)

Other languages

       

Corpus del Español

100 million

Spanish

1200s-1900s

CORDE and CREA

Corpus do Português

45 million

Portuguese

1300s-1900s

 

* Our architecture and interface to the BNC from OUP