These corpora were created by
Mark Davies,
Professor of Linguistics at Brigham
Young University. They have many different uses, including: finding out how native
speakers actually speak and write; looking at language variation and change; finding the frequency of words, phrases, and collocates; and designing
authentic language teaching materials and resources.
The corpora are used by more
than 80,000 people each month (more than 200,000 visits), which makes them perhaps the most widely-used
corpora currently available. They also serve as the basis for an increasing
number of publications by researchers from
throughout the world.
|
English |
# Words |
Language / dialect |
Time period |
Compare
to: |
|
Corpus of Contemporary American
English (COCA) |
425 million |
American English |
1990-2011 |
Google,
BNC,
ANC,
BoE |
|
Try the alternate COCA interface:
www.wordandphrase.info. Frequency lists
(1-60,000), integrated genre information, definitions,
collocates, concordance lines, synonyms, and WordNet |
|
Corpus of
Historical American English (COHA) |
400 million |
American English |
1810-2009 |
Google Books,
small corpora |
|
TIME Magazine Corpus
of American English
|
100 million |
American English |
1923-2006 |
|
|
BYU-BNC: British National
Corpus* |
100 million |
British English |
1980s-1993 |
COCA |
|
N-grams |
|
|
|
|
|
Google Book (American English) Corpus
|
155
billion |
American English |
1810-2009 |
Google Books (Standard) |
|
Other languages |
|
|
|
|
|
Corpus del Español |
100 million |
Spanish |
1200s-1900s |
CORDE and CREA |
|
Corpus do Português |
45 million |
Portuguese |
1300s-1900s |
|
* Our architecture and interface to the
BNC from
OUP
|