CORPUS.BYU.EDU

The following are some of the corpora that have been created by Mark Davies, Professor of Corpus Linguistics at Brigham Young University.

Name Availability Number of words Dialect / time period Content Searches / architecture / interface
English
BYU Corpus of American English Public, as of
Feb 2008
360 million American
1990-present
20 million words each year, 1990-present. Equally divided into spoken, fiction, popular magazine, newspaper, and academic. Will be updated at least two times a year. Search by word, phrase, substring, part of speech, collocates, etc. Limit and compare by frequency in different genres and years (1990-present) 
British National Corpus (BNC) Public 100 million British
Mainly 1980s-1993
90 million words written (fiction, newspaper, academic, etc); 10 million spoken. [Website for the original BNC] Re-engineered, relational database version of the original. Allows many types of searches not found in any other interface of the BNC.
Note: Was view.byu.edu
TIME Magazine Public 100 million American
1923-present
More than 275,000 articles from TIME Magazine. Wide range of topics: news, sports, business, culture, health, entertainment, etc. Search by word, phrase, substring, part of speech, collocates, etc. Limit and compare by frequency in different years and decades.
Other languages
Corpus del Espaņol Public 100 million Spanish
1200s-1900s
20 million words 1900s, 20m 1800s, 40m 1500s-1700s, 20m 1200s-1400s Search for words, phrases, substrings, part of speech, lemma. Limit and sort by frequency in different centuries and registers.
Corpus del Espaņol: Registers Public 20 million Spanish
1900s
Enhanced version of the 1900s component of the Corpus del Espaņol. Equally divided between spoken, fiction, non-fiction Compare frequency of 110+ grammatical constructions in twenty different registers. Re-tagged version of the texts from the Corpus del Espaņol.
Corpus do Portuguęs Public 45 million Portuguese
1300s-1900s
20 million words 1900s, including spoken, fiction, newspaper, and academic. Equally divided Brazil/Portugal. 10m 1800s, 15m 1300s-1700s Compare words, phrases, collocates, etc in different historical periods, across genres/registers, and in different dialects.
BYU-only (limited to on-campus use by BYU students and faculty)
Oxford English Dictionary (OED)
 
BYU only [SEARCH] 37 million Old English - 1900s 2.2 million quotations in the Oxford English Dictionary. Find the frequency of word, phases, substrings, and constructions in each century since Old English. Can limit hits by frequency limits in any century.
EEBO / LION BYU only [SEARCH] 700 million 1500s-1900s Early English Books Online (1500s-1600s; 350m words) and Literature Online (mainly 1700s-1800s; 350m words) Basic interface to these corpora. Find the frequency by decade and century for words, phrases, and substrings.
LDS General Conferences   23 million 1851-present Every General Conference talk from 1851 to the current time Basic interface to these corpora. Find the frequency by decade for words, phrases, and substrings.