corpus.byu.edu

corpora, size, queries = better resources, more insight


 Contribute   Contributors 

 Academic site license 

Overview
Corpora
Size, speed, queries
Insight into variation

Updates (May 2016)
History / updates
FAQ / questions
Researchers
Log in / password
Profile / register

Related resources
   Full-text data
   Word frequency
   Collocates
   N-grams
   WordAndPhrase
   Academic vocabulary

Problems
Contact us


Created by Mark Davies, BYU. Overview, search types, looking at variation, corpus-based resources, updates.

The most widely used online corpora -- more than 130,000 distinct researchers, teachers, and students each month.
 

English

# words language/dialect time period  compare
NOW Corpus   NEW  2.8 billion+ 20 countries / Web 2010-yesterday  
Global Web-Based English (GloWbE) 1.9 billion 20 countries / Web 2012-13  
Wikipedia Corpus 1.9 billion English -2014 Info
Hansard Corpus (British Parliament) 1.6 billion British 1803-2005 Info
Corpus of Contemporary American English (COCA) 520 million American 1990-2015 * * * * *
Corpus of Historical American English (COHA) 400 million American 1810-2009 * *
TIME Magazine Corpus 100 million American 1923-2006  
Corpus of American Soap Operas 100 million American 2001-2012 *
British National Corpus (BYU-BNC)* 100 million British 1980s-1993 * *
Strathy Corpus (Canada) 50 million Canadian 1970s-2000s  
CORE Corpus  NEW  50 million Web registers -2014  
Other languages        
Corpus del EspaŮol   (see also...) 100 million Spanish 1200s-1900s *
Corpus do PortuguÍs   (see also...) 45 million Portuguese 1300s-1900s  
N-grams        
Google Books: American English 155 billion American 1500s-2000s *
Google Books: British English 34 billion British 1500s-2000s  
Google Books: One Million Books 89 billion Am/Br 1500s-2000s  
Google Books: Spanish 45 billion Spanish 1500s-2000s