corpora, size, queries = better resources, more insight

Size, speed, queries
Insight into variation

Updates (May 2016)
History / updates
Related resources
   Full-text data
   Word frequency
   Academic vocabulary

Created by Mark Davies, BYU. Overview, search types, looking at variation, corpus-based resources, updates.

The most widely used online corpora -- more than 130,000 distinct researchers, teachers, and students each month.


# words language/dialect time period  compare
News on the Web (NOW) 4.91 billion+ 20 countries / Web 2010-yesterday  
Global Web-Based English (GloWbE) 1.9 billion 20 countries / Web 2012-13  
Wikipedia Corpus 1.9 billion English -2014 Info
Hansard Corpus 1.6 billion British (parliament) 1803-2005 Info
Corpus of Contemporary American English (COCA) 520 million American 1990-2015 * * * * *
Corpus of Historical American English (COHA) 400 million American 1810-2009 * *
Corpus of US Supreme Court Opinions 130 million American (law) 1790s-present  
TIME Magazine Corpus 100 million American 1923-2006  
Corpus of American Soap Operas 100 million American 2001-2012 *
British National Corpus (BYU-BNC)* 100 million British 1980s-1993 * *
Strathy Corpus (Canada) 50 million Canadian 1970s-2000s  
CORE Corpus 50 million Web registers -2014  
Corpus of LDS General Conference talks 25 million (Religious) 1851-present  
Other languages        
Corpus del EspaŮol   (see also...) 2.1 billion Spanish 1200s-1900s *
Corpus do PortuguÍs   (see also...) 1.1 billion Portuguese 1300s-1900s  
Google Books: American English 155 billion American 1500s-2000s *
Google Books: British English 34 billion British 1500s-2000s  
Google Books: Spanish 45 billion Spanish 1500s-2000s