English-Corpora.org

English-Corpora.org


Corpus Size Countries Time Genre
IWEB 13.9b 6 2017 Web
NOW 16.2b 20 2010-now Web: News
CORONA 1.58b 20 2020-now Web: News
GLOWBE 1.9b 20 2012-13 Web/blogs
WIKI 1.9b (+) 2014 Wikipedia
COCA 1.0b Am 1990-2019 Balanced
COHA 400m Am 1810-2009 Balanced
TV 325m 6 1950-2018 TV shows
MOVIES 200m 6 1930-2018 Movies
SOAP 100m Am 2001-2012 TV shows
HANSARD 1.6b Br 1803-2005 Parliament
EEBO 755m Br 1470s-1690s Various
SUP CRT 130m Am 1790s-2010s Legal
TIME 100m Am 1923-2006 Magazine
BNC 100m Br 1980s-1993 Balanced
CAN 50m Can 1970s-2000s Balanced
CORE 50m 6 2014 Web
 

Overview: brief / detailed    

These are the most widely used online corpora, and they are used for many different purposes by teachers and researchers at universities throughout the world. In addition, the corpus data (e.g. full-text, word frequency) has been used by a wide range of companies in many different fields, especially technology and language learning.

The links below are for the free online interface. You can also purchase and download the corpora for use on your own computer.

Corpus Download # words Dialect Time period Genre(s)
News on the Web (NOW)   18.8 billion+ 20 countries 2010-yesterday Web: News
iWeb: The Intelligent Web-based Corpus   14 billion 6 countries 2017 Web
Global Web-Based English (GloWbE)   1.9 billion 20 countries 2012-13 Web (incl blogs)
Wikipedia Corpus   1.9 billion (Various) 2014 Wikipedia
Coronavirus Corpus   1.5 billion 20 countries Jan 2020-Dec 2022 Web: News
Corpus of Contemporary American English (COCA)   1.0 billion American 1990-2019 Balanced
Corpus of Historical American English (COHA)   475 million American 1820-2019 Balanced
The TV Corpus   325 million 6 countries 1950-2018 TV shows
The Movie Corpus   200 million 6 countries 1930-2018 Movies
Corpus of American Soap Operas   100 million American 2001-2012 TV shows
           
Hansard Corpus   1.6 billion British 1803-2005 Parliament
Early English Books Online   755 million British 1470s-1690s (Various)
Corpus of US Supreme Court Opinions   130 million American 1790s-present Legal opinions
TIME Magazine Corpus   100 million American 1923-2006 Magazine
British National Corpus (BNC) *   100 million British 1980s-1993 Balanced
Strathy Corpus (Canada)   50 million Canadian 1920s-2000s Balanced
CORE Corpus   50 million 6 countries 2014 Web
From Google Books n-grams (compare)          
American English   155 billion American 1500s-2000s (Various)
British English   34 billion British 1500s-2000 (Various)