This site contains downloadable, full-text corpus data from five large corpora of English-- NOW, Wikipedia, COCA, COHA, GloWbE, as well as the Corpus del Español. The data is being used at hundreds of universities throughout the world, as well as in a wide range of companies .

These corpora provide important insight into variation, which is not available from other sources, and in many cases they are 50 to 100 times as large as comparable corpora. (More information on the strengths of each corpus...)

With this full-text data, you have the actual corpora on your computer, and you can search the data in any way that you'd like. The data for all three corpora comes in three different formats: data for relational databases, word/lemma/PoS, and words (paragraph format). When you purchase the data, you purchase the rights to any and all of these formats.

See samples of each corpus (the samples are 1.7 to 3.6 million words each).

wordlemma PoS
COCA: 440 million words | 190,000 texts | 1990-2012 | evenly divided (~88 million words each) into spoken, fiction, magazine, newspaper, academic

FICTION: Trees were swaying , though gently , and their leaves were rustling as if in applause to the change in the weather . <p> This had been going on for several days . The men and women who gauge the climate on television were exultant over the unusual run of good weather as if it was they who had brought it on .

MAGAZINE: The ability to approach every hunting situation with the same kind of open mind and without pre-conceived notions about " how it 's supposed to be done " is a trait all of the best deer hunters share . Plus , it 's a lot of fun to pull off-the-wall stunts that actually work in special situations .

NEWSPAPER: The protesters here certainly know what they do n't like : war , globalization , capitalism , drug laws , immigrant detention centers , a high-speed train line and , inexplicably , the Olympic torch . <p> " This is a discussion of war , " said Claudio Robba , 25 , one of maybe 150 protesters at a piazza , 

ACADEMIC: Synthesizing knowledge of the connections between above-surface and below-surface biodiversity was considered a priority to be addressed at a second workshop , since it would help to yield information on keystone species and interactions in ecosystem processes assess the extent of species 

SPOKEN: @SUMMIT It should be a C-note. @Mr._CARY_ANDERSON That's it. @Mr_ ANDERSON Oh, very good. See, you didn't have to get nervous, Mr. Cronick. You were really very good at it. @SUMMIT All right. @Mr._ANDERSON You -- you were coming fast and furious here. It was great. I could sleep. @

 NEW  NOW: (Currently) 3.4 billion words | 6,000,000 texts | web pages | 20 different countries | Growing by 4-5 million words each day.

United States (Dec 2016, Gizmodo): Just when you thought it was safe to go back in the water ... it totally was. While certainly dramatic, an image of a breaching great white shark currently making the rounds on Twitter as National Geographic’s “Photo of the Year” has nothing to do with the magazine. In fact, it isn’t even a photo.

Great Britain (Dec 2016, Guardian): Akhtar says she is “sick of being told I’m not getting on with people who are not like me” and admits to deliberately going out of her way to change that perception. “You know what I’ve been doing recently?” she asks. “When I’m in a shopping centre, if a see a white person sat on a bench

Australia (Dec 2016, news.com.au): “It is soooooo heavy (this is just the top section) and made of recycled branches [and] our decorations are from @countryroad,” she wrote. Within 24 hours the photo has clocked up 10,000 likes and almost 300 comments, with many slamming the “tree” as an “epic fail”.

India (Dec 2016, Siasat.com): ”Naseer saab was not promoting the film and Arshad was not there for all the promotions. But I went everywhere in that synthetic sari, promoting the film. But I was happy that I was doing all I could to get as many people as possible into the theatres,” she said. Vidya Balan added, ”And then, much later, after several films,

 NEW  Wikipedia: 1.8 billion words | 4.4 million texts

Toyota Camry: Coil spring independent suspension features by way of a MacPherson strut type with stabilizer and strut bar up front, and a MacPherson rear setup with parallel lower arms. Steering uses a rack and pinion design; braking hardware is made up of front ventilated discs and rear drums with a double proportioning valve to suppress lock-up.

Basilisk: The basilisk appears in Harry Potter and the Chamber of Secrets as the monster inside the Chamber of Secrets. Some characteristics of the beast are similar to the rest of the mythos: the basilisk is considered king of serpents, and its gaze kills. But the basilisk in JK Rowling's work is also said to be of gigantic size

Computational phylogenetics: Traditional phylogenetics relies on morphological data obtained by measuring and quantifying the phenotypic properties of representative organisms, while the more recent field of molecular phylogenetics uses nucleotide sequences encoding genes or amino acid sequences encoding proteins

Duke Ellington: Ellington had to increase from a six to eleven-piece group to meet the requirements of the Cotton Club's management for the audition, and the engagement finally began on December 4. With a weekly radio broadcast, the Cotton Club's exclusively white and wealthy clientele poured in nightly to see them. At the Cotton Club, 

 NEW  Corpus del Español (Web/Dialects): 1.8 billion words | 1,800,000 texts | web pages | ~ 60% blogs | 21 Spanish-speaking countries

Argentina (blog): Que no me guste lo que hace en este momento es otra cosa. Otras cosas que hace, como su labor solidaria, las donaciones que hace en Bolívar, etc, son menos conocidas que el resto de las cosas que se publican, y son totalmente encomiables. Noooo Alice, no me banco a Tinelli. Me parece super falluto mal.

México (general): el chamaco tiene fuertes probabilidades de heredar algún trastorno. Pero, como dice ella, apenas lo detectemos, lo llevamos a el psicólogo y a el psquiatra y lo tratamos. Un abrazo. Hoy fui a el psiquiatra y me dio una de las noticias mas tristes, pero la mas sensata: Señora, usted NO debe tener más hijos, porque de seguro

Puerto Rico (blog):  nos detuvimos a pensar sobre el porvenir en un diálogo decisivo mi esposo me preguntó: ¿ Qué tú deseas realmente? ¿ Dar a luz o ser mamá? Yo me he preguntado lo mismo en muchas ocasiones durante todos estos años y he llegado a la conclusión de que quiero ser padre. No importa la forma, quiero ser papá.

España (general): all sistema operativo (impidiendo su ejecución normal) o de borrar completamente la información almacenada en el disco duro. Aunque estas acciones pueden hacer que su ordenador deje de funcionar correctamente, no representan un daño físico irreparable. Por último, tenga en cuenta que aunque en la actualidad(Laughing) I appreciate it.

GloWbE: 1.8 billion words | 1,800,000 texts | web pages | ~ 60% blogs | United States, Great Britain, Australia, India, and 16 other countries

United States (blog): So , my last blog post was about going out with a guy who really " got " me . He was cool with all the projects I do . Well , I did see him again later that week and let 's just say : Things got a little creepy . <p> He plays softball , and was going to come over after a game . He also loves horror movies and

Great Britain (general): Returning in 2012 with his fourth artist album , " The Agony & The Ecstasy " , High Contrast is set to reinstate his reputation at the top table with this superb twelve-track long player . <p> High Contrast himself describes The Agony & The Ecstasy as more personal than any other album he 's made 

Australia (blog): i want build a little biz to be the place to ' get a little help as you build your little biz . ' <p> but i also want to send a strong message about the people i want to help . people who want to ... <p> be awesome . do what you love . kick ass . <p> i have these words posted clear above my desk as a reminder to myself too .

India (general): The word apavitra anna refers to food that is unacceptable for a Vaisnava . In other words , a Vaisnava can not accept any food offered by an avaisnava in the name of maha-prasada . This should be a principle for all Vaisnavas . When asked , " What is the behavior of a Vaisnava ? " Sri Caitanya Mahaprabhu replied , "

COHA: 385 million words | 115,000 texts | 1810-2009 | each decade has roughly the same balance of fiction, popular magazine, newspaper, and non-fiction books

1820s (fiction): whom he called his good genius , and his elevated character and rare purity entitled him to this distinction . The influence of his virtues and affection might , perhaps , have preserved Henry from the errors of his after life , but their opportunities of intercourse were rare and brief

1870s (non-fiction books): In which opinion all physiologists will join . As I have said , hysterical women certainly do exhibit a marked ability to go without both food and drink . I have had patients abstain from sometimes one , sometimes the other , and sometimes both , for periods varying from one day to eleven

1910s (newspaper): The Greenwich Equal Franchise League , composed entirely of New York society women , had taken part in the selection of candidates , and , deciding that one of their own sex could not be elected , advised their husbands and brothers whom to nominate .

1960s (magazine): " And when you have fifty bucks ' worth of candles , you have to light ' em , because they look so beautiful . " They both laughed . " The hippie thing was misinterpreted by the people of this country , " Guthrie said , a moment later . " The hippies were saying , ' Love everybody , ' but