|
The Corpus of Historical American English (COHA) is the largest structured
corpus of historical English. The corpus was created by
Mark Davies
of Brigham Young University,
with
generous funding from the
US National Endowment for the
Humanities. It is also related to
other large corpora
of English that we have created.
COHA allows you to quickly and easily search more than
400
million words of text of American English from 1810 to 2009. You can see how words, phrases and grammatical constructions have increased
or decreased in frequency, how words have changed meaning over time, and how
stylistic changes have taken place in the language. It's a lot more than
just frequency charts for individual words and phrases (like with
Google Books
/ Culturomics)
-- although those types of searches can be done here as well, and yield
essentially the same results as Google Books.
The following are just a small sample of an unlimited number of queries,
but they should give you some idea of what the corpus can do. As you
click on the links below, pay attention to how the form to the left has
been filled out, and then feel free to modify the search form to find
what you are interested in.
-
The overall frequency
over time of words and phrases that were related to changes in
society and culture, or historical events, such as
emancipation,
steamship,
telegraph,
flapper*,
fascis*,
teenage*,
communis*,
and
global warming.
-
Changes in
the language itself, such as the rise and fall of words and
phrases like:
-
(decrease since the
1800s): bosom,
grieved,
bestow*,
beauteous,
fellow,
sublime,
lad,
many a time,
of no little, and for
(conj) -
(an increase and then
decrease): anyhow, mustn't,
naughty, as though to,
don't know as
(=that), far-out,
swell (adj), and
lousy -
(an increase to the
present time): a lot of,
guys,
unleash,
sexual,
calm
down, screw up,
freak out,
mommy,
and
frustrating
-
You can also search for changes with
grammatical constructions like
end up
V-ing,
going to V,
V PRON into V-ing
(e.g. talked them into going),
phrasal verbs with
up (e.g. make up, show up),
post-verbal
negation with need (needn't mention), the
get passive
(get hired),
sentence-initial
hopefully, semi-modals like
need to
and have to, and the rise
(and possible recent decrease) of the
progressive passive (e.g.
was being considered).
-
You might also look for
"stylistic constructions" (half lexical, half grammatical)
which really do give the flavor of a different time period. Examples
from the 1800s, which have decreased since then, are:
so ADJ as to V
(so good as to show me),
be but
(they are but the last examples),
have quite V-ed
(until she had quite finished),
NOUN be that of
(her dress was that of a beggar), or
a most ADJ NOUN
(a most helpful child). Only a very robust corpus like COHA would
have enough tokens of specific constructions like these.
-
Parts of words (which
show how word roots, prefixes, and suffixes are being used over time in other words),
such as
-heart-
(compare earlier and later),
home- (earlier/later),
-able adjectives (earlier/later),
-ware
(earlier/later),
and -free
(earlier/later).
-
You can also have the
corpus generate a list of words that were used more in one period
than another, even when you don't know what the specified words
might be. For example, you can compare
verbs
in the 1970s-2000s (left) to the 1930s-1960s (right),
adjectives
in the 1970s-2000s (left) and the 1930s-1960s (right), or
-ly adverbs in the
1900s (left) to the 1800s (right).
-
The corpus can also
help to show how the meaning or usage of words have changed over time,
by looking at changes in collocates (co-occurring words). For
example, the collocates of
sexual,
gay,
chip,
engine, or
web
have changed over time. Notice also how
this can signal cultural changes over time, such as
nouns used
with woman in the 1930s-50s compared to the 1960s-80s, or
nouns used with
problem 1920-present (left) compared to 1810-1920 (right).
Please feel free to take a
five
minute guided tour, which will show the major features of the
corpus. A simple click for each query will automatically fill in
the form for you, search through the more than 400 million words of text, and then
display the results.
|