Exact number of words (tokens): 414,771,808

Statistics on types (unique word forms)
 
# tokens per type >= Not case sensitive Case sensitive
1 2,805,451 3,298,943
2 904,912 1,122,436
3 605,712 759,136
4 476,398 599,855
5 401,786 507,058
6 352,967 445,906
7 316,952 400,970
8 289,423 366,429
9 267,095 338,330
10 249,013 315,654

Download spreadsheet with number of types for tokens = 1-100, with chart

Download list of all words [types] that occur at least four times in the corpus; with part of speech

Download file with information for each text: tokens, types, avg. word length, # nouns, # tokens