corpus.byu.edu


SPEED

For very large corpora, Sketch Engine is just about the fastest corpus architecture available. Our architecture, however, is even faster -- about six times as fast, on average, for "string searches" like those shown below. This means that with GloWbe, for example, you might spend 5 minutes doing a series of searches, whereas it would take you 30 minutes total (25 minutes more waiting for results) in a similar-sized corpus in Sketch Engine.

The following data is based on the 1.9 billion GloWbE corpus and a 2.7 billion word corpus in Sketch Engine [enTenTen08 = 3.3 billion tokens, including punctuation, etc). Since [enTenTen08] is about 50% larger (2.7 vs 1.9 billion words), it should take about 50% longer for each search. But in fact, it takes much longer than that. For example, the first search shown below -- [have] quite [vvn*] -- takes about 2.6 seconds in GloWbE. Allowing for the 50% larger size of [enTenTen08], it should take about 3.9 seconds there. In fact, though, it takes about 25 seconds (11 seconds for the concordance lines (SE1) + 14 seconds to find and sort the node words (SE2)), and this is about 6-7 times as slow as GloWbe.

Note: click on any link on this page to see the corpus data, and then click on "RETURN" in the upper right-hand corner of the corpus to come back to this page.

 
GloWbE Sketch Engine (enTenTen08) GloWbE SE1 SE2 Faster (x)
[have] quite [vvn*] [lemma = "have"] [word = "quite"] [tag = "VVN"]  2.6 11 14 6.4
several [nn*] [word = "several"] [tag = "NN."]  3.3 12 75 17.6
I [vv*] if [word = "I"] [tag = "VV."] [word = "if"]  5.7 24 29 6.2
just [vv*] [p*] [vv*] that [word = "just"] [tag = "VV."] [tag = "PP$"] [tag = "VV."] [word = "that"] 5.5 36 5 5.0
[j*] places [tag = "AJ"] [word = "places"]  3.6 14 31 8.3
in no [nn*] [word = "in"] [word = "no"] [tag = "NN."] 4.9 14 7 2.9
to only [v*] [word = "to"] [word = "only"] [tag = "VV."] 5.0 21 5 3.5
[vv*] [p*] into [v?g*] [tag = "VV."] [tag = "PP"] [word = "into"] [tag = "V.G"] 5.0 30 8 5.1
[r*] [vv*] whether [tag = "RB"] [tag = "VV."] [word = "whether"]  3.0 26 14 8.9
[go] [j*] [lemma = "go"] [tag = "JJ"]  6.8 14 28 4.1
     

Average

6.8 x