The NOW corpus currently contains about billion words of data, and it grows by about 5-6 million words (or about 10,000 articles) each day. This means that it is current as of yesterday, unlike other "stale" corpora that were created 10-20 years ago.

One of the best features of the corpus is that you can track the frequency of a word or phrase in each 10 day period since January 2010 (i.e. almost every week). While tools like Google Trends allow you to see how much people are searching for a word over time, we're not aware of any other resource that allows you to see how frequently a word or phrase actually occurs in the news over time.

One of the most interesting uses of the corpus is to track political topics, and to see how often they're being reported. For example, consider the term fake news, which has been talked about a lot recently (at least in the United States).

But when precisely did newspapers and magazines start talking about "fake news"? If you search for fake news in the corpus, you'll see that it spiked in the latter half of 2016 ("2016-B" in the chart). Click on 2016-B and you'll see the frequency in each 10 day period from July - December 2016. Interestingly, there is almost no mention of "fake news" until the first week of November (Nov 1-10 in the chart) and then it explodes in Nov 11-20, and has stayed very high since then:

What happened around November 10 that might explain why people all of the sudden started talking about something that had really not been mentioned much at all until that time?

Of course it was the US elections, which were held on November 8, 2016.

In other words, no one was really concerned about supposed "fake news" -- it wasn't on anyone's radar at all -- until a day or two after the election, when people were desperately trying to come up with some explanation for why Donald Trump had (so unexpectedly) won the election. And then this topic -- which wasn't really covered by anyone while it was supposedly happening in Summer 2016 -- suddenly became the topic of conversation in the mainstream media.

(Note that some people claim that interest in fake news was started by Trump supporters before the election, but actual data show that's incorrect).


Note: lest anyone think that the analysis shown above is politically motivated, I should say upfront that I'm not much of a fan of Donald Trump. But... I'm also not much of a fan of the "mainstream media" either, which I think is often biased in its reporting (both linguistically and otherwise). A corpus like this one -- which gathers data as it happens -- can hopefully help discourage the media from trying to throw past events down the "memory hole".


If you find other examples like this (which show how the frequency of a word provides interesting insight into politics and what gets reported and what doesn't), please let me know.