In August 2012 we re-tagged the BNC and updated the CLAWS tagset from CLAWS 5 (C5) to CLAWS 7 (C7) -- the same one that is used for COCA, COHA, TIME, and SOAP. A partial list of the more important changes in provided below, as well as links to conversion tables between the two tagsets.


Note that if you "re-use" any of your older queries (via the history function after logging in), they are automatically converted from C5 to C7, which should help to ease the transition. If you find any codes that aren't being mapped correctly, please let me know.


Why the change? Just as computer technology changes and new operating systems are introduced, tagsets also evolve. The C7 tagset has several advantages over the C5 tagset, which was created way back in the 1990s. More importantly, by using the C7 tagset we can allow better comparisons between the corpora at this site, which is an important consideration for us. (For example, one click will re-run a query from the BNC in COCA (or vice versa), which allows powerful comparisons between the two corpora.)


If you absolutely need to use the older C5 tagset, you might consider any of the other interfaces to the BNC, all of which still use the older C5 tagset: BNCweb (aka CQPweb), Sketch Engine, VISL, Phrases in English, Just the Word, SARA, XAIRA, the actual BNC site, etc.

The main differences between the two tagsets are shown below. Please check the CLAWS 7 page for other PoS codes. You can also see a page showing correspondences between the two tagsets.

Old form (CLAWS5) New form (CLAWS7) example
[aj*] [j*] green
[av*] [r*] quickly
[pn*] [p*] some
[pnp*] [pp*] she
[pr*] [i*] from
[dps*] [app*] my
[vvb*] [vv0*] walk
[crd*] [mc*] two
[ord*] [md*] second
[pu*] [y*] ,
[itj*] [uh*] umm