The
Collins WordbanksOnline English corpus is composed of 56 million words
of contemporary written and spoken text. To get a flavour of the type of
linguistic data that a corpus like this can provide, you can type in some
simple queries here and get a display of concordance lines from the corpus. The
query syntax
allows you to specify word combinations, wildcards, part-of-speech tags, and so
on.
Note that output from this demo
facility will be restricted to 40 lines of concordance. The lines to be
displayed will be selected on an every-Nth basis.
Note that output from this demo
facility will be restricted to 100 collocates. These will be the statistically
most significant ones according to the score you have selected.
A
query is made up of one or more terms
concatenated with a +
symbol. E.g.hell+hole
would search for the word
"hell" immediately followed by the word "hole".
Terms may be made up of simple
alphabetic strings, optionally modified with a trailing asterisk or 'at'-symbol,
concatenated and separated by vertical bars, or followed by an oblique stroke
and a part-of-speech tag.
The
plus may be modified with a preceding number to indicate the maximum
number of intervening words. E.g. dog+4bark
will search for
"dog" followed by "bark" with up to 4 words intervening.
An
at-sign (@) appended to a string of letters causes the software to expand the
wordform preceding the @ symbol into a set of inflected forms. For example, the
query blew@+away
will search for the set of
words blow
blows blowing blew
followed by the word away
.
An
asterisk appended to a string of letters indicates a wildcard match for all
characters at the end of a word. Be careful with this feature: in a large
corpus there are a surprising number of matching words for any given prefix
string. Using cut*
to get instances of "cut", "cuts" and
"cutting" is probably a bad idea.
Words
(or wildcard words) can be strung together with vertical bars to match an
explicit set of words. E.g. cut|cuts|cutting
The
corpus has been tagged automatically with a statistical tagger. You can specify
a search on word/TAG combinations by appending an oblique stroke and a
part-of-speech tag. POS tags must be in uppercase. Here are some major POS
tags:
NOUN a macro tag: stands for any noun tag
VERB a macro tag: stands for any verb tag
NN common noun
NNS noun plural
JJ adjective
AT definite and indefinite article
RB adverb
VB base-form verb
VBN past participle verb
VBG -ing form verb
VBD past tense verb
Word sets, wildcards and
part-of-speech tags can be combined within a term. The vertical bar binds more
tightly than the oblique stroke, so that fool|fools|fooling|fooled/VERB
matches these four words when
any of them occurs as a verb.