The
Collins WordbanksOnline English corpus is composed of 56 million words
of contemporary written and spoken text. To get a flavour of the type of
linguistic data that a corpus like this can provide, you can type in some
simple queries here and get a display of concordance lines from the corpus. The
query syntax
allows you to specify word combinations, wildcards, part-of-speech tags, and so
on.
Note that output from this demo
facility will be restricted to 40 lines of concordance. The lines to be
displayed will be selected on an every-Nth basis.
Note that output from this demo
facility will be restricted to 100 collocates. These will be the statistically
most significant ones according to the score you have selected.
A
query is made up of one or more terms concatenated with a + symbol. E.g.hell+hole would search for the word
"hell" immediately followed by the word "hole".
Terms may be made up of simple
alphabetic strings, optionally modified with a trailing asterisk or 'at'-symbol,
concatenated and separated by vertical bars, or followed by an oblique stroke
and a part-of-speech tag.
The
plus may be modified with a preceding number to indicate the maximum
number of intervening words. E.g. dog+4bark will search for
"dog" followed by "bark" with up to 4 words intervening.
An
at-sign (@) appended to a string of letters causes the software to expand the
wordform preceding the @ symbol into a set of inflected forms. For example, the
query blew@+away will search for the set of
words blow
blows blowing blew followed by the word away.
An
asterisk appended to a string of letters indicates a wildcard match for all
characters at the end of a word. Be careful with this feature: in a large
corpus there are a surprising number of matching words for any given prefix
string. Using cut* to get instances of "cut", "cuts" and
"cutting" is probably a bad idea.
Words
(or wildcard words) can be strung together with vertical bars to match an
explicit set of words. E.g. cut|cuts|cutting
The
corpus has been tagged automatically with a statistical tagger. You can specify
a search on word/TAG combinations by appending an oblique stroke and a
part-of-speech tag. POS tags must be in uppercase. Here are some major POS
tags:
NOUN a macro tag: stands for any noun tagVERB a macro tag: stands for any verb tagNN common nounNNS noun pluralJJ adjectiveAT definite and indefinite articleRB adverbVB base-form verbVBN past participle verbVBG -ing form verbVBD past tense verb
Word sets, wildcards and
part-of-speech tags can be combined within a term. The vertical bar binds more
tightly than the oblique stroke, so that fool|fools|fooling|fooled/VERB matches these four words when
any of them occurs as a verb.