A corpus is a large collection of texts which is used for studying a language.
For example, Google has scanned English books from the year 1500 to 2019. The data from these books can tell us how common certain words and phrases are now (and were in the past).
You can see this for yourself at Google Ngram Viewer. Perhaps you are not sure if the usual plural of corpus is corpora or corpuses. Enter corpora,corpuses in the search box and you will see which word is more common:
Corpora can also tell us which words often go together (collocations). For example, which of these 4 adjectives is best to use with problem? A comparison suggests major or serious. While big seems OK, large must be wrong:
Another way is to ask for the most common adjectives used with problem:
See the sections on Google Ngram Viewer, SKELL and English-Corpora.org for instructions on using these online tools.
Exploring corpora with Google Ngram Viewer, SKELL and English-Corpora.org
Background music is ‘Effects of Elevation’ from the album Effects of Elevation by Revolution Void, licensed under an Attribution Licence.
Google Ngram Viewer
To compare the frequency of words or phrases, enter them in the search box, separated by commas.
Which of these words is more common: undeniable or indubitable? (We have set the years to 2000-2019.)
Clearly undeniable is more common.
Which of these phrases is most common: Third World, developing country, least developed country, Global South?
Third World is still the most common, though less so than before. Global South has overtaken developing country.
Comparing collocations: 1
Which of these three adjectives is most common with outcome?
Most common is positive outcome, followed by good outcome. Great outcome is rare.
Comparing collocations: 2
We are not sure whether bored with or bored of is correct, so we compare the two:
Bored with is still the winner, but bored of is catching up.
Finding the most common collocations
To find the top ten collocations, use *. For example, which words commonly follow positive?
We see from this that the most common words after positive are and and or.
Restricting collocations to nouns, verbs, adjectives, etc.
Let’s try to limit the words after positive to nouns. We do this by adding a part-of-speech tag (in this case _NOUN):
For more information on part-of-speech tags, see the help page.
Under the graph you may see the heading Search in Google Books and some date ranges. Click on one to see examples of your word in sentences and book titles. (You will probably find better examples through SKELL or English-Corpora.org.)
SKELL is free to use, with no registration or login.
Enter a word or phrase in the search box.
Enter evaluate in the search box and under the Examples tab you will see sentences that contain evaluate, evaluates, evaluated or evaluating.
Switch to the Word sketch tab and you will see examples of collocations under various headings, such as:
- subject of evaluate
- object of evaluate
- adjectives with evaluate
Switch to the Similar words tab and you will see words with similar meanings to evaluate.
‘The word cloud shows how similar each word is. The words in the centre are the most similar. The size indicates how frequent the word is.’
English-Corpora.org is free to use with a UEA login. You will need to register.
The website has several corpora. Probably the most useful for academic English is the Corpus of Contemporary American English (COCA).
Click on Word and enter your word in the search box:
You will then see detailed information on this word:
Note: Topics are words found in the same texts as your word. For example, if you search for vampire, topics include demon, creepy, corpse, curse, werewolf, zombie and twilight.