Corpus insider #2: Frequency & typicality
Corpora are
really great for checking collocations: words that are typically used together.
Collocation's a really important aspect of language and a vital part of
language teaching if we want to help students avoid 'doing' obvious mistakes.
As expert speakers, we generally have a feel for an individual word's most
typical collocates, but when you're writing materials, it's easy to get a
particular combination stuck in your head or to start doubting your intuitions
- do we say get a bus or take a bus? The more you say it to yourself, the
sillier each one starts to sound. A bit of outside evidence can be really
helpful.
If you want
to use a corpus to check out collocations though, it's important to understand
a few basics about the statistics behind what the corpus tools are showing you
and what type of collocations might be appropriate for the materials you're
writing.
Frequent vs
Typical
The most
important distinction to get to grips with is the difference between frequent
collocations and typical or significant or strong collocations. Most corpus
tools will show you which words most commonly co-occur just based on raw
frequency, but some tools will also have an option to rank collocates by
strength of attraction, shown as a score. That is, the software will take into
account not just how often two words occur together, but how likely that
combination is based on the relative frequency of the two items. So the chances
of two very frequent words occurring together is quite high and therefore often
fairly predictable and uninteresting. If you look, for example, at the raw
frequencies for words which modify the noun car, you'll come across a whole
load of very common adjectives - new car, old car, small car, first car, other
cars, etc. That doesn't really tell you an awful lot about language. Most
students could probably guess these combinations. But if you rearrange the
collocates by significance, combinations like electric car, sports car, rental
car and police car start rising to the top, along with some cars that aren't even
cars, like cable car. They're clearly much more interesting from a linguistic
perspective, much less predictable and much more what we think of when we talk
about teaching collocation. See this Sketch Engine blog post for more about
this and more examples (although, I kind of disagree with its conclusions re.
language teaching!).
Ranked by frequency (the underlined number) Source: Sketch Engine, English Web 2013 corpus |
Ranked by score (the number on the right) Source: Sketch Engine, English Web 2013 corpus |
When you want typical
I started
off using corpora as a lexicographer working on learner's dictionaries. In a
dictionary, you want to show the range of a word and its usage, so looking at
typical collocates is a great starting point for getting a feel for a word. It
helps you to tease out different senses - like the AmE sense of car meaning
carriage, as in rail car, train car, freight car, etc. - to identify possible
compounds, phrases and idioms - car park, car pool, get car sick - and to pick out
some of the most significant collocates you might want to exemplify and perhaps
highlight.
The less
obvious but typical collocations are important in teaching materials too,
especially when an unpredictable collocation is also very frequent, like catch
a bus or board a plane; which score highly on both types of measure. The
typical collocations aren't, however, always what we want to focus on.
When you
want vanilla
Many
dictionary entries, especially for more frequent words, will start with what's
known as a 'vanilla' example. That is a simple example that illustrates the
basic meaning of the word in a context that's authentic but doesn't contain
other elements that distract from the word being exemplified. Information about
less obvious collocations, phrases or colligational patterns will come later.
So the Cambridge Dictionaries entry for car has the following example
sentences:
They don't
have a car. (the 'vanilla' example - 'have' is actually one of the top
collocating verbs by raw frequency, but it's unremarkable)
Where did
you park the car? ('park' is a more interesting collocate)
It's
quicker by car.
a car
chase/accident/factory
The same
principle holds for many other teaching contexts.
When you're
introducing potentially new vocabulary items, you want students to focus on
those new words. Of course, you want to present them in a realistic context
with appropriate collocates, but you don't want to overwhelm the student with
extra information and especially not with collocates that are well above the
level of the original target word. So if I was, say, teaching car for the first
time, I probably wouldn't throw in sports car or rental car, but it might be
appropriate to add a bit of variety to the material with simple combinations
like new car or small car. Only later when car was a familiar vocabulary item
might I want to extend students' range to talk about other types of cars as
appropriate contexts cropped up.
When
frequent isn't necessarily obvious
A
particularly tricky case in English is the set of 'delexical' verbs (make, do,
take, get, have, put, give, etc.) which are all incredibly frequent, but for a
learner of English, not at all obvious in terms of which to choose. If we go
back to what we do with buses, by far the most frequent collocating verb is
take. If you look at collocates by frequency, it's right at the top for most
corpora. If you switch to order collocates by significance though, because it's
a very common verb, it drops way down the order to be replaced by board, ride,
catch, park and drive. Obviously, that doesn't mean that we don't need to teach take the bus because it'll be obvious to our students … because it won't!
Weighing up
the numbers
So what
does all this mean? Which statistics should we be looking at? Well, the answer
is probably both. When I'm researching the collocates of a word, I'll flick
between both types of ranking to get an overall picture of how the word works,
then make my choices based on the teaching context.
- If I'm looking for a natural example for a new vocab item, I'll probably look at raw frequencies to find a collocate that's common but not distracting.
- If a collocate - like catch a bus - is high on both scores - it's probably worth teaching, and maybe highlighting, early on.
- If I'm looking to extend students' range and get them to use familiar words in more varied ways, then I'll investigate the more interesting collocates that come up when ranked by score
A note
about data
Finally, as
ever with corpora, it’s also important to know what data you’re looking at. As
I mentioned in my last corpus insider post, most corpora are made up of predominantly written
data and, of course, that’s going to affect the type of results you get back.
So, going back to my query at the start of this post about get the bus vs. take
the bus, most of the corpora I looked at listed take as a top collocate by
frequency, but get, which felt more natural to me, was much further down the
lists (both by score and raw frequency). When I looked at the Spoken BNC2014 (a
corpus of contemporary spoken British English) though, suddenly get the bus
rocketed to the top, suggesting it's something we say, but maybe write slightly less often.
Labels: collocation, corpus insider, corpus research, materials writing
0 Comments:
Post a Comment
<< Home