Like searching for an idiom in the proverbial haystack
Recently, I've been doing quite a bit of research into idioms. It's lots of fun, just because idioms are the fun end of language, but it's also quite challenging from a corpus perspective, because idioms are slippery suckers!
In general, idioms pose two key problems for a corpus researcher:
1 Separating the figurative from the literal: so, for example, trying to get stats on how common the idiom 'an own goal' is – as in The PM scored a bit of a political own goal yesterday – you realize you also have a whole load of cites from football reporting about actual own goals. There's no real way of doing this apart from trawling through a sample of corpus lines to make a rough judgement about the percentage of figurative vs literal uses.
2 Dealing with variation: while a few idioms are completely fixed, most allow for a bit of variation and some are so variable as to be almost impossible to pin down. For example, you might start off with "frighten the life out of someone" … then you realize that the verb scare is common too and actually there are some examples of terrify … then you look some more and find examples for frighten/ scare the (living/ absolute) shit /crap /hell /fuck /heck /daylights /piss /bejesus* out of someone! (*various spellings) All of which I only uncovered by trying out different search patterns, allowing for alternative verbs and gaps for things that get scared out of you.
[lemma="frighten|scare|terrify"][word="the"][]{1,2}[word="out"][word="of"]
Of course though, the more flexible you make your search, the more 'noise' you get – i.e. examples that aren't of the target idiom – so it's a bit of a balancing act with lots of trial and error.
Then yesterday, a chance comment in a TV programme threw up a whole new issue that I'd never considered – the use of the term 'the proverbial' which is kind of an idiom within an idiom! I scurried off to a corpus to check it out and found that:
It's mostly used before or within a complete idiom (often
before a key noun). And notice it doesn't have to be what we'd typically think of as a proverb, it can go with any fixed, idiomatic expression, I think as a way of the speaker acknowledging that what
they're saying is a bit of a cliché. (Click on the image to enlarge).
Perhaps more interestingly though, it can also be used to
replace a key word within an idiom. This often seems to be a way for the
speaker to avoid a taboo word (shown in red) – and so be polite – but not always (words in green):
It's a fabulous linguistic quirk and lots of fun to play around with, but wow, how the proverbial do you go about explaining that one to a poor learner?!
Labels: corpus research, idioms