Tuesday, February 23, 2021

Like searching for an idiom in the proverbial haystack

Recently, I've been doing quite a bit of research into idioms. It's lots of fun, just because idioms are the fun end of language, but it's also quite challenging from a corpus perspective, because idioms are slippery suckers!

In general, idioms pose two key problems for a corpus researcher:

1 Separating the figurative from the literal: so, for example, trying to get stats on how common the idiom 'an own goal' is – as in The PM scored a bit of a political own goal yesterday – you realize you also have a whole load of cites from football reporting about actual own goals. There's no real way of doing this apart from trawling through a sample of corpus lines to make a rough judgement about the percentage of figurative vs literal uses.

2 Dealing with variation: while a few idioms are completely fixed, most allow for a bit of variation and some are so variable as to be almost impossible to pin down.  For example, you might start off with "frighten the life out of someone" … then you realize that the verb scare is common too and actually there are some examples of terrify … then you look some more and find examples for frighten/ scare the (living/ absolute) shit /crap /hell /fuck /heck /daylights /piss /bejesus* out of someone! (*various spellings) All of which I only uncovered by trying out different search patterns, allowing for alternative verbs and gaps for things that get scared out of you.


Of course though, the more flexible you make your search, the more 'noise' you get – i.e. examples that aren't of the target idiom – so it's a bit of a balancing act with lots of trial and error.

Then yesterday, a chance comment in a TV programme threw up a whole new issue that I'd never considered – the use of the term 'the proverbial' which is kind of an idiom within an idiom! I scurried off to a corpus to check it out and found that:

It's mostly used before or within a complete idiom (often before a key noun). And notice it doesn't have to be what we'd typically think of as a proverb, it can go with any fixed, idiomatic expression, I think as a way of the speaker acknowledging that what they're saying is a bit of a cliché. (Click on the image to enlarge).


Perhaps more interestingly though, it can also be used to replace a key word within an idiom. This often seems to be a way for the speaker to avoid a taboo word (shown in red) – and so be polite – but not always (words in green):


It's a fabulous linguistic quirk and lots of fun to play around with, but wow, how the proverbial do you go about explaining that one to a poor learner?!

