The occasional ramblings of a freelance lexicographer

Tuesday, February 23, 2021

Like searching for an idiom in the proverbial haystack

Recently, I've been doing quite a bit of research into idioms. It's lots of fun, just because idioms are the fun end of language, but it's also quite challenging from a corpus perspective, because idioms are slippery suckers!

In general, idioms pose two key problems for a corpus researcher:

1 Separating the figurative from the literal: so, for example, trying to get stats on how common the idiom 'an own goal' is – as in The PM scored a bit of a political own goal yesterday – you realize you also have a whole load of cites from football reporting about actual own goals. There's no real way of doing this apart from trawling through a sample of corpus lines to make a rough judgement about the percentage of figurative vs literal uses.

2 Dealing with variation: while a few idioms are completely fixed, most allow for a bit of variation and some are so variable as to be almost impossible to pin down.  For example, you might start off with "frighten the life out of someone" … then you realize that the verb scare is common too and actually there are some examples of terrify … then you look some more and find examples for frighten/ scare the (living/ absolute) shit /crap /hell /fuck /heck /daylights /piss /bejesus* out of someone! (*various spellings) All of which I only uncovered by trying out different search patterns, allowing for alternative verbs and gaps for things that get scared out of you.


Of course though, the more flexible you make your search, the more 'noise' you get – i.e. examples that aren't of the target idiom – so it's a bit of a balancing act with lots of trial and error.

Then yesterday, a chance comment in a TV programme threw up a whole new issue that I'd never considered – the use of the term 'the proverbial' which is kind of an idiom within an idiom! I scurried off to a corpus to check it out and found that:

It's mostly used before or within a complete idiom (often before a key noun). And notice it doesn't have to be what we'd typically think of as a proverb, it can go with any fixed, idiomatic expression, I think as a way of the speaker acknowledging that what they're saying is a bit of a cliché. (Click on the image to enlarge).


Perhaps more interestingly though, it can also be used to replace a key word within an idiom. This often seems to be a way for the speaker to avoid a taboo word (shown in red) – and so be polite – but not always (words in green):


It's a fabulous linguistic quirk and lots of fun to play around with, but wow, how the proverbial do you go about explaining that one to a poor learner?!

Labels: ,

Friday, February 12, 2021

Writing rhythms

On most ELT writing projects, the work (and your life for the duration of the project!) gets divided up into units. For a students' book, that might be 10-15 quite large units, but for many of the sort of self-study, language practice type materials I work on, there can be anywhere between 20 and 50 short units which may only be 2-4 pages each.

At the start of a new project, you spend a bit of time getting to grips with the brief and playing around with the first unit or two to establish how they're going to work. Often, the format's already quite fixed in the brief, sometimes you have a bit of leeway to play with. Then once everyone's happy, you get your head down and start ploughing through unit-by-unit.

What interests me is how different people go about tackling each unit. Do they sketch out the whole thing then go back and fill in the details? Do they do it on paper or straight into a Word doc? Do they start from the beginning and work through each activity in turn? Or do they start with a core component, such as a reading text, then work outwards from it? A lot, of course, depends on the type and scope of the material, but even within that there's quite a bit of room for variation.

For the past couple of months, I've been working on some self-study vocab practice materials. There are 50 units altogether (across two linked projects) which is kind of daunting, but also quite nice as it means I've settled into a rhythm of roughly a unit a day. For each unit, I already have a (more-or-less) predetermined set of vocab items to practise across a number of activities. It's heavily corpus-informed, so I'm researching the vocab items to pick out features to highlight (typical usage and context, collocations, typical colligational patterns, etc.) and also using and adapting corpus examples in the activities. For the first few units, this was my approach:


The major downside of this was that I found myself running the same corpus searches numerous times. So, I'd explore vocab item A extensively in the initial research stage, then I'd find myself searching for it again several times to source examples for each exercise. I revised my approach after a few units so that I still did my research stage as before, but then sketched out a rough plan of the different exercises, e.g. exercise 1 focus on noun collocations, exercise 2 focus on following prepositions, etc. Then I ran a corpus search for each vocab item and added examples to several of the exercises at the same time. This seemed more efficient and I settled into it as a way of working for the first 15 units or so.

As is so often the case though, totting up my hours regularly as I went along, I realized I was spending much longer on the work than I'd budgeted for up-front. That meant that because the project is for a fixed fee, my hourly rate was nose-diving. It also meant I was getting behind schedule. After a bit of a review and discussion, it turned out that a lot of the extra work was just down to there being more involved in the project than I'd originally bargained for – isn't it always the case?! With no more budget available though, I had to try and rein in my hours regardless. So I came up with a new way of working.

On the plus side, it is much quicker because I'm only researching each vocab item once, then just reshuffling the results to create the exercises. On the downside, I'm not able to wait until I've researched all the items to see how the unit's going to shape up. So, if you like, the whole process is slightly less data led. In some units, it works out fine and the examples I've selected shuffle neatly into nice, coherent exercises. Other times, I find that a feature or exercise type starts to suggest itself towards the end of the vocab list and I realize I haven't noted relevant examples for some of the earlier items. Then I either have to squeeze the material I have into exercises which aren't a great fit or I have to go back and look for better examples for some items. For some units, I use up most of the examples I've collected, for others I'm left with a whole page of unused material at the end.

So often, ELT writing is a balancing act between how you'd like to work and what the time and budget allows. In this case, the hurry-up initially felt a bit uncomfortable, but as I go on, I think I'm settling into my rhythm again and making it work.

Labels: , , ,

Friday, February 05, 2021

Motivation, mindset and guessing vocabulary from context

When it comes to ELT vocabulary, the idea of guessing the meaning of unknown words from context seems intuitively a useful strategy. However, its effectiveness both in terms of reading comprehension and how well it helps learners retain new vocabulary has been questioned – see this blogpost from Philip Kerr for a summary of some of the arguments. I just went back to reread it after something that happened yesterday.

Since my partner recently took Swiss citizenship (via his father), we've been receiving regular piles of paperwork from the canton in which he's registered. It's mostly to do with voting, either in referenda or local elections and is all in French. We both speak some French, but far from fluently.

A pile arrived yesterday and I was flicking through it, mostly just to practise my French as I waited for the kettle to boil on a tea break. There were three referenda questions, two of which I understood quite easily, the third I hesitated over. 


Image of red booklet with referendum question

It read: "l'interdiction de se dissumuler le visage." Which I read as "Prohibition/Ban on [reflexive verb which I don't recognize] the face." My first thought was it might be something to do with banning facial recognition software or something similar – it was about banning something to do with people's faces and based on my current world knowledge, that seemed like a logical guess. I read the first paragraph and it initially seemed to fit – it talked about the ban applying in public places such as in the street, on public transport, in sports stadiums, etc.

I still wasn't quite sure though, so I scanned through a bit more of the text. Then I came across a section about the arguments in favour of the ban and it said that "[the noun from the unknown verb] of the face in public spaces symbolises the oppression of women and is against the liberal spirit of living together/community cohesion". Aha! It was at that point that I realized that dissumuler means to conceal or hide or cover and that the question was about face-coverings – presumably in the sense of a niqab rather than a medical face mask (the irony of the timing wasn't lost on me!).

That aha moment was incredibly satisfying – perhaps an under-rated motivator in language learning? Or is that just me? It often strikes me that teachers and linguists, who are inherently fascinated by language for its own sake, may not be the best people to judge what works and what doesn't for the average language learner for whom learning a language may just be a means to an end. Does the average learner get that same sense of achievement from working out meaning? Would they have bothered to form a hypothesis then read on to check it in the way that I did or would they have just given up? It's hard to say and I'm sure it would differ enormously from student to student.

And of course, now I'm not going to be able to fairly judge whether my experience of guessing from context is going to help me retain the new vocab item either. Chances are I will just because I've done the diligent language-learner thing of processing and working with my new word by writing a blog post about it!

Labels: ,