The occasional ramblings of a freelance lexicographer

Tuesday, March 15, 2022

Researching phrasal verbs: showing my workings

This is kind of the post behind the post. I recently wrote a post for the Collins ELT blog about how we updated the new edition of Work on Your Phrasal Verbs. I was asked to make it a short, simple post for a wide audience. You can read the post here - it summarizes the key changes in terms of updating the phrasal verbs we included to reflect current usage (we changed around 10% of the PV list) and also how we reworked the design to include more space for practice activities.  What the post doesn’t have space for is explaining how we did that. So, I thought I’d write this post-behind-the-post to give those of you who like that kind of thing a few more of the nerdy details.

Updating the PV list was a fairly straightforward case of checking the current corpus frequencies of the PVs in the first edition and highlighting any that had declined in usage. I say “straightforward” – actually checking the frequency of phrasal verbs is far from straightforward, but I’m not going to get into that here! Then the Collins corpus team generated a list of possible new additions based on the most high-frequency PVs that weren’t included in the first edition. I checked the most likely candidates manually, chose suitable options, and we worked out how to shuffle things around to fit the new additions into the themed units to replace those we’d dropped.

When we were initially reviewing the first edition, I'd highlighted the fact that most of the existing practice activities focused on the basic meaning of the PVs, so a largely receptive focus. With an extra page per unit for new activities, I suggested we could use the space to build on the receptive/meaning-focused exercises by adding new ones that looked more at the kind of things learners need to know to use PVs productively. For me, it was this bit that turned out to be the most interesting part of the project as I got to do original corpus research, then put it directly into practice.

In my Collins post, I pick out four main areas we focused on – and I also made some pretty graphics around them for an instagram post which I’m going to unapologetically reuse!

Typical Collocations:

This is a biggie because collocations are not only important for using language in a way that sounds natural - and produces predictable combinations of words that readers and listeners will expect and not ‘trip over’ – but collocations such as typical subjects and objects also tell you a huge amount about how and where the target PV is typically used. Is it used to talk about serious or fairly frivolous topics? Are the subjects positive or negative things? What types of people do this thing? Is it informal and conversational or something more likely to crop up in business communications or journalism?

Here are some of the pages and pages of notes I made as I researched each PV.

You can see very clearly that, in terms of typical objects, we drop off both objects and people:


For intransitive PVs, the subjects are obviously more interesting, like here at (not) add up:


Or here you can see what kinds of people tend to step down:


For ditransitive verbs, like remind sb of sth, I looked at both the direct and indirect objects:

And sometimes I noted down both subjects and objects as worth highlighting, such as here you can see who lays off who:


But it’s not just about the nouns. There are the adverbs too that you can see above with step down immediately/voluntarily and not really add up. Or even the quantifiers, lay off a lot of/hundreds of .

I used all this mass of information to create activities that focus specifically on collocation, like the one above, but I also tried to include the strongest collocates of each PV throughout the unit, including them in examples where they weren’t the main focus.

Colligation patterns:

These are the grammatical patterns that PVs tend to be used in. At the simplest level, do you carry on read, carry on to read or carry on reading? While scrolling through corpus lines, I often found myself noting down following patterns, some of them simple, like a following -ing form or a wh- clause as above. Some like at make up for were more complex, taking in direct objects, -ing forms, wh- clauses and also a passive plus preposition combo (be made up for in …) and a preceding phrase (more than made up for …)


Other common preceding patterns I noted included modals and other introductory verbs (do they have a name?), like try to, fail to, go and

Again, some of these made their way into explicit exercises highlighting the patterns, often matching sentence halves, while others just loitered in general examples, building the picture for students of typical usage in the background.


Slightly confusingly for learners, phrasal verbs often co-occur with specific prepositions that aren’t strictly part of the phrasal verb itself, because they’re optional or vary depending on what follows. They’re at the niggly, detailed end of language learning, but they can make a real difference to how language flows and to how listeners/readers are able to process a sentence. Imagine you read “They fell out with …”, you expect what follows to be a person, not an issue and if it isn’t, you hesitate, maybe reread, wonder if you’ve understood correctly.

Word order:

All of the above categories apply to almost any type of word, but this last issue is uniquely phrasal-verby. ELT materials commonly teach about the difference between separable and inseparable phrasal verbs, often in a one-off section, but once students have grasped the concept, they need to know how it applies to specific PVs they come across. Does the object always appear between the verb and the particle or always after the particle? If both are possible, is there a tendency one way or the other? Can a PV only be split by a pronoun or are there other pronoun-like words that can go in-between, like everyone, things, etc.?

As you can see, I had hours of nerdy fun researching all this stuff which I then tried to cram into a few short pages. I just hope that students get out at least some of what went in!

More about the book here.


Friday, February 18, 2022

Different perspectives & the frequency illusion

Recently, for various reasons, I’ve been looking at some relatively low-frequency words. Yesterday, I had paranormally to deal with and it sparked a couple of thoughts about the way I work with language.

Most of the time when you’re researching and compiling dictionary entries, your feelings about the words you’re dealing with are fairly neutral. Sometimes, you get a fun word just because it expresses a fun idea, because it’s pleasingly onomatopoeic, or because it’s rude. Sometimes, words are awkward because they sprawl across tricky-to-split senses or have lots of confusing variants. And sometimes words are just hard to get your head around – see my recent post about hard words. Then there are words that you realize you’re biased against because they express ideas you feel uncomfortable about or, in this case, sceptical of.

Initially, I wasn’t sure that ‘paranormally’ even felt like a feasible word. Of course, ‘paranormal’ as an adjective and a noun are pretty frequent, but I couldn’t imagine how it’d be used as an adverb. So, I looked at the corpus evidence and, although it wasn’t super-frequent, there was enough evidence to investigate further. I scanned through the cites and found myself immediately drawn to the jokey, ironic, and yes, sceptical uses. When you’re compiling dictionary entries and looking for good example sentences, it’s tempting to plump for the ones you can hear yourself saying, because they feel more natural. But then you have to check yourself and remember that other people who will be looking these words up, may be coming from very different perspectives.

When I come across words that for whatever reason don’t fit with my own beliefs and opinions, I try to think of someone I know who would have a different perspective. As a firm atheist, I always feel a bit squeamish about words with religious connections or connotations, but I have some more religious friends whose shoes I try to imagine myself into. In this case, I immediately thought of an old childhood friend who I know has an interest in the paranormal, as well as a whole host of other ideas I wouldn’t normally pay much attention to. So, I approached the evidence again through her eyes and gave her a couple of examples to pick to balance my more sceptical choices.

Later in the day, I was scrolling through my Facebook feed and came across this post from her: 


I was especially chuffed to see I’d correctly picked out paranormally active as a strong collocation! I also marvelled for the umpteenth time at the phenomena of frequency illusion. Even when I’ve been researching the most obscure of words, that I swear I’ve never come across before, they always seem to have a knack of then cropping up in context somewhere the very next day!



Monday, January 17, 2022

Lexicography FAQs: hard words

The other night I dreamt that I had an app on my phone that kept pinging obscure words at me that I had to define at the same time as I was trying to pack up to move house. 


Having been deep into a lexicography project for the past few months, I’m loving being paid to play with words all day, but at the same time, it’s clearly messing with my head!

It made me think of one of the questions that people often ask when I say I’m a lexicographer - how I know what the ‘hard’ words mean. My first response is usually that as someone who works primarily on ELT dictionaries, we tend not to get very many ‘hard’ words to deal with. Learner’s dictionaries focus more on high frequency words because they’ll be the ones learners are most likely to come across and need to know. Back in the days of print dictionaries, space constraints largely dictated which words would be included. Crudely speaking, we started with the most frequent words and worked our way down the frequency list until we’d filled the number of pages we had space for. Of course, in a more digital age, those constraints no longer apply and the distinction between learner’s dictionaries and those aimed at L1 English speakers is becoming more blurred.

Frequency notwithstanding though, at the top end of advanced learner’s dictionaries, you do sometimes come across words you’re not familiar with and over the years, I’ve worked on a number of projects that have gone beyond the confines of general ELT, such as working on the Oxford Learner’s Dictionary of Academic English and on non-ELT projects, including the Oxford Dictionary of English. So, yes, I do have to deal with words that are either right on the edges of my knowledge or that I just don’t have a clue about.

Connections and derivatives

At the easier end of the scale, you’re sometimes faced with words that are related to those already covered in the dictionary you’re working on. A lot of my work on ODE, for example, was adding or upgrading run-ons – words that are tagged onto the end of other entries but don’t get a full entry and definition of their own. So, you might find contentedly and contentedness as run-ons at the end of the entry for contented.

Oxford Dictionary of English (3e)

If you’re upgrading a run-on to a full entry, it’s sometimes as simple as adapting the definition from the existing entry to the new part of speech and adding some examples. You still need to do a corpus check though – firstly to source the examples, but also to make sure the word does just mirror its root word in terms of usage. For example, looking at corpus cites for the word deportable recently, it was immediately clear that it broadly means “can be deported”, but a closer look revealed that a person can be deportable – a deportable immigrant/foreign national – but there are also deportable crimes/offences, i.e. ones that someone can be deported for. So, both of these uses needed to be reflected in the entry.

Checking other dictionaries

Another question I often get asked is whether we just copy stuff from other dictionaries. The answer is “No, but …” Dictionaries all have their own style guidelines which dictate how definitions are worded. Learner’s dictionaries have a defining vocabulary; a list of words you’re allowed to use in definitions. There are pages of guidelines about all the other little details such as register labels, regional and spelling variants, and how grammatical information, collocations, etc. are shown. And examples are all drawn from the specific publisher’s corpus, along with guidelines about how many examples to include, how they should be edited, presented, etc.

That said, I do often consult other dictionaries to check my intuitions. I’ll always look at the corpus data first and try to form my own impression of a word, even if in the case of completely unknown words, it’s a bit of a vague one. Then I’ll look at other dictionaries, often 3 or 4 others, to see whether my impressions were in track and then go back to the corpus evidence again. Sometimes, the other dictionaries confirm quite straightforwardly what I was thinking, sometimes they highlight a usage I hadn’t spotted, so I’ll go back to the data to see whether I can find examples. Sometimes different dictionaries disagree, and very occasionally, they just don’t cover the word at all.

Other reference sources:

When it comes to specialized, technical and academic words, I often find that I need to dig a bit further. Sometimes I find myself looking at definitions from other dictionaries but feeling that I still don’t really understand the concept. That’s not to say those entries are lacking, dictionary definitions are about trying to capture the essence of a word as concisely as possible, they’re not about explaining complex concepts in endless detail.  In those cases, I seek out other reference sources, often specialist technical or academic websites that have fuller explanations. It’s one of my favourite bits of the job but can be challenging too. I’ll usually try and look at a mix of sources including some nice simple undergraduate guides if I can find them, as well as the proper technical stuff. Then when I think I’ve got my head around a concept, I’ll try and synthesize it all into an appropriate definition. When it’s an idea within the humanities and social sciences, I often feel like I’ve pinned it down well. In the hard sciences and computing though, it can feel like a half-understood approximation! I was pleased to fall back on my A level in Statistics recently when trying to understand probability density function, but found myself stumped by multi-phase and premeiotic … which is why we have:

Spec checks:

Thankfully, as lexicographers, we aren’t expected to be omni-experts! Anything we’re unsure about, we can flag up for a specialist check so it can be referred to a subject specialist … who’ll probably laugh at our lay attempts to define a techy term before they completely rewrite it!

Monday, December 27, 2021

Patterns that go unnoticed

It’s turning into a bit of a negative end to the year … not because of anything bad happening, just because I find myself deep in a flurry of words beginning with un-. In my last post, I looked at what I’ve now discovered is called litotes; the use of two negatives together to express either irony or a subtle distinction between two absolutes (not uncommon, not unpleasant, etc.). Recently, I’ve been seeing another pattern with un- prefixed words; a kind of passive construction with go + un- + past participle:


It seems to describe events that no one sees or does anything about, things that are missed or ignored. In terms of form, it’s a bit like the get passive that we’re all familiar with, but this time the focus is on the lack of an agent doing anything. Although like most passives, we can add a by to say who didn’t notice or act … and maybe should have.

It’s also another pattern that can be used with a negative – back to litotes again. This seems to work in two ways – to talk about negative actions and events which won’t escape notice or shouldn't escape punishment:

But also to talk about positive actions and events that will be acknowledged or rewarded:


Digging a bit deeper, I realized that as well as the obvious negative verbs beginning with un-, the pattern also occurs with a handful of other verbs that have a negative meaning:

Flicking through the ELT grammar reference books on my shelves, it seems to go unmentioned. If you dig deep enough, it does feature in many dictionary definitions for go, like these from Cambridge and Macmillan, but I'm guessing it's the kind of entry that goes largely unread ...

Cambridge Dictionary

Macmillan Dictionary

Tuesday, December 14, 2021

Not such an uncommon pattern

I was recently researching the word unproblematic. Before I started looking at the corpus evidence, I expected that it was used to describe something that’s simple, straightforward, and uncontroversial, something that doesn’t throw up any problems. And it is, but …

As I scrolled down the concordance lines, sorting left and right as I often do when I first look at a word, I noticed a chunk of not unproblematic examples. It was a significant, but not huge proportion, so I made a mental note to investigate further when I’d dealt with the more obvious examples. As I started to look in more detail, it soon became clear that those straightforward examples, although they were there – The whole process was simple and unproblematic – were actually in the minority. What I did find was:

not unproblematic

However, the category of 'climate refugees' is not unproblematic.
However, comparing evidence from different surveys is not unproblematic.
This definition is not unproblematic, as it seems to rest on circular reasoning.
However, this approach is not unproblematic, since site reactions can cause distress to patients …

not an unproblematic + noun

Given related debates this is not an unproblematic option either.
Of course, Twitter is not an unproblematic representation of the population.
However, this is not an unproblematic undertaking.

not + (a/an) + adverb + unproblematic

balancing school and extracurricular activities is not always unproblematic
So student visas are not completely unproblematic from this point of view.
it's not an entirely unproblematic development from an editorial standpoint
The study was, indeed not wholly unproblematic
I also concluded that the idea is not so unproblematic as it might appear on first glance.
This popular mixed-mode design is not altogether unproblematic from a measurement error perspective

Miscellaneous other negatives

However, the process has not been unproblematic and has led to controversies
there can be cases where the merger cannot be characterised as unproblematic in advance.
The situation should not be thought of as unproblematic, though
That doesn't make it unproblematic
Princess Jellyfish isn't what you'd call unproblematic, but I really enjoy most of it so far

[All examples from English Web 2020 (enTenTen20) corpus via SketchEngine]

What all of these examples seem to have in common is the idea that something isn’t as simple as you might expect or as it might seem, and that in fact there may be some problems with it.

Delving further into other un- words that come after negatives, I found lots of similar patterns. Here are some of the most frequent combinations:


Although they don’t all work in exactly the same way and you come across different nuances of meaning in different combinations or specific examples, there does seem to be a common generalizable meaning. What many of the not + un- patterns seem to be trying to convey is:

  • a middle ground between the two antonyms – where something isn’t very problematic, common, expected, etc. but neither is it straightforward, rare, unexpected. Where that point along a scale between the two lies varies depends on context, although my feeling is it’s usually nearer to the un-


  • often the idea that something is not quite what you might expect or what it might seem. It may seem unproblematic, uncommon, unexpected, but maybe it’s not quite as much as you’d think.

And of course negatives aren’t limited to un- words – not dissimilar springs to mind – so I’m sure there’s more here to explore.

This all raises the question: have you ever seen this pattern taught, even at advanced levels? I don’t think I’ve seen it, at least not explicitly highlighted. Given it’s clearly not altogether uncommon, it’s certainly been added to my ongoing list of features to get a mention next time I’m writing something relevant.


And for those who're interested:


Monday, November 29, 2021

Words of an Odd Year

At this time of year, dictionaries announce their Words of the Year. It’s a bit of a publicity exercise, to be honest, and not something to set too much store by, but still fun to see what gets chosen. This year’s selections have been a bit of a mixed bag and many of them have left me thinking “hmm, odd choice” - but then maybe that appropriately reflects what an odd, discombobulating sort of year it’s been.


Different dictionary publishers use different criteria to choose their Words of the Year, some of which are clearly stated, some less so. Cambridge’s choice is based (largely) on the most popular word that people have looked up on their dictionary website and this year was announced as perseverance - which at first sight seems an odd choice. It’s popularity was linked to the NASA Mars rover called Perseverance that landed back in February. It saw a huge spike in lookups, likely from two sources - English learners who wanted to know what the word meant and also L1 English speakers who wanted to check the spelling. To me, it feels like a slightly odd choice, regardless of the stats, just because it refers to such a specific moment, quite early in the year, but I guess it does also chime with the perseverance we’ve all had to demonstrate in living through a second year with Covid.

Graphic divided into four squares with a word in each square. 1 Oxford: vax; 2 Cambridge perseverance, 3. Collin NFT, non-fungible token; 4 Australian NDC strollout

Collins, on the other hand, have gone down the new coinages route, choosing novel words and terms that have appeared, or at least gained a foothold, this year. Their shortlist was topped by NFT or non-fungible token - yes, exactly, neither do I! Again, it feels a slightly left-field choice, but their shortlist more generally does reflect some of the themes of the year with several tech-related words (NFT, metaverse and crypto), some pandemic words (double-vaxxed, pingdemic and hybrid working) and miscellaneous others - as I said, it has been a miscellaneous sort of a year, so maybe that’s appropriate.

Oxford went for perhaps the most obvious choice, vax, with an accompanying report into the language of vaccines. American dictionary, Merriam-Webster also went for vaccine. Probably for many of us, it is the word that best reflects the year, but then it doesn’t provoke much debate, does it, or make you read on to find out why.

I think my favourite WOTY comes from the Australian National Dictionary Centre who plumped for strollout - apparently a term to describe the slow pace of the vaccination rollout in Australia. Yes, it’s one of those gimmicky buzzwords that was probably coined by a headline writer, but it does definitely tell you something about a time and a place.

Words of My Year

From a professional point of view, I’ve worked on a mix of projects this year that have had me delving into different types of vocabulary. I started off the year researching idioms and phrasal verbs for new editions of two books - Work on Your Idioms and Work on Your Phrasal Verbs (both for Collins). We were focusing on the most frequently-used items in each group, so not necessarily touching on low-frequency trending words. However, we did add call out to the unit on Reporting in the media, which I think has proved to be quite key this year, with unacceptable behaviour being called out in all kinds of areas of life. Here are a few key collocates I found from recent corpus data.

Examples of usage of the phrasal verb call out, without key collocates highlighted: Women are too afraid to call out bad behaviour for fear of losing a job. It came only after the company was publicly called out by several people on [social media]. This has been rightly called out as hypocritical. This behaviour must be challenged and called out.

Recently, I’ve spent more time than you would think is feasible researching prefixed words for another project. I’m not sure that any of them would be candidates for WOTY, but to continue the ‘odd year’ theme, they’ve definitely sent me off in some peculiar directions, including getting to grips with the philosophical concept behind antirationalism and trying to understand the physics of multipole.

I’ve also spoken about language change - and its relevance to ELT - at a number of events, both this year and last. What struck me when I was putting together my most recent session for TESOL France was the degree to which I needed to update my examples of coronavirus-related vocabulary. Words that had sprung up in the early days of the pandemic when we were all coming to terms with lockdowns - like coronadodging (trying to avoid people on the pavement to maintain social distance) and quarantinis (quarantine cocktails, sometimes shared with friends via Zoom) - already feel quite passé and have instead been replaced by terms that reflect the place of Covid as a mundane reality in our everyday lives - like corona-related and covid-appropriate.

On a more personal note, I think one word I’ve used a lot in 2021 has been hermity - as in, I’m getting quite hermity. (Yes, it’s a made-up word. Apparently, hermitic or hermitical is the adjective from hermit, but doesn’t feel quite the same) After so long staying at home, avoiding crowded places and barely travelling, I’ve definitely got used to a more isolated sort of existence and my re-entry is proving to be a slow one. Although there have been few official restrictions in the UK since the summer, I’ve felt wary about getting back to normal activities and have continued to mostly stay at home - partly out of caution and a sense of social responsibility, but if I’m honest, as much out of habit. I’m still feeling that life is very much 'on pause', so for 2022, I’m hoping that something will prompt me to 'press play' again.

