The occasional ramblings of a freelance lexicographer

Wednesday, June 15, 2022

Lexicography FAQs: how it looks from here

Yesterday, I was chatting to a couple of ELT friends and they were asking me what a lexicographer actually does, day-to-day and how we create dictionary entries. It's a question I get asked a lot in different forms. So here's a very quick run-down of what my work routine actually looks like.

📃 I get sent a list of words to work on in a spreadsheet. It will depend on the project as to whether they're words that need completely new entries created or are words with existing entries to be revised or added to.

🔍 If I'm creating new entries for words that aren't already in the dictionary I'm working on, for whatever reason, I start off with a corpus search. I use the (publisher's own) corpus to research how the word is used and pin down the meaning - or meanings. For some words, the corpus data will give me enough information to come up with a definition, for more "difficult" words, I might refer to other resources to get my head around it - see my post here about hard words.

For reasons of confidentiality, this is not a screenshot of a publisher's corpus - it's the Timestamped JSI web corpus accessed using my personal account - but it uses the same Sketch Engine software we generally work with

⌨️ Then I go into an app called Entry Editor where I actually create the entry. This is a mock-up entry I just created. As you can see, there are several panels. The one on the right is where I type in the various bits of information needed to create a dictionary entry, all structured within tags. So, you can see at the top, the headword, the word that appears at the top of the entry. Below that is all the stuff about part of speech, grammar labels, regional and usage labels, spelling variants, etc. There are also spaces for the IPA pronunciations which I don't fill in, these get added later by a pron specialist. Then underneath that you have the definition and then example sentences.

Click to enlarge

✏️ The definitions in learner's dictionaries are written using what's known as a DV or defining vocabulary. This is a list of (high frequency) words that as lexicographers we're "allowed" to use in definitions in order to make sure that the defs are as clear and simple as possible. Each project will have a slightly different DV (put together by the publisher) and slightly different rules about how to deal with words that really can't be defined without using non-DV words.

🔍 After I've composed and entered the definition, then I'll fill in the other information using the corpus. Can I see that the word's mostly used in American sources? If it's a noun, is it used countably, uncountably or both? Is it usually one word or two words or hyphenated? (I'll use corpus stats to decide and add spelling variants as necessary.) Then I look at the patterns the word's typically used in - collocations, typical subjects/objects, colligation, etc. - and try to choose corpus examples that reflect the most typical uses. How many examples get added will depend to a degree on the frequency of the word (common words will typically get more examples than low-frequency ones) and the guidelines for the particular project. And example sentences will also get slightly edited to comply with corpus permissions (i.e. to make them unidentifiable) and to make them work as stand-alone sentences of an appropriate length. Then I'll add any extra bits as needed for the particular project, such as synonyms, antonyms, cross-references, usage notes, etc.

📄 As I add text to the right-hand panel, it comes up in WYSIWYG form in the centre panel. This is useful to read through the entry in a slightly more readable form to check that everything looks okay and reads well before I save it and upload it to the publisher's database.

✅ And that's it, when I've finished one entry, I upload it and move onto the next word on my list. Harmless drudgery, but often fascinating and sometimes head-scratchingly challenging in a way that keeps a word nerd like me happy.

Footnote: scribblynonsense will, perhaps sadly, not be appearing in a dictionary anytime soon. I did check to see if I could find any corpus evidence for it, but I couldn't find a single example, so I deleted the entry without uploading it

Labels: , ,

Tuesday, May 31, 2022

Let's jump on a call ... and other downplaying verbs

With my partner working from home a couple of days a week nowadays, I can’t help but overhear a lot of the “office speak” that I usually miss out on. He spends a lot of his time chatting to colleagues via Teams and several times I’ve heard him and others talking about “jumping on a call”, to mean having a video call. It caught my attention, not least because it seems to be another potential addition to a set of ‘downplaying’ verbs that interest me. 

So, I did a corpus search (using the Timestamped JSI web corpus 2014-2021 via SketchEngine) for “jump on” and found an interesting collection of direct object collocations that seem to fall into fairly clear sets: 

Heading: jump on. Image of four circles, in the top on the text reads "physical leap, jump on a trampoline, a bed, the table, downward arrows to two circles, the one directly below, text reads: quick movement, jump on a train/bus, a bike, a plane/flight; another circle slightly to the right, text reads: seize a chance, jump on the bandwagon, opportunity/chance, an idea, downwards arrow in line with 'quick movement' to bottom circle, text reads: communications, jump on Twitter.instagram/a call, social media. Final text box with arrows to quick movement and communications circles text reads: downplaying, quick, easy, no effort
Click to enlarge

Starting with the most literal sense, there were quite a few examples of people (and animals) physically jumping on(to) things: 

children playing outside and jumping on the trampoline
My son broke his elbow a couple of months ago after jumping on his bed
Look who learned to jump on the table!
[of a cat] 

As on offshoot, there’s a whole set of expressions to do with seizing an opportunity, the most frequent of which is the idiom jump on the bandwagon. You can jump on an opportunity or chance – take it while it’s available. And journalists, politicians and media commentators generally sometimes jump on a story or some news – they eagerly take the opportunity to talk/write about it because it's interesting or controversial, etc. But for today, I want to leave this offshoot to one side. 

Getting back to movement, there was an overlap between the literal and slightly more metaphorical when it comes to transport. If we say that someone jumps on a bus or a train or a flight, we probably don’t quite visualize them leaping. Instead, the use of the verb jump here, rather than the more neutral get or catch, suggests that the journey was easy and quick and no trouble at all. It’s the kind of verb you use when you’re trying to downplay the effort or inconvenience involved. For example, if someone offers you a lift home and you tell them not to worry, you’ll just jump on the bus. 

Jumping on a train to the airport is a given in many cities
You know, not everyone can jump on a plane and come to New York
I could jump on my bike and be in the city in two-and-a-half minutes. 

The communications contexts seem to follow on from this idea of doing something quickly and easily. When it’s a video call, it seems to have the connotation that it will be easy to set up, won’t take up too much time, and will all round perhaps be much easier than it used to be arranging face-to-face meetings. 

If you have questions […] we're happy to jump on a Zoom call
Then I jump on work calls at about 7.15am.
Shall we jump on a quick video call

Both the transport and video call senses emphasize ease and downplay effort or inconvenience. Which is what makes me think that jump (on) may be a candidate for a group of verbs that include nip, pop and grab, that we use to downplay actions. 

I’m just … nipping out/popping to the shop/going to grab a coffee.
Can you just … nip down to reception/pop your PIN number in/grab me some sugar? 

Interestingly, if you jump on social media, there seems to be an overlap between ease/speed/convenience and the journalistic sense of jumping on a news story, in that you’re quick to voice your (often critical) opinion: 

On Friday she jumped on Twitter to insist her words had been taken "out of context"
Chelsea fans were quick to jump on social media to condemn […] for his stamp last weekend.
She jumped on Instagram this afternoon to deliver an update on the project. 

Right, I’m just going to pop this post on my blog and then jump on social media to share it. Feel free to pop any thoughts in the comments.

Tuesday, March 15, 2022

Researching phrasal verbs: showing my workings

This is kind of the post behind the post. I recently wrote a post for the Collins ELT blog about how we updated the new edition of Work on Your Phrasal Verbs. I was asked to make it a short, simple post for a wide audience. You can read the post here - it summarizes the key changes in terms of updating the phrasal verbs we included to reflect current usage (we changed around 10% of the PV list) and also how we reworked the design to include more space for practice activities.  What the post doesn’t have space for is explaining how we did that. So, I thought I’d write this post-behind-the-post to give those of you who like that kind of thing a few more of the nerdy details.

Updating the PV list was a fairly straightforward case of checking the current corpus frequencies of the PVs in the first edition and highlighting any that had declined in usage. I say “straightforward” – actually checking the frequency of phrasal verbs is far from straightforward, but I’m not going to get into that here! Then the Collins corpus team generated a list of possible new additions based on the most high-frequency PVs that weren’t included in the first edition. I checked the most likely candidates manually, chose suitable options, and we worked out how to shuffle things around to fit the new additions into the themed units to replace those we’d dropped.

When we were initially reviewing the first edition, I'd highlighted the fact that most of the existing practice activities focused on the basic meaning of the PVs, so a largely receptive focus. With an extra page per unit for new activities, I suggested we could use the space to build on the receptive/meaning-focused exercises by adding new ones that looked more at the kind of things learners need to know to use PVs productively. For me, it was this bit that turned out to be the most interesting part of the project as I got to do original corpus research, then put it directly into practice.

In my Collins post, I pick out four main areas we focused on – and I also made some pretty graphics around them for an instagram post which I’m going to unapologetically reuse!

Typical Collocations:

This is a biggie because collocations are not only important for using language in a way that sounds natural - and produces predictable combinations of words that readers and listeners will expect and not ‘trip over’ – but collocations such as typical subjects and objects also tell you a huge amount about how and where the target PV is typically used. Is it used to talk about serious or fairly frivolous topics? Are the subjects positive or negative things? What types of people do this thing? Is it informal and conversational or something more likely to crop up in business communications or journalism?

Here are some of the pages and pages of notes I made as I researched each PV.

You can see very clearly that, in terms of typical objects, we drop off both objects and people:


For intransitive PVs, the subjects are obviously more interesting, like here at (not) add up:


Or here you can see what kinds of people tend to step down:


For ditransitive verbs, like remind sb of sth, I looked at both the direct and indirect objects:

And sometimes I noted down both subjects and objects as worth highlighting, such as here you can see who lays off who:


But it’s not just about the nouns. There are the adverbs too that you can see above with step down immediately/voluntarily and not really add up. Or even the quantifiers, lay off a lot of/hundreds of .

I used all this mass of information to create activities that focus specifically on collocation, like the one above, but I also tried to include the strongest collocates of each PV throughout the unit, including them in examples where they weren’t the main focus.

Colligation patterns:

These are the grammatical patterns that PVs tend to be used in. At the simplest level, do you carry on read, carry on to read or carry on reading? While scrolling through corpus lines, I often found myself noting down following patterns, some of them simple, like a following -ing form or a wh- clause as above. Some like at make up for were more complex, taking in direct objects, -ing forms, wh- clauses and also a passive plus preposition combo (be made up for in …) and a preceding phrase (more than made up for …)


Other common preceding patterns I noted included modals and other introductory verbs (do they have a name?), like try to, fail to, go and

Again, some of these made their way into explicit exercises highlighting the patterns, often matching sentence halves, while others just loitered in general examples, building the picture for students of typical usage in the background.


Slightly confusingly for learners, phrasal verbs often co-occur with specific prepositions that aren’t strictly part of the phrasal verb itself, because they’re optional or vary depending on what follows. They’re at the niggly, detailed end of language learning, but they can make a real difference to how language flows and to how listeners/readers are able to process a sentence. Imagine you read “They fell out with …”, you expect what follows to be a person, not an issue and if it isn’t, you hesitate, maybe reread, wonder if you’ve understood correctly.

Word order:

All of the above categories apply to almost any type of word, but this last issue is uniquely phrasal-verby. ELT materials commonly teach about the difference between separable and inseparable phrasal verbs, often in a one-off section, but once students have grasped the concept, they need to know how it applies to specific PVs they come across. Does the object always appear between the verb and the particle or always after the particle? If both are possible, is there a tendency one way or the other? Can a PV only be split by a pronoun or are there other pronoun-like words that can go in-between, like everyone, things, etc.?

As you can see, I had hours of nerdy fun researching all this stuff which I then tried to cram into a few short pages. I just hope that students get out at least some of what went in!

More about the book here.


Labels: , ,

Friday, February 18, 2022

Different perspectives & the frequency illusion

Recently, for various reasons, I’ve been looking at some relatively low-frequency words. Yesterday, I had paranormally to deal with and it sparked a couple of thoughts about the way I work with language.

Most of the time when you’re researching and compiling dictionary entries, your feelings about the words you’re dealing with are fairly neutral. Sometimes, you get a fun word just because it expresses a fun idea, because it’s pleasingly onomatopoeic, or because it’s rude. Sometimes, words are awkward because they sprawl across tricky-to-split senses or have lots of confusing variants. And sometimes words are just hard to get your head around – see my recent post about hard words. Then there are words that you realize you’re biased against because they express ideas you feel uncomfortable about or, in this case, sceptical of.

Initially, I wasn’t sure that ‘paranormally’ even felt like a feasible word. Of course, ‘paranormal’ as an adjective and a noun are pretty frequent, but I couldn’t imagine how it’d be used as an adverb. So, I looked at the corpus evidence and, although it wasn’t super-frequent, there was enough evidence to investigate further. I scanned through the cites and found myself immediately drawn to the jokey, ironic, and yes, sceptical uses. When you’re compiling dictionary entries and looking for good example sentences, it’s tempting to plump for the ones you can hear yourself saying, because they feel more natural. But then you have to check yourself and remember that other people who will be looking these words up, may be coming from very different perspectives.

When I come across words that for whatever reason don’t fit with my own beliefs and opinions, I try to think of someone I know who would have a different perspective. As a firm atheist, I always feel a bit squeamish about words with religious connections or connotations, but I have some more religious friends whose shoes I try to imagine myself into. In this case, I immediately thought of an old childhood friend who I know has an interest in the paranormal, as well as a whole host of other ideas I wouldn’t normally pay much attention to. So, I approached the evidence again through her eyes and gave her a couple of examples to pick to balance my more sceptical choices.

Later in the day, I was scrolling through my Facebook feed and came across this post from her: 


I was especially chuffed to see I’d correctly picked out paranormally active as a strong collocation! I also marvelled for the umpteenth time at the phenomena of frequency illusion. Even when I’ve been researching the most obscure of words, that I swear I’ve never come across before, they always seem to have a knack of then cropping up in context somewhere the very next day!



Labels: , ,

Monday, January 17, 2022

Lexicography FAQs: hard words

The other night I dreamt that I had an app on my phone that kept pinging obscure words at me that I had to define at the same time as I was trying to pack up to move house. 


Having been deep into a lexicography project for the past few months, I’m loving being paid to play with words all day, but at the same time, it’s clearly messing with my head!

It made me think of one of the questions that people often ask when I say I’m a lexicographer - how I know what the ‘hard’ words mean. My first response is usually that as someone who works primarily on ELT dictionaries, we tend not to get very many ‘hard’ words to deal with. Learner’s dictionaries focus more on high frequency words because they’ll be the ones learners are most likely to come across and need to know. Back in the days of print dictionaries, space constraints largely dictated which words would be included. Crudely speaking, we started with the most frequent words and worked our way down the frequency list until we’d filled the number of pages we had space for. Of course, in a more digital age, those constraints no longer apply and the distinction between learner’s dictionaries and those aimed at L1 English speakers is becoming more blurred.

Frequency notwithstanding though, at the top end of advanced learner’s dictionaries, you do sometimes come across words you’re not familiar with and over the years, I’ve worked on a number of projects that have gone beyond the confines of general ELT, such as working on the Oxford Learner’s Dictionary of Academic English and on non-ELT projects, including the Oxford Dictionary of English. So, yes, I do have to deal with words that are either right on the edges of my knowledge or that I just don’t have a clue about.

Connections and derivatives

At the easier end of the scale, you’re sometimes faced with words that are related to those already covered in the dictionary you’re working on. A lot of my work on ODE, for example, was adding or upgrading run-ons – words that are tagged onto the end of other entries but don’t get a full entry and definition of their own. So, you might find contentedly and contentedness as run-ons at the end of the entry for contented.

Oxford Dictionary of English (3e)

If you’re upgrading a run-on to a full entry, it’s sometimes as simple as adapting the definition from the existing entry to the new part of speech and adding some examples. You still need to do a corpus check though – firstly to source the examples, but also to make sure the word does just mirror its root word in terms of usage. For example, looking at corpus cites for the word deportable recently, it was immediately clear that it broadly means “can be deported”, but a closer look revealed that a person can be deportable – a deportable immigrant/foreign national – but there are also deportable crimes/offences, i.e. ones that someone can be deported for. So, both of these uses needed to be reflected in the entry.

Checking other dictionaries

Another question I often get asked is whether we just copy stuff from other dictionaries. The answer is “No, but …” Dictionaries all have their own style guidelines which dictate how definitions are worded. Learner’s dictionaries have a defining vocabulary; a list of words you’re allowed to use in definitions. There are pages of guidelines about all the other little details such as register labels, regional and spelling variants, and how grammatical information, collocations, etc. are shown. And examples are all drawn from the specific publisher’s corpus, along with guidelines about how many examples to include, how they should be edited, presented, etc.

That said, I do often consult other dictionaries to check my intuitions. I’ll always look at the corpus data first and try to form my own impression of a word, even if in the case of completely unknown words, it’s a bit of a vague one. Then I’ll look at other dictionaries, often 3 or 4 others, to see whether my impressions were in track and then go back to the corpus evidence again. Sometimes, the other dictionaries confirm quite straightforwardly what I was thinking, sometimes they highlight a usage I hadn’t spotted, so I’ll go back to the data to see whether I can find examples. Sometimes different dictionaries disagree, and very occasionally, they just don’t cover the word at all.

Other reference sources:

When it comes to specialized, technical and academic words, I often find that I need to dig a bit further. Sometimes I find myself looking at definitions from other dictionaries but feeling that I still don’t really understand the concept. That’s not to say those entries are lacking, dictionary definitions are about trying to capture the essence of a word as concisely as possible, they’re not about explaining complex concepts in endless detail.  In those cases, I seek out other reference sources, often specialist technical or academic websites that have fuller explanations. It’s one of my favourite bits of the job but can be challenging too. I’ll usually try and look at a mix of sources including some nice simple undergraduate guides if I can find them, as well as the proper technical stuff. Then when I think I’ve got my head around a concept, I’ll try and synthesize it all into an appropriate definition. When it’s an idea within the humanities and social sciences, I often feel like I’ve pinned it down well. In the hard sciences and computing though, it can feel like a half-understood approximation! I was pleased to fall back on my A level in Statistics recently when trying to understand probability density function, but found myself stumped by multi-phase and premeiotic … which is why we have:

Spec checks:

Thankfully, as lexicographers, we aren’t expected to be omni-experts! Anything we’re unsure about, we can flag up for a specialist check so it can be referred to a subject specialist … who’ll probably laugh at our lay attempts to define a techy term before they completely rewrite it!

Labels: , , , ,

Monday, December 27, 2021

Patterns that go unnoticed

It’s turning into a bit of a negative end to the year … not because of anything bad happening, just because I find myself deep in a flurry of words beginning with un-. In my last post, I looked at what I’ve now discovered is called litotes; the use of two negatives together to express either irony or a subtle distinction between two absolutes (not uncommon, not unpleasant, etc.). Recently, I’ve been seeing another pattern with un- prefixed words; a kind of passive construction with go + un- + past participle:


It seems to describe events that no one sees or does anything about, things that are missed or ignored. In terms of form, it’s a bit like the get passive that we’re all familiar with, but this time the focus is on the lack of an agent doing anything. Although like most passives, we can add a by to say who didn’t notice or act … and maybe should have.

It’s also another pattern that can be used with a negative – back to litotes again. This seems to work in two ways – to talk about negative actions and events which won’t escape notice or shouldn't escape punishment:

But also to talk about positive actions and events that will be acknowledged or rewarded:


Digging a bit deeper, I realized that as well as the obvious negative verbs beginning with un-, the pattern also occurs with a handful of other verbs that have a negative meaning:

Flicking through the ELT grammar reference books on my shelves, it seems to go unmentioned. If you dig deep enough, it does feature in many dictionary definitions for go, like these from Cambridge and Macmillan, but I'm guessing it's the kind of entry that goes largely unread ...

Cambridge Dictionary

Macmillan Dictionary

Labels: , , , ,

Tuesday, December 14, 2021

Not such an uncommon pattern

I was recently researching the word unproblematic. Before I started looking at the corpus evidence, I expected that it was used to describe something that’s simple, straightforward, and uncontroversial, something that doesn’t throw up any problems. And it is, but …

As I scrolled down the concordance lines, sorting left and right as I often do when I first look at a word, I noticed a chunk of not unproblematic examples. It was a significant, but not huge proportion, so I made a mental note to investigate further when I’d dealt with the more obvious examples. As I started to look in more detail, it soon became clear that those straightforward examples, although they were there – The whole process was simple and unproblematic – were actually in the minority. What I did find was:

not unproblematic

However, the category of 'climate refugees' is not unproblematic.
However, comparing evidence from different surveys is not unproblematic.
This definition is not unproblematic, as it seems to rest on circular reasoning.
However, this approach is not unproblematic, since site reactions can cause distress to patients …

not an unproblematic + noun

Given related debates this is not an unproblematic option either.
Of course, Twitter is not an unproblematic representation of the population.
However, this is not an unproblematic undertaking.

not + (a/an) + adverb + unproblematic

balancing school and extracurricular activities is not always unproblematic
So student visas are not completely unproblematic from this point of view.
it's not an entirely unproblematic development from an editorial standpoint
The study was, indeed not wholly unproblematic
I also concluded that the idea is not so unproblematic as it might appear on first glance.
This popular mixed-mode design is not altogether unproblematic from a measurement error perspective

Miscellaneous other negatives

However, the process has not been unproblematic and has led to controversies
there can be cases where the merger cannot be characterised as unproblematic in advance.
The situation should not be thought of as unproblematic, though
That doesn't make it unproblematic
Princess Jellyfish isn't what you'd call unproblematic, but I really enjoy most of it so far

[All examples from English Web 2020 (enTenTen20) corpus via SketchEngine]

What all of these examples seem to have in common is the idea that something isn’t as simple as you might expect or as it might seem, and that in fact there may be some problems with it.

Delving further into other un- words that come after negatives, I found lots of similar patterns. Here are some of the most frequent combinations:


Although they don’t all work in exactly the same way and you come across different nuances of meaning in different combinations or specific examples, there does seem to be a common generalizable meaning. What many of the not + un- patterns seem to be trying to convey is:

  • a middle ground between the two antonyms – where something isn’t very problematic, common, expected, etc. but neither is it straightforward, rare, unexpected. Where that point along a scale between the two lies varies depends on context, although my feeling is it’s usually nearer to the un-


  • often the idea that something is not quite what you might expect or what it might seem. It may seem unproblematic, uncommon, unexpected, but maybe it’s not quite as much as you’d think.

And of course negatives aren’t limited to un- words – not dissimilar springs to mind – so I’m sure there’s more here to explore.

This all raises the question: have you ever seen this pattern taught, even at advanced levels? I don’t think I’ve seen it, at least not explicitly highlighted. Given it’s clearly not altogether uncommon, it’s certainly been added to my ongoing list of features to get a mention next time I’m writing something relevant.


And for those who're interested:


Labels: , , , ,