The occasional ramblings of a freelance lexicographer

Monday, July 19, 2021

Coronavocab: the pingdemic

I haven’t posted about coronavocab for a while, largely because the corpus I was using to track new words last year hasn’t kept up – and currently only has data up to Jan 2021. I should say that the folks who’ve been putting together the Timestamped JSI Web Corpus did an amazing job keeping pace last year, so it’s really no surprise that they’ve finally had to slow down.

Nevertheless, I couldn’t resist posting about this latest new coinage, just because it’s such a great creation. So, this post is based on anecdotal evidence and some googling. To explain the pingdemic though, I need to take a step back first.

be/get pinged (verb, usually passive) 

In the UK, the National Health Service (NHS) introduced a phone app last year that’s designed to track coronavirus infections. Apparently, it’s been downloaded by millions of people. It works in two main ways, firstly it uses Bluetooth to detect whether you’ve been in close contact with someone who later tests positive for Covid-19, i.e. you’ve been close to them for more than about 15 minutes.

You also use it to check in to places like restaurants by scanning a QR code, so that if there’s an outbreak connected to the venue, you can be alerted.


And if you receive a notification via either of these methods, you get pinged to tell you to self-isolate for 10 days.

If I get pinged, there is only me - so the shop would have to close. (BBC

More than half a million people in England were "pinged" by the NHS Covid-19 app in a single week (ITV news)

Employers report staff shortages as thousands of workers pinged (Personnel Today

What do I do if I am pinged by the NHS Covid-19 app? (The Times

More people in Leeds were 'pinged' by the NHS Covid app than anywhere else in the country last week (Yorkshire Evening Post


In recent weeks, the number of infections in the UK has increased dramatically – although thankfully, with many people now vaccinated, those cases have largely been mild or asymptomatic. Along with that, thousands of people are getting pinged every day resulting in a pingdemic where increasing numbers of people are being told to self-isolate, leading to many businesses being short-staffed or even forced to close.

NHS Test and Trace is penetrating through walls and forcing neighbours to self-isolate in the latest sign of a ‘pingdemic’. (Metro

Amid both the pingdemic and face mask rule confusion, we asked our readers what they are most worried about this summer (Telegraph

Bin collections have also been hit by the pingdemic in Wyre council in Lancashire. (Guardian

The cause appears to be the so-called “pingdemic”, with essential staff being told to self-isolate because they have been in contact with a coronavirus case. (Independent

Inevitably, this is going to be a fairly short-lived new term, but I still find it oddly pleasing – what with the light-hearted nature of the word ping which brings a bit of fun amongst the gloom and also because it just mirrors the word it’s based on so nicely. 

and fully-vaxxed (adjectives)

One reason why the pingdemic might not be with us for too much longer is that in a few weeks, the guidelines are due to change so that people who’ve been double-jabbed and fully-vaxxed will no longer have to isolate and will instead just be asked to take a Covid test.

From 16 August, double jabbed individuals and under 18s will no longer need to self-isolate if they are identified as a close contact of someone with COVID-19 (UK government website

Double-jabbed Britons have been given the green light to jet to holiday hotspots like Greece and Italy this summer without quarantine. (Evening Standard

People who have not been double jabbed will have to test negative for COVID-19 within 24 hours of arrival in France if they are travelling from the UK, Spain, Portugal, Cyprus, the Netherlands and Greece. (Sky News)

The phenomena of the double x in vaxx (short for vaccine/vaccinate) is an odd one. Other verbs ending in x don’t double their final consonant; taxed, faxed, relaxed. It’s not a new coinage as it’s been used in the context of anti-vaxxers (people opposed to vaccination) for several years – and it may have originated in the controversial anti-vaccine film; “Vaxxed”. Whatever its origins, in the past few months it’s blossomed - and freed itself of those connotations.

Two-thirds of adults in the UK are now double-vaxxed  (City AM

Canada may let fully vaxxed Americans visit next month (The Suburban

Work ongoing to understand the profile of fully vaxxed people with severe outcomes. (Metro

Can anyone out there confirm I have this right for what is needed to travel from UK to Majorca next week please - both adults double vaxxed last jab a month ago. (Mumsnet)

Interestingly, that last example, a comment from an online forum, shows how while pingdemic is probably more one for journalists and headline writers, being pinged, jabbed and vaxxed have all passed into people's everyday vocabulary.

We’ll have to wait and see how long the pingdemic lasts and what new coronavocab coinages continue to emerge as the situation develops …

Labels: , ,

Tuesday, June 29, 2021

“Then we’ll write the dictionary”: underestimating the lexicographic task

Last year, I became a member of the expert panel for the AS Hornby Trust Dictionary Research Awards (ASHDRA). The awards are designed to fund dictionary-related research – that might include research into dictionary usage or research aimed at developing new resources, for example in areas not covered by conventional dictionaries or for under-resourced contexts.

Over the past few weeks, I’ve had a flurry of Zoom meetings with my fellow panel members to discuss this year’s applications and decide which projects to fund. It was fascinating to read all the different proposals that came in from around the world and to discuss their merits and drawbacks. There are a whole range of criteria used to assess the proposals – which I don’t plan to go into here – but this year, one issue seemed to come up across a number of the applications. In projects that had some kind of resource as an end result – not necessary a full-scale dictionary, but often a vocabulary reference for a specific context – there was an underestimate of how much time, work and expertise goes into producing a good lexicographic resource – on whatever scale.

Time and again, I found myself reading proposals that started off with an interesting aim, a solid foundation in existing research and theory, and a strong proposal for the initial research stages – involving collecting data, reviewing existing resources, maybe creating corpora, conducting interviews/questionnaires with stakeholders (such as teachers), analysing data to create word lists, etc. But then when it came to producing the actual resource, there was often just a couple of sentences which amounted to not much more than “and then we’ll write the dictionary”. Having worked as a lexicographer and materials writer for more than 20 years, my reaction was often “Woah! Hang on a moment – do you realize just how much goes into compiling a dictionary?

It often seemed to be the case that little detailed thought had gone into the design and format of the resource that would result from all the research. And perhaps of even more concern, there was rarely any mention of plans to pilot the resource with learners to see if it was something they could and would use. Some of the kinds of questions that sprang to my mind included: [click to enlarge the images]

Questions: 1. What about design and format? What will an entry actually look like on the page/screen? 2. How much information will you include in each entry? Too much may be confusing, not enough is unhelpful. 3. How will you make the information clear and accessible to learners? There’s no point including details which users don’t understand or notice and so ignore. 4. How will you pitch the content appropriately to your target audience? What’s right for university students won’t be the same for young learners. Lower-level learners will need a different approach to higher levels.

More questons: 5. Remember that what seems clear and obvious to an academic linguist caught up in language research may not be so appealing to your average learner for who probably just wants a quick and simple answer to their look-up.  6. Will your format work equally for different types of words (function words, concrete/abstract, phrases, multi-sense words …)? Can you find a format that’s consistent but flexible enough to deal with these differences? 7. Will you use a defining vocabulary? What about your defining style (traditional, full sentence or a pragmatic mix)? Will you create a style guide? 8. What about images – will you commission illustrations or use photos? Where from? Remember commissioned artwork and stock photos both cost money. And don’t forget about copyright issues!

I could go on and on. As I looked at the specific challenges of different projects, different issues sprang to mind. Creating a useful reference resource isn’t as simple as throwing the results of research down on paper.

So, how could applicants have got around this issue? In discussing cases where someone had a really promising idea but underestimated the lexicographic part of the project, one potential solution we came up with was a more scaled-back proposal that could effectively become a pilot study. In the same way that a commercial publisher would usually start off with a sample to be reviewed and piloted, researchers could put together just a small number of entries of their planned resource to pilot with students and teachers in order to work through some of the issues above, to try out different designs and formats, and hopefully, come up with something that really works for their target learners.  At the end of this process, they would come out with a solid sample that they could use as a proof of concept to move forward and seek further funding for a full-scale project. This would also, hopefully, give them a clearer idea in terms of where to focus their research efforts to create the final resource and so, to a degree, avoid wasted effort.

From my perspective, the processes of assessing and discussing the proposals has been an interesting opportunity to reflect on my own accumulated knowledge as a lexicographer; all those things you absorb over the years and start to take for granted as an ‘obvious’ part of the process of creating a vocabulary resource, but which perhaps aren’t so obvious after all.


Labels: , , , ,

Monday, June 07, 2021

Coronashift: working (and earning) through a pandemic

It's that time of year when I sit down to do my accounts - the UK tax year ends at the start of April so I usually get round to totting everything up to submit my tax return around now. Before I get down to the serious book-keeping though, I just spent a couple of hours making myself some graphs to see how work's panned out over the past year.

Most years, I make a graph for myself to see how my different sources of income break down. It's partly just out of curiosity, but it's also useful for tracking where the main focus of my work has been and assessing whether it's the kind of balance I'm after. This year, of course, has been a bit different with the coronavirus pandemic breaking out just before the start of the 2020-21 tax year.

So, below are the graphs for April 2019-2020 - to give a pre-pandemic comparison - and then for April 2020-2021:

The main points to come out seem to be:

Grants: I had a long patch of 4-5 months last summer with almost no work at all as publishers cancelled or paused projects. I was luckily able to claim government grants for the self-employed. So these made up nearly a quarter of the year's overall income.

Talks & training: Around 10% of my income in an average year is generally made up of talks and training in some form; at conferences, events, workshops, etc. This year, for obvious reasons, that dropped off a cliff and made up less than 1% of my income (for a single paid webinar). 

Royalties: These were down both as a percentage of my income and in real terms. With everything going on, some teaching cancelled and the rest shifting online, people haven't been buying new ELT books. Publishers' reps haven't been able to get out to chat to teachers and schools, bookshops have been closed, some publishers have even struggled at points to get books printed or moved around the world. So, royalties for writers have dropped and because they're paid in arrears, I suspect they'll continue to go down before they start to recover.

Writing vs. consulting: In terms of the kind of projects I worked on, it looks like there was quite a big shift from lots of consulting to more writing. 'Consulting' is a bit of a 'miscellaneous' category for work I do for publishers which isn't really materials writing. It might involve reviewing, giving input on syllabus or word lists or the like. One project that slightly skewed the 2019-20 figures was my work on the Oxford 3000 word list and the position paper I wrote for OUP. I've lumped it all in as consulting, even though the final bit involved writing the paper for publication, just because it was all part of one project. Over the past year or so, the extra writing has come from four main writing projects - creating writing workshops for the Oxford Discover Futures students books (levels 5 and 6), plus two forthcoming projects which I'll post more about when they're published.

The first couple of months of 2021-2022 tax year have been very quiet so far with only a few odd bits and pieces of work; a couple of online talks, some blog posts and quite a bit of (unpaid) work in my role with the Hornby Trust.  Fingers crossed though there's a new project in the pipeline which might see a whole new category added to next year's chart and hopefully, a bounce back in the talks and training category whether that's online or maybe even in-person.

Labels: , ,

Tuesday, May 04, 2021

Text checkers: an overview

I’ve been mulling over a post about text analysis tools for ages but kept putting it off because I felt like I should research all the different tools out there thoroughly first. A recent post by Pete Clements though has forced my hand, so here’s my thoughts on the tools I have seen and tried. I should also say that I’m just focusing on the vocab aspect of the tools, not any other analysis features they have such as readability scores and the like.

So, what is a text analyser? Basically, it’s an online tool that allows you to cut and paste a text that you’d like to use with students into a box, you hit a button and it comes back with stats about the text. In particular, what most ELT materials writers are interested in is the level of the vocab. We’re usually looking for a breakdown by CEFR level to tell us whether the text is suitable for a particular class/level and which words might be “above level”.

Before you use any kind of text analysis tool though, here are some basics to bear in mind:


It’s really important to understand how the tool you use is making those judgements about level. Most tools use some kind of word list that’s been developed to peg individual words to CEFR levels. It goes without saying that this in itself is fraught with problems – my blog post here looks at some of them. But if we’re accepting the basic premise of using a word list, then you need to know which one. If you can’t find out which list a tool is using, then I’d probably say, don’t use it because you can’t know what it’s showing you.

 A number of tools use Cambridge’s English Vocabulary Profile (EVP) list – the key thing to understand about EVP is it ranks words (largely) by productive level – so the level at which you might typically expect a student to be using a word themselves. Given the way we acquire vocab that might be a level (or two) after students recognize and can understand the same word receptively, i.e. if they read it. The Oxford 3000, on the other hand, ranks vocab more by receptive level, so the point at which students will typically be able to read and understand a word.

Text analysers use clever algorithms to analyse the text you input, but these have a number of shortcomings it’s really important to be aware of:


Starting at the most basic level, the tools don’t always correctly identify the part of speech of a word, especially words that have the same forms across parts of speech. So, weather is most frequently a noun, but can also be a verb, national is mostly an adjective, but can be a noun (a foreign national). Most tools will opt for the most common form and label its level accordingly. I put in the sentence:  Some people who contract this virus can feel very poorly for three to four weeks. And most of the tools identified contract here as a noun and labelled it as around B1, when in fact it’s a verb and EVP pegs it at C2.

Text Inspector: Out of all of the tools, the only one I’ve found to really deal with this issue is Text Inspector which allows you to click on any word in a text that looks like it’s been tagged incorrectly and choose the correct use (meaning and part of speech) from a drop-down menu.  Of course, that means you have to spot the incorrectly tagged words, but it’s better than most. 

[Click to enlarge the image].

Oxford Text Checker: If you hover over a word in your text with more than one possible part of speech, the Oxford Text Checker shows a box giving the CEFR level of each one, e.g. v = A2, n = B1. Although it only shows the level for the most basic meaning of the word (see multi-sense words below).

Many tools also fail to label certain words, especially function words. So, in the sentence - Others end up in hospital needing oxygen. – many tools left others without a level because they just weren’t sure what to do with it grammatically.  Contractions (they’ve, she’d, who’s) also tend to go unlabelled, but are rarely a big issue.


English is a highly polysemous language; lots of words have multiple meanings which students are likely to come across and recognize at different levels. Most published word lists take this into account and assign different level labels to different meanings. Most text analysers though just opt for the most frequent (and usually lowest level) sense. We’ve already seen that to an extent with contract above, but even without the part of speech issue, if you put in a sentence like - There are links in the table below.table will be shown as A1 (the piece of furniture) rather than A2/B1 (a graphic).

Text Inspector: As we saw above, Text Inspector gets around this by offering drop-downs for any words you suspect may be used in a less obvious sense.


For me, the biggest issue to look out for with text analysers is that they mostly treat words individually and ignore the fact that a large proportion of most texts (30-50% by some estimates) is made up of chunks; phrasal verbs (end up, carry on), mundane phrases (of course, as usual, a lot of) and idioms (under the weather, have no idea). And of course, a phrase is often going to have a very different level from the sum of its parts.

Unlike with the more glaring mis-tags of part of speech and meaning, I think multi-word items are far more difficult to spot because as expert speakers, we tend to read through them without noticing, but for students, an unknown phrasal verb can be a real stumbling block. It takes a keen eye to spot every phrase and phrasal verb in a text when it hasn’t been tagged.

Text Inspector: Again, the only tool to rate a bit better here is Text Inspector. It does at least manage to identify some phrases. In the sample text that I’ve been using for this post, it correctly picked out the phrasal verbs end up and carry on and the phrase have no idea.

It didn’t recognize pass it on (presumably because of the object in-between), but you can click on pass and choose the phrasal verb sense from the drop-down. Similarly, it didn’t pick up under the weather, but again, you can click on weather and select the idiom and it changes weather from A1 to C2. It doesn’t allow you to neatly link up the whole phrase (I don’t think), but it’s a reasonable compromise.



You’ll probably have gathered by this point that Text Inspector is very clearly out in front when it comes to analysing vocab from an ELT perspective. I subscribe to the paid version which gives you full functionality. You’ll find a link to a free version in Pete’s post which does much the same, but I’m not going to reshare it because, well, I think we should be paying for the good stuff and it’s a very small amount to invest for a really useful resource.

Here’s a brief overview of some of what’s out there though:

Text Inspector

Free version has limited functionality and doesn’t give CEFR analysis. Sign up for the paid version to get everything via: https://textinspector.com/

Word lists: The paid version allows you to analyse the vocab in a text in terms of EVP, AWL (the Aacademic Word List), BNC and COCA – these last two are corpora and it shows you the frequency of words as they appear in each corpus – useful if you’re into corpora.

Comments: By far the best I’ve seen in terms of at least trying to take into account the factors above.

Oxford Text Checker

Free via: https://www.oxfordlearnersdictionaries.com/text-checker/ (If you get to the main dictionary home page, click on Resources to find it)

Word lists: Based on the Oxford 3000 & 5000.

Comments: Easy to use and colour codes words by CEFR level. However, it always opts for the most common form/meaning of a word and doesn’t recognize phrases. If you hover over a word, it does at least show different CEFR options for different parts of speech, e.g. hovering over feel, you get a box showing v=A1 n=B2. You can also double-click on any of the words in your text to go direct to the dictionary entry which is useful for quickly checking the CEFR label against different meanings. It also has options to create word lists and activities from texts, but given the shortcomings, I wouldn’t be inclined to use them without heavy editing.



Free via: https://www.vocabkitchen.com/profile

Word lists: It shows words on the AWL and NAWL (New Academic Word List). It also claims to show words by CEFR level, but I can’t find out what word list it’s using which for me is a bit of a red flag.

Comments: It’s intuitive and easy to use, but again doesn’t account for different meanings or phrases. I believe it has more options if you register and sign in which I haven’t tried out.

EDIA Papyrus

Free but you need to register via: https://papyrus.edia.nl/

Word lists: This site is based on a mix of experts’/teachers’ assessments of the level of texts and AI.

Comments: Quite a nice interface, but it seems to skip quite a few words in your input text - not just function words, but you can see below it completely ignores the phrasal verb end up. And as above, doesn’t deal with different meanings or phrases.



Free via: https://www.lextutor.ca/vp/

Word Lists: Originally designed for corpus geeks, the main focus for this tool is around corpus frequencies and the AWL. It does now include a CEFR option, but reading through the blurb, the CEFR levels seem to be based on some very old (1990) word lists published by Cambridge way back before this became a properly researched area, so I’m not sure how useful they are.

Comments: A horrible user interface, still really for geeks only. It's so messy, I couldn't even get a meaningful screenshot.

Pearson/GSE Text Analyzer

Free via: https://www.english.com/gse/teacher-toolkit/user/textanalyzer

Word lists: based on Pearson’s own Global Scale of English (GSE) lists

Comments: I hesitated to even include this as it’s just plain weird – unless I’ve missed something. It calculates an overall level for your text but doesn’t show the level of individual words. It does highlight words that it judges to be ‘above level’, but the choices seem to be a bit random. It pegged my sample text at B1+, then picked out poorly and passing as above level, ignoring asymptomatic.



Labels: , , , , ,