The occasional ramblings of a freelance lexicographer

Sunday, November 12, 2023

The cost of speaking: ELT conferences as a freelancer

Recently, I’ve seen a number of posts and calls for papers for upcoming ELT conferences and had a few online exchanges with colleagues trying to encourage me to sign up for events coming up in 2024.

I love going to ELT conferences. As a freelancer who spends most of the year working alone at home, it’s a really good opportunity to catch up with ELT colleagues on both a social and professional level. It’s a good way to get a feel for what’s going on in the industry – sometimes just confirming what I’ve been seeing on social media, but occasionally throwing up new perspectives and information as well. And it’s a really important networking opportunity – maintaining contacts, making new ones, finding out who’s moved where in the ELT publishing merry-go-round, and talking to the right people about potential new projects.

Speaking at an event boosts all of those things. If your name’s in the programme, it flags to people you’re there and they’re more likely to seek you out, whether or not they come along to your session. It raises your profile and reminds people of your area of expertise. It also gives you a focus for the event.

However … attending and speaking at a conference is really expensive, not just in terms of the obvious costs of registration and travel, but the indirect costs too which are often far more significant.

The surfaces costs: The costs of registration can range enormously depending on the event. Sometimes speakers get free registration, but often they don’t. There’s travel and accommodation. Then you can add on all the miscellaneous costs of things like food and endless cups of coffee!

The prep: For me, this is the biggest ‘cost’ of speaking at an event. To prepare a new talk from scratch takes a lot of time:

·         initial time to come up with an idea and sketch out notes

·         putting together a proposal. Proposal forms vary – some are quite simple and just require a title and summary, others are more long-winded with abstracts and summaries and various other questions to be answered, all within strict word limits.

·         putting together your actual session, deciding on a structure, doing your research, designing your slides and deciding what you’ll say to go with each one.

·         rehearsing and working out your timings, then rehearsing again until you’re confident in what you’re going to say.

Nowadays, there’s often a requirement to do some form of promotion for your session. Increasingly, that’s a video which you’re told “won’t take long”, but in my experience, usually swallows up at least a whole morning – deciding what to say, getting set up, doing a few takes, uploading and doing a bit of editing, then posting or sending on the final video.

How long all this takes depends, in part, on whether it’s a completely new talk or a new version of a talk you’ve given before. Occasionally, I’ve been able to reuse talks several times, but that’s dependant on whether the session fits in with the theme of an event, whether it’s still up-to-date and relevant, and whether a sponsor is happy for you to repeat it. (I’ll come onto the issue of sponsors in a bit.) But I’d say anywhere between 15 and 25 hours’ prep is what I’d expect.

As someone managing a chronic health condition, I have to strictly limit the hours I spend at my desk each week, so prep for a conference can’t just be added on top of my regular working hours. Of course, that time should count as ‘proper work’ whatever your circumstances, but in my case, it literally replaces paid hours on other jobs. So, if it takes me a week or more likely, a week-and-a-half’s worth of hours, I lose that much paid income.

The time out: As a freelancer, I don’t get paid for the days I’m actually at the event, both the days there and the travel days. So that’s more lost income to factor into the cost.

Adding all of that together, counting the prep hours and the days out at my usual hourly rate, can easily come to the equivalent of a whole month’s income for a long event life IATEFL. Similarly, for a foreign trip with the extra travel and accommodation costs. And that’s not some theoretical calculation, it’s real lost income. If I give an unsponsored talk at IATEFL, that effectively means no income for April that year. And yes, I know it’s an investment in my business, but that’s a big hit when you’re already trying to support yourself on a part-time income.

Sponsors: All of which is why, where possible I try to speak on behalf of publishers. That doesn’t necessarily cover all the costs – amounts offered for prep rarely cover all the time and they usually only cover one night’s accommodation and one day’s ‘income’ – but it makes it more manageable. Speaking on behalf of a publisher though is contingent on you having recently worked on a project that they want to promote and on the talk fitting in with their marketing plans. Flagship new coursebook series are always going to attract more marketing budget than the kind of smaller, niche vocab materials I tend to work on!

Speaking for a sponsor also relies on the event accepting proposals for ‘promotional’ talks. I was looking at one event this week which specifically rules them out – which I understand, but also rules me out as a speaker.

So, yes, I’d love to come along to all those great events out there, but budgets for 2024 will likely mean just IATEFL and maybe one more.

Labels: , , ,

Monday, October 23, 2023

Diversity and inclusion: what you exclude

In recent years, the concepts of diversity and inclusion have become increasingly spoken about in the context of ELT. As a lexicographer working on learner’s dictionaries, I often find there’s less scope to put these ideas into practice than in other ELT materials – we don’t have illustrations or discussion topics or characters taking part in dialogues, etc. We do, of course, have to deal with vocabulary that relates to a whole range of topics – no PARNSIPS in dictionaries!  - so sensitivity is required in how we deal with those entries. One area where I’m especially aware of who I include and how is in the selection of example sentences.

Dictionary examples and inclusion:

Each numbered sense of each entry in a learner’s dictionary has a number of example sentences to illustrate how the word or phrase is used. They aim to back up the definition in showing what the word means and also to exemplify how it’s most commonly used – its typical context(s), genre(s), collocations, grammatical forms and patterns, etc.

Dictionary examples are, by necessity, short and as a result, tend to lack context. That can make it difficult to show very much about the people or groups who get mentioned. A recent social media post highlighted the example below from the Cambridge Dictionary at the entry for “compliment” which does manage to use pronouns to subtly include a same-sex relationship.


From: Cambridge Dictionary (click to enlarge)

Besides indicating gender through pronouns, though, it’s actually quite difficult to include specific groups or characteristics in short, decontextualized example sentences. Characteristics such as race or disability are hard to show without becoming clunky and unnatural. I recently, for example, came across a really nice corpus example to illustrate a phrase that was about a non-verbal child, who may have been neurodiverse or had a learning disability. By the time I’d edited it down to fit within what was needed, however, that context had been rather lost, and I doubt that someone reading the example would have picked up on who it was about.

Holding a mirror up to society … and choosing to exclude:

What I’m perhaps more conscious of is what I leave out. Scrolling through hundreds of corpus lines illustrating a particular word or phrase is rather like holding a mirror up to society. And, to be honest, what you find reflected back isn’t always what you’d like to see. Although the corpora that publishers use to compile dictionaries are kept up to date, they do, necessarily, also include texts going back over time and from a wide range of sources.

One bias that I’ve been aware of since I first started out in lexicography some 25 years ago is the inherent sexism in language. Going back to those pronouns again, time and again I’ll sort a screenful of corpus lines and scroll down to find predominantly he/his or she/her coming up alongside a particular word or phrase. The stereotypes really jump out.

Now in some ways, stereotypes are a useful shortcut when you’re trying to get across an idea quickly and succinctly. For example, if I was trying to exemplify a verb such as drive or reverse or stall, it would make sense to use a car as the object of the verb, rather than, say, a juggernaut or ambulance or camper van. In an example sentence, we want to get the key idea across as simply as possible, keeping focused on the target vocab without unnecessary distractions or ambiguity.

Stereotypes involving people, however, are a different kettle of fish and at almost every word I deal with, I have to make myself stop and think about what I’m seeing in the corpus and whether I’m perpetuating unfair and unhelpful stereotypes. It’s relatively easy to make sure you get a gender balance where the word in question can quite reasonably be applied to any gender, and I’ve been putting in examples of female professors and footballers and male nurses and dancers for many years. Where things get trickier is where a word or phrase is (almost) always used about either men or women. As a descriptivist, I don’t want to distort language and give dictionary users an unrepresentative idea of how a word is actually used. That would be misleading and could potentially land a learner in hot water.

Two phrases I came across and hesitated over recently were “leaves nothing to the imagination” (to describe tight-fitting or skimpy clothes) and “brazen hussy”. A first look at the corpus showed both applied mainly to women and although most cites were fairly humorous and light-hearted, I felt both had uncomfortably sexist connotations. With a bit more digging, I found some examples of the first expression used about both men and women and I managed to come up with examples that weren’t overly stereotypical.  Hussy, however, brazen or otherwise, is just used as an offensive term for a woman or girl, so it went the way of other offensive terms, clearly labelled as such and defined but receiving minimal treatment in terms of examples.

Gradual progress:

It's important to realize that dictionaries are huge and sprawling projects containing many thousands of entries and tens of thousands of example sentences, compiled and edited over decades. In many cases, dictionary departments have very limited budgets and although work continues to keep dictionary content up to date, the idea that a publisher could go through all the examples in a particular dictionary to check for DEI is just unfeasible, especially in the light of rapidly shifting norms. So, progress is slow. During any update, lexicographers will be on the look-out for examples that look dated or no longer feel acceptable. They make changes and tweaks where they can - the compliment example above was her not his in an earlier edition. They include a more diverse range of people and contexts where they can and exclude more examples of harmful and offensive stereotypes. But it's not an overnight fix, so bear with us.

Footnote 1: Brazen Hussy is also a type of flower, so named because its bright yellow flowers set against dark purple leaves stand out so brazenly when it blooms in early spring:


Footnote 2: AI doesn’t do this – it just takes the data it’s fed at face value and replicates the stereotypes and biases. Just saying …

Monday, September 25, 2023

Language data and permissions: AI vs corpora

Like everyone else lately, I haven’t been able to avoid hearing about generative AI, including a session at last week’s Freelancers’ AwayDay. One point in the debates around it though has jumped out at me, in particular. The language data collected to train the large language models that power the likes of ChatGPT is scraped from the internet with no attempt to get permission from the original creators of those texts. This has raised alarm bells with writers such as the US Authors Guild who are taking action on the issue.

As a lexicographer, I work with very large collections of language data every day: corpora. Working with publishers’ corpora on materials for publication I’m very aware that permission and copyright are issues we absolutely don’t ignore. The multi-billion-word corpora collected and held by dictionary publishers contain material that has been added with the permission of the copyright holders. This is generally the publishers of the texts rather than the individual writers allowing for the collection of large quantities of data, such as all the newspapers published by a particular media group or all the academic journals from a certain academic publisher.

That permission also comes with restrictions on how the data can be used. Specific agreements vary between corpora, but they usually include limitations about the length of excerpts that can be used, not generally a big problem when you need short dictionary examples. We are also generally required to edited examples drawn from a corpus so that they’re not obviously identifiable. Very ‘vanilla’ examples that could have come from anywhere – She left and closed the door behind her. – can safely be copied as they are, but any references to real people, places, events, or organizations will usually be removed, and often replaced with the minister, a company spokesperson, in the region, etc. Incidentally, this also has the advantage that the examples are less likely to date and will be accessible to a wider audience because they rely less on culturally-specific references.

As lexicographers, we do occasionally turn to other sources to research language, especially when we’re looking at new or niche uses for which we may have scant corpus evidence. In these cases, our editors are even more insistent about the need for caution. Ideally, we’ll refer to online sources to confirm how a word is being used, then try to make use of the few corpus examples we do have, informed by what we’ve seen elsewhere, to come up with appropriate example sentences.

I should note that I’m specifically talking about publishers’ corpora here. There are, of course, plenty of corpora out there, including numerous web corpora, where the issues around permission and copyright are very different. Many of these were originally collected for academic purposes, i.e. principally to research language usage rather than to publish commercial materials. They should also come along with notes about how they can be used – although I wonder how many users actually read the small print.

As writers ourselves, in whatever form, I think we should be especially aware of how the content we create is being used, and potentially abused, with and without our permission, and crucially, in turn, have respect for how we use the intellectual property of others.

Labels: , ,

Wednesday, July 19, 2023

Mundane language change

When I talk about language change and the need to keep dictionaries up-to-date with current language usage, people tend to immediately start talking about new words, and especially trending new coinages that they may have come across in the media, the likes of wokefishing or quiet quitting. But actually a huge amount of language change is much subtler and much more mundane.

Yesterday, I was looking into the noun form, in the sense of an application form or an entry form for a competition. Looking at different learner's dictionary definitions, I found a split between those which still describe it as a piece of paper to be written on, what we might now call a hard copy (a retronym) and those that have shifted to the more neutral description of a document which implies that it could be a piece of paper or in a digital format, maybe online. And of course, the word document itself has shifted and stretched in the same way.

You probably hadn't even noticed that the concept of a form, that always used to be a piece of paper, has slowly morphed to encompass digital and online formats too without us feeling the need for a new word - in the same way, for example, that we distinguish between a letter and an email.

A lot of language change is similarly undramatic. Words slowly shift from one usage to something slightly different or stretch seemlessly to encompass new concepts. As lexicographers, we have to be alert to these shifts, to gently tweak definitions to keep them current, and edit examples to reflect contemporary usage - in this case, likely showing examples that refer to both paper forms and digital ones.

Labels: , ,

Thursday, July 06, 2023

Lexicography FAQs: messy entries

Last week, I was speaking at the BAAL Vocab SIG conference about the process of compiling an entry for a learner's dictionary. I talked about some of the questions that you end up asking as you carry out your corpus research, and the variety of challenges and choices you're faced with: from how many variant forms of a word to show, to what constitutes a separate part of speech, to how finely to split out different senses of a word, and what uses and patterns to exemplify.

I mentioned how entries can range in length from very simple, single-sense words to the mammoth entry for run, the longest entry in most contemporary learner's dictionaries, running to 120 numbered senses in the Oxford Advanced Learner's Dictionary (see what I did there?! ).

This week, I've been thinking about how some entries are really simple and straighforward to compile, while others turn out to be messy and entangled. A couple of medical-related entries I've dealt with recently exemplify that nicely. The entry for cynaosis, despite being a fairly specialized medical term, turned out to be a really simple one to compile. It only has a single, clearly-defined meaning and it's one that can be explained easily within a defining vocabulary.

CCU, on the other hand, turned out to be a complicated mess. Abbreviations can be tricky for a number of reasons. Firstly, they're hard to search for in the corpus because the same abbreviation often gets used to refer to lots of different things, some of them things you wouldn't put in the dictionary, like names of companies or products or local sports clubs, etc., but also sometimes more than one generally-used concept that's relatively high frequency and that learner's might reasonably look up. Then there's the question of whether to have full entries for both the abbreviation and full form or maybe just a cross-reference at the abbreviation pointing to the full form. In the days of print dictionaries when space was at a premium, x-refs would be widely used, but online, it seems unnecessary to send a user round in circles when you could just give a full definition at both. Different publishers and projects will have detailed policies for these kinds of things set out in the styleguide, but sometimes decisions are still left, in part, to the discretion of the lexicographer, considering things such as overall frequency of the term and the relative frequencies of the abbreviation and full form. CCU, as you can see below, led me down a whole rabbit hole of different questions and choices both about the abbreviation itself and other possible variants and inclusions!

So, it seems that CCU can be an abbreviation for coronary care unit or cardiac care unit, which are both the same thing. However, such units are also sometimes called just coronary units or cardiac units - in which case, the abbreviation wouldn't be CCU. CCU can also refer to a critical care unit, which is something different, but mostly synonymous with intensive care unit, for which the abbreviation is ICU ... are you still following?!
And as I mentioned in my session last week, all those decisions about what to show, where and how have to be filtered through the lens of what will be most helpful for the user. You're always balancing wanting a learner to find the meaning or form of the word (or abbreviation) they've come across, which leans towards "include everthing", but at the same time, you know that they also want simple, concise answers rather than a confusing mess of too much information. Because, TL;DR!

Labels: , ,

Monday, June 05, 2023

Phrasal verbs: delivering on a trend

A couple of years ago, I worked on two phrasal verb projects for Collins, a new edition of the Collins COBUILD Phrasal Verbs Dictionary and Work on your Phrasal Verbs (2e), with my friend and frequent collaborator, Penny Hands. We ended up having quite a few discussions about the increasing trend for phrasal verbs and the reasons behind it. Penny wrote a post about it on the Collins ELT blog in which she discusses not just the completely new phrasal verbs that have come into use, but also the trend to add particles after verbs more often.

Since then, it’s something I can’t help noticing, both in everyday life and when I’m researching language for other projects. Last week, I was looking at the verb deliver and came across another phrasal verb trend that seems to have built over the past few years, deliver on sth.


[click to enlarge]

It isn’t a completely new combination, of course. Looking back at the old BNC compiled in the mid-1990s, I can find examples of the classic collocation, deliver on a promise, plus just 2 or 3 similar objects:


Looking at more recent corpus data though, it’s clear that classic collocation has expanded to include a much greater range of objects in recent years:


And it’s been extended from people delivering on promises, to things, especially products, delivering on what you’d hoped for:


Labels: , ,

Wednesday, March 29, 2023

Gruesome plurals

When you work on ELT materials, you can end up researching all kinds of random topics and in lexicography, that mix can be even more unpredictable. Over recent months, I’ve been working on a lot of medical terminology; researching terms with the help of input from a specialist medical editor, then checking corpus evidence, finding usable examples, and putting together dictionary entries. It’s been fascinating learning the names for all kinds of body parts and working out how things fit together. Google image searches have been particularly useful for visualizing exactly where your unciform bone or suprachiasmatic nucleus are!

One thing that’s very noticeable is the amount of terminology that originates from Latin and Greek and has irregular plural forms that have to be checked and included. I’ve learnt that the plural of stroma is stromata, that more than one fimbria can be described as fimbriae, and that you have one tragus on each ear giving you a pair of tragi … amongst many many others.

However, there are many anatomical features which you only have one of, so while they’re technically countable nouns, they’re overwhelmingly used in the singular. And actually, many things which we have pairs of are predominantly referred to singularly too; The tragus is a small piece of cartilage on the inner side of the external ear.  So, when checking for plural forms, I often have to do quite a bit of searching.

I’ve gradually come to realize though that plural body parts only tend to crop up in medical research. Sometimes it’s a study involving several patients, which is okay. More often than not though, the plurals appear in rather gruesome animal experiments – in contexts that really ought to have trigger warnings for the unsuspecting! Thankfully, I’m mostly just checking that the irregular plural is used (and hasn’t been anglicized to stromas or traguses, sometimes the case for very common terms) and I don’t have to include any gruesome examples.

Labels: , ,