Lexicoblog

The occasional ramblings of a freelance lexicographer

Wednesday, March 13, 2024

ASHDRA: five years of ELT dictionary research

2023 was a busy year for ASHDRA and we’re looking forward to lots of new developments in 2024, as we mark 5 years since ASHDRA was launched.

Completed research:

This year we’ve published several new reports/summaries from completed ASHDRA projects:

Dr Aisling O’Boyle, from Queens University Belfast, investigated the use of dictionaries by refugees in Northern Ireland. Dr Thomai Dalpanagioti, from Aristotle University of Thessaloniki, Greece, explored a different kind of lexicographic resource to encourage her university-level EFL learners to develop their use of metaphor as part of a writing course. Dr R Vennela, from the National Institute of Technology, Warangal, India, developed and piloted sample bilingual English-Telegu picture dictionary entries with classes in a rural village in Andhra Pradesh. And Professor Amy Chi, from Hong Kong University of Science and Technology, developed and trialled a Teaching Pack aimed at helping EFL teachers to improve their knowledge of contemporary learner’s dictionaries.

You can, of course, read the reports of completed ASHDRA projects, along with shorter, one-page summaries here.

News, events, & publications:

In 2023, ASHDRA awardees past and present took part in a number of events and conferences. In April, Chinh Ngan Nguyen Le presented her Semi-Med project, which developed a novel reference resource focused on semi-technical vocabulary for medical students, at the IATEFL Conference in Harrogate. In June, ASHDRA researchers, Lorna Morris, Thomai Dalpanagioti, and Priya Mathew presented their research (virtually) at the eLex Conference and again in September at the AFRILEX Conference. You can find a video of their presentations here.

In other alumni news, past ASHDRA researchers, Yan Yan Yeung and Chinh Ngan Nguyen Le have both been awarded their PhDs – congratulations to them both! And Agus Riadi has received a scholarship to pursue a PhD at Coventry University.

The ASHDRA panel members were also busy. Michael Rundell delivered a keynote at the ASIALEX Conference in Seoul in June, looking at the potential impact on dictionary creation of Large Language Models such as ChatGPT. Michael also published an article in HLT magazine about ASHDRA, which you can read here. I spoke at IATEFL in April about the role of CEFR labels in learner’s dictionaries and gave a keynote to the BAAL VocabSIG conference in Nottingham in June about the role of vocabulary research in practical lexicography.

Coming up in 2024:

Two ASHDRA research projects are due for completion this year – so watch out for new reports on the website in the autumn. I’m looking forward to speaking about the fascinating range of research that’s already come out of ASHDRA projects over the past 5 years at the IATEFL Conference in Brighton on 16 April. Amy Chi will be repeating her LexTeach workshop for Japanese school teachers in September as part of the ASIALEX conference in Tokyo. And ASHDRA researchers will be reporting on their projects at the Euralex Conference in Croatia in October.

2024 Call for Proposals OPEN

Finally, this year’s call for proposals is currently open. If you, or one of your colleagues, might be interested in applying, there’s still time to get your application in before the 14 April deadline. Visit the website for more details and an application form.

Labels: ASHDRA, research

Sunday, January 21, 2024

Is there enough ‘research’ in your research proposal?

The AS Hornby Dictionary Research Awards (ASHDRA) are entering their fifth year in 2024. As new batches of dictionary-related research proposals come in each year, new issues crop up for us to discuss on the ASHDRA panel. I’ve previously written about over-ambitious dictionary development projects here.

Recently, we met up ahead of the 2024 call for proposals going out at the start of February and one of the talking points that came out of last year’s submissions was the question of what constitutes ‘research’.

We welcome any research angle that relates to dictionaries in English language teaching. One popular theme, though, involves developing new lexicographic resources in some form, especially to target the needs of a particular group of learners who are currently under-served. Targeting a need is a valid aim for a piece of research, but in some cases, we found that proposals were really materials development projects rather than research, i.e. the applicant had essentially already decided what kind of resource they wanted to create and just wanted funding to develop it. And that’s just not what the awards are about.

So, what do we mean by ‘research’? According to the call for proposals, we define research as “original investigation undertaken in order to gain and extend knowledge and understanding” – but what does that look like in practice?

Researching the need:

Many applicants already have a clear idea of what gap they think needs to be filled, often based on their own teaching experience. That makes a great starting point, but is it enough? Could you find out more about exactly what issues your learners have with existing dictionary resources? Are they actually turning to dictionaries at all or just googling their language queries? What kinds of things do they find problematic? What would they find more helpful?

Is your planned resource going to target learners exactly like your own students (same age, same level, same educational context, same L1/country/culture) or do you want to appeal to a wider audience? If so, how much do you know about the needs of that wider group of learners? If, for example, you work mainly in university-level EAP, you may be used to higher-level learners or those who are majoring in English, so are linguistically-minded and motivated. They will have a very different profile (and needs and skills) from the average high-school student who may not be interested in language at all and just wants quick fixes to get them through. Can you approach other teachers to draw on their experiences?

By investigating these kinds of questions (through questionnaires, interviews, focus groups or activities to find out about dictionary usage behaviours and needs), you might find that some of your initial intuitions were correct, but you might also turn up things you hadn’t considered and that might significantly shape what you do next.

Also, before you jump in feet first, have you thoroughly investigated what other published resources are available? Does the resource you’re planning actually already exist or perhaps, something similar to it? There’s no point in reinventing the wheel when you could learn from others. Make sure you thoroughly research and review what’s already available. Even if what you find isn’t exactly right for your target group, there may be elements you can draw on, especially if you don’t have past lexicography experience.

⚠️ A word of warning here: although it’s helpful to draw on already published dictionaries for inspiration; for style, layout, what information to include, etc.:

Remember that published dictionaries are subject to copyright so while they can provide a helpful guide, you shouldn’t just copy definitions for your own use without permission.
Make sure you draw on a relevant resource for your audience. Don’t imitate the style of definitions from a reference dictionary aimed at L1 speakers of English, especially not a dictionary of record such as the OED or Merriam-Webster, if you’re writing for EFL.

Research during your initial design phase:

Developing any dictionary-style resource will likely involve some form of corpus research – investigating the meaning and usage of lexical items, sourcing authentic example sentences – these things are a lexicographer’s bread and butter.

But will you be using corpus tools in a novel way to research the language going into your dictionary entries? Will you build your own corpus, perhaps in a specialized area? Maybe, like previous ASHDRA researcher, Chinh Ngan Nguyen Le, you’ll use a novel approach to analysis. She used collocation searches to establish the key senses of polysemous semi-technical medical terms and used the results to create visual representations of each term.

When you’ve created your first few draft entries, you could get some informal pre-trial feedback, perhaps from a handful of colleagues. Is there anything they think you’ve missed out? Anything that’s unclear or potentially confusing? You could design up a couple of different entry layouts to find out which format they prefer. It’s easier to iron out any problems early on before you’ve done too much work.

(Action) Research and development:

Once you have a set of draft entries, then an action research approach works well in this context to trial your material.

That might involve recruiting groups of learners (your target end users) or it might be more appropriate for teachers to trial the resource in the classroom.

One of the keys to effective trialling is making sure you’re going to get enough detailed feedback that you can then act on to improve on your first draft. A handful of general feedback questions might just elicit rather vague, polite responses and give you little to work with. It’s important to design your trial carefully, to think about how to collect feedback and also make sure you leave enough time for it. Again, you could use questionnaires, interviews or focus groups. If your materials are being trialled in class, is it possible to observe or video the lesson to see for yourself not just how the students react to the materials, but also how the teachers make use of them? In one recent ASHDRA project, for example, the researcher found that despite training sessions with teachers ahead of the trail, in practice, the teachers tended to revert to their familiar teaching style and didn’t fully make use of the new resource as planned.

Once you’ve got your feedback and used it to redesign and improve your resource – or possibly any accompanying teacher’s guide - then you ideally need to go through the same process again. A second trial will tell you whether you’ve managed to get your resource spot-on or whether there’s still some tweaking needed.

Where your main research elements come will, of course, come down to the exact nature of your project, but the key thing to bear in mind is that just putting your existing ideas into practice doesn’t constitute research on its own. You need to investigate something and to come up with results and findings at the end of the process. And remember, it’s often the things that didn’t work that tell us more than those that went smoothly.

Labels: ASHDRA, dictionaries, research

Sunday, November 12, 2023

The cost of speaking: ELT conferences as a freelancer

Recently, I’ve seen a number of posts and calls for papers for upcoming ELT conferences and had a few online exchanges with colleagues trying to encourage me to sign up for events coming up in 2024.

I love going to ELT conferences. As a freelancer who spends most of the year working alone at home, it’s a really good opportunity to catch up with ELT colleagues on both a social and professional level. It’s a good way to get a feel for what’s going on in the industry – sometimes just confirming what I’ve been seeing on social media, but occasionally throwing up new perspectives and information as well. And it’s a really important networking opportunity – maintaining contacts, making new ones, finding out who’s moved where in the ELT publishing merry-go-round, and talking to the right people about potential new projects.

Speaking at an event boosts all of those things. If your name’s in the programme, it flags to people you’re there and they’re more likely to seek you out, whether or not they come along to your session. It raises your profile and reminds people of your area of expertise. It also gives you a focus for the event.

However … attending and speaking at a conference is really expensive, not just in terms of the obvious costs of registration and travel, but the indirect costs too which are often far more significant.

The surfaces costs: The costs of registration can range enormously depending on the event. Sometimes speakers get free registration, but often they don’t. There’s travel and accommodation. Then you can add on all the miscellaneous costs of things like food and endless cups of coffee!

The prep: For me, this is the biggest ‘cost’ of speaking at an event. To prepare a new talk from scratch takes a lot of time:

· initial time to come up with an idea and sketch out notes

· putting together a proposal. Proposal forms vary – some are quite simple and just require a title and summary, others are more long-winded with abstracts and summaries and various other questions to be answered, all within strict word limits.

· putting together your actual session, deciding on a structure, doing your research, designing your slides and deciding what you’ll say to go with each one.

· rehearsing and working out your timings, then rehearsing again until you’re confident in what you’re going to say.

Nowadays, there’s often a requirement to do some form of promotion for your session. Increasingly, that’s a video which you’re told “won’t take long”, but in my experience, usually swallows up at least a whole morning – deciding what to say, getting set up, doing a few takes, uploading and doing a bit of editing, then posting or sending on the final video.

How long all this takes depends, in part, on whether it’s a completely new talk or a new version of a talk you’ve given before. Occasionally, I’ve been able to reuse talks several times, but that’s dependant on whether the session fits in with the theme of an event, whether it’s still up-to-date and relevant, and whether a sponsor is happy for you to repeat it. (I’ll come onto the issue of sponsors in a bit.) But I’d say anywhere between 15 and 25 hours’ prep is what I’d expect.

As someone managing a chronic health condition, I have to strictly limit the hours I spend at my desk each week, so prep for a conference can’t just be added on top of my regular working hours. Of course, that time should count as ‘proper work’ whatever your circumstances, but in my case, it literally replaces paid hours on other jobs. So, if it takes me a week or more likely, a week-and-a-half’s worth of hours, I lose that much paid income.

The time out: As a freelancer, I don’t get paid for the days I’m actually at the event, both the days there and the travel days. So that’s more lost income to factor into the cost.

Adding all of that together, counting the prep hours and the days out at my usual hourly rate, can easily come to the equivalent of a whole month’s income for a long event life IATEFL. Similarly, for a foreign trip with the extra travel and accommodation costs. And that’s not some theoretical calculation, it’s real lost income. If I give an unsponsored talk at IATEFL, that effectively means no income for April that year. And yes, I know it’s an investment in my business, but that’s a big hit when you’re already trying to support yourself on a part-time income.

Sponsors: All of which is why, where possible I try to speak on behalf of publishers. That doesn’t necessarily cover all the costs – amounts offered for prep rarely cover all the time and they usually only cover one night’s accommodation and one day’s ‘income’ – but it makes it more manageable. Speaking on behalf of a publisher though is contingent on you having recently worked on a project that they want to promote and on the talk fitting in with their marketing plans. Flagship new coursebook series are always going to attract more marketing budget than the kind of smaller, niche vocab materials I tend to work on!

Speaking for a sponsor also relies on the event accepting proposals for ‘promotional’ talks. I was looking at one event this week which specifically rules them out – which I understand, but also rules me out as a speaker.

So, yes, I’d love to come along to all those great events out there, but budgets for 2024 will likely mean just IATEFL and maybe one more.

Labels: conferences, costs, freelancing, IATEFL

Monday, October 23, 2023

Diversity and inclusion: what you exclude

In recent years, the concepts of diversity and inclusion have become increasingly spoken about in the context of ELT. As a lexicographer working on learner’s dictionaries, I often find there’s less scope to put these ideas into practice than in other ELT materials – we don’t have illustrations or discussion topics or characters taking part in dialogues, etc. We do, of course, have to deal with vocabulary that relates to a whole range of topics – no PARNSIPS in dictionaries! - so sensitivity is required in how we deal with those entries. One area where I’m especially aware of who I include and how is in the selection of example sentences.

Dictionary examples and inclusion:

Each numbered sense of each entry in a learner’s dictionary has a number of example sentences to illustrate how the word or phrase is used. They aim to back up the definition in showing what the word means and also to exemplify how it’s most commonly used – its typical context(s), genre(s), collocations, grammatical forms and patterns, etc.

Dictionary examples are, by necessity, short and as a result, tend to lack context. That can make it difficult to show very much about the people or groups who get mentioned. A recent social media post highlighted the example below from the Cambridge Dictionary at the entry for “compliment” which does manage to use pronouns to subtly include a same-sex relationship.

From: Cambridge Dictionary (click to enlarge)

Besides indicating gender through pronouns, though, it’s actually quite difficult to include specific groups or characteristics in short, decontextualized example sentences. Characteristics such as race or disability are hard to show without becoming clunky and unnatural. I recently, for example, came across a really nice corpus example to illustrate a phrase that was about a non-verbal child, who may have been neurodiverse or had a learning disability. By the time I’d edited it down to fit within what was needed, however, that context had been rather lost, and I doubt that someone reading the example would have picked up on who it was about.

Holding a mirror up to society … and choosing to exclude:

What I’m perhaps more conscious of is what I leave out. Scrolling through hundreds of corpus lines illustrating a particular word or phrase is rather like holding a mirror up to society. And, to be honest, what you find reflected back isn’t always what you’d like to see. Although the corpora that publishers use to compile dictionaries are kept up to date, they do, necessarily, also include texts going back over time and from a wide range of sources.

One bias that I’ve been aware of since I first started out in lexicography some 25 years ago is the inherent sexism in language. Going back to those pronouns again, time and again I’ll sort a screenful of corpus lines and scroll down to find predominantly he/his or she/her coming up alongside a particular word or phrase. The stereotypes really jump out.

Now in some ways, stereotypes are a useful shortcut when you’re trying to get across an idea quickly and succinctly. For example, if I was trying to exemplify a verb such as drive or reverse or stall, it would make sense to use a car as the object of the verb, rather than, say, a juggernaut or ambulance or camper van. In an example sentence, we want to get the key idea across as simply as possible, keeping focused on the target vocab without unnecessary distractions or ambiguity.

Stereotypes involving people, however, are a different kettle of fish and at almost every word I deal with, I have to make myself stop and think about what I’m seeing in the corpus and whether I’m perpetuating unfair and unhelpful stereotypes. It’s relatively easy to make sure you get a gender balance where the word in question can quite reasonably be applied to any gender, and I’ve been putting in examples of female professors and footballers and male nurses and dancers for many years. Where things get trickier is where a word or phrase is (almost) always used about either men or women. As a descriptivist, I don’t want to distort language and give dictionary users an unrepresentative idea of how a word is actually used. That would be misleading and could potentially land a learner in hot water.

Two phrases I came across and hesitated over recently were “leaves nothing to the imagination” (to describe tight-fitting or skimpy clothes) and “brazen hussy”. A first look at the corpus showed both applied mainly to women and although most cites were fairly humorous and light-hearted, I felt both had uncomfortably sexist connotations. With a bit more digging, I found some examples of the first expression used about both men and women and I managed to come up with examples that weren’t overly stereotypical. Hussy, however, brazen or otherwise, is just used as an offensive term for a woman or girl, so it went the way of other offensive terms, clearly labelled as such and defined but receiving minimal treatment in terms of examples.

Gradual progress:

It's important to realize that dictionaries are huge and sprawling projects containing many thousands of entries and tens of thousands of example sentences, compiled and edited over decades. In many cases, dictionary departments have very limited budgets and although work continues to keep dictionary content up to date, the idea that a publisher could go through all the examples in a particular dictionary to check for DEI is just unfeasible, especially in the light of rapidly shifting norms. So, progress is slow. During any update, lexicographers will be on the look-out for examples that look dated or no longer feel acceptable. They make changes and tweaks where they can - the compliment example above was her not his in an earlier edition. They include a more diverse range of people and contexts where they can and exclude more examples of harmful and offensive stereotypes. But it's not an overnight fix, so bear with us.

Footnote 1: Brazen Hussy is also a type of flower, so named because its bright yellow flowers set against dark purple leaves stand out so brazenly when it blooms in early spring:

Footnote 2: AI doesn’t do this – it just takes the data it’s fed at face value and replicates the stereotypes and biases. Just saying …

Monday, September 25, 2023

Language data and permissions: AI vs corpora

Like everyone else lately, I haven’t been able to avoid hearing about generative AI, including a session at last week’s Freelancers’ AwayDay. One point in the debates around it though has jumped out at me, in particular. The language data collected to train the large language models that power the likes of ChatGPT is scraped from the internet with no attempt to get permission from the original creators of those texts. This has raised alarm bells with writers such as the US Authors Guild who are taking action on the issue.

As a lexicographer, I work with very large collections of language data every day: corpora. Working with publishers’ corpora on materials for publication I’m very aware that permission and copyright are issues we absolutely don’t ignore. The multi-billion-word corpora collected and held by dictionary publishers contain material that has been added with the permission of the copyright holders. This is generally the publishers of the texts rather than the individual writers allowing for the collection of large quantities of data, such as all the newspapers published by a particular media group or all the academic journals from a certain academic publisher.

That permission also comes with restrictions on how the data can be used. Specific agreements vary between corpora, but they usually include limitations about the length of excerpts that can be used, not generally a big problem when you need short dictionary examples. We are also generally required to edited examples drawn from a corpus so that they’re not obviously identifiable. Very ‘vanilla’ examples that could have come from anywhere – She left and closed the door behind her. – can safely be copied as they are, but any references to real people, places, events, or organizations will usually be removed, and often replaced with the minister, a company spokesperson, in the region, etc. Incidentally, this also has the advantage that the examples are less likely to date and will be accessible to a wider audience because they rely less on culturally-specific references.

As lexicographers, we do occasionally turn to other sources to research language, especially when we’re looking at new or niche uses for which we may have scant corpus evidence. In these cases, our editors are even more insistent about the need for caution. Ideally, we’ll refer to online sources to confirm how a word is being used, then try to make use of the few corpus examples we do have, informed by what we’ve seen elsewhere, to come up with appropriate example sentences.

I should note that I’m specifically talking about publishers’ corpora here. There are, of course, plenty of corpora out there, including numerous web corpora, where the issues around permission and copyright are very different. Many of these were originally collected for academic purposes, i.e. principally to research language usage rather than to publish commercial materials. They should also come along with notes about how they can be used – although I wonder how many users actually read the small print.

As writers ourselves, in whatever form, I think we should be especially aware of how the content we create is being used, and potentially abused, with and without our permission, and crucially, in turn, have respect for how we use the intellectual property of others.

Labels: AI, corpora, lexicography

Wednesday, July 19, 2023

Mundane language change

When I talk about language change and the need to keep dictionaries up-to-date with current language usage, people tend to immediately start talking about new words, and especially trending new coinages that they may have come across in the media, the likes of wokefishing or quiet quitting. But actually a huge amount of language change is much subtler and much more mundane.

Yesterday, I was looking into the noun form, in the sense of an application form or an entry form for a competition. Looking at different learner's dictionary definitions, I found a split between those which still describe it as a piece of paper to be written on, what we might now call a hard copy (a retronym) and those that have shifted to the more neutral description of a document which implies that it could be a piece of paper or in a digital format, maybe online. And of course, the word document itself has shifted and stretched in the same way.

You probably hadn't even noticed that the concept of a form, that always used to be a piece of paper, has slowly morphed to encompass digital and online formats too without us feeling the need for a new word - in the same way, for example, that we distinguish between a letter and an email.

A lot of language change is similarly undramatic. Words slowly shift from one usage to something slightly different or stretch seemlessly to encompass new concepts. As lexicographers, we have to be alert to these shifts, to gently tweak definitions to keep them current, and edit examples to reflect contemporary usage - in this case, likely showing examples that refer to both paper forms and digital ones.

Labels: dictionaries, language change, lexicography

Thursday, July 06, 2023

Lexicography FAQs: messy entries

Last week, I was speaking at the BAAL Vocab SIG conference about the process of compiling an entry for a learner's dictionary. I talked about some of the questions that you end up asking as you carry out your corpus research, and the variety of challenges and choices you're faced with: from how many variant forms of a word to show, to what constitutes a separate part of speech, to how finely to split out different senses of a word, and what uses and patterns to exemplify.

I mentioned how entries can range in length from very simple, single-sense words to the mammoth entry for run, the longest entry in most contemporary learner's dictionaries, running to 120 numbered senses in the Oxford Advanced Learner's Dictionary (see what I did there?! ).

This week, I've been thinking about how some entries are really simple and straighforward to compile, while others turn out to be messy and entangled. A couple of medical-related entries I've dealt with recently exemplify that nicely. The entry for cynaosis, despite being a fairly specialized medical term, turned out to be a really simple one to compile. It only has a single, clearly-defined meaning and it's one that can be explained easily within a defining vocabulary.

CCU, on the other hand, turned out to be a complicated mess. Abbreviations can be tricky for a number of reasons. Firstly, they're hard to search for in the corpus because the same abbreviation often gets used to refer to lots of different things, some of them things you wouldn't put in the dictionary, like names of companies or products or local sports clubs, etc., but also sometimes more than one generally-used concept that's relatively high frequency and that learner's might reasonably look up. Then there's the question of whether to have full entries for both the abbreviation and full form or maybe just a cross-reference at the abbreviation pointing to the full form. In the days of print dictionaries when space was at a premium, x-refs would be widely used, but online, it seems unnecessary to send a user round in circles when you could just give a full definition at both. Different publishers and projects will have detailed policies for these kinds of things set out in the styleguide, but sometimes decisions are still left, in part, to the discretion of the lexicographer, considering things such as overall frequency of the term and the relative frequencies of the abbreviation and full form. CCU, as you can see below, led me down a whole rabbit hole of different questions and choices both about the abbreviation itself and other possible variants and inclusions!

So, it seems that CCU can be an abbreviation for coronary care unit or cardiac care unit, which are both the same thing. However, such units are also sometimes called just coronary units or cardiac units - in which case, the abbreviation wouldn't be CCU. CCU can also refer to a critical care unit, which is something different, but mostly synonymous with intensive care unit, for which the abbreviation is ICU ... are you still following?!

And as I mentioned in my session last week, all those decisions about what to show, where and how have to be filtered through the lens of what will be most helpful for the user. You're always balancing wanting a learner to find the meaning or form of the word (or abbreviation) they've come across, which leans towards "include everthing", but at the same time, you know that they also want simple, concise answers rather than a confusing mess of too much information. Because, TL;DR!

Labels: abbreviations, corpus research, lexicography

Lexicoblog

Wednesday, March 13, 2024

ASHDRA: five years of ELT dictionary research

Sunday, January 21, 2024

Is there enough ‘research’ in your research proposal?

Sunday, November 12, 2023

The cost of speaking: ELT conferences as a freelancer

Monday, October 23, 2023

Diversity and inclusion: what you exclude

Monday, September 25, 2023

Language data and permissions: AI vs corpora

Wednesday, July 19, 2023

Mundane language change

Thursday, July 06, 2023

Lexicography FAQs: messy entries

Lexicoblog

About Me

Previous Posts

Archives