Last week, I was speaking at the BAAL Vocab SIG conference about the process of compiling an entry for a learner's dictionary. I talked about some of the questions that you end up asking as you carry out your corpus research, and the variety of challenges and choices you're faced with: from how many variant forms of a word to show, to what constitutes a separate part of speech, to how finely to split out different senses of a word, and what uses and patterns to exemplify.
I mentioned how entries can range in length from very simple, single-sense words to the mammoth entry for run, the longest entry in most contemporary learner's dictionaries, running to 120 numbered senses in the Oxford Advanced Learner's Dictionary (see what I did there?! ).
This week, I've been thinking about how some entries are really simple and straighforward to compile, while others turn out to be messy and entangled. A couple of medical-related entries I've dealt with recently exemplify that nicely. The entry for cynaosis, despite being a fairly specialized medical term, turned out to be a really simple one to compile. It only has a single, clearly-defined meaning and it's one that can be explained easily within a defining vocabulary.
CCU, on the other hand, turned out to be a complicated mess. Abbreviations can be tricky for a number of reasons. Firstly, they're hard to search for in the corpus because the same abbreviation often gets used to refer to lots of different things, some of them things you wouldn't put in the dictionary, like names of companies or products or local sports clubs, etc., but also sometimes more than one generally-used concept that's relatively high frequency and that learner's might reasonably look up. Then there's the question of whether to have full entries for both the abbreviation and full form or maybe just a cross-reference at the abbreviation pointing to the full form. In the days of print dictionaries when space was at a premium, x-refs would be widely used, but online, it seems unnecessary to send a user round in circles when you could just give a full definition at both. Different publishers and projects will have detailed policies for these kinds of things set out in the styleguide, but sometimes decisions are still left, in part, to the discretion of the lexicographer, considering things such as overall frequency of the term and the relative frequencies of the abbreviation and full form. CCU, as you can see below, led me down a whole rabbit hole of different questions and choices both about the abbreviation itself and other possible variants and inclusions!
So, it seems that CCU can be an abbreviation for coronary care unit or cardiac care unit, which are both the same thing. However, such units are also sometimes called just coronary units or cardiac units - in which case, the abbreviation wouldn't be CCU. CCU can also refer to a critical care unit, which is something different, but mostly synonymous with intensive care unit, for which the abbreviation is ICU ... are you still following?!
And as I mentioned in my session last week, all those decisions about what to show, where and how have to be filtered through the lens of what will be most helpful for the user. You're always balancing wanting a learner to find the meaning or form of the word (or abbreviation) they've come across, which leans towards "include everthing", but at the same time, you know that they also want simple, concise answers rather than a confusing mess of too much information. Because, TL;DR!
Labels: abbreviations, corpus research, lexicography