Lexicoblog

The occasional ramblings of a freelance lexicographer

Tuesday, September 27, 2022

Lexicography FAQs: spelling variants

English spelling isn’t officially standardized. Unlike French, with its Académie Française to make pronouncements on correct (and incorrect) spellings, English spelling is largely governed by consensus. Whatever spelling of a word is most widely used by a particular speech community becomes the norm.

Spelling variation in English

Of course, the vast majority of common words tend to have a single agreed spelling; we all spell house and rabbit and oxygen and establishment in the same way. There is a wonderful degree of variation though too. Some words have different spellings in different places, the most obvious examples being British and American regional spellings of words like colour/color, catalogue/catalog, tyre/tire. Other variants have regional leanings but are actually less clearly divided; encyclopedia/encyclopaedia, foetus/fetus. Then there’s the question of whether to hyphenate or not (a topic I’ve written about before). Sometimes that’s about the evolution of a word; E-mail became e-mail and has now mostly become email. Sometimes there are grammatical reasons for a difference – a part-time job versus working part time – if you’re especially fussed about that kind of thing. And sometimes it’s down to no more than personal preference. I’m more of a co-ordinate kinda person just because I think the double O in coordinate looks funny. FYI, a quick corpus search shows that I'm very much in the minority on that.

Dictionaries and spelling variants

But surely you can check the “correct” spelling of a word in the dictionary, I hear you cry. Well, first off, and for the umpteenth time, there’s no such thing as the dictionary. There are many different dictionaries which will all have slightly different policies about spelling variants, and which will occasionally – shock, horror – show different things. And that’s because as lexicographers, we don’t decide what the correct spelling of a word should be, instead we use corpus data to help us reflect how a word is spelled out there in the wild. How much of that variation we reflect will depend on the policy of the dictionary. So, for example, a learner’s dictionary is more likely to keep things simple, only showing the most frequent spelling, with maybe one variant spelling if it’s very common. Larger reference dictionaries are more likely to show a wider range of variants.

And of course, different dictionary publishers use different corpora, so especially where it’s a close call, one corpus might rank one spelling as more frequent and so a lexicographer might make it the headword (the word in bold at the top of the entry) and make another the variant (usually shown below, perhaps in brackets and labelled as ALSO). Another corpus might come up with the opposite stats and result in a subtly different entry. 

The curious case of dogtooth

One challenge for lexicographers is how far to go with variants. Last week, I started researching the word dogtooth. I was initially looking for the anatomical sense to refer to a human tooth, also known as a canine. As I looked at the corpus though, I came across lots of examples referring to a fabric pattern:

 

And to a type of violet also called a Erythronium – which is rather beautiful, and I happen to have in my garden (see below).

So far, so good, until I realized that at least for the second two contexts, I’d probably want to add a possessive S in the middle. So, I went back and widened my search criteria. I won’t share the stats from the publishers’ corpus (for reasons of confidentiality), but I replicated the same search using the Timestamped JSI Web Corpus 2014-2021 (via SketchEngine) and interestingly came up with almost the same frequency order:

dogtooth*            532 examples                   
dog-tooth            115
dog’s tooth**      85
dogstooth            30
dogs-tooth            1

*I ignored capitalized forms because they were frequently proper names.
** Some of these were references to teeth belonging to actual dogs. So, I ignored examples of dog’s teeth, which were mostly about our furry friends’ gnashers and only counted the singular dog’s tooth which was more likely to be one of the other uses, although not always.

Then the question is how many of those variants do I include in my entry. If I throw them all in, will it just look messy and confusing? 

 

And do I trawl through all the corpus evidence to see which variants apply to which uses and try to show that?

Or do I just pick the most frequent forms? I can probably quite safely dump dogs-tooth, but where do I draw the line with the others? I don’t want people to look up, say dogstooth, because they found it in a text somewhere and think that it doesn’t exist or is “incorrect”, because it clearly isn’t. People are using it and presumably, understanding each other, so the descriptivist in me says that’s fine.

I won’t give away how I finally tackled the entry – back to confidentiality again – but next time you’re looking something up in a dictionary, take a look to see if there are spelling variants. Some poor lexicographer has probably agonized over them so they’re worth at least a glance.

 

Labels: , , , ,

Saturday, September 17, 2022

Why isn’t the Wordle word in the dictionary?

Yesterday, I had a rare fail on Wordle … and it seems so did a lot of other people judging by the uproar on Twitter. 

Interestingly, I was slightly less indignant than most as the correct answer, parer, seemed like a perfectly reasonable word to me. In fact, and you’ll just have to believe me on this, it was one I’d thought of trying. It’s a tool you use to pare fruit, vegetables, or cheese, usually a small knife or a peeler. What surprised me more was that it doesn’t appear as an entry in any of the major online dictionaries – if you type it in, you either get nothing or redirected to the entry for the verb, pare. I also notice as I type this post, my spellcheck is underlining it as an unrecognized word too.

Intrigued, and slightly doubting my own intuitions, my next port of call was a corpus search. It turns out it isn’t exceptionally common, and it’s difficult to get exact stats because of noise (names, foreign words, etc.), but there are certainly plenty of examples of usage out there, mostly in recipes and adverts for knife sets.


So why isn’t the word in any of the dictionaries? How did we miss it? Dictionary editors do regularly check for new words to add to a dictionary, especially when they’re compiling new editions. It isn’t an exact science though. New coinages and buzzwords jump out and get noted down. Sometimes words get noticed and noted by lexicographers when they’re working on an entry for another word. Several online dictionaries also have facilities for users to add suggested new words. Old, but relatively rare words though are easily missed.

But has parer actually been missed?

Learner’s dictionaries, of course, focus on the more high-frequency words that are likely to be most useful to learners of the language, so parer would be unlikely to feature anyway. And the free online versions of general reference dictionaries aren’t always based on the most comprehensive version of that publisher’s dictionaries. So, you won’t, for example, find the full OED available for free online, it’s a subscription service. Although, in fact, you don’t even have to go to the full OED to find parer. The entry below, in which parer appears at least as a run-on if not a headword, is from the Oxford Dictionary of English, a slightly less weighty, single-volume dictionary. Similarly, I found it in an old print version of The American Heritage College Dictionary I have on my shelves.

 

Oxford Dictionary of English, 3e

So, I went back to online sources and found that if you scroll down to the second entry for pare on dictionary.com, which has recently taken on data from Oxford Dictionaries, you do find it hidden away there too. 


So, before you exclaim that a word isn’t in the dictionary, perhaps you need to consider which dictionary you’re looking in and how much you can reasonably expect to get for free online.

Labels: , ,