Lexicography FAQs: spelling variants
English spelling isn’t officially standardized. Unlike French, with its Académie Française to make pronouncements on correct (and incorrect) spellings, English spelling is largely governed by consensus. Whatever spelling of a word is most widely used by a particular speech community becomes the norm.
Spelling variation in English
Of course, the vast majority of common words tend to have a
single agreed spelling; we all spell house and rabbit and oxygen and establishment
in the same way. There is a wonderful degree of variation though too. Some
words have different spellings in different places, the most obvious examples
being British and American regional spellings of words like colour/color,
catalogue/catalog, tyre/tire. Other variants have regional leanings but are
actually less clearly divided; encyclopedia/encyclopaedia, foetus/fetus. Then
there’s the question of whether to hyphenate or not (a topic I’ve written about before). Sometimes that’s about the evolution of a word; E-mail became e-mail
and has now mostly become email. Sometimes there are grammatical reasons for a
difference – a part-time job versus working part time – if you’re especially
fussed about that kind of thing. And sometimes it’s down to no more than
personal preference. I’m more of a co-ordinate kinda person just because I
think the double O in coordinate looks funny. FYI, a quick corpus search shows that I'm very much in the minority on that.
Dictionaries and spelling variants
But surely you can check the “correct” spelling of a word in the dictionary, I hear you cry. Well, first off, and for the umpteenth time, there’s no such thing as the dictionary. There are many different dictionaries which will all have slightly different policies about spelling variants, and which will occasionally – shock, horror – show different things. And that’s because as lexicographers, we don’t decide what the correct spelling of a word should be, instead we use corpus data to help us reflect how a word is spelled out there in the wild. How much of that variation we reflect will depend on the policy of the dictionary. So, for example, a learner’s dictionary is more likely to keep things simple, only showing the most frequent spelling, with maybe one variant spelling if it’s very common. Larger reference dictionaries are more likely to show a wider range of variants.
And of course, different dictionary publishers use different corpora, so especially where it’s a close call, one corpus might rank one spelling as more frequent and so a lexicographer might make it the headword (the word in bold at the top of the entry) and make another the variant (usually shown below, perhaps in brackets and labelled as ALSO). Another corpus might come up with the opposite stats and result in a subtly different entry.
The curious case of dogtooth
One challenge for lexicographers is how far to go with variants. Last week, I started researching the word dogtooth. I was initially looking for the anatomical sense to refer to a human tooth, also known as a canine. As I looked at the corpus though, I came across lots of examples referring to a fabric pattern:
And to a type of violet also called a Erythronium – which is rather beautiful, and I happen to have in my garden (see below).
So far, so good, until I realized that at least for the second two contexts, I’d probably want to add a possessive S in the middle. So, I went back and widened my search criteria. I won’t share the stats from the publishers’ corpus (for reasons of confidentiality), but I replicated the same search using the Timestamped JSI Web Corpus 2014-2021 (via SketchEngine) and interestingly came up with almost the same frequency order:
dogtooth* 532
examples
dog-tooth 115
dog’s tooth** 85
dogstooth 30
dogs-tooth 1
*I ignored capitalized forms because they were frequently
proper names.
** Some of these were references to teeth belonging to actual dogs. So, I
ignored examples of dog’s teeth, which were mostly about our furry friends’
gnashers and only counted the singular dog’s tooth which was more likely to be
one of the other uses, although not always.
Then the question is how many of those variants do I include in my entry. If I throw them all in, will it just look messy and confusing?
And do I trawl through all the corpus evidence to see which variants apply to which uses and try to show that?
Or do I just pick the most frequent forms? I can probably quite safely dump dogs-tooth, but where do I draw the line with the others? I don’t want people to look up, say dogstooth, because they found it in a text somewhere and think that it doesn’t exist or is “incorrect”, because it clearly isn’t. People are using it and presumably, understanding each other, so the descriptivist in me says that’s fine.
I won’t give away how I finally tackled the entry – back to confidentiality again – but next time you’re looking something up in a dictionary, take a look to see if there are spelling variants. Some poor lexicographer has probably agonized over them so they’re worth at least a glance.
Labels: corpus research, language variation, lexicography, spelling, variants