Lexicoblog

The occasional ramblings of a freelance lexicographer

Tuesday, September 27, 2022

Lexicography FAQs: spelling variants

English spelling isn’t officially standardized. Unlike French, with its Académie Française to make pronouncements on correct (and incorrect) spellings, English spelling is largely governed by consensus. Whatever spelling of a word is most widely used by a particular speech community becomes the norm.

Spelling variation in English

Of course, the vast majority of common words tend to have a single agreed spelling; we all spell house and rabbit and oxygen and establishment in the same way. There is a wonderful degree of variation though too. Some words have different spellings in different places, the most obvious examples being British and American regional spellings of words like colour/color, catalogue/catalog, tyre/tire. Other variants have regional leanings but are actually less clearly divided; encyclopedia/encyclopaedia, foetus/fetus. Then there’s the question of whether to hyphenate or not (a topic I’ve written about before). Sometimes that’s about the evolution of a word; E-mail became e-mail and has now mostly become email. Sometimes there are grammatical reasons for a difference – a part-time job versus working part time – if you’re especially fussed about that kind of thing. And sometimes it’s down to no more than personal preference. I’m more of a co-ordinate kinda person just because I think the double O in coordinate looks funny. FYI, a quick corpus search shows that I'm very much in the minority on that.

Dictionaries and spelling variants

But surely you can check the “correct” spelling of a word in the dictionary, I hear you cry. Well, first off, and for the umpteenth time, there’s no such thing as the dictionary. There are many different dictionaries which will all have slightly different policies about spelling variants, and which will occasionally – shock, horror – show different things. And that’s because as lexicographers, we don’t decide what the correct spelling of a word should be, instead we use corpus data to help us reflect how a word is spelled out there in the wild. How much of that variation we reflect will depend on the policy of the dictionary. So, for example, a learner’s dictionary is more likely to keep things simple, only showing the most frequent spelling, with maybe one variant spelling if it’s very common. Larger reference dictionaries are more likely to show a wider range of variants.

And of course, different dictionary publishers use different corpora, so especially where it’s a close call, one corpus might rank one spelling as more frequent and so a lexicographer might make it the headword (the word in bold at the top of the entry) and make another the variant (usually shown below, perhaps in brackets and labelled as ALSO). Another corpus might come up with the opposite stats and result in a subtly different entry. 

The curious case of dogtooth

One challenge for lexicographers is how far to go with variants. Last week, I started researching the word dogtooth. I was initially looking for the anatomical sense to refer to a human tooth, also known as a canine. As I looked at the corpus though, I came across lots of examples referring to a fabric pattern:

 

And to a type of violet also called a Erythronium – which is rather beautiful, and I happen to have in my garden (see below).

So far, so good, until I realized that at least for the second two contexts, I’d probably want to add a possessive S in the middle. So, I went back and widened my search criteria. I won’t share the stats from the publishers’ corpus (for reasons of confidentiality), but I replicated the same search using the Timestamped JSI Web Corpus 2014-2021 (via SketchEngine) and interestingly came up with almost the same frequency order:

dogtooth*            532 examples                   
dog-tooth            115
dog’s tooth**      85
dogstooth            30
dogs-tooth            1

*I ignored capitalized forms because they were frequently proper names.
** Some of these were references to teeth belonging to actual dogs. So, I ignored examples of dog’s teeth, which were mostly about our furry friends’ gnashers and only counted the singular dog’s tooth which was more likely to be one of the other uses, although not always.

Then the question is how many of those variants do I include in my entry. If I throw them all in, will it just look messy and confusing? 

 

And do I trawl through all the corpus evidence to see which variants apply to which uses and try to show that?

Or do I just pick the most frequent forms? I can probably quite safely dump dogs-tooth, but where do I draw the line with the others? I don’t want people to look up, say dogstooth, because they found it in a text somewhere and think that it doesn’t exist or is “incorrect”, because it clearly isn’t. People are using it and presumably, understanding each other, so the descriptivist in me says that’s fine.

I won’t give away how I finally tackled the entry – back to confidentiality again – but next time you’re looking something up in a dictionary, take a look to see if there are spelling variants. Some poor lexicographer has probably agonized over them so they’re worth at least a glance.

 

Labels: , , , ,

Friday, September 24, 2021

Are you hyphen-hesitant?

At a recent webinar on EAP vocab, the topic of prefixes had come up and someone asked me the best way to teach students about when to hyphenate words with prefixes.

My first answer – which I wasn’t quick enough to give at the time – is I’d tell them to really not worry about it!

My second answer – and the one which I gave – is if they’re not sure, check in a dictionary. Although, to be honest, good luck with that …

Recently, I’ve been spending a lot of my time researching prefixed words and one of the things I’ve been checking is which are typically hyphenated and which are closed (no hyphen). And the results are really messy. It seems that a corpus search for almost any prefixed word will throw up a mix of both options.

Dictionaries: Dictionaries seem to vary a bit in their approach to showing prefixed words. Most though show the most frequent spelling as the headword, then the other alternative as a variant.

 


Although of course, different dictionaries use different corpora and will come up with different balances of hyphenated/closed examples. So, if you look up the same word in several dictionaries, don’t be surprised if you find different answers.

Some trends: Although it’s an area where there aren’t firm rules, there are a few tendencies:

Fixedness: Typically, newly created prefixed words tend to start off with a hyphen – perhaps to make the original root word clearer and the novelty less confusing. Then over time, as the combination becomes more familiar and fixed, the hyphen often gets dropped – think e-mail > email, on-line > online.

US vs UK: Also as a general rule of thumb, Brits tend to use more hyphens than Americans.

Double vowels: Where a prefix ends in a vowel and the root word starts with a vowel, hyphens are more common to avoid possible confusion over the pronunciation of the double vowel sound. So, co-opt is quite clearly /kəʊˈɒpt/ (two separate O sounds) whereas coopt not only looks a bit weird, but could potentially be pronounced /kuːpt/!

Looking at a few random corpus examples of prefixed words with double vowels, it’s clear that there are a mix of factors in play. The mispronouncability of the closed compound may be one – try antiageing without a hyphen. And some more familiar, fixed combinations, like preempt and preamble, are less likely to be hyphenated compared with more novel ones that err towards a hyphen – semiindependent is clearly a step too far for most people!


Note: All my corpus stats here are very rough and ready – this isn’t a careful academic analysis – but I think they probably represent the general trends. With this kind of thing, you also need to bear in mind all kinds of different factors, like the influence of style guides – so, for example, a major publisher or media group might decide at a certain point to shift from using e-mail to email and that could have quite an impact on corpus stats. As ever, language gets pushed and pulled by all kinds of sometimes mysterious forces!

Labels: , , , , ,

Thursday, November 17, 2011

Google search spellcheck

As I've admitted before, my spelling's not great. I generally know when I can't spell a word though, so I check it before committing it to print and with a whole shelf of dictionaries within reach of my desk there's really no excuse. Recently though, I've noticed that I'm increasingly just starting to type a word I'm unsure about into the Google search box at the top of my browser and waiting for it to come up with suggestions. It's a technique I started using mainly to check the spellings of proper nouns (names of people and places or foreign dishes - does moussaka usually have one s or two?), but I've found that I'm now using it more and more, especially if I'm on my laptop away from my desk.

The lexicographer in me wants to object that it's not a reliable or authoritative source, but then arguably, that's not always what you need. For many words, I recognise the correct spelling when I see it, it's just a quick double check when I'm having a bit of a mental blank. And after all dictionary work is all about frequency-based corpus research and what's Google if it's not a massive corpus.

Labels: , ,

Thursday, March 31, 2011

Spelling shame

Today, I had a chiropodist's appointment. When I arrived I was asked to fill in a form with a few bits of basic personal information. No problem, until I came to 'allergies'. I have an allergy to a common antibiotic, but I can't for the life of me ever remember how to spell it! My pen hovered for a moment, then I plumped for "penecillin" ... Sat in the chiropodist's chair, the polite conversation inevitably turned to:

"So what do you do?"
"Erm, I'm a writer. I mostly write dictionaries."
"Oh really, dictionaries?! You must be very good at English."
"Well, no, not really. My spelling's terrible actually. I've probably spelt penicillin wrong on that form, haven't I?"
(Looks at the form) "Yes. ... Oh well, we're all human, aren't we?"

I feel as if I ought to be able to spell better, but there's just something about certain words that I have a mental block with. I've always been the same - at school I was always being nagged by teachers about my spelling. I think it's something to do with spelling being something you just have to learn - I've never been good at learning things by heart - I have to have some kind of logic or reasoning behind something. I can explain complex grammatical rules, but I can't remember whether it's an e or an i in the middle of words like penicillin!

Thankfully, I know which words I can't spell, so in most circumstances, I just check - I've got a pile of dictionaries on my desk, after all! I've even got my most frequent problem word on a little post-it note on my monitor ...


Labels: ,