Lexicoblog

The occasional ramblings of a freelance lexicographer

Monday, January 07, 2013

The future of dictionaries (2): lexicographer versus computer



Some 20-odd years ago, as a young, Linguistics undergraduate, I became interested in the concept of computers ‘understanding’ human language. I did my undergraduate dissertation on Natural Language Processing (NLP), considering how far computers might go in really understanding language in all its subtle, complex, nuanced detail, and holding up the talking computer Hal, from Kubrick’s 1968 film 2001, as what I then suggested was an unobtainable goal. I went on to start a Master’s course in Computer Speech and Language Processing. I only lasted a term – mainly because I discovered I really hated all the computer programming involved, but also because I was disappointed to find that most of the course seemed to revolve around the speech processing side (i.e. voice recognition) and the language processing component came down to rather vague theoretical discussions that didn’t go much beyond my basic undergraduate research. Okay, that may be a bit of a distorted recollection of the actual course content, but I was only 21 and that’s how you see things when you’re barely out of your teens!

Obviously, in the intervening decades, technology has come on in leaps and bounds. Speech recognition has improved immeasurably - I'm actually dictating this blog post using speech recognition software and while it's not perfect, it's considerably more impressive than my early efforts are programming! I have to hold up my hands here and admit that I haven't kept up-to-date with developments in NLP, but I suspect progress has been much slower; we're still an incredibly long way from communicating with our technology in the same fluent way we can chat to our friends.

So, what’s any of this got to do with dictionaries? Well, let me try and explain my train of thought, triggered by the announcement by Macmillan back at the start of November that they are to stop printing paper dictionaries and focus on their online content:

  • If publishers aren't actually selling paper dictionaries but are mostly focusing on a free online service, how much are they going to be prepared to spend on the time-consuming and labour-intensive work of lexicography?
  • Of course, they'll be looking into other related income streams, selling dictionary data for other uses, and online advertising, but without a tangible, on-the-shelf product, will that justify quite the same budget?
  • Reduced budgets often suggest a drive towards more automation, something we've already seen with the emergence of developments such as ‘TickBox lexicography’.
  • Will more automation and “more efficient” ways of working inevitably lead to a drop in standards?

Clever developments in making the dictionary compilation process more automated do supposedly speed it up, for example, by automatically selecting ‘good’ dictionary examples from a corpus, to save a human lexicographer having to trawl through by hand. But any lexicographer who's worked with them will know that they only work to a degree and only speed things up to a certain extent … probably not quite compensating for the increased rate expected of said lexicographer without a drop in quality.

And then there's the whole established process for keeping dictionaries up-to-date. Currently, most dictionaries undergo a revision and a new edition every five years or so. This is a long, slow, and labour-intensive process that involves a team of lexicographers (mostly freelancers nowadays) going through the whole A-Z, looking at each entry and checking whether it needs updating. This doesn’t just involve adding trendy new buzzwords like ‘omnishambles’ or whatever – which are rarely of much use, or interest, to the average foreign learner anyway. There are all kinds of more subtle changes in the usage of existing words, sometimes due to linguistic trends and sometimes just as a result of changes in the real world. As one commenter on the Macmillan dictionaries blog pointed out, MED still contains an entry for Inland Revenue as the name of the UK tax authority, even though it changed its name to HMRC in 2005. And having done a quick search myself, I found it also has a couple of example sentences that rather unhelpfully in a digital age refer to cassettes (She slotted another tape into the cassette player. @ slot into, He quickly undid the screws that held the cassette together. @ undo).
 And I’m not just trying to pick holes in Macmillan here; all dictionaries naturally date as language and usage changes. Thus the need for new editions. And there are changes in style and presentation too as different aspects of language come to the fore within language research and teaching. More information about collocations has become de rigueur over recent years, for example. And whilst corpora are wonderful tools for researching collocational information, it still needs a team of lexicographers to trawl through each entry and decide where it’s worth adding a bolded collocate, or in some cases, whether a particularly strong collocation should actually be shown as a phrase or an idiom.

Which comes back to where I started … computer technology can do lots of wonderful things, but for me, when it comes to language, there still needs to be a human drudge working their way through that data to make intelligent decisions about what to present in a dictionary and how. In a world of online-only dictionaries, will dictionary departments have the clout to take on a team of lexicographers to do those regular sweeps through the database or will they just have a couple of people on the lookout for interesting, newsworthy nuggets that give the appearance of being “up-to-date”?

Labels: , , , ,

4 Comments:

Anonymous Liz said...

Some very pertinent points here Julie, especially about who is going to want to pay for the revision and updating of material that is free at the point of use. I can't help feeling we have lived through - and come to the end of - a golden age of lexicography. I just counted the number of people in my photo of the Longman dictionaries team back in 1990; there were 50 of us! I doubt if that many people are working in UK lexicography as a whole today.

4:32 pm  
Blogger Tyson Seburn said...

I'm not convinced that moving away from paper dictionaries to online suggests that the quality will go down. Maybe I don't understand that the source of income is significantly less to afford the team of lexicographers. If dictionaries are the business, the money to keep up the quality must come from alternative sources.

One aspect that I continue to grapple with in favour of non-paper versions comes back to your point that language changes and that it needs updating. Most people who buy a paper dictionary will almost invariably keep that dictionary as their precious go-to for potentially decades, longer than its value remains, well, valuable. Online versions don't require a 5-year process for updating. Can't a couple of smaller teams of lexicographers be assigned to the beginning and middle of the alphabets and then continuously cycle through the alphabet in less time? Besides, the cost of printing and publishing the dictionaries also goes down, so the extra money may be there. Bottom line is that consumers will benefit from current usage instead of being clingy to the past.

I'm going on about an aspect I know little about really, but just some thoughts.

2:43 am  
Blogger The Toblerone Twins said...

Hi Tyson,

I completely understand your thoughts - from a user perspective, an online dictionary does have potentially huge advantages. It can be constantly updated, contain all manner of exciting new features (pictures, sound, video), there are no space restrictions, etc. etc.

The key point though is that most online dictionaries are free, so where does the income come from to pay for all this?

As Liz said, (ELT) lexicography has been being squeezed gradually over the past decade or so. When I started, I was part of a team of in-house lexicographers working at a publisher, fully trained up by them and working on exciting new products. When I first went freelance, I was working as part of large teams, who'd get together for regular meetings and training, on long-term projects with good numbers of hours spent (and decent pay). Now projects are few and far between, dictionary departments have largely shrunk to a couple of in-house editors (or disappeared altogether)and projects tend to be "quick and dirty" - do as much as you can in limited time, no meetings, no training, no new lexicographers coming through.

So from an insider's point of view, this move away from a tangible print product feels very much like publishers (in squeezed times) trying to hang onto something for nothing.

9:13 am  
Blogger Tyson Seburn said...

All good points--insight into the backend of an sector I haven't had a look into before. I wonder if even though the online dictionary is free (though perhaps abridged), a different model needs to be developed for sustainability. Obviously over time if they are not well-attended, which takes money, users will wise up and discontinue using them.

Of course, it's something that we can probably align to the print publishing industry in general. If consumers demand e-versions because that's all they'll use, everyone must adapt.

5:00 am  

Post a Comment

<< Home