The future of dictionaries (2): lexicographer versus computer
Some 20-odd years ago, as a young, Linguistics undergraduate, I
became interested in the concept of computers ‘understanding’ human language. I
did my undergraduate dissertation on Natural Language Processing (NLP),
considering how far computers might go in really understanding language in all
its subtle, complex, nuanced detail, and holding up the talking computer Hal,
from Kubrick’s 1968 film 2001, as
what I then suggested was an unobtainable goal. I went on to start a Master’s
course in Computer Speech and Language
Processing. I only lasted a term – mainly because I discovered I really
hated all the computer programming involved, but also because I was
disappointed to find that most of the course seemed to revolve around the
speech processing side (i.e. voice recognition) and the language processing
component came down to rather vague theoretical discussions that didn’t go much
beyond my basic undergraduate research. Okay, that may be a bit of a distorted
recollection of the actual course content, but I was only 21 and that’s how you
see things when you’re barely out of your teens!
Obviously, in the intervening decades, technology has come
on in leaps and bounds. Speech recognition has improved immeasurably - I'm
actually dictating this blog post using speech recognition software and while
it's not perfect, it's considerably more impressive than my early efforts are
programming! I have to hold up my hands here and admit that I haven't kept up-to-date
with developments in NLP, but I suspect progress has been much slower; we're
still an incredibly long way from communicating with our technology in the same
fluent way we can chat to our friends.
So, what’s any of this got to do with dictionaries? Well,
let me try and explain my train of thought, triggered by the announcement by Macmillan
back at the start of November that they are to stop printing paper dictionaries
and focus on their online content:
- If publishers aren't actually selling paper dictionaries but are mostly focusing on a free online service, how much are they going to be prepared to spend on the time-consuming and labour-intensive work of lexicography?
- Of course, they'll be looking into other related income streams, selling dictionary data for other uses, and online advertising, but without a tangible, on-the-shelf product, will that justify quite the same budget?
- Reduced budgets often suggest a drive towards more automation, something we've already seen with the emergence of developments such as ‘TickBox lexicography’.
- Will more automation and “more efficient” ways of working inevitably lead to a drop in standards?
Clever developments in making the dictionary compilation process more
automated do supposedly speed it up, for example, by automatically selecting
‘good’ dictionary examples from a corpus, to save a human lexicographer having
to trawl through by hand. But any lexicographer who's worked with them will
know that they only work to a degree and only speed things up to a certain
extent … probably not quite compensating for the increased rate expected of
said lexicographer without a drop in quality.
And then there's the whole established process for keeping
dictionaries up-to-date. Currently, most dictionaries undergo a revision and a
new edition every five years or so. This is a long, slow, and labour-intensive
process that involves a team of lexicographers (mostly freelancers nowadays)
going through the whole A-Z, looking at each entry and checking whether it
needs updating. This doesn’t just involve adding trendy new buzzwords like
‘omnishambles’ or whatever – which are rarely of much use, or interest, to the
average foreign learner anyway. There are all kinds of more subtle changes in the
usage of existing words, sometimes due to linguistic trends and sometimes just
as a result of changes in the real world. As one commenter on the Macmillan
dictionaries blog pointed out, MED still contains an entry for Inland Revenue as the name of the UK tax
authority, even though it changed its name to HMRC in 2005. And having done a quick search myself, I found it
also has a couple of example sentences that rather unhelpfully in a digital age
refer to cassettes (She slotted another tape into the cassette player. @ slot into, He
quickly undid the screws that held the cassette together. @ undo).
And I’m not just trying to pick holes in Macmillan here; all
dictionaries naturally date as language and usage changes. Thus the need for new
editions. And there are changes in style and presentation too as different
aspects of language come to the fore within language research and teaching.
More information about collocations has become de rigueur over recent years,
for example. And whilst corpora are wonderful tools for researching
collocational information, it still needs a team of lexicographers to trawl
through each entry and decide where it’s worth adding a bolded collocate, or in
some cases, whether a particularly
strong collocation should actually be shown as a phrase or an idiom.
Which comes back to where I started … computer technology
can do lots of wonderful things, but for me, when it comes to language, there
still needs to be a human drudge working their way through that data to make
intelligent decisions about what to present in a dictionary and how. In a world
of online-only dictionaries, will dictionary departments have the clout to take
on a team of lexicographers to do those regular sweeps through the database or
will they just have a couple of people on the lookout for interesting,
newsworthy nuggets that give the appearance of being “up-to-date”?
Labels: dictionaries, lexicographer, lexicography, Macmillan, online dictionaries