Lexicoblog

The occasional ramblings of a freelance lexicographer

Friday, December 13, 2019

Lexical layers 1: register and genre


One of my pet hates when it comes to vocabulary teaching is when I see materials that don't venture beyond the basic denotational meaning of a word or phrase – that is, the thing or idea it refers to in the real world. So, we could say that purchase is another word for buy, that wonga is money or that built like a brick shithouse means large. But that's really only the first layer of meaning peeled off and in many cases, is not enough to really understand what the speaker intends by their choice of word(s) and is nowhere near enough to know when and how you can use it yourself.

I especially come across this on social media with posts from all kinds of sources (trusted and less so) offering fun words of the day or sets of useful phrases or lists of synonyms, all aimed directly at learners but invariably explaining nothing at all about when or where or to whom the words would be appropriate. And, to be honest, conventional published materials don't always fare much better either, with the teaching of idioms being an area liable to see me sinking my face into my hands in despair.

In ELT, it's an issue that tends to get increasingly relevant as students work their way up the levels. A lot of the vocabulary we teach at the lowest levels is the very high-frequency words. Many of these tend to be fairly neutral; there's not much more to say about table, pencil, car, walk or blue. As students expand their vocabulary beyond the basics though, the picture gets less clear. Yes, there are still plenty of neutral words, especially simple concrete nouns like tunnel, stadium, fennel or sieve, but there are many more words with multiple layers of meaning that we really need to be getting across to students so that they properly understand the language they read and hear, and perhaps more importantly, so they don't go around inadvertently giving the wrong impression or making dreadful faux pas.

Register and genre

The concept of register in language teaching, if it gets covered at all, tends to get reduced to 'formal' and 'informal'. You might come across a lesson on formal and informal messages in which Yours faithfully is labelled as formal and love from as informal. Register, though, goes much further than that. Here's a definition from Oxford Dictionaries:

register: linguistics
A variety of a language or a level of usage, as determined by degree of formality and choice of vocabulary, pronunciation, and syntax, according to the communicative purpose, social context, and standing of the user.

Let's start off by just focusing on words that are typically used in particular contexts or genres (types of text or speech). The most obvious of these are labelled in learner's dictionaries. So, you'll find that purchase is labelled as formal because it's not a word we typically use in everyday, informal conversation. At the other end of the scale, wonga is likely to be labelled as informal or slang. How far learner's dictionaries go with other register labels varies, but you might come across a word like herewith labelled as law/legal, kinetic might appear as specialized or technical or have a science label, and elegiac will likely be shown as literary. Of course, you could dig around in any genre and turn up a whole host of typical vocab that would seem odd used elsewhere:

Business jargon: core values, scalable, going forward, think outside the box
Tabloid journalism: mum-to-be, blonde bombshell, love rat, (jobs) axed
Football commentary: play to the whistle, against the run of play, hit the woodwork, handbags
Academia: epistemological, existential, ibid, give rise to, allude to
Official announcements: Kindly refrain from …, Bags must be stowed …, Please proceed to …, Alight here for …


As proficient speakers of English, we mostly don't notice these choices until there's an obvious mismatch. My favourite example of this (apologies if you've seen me quote this before!) is from a television advert from a few years ago for a job search website. A primary school teacher is seen speaking to a class of five-year-olds … name that register!

I put it to you that on the morning of the 17th you did enter the Story Time Corner and with malice aforethought you did inflict grievous injury upon one Mr Boo-Boo Bananas.

Then there's the distinction between language in current use and words or phrases that are falling out of use or have become 'marked' because they no longer feel contemporary. In a dictionary, you might find labels indicating language that's dated (used within living memory, but not current: phone box, discotheque, groovy), words that are old-fashioned (the fair sex, gramaphone, wedlock) and old use (only really found in literature from centuries past: thou, smite).

Of course, these are very broad distinctions which any proficient English speaker could refine into scales of formality, of datedness or of specialization. And exactly where the boundaries lie are grey areas that will vary between speakers – a point I'll return to in a later post.

In the classroom, I think the thing to remember is that context is key. If you come across a new word or phrase in a reading or listening text, by all means look at the (denotational) meaning to help students understand the text, but don't then take it out of context and slot it into a productive activity or add it to a words-to-learn list without considering any restrictions on its use. Encourage students to note who used it and where it came from, to look out for it in future and again, notice the context. Help them avoid rushing to use newly-learned vocabulary where it doesn't really fit. To take a recent example I came across online offering alternatives to please for asking for things politely, just adding kindly to a request probably isn't going to go down well. As most learner's dictionaries note, it's either used in very formal, usually official instructions - We kindly request you read the following information carefully - or it's actually a tetchy, passive-aggressive show of annoyance – Kindly move your car immediately!

In my next post, I'll be looking at another lexical layer: connotation.

Labels: , , ,

Monday, March 05, 2018

Corpus insider #1: Representativeness



As I was putting together my talk for the ELT Freelancers’ Awayday and the follow-up blog post, I realized that over 20 years of using corpora, there are a whole host of factors I’ve learnt to take into account. I touched very briefly on a few of them in my talk, but I thought some might be worth exploring further. So, this is the first in a series of posts about things you might need to bear in mind if you want to use corpus tools to inform your work on ELT materials.

When I explain to people what a corpus is, I usually start off by saying that it’s a large collection of language that we use to represent the way English is used as a whole. It seems a simple premise, but when you dig a bit deeper, it gets more complicated. As with any research, the validity of your results is dependent on the quality of your data. The data your chosen corpus contains will determine exactly what kind of language it can actually be said to represent and so how useful it is for your purpose. To take a simple example, if you were writing for a specifically British English market, using a corpus that contained only American data wouldn’t be very useful. Similarly, if you were working on speaking materials, looking at usage in a corpus of entirely written data wouldn’t really tell you much about how people normally speak. Understanding a bit about the corpus you plan to use, the data it contains, and what that might represent is absolutely essential before you start doing any corpus research.

Corpus types:
There are two main types of corpus, those which contain data drawn from one type of source or genre and those which are said to be ‘balanced’ and contain data from a wide variety of different genres. The first type includes purely spoken corpora (like the Spoken BNC2014), corpora of academic writing (of either published texts, like the academic part of COCA or student writing, like BAWE or MICUSP) and many corpora are composed largely of journalism, because it’s one of the simplest sources of data to collect, especially for those corpora that rely on web-based content (e.g. Monco, NOW, etc.).

Large balanced corpora, containing written and spoken data from a wide range of sources, are much more difficult to put together. For this reason, they’re mainly owned and maintained by large publishers, especially those who produce dictionaries, and aren’t publically available. The British National Corpus (BNC) is a balanced corpus that’s freely available, but it’s relatively small by modern standards and, perhaps more importantly, it’s becoming increasingly out of date (with data from the 1980s and 90s). The Corpus of Contemporary American English (COCA) sits in a mid-ground with data from spoken sources (although all radio transcripts rather than everyday conversation), fiction, popular magazines, newspapers and academic texts. It’s reasonably balanced, although all American as the name suggests.

The trouble with media hype:
Data from newspapers, magazines and blogs is very easy to collect and makes up a large proportion of many corpora. It can provide lots of interesting information about language used to talk about a wide range of topics, but it’s important to remember that journalism as a genre has its own quite marked features that don’t necessarily reflect the way that ordinary people use language day to day. It may seem obvious to say that journalists report news, but that means they’re generally writing about what’s new, surprising, shocking or problematic. They also want to draw their readers in and keep their attention with colourful language choices and hyperbole. For my recent talk, I demonstrated an example of a query about the language of social media and in particular, which verbs collocate with the noun ‘newsfeed’. I used the Monco corpus, because I was interested in up-to-date usage, and came up with the following verbs:

scroll through your newsfeed
pop up on your newsfeed
fill/flood/dominate/clog up your newsfeed


The first two feel like expressions you might use in conversation, the others, however, are clearly journalistic in style; bemoaning the way that a particular trend is overtaking our online lives. Searching a couple of other news-dominated corpora came up with similar results (enTenTen: spam/clutter/bombard/clog your newsfeed; NOW: scroll through/appear on/pop up on/tweak/flood/clog your newsfeed). They’re all interesting collocations, but they’re probably not the first ones you’d choose to teach an intermediate learner who wants to talk about the way they use social media themselves. That’s not to say you shouldn’t use these corpora when you're researching ideas for ELT materials, but knowing a corpus contains only or largely data from journalistic sources means that you can be on the lookout for this type of language and be selective about what you use as appropriate for the learners you’re writing for.


Professional and lay writers:
Unsurprisingly, the majority of written corpus data comes from published sources and, as such, it’s written by people who are professional writers: authors, journalists, copy-writers. As we saw with journalism, above, this can mean the language is more colourful and probably more varied than the average lay person typically tends to use. This came out very clearly in a recent study into academic vocabulary (Durrant, 2016*) which looked at how many of the words on the Academic Vocabulary List (based on a corpus of published academic writing) were actually used regularly by student writers (using a corpus of university-level student writing). It turned out that the student essays contained a vastly narrower range of vocabulary than the published academic texts, written by experienced (and edited!) academics. That’s not to say the student writing was in some way lacking – all the papers had got high marks – it’s just a different genre with different expectations. 

When you’re using a corpus to search for ideas, it’s all too easy to pick out examples and patterns that are elegant or appealing, but I think it’s always important to ask yourself how typical they are of what the average person might say or write. Is it a writerly flourish? Is it helpful as a model for your target learners?

I’m not saying that as ELT writers and editors we should reject all corpus evidence as flawed and unhelpful. Far from it, I think corpus tools can be incredibly helpful in backing up our intuitions and uncovering patterns of usage we might not have thought of, but they are just that, ‘tools’ and should be used with an element of caution. It's all too easy to be drawn in by a corpus that's new or especially large or has a nice interface and nifty tools, but making sure you know what your corpus represents is vital. If a collocation or pattern feels unlikely or overly fancy, then ask yourself why. Don’t just accept the first results that pop up, click through to the examples, scroll down to see where they come from and understand exactly what’s going on.


* There's a good summary of Durrant's study on ELT Research Bites.

Labels: , , , , , ,

Tuesday, February 02, 2016

Semi-academic sources in EAP: an interview with a New Scientist journalist (2)



Part two: Language & vocabulary

My last post featured the first part of an interview with Dr Alison George, an editor for New Scientist magazine. She talked about how scientific papers are restructured and presented in a more appealing way for the more general, New Scientist readership. In this post, she talks about how the actual language used is different.

Dr Alison George: “A journalist will try to avoid the "jargon heavy" language used in scientific papers and adopt a simpler approach to conveying information.  A case in point is my PhD thesis, for which I gave the title:  "The biodegradation of anionic surfactants in the estuarine environment".  In hindsight, I realise that I went out of my way to use long words to make it sound serious.  This is typical of scientific papers. However, if I was explaining my thesis to a friend, I'd say that my research was about whether the chemicals found in shampoos and detergents are biodegradable.” 

I ask whether the use of long words in academic papers is really just about ‘sounding serious’ and on reflection, Alison admits, that isn’t always strictly true. “For example, to use "detergents" instead of "anionic surfactants" would have made it easier to understand for the lay person, but is technically inaccurate.

Vocabulary differences: a specific case

To further illustrate her point about language differences, Alison gave me an example of an article she’d written for New Scientist about penguins and for comparison, the two academic papers on which it was based. 

She picks out a couple of phrases that were reworded to make them more accessible. “The first paper used the words "synoptic survey" in the opening sentence and title . The words "synoptic survey" would not be used in New Scientist, instead we might say, "a survey of the entire coastline of Antarctica using satellite images".  

The second paper uses terms such as "analysis of coupled demographic and climate models ". Again, we would avoid using this term in New Scientist because it's hard to work out what it means. Instead, we might say something like: "predictions of future numbers of Emperor penguins based on forecasts of the Antarctic climate".  

The bottom line is this: although a scientific research paper and an article in New Scientist might tackle the same topic, and both might deal with some tricky concepts, the style they are written in is different. In New Scientist, we make strenuous efforts to translate technical terminology and jargon into words that an educated reader, without any specialist knowledge of the subject, should understand.”

Lost vocabulary:

What exactly constitutes ‘technical terminology’ though? The two examples above are clearly very specialist and arguably not very useful for the average EAP student to spend time on, but what about the rest of the language? If we compare the New Scientist article with the first of the academic articles in terms of overall vocabulary, we see any interesting difference:


New Scientist article
Original scientific paper
Top 2000 most frequent words
83%
74.5%
AWL* words
5%
14.5%
Other words
12%
11%
* Academic Word List

These stats are very broad-brush, but they do show that as well as cutting the most specialist terminology, the New Scientist article also loses a lot of the general academic vocabulary (here based on the AWL), which is probably exactly what EAP students do need. Just some of the vocabulary that gets lost in the edit here includes words like: assess, consistent, distribution, establish, evidence, factor, function, indicate, occur, variation; all recognizably useful core academic words.

If so many EAP materials focus on teaching this core academic vocabulary, it seems somewhat counterproductive to be using texts that quite consciously feature significantly less of it.

Idiom and hyperbole:

So what is it that replaces the academic vocabulary in the New Scientist article? Well, it does contain a higher proportion of high frequency words, which should make it more accessible to the average non-native speaker student. This is good news, of course, if you’re looking for input for a speaking lesson, say. However, there are a couple of linguistics features which could work against its usefulness in an EAP context.

Because New Scientist articles are essentially targeted at a native speaker readership, they draw on idiomatic language and cultural references to appeal to that audience. Take these two short extracts:

“Fast-forward a few decades, and many colonies will be on the road to extinction. Are we witnessing the last march of the emperor penguins?” (> tricky idioms in ‘fast-forward a few decades’ and ‘on the road to extinction’, plus the cultural reference to the documentary film ‘March of the Penguins’, which gets another mention later in the piece)

“This extraordinary lifestyle has made the emperors famous. They have even been held up as role models by evangelical Christians.” (> again, the cultural reference here might take quite a bit of explaining to students from some backgrounds!)

These type of issues might be a fun distraction in a General English class, but are they really an effective use of class time for students preparing for academic study? Again, I guess that’s down to context and the amount of class time available, as well as the interests and priorities of your students.

Perhaps of more concern, I think, for students trying to get to grips with an academic style of writing is the type of language used to give the story more impact for a general audience. The New Scientist article is littered with words like impossible, blockbuster, breath-taking, catastrophic, disastrous, extraordinary, demise and vanish.  This is exactly the type of language that academic writers are careful to avoid, unless it’s very carefully hedged (with seemingly, apparently, potentially, etc.) It comes back to the point Alison made above about the need to be completely accurate in academic writing. As EAP tutors, we warn our students to avoid exaggeration and overgeneralization in their writing, because we can foresee the comments which will come back from their subject tutors.

This raises the question of whether it’s actually misleading to present this type of text to students as an example of academic writing. How will they know just what’s appropriate to use in their own writing and what’s not? Yes, we can make mention of the differences, we can do a bit of genre analysis even, but will students be able to make all those distinctions for themselves, will they realize just what’s transferrable and what isn’t?

So having looked in a bit more detail at the genre, is it helpful to use articles from consumer magazines aimed at a general readership in an EAP context?  As Swales (2016) puts it: “Genres are defined in terms of their communicative purposes” and from what we’ve seen, the communicative purposes of these articles versus the kind of academic texts that students will need to read as part of their studies are clearly not the same. So, once again, I think, it comes back to the aims of the lesson; these articles are clearly more fun and engaging than most academic texts and because they’re aimed at a non-specialist audience, they’re more suited to a mixed-discipline EAP class. However, if the aim is to prepare students for the type of reading texts and language they’re going to need for their future studies, not only are these articles unhelpful, but they could actually prove a hindrance.

With special thanks to Alison George for taking the time to answer my questions, for being so enthusiastic about the topic and for providing some fascinating insights into the workings of New Scientist.

References:
Fretwell PT, LaRue MA, Morin P, Kooyman GL, Wienecke B, Ratcliffe N, et al. (2012) An Emperor Penguin Population Estimate: The First Global, Synoptic Survey of a Species from Space. PLoS ONE 7(4): e33751. doi:10.1371/journal.pone.0033751
George A (2012) The last march of the emperor penguins. New Scientist
Jenouvrier S, Holland M, Stroeve J, Barbraud J, Weimerskirch H, Caswell H (2012) Effects of climate change on an emperor penguin population: analysis of coupled demographic and climate model. Global Change Biology 18 (9), p.2756-2770
Swales J (2016) Genre & English for Academic Purposes video: https://www.youtube.com/watch?v=W--C4AzvwiU&feature=youtu.be
 

Labels: , , , , ,