Lexicoblog

The occasional ramblings of a freelance lexicographer

Wednesday, July 04, 2018

Sticking to your lexical guns: 4 principles for writing vocabulary materials


I was inspired to put together a talk for the recent MaWSIG/Oxford Brookes event by a number of posts by Katherine Bilsborough about materials writers’ principles. It’s a topic I’ve been pondering for a while and one I decided to give a vocabulary slant. After lots of thought, I came up with four broad principles. And with the title of the event being about challenges and opportunities, I combined my four principles with some of the challenges I face in sticking to them when they seem to conflict with the brief I’m working to.

Principle 1: Have a clear aim in mind for every activity
This may seem obvious and a bit of a universal truth when writing any kind of materials, but I think that all too often, vocab activities get tagged on – to a grammar syllabus, to reading lessons – without any real thought about what they aim to achieve other than “teach some words” ... which isn’t really a realistic aim, is it? Because vocabulary learning isn’t as simple as “doing” a word once and then it’s known. Yet it often gets pushed into a standard PPP model:

Pack some words into a text (whether they’re ever used together or not!)
Pop them into a gap-fill (because that’s what you do with vocab, isn’t it?)
Prod students into using them (because they ‘know’ them now, right?)

In fact, most research into vocab acquisition suggests that learning vocab is a gradual process in which students get to grips with words over a period of time via repeated exposures. Which suggests an approach to teaching vocab something like below might be more appropriate.
So, in an ideal world, at the first encounter with a word (in context), the focus of any activities would be on comprehension or receptive knowledge, that is recognizing the form of the word (spelling and pronunciation) and understanding its meaning in the current context ... quite enough for a first meeting. Then, as the same item pops up again and again, the focus shifts from reception and understanding how the word’s used in different contexts to controlled production. As a word becomes more familiar, students should be encouraged to look at how they can use it – what collocations is it used in, what register is it, what are its grammatical features and what patterns does it typically appear in? Then eventually, along the line, they’ll hopefully be ready to start using it for freer production. How long that process takes will depend on the type of word and also the stage the learner’s at. And crucially, at each stage, the aims of any vocab activities will be quite different.

The Challenge:We certainly can’t have a student see a word twice!” – as Dorothy Zemach so succinctly highlighted in her IATEFL plenary earlier this year, within the world of ELT publishing, there’s very much of a focus on providing a constant stream of fresh, new vocabulary and repetition of items is actively discouraged – “you can’t have that word, it was covered at the previous level”. To be fair, this isn’t just coming from publishers. Students and teachers naturally want to feel like they’re making progress and in many people’s minds, understandably, that’s about increasing their vocab. They want to see new words to learn as they work their way through a course.

The Work-around: Exposing students to vocabulary doesn’t have to be confined to specific vocabulary sections. As a writer, if you keep a record of newly introduced vocab, you can sneak it in all over the place. A new word might pop up first in an explicit vocab activity, but it can easily be recycled in later units in different sections. If you’ve got a grammar activity to write, it’s relatively easy to look back at target vocab from previous units to include in example sentences. You could even slip some work on collocation into a revision of the present perfect, for example (this is a quickie, made-up exercise just to demo the point):

Complete the gaps using the best verb from the box in the present perfect.

do   give   make 

Jack ______ his homework, but he _____ a lot of mistakes. The teacher isn’t very happy and she _____ him a low mark. …

Principle 2: Create reasonable and memorable lexical sets
Following on from the pressure to constantly provide students with lots of juicy new vocab, a lot of ELT materials seem to regularly throw long lists of semantically-similar words at students in the hope that they’ll stick. The trickle-down of research into how useful (or not) it is to teach vocabulary in traditional lexical sets is patchy at best. From my understanding of the research (such as it is), things to avoid in materials include long lists of very similar words introduced as new vocabulary, including near synonyms, easily substitutable items and synforms (words which look very similar). The reason for this being that students easily get them confused and so find them more difficult to learn – an issue known in the literature as interference (Nation, 2000).

From a writing perspective, that means avoiding confusables in new vocab as much as possible. So, if you’re trying to write an activity and you’re struggling to come up with items that have unambiguous answers because more than one word in the set can fit in a gap, it probably means you have items that are too close together and you might want to consider tweaking your set.

Note, however, that the problem of interference is largely around new vocabulary. Researchers suggest that once words are fairly well-established, then bringing similar words together is actually beneficial. Arguably, at higher levels, it’s essential for students to understand how synonyms overlap and in what ways they differ, so bringing them together at this stage is a necessary part of learning.

The Challenge:  Especially if you’re working on a large, publisher-led project, you’re likely to have a scope and sequence document that includes vocab sets based largely on topics and often those will include sets of worryingly similar words.

The Work-around: Whilst sets of overly similar items can lead to confusion, that doesn’t mean you can’t have thematic sets which include a range of different vocab to talk about a topic. One really simple way to mix up a vocab set is to include different parts of speech. So, for example, if you’re doing an A2 unit on ‘feelings’ you might have a suggested vocab set that contains exclusively adjectives … *heart sinks*:
But of course, we don’t just use adjectives to describe our feelings and there are plenty of other on-level verbs and nouns you could include as well. This might involve bending the brief a little, but it’s quite possible without straying too far from the original plan. And not only does this make the items less easily confused by students, it also encourages more variety of expression – which has got to be a good thing.

Principle 3: Use research tools with a large dose of common sense
There are lots of tools that we use as writers to help us make choices about what vocabulary to include and what to leave out– wordlists (like the AWL), ‘vocab level’ lists (like EVP), text analysis tools (like Text Inspector), corpora, readability measures, etc. They all provide valuable input to complement our own experience and intuition, but they do need to be treated with care. Most importantly, I think it’s essential to fully understand any tool you use, to know what it’s based on, what it tells you and crucially, what its limitations are.

I won’t go into the ins and outs of wordlists again here, but will refer you back to my previous blog on the topic. A quick further note on text analysis tools though – whilst, they provide a really useful guide, they are automated and can’t always be relied on to get it right for every word in your text. Make sure you check for each word whether the tool has chosen the appropriate sense (the full version of Text Inspector allows you to choose the sense from a drop-down), whether it’s got the correct part of speech (a percentage of words in any text will usually be wrong here) and finally, check whether the words are part of a phrase or chunk as this won’t usually be recognized by the tool and may change the level significantly. When I put a short section of the abstract for my talk through Text Inspector, for example, I’d estimate that roughly 25% of the words were initially labelled incorrectly for level.

The Challenge: Your editor insists that you can’t use a B2 word in a reading text in a B1 book, because they’re sticking rigidly to a level list and, if they’re using EVP, have misunderstood (as many people do) that the level labels signal productive usage by students at that level which is very different from when a student might first encounter a word receptively. 

The Work-around: If you properly understand the list (or other tool) you’re being asked to use, then you’ll be in a much stronger position to argue your case – in this case, explaining that if a word is tagged as B2 for production, it’s perfectly reasonable to introduce it receptively in a B1 reading text. Of course, if you’re just one writer on a large multi-author project that’s already well in motion, then you’re not always going to win your case, but certainly for smaller projects, showing your understanding up-front might help steer things in a more informed direction.

Principle 4: Work beyond the level of the word
Research suggests that somewhere between 30% and 50% of any text is made up of phrases, idioms and other chunks of language (depending on the type of text and how you count). So understanding these chunks is vital for any language learner, as is getting to grips with how to use them themselves. One piece of research (Millar, 2011) found that atypical collocations used by learners slowed readers down significantly and made reading a text overall much harder work. Not something anyone wants when they’re trying to communicate. Yet, most vocab materials still focus on lists of individual words.

Again, students (and teachers) like to see lists of vocabulary and once you start trying to include more than the simplest of phrases, trying to compile a list gets messy. Short phrases – at least, as a result – work okay on lists, but in sb’s safe/capable, etc. hands or know better (than that/than to do sth) probably won’t fit neatly on a single line and let’s be honest, do look a bit confusing.

The Challenge: Your brief states that there needs to be a list of key vocab at the start of each unit/section and the vocab list you’re working from contains largely single words.

The Work-around: Just because you’re highlighting individual key words in your headline list doesn’t mean you can’t work phrases, collocations and other chunks of language into your activities. For example, a simple activity in which students have to match sentence halves can work for checking comprehension of the key words (one per sentence), but can also involve students noticing a collocation in the other half of the sentence. This might be explicit – mentioning the collocations in the rubric and even getting students to underline the pairs of words – or if your editor’s not keen, just leaving in the collocation element quietly for students to absorb implicitly.

So those are my four broad principles, some of the challenges I regularly face in trying to stick to them and just a few of the work-arounds I use to argue my case, to bend a brief or, if all else fails, to sneak things in under the radar.

A couple of references:

Martinez, R. & Schmitt, N. (2012) A Phrasal Expressions List, Applied Linguistics 33/3
Millar, N. (2011) The Processing of Malformed Formulaic Language, Applied Linguistics 32/2
Nation, P. (2000) Learning Vocabulary in Lexical Sets: Dangers and Guidelines, TESOL Journal



Labels: , , , , , ,

Thursday, June 07, 2018

Learner corpus research: mixed methods and detective work


In a blog post at the end of last year, one of my resolutions for 2018 was to focus more on the areas that especially interest me and one of those was corpus research. So far, so good. I’ve spent the past couple of months researching Spanish learner errors for a writing project and next week, I’ll be presenting at my first corpus linguistics conference, IVACS in Valletta, Malta. Coincidentally, I’m going to be talking about my work with the Cambridge Learner Corpus (CLC) researching errors by Spanish students to feed into ELT materials for the Spanish market.

Although this will be my first time speaking at a corpus linguistics conference, it’s far from the first time I’ve spoken about my work with the CLC. In fact, my first presentation at a major conference, at IATEFL in Dublin back in 2000, was about the work I’d been doing using the CLC to write common error notes for the Cambridge Learner’s Dictionary.

So what is the Cambridge Learner Corpus?

The CLC is a collection of exam scripts written by learners taking Cambridge exams, including everything from PET and KET, through First, CAE and Proficiency, IELTS and the Business English exams. It includes data from over 250 000 students from 173 countries. You can choose to either search through the raw data or there’s a coded version of the corpus in which errors have been classified and corrections suggested, much as they would be by a teacher marking a student essay, allowing you to search for specific error types. So, the example below shows how a verb inflection error would be coded.

We are proud that you have <#IV> choosen | chosen </#IV> our company.

You can also search by CEFR level, by specific exams, by country and by L1.

Research for exams:

When I gave my talk in Dublin all those years ago, one of the first questions at the end came from the redoubtable Mario Rinvolucri who was sitting right in the middle of the front row. He was concerned that the corpus didn’t include any spoken data, so wasn’t really representative of student language in general. And he was right. One of the major drawbacks of the CLC is that it only reflects students’ written performance and then, only in an exam writing context. That means it doesn’t pick up issues that are specifically related to spoken language and the data is rather skewed by the topics and genres of exam writing tasks (largely emails and essays).

That does, however, make it perfect for informing exam practice materials. Over the years, I’ve carried out learner corpus research to feed into a whole range of exam materials. This has largely involved searching for the most frequent errors that match different parts of a coursebook syllabus in order to provide the writers with evidence and examples to help them better target particular problem areas. It also led to the Common Mistakes at … series, most of which I was involved in researching and two of which I wrote.


Mixed methods and detective work:

One of the things I enjoy most about working with the learner corpus, even after more than 18 years, is that I’m constantly finding new ways to get the best from the data. It’s easy to start out by searching for the most frequent errors by type, so the top ten spelling errors or the most commonly confused nouns (trip/travel, job/work, etc). But the stats don’t tell the whole story. 

Firstly, there’s the issue I’ve already mentioned about the skewing of the data by particular exam questions and topics. So, for example, one of the top noun confusion errors amongst Spanish learners in the corpus is to use ‘jail’ instead of ‘cage’; a lot of animals are locked up in jails. It is a legitimate false friend error (cage is jaula in Spanish), but it’s only so prominent in the data because of a classic FCE essay question about keeping animals in zoos. Does it merit highlighting in materials? Probably not compared to other errors that involve more high-frequency nouns and crop up across a range of contexts. There’s a balancing act to achieve between looking at the corpus stats, delving into the source of the errors (Were all the examples of an error prompted by a single exam question?), understanding the likely needs of students (Is it a high frequency word or likely to come up again in an exam?) and understanding what’ll work (and won’t work!) in published materials. I think it’s when it comes to these last two that, as a former teacher and full-time materials writer, I probably have the edge over a purely academic researcher.

Then there’s the tagging of errors to consider. Many learner corpus researchers are wary of error tagging because it can push the data into fixed categories that aren’t always appropriate. Any teacher who’s ever marked a student essay will know that some errors are very straightforward to mark (a spelling error or a wrong choice of preposition, for example), while others are messy. There can be several things going on in a single sentence that contribute to it not working and sometimes it’s difficult to pin down the root cause. Not to mention those chunks of text that are so garbled, you’re just not sure what the student intended and you don’t know how to go about correcting them. That means that while the coders who mark up the CLC data do a fantastic job, there are always instances that are open to interpretation. 

When I find an error that looks worth highlighting within a particular error category, I’ll often do a more general search for the word (or form or chunk) to see whether it crops up elsewhere with different tags. Sometimes, I’ll go back to the untagged data too to see how students are using the word more generally. This can help me pin down issues that stray across error categories. Then, if the error isn’t a straightforward one, I’ll flick over to a native-speaker corpus to check how the word or structure in question is typically used – after looking at too much learner data, you start to question your intuitions! – and check a few reference sources to help me pinpoint exactly where the mismatch lies and try to come up with a clear explanation for the students.

It’s this multi-layered detective work to understand where things are going wrong and figure out the best way to help learners understand and hopefully, overcome language issues that I find so satisfying.

At the IVACS conference, I’ll be talking about delving into the issues specific to Spanish learners at 16.00 on Thursday, 14 June for anyone who’s going to be joining me in Valletta.

Labels: , , , , ,

Monday, April 23, 2018

IATEFL2018: Vocabulary lists: snog, marry, avoid?


At the recent IATEFL conference in Brighton, I gave a talk as part of the MaWSIG showcase about the way wordlists are used (and misused), especially in writing ELT materials and some of the issues that writers need to be aware of.


Below is an overview of my key points and also links to some of the references and tools I mentioned. I've embedded links in the post, but also repeated them all at the end, so if you came to the talk and just want the links, feel free to scroll down.


What do I mean by a wordlist?
My talk was about the kind of standardized wordlists that have been put together according to some criteria (typically frequency and usefulness for learners) and then published with the aim of being used as a basis for deciding which vocabulary to prioritize in teaching. There are loads of wordlists out there, but I mentioned just a few of the most well-known:
Specialist lists: Academic Word List (AWL), Academic Vocabulary List (AVL), New AWL, discipline specific lists (e.g. for Engineering, Medicine, etc.)
Vocabulary level tools: These approach the task from a slightly different perspective. Instead of providing a limited list of target vocab, they instead classify items from a learner's dictionary according to the level at which learners are most likely to start using/need each item. I'm especially familiar with English Vocabulary Profile, EVP (from Cambridge) and there's also the Global Scale of English, GSE, vocab tool (from Pearson). Both online tools allow you to look up an item and check its suggested level based on the CEFR scale (A1, A2, B1 etc.)


Why are wordlists popular?
Given the huge variety of English vocabulary, it's not surprising that anything that gives teachers and materials writers a starting point and a guide to which items might be most useful to teach first is popular. Wordlists provide a principled basis for planning a vocab syllabus, backing up our intuitions about which words are most frequent and saving us from reinventing the wheel by having to research the frequency of each word as we go along. For publishers, they also help to ensure a consistent approach to vocab across a coursebook series, across different titles or between a group of writers all working on the same project; they provide a single lexical hymn-sheet for everyone to sing from, if you like.

Why you need to understand your list:
Whilst wordlists have an obvious appeal, especially for writers, I think it's really important to understand any list you plan to use before you get started. Understanding how a list was put together, what the aims of the list compilers were, what criteria they used to select items and what data they used is vital. To take the academic wordlist (AWL) as an example:

  • It aims to identify general academic vocab, so it excludes items that only appear in specific disciplines, such as science or medicine, and focuses on words common across a range of disciplines. So if you're teaching ESP/ESAP, you'll need to supplement it with relevant subject-specific vocab.
  • It's based on data from published academic writing, not from student writing. That means it provides a good guide to the vocab students might need to know receptively (i.e. for reading), which might not necessarily be quite the same as what they need productively, for their own writing. See Durrant (2016) for an interesting look at what proportion of an academic wordlist student writers actually need.
  • The AWL excludes items on the GSL based on the premise that EAP students will have already 'learnt' this core general vocabulary. That doesn't, however, take into account any gaps in students' general vocab knowledge or that many of those general words are absolutely vital for academic writing and are often used in a way that might not be entirely predictable and students might not have already encountered. That's not necessarily a criticism of the list (you've got to draw the line somewhere), but it does mean that as a writer, you might want to include some of that off-list vocab in your syllabus.

And it's not just the AWL this applies to, all wordlists have their own quirks and limitations and unless you understand what these are, you're not going to get the best out of the list or understand what gaps you might need to fill. See the links at the bottom of this post for some places you can learn more about different lists.

User beware:


Issue 1: The nature of English
One issue with trying to chivvy words into a nice, neat list is that English is a messy beast and words are slippery little suckers! 

Multiple meanings: English is a highly polysemous language, that is, many words have multiple meanings. For example, a table can be a piece of furniture (very much an elementary word) or it can be a graphic representation of data in rows and columns (definitely a less frequent sense). Most lists don't differentiate between senses, leaving the user to guess which sense is the core one that should be taught and whether they should stretch to other senses or not. Lists such as EVP and GSE do give levels for different senses (so EVP has table=furniture as A1 and table=chart as B1), but if you put your text through a text-checking tool such as Text Inspector or VocabKitchen, it'll show the level for the basic, most frequent sense only. So in the phrase "the data in the table above", table would be highlighted as A1.

What is a word: Most lists deal in lemmas, that is a single part of speech and its associated inflections (so speak, speaks, spoke, spoken, speaking is one lemma). Some lists, such as the AWL, take the word family as their basic unit, that takes in all the words from a single root, including different parts of speech and prefixes (develop, development, developing, developmental, underdeveloped, etc.). This makes sense in an EAP context where being able to switch between parts of speech is a key skill for student writers, but deciding which members of a word family to focus on also requires a bit of common sense. You might, for example, decide to skip disestablishment as part of the establish word family!

Chunks: Most frequency-based wordlists tend to focus on individual words, simply because even the most common phrases or formulaic expressions (at least, in the first place, etc.) just don't make it in on frequency criteria alone. However, language chunks make up somewhere between 30-50% of any text, so they're clearly a really important part of vocabulary learning. This has two implications for writers (and teachers); firstly, you may want to supplement your wordlist with some useful chunks (such as those on the phrasal expressions list or just collocations to go with your key words) and again, you need to take chunks into account if you're using text-checkers - the chunk 'in the first place' will be shown as a sequence of A1 single words rather than being recognized as a fixed expression (ranked as B2 on EVP).



Issue 2: The nature of language learning
Similarly, language learning is a messy, non-linear sort of process, that isn't as simple as ticking words off a list and declaring them 'learnt'. Wordlists make it all too easy to fall into this trap though. Many's the time I've been told by an editor that I can't include a word in a vocab activity because it's already been 'covered' at a previous level ... and as Dorothy Zemach put it so brilliantly in her plenary "We can't have a student see a word twice!". Most research agrees that vocab learning requires repeated exposures to a word. Of course, I understand where my editors are coming from and there are other ways of recycling vocabulary without having to have the same words pop up as the vocab focus time and again, but it's still an important factor to bear in mind.

There's also the issue of whether a word is going to be most useful for a student at any particular stage for receptive purposes (i.e. we just want them to recognize and understand it when they comes across it) or whether we expect them to be able to use it productively. A lot of words will start off in a student's receptive vocab and then gradually shift into their productive repertoire. Some words will get stuck in reception even though we'd like them to move on. And others can quite happily stay as receptive only ... I know plenty of words that I understand but will probably never feel the need to use. Again, understanding whether a list is suggesting words for receptive or productive use at a particular level is vital. So, EVP, for example, aims to describe vocab that students are using productively at certain levels (based in large part on what students are writing in Cambridge exams). So if a word is labelled B1, then B1 students are already confident enough to use it in their exam writing. That means they probably became familiar with it receptively quite some time before. And if I want to include a word in a reading text in a B1 book, as receptive vocab, choosing an item marked as B2 will be entirely appropriate.

Issue 3: The nature of learners
Finally, learners don't form the single homogenous audience that universal wordlists suggest they might be with an equal number and range of vocab learning gaps to be filled.

L1 plays an important role in vocab learning, with learners from L1s that share a history with English (Romance languages, Greek, Germanic languages) having a head start when it comes to certain words because they're close cognates in their first language. For example, a word like diurnal may seem 'difficult', but if you're an Italian, French, Spanish, Portuguese or Romanian student of animal behaviour, you'll probably recognize it right away. Whereas your German-speaking peer will probably have to look it up to find it's translation (tagaktiv).

Age, interests, location and language needs will also play a role in exactly which vocab items are relevant to any given student. Yes, they'll probably all find a common core useful, but they'll want words to describe the things that are important or helpful to them and their context too. When I was learning French at school, I wanted to know all the cool, teenage slang, nowadays I'd be more likely to want vocab to describe my garden. Anyone using English in an ESP context is likely to need apparently low-frequency, specialist terms, sometimes quite early on in the language learning process.

Language level makes a difference too. Whilst most linguists agree on a common core of the more frequent couple of thousand words or so which might sustain a learner up to, say, intermediate level, beyond that, frequency statistics become less reliable and less useful. As you start to investigate lower frequency words, the range of similar-frequency items suddenly explodes and exactly which words you choose to teach will inevitably have to be guided more by usefulness for particular groups of learners than by simple frequency, making wordlists a much less reliable guide for higher level learners.

Wordlists: snog, marry, avoid?
So, if wordlists are so flawed, should we be bothering with them at all? Well, personally, I'm not going to be dumping them just yet because they are still undoubtedly an incredibly useful tool. But they're just that, a tool, to be used like any other reference resource we might turn to, as just one part of the mix, with full knowledge of their idiosyncratic quirks, taking into account all the factors I've mentioned here and always applying a solid dose of common sense.

Links:

Wordlists:
These are the most useful links I've found for each list. Most give the background to the list and the list itself.
General Service List (West, 1953)
New GSL (Browne, et. al, 2013)
Academic Word List (Coxhead, 2000)
Academic Vocabulary List  (Gardner & Davies, 2013)
New AWL (2013)
Phrasal Expressions List (Martinez &Schmitt, 2012) 
Phrasal Verbs List (Garnier & Schmitt, 2015) 
Global Scale of English vocab tool (Pearson) - for background to the vocab tool, click on Developing the GSE Vocabulary on the Research & Expertise page 
See also Mura Nava's excellent list of wordlists for many more lists and links, including many of the specialist ESP lists. 

Text analysis tools:
Text Inspector - a paid tool with several analysis options (including EVP and AWL)
VocabKitchen - a free tool with CEFR and AWL options
Lextutor - a free tool with several analysis options, but not the most user-friendly interface  

Other references:

Durrant, P. (2016) To what extent is the Academic Vocabulary List relevant to university student writing? English for Specific Purposes 43
Working with wordlists - a blog post I wrote for the MaWSIG blog a couple of years ago


Labels: , , , , ,