Lexicoblog: #IVACS2018: learner corpus research & ELT materials for Spanish learners

Last week, I spoke at the IVACS (Inter-Varietal Applied Corpus Studies) conference in Malta about my work using the Cambridge Learner Corpus (CLC) to help develop ELT materials targeted at Spanish learners of English. So, following on from my last post about my work generally using the learner corpus, here's a brief summary of my talk.

Photo from Naill Curry via Twitter

ELT: a global market

From the perspective of a large ELT publisher, if they're to invest in producing a major coursebook series - over several levels each with multiple components - it makes economic sense to sell it to the widest possible global market. This one-size-fits all approach, however, ignores the fact that different learners have different needs. Just one of the factors that differentiates learners is the influence of their first language; their L1. It's well-established that friction between a learner's L1 and target language, in this case English, can result in language transfer issues or interference, a factor not accounted for in materials for a global audience. In recent years, I've worked on a number of projects for CUP that have involved localizing materials to target them more effectively at Spanish learners. More specifically, I've used the CLC to investigate errors by Spanish learners to feed into English for Spanish Speakers (ESS) versions of a number of books.

For more about the CLC see my previous post.

Error types:

When you start looking at learner data for a specific L1 group, three broad error types emerge. There are global errors, that is errors that are common across learners more-or-less regardless of L1. These can be described as developmental or intralingual errors that are a result of the inherent quirks and irregularities in English that trip everyone up. Then there are interlingual errors where the learner's L1 rubs up against English in a way that creates friction and interference. Some of these are common across a language group, such as errors frequent among all Romance language speakers learning English, while others are L1 specific, so peculiar to say, Spanish speakers.

In my session, I took an example of each error type to show how I went about investigating the error and then incorporating activities to target the issue into classroom materials.

Global errors:

One classic example of a global, developmental error is with irregular verbs. Below is a list of the most common past simple/past participle verb inflection errors across the whole learner corpus. As you'd expect, there are some irregular verbs (pay, choose, rise, hear) and others where the spelling rules around whether or not to double the final consonant cause difficulties.

1 occured; 2 happend; 3 payed; 4 choosen; 5 prefered; 6 planed; 7 rised; 8 developped; 9 heared; 10 stoped

If we then look at the top tens for Spanish and French speakers for comparison, we see a lot of overlap.

Spanish: 1 choosen; 2 prefered; 3 payed; 4 teached; 5 refered; 6 planed; 7 occured; 8 heart; 9 writen; 10 tryed

French: 1 developped; 2 mentionned; 3 occured; 4 prefered; 5 choosen; 6 planed; 7 rised; 8 red; 9 enroled; 10 stoped

There are a few interesting differences though. The Spanish use of 'heart' as the past form of 'hear' doesn't seem to follow the pattern you'd expect - as with 'heared' in the global list. This can be put down to an issue of pronunciation; Spanish speakers tend not to pronounce voiced consonants at the end of words, so that a /d/ sound often becomes a /t/ (or is sometimes lost altogether) and this seems to spill over into the spelling. In the French list, we see the extra double letters in 'developped' and 'mentionned', this time because they're cognates in French (developper and mentionner), but both spelt in French with the double consonant that then creeps into the English. So whilst all learners need help and reminders about similar verb inflections, there are local factors that might come into play too.

Language Group errors and the issue of 'below-level' mistakes:

The error I looked at here is around students adding an unnecessary 's' inflecton onto adjectives to agree with a plural noun, so "differents reasons", "two news friends", "interestings questions", etc. Of course, many languages have adjective inflections that agree with the noun they modify for number and these kinds of errors are particularly simple to search for using the coded version of the corpus (where errors are tagged by type). Interestingly though, the corpus data suggests that this particular error is especially prevalent amongst Romance language speakers (Spanish, French, Italian, Portuguese).

What's perhaps more interesting here from a materials writer's perspective is that these errors crop up across levels, with examples right up to proficiency level in the data, even though students will likely learn the basic rules about adjectives in English in their beginner class. So these aren't 'errors' in the sense that the learners clearly know the rules around adjectives in English. Instead, they're mistakes, inadvertent slips. Looking at learner corpus data reveals a lot of these and it shows that the pattern of these mistakes can often be described as something of a bell curve, whereby learners make few errors when they first learn a new language form, partly just because they're cautious and don't use it very much. Then as they progress, they start to make a lot more mistakes with the forms they learnt at previous levels as they experiment and become more adventurous. You could say that they take their eye off the ball with adjectives by B1 or B2 because they're more concerned about complex sentence constructions and whether or not to use a past perfect simple verb form, for example. Then eventually the mistakes start to tail off as learners become more proficient, their language skills more automatic and they have the cognitive capacity to tidy up.

This presents a problem for me as a corpus researcher trying to feed into classroom materials. On the one hand, the data is telling me that these mistakes are significant at mid-levels and probably worth highlighting, but how do I convince editors, teachers and students that they need to focus on simple adjective forms at A2 or even B1 level without the materials seeming 'dumbed down' and 'below level'? The approach I took in one book illustrated below (Empower, A2, CUP 2016) was to:

Make it clear that this is revision. The note starts with the word 'remember' to acknowledge that students probably already know this and the explanation is short and simple - they don't need the 'rules' explained in detail all over again.
Combine several errors around adjectives. An activity just practising adjectives with singular and plural nouns would be pretty pointless at this level. Once the issue had been highlighted, students would find any follow-up activity mechanical and wouldn't engage with the point. By combining a number of issues, there's more to think about and you up the challenge. And a proof-reading activity of this kind is an authentic task type mirroring what students need to do with their own writing to reduce the number of mistakes that slip through.

Going beyond error codes:

The third point in the box above is also worth a bit more attention from a research perspective. The first two errors here jump out of the coded data (they're tagged as adjective inflection and word order errors), but the issue with the word 'colour' was less obvious. As I was looking through adjective examples, I started to notice various instances of awkward phrasing which had been tackled in the coded data in different ways.

I bought it in <#MD> | a </#MD> green colour . (KET, A2)

It's blue and white <#MT> | in </#MT> colour . (KET, A2)

It only cost 20€ and <#DD> it | its </#DD> colours are red and black. (KET, A2)

<#UP> It's | Its </#UP> colour <#MV> | is </#MV> black. (KET, A2)

I like it because <#MA> | it </#MA> is very small and <#MA> | it </#MA> is <#UN> colour | </#UN> black. (KET, A2)

Anyone who's ever marked student writing will know that there's more than one way to go about trying to correct an oddly-worded sentence and the suggestions in the coding above are all legitimate, but somehow didn't quite ring true to me. It struck me that in each case, the best solution would actually just be to drop the word 'colour' altogether. You might have noticed that all the examples are very similar and they were all indeed in response to the same question which asked students to describe a new mobile phone, including what colour is was. Hmm, so was this just a case of the wording of the question skewing the data? Was it just task effect? It prompted me to search more widely and I found that although I had a lot of examples from this one question, the same issue was cropping up at other levels amongst the Spanish learner data in response to completely different tasks. And from what I understand (I should confess at this point that I'm not a Spanish speaker!), it’s possible to say something along the lines of "a dress of colour blue" in Spanish. It's not a really major error, but it's a high-frequency word and I think the point fits nicely here and, hopefully, gives students (and teachers) pause for thought over something they may not have considered before.

L1-specific errors and classic false friends:

Finally, some of the most satisfying errors are the ones you track down which are clearly examples of L1 interference. And perhaps the most fun are the simple 'false friends'; the English words which seem to be a near equivalent to something in Spanish, but turn out to mean something different. I note these down as I work through the learner data, then try to collect them together into thematic sets which I can tie in with the coursebook syllabus. Below are a few around the theme of 'information' that I was looking at recently for some B2 material, shown along with the Spanish 'false friend' in brackets.

I am writing to you to reply to your <#RN> announcement | advertisement </#RN> in the newspaper. (anuncio)

It is really complicated to talk about a <#RN> theme | subject </#RN> as controversial as the cruelty of keeping animals in zoos. (tema)

What <#UD> a | </#UD> great <#RN> notice | news </#RN>! (noticia)

We would like to know if you will be able to come, and give a <#RN> conference | talk </#RN>. (conferencia)

In some of these, the meaning of the Spanish word simply doesn't match its English near equivalent - although they're often in the same semantic ballpark - announcement/advert, notice/news. Others are more about range of usage. So, the Spanish 'tema' seems more widely applicable than the English word 'theme' and gets used by students where 'subject' or 'topic' would fit better in English. And 'conferencia' in Spanish can describe both a conference and an individual talk or lecture. Activities for these are about raising students' awareness, drawing attention to the differences and where relevant, provoking some discussion.

As a side note here, when looking for example sentences for practice activities, although the learner corpus is great for a getting a feel for level, you have to be careful not to transfer subtly awkward phrasing and atypical constructions, ‘learnerese’ if you like, into materials. Especially at higher levels and with subtle differences, such as the theme/subject distinction, I’ll often have a browse through NS corpus data for example sentences. That way I’m ensuring learners have an authentic model and it’s also good to up the level of the language just a little to provide a sense of challenge and progress even when essentially revising.

Research into practice:

Hopefully, this handful of examples gives a taste of the work I've been doing, the way I make use of the learner corpus, both by using the error tags and going beyond the tags to explore less obvious errors. I've also tried to show just some of the issues that emerge in trying to translate the results of that analysis into materials that fit in with the coursebook syllabus, that focus on significant, but apparently below-level mistakes in a way that's appropriately challenging and engaging, and that draws learners' attention to language points that are especially relevant to them rather than just part of a generic global syllabus.

Labels: Cambridge learner corpus, corpus research, IVACS, Malta, materials writing

Lexicoblog

Wednesday, June 20, 2018

#IVACS2018: learner corpus research & ELT materials for Spanish learners

0 Comments:

Lexicoblog

About Me

Previous Posts