#IVACS2018: learner corpus research & ELT materials for Spanish learners
Last week,
I spoke at the IVACS (Inter-Varietal Applied Corpus Studies) conference in
Malta about my work using the Cambridge Learner Corpus (CLC) to help develop
ELT materials targeted at Spanish learners of English. So, following on from my
last post about my work generally using the learner corpus, here's a brief
summary of my talk.
Photo from Naill Curry via Twitter |
ELT: a
global market
From the
perspective of a large ELT publisher, if they're to invest in producing a major
coursebook series - over several levels each with multiple components - it
makes economic sense to sell it to the widest possible global market. This
one-size-fits all approach, however, ignores the fact that different learners
have different needs. Just one of the factors that differentiates learners is
the influence of their first language; their L1. It's well-established that
friction between a learner's L1 and target language, in this case English, can
result in language transfer issues or interference, a factor not accounted for
in materials for a global audience. In recent years, I've worked on a number of
projects for CUP that have involved localizing materials to target them more
effectively at Spanish learners. More specifically, I've used the CLC to
investigate errors by Spanish learners to feed into English for Spanish
Speakers (ESS) versions of a number of books.
For more
about the CLC see my previous post.
Error
types:
When you
start looking at learner data for a specific L1 group, three broad error types
emerge. There are global errors, that is errors that are common across learners
more-or-less regardless of L1. These can be described as developmental or
intralingual errors that are a result of the inherent quirks and irregularities
in English that trip everyone up. Then there are interlingual errors where the
learner's L1 rubs up against English in a way that creates friction and
interference. Some of these are common across a language group, such as errors frequent
among all Romance language speakers learning English, while others are L1
specific, so peculiar to say, Spanish speakers.
In my
session, I took an example of each error type to show how I went about
investigating the error and then incorporating activities to target the issue
into classroom materials.
Global
errors:
One classic
example of a global, developmental error is with irregular verbs. Below is a
list of the most common past simple/past participle verb inflection errors
across the whole learner corpus. As you'd expect, there are some irregular
verbs (pay, choose, rise, hear) and others where the spelling rules around
whether or not to double the final consonant cause difficulties.
1 occured;
2 happend; 3 payed; 4 choosen; 5 prefered; 6 planed; 7 rised; 8 developped; 9
heared; 10 stoped
If we then
look at the top tens for Spanish and French speakers for comparison, we see a
lot of overlap.
Spanish: 1
choosen; 2 prefered; 3 payed; 4 teached; 5 refered; 6 planed; 7 occured; 8
heart; 9 writen; 10 tryed
French: 1
developped; 2 mentionned; 3 occured; 4 prefered; 5 choosen; 6 planed; 7 rised;
8 red; 9 enroled; 10 stoped
There are a
few interesting differences though. The Spanish use of 'heart' as the past form
of 'hear' doesn't seem to follow the pattern you'd expect - as with 'heared' in
the global list. This can be put down to an issue of pronunciation; Spanish
speakers tend not to pronounce voiced consonants at the end of words, so that a
/d/ sound often becomes a /t/ (or is sometimes lost altogether) and this seems to
spill over into the spelling. In the French list, we see the extra double
letters in 'developped' and 'mentionned', this time because they're cognates in
French (developper and mentionner), but both spelt in French with the double
consonant that then creeps into the English. So whilst all learners need help
and reminders about similar verb inflections, there are local factors that
might come into play too.
Language
Group errors and the issue of 'below-level' mistakes:
The error I
looked at here is around students adding an unnecessary 's' inflecton onto
adjectives to agree with a plural noun, so "differents reasons",
"two news friends", "interestings questions", etc. Of
course, many languages have adjective inflections that agree with the noun they
modify for number and these kinds of errors are particularly simple to search
for using the coded version of the corpus (where errors are tagged by type). Interestingly
though, the corpus data suggests that this particular error is especially
prevalent amongst Romance language speakers (Spanish, French, Italian,
Portuguese).
What's
perhaps more interesting here from a materials writer's perspective is that these
errors crop up across levels, with examples right up to proficiency level in
the data, even though students will likely learn the basic rules about
adjectives in English in their beginner class. So these aren't 'errors' in the
sense that the learners clearly know the rules around adjectives in English.
Instead, they're mistakes, inadvertent slips. Looking at learner corpus data
reveals a lot of these and it shows that the pattern of these mistakes can
often be described as something of a bell curve, whereby learners make few
errors when they first learn a new language form, partly just because they're
cautious and don't use it very much. Then as they progress, they start to make
a lot more mistakes with the forms they learnt at previous levels as they experiment
and become more adventurous. You could say that they take their eye off the
ball with adjectives by B1 or B2 because they're more concerned about complex
sentence constructions and whether or not to use a past perfect simple verb
form, for example. Then eventually the mistakes start to tail off as learners
become more proficient, their language skills more automatic and they have the
cognitive capacity to tidy up.
This
presents a problem for me as a corpus researcher trying to feed into classroom
materials. On the one hand, the data is telling me that these mistakes are
significant at mid-levels and probably worth highlighting, but how do I
convince editors, teachers and students that they need to focus on simple
adjective forms at A2 or even B1 level without the materials seeming 'dumbed
down' and 'below level'? The approach I took in one book illustrated below
(Empower, A2, CUP 2016) was to:
- Make it clear that this is revision. The note starts with the word 'remember' to acknowledge that students probably already know this and the explanation is short and simple - they don't need the 'rules' explained in detail all over again.
- Combine several errors around adjectives. An activity just practising adjectives with singular and plural nouns would be pretty pointless at this level. Once the issue had been highlighted, students would find any follow-up activity mechanical and wouldn't engage with the point. By combining a number of issues, there's more to think about and you up the challenge. And a proof-reading activity of this kind is an authentic task type mirroring what students need to do with their own writing to reduce the number of mistakes that slip through.
Going
beyond error codes:
The third
point in the box above is also worth a bit more attention from a research
perspective. The first two errors here jump out of the coded data (they're
tagged as adjective inflection and word order errors), but the issue with the
word 'colour' was less obvious. As I was looking through adjective examples, I
started to notice various instances of awkward phrasing which had been tackled
in the coded data in different ways.
I bought it
in <#MD> | a </#MD> green colour . (KET, A2)
It's blue
and white <#MT> | in </#MT> colour . (KET, A2)
It only
cost 20€ and <#DD> it | its </#DD> colours are red and
black. (KET, A2)
<#UP>
It's | Its </#UP> colour <#MV> | is </#MV> black.
(KET, A2)
I like it
because <#MA> | it </#MA> is very small and <#MA> | it
</#MA> is <#UN> colour | </#UN> black. (KET, A2)
Anyone
who's ever marked student writing will know that there's more than one way to
go about trying to correct an oddly-worded sentence and the suggestions in the
coding above are all legitimate, but somehow didn't quite ring true to me. It
struck me that in each case, the best solution would actually just be to drop
the word 'colour' altogether. You might have noticed that all the examples are
very similar and they were all indeed in response to the same question which
asked students to describe a new mobile phone, including what colour is was.
Hmm, so was this just a case of the wording of the question skewing the data?
Was it just task effect? It prompted me to search more widely and I found that
although I had a lot of examples from this one question, the same issue was
cropping up at other levels amongst the Spanish learner data in response to
completely different tasks. And from what I understand (I should confess at
this point that I'm not a Spanish speaker!), it’s possible to say something
along the lines of "a dress of colour blue" in Spanish. It's not a
really major error, but it's a high-frequency word and I think the point fits
nicely here and, hopefully, gives students (and teachers) pause for thought
over something they may not have considered before.
L1-specific
errors and classic false friends:
Finally,
some of the most satisfying errors are the ones you track down which are
clearly examples of L1 interference. And perhaps the most fun are the simple
'false friends'; the English words which seem to be a near equivalent to
something in Spanish, but turn out to mean something different. I note these
down as I work through the learner data, then try to collect them together into
thematic sets which I can tie in with the coursebook syllabus. Below are a few
around the theme of 'information' that I was looking at recently for some B2
material, shown along with the Spanish 'false friend' in brackets.
I am writing to you to reply to your <#RN> announcement |
advertisement </#RN> in the newspaper. (anuncio)
It is really complicated to talk about a <#RN> theme | subject
</#RN> as controversial as the cruelty of keeping animals in zoos. (tema)
What <#UD> a | </#UD> great <#RN> notice | news
</#RN>! (noticia)
We would like to know if you will be able to come, and give a <#RN>
conference | talk </#RN>. (conferencia)
In some of
these, the meaning of the Spanish word simply doesn't match its English near
equivalent - although they're often in
the same semantic ballpark - announcement/advert, notice/news. Others are more
about range of usage. So, the Spanish 'tema' seems more widely applicable than
the English word 'theme' and gets used by students where 'subject' or 'topic'
would fit better in English. And 'conferencia' in Spanish can describe both a
conference and an individual talk or lecture. Activities for these are about
raising students' awareness, drawing attention to the differences and where
relevant, provoking some discussion.
As a side
note here, when looking for example sentences for practice activities, although
the learner corpus is great for a getting a feel for level, you have to be
careful not to transfer subtly awkward phrasing and atypical constructions,
‘learnerese’ if you like, into materials. Especially at higher levels and with
subtle differences, such as the theme/subject distinction, I’ll often have a
browse through NS corpus data for example sentences. That way I’m ensuring
learners have an authentic model and it’s also good to up the level of the
language just a little to provide a sense of challenge and progress even when
essentially revising.
Research
into practice:
Hopefully,
this handful of examples gives a taste of the work I've been doing, the way I
make use of the learner corpus, both by using the error tags and going beyond
the tags to explore less obvious errors. I've also tried to show just some of
the issues that emerge in trying to translate the results of that analysis into
materials that fit in with the coursebook syllabus, that focus on significant,
but apparently below-level mistakes in a way that's appropriately challenging
and engaging, and that draws learners' attention to language points that are
especially relevant to them rather than just part of a generic global syllabus.
Labels: Cambridge learner corpus, corpus research, IVACS, Malta, materials writing
0 Comments:
Post a Comment
<< Home