Lexicoblog

The occasional ramblings of a freelance lexicographer

Wednesday, June 27, 2018

#IVACS2018: new corpora and new collaborations


I came away from the IVACS conference in Malta a couple of weeks ago with my head buzzing with ideas which still haven't quite settled themselves down into a coherent order for a blog post, but here goes anyway ...

Lots of corpora:
Almost every speaker seemed to start off by describing the new corpus they'd compiled for their research. To the outsider, this might seem slightly pointless, constantly reinventing the wheel ... why not just use an existing corpus? I guess the answer is that the English language isn't a single homogeneous whole and if you want to study a particular area of usage or group of users, you do need to look at relevant, representative data. A lot of the corpora I heard about were fascinating, but a couple I took note of as being especially useful for ELT - and crucially, available or almost available to use - were ...

EFCAMDAT: This is a corpus of learner writing collected from the language school chain EF’s online learning platform, based on responses to a variety of tasks at a range of levels. I had come across it before and tried to register to use it but never got the confirmation email through. (Tried again this week, but I seem to have got stuck at the same point, sadly). I think it's potentially a useful addition to the CLC because of its wider range of task types. Ute Römer talked about her research into verb patterns using EFCAMDAT and it looked like it was bringing up a nice range of examples. I’ll keep trying to register …

Growth in Grammar: Phil Durrant described his work compiling and analysing a corpus of children's language. The project is aimed at understanding how (native speaker) UK school children's language develops between the ages of 5 and 16 and the corpus consists of writing kids have done in school, not tasks set by the researchers, just their day-to-day schoolwork. The corpus is due to be made publically available in the near future and it struck me as a potentially really useful resource for writers of ELT materials for young learners. A number of times lately, I've been asked about using corpora to inform YL materials and I've drawn a bit of a blank because most existing corpora consist firmly of grown-up data which just doesn't seem a relevant point of comparison. When you're asked to look at the authenticity of the language in a story about a talking octopus, the BNC really doesn't offer very much!! So I'll be keeping an eye open for when Growth in Grammar becomes available for all my YL ELT writer colleagues.

Research into practice:
A number of speakers were talking about the potential implications of their research for teachers and learners. Again, loads of interesting stuff, but I just want to highlight a couple of things that have stuck with me ...

Native/non-native teachers, teacher talk and ELF: Perhaps my favourite session of the whole conference was Eric Nicaise's brilliant paper on teacher talk.  He compiled a corpus of teacher talk in the ELT classroom by native and non-native speaker teachers (working in Belgium) to compare their use of language.  It revealed some fascinating differences which Eric was keen to stress weren't about making any kind of judgment, but about raising both groups of teachers' awareness of the type of language they use which might be providing a model for their students. Just two of the findings he had time to present were around the use of modals and phrasal verbs. He found that the NS teachers used a much wider range of modals in the classroom and used them for a wider range of functions. So, for example, while NS teachers would say things like “Would you like to … (turn to page …)” and "Shall we ... (look at …)?", NNS teachers would often go for much more direct imperatives “Turn to …” and “Repeat …” (examples from memory).  And perhaps unsurprisingly, when it came to phrasal verbs, the NS teachers again used far more scattered throughout their speech, whereas the NNS teachers used fewer and they tended to be mainly when they we were consciously teaching and exemplifying phrasal verbs.

For me, this set my head whizzing with all kinds of ideas. Eric mentioned the idea of language that's perhaps more suitable for teacher talk at lower levels where NS teachers might want to consciously use simpler forms and it also struck me that avoiding the trickier forms might apply to learners in ELF contexts ... why use confusing phrasal verbs if you're a French L1 speaker talking to an Italian and you'll both find Latinate verbs much easier?! Then from the opposite perspective, if learners are underexposed to a full range of modals, for example, they risk coming across to native speakers they communicate with (and expert NNSs too) as overly direct and even rude. This has implications for NNS teachers who might want to try and model more of these forms in their TT, especially at higher levels, and for learners, who need to be aware of the effect their language choices can have on the impression they create.

Collaboration between researchers and materials writers: One of the key points I tried to make in my own session was about how we can feed insights from corpus research into classroom materials and this was a theme that cropped up again and again in other talks with various researchers talking about the importance of translating their research into practical applications. For some, with very specific research objectives, that meant applying it directly to their own teaching context, but several also expressed the need for more collaboration between researchers and mainstream materials writers. In the Q&A at the end of Ute Römer's great plenary, I ventured to suggest a direct collaboration between corpus researchers and MaWSIG (the IATEFL materials writing group) ... a suggestion that seemed to be received enthusiastically. 

Conference selfie via @uroemer

So often it seems to me that freelance ELT writers would like to keep up with the latest research but they come up against various barriers (a point picked up too by Clare Maas at the recent MaWSIG/Oxford Brookes event). Most academic research is behind paywalls which make it difficult to access if you're not attached to a university – I couldn’t, for example, access the papers on which some of the talks mentioned above were based in order to recheck details. Of course, there are ways around this if you know what you're particularly interested in (I could have emailed the researchers directly), but without access to the kind of search facilities found in university libraries, how do you know what to look for? Finding out about relevant research becomes rather pot luck, down to happening across things shared on social media, reviewed in open access sources (such as ELT Research Bites) or mentioned at conferences, and for each person is usually restricted to the highlights in their own specialist area. And then there's time ... as a freelancer, time is money and most of us simply can't afford to spend lots of unpaid hours ploughing through long academic articles in search of relevant insights that we might be able to make use of when we have writing deadlines to meet. It seems to me that more direct contact between researchers and writers would be really helpful in sharing knowledge in a way that's friendly and accessible. Hearing about research face-to-face with the option to chat and ask questions, to clarify any unfamiliar academic terminology, to share ideas on applicability and tease out any important caveats would be really helpful. I don't know quite what form this might take or who we could invite, but I'll definitely be talking to the MaWSIG crew about it as a possibility.

Labels: , , , , , ,

Wednesday, June 20, 2018

#IVACS2018: learner corpus research & ELT materials for Spanish learners


Last week, I spoke at the IVACS (Inter-Varietal Applied Corpus Studies) conference in Malta about my work using the Cambridge Learner Corpus (CLC) to help develop ELT materials targeted at Spanish learners of English. So, following on from my last post about my work generally using the learner corpus, here's a brief summary of my talk.

Photo from Naill Curry via Twitter
ELT: a global market

From the perspective of a large ELT publisher, if they're to invest in producing a major coursebook series - over several levels each with multiple components - it makes economic sense to sell it to the widest possible global market. This one-size-fits all approach, however, ignores the fact that different learners have different needs. Just one of the factors that differentiates learners is the influence of their first language; their L1. It's well-established that friction between a learner's L1 and target language, in this case English, can result in language transfer issues or interference, a factor not accounted for in materials for a global audience. In recent years, I've worked on a number of projects for CUP that have involved localizing materials to target them more effectively at Spanish learners. More specifically, I've used the CLC to investigate errors by Spanish learners to feed into English for Spanish Speakers (ESS) versions of a number of books.

For more about the CLC see my previous post.

Error types:
When you start looking at learner data for a specific L1 group, three broad error types emerge. There are global errors, that is errors that are common across learners more-or-less regardless of L1. These can be described as developmental or intralingual errors that are a result of the inherent quirks and irregularities in English that trip everyone up. Then there are interlingual errors where the learner's L1 rubs up against English in a way that creates friction and interference. Some of these are common across a language group, such as errors frequent among all Romance language speakers learning English, while others are L1 specific, so peculiar to say, Spanish speakers.

In my session, I took an example of each error type to show how I went about investigating the error and then incorporating activities to target the issue into classroom materials.

Global errors:
One classic example of a global, developmental error is with irregular verbs. Below is a list of the most common past simple/past participle verb inflection errors across the whole learner corpus. As you'd expect, there are some irregular verbs (pay, choose, rise, hear) and others where the spelling rules around whether or not to double the final consonant cause difficulties.

1 occured; 2 happend; 3 payed; 4 choosen; 5 prefered; 6 planed; 7 rised; 8 developped; 9 heared; 10 stoped

If we then look at the top tens for Spanish and French speakers for comparison, we see a lot of overlap.

Spanish: 1 choosen; 2 prefered; 3 payed; 4 teached; 5 refered; 6 planed; 7 occured; 8 heart; 9 writen; 10 tryed


French: 1 developped; 2 mentionned; 3 occured; 4 prefered; 5 choosen; 6 planed; 7 rised; 8 red; 9 enroled; 10 stoped

There are a few interesting differences though. The Spanish use of 'heart' as the past form of 'hear' doesn't seem to follow the pattern you'd expect - as with 'heared' in the global list. This can be put down to an issue of pronunciation; Spanish speakers tend not to pronounce voiced consonants at the end of words, so that a /d/ sound often becomes a /t/ (or is sometimes lost altogether) and this seems to spill over into the spelling. In the French list, we see the extra double letters in 'developped' and 'mentionned', this time because they're cognates in French (developper and mentionner), but both spelt in French with the double consonant that then creeps into the English. So whilst all learners need help and reminders about similar verb inflections, there are local factors that might come into play too.

Language Group errors and the issue of 'below-level' mistakes:
The error I looked at here is around students adding an unnecessary 's' inflecton onto adjectives to agree with a plural noun, so "differents reasons", "two news friends", "interestings questions", etc. Of course, many languages have adjective inflections that agree with the noun they modify for number and these kinds of errors are particularly simple to search for using the coded version of the corpus (where errors are tagged by type). Interestingly though, the corpus data suggests that this particular error is especially prevalent amongst Romance language speakers (Spanish, French, Italian, Portuguese).

What's perhaps more interesting here from a materials writer's perspective is that these errors crop up across levels, with examples right up to proficiency level in the data, even though students will likely learn the basic rules about adjectives in English in their beginner class. So these aren't 'errors' in the sense that the learners clearly know the rules around adjectives in English. Instead, they're mistakes, inadvertent slips. Looking at learner corpus data reveals a lot of these and it shows that the pattern of these mistakes can often be described as something of a bell curve, whereby learners make few errors when they first learn a new language form, partly just because they're cautious and don't use it very much. Then as they progress, they start to make a lot more mistakes with the forms they learnt at previous levels as they experiment and become more adventurous. You could say that they take their eye off the ball with adjectives by B1 or B2 because they're more concerned about complex sentence constructions and whether or not to use a past perfect simple verb form, for example. Then eventually the mistakes start to tail off as learners become more proficient, their language skills more automatic and they have the cognitive capacity to tidy up.


This presents a problem for me as a corpus researcher trying to feed into classroom materials. On the one hand, the data is telling me that these mistakes are significant at mid-levels and probably worth highlighting, but how do I convince editors, teachers and students that they need to focus on simple adjective forms at A2 or even B1 level without the materials seeming 'dumbed down' and 'below level'? The approach I took in one book illustrated below (Empower, A2, CUP 2016) was to:
  1. Make it clear that this is revision. The note starts with the word 'remember' to acknowledge that students probably already know this and the explanation is short and simple - they don't need the 'rules' explained in detail all over again.
  2. Combine several errors around adjectives. An activity just practising adjectives with singular and plural nouns would be pretty pointless at this level. Once the issue had been highlighted, students would find any follow-up activity mechanical and wouldn't engage with the point. By combining a number of issues, there's more to think about and you up the challenge. And a proof-reading activity of this kind is an authentic task type mirroring what students need to do with their own writing to reduce the number of mistakes that slip through.





Going beyond error codes:
The third point in the box above is also worth a bit more attention from a research perspective. The first two errors here jump out of the coded data (they're tagged as adjective inflection and word order errors), but the issue with the word 'colour' was less obvious. As I was looking through adjective examples, I started to notice various instances of awkward phrasing which had been tackled in the coded data in different ways.

I bought it in <#MD> | a </#MD> green colour .  (KET, A2)

It's blue and white <#MT> | in </#MT> colour . (KET, A2)

It only cost 20€ and <#DD> it | its </#DD> colours are red and black. (KET, A2)

<#UP> It's | Its </#UP> colour <#MV> | is </#MV> black. (KET, A2)

I like it because <#MA> | it </#MA> is very small and <#MA> | it </#MA> is <#UN> colour | </#UN> black. (KET, A2)

Anyone who's ever marked student writing will know that there's more than one way to go about trying to correct an oddly-worded sentence and the suggestions in the coding above are all legitimate, but somehow didn't quite ring true to me. It struck me that in each case, the best solution would actually just be to drop the word 'colour' altogether. You might have noticed that all the examples are very similar and they were all indeed in response to the same question which asked students to describe a new mobile phone, including what colour is was. Hmm, so was this just a case of the wording of the question skewing the data? Was it just task effect? It prompted me to search more widely and I found that although I had a lot of examples from this one question, the same issue was cropping up at other levels amongst the Spanish learner data in response to completely different tasks. And from what I understand (I should confess at this point that I'm not a Spanish speaker!), it’s possible to say something along the lines of "a dress of colour blue" in Spanish. It's not a really major error, but it's a high-frequency word and I think the point fits nicely here and, hopefully, gives students (and teachers) pause for thought over something they may not have considered before.

L1-specific errors and classic false friends:
Finally, some of the most satisfying errors are the ones you track down which are clearly examples of L1 interference. And perhaps the most fun are the simple 'false friends'; the English words which seem to be a near equivalent to something in Spanish, but turn out to mean something different. I note these down as I work through the learner data, then try to collect them together into thematic sets which I can tie in with the coursebook syllabus. Below are a few around the theme of 'information' that I was looking at recently for some B2 material, shown along with the Spanish 'false friend' in brackets.

I am writing to you to reply to your <#RN> announcement | advertisement </#RN> in the newspaper. (anuncio)

It is really complicated to talk about a <#RN> theme | subject </#RN> as controversial as the cruelty of keeping animals in zoos. (tema)

What <#UD> a | </#UD> great <#RN> notice | news </#RN>!  (noticia)

We would like to know if you will be able to come, and give a <#RN> conference | talk </#RN>. (conferencia)


In some of these, the meaning of the Spanish word simply doesn't match its English near equivalent -  although they're often in the same semantic ballpark - announcement/advert, notice/news. Others are more about range of usage. So, the Spanish 'tema' seems more widely applicable than the English word 'theme' and gets used by students where 'subject' or 'topic' would fit better in English. And 'conferencia' in Spanish can describe both a conference and an individual talk or lecture. Activities for these are about raising students' awareness, drawing attention to the differences and where relevant, provoking some discussion.

As a side note here, when looking for example sentences for practice activities, although the learner corpus is great for a getting a feel for level, you have to be careful not to transfer subtly awkward phrasing and atypical constructions, ‘learnerese’ if you like, into materials. Especially at higher levels and with subtle differences, such as the theme/subject distinction, I’ll often have a browse through NS corpus data for example sentences. That way I’m ensuring learners have an authentic model and it’s also good to up the level of the language just a little to provide a sense of challenge and progress even when essentially revising.


Research into practice:
Hopefully, this handful of examples gives a taste of the work I've been doing, the way I make use of the learner corpus, both by using the error tags and going beyond the tags to explore less obvious errors. I've also tried to show just some of the issues that emerge in trying to translate the results of that analysis into materials that fit in with the coursebook syllabus, that focus on significant, but apparently below-level mistakes in a way that's appropriately challenging and engaging, and that draws learners' attention to language points that are especially relevant to them rather than just part of a generic global syllabus.

Labels: , , , ,