Lexicoblog

The occasional ramblings of a freelance lexicographer

Monday, December 02, 2019

Missing grammar: parallel structure


I've been researching learner language using the Cambridge Learner Corpus for 20 years now and there are certain issues that crop up again and again among learners at all levels. One that I pick up on regularly is illustrated in the examples below (made up examples rather than real corpus data, but they illustrate the point):

At the weekend, he goes to the park and play football. (subject-verb agreement)
I like playing football and run. (verb + -ing form)
I'd love to visit Paris and seeing the Eiffel Tower. (verb + to do)
We went to the park and play football. (past simple verb form)
We can swim in the sea and playing volleyball on the beach. (modal + verb form)
I've tided the kitchen and did the washing up. (present perfect/past participle form)
I was sitting on the train, chatted to my friend on the phone. (past continuous/-ing form)


Basically students attempt to use a second verb form (usually) after a conjunction without repeating the subject, but they forget to match the verb form to the start of the sentence. In each of the examples above, the correct form would become clear(er) if we inserted the 'missing' subject (+verb/auxiliary/modal):

At the weekend, he goes to the park and [he] plays football.
I like playing football and [I like] running.
I'd love to visit Paris and [I'd love to] see the Eiffel Tower.
We went to the park and [we] played football.
We can swim in the sea and [we can] play volleyball on the beach.
I've tided the kitchen and [I've] done the washing up.
I was sitting on the train [and I was] chatting to my friend on the phone.

It's something I've noted in countless corpus reports, but I've never been quite sure what to call it. Until last week when I came across it for the first time in an ELT coursebook referred to as parallel structure. It was in a B2 book in a section about academic writing style and covered a wider range of structures than those above (not just verb phrases, but nouns, adjectives and full clauses too), but it still made me cheer out loud at my desk. It's long puzzled me why these incredibly common structures aren't explicitly addressed in most ELT materials when they cause so many issues for students.

I rarely get the chance to choose the grammar points I cover in the materials I work on, because they're mostly supplementary materials and the syllabus is already fixed by the time I get started. So I've never had the opportunity to cover this explicitly myself. I have tried to include examples in practice exercises, but they usually end up getting cut by editors who want all the items to fit on a single line and don't like the longer examples these structures often involve (grrr!).

So I'm making a case for this to be included explicitly in more ELT materials. It's relevant at every level and with almost every kind of verb structure we teach. It doesn't have to be a separate grammar point and it doesn't even have to have the label parallel structure. I think it's a great thing to bring up when you're revising a particular verb form as a slight variation on the usual practice activities, just to raise students' awareness. You could have a simple intro as above showing/eliciting the 'missed out' words and the correct second verb forms. Then straight into some practice examples (as gap-fills or freer practice). It works perfectly for any kind of list: 

  • daily routines (She leaves the house at 8 and catches the bus at 8.15)
  • a dramatic narrative (He opened the box and looked inside)
  • background to a narrative (People were sitting in the café, eating and drinking)
  • things people like doing (I like watching TV and chatting to my friends online)
  • things people would like to do in the future (I'd like to go to university and study drama)
  • things ticked off on a list (We've booked a room for the party and set up a Facebook page)
  • things on a to-do list (I still need to confirm the hotel booking and renew my travel insurance)

I'm happy to be proved wrong with a flurry of comments about ELT materials that practise exactly this already ...

Labels: , , , ,

Thursday, June 07, 2018

Learner corpus research: mixed methods and detective work


In a blog post at the end of last year, one of my resolutions for 2018 was to focus more on the areas that especially interest me and one of those was corpus research. So far, so good. I’ve spent the past couple of months researching Spanish learner errors for a writing project and next week, I’ll be presenting at my first corpus linguistics conference, IVACS in Valletta, Malta. Coincidentally, I’m going to be talking about my work with the Cambridge Learner Corpus (CLC) researching errors by Spanish students to feed into ELT materials for the Spanish market.

Although this will be my first time speaking at a corpus linguistics conference, it’s far from the first time I’ve spoken about my work with the CLC. In fact, my first presentation at a major conference, at IATEFL in Dublin back in 2000, was about the work I’d been doing using the CLC to write common error notes for the Cambridge Learner’s Dictionary.

So what is the Cambridge Learner Corpus?

The CLC is a collection of exam scripts written by learners taking Cambridge exams, including everything from PET and KET, through First, CAE and Proficiency, IELTS and the Business English exams. It includes data from over 250 000 students from 173 countries. You can choose to either search through the raw data or there’s a coded version of the corpus in which errors have been classified and corrections suggested, much as they would be by a teacher marking a student essay, allowing you to search for specific error types. So, the example below shows how a verb inflection error would be coded.

We are proud that you have <#IV> choosen | chosen </#IV> our company.

You can also search by CEFR level, by specific exams, by country and by L1.

Research for exams:

When I gave my talk in Dublin all those years ago, one of the first questions at the end came from the redoubtable Mario Rinvolucri who was sitting right in the middle of the front row. He was concerned that the corpus didn’t include any spoken data, so wasn’t really representative of student language in general. And he was right. One of the major drawbacks of the CLC is that it only reflects students’ written performance and then, only in an exam writing context. That means it doesn’t pick up issues that are specifically related to spoken language and the data is rather skewed by the topics and genres of exam writing tasks (largely emails and essays).

That does, however, make it perfect for informing exam practice materials. Over the years, I’ve carried out learner corpus research to feed into a whole range of exam materials. This has largely involved searching for the most frequent errors that match different parts of a coursebook syllabus in order to provide the writers with evidence and examples to help them better target particular problem areas. It also led to the Common Mistakes at … series, most of which I was involved in researching and two of which I wrote.


Mixed methods and detective work:

One of the things I enjoy most about working with the learner corpus, even after more than 18 years, is that I’m constantly finding new ways to get the best from the data. It’s easy to start out by searching for the most frequent errors by type, so the top ten spelling errors or the most commonly confused nouns (trip/travel, job/work, etc). But the stats don’t tell the whole story. 

Firstly, there’s the issue I’ve already mentioned about the skewing of the data by particular exam questions and topics. So, for example, one of the top noun confusion errors amongst Spanish learners in the corpus is to use ‘jail’ instead of ‘cage’; a lot of animals are locked up in jails. It is a legitimate false friend error (cage is jaula in Spanish), but it’s only so prominent in the data because of a classic FCE essay question about keeping animals in zoos. Does it merit highlighting in materials? Probably not compared to other errors that involve more high-frequency nouns and crop up across a range of contexts. There’s a balancing act to achieve between looking at the corpus stats, delving into the source of the errors (Were all the examples of an error prompted by a single exam question?), understanding the likely needs of students (Is it a high frequency word or likely to come up again in an exam?) and understanding what’ll work (and won’t work!) in published materials. I think it’s when it comes to these last two that, as a former teacher and full-time materials writer, I probably have the edge over a purely academic researcher.

Then there’s the tagging of errors to consider. Many learner corpus researchers are wary of error tagging because it can push the data into fixed categories that aren’t always appropriate. Any teacher who’s ever marked a student essay will know that some errors are very straightforward to mark (a spelling error or a wrong choice of preposition, for example), while others are messy. There can be several things going on in a single sentence that contribute to it not working and sometimes it’s difficult to pin down the root cause. Not to mention those chunks of text that are so garbled, you’re just not sure what the student intended and you don’t know how to go about correcting them. That means that while the coders who mark up the CLC data do a fantastic job, there are always instances that are open to interpretation. 

When I find an error that looks worth highlighting within a particular error category, I’ll often do a more general search for the word (or form or chunk) to see whether it crops up elsewhere with different tags. Sometimes, I’ll go back to the untagged data too to see how students are using the word more generally. This can help me pin down issues that stray across error categories. Then, if the error isn’t a straightforward one, I’ll flick over to a native-speaker corpus to check how the word or structure in question is typically used – after looking at too much learner data, you start to question your intuitions! – and check a few reference sources to help me pinpoint exactly where the mismatch lies and try to come up with a clear explanation for the students.

It’s this multi-layered detective work to understand where things are going wrong and figure out the best way to help learners understand and hopefully, overcome language issues that I find so satisfying.

At the IVACS conference, I’ll be talking about delving into the issues specific to Spanish learners at 16.00 on Thursday, 14 June for anyone who’s going to be joining me in Valletta.

Labels: , , , , ,

Wednesday, March 20, 2013

Why mistakes matter



Last week, I clicked on a link on a friend’s Facebook page to read an interview in an online fashion magazine. As I started to read, I found that I had to reread the first few lines a couple of times and still couldn’t quite get the flow of the writing. Thinking that in a slightly trendy, arty publication the writer was trying to achieve some kind of creative effect, I read on. I kept, however, stumbling over sentences, having to go back and reparse again and again. Inevitably, given my day job, I started to analyse what it was that was troubling me about the writer’s style, and I started to pick out grammatical errors; missing subjects, mismatched subjects and verbs, awkward parallel constructions. After a while, the awkwardness of the grammar started to really irritate me and eventually, became so tiring, I gave up reading.  It was only then that I noticed the name of the writer and after a bit of clicking around, realized that the website was based in Spain and quite possibly not written, or I suspect even edited, by native English speakers. This perhaps explained the slightly odd writing style, but my first impressions stuck and I couldn’t summon up the energy or ‘understanding’ to go back and finish the article. I don’t mean to be disparaging about the writer’s attempts to write in English. He was clearly a very proficient English speaker and had been ambitious in his writing style and very nearly pulled it off – the errors were not basic, but generally stemmed from his use of more challenging structures. But it seems to me, that if you’re going to publish for an international audience in English (or any language come to that), then you really have to get your writing edited by a native (or near-native) speaker.

I’ve long been interested in the area of learner errors, especially through my long-standing work with the Cambridge Learner Corpus. When I started doing talks about learner errors and how to help students eliminate them, I often came up against resistance along the lines of; but shouldn’t we be encouraging fluency and confidence, not focusing on errors all the time? And I would find myself explaining that yes, of course fluency and confidence are very important, especially in spoken communication, and no, I wasn’t advocating a focus on error correction “all the time”. I do firmly believe though that if we’re going to teach writing skills, then helping students to identify, correct and eventually perhaps eliminate errors has to be a part of that process. And in some contexts, a very important part.

In my own current area of interest, EAP, we bang on a lot about critical thinking and we encourage students to ask critical questions about the accuracy, reliability and credibility of information. These are all qualities that are highly valued in academia – if you’re going to make a claim, your arguments and evidence have to be clear, unambiguous and precise. If a student hands in a piece of writing to their subject tutor that contains inaccuracies or ambiguities, they will quite likely question the students’ understanding of the topic before they put the deficiencies down to language errors.

It seems to me that we’re selling our students short if we mark a piece of written work littered with language errors as “good”, when it clearly isn’t (a brief nod to Jim Scrivener there). Ever since I’ve been involved in ELT, there seems to have been a general distinction made between errors which hamper meaning (bad and to be marked down) and those which don’t (okay to let slide). Whilst that may be valid where simply conveying a message by whatever means possible is our aim, especially in high stakes writing, I don’t think that’s always enough. In the same way that I got tired and irritated by the awkward grammar of my Spanish fashion journalist, a subject tutor ploughing through a pile of student essays may equally feel the linguistic strain placed on them by the errors of their international students, even where they don’t directly impact on the basic meaning.  Even apparently minor errors in academic writing can undermine the writer’s credibility and the degree to which their reader is persuaded of their argument.

So for me, regular error analysis and correction (in a variety of different forms) and occasional activities on ‘basic’ areas of grammar (articles, prepositions, subject-verb agreement) should always have a place in any EAP teacher’s repertoire, even at the highest levels.

Labels: , ,