The occasional ramblings of a freelance lexicographer

Friday, March 16, 2018

So you want an editable coursebook …

The classic ELT coursebook has been falling out of favour in certain circles for some time. The topics are too bland, they can’t possibly be of interest and relevant to all the different students they try to cater for, their methodological approaches are too rigid and don’t always fit with approaches that teachers would like to adopt, they’re too inflexible and difficult to adapt.

As someone who’s worked in ELT publishing for nearly 20 years (although never as a general coursebook author), I can see both sides of the issue. I understand the limitations that publishers are working within and the difficulties of delivering something that works for everyone when commercial restraints demand that an expensive-to-produce coursebook series will need to sell to the widest possible demographic. I also know from talking to teachers I meet from around the world that beyond the ELT Twitteratti, many ordinary teachers are actually quite happy with the status quo. At the same time though, I do agree with many of the criticisms of coursebooks and I find the current, publisher-led, writing-by-committee approach to coursebook production frustrating on all kinds of levels.

I’ve always found the “do away with coursebooks and let teachers write all their own original materials tailored to their students” argument unhelpful and unrealistic for most teachers. Recently though, I’ve read a number of articles which take a more constructive position, suggesting instead how coursebooks could be improved (see Sandy Millin and Kyle Dugan). One suggestion that I’ve seen in a number of places, and that I want to address here, is the option to make materials available in easily editable formats so that teachers can adapt them to be more relevant to their students. It sounds like a great idea, giving teachers (and students) the best of both worlds, but even leaving aside issues around students preferring print books to digital or to piles of handouts, there are a number of important issues to consider around copyright and intellectual property.

Original texts and permissions: 
Anyone who has ever written materials for publication will know that if you want to use authentic texts from external sources, there are lots of hoops to jump through to obtain permission from the copyright holders to reuse them. Some sources just point-blank refuse, others charge considerable fees and most put restrictions on exactly how the text can be reused (whether it can be changed, shortened, adapted, etc.). That means that making those texts available in a format that could be edited by teachers is generally just not possible from a legal perspective. The best that might be feasible in such a context would be to make the original text available in an uneditable form, say, as a PDF, perhaps with the accompanying activities in a separate Word document that could be edited. There are permissions issues with photos and artwork too, so those would probably need to be stripped out of any editable versions.

Intellectual property and reputation:
One reason why many copyright holders won’t allow their work to be used and adapted in any old way is because their original intent in writing the text and the message they intended to convey could too easily become distorted, misrepresenting their ideas in a way that they wouldn’t want to put their name to. Imagine you’re a political journalist who’s carefully constructed a piece to put across a particular argument and point-of-view, then someone comes along and chops it about in such a way as to miss the point, or worse still, end up suggesting an opposite view, but with your name still attached.

Arguably, ELT writers are slightly less precious when it comes to presenting their personal views in a grammar or vocab activity, but there is nonetheless, a good deal of professional expertise and principle that goes into writing ELT materials. I frequently object to changes suggested by editors to activities or even individual items that I feel would change the nature of the activity and no longer achieve the intended language-learning objective. Other times, I’ve included something to make the material more inclusive, for example, and no, I don’t want all my female scientists swapped for men who are easier to find stock photos of! When materials go out into the world with my name attached, I want to feel that even if a few compromises were made along the way, I’m still essentially happy with the content.

I fully expect teachers to adapt the materials I write - to skip activities, to write up extra questions on the board or even to make a handout with alternative activities to go along with a reading or listening text in the book. It would be a bit strange if they didn’t adapt to some extent because they know their teaching context and their students and I don’t. Generally though, it still remains clear to the students, or to whoever else might see the resulting hybrid materials (parents, colleagues, etc.), who wrote what. So if those adaptations contain language errors or changes that I wouldn’t agree with or that completely miss the point I was trying to get across to students, then that doesn’t reflect on me or damage my professional reputation or that of the publisher. If those materials were fully editable, then the line between my intellectual property and any local additions (be they brilliant or error-strewn) would become blurred and versions of the material could start circulating that bore my name, but which I would never have written myself. For me, that’s problematic. 

Of course, we could just remove authors’ names from classroom materials. Much of the work I do already goes largely uncredited apart from in tiny print on the acknowledgements page, so I don’t really have a problem with that (although I think it somewhat lessens the incentive for authors to produce great content). But that still doesn’t solve the problem of publishers (or ‘content providers’, if you will) establishing and maintaining their reputation, not to mention avoiding piracy … but that’s maybe a topic for another day …

In short, I think editable ELT materials might be possible in some form, but they would have to be constructed in such a way that certain sections of the material could not be changed and also so that any changes that were made were clearly attributed. It’s an interesting challenge that I don’t think is insurmountable, but it’s certainly not as straightforward as it might at first seem.

Labels: , , ,

Monday, March 05, 2018

Corpus insider #1: Representativeness

As I was putting together my talk for the ELT Freelancers’ Awayday and the follow-up blog post, I realized that over 20 years of using corpora, there are a whole host of factors I’ve learnt to take into account. I touched very briefly on a few of them in my talk, but I thought some might be worth exploring further. So, this is the first in a series of posts about things you might need to bear in mind if you want to use corpus tools to inform your work on ELT materials.

When I explain to people what a corpus is, I usually start off by saying that it’s a large collection of language that we use to represent the way English is used as a whole. It seems a simple premise, but when you dig a bit deeper, it gets more complicated. As with any research, the validity of your results is dependent on the quality of your data. The data your chosen corpus contains will determine exactly what kind of language it can actually be said to represent and so how useful it is for your purpose. To take a simple example, if you were writing for a specifically British English market, using a corpus that contained only American data wouldn’t be very useful. Similarly, if you were working on speaking materials, looking at usage in a corpus of entirely written data wouldn’t really tell you much about how people normally speak. Understanding a bit about the corpus you plan to use, the data it contains, and what that might represent is absolutely essential before you start doing any corpus research.

Corpus types:
There are two main types of corpus, those which contain data drawn from one type of source or genre and those which are said to be ‘balanced’ and contain data from a wide variety of different genres. The first type includes purely spoken corpora (like the Spoken BNC2014), corpora of academic writing (of either published texts, like the academic part of COCA or student writing, like BAWE or MICUSP) and many corpora are composed largely of journalism, because it’s one of the simplest sources of data to collect, especially for those corpora that rely on web-based content (e.g. Monco, NOW, etc.).

Large balanced corpora, containing written and spoken data from a wide range of sources, are much more difficult to put together. For this reason, they’re mainly owned and maintained by large publishers, especially those who produce dictionaries, and aren’t publically available. The British National Corpus (BNC) is a balanced corpus that’s freely available, but it’s relatively small by modern standards and, perhaps more importantly, it’s becoming increasingly out of date (with data from the 1980s and 90s). The Corpus of Contemporary American English (COCA) sits in a mid-ground with data from spoken sources (although all radio transcripts rather than everyday conversation), fiction, popular magazines, newspapers and academic texts. It’s reasonably balanced, although all American as the name suggests.

The trouble with media hype:
Data from newspapers, magazines and blogs is very easy to collect and makes up a large proportion of many corpora. It can provide lots of interesting information about language used to talk about a wide range of topics, but it’s important to remember that journalism as a genre has its own quite marked features that don’t necessarily reflect the way that ordinary people use language day to day. It may seem obvious to say that journalists report news, but that means they’re generally writing about what’s new, surprising, shocking or problematic. They also want to draw their readers in and keep their attention with colourful language choices and hyperbole. For my recent talk, I demonstrated an example of a query about the language of social media and in particular, which verbs collocate with the noun ‘newsfeed’. I used the Monco corpus, because I was interested in up-to-date usage, and came up with the following verbs:

scroll through your newsfeed
pop up on your newsfeed
fill/flood/dominate/clog up your newsfeed

The first two feel like expressions you might use in conversation, the others, however, are clearly journalistic in style; bemoaning the way that a particular trend is overtaking our online lives. Searching a couple of other news-dominated corpora came up with similar results (enTenTen: spam/clutter/bombard/clog your newsfeed; NOW: scroll through/appear on/pop up on/tweak/flood/clog your newsfeed). They’re all interesting collocations, but they’re probably not the first ones you’d choose to teach an intermediate learner who wants to talk about the way they use social media themselves. That’s not to say you shouldn’t use these corpora when you're researching ideas for ELT materials, but knowing a corpus contains only or largely data from journalistic sources means that you can be on the lookout for this type of language and be selective about what you use as appropriate for the learners you’re writing for.

Professional and lay writers:
Unsurprisingly, the majority of written corpus data comes from published sources and, as such, it’s written by people who are professional writers: authors, journalists, copy-writers. As we saw with journalism, above, this can mean the language is more colourful and probably more varied than the average lay person typically tends to use. This came out very clearly in a recent study into academic vocabulary (Durrant, 2016*) which looked at how many of the words on the Academic Vocabulary List (based on a corpus of published academic writing) were actually used regularly by student writers (using a corpus of university-level student writing). It turned out that the student essays contained a vastly narrower range of vocabulary than the published academic texts, written by experienced (and edited!) academics. That’s not to say the student writing was in some way lacking – all the papers had got high marks – it’s just a different genre with different expectations. 

When you’re using a corpus to search for ideas, it’s all too easy to pick out examples and patterns that are elegant or appealing, but I think it’s always important to ask yourself how typical they are of what the average person might say or write. Is it a writerly flourish? Is it helpful as a model for your target learners?

I’m not saying that as ELT writers and editors we should reject all corpus evidence as flawed and unhelpful. Far from it, I think corpus tools can be incredibly helpful in backing up our intuitions and uncovering patterns of usage we might not have thought of, but they are just that, ‘tools’ and should be used with an element of caution. It's all too easy to be drawn in by a corpus that's new or especially large or has a nice interface and nifty tools, but making sure you know what your corpus represents is vital. If a collocation or pattern feels unlikely or overly fancy, then ask yourself why. Don’t just accept the first results that pop up, click through to the examples, scroll down to see where they come from and understand exactly what’s going on.

* There's a good summary of Durrant's study on ELT Research Bites.

Labels: , , , , , ,