Friday, April 27, 2012

Corpus pick-me-up

This week, I've mostly been working on EAP materials and whilst I'm really enjoying getting my teeth into some writing, by lunchtime today, I was starting to flag and I realised that progress was slowing down.  I'd been looking at using journal article abstracts and was mulling over some language work on expressing the aims of a piece of writing and wondering about the use of This paper will explore/discuss/focus on, etc. as opposed to personal pronouns; I/We will examine/discuss... It's tempting to encourage students not to use personal pronouns in their academic writing - largely because you're trying to steer them away from the rather IELTS-y I think, I believe, In my opinion type phrasing. But then, I know that some academic texts, especially in certain disciplines, do commonly use personal pronouns, especially in stating aims.

Having been reminded of a couple of useful corpus resources over the weekend, I decided to put my intuitions to the test. I used several different sources to see how different types of writers express the aims of a piece of writing. As well as looking at conventional academic corpora of 'professional' academic writers (in the form of published journal articles, etc), I used the British Academic Written English corpus (BAWE) and the Michigan Corpus of Upper-Level Student Papers (MICUSP) to investigate how students express their aims (largely in essay introductions rather than abstracts). As I built up some lovely lists of common phrases in my notebook, I felt my energy levels rising, even on a wet Friday afternoon! I'm not sure what it is about all those lines of words on the screen slowly revealing themselves into patterns, but it certainly perked me up.

 My notebook satisfyingly full of words, phrases and patterns - click to enlarge.

It also revealed that indeed both patterns are common, as is the combination of the two (In this paper, we will ...) and they both appear in both published academic texts and student writing.  Yes, the different forms were clearly used with varying frequencies and in slightly different proportions in different text types, but for my purposes, the exact statistics didn't really matter - I'm being corpus-informed, not corpus-driven here.  What was important was that all the different forms were worth highlighting. Sadly, I can't cram everything I discovered into the very short task I'm working on, but I'll keep it for future reference, and a couple of hours playing with words certainly lifted me out of my slump and sparked a few ideas.

