Ludwig Guru: a review
Recently, a fellow ELT writer posted in a Facebook group about
a new language tool they'd discovered. I hadn't come across it before, so couldn't
resist checking it out.
It's called Ludwig Guru and it describes itself as:
"the first sentence search engine that helps you write better English by
giving you contextualized examples taken from reliable sources."
It's
aimed at learners/non-expert users of English and the idea is you type in your
best guess at an English sentence, or part of a sentence, and it comes back
with examples of similar sentences from 'reliable sources'. Then you can see
how well the examples match your own attempt. Presumably, if you find lots that
are exactly the same, you know you're on the right track and if they're a bit
different, you can adjust yours to sound more natural.
The post that had led me to it was from an ELT writer
looking for ideas for how a slightly obscure tense (future continuous passive,
yes, it's a thing!) is typically used. My first reaction was "Why not use
a 'proper' corpus?" … but I am aware that corpus tools can be off-putting
until you get used to them and this looked like a potentially more
user-friendly alternative. I decided to test it out to see whether it might be a
useful tool for ELT writers for checking intuitions or searching for ideas for
authentic examples/contexts, as well as a tool to recommend to students.
As with many similar services, there's a free version
with limited functionality and a premium version that gives you the full
experience. I registered for the free version just to try it out. It's very
restrictive! You only get 6 searches per day – and that's a 24-hour period, so
if you hit your limit in the afternoon, you can't log back in the next morning
– which made it very difficult to test out in any meaningful way. You also only
get 15 results per search, which again made it difficult to know whether what I
was seeing was a representative sample of what you'd get from a wider search. You can sign up for a free 15-day trial of the premium version, but that requires you to enter your credit card details, which
I wasn't prepared to do. So, to be honest, I didn't get as far as I'd like
before I just gave up! But here's what I did find.
The data:
My first question was about what constituted 'reliable
sources'. The site uses 22 sources including
8 news media sites (including the BBC, the Guardian, the New York Times, etc.),
it has 5 academic science sources (mostly scientific journals), a couple of wikis,
a couple of encyclopedias and a collection of other sources that it describes
as 'Formal & Business' but are a bit of a mixed bunch, including documents
from UNICEF and the European Parliament. You can (with the premium version) choose
to filter your results by selecting which sources you want to include.
My first thought is that it's actually not a bad spread. Many
corpora depend heavily on news media sources because they're readily available
and reasonably wide-ranging in terms of topics (and so spread of
language/vocab). The encyclopedias and wikis will also provide a nice spread of
topic vocab. The language of journalism though (and of reference materials too,
I suspect) is quite a distinct genre, so isn't necessarily an ideal model for
other contexts.
The academic content is made up of only science journals,
so obviously doesn't help with other academic disciplines. I also noticed that a lot of the
unexpected results that came up showing examples that felt awkward (and in some
cases positively incorrect) came from this section of the data. When I clicked
through (as you can) to the original sources, they were papers that appeared to
be written (at least judging very crudely by the names of the authors) by
non-native speakers of English. That's unsurprising seeing as many academic
papers in English language journals have a very international mix of authors.
The judgment as to whether something that has managed to pass through the
reviewing process for a journal (and in this case actually a small range of
journals mostly from the same publisher) represents a good language model or
not is up for debate.
The overall British/American split is difficult to
determine, but the news media are 50/50 – which is just something to bear in
mind as some searches will throw up clear differences between the two. For
example, write me (with the person as the direct object) is standard in
American English but sounds distinctly odd to a British English speaker (who'd
use write to me).
For learners:
The tool has been designed for non-expert users of English to search for specific
phrases, so this is where I started. The
results were kind of mixed and the main thing I took away was that they needed
quite a degree of language awareness and analysis to be useful. Here a just a
few of the searches I tried and the issues they threw up:
Just as a side note, I actually took some of the examples
that sounded awkward/unlikely to me from interviews with the creators of the
app itself … and interestingly, those exact examples often came up as the first
result!
One obvious search is to check collocations, so I looked
at a couple that seemed slightly odd to me; firmly think and obtain my goal.
My intuition tells me (as do reference sources and other corpus evidence) that
we'd be more likely to say firmly believe and attain my goal. Ludwig came
back with the following results (click on the images to enlarge).
As a seasoned corpus researcher, I know that in a large
body of data, you'll probably find some examples of almost any combination of
words. Mostly though, with very small numbers like these, you'd discount them
as untypical and unhelpful for a learner. (As I mentioned above, many of the
results for obtain a goal, in fact, come from a handful of academic papers
likely written by non-native speakers.) Corpus research is all about
identifying frequent and typical patterns, not individual quirks of usage. For
the student using a tool like this, I guess the question is how they make that
judgment. Will they see that there are actually only a relatively small number
of matches and instead click through to see the similar patterns? Or will they
just see a first screen full of examples that appear to match their own,
possibly slightly awkward, wording and stick with it?
If students do discount patterns with fewer hits, then
the other tools available can be really helpful. The search for obtain a
goal above shows suggestions for achieve/realize/attain a goal – all
good, solid collocations. Another search for the slightly awkward a large
part of them only turned up 17 exact matches, but Ludwig allows a search
for synonyms (by putting an underscore before the word you want synonyms of) which offers some good alternatives shown in frequency order; a
large percentage/proportion of them.
The other major issue, of course, is that learners need
to feel that a construction might not be right in order to decide to check it in the first place.
One review of the app which explicitly highlighted that it was written by a
non-native English speaker using the app still contained a few clear language
errors. That's not a criticism of the writer – or even of Ludwig, to be honest
– but it goes to show that it's impossible to be conscious of all your own
errors.
For language research:
Both the basic searches and some of the other tools
available do have an appeal for the ELT writer wanting to check out typical
usage or just search for ideas, but I think the limitations probably outweigh
the benefits.
I was initially unsure whether the searches were
lemmatised or not … by that I mean that if you search for take do you just
get results for that exact form or do you also get takes/taking/took/taken?
It's difficult to be certain with so few search results returned – many of my
searches just seemed to come up with the exact form I'd typed in, but then some
less frequent ones, like obtain my goal above, did seem to show other forms
(obtaining) as 'similar' results. It seems though that exact matches always
come up first and they are just that 'exact'. Which is not very helpful for
researching most language patterns where you want to allow at least some
variation. Even if you were searching for a particular tense, say present
perfect, you'd want to allow for has done and have done. Certainly, if you were
looking to compare collocations using the comparison tool, e.g. [take get] a
bus, you wouldn't want to just look at the base form of the verb, you'd want to
compare across all verb forms.
Another issue when it comes to searching for language
patterns is allowing for variation. So taking that same example of take/get +
bus, you want to see not just take/get a
bus, but take the bus, take the airport bus, take the next bus, etc. too. Similarly with
verb patterns, you want to allow for negatives have not done and possible
adverbs have already done. It may sound a bit silly but by searching for exact
matches, you only find what you were searching for … when actually what's often
more useful, typical or interesting are the variations you hadn't thought of.
While investigating, I did come up against a number
of unexpected results. So, for example, I searched for have * been which should have shown me the most common words that occur between have and been. A standard corpus search uncovers plenty of examples of have
already/now/also/just/long/not, etc. been, so it was slightly surprising that
Ludwig returned no matches at all. Oddly though, one of the suggested similar
searches, the much less frequent, have * participated, came up with 5 matches
(have also/never/already/not/consistently participated). This just planted a
seed of doubt in my mind about consistency and reliability, but wasn't something I could really explore further
within my limited searches.
Conclusions:
Overall, I think the idea behind the project is a good
one and the app has some really nice features … but for me, the limitations of
the free version make it fairly unusable and the limitations of the whole thing
make it not worth paying for premium. Certainly in the case of an ELT writer,
you'd be much better off investing a bit of time and your subscription money in
learning to use a standard corpus tool which will give you much more
flexibility and functionality.
Labels: corpus research, online tools
7 Comments:
you've mentioned "standard corpus tool" - which one would you recommend?
I use Sketch Engine mostly: https://www.sketchengine.eu/ I have a subscription which gives me access to a wide range of corpora, but there's a free version that has fewer corpora but the same functionality. And if you're attached to a university in Europe, there's also a scheme which gives free institutional access.
Thank you, I'll check it.
Thank you so much. I really appreciate your advice. I just got a subscription in sketch engine via my university access. It is amazing.
Thank you!! Could you please explain the errors present in the review written by the non-native English speaker?
Hi Beatriz,
Firstly, apologies that it took me a while to spot your comment.
It's more than a year since I wrote this post, so I don't remember exactly what I spotted in the review initially, but looking back at it now, here are a few of the issues I'd pick out. It's tricky to highlight the errors and corrections with limited formating here, but hopefully you can see the corrections I've made.
- The title of the post is a bit odd. "Online linguistic search engine Ludwig helps get your English on" - my first thought is that there's a word missing at the end of the sentence because it doesn't really make sense - unless it's a very clever play on a slang expression which I don't think it is. But I'm guessing maybe the writer is trying to use the phrasal verb "get on" in the sense of "make progress". However, that's an intransitive use - so someone gets on, you can't get sth on.
- "unless The New York Times, *BBC* > *the BBC* ... " ... "such as The New York Times, PLOS ONE, *BBC* > *the BBC* and [a number of] scientific publications."
- "To use Ludwig, people *should* > *need to/have to* type into the Ludwig bar not the sentence they want to translate ..."
- [Within a quote] “Wittgenstein came to a conclusion: *the meaning* > *meaning* is determined by context" [no article]
- "*a large part of them* > *a large proportion of them* (44 percent) *enrolled* > *were enrolled* in a STEM program"
- "an *ads-free* > *ad-free* desktop app"
- "the company would like to *sign* > *establish/develop/form* partnerships with reliable sources"
- "As for leaving Sicily for *the Silicon Valley* > *Silicon Valley*" [no article]
Like I said in my post, most of the errors are quite minor - apart from the headline which is pretty confusing. Several of them are incorrect use of articles which is something that the writer probably didn't notice and so didn't think to check - which is one of the drawbacks to this kind of app - you only tend to check the things you're not sure about and the minor slips go unnoticed.
I hope that answers your question.
Julie
I've only looked at Ludwig when it's come up in a Google search for a word or phrase, and my impression from that limited use is that it's very poor. (I'm a copy-editor, not a learner or an unconfident user of English, but sometimes I want to check something an author has written to confirm or otherwise my feeling that it's 'wrong', or so uncommon that it's best treated as wrong.) Today I searched for 'frustration at' ('about', 'with'). Some related queries popped in the Google search, including one about 'frustration borne [out] of', which had a Ludwig answer. Of course, Ludwig, or anything sensible, should have said 'It's not borne of', it's 'born of'. But it said 'borne of' was fine and gave some examples of its use.
This review tells me why: it's a corpus and contains unassessed text, which will contain errors. So I think it should come with a strong health warning, and not appear near the top of Google searches. I had formed the impression that it was an extremely amateurish product; perhaps it's just posing as something it isn't.
Post a Comment
<< Home