Scratchpad

Scratchpad

A blog, of sorts, intended as a place to experiment, struggle, question, and play with whatever research I am currently working on. The themes will thus change over time as my projects change, and the entries may be quotations that strike my fancy, attempts to puzzle through hairy problems, notes on sources, experiments, musings, dead ends, odd angles of looking at things. It is a voice to my frustrations, discoveries, curiosities, and confusions. It is thinking out loud. ...More subscribe to this blog

Hasty Thoughts on Plagiarism Detection

, ,

10 May 2010

I spent the weekend applying for jobs, and this included putting together a sample syllabus. It's the first full syllabus I've put together, and it turned out pretty damn well, if I do say so myself. But as I thought about it yesterday evening, I realized that there was one especially nagging weakness in it, and it's that I failed to build my concern about plagiarism into it. Not address it—I've stated clearly that I will fail students for plagiarism. I indicated that I am fully aware of all of the devious little ways they go about plagiarizing (yes, I even know about all 600 paper mill sites, and I can tell you exactly why using each different flavor will not work). But I didn't actually construct my paper writing assignment to prevent them from plagiarizing in the first place. Doh. Such a simple thing. At least I thought of it now, before I actually sit down in front of a class.

At any rate, it got me thinking about the tizzy plagiarism has got academia in, and a few random ideas on detection started popping into my mind.

1. A centralized, university-run database of papers
We do everything else consortium style, so why not this? Currently, professors have the option of submitting papers to Turnitin, but many refuse because of intellectual property issues (Turnitin's schtick is that it keeps copies of submitted papers in order to catch sharing between students and from other such unpublished sources). So why not run such a service ourselves? This would be especially beefy if we could work out some sort of a deal with the big database companies to allow it to also plug in to their backends, simultaneously searching our own collection of papers as well as databases of published papers (although, honestly, Google scholar is getting better at this every day—at the very least, it can identify a citation we need to track down, even if it won't provide us with the full text for it). Free paper mill sites should also be scraped and their papers dumped into the database.

2. Flood paper mills with dummy papers
To be fair, paper mills are already filled with crap papers, which is what makes it so easy to tell when someone has purchased one. But why not go ahead and intentionally submit more—ones that themselves plagiarize shamelessly from web sources (which makes it easy to catch them from a simple Google search), are simply duplicates of well known published papers, or that have hidden red flag words or phrases that professors are made aware of in advance? And then loudly publicize the fact that we are doing it? This is basically the academic equivalent of sending undercover 18 year olds to try and purchase liquor.

3. Sniff our networks
Yes, I can hear privacy people screeching now. For what it's worth, I am a huge privacy advocate, and I believe there are ways to do this that do not interfere with privacy. I picture something like: sniff network for connections/e-mails to/from paper mills; if connection is found, save pages transferred; extract paper from transmission; add to database in #1 or distribute to faculty. At least in this scenario, it is not necessary or even desirable to identify who is downloading the paper. It's only plagiarism, after all, if someone turns it in, and there is neither a guarantee that this will happen or that the person who downloaded it is the one who ultimately submits it. But if someone—anyone—turns it in, we already have a copy ready and waiting for us to compare it to.

That's it for now. I've left out the ways that already exist, thinking instead of more efficient ways (seriously, Googling a suspicious paper and compiling painstaking evidence for several hours just sucks, and when you add that to multiple papers...well, I don't know about you, but I have other things I would much rather spend my time on) than what we already do. I'm sure there are plenty of other crafty, efficient ways to detect plagiarism that I'm not thinking of in this one-off little thought experiment. Any one else have any brilliant schemes?