How do plagiarism-checkers work?

Every piece of text-matching software has its own approach. Most work on the same basic principle: check entered content against a database of source material and look for similarities. Considering the vast amount of content that may potentially be plagiarized, though, this is not a trivial task. A simple line-by-line search would take forever and be impractically resource-intensive. That’s why most tools that check text for plagiarism use fingerprinting. For every piece of text in the database and every piece of text they check, they extract sets of samples and run each one through a hashing algorithm which produces a unique identifier for every input. If a paper has a fingerprint that’s identical to one in the database, it means they both have the same input and may be plagiarism. This unavoidably results in lower accuracy, but a good fingerprinting algorithm can take samples from the paper in such a way that it can detect not only exact matches but plagiarism where some of the content has been altered – such as by a spinning program. If the program finds a fingerprint match, it may simply flag a possible instance of plagiarism and call it a day. Higher-quality software, though, will often then use direct string matching to check the texts line by line. This is a task that gets much computationally lighter once the database has been narrowed down. This helps confirm initial fingerprint hits and provides a lot more data for the humans making the ultimate decision.

Things to Look for in a Good Plagiarism Checker

A plagiarism checker should have:

Privacy policy

Many free (or, more often, freemium) plagiarism checkers are legitimate, making money through ads or by selling a premium version. However, some of the less-scrupulous ones may actually be taking the writing you check and using it for their own purposes. It may end up being used as content on a study website or being run through a “spinner” to change its wording and be put up as an article to generate traffic. It’s a good idea to check the privacy policy and do a quick check on the site’s reputation. Especially do this if it seems a bit sketchy or too good to be true.

Database

If a plagiarism checker doesn’t have access to the right source material, it won’t be able to tell when that material is plagiarized. This is typically the biggest thing that separates lower-quality plagiarism checkers from their premium counterparts. Getting access to collections of books, articles, and other content that is owned by someone else isn’t free or easy, so many tools can only check the Internet. That is where a lot of plagiarism happens, though, so having access to books, journal articles, or other private materials is most important if you’re checking for plagiarism that someone might have put a bit more work into.

Algorithm

Most plagiarism-checkers don’t explicitly reveal their algorithm, but the quality and accuracy of the results are a good indicator of how well-built it is. This can be difficult to measure directly, but looking at how much detail it returns, reading user reviews, and testing to see if it can detect material you copy from other sources can give you a good idea of how comprehensive the site searches. If the free version fails to pick up a copy-paste from a Wikipedia article, for example, you probably can’t expect the paid version to be very thorough.

The Best Plagiarism Checker

Professional-level plagiarism checkers mostly all come at a price, and most of the free options available are either worse than Google or have privacy policies that imply they might be using your content for their own purposes. The best you’re likely to get for free is either a few trial pages or a simple report that simply tells you if there’s plagiarism present. The latter can still be useful since it gives you a quick way to assess whether or not you should use a more in-depth tool or go through a paper manually. I tested each of the tools below using several texts (articles I’ve written, Wikipedia entries, and news sources) and all of them were able to accurately identify plagiarized content, along with the sources. I tested quite a few completely free sites, but many of them were unable to identify passages from my articles and even failed to catch copy-pastes from the BBC and Wikipedia, despite a quick Google search popping up with the plagiarized content immediately.

1. Google

If there’s a specific piece of text you suspect of having been plagiarized, Google is actually a great first stop. You can only search for 32 words at a time, but that can often be enough to turn up the website, paper, or book that someone has copied from, even if they’ve altered a few words.

2. Grammarly

Grammarly requires a subscription to it editing service for you to get full plagiarism results, but there’s no charge for the initial check, which tells you whether there’s likely to be plagiarism or not. That’s more than you get with a lot of other apps, and I found it correctly flagged plagiarism most of the time, making it a good first-line free option.

3. SearchEngineReports

It’s basically a Google wrapper, but it’s free and actually works better than a lot of other free options. It got most of what I put into it correct. SearchEngineReports lets you check up to 2,000 words of text per search (with no upper limit on the number of searches) and runs it through Google piece by piece, telling you which sentences produce hits. It also gives you the option to rewrite plagiarized content to avoid future detection, which I don’t advise you do.

4. Copyleaks

Copyleaks gives you 2,500 words, or about 10 pages, of free checking. It’s pretty widely used, has a user-friendly interface, and includes a large database of academic and scientific work to check against. If you need to go beyond Internet content, this is a reliable place to start. It got all the online content I threw at it.

5. Quetext

You get three 500-word checks for free, and after that you have to subscribe. Quetext has a good reputation for accuracy and thoroughness, though, and, accordingly, it performed well in my tests. Its database includes a lot of books and articles as well as Internet content. If you’re looking for something comprehensive but cheaper than Copyleaks, Quetext is a good place to start.

6. Plagscan

PlagScan has an extensive database of books, articles, and other texts and returns a detailed analysis that, for me, identified just about all of the plagiarized sources. The free trial is good for 2,000 words, after which you’ll have to buy credits to continue. If you don’t have a huge amount of text to check, the system of buying credits to check a certain number of words could end up being cheaper than the subscription options offered by most other plagiarism checkers.

There’s no magic bullet

Plagiarism checkers, especially the budget ones, almost definitely won’t be able to catch everything. If a plagiarist uses obscure sources or rewrites enough, there’s not much a machine can do to flag them, and even knowledgeable humans can be fooled. They can be a good first line of defense, though, and can at least deter low-effort plagiarism. Image credits: PD Methods Detection Performance, Example-Of-Article-Plagiarism-Diagram, A hash function at work