By visiting our site, you agree to our privacy policy regarding cookies, tracking statistics, etc.
This report, researched during summer 2024, examines the increased use of AI chatbots in academia following the launch of ChatGPT in 2022 and the response by educational institutions through AI detection tools. It discusses the varying effectiveness of these tools in identifying AI-generated content, with issues like false positives and false negatives. The report also highlights ethical concerns around privacy, accuracy, and the use of AI in education. Some institutions are reconsidering the use of detection tools, opting instead to integrate AI into curricula responsibly.
This guide is primarily for universities and other higher education institutions participating in the Google.org Cybersecurity Seminars Program. It is addressed to the Faculty Champions and EDI Champions of these programs. Beyond the Google.org Cybersecurity Seminars, this guide may also be relevant for other organizations involved in practical cybersecurity education.
The launch of ChatGPT in 2022 generated worldwide interest in artificial intelligence (AI) and led to a widespread use of AI chatbots, including by students. Following the emergence of AI chatbots, concerns were raised by higher education institutions about “unfair use of artificial intelligence generated content in an academic environment”1 and the “originality and appropriateness of the content generated by the chatbot”.2
To detect and manage the inappropriate or unfair use of such chatbots, AI detection tools have increased in popularity, with standard plagiarism tools, such as TurnItIn, pivoting to detect AI generated content to varying degrees of efficacy and at various price points.3 Most AI detection tools in academia are integrated into broader education platforms, such as Moodle, Canvas, Blackboard, Brightspace, Schoology, or Sakai.4
AI detection tools identify generated text by using pattern matching rather than comparing it to a database, as traditional plagiarism checkers do. Language models are trained on vast amounts of text data to learn probabilistic language rules, which they use to create new content. However, generated text often exhibits predictable patterns, such as consistent sentence structures, overuse of certain conjunctions and vocabulary, and predictable sentence or paragraph lengths. Detection tools aim to spot these patterns and may also incorporate traditional plagiarism checks to identify text that might have been reproduced directly from the model's training data.5
When AI detection tools were first released, higher education institutions hastened to integrate them into education platforms. However, most, if not all, AI detection tools are able to be circumvented given enough time and effort.6 Some higher education institutions are therefore reversing their decision to utilize AI detectors. In 2023, Vanderbilt, Michigan State, Northwestern, and the University of Texas at Austin disabled their Turnitin AI detectors, citing problems with effectiveness discussed above.7 Other education institutions are likely to follow suit, as it may be that detection tools are causing more problems than they solve.8 Some academic institutions are not only disabling AI detection tools, but finding ways to incorporate LLMs ethically and productively into their curriculums.9
Moreover, new “humanizer” tools have been released to enable LLM users to bypass AI detection tools through “rephrasing sentences, altering structures, and incorporating varied vocabulary” which significantly reduces the likelihood of AI detection.10 Initial research suggests that paraphrasing tools significantly complicate AI detection.11 For example, the Washington Post found that Turnitin struggles with identifying AI-generated content when the text mixes human and AI-generated content through paraphrasing tools.12
Although Turnitin released a new AI paraphrasing detection feature to its AI detection tool,13 such responses create a difficult market context for AI detection, with other companies pivoting to other business models,14 or closing.15
A selection of major AI detection tools are listed below in alphabetical order. We have also included publicly accessible information regarding the efficacy of detection, education platform integration, pricing (in USD), and release and/or update date. Note that most of the AI detection tools listed below are mainly effective against ChatGPT-3.5 only.
AI Detection Tool | Is there integration into education platforms? | Pricing (USD) | Date released/updated |
Compilatio | Yes: Moodle, Brightspace, Canvas, Microsoft Teams, Blackboard, Open LMS | No information found16 | February 2023 |
Content at Scale | Yes: limited information | $49/month17 | No information |
Content Detector AI | No information | No information found | 202318 |
Copyleaks | Yes: Moodle, Canvas, Blackboard, Brightspace, Schoology, Sakai | $7.99-$13.99/month19 | January 2023 |
Crossplag | No information | $7-$100/month20 | January 2023 |
Detect GPT | No information | $7-$29/month21 | No information |
Duplichecker | No information | $110-$2000/year22 | 2024 |
Go Winston | No information | $12-$32/month23 | February 2023 |
GPT-Zero | Yes: Canvas, Coursify.me, K16 solutions, NewsGuard | $10-$23/month24 | January 2023 |
Originality | Yes: Moodle, Scribbr | $14.95-$30/month25 | November 2022 |
Plagiarism Detector (AI detection) | No information | $110-$330/year26 | No information |
Quillbot | Yes: No details publicly available as to which platforms | $0-$8.33/month27 | No information |
Sapling | Unclear | $0-$12/month28 | January 2023 |
Scispace | Likely, however lack of information | $0-$8/month29 | No information |
Turnitin | Yes: Brightspace, Scribbr | $3/student/year30 | April 2023 |
Undetectable AI | No information | $5-$14.99/month31 | May 2023 |
Wordtune | Likely, however lack of information | $0-$9.99/month32 | January 2023 |
Writer’s AI detector | No information | $0-$18/month33 | No information |
ZeroGPT | Yes: No details publicly available as to which platforms | $0-$18.99/month34 | January 2023 |
In the context of AI detection tools, false positives occur when an AI detection tool incorrectly identifies submitted content as generated by AI. Some studies indicate that AI detection tools have a high false positive rate, and only a few AI detection tools have significantly low rates of false positive detection.35 In an academic setting, this may mean incorrectly flagging student work as generated by AI, when it is in fact, human-generated. There are also differences found depending on which AI-model is used to generate the submitted text for the AI detection tool to test, and vice-versa with varying results across studies.36 In addition, content by non-native English speakers is more likely to be incorrectly classified as AI-generated, which is obviously an issue for educational institutions with students from various backgrounds.37
In the context of AI detection tools, false negatives occur when an AI detection tool fails to identify submitted content as generated by AI. Some tools have showed low sensitivity, correctly identifying barely 15% of submitted samples as AI-generated,38 whilst others demonstrate a near perfect score in classifying human-written content, misclassifying only 3% of AI-generated samples.39 In general, results vary widely in accuracy depending on what AI detection tool is used. One study suggests that only two of the main AI detection tools correctly classified all 126 samples as either AI- or human-generated.40 Other researchers claim that AI detection tools produce more false negatives when analyzing more sophisticated language.41
In general, the effectiveness of AI detection tools varies depending on what tool is used, and against what model. One study found that AI detection tools are more effective with ChatGPT-3.5 content, and less so with ChatGPT-4, except for Copyleaks, Turnitin, and Originality.ai which had greater than 83% accuracy in detecting ChatGPT-4 content.42 This study concluded that “a detector’s free or paid status is not a good indicator of its accuracy”,43 although contrasting findings (with a small sample size) tentatively suggest that paid AI detection tools seemed to be better than free AI detection tools.44 Studies also generally focus on effectiveness of AI detection tools against ChatGPT, ignoring other LLMs. This may be due to the greater popularity of OpenAI’s models compared to others such as Gemini, Mistral or Command.
The use of AI chatbots AI in academia raises significant ethical questions, beginning with reputational damage for both students and higher education institutions. For students, failing to disclose use of AI generated content and passing it off as their own can harm their ongoing education and future careers. Universities can similarly face accusations of enabling plagiarism, cheating, and failing to uphold academic integrity.
However, the use of AI detection tools without proper safeguards generates equally significant concerns around privacy and consent, especially regarding the contractual arrangements between universities and the tool provider. Such concerns include what happens to uploaded content, how it is stored, and consent if uploaded content is used in future training data.
Furthermore, as the previous section discussed, AI detection tools may misidentify human-written content as AI (false positives) or fail to detect AI-generated text (false negatives). Accuracy varies widely, with some tools better at detecting ChatGPT-3.5. Finally, they play a cat-and-mouse game with methods to evade detection - including software that specifically generates content designed to be undetectable by standard AI detection tools.45
AI detection tools also contribute to broader debates around access, equity, and environmental impact. Students may be using AI to support translating and comprehension of coursework, especially if they are studying in an English-speaking country and are from a non-English speaking or other minoritized background with historically fewer opportunities for university education. Access issues also arise due to the commercial availability of LLMs; more well-off students may be able to pay for more sophisticated models and/or feed their work through multiple LLMs, meaning that chances of detection drop significantly.46
The Google.org Cybersecurity Seminars Program supports cybersecurity seminar courses in selected universities and other eligible higher education institutions in Europe, the Middle East, and Africa, to help students learn more about cybersecurity and explore pathways in the field. The program actively supports the expansion of cybersecurity training in universities, to build the diverse workforce needed to help the most vulnerable organizations prevent potential cyberattacks. It also addresses new risks from artificial intelligence (AI), providing students with an understanding of AI-based changes to the cyber threat landscape and helping them effectively integrate AI into practical cybersecurity measures.
Participating universities are expected to actively promote equality, diversity, and inclusion within their programs. They should encourage the strong participation of individuals from diverse backgrounds and create an inclusive environment for education, thereby enriching the overall learning experience and strengthening the cybersecurity community.
Loading…