From Hallucinations to Homework Helpers: How One AI Writes Academic Papers with Real Citations

From Hallucinations to Homework Helpers: How One AI Writes Academic Papers with Real Citations

Academic writing is tough. Between crafting a coherent argument, locating the perfect supporting paper, and citing it all correctly in BibTeX format — it can feel like a puzzle with too many missing pieces.

Now imagine having an AI assistant that not only helps you write fluent, structured paragraphs but also pulls up exactly the right references and inserts properly formatted citations as you go.

Sound science-fictiony? Meet ScholarCopilot—a fresh new AI system designed specifically for academic writing. It doesn't just write like a pro; it cites like one too.

In today’s post, we’re breaking down a recent research paper by Yubo Wang and colleagues that introduces this game-changing AI system. We’ll explain what makes ScholarCopilot different, why it matters, and how it could reshape how researchers tackle the blank page.

Let’s get into it.


The Citation Struggle Is Real

Language models like GPT-4 can already write impressively fluent academic text. But there’s a catch: they often make up citations.

This issue, called citation hallucination, is more than a minor glitch—it seriously undermines trust in AI for serious academic work. You don’t want to realize halfway through writing that the “citation” supporting your argument doesn’t exist.

To combat this, a popular approach called Retrieval-Augmented Generation (RAG for short) retrieves real documents from a database before generating text. So instead of guessing, the AI pulls in relevant info to ensure what it says is credible.

But there’s a big problem with traditional RAG systems: they're kind of robotic. They retrieve everything at the start—before they even know where the paper is going—making them inflexible. If your writing shifts to a new topic, the AI won’t know it needs to pull in new sources unless you manually prompt it.

ScholarCopilot changes all that.


Reinventing RAG: ScholarCopilot Enters the Chat

ScholarCopilot isn’t just another retrieval-augmented model. It does two very clever things:

  1. Dynamic Retrieval: As the AI writes, it decides on the fly when it needs to pull in a citation by inserting a special token: [RET]. That’s like the AI saying, “Hold on, I need to look something up.”

  2. Joint Training: ScholarCopilot learns to generate academic content and fetch the right citations at the same time—not as two separate steps. This dual training helps the model better understand the connection between what it writes and the references it needs.

So instead of a rigid "first retrieve, then write" pipeline, ScholarCopilot thinks, writes, and cites iteratively—just like a real researcher.


How ScholarCopilot Works (Minus the Jargon)

To train this model, the researchers gathered a giant dataset: 500,000 computer science papers from arXiv, complete with their BibTeX citations.

Here’s what they did:

  • Formatted the raw LaTeX into structured sections (like intro, related work, citations).
  • Used another language model to extract clean paper titles from messy BibTeX entries.
  • Matched those titles to real papers in arXiv and Semantic Scholar to ensure the references were valid.
  • Trained ScholarCopilot to:
    • Write coherent academic text.
    • Detect when a citation is needed.
    • Retrieve the relevant reference from a database.
    • Insert that citation into the text—accurately.

And yes, those citations aren’t just window dressing. ScholarCopilot can even incorporate details from the referenced paper into the prose.

It’s like co-authoring a paper with an AI that also happens to be a lightning-fast librarian.


Does It Actually Work? The Results Say Yes

ScholarCopilot wasn’t just tested in theory. The researchers ran a battery of evaluations to see how it compared with other top models like Qwen-2.5-72B (a huge 72-billion-parameter model) and popular citation retrievers like BM25 and Mistral-E5.

Here’s how it stacked up:

📌 Citation Retrieval Accuracy:

  • ScholarCopilot got the right reference among the top 1 suggested results 40.1% of the time.
  • That’s nearly 3x better than BM25 (9.8%) and more than double Mistral-E5 (15.0%).

📝 Generation Quality:

Scored across 5 dimensions (relevance, coherence, rigor, completeness, innovation), ScholarCopilot earned:
- 16.2 / 25 points, beating much larger models like Qwen-2.5-72B (15.8) with a fraction of the computing power.

🧠 Human Studies:

Ten experienced researchers tried the tool and rated:
- Citation Accuracy: 4.6 / 5
- Writing Style: 4.5 / 5
- Likelihood to Use Again: 4.1 / 5

Bonus? ScholarCopilot beat ChatGPT in almost every area related to citation quality and academic tone.


More Than Just a Smart Typist

What makes ScholarCopilot shine isn’t just the model—it’s the experience.

The system lets you:

  • Write incrementally, triggering citations at specific points.
  • See and edit retrieved abstracts as references are pulled in.
  • Stay in control, guiding retrieval or letting the AI lead.

Researchers especially appreciated how it handled the dreaded related works section—a part that often requires sifting through tons of papers and paraphrasing related studies. ScholarCopilot streamlined that process significantly.

One student called the AI “like an unusually helpful grad student who's read a thousand papers and never sleeps.”


But Wait—There Are Some Caveats

ScholarCopilot is promising, but it’s not perfect. Current limitations include:

  • Domain-Specific: It’s mainly trained on computer science papers. Don’t expect it to quote Shakespeare or solve biology-specific queries (yet).
  • Only works on certain paper sections: Currently focused on introductions and related work.
  • Not great at innovation: It excels at mimicking academic writing, but it's not here to brainstorm radically new research ideas.

Also, speed was a mixed bag in the user study. With limited hardware, response times varied—something the team hopes to fix with better server support and optimization.


ScholarCopilot Could Change How Research Gets Written

The implications of ScholarCopilot go beyond convenience.

Imagine:

  • Graduate students reducing hours spent Googling citation candidates.
  • Professors drafting literature reviews with smarter AI support.
  • Non-native English speakers getting native-level academic tone with trustworthy references.

In short, ScholarCopilot isn’t just a writing tool—it’s a productivity booster, research assistant, and full-stack citation engine rolled into one.

As the team continues expanding its capabilities to more disciplines and sections (like methods and experiments), tools like this could redefine academic writing workflows.

It won’t replace human insight or creativity, but when it comes to academic accuracy and efficiency? It’s a game-changer.


Key Takeaways

  • ScholarCopilot is a new AI system built specifically for academic writing.

    • It helps generate content while pulling in real, relevant, and properly formatted citations.
  • Its secret weapon? Dynamically inserting [RET] tokens to signal when the AI needs to retrieve supporting references — during writing.

  • Joint training means it learns how to write and cite in sync, unlike previous systems that treat these as separate problems.

  • On key benchmarks, ScholarCopilot outperformed larger models across citation accuracy and writing quality.

  • Researchers love it: 100% of users in the study rated it better than ChatGPT for citations, and 70% preferred it overall.

  • While it still has limitations (domain, creativity, speed), it’s a major leap forward in making AI writing tools actually useful for serious research.


If you're using AI to help with academic work, here’s something to try: structure your prompts more like this system does—asking the AI to “cite from real papers related to [X]” and encouraging step-by-step references. It won’t match ScholarCopilot's precision, but it'll inch you closer.

With tools like ScholarCopilot on the horizon, we’re not far from a future where writing the first draft isn’t the hardest part—it’s deciding what groundbreaking idea you want to explore next.


Interested in trying a version of ScholarCopilot or discussing the future of AI-powered research tools? Drop us a comment below or share your thoughts on Twitter/X!

And if you're a graduate student still formatting citations by hand... maybe it's time to find yourself a co-pilot. ✍️🤖📚

Stephen, Founder of The Prompt Index

About the Author

Stephen is the founder of The Prompt Index, the #1 AI resource platform. With a background in sales, data analysis, and artificial intelligence, Stephen has successfully leveraged AI to build a free platform that helps others integrate artificial intelligence into their lives.