Can We Use AI to Detect AI Bots on Social Media?

research

Published

February 3, 2026

Short answer: We can reliably detect obvious spam bots using large language models. However, we do not know if we can reliably detect human-like AI accounts.

This post summarizes our findings from the November 2025 Apart Research def/acc hackathon, during which Andreas Raaskov and I tested whether large language models (LLMs) could help detect social media bots on Bluesky. One of our key findings: when bots are not obvious, the prompts we use to detect them strongly influence the detection results.

Our central question: Using only publicly visible content and metadata, can we detect an AI-generated account that mimics a human user?

In brief:

Existing algorithmic methods and our LLM-based methods worked well on obvious spam and scam bots.
These methods failed on the hard cases: accounts behaving like ordinary, engaged humans.
LLM-based detection varies wildly depending on prompt wording (from ~9% to ~91% of the same accounts flagged).
Without reliable ground truth, detection results are not interpretable.
A major risk today is false positives.

Why Bot Detection Is Getting Harder

Bot detection used to be straightforward. Automated accounts posted constantly, followed thousands of users at once, or repeated the same promotional text. Existing tools are very good at catching that behavior. We found that low-cost LLMs were also able to detect this kind of bot behavior.

Recently, the quality of bot text generation has improved dramatically. Modern LLMs can produce short, context-appropriate replies that look indistinguishable from normal human participation, especially in settings like social media threads. An account powered by an LLM does not need to post frequently, coordinate with other accounts, or impersonate a specific individual to be disruptive. LLMs can simply participate.

This change in bot text quality raises the question we set out to test.

Our Approach

To examine this question, we chose to look at Bluesky posts because its data is openly accessible via the AT Protocol and the platform functions as a high-engagement public discussion space.

We evaluated four detection approaches: 1. Follower / following patterns 2. Posting frequency and timing 3. Text-based heuristics (repetition, templating, typical AI phrases) 4. LLMs used as judges, via zero-shot prompts (asking a model to judge without prior examples) asking whether an account appeared AI-generated

A key challenge with LLM-based detection is prompt sensitivity: the exact wording of the question can dramatically change results. We tested multiple prompt variations to examine this effect.

We tested these on two datasets:

A known bot list, consisting mostly of obvious spam and scam accounts
A real-world sample of 1,000 accounts that had replied to popular posts in active discussion threads

The second sample was significantly more challenging to analyze, because it contained accounts that looked human-generated.

Obvious Cases Were Easily Detectable

Against the known bot list, all of our methods worked.

Algorithmic methods flagged accounts with non-human posting rates, suspicious follower ratios, and repetitive promotional content. LLM-based judges also detected nearly all of these accounts, across multiple prompt variations. Our findings aligned with prior work and deployed tools, including algorithmic bot detectors, keyword-based heuristics, and network analysis approaches, all of which are generally effective against high-volume spam and coordinated campaigns.

This result is reassuring, but not surprising. These are exactly the kinds of bots that existing detection systems are designed to catch. LLMs turn out to be quite good at detecting them too.

Detection Failed for the Difficult Cases

When we applied the same methods to the 1,000 accounts drawn from active reply threads, the results changed dramatically.

Algorithmic Methods:

Algorithmic methods detected almost no bots in this sample.

Follower ratios looked normal. Posting rates fell within human ranges. Text analysis found little repetition or templating. Either there were no sophisticated bots in this sample—or the bots were behaving exactly like humans.

LLM-Based Detection:

LLM-based detection gave unstable responses.

Different prompts produced radically different results:

One prompt flagged ~16% of accounts as bots
Another flagged ~21%
Another flagged ~29%
A highly specific prompt flagged ~91%

The chart in the center shows how the same accounts receive dramatically different classifications depending only on prompt wording.

Prompt Sensitivity Is a Core Problem

The variability in results reveals a deeper issue.

When accounts fall into an ambiguous middle ground—posting frequently, expressing opinions, engaging in debate—LLMs do not have a stable basis for classification. Prompting becomes prompt steering: asking the model to be more or less suspicious directly changes the outcome.

In one case, an account was flagged as “likely AI” by all four prompts. Manual review suggested the posts came from a politically engaged human who posting frequently and contextually. The LLM had mistaken this user’s enthusiasm for automation. Without ground truth, we cannot tell which prompt, if any, is accurate.

Is 91% detection uncovering hidden bots?
Or is it massively over-flagging humans?
Is 16% conservative and accurate—or simply missing things?

There is no way to know which method is accurate without ground truth - and how can we assess a ground truth about bots designed to be undetectable?

Why This Matters

Two risks: one is that undetected bots flood social media spaces, influencing conversations in ways completely unknown to humans on the platform - a major concern. Another risk is false positives. If detection systems cannot distinguish between human-like AI behavior and normal human participation, they will inevitably mislabel real users—especially those who post frequently, argue passionately, or communicate in unusual styles. Both of these risks pose dangers for platform governance, and user trust.

The Real Bottleneck: Ground Truth

The fundamental barrier for our research is the absence of reliable examples of bots that act like humans.

We do not have a validated dataset of human-like, LLM-generated social media accounts that:

Behave organically over time
Show normal engagement patterns
Avoid obvious spam or coordination signals

Without such a reference, detection results cannot be calibrated, compared, or meaningfully evaluated.

In short: Our intervention was to test LLM-based detection against realistic, ambiguous accounts rather than obvious bots. The main problem we face now is a lack of reliable reference points.

What We’re Doing Next

Our next step is to build a synthetic reference dataset—not to deploy bots on live platforms, but to generate realistic posting histories offline.

The goal is to create controlled examples of accounts that are:

Indistinguishable from humans to casual inspection
Generated using multiple models and prompting styles
Accompanied by realistic metadata and engagement patterns

With such a dataset, we can ask a meaningful question: are there any signals—textual or behavioral—that consistently distinguish human expression from AI-generated participation?

It is possible the answer will be “no.” In that case, knowing that we cannot detect LLM accounts will also be an important finding.

Future extensions to this research that we have considered but have not yet attempted include:

Larger samples across different languages and communities
Few-shot or fine-tuned detection models (using additional training methods to improve the LLM’s detection behavior)
Broader network-level analysis combined with content analysis - potentially partnering with platform operators

We intentionally scoped this work to evaluate zero-shot LLM judgment.

How This Fits with Prior Work

Prior work on social media bot detection has focused largely on algorithmic signals (such as posting frequency or network structure) and, more recently, on deep learning classifiers trained on labeled datasets. These approaches work well for coordinated campaigns and spam bots, but they assume access to reliable ground truth.

Recent experimental work has also shown that modern LLMs can pass controlled Turing-style evaluations, suggesting that text alone may no longer reliably distinguish humans from AI. Work has also been done to show that LLMs can be effective at zero-shot anomaly detection, which we think will be useful to pursue in this context.

Limits and Open Questions

Our work does not show that sophisticated LLM bots are currently operating on Bluesky. If they are, we have not yet found a reliable method for detecting them. Our work shows how fragile current approaches are when applied to real-world, ambiguous cases.

There is also a deeper uncertainty: it may be that human and AI-generated social media behavior has already converged to the point where reliable distinction is impossible using content alone, raising the possibility that human and AI social media behavior may already be converging in these contexts.

Because we lack verified examples of human-like AI accounts, we cannot estimate detection accuracy yet. Testing that is the point of the next phase of research.

Ethical note: We deliberately avoided publishing lists of flagged accounts, with the exception of referencing a published known botlist. Our results showed high false-positive risk, including misclassifying human users as bots. Any future dataset or detection work must prioritize harm minimization, anonymization, and informed consent, especially when classifications can affect real people.

Bottom Line

Current detection methods are effective against obvious bots and spam. These detection methods struggle where the problem matters most - ambiguous cases. This problem will be exacerbated as LLM quality improves.

Open questions: - Are there any stable signals that distinguish human and AI participation at all? If so, might these vanish in the future? - Will platforms need to shift from detection to disclosure or provenance mechanisms? - How could moderation systems handle inevitable ambiguity without harming users?

These are gaps our research aims to help close.

Code:

Github repo for the Hackathon project

Contact: [email protected]

We’d love to hear your ideas. If you’re working on related problems or have thoughts on approaches we should try, please reach out.

Acknowledgements

We would like to thank Apart Research, BlueDot Impact, and Halcyon Futures for hosting the hackathon that initiated this project. Lambda.ai provided compute credits. Li-Lian Ang, Mackenzie Puig-Hall, and the Apart Lab Studio provided valuable feedback on our initial draft.

Tools used in writing this post: Claude Opus 4.5, Claude Sonnet 4.5, ChatGPT 5.2, NotebookLM, and SolveIt.