Technology

A Solo Developer Mapped 10,000 Malware Repos on GitHub. The Bait Is Aimed at AI Coding Agents.

The clones copy a real project, keep the code and contributors, and poison only the README. New repos outrank the originals in search.

Janet Torvalds

June 26, 2026

A developer who writes under the alias Orchid wanted to check whether search engines had indexed one of his GitHub projects. He typed the name into Google and found his repo. He typed the same name into Bing and found someone else's: same name, same description, a full copy of his commit history, with his own account listed as a contributor. An hour earlier, a new commit had added a link to a ZIP archive in the README.

That was the thread he pulled. By the time he published his writeup on June 18, he had found roughly 10,000 GitHub repositories running the same scam, and he had the script to prove it.

The trick is distribution, not the malware

The payload here is not new. What is worth paying attention to is how it gets in front of you.

The operators clone a repository that was created recently, not a popular one. They copy the entire commit history and the contributor list, so the page looks like a real project that has been around for a while. It is not a fork, so GitHub's fork tooling does not group it with the original. The only thing they change is the README, which gets stripped of the real instructions and fitted with a link to a ZIP file buried deep in the directory tree, at a path like repo/some/deep/path/project-name-version.zip so it reads as an ordinary build artifact.

Then they keep poking it. Every few hours the repo deletes its last commit and pushes an identical one back, always named "Update README.md." Orchid's read is that the constant churn is meant to ride GitHub's freshness signals and stay near the top of search results. Cloning new repositories instead of established ones is the same idea: a brand-new project name is a low-competition search term, so the clone surfaces first for anyone looking for that exact thing.

What is in the ZIP

The archive is consistent across the campaign. It holds a one-line batch launcher (Application.cmd or Launcher.cmd), a renamed LuaJIT executable (loader.exe, luajit.exe, boot.exe, names rotate), an obfuscated Lua script hidden under a benign .txt or .log extension, and in some variants a lua51.dll. Run the batch file and it starts the LuaJIT interpreter with the obfuscated script as its argument.

That script is a loader the security firm HexaStrike tracks as SmartLoader. HexaStrike published its own teardown on April 18, after independently finding 109 of these repositories across 103 accounts. Their description of the execution chain is specific enough to trust: SmartLoader uses LuaJIT's foreign function interface to call Windows APIs straight from Lua, hides its console window, runs an anti-debug check using shellcode copied into executable memory, then resolves its command-and-control server through a smart contract on the Polygon blockchain. Reading the live C2 address off-chain means the operator can rotate infrastructure without rebuilding the loader or touching every staged copy. From there it pulls a second Lua stage from another GitHub repo and loads StealC, a commodity infostealer that lifts crypto wallets, saved logins, credit cards, cookies, browser history, and session data from Steam, Discord, and Telegram.

One detail that matters for anyone checking a suspicious repo: pasting the link to the ZIP into VirusTotal comes back clean. You have to submit the archive itself before it flags the Trojan.

Why the timing is bad

Several developers on Hacker News raised the same point Orchid did, which is that the people most likely to actually run one of these archives are not people. They are AI coding agents.

A human browsing GitHub has to be talked into downloading a ZIP and executing what is inside it, which is a high bar. An agent searching for a dependency to satisfy a build does not have that instinct. It searches for a package, lands on the poisoned clone that outranks the real one, and pulls the attached file. As one commenter, posting as guhcampos, put it: "They just need to appear on a fraction of the searches agents do to add dependencies and get lucky a couple of times to start a new infection cluster." That is a lead, not a confirmed targeting strategy. Nobody has published the agent-infection telemetry to prove it. But the incentives line up, and the clones are built to look exactly like the legitimate dependency an agent went looking for.

How long this has been running

Not days. HexaStrike dated the campaign back at least seven weeks from its April writeup based on compile timestamps and commit history. A separate analysis Orchid cites, from derp.ca, traces the smart-contract infrastructure to a deployment in March 2025. A Reddit thread complaining about spoofed repositories goes back to February 2025. So this has been live, in one form or another, for well over a year.

GitHub's handling is the uncomfortable part. Orchid says it took nearly two months and repeated requests to get two clones of his project removed. Once he published his list of 9,330 repositories, GitHub deleted most of them quickly. Then he reran the script, found fresh ones, and watched them sit untouched.

"These repositories have been around for many months, some even for over a year, and GitHub does not automatically detect and delete them."

His method is not exotic. He downloaded GitHub's own public event archive from GH Archive, filtered five days of data (about 16 million commit-push events) down to repos updated between 1 and 24 times a day, got 40,000 candidates, and found that roughly 10,000 of them, a quarter, matched the full pattern. He hit GitHub's API rate limit of 5,000 requests an hour doing it, which is why he thinks the real number is higher. GitHub has no such limit on its own infrastructure. Another writeup he points to notes the clones can be surfaced with a single search query: path:README.md /software-v.*.zip/.

HexaStrike's assessment is that one operator is behind it: "The campaign appears to be operated by a single threat actor or tightly controlled cluster based on infrastructure overlap, synchronized repository updates, and consistent tooling." Who that is remains unknown.

What to actually do

If you maintain open source on GitHub, search for clones of your own projects by name and report them. If you pull code or build artifacts from a repo you found through search, check that the account and history are real before you run anything, and submit the actual ZIP to VirusTotal rather than the link. If you run coding agents that fetch dependencies, the takeaway is that "it was the top GitHub result" is not provenance. Pin your sources.

HexaStrikefake repositoriessupply chain attackGitHubrepo confusionGitHub malwareMalwareAI coding agentsSoftware supply chainLuaJIT loaderSmartLoaderinfostealerCybersecurityOrchidStealCPolygon C2

Sources (4)

I discovered a large-scale malware distribution on GitHuborchidfiles.com
10,000 malicious GitHub repos detected: AI agents compromising their ownerscybernews.com
Cloned, Loaded, and Stolen: How 109 Fake GitHub Repositories Delivered SmartLoader and StealChexastrike.com
Hacker News discussionnews.ycombinator.com

Keep reading

Technology

300 GitHub Repos Could Be Hijacked by Anyone With a Free Account. The Bug Has No CVE.

Janet Torvalds

Technology

FIRST raised its 2026 vulnerability forecast to about 66,000 CVEs as AI takes over bug hunting

Janet Torvalds

Technology

Two independent teams ran real physics and security problems on IBM's Nighthawk quantum chip

Janet Torvalds

Technology

Tata Electronics confirms a breach as World Leaks dumps 630GB of Apple supplier files

Janet Torvalds