AI Bots Bypass Defenses, Overwhelm Codeberg with DDoS-like Traffic

Theregister

The open-source code hosting platform Codeberg, a community-driven initiative based in Berlin, finds itself at the forefront of a new digital battleground, grappling with an overwhelming surge of sophisticated AI bots. These automated agents have successfully circumvented Codeberg’s previously robust defense system, known as Anubis, leading to significant service disruptions and highlighting a growing threat to online communities.

Anubis, designed as a “tarpit,” functions as a proof-of-work proxy, requiring incoming connections to perform intensive computational challenges before granting access. This mechanism was implemented to deter malicious AI crawlers and reduce the need for constant manual blacklisting, effectively safeguarding Codeberg’s infrastructure for months. However, in a troubling development, Codeberg volunteer staff recently reported via Mastodon that the AI crawlers have “learned how to solve the Anubis challenges,” rendering the defense ineffective. This bypass is particularly concerning given a recent fix for Anubis (commit e09d0226a628f04b1d80fd83bee777894a45cd02) which addressed a vulnerability where sophisticated attackers could bypass the proof-of-work by supplying a difficulty of zero. The sheer volume of this bot traffic has effectively initiated a denial-of-service (DoS) attack, causing “extreme slowness” across the platform. Codeberg has also noted that some of the offending bots appear to originate from networks controlled by Huawei, a China-based telecommunications company.

The incident at Codeberg is far from isolated; it underscores a pervasive and escalating challenge facing open-source projects and, indeed, the wider internet. Reports indicate that “bad bots” constituted a staggering 71% of all bot traffic in 2024, a notable increase from 63% in 2023. A significant portion of this surge is attributed to AI-driven “grey bots” that indiscriminately scrape data to train large language models (LLMs) without explicit permission, placing an immense and often uncompensated burden on hosting providers and volunteer-run communities. This relentless scraping not only inflates bandwidth and hosting costs but also degrades service for legitimate users, forcing some platforms to block entire countries or implement their own “AI mazes” to combat the deluge.

The ethical implications of this unchecked AI activity are a growing point of contention within the tech community. Bradley M. Kuhn, a policy fellow at the Software Freedom Conservancy, has openly condemned these actions, labeling them as “DDoS attacks against the kindest and most giving people in our community.” He asserts that companies deploying bots for LLM training should be held accountable for their “insatiable greed for more and more training data.” Critics argue that the current business models of many AI companies are unsustainable without this free access to data, advocating for stronger regulations to ensure fair compensation and adherence to established internet protocols. The ongoing arms race between evolving AI defenses and increasingly sophisticated bots, which are now capable of autonomous exploit generation and advanced phishing, signals a critical juncture for digital security and the sustainability of open-source ecosystems. Codeberg’s exploration of alternative defenses like “Iocaine” highlights the urgent need for new strategies in this evolving threat landscape.