Is your website being bombarded by relentless AI scraping bots, feasting on your bandwidth and leaving you with nothing but a performance drain? If so, you’re not alone. The rise of AI has brought with it a new wave of web scraping, and protecting your valuable content is more critical than ever. Fortunately, there’s a solution: Anubis. This innovative tool acts as a gatekeeper, distinguishing between genuine human visitors and data-hungry bots, effectively safeguarding your website.

Why You Need to Protect Your Website from AI Scraping

The problem is widespread. Websites like kernel.org, Codeberg, ScummVM, FreeCAD, and even some UN sites have implemented protection against these scraping bots to maintain their online presence. As you read this, a constant battle is being waged between website maintainers dedicated to providing valuable content and armies of bots dispatched by AI giants to harvest data for model training.

While services like Cloudflare offer protection, they introduce reliance on centralized intermediaries, which contradicts the open nature of the web. And let’s be honest: if you host a small open-source project, paying for protection against a problem you didn’t create can feel unfair.

Enter Anubis: A Modern Solution Inspired by a Proven Concept

Anubis leverages a clever concept rooted in the past: Hashcash. Originally developed in the early 2000s to combat email spam, Hashcash introduces a small “computational cost” for each request. Think of it as a toll on a road. It’s negligible for occasional visitors but becomes incredibly costly for companies running thousands of “trucks” (bots) per day.

How Anubis Works: A Technical Deep Dive

Here’s how Anubis works in practice:

  1. Visitor Arrives: A user arrives at your website, now protected by Anubis.
  2. Challenge Presented: The server presents the visitor with a mathematical challenge: find a number that, when added to a specific string of characters and passed through a SHA-256 function, produces a hash with a certain number of leading zeros.
  3. Browser Calculation: The visitor’s browser calculates this hash in the background using Web Workers (JavaScript that runs without blocking the user interface).
  4. Verification & Cookie: Once found, the result is sent to the server, which verifies its validity. If correct, a special cookie is generated, authorizing future visits without redoing the test.
READ 👉  Stacer: The Best Linux System Optimizer Tool (Full Review)

This calculation takes only a few seconds for a modern browser on a standard PC, creating acceptable friction. But for an industrial scraper processing millions of pages, it’s a significant hurdle. The system is highly configurable, allowing you to adjust the difficulty (number of required zeros, typically 4-5), use intentionally slow algorithms to penalize identified bots, and create custom rules with regular expressions and a custom expression language (CEL).

Customization at Your Fingertips: Rules and Examples

Here’s an example of a rule to allow API requests while blocking everything else:

- name: allow-api-requests
  action: ALLOW
  expression:
    all:
      - '"Accept" in headers'
      - 'headers["Accept"] == "application/json"'
      - 'path.startsWith("/api/")'

Or, to specifically block Amazon bots:

- name: amazonbot
  user_agent_regex: Amazonbot
  action: DENY

Setting Up Anubis: Simple and Efficient

The best part? Anubis is incredibly easy to set up. A basic VPS with Docker is more than enough, consuming less than 32MB of RAM on average.

The simplest method to set up Anubis is through Docker Compose:

1services:
2  anubis-nginx:
3    image: ghcr.io/techarohq/anubis:latest
4    environment:
5      BIND: ":8080"
6      DIFFICULTY: "4"
7      TARGET: "http://nginx"
8      SERVE_ROBOTS_TXT: "true"
9    ports:
10      - 8080:8080
11  nginx:
12    image: nginx
13    volumes:
14      - "./www:/usr/share/nginx/html"

Anubis is available for Debian, Ubuntu, RHEL, and other popular distributions if you prefer native packages. Experienced users can even install it from the source code.

The documentation provides clear instructions for configuring it with popular web servers:

  • Nginx: Add Anubis as an upstream and redirect all traffic through it.
  • Apache: Similar to Nginx, with Apache-specific details.
  • Traefik or Caddy: Supported for more modern setups.

After configuration, verify functionality by visiting your website. If you see the challenge page initially and then gain normal access, everything is working correctly!

READ 👉  Turn Off Ads in Windows 11: Step-by-Step Guide (2025)

Why Choose Anubis? The Advantages are Clear:

  • Free & Open Source: Unlike pay-per-request solutions, it costs you nothing.
  • Lightweight: Minimal resource consumption.
  • Full Control: You remain in complete control of your website and its traffic.
  • Highly Customizable: Tailor rules to your exact needs.
  • Privacy-Focused: No data sharing with third parties.
  • Independence: Maintain control over your infrastructure.
  • Resistance: Contribute to defending the open web.
  • Community: Support a project driven by ethical values.
  • Intellectual Satisfaction: An elegant solution to a complex problem.

Important Considerations:

  • Anubis requires JavaScript on the client-side.
  • There’s a small chance of blocking some bots or legitimate users if misconfigured.

However, the benefits of Anubis far outweigh the drawbacks when compared to commercial alternatives.

Conclusion:

Tired of watching your bandwidth disappear due to greedy bots? It’s time to take action. A few minutes of configuration today could save you hours of maintenance tomorrow. Anubis offers a powerful, flexible, and free solution to protect your website from AI scraping. If you have any questions about installation or configuration, consult the official documentation. Don’t let bots drain your resources – embrace Anubis and secure your online presence!

Did you enjoy this article? Feel free to share it on social media and subscribe to our newsletter so you never miss a post!And if you'd like to go a step further in supporting us, you can treat us to a virtual coffee ☕️. Thank you for your support ❤️!