Octopii: An Open-Source Project for Personal Data Detection

Octopii is a Python script capable of analyzing images using artificial intelligence to verify and recover personal data. The script takes care of straightening and cleaning images and can recognize identification documents such as passports or identity cards. It employs optical character recognition (OCR) to extract data from these documents.

After retrieving information from the image and the text, the algorithm classifies each image, allowing it to be later checked by a human.

The Octopii interface displays collected data (see screenshot).

This tool serves the purpose of checking if personal information has leaked on the Internet or validating the presence of an identity on scanned or transmitted documents.

To install and test Octopii, visit the GitHub project.

  1. Clone the sources using Git:
git clone https://github.com/redhuntlabs/Octopii.git
  1. Install the dependencies:
pip install -r requirements.txt
  1. Install Tesseract (the OCR engine):
sudo apt install tesseract-ocr -y
  1. Launch Octopii by specifying the directory to scan:
python3 octopii.py DIRECTORY/

Octopii is a powerful tool that provides a confidence index on the presence of personal data in images. It is ideal for checking if personal information has leaked on the Internet or for tracking the presence of identifying information on scanned documents accurately.”

"Because of the Google update, I, like many other blogs, lost a lot of traffic."

Join the Newsletter

Please, subscribe to get our latest content by email.

Mohamed SAKHRI
Mohamed SAKHRI

I'm the creator and editor-in-chief of Tech To Geek. Through this little blog, I share with you my passion for technology. I specialize in various operating systems such as Windows, Linux, macOS, and Android, focusing on providing practical and valuable guides.

Articles: 1454

Newsletter Updates

Enter your email address below and subscribe to our newsletter

Leave a Reply

Your email address will not be published. Required fields are marked *