Octopii – The open-source project that detects personal data

Octopii is a Python script capable of analyzing images using artificial intelligence to verify and recover personal data. The script takes care of straightening and cleaning images and can recognize identification documents such as passports or identity cards. It employs optical character recognition (OCR) to extract data from these documents.

After retrieving information from the image and the text, the algorithm classifies each image, allowing it to be later checked by a human.

The Octopii interface displays collected data (see screenshot).

This tool serves the purpose of checking if personal information has leaked on the Internet or validating the presence of an identity on scanned or transmitted documents.

To install and test Octopii, visit the GitHub project.

  1. Clone the sources using Git:
git clone https://github.com/redhuntlabs/Octopii.git
  1. Install the dependencies:
pip install -r requirements.txt
  1. Install Tesseract (the OCR engine):
sudo apt install tesseract-ocr -y
  1. Launch Octopii by specifying the directory to scan:
python3 octopii.py DIRECTORY/

Octopii is a powerful tool that provides a confidence index on the presence of personal data in images. It is ideal for checking if personal information has leaked on the Internet or for tracking the presence of identifying information on scanned documents accurately.”

5/5 - (1 vote)

See also  Discover VideoCrafter - Your personal director in just one click!
Mohamed SAKHRI

I am Mohamed SAKHRI, the creator and editor-in-chief of Tech To Geek, where I've demonstrated my passion for technology through extensive blogging. My expertise spans various operating systems, including Windows, Linux, macOS, and Android, with a focus on providing practical and valuable guides. Additionally, I delve into WordPress-related subjects. You can find more about me on my Linkedin!, Twitter!, Reddit

Leave a Comment