Octopii is a Python script capable of analyzing images using artificial intelligence to verify and recover personal data. The script takes care of straightening and cleaning images and can recognize identification documents such as passports or identity cards. It employs optical character recognition (OCR) to extract data from these documents.
After retrieving information from the image and the text, the algorithm classifies each image, allowing it to be later checked by a human.
The Octopii interface displays collected data (see screenshot).
This tool serves the purpose of checking if personal information has leaked on the Internet or validating the presence of an identity on scanned or transmitted documents.
To install and test Octopii, visit the GitHub project.
- Clone the sources using Git:
git clone https://github.com/redhuntlabs/Octopii.git
- Install the dependencies:
pip install -r requirements.txt
- Install Tesseract (the OCR engine):
sudo apt install tesseract-ocr -y
- Launch Octopii by specifying the directory to scan:
python3 octopii.py DIRECTORY/
Octopii is a powerful tool that provides a confidence index on the presence of personal data in images. It is ideal for checking if personal information has leaked on the Internet or for tracking the presence of identifying information on scanned documents accurately.”