Today, I want to talk to you about a really cool tool for archiving web pages. Sure, you can already save a web page with your browser, but this tool, called Monolith, does it 1000 times better. It not only saves the target page but also embeds all CSS elements, images, and JavaScript into a single HTML5 file.
Unlike a standard save or even using wget, Monolith integrates all assets as data URLs. This means that your browser will display the page exactly as it was on the web, even without an Internet connection!
Installing it is super simple. Whether you are on Windows, macOS, GNU/Linux, or even on exotic devices with ARM processors, it will work:
- With Cargo (cross-platform):
cargo install monolith - Via Homebrew (macOS and GNU/Linux):
brew install monolith - With Snapcraft (GNU/Linux):
snap install monolith - And many other options…
To save, for example, this article from my site, just enter the following command:
monolith https://www.tech2geek.net/monolith-archivage-web-html-autonome.html -o monolith.html
And bam, it generates a monolith.html file with everything in it. You can open it easily in your browser even without internet access, it’s magical.
But Monolith has many more tricks up its sleeve. You can, for instance, use it directly with a STDIN input:
cat index.html | monolith -aMcIiFfv -b https://site.com/ - > result.html
Here, we pass the HTML content via the standard input, with a few additional options:
-ato remove audio-Mto not add date and URL information-cto exclude CSS-Ito isolate the document-ito remove images-Fto exclude web fonts-fto skip frames-vto remove videos
In short, you have full control over what you want to keep or exclude.
You can also specify allowed or forbidden domains for fetching assets, like:
monolith -I -d example.com -d www.example.com https://example.com -o example-only.html
Here we only allow the domains example.com and www.example.com. Everything else will be excluded. Or conversely, you can exclude domains, typically those serving ads:
monolith -I -B -d .googleusercontent.com -d googleanalytics.com -d .google.com https://example.com -o example-no-ads.html
Note that Monolith does not embed a JavaScript engine. So for more complex web pages that fetch data after the initial load, it can be limited. But no worries! We can use a headless browser like Chromium beforehand to preprocess the page before passing it to Monolith:
chromium --headless --incognito --dump-dom https://github.com | monolith - -I -b https://github.com -o github.html
And there you go, problem solved!
Perfect for web archivists or data hoarders who want to keep a trace of everything, or even automate it all in their scripts.
And if you'd like to go a step further in supporting us, you can treat us to a virtual coffee ☕️. Thank you for your support ❤️!
We do not support or promote any form of piracy, copyright infringement, or illegal use of software, video content, or digital resources.
Any mention of third-party sites, tools, or platforms is purely for informational purposes. It is the responsibility of each reader to comply with the laws in their country, as well as the terms of use of the services mentioned.
We strongly encourage the use of legal, open-source, or official solutions in a responsible manner.


Comments