How to Mount a Remote Zip and Access Its Files Without Downloading Everything with Cloudzip

Imagine you have a huge zip archive stored somewhere in the cloud, say on an S3 bucket, and you need to access a few specific files inside. What do you do? Well, like everyone else, you download the entire 32 GB, unzip the whole thing, and all that just to retrieve 3 miserable files…

Well, guess what? I’ve found a nifty tool that will make your life easier: Cloudzip! It allows you to mount your remote zip archive directly on your machine, like an external hard drive, so you can access the files you need, copy them, use them, all without having to download the entire archive.

Example:

cz ls s3://example-bucket/path/to/archive.zip

Pretty cool, right?

Cloudzip’s operation is quite ingenious. It is based on two simple but incredibly effective principles:

  1. Zip files allow random read access. They have a “central directory” stored at the end of the archive that describes all the contained files, with their offsets. No need to read the entire archive to find a file.
  2. Most HTTP servers and cloud storage services (S3, Google Cloud Storage, Azure Blob Storage, etc.) support HTTP requests with “range” headers. Basically, this allows you to fetch only a part of a remote file.

By combining these two principles, Cloudzip can retrieve just the central directory of your zip archive (which weighs only a few KB) to get the list of files and then download only the file segments you need when you access them!

To install:

git clone https://github.com/ozkatz/cloudzip.git
cd cloudzip
go build -o cz main.go

Then copy the cz binary to a location accessible via your $PATH:

cp cz /usr/local/bin/

And where it gets even crazier (oops, I meant “interesting”) is that with the mount parameter, Cloudzip can actually mount your remote zip archive as a local directory. It starts a small local NFS server and mounts this NFS directory in the folder of your choice.

Another example:

cz mount s3://example-bucket/path/to/archive.zip some_dir/

This way, you have access to all your files as if they were local, you can open them directly in your applications, process them, all without ever having to download the entire archive.

And the best part of all this is that Cloudzip works with almost all remote storages you can imagine. Of course, there’s S3, but also HTTP, HTTPS, GCS, Azure, and even… drumroll… Kaggle!

Ah, Kaggle, that haven of Data Scientists where datasets are larger than a Bitcoin miner’s electricity meter… Cloudzip can use Kaggle’s API to directly fetch the zip of a dataset without having to download it. You can literally mount a Kaggle dataset locally and start working on it in seconds. And if you ever need a particular file to test something, no problem, it will be downloaded on demand.

Of course, it’s not perfect. The NFS mount, for example, is only available on Linux and macOS for now. And don’t expect crazy performance either, we’re still talking about downloading file segments over the network. But for all those cases where you need to access a few files in a huge zip archive, it’s perfect!

And besides, it’s open-source (you didn’t think I would recommend a proprietary tool, did you!). You can find the project on GitHub.

"Because of the Google update, I, like many other blogs, lost a lot of traffic."

Join the Newsletter

Please, subscribe to get our latest content by email.

Mohamed SAKHRI
Mohamed SAKHRI

I'm the creator and editor-in-chief of Tech To Geek. Through this little blog, I share with you my passion for technology. I specialize in various operating systems such as Windows, Linux, macOS, and Android, focusing on providing practical and valuable guides.

Articles: 1634

Newsletter Updates

Enter your email address below and subscribe to our newsletter

Leave a Reply

Your email address will not be published. Required fields are marked *