htmlq – A command-line tool for extracting data from HTML

In the past, I’ve already mentioned the jc command in an article. As a reminder, jc allows you to transform textual data from commands or scripts into structured data such as JSON.

And today, I’d like to talk to you about htmlq, which uses the same principle of operation as jq, except that we’re working with structured data in HTML. The tool allows you to select and extract elements from an HTML file using CSS selectors.

To make it easier for you, here’s an example of how to retrieve the HTML contained in an element whose class is .post:

curl --silent https://tech2geek.net/ | htmlq '.post'

For example, to output all the links on a page:

curl https://tech2geek.net/ | htmlq --attribute href a

Or to retrieve only a text format (without HTML tags):

curl --silent https://tech2geek.net | htmlq --text .post

This makes it quite easy to do a lot of things without necessarily having to code something to play with XPaths.

Now, to install htmlq, it depends on your OS:

  • Linux:
  cargo install htmlq
  • FreeBSD:
  pkg install htmlq
  • Homebrew (macOS):
  brew install htmlq
  • Scoop (Windows):
  scoop install htmlq

For all the details, I invite you to read the documentation on GitHub.

"Because of the Google update, I, like many other blogs, lost a lot of traffic."

Join the Newsletter

Please, subscribe to get our latest content by email.

Mohamed SAKHRI
Mohamed SAKHRI

I'm the creator and editor-in-chief of Tech To Geek. Through this little blog, I share with you my passion for technology. I specialize in various operating systems such as Windows, Linux, macOS, and Android, focusing on providing practical and valuable guides.

Articles: 1417

Newsletter Updates

Enter your email address below and subscribe to our newsletter

Leave a Reply

Your email address will not be published. Required fields are marked *