How to convert HTML to text?

How it is possible to convert HTML to text file in Linux? For example I want to curl a query to Google, then convert the output html to text and read converted text on my terminal. I am using RHEL6.

2 Answers

I don't think curl has a built in HTML processor. However:

lynx --dump <URL>

does the trick.

If you still want to use curl, you could use html2text (available in Ubuntu).

1

You can install html2text (an advanced HTML-to-text converter) and the usage is straight forward:

$ html2text
$ cat file.html | html2text -o file.txt

Install by:

  • Linux: apt-get install html2text
  • OS X: brew install html2text

Example with curl:

$ curl -sL google.com | html2text
Search Images Maps Play YouTube News Gmail Drive More ?
Web History | Settings | Sign in A better way to browse the web Get Google Chrome Advanced search Language tools [Google Search][I'm Feeling Lucky] Advertising Programmes Business Solutions+GoogleAbout GoogleGoogle.com ? 2016 - Privacy - Terms

Your Answer

Sign up or log in

Sign up using Google Sign up using Facebook Sign up using Email and Password

Post as a guest

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

You Might Also Like