Simple web scraping with Bash: Ski Report
Linux Magazine|#262/September 2022
With one line of Bash code, Pete scrapes the web and builds a desktop notification app to get the daily snow report.
Pete Metcalfe
Simple web scraping with Bash: Ski Report

While recently doing a small project, I was amazed by how much web scraping I could do with just one line of Bash. I used the text-based Lynx browser [1] and then piped the output to a grep search. Figure 1 shows the one-line Bash example that scrapes the current snow depth from the Sunshine Village Snow Forecast web page.

In this article, I will introduce some techniques to easily scrape web pages, and then I will create a desktop notification script that provides the daily snow forecast.

The Lynx Text Browser

For my Bash web scraping, I started out by looking at using command-line tools such as curl [2] with the htm12text [3] utility. This technique definitely works, but I found that using the Lynx browser offers a one-step solution with a slightly cleaner text output.

To install Lynx on Raspian/Debian/ Ubuntu, use:

sudo apt install lynx

The Lynx -dump option will output a web page to text with HTML tags, HTML encoding, and JavaScript removed. Figure 2 shows that a Lynx dump can greatly clean up the original web page and make searching considerably easier.

Sometimes a simple Bash grep search might be all that you need. However, there are many cases where some text manipulation is required. The good news is that Bash has a nice selection of line and string manipulation tools.

Esta historia es de la edición #262/September 2022 de Linux Magazine.

Comience su prueba gratuita de Magzter GOLD de 7 días para acceder a miles de historias premium seleccionadas y a más de 9,000 revistas y periódicos.

Esta historia es de la edición #262/September 2022 de Linux Magazine.

Comience su prueba gratuita de Magzter GOLD de 7 días para acceder a miles de historias premium seleccionadas y a más de 9,000 revistas y periódicos.

MÁS HISTORIAS DE LINUX MAGAZINEVer todo
MADDOG'S DOGHOUSE
Linux Magazine

MADDOG'S DOGHOUSE

The stakeholder approach of open source broadens the pool of who can access, influence, and benefit from information technologies.

time-read
3 minutos  |
#289/December 2024: Coding with AI
MakerSpace
Linux Magazine

MakerSpace

Rust, a potential successor to C/C++, claims to solve some memory safety issues while maintaining high performance. We look at Rust on embedded systems, where memory safety, concurrency, and security are equally important

time-read
10+ minutos  |
#289/December 2024: Coding with AI
In Harmony
Linux Magazine

In Harmony

Using the Go Interface mechanism, Mike demonstrates its practical application with a refresh program for local copies of Git repositories.

time-read
9 minutos  |
#289/December 2024: Coding with AI
Monkey Business
Linux Magazine

Monkey Business

Even small changes in a web page can improve the browsing experience. Your preferred web browser provides all the tools you need to inject JavaScript to adapt the page. You just need a browser with its debugging tools, some knowledge of scripting, and the browser extension Tampermonkey.

time-read
10+ minutos  |
#289/December 2024: Coding with AI
Smarter Navigation
Linux Magazine

Smarter Navigation

Zoxide, a modern version of cd, lets you navigate long directory paths with less typing.

time-read
4 minutos  |
#289/December 2024: Coding with AI
Through the Back Door
Linux Magazine

Through the Back Door

Cybercriminals are increasingly discovering Linux and adapting malware previously designed for Windows systems. We take you inside the Linux version of a famous Windows ransomware tool.

time-read
9 minutos  |
#289/December 2024: Coding with AI
Page Pulse
Linux Magazine

Page Pulse

Do you want to be alerted when a product is back in stock on your favorite online store? Do you want to know when a website without an RSS feed gets an update? With changedetection.io, you can stay up-to-date on website changes.

time-read
8 minutos  |
#289/December 2024: Coding with AI
Arco Linux
Linux Magazine

Arco Linux

ArcoLinux, an Arch derivative, offers easier installs while educating users about Arch Linux along the way.

time-read
5 minutos  |
#289/December 2024: Coding with AI
Ghost Coder
Linux Magazine

Ghost Coder

Artificial intelligence is increasingly supporting programmers in their daily work. How effective are these tools? What are the dangers? And how can you benefit from Al-assisted development today?

time-read
10+ minutos  |
#289/December 2024: Coding with AI
Zack's Kernel News
Linux Magazine

Zack's Kernel News

Chronicler Zack Brown reports on the latest news, views, dilemmas, and developments within the Linux kernel community.

time-read
9 minutos  |
#289/December 2024: Coding with AI