Simple web scraping with Bash: Ski Report
Linux Magazine|#262/September 2022
With one line of Bash code, Pete scrapes the web and builds a desktop notification app to get the daily snow report.
Pete Metcalfe
Simple web scraping with Bash: Ski Report

While recently doing a small project, I was amazed by how much web scraping I could do with just one line of Bash. I used the text-based Lynx browser [1] and then piped the output to a grep search. Figure 1 shows the one-line Bash example that scrapes the current snow depth from the Sunshine Village Snow Forecast web page.

In this article, I will introduce some techniques to easily scrape web pages, and then I will create a desktop notification script that provides the daily snow forecast.

The Lynx Text Browser

For my Bash web scraping, I started out by looking at using command-line tools such as curl [2] with the htm12text [3] utility. This technique definitely works, but I found that using the Lynx browser offers a one-step solution with a slightly cleaner text output.

To install Lynx on Raspian/Debian/ Ubuntu, use:

sudo apt install lynx

The Lynx -dump option will output a web page to text with HTML tags, HTML encoding, and JavaScript removed. Figure 2 shows that a Lynx dump can greatly clean up the original web page and make searching considerably easier.

Sometimes a simple Bash grep search might be all that you need. However, there are many cases where some text manipulation is required. The good news is that Bash has a nice selection of line and string manipulation tools.

Esta historia es de la edición #262/September 2022 de Linux Magazine.

Comience su prueba gratuita de Magzter GOLD de 7 días para acceder a miles de historias premium seleccionadas y a más de 9,000 revistas y periódicos.

Esta historia es de la edición #262/September 2022 de Linux Magazine.

Comience su prueba gratuita de Magzter GOLD de 7 días para acceder a miles de historias premium seleccionadas y a más de 9,000 revistas y periódicos.

MÁS HISTORIAS DE LINUX MAGAZINEVer todo
Tracking your finances with plain text accounting Plain Numbers
Linux Magazine

Tracking your finances with plain text accounting Plain Numbers

If you're tired of tinkering with spreadsheets, using hledger and plain text accounting offers a simpler method for managing your finances without vendor lock-in

time-read
4 minutos  |
#285/August 2024: Kernel Exploits
Dependency resolution with apt-get and apt Evolutionary Tale
Linux Magazine

Dependency resolution with apt-get and apt Evolutionary Tale

Over the past 30 years, the apt family has played an important role in dependency resolution for Debian distros.

time-read
5 minutos  |
#285/August 2024: Kernel Exploits
Cryptomining with Litecoin Traveling Lite
Linux Magazine

Cryptomining with Litecoin Traveling Lite

Although not as popular as headliners like Bitcoin and Ethereum, Litecoin is one of the oldest crytocurrencies, and it offers some useful features, such as dual-mining with Dogecoin.

time-read
5 minutos  |
#285/August 2024: Kernel Exploits
Software Update SnoopGod
Linux Magazine

Software Update SnoopGod

SnoopGod delivers an Ubuntu-based pentesting distribution with an emphasis on security education.

time-read
6 minutos  |
#285/August 2024: Kernel Exploits
Kernel Trouble
Linux Magazine

Kernel Trouble

This deep look at how intruders attack an out-of-date kernel should be enough to convince you of the need to stay vigilant.

time-read
3 minutos  |
#285/August 2024: Kernel Exploits
Using Wake-on-LAN for a NAS backup Power Saver
Linux Magazine

Using Wake-on-LAN for a NAS backup Power Saver

Put your backup server to sleep when you don't need it and then wake it on demand using the Wake-on-LAN feature built into network adapters.

time-read
5 minutos  |
#285/August 2024: Kernel Exploits
Time Travel
Linux Magazine

Time Travel

Mike Schilli uses a Go program to check whether a strategy for trading stocks is making gains or losses on the basis of historical price data.

time-read
8 minutos  |
#285/August 2024: Kernel Exploits
URL filtering with Pi-hole Into the Funnel
Linux Magazine

URL filtering with Pi-hole Into the Funnel

Supporting browser plug-ins, network-based DNS blockers like Pi-hole help protect you against online tracking and unwanted content.

time-read
10+ minutos  |
#274/August 2023: The Best of Small Distros
Artificial intelligence on the Raspberry Pi Learning Experience
Linux Magazine

Artificial intelligence on the Raspberry Pi Learning Experience

You don't need a powerful computer system to use Al. We show what it takes to benefit from Al on the Raspberry Pi and what tasks the small computer can handle.

time-read
7 minutos  |
#274/August 2023: The Best of Small Distros
MakerSpace Manage your greenhouse with a Raspberry Pi Pico W Sheltered Growth
Linux Magazine

MakerSpace Manage your greenhouse with a Raspberry Pi Pico W Sheltered Growth

You can safely assign some greenhouse tasks to a Raspberry Pi Pico W, such as controlling ventilation, automating a heater, and opening and closing windows.

time-read
7 minutos  |
#274/August 2023: The Best of Small Distros