While recently doing a small project, I was amazed by how much web scraping I could do with just one line of Bash. I used the text-based Lynx browser [1] and then piped the output to a grep search. Figure 1 shows the one-line Bash example that scrapes the current snow depth from the Sunshine Village Snow Forecast web page.
In this article, I will introduce some techniques to easily scrape web pages, and then I will create a desktop notification script that provides the daily snow forecast.
The Lynx Text Browser
For my Bash web scraping, I started out by looking at using command-line tools such as curl [2] with the htm12text [3] utility. This technique definitely works, but I found that using the Lynx browser offers a one-step solution with a slightly cleaner text output.
To install Lynx on Raspian/Debian/ Ubuntu, use:
sudo apt install lynx
The Lynx -dump option will output a web page to text with HTML tags, HTML encoding, and JavaScript removed. Figure 2 shows that a Lynx dump can greatly clean up the original web page and make searching considerably easier.
Sometimes a simple Bash grep search might be all that you need. However, there are many cases where some text manipulation is required. The good news is that Bash has a nice selection of line and string manipulation tools.
Esta historia es de la edición #262/September 2022 de Linux Magazine.
Comience su prueba gratuita de Magzter GOLD de 7 días para acceder a miles de historias premium seleccionadas y a más de 9,000 revistas y periódicos.
Ya eres suscriptor ? Conectar
Esta historia es de la edición #262/September 2022 de Linux Magazine.
Comience su prueba gratuita de Magzter GOLD de 7 días para acceder a miles de historias premium seleccionadas y a más de 9,000 revistas y periódicos.
Ya eres suscriptor? Conectar
Tracking your finances with plain text accounting Plain Numbers
If you're tired of tinkering with spreadsheets, using hledger and plain text accounting offers a simpler method for managing your finances without vendor lock-in
Dependency resolution with apt-get and apt Evolutionary Tale
Over the past 30 years, the apt family has played an important role in dependency resolution for Debian distros.
Cryptomining with Litecoin Traveling Lite
Although not as popular as headliners like Bitcoin and Ethereum, Litecoin is one of the oldest crytocurrencies, and it offers some useful features, such as dual-mining with Dogecoin.
Software Update SnoopGod
SnoopGod delivers an Ubuntu-based pentesting distribution with an emphasis on security education.
Kernel Trouble
This deep look at how intruders attack an out-of-date kernel should be enough to convince you of the need to stay vigilant.
Using Wake-on-LAN for a NAS backup Power Saver
Put your backup server to sleep when you don't need it and then wake it on demand using the Wake-on-LAN feature built into network adapters.
Time Travel
Mike Schilli uses a Go program to check whether a strategy for trading stocks is making gains or losses on the basis of historical price data.
URL filtering with Pi-hole Into the Funnel
Supporting browser plug-ins, network-based DNS blockers like Pi-hole help protect you against online tracking and unwanted content.
Artificial intelligence on the Raspberry Pi Learning Experience
You don't need a powerful computer system to use Al. We show what it takes to benefit from Al on the Raspberry Pi and what tasks the small computer can handle.
MakerSpace Manage your greenhouse with a Raspberry Pi Pico W Sheltered Growth
You can safely assign some greenhouse tasks to a Raspberry Pi Pico W, such as controlling ventilation, automating a heater, and opening and closing windows.