The world’s leading publication for data science, AI, and ML professionals.

Building An App That Helps You Catch More Fish

An alert system to notify you when and where to fish using data

Photo by Kelly Sikkema on Unsplash
Photo by Kelly Sikkema on Unsplash

I recently started getting into freshwater fishing. Anyone who’s been fishing knows it can be a hit or miss especially if you’re newer to the sport or are fishing in an unfamiliar location. Nobody wants to get "skunked". Fortunately, fish stocking has made this easier to avoid.

Fish stocking is the practice of raising fish in a hatchery and releasing them into bodies of water to help supplement existing populations or create new populations where none exists. Stocking is usually done by state departments and they also help benefit recreational fishing, offering more abundant fish populations.

In the US, stocking updates are often published online by the state’s conservation or environmental departments. Usually, people flock to locations once they see they’ve been stocked and those spots will be heavily fished over the next few days.

That got me thinking. What if I built an app that could scrape the website and alert me anytime an update on a stocking occurred?

With this In mind, I decided to build a Python script and deploy it on Heroku to run on an automatic schedule to notify me via email anytime a new stocking update is identified. My goal is not necessarily to be first to deplete all the lakes of fish ASAP but just to be aware of any updates instead of having to remember to check it. This way, if I’m out and about and get a notification, I can swing by to do a bit of fishing and enjoy the time of day!

Below, I’ll walk you through how to build this app and how it works so you can build similar Apps.

The Target Data

The updates I’ll be scraping are Connecticut’s Trout Stocking reports published by the CT Department of Energy and Environmental Protection (DEEP). The report is stored in tables in a PDF and hosted on their website.

Page 1 of the CT DEEP Trout Stocking Report. Note: extracting data allowed the CT gov website under policies
Page 1 of the CT DEEP Trout Stocking Report. Note: extracting data allowed the CT gov website under policies

The date listed at the top of the report will serve as our indicator as to whether or not the report has been updated since we last scraped the PDF. Below that, each page has a long-running table that lists all of the specific water bodies, their locations, and details on when they were last stocked.

Reading in PDF Data

The first thing we’ll do is create a 1-line text file last_stocked_date.txtthat stores the last updated date of the PDF. We can set the date 5/1/2021in the file, to begin with. This will allow us to read in the data and store it to a variable LAST_STOCK_DATEin our main program.

Next, we’ll want to download and store the current PDF from the website. To check if the PDF file has been updated, we’ll use pdfplumber to open the file and extract all of the text from the first page only. Using some string manipulation, we’ll then extract the date from that text based on the substring "STOCKING UPDATE AS OF".

Checking For Recent Updates

Here we’ll initiate our main program and check if the date from the PDF is a different date relative to the previous last date we scraped and stored. If so, the program will extract the data from the tables on the first 8 pages and store this into a dataframe. We only need to extract the first 8 pages as these give us all locations that are stocked (anything beyond that is duplicate data).

Next, we will create a new column in our dataframe last_update that only extracts the last time a particular location was stocked and converts this to a datetime type. This allows us to filter the dataframe by those locations that have been stocked in the past 3 days for example.

Additionally, we will want to remember to update our text file storing our last stocked date. This will ensure that the next time our script runs, it will continue on processing only if a new date is identified.

Sending Out an Email Alert

The next thing to do is send out an email to alert us that fish have been stocked and where they’ve been stocked.

To do that, we’ll need to set up environment variables first to collect our Gmail account, credentials, and recipient details. Since sending emails requires providing credentials, we do not want to store such sensitive data in our script directly. Instead, we’ll create an environment file to store this data. This will work for running the script locally but when using a cloud solution such as Heroku for live deployment, we’ll also need to define set these environment variables as Config Vars.

Next, we can load in those environment variables needed.

We will also want to convert our filtered data frame of the stocked locations from before to a string so that we can include it in the message or body of our email. After that, we can piece everything together for the email, choosing any subject line we want for our email.

With that set, we should be able to run the script and receive an email similar to this! Note – when using Google’s Gmail, you may get a security notification the first time and may need to and allow access for less secure apps to send emails using your credentials.

Example of a successful email alert sent by the program (Provided by Author)
Example of a successful email alert sent by the program (Provided by Author)

Putting Everything Together

We now have a fully built alert system for tracking fish stocking updates! The full code will include error handling as well to catch any notify us of any potential issues that may occur if the PDF file format or data is changed in a way that can no longer be captured by the program’s code.

Automating the Script

From here, the final step is automating the alert system to run on its own. There are many ways you can do this but there are two main approaches:

  1. Run it locally and schedule it through software like Task Scheduler (Windows) or cron (Mac or Linux).
  2. Deploy the program to a vendor’s cloud platform such as Heroku or AWS and use their tools to schedule it.

The first approach is easy to set up and free but requires that the machine be powered on for the script to run. The second approach is hosted on a server in the cloud and therefore can run whenever but may be more complex and can cost money to set up.

For our purposes, I’ve deployed the app on Heroku using a free tier membership and have scheduled the script to run every hour. This is enough to run each month for free without going over the free hours or credits Heroku provides for its basic membership. Deploying on Heroku will require a few additional steps in our main program as well as additional files such as requirements.txt and runtime.txt files.

If you prefer to host the program locally, there’s a ton of great tutorials out there on how to use a tool like Task Scheduler.

Automate your Python Scripts with Task Scheduler


I hope you found this tutorial useful and enjoyable. Please reach out if you any questions or feedback. And if you’re interested in a more in-depth guide for deployment on Heroku, please feel free to comment below!

All of the code for this project can be found in this Github Repository which you can download and start running right away either locally or in Heroku if you have an account.


Related Articles