Automating the Christmas Bird Count

Modern tech applied to the longest-running community science bird project

John Hurley
Towards Data Science

--

Photo by Ray Hennessy on Unsplash

The Christmas Bird Count has been a tradition for 120 years. Started by ornithologist Frank M. Chapman in 1900, it was intended to encourage conservation by counting birds rather than hunting them. One thing that hasn’t changed since then is that it involves a lot of manual paperwork.

I had the privilege of participating in my first annual Christmas Bird Count (CBC) in San Jose, CA this past winter (2019). One thing that surprised me was the number of paper forms that had to be filled out by each team. These were turned in to the CBC compiler for the county, who then entered this data manually into the Audubon site. Given that our count circles are in the heart of Silicon Valley, we should be able to improve this state of affairs.

The long term advantage is that a new generation of birders is already familiar with much of this technology. Apps such as eBird, Birdathon, Merlin and Sibley are rapidly gaining acceptance. The use of fillable PDFs saves time for the participants and the compiler, and improves readability.

In addition, having code that can quickly manipulate the electronic data means that checklists can be sorted in multiple ways, data can be collected and processed more rapidly and time currently spent doing manual tasks can be spent birding instead.

Goals

The ultimate goal of this project is to allow a CBC team to automate the generation of the required forms using eBird data, and have them submitted electronically to Audubon.

We are using some of these services experimentally for the San Jose (CASJ), Calero-Morgan Hill (CACR) and Palo Alto (CAPA) count circles during the upcoming 2020 Christmas Bird Count.

This year in particular, the COVID-19 pandemic provides additional incentive for reducing physical paperwork. In the past, forms could be handed in at the traditional count dinner, but that won’t be possible this year.

Tech is a tool, not an end in itself, so we have to keep in mind that the primary activities of the day are birding and collecting scientific data. For many birders, the paper forms are more familiar and convenient, and it is important that people can contribute however they like. The workflow supports a mix of automated and manual input.

The GitHub repository initially will contain some fillable PDFs as the code itself is still in flux. Based on the results of using the services with local count circles, I will clean up the Python code and publish it in early 2021.

Fillable PDFs

One of the simplest ways to automate is to allow data entry on a computer rather than on a paper form. This makes the data more readable and able to be processed electronically.

A fillable PDF is the electronic equivalent of a paper form. Using Adobe Acrobat, one can take a PDF file and scan it for data entry fields. Additional fields can be manually added. The final result is a new PDF that can be printed out normally. If it is opened in a browser or PDF reader, the user can type into the fields and then save or print.

The official Audubon Rare Bird form and a CASJ specific Eagle log have both been converted to fillable PDFs and are available in the GitHub repo.

Services

The overall project is split into a set of semi-independent services that are aimed at different audiences. This is a summary of the services provided by this project.

Note: The single most useful service for sector leaders and compilers is the “Merge” service which takes species/counts lists from individuals and teams and aggregates the results into a summary with subtotals.

Services before, during and after count day

The services are divided into three groups: those for use before Count Day, during Count Day, and after Count Day. The services can be used singly or in combination, and in general the output of a service is an Excel spreadsheet or a PDF file (for printing or emailing).

Each service is a Jupyter notebook that requires a Python environment running 3.7 or later. This does limit the people who will be able to use this, but I am hoping to develop these into web services that will be much more accessible. Not everyone needs to be a programmer, as the outputs from the service are mostly Excel spreadsheets, and the fillable PDFs can be used by anyone familiar with a web browser. On count day, the recorder for each team will be using the usual Cornell eBird app on their mobile device.

Service-Merge

The Merge service takes a number of individual sector (or team) tally sheets and creates a single spreadsheet with a grand total for each species and columns for each sector total. The names of the sector columns are derived from the names of the input tally sheets.

Summary for multiple sectors

There are several hidden columns, among them “Rare” and “Taxon Order”. The “Taxon Order” is important for sorting in various ways and currently uses Clements/eBird. Both AOU and IOC taxon orders will be added soon, along with the order used by the Audubon compiler entry system, as soon as I can figure out what that is. It largely agrees with AOU, but there are some differences and their taxonomy has some obsolete names, e.g. Western Scrub-Jay instead of California Scrub-jay.

The “Rare” column is carried along from the original count checklist created by the compiler. In this context “Rare” means that the participant must fill out a Rare Bird Form to submit to the regional Audubon office.

The most important thing to realize about this service is that it is actually very simple at its core: it takes a number of tally sheets and produces a summary tally sheet. The only essential information for each input tally sheet is that it has a “CommonName” and “Total” column, so it can be a CSV, Excel or Numbers file.

By cascading this service multiple times, a summary for the whole count circle can be produced. First, combine team results for each team in a sector into a sector summary. Next, combine each of the sector summaries using the merge service into the summary for the count circle.

Service-ProcessEBird

The ProcessEBird service pulls all checklists filed on count day for the given county and filters using the participant list to limit the results. The output is an Excel spreadsheet with multiple sheets: counts, rarities, locations, team efforts (miles/hours), team details, and a list of all filed checklists.

The first sheet is the list of species and counts and will be merged later into the full circle report.

Sheet 1 of the Summary report

Species that have a non-zero count are highlighted in green, and of course can be filtered to only show the species with non-zero counts for entry into the Audubon site.

The Rarities sheet shows a count for each species that the compiler marked as rare (i.e. needing a Rare Bird Form). This is broken out by eBird location.

Subset of the rarities sheet

The locations sheet shows details about the locations pulled from eBird.

Locations details

The Team Efforts sheet pulls data from eBird for time and distance.

Team Efforts Sheet

The information in each of these sheets can be used to fill in data for Audubon on the compiler site. At some point in the future, I would hope that a file like this could be sent directly to Audubon and automatically tallied.

Service-EMailToContacts

The EMailToContacts service reads a saved email and creates an initial contacts database with the names and emails extracted from the email. Audubon needs names, email, phone and city at a minimum for all participants. The contact information is also used by the Rare Bird Form service to pre-populate the relevant fields on the form.

There is still some amount of work on the part of the sector leaders to collect additional information from the participants such as city of residence and phone number. A better long term solution might be a standardized registration page that could collect this information.

Service-Parse

The Parse service reads a tally sheet and creates a fully populated spreadsheet with species names standardized, annotations such as Rare and taxonomic order. It accepts many formats: PDF, Excel, Word, CSV, and plain text. This service is useful for standardization and enriching the fields available for sorting and printing.

This service was one of the most fun to write, as I used various natural language processing (NLP) techniques to detect species names even in the presence of misspellings, abbreviations, etc.

An example of the output from this service is shown below.

Output from processing the checklist for CAHF

The main features of this tally sheet are the species in taxonomic order with the species group. If the compiler designated a species as “Rare” (i.e. needing a Rare Bird Form), the name is shown in bold. Additional fields can be used to show the difficulty of finding a particular species, Adult/Immature counts (particularly for eagles) or different morphs. Any of these can be shown or hidden for printed tally sheets.

Service-RareBird

This service works in conjunction with the EMailContacts service and the ProcessEBird service. For each species that needs a writeup (“rare”), data pulled from eBird and the contacts database are used to partially populate a copy of the fillable rare bird form.

There is some additional data that still needs to be filled in manually, but the species and contact data is already filled in.

Service-Weather

This service is described more fully in my article “Weather or not…”. It produces a summary weather report for the day at 7AM, 11AM, 1PM and at the time it is run. It uses data from the nearest weather station and can be helpful when filling in forms that need weather information.

Caveats

This code assumes that you are using Python 3.7 or later and have an environment where you can run a Jupyter notebook. The CSV files and Excel files can be read and written by Apple Numbers, Microsoft Excel and Google Sheets. To run the ProcessEBird service you will need an API key for the eBird API.

Most of the reports will work if the date is not today, but the weather report is only valid on the actual day.

Conclusion

My hope is that these tools will help streamline the process for the Christmas Bird Count and help collect more accurate information with less effort. Realistically, full automation will take years but we can always do it in steps and make incremental improvements.

Questions, suggestions or requests are welcomed.

References & Links

--

--