Building Web App For Canada-US Border Crossing Wait Time Forecast

Peng Wang
Towards Data Science
5 min readNov 25, 2019

--

Black Friday is just around the corner. Many Canadian shoppers will drive south and join the shopping extravaganza. If you plan to travel cross border on November 29, you want to plan smartly to avoid long delays at the border. To solve this problem, we built a web app that forecasts border crossing wait time in the next 7 days.

Here is the workflow of the project:

Image by Imre Tömösvári on Unsplash
  1. Retrieve border crossing wait time from Cascade Gateway API
  2. Build predictive model for future crossing using Python + XGBoost
  3. Develop web app REST API using Flask, HTML, CSS, ajax
  4. Deploy web app on AWS
  5. Refresh data and re-build predictive model daily

Link to web app http://35.164.32.109:5000/

It doe not have a permanent address because it’s a low-cost AWS instance. Currently forecast is only available for southbound traffic at Peace Arch crossing (between British Columbia and state of Washington) for the next 7 days.

One thing to note: volume of car passing at the border could be used for prediction instead of wait time. Volume should be more predictable because it does not vary frequently due to border control changes, e.g. tougher security checks, number of lanes/posts open, and sometimes on-site constructions. However, from a commuter perspective, he/she is more interested in knowing time delays than total number of vehicles passing the border.

Data Gathering

Border crossing wait time data is retrieved from Cascade Gateway data warehouse. It has a well-documented API reference and it’s quite simple to use.

We extract hourly crossing wait time since 01/01/2014 for Peach Arch (BC-Washington) southbound cars. Here are some example records:

Group Starts,Avg - Delay (Peace Arch)
2018-01-01 07:00:00,0.3
2018-01-01 08:00:00,1.6
2018-01-01 09:00:00,1.3
2018-01-01 10:00:00,18.8
2018-01-01 11:00:00,37.8
2018-01-01 12:00:00,41.4
2018-01-01 13:00:00,49.1

Border crossing wait time records of a day before is added to our model training on a daily basis.

Model Training

From the border crossing date and time, we extract year, month, day of month, day of week, and hour as training features.

data['HourOfDay'] = data['Date_time'].dt.hour        
data['Year'] = data['Date_time'].dt.year
data['Month'] = data['Date_time'].dt.month
data['DayOfMonth'] = data['Date_time'].dt.day
data['DayOfWeek'] = data['Date_time'].dt.dayofweek

BC and WA holidays are also added as features.

# Get Canadian - BC holidays
ca_holidays = holidays.CountryHoliday('CA', prov='BC', state=None)
# Check each date what Canadian holiday it is
data['Holiday_CA'] = [ca_holidays.get(x) for x in data['Date_time']]
# Treat Observed holiday same as regular
data['Holiday_CA'] = pd.Series(data['Holiday_CA']).str.replace(" \(Observed\)", "")
# Convert holiday columns
data = pd.get_dummies(data, columns=['Holiday_CA'])
# Get US - WA holidays
us_holidays = holidays.CountryHoliday('US', prov=None, state='WA')
data['Holiday_US'] = [us_holidays.get(x) for x in data['Date_time']]
data['Holiday_US'] = pd.Series(data['Holiday_US']).str.replace(" \(Observed\)", "")
data = pd.get_dummies(data, columns=['Holiday_US'])

We train our model using XGBoost algorithm and evaluate its performance using RMSE (root mean square error).

n_iter = 48
tscv = TimeSeriesSplit(n_splits=4)
xgb_regressor = xgb.XGBRegressor(random_state=29, n_jobs=-1)
xgb_grid_search = RandomizedSearchCV(xgb_regressor,
xgb_parameters,
n_iter = n_iter,
cv=tscv,
scoring = 'neg_mean_squared_error',
verbose=1,
n_jobs=-1,
random_state= 50)

We can define a grid of XGBoost hyperparameter ranges, and randomly sample from the grid. The set of parameters that produces the best result is selected as the final model.

xgb_parameters = {'objective': ['reg:squarederror'],
'n_estimators': [80, 100, 120],
'learning_rate': [0.01, 0.1, 0.5],
'gamma': [0, 0.01, 0.1],
'reg_lambda': [0.5, 1],
'max_depth': [3, 5, 10],
'subsample': [0.5, 1.0],
'colsample_bytree': [0.5, 0.7, 1],
'seed':[0]
}

Wait time records of the last 7 days in the dataset are used for validation. Here is an example comparing real border crossing delays with our predictions for 08/25 to 08/31/2018.

Finally, we can save our model and make it ready for generating forecasts for the future 7 days.

Flask Web App

Now, we are done with building our predictive model and ready to expose it via web app. Flask is a micro web framework and is popular for building web applications with Python. It is light as it does not require particular libraries to start. It is designed to make getting started quick and easy, with the ability to extend to complex tasks.

AWS Setup

There are plenty of resources on how to set up an AWS EC2 instance, e.g. AWS user guide, fast.ai reference, etc. We use Linux 2 AMI t2.micro which has 1 CPU and 1GB memory because it’s cheap. Just remember to configure HTTP, SSH and port 5000 (used for our web app access) in the security group.

Environment Setup and Package Install

After set up AWS EC2 instance, clone our GitHub project

git clone https://github.com/wangpengcn/border-crossing-delay-forecast-web-app-flask.git border_forecast

Then, run script to install python, create virtual environment, install necessary packages, and schedule daily model rebuild

./install.sh

Let’s take a look what this script does.

Define some environment variables.

FOLDER_PATH=”/home/ec2-user/border_forecast/”
FOLDER_NAME=”border_forecast”
VENV_NAME=”venv_border_forecast”

Perform yum install update and install Python 3.6.

sudo yum update
sudo yum install python36-pip

Note, if not sure what Python package available to install, run sudo yum list | grep python3 to check.

Next, install and create virtual environment, which has its own site directories and is isolated from system site directories. It makes project deployment much easier and independent of other ongoing Python projects.

pip3 install — user virtualenv
python3 -m venv $VENV_NAME

Activate our virtual environment.

source $VENV_NAME/bin/activate

Next, install required Python packages

$VENV_NAME/bin/pip3 install -r requirements.txt

That’s it! Your Python environment is good to go.

The last line in this script is to schedule a daily job to rebuild prediction model and generate forecast for next 7 days.

(crontab -l 2>/dev/null; echo “0 5 * * * cd $FOLDER_PATH && $VENV_NAME/bin/python3 border_wait_time_forecast.py > /tmp/border.log”) | crontab -

It’s coffee time! Let the script run for several minutes, and we’re ready to launch our web app.

Run Web App

Run ./run_app.sh to launch the web app. It’ll run with a warning message “This is a development server“. This is because we’re running on a Flask development server. It should be replaced by a WSGI server in production instead.

Web app now is running on http://0.0.0.0:5000 When access, replace 0.0.0.0 with your AWS IPv4 public IP e.g. http://35.164.32.109:5000/

Done! Your web app is now live!

Future Works

  • To expand forecast for other border crossing ports
  • Allow user reporting actual border crossing time and incorporate into model

Python code can be found on my GitHub.

Happy Thanksgiving! Happy Machine Learning!

If you like this post or have any question, feel free to leave a comment. Connect me on LinkedIn.

--

--