When I look back at my article frequencies, I find out my most productive season for "weekend build and learn" is autumn. I think the reason is the autumn colors and the coffee. Ok. Let’s see what we will build and learn this weekend.

In this build and learn series, I will be working you through the Conception, Design, Architecture, and Build, which conjoins with the Coding and ensures we have end-to-end logical thinking with our favorite coding part.
Use case definition
Since the pandemic and the WFH kick-off, I do not need to take the Go Train to the job, but I still got my registered Go Train delay alert by email. From time to time, when I got the alert email more frequently. And the question that comes up to my mind will be:
- How many delays per day?
- What’s the average delay mins?
- Which station has the most delay occurred.?
- What if those data can build a model to predict the Go Train delay? It will be pretty neat.
- Etc.

Ok, let’s summary our requirements; it breaks down into two items:
- As a user, I would like to see a dashboard, which shows the data elements about the Go Train delay information.
- As a user, I would like to predicate the Go Train delay in the upcoming date.
So for those two requirements, the essential part is the #1 because the #2 is based on #1. Let’s priorities #1.
Technical Analysis
The requirement is a requirement, and it is something people what to archiver or the definition of the outcome. However,
- Can it be done?
- Anything available for the quick deliverable outcomes?
- Are there any technical challenges?
- Etc.
Above analysis, we can call, Business/Technical Analysis, which translates the Business language/requirements to the IT/Technology language. This task is usually under a business analyst (BSA) or Architect in the traditional company, with a project milestone.
In some small startups, it will be under the engineers within the sprints.
The Machine learning Pipeline needs consistent data feeding to increase the accuracy percentage. So, we will need to know how to get the easiest and quick data in this case. Let’s google it and see if an existing API or dataset is available but needed in a scalable way.
Pretty quick, I found Go Train Service does exist some API / Dataset as follow:
However, after some review and analysis, the conclusion is as follows:
- The dataset (GTFS) file does not contain the Go Train Delay’s alert and info, based on what I have searched so far.
- We will need to register for the API to access it; I have submitted it for personal use, but I am still waiting for approval. Based on the dataset and response data structure, I did not see the delay time for the current analysis, which is the critical information we needed.
Ok, let’s step back, and what do we need? We need to know
- the date and time delay happen
- the departure station
- the departure scheduled time
- the destination station
- the destination scheduled time
- delay mins.
But that information exists in the alert email. COOL. How can I consistently automatically get those alert emails?
Architecture
The Architecture will focus more on the How to.
And this is maybe a prerequisite; the email I used to receive the Go Train alert message is Gmail.
After further analysis, we found out that Google GMAIL API provides access to the email, which even has push notification capabilities. This can be used for the incremental data process and avoid heavy processing every time.
Once we have the how to get, now will be the focus on what to get, if we can initiate the access and process the emails, find out only the Go Train Delay email, and then we can retrieve the information we need as the alert email as the pattern as follow:
Subject: Trains delay – Oakville GO 17:32 – Union Station 18:15
The Oakville GO 17:32 – Union Station 18:15 train is estimated to be delayed 10 to 15 minutes from Clarkson GO due to an earlier track inspection. We apologize for the delay. You can use gotracker.ca to track your train and to see when your train is expected to arrive at your station stop:
As you can see, the email title already contained the stations and the scheduled time, and the email content has the min attribute of delay.
It seems we are heading on the right path.
Let see the data flow.

Build, Coding
I recently wrote a lot of Rust; in this build and learn, and I will use the Golang. Golang build for Go Train, pretty cool, right?
Joking aside, we chose Golang because Google Gmail API has the native built-in Golang API SDK, facilitating the development process.
This is an excellent example of focusing on the most effective way to get things done. It’s not about which language/technology is the best, which in my previous article below also articulates this point.
let’s start breaking it down to more technical details flow:
- OAuth Authentication to access the Gmail email via Gmail API.
- Get email only for the Go Train Delay titled email.
- Processing each email, extracting the data element we needed.
- Insert/Load to the Database.
- Using the data to train or produce the Predicate model.
- Expose the API for Train Delay Predicate used.
Before we join into each item, here is our code structure.

1. Authentication
This should be straightforward; however, the only part I want to highlight is the OAuth 2.0 VS Service Account and why we are not using the service account authentication approach.
As per Google:
Requests to the Gmail API must be authorized using OAuth 2.0 credentials. You should use server-side flow when your application needs to access Google APIs on behalf of the user, for example, when the user is offline. This approach requires passing a one-time authorization code from your client to your server; this code is used to acquire an access token and refresh tokens for your server.
Using the credentcial.json, you download from Google GCP means generating the token via the OAuth process. And use the token for future requests.
And Google Golang Gmail API also has the ready-to-go code.
2. Application Initial and main sequence run flow
3. Get the email only for the Go Train Delay Alert Titled
4. Insert/Load to the Database
Finally, running output and result


Bonus – Incremental load
I assume when you pay attention to the final result, you may ask:
Wait a minute; the implementation above only covers the initial load; how about the incremental.
Let’s implement it right now.
The Incremental should base on the update of the email, and it seems it all roll up within the Google API / SDK; we will see if there is any PubSub mechanism we can use to listen to the email update and trigger the processing, which means:
- Create the PubSub Client
2. Listen to the changes
3. Processing the new upcoming email.
The new final result is as follows.

We have the initial load and have the listener for all the incoming delay alerts, which can keep the database growing and increase the training dataset size and volume.
Conclusion for part 1
In part 1, we start with the use case, analysis, conception, solution-ing, and of course, the coding and build.
The target state will be an entire end-to-end ML Ops Pipeline.
In Part 2, we will base on the ETLed table, start building
- Dashboard
- The ML model and the API for the Model API can interact with the user request.
In Part 3, we will focus on the Github action (CI/CD) and deployment.
Thank you for your time staying with me for my build and learn weekend.
Disclaimer: This is a unique build and learns project; the number, the analysis, the dashboard shown within this article it’s not relative to any services or API provided by Metrolinx, and it’s ONLY my analysis comment and for my learning used.
I have been starting my writing journey for almost three years now. Your support is the most important motivation to keep me moving forward and write more exciting learning and sharing.
Also, you can buy me a coffee with the link below to keep me motivated for more weekend build and learn.
