From Punch Cards to ChatGPT

My grandfather’s glimpse into generative AI

Ty Stephens
Towards Data Science

--

My maternal grandfather, Skip, was always a farmer in my eyes. Tragically, my mother passed away from leukemia just a month after my birth in 1988. Being the first grandchild in the family, Skip and I were very close. As a child, I spent my days riding on the arm rest of tractors and combines during annual wheat harvest — and when I became a teenager, I worked the farm as a summer job myself.

Aerial view of a John Deere combine during a summer wheat harvest. My Grandfather and I used two of these during our own wheat harvest seasons in the late 1990s / early 2000s. It was my first “job”. Photo by Scott Goodwill on Unsplash

Yet, Skip’s earlier life was a far cry from the farming world I knew him for. Before I entered the scene, he had delved deep into academia, completing his PhD coursework in Statistics at Texas A&M, College Station, by 1972. Soon after, he embraced a professorship at the University of Maryland, finalizing his thesis in 1974. His groundbreaking research aimed at predicting and pinpointing safety and material risks in industrial settings. This monumental task demanded years of effort. He had to manually gather a decade’s worth of accident reports from diverse companies, process the statistics manually, and then convert these insights into punch card instructions for the university’s computer system. Securing time on that computer was not immediate; it required reservations made weeks or even months ahead. A single coding error could mean starting from scratch, potentially stalling his research by several months.

He left that life in the 1980s to take over the family farm back in East Texas and branch off into entrepreneurship. But the desire to leverage statistical inference was baked into everything he did as a farmer — I just didn’t realize it as a child. To my child’s mind, Skip was doing “office work” as he called it — but in reality he was leveraging IT to forecast and secure financing to meet his operational expenses, optimizing chemistry in fertilizers to increase crop yields, developing strategies to reduce uncertainty in his cash flows through trading futures on the Chicago Commodity Exchange all on a TRS 80 he bought at Radio Shack with 16 KB of RAM connected to a dot matrix printer. Agriculture can be a terribly low margin business — and Skip’s bet was he could use statistics to level the field a bit.

Over the years, the farm did not endure the test of time. Turns out inter-generational farming does not fare all too well when forced to skip a generation — and today’s input cost are more unforgiving than ever — economy of scale becomes the only way to compete profitably — and so most small to mid-size farmers from Skip’s generation were bought out and consolidated — but that happens gradually — little by little (at least it did for us).

I, of course, grew to appreciate the strong connection between statistics and agriculture. I still remember the annual visit from the U.S. Department of Agriculture meticulously sampling crop yields (on every farm, ours included) as part of their National Agricultural Statistics Service. Which provides, in my opinion, one of the great unsung and ongoing data projects in history — helping generations of farmers make “data-driven” decisions — before that was even a buzzword. But I found even more appreciation for all that Skip did decades later as I embarked on my own Analytics and Data Science career — my own second act after spending my 20’s and 30’s serving and globetrotting as a US Army Officer. I often reconnect with him over the phone asking him to recount how they used to run regressions or simulations or how they controlled for random sampling “back in the day”. And then occasionally telling him about how they are doing it now to gauge his excitement as I described to him concepts like machine learning, deep learning, reinforcement learning — — it is somewhat science fiction to him — but he loves hearing about it — even if it isn’t quite real to him this late in life.

I gave Skip real time updates when I saw John Deere present their keynote at CES in January 2023 — showcasing new equipment, like the sprayer pictured above, that uses 36 cameras, computer vision, and neural networks to detect weeds from crops in real time and minimize the use of herbicides. Image by Author.

Enter the experiment.

I decided over the weekend to show not tell. I wanted Skip to experience the new technology for himself — not just hear me talk about it. What if we could make a business plan to launch a hypothetical new farm in under 40 minutes? I could do it using ChatGPT 4 with the Advanced Data Analysis enabled, over screen share to his iPhone — we’d collaborate together to build it.

We briefly discussed our assumptions:

  1. We have 1,000 acres to farm in East Texas
  2. We have $1M in start up capital to purchase equipment with
  3. We need to secure a line of credit from a bank on year 1 operating expense to get started
  4. We are interested in growing Wheat and Soybeans (what Skip actually farmed back in the day)

We started simple:

Prompt 1: “Write me an outline for a business plan to start a new farm in East Texas to grow wheat and soybeans”.

Skip was immediately impressed with the long form results. But wait, we were just getting started…

Prompt 2: “Give me a list of equipment I need to get started”
Prompt 3: “Put it in a table and prioritize numerically based on what is most urgently needed to least urgent”
Prompt 4: “now create a new column for budget allocation, Assume I’m going use a combination of leases and purchases, create an additional column to recommend lease vs purchase, assume I have $1M in starting capital.”

(Time elapsed 4 minutes)

Result:

Data Generated from GPT-4, Image by Author.

Prompt 5: “How much should I set aside to ask for a line of credit for variable cost (seed, fertilizer, labor, etc.), assume 1000 acres. Create a new table.”

Result:

Data Generated from ChatGPT 4, Image by Author

By this point, I felt like Skip has just witnessed Apollo 11 all over again. He could barely contain his excitement. But we still weren’t done…

Prompt 6: “Write me a Monte Carlo simulation that I can adjust my input assumptions and test the outcome on my my budget”

Results:

import numpy as np
import matplotlib.pyplot as plt

# Define number of simulations
n_simulations = 10000

# Assumptions with [mean, std deviation]
wheat_seed_cost = [20, 2] # Cost per acre
soybean_seed_cost = [50, 5]
fertilizer_wheat_cost = [50, 5]
fertilizer_soybean_cost = [40, 4]
labor_cost = [25, 3]
pesticide_cost = [30, 3]
fuel_maintenance_cost = [15, 2]
miscellaneous_cost = [10, 1]

def simulate_farm_costs():
wheat_seed = np.random.normal(wheat_seed_cost[0], wheat_seed_cost[1])
soybean_seed = np.random.normal(soybean_seed_cost[0], soybean_seed_cost[1])
fertilizer_wheat = np.random.normal(fertilizer_wheat_cost[0], fertilizer_wheat_cost[1])
fertilizer_soybean = np.random.normal(fertilizer_soybean_cost[0], fertilizer_soybean_cost[1])
labor = np.random.normal(labor_cost[0], labor_cost[1])
pesticide = np.random.normal(pesticide_cost[0], pesticide_cost[1])
fuel_maintenance = np.random.normal(fuel_maintenance_cost[0], fuel_maintenance_cost[1])
miscellaneous = np.random.normal(miscellaneous_cost[0], miscellaneous_cost[1])

total_cost = (wheat_seed + soybean_seed + fertilizer_wheat + fertilizer_soybean +
labor + pesticide + fuel_maintenance + miscellaneous) * 1000 # for 1000 acres

return total_cost

results = [simulate_farm_costs() for _ in range(n_simulations)]

# Visualization
plt.hist(results, bins=50, edgecolor='black', alpha=0.7)
plt.title('Distribution of Total Farm Costs')
plt.xlabel('Total Cost')
plt.ylabel('Frequency')
plt.grid(True, alpha=0.2)
plt.show()

Now for the moment of truth — will the code run? We hopped over to Jupyter notebook to run the code. And there it is on the first try:

Data Generated from ChatGPT 4, Image Generated by Python, Image from Author

By this point, we were only 10–15 minutes in. Still have some time to spare…Could we make it interactive?

We need to create a list of parameters that allow for the user to dynamically adjust the assumptions on a slider. Again we turn the ChatGPT for some recommendations of what those parameters need to be based on the Monte Carlo we built earlier:

Data Generated by GPT 4, Image by Author

Once we have our list of parameters built, we create a ‘measures’ table in Power BI tied to 16 slicer visuals that allow the user to manually select their inputs and dynamically update the Monte Carlo simulation. To do this we create a ‘Python Visual’ in Power BI, drag all of our measures in, and then update the code as follows:

# The following code to create a dataframe and remove duplicated rows is always executed and acts as a preamble for your script: 

# dataset = pandas.DataFrame(fertilizer_soybean_cost_avg Value, fertilizer_soybean_cost_std Value, fertilizer_wheat_cost_avg Value, fertilizer_wheat_cost_std Value, fuel_maintenance_cost_avg Value, fuel_maintenance_cost_std Value, labor_cost_avg Value, labor_cost_std Value, miscellaneous_cost_avg Value, miscellaneous_cost_std Value, pesticide_cost_avg Value, pesticide_cost_std Value, soybean_seed_cost_avg Value, wheat_seed_cost_avg Value, wheat_seed_cost_std Value)
# dataset = dataset.drop_duplicates()

# Paste or type your script code here:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Assuming the data from Power BI is passed as 'dataset'
df = dataset

# Fetch values from the dataset
wheat_seed_cost_avg = df['wheat_seed_cost_avg Value'].iloc[0]
wheat_seed_cost_std = df['wheat_seed_cost_std Value'].iloc[0]
soybean_seed_cost_avg = df['soybean_seed_cost_avg Value'].iloc[0]
soybean_seed_cost_std = df['soybean_seed_cost_std Value'].iloc[0]
fertilizer_wheat_cost_avg = df['fertilizer_wheat_cost_avg Value'].iloc[0]
fertilizer_wheat_cost_std = df['fertilizer_wheat_cost_std Value'].iloc[0]
fertilizer_soybean_cost_avg = df['fertilizer_soybean_cost_avg Value'].iloc[0]
fertilizer_soybean_cost_std = df['fertilizer_soybean_cost_std Value'].iloc[0]
labor_cost_avg = df['labor_cost_avg Value'].iloc[0]
labor_cost_std = df['labor_cost_std Value'].iloc[0]
pesticide_cost_avg = df['pesticide_cost_avg Value'].iloc[0]
pesticide_cost_std = df['pesticide_cost_std Value'].iloc[0]
fuel_maintenance_cost_avg = df['fuel_maintenance_cost_avg Value'].iloc[0]
fuel_maintenance_cost_std = df['fuel_maintenance_cost_std Value'].iloc[0]
miscellaneous_cost_avg = df['miscellaneous_cost_avg Value'].iloc[0]
miscellaneous_cost_std = df['miscellaneous_cost_std Value'].iloc[0]

# Define number of simulations
n_simulations = 10000

# Assumptions with [mean, std deviation]
wheat_seed_cost = [wheat_seed_cost_avg, wheat_seed_cost_std]
soybean_seed_cost = [soybean_seed_cost_avg, soybean_seed_cost_std]
fertilizer_wheat_cost = [fertilizer_wheat_cost_avg, fertilizer_wheat_cost_std]
fertilizer_soybean_cost = [fertilizer_soybean_cost_avg, fertilizer_soybean_cost_std]
labor_cost = [labor_cost_avg, labor_cost_std]
pesticide_cost = [pesticide_cost_avg, pesticide_cost_std]
fuel_maintenance_cost = [fuel_maintenance_cost_avg, fuel_maintenance_cost_std]
miscellaneous_cost = [miscellaneous_cost_avg, miscellaneous_cost_std]

def simulate_farm_costs():
wheat_seed = np.random.normal(wheat_seed_cost[0], wheat_seed_cost[1])
soybean_seed = np.random.normal(soybean_seed_cost[0], soybean_seed_cost[1])
fertilizer_wheat = np.random.normal(fertilizer_wheat_cost[0], fertilizer_wheat_cost[1])
fertilizer_soybean = np.random.normal(fertilizer_soybean_cost[0], fertilizer_soybean_cost[1])
labor = np.random.normal(labor_cost[0], labor_cost[1])
pesticide = np.random.normal(pesticide_cost[0], pesticide_cost[1])
fuel_maintenance = np.random.normal(fuel_maintenance_cost[0], fuel_maintenance_cost[1])
miscellaneous = np.random.normal(miscellaneous_cost[0], miscellaneous_cost[1])

total_cost = (wheat_seed + soybean_seed + fertilizer_wheat + fertilizer_soybean +
labor + pesticide + fuel_maintenance + miscellaneous) * 1000 # for 1000 acres

return total_cost

results = [simulate_farm_costs() for _ in range(n_simulations)]

# Convert results into a dataframe
df_simulated_results = pd.DataFrame(results, columns=['Total Cost'])

# Calculate the Interquartile Range (IQR)
Q1 = df_simulated_results['Total Cost'].quantile(0.25)
Q3 = df_simulated_results['Total Cost'].quantile(0.75)

# Plotting the histogram
plt.figure(figsize=(10, 6))
n, bins, patches = plt.hist(df_simulated_results['Total Cost'], bins=50, color='blue', edgecolor='black', alpha=0.7)
plt.title('Distribution of Year 1 Variable Farm Costs from Simulation')
plt.xlabel('Year 1 Variable Cost')
plt.ylabel('Frequency')
plt.grid(True, which='both', linestyle='--', linewidth=0.5)

# Shade the IQR
for i in range(len(bins)):
if bins[i] > Q1 and bins[i] < Q3:
patches[i].set_facecolor('green')

plt.axvline(Q1, color='red', linestyle='dashed', linewidth=1)
plt.axvline(Q3, color='red', linestyle='dashed', linewidth=1)
plt.tight_layout()
plt.savefig('simulated_costs_histogram.png') # This will save the figure as an image file
plt.show()

Just for fun, we prompted ChatGPT to define the Inter-Quartile Range (IQR) and shade it a different color, we also manually updated the chart labels and x-axis. The rest was just cleaning up the visuals a bit in Power BI to make it a little more user friendly. The end result:

Data Generated by Monte Carlo Simulation based on User Selected Parameter Inputs, Python Code Generated by ChatGPT 4, Dashboard built in MS PowerBI, Image by Author

Now we have a dynamic Monte Carlo simulation to play around with different input cost assumptions and project out the variable operating expense we would need to get our farming operation off the ground. Leveraging ChatGPT 4, we wrote virtually no code, just tweaked it a bit, did most of the work from an iPhone over screen share, built the last piece in Power BI desktop and screen shared it over the Power BI iPhone app. All in about 30–40 minutes of work.

My grandfather’s verdict? “We accomplished in 40 minutes what would have taken him 2 years to do on his own ‘back in the day’.” Yes, I recognize there is a lot more we could do — and the “simulation” is far from perfect. (For example, we do not delineate between % of crops devoted to soybeans vs wheat.) But for 40 minutes? Even I was impressed. And this is the promise of Gen AI — democratizing data science, encouraging experimentation, and speeding up the ability to develop all within the palm of your hand. Allowing a grandfather and his grandson an opportunity to reconnect over some statistics and leveraging technology in new and unexpected ways.

--

--