
ChatGPT’s Code Interpreter, now renamed to Advanced Data Analytics, has been out for some time now. It was launched on July 6th 2023, and is a plugin developed by OpenAI to allow users to upload data and perform analysis on it. This can range from cleaning the data, creating visualisations and summarising the data.
Rather than relying on you to write Python code to analyse your data, you can leverage ChatGPT by telling it what to do in plain English. From that, it will carry out the analysis for you.
As many of my regular readers will know, I am a big fan of matplotlib. Even though the library appears to be clunky and time-consuming to use, it can be used to create stunning visualisations with a little bit of effort.
After playing around with this new tool, I thought it was about time to see how ChatGPT and the Advanced Data Analytics plugin could be used to create custom plots for working with well log data.
Before proceeding and due to the rising legal cases against OpenAI:
Always be cautious of the data you upload to ChatGPT as that data and your input could be used to train future models. If in doubt, avoid uploading any data and always follow your company’s policies.
Enabling Advanced Data Analytics in ChatGPT
To use the Advanced Data Analytics plugin within ChatGPT, you first have to enable it.
This can be done by going to Settings and then selecting Beta Features. In here, you will see the option to turn on Advanced Data Analysis, which will enable the plugin.

The plugin will now be available when you start a new chat.
Uploading and Converting Data to a Pandas Dataframe
To start, we need to upload our file. For this example, I am using a well log data set from the NLOG database (details at the end of the article). This data set contains a series of well log measurements obtained from an oil and gas exploration well.
To begin, we first click the plus icon on the chat input box and then select the file containing our data.

Next, we need to provide a prompt for ChatGPT. In this case, I am going to tell it to be a petrophysicist (a niche role within geoscience) and that it understands well log measurements. This can help fine-tune some of the responses that are returned from ChatGPT.

After submitting the above input, ChatGPT will load the file and look at the contents.
Below is the response we get back from ChatGPT’s Advanced Data Analytics plugin, which provides information about each of the measurements in our dataset.
In this case, it is using pandas to read the CSV file into a dataframe and then output the header using the common df.head()
command.

Most of the comments about the curves contained within the file are correct. However, there are a few inconsistencies, such as DT being Delta-Time. This is partially correct; however, this curve represents Acoustic Compressional Slowness, a measure of the compressional arrival from a sound wave emitted by a sonic tool and travelling through the formation.
In my experiences with ChatGPT, I find that it can be challenging to use within niche topics such as petrophysics and can come back with incorrect information. Something that everyone should be aware of when working with large language models (LLMs).
Always double-check the output and make sure it makes sense.
I have previously covered this topic in the article below:
Clicking on the Show Work drop down box, we can examine the code that was used to load the data. We can see that it has done some basic Python coding to read our CSV file into a pandas dataframe.
import pandas as pd
# Load the data from the provided CSV file
well_data = pd.read_csv("/mnt/data/L0509WellData.csv")
# Display the first few rows of the data
well_data.head()
ChatGPT has also identified placeholder values of -999 in some of the curves. These values are representative of missing data. However, as we will see this can cause some confusion and issues with the responses we will get.
Data Exploration With Code Interpreter
When working with any dataset, the most time-consuming part is understanding what data you have, carrying out data quality checks and cleaning up the data.
This is where I personally see ChatGPTs Advanced Data Analytics plugin being the most helpful to petrophysicists and data scientists. However, as always, you must check the results and work carried out by these AI systems, as they could have inadvertently made a few mistakes.
Whilst experimenting with the data cleaning step, ChatGPT has trouble converting the -999 values to NaNs. When doing this, it also converted the column data type to String.
As a result, I had to be specific in my request to make sure it kept the columns as numeric.

It finally came back with the following response:

And the following code
# Replace -999 values with NaN
well_data.replace(-999, float("nan"), inplace=True)
# Display the first few rows of the updated data
well_data.head()
Which generated the following output:
RESULT
DEPTH GR DT RHOB DRHO NPHI
0 4609.8008 89.059479 <NA> 2.483700 0.018494 0.129119
1 4609.9008 90.721848 <NA> 2.489308 0.013656 0.108034
2 4610.0008 90.709061 <NA> 2.501088 0.011289 0.085650
3 4610.1008 89.568954 66.223099 2.515450 0.008615 0.070332
4 4610.2008 88.169571 66.705551 2.530982 0.005628 0.065343
This is interesting and a little odd that the response is trying to convert a string nan
to a float. I would have used np.nan
in the dataframe’s replace function to put NaNs instead of -999 values. However, this seems to have worked for now, but it will cause issues further on.
Generating a Descriptive Summary
Next, let’s see if we can get a descriptive summary table of each of the measurements in the dataset using the following simple prompt.

ChatGPT returns with the following table in it’s response.

At first glance, it may appear to be the same output as the df.describe()
function; however, ChatGPT has also added units to each of the measurements. This is handy if we want to take this content and place it in a report. Although it would have been nicer to have the units on the left in the row headers rather than in every box.
Additionally, it has converted the porosity units from decimal to percentage. This could potentially be misleading when reporting or passing the information on to colleagues. I would have preferred it leave the data in the original units.
Finally, it has failed to create statistics for the DT curve due to nulls being present – this ties back to the earlier section where ChatGPT failed to convert the -999 values to nulls. However, if we were to use thedf.describe()
method and as long as the data is numeric, then we should still see the statistics of that curve.
To an unaware data analyst, this could potentially slip through without being noticed.
Creating Well Log Plots With Matplotlib and ChatGPT Advanced Data Analytics Plugin
When I first started writing articles on Medium, I focused on how to create basic well log plots with matplotlib and how to work with well log data using Python. That process took a lot of time as I was relatively new to Python and struggled to get the coding right.
Essentially a well log plot consists of subplots, which are often referred to as tracks. Within each of these tracks, different scientific measurements are plotted, which can be used together to develop an interpretation of the subsurface. Each measurement is plotted against depth on the y-acis.
As a quick test of ChatGPT’s Advanced Data Analytics plugin, I wanted to see if it can help recreate a well log plot from my earlier article with a few simple prompts.

The first step is to get it to create a basic well log plot, where each of the measurements are displayed in their own subplot.

After processing and writing the code for the first attempt, ChatGPT, again comes up against the issue of NaN values with the dataset. As a result it has to recreate the plot.

Once the plot has been generated, it comes back with a relatively good first-pass plot of the well measurements.
Interestingly, it has also made a few adjustments to the plot, including:
- Inverting the y-axis, so that we go from a shallow depth to a deeper depth
- Applied standard scales to some of the measurements, e.g. GR is scaled 0 to 150
The code ChatGPT generated is as follows:
# Setting up the figure and axes again
fig, ax = plt.subplots(nrows=1, ncols=6, figsize=(15, 10), sharey=True)
# Plotting DEPTH on the first track
ax[0].plot(well_data['DEPTH'], well_data['DEPTH'], color='black')
ax[0].set_title('DEPTH')
ax[0].set_ylabel('Depth (m)')
ax[0].invert_yaxis() # Invert the y-axis to have depth increasing downwards
ax[0].grid()
# Plotting GR on the second track
ax[1].plot(well_data['GR'], well_data['DEPTH'], color='green')
ax[1].set_title('GR (Gamma Ray)')
ax[1].set_xlim(0, 150) # Typical scale for GR
ax[1].grid()
# Plotting DT on the third track (excluding NaN values)
ax[2].plot(well_data['DT'].dropna(), well_data['DEPTH'].loc[well_data['DT'].dropna().index], color='blue')
ax[2].set_title('DT (Delta Time)')
ax[2].grid()
# Plotting RHOB on the fourth track
ax[3].plot(well_data['RHOB'], well_data['DEPTH'], color='red')
ax[3].set_title('RHOB (Bulk Density)')
ax[3].set_xlim(1.5, 3.0) # Typical scale for RHOB
ax[3].grid()
# Plotting DRHO on the fifth track
ax[4].plot(well_data['DRHO'], well_data['DEPTH'], color='cyan')
ax[4].set_title('DRHO')
ax[4].grid()
# Plotting NPHI on the sixth track
ax[5].plot(well_data['NPHI'], well_data['DEPTH'], color='magenta')
ax[5].set_title('NPHI (Neutron Porosity)')
ax[5].set_xlim(-0.15, 0.45) # Typical scale for NPHI
ax[5].grid()
plt.tight_layout()
plt.show()
This code is very similar to the code in my earlier article, but rather than using subplot2grid, it accesses the individual axes and adds the plots.
We can now take this plot and modify it by making several changes at once.
When working with ChatGPT, I have often found that splitting up prompts into individual instructions can help get the required output. However, to see how ChatGPT’s Code Interpreter responds to multiple commands at once, we can use the following prompt to change the following:
- Remove the subplot containing the Depth curve
- Add gridlines to each subplot
- Add spines to the right and top sides of each subplot
- Add standard colours for each curve
- Make sure that all curves have defined x-axis scale ranges

After further processing, ChatGPT came back with a plot containing the changes I requested.

It also refactored the code from having individual calls to each axis, and it now uses a loop to go through each measurement and apply changes using some common functions such as the spines and grid colour.
# Setting up the figure and axes with one less subplot for depth
fig, ax = plt.subplots(nrows=1, ncols=5, figsize=(15, 10), sharey=True)
# Define a dictionary for colors and scales for each log
colors = {
"GR": "green",
"DT": "magenta",
"RHOB": "red",
"DRHO": "grey",
"NPHI": "blue"
}
scales = {
"GR": (0, 150),
"DT": (140, 40),
"RHOB": (1.95, 2.95),
"DRHO": (-0.2, 0.2),
"NPHI": (0.45, -0.15)
}
# Plotting each log with the specified colors and scales
logs = ['GR', 'DT', 'RHOB', 'DRHO', 'NPHI']
for i, log in enumerate(logs):
ax[i].plot(well_data[log], well_data['DEPTH'], color=colors[log])
ax[i].set_xlim(scales[log])
ax[i].set_title(log)
ax[i].grid(color="lightgrey")
# Adding top and right spines
ax[i].spines['top'].set_visible(True)
ax[i].spines['right'].set_visible(True)
ax[0].set_ylabel('Depth (m)')
ax[0].invert_yaxis() # Invert the y-axis to have depth increasing downwards
plt.tight_layout()
plt.show()
Adding a Variable Fill Between the Y-axis and a Line
On log plots, it is common to apply shading and additional fills between curves and the edges of the subplots. This helps the petrophysicist understand the data through visualisation cures.
One such method is to apply a variable shading for the gamma-ray (GR) to help identify shale and non-shale intervals. For this, we commonly use the Yellow – Orange – Brown colourmap, which can help show clean reservoir intervals as brighter colours and more shaley intervals as darker colours.
I covered the process in depth in my earlier article below.
I have previously attempted this with ChatGPT and Code Interpreter, but it has always been unsuccessful. Perhaps this is down to how I make the prompt.
Let’s see if ChatGPT can help create a YlOrBr colour map shading for the GR curve by being more specific in the prompt.

We get back the following plot:

On the first attempt, it managed to get the fill on the subplot. However, it did not do it from the left-hand side of the subplot to the line.
We can see if ChatGPT will correct this:

And it did, which is very surprising as previous attempts have failed.

We can take a closer look at the code to see what it did to generate the plot.
# Setting up the figure and axes again
fig, ax = plt.subplots(nrows=1, ncols=5, figsize=(15, 10), sharey=True)
# Plotting each log with the specified colors and scales, excluding NaN values for DT
for i, log in enumerate(logs):
valid_indices = well_data[log].dropna().index
ax[i].plot(well_data[log].loc[valid_indices], well_data['DEPTH'].loc[valid_indices], color=colors[log])
ax[i].set_xlim(scales[log])
ax[i].set_title(log)
ax[i].grid(color="lightgrey")
# Adding top and right spines
ax[i].spines['top'].set_visible(True)
ax[i].spines['right'].set_visible(True)
# Adding the fill for the GR subplot
if log == 'GR':
# Add color based on the value of the GR line
for depth, value in zip(well_data['DEPTH'].loc[valid_indices], well_data[log].loc[valid_indices]):
ax[i].fill_betweenx([depth-0.05, depth+0.05],
0, value,
facecolor=plt.cm.YlOrBr(value/150),
edgecolor='none',
zorder=0) # Place the fill behind the line
ax[0].set_ylabel('Depth (m)')
ax[0].invert_yaxis() # Invert the y-axis to have depth increasing downwards
plt.tight_layout()
plt.show()
The approach taken by ChatGPT differs from my earlier attempt, which looped through each sorted value and applied the fill under the curve. Whereas ChatGPT does it depth by depth and then applies the fill.
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
left_col_value = 0
right_col_value = 150
#assign the column to a variable for easier reading
curve = well_data['GR']
#calculate the span of values
span = abs(left_col_value - right_col_value)
#assign a color map
cmap = plt.get_cmap('YlOrBr')
#create array of values to divide up the area under curve
color_index = np.arange(left_col_value, right_col_value, span / 100)
#setup the plot
well_data.plot(x='GR', y='DEPTH', c='black', lw=0.5, legend=False, figsize=(6,15))
plt.ylim(4800, 4600)
plt.xlim(left_col_value, right_col_value)
plt.title('Plot With a Variable Fill to Y-Axis')
#loop through each value in the color_index
for index in sorted(color_index):
index_value = (index - left_col_value)/span
color = cmap(index_value) #obtain colour for color index value
plt.fill_betweenx(well_data['DEPTH'], 0 , curve, where = curve >= index, color = color)
plt.show()
The above code generates the following GR plot with the shading.

Moving Two Lines onto the Same Subplot
On log plots, it is common to plot the Bulk Density (RHOB) and Neutron Porosity (NPHI) on the same track. As these two curves have different scales, we need to place one of them on a secondary x-axis.
This can be tricky to get right in matplotlib, so we will see how ChatGPT manages it with the following prompt.

The returned plot is not terrible. ChatGPT has managed to get the NPHI curve on the same subplot as RHOB and has placed it on a secondary x-axis. However, the labels for the subplot overlap each other and it is not clear what scale belongs to the curves.
Additionally, we now have two sets of grid lines on the subplot, which can cause confusion.

We can modify that with a simple prompt to make sure the labels don’t overlap and remove the gridlines from one of the curves.

ChatGPT has done what it was asked, but it has also added colour to the different labels.
I was not expecting the result to do this, however, it does allow us to easily link the label with the correct line.

Therefore, based on ChatGPT changing the label colours, we can use that as inspiration to change the rest of the labels to match the colour of the lines.
This should help us if we have multiple lines on the same subplot, which is very common with log plots.

ChatGPT has now responded with our final plot, and it is now easy for us to see the NPHI and RHOB scales and what line they belong to.

The following code is what ChatGPT came up with to generate the above plot.
# Setting up the figure and axes again
fig, ax = plt.subplots(nrows=1, ncols=4, figsize=(15, 10), sharey=True)
# Plotting each log with the specified colors and scales, excluding NaN values
for i, log in enumerate(logs):
valid_indices = well_data[log].dropna().index
ax[i].plot(well_data[log].loc[valid_indices], well_data['DEPTH'].loc[valid_indices], color=colors[log])
ax[i].set_xlim(scales[log])
# Remove subplot title
ax[i].set_title("")
# Add axis label representing the name of the log measurement
ax[i].set_xlabel(log, color=colors[log])
# Color tick marks and tick labels in the same color as the line
ax[i].tick_params(axis='x', colors=colors[log])
# Adding top and right spines
ax[i].spines['top'].set_visible(True)
ax[i].spines['right'].set_visible(True)
# Adding the fill for the GR subplot
if log == 'GR':
# Add color based on the value of the GR line
for depth, value in zip(well_data['DEPTH'].loc[valid_indices], well_data[log].loc[valid_indices]):
ax[i].fill_betweenx([depth-0.05, depth+0.05],
0, value,
facecolor=plt.cm.YlOrBr(value/150),
edgecolor='none',
zorder=0) # Place the fill behind the line
ax[i].grid(color="lightgrey")
# Adding NPHI to the RHOB subplot with a secondary x-axis
if log == 'RHOB':
ax2 = ax[i].twiny() # Create a secondary x-axis for NPHI
valid_indices_nphi = well_data['NPHI'].dropna().index
ax2.plot(well_data['NPHI'].loc[valid_indices_nphi], well_data['DEPTH'].loc[valid_indices_nphi], color=colors['NPHI'])
ax2.set_xlim(scales['NPHI'])
ax2.set_xlabel('NPHI', color=colors['NPHI'])
ax2.tick_params(axis='x', colors=colors['NPHI'])
# Remove gridlines for NPHI and display the ones for RHOB
ax2.grid(False)
ax[i].grid(color="lightgrey")
ax[0].set_ylabel('Depth (m)')
ax[0].invert_yaxis() # Invert the y-axis to have depth increasing downwards
plt.tight_layout()
plt.show()
The plot and code look reasonable, and there are a few more modifications I would make. However, I felt at this point, it was best to continue modifying the plot using Jupyter Notebooks.
This was due to a few issues I had previously where if I went back and changed an earlier prompt, it would wipe out anything after that prompt and recreate it.
This was especially problematic if I had left the ChatGPT window open for several hours or even days, and it had timed out.
Summary
Overall, I have found ChatGPT’s Data Analytics plugin (Previously Code Interpreter) to be a useful tool in generating well log plots for petrophysics and geoscience. However, I do have several reservations and issues with using it.
I found it very difficult to replicate results in a new chat instance. I had previously attempted the same process above and ended up with completely different results, and ChatGPT struggled to generate the plot I wanted. This even occurred when using exactly the same prompts.
Sometimes the results generated by the Data Analytics plugin were questionable and even erroneous. As with any LLM, it is always wise to review the output and make sure it makes sense programmatically and technically.
If you make any errors in the prompts, it is not easy to go back and change them. If you do try to change one of your prompts, it can result in the deletion of any chat after that prompt. Therefore, I would recommend copying the code over to Jupyter Notebook as you go along so that you do not lose any information.
Finally, my biggest reservation about using ChatGPT and the Data Analytics plugin is how easy it can be to upload proprietary data (in this example, I have used public data, which is open to use). The data, prompts and output can all be used to train future models without you knowing it. The issues surrounding copyright and Intellectual Property are increasing daily, and extreme caution is advised when working with this tool and proprietary data.
It would be great to hear your comments, thoughts and concerns about using ChatGPT and the Data Analytics plugin.
Thanks for reading. Before you go, you should definitely subscribe to my content and get my articles in your inbox. You can do that here! Also, if you have enjoyed this content and want to show your appreciation, consider giving it a few claps.
Data Used in this Example
The data used within this tutorial was downloaded from NLOG.nl, which is a website that contains well logging data for the entire Dutch sector of the North Sea. The data is free to download and use. Full details of the data licence can be found here.