The world’s leading publication for data science, AI, and ML professionals.

Creating a Multi-Well Integrated Well Log and Formation Tops Dataframe in Python

Combining formation data and well log measurements for multiple wells in Python

When working with well log measurements and subsurface data we are often dealing with different file formats and sample rates. For instance, well log measurements are typically stored and transferred within .las files or dlis files and sampled every 0.1m or 0.5ft. Geological formation tops on the other hand are single discrete depth points. This requires interpolation of the formation data to match the sample rates of well log measurements.

In my previous tutorial, we saw how to merge well log data and formation data for a single well. Within this tutorial, we are going to see how we can do this for multiple wells.

Importing Libraries

The first step in the process is to import the libraries we will be working with.

For this tutorial we will be using lasio to load in .las files, os to read files from a directory, pandas to enable us to work with dataframes, and csv to load formation data stored within csv files.

Importing Well Log LAS Files Using LASIO

Next, we will begin importing the data.

The data used within this tutorial was downloaded from, which is a website that contains well Logging data for the entire Dutch sector of the North Sea. The data is free to download and use. Full details of the data licence can be found here, but a summary of the usage is provided here from the Intellectual Property Rights section:

NLOG.NL does not claim any rights (except domain names, trademark rights, patents and other intellectual property rights) in respect of information provided on or through this website. Users are permitted to copy, to download and to disclose in any way, to distribute or to simplify the information provided on this website without the prior written permission of NLOG.NL or the lawful consent of the entitled party. Users are also permitted to copy, duplicate, process or edit the information and/or layout, provided NLOG.NL is quoted as the source.

From the website, we will be using data from three wells: L07–01, L07–05 and L07–04.

When loading in single files, we can easily pass the file location into the function. However, as we are working with multiple .las files, we need to read them separately and append them together within a list.

The dataframes stored within that list are then joined together using pd.concat() .

The following code will read all files ending in .las within a directory named Data/Notebook 36.

Once the full file name has been obtained (line 8) it will be combined with the directory path it is stored in. The las file is then read (line 11) and converted to a pandas dataframe (line 12).

In order to distinguish which well the data came from, we can add a new column called WELL to the dataframe. Its value will be set to the well name (line 15–16) contained within the well header section of the las file.

When loading files with LASIO and converting them to dataframes, the index of the dataframe will be set to the depth curve. We can change this so that we have a simple integer index and depth as an actual column within the dataframe. This is achieved by using the .reset_index() function within Pandas (line 19).

Next, we need to sort the dataframe so that it goes from the shallowest depth measurement to the deepest (line 20). Doing this will put the index out of order, so we need to reset the index again, but this time, we do not want the index transformed into a column (line 20) so we need to set the drop parameter to True.

Once the dataframe has been sorted, we can then append it to our dataframe list: df_list (line 21).

This process repeats until all available .las files have been read within the specified directory.

Finally, the dataframes stored within the list are joined together using pd.concat() (line 24).

When we call upon the well_df dataframe, we get back the following view of the first 5 and last 5 rows.

Loading Formation Tops Data from CSV

Formation top data is often stored within tabular form, most commonly within .csv files. These files will contain the name of the geological formation, and the associated top and bottom depth.

The csv files for this example already have a column called Well, which contains the well name. Doing this upfront prior to loading them into Python is helpful, but not essential. If you don’t do this though, you may have to extract the well name from the file name, which can be more time-consuming.

In the code example below, we again create a blank list (line 2) to store the dataframes in.

Next, we loop over all files ending with .csv within the specified directory and read them using pd.read_csv(), and then append them to the list called df_formation_list

Once all files have been read, we can then call upon pd.concat() to join the dataframes together.

When we call upon the formations_df dataframe we get back the following view:

Creating a Dictionary of Formation Data

Now that we have the formations within a simple pandas dataframe, we now need to convert this dataframe to a nested dictionary.

This makes the process of combining the two datasets much easier and allows us to create a continuous column with the formation name at each depth level.

We can do this by using a dictionary comprehension.

Once we have run the above code, we can call upon formations_dict and we get back the following result.

From it, we can see that the main key is the well name, and within each well we have a sub-dictionary with the depth as the key and the formation name as the value.

You may wonder why we are using depth as the key rather than the formation name. Doing it this way will allow us to check if the depth we are currently at (in the loop we will cover in the next section) is between two of the keys. If it is, then we can simply get the formation name.

If we want to view the tops for a specific well, we can call upon specific wells within the call to the dictionary like so:


Which will return the formation data for that specific well:

Merging Formation Data with Well Log Data

Now that the processing and setup have been completed, we can move on to integrating the formation tops dictionary with the well log dataframe.

For this, we will use the following function.

The function first gets the depths (keys from formations_dict) for the formations of the well (well_name ) that is passed in.

We then need to catch a few edge cases.

First, we need to see if we are at the last formation within the formations dictionary. If we are then we will set a flag at_last_formation to True, otherwise, we will create a new variable called below which will be the closest formation depth below the current depth (depth).

Next, we need to see if we are at the first formation within the dictionary (lines 12–17). In this situation, we are checking if we have any depth values above the first formation listed. This can occur if we only have formations from a specific depth instead of from the surface. If the current depth is above (shallower) the first formation depth, then we will set the formation name to a blank string (lines 19–20). Otherwise, we will get the depth value for the formation from above the current depth.

Finally, we need to check where the current depth value sits within the formation depths. If we don’t do this, then the correct formation name will not be set. If the current depth is equal to one contained within the formation dictionary then we will set it to the formation name at the depth listed.

Once the function has been written, we can call upon it using the apply method within pandas. This allows us to iterate over each row within the dataframe.

When we call upon well_df we get back the following view of the dataframe:

Checking the Final Result

When doing anything like this, it is essential you check the results close up. For example, we can check the results within one of the wells, between the depths where we expect a formation change.

We can do that as follows.

In the original formation tops csv file above, we can see that the transition between the Brussels Marl Member and the Ieper Member occurs at 930 ft. This occurs at the same point within the combined dataframe.

This helps us gain confidence that the process has worked.

To be sure, it is always wise to check multiple wells and intervals in this way, or by generating a well log plot.


Integrating well log data and formation data for multiple wells can be a challenge within Python. Within this short article, we have seen how to load multiple las files and formation data files and combine them together into a single dataframe.

This will allow us to integrate formation data and well log data into machine learning models or well log displays.

Be sure to check out the following article if you are looking to deal with a single well:

Combining Formation Data With Well Log Measurements in Pandas

Thanks for reading. Before you go, you should definitely subscribe to my content and get my articles in your inbox. You can do that here! Alternatively, you can sign up for my newsletter to get additional content straight into your inbox for free.

Secondly, you can get the full Medium experience and support me and thousands of other writers by signing up for a membership. It only costs you $5 a month, and you have full access to all of the amazing Medium articles, as well as the chance to make money with your writing. If you sign up using my link, you will support me directly with a portion of your fee, and it won’t cost you more. If you do so, thank you so much for your support!

Related Articles