The world’s leading publication for data science, AI, and ML professionals.

Unlocking Data from Graphs: How to Digitise Plots and Figures with WebPlotDigitizer

Unlocking Digital Potential from Static Image Data

Going from paper to digital. Image generated using DALLE by the author.
Going from paper to digital. Image generated using DALLE by the author.

When working within Data Science, geoscience or petrophysics, we often come across data or charts that are in image form, such as those within publications. However, the associated data is not present which means it can be difficult to use this data in our interpretation or research.

This is where a tool like WebPlotDigitizer becomes really useful. This online tool helps us take those charts from images and turn them into data that we can use for further research and analysis.

WebPlotDigitizer – Extract data from plots, images, and maps

There are a number of areas in Petrophysics and geoscience where digitising charts can be very beneficial, including:

  • Extracting and digitising charts within Service Company chart books
  • Digitising well log measurements from images
  • Digitising figures from publications for further research
  • Digitising well positions from a map

In this article, we will see how we can use the WebPlotDigitizer to extract data from a scatter plot made with synthetic data. In most cases, the quality of the figures we may deal with will likely be poorer.

Synthetic data on a scatter plot that will be used for extraction. Image by the author.
Synthetic data on a scatter plot that will be used for extraction. Image by the author.

Also, it is important to remember that when we use data from sources, we should always cite where it came from as well as the methods of how that data was obtained.

Loading the Image File

After capturing the image from the publication, it is time to load it into the WebPlotDigitizer.

To do this, we first navigate to:

File -> Load Image Files

Here, we can choose what type of plot we are dealing with.

As we are working with a simple scatter plot with data points and a regression line, we can select the 2D (X-Y) Plot.

Once the correct chart has been selected, we can begin the process of aligning the axes.

Setup the Axes

To setup the axes, we need to define the positions of the four key positions, the start and end of the x-axis, and the start and end of the y-axis.

First, we set the X1 and X2 positions, followed by the Y1 and Y2 positions.

Chart after selecting the x and y start (x1, y1) and end (y1, y2) points. Image by the author.
Chart after selecting the x and y start (x1, y1) and end (y1, y2) points. Image by the author.

To improve accuracy when selecting the points, you can click on each one and manually nudge the point with the arrow keys on your keyboard to match the coordinates between all four points. This assumes that the figure is straight, to begin with.

For example, on the y-axis, I need to make sure my x coordinate is the same for both y1 and y2

Close up of selecting and micro-adjusting the selected axes points. Image by the author.
Close up of selecting and micro-adjusting the selected axes points. Image by the author.

Once these positions have been defined, we can assign the values to these positions. In our figure, we go from 0 to 10 on the X-axis and from 0 to 100 on the Y-axis.

Setting the values for the x and y axes. Image by the author.
Setting the values for the x and y axes. Image by the author.

Extracting Point Data

We can extract the point data from this chart in a couple of ways:

  • Automatically by drawing areas around points and running algorithms
  • Manually by clicking on each marker separately

Automatic Extraction of Data Points from a Chart

The first is by using the Automatic Extraction feature. This works great if our image is very clear and crisp. I have found it fails to work properly when dealing with figures that have been scanned in and those that are of poor quality.

To do this, we click on Box on the Automatic Extraction tools.

Then, we select the foreground colour of the points we want to select. In this case, we are going to select the blue markers.

Parameters for automatic selection. Image by the author.
Parameters for automatic selection. Image by the author.

Next, we go over the plot and draw a box over all of the data points.

Animation of how the automatic point selection works in WebPlotDigitizer. Image by the author.
Animation of how the automatic point selection works in WebPlotDigitizer. Image by the author.

You will notice that when we used the automatic selection, the algorithm incorrectly selected the marker within the legend. This can easily be deleted by:

  • Selecting Delete Point on the Manual Extraction menu
Options for adding, deleting and adjusting selected points. Image by the author.
Options for adding, deleting and adjusting selected points. Image by the author.
  • Hovering the mouse cursor over the data point we want to delete
  • Left mouse clicking on the point

Manual Extraction of Data Points from a Chart

In cases where we have an old scanned image or the image is black and white, the automatic extraction can fail.

In these situations, we need to extract the data points manually.

This can be done by:

  • Selecting Add Point from the Manual Extraction menu
  • Left mouse clicking on the points in the chart
  • Adjusting points afterwards by using the arrow keys on the keyboard
Example of manually selecting markers in WebPlotDigitizer. Image by the author.
Example of manually selecting markers in WebPlotDigitizer. Image by the author.

Viewing The Extracted Data in WebPlotDigitizer

Once we are happy with the selected points, we can view the data by clicking the View Data button under the Dataset menu on the left.

Raw data view of the extracted data points. Image by the author.
Raw data view of the extracted data points. Image by the author.

Here, we can see the actual values that were picked, and if we want, we can change the formatting of the data points. For example, changing the numbers from several decimal places to 2.

Extracted data points after changing the number formatting. Image by the author.
Extracted data points after changing the number formatting. Image by the author.

We can also sort the values either by the x or the y-axis, which is handy if we have manually selected the points.

Once we are happy with the format of the data, we can export it to clipboard or into a CSV.

However, there is one feature that I really like, and that is graphing it up directly in Plotly. Not only does this create a nice interactive online visualisation, but it also saves you time from having to write the code yourself to verify the extracted data.

Exporting raw data to automatically graph in Plotly. By default, the plot will be set to a Line Plot. Image by the author.
Exporting raw data to automatically graph in Plotly. By default, the plot will be set to a Line Plot. Image by the author.

By default, the chart will open as a line chart. But that can easily be changed on the left hand side by selecting Type and changing it to a scatter plot.

The various styles of plot that can be viewed on the Online Plotly chart viewer. Image by the author.
The various styles of plot that can be viewed on the Online Plotly chart viewer. Image by the author.

Once we have changed the plot style, we will get the following plot.

Extracted data points after changing the chart type to x-y scatter. Image by the author.
Extracted data points after changing the chart type to x-y scatter. Image by the author.

Whilst in this interface, we can also analyse our data and add in regression lines, which is greater for a quick look analysis.

Applying a linear regression line to the extract data using the Analytical tools. Image by the author.
Applying a linear regression line to the extract data using the Analytical tools. Image by the author.

Summary

The online WebPlotDigitizer tool can be very useful in data science and in petrophysics to extract data from graphics when the raw data source is unavailable.

However, when extracting this data, be aware of the accuracy of the picked points. They are only as good as the original image and will give a close approximation to the true value. In cases where scales are large, this accuracy may be reduced.

If you can get the original data, then that is the best approach. Many authors of papers are approachable and willing to discuss their work. In some cases, they may be able to share the data if it is not proprietary or confidential.

In any case, always cite your data source and state how the data you used has been obtained.

WebPlotDigitizer Citation

2022, Rohatgi, A., Webplotdigitizer: Version 4.6, https://automeris.io/WebPlotDigitizer


Related Articles