Preparing for Climate Change with an AI Assistant

Simplifying complicated data through conversation

Matthew Harris

Published in

Towards Data Science

13 min readNov 26, 2023

Image generated using OpenAI’s ChatGPT and Dall-E-3

TL;DR

In this article, we explore how to create a conversational AI agent using climate change data from the excellent Probable Futures API and the new OpenAI Assistants API. The AI agent is able to answer questions about how climate might affect a specified location and also perform basic data analysis. AI assistants can be well-suited to tasks like this, providing a promising channel for presenting complex data to non-technical users.

I was recently chatting with a neighbor about how climate change might affect us and how best to prepare homes for extreme weather events. There are some amazing websites that provide information related to this in map form, but I wondered if sometimes people might simply want to ask questions like “How will my home be affected by climate change?” and “What can I do about it?” and get a concise summary with tips on how to prepare. So I decided to explore some of the AI tools made available in the last few weeks.

Open AI’s Assistant API

AI agents powered by large language models like GPT-4 are emerging as a way for people to interact with documents and data through conversation. These agents interpret what the person is asking, call APIs and databases to get data, generate and run code to carry out analysis, before presenting results back to the user. Brilliant frameworks like langchain and autogen are leading the way, providing patterns for easily implementing agents. Recently, OpenAI joined the party with their launch of GPTs as a no-code way to create agents, which I explored in this article. These are designed very well and open the way for a much wider audience but they do have a few limitations. They require an API with an openapi.json specification, which means they don’t currently support standards such as graphql. They also don’t support the ability to register functions, which is to be expected for a no-code solution but can limit their capabilities.

Enter OpenAI’s other recent launch — Assistants API.

Assistants API (in beta) is a programmatic way to configure OpenAI Assistants which supports functions, web browsing, and knowledge retrieval from uploaded documents. The functions are a big difference compared to GPTs, as these enable more complex interaction with external data sources. Functions are where Large Language Models (LLMs) like GPT-4 are made aware that some user input should result in a call to a code function. The LLM will generate a response in JSON format with the exact parameters needed to call the function, which can then be used to execute locally. To see how they work in detail with OpenAI, see here.

A Comprehensive API for Climate Change — Probable Futures

For us to be able to create an AI agent to help with preparing for climate change, we need a good source of climate change data and an API to extract that information. Any such resource must apply a rigorous approach to combine General Circulation Model (GCM) predictions.

Luckily, the folks at Probable Futures have done an amazing job!

Probable Futures provide a range of resources related to climate change predictions

Probable Futures is “A non-profit climate literacy initiative that makes practical tools, stories, and resources available online to everyone, everywhere.”, and they provide a series of maps and data based on the CORDEX-CORE framework, a standardization for climate model output from the REMO2015 and REGCM4 regional climate models. [ Side note: I am not affiliated with Probable Futures ]

Importantly, they provide a GraphQL API for accessing this data which I could access after requesting an API key.

Based on the documentation I created functions which I saved into a file assistant_tools.py …

pf_api_url = "https://graphql.probablefutures.org"
pf_token_audience = "https://graphql.probablefutures.com"
pf_token_url = "https://probablefutures.us.auth0.com/oauth/token"

def get_pf_token():
    client_id = os.getenv("CLIENT_ID")
    client_secret = os.getenv("CLIENT_SECRET")
    response = requests.post(
        pf_token_url,
        json={
            "client_id": client_id,
            "client_secret": client_secret,
            "audience": pf_token_audience,
            "grant_type": "client_credentials",
        },
    )
    access_token = response.json()["access_token"]
    return access_token

def get_pf_data(address, country, warming_scenario="1.5"):
    variables = {}

    location = f"""
        country: "{country}"
        address: "{address}"
    """

    query = (
        """
        mutation {
            getDatasetStatistics(input: { """
        + location
        + """ \
                    warmingScenario: \"""" + warming_scenario + """\" 
                }) {
                datasetStatisticsResponses{
                    datasetId
                    midValue
                    name
                    unit
                    warmingScenario
                    latitude
                    longitude
                    info
                }
            }
        }
    """
    )
    print(query)

    access_token = get_pf_token()
    url = pf_api_url + "/graphql"
    headers = {"Authorization": "Bearer " + access_token}
    response = requests.post(
        url, json={"query": query, "variables": variables}, headers=headers
    )
    return str(response.json())

I intentionally excluded datasetId in order to retrieve all indicators so that the AI agent has a wide range of information to work with.

The API is robust in that it accepts towns and cities as well as full addresses. For example …

get_pf_data(address="New Delhi", country="India", warming_scenario="1.5")

Returns a JSON record with climate change information for the location …

{'data': {'getDatasetStatistics': {'datasetStatisticsResponses': [{'datasetId': 40601, 'midValue': '17.0', 'name': 'Change in total annual precipitation', 'unit': 'mm', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40616, 'midValue': '14.0', 'name': 'Change in wettest 90 days', 'unit': 'mm', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40607, 'midValue': '19.0', 'name': 'Change in dry hot days', 'unit': 'days', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40614, 'midValue': '0.0', 'name': 'Change in snowy days', 'unit': 'days', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40612, 'midValue': '2.0', 'name': 'Change in frequency of “1-in-100-year” storm', 'unit': 'x as frequent', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40101, 'midValue': '28.0', 'name': 'Average temperature', 'unit': '°C', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40901, 'midValue': '4.0', 'name': 'Climate zones', 'unit': 'class', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {'climateZoneName': 'Dry semi-arid (or steppe) hot'}}, {'datasetId': 40613, 'midValue': '49.0', 'name': 'Change in precipitation “1-in-100-year” storm', 'unit': 'mm', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40701, 'midValue': '7.0', 'name': 'Likelihood of year-plus extreme drought', 'unit': '%', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40702, 'midValue': '30.0', 'name': 'Likelihood of year-plus drought', 'unit': '%', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40704, 'midValue': '5.0', 'name': 'Change in wildfire danger days', 'unit': 'days', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40703, 'midValue': '-0.2', 'name': 'Change in water balance', 'unit': 'z-score', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40201, 'midValue': '21.0', 'name': 'Average nighttime temperature', 'unit': '°C', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40205, 'midValue': '0.0', 'name': 'Freezing days', 'unit': 'days', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40301, 'midValue': '71.0', 'name': 'Days above 26°C (78°F) wet-bulb', 'unit': 'days', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40302, 'midValue': '24.0', 'name': 'Days above 28°C (82°F) wet-bulb', 'unit': 'days', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40303, 'midValue': '2.0', 'name': 'Days above 30°C (86°F) wet-bulb', 'unit': 'days', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40102, 'midValue': '35.0', 'name': 'Average daytime temperature', 'unit': '°C', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40103, 'midValue': '49.0', 'name': '10 hottest days', 'unit': '°C', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40104, 'midValue': '228.0', 'name': 'Days above 32°C (90°F)', 'unit': 'days', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40105, 'midValue': '187.0', 'name': 'Days above 35°C (95°F)', 'unit': 'days', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40106, 'midValue': '145.0', 'name': 'Days above 38°C (100°F)', 'unit': 'days', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40202, 'midValue': '0.0', 'name': 'Frost nights', 'unit': 'nights', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40304, 'midValue': '0.0', 'name': 'Days above 32°C (90°F) wet-bulb', 'unit': 'days', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40305, 'midValue': '29.0', 'name': '10 hottest wet-bulb days', 'unit': '°C', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40203, 'midValue': '207.0', 'name': 'Nights above 20°C (68°F)', 'unit': 'nights', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40204, 'midValue': '147.0', 'name': 'Nights above 25°C (77°F)', 'unit': 'nights', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}]}}}

Creating an OpenAI Assistant

Next, we need to build the AI assistant using the beta API. There are some good resources in the documentation and also the very useful OpenAI Cookbook. However, being so new and in beta, there isn’t that much information around yet so at times it was a bit of trial and error.

First, we need to configure tools the assistant can use such as the function to get climate change data. Following the documentation …

    get_pf_data_schema = {
        "name": "get_pf_data",
        "parameters": {
            "type": "object",
            "properties": {
                "address": {
                    "type": "string",
                    "description": ("The address of the location to get data for"),
                },
                "country": {
                    "type": "string",
                    "description": ("The country of location to get data for"),
                },
                "warming_scenario": {
                    "type": "string",
                    "enum": ["1.0", "1.5", "2.0", "2.5", "3.0"],
                    "description": ("The warming scenario to get data for. Default is 1.5"),
                }

            },
            "required": ["address", "country"],
        },
        "description": """
            This is the API call to the probable futures API to get predicted climate change indicators for a location
        """,
    }

You’ll notice we’ve provided text descriptions for each parameter in the function. From experimentation, this seems to be used by the agent when populating parameters, so take care to be as clear as possible and to note any idiosyncracies so the LLM can adjust. From this we define the tools …

tools = [
    {
        "type": "function",
        "function": get_pf_data_schema,
    }
    {"type": "code_interpreter"},
]

You’ll notice I left code_interpretor in, giving the assistant the ability to run code needed for data analysis.

Next, we need to specify a set of user instructions (a system prompt). These are absolutely key in tailoring the assistents’s performance to our task. Based on some quick experimentation I arrived at this set …

instructions = """ 
    "Hello, Climate Change Assistant. You help people understand how climate change will affect their homes"
    "You will use Probable Futures Data to predict climate change indicators for a location"
    "You will summarize perfectly the returned data"
    "You will also provide links to local resources and websites to help the user prepare for the predicted climate change"
    "If you don't have enough address information, request it"
    "You default to warming scenario of 1.5 if not specified, but ask if the user wants to try others after presenting results"
    "Group results into categories"
    "Always link to the probable futures website for the location using URL and replacing LATITUDE and LONGITUDE with location values: https://probablefutures.org/maps/?selected_map=days_above_32c&map_version=latest&volume=heat&warming_scenario=1.5&map_projection=mercator#9.2/LATITUDE/LONGITUDE"
    "GENERATE OUTPUT THAT IS CLEAR AND EASY TO UNDERSTAND FOR A NON-TECHNICAL USER"
"""

You can see I’ve added instructions for the assistant to provide resources such as websites to help users prepare for climate change. This is a bit ‘Open’, for a production assistant we’d probably want tighter curation of this.

One wonderful thing that’s now possible is we can also instruct regarding general tone, in the above case requesting that output is clear to a non-technical user. Obviously, all of this needs some systematic prompt engineering, but it’s interesting to note how we now ‘Program’ in part through persuasion. 😊

OK, now we have our tools and instructions, let’s create the assistant …

import os
from openai import AsyncOpenAI
import asyncio
from dotenv import load_dotenv
import sys

load_dotenv()

api_key = os.environ.get("OPENAI_API_KEY")
assistant_id = os.environ.get("ASSISTANT_ID")
model = os.environ.get("MODEL")
client = AsyncOpenAI(api_key=api_key)

name = "Climate Change Assistant"
try:
    my_assistant = await client.beta.assistants.retrieve(assistant_id)
    print("Updating existing assistant ...")
    assistant = await client.beta.assistants.update(
        assistant_id,
        name=name,
        instructions=instructions,
        tools=tools,
        model=model,
    )
except:
    print("Creating assistant ...")
    assistant = await client.beta.assistants.create(
        name=name,
        instructions=instructions,
        tools=tools,
        model=model,
    )
    print(assistant)
    print("Now save the DI in your .env file")

The above assumes we have defined keys and our agent id in a .env file. You’ll notice the code first checks to see if the agent exists using the ASSISTANT_ID in the .env file and update it if so, otherwise it creates a brand-new agent and the ID generated must be copied to the .env file. Without this, I was creating a LOT of assistants!

Once the assistant is created, it becomes visible on the OpenAI User Interface where it can be tested in the Playground. Since most of the development and debugging related to function calls actually calling code, I didn’t find the playground super useful for this analysis, but it’s designed nicely and might be useful in other work.

For this analysis, I decided to use the new GPT-4-Turbo model by setting model to “gpt-4–1106-preview”.

Creating a User Interface

We want to be able to create a full chatbot, so I started with this chainlit cookbook example, adjusting it slightly to separate agent code into a dedicated file and to access via …

import assistant_tools as at

Chainlit is very concise and the user interface easy to set up, you can find the code for the app here.

Trying Out Our Climate Change Assistant AI Agent

Putting it all together — see code here — we start the agent with a simple chainlit run app.py …

Let’s ask about a location …

Noting above that I intentionally misspelled Mombasa.

The agent then starts its work, calling the API and processing the JSON response (it took about 20 seconds) …

Based on our instructions, it then finishes off with …

But is it right?

Let’s call the API and review the output …

get_pf_data(address="Mombassa", country="Kenya", warming_scenario="1.5")

Which queries the API with …

mutation {
    getDatasetStatistics(input: { 
            country: "Kenya"
            address: "Mombassa"
            warmingScenario: "1.5" 
        }) {
        datasetStatisticsResponses{
            datasetId
            midValue
            name
            unit
            warmingScenario
            latitude
            longitude
            info
        }
    }
}

This gives the following (truncated to just display a few) …

{
  "data": {
    "getDatasetStatistics": {
      "datasetStatisticsResponses": [
        {
          "datasetId": 40601,
          "midValue": "30.0",
          "name": "Change in total annual precipitation",
          "unit": "mm",
          "warmingScenario": "1.5",
          "latitude": -4,
          "longitude": 39.6,
          "info": {}
        },
        {
          "datasetId": 40616,
          "midValue": "70.0",
          "name": "Change in wettest 90 days",
          "unit": "mm",
          "warmingScenario": "1.5",
          "latitude": -4,
          "longitude": 39.6,
          "info": {}
        },
        {
          "datasetId": 40607,
          "midValue": "21.0",
          "name": "Change in dry hot days",
          "unit": "days",
          "warmingScenario": "1.5",
          "latitude": -4,
          "longitude": 39.6,
          "info": {}
        },
        {
          "datasetId": 40614,
          "midValue": "0.0",
          "name": "Change in snowy days",
          "unit": "days",
          "warmingScenario": "1.5",
          "latitude": -4,
          "longitude": 39.6,
          "info": {}
        },
        {
          "datasetId": 40612,
          "midValue": "1.0",
          "name": "Change in frequency of \u201c1-in-100-year\u201d storm",
          "unit": "x as frequent",
          "warmingScenario": "1.5",
          "latitude": -4,
          "longitude": 39.6,
          "info": {}
        },


        .... etc

        }
      ]
    }
  }
}

Spot-checking it seems that the agent captured them perfectly and presented to the user an accurate summary.

Improving Usability Through Instruction

The AI agent can be improved through some instructions about how it presents information.

One of the instructions was to always generate a link to the map visualization back on the Probable Futures website, which when clicked goes to the right location …

The agent always generates a URL to take the user to the correct map visualization for their query on the probable futures website

Another instruction asked the agent to always prompt the user to try other warming scenarios. By default, the agent produces results for a predicted 1.5C global increase in temperature, but we allow the user to explore other — and somewhat depressing — scenarios.

Analysis Tasks

Since we gave the AI agent the code interpreter skill, it should be able to execute Python code to do basic data analysis. Let’s try this out.

First I asked for how climate change would affect London and New York, to which the agent provided summaries. Then I asked …

This resulted in the Agent using code interpreter to generate and run Python code to create a plot …

The AI agent is able to carry out basic data analysis tasks using climate change data extracted from the API

Not bad!

Conclusions and Future Work

Using the Probable Futures API and an OpenAI assistant we were able to create a conversational interface showing how people might be able to ask questions about climate change and get advice on how to prepare. The agent was able to make API calls as well as do some basic data analysis. This offers another channel for climate awareness, which may be more attractive to some non-technical users.

We could of course have developed a chatbot to determine intent/entities and code to handle the API, but this is more work and would need to be revisited for any API changes and when new APIs are added. Also, a Large Language Model Agent does a good job of interpreting user input and summarization with very limited development, and takes things to another level in being able to run code and carry out basic data analysis. Our particular use-case seems particularly well suited to an AI agent because the task is constrained in scope.

There are some challenges though, the technique is a bit slow (queries took about 20–30 seconds to complete). Also, LLM token costs weren’t analyzed for this article and may be prohibitive.

That said, OpenAI Assistants API is in beta. Also the agent wasn’t tuned in any way and so with further work, extra functions for common tasks, performance and cost could likely be optimized for this exciting new technique.

References

This article is based on data and other content made available by Probable Futures, a Project of SouthCoast Community Foundation and certain of that data may have been provided to Probable Futures by Woodwell Climate Research Center, Inc. or The Coordinated Regional climate Downscaling Experiment (CORDEX)

Code for this analysis can be found here.

You can find more of my articles here.