The world’s leading publication for data science, AI, and ML professionals.

Hands-On Building a Virtual Property Consultant Using Artificial Intelligence

This is how I used real estate data and powered them using OpenAI Large Language Model GPT3

Image made by author using DALL·E3
Image made by author using DALL·E3

This article starts with a personal story.

I’m from Italy, and I have been living in the United States for 4 years now. I moved fresh out of my university in Rome, and I came here when I was 23 years old. I have a lot of dreams, a huge passion (the one for Artificial Intelligence), and a doctorate to get at the University of Cincinnati.

I lived the majority of my life in Italy, eating good food, drinking good coffee, and sitting for good hours in traffic (to go literally anywhere) 🙃 . Nonetheless, I’ve "adulted" (my wife taught me this term) so much in the United States, as I started doing things I have never done in Italy by myself, my parents being an ocean away. One thing that me and my wife are doing together for the first time is house hunting.

1. About House Hunting

For my non-American followers, "house hunting" is trivially the practice of searching for your perfect house. The term "hunting" is used because there are so many houses and so many realtors and websites you can look for that it’s more than just "searching" for your house; it’s an active hunt. It’s about questions like:

What is the best website? What is the best offer? Where is the best neighborhood? How much taxes do I pay? How high is my insurance?

In physics, we would call that an optimization problem, where all the properties of the house of your dreams want to be met (with some room for negotiation) but limiting the budget to a certain value of x dollars:

Image made by the author
Image made by the author

The market is so big, full of opportunities, and customizable that real estate agents try to work with the clients to get them the house of their dreams. For this concept, the figure of real estate in the United States is also usually linked with the concept of a "go-getter," thus highlighting how human qualities and soft skills are essential in a real estate day-by-day activity.

2. House Hunting during Data Science

Now, I would like to think of myself as a "go-getter" as well, but I am not a real estate agent. Actually, I am a Machine Learning engineer, so I am probably a "nerd go-getter," if that makes sense. As a nerd go-getter, I was thinking of where data science and the real estate world meet. Thinking about it Data science and real estate already meet in so many places, like Zillow, Realtor, or Parafin. These companies are the ones I go to when I’m dreaming of a multimillion-dollar home in California in my delusional moments and they do a wonderful job of creating a bridge between the client and the seller.

Red is the traditional approach; Green would be one of those companies cited above.

Now, how does the green work? If you go, for example, on Zillow, you see that you have a bunch of features you can select. Examples are price, number of bedrooms/bathrooms, if it’s an apartment or a house with a yard, the budget (of course), the interest rate, the city and zip code. Once you select the features you care about, you can have a virtual tour (with pictures and videos) of the house. If you like it, you can schedule a tour with the real estate agent of the property.

If you ask me (that I love software), it is a huge step forward. It requires way less commitment; you don’t have to drive to the office, and you are not stuck with a real estate agent you don’t like. I am also extremely biased because I love software and everything around it, so I have huge respect to the people who built Zillow from scratch.

3. House Hunting using Generative AI (this article)

I think that with generative Artificial Intelligence, a next-step experience for the user is possible. Let me clarify this:

I still strongly believe in the human in the loop.

I think that not even the inventor of Artificial Intelligence (which doesn’t exist) would buy a house just because a computer says so. You want to talk to people, see what they think, help them help you, confront your ideas, and talk about football while you’re there.

The only part where I see an opportunity in this process is the following:. Sometimes I don’t even know what I want, I have a sort of budget but I could go for a little higher for a house that takes my breath away, I want to live in a place but 20 minutes away is ok too, provided it’s worth it, I would ideally not live in an apartment but maybe I would. So in this article, we will do this:

We will replace the hard database search with a chat experience

What do I mean? Let me show you an example.

Image made by author
Image made by author

As you can see, my query is:

"I would like a house in East Seattle, probably big but not too expensive. I have around 1M of budget."

And my AI is communicating with a database for the city of Seattle in the same way that your data science search algorithm would, but better 🙃 . I would argue that is better because: it helps you in that grey area (I said "probably big" and it gave me the reference for the size); it provided me the zip code; it informed me about the possibility of negotiating; and it gave me the option to schedule a viewing right away.

In a few words, I think that house hunting would become incredibly more interesting with the adoption of generative AI, and I hope that, after this article, you will agree with me.

This is what we are going to do:

  • We will download and preprocess a real estate house list. In particular, we will use the example of Seattle for simplicity
  • We will connect the OpenAI GPT model to the database using Langchain
  • We will create a WebApp where you can actively talk to the GPT, just like the example shows, using Streamlit
  • We will show some more examples

Hopefully, you are as excited as I am. Let’s get started 🚀

3.0 Import Libraries

So. As I promised, we will do a web app, but I still want to show you the process in a notebook way, even if we are actively going to use the app. So let’s dive in.

The library that we used for the core part (not for the app) are the following:

You would need this constants.py script, where you replace OPENAI_API_KEY with your OpenAI API key. Keep in mind that this has some (very limited) cost every time you run Langchain. Use the OpenAI API page for more information.

All the people of langchain and OpenAI made it in such a way that the large language models need to be easy to access, so you shouldn’t have any problem downloading any missing libraries. As always, you can do it with:

pip install <everythingyouneed>

on your terminal. In the webapp, we will always do some (optional) visualization of the map of Seattle. But we will see it in a bit.

3.1 Data and Data Processing

The dataset that we are going to use is an example with the city of Seattle , as I mentioned. In order to create an open source, ready to use for everyone dataset, I generated a realistic dataset with the zip code of the city of Seattle and some realistic prices, lot sizes, number of bathrooms and bedrooms. Remember that you can use whatever Real Estate dataset you have and with some minor adjustments you should be able to get the code running.

The dataset can be found in this Google drive folder.

Let me show you the dataset:

We are doing some processing now according to the following steps:

  • Standardizing the lot_size into square feet for all the rows and removing lot_size_units.
ACRE_TO_LOT = 43560
  • A client might not have the zip code inside their head, but they might want to know if it’s North, South, East or West. I used the following map to do the conversion.
SEATTLE_ZIPCODES = {
    'North': ['98103', '98107', '98115', '98117', '98125', '98133', '98155', '98177'],
    'South': ['98108', '98118', '98144', '98146', '98168', '98178', '98188', '98198'],
    'East': ['98102', '98112', '98122', '98109', '98105'],  # More central than truly east, but for categorization
    'West': ['98106', '98116', '98126', '98136', '98199']
}
  • We add the columns "possibility to negotiate" and "price cut," which are random variables that are meant to represent how much the real estate agent can help the client.

All these steps are done using the following code:.

And this is the preprocessed dataset:

I saved this dataset in "seattle_data_proccesed.csv," as we will see in a few moments.

3.2 Langchain

Langchain is a super cool LLM tool that allows you to connect the power of a large language model with some agents. These agents basically allow communication between your input data and the LLM logic. I will admit that describing Langchain like that is extremely reductive, as this company does a lot more stuff. If you are an LLM fanatic, I absolutely recommend checking them out.

The Langchain agent that we will use is "create_csv_agent.". This library allows the LLM to scan through a CSV document (your data table) and answer your query (what you would write on ChatGPT) according to the content of the CSV table. In this blogpost we will use OpenAI to do that (but you can change it with your favorite LL model).

For example, if you run GPT3 on the OpenAI ChatGPT, it would tell you that it doesn’t have access to real-time data, as it’s meant to. On the other hand, if you connect it with real-time data using Langchain, you can exactly extract the house with the lowest price:

Image made by author
Image made by author

Now, all this talk, but the code to run all this stuff is in the following few lines of code:

My question is this:

And it is the answer:

Note! To run this you need an OpenAI API Key. You can get in here. Keep in mind that this operation has some costs (even if very limited). Set an account threshold not to have unpleasant surprises.

Now, that is enough to show how cool Langchain is and how fun this approach could be. We have actively connected our Real Estate information with the latest AI technology. If you are into AI, I think you might find this stuff incredibly cool, and you could see its potential.

One application of this is to build a Web app about it. Here is how I did it:

4. Web App

In order to create the web app that you saw in the previous examples, we would need some extra steps.

These are the Python scripts you want:

4.1 constants.py

The same constants file we saw earlier.

4.2 seattle_data_loader.py

This is where we do the preprocessing, You might need to install Python using pip

4.3 prompts.json

The gold prompts are saved here:

4.4 realestategpt.py

This code has all the GPT stuff, and it has an option to keep the conversational part (when one_shot==False):

4.4 app.py

This is the app.py that uses folium to do the interactive map, the title of the app, and then some streamlining syntax to create an interactive chat bot.

To run this, just do this:

streamlit run app.py

5. Some examples:

These are some examples:

6. Conclusions:

Thank you for experiencing this journey with me! I highly appreciate it. This is what we have done in this blog post:

  1. We introduced the concept of job hunting. We described what it is to look for a house in the US and how companies like Zillow or Realtor are using Data Science algorithms to help in the process
  2. We saw the opportunity for generative AI. Rather than using a hard search of the house, we can talk with a virtual assistant that can talk to us and assist us in finding our dream home
  3. We introduced Langchain. And saw how the Open AI GPT model can be used to combine CSV data with generative AI.
  4. We developed a notebook and, most importantly, a whole web app to test it out, with a folium map and a chatbot.

Let me just discuss what I learned from this:

  • Hard searches throughout the database are, sometimes, limiting. Let me explain. If I’m looking for something that I already know very well, hard searches work: I just want to scroll down until I find my items. Sometimes though, I don’t really know what I want, I just want to browse and see what I like. In those cases, generative AI can help.
  • Generative AI is much more than a chatbot. **** I would argue that the best thing you can do with generative AI is filling it with examples and data, just like we did in this blogpost
  • Is this going to replace a Real Estate agent? I’d say it’s not even slightly close. What this tool is useful for is to create a bridge between the client and the real estate agent, making life easier for both.
  • What did you learn from this? Let me know in the comments below 🙂

7. About me!

Thank you again for your time. It means a lot ❤

My name is Piero Paialunga and I’m this guy here:

I am a Ph.D. candidate at the University of Cincinnati Aerospace Engineering Department and a Machine Learning Engineer for Gen Nine. I talk about AI, and Machine Learning in my blog posts and on Linkedin. If you liked the article and want to know more about machine learning and follow my studies you can:

A. Follow me on Linkedin, where I publish all my stories B. Subscribe to my newsletter. It will keep you updated about new stories and give you the chance to text me to receive all the corrections or doubts you may have. C. Become a referred member, so you won’t have any "maximum number of stories for the month" and you can read whatever I (and thousands of other Machine Learning and Data Science top writers) write about the newest technology available. D. Want to work with me? Check my rates and projects on Upwork!

If you want to ask me questions or start a collaboration, leave a message here or on Linkedin:

[email protected]


Related Articles