A Data Science Web App to Predict Real Estate Price

Data Science + Machine Learning + Web Development

Prajwal
Towards Data Science

--

Web Development and Data Science have been a long time passion of mine. It has always been an idea to combine these two interests and merge them together and make a project. Finally, I have managed to accomplish this task by building a web app that predicts the real estate price for properties and houses across the city of Bengaluru, India. It has been a huge project and has taken a while. I shall make sure to walk you through every step that I have followed.

Photo by Stephen Dawson on Unsplash

The project mainly has 4 steps:

Architecture of the Application

Data Science

The first step is typical data science work where we take a data set from Kaggle called ‘Bengaluru House price data’ .We will perform some extensive data cleaning work on it to ensure that it gives accurate results during prediction.

This jupyter notebook entitled ‘RealEstatePricePredictor.ipynb’ is where we perform all the data science related work. As the jupyter notebook is self-explanatory I shall briefly touch upon the concepts that I have implemented. Our dataset requires a lot of work in terms of data cleaning. In fact, 70% of the notebook is all about data cleaning where we drop empty rows and remove unnecessary columns that won’t help in prediction.

Next step, Feature Engineering which is the process of extracting useful and important information from the dataset that will contribute the most towards a successful prediction.

The final step is handling outliers. Outliers are anomalies that cause an enormous amount of damage to data and prediction. There is a lot of things to understand from the dataset logically to detect and remove these outliers.

Again, all of these have been explained in the jupyter notebook.

In the end, the original dataset which had almost 13000 rows and 9 columns are reduced to almost 7000 rows and 5 columns.

Machine Learning

The final data obtained is subjected to a machine learning model. We will mainly use K-fold Cross Validation and GridSearchCV technique to perform hyper parameter tuning to obtain the best algorithm and parameters for the model.

Turns out linear regression model gives the best results for our data with a score above 80% which is not bad.

Now, our model needs to be exported into a pickle file (Bengaluru_House_Data.pickle) which converts python objects into a character stream. Also, we need to export the locations(columns) into a json(columns.json) file to be able to interact with it from the frontend.

Server

We will use a Flask server as our backend to host the application locally. In the server folder we will set up two files:

The server.py file will be responsible for handling the routes for fetching the location names and predicting the house price. It also gets the form data from the front end and feeds it to the util.py. These routes can be tested using Postman app.

The util.py file is the main brains behind the back end. It has a function to load the JSON and pickle file. This file takes the form data from server.py and uses the model to predict the estimated price of the property.

Frontend

The front end is made up of simple HTML, CSS and JavaScript. The user can select the number of square feet area, BHK, bathrooms and location in the form and hit on ‘ESTIMATE’ button to get the estimated price. The JavaScript file is responsible for interacting with both the backend flask server routes and the frontend HTML. It gets the form data filled by the user and calls the function that uses the prediction model and renders the estimated price in lakhs rupees (1 lakh = 100000).

Result

Let's see how our project works. Run the server.py file in the backend and open up the HTML web page we created. Input the area of the property in square feet, the number of BHK, the number of bathrooms and the location and hit ‘ESTIMATE’. 😲Yay! We predicted the price of what could be the dream house of someone.

Bonus — Deployment

This part is quite tricky so make sure to do every step along with me.

As of now, the project is only local, we will learn how to deploy the application to the web using AWS EC2 instance.

Architecture

We will use nginx which is a web server that can serve HTTP requests. The nginx can then interact with our Flask Server.

Download nginx from here and install it.

  • STEP 1: To make the estimation, in the script.js file, initially we had these two lines:
var url = "http://127.0.0.1:5000/predict_home_price";
var url = "http://127.0.0.1:5000/get_location_names";

These two were just for the sake of running our app locally, now we must change these two lines into:

var url = "/api/predict_home_price";
var url = "/api/get_location_names";

We need to configure the reverse proxy on nginx server so that all requests to ‘/api’ are routed to port 5000 only.

  • STEP 3: Create new key pair, enter a name, download key(.pem file) and launch EC2 Instance.
  • STEP 4: Click on the ‘Connect’ option beside ‘Launch Instance’ where there will be a command under ‘Example’.

Copy this command. It would look something like this.

  • STEP 5: Open up bash shell and paste this command but replace the blacked out area with the path to your downloaded pem file.

You are now connected to the Linux machine on AWS cloud.

  • STEP 6: Now you need to copy your project onto that virtual machine using WinSCP. Download and launch WinSCP.
  • STEP 7: On your AWS if we click on ‘Connect’ again there is a command above ‘Example’, that looks something like:

Copy this to your WinSCP host name as this is the host name of your virtual machine.

  • STEP 7: Username is ‘ubuntu’. Before entering the password, we need to convert the .pem file into .ppk file using PuTTYgen. Please download PuTTYgen from this link.
  • Click on the ‘Advanced’ button under the Password field. Select the downloaded .pem file, it’ll open up PuTTYgen to convert the .pem file to .ppk file. Select the new .ppk and click ‘OK’.
  • Hit ‘Login’.
  • Copy your whole root project folder(RealEstatePricePrediction) from your machine to the cloud server.
  • On the ubuntu shell of the server execute the following commands:
sudo apt-get update
sudo apt-get install nginx
  • Copy-paste the hostname in the ‘Connect’ dialog box which we saw earlier into the browser.

You should see the homepage of nginx running.

  • STEP 8: Execute the following commands:
cd /etc/nginx/sites-enabled/
sudo unlink default
cd ../sites-available/
sudo vim BengaluruHousePrediction.conf

Copy-paste the following code in the .conf file. This is the reverse proxy setup.

server {
listen 80;
server_name BegaluruHousePrediction;
root /home/ubuntu/RealEstatePricePrediction/client;
index index.html;
location /api/ {
rewrite ^/api(.*) $1 break;
proxy_pass http://127.0.0.1:5000;
}
}
  • Exit from the vim and execute the following commands:
cd ../sites-enabled/
ln -v -s /etc/nginx/sites-available/BengaluruHousePrediction.conf inside /etc/nginx/sites-enabled

What we are doing in the above command is establishing a symbolic link.

  • Restart the server using:
sudo service nginx restart
  • Check status using:
sudo service nginx status
  • Go to your browser and enter the hostname link in the url. You should see your application frontend loading, yet the backend doesn’t work yet.
  • STEP 9: Go back to the terminal and type:
cd ~
cd RealEstatePricePrediction/server/
sudo apt-get python3-pip
sudo pip3 install Flask
sudo pip3 install scikit-learn
sudo pip3 install numpy
  • Start backend server using:
 python server.py

Your application is now live on the internet accessible from any place in the world.

I understand if the deployment phase was a little tough. You can contact me on my LinkedIn if you run into any errors. Feel free to check out the entire code-base on my GitHub.

I hope you all understood and learnt many things along the way.

Thank You!

--

--