The world’s leading publication for data science, AI, and ML professionals.

Best Features engineering Methods Every Data Scientist Should Know

Tips and tricks senior data scientists use for fast features engineering

Photo by Sabrina Nedjah on Unsplash
Photo by Sabrina Nedjah on Unsplash

Article Co-authors with : @bonnefoypy and @emeric.chaize CEOs at Olexya.

Features engineering is a natural step after exploratory Data analysis in most data science projects. This process consists of engineering the right features for obtaining the best predictions. Easy to use features engineering methods include Pycaret and Azure studio.

To become a unicorn data scientist mastering the most recent features engineering methods is a must-have skill. In this article, we will review the Kaggle winners’ features engineering methods which can be implemented in an easy and fast manner.

  1. Pycaret

PyCaret is a simple and easy to use sequential pipeline including a well integrate preprocessing functions including one-step feature engineering :

#import libraries
!pip install pycaret
from pycaret.regression import * #open the dataset
df =  pd.read_csv('dog_data.csv')
df
Dog dataset (Image by Author)
Dog dataset (Image by Author)
#define feature engineering pipeline:
from pycaret.regression import *
exp = setup(data = df, target = 'breed', feature_interaction = True)
Pycaret feature engineering operations (Image by Author)
Pycaret feature engineering operations (Image by Author)

All the preprocessing steps are applied within setup(). **** With more than 20 features engineering methods to prepare your dataset for machine learning including polynomial, trigonometric, arithmetic operations. The best features are automatically selected in correlation with the target.

New features from Pycaret arithmetic engineering pipeline (Image by Author)
New features from Pycaret arithmetic engineering pipeline (Image by Author)

New features from Pycaret arithmetic engineering pipeline (Image by Author)

model = setup(df, target = ‘breed’, polynomial_features = True)

model[0]

New features from Pycaret polynomial engineering pipeline (Image by Author)
New features from Pycaret polynomial engineering pipeline (Image by Author)

For more details about PyCaret’s preprocessing abilities Click here.

Photo by Slashio Photography on Unsplash
Photo by Slashio Photography on Unsplash

2. Azure studio

This free tool (without the need to register with a credit card) by Microsoft can by a modular approach create features in an easy and fast manner automatically using personalize data engineer pipeline with complete data import options.

Azure Studio import data wizard (Image by Author)
Azure Studio import data wizard (Image by Author)

The example below uses this method to engineer new features from a financial dataset with 139067 rows and 56 columns.

Azure studio feature engineering pipeline (Image by Author)
Azure studio feature engineering pipeline (Image by Author)

For example, from the numeric ‘paymentInstrumentAgeInAccount’ numeric feature, the framework can create in one step more than 300 new features in less than one minute:

Azure studio math operations (Image by Author)
Azure studio math operations (Image by Author)

With a complete list of more than 300 operations and functions:

Azure studio special operations (Image by Author)
Azure studio special operations (Image by Author)

A more complex framework can be created from a list of preconfigured options with R, SQL, and python scripts combine, as you can see below:

Azure studio features engineering pipeline using R scripts (image by Author)
Azure studio features engineering pipeline using R scripts (image by Author)

For more details about Azure preprocessing abilities Click here.


Sum up

This brief overview is a reminder of the importance of using several features of engineering methods in Data Science. This post has the scope to covered fast and simple features engineering methods as well as share useful documentation.

Conclusion

If you have some spare time I’d recommend, you’ll read this:

4 Tips for Advanced Feature Engineering and Preprocessing

Exploratory Data Analysis, Feature Engineering and Modelling using Supermarket Sales Data. Part 1.

I hope you enjoy it, keep exploring!

Photo by Ozgu Ozden on Unsplash
Photo by Ozgu Ozden on Unsplash

Related Articles