
Article Co-authors with : @bonnefoypy and @emeric.chaize CEOs at Olexya.
Features engineering is a natural step after exploratory Data analysis in most data science projects. This process consists of engineering the right features for obtaining the best predictions. Easy to use features engineering methods include Pycaret and Azure studio.
To become a unicorn data scientist mastering the most recent features engineering methods is a must-have skill. In this article, we will review the Kaggle winners’ features engineering methods which can be implemented in an easy and fast manner.
- Pycaret
PyCaret is a simple and easy to use sequential pipeline including a well integrate preprocessing functions including one-step feature engineering :
#import libraries
!pip install pycaret
from pycaret.regression import * #open the dataset
df = pd.read_csv('dog_data.csv')
df

#define feature engineering pipeline:
from pycaret.regression import *
exp = setup(data = df, target = 'breed', feature_interaction = True)

All the preprocessing steps are applied within setup(). **** With more than 20 features engineering methods to prepare your dataset for machine learning including polynomial, trigonometric, arithmetic operations. The best features are automatically selected in correlation with the target.

New features from Pycaret arithmetic engineering pipeline (Image by Author)
model = setup(df, target = ‘breed’, polynomial_features = True)
model[0]

For more details about PyCaret’s preprocessing abilities Click here.

2. Azure studio
This free tool (without the need to register with a credit card) by Microsoft can by a modular approach create features in an easy and fast manner automatically using personalize data engineer pipeline with complete data import options.

The example below uses this method to engineer new features from a financial dataset with 139067 rows and 56 columns.

For example, from the numeric ‘paymentInstrumentAgeInAccount’ numeric feature, the framework can create in one step more than 300 new features in less than one minute:

With a complete list of more than 300 operations and functions:

A more complex framework can be created from a list of preconfigured options with R, SQL, and python scripts combine, as you can see below:

For more details about Azure preprocessing abilities Click here.
Sum up
This brief overview is a reminder of the importance of using several features of engineering methods in Data Science. This post has the scope to covered fast and simple features engineering methods as well as share useful documentation.
Conclusion
If you have some spare time I’d recommend, you’ll read this:
4 Tips for Advanced Feature Engineering and Preprocessing
Exploratory Data Analysis, Feature Engineering and Modelling using Supermarket Sales Data. Part 1.
I hope you enjoy it, keep exploring!
