Hands-on Tutorials

An Overview of Data Preprocessing: Features Enrichment, Automatic Feature Selection

Useful feature engineering methods with python implementation in one view

Published in

Towards Data Science

9 min readAug 2, 2021

The dataset should render suitable for the data trained in Machine Learning and the prediction made by the algorithm to yield more successful results. Looking at the dataset, It is seen that some features are more important than others, that is, they have more impact on the output. For example, better results are obtained when replacing with the logarithmic values of the dataset or other mathematical operations such as square root, exponential can be more efficient for the results. The distinction to be made here is to choose the data preprocessing method suitable for the model and the project. This article contains different angles to look at the dataset to make it easier for algorithms to learn the dataset. All studies are made more understandable with python applications.

Table of Contents (TOC)
1. Binning
2. Polynomial & Interaction Features
3. Non-Linear Transform
3.1. Log Transform
3.2. Square Root Transform
3.3. Exponential Transform
3.4. Box-cox Transform
3.5. Reciprocal Transform
4. Automatic Feature Selection
4.1. Analysis of Variance (ANOVA)
4.2. Model-Based Feature Selection
4.3. Iterative Feature Selection

Photo by Tamara Gak on Unsplash

Create an account to read the full story.

The author made this story available to Medium members only.
If you’re new to Medium, create a new account to read this story on us.

Continue in app

Or, continue in mobile web

Sign up with Google

Sign up with Facebook

Sign up with email

Already have an account? Sign in

Published in Towards Data Science

Last published just now

Your home for data science and AI. The world’s leading publication for data science, data analytics, data engineering, machine learning, and artificial intelligence professionals.

Written by Ibrahim Kovan

Machine Learning Researcher —https://www.linkedin.com/in/ibrahimkovan/ — https://www.twitter.com/theibrr

No responses yet

What are your thoughts?

Also publish to my profile

Recommended from Medium

Advanced Dimensionality Reduction Models Made Simple

In

Towards Data Science

by

Riccardo Andreoni

Advanced Dimensionality Reduction Models Made Simple

Learn how to efficiently apply state-of-the-art Dimensionality Reduction methods and boost your Machine Learning models.

Nov 16, 2023

Bayesian variable selection for linear regression based on stochastic search in R applicable to ML…

In

Dev Genius

by

Lukasz Gatarek

Bayesian variable selection for linear regression based on stochastic search in R applicable to ML…

Introduction to the Code and its Purpose

Nov 16, 2024

Lists

Practical Guides to Machine Learning

10 stories2142 saves

Predictive Modeling w/ Python

20 stories1774 saves

The New Chatbots: ChatGPT, Bard, and Beyond

12 stories538 saves

Stories to Help You Grow as a Software Developer

19 stories1556 saves

Boost Your ML Model’s Accuracy with Effective Hyperparameter Tuning

In

Funny AI & Quant

by

Pham The Anh

Boost Your ML Model’s Accuracy with Effective Hyperparameter Tuning

Discover powerful techniques to optimize hyperparameters and enhance machine learning model performance effectively.

Aug 11, 2024

5 Underrated Statistical Tests You Didn’t Know You Needed

In

The Pythoneers

by

Thomas Konstantinovsky

5 Underrated Statistical Tests You Didn’t Know You Needed

From TCR Repertoires to Stock Prices, These Hidden Gems Can Transform Your Analysis

Dec 30, 2024

Advanced Pandas Techniques for Data Processing and Performance

In

Towards Data Science

by

Pratheesh Shivaprasad

Advanced Pandas Techniques for Data Processing and Performance

Hands-on approach from chunked processing to parallel execution

4d ago

Recursive Feature Selection & Cheatsheet for Feature Selection

In

AI Mind

by

Nimisha Singh

Recursive Feature Selection & Cheatsheet for Feature Selection

part 4 — feature selection series

Jul 30, 2024

See more recommendations

Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams