Weekly Selection — Mar 9, 2018

TDS Editors
Towards Data Science
3 min readMar 9, 2018

--

Introduction to Markov Chains

by Devin Soni — 5 min read

Markov chains are a fairly common, and relatively simple, way to statistically model random processes. They have been used in many different domains, ranging from text generation to financial modeling.

Selecting the best Machine Learning algorithm for your regression problem

by George Seif — 5 min read

When approaching any type of Machine Learning (ML) problem there are many different algorithms to choose from. In machine learning, there’s something called the “No Free Lunch” theorem which basically states that no one ML algorithm is best for all problems.

Beyond Accuracy: Precision and Recall

by William Koehrsen — 11 min read

Would you believe someone who claimed to create a model entirely in their head to identify terrorists trying to board flights with greater than 99% accuracy? Well, here is the model: simply label every single person flying from a US airport as not a terrorist. Given the 800 million average passengers on US flights per year and the 19 (confirmed) terrorists who boarded US flights from 2000–2017, this model achieves an astounding accuracy of 99.9999999%!

Black-Box Attacks on Perceptual Image Hashes with GANs

by Nick Locascio — 5 min read

tldr: This post demonstrates that GANs are capable of breaking image hash algorithms in two key ways: (1) Reversal Attack: Synthesizing the original image from the hash (2) Poisoning Attack: synthesizing hash collisions for arbitrary natural image distributions.

Data Analytics with Python by Web scraping: Illustration with CIA World Factbook

by Tirthajyoti Sarkar — 9 min read

Ina data science project, almost always the most time consuming and messy part is the data gathering and cleaning. Everyone likes to build a cool deep neural network (or XGboost) model or two and show off one’s skills with cool 3D interactive plots. But the models need raw data to start with and they don’t come easy and clean.

Machine Learning Workflow on Diabetes Data : Part 02

by Lahiru Liyanapathirana — 7 min read

In my last article of this series, we discussed about the machine learning workflow on the diabetes data set. And discussed about topics such as data exploration, data cleaning, feature engineering basics and model selection process. You can find the previous article below.

Why take the log of a continuous target variable?

by Radek Osmulski — 5 min read

Data science is a conspiracy.

“Hi, my name is Bob and I’ll be your instructor. I’ll teach you how to drive a car. Open your books on page 147 and let’s learn about different types of exhaust manifolds. Here is the formula for the essential Boyle’s Law…”

Machine Learning From Scratch: Part 3

by Sebastian Kwiatkowski — 12 min read

Part 3 introduces arrays. This family of higher-order collections allows us to describe images and text documents in a format that can be processed by machine learning algorithms.

--

--

Building a vibrant data science and machine learning community. Share your insights and projects with our global audience: bit.ly/write-for-tds