by Devin Soni – 5 min read
Markov chains are a fairly common, and relatively simple, way to statistically model random processes. They have been used in many different domains, ranging from text generation to financial modeling.
Selecting the best Machine Learning algorithm for your regression problem
by George Seif – 5 min read
When approaching any type of Machine Learning (ML) problem there are many different algorithms to choose from. In machine learning, there’s something called the "No Free Lunch" theorem which basically states that no one ML algorithm is best for all problems.
Beyond Accuracy: Precision and Recall
by William Koehrsen – 11 min read
Would you believe someone who claimed to create a model entirely in their head to identify terrorists trying to board flights with greater than 99% accuracy? Well, here is the model: simply label every single person flying from a US airport as not a terrorist. Given the 800 million average passengers on US flights per year and the 19 (confirmed) terrorists who boarded US flights from 2000–2017, this model achieves an astounding accuracy of 99.9999999%!
Black-Box Attacks on Perceptual Image Hashes with GANs
by Nick Locascio – 5 min read
tldr: This post demonstrates that GANs are capable of breaking image hash algorithms in two key ways: (1) Reversal Attack: Synthesizing the original image from the hash (2) Poisoning Attack: synthesizing hash collisions for arbitrary natural image distributions.
Data Analytics with Python by Web scraping: Illustration with CIA World Factbook
by Tirthajyoti Sarkar – 9 min read
Ina data science project, almost always the most time consuming and messy part is the data gathering and cleaning. Everyone likes to build a cool deep neural network (or XGboost) model or two and show off one’s skills with cool 3D interactive plots. But the models need raw data to start with and they don’t come easy and clean.
Machine Learning Workflow on Diabetes Data : Part 02
by Lahiru Liyanapathirana – 7 min read
In my last article of this series, we discussed about the machine learning workflow on the diabetes data set. And discussed about topics such as data exploration, data cleaning, feature engineering basics and model selection process. You can find the previous article below.
Why take the log of a continuous target variable?
by Radek Osmulski – 5 min read
Data science is a conspiracy.
"Hi, my name is Bob and I’ll be your instructor. I’ll teach you how to drive a car. Open your books on page 147 and let’s learn about different types of exhaust manifolds. Here is the formula for the essential Boyle’s Law…"
Machine Learning From Scratch: Part 3
by Sebastian Kwiatkowski – 12 min read
Part 3 introduces arrays. This family of higher-order collections allows us to describe images and text documents in a format that can be processed by machine learning algorithms.