The world’s leading publication for data science, AI, and ML professionals.

Weekly Selection – Apr 26, 2019

Ensemble methods: bagging, boosting and stacking

By Joseph Rocca and Baptiste Rocca – 20 min read

"Unity is strength". This old saying expresses pretty well the underlying idea that rules the very powerful "ensemble methods" in machine learning.


Democratising Machine learning with H2O

By Parul Pandey – 9 min read Overview of H2O: the open source, distributed in-memory machine learning platform


5 Advanced Features of Python and How to Use Them

By George Seif – 4 min read

Python is a beautiful language. Simple to use yet powerfully expressive. But are you using everything that it has to offer?


Detecting Malaria with Deep Learning

By Dipanjan (DJ) Sarkar – 16 min read

Welcome to the AI for Social Good Series, where we will be focusing on different aspects of how Artificial Intelligence (AI) coupled with popular open-source tools, technologies and frameworks are being used for development and betterment of our society.


Linear programming and discrete optimization with Python using PuLP

By Tirthajyoti Sarkar – 11 min read

Linear and integer programming are key techniques for discrete optimization problems and they pop up pretty much everywhere in modern business and technology sectors.


A Radiologist’s Exploration of the Stanford ML Group’s MRNet data

By Walter Wiggins – 8 min read

This post reviews the recently released Stanford MRNet knee MRI data set and competition. As I am a senior radiology resident, I will focus on exploring the data through basic domain knowledge – addressing aspects of the data distribution that non-physicians may find perplexing.


Top 10 Coding Mistakes Made by Data Scientists

By Norm Niemer – 5 min read

A data scientist is a "person who is better at statistics than any software engineer and better at software engineering than any statistician". Many data scientists have a statistics background and little experience with software engineering.


Machine learning for anomaly detection and condition monitoring

By Vegard Flovik – 10 min read

The current article focuses mostly on the technical aspects, and includes all the code needed to set up anomaly detection models based on multivariate statistical analysis and autoencoder neural networks.


Simplifying Deep Learning with Fast.ai

By Andrei Lyskov – 7 min read

Deep learning is a field notorious for gatekeeping. If you try to find answers online on how to break into the field, you’ll likely find yourself overwhelmed with a long list of requirements.


Making the Mueller Report Searchable with OCR and Elasticsearch

By Kyle Gallatin – 6 min read

April 18th marked the full release of the Mueller Report – a document outlining the investigation of potential Russian interference in the 2016 presidential election. Like most government documents it is long (448 pages), and would be painfully tedious to read.


Related Articles