The world’s leading publication for data science, AI, and ML professionals.

January Edition: Image and Speech Recognition

10 Must-Read Articles

Humans have come a long way thanks to evolution. The insatiable desire to quest and experiment made humans superior over other living beings. To achieve big, first you need to take baby steps and you need to take one step at a time. The complex human anatomy evolved over millions of years and this will continue to evolve to a state that we cannot even imagine. The insatiable desire to quest and experiment started as a result of human’s ability to infer its surroundings and environments through its basic five senses: "sight", "hearing", "smell", "taste" and "touch".

If we fast forward the evolution to today, as humans, we can be extremely proud of ourselves about the journey we travelled so far. Today, we are discussing about colonizing Mars and going beyond our milky way. There’s no doubt in saying mankind has achieved a great success in terms of knowledge, science and technology. Technology has become a necessity in day to day life and it has increased the quality of life in a significant manner. Even though, people make use of technology and we are in a super high position in terms of advances in technology, the driving force behind this tools are still the human’s bravery and brain. Scientists from 1950s and 1960s started to research and experiment the idea on "how to teach machines to learn by themselves like human beings". Just as the evolution process took years and years, this idea also took decades to take its first few baby steps.

Thanks to the hard work of visionary thinkers who are the pioneers of modern day artificial intelligence (deep learning), we see a promising journey towards the next era of human evolution. "Data is the new oil". With the availability of massive amount of data, with the power of high performance computing, distributed systems (cloud computing), and being able to have access to free and open source technology, today we see great innovations in Artificial Intelligence. The ultimate goal of AI is to empower humans to do tremendous work that we never imagine before. Today we hear news around where machines beat human intelligence in certain tasks/work (DOTA, medical imaging etc.), but that’s not the end of it, it’s merely the beginning of a long success journey.

One of the challenges that today’s research community tries to address is, giving power the machines to recognize, generate and infer decisions from visuals and sounds. Past few years we see a glutted of research work around many aspect of deep learning, particularly in the domains of computer vision, natural language processing /generation, geometric deep learning and GANs where researchers try to bring vision and speech to machines. Based on our reader statistics and feedback, we like to present few selected articles from our platform under the theme "Image and Speech Recognition"

Chamin Nalinda, TDS Editorial Associate.


Image Recognition/Generation

Tutorial: Build a lane detector

By Chuan En Lin 林傳恩 – 10 min read

Waymo’s self-driving taxi service just hit the road this month – but how do autonomous vehicles even work? The lines drawn on roads indicate to human drivers where the lanes are and act as a guiding reference to which direction to steer the vehicle accordingly and convention to how vehicle agents interact harmoniously on the road.


The 10 coolest papers from CVPR 2018

By George Seif – 8 min read

The 2018 Conference on Computer Vision and Pattern Recognition (CVPR) took place last week in Salt Lake City, USA. It’s the world’s top conference in the field of computer vision. This year, CVPR received 3,300 main conference paper submissions and accepted 979. Over 6,500 attended the conference and boy was it epic! 6500 people were packed into this room:


What’s new in YOLO v3?

By Ayoosh Kathuria – 9 min read

You only look once, or YOLO, is one of the faster object detection algorithms out there. Though it is no longer the most accurate object detection algorithm, it is a very good choice when you need real-time detection, without loss of too much accuracy.


Only Numpy: Implementing GAN and Adam Optimizer using Numpy with Interactive Code.

By Jae Duk Seo – 6 min read

So today I was inspired by this blog post, "Generative Adversarial Nets in TensorFlow" and I wanted to implement GAN myself using Numpy. Here is the original GAN paper by @goodfellow_ian .Below is a gif of all generated images from Simple GAN.


Real-time and video processing object detection using Tensorflow, OpenCV and Docker.

By Léo Beaucourt – 7 min read

In this article, I will present how I managed to use Tensorflow Object-detection API in a Docker container to perform both real-time (webcam) and video post-processing. I used OpenCV with python3 multiprocessing and multi-threading libraries.


Audio/Speech Recognition

Recurrent Neural Networks: The Powerhouse of Language Modeling

By James Le – 12 min read

During the spring semester of my junior year in college, I had the opportunity to study abroad in Copenhagen, Denmark. I had never been to Europe before that, so I was incredibly excited to immerse into a new culture, meet new people, travel to new places, and, most important, encounter a new language.


Automatic Speech Recognition Data Collection with Youtube V3 API, Mask-RCNN and Google Vision API

By 黃功詳 Steeve Huang – 8 min read

With the rapid development of Machine Learning, especially Deep Learning, Speech Recognition has been improved significantly. Such technology relies on large amount of high-quality data.


Audio Classification using FastAI and On-the-Fly Frequency Transforms

By John Hartquist – 9 min read

While deep learning models are able to help tackle many different types of problems, image classification is the most prevalent example for courses and frameworks, often acting as the "hello, world" introduction.


Human-Like Machine Hearing With AI

By Daniel Rothmann – 9 min read

Significant breakthroughs in AI technology have been achieved through modeling human systems. While artificial neural networks (NNs) are mathematical models which are only loosely coupled with the way actual human neurons function, their application in solving complex and ambiguous real-world problems has been profound.


Kaggle Tensorflow Speech Recognition Challenge

By Chris Dinant – 12 min read

From November 2017 to January 2018 the Google Brain team hosted a speech recognition challenge on Kaggle. The goal of this challenge was to write a program that can correctly identify one of 10 words being spoken in a one-second long audio file.


We also thank all the great new writers who joined us recently, Guilherme Lichand, [Aimun Khan](None), Pol Ferrando, Aimun Khan, Pitchaya Thipkham, Dr. Salih Tutun, Ian Macomber, Douglas Coimbra de Andrade, Rishabh garg, Thijs Bressers, Hlynur Davíð Hlynsson, Jade Abbott, Cheryl Liao, Frankinetics, Gautham Nekkanti, Massimo Belloni, Tomer Dicturel, Jonathan Oheix, Vincent Vanhoucke, Nicole Kwan, Colin Sinclair, Ashrith, Michael Hunger, John Braunlin, and many others. We invite you to take a look at their profiles and check out their work.


Related Articles