A Deep Dive into The Internet of Things

Eric van Rees
Towards Data Science
5 min readAug 31, 2017

--

On reading “Analytics for the Internet of Things (IoT)” by Andrew Minteer

This week I´ve been reading a recent Packt Publishing book called “Analytics for the Internet of Things (IoT)”. It is a great read and I can definitely recommend it to anyone interested in wanting to know more about this fascinating field.

There are many books on IoT, but this one stood out for me. The blurb text “Break through the hype and learn how to extract actionable intelligence from the flood of IoT data” especially appealed to me — IoT is a term that pops up everywhere, even at places where you´d least expect it. Also, it was published only a month ago (July 2017) so at least this meant that the content would be up-to-date, which is an advantage as this field is advancing at a rapid pace.

I was not only interested in getting to know about IoT applications, but also the technology behind it, as well as the combination of hardware and software, and finally the perspective from a developer. The author of the book is Andrew Minteer, who is currently the senior director, data science and research at a leading global retail company. He is an expert on the subject of IoT, with a background in statistics, software development, database design, cloud architecture, and has led analytics teams for over 10 years. He is also a great writer, judging from this book.

“Analytics for the Internet of Things (IoT)” targets various reader groups. In the preface of Packt Publishing books, there´s a section called “Who this book is for” that describes the targeted readers. Foremost, the book is meant for “professionals that are either currently struggling with how to create value with IoT data or are thinking about building this capability in the near future.” These could be “developers, analytics practitioners, data scientists, and general IoT enthusiasts.” The theme of value creation in terms of costs versus benefits is indeed a prominent one, but equally important are the different components of IoT data flow: the devices and sensors, the network protocols, and the data collection technology. Finally, the readers gets an overview of data storage and processing options and strategies, all with the underlying theme to maximize business value using IoT big datasets. There are code examples in Python, R, as well as Tableau.

What I enjoyed about this book is that the author starts from zero by explaining you what people mean with “IoT”, “IoT data” and why IoT data is different from other data. The author uses everyday speech to make his point. For example, his rule number one in IoT analytics is as follows: “never trust data you don´t know. Treat it like a stranger offering you candy.” Another great idea was to start every chapter with a fictitious scenario of a data scientist in conversation with his CEO, being handed new tasks every time a new IoT data analytics scenario is applied, introducing new challenges. These become the topics of each new chapter and give the book a certain logic. They´re also very funny to read and give the text a bit of air, as most of the chapters are quite complex.

And things get complex quite early, for example in Chapter 2 that covers IoT devices and network protocols. The next chapter covers IoT analytics for the cloud, describing Amazon Web Services from market leader in cloud infrastructure, Amazon. The service list from AWS is impressive in its length, and the author covers some relevant services for this book in detail in later chapters, for example AWS Lambda and the AWS IoT Platform. As the author explains, this cloud infrastructure is the best option for lots of data coming in from devices everywhere — the cloud is the number one choice for handling and analyzing IoT data.

The next step is then to define strategies to collect IoT data in order to enable analytics. Here, we enter the big data technology area so expect a lot of info on IoT-specific services from AWS, Microsoft Azure and big data technology for storage such as the Hadoop ecosystem. I found this particular chapter one of the strongest of the book as it explains many different components of multiple ecosystems into detail while at the same time demystifying the complexity of big data analytics. Well done.

The second half of the book is more practical: you learn how to use Tableau for exploring and visualizing data, use the R language to augment visualization tools and get to know a few industry-specific examples. External datasets can add value to your existing ones, and might be easier to access to as there are many free online data sets available. Tableau is again used for visualizing results of IoT data analytics, which comes with some great advice when preparing dashboards.

A separate chapter on applying geospatial analytics to IoT data is a bit of a disappointment as it’s rather short and therefore incomplete. It does not mention more recent IoT and big data analytics tools, such as Esri’s ArcGIS GeoAnalytics and ArcGIS GeoEvent Server. Although this is not an important part of the book, it could easily have been the subject of a single book. On the other hand, what the author explains well are the different file formats used in GIS and how several open and closed source databases handle spatial data.

Machine learning concepts are introduced in a chapter on data science for IoT analytics, together with deep learning and forecasting. Special focus is given on how to use these methods with IoT data. The core concepts for each are reviewed along with examples in R. Again, if you´re new to these topics, this book is a great resource for explaining them along with some example code. As the author is an expert in these fields, this is a very complex and detailed book chapter and therefore a core chapter, along with the third chapter on IoT analytics for the cloud. The last two chapters finally focus on economics (budget) and strategies to organize your data for analytics. The two are interrelated: more data means more costs on cloud infrastructure. Data lakes can turn into data swamps without a clear maintenance strategy.

One last word about the use of Python and R: both languages are used throughout the book but the author acknowledges that Python is a better option for large datasets, which is often (if not always) he case with IoT data analytics. But it is also true that R can be a more handy option to realize data analytics than Python (sometimes). It could also be a matter for preference from the part of the author to choose for R instead of Python as he does several times in this book. Another big insights I gained from this book is the role of Linux in big data computing clusters. As the author writes on page 130 “…you will need to know how to interact with the Linux OS for IoT analytics. It is unavoidable and not so bad once you get used to it. You will not need to be an expert, but you will need to know how to find your way around, run programs, and do some basic scripting.” This book offers lots of advice such as this. It´s a great resource on everything that has to do with IoT and is highly recommended.

--

--