The Beginner’s Guide to Distributed Computing

7 Fundamental Concepts to Succeed With Distributed Computing in Python

Avril Aysha
Towards Data Science

--

Enter the Distributed Universe

More and more data scientists are venturing into the world of distributed computing to scale up their computations and process larger datasets faster. But starting your distributed computing journey can feel a bit like entering an alternate universe: overwhelming, intimidating and confusing.

Animated image of a bear in a space suit looking around on an empty planet
image via giphy.com

But here’s the good news: you don’t need to know everything about distributed computing to get started.

It’s a bit like going on a holiday to a country where you don’t speak the language. It’d be overkill to learn how to hold an entire conversation on the intricacies of the local political system before getting on your flight. But it’s probably smart to know enough to get around and ask for help when you need it.

This post explains the 7 foundational concepts you’ll need to get started with distributed computing. Mastering these basic concepts early on will save you hours of research and expensive mistakes later on. The concepts will be demonstrated with code in Python using Dask.

Let’s dive in.

1. Lazy Evaluation

--

--