Notes From Industry

How to Connect a Local or Remote Machine to a Databricks Cluster

The intersection of Databricks, Python, and Docker

Pedram Ataee, PhD
Towards Data Science
5 min readApr 11, 2021

--

Photo by Clay Banks on Unsplash

When you start working with Databricks, you will reach the point that you decide to code outside of Databricks and remotely connect to its computation power, a.k.a. Databricks Cluster. Why? Mainly because one of the main features of Databricks is its Spark job management, which can make your life easy. Using this service, thanks to its Spark engines, you can submit a series of Spark jobs to a large-scale dataset and get back your results in a matter of seconds. In this article, I want to describe how you can configure your local or remote machine to connect to a Databricks Cluster as the first step.

— What is Databricks?

Databricks is an abstract layer sitting on cold cloud infrastructures like AWS and Azure that lets you easily manage computation power, data storage, job scheduling, and model management. It provides a development environment to obtain preliminary…

--

--

🤖 AI Architect 📚 Author of “Artificial Intelligence: Unorthodox Lessons” ❤️ Learn more about me: dancewithdata.com