As a person interested in Data Science, a professional data scientist, or just someone looking for some applications that you can use Python for, PyCon is a great place to gain information and learn a ton from knowledgeable and accomplished speakers. Python is one of the most known and used programming languages today, with millions of developers and users worldwide.
Because Python is used internationally, many communities were built to connect developers and networks and share experiences. One of the biggest yearly Python-related events is PyCon (Python Conference). PyCon is not just one conference; it’s a collection of conferences worldwide, with the main one held in the US.
PyCon often includes talks, tutorials, and workshops for different Python-related topics for all levels, from beginners to experts. So anytime I want to learn – or refresh my knowledge – about anything Python, I first try to look for related talks from previous PyCons before heading to Google and looking for different kinds of tutorials and information.
9 Discord Servers for Math, Python, and Data Science You Need to Join Today
In my experience, PyCon talks are often short – less than 30 mins – concise, and presented in a fun, simple way by amazing, accomplished people. However, despite these talks being short and concise, they often contain all the details you need to know about a topic to have a solid understanding of that topic.
This year’s (2021) PyCon US is already done, and like many other conferences this year, it was fully virtual. The conference included a ton of eye-opening talks, simple-to-follow workshops, and useful tutorials. All videos of PyCon are now available on YouTube. Although I would recommend you go through all the materials from PyCon, In this article, I will focus on the data science-related talks that I attended and learned a lot from.
№1: From NumPy to PyTorch, A Story of API Compatibility, by Randall Hunt and Mike Ruberry
Let’s start with a talk by Randall Hunt and Mike Ruberry, two software engineers with experience in large tech companies like Facebook, AWS cloud, and SpaceX. Hunt and Ruberry gave a talk about the behavior of NumPy within PyTorch and whether or not PyTorch is actually NumPy-compatible, and how you can deal with this compatibility issue for smoother execution.
№2: Patterns of ML Models in Production, by Simon Mo
Simon Mo, a software engineer at AnyScale, talks about the hassle of deploying machine learning models for production. As data scientists, building and training machine learning models should be the part where you spend most of your time. Mo walks you through the process of deploying your model with less hassle. He also covers the process of machine learning model deployment for Ray Serve, which is a scalable model serving framework.
№3: Testing stochastic AI models with Hypothesis, by Marina Shvartz
Marina Shvartz is an AI software engineer at Aidoc Medical. Shvarts addresses the struggle of performing efficient testing on different AI models when we can’t manually set their exact edge cases. Shvartz talks about property-based testing, the hypothesis library, and how it can assist data scientists in generating edge cases that can help produce well-tested and developed AI models.
№4: Data Processing on Ray, by SangBin Cho
SangBin Cho, another software engineer from AnyScale, continues on Simon Mo’s talk about deploying machine learning models. In Cho’s talk, you will learn more about how Ray Serve deals with the challenge of scalable data science Python applications. He talks about the challenges they faced in developing Ray and how they overcame them to support the processing of large-scale datasets.
№5: Event-driven applications: Apache Kafka and Python, by Francesco Tisiot
Apache Kafka is one of the most well-known data-streaming platforms. Francesco Tisiot, as a developer advocate at Aiven, helps you explore what Apache Kafka is capable of and the problems it can solve. Tisiot offers some tips and tricks on adding and using Kafka with Python libraries and then introduces Kafka Connect, which is a tool to connect events to take your application to the next data level.
№6: Optimizing Data Retrieval with Python Celery, by Jenna Conn and Hannah Cline
We live in an expanding world of data; the amount of data our application needs to process and handle is growing rapidly. Jenna Conn and Hannah Cline, both software engineers, discuss the different methods you can use the Python library Celery to sort out and optimize your data retrieval process by creating queues for your application and a better overall experience.
№7: Large Scale Data Validation, by Kevin Kho
Kevin Kho, an Open Source Community Engineer at Perfect and former data scientist, discusses the process of validating data using Spark and Dask, with a focus on large-scale data. Kho looks at the challenge of validating different partitions of the data and how this challenge can be overcome by the correct utilization of a mix of tools. Kho will explain that using Pana validation and how it can be done more efficiently using Spark, Dask, and Fugue.
Takeaways
One of the most useful and reliable sources for information about anything Python-related is PyCon talks, tutorials, and workshops. PyCon is a series of Python conferences organized yearly by volunteers from the Python community and held worldwide.
The first and main PyCon is the US version, which is often held in the first half of the year. PyCon US 2021 was held in May and contained hundreds of talks and tutorials focusing on different aspects of Python targeting people from all experience levels. Even if you’re new to the Programming world and Python, I can guarantee that you will find some PyCon talk to match your level and provide you with useful information.
Python is a very versatile language that can be used in many applications, but one of Python’s most common application domains is data science. In this article, I proposed 7 amazing PyCon US 2021 talks that are data science-related and are filled with useful information for data scientists of all levels.
In addition to the talks I suggested in this article, I would recommend you go through the entire list of PyCon talks, maybe you will find a talk that interests you, but I didn’t mention this list.