Author Spotlight

The More You Write, the Better You Are at Explaining Your Work

Dr. Varshita Sher: “My first advice for a complete beginner would be to just start anywhere”

TDS Editors
Towards Data Science
7 min readApr 20, 2022

--

In the Author Spotlight series, TDS Editors chat with members of our community about their career path in data science, their writing, and their sources of inspiration. Today, we’re thrilled to share our conversation with Dr. Varshita Sher.

Dr. Sher is currently working as a data scientist at the Alan Turing Institute’s Applied Research Centre, leveraging deep-learning technology to solve problems in the NLP and Computer Vision domains. She has a Master’s degree in Computer Science from the University of Oxford and a Ph.D. in Learning Analytics from Simon Fraser University. Her work in the last eight years has focused on the intersection of research and implementation of AI/ML algorithms in myriad sectors, including Edtech, Fintech, and Healthcare.

What inspired you to choose the career path you’re currently on?

I was quite interested and invested in mathematics and statistics from an early age — everything from solving sudoku puzzles and magic square to algebra, calculus, etc. It wasn’t until I started working on big data visualizations during my master’s thesis (in human-computer interaction and visual analytics) that I truly appreciated the power of data, and how visuals can be so useful at bringing patterns and insights to light.

I now realize that data viz is just a subset of the wide data science spectrum, but that tip of the iceberg was enough to pique my interest in the field.

Following my master’s, I was offered the opportunity to work on an EU project that involved, among other things, building a sophisticated dashboard and conducting naturalistic user studies to assess its usability by the general public. It was then that I knew data was my true calling. I loved how creative the entire process from data mining to EDA (exploratory data analysis) to modelling was. I have since moved on to core ML and (more recently) DL fields, but what I love the most is that there is no one-size-fits-all approach that works on all data types and all problem statements.

Along the way, was there any aspect of data science you found particularly difficult to tackle?

At the very beginning of my DS journey, I found it challenging to prioritize things and concepts to learn. As you may know, the DS field is ever-evolving and knowing where to start can be overwhelming. It wasn’t until I decided to get my hands dirty with a few datasets that I realized there are recurring themes, so much so that I was able to put concepts and techniques into the “must-know,” “could-know,” and “should-know” buckets.

What kinds of projects do you prefer to focus on these days?

While data science is an incredible tool for solving complex problems, I find beauty in simplicity. So, for me, DSSG (Data Science for Social Good) projects are some of the most interesting ones to read. For those interested, Datakind has done a fantastic job in this sector, and some of the projects the team have been working on are remarkable. One that comes to mind is the automatic identification of villages using satellite imagery to allocate monetary help. That’s because families with thatched roofs are generally poorer compared to those with metal roofs in villages.

A common theme across all these projects is that it is not about implementing the fanciest algorithm, but about finding insights through exploratory analysis, getting the job done, and seeing timely impact.

Are there any specific areas you see as particularly promising?

I am excited to see inclusive ML and DS techniques. For instance, the state-of-the-art NLP models are mostly trained on English, but other languages and dialects have been unable to take advantage of GPT-like models. Prompt engineering is another field to look out for as interaction with most GPT-like models is sensitive to wording or phrasing, and sometimes requires trial and error to perform well.

With a very busy work life, why did you decide to start writing publicly about data-related topics?

I started writing mainly during the pandemic. Writing was a way of documenting everything I was learning for my future self. And this stands true even to this day — I don’t think anyone reads my articles as much as I do!

I wanted my writing style to be lucid, without technical jargon, and easily comprehensible (to me at least), and that’s why I was able to start writing so effortlessly. As time went by, writing became a way for me to test my understanding of a particular topic—because unless you can explain something in an ELI5 (Explain Like I am 5) manner, do you really understand it yourself?

Does your work at the Turing Institute inform the kind of posts you write for a broader audience on TDS?

Actually, quite the opposite. I tend to write mostly about topics which would be more broadly applicable to people who are not working in the same sector as me. And this is quite useful, because transferrable skills are super important in the field of data science. I should, however, mention that some of my more popular posts are the ones based on the kinds of things I get stuck on during a typical day at work — importing files, git commits, setting up a virtual environment, etc.

How do you decide on the topics you write about? And what advice would you give to someone who wants to write about their work, but isn’t sure where to start?

I have a strict policy about quality over quantity. So, before I begin writing, I do a quick scan of the top Google search results on the potential topic. I begin writing if nothing relevant comes up or if I feel I can explain it better through my writing. Additionally, I also have some go-to blogs and vlogs that I have full faith in, such as MachineLearningMastery, Jay Allamar’s blog, Yannic Kilcher, Aleksa Gordic, etc., and I scout these before I begin writing. This makes sense to me because if someone has already explained everything in a way I could have only imagined, I don’t see a point in adding to the plethora of resources out there — I would rather point my readers to it, in fact.

Having said all that, my first advice for a complete beginner would be to just start anywhere — irrespective of how much has already been written about that topic, irrespective of the length of your article, irrespective of how well you think it will be received by your audience. Just pick a topic you feel confident explaining to your parents, siblings, or friends and write about it.

The topic doesn’t even have to be a full-fledged algorithm (for me, it was explaining how a p-value should be interpreted); it could be niche, such as a common runtime error (you can find plenty of them in GitHub issues) and the fix for it. If picking a topic seems tough, pick a data modality (image, audio, video, etc.) that you like working with and gather some common analysis techniques that an outsider ought to know about it. For instance, data augmentation techniques in computer vision.

Soon you’ll realize: the more you write, the better you are at explaining your work. That’s something you should definitely mention as an example of good communication skills during an interview.

To end on a future-leaning note, what are some of the changes you’d like to see in data science over the next couple of years?

I feel that in the coming years, there’s hope for FDA-level assessment protocols for ML models that are used in high-stake sectors such as healthcare, autonomous vehicles, etc. Just last month, the U.S. government punished an app vendor for building an algorithm that violated data-collection laws.

Similar measures will prove quite useful for embedding trust in these systems and will allow proper auditing when things go south. XAI (Explainable AI) has paved the way for decoding these black box models, but it’s still got a long way to go.

I am also excited to see the field move away from the trend that larger models lead to better results, and hope to see more GPT-3-like or Wu Dao 2.0-like models which are smaller by an order of magnitude. And as far as moon-shot wishful thinking goes, I am also excited to see whether Ray Kurzweil’s prediction regarding the singularity (AI’s capability of human-level thinking, when machines become smarter than humans) will occur within our lifetime.

To explore Dr. Sher’s work, you can follow her on Medium, Twitter, and LinkedIn. In case you’d like to dive right into her writing, here’s a selection of some of her standout TDS articles:

Feeling inspired to share some of your own writing with a wide audience? We’d love to hear from you.

This Q&A was lightly edited for length and clarity.

--

--

Building a vibrant data science and machine learning community. Share your insights and projects with our global audience: bit.ly/write-for-tds