For the last ~12 months I’ve been working as a Data Scientist at a startup in the health-tech space. I joined as one of two Data Scientists on the team, which essentially meant we needed to be full-stack engineers and scientists to get the work done and build scalable systems that set the company up for success in the future.
In this time, I have learned more than I have in any other role that I’ve held and this piece showcases three main ideologies that summarize my experience best so far.
Essentially, you need to be a paradox. You have to walk the fine line between worlds that are often at odds with one another. This aspect of the job can be really difficult to shine in, as it often involves you to be a lot more than most other roles ask of you. But for those hungry for an intensely rewarding learning experience, it’s unbeatable.
Being a Builder and a Strategist
How much you have to build really depends on how early the startup is and how many people are on the Data Science/ML, Data Engineering, and Data Analysis teams. Regardless, it’ll likely come down on you to not only operate the full ML stack (data ingestion to deployment) but also to build a platform to make future projects better.
Often times this looks like being in meetings with key business stakeholders and creating models that directly impact the bottom line. This is commonly done with most data projects, but there often can be a lot more noise in this type of scenario. By definition, you’re working in a space that is trying to do something novel or solve a problem in a way that provides greater value to your customers than your competitors. This means you’re likely not going to work on traditional ML projects doing what everyone’s already doing and you’re definitely not going to get the project handed to you; you’ll be in charge of attempting to build something new. Whether it’s a novel approach to dataset curation, feature engineering, modeling, application of models, or all of the above – you should be trying to innovate (while still remaining ethical and within legal constraints, obviously).
The projects usually start with a domain expert who deeply understands the problem and/or the customers and what they currently need. As the data scientist, you have to wade through the laundry list of ideas to scope out which make sense in a timely fashion, which need more quality data, and which are long-term goals that can be worked for. This takes strategizing with the domain experts so you can execute in the right direction. The strategizing here is notably different due to the startup nature of the organization, which means we often will need to scrape data from trusted sites to augment datasets or need to pair very closely with the Data Engineers to build the ETL flows or build the architecture for data curation for your idea that currently doesn’t even exist.
Strategizing to build the right thing is immensely harder to do in practice because often times you need to take a lot of shots at building the wrong thing to get concrete feedback so you can eventually build the right thing. Iteration is a central tenet, but a high speed of execution is a religion. You have to be able to move faster than you likely have ever been asked before. By nature of the velocity, you’re forced to focus on only what truly matters, gather feedback, and keep going.
Looking around corners, parsing noise from signals, shipping code each week, and constantly responding to feedback are highly valuable traits. It’s often difficult to keep in this mode for an extended period of time and so a really tight and strong team and crucial for making this work.
Being an Engineer and a Scientist
Although I expressed the importance of going fast, it’s weirdly not fully complete. It’s more accurate to say that you want to build in a way so you can go faster tomorrow. Velocity is important, but acceleration is imperative. What this essentially means for Data Scientists is that you need to come in with 2 rock-solid foundations: Python and Statistics. This was always needed in Data Scientists to be hired for the job, but the work I’ve needed to do has sometimes looked like pure software engineering and sometimes looked like pure statistician work.
My work initially started with a lot of data engineering type of work – gathering quality datasets, setting up ETL flows, curating datasets for downstream analytics or modeling, etc. I’m not a Data Engineer by trade though, so this was done to the best of my ability to keep the team’s speed up. Immediately after I started working on the Machine Learning work, but operating in Jupyter notebooks only each model at a time was a cumbersome approach that wouldn’t accelerate us further. The solution was to iteratively build an ML platform so over time each model iteration is easier to build and deploy. This work was primarily all software engineering type of work and without a strong handle on Python, I don’t think I would’ve stood a chance. As I was building this out, the statistician type of work was picking up due to needing to read a multitude of research papers that employed various ML methods on healthcare data – could we replicate that work? Were there gaps they didn’t consider? What techniques could we adopt that they tested? Knowing how to read papers and translate them into code quickly became a trait I needed to develop.
Every corner of the Data Scientist Venn diagram got magnified x10 for me in this role. Leaning into as many of them as possible was needed for me to deliver at the level that was expected of me. And in return, I nearly eradicated my imposter syndrome while learning an immense amount along the way. I definitely don’t mean to glamorize this either – doing all of this work is probably not the norm of the Data Scientist role, but if you find yourself wanting to do this work at a startup this is likely what it’ll look like. As the saying goes, "A jack of all trades is a master of none, but oftentimes better than a master of one"
Being an Owner and a Collaborator
This is a line most have to walk between as they evolve in their career, but doing it at a Startup is like doing it blindfolded. You often don’t have the cushion or safety net of delivering large-scale projects that don’t have any impact other than wasted work/time. Everyone’s spread a bit thin and the customer is always first, so you need to own projects that fall in your space completely. Hardly any of my teammates wait to be told what to work on, they usually see what needs to be done and get it done in addition to what’s asked so it’s built right.
Ownership here usually means end-to-end ownership. For a data project, it’s from the second data can be curated (from where? how often? how does it need to be cleaned? where does it need to be stored? etc.) to data analysis/manipulation (what metrics need to be created and how? how should we analyze this data? what are relevant patterns to track? etc.) to data modeling (how should this data be modeled? what models will work best? what does current literature say? etc.) to evaluation and deployment (how do we measure success? what does failure mean? where is the model deployed? how can customers access this? etc.). This end-to-end thinking doesn’t end with just deployment either because you then need to have model monitoring and triggers for retraining. It’s a lot to own and not something to be done alone.
Collaboration can be particularly hard for a high-growth company as new people are joining and roles are being defined. You quickly need to create friends that will do favors for you so you’re able to drive these projects to completion. Honestly, the culture of the company can largely be quantified down to how many people are comfortable doing favors for others. If you happen to luck into an organization where people are quick to make friends and help you out after a quick Slack meet, you better be sure you show deep gratitude. Intelligent people are rare, intelligent people with a strong work ethic are ultra rare, and intelligent people with a strong work ethic that are always ready to help you are unicorns.
Nothing I’ve done could be doable without the graciousness of people around the org willing to extend themselves to get me across the finish line. And over time this creates a deeply resilient and formidable culture. This can be created around you by you first being someone who is quick to help others. Instead of waiting for someone else to help you, be the first to extend an olive branch for nothing else other than seeing a friend succeed. Over time, you’ll build a team around you filled with people that won’t let you fail in a way you can’t come back from.
Hope this story gives somewhat of an inside look into my last year (and partially explains why I’ve been a bit away from writing). I’d especially love to talk to people with a similar experience or in a similar position and learn from you all. I’m sure there are elements of my world that I can improve on and would love to know how.
And of course, if there are more aspects of my job that you’d like me to write about please let me know! Thanks a ton for reading along.
Become a Medium Member with my Referral Link
Medium is a large repository of where I do my daily reading, and if you’re in the data space, this platform is a gold mine. If you wish to subscribe, here’s my referral link to sign up. Full disclosure: if you use this link to subscribe to Medium, a portion of your subscription fee will go directly to me. Would love to have you be a part of our community.