The world’s leading publication for data science, AI, and ML professionals.

Why You Should Think of the Enterprise of Data Science More Like a Business, Less Like Science

For Eric J. Daza, "how you sell your work matters in setting you up for success."

Author Spotlight

In the Author Spotlight series, TDS Editors chat with members of our community about their career path in Data Science, their writing, and their sources of inspiration. Today, we’re thrilled to share Eric J. Daza, DrPH, MPS‘s conversation with Ben Huberman.

Dr. Eric J. Daza is a data science statistician at Evidation Health, a digital health company. He has worked for 18+ years in both industry and academia, in fields including pharma clinical trials and survey sampling, nutrition, maternal and child health, global and international health, health promotion and disease prevention, digital health, and behavioral medicine.

Dr. Daza also created and edits Stats-of-1, a digital-health statistics blog. He investigates how to discover causal relationships from an individual’s own wearable device, sensor, and app data.


You variously describe yourself online as "healthcare data scientist" and "data science statistician" – can you flesh out these roles for us?

I was "born" professionally as a biostatistician. I worked as a masters-level biostatistician in a small oncology pharmaceutical company for five years, followed by two years at a contract/clinical research organization when I first started my doctoral program. I then worked on a few survey-sampling projects for a year, and finished grad school with five years as the biostatistician for a global public-health study.

At the end of grad school, I got interested in tailored or personalized health interventions. This quickly led me to n-of-1 trials and single-case experimental designs, which I found fascinating – and also easier for me to grasp than more complex adaptive designs.

How did you land in this corner of the data science field?

My motivation? Someone very dear to me had irritable bowel syndrome, an idiosyncratic chronic illness: While it has a common spectrum of symptoms and triggers, the relationships between these are highly individual-specific. For example, a food that triggers one person’s diarrhea may actually help ameliorate another person’s symptoms. Other such conditions include migraines and chronic pain.

N-of-1 trials were a promising solution, but they involve experimentation via single-individual randomized or alternating crossover treatments. These could be overly tedious and anxiety-inducing for someone already suffering from a lifelong disease that they didn’t want to constantly think about.

I thought wearable sensors and digital health apps might provide a better way of more passively screening possible triggers (before embarking on an n-of-1 self-experiment). That led me to digital health – and having to learn how to clean and process such dense temporal data, as well as techniques for both feature discovery and creation. So I explored these and picked up a little machine learning during a three-year postdoc.

Afterwards, I joined a healthtech startup. We wanted to give insurers and health providers better insights into physician and health-facility performance in terms of patient health and care outcomes. While the company’s mission didn’t overlap much with my own digital-health interests, I valued learning about a completely new analysis domain—Healthcare claims data – and how to manage all of it. I’d never before worked with hundreds of thousands (sometimes millions) of rows of data!

There, I learned how to use AWS clusters and Spark, and ramped up my SQL skills (well beyond PROC SQL from my pharma SAS days). And I also began to recognize what to me were key similarities and differences between data science and applied statistics. I was no longer a biostatistician, but a healthcare data scientist.

How did this trajectory shape your current interests?

I now happily work at Evidation Health on what I really care about: how to use passively collected wearable-sensor data and non-experimental app-based surveys to begin to carefully move from correlation to causation in examining one person’s health history. I call myself a data science statistician because while I work as a digital-health data scientist, I still maintain a very strong identity as a statistician.

I help my team succeed by providing guidance on how to practice good statistical hygiene in various client-driven and internal projects. This includes clearly identifying pre-planned analysis results versus those that emerge during exploratory data analysis (to avoid overgeneralizing/overfitting), and avoiding describing a "statistically significant" finding as a "significant" finding because the two are unrelated. I also try my best to share insights on various classical statistical models and use-cases, probability distributions, and causal inference – my other favorite topic!

What other kinds of projects do you find yourself drawn to these days?

While in grad school, I’d understandably developed a strong interest in finding a faculty position. The required application materials can be intimidating: You need to list your accomplishments in a curriculum vitae (CV), which is basically a detailed resume with all your publications and presentations. But you also have to write a research statement (your detailed 5-year research plan), a teaching statement on your teaching philosophy and background, and maybe a diversity statement on how you can contribute to fostering diversity and inclusivity in academia.

By early 2020, after over two years of searching, it was clear that the probability of me securing a faculty position was almost surely converging to zero. But around then, I was invited to share a bit about my work as a scientist through a weeklong Instagram takeover for Pinoy Scientists, a small but growing global network of Filipino scientists and researchers.

It was a lot of work, but I had a lot of fun! I realized afterwards that I’d pretty much written a short blog series. I was also hoping to attend the big annual behavioral health and medicine conference for the first time that April, and I wanted to have some easily recognized "brand" that folks could remember. So that’s when I decided to turn my research statement into a research blog.

Stats-of-1 launched on February 4, 2020. This year, I took on two other editors: a clinical psychologist and a digital-health computer scientist. Together, we cover perfectly complementary subject domains all centered on individual-focused methods. Our vision at Stats-of-1 is to bring together professionals and experts scattered across various fields like ours, who all share an interest in developing digitally enabled, individual-focused study designs and analysis methods for improving each person’s own health.

As someone who’s worked both in academia and in industry, what were the biggest challenges you’ve had to overcome along the way?

I’d applied to faculty roles, and failed. I realized afterwards that I’d probably pitched my research too narrowly. Saying I liked working on "n-of-1 or single-individual methods with wearable data" may have sounded too niche and risky to faculty search committees. If academic funding organizations like the National Institutes of Health (NIH) and National Science Foundation (NSF) don’t find your field that interesting or too risky, you won’t get money (or a faculty role) to pursue it.

Instead, I may have had better success marketing my work as "precision digital health." That’s a lesson I’ve learned working in industry: How you sell your work matters in setting you up for success.

What has your experience been like with these transitions between academia and industry?

At both of my recent startups, I’ve had to learn how to take action more quickly, and communicate outside of a traditional statistics role. I recall early on when a trusted manager told me that I needed to jump in to offer help when needed, alongside other team members. I didn’t yet realize that my academic training made me overthink decisions that needed to be made quickly, and have since been getting more comfortable with moving much faster. You can also see this in my blog posts: My earlier super-detailed posts have since yielded to ones that are more relaxed and concise – and hopefully more fun to read!

I’ve also had to learn to make quick scientific decisions in helping a client design a project or study. Many clients come with ambiguous research requests – very much the opposite of academic research. I’ve had to learn that some clients expect me to work with them as what academics would call a "co-investigator."

Academic statisticians and clinical-trials biostatisticians rarely (if ever) work as co-investigators; we’re not trained to do so. My academic training tells me that making decisions as a co-investigator is overstepping, maybe even disrespectful (i.e., by sending the message that I don’t trust the scientist, even though it’s clearly not my field of expertise). It’s not a typical responsibility for professional statisticians.

So it’s been a struggle for me to learn that for many client requests, it’s OK (and is really what’s expected) to recommend something scientific (not statistical) based mostly on a cursory literature review and fuzzy statistical logic. That said, it’s important to cover your butt by making sure your logic is clear – both for you, and for the client. This is another way a "data scientist" is different from a "statistician": While "being a data scientist" requires knowing core statistical principles and having basic statistical skills, "being a statistician" in data science can be a liability!

This example leads me to a question I really wanted to ask: there’s an ongoing debate, especially among early-career data scientists, on the value of being a generalist vs. a specialist. What advice would you give to people who are just starting out?

At a recent data science career panel, I realized it may help to think of the enterprise of data science more like business, less like science. Unless you are hired into a role specifically to meet a technical need, you have to be more of a generalist.

For example, "machine learning engineer" and "biostatistician" are specific technical roles. Folks in these roles are responsible for really digging into the details, and communicating with a handful of stakeholders. They aren’t expected to handle multiple, often vague, client requests – which require navigating many interpersonal relationships (both internal and external to the organization or company), as well as managing multiple delivery timelines.

A good data scientist will be able to hone in on and help clarify the client’s true underlying goals, regardless of the technical language they use (which will often be technically inaccurate). It is your job to help "translate" your client’s needs into something understandable by both technical and non-technical stakeholders.

Data science also resembles science in that you should have a specific technical skill set you can contribute to your team. Data science research questions are fairly nebulous, requiring exploratory data analyses. To do so well requires pooling the deep training and expertise of a scientifically diverse team.

Your public writing—for example, here on TDS—seems to follow a similar pattern, in that it doesn’t stay confined within one narrow field.

A good data science team draws from diverse scientific backgrounds. This means that being able to communicate first principles and fundamental concepts from your own field is super important! Unlike academia, you’re constantly talking to folks "outside your field."

It’s also a great way to combat impostor syndrome. Getting good at explaining fundamentals means you get to see how much your teammates appreciate really learning those concepts from you – and your patience in helping them along. It’s also great for team-building, because more often than not, you’ll be the teammate learning someone else’s fundamentals!

That’s how I got into writing for a broad audience.

How do you go about choosing the specific topics you cover?

The fundamentals I like to write about are ones I’ve thought of since grad school. Hence, I have pretty strong feelings about them. These include changing the phrase "statistical significance" to something more scientifically honest (i.e., that can’t be conflated with "significance"), what I call "statistical hygiene" in reporting which analyses were planned before or after looking at your study data (and the complementary message that coming up with new analyses after seeing your initial results is totally ok – and is at least half of science), and the need for causal inference (and how it relates to achieving ethical artificial intelligence and fairness in machine learning).

Looking into the future, what are your hopes for the field of data science in the next couple of years?

For now, I hope "data science" writ large becomes formally recognized or defined as something like "a business or business-paced enterprise made up of teams of scientific experts and specialists." Core competencies should include strong fundamentals in management/consulting practice, statistics, and computer science. This will help schools and training programs create curricula that better prepare data scientists for the truly fascinating, inspiring, and fast-paced work they will do.


Curious to learn more about Eric’s work and projects? Follow him on Medium, LinkedIn, and Twitter.


Related Articles