The world’s leading publication for data science, AI, and ML professionals.

Five Tips For Writing A Great Data Science Thesis

Write for your reader, not for yourself

A good thesis always focuses on the reader. Learn which principles are necessary. Photo by Green Chameleon on Unsplash
A good thesis always focuses on the reader. Learn which principles are necessary. Photo by Green Chameleon on Unsplash

In this article, I will share some tips on how to improve your Data Science thesis. Over the years, I have supervised my share of Data Science thesis projects, ranging from Big Four firms to local SMEs and from multinational banks to software consultancies. The academic program I am active typically involves internships, in which data is utilized to resolve a corporate problem – think designing decision-support dashboards, detecting financial anomalies with machine learning algorithms, or improving real-time parcel routing. Although educational programs, conventions and thesis requirements vary wildly, I hope to offer some common guidelines for any student currently working on a Data Science thesis.

The article offers five guidance points, but may effectively be summarized in a single line:

"Write for your reader, not for yourself."

Data Science is a complex field, and the myriad of algorithms, performance metrics and data structures is hard to fully grasp even for the most seasoned veteran. As such, your job as a writer is to help the reader as much as possible in digesting your research, guiding and clarifying wherever you can. Everyone can make matters more complicated, but to simplify them is the true test of your skill.

I. Intro, content, conclusion

Intro

Always lead with an intro that outlines what the reader can expect. The key is to make this intro specific. Don’t just mention that you will perform a literature review, collect data, model the problem – foreshadow what you will study, what data you collect, what structures or decisions you model. Academic Writing is the opposite of an exciting novel that prolongs the plot— academic readers do not like surprises. A proper introduction provides an framework that aids the reader in structuring the content that follows.

Content

The content will take the bulk of the page volume, and as such it is imperative to clearly structure it. Think in advance about the messages you want to get across. It is often helpful to first write the ‘skeleton’ of the text (e.g., the chapter theme, the section purpose, a bullet point per paragraph). Verify whether the messages have a logical sequence and form a cohesive story. Try to limit yourself to one key message per paragraph. Without a predefined structure and a thought-out message, content quickly collapses into a tangled mess of formulas, data structures and experimental results.

Conclusion

Always end your text (whether it is the thesis, a chapter, or even a single paragraph) with a clear conclusion or summary. Your audience will likely not read, remember or understand every detail you jot down. Of course, the length of the conclusion should be proportionate to what it is concluding. For the thesis itself you are looking at a full chapter, for a paragraph a single closing line suffices. Ending with strong closures is vital for your thesis.

II. Recap, interpret, explain

Truly reading and comprehending a thesis is a tough job; readers need all the help they can get. As a writer, you might be completely immersed in the topic for many weeks, but for your reader the thesis is likely one of many documents to skim through. In fact, few people will ever read your work cover to cover. As such, it is your job as a writer to aid the reader as much as possible. Recap that ω_t you last mentioned 10 pages ago. Interpret what that AUC of 0.7 actually means. Explain why you perform that t-test. Do not assume that the reader will figure out how to put the pieces together – make an active effort to guide your audience through your research.

The main purpose of a thesis is to explain and interpret your research. Presenting results is not enough. [Photo by Marc Schaefer on Unsplash]
The main purpose of a thesis is to explain and interpret your research. Presenting results is not enough. [Photo by Marc Schaefer on Unsplash]

III. Select solutions appropriate to the problem

The problem should always be leading in selecting the techniques to use. It is tempting to explore the Machine Learning hype of the week, but chances are it simply isn’t the best tool for the job. First (i) study your problem setting, (ii) define suitable research questions, (iii) set the requirements and restrictions, (iv) analyze your data sets, and (v) determine success criteria. Only when all that is done, you can truly make an informed decision on appropriate solution methods. In all fairness, thesis projects typically offer a bit more room for exploration than normally encountered – after all, sometimes companies just want to explore whether something new works or not. Nonetheless, always let the problem drive the solution method, not the other way around.

IV. Start broadly, end broadly

Data Science theses have a tendency to go into great depth on tiny aspects, e.g., finetuning hyperparameters or running scores of experiments. In itself, this is fine. However, if the problem you try to resolve was never clear to begin with (problem statement, context analysis) or your experimental results never link back to the original research motivation (conclusion, recommendations), you miss the chance to make a meaningful impact with your research. The following structures might be helpful to translate your work:

  • Hourglass model: start broadly at the level of the corporate/societal problem, gradually zoom in at the technical level, then translate the results back to managerial insights.
  • Double Diamond model: Alternate divergent- and convergent thinking, both in the research phase and in the design phase. Deliberately build in times to explore and times to focus.

The bulk of your work will likely be at the content level (data collection- and cleaning, modeling, parameter turning, experimentation). However, do not forget to first sketch the scene, and close with a compelling final act.

V. Define your key metrics

Try to identify the key metrics that capture the success of your research. For a balanced viewpoint, it is often necessary to report multiple metrics (precision, recall, AUC, F1 score, etc.). However, you should avoid providing merely an array of metrics absent a comprehensive interpretation. That 98.3% accuracy for your fraud detection model sounds great, but tells little about its practical usability. Your result table with hundreds of metrics is impressive, but can you capture its key message in one sentence? Are the higher precision yet lower recall an improvement compared to the base model? Vigorously explore your results from various angles, but ensure to distill the key outcomes for your Twitter summary.

Although data analysis is often multi-faceted, highlighting the key metrics is important to get across your main insights. [Photo by Luke Chesser on Unsplash]
Although data analysis is often multi-faceted, highlighting the key metrics is important to get across your main insights. [Photo by Luke Chesser on Unsplash]

Wrapping up

This article discussed five tips to help writing your Data Science thesis. The overarching principle is to always keep your reader in mind, and to go the distance in actively structuring, explaining and interpreting your research for the intended audience. The five tips might be summarized as follows:

  • Intro, content, conclusion – Use a consistent structure at all levels of your text (thesis, chapter, section, paragraph), leading with an introductory outline or signal and ending with a conclusion or summary.
  • Recap, interpret, explain – A successful thesis guides the reader through your research, providing helpful explanations to support your techniques and results.
  • Select solutions appropriate to the problem – Ensure to thoroughly study the problem, context and expectations, before selecting a solution method that fits the nature of the assignment.
  • Start broadly, end broadly – Going in-depth is perfectly fine, but don’t forget to (i) clearly outline the problem setting and (ii) translate your main findings into tangible insights.
  • Define your key metrics – A single metric rarely suffices to capture the full depth of an analysis, but ultimately it is necessary to boil down your research to some digestible numbers.

Related Articles