Writing More Successful Machine Learning Research Papers

Things to have in mind to impress your reviewers and your fellow researchers.

Prof. Marc Aubreville
Towards Data Science

--

You spent so many hours on doing your research, sometimes failing, redoing, and at the end having some decent results. Now you want to see them published — maybe it’s a requirement for your PhD, maybe it’s just for your own ego, because it’s cool to have a paper in a prestigious journal with your name on it. So, you go ahead and write your paper and submit to a journal or conference.

Image by author.

But wait — don’t submit yet! There’s a couple of things you should avoid. How do I know? I’m heading a little research group and I’ve been doing reviews for a while now. Some mistakes are made over and over again. Here is my list of recommendations to follow if you want to write a successful machine learning paper:

1. Don’t assume the reader knows about the importance of your topic!

You know what you’re doing and why you’re doing it. But your average reader will not. Always have in mind that you will have at least four kinds of readers:

The probability of your reading having now a lot of knowledge from your exact research field is high! Image by author.
  • Your supervisor. Don’t write for your supervisor. A good supervisor knows already what you are doing and especially why you are doing it.
  • People in the same research field as you are. They mostly know all the related work and also all the relevant terms. But the closer these people are to your exact research topic, the fewer of them exist. So it’s not very likely that they will be your peers.
  • People in closely related research areas. These will not exactly know what your research is about and the specific problems. But they have a good general understanding of the wider research area. If you miss to give a good motivation (including the current challenges and why your research is necessary) you will loose these readers. And finally:
  • People from remotely related research areas. This is the biggest group. And it is quite likely that (at least some) your reviewers will be from this group. They don’t know all of the related work or why your research is important and what makes your method special. So you do need to explain this to them.

Bottom line:

Spend enough time on the motivation. Formulate clear thoughts and back up your claims with literature! All of this goes into the introduction section of your work.

2. Write about novel insights, not technical novelties.

Mostly all machine learning papers state the “novelty” at the end of the introduction. Why do they do this? Because it’s a requirement by many journals and conferences, that what’s being presented is new. And it’s good to require that from a paper, because: If it’s not novel, why bother reading it?

A novelty is something that was know known before. Authors (and also reviewers sometimes), however, misunderstand it as: You need to develop a ground-breaking new method, otherwise your paper is not worthy enough.

From a review of one of our own papers. Image by author.

Consequently, most authors feel that they need to tick off the novelty checkbox by creating a method that is in a way dissimilar to the state-of-the-art. And it’s easy to get the feeling that technical novelty is the most important aspect of machine learning papers.

Sometimes this is my impression when reviewing papers. Image by author.

But it’s not. Insights are the most important aspect of any paper (regardless of the domain). If the reader feels informed about something she/he did not know beforehand, then that’s a novelty!

Bottom line:

Spend time on the analysis and interpretation of your results. If your method shows better results than the state-of-the-art, you should at first doubt your results! Analyze, interpret — then, as a last step, publish.

3. Don’t assume the reader knows your previous work

Sometimes people are making heavy use of referrals to their previous work. Image by author.

While we all agree that research is incremental and you will very likely publish a paper that is based upon your previous findings: Assume that none of your readers has read that paper, yet. And even more: Don’t try to force them to read it.

Most reviewers are senior scientists or professors with a lot on their todo list. It is in your best interest to make your paper as easy to read as possible! Thus, your paper needs to be fully self-contained.

My research is in machine learning, I’ve seen people introducing their own abbreviations and names for their own stuff (e.g. model architectures) and they expect people to know what they’re talking about in a follow-up paper. They don’t.

4. Follow the scientific method

I don’t want to blame anyone here, but from some of the reviews I’ve conducted the following method is not uncommon:

  1. Development of a method
  2. Test of that method
  3. Analysis of the results and then
  4. Interpretation of why the method worked.

But that’s bad science. It’s called HARKing — hypothesis after results are known. And machine learning is especially prone to HARKing.

Instead, the hypothesis should always go first:

The Scientific Method. Image by author.

I am sure you have known this already. But I want to motivate you to also structure your paper like this. Your method is always the test for your hypothesis!

  1. What did we observe in the data that makes us think that we could improve on the state-of-the-art in a specific way? (observation)
  2. How do we need to design our method? (hypothesis)
  3. In what way would we write a test for that that can specifically find out if our hypothesis is true or false? Note: Make up your mind how you can make sure your test is not biased! (derive test)
  4. Run the inference on an independent test set (experiment)
  5. Analyze your results and relate that to your hypothesis! (analysis)
  6. Once you have drawn your conclusions, and think that these are interesting, you are ready to write your paper!

5. Be your own devil’s advocate

Being humble is a very good precondition to being a scientist. This means that you should always be aware of the limitations of your research. Name them and write them down. These are an essential part of your paper.

Your reviewers will have to find weaknesses of your paper:

Field for weaknesses in the MIDL review on openreview. Image by author.

Try to be smart and identify the weaknesses before they do. The goal is not to find an excuse. The goal is really just to describe the limits of what you did, so others are not making avoidable mistakes.

Here are some evil questions a reviewer might ask:

  • Could the findings just be because of a lucky choice of dataset / hyperparameters / random states
  • Why did you choose X,Y,Z in your experimental setup?
  • Will this also work for other datasets as the one(s) it was demonstrated on?

And finally: Please do not forget to proof-read your text before sharing it.

6. Avoid “unnecessary mathiness”

Formulae are a great tool to describe something very precise. They have a side effect, however: Oftentimes it takes the reader much longer to understand a formula than if you describe what you did verbally or in pseudocode.

Then why do people often use more math then necessary? Ian Goodfellow summarized it beautifully in this twitter thread:

While it might help you to pass peer review as Ian states, in the long run your paper will have a much smaller impact if it’s hard to understand.

So: Please use formulas if it helps the reader to more precisely understand what you did. Don’t add math to show off.

7. Concept first, writing second.

Yes, the deadline is approaching at lightspeed. And yes, you need to start writing, but your results are not quite ready yet, because if you don’t start writing now you will miss the deadline.

Don’t write your paper while your results are not yet known. Image by author.

Don’t write “as you go”. Writing a paper has a lot in common with writing a novel. You should know about the main storyline in advance before writing the first paragraph.

It’s quite likely that you are wasting your time if you start writing the paper before having your results and running an analysis on them!

There is one exception: The only part that I can recommend to start writing before you have your results analyzed is the introduction. Try to write the introduction as early as possible. It will help you to identify relevant related work and get a clearer picture of your own work.

8. Write the abstract at last

The abstract is the most important part of your paper. It is the part that will be read by most people. I recommend you to write the abstract at the very last. It’s only after writing the discussion that you know the key essences and takeaways of your paper.

Always remember that the abstract is a short summary of the complete paper, including the conclusions. I always try to wrap up every part of the paper (introduction / methods / results / discussion) in one to three sentences.

I use comments to structure my own abstract. Image by author.

Here’s a little exercise that I can recommend before writing the abstract:

Try to wrap up your paper on a single sheet of paper with a thick sharpie. Try to explain your concept to someone else using that sheet of paper. If it worked out, you were able to summarize your paper. And you are now ready to write the final part of your paper!

--

--

I do teaching and research in medical image recognition at THI, Germany, primarily focused tumor diagnostics.