Why you should NOT use MS MARCO to evaluate semantic search

And likely not many other widely used datasets either

Published in

Towards Data Science

7 min readMar 23, 2020

If we want to investigate the power and limitations of semantic vectors (pre-trained or not), we should ideally prioritize datasets that are less biased towards term-matching signals. This piece shows that the MS MARCO dataset is more biased towards those signals than we expected and that the same issues are likely present in many other datasets due to similar data collection designs.

Why you should NOT use MS MARCO to evaluate semantic search

And likely not many other widely used datasets either

Written by Thiago G. Martins