How to Create your First Competition on CodaLab

It’s free and easy!

Adrien Pavao
Towards Data Science

--

Source: istockphoto.com (purchased by the author)

Organizing a challenge allows you to crowd-source the most difficult machine learning problems. It is also an excellent way to learn data science. This short hands-on tutorial will give you everything you need to create your first competition — as early as today!

Why Codalab?

Codalab is an open-source web platform hosting data science competitions. This means that you can set up your own instance of it, or use the main instance on codalab.lisn.fr. Codalab puts an emphasis on science and each year hundreds of challenges take place on it, pushing the limits in many areas: physics, medicine, computer vision, natural language processing or even machine learning itself. Its flexibility allows to solve a wide variety of tasks!

Once you have an account, you can already publish your first competition! The only limit is your imagination.

Get started

To create your first machine learning challenge, all you need to do is to upload a competition bundle. A competition bundle is a ZIP file containing all the pieces of your competition: the data, the documentation, the scoring program and the configuration settings.

Let’s start from an example; it’s the easiest way. Here is the competition bundle of the Iris Challenge, based on the famous Fisher’s dataset; simply click on that link: Iris Competition Bundle.

Now just upload the file named “iris_competition_bundle.zip” into Codalab as following:

Go to “My Competitions”, then “Competitions I’m Running” and finally “Create Competition”
That’s it! Your competition is ready to receive submissions.

Customize your competition

That’s cool and all… but I just re-created the Iris Challenge. I want to design my own task and include my awesome dataset! — You

To custom your competition, you need to change the files contained in the bundle and re-upload it. Let’s have a closer look at what’s inside the bundle.

HTML files

The HTML files define the various pages that participants can see when going to your competition. Use them to provide the documentation and the rules, as well as any information you find important.

Main page of the Iris Challenge. The web pages are defined by the HTML files from the bundle.

Logo

Replace “logo.png” by your awesome logo!

“logo.png” from Iris Competition Bundle

Data

If you are designing a machine learning problem, it is likely that you have data. The public data (or input data) folder is for the data that the participants will access, and the reference data folder is for the ground truth, usually the labels from the testing set that you want to keep secret. You can use the same data format as the one of the provided example, or any other format you like. To ensure compatibility, you’ll need to update the scoring program — we’ll talk about it in the next section.

If your problem doesn’t involve data, don’t worry! Codalab is flexible and allows to define your problem as you want (e.g. reinforcement learning tasks).

Ingestion and scoring programs

Ingestion and scoring programs are the main pieces of code of your competition: they define the way submissions will be evaluated. If you want to allow only result submissions, then you only need the scoring program; the ingestion program is useful for code submissions.

  • Ingestion program: defines how to train the models and save their predictions.
  • Scoring program: defines how to compare the predictions with the ground truth.

Starting kit

If you have already joined a challenge as a competitor, you know how important it is to have a good starting kit. Here you just include everything needed by the participants to easily dive into your challenge: some example submissions, Jupyter notebooks, etc. They’ll be able to download it from the web page “Participate > Files”.

Users can download the Starting Kit from the web page “Participate > Files”

The competition.yaml file

Last but not least, the competition.yaml file defines the settings of your challenge. The title, description, Docker image (simply put a DockerHub name and the submissions will run inside it), dates, prizes and so on.

Edit an existing competition

Your competition is up and running! However, you wish to edit it. It is still possible. As an administrator of your own challenge, you have access to the “Edit” menu; click on the button as following:

Admin features include editing an existing competition

You’ll then access a panel where you can edit every setting. If you wish to change the dataset or the scoring program, you’ll firstly need to upload the new version from the “My Datasets” page.

To go further

Congratulations! You know how to create a competition on Codalab! However, we barely scratched the surface of all the possibilities offered by the software. To learn more, you can refer to Codalab’s Wiki. You’ll for instance learn how to link your personal compute workers (CPU, GPU) or how to define complex leaderboards with multiple criteria. You can even join the effort and develop your own features!

Source: codalab.org

--

--

Research engineer and PhD student in Machine Learning @ Université Paris-Saclay