Predicting campaign success on Kickstarter with classification algorithms

Model selection, discussion of feature importance and a Flask app to try.

Published in

Towards Data Science

5 min readMay 10, 2019

As part of my data science quest to explore the creativity industry, using classification algorithms to try and predict the outcome of a Kickstarter campaign seemed like a perfect project idea. As I write this, Kickstarter has helped 182,897 projects to be successful and their mission is to help bring creative projects to life. With $4,284,585,270 pledged to Kickstarter projects, it is a powerful platform for both new and experienced creators.

An example of a Kickstarter campaign page.

Kickstarter campaigns operate using all or nothing funding model. If a campaign fails, no money changes hands, which puts the heavier burden of failure on the campaign creators who may have invested their own time and money into the campaign.

For the purposes of my analysis, I focused on developing an overall robust predictive algorithm with a side-quest of making sure the creators are taken care of. To accomplish this, I chose to focus on the AUC (Area Under The Curve) ROC (Receiver Operating Characteristics) curve as the primary evaluation metric while paying attention to the precision score for the “Success” class in my analysis to make sure we would not predict too many successes which would turn out to be failures (thus minimizing the false positive rate).

The data for this analysis came from Web Robots site which compiles Kickstarter data on an ongoing basis, I examined the period covering April 2018 through March 2019, for a full year of successful and failed campaigns.

An example of a wildly successful Kickstarter campaign.

The type of information used was the data that would be available when the campaign is launched, such as:

the description of the campaign (used as length in words)
campaign length (in days)
whether or not the campaign has Kickstarter endorsement
the goal in USD
location of the campaign (US-based or not)
the category of the campaign

Slightly more than half of campaigns within my dataset were successful (54% success, 46% failure), so the classes were fairly well balanced.

As a result of the initial evaluation of several classification algorithms, and based on the Area Under the Curve (AUC) as my main evaluation metric, XGBoost and Logistic Regression were the top performers.

The Receiving Operating Characteristic (ROC) Curves for the models used in the analysis.

Double checking the precision score for the SUCCESS class of the campaign, to make sure we are not predicting too many successes which would turn out to be failures, the overall top performer is XGBoost with Precision score of 0.71.

Results for the XGBoost model performance on the test dataset.

After running the , XGBoost, on the test data, it performed consistently, which means there was no overfitting or under-fitting in the data. The precision score for the success class increased slightly, from 0.71 to 0.72.

After settling on the best model for this analysis, I reviewed which features are going to be the most important for a successful campaign.

Using a metric called the SHAP values, we can examine which attributes have higher importance for campaign success.

Feature importance for XGBoost Kickstarter model.

The color indicates the magnitude of the feature, The direction indicates positive or negative impact on the model outcome. The splotchy appearance comes from many data points clustered together at those areas. For example, having a small goal has positively affected many campaigns, and having large goals has been detrimental to a smaller proportion of campaigns. Being endorsed by Kickstarter makes a big difference for those campaigns that have it, but not getting it doesn’t hurt that much.

The categories here are being compared to art so we can conclude that design and games are getting higher traction than art while journalism and crafts appear to be negatively affecting the campaign’s odds of success.

To demonstrate this predictive algorithm at work, I developed a Flask app using the logistic regression model, second highest performing algorithm which proved to be easier and faster deployable than XGBoost.

The app itself can be found here — feel free to play around with your favorite Kickstarter campaign and let me know how it turns out!

In conclusion, the creators are recommended to do the following to increase the chances of their campaigns’ success:

Set a small goal in USD that fits the scope of the campaign
Don’t stretch the campaign past 30 days
Consider the category carefully
Get endorsed by Kickstarter if possible

Going forward, I’d like to develop an XGBoost Heroku app and increase the functionality and visual appeal, look into how much the successful campaigns raise and if setting stretch goals is beneficial, and dive into the ongoing challenge of campaign fulfillment and how it can be improved.

All the materials for this project can be found on my GitHub page.

Natasha Borders, MBA
Linked In: @natashaborders

Predicting campaign success on Kickstarter with classification algorithms

Model selection, discussion of feature importance and a Flask app to try.

Written by Natasha Borders