The world’s leading publication for data science, AI, and ML professionals.

lazy-text-predict

An open-source library for quick and easy text classification

CLASSIFY TEXT WITH AUTO-ML

Photo by Geralt via pixabay (CC0)
Photo by Geralt via pixabay (CC0)

Introduction and Motivation

There are lots of great text Classification packages out there that can help you solve your business problem. For example, deciding on the sentiment of a sentence. However, there is a complication. Many tasks are custom, and therefore there is no pre-trained off-the-shelf model for you to use out-of-the-box. Making a custom text classification model is very challenging for beginners. The differences between the custom machine learning packages are often highly nuanced. As such, significant experience is required to choose between the different possible tools. Furthermore, technical experience is required to implement the solution of choice for your dataset. This prevents the uninitiated from moving forward.

If you want to skip straight to the code, click here.

AutoML to the Rescue

We are a team of machine learning developers, and recognizing the challenges faced by the developer community, we asked "Is there a solution for developers to easily choose between different text classification options, without having to become a machine learning expert?" Having found no easy solution for text datasets, we decided to build one and share it with the developer community. We call it lazy-text-predict. Using this new tool, you can test your dataset on a variety of different text classification models, and then further train the best model to produce a very good solution that has been customized for your specific needs!

We were inspired to build this project as a result of our exposure to the lazypredict project, and other AutoML solutions.

lazy-text-predict contains a toolkit that lets you load your data to train different text classification models and compare their performance to choose the best tool for deployment in your situation. We have selected several Deep Learning (e.g., transformers) and count-vectorizer-based (sci-kit learn pipelines) models that could solve a variety of text classification tasks. It is pretty simple to get started.

Quickstart

To get started with this tool is really easy! You need to run a command and type 5 lines of code:

First, install the package:

pip install lazy-text-predict

Next, use the code:

The code above will train a series of models to predict the label "Y" based on the input text "X". The tool will also report on the performance of each model type. You can see a summary of the trial with "trial.print_metrics_table()" and test out the models on some custom data you think of on the fly. For example:

trial.predict("This movie was so good that I sold my house to buy tickets to see it")

You can load your own dataset into "X" and "Y", and train the models yourself. Once you have done this you can train your favorite of these models more rigorously, and then export it to be used in your own code(which this tool facilitates as well). We have prepared several annotated examples showing you how to do this, and our docs give pretty clear instructions as well (we hope!). Here is a link to the documentation:

https://github.com/lemay-ai/lazyTextPredict

Other Tools

We think that our tool is pretty great for giving people a start on NLP, and would love for you to use it, but if you really want to dig deep into this exciting field there are several other sources that you should check out:

Hugging Face: The team at Hugging Face curates a great selection of NLP models and packages for deploying them. We have incorporated many of their models into this tool. Check them out!

Scikit-learn: scikit-learn is one of the most popular machine learning and data science libraries today, so it is only natural they were included in our tool. They are a great resource to learn more about machine learning in general, and they have lots of useful tutorials on how to implement NLP and other machine learning pipelines.

What’s Next?

lazy-text-predict is an open-source project that is continually being improved. We are interested to collaborate with the community to take on this project. If you would like to take part, please contact us at [email protected]

Special thanks to the dev team that contributed to the first release of this project, and also this post.

-Daniel Lemay.ai [email protected]


Related Articles