Docker image for NLP

A ready-to-use solution

Published in

Towards Data Science

3 min readJan 14, 2019

NLP is one of the main directions of our work at Poteha Labs. We do text analysis, chatbot development and information retrieval. Therefore, we regularly use Flair, Natasha, TensorFlow and Pytorch, NLTK, sometimes encountering languages other than English. Existing solutions are not always suitable for every problem we face: some are difficult to launch, others are too complicated and heavy.

Consequently, we’ve compiled our own Docker image with all the convenient frameworks for NLP, including deep learning. It suits almost 80% of our tasks and saves time for installing the frameworks.

Current approaches

Generally, the majority of data science solutions are now deployed in one of the following ways:

On bare metal. One has to spend a lot of time setting CUDA with cudNN and installing drivers for Ubuntu and, having succeeded, make plenty of attempts to run it all together. Something will definitely go wrong.
Or using Docker (a simpler approach). However, regular Docker won’t work well. Needed a specific one: the one in which everything related to GPU is preconfigured. There are already several ready images available at Docker Hub which are prepared with different CUDA, cuDNN and other modules’ versions.

How to deploy GPU in Docker? Firstly, one has to choose a basic image which suits his graphics card particularly (for this, search by tag at hub.docker.com/nvidia). When using GPU everything is anyway inherited from Nvidia images of the required version (for CPU, on the contrary, any convenient image can be used). Then, having inherited from the basic image, one creates his own basic image and runs it. The whole image is going to weigh around 3Gb, however, everything will work fine.

The solution

Having undergone all these difficulties, we, in turn, have created a production Docker image for NLP (source code) which is available and free. A ready image is at our Docker hub. It contains a handful of modern NLP frameworks including those for deep learning. What is inside from the box: torch, flair, spacy, dateparser, pymorphy2, yargy, natasha, nltk, yake (versions). More detailed:

flair is a state-of-the-art nlp module, which provides convenient NER, PoS, sense disambiguation and classification.
natasha is a module for NER in Russian.
yargy is a Russian language parser.
yake — is an automatic keyword extractor. Its main features are: unsupervised approach, corpus, domain and language independency.

Four simple steps to install the image:

Clone the repo
Then build docker build -t nlp-cpu -f ./Dockerfile.cpu . (or docker build -t nlp-cpu -f ./Dockerfile.gpu .)
And use: docker run -it — runtime=nvidia iwitaly/nlp:gpu nvidia-smi
You can also pass CUDA_VISIBLE_DEVICES environment variable.

With the help of this Docker image, you can save time on deployment by quickly launching it and start with NLP directly. We hope that it will simplify processes at least a little.

If you have any questions about the installation, please, leave your comments here or contact me directly. Also, feel free to fork the repo and modify the original files.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Published in Towards Data Science

788K Followers

Last published just now

Your home for data science and AI. The world’s leading publication for data science, data analytics, data engineering, machine learning, and artificial intelligence professionals.

Written by Vitaly Davydov

110 Followers

50 Following

CEO at Adapty; Rock climber&mountaineer

Responses (1)

What are your thoughts?

Also publish to my profile

Tim Dentry

over 1 year ago

HI Vitaly - great article. Just out of curiosity, will this approach (with your Docker solution) work on M1 silicon (for R&D / EDA)? Or have you tuned it to a strict GPU / fixed memory environment?

Recommended from Medium

Demystifying Transformers: Components and Coding from Scratch

ajaymehta

Demystifying Transformers: Components and Coding from Scratch

“Attention Is All You Need” research paper,here :https://arxiv.org/pdf/1706.03762.pdf

Dec 30, 2024

License Plate Recognition with OpenCV and Tesseract OCR

AI Innovator From PrismAI

Abhijat Sarari

License Plate Recognition with OpenCV and Tesseract OCR

License Plate Recognition (LPR) is a powerful tool in computer vision, used in applications like automated toll collection, traffic…

Nov 4, 2024

Lists

Predictive Modeling w/ Python

20 stories1761 saves

Practical Guides to Machine Learning

10 stories2136 saves

Natural Language Processing

1884 stories1529 saves

The New Chatbots: ChatGPT, Bard, and Beyond

12 stories534 saves

How I Am Using a Lifetime 100% Free Server

Harendra

How I Am Using a Lifetime 100% Free Server

Get a server with 24 GB RAM + 4 CPU + 200 GB Storage + Always Free

Oct 26, 2024

128

Multi-Agentic RAG with Hugging Face Code Agents

Towards Data Science

Gabriele Sgroi, PhD

Multi-Agentic RAG with Hugging Face Code Agents

Using Qwen2.5–7B-Instruct powered code agents to create a local, open source, multi-agentic RAG system

Dec 31, 2024

4 ideas that are changing the game for LLM research

AI Advances

Nikhil Anand

4 ideas that are changing the game for LLM research

The 2nd got me a full-time job at Adobe.

6d ago

5 Simple Steps to Start Investing with Little Money

Satyam Sahu

5 Simple Steps to Start Investing with Little Money

“Many people think they need a lot of money to invest. This blog will dispel this myth by outlining easy ways to start investing, such as…

Sep 12, 2024

See more recommendations

Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams