Serverless Vector Search with txtai

Build a serverless txtai instance with Kubernetes and Knative

David Mezzetti
NeuML

--

Photo by José Ramos on Unsplash

Serverless application development is a popular way for developers to skip over the complexity of servers and focus on delivering products to their users. All the popular cloud providers offer a form of function as a service (FaaS). Examples include AWS Lambda, Google Cloud Functions and Azure Functions. Pricing is based on execution time and memory consumed. There is significant savings to be had for low to medium frequency functions.

txtai is an open-source platform for semantic search and workflows powered by language models. txtai supports building YAML-configured applications with a “build once, run anywhere” paradigm.

A number of articles cover txtai and are referenced below.

This article will setup a txtai vector search instance (also known semantic/similarity/neural search) on a Kubernetes-based environment with Knative. This approach is cloud-agnostic, enabling a serverless approach for any cloud provider. Kubernetes also supports GPU instances, needed to achieve the fastest machine model inference times.

Note: For those interested in a AWS Lambda based example, check out this documentation.

Install Kubernetes

The first step is installing a local Kubernetes cluster. There are a number of options available, including kind, minikube, MicroK8s and K3s. This article uses kind but others could be swapped in. If there is already a Kubernetes cluster available, this step can be skipped.

The kind documentation is great, follow the instructions here. If everything worked, the following should return a version similar to below.

$ kind version
kind v0.11.1 go1.16.4 linux/amd64

Install Knative

Next, we’ll install Knative. Per Knative’s website:

Knative is an Open-Source Enterprise-level solution to build Serverless and Event Driven Applications

Basically, Knative makes building serverless applications on Kubernetes easier.

Note: If installing on an existing Kubernetes cluster, skip the rest of this section and use these install instructions instead.

As with kind, Knative provides solid documentation. Follow the instructions here to get a copy of Knative installed. The instructions here install a Knative instance into the Kubernetes cluster.

Once Knative is installed, do the following

  1. Install the base kind cluster.
$ kind create cluster
Creating cluster "kind" ...
✓ Ensuring node image (kindest/node:v1.21.1) 🖼
✓ Preparing nodes 📦
✓ Writing configuration 📜
✓ Starting control-plane 🕹️
✓ Installing CNI 🔌
✓ Installing StorageClass 💾
Set kubectl context to "kind-kind"
You can now use your cluster with:
kubectl cluster-info --context kind-kindHave a question, bug, or feature request? Let us know! https://kind.sigs.k8s.io/#community 🙂

2. Install Knative into local cluster via quickstart.

$ kn quickstart kind
Running Knative Quickstart using Kind
✅ Checking dependencies...
Kind version is: 0.11.1
☸ Creating Kind cluster...
Creating cluster "knative" ...
✓ Ensuring node image (kindest/node:v1.22.4) 🖼
✓ Preparing nodes 📦
✓ Writing configuration 📜
✓ Starting control-plane 🕹️
✓ Installing CNI 🔌
✓ Installing StorageClass 💾
✓ Waiting ≤ 2m0s for control-plane = Ready ⏳
• Ready after 18s 💚
Set kubectl context to "kind-knative"
You can now use your cluster with:
kubectl cluster-info --context kind-knativeHave a question, bug, or feature request? Let us know! https://kind.sigs.k8s.io/#community 🙂🍿 Installing Knative Serving v1.2.0 ...
CRDs installed...
Core installed...
Finished installing Knative Serving
🕸️ Installing Kourier networking layer v1.2.0 ...
Kourier installed...
Ingress patched...
Finished installing Kourier Networking layer
🕸 Configuring Kourier for Kind...
Kourier service installed...
Domain DNS set up...
Finished configuring Kourier
🔥 Installing Knative Eventing v1.2.0 ...
CRDs installed...
Core installed...
In-memory channel installed...
Mt-channel broker installed...
Example broker installed...
Finished installing Knative Eventing
🚀 Knative install took: 1m38s
🎉 Now have some fun with Serverless and Event Driven Apps!

3. If everything worked, the following should return a two clusters similar to below.

$ kind get clusters
kind
knative

Build txtai image

Next we’ll build a txtai image bundled with txtai, dependencies, configuration and a cached model.

  1. Create a new working directory.
$ mkdir app && cd app

2. Create a config.yml file. This creates an in-memory index.

# config.yml
writable: true
embeddings:
path: sentence-transformers/nli-mpnet-base-v2
content: true

3. Create Dockerfile. This Dockerfile can also be downloaded from GitHub.

# Set base image
ARG BASE_IMAGE=neuml/txtai-cpu
FROM $BASE_IMAGE
# Copy configuration
COPY config.yml .
# Run local API instance to cache models in container
RUN python -c "from txtai.api import API; API('config.yml', False)"
# Start server and listen on all interfaces
ENV CONFIG "config.yml"
ENTRYPOINT ["uvicorn", "--host", "0.0.0.0", "txtai.api:app"]

4. Build the image.

Note: The tag prefix is important as that will prevent Knative from attempting to pull the image from Docker Hub.

$ docker build -t dev.local/txtai:v1 .
Sending build context to Docker daemon 3.072kB
Step 1/6 : ARG BASE_IMAGE=neuml/txtai-cpu
Step 2/6 : FROM $BASE_IMAGE
---> 894803a0dc04
Step 3/6 : COPY config.yml .
---> b4b01e846285
Step 4/6 : RUN python -c "from txtai.api import API; API('config.yml', False)"
---> Running in 2407b608e382
Downloading: 100%|██████████| 587/587 [00:00<00:00, 570kB/s]
Downloading: 100%|██████████| 418M/418M [00:04<00:00, 90.3MB/s]
Downloading: 100%|██████████| 1.16k/1.16k [00:00<00:00, 505kB/s]
Downloading: 100%|██████████| 226k/226k [00:00<00:00, 5.47MB/s]
Downloading: 100%|██████████| 455k/455k [00:00<00:00, 10.0MB/s]
Downloading: 100%|██████████| 239/239 [00:00<00:00, 104kB/s]
Removing intermediate container 2407b608e382
---> f35b58edfef9
Step 5/6 : ENV CONFIG "config.yml"
---> Running in fdfaa1596467
Removing intermediate container fdfaa1596467
---> b7496b4daea7
Step 6/6 : ENTRYPOINT ["uvicorn", "--host", "0.0.0.0", "txtai.api:app"]
---> Running in 9a70ab36501c
Removing intermediate container 9a70ab36501c
---> a9901b24be98
Successfully built a9901b24be98
Successfully tagged dev.local/txtai:v1

5. Add image to Knative cluster (skip if installing into existing cluster)

$ kind load docker-image --name knative dev.local/txtai:v1
Image: "dev.local/txtai:v1" with ID "sha256:a9901b24be986571178eff9471d875d6002c023dbc55353cd95171d832b7c851" not yet present on node "knative-control-plane", loading...

6. Create Knative txtai service

$ kn service create txtai --image dev.local/txtai:v1 --port 8000 --scale-max 1
Creating service 'txtai' in namespace 'default':
0.020s The Route is still working to reflect the latest desired specification.
0.032s ...
0.045s Configuration "txtai" is waiting for a Revision to become ready.
11.200s ...
11.227s Ingress has not yet been reconciled.
11.273s Waiting for load balancer to be ready
11.540s Ready to serve.
Service 'txtai' created to latest revision 'txtai-00001' is available at URL:
http://txtai.default.127.0.0.1.sslip.io

Test the service

txtai has a number of language bindings available to work with the API. To keep things simple, we’ll interact with txtai via cURL.

  1. Add data to the index.
$ curl -XPOST "http://txtai.default.127.0.0.1.sslip.io/add" -H "Content-Type: application/json" \
--data-binary @- << EOF
[{"id": 0, "text": "US tops 5 million confirmed virus cases"},
{"id": 1, "text": "Canada's last fully intact ice shelf has suddenly collapsed, forming a Manhattan-sized iceberg"},
{"id": 2, "text": "Beijing mobilises invasion craft along coast as Taiwan tensions escalate"},
{"id": 3, "text": "The National Park Service warns against sacrificing slower friends in a bear attack"},
{"id": 4, "text": "Maine man wins \$1M from \$25 lottery ticket"},
{"id": 5, "text": "Make huge profits without work, earn up to $100,000 a day"}]
EOF

2. Index data

$ curl "http://txtai.default.127.0.0.1.sslip.io/index"

3. Run search

$ curl "http://txtai.default.127.0.0.1.sslip.io/search?query=feel+good+story&limit=1"
[{
"id":"4",
"text":"Maine man wins $1M from $25 lottery ticket",
"score":0.08329004049301147
}]

Ok, time to take a breath. We’ve been running commands for a while. But we just built an embeddings index and successfully executed a search 🎉

Add persistent storage

One important thing about Knative — services will be scaled up and down. If you wait a minute and then run the following:

$ curl "http://txtai.default.127.0.0.1.sslip.io/count"
0

The count is now 0? Yes. Due to inactivity the txtai service was terminated. We could overcome this by forcing an instance to always run but that is wasteful. Next we’ll cover how to persist index data.

  1. Keeping things local, we’ll add a local S3 instance with LocalStack.
$ docker run -p 4566:4566 --rm -it localstack/localstack
2022-02-28 18:21:39,477 INFO supervisord started with pid 14
2022-02-28 18:21:40,481 INFO spawned: 'infra' with pid 20
2022-02-28 18:21:41,620 INFO success: infra entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
LocalStack version: 0.14.0
LocalStack build date: 2022-02-15
LocalStack build git hash: 8db17b52
Starting edge router (https port 4566)...
Ready.
[2022-02-28 18:21:41 +0000] [20] [INFO] Running on https://0.0.0.0:4566 (CTRL + C to quit)
2022-02-28T18:21:41.803:INFO:hypercorn.error: Running on https://0.0.0.0:4566 (CTRL + C to quit)

2. Then we’ll edit our Kubernetes cluster to be able to talk to LocalStack. First, let’s get the cluster network’s IP

$ docker container ls -f "name=knative" 
CONTAINER ID IMAGE
949b4e6bdda5 kindest/node:v1.22.4
$ docker inspect 949b4e6bdda5 | grep IPAddress
"IPAddress": "172.19.0.3",

In this case, the host IP is 172.19.0.1

$ kubectl edit cm coredns -n kube-system

Brings up a file. Add the section right after the line with text “prometheus :9153".

hosts {
172.19.0.1 localhost.localstack.cloud
fallthrough
}

3. Restart the DNS service.

$ kubectl rollout restart -n kube-system deployment/coredns
deployment.apps/coredns restarted

4. Now we need to edit the txtai config.yml.

# config.yml
writable: true
path: /tmp/index.tar.gz
cloud:
provider: s3
container: index
key: ""
secret: ""
host: localhost.localstack.cloud
port: 4566
embeddings:
path: sentence-transformers/nli-mpnet-base-v2
content: true

This section adds persistence both to the local instance and syncs the index to LocalStack.

5. Rebuild the txtai image

Note: This command and the following commands are re-running what we’ve done before to rebuild the txtai service with new settings.

$ docker build -t dev.local/txtai:v1 .
Sending build context to Docker daemon 3.072kB
Step 1/6 : ARG BASE_IMAGE=neuml/txtai-cpu
Step 2/6 : FROM $BASE_IMAGE
---> 894803a0dc04
Step 3/6 : COPY config.yml .
---> 5c88e1e649b0
Step 4/6 : RUN python -c "from txtai.api import API; API('config.yml', False)"
---> Running in dad9eb8ad91f
Downloading: 100%|██████████| 587/587 [00:00<00:00, 249kB/s]
Downloading: 100%|██████████| 418M/418M [00:04<00:00, 89.3MB/s]
Downloading: 100%|██████████| 1.16k/1.16k [00:00<00:00, 582kB/s]
Downloading: 100%|██████████| 226k/226k [00:00<00:00, 6.90MB/s]
Downloading: 100%|██████████| 455k/455k [00:00<00:00, 9.82MB/s]
Downloading: 100%|██████████| 239/239 [00:00<00:00, 111kB/s]
Removing intermediate container dad9eb8ad91f
---> 91eb73e6756b
Step 5/6 : ENV CONFIG "config.yml"
---> Running in 49573ceb2fbf
Removing intermediate container 49573ceb2fbf
---> 95bb5499b335
Step 6/6 : ENTRYPOINT ["uvicorn", "--host", "0.0.0.0", "txtai.api:app"]
---> Running in fca01a3b9d52
Removing intermediate container fca01a3b9d52
---> fcf0db204af0
Successfully built fcf0db204af0
Successfully tagged dev.local/txtai:v1

6. Replace image on Knative cluster (skip if installing into existing cluster)

$ kind load docker-image --name knative dev.local/txtai:v1
Image: "dev.local/txtai:v1" with ID "sha256:fcf0db204af0c38dc9ebd5779e077d1d9fb0a05ae64642d5a3595024220a270d" not yet present on node "knative-control-plane", loading...

7. Recreate Knative txtai service

$ kn service create txtai --image dev.local/txtai:v1 --port 8000 --scale-max 1 --force
Replacing service 'txtai' in namespace 'default':
0.039s The Configuration is still working to reflect the latest desired specification.
11.192s Traffic is not yet migrated to the latest revision.
11.231s Ingress has not yet been reconciled.
11.263s Waiting for load balancer to be ready
11.537s Ready to serve.
Service 'txtai' replaced to latest revision 'txtai-00002' is available at URL:
http://txtai.default.127.0.0.1.sslip.io

8. Test service

$ curl -XPOST "http://txtai.default.127.0.0.1.sslip.io/add" -H "Content-Type: application/json" \
--data-binary @- << EOF
[{"id": 0, "text": "US tops 5 million confirmed virus cases"},
{"id": 1, "text": "Canada's last fully intact ice shelf has suddenly collapsed, forming a Manhattan-sized iceberg"},
{"id": 2, "text": "Beijing mobilises invasion craft along coast as Taiwan tensions escalate"},
{"id": 3, "text": "The National Park Service warns against sacrificing slower friends in a bear attack"},
{"id": 4, "text": "Maine man wins \$1M from \$25 lottery ticket"},
{"id": 5, "text": "Make huge profits without work, earn up to $100,000 a day"}]
EOF
$ curl "http://txtai.default.127.0.0.1.sslip.io/index"$ curl "http://txtai.default.127.0.0.1.sslip.io/search?query=feel+good+story&limit=1"
[{
"id":"4",
"text":"Maine man wins $1M from $25 lottery ticket",
"score":0.08329004049301147
}]

9. Wait for service to terminate.

$ kubectl get pod -l serving.knative.dev/service=txtai -w
NAME READY STATUS RESTARTS AGE
txtai-00002-deployment-745c77586f-ppd8c 2/2 Running 0 42s
txtai-00002-deployment-745c77586f-ppd8c 2/2 Terminating 0 92s

Now run again.

$ curl "http://txtai.default.127.0.0.1.sslip.io/search?query=feel+good+story&limit=1"
[{
"id":"4",
"text":"Maine man wins $1M from $25 lottery ticket",
"score":0.08329004049301147
}]

The index persisted through restart given it’s synced to cloud storage (in this case LocalStack but it can be any cloud provider, read more here).

Wrapping up

This article demonstrated how a serverless txtai vector search instance can be run on Kubernetes. While this example covered an embeddings index, the same concepts can be applied to build a scalable translation service, summarization service or workflow service with txtai.

The ☁️ is the limit with how this approach can be applied to other problems!

--

--

David Mezzetti
NeuML
Editor for

Founder/CEO at NeuML. Building easy-to-use semantic search and workflow applications with txtai.