The world’s leading publication for data science, AI, and ML professionals.

Using AI to make my own Smart Assistant App

Smart Assistant App made using Python and GPT-3

Applied Machine Learning

Image by maxuser in Shutterbox
Image by maxuser in Shutterbox

Smart assistants are becoming more and more popular, with products like Apple’s Siri, Amazon’s Alexa, and Google Home. In this article, I make my own smart assistant in Python.

Introduction

Smart assistants allow you to gather information from the web quickly through voice interaction. When making a smart assistant, I want to make a web app with a Graphical User Interface where I can interact with the smart assistant using my voice. In this article, I detail how I go about making this app, what tools I used and the methods I applied. A lot of the tools used for this little project can be used in many others.

Method

In order to make the smart assistant, I use OpenAI‘s GPT-3. GPT-3 is a massive language model trained by OpenAI. It is pre-trained and is able to generalize well to a wide variety of use cases.

My goal is to make an app through which I can interact with GPT-3 using my voice.

The overall flowchart of the app looks like this:

Image by Author
Image by Author

In order to pass my voice to OpenAI I first need to pass my speech to text. So first I record my voice using the Streamlit app. I then encode the recording and pass it to AssemblyAI for it to be transcribed. I recieve the transcription and pass it to OpenAI. I can then take the response and display it on the app.

To communicate with these services through the app I made the use of their APIs.

What is an API?

Application program interfaces (APIs) are ways for software programs to talk to each other. In order for them to interact, the API provides a set of protocols and tools that help the interaction be successful.

Both services have APIs in Python so I can easily integrate both services into the app. The APIs allow me to access services, tools, and AI models developed and hosted by companies. I can send them a request of what I want their models to process for me, and I get a response with the output of their pre-trained models.

How do you use API’s?

To transcribe the audio in real-time I used the AssemblyAI Websocket API. Websocket connections allow to send and receive requests from the API server bi-directionally, unlike a HTTPS connection. This way I can transcribe the data live.

To establish the WebSocket connection I used the async library "asyncio" and "websockets" libraries in Python. Here is how the code looked:

In the code above I connect to the WebSocket URL and I attempt to receive a confirmation of the connection (seen above in the session_begins).

Once connected I can define the send and receive functions which will allow me to communicate with the APIs the way I want.

In the send function, I first gather my audio, encode it into utf-8, and send it using the WebSocket. utf-8 encoding is quite standard when working with audio files or when working with APIs.

In the receive function, I first receive the transcribed text from the audio file, I send that text to OpenAI and receive the response from GPT-3.

Building an App with Streamlit

Streamlit can be used to build web-apps quickly. The goal of the app is to provide a Graphical User Interface (GUI) so that the user can interact with the smart assistant seamlessly without using any code.

I’ll now detail the structure of the code I used to make the app. I always wrap my Streamlit apps in a class. This is what the whole code structure looks like:

The app contains two buttons. The first is a record button, the second is a clear chat button.

The record button starts recording your voice, then transcribes and sends the text to OpenAI. The response is then shown on the app as a chat dialogue.

The app keeps a record of the dialogue the chat is stored in .txt files. When the clear chat button is pressed, the .txt files are cleared.

One final addition to the app which I cannot show here is text to speech. When the bot responds, it reads out the sentence. To do this I used the pyttsx3 library.

Interacting with the tool

Image by Author
Image by Author

Above is a screenshot of what the app looks like. On top are the two buttons of the app. To interact with the smart assistant, you press record and say your message. The transcription of your message is displayed on the left, the response by GPT-3 is shown on the right.

The smart assistant is apparently named Sarah and is 26 years old. The app seems to work well, it can detect my voice and display it in a comprehensive manner. The smart assistant can take more complicated queries:

Image by Author
Image by Author

The smart assistant has no issues working with basic geometry or with acronyms. The bot is also contextually aware of modern companies:

Image by Author
Image by Author

Conclusions

In this article, I walk through how I made a smart assistant app. The app is able to record your voice, transcribe it, pass the transcription to OpenAI’s GPT-3, and read you out the response. I walk through how the pipeline of the app looks like and how the app was programmed on streamlit. Finally, I go through some example interactions I had with the smart assistant. Overall, I think the app is useful and works well.

Support me

Hopefully, this helped you, if you enjoyed it you can follow me!

You can also become a medium member using my referral link, get access to all my articles and more: https://diegounzuetaruedas.medium.com/membership

Other articles you might enjoy

Using AI Analyze Speech

AI Applied to Mask Detection


Related Articles