Create Your Own Large Language Model Playground in SageMaker Studio

Now you can deploy LLMs and experiment with them all in one place

Heiko Hotz
Towards Data Science

--

Image by author — created with Midjourney

What is this about?

Utilising large language models (LLMs) through a REST endpoint offers numerous benefits, but experimenting with them via API calls can be cumbersome. Below we can see how we can interact with a model that has been deployed to an Amazon SageMaker endpoint.

Image by author

To streamline this process, it would be advantageous to develop a playground app that allows for seamless interaction with the deployed model. In this tutorial, we will achieve this by using Amazon SageMaker (SM) Studio as our all-in-one IDE and deploy a Flan-T5-XXL model to a SageMaker endpoint and subsequently create a Streamlit-based playground app that can be accessed directly within Studio.

All of the code for this tutorial is available in this GitHub repository.

Why is it important?

Assessing and contrasting different LLMs is crucial for organisations to identify the most fitting model for their unique requirements and to experiment quickly. A playground app presents the most accessible, rapid, and straightforward method for stakeholders (technical & non-technical) to experiment with deployed models.

In addition, utilising a playground app enhances comparison and promotes further customisation, such as incorporating feedback buttons and ranking the model output. These supplementary features enable users to offer feedback that enhances the model’s precision and overall performance. In essence, a playground app grants a more thorough comprehension of a model’s strengths and weaknesses, ultimately guiding well-informed decisions in choosing the most suitable LLM for the intended application.

Let’s get started!

Deploying the Flan-T5-XXL model

Before we can set up the playground we need to set up a REST API to access our model. Fortunately this is very straightforward in SageMaker. Similarly to what we have done when we deployed the Flan-UL2 model, we can write an inference script that downloads the model from the Hugging Face Model Hub and deploys it to a SageMaker endpoint. That endpoint then provides us with a REST API that we can access within our AWS account without having to use API Gateway on top.

Note that we are using the option to load the model in 8 bit which allows us to deploy the model onto a single GPU (G5 instance).

Once we have the inference script ready we can deploy the model with just one command:

For more detailed information check out the deployment notebook and my previous blog post on deploying Flan-UL2.

Once the endpoint is up and running we can get to the fun part — setting up a playground app to interact with the model.

Playground app

We will employ Streamlit to develop a streamlined playground app. With just a few lines of code, it enables us to create a text box and showcase various generation parameters within a user-friendly interface. You are welcome to modify the app and exhibit an alternate set of generation parameters for even greater control over the text generation procedure.

A list of all generation parameters can be found here.

Note that you will have to specify the endpoint name in line 10 which you can retrieve from the deployment notebook of the SageMaker console.

Test

Now it’s time to deploy and test our playground app. Inspired by the documentation on how to use TensorBoard in SM Studio, we can use the same mechanism to spin up our Streamlit app in SM Studio.

To do so, we can execute the command streamlit run flan-t5-playground.py --server.port 6006 in the terminal. After that we will be able to access the playground on https://<YOUR_STUDIO_ID>.studio.<YOUR_REGION>.sagemaker.aws/jupyter/default/proxy/6006/.

Image by author

Conclusion

In this tutorial, we successfully deployed a cutting-edge language model and established a playground app within a single environment, SageMaker Studio. The process of initiating LLM experimentation has never been more straightforward. I hope you found this information valuable, and please feel free to reach out if you have any questions or require further assistance.

Heiko Hotz

👋 Follow me on Medium and LinkedIn to read more about Generative AI, Machine Learning, and Natural Language Processing.

👥 If you’re based in London join one of our NLP London Meetups.

https://www.linkedin.com/in/heikohotz/

--

--