The world’s leading publication for data science, AI, and ML professionals.

How to Build Your Own Google AI Chatbot Within 5 Minutes

Fully utilising the power of Google LLM and your private knowledge

Authors: Selina Li, Tianyi Li

· The Problem · The Solution · Use Case · Step by Step GuideStep 1: Environment SetupStep 2: Prepare Private Knowledge and store them into Google Cloud Storage (low code)Step 3: Create Chatbot and the Data Store sitting behind the Chatbot (no code)Step 4: Test the Chatbot (no code)Step 5: Publish / Integrate your Chatbot (low code)Step 6 (Optional): Publish it through a Beautiful Application (low code) · What makes this "magic"? · Some Observations · Wrap-up · Enjoyed This Story?

The Problem

You might have been familiar with AI chats powered by Large Language Model (LLM) such as OpenAI ChatGPT or Google Bard. And you might have noticed one thing – these LLMs have extensive general knowledge about the world, but might not give you satisfactory answers when you ask about a very specific or professional area, especially if the knowledge of this area is not that publicly available or sharable.

Have you thought about "giving" your private knowledge to LLM and creating your own Chatbot?

Do you know this can be done within 5 minutes with no code or low code?

The end product will be like this:

Github link: https://github.com/bianbianzhu/property-hunter
Github link: https://github.com/bianbianzhu/property-hunter

The Solution

During the Asia Pacific Google Cloud Applied AI Summit, Alan Blount from Google shared an interesting idea of achieving this using Google Cloud Vertex AI Search and Conversation, which I found pretty attractive to try out.

The idea is simple, first put a corpus of private knowledge documents onto Google Cloud Storage:

then create a Data Store, and import the documents from the Cloud Storage into the Data Store:

finally plug that Data Store into Dialogflow CX:

then we are done!

We can test Chatbot like this:

And if we want to publish it through a beautiful application, Google provides a public git repo for a Chat App that we can utilise. With a bit of coding knowledge, we can plug the link of the Dialogflow Chatbot into the Chat App, and customize the interface like this:

OR this:


Use Case

In this case, assuming I am the owner of an ecommerce website. I would like to create a Chatbot, so my users can ask specific questions regarding anything about this website (price, product, service, shipping, etc.) as they are in the store. The Chatbot will be supplied with the "private knowledge" and ground its answers to the contents of the website.

Given I am not actually owning an ecommerce website, I will take a workaround to crawl contents from an existing website available on the Internet. This is tricky because most websites are anti-scraping as specified in their terms of use, and it could be illegal to scrape ecommerce websites such as Amazon, eBay, Alibaba, etc.

ChatGPT provided me with a perfect option –

Books to Scrape (https://books.toscrape.com/). A simulated bookstore specifically designed for web scraping practice. It offers a straightforward structure for scraping book details like title, price, and rating.

In this use case, I would assume I am the owner of this Books to Scrape website, and create the Chatbot based on it.

Step by Step Guide

This might look a bit lengthy at first glance because it covers every detailed step that you will need. Once you have run through, you can get the same done within 5 minutes.

Step 1: Environment Setup

The tool we are going to use is sitting on Google Vertex AI and we will need a Google Cloud Platform (GCP) account.

Google has a free-tier program to provide new Google Cloud Platform (GCP) users with a 90-day trial period that includes $300 as free Cloud Billing credits.

Follow the tutorial here to set up the free Google Cloud account.

After you have set up Google Cloud account and can access the console, create a storage bucket (step-by-step guide here) for the next step use.

Step 2: Prepare Private Knowledge and store them into Google Cloud Storage (low code)

As mentioned above, the private knowledge in this case will be the contents sitting on the book store website.

For owners of ecommerce websites, all you need to do is to provide the website URLs, and Google can automatically crawl website content from a list of domains you define.

Given I am not a real owner, I will resolve this by scrawling. Alan Blount from Google provided a very useful notebook to achieve this. All the code snippet does is to scrawl webpages from the website that you specified and store them in a Google Cloud Storage bucket that you specified.

This is all you need to do:

2.1 Save a copy of the notebook in your own drive

Recall that in step 2 you have created a new Google account when you registered for Google Cloud? Your Google account will have Google Drive and you can save a copy of this notebook to your drive.

Select "Save a copy in Drive" option from the dropdown menu of "File"

Image from Google Colab Notebook by Alan Blount
Image from Google Colab Notebook by Alan Blount

Then if you go to Google Drive, you will be able to see the notebook you created. Feel free to rename it according to your need.

2.2 On your own notebook, locate the below and specify

Image from Google Colab Notebook
Image from Google Colab Notebook

website_url refers to the website page URL that you would like to scrawl.

storage_bucket refers to the Google Cloud Storage that you created in above step 1.

metadata_filename refers to a json file that will be created and stored together with the webpages. You might want to make it relevant to your website by changing applied_ai_summit_flutter_search to something that can describe your use case.

This is my version:

Image from Google Colab Notebook
Image from Google Colab Notebook

2.3 Run all

Image from Google Colab Notebook
Image from Google Colab Notebook

2.4 When it prompts you to authenticate the Google Colab notebook to access your Google credentials, click "Allow" -> "Continue"

Image from Google Colab Notebook
Image from Google Colab Notebook

Then the script should run through and show the progress of the scrawling at the bottom, just like this:

Image from Google Colab Notebook
Image from Google Colab Notebook

And if you refer to your Google Cloud storage bucket, you will see these html files get scrawled and stored properly within your bucket:

Image from Google Cloud Console
Image from Google Cloud Console

One thing to notice is that the code snippet is not designed for every use case, and you might need some slight tuning of the codes to achieve your goal.

For example, in my case, I tuned the code a bit by changing

blob.upload_from_string(html_string)

into

blob.upload_from_string(html_string, content_type='text/html')

By default the html_string will be uploaded as text/plain . By changing into text/html , I would like to enable this HTML contents to show properly in a later stage.

You can tune the code as much as you like.

Step 3: Create Chatbot and the Data Store sitting behind the Chatbot (no code)

Go to Google Cloud Console (https://console.cloud.google.com/) and type "search and conversation" as the service:

Create "NEW APP":

Image from Google Cloud Console
Image from Google Cloud Console

Select "Chat":

Image from Google Cloud Console
Image from Google Cloud Console

Provide your "Company name" and "Agent name". Note that the "Agent name" here will be the name of the Chatbot, you might want to put a good name for your users.

Image from Google Cloud Console
Image from Google Cloud Console

At this "Data" page, select "CREATE NEW DATA STORE":

Image from Google Cloud Console
Image from Google Cloud Console

For owners of ecommerce websites, select "Website URLs" and provision your website URLs

As I have scrawled the website contents into Cloud Storage, we can select "Cloud Storage" here:

Image from Google Cloud Console
Image from Google Cloud Console

Specify the Cloud Storage bucket name, and select "Unstructured documents" in below:

Image from Google Cloud Console
Image from Google Cloud Console

Give your data store a name, then "CREATE"

Image from Google Cloud Console
Image from Google Cloud Console

You will see your data store listed, then "CREATE"

Image from Google Cloud Console
Image from Google Cloud Console

Your data store will be created as below

Image from Google Cloud Console
Image from Google Cloud Console

If you click into it, you will see your data store is "processing data" by importing documents from the Cloud Storage bucket that we specified earlier:

Image from Google Cloud Console
Image from Google Cloud Console

If we click the "ACTIVITY" tab, we can see the import is in progress:

Image from Google Cloud Console
Image from Google Cloud Console

Import will take minutes to hours depending on the number of documents in your Cloud Storage bucket.

In my case, I have over 1,000 files and it finishes within minutes.

After import is completed, the status as highlighted has changed:

Image from Google Cloud Console
Image from Google Cloud Console

And if you switch back to the "DOCUMENTS" tab, you will see the list of files imported into the data store:

Image from Google Cloud Console
Image from Google Cloud Console

That means you’ve got all the materials and you are ready to cook!

Step 4: Test the Chatbot (no code)

In step 3 above, we have already created a Chatbot app as well as the data store sitting behind it.

Click "Apps" on the top:

Image from Google Cloud Console
Image from Google Cloud Console

You will see the Chatbot you created in the previous step 3:

Image from Google Cloud Console
Image from Google Cloud Console

If you click into the Chatbot name, you will be directed to the Dialogflow CX page like below:

Image from Google Cloud Console
Image from Google Cloud Console

To test the Chatbot, select "Test Agent" in the right up corner:

Image from Google Cloud Console
Image from Google Cloud Console

And the dialogue box will pop up:

Image from Google Cloud Console
Image from Google Cloud Console

You can start the conversation by saying "hi" and start asking questions to the Chatbot:

Image from Google Cloud Console
Image from Google Cloud Console

It works!

Step 5: Publish / Integrate your Chatbot (low code)

If you are happy with the Chatbot, it is easy to integrate it with your web application

Go to the left pane, select "Manage" -> "Integrations" -> "Dialogflow Messenger"

Image from Google Cloud Console
Image from Google Cloud Console

You can choose the type of API and UI style according to your needs

For demo purpose, I selected "Unauthenticated API" as API and "Pop-out" as UI style:

Image from Google Cloud Console
Image from Google Cloud Console

After selecting "Done", a code snippet in HTML will be generated in the next page as below:

Image from Google Cloud Console
Image from Google Cloud Console

You may copy the code snippet and easily paste it into your applications for integration.

For demo purpose, I copy paste this HTML snippet into JSFiddle and run it, then I am getting my little Chatbot working as shown in the right down corner!

Image from JSFiddle
Image from JSFiddle

Step 6 (Optional): Publish it through a Beautiful Application (low code)

In case you don’t have an application yet and you want to have one, Google provides a good starting point through a public git repository Chat App.

This is a Chatbot Application written in Node.js and you can easily adapt it for your own use by changing the code snippets a bit within chat-app/src/routes/+page.svelte .

You will need to change the project-id, agent-id and chat-title into yours.

Image from git repo https://github.com/GoogleCloudPlatform/generative-ai/tree/main/conversation/chat-ap
Image from git repo https://github.com/GoogleCloudPlatform/generative-ai/tree/main/conversation/chat-ap

And once you run/deploy the app, you will get the web UI like this:

Image from git repo https://github.com/GoogleCloudPlatform/generative-ai/tree/main/conversation/chat-app
Image from git repo https://github.com/GoogleCloudPlatform/generative-ai/tree/main/conversation/chat-app

Of course you can change the appearance of the UI as you like.

Now you can have your own application!


What makes this "magic"?

Recalling the solution design that we mentioned at the beginning. This looks a bit magic as you can get your own LLM powered Chatbot by simply supplying your private knowledge to a Google Cloud Storage bucket.

This is achieved as Google has done quite a bit of integrations behind the scene, by integrating Vertex AI platform with the chatbot agent service Dialogflow CX, and coming up with a new abstraction called Vertex AI Conversation (formerly Gen App Builder). This new abstraction also supports Search and Recommend, and the full name of this service is "Vertex AI Search and Conversation".

As we can see, this new abstraction of "Vertex AI Search and Conversation" is sitting on top of Vertex AI which has orchestrated a bunch of foundation models, and gets "augmented" by user-supplied updated real world information, so it can contextualize its responses to these information.

Image as a slide from Google Cloud CEO speech Generative AI: The next frontier for developers
Image as a slide from Google Cloud CEO speech Generative AI: The next frontier for developers

The integration is great as it can help at least two groups of people –

  1. traditional Chatbot builder, and
  2. people exploring GenAI solutions but having not identified a good use case

Imagine you are a traditional Chatbot builder using Dialogflow CX, you are creating pages, intents and routes to route customer intentions to the corresponding page. Basically you are defining "if customer say this then I respond with this" which is a bit hard-coding. Now Google plugs in Vertex AI which can utilise LLM models (e.g. text-bison, gemini) to generate agent responses and control conversation flow in a much smarter way. This can significantly reduce agent design time and improve agent quality.

On the other hand, image you are exploring the power of LLMs and Generative AI but not sure what to do with it. This Vertex AI Conversation feature can enable you to easily build and launch your own Chatbot applications quickly and make them available for real use case. This can significantly shorten the go-to-market time of LLM and GenAI solutions.


Some Observations

Despite the "magic" as it appears, we observed several things worth sharing with developers who are considering use this "Vertex AI Search and Conversation" feature.

Our gut feeling is this is a new product Google brought in by "integrating" several existing tools and is still working towards making it better. It lacks clarity how the integration happens behind the scene, and how developers can best understand and configure it.

I got our chatbot very quickly but once I started looking at how to fine tune it, it took me quite a bit of time to figure out how Dialogflow CX works, what is "generator" and how it works. At this moment I’m still confused why this Chatbot works so great without me even configuring any "generator" as described in Google doc, and whether/how we can make it better by using "generator".

Some other observations during the development:

  • Indexing a website or a set of documents can take minutes or days, depending on the amount of data. There is no clear estimate on how long this process will take, and what developers can do is just wait and check periodically.
  • We know how to link a datastore to a Chatbot app, but looks like we cannot "unlink" it.
  • Despite the level of grounding, the quality of the data supplied by users can significantly impact the performance of the Chatbot. "Rubbish in, rubbish out" still applies to a great extent.
  • "Augment" by supplying private data and knowledge helps resolve one issue of LLM – lack of updated real world information. But the issue of hallucination stays as sometimes the Chatbot can give "fake" information (of course depending on the data quality of the private knowledge you supplied).
  • The Chatbot provides links to the relevant web page / document page (e.g. PDF) during chat with users. This is great, but the link supplied by the Chatbot comes as Google Cloud Storage Authenticated URL and can only be accessed by users with granted permission. Developers need to figure out how to make them into the signed-URLs which are safe to share with public anonymous users, instead of using the Google Cloud Storage Authenticated URLs.
  • The data store sitting behind the Chatbot works best for unstructured data. For structured data, it supports linking to CSV structured data but it has to be in the format of "question" and "answer" as mentioned in Google doc:
Image from Google Cloud Console
Image from Google Cloud Console
Image from Google Cloud Dialogflow Guides
Image from Google Cloud Dialogflow Guides

Wrap-up

In the above use case, I assume I am an online book store owner and created a Chatbot based on my e-commerce website contents in HTML.

Similarly, you can supply "private knowledge" in the format of blogs, files (e.g. PDF, HTML, TXT) and all kinds of websites to the Google Cloud Storage, and create your own Chatbot.

This enables individuals / businesses to fully utilize the power of the Google LLMs (text-bison, gemini, etc.) and augment it with private knowledge, and create own Chatbots in a very quick manner.

This marks the end of this article. Hope you find it helpful!

(PS: I am working on a video to make this step-by-step guide easier to follow. Will share if I get it done in near future.)

Enjoyed This Story?

Selina Li (Selina Li, LinkedIn) is a Principal Data Engineer working at Officeworks in Melbourne Australia. Selina is passionate about AI/ML, data engineering and investment.

Jason Li (Tianyi Li, LinkedIn) is a Full-stack Developer working at Mindset Health in Melbourne Australia. Jason is passionate about AI, front-end development and space related technologies.

Selina and Jason would love to explore technologies to help people achieve their goals.

Unless otherwise noted, all images are by the authors.


Related Articles