Blog 2024 AI Chatbot Header

share this article:FacebookTwitterLinkedIn

Your company has a web application, mobile app, and/or website that gets thousands of users every single day. Questions and support from your users are getting difficult to manage with the current number of employees you have on staff. You think about how to offset some of the demand from your users for simple tasks like changing settings and resetting passwords so that your employees can focus on the more difficult requests and support. You decide to add a chatbot.

However, you don’t just want this chatbot to provide canned answers that are going to frustrate your users. Instead, you want a chatbot that is going to be smarter and personalized — the conversation feels like they are talking to a real person. And you’ve heard that AI does that. Would that make sense to use?

How do you start?

We got you.

AI Models

How does an AI chatbot work? It starts with a model.

A model is ultimately used to handle generating a response from information that is passed to it. For example, when building a chatbot, whatever the question is that a user would provide in a chat window, that question would be fed into the model and then the model would produce a response that would be passed back to the user.

Models exist and can be created to solve all kinds of problems. When models are used to generate content such as a response to a question, that model at a high level is considered to be using Generative AI because it’s generating something. At a more specific level, it’s also considered to be a Large-Language Model (LLM) because it can comprehend and/or generate human language text.

Foundation Models

We have two options when it comes to establishing a starting point for an AI model. We can choose to start from scratch — build and train a model entirely within our organization that is based only on data we have provided or we can choose to start with a Foundation Model (FM).

A Foundation Model is a model that was built and trained by another organization on a large and broad amount of data so that it can be a generalized model that can be used for a variety of purposes. OpenAI would be an example of an organization that creates foundation models.

Building and training a model from scratch is typically not cost-effective because the costs involved can easily be in the millions of dollars and take a long time. This process is also resource-intensive and takes a specialized team of data scientists and engineers to effectively train the model with a large amount of data that you may not have available..

Foundation Models are a more cost-effective entry to getting started and require much fewer resources.

Picking a Foundation Model

When building an AI-powered chatbot, we know a couple of specifications that we want for our foundation model. We know we want it to be a Generative AI model, an LLM, and able to specifically generate text.

Finding a model can also depend on the platform we want to use. There are many AI platforms available to support getting started with an FM. The following are examples of the many platforms available:

All of these platforms have pros & cons and choosing one over the other might come down to what foundation models are available and how best they suit your needs. The ecosystem, support, and experience with developing the AI model within the platform are also things to consider.

In this case, we’re going to focus on the Google Cloud Vertex AI Studio. The following is a list of foundation model groups that we can pick from at the time this article was published:

  • Gemini

  • PaLM

  • Codey

  • Imagen

Each of the groups has specific models that combine specific features and are designed for specific use cases. The following are two examples of models from the Gemini and PaLM model groups:

Gemini 1.0 Pro (gemini-1.0-pro)

Designed to handle natural language tasks, multiturn text and code chat, and code generation. Use Gemini 1.0 Pro for prompts that only contain text.

PaLM 2 for Chat (chat-bison)

Fine-tuned for multi-turn conversation use cases.

In this particular use case, the “Gemini 1.0 Pro” model offers more functionality than we actually need. So instead, we should start with the “PaLM 2 for Chat” model and work to customize it.

Configuring the Model

Things to consider when creating an AI chatbot for your users is that you want to make sure that the conversation is grounded in a discussion about your web application, company, support-related questions, etc.

You don’t want to create a chatbot for your web application that supports the user asking it about things like a sports team, evaluating code, or anything else that is unrelated to your company. At the same time, you do want the chatbot to be aware of specific prompts and responses that would be appropriate for users to request so that the chatbot can provide the user with a grounded experience.

For the PaLM 2 for Chat model, we have the following options to configure the model:

Context allows us to put guardrails on the model and to establish the style and tone of the response. Some guardrails that we might put in place are things like topics to focus on or avoid, what to do when the model doesn’t know the answer, and what words the model can’t use.

Examples provide the model with ideal responses to questions that may be asked to demonstrate to the model what is expected.

Grounding helps make sure the responses are focused on specific information such as company support features and frequently asked questions and answers.

Tuning the Model

There are different techniques for ensuring a model adapts to custom data and/or sources. Fine-tuning is just one technique. Additional articles in this series will highlight the different techniques along with their specific use cases. For this example though, we are going to focus on just fine-tuning the model in order to handle a custom dataset. Something to keep in mind is that fine-tuning can be rather expensive and resource-intensive depending on the amount of fine-tuning that is necessary.

In this step, we will first need to prepare a dataset to be used for tuning the model.

Preparing the dataset involves developing examples of a conversation that might occur for a user with our chatbot. The more examples we can provide in the dataset, the better-tuned our model will be.

Once we have a dataset ready, the next step is running the model through a fine-tuning process with the dataset. Depending on the platform and process you are taking with building your AI model, this will look different. For Google Clouse Vertex AI Studio, this involves storing the tuning dataset in a Google Cloud Storage bucket and then kicking off a text model supervised tuning job.

Evaluating the Model

Once we have completed the process of tuning our AI model, the final step to prepare the model for production is to evaluate the model with a test dataset. This will determine if the model is responding appropriately to our questions in a chat context.

The simplest way to evaluate the tuned model is to compare it with the pre-trained model. This involves preparing a dataset to be used for evaluation that contains questions that are representative of what our users might ask.

We would want to run the dataset through the pre-trained model to determine the responses. We would then run the dataset through our fine-tuned model and then compare the responses between the two result sets.

Specific metrics we would be looking for are the following:

  • Length of response

  • Whether the response had a positive or negative sentiment

  • Answer quality, coherence, relevance, fluency

We should be looking to establish a threshold percentage for each metric that we want to target. Meeting these defined thresholds will indicate that our model is ready for deployment. If we feel the model is not quite ready for production and needs further fine-tuning then we should continue to tune the model until it reaches the threshold for deployment.

Utilizing Your Model in a Solution

We have fine-tuned an AI model and evaluated it so that it is ready for deployment. Now, we need to be able to deploy our model so that it can be utilized behind API endpoints. We can choose to deploy the model in most cloud platforms very easily. This will come down to what your preference is and where you likely already have infrastructure in the cloud.

Once we have the model API deployed, we are ready to update our web application, mobile app, and/or website to have a chat interface that directly interacts with our AI model API endpoints.

Since we have thousands of users hitting our site every day, an isolated roll-out of the feature would likely be warranted so that we can ensure the AI model is effective in production before rolling it out to all of our users.

Additional metrics we likely want to measure and evaluate once our AI model is being used in production are the following:

  • Satisfaction Rate

  • Non-response Rate

  • Average Chat Time

  • Bounce Rate

  • Performance Rate

  • Self-service Rate

There will likely be additional metrics that you will want to determine as well that will be specific to your organization.

What’s Next

It’s not difficult to create a custom AI chatbot for your organization. It takes some time, preparation of datasets for fine-tuning and evaluation, and measurement of the effectiveness of the AI model before and after deployment.

Once you have the chatbot being utilized within your organization, it is important to continue to evaluate the AI model regularly to ensure it maintains a threshold for specific metrics identified by your organization.

Additionally, as new content, questions & answers, and services & offerings change within your organization, a combination of techniques may be necessary to ensure the AI model continues to provide relevant and up-to-date information to the user through chat conversations.

Next up our friends at OmbuLabs will go more in-depth on enhancing the capabilities for your LLM.

This article was written by Travis Smith, Director of Engineering