
Many people I know are interested in playing with OpenAI’s large language models (LLMs). But hosting LLMs is expensive, and thus, inference services like OpenAI’s application Programming interface (API) are not free. But entering your payment information without knowing what inferencing costs will add up to can be a little intimidating.
Usually, I like to include a little indicator of the API costs of a walkthrough of my articles, so my readers know what to expect and can get a feeling for inferencing costs.
This article introduces you to the tiktoken
library I use to estimate inferencing costs for OpenAI foundation models.
What is tiktoken?
tiktoken
**** is an open-source byte pair encoding (BPE) tokenizer developed by OpenAI that is used for tokenizing text in their LLMs. It allows developers to count how many tokens are in a text before making calls to the OpenAI endpoint.
It thus helps with estimating the associated costs of using the OpenAI API because its costs are billed in units of 1,000 tokens according to the OpenAI’s pricing page [1].
GitHub – openai/tiktoken: tiktoken is a fast BPE tokeniser for use with OpenAI’s models.
Tokens and Tokenization
Tokens are common sequences of characters in a text, and tokenization is when you split a text string into a list of tokens. A token can be equal to a word but usually a word consists of multiple tokens.
Natural language processing (NLP) models are trained on tokens and understand the relationships between them. Thus, the input text is tokenized before an NLP model processes it.
But how words are tokenized exactly depends on the used tokenizer.
Below you can see an example of how the text
"Alice has a parrot.
What animal is Alice’s pet?
Alice’s pet is a parrot."
can be tokenized.

You can see that the text is split into chunks of characters, including spaces and punctuation, and even line breaks. Then each token is encoded as an integer.
While some shorter words are equivalent to one token, longer words, e.g., the word "parrot", are separated into multiple tokens, as shown below:

Depending on the tokenizer, the same word may not be encoded as the same token. E.g., in this example, the word Alice is once tokenized as "Alice" and once as " Alice" (with a leading space), depending on where in the text this word appeared. Thus, the tokens "Alice" and " Alice" (with a leading space) have different token identifiers.

OpenAI uses a tokenization technique called byte pair encoding (BPE), which replaces the most frequent pairs of bytes in a text with a single byte and thus helps NLP models understand grammar better [4].
E.g., "ing" is a frequent substring of characters in the English language. Thus, BPE will split words ending in "ing" as, e.g., "walking" into "walk" and "ing".
On average, each token corresponds to about 4 bytes or 4 characters for common English text in BPE, which roughly translates to 100 tokens for 75 words [2, 4].
For an in-depth explanation of BPE, I recommend this article:
How To Use tiktoken To Estimate OpenAI API Costs
Estimating the OpenAI API costs with tiktoken
consist of the following four simple steps, which we will discuss in detail:
Step 1: Installation and setup
First, you need to install tiktoken
as follows:
pip install tiktoken
Then you import the library:
import tiktoken
Step 2: Define encoding
Next, you need to define which encoding to use for tokenization because OpenAI models use different encodings [3]:
cl100k_base
: forgpt-4
,gpt-3.5-turbo
, andtext-embedding-ada-002
p50k_base
: for codex models,text-davinci-002
,text-davinci-003
gpt2
(orr50k_base
): for GPT-3 models likedavinci
If you know the encoding of your model, you can define the encoding as shown below:
encoding = tiktoken.get_encoding("cl100k_base")
Alternatively, you can define the encoding according to the used model:
encoding = tiktoken.encoding_for_model("gpt-3.5-turbo")
Step 3: Tokenize the text
Finally, you can tokenize any body of text with the .encode()
method, which will return a list of integers that represent the tokens.
text = "Alice has a parrot.n
What animal is Alice's pet?n
Alice's pet is a parrot."
tokens = encoding.encode(text)
[44484, 468, 257, 1582, 10599, 13, 198, 2061, 5044, 318, 14862, 338, 4273, 30, 198, 44484, 338, 4273, 318, 257, 1582, 10599, 13]
Step 4: Estimate OpenAI API costs
To estimate the OpenAI API costs, you can now count the number of tokens in your text and estimate the associated costs of the inferencing service according to the model you are using.
len(tokens)
If you are using an embedding model, you are only charged for the number of tokens of the input text to embed. E.g., text-embedding-ada-002
cost $0.0001 for 1,000 tokens at the time of writing [1].

But note, if you are using a conversational model, you are charged both for the number of input tokens (number of tokens of your prompt) as well as for the number of output tokens (number of tokens of the returned completion). E.g., the gpt-3.5-turbo
(4K context) model costs $0.0015 for 1,000 input tokens and $0.002 for 1,000 output tokens at the time of writing [1].

That’s why you need to control the number of output tokens in addition to managing the length of your input text to avoid unexpected costs. You can control the number of output tokens via the optional but highly recommended max_tokens
parameter.
Optional step: Decode tokens
Another advantage of BPE is that it is reversible. If you want to decode a list of tokens, you can use the .decode_sigle_token_bytes()
method shown below:
decoded_text = [encoding.decode_single_token_bytes(token) for token in tokens]
["Alice", " has", " a", " par", "rot", ".", "n", "What", " animal"," is"," Alice", "'s", " pet", "?", "n", "Alice", "'s", " pet", " is", " a", " par", "rot", "."]
Summary
This article showed how you can easily calculate the number of tokens in your input text (text to embed or prompt) with the tiktoken
library before calling the OpenAI API endpoint. By including this step in your coding practice, you will get a feeling for the resulting API costs. Additionally, we discussed that you should also use the max_tokens
parameter when using an LLM that will output a completion to avoid unexpected costs.
Enjoyed This Story?
Subscribe for free to get notified when I publish a new story.
Find me on LinkedIn, Twitter, and Kaggle!
References
Image References
If not otherwise stated, all images are created by the author.
Web & Literature
[1] OpenAI (2023). Pricing (accessed July 31st, 2023)
[2] OpenAI (2023). Tokenizer (accessed July 31st, 2023)
[3] OpenAI on GitHub(2023). OpenAI Cookbook (accessed July 31st, 2023)
[4] OpenAI on GitHub(2023). tiktoken (accessed July 31st, 2023)