The world’s leading publication for data science, AI, and ML professionals.

Easily Estimate Your OpenAI API Costs with Tiktoken

Count your tokens and avoid going bankrupt from using the OpenAI API

Fresh tokens! $0.0015 per kilo!
Fresh tokens! $0.0015 per kilo!

Many people I know are interested in playing with OpenAI’s large language models (LLMs). But hosting LLMs is expensive, and thus, inference services like OpenAI’s application Programming interface (API) are not free. But entering your payment information without knowing what inferencing costs will add up to can be a little intimidating.

Usually, I like to include a little indicator of the API costs of a walkthrough of my articles, so my readers know what to expect and can get a feeling for inferencing costs.

This article introduces you to the tiktoken library I use to estimate inferencing costs for OpenAI foundation models.

What is tiktoken?

tiktoken **** is an open-source byte pair encoding (BPE) tokenizer developed by OpenAI that is used for tokenizing text in their LLMs. It allows developers to count how many tokens are in a text before making calls to the OpenAI endpoint.

It thus helps with estimating the associated costs of using the OpenAI API because its costs are billed in units of 1,000 tokens according to the OpenAI’s pricing page [1].

GitHub – openai/tiktoken: tiktoken is a fast BPE tokeniser for use with OpenAI’s models.

Tokens and Tokenization

Tokens are common sequences of characters in a text, and tokenization is when you split a text string into a list of tokens. A token can be equal to a word but usually a word consists of multiple tokens.

Natural language processing (NLP) models are trained on tokens and understand the relationships between them. Thus, the input text is tokenized before an NLP model processes it.

But how words are tokenized exactly depends on the used tokenizer.

Below you can see an example of how the text

"Alice has a parrot.

What animal is Alice’s pet?

Alice’s pet is a parrot."

can be tokenized.

You can see that the text is split into chunks of characters, including spaces and punctuation, and even line breaks. Then each token is encoded as an integer.

While some shorter words are equivalent to one token, longer words, e.g., the word "parrot", are separated into multiple tokens, as shown below:

Depending on the tokenizer, the same word may not be encoded as the same token. E.g., in this example, the word Alice is once tokenized as "Alice" and once as " Alice" (with a leading space), depending on where in the text this word appeared. Thus, the tokens "Alice" and " Alice" (with a leading space) have different token identifiers.

OpenAI uses a tokenization technique called byte pair encoding (BPE), which replaces the most frequent pairs of bytes in a text with a single byte and thus helps NLP models understand grammar better [4].

E.g., "ing" is a frequent substring of characters in the English language. Thus, BPE will split words ending in "ing" as, e.g., "walking" into "walk" and "ing".

On average, each token corresponds to about 4 bytes or 4 characters for common English text in BPE, which roughly translates to 100 tokens for 75 words [2, 4].

For an in-depth explanation of BPE, I recommend this article:

Byte-Pair Encoding: Subword-based tokenization algorithm

How To Use tiktoken To Estimate OpenAI API Costs

Estimating the OpenAI API costs with tiktoken consist of the following four simple steps, which we will discuss in detail:

  1. Installation and setup
  2. Define encoding
  3. Tokenize text
  4. Estimate OpenAI API costs

Step 1: Installation and setup

First, you need to install tiktoken as follows:

pip install tiktoken

Then you import the library:

import tiktoken

Step 2: Define encoding

Next, you need to define which encoding to use for tokenization because OpenAI models use different encodings [3]:

  • cl100k_base: for gpt-4, gpt-3.5-turbo, and text-embedding-ada-002
  • p50k_base: for codex models, text-davinci-002, text-davinci-003
  • gpt2 (or r50k_base): for GPT-3 models like davinci

If you know the encoding of your model, you can define the encoding as shown below:

encoding = tiktoken.get_encoding("cl100k_base")

Alternatively, you can define the encoding according to the used model:

encoding = tiktoken.encoding_for_model("gpt-3.5-turbo")

Step 3: Tokenize the text

Finally, you can tokenize any body of text with the .encode() method, which will return a list of integers that represent the tokens.

text = "Alice has a parrot.n
What animal is Alice's pet?n
Alice's pet is a parrot."

tokens = encoding.encode(text)
[44484, 468, 257, 1582, 10599, 13, 198, 2061, 5044, 318, 14862, 338, 4273, 30, 198, 44484, 338, 4273, 318, 257, 1582, 10599, 13]

Step 4: Estimate OpenAI API costs

To estimate the OpenAI API costs, you can now count the number of tokens in your text and estimate the associated costs of the inferencing service according to the model you are using.

len(tokens)

If you are using an embedding model, you are only charged for the number of tokens of the input text to embed. E.g., text-embedding-ada-002 cost $0.0001 for 1,000 tokens at the time of writing [1].

But note, if you are using a conversational model, you are charged both for the number of input tokens (number of tokens of your prompt) as well as for the number of output tokens (number of tokens of the returned completion). E.g., the gpt-3.5-turbo (4K context) model costs $0.0015 for 1,000 input tokens and $0.002 for 1,000 output tokens at the time of writing [1].

That’s why you need to control the number of output tokens in addition to managing the length of your input text to avoid unexpected costs. You can control the number of output tokens via the optional but highly recommended max_tokens parameter.

Optional step: Decode tokens

Another advantage of BPE is that it is reversible. If you want to decode a list of tokens, you can use the .decode_sigle_token_bytes() method shown below:

decoded_text = [encoding.decode_single_token_bytes(token) for token in tokens]
["Alice", " has", " a", " par", "rot", ".", "n", "What", " animal"," is"," Alice", "'s", " pet", "?", "n", "Alice", "'s", " pet", " is", " a", " par", "rot", "."]

Summary

This article showed how you can easily calculate the number of tokens in your input text (text to embed or prompt) with the tiktoken library before calling the OpenAI API endpoint. By including this step in your coding practice, you will get a feeling for the resulting API costs. Additionally, we discussed that you should also use the max_tokens parameter when using an LLM that will output a completion to avoid unexpected costs.


Enjoyed This Story?

Subscribe for free to get notified when I publish a new story.

Get an email whenever Leonie Monigatti publishes.

Find me on LinkedIn, Twitter, and Kaggle!

References

Image References

If not otherwise stated, all images are created by the author.

Web & Literature

[1] OpenAI (2023). Pricing (accessed July 31st, 2023)

[2] OpenAI (2023). Tokenizer (accessed July 31st, 2023)

[3] OpenAI on GitHub(2023). OpenAI Cookbook (accessed July 31st, 2023)

[4] OpenAI on GitHub(2023). tiktoken (accessed July 31st, 2023)


Related Articles