Why OpenAI’s API Is More Expensive for Non-English Languages

Beyond words: How byte pair encoding and Unicode encoding factor into pricing disparities

Leonie Monigatti
Towards Data Science
7 min readAug 16, 2023

--

How can it be that the phrase “Hello world” has two tokens in English and 12 tokens in Hindi?

After publishing my recent article on how to estimate the cost for OpenAI’s API, I received an interesting comment that someone had noticed that the OpenAI API is much more expensive in other languages, such as…

--

--

Developer Advocate @ Weaviate. Follow for practical data science guides - whether you're a data scientist or not. linkedin.com/in/804250ab