
While ChatGPT already yields impressive results with the default settings, there is huge untapped potential that comes from its advanced parameters.
By adjusting settings like Top P, frequency penalty, presence penalty, stop sequences, maximum length, and temperature, we can steer text generation to meet nuanced demands through new levels of creativity and specificity.
In this article, we explore these advanced settings and learn how to tune them effectively.
Table of Contents
(1) Temperature (2) Maximum Length (3) Stop Sequences (4) Top P (5) Frequency Penalty (6) Presence Penalty
Introduction
Using ChatGPT is simple – type a prompt and receive a response. Yet, there are numerous advanced parameters that we can configure to enrich the output generated.
The OpenAI Playground lets us interact with language models while offering various configuration options, as shown below:

These advanced settings can also be configured in the API codes:
response = openai.ChatCompletion.create(
messages = [
{"role": "user",
"content": ""
}],
model = "gpt-4"
temperature = 0,
max_tokens = 100,
stop = ["goodbye"],
top_p = 0.5,
frequency_penalty = 0,
presence_penalty = 0)
Aside from the self-explanatory parameters of the mode and model, let’s do a deep dive into the other six parameters.
(1) Temperature
Temperature controls the degree of randomness in the responses and its value ranges between 0 and 2.
At zero, the outputs are more predictable and will stick closely to the most likely words. If we want consistent answers, a zero temperature is ideal, especially when using these models for grounded responses.
If we want responses that are more creative and unpredictable, we can increase the temperature. Consider the following sentence:
"The cat sat on the mat and started to ___"
- At a low temperature (e.g., 0), the model will choose a highly probable word like "purr" or "sleep."
- At a medium temperature (e.g., 1), the model might introduce slightly less expected yet reasonable words, such as "groom" or "stretch."
- At a high temperature (e.g., 2), the model may generate more diverse and less predictable outcomes, such as "contemplate" or "brainstorm."
At higher temperatures, the model is more inclined to take risks, leading to a wider variety of possible completions. However, high temperatures may lead to nonsensical output, as shown below:

From a technical perspective, a higher temperature flattens the probability distribution such that the typically less-common tokens now become as likely to be generated as the more common ones. On the other hand, a lower temperature skews the distribution such that the the more common tokens will have an even higher probability of being generated.
(2) Maximum Length
The maximum length relates to the maximum number of tokens that will be generated.
For English text, 1 token is roughly 0.75 words (or 4 characters). Check out OpenAI’s token counter to calculate the number of tokens in your text.
One caveat is that the maximum length includes the input prompt. If we set the maximum length at 1,000 and our input has 300 tokens, the output will be capped at 1,000 – 300 = 700 tokens.
Furthermore, the upper limit to the maximum length is model-specific. For example, the GPT-4 model can reach 8,191 tokens.
With this, we can generate responses that fit within custom token limits for different use cases. An example is marketing campaigns where we create SMS messages within the 160-character (~40 tokens) limit.
Suppose we have the following 20-token prompt:
"Generate an SMS marketing message for a local bakery in London called Delights that is offering a discount."
To generate a message that fits the SMS limit, we set the maximum length as 40 + 20 = 60 tokens. After entering the prompt, we get this concise SMS message:
Hey! Delicious savings at Delights Bakery, London!
Enjoy 20% off exquisite pastries & delightful bread.
Pop in & make your day a bit sweeter. Hurry, offer ends soon!
(3) Stop Sequences
The "stop sequences" parameter instructs the model to halt generation upon reaching a certain string. This is useful when we want the output to end at specific points, ensuring that the response is concise and omits unwanted information.
Suppose we have the following output after asking ChatGPT to generate a resignation letter template:
Subject: Resignation Notice
Dear [Manager's Name],
I hereby resign from my position at [Company Name],
effective [Last Working Day, typically two weeks from the date of the email].
Best,
[Your Name]
If we want to exclude the sign-off at the bottom, we can set the string "Best" as one of the stop sequences. By doing so, the regenerated output will cut off at the stop sequence, as seen below:

The returned output excludes the stop sequence itself, and up to four string sequences can be defined for each execution.
(4) Top P
Top P is associated with the top-p sampling technique (aka nucleus sampling). As a recap, GPT models generate the next word by assigning probabilities to all possible next words in its vocabulary.
With top-p sampling, instead of considering the entire vocabulary, the next word will be sampled from a smaller set of words that collectively have a cumulative probability above the Top P value.
Top P ranges from 0 to 1 (default), and a lower Top P means the model samples from a narrower selection of words. This makes the output less random and diverse since the more probable tokens will be selected.
For instance, if Top P is set at 0.1, only tokens comprising the top 10% probability mass are considered.
_Given that Top P impacts output randomness, OpenAI recommends adjusting either Top P or temperature, but not both. Nonetheless, there is no harm in experimenting with tuning both._
The following shows the outputs for different Top P values based on this prompt:
"Write a wildly creative short synopsis about a whale"

The example above shows that the output from a lower Top P of 0.01 appears less creative and fancy in its description.
Technical Details
If Top P is set to 0.1, it does not strictly mean that tokens in the top 10% probability mass are considered. Rather, the model finds the smallest set of most probable tokens whose cumulative probability exceeds 10%.
It starts from the most probable token and adds others in descending probabilities until the Top P is met. In some cases, this could involve many tokens if no single token has a very high probability and the distribution is relatively flat.
(5) Frequency Penalty
The frequency penalty addresses a common problem in text generation: repetition. By applying penalties to frequently appearing words, the model is encouraged to diversify language use.
Positive frequency penalty values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood of repeating the same line verbatim.
Based on API documentation, the frequency penalty ranges from -2 to 2 (default 0). However, the range available on the Playground is 0 to 2. We shall follow the API documentation’s range.
The following shows the outputs for different frequency penalties based on this prompt:
"Write a poem where every word starts with Z"

The example above shows that a larger frequency penalty leads to fewer repeated words and greater diversity, such that we even get words that do not begin with ‘Z.’
Reasonable values for the frequency penalty are around 0.1 to 1. We can increase it further to suppress repetition strongly, but it can degrade output quality. Negative values can also be set to increase repetition instead of reducing it.
(6) Presence Penalty
Like the frequency penalty, the presence penalty aims to reduce token repetition.
Positive presence penalty values penalize new tokens based on whether they have appeared in the text so far, increasing the model’s likelihood of talking about new topics.
Based on API documentation, the presence penalty ranges from -2 to 2 (default 0), whereas the range on the Playground is 0 to 2.
What is the difference between a frequency penalty and a presence penalty?
The subtle difference lies mainly in the degree of penalty on the repeated tokens. The frequency penalty is proportional (i.e., relative marker) **** to how often a particular token has been generated.
On the other hand, the presence penalty is a once-off (additive) penalty applied to a token that has appeared at least once, like a Boolean (1/0) marker.
The impact of these penalties is seen in the following equation for the logit (unnormalized log probability) μ of a token j
:

c[j]
refers to how often a token has been generated previously, and the α values are the penalty coefficients (i.e., between -2 and 2).
Reasonable values for the presence penalty are the same as described for the frequency penalty.
Wrapping It Up
After understanding what each parameter does, we can tweak these advanced settings more confidently to meet our needs.
Tuning these parameters is a delicate blend of art and science, so it is recommended to play around with different configurations to see what works best for various use cases.
Before you go
I welcome you to join me on a journey of Data Science discovery! Follow this Medium page and visit my GitHub to stay updated with more engaging and practical content. Meanwhile, have fun experimenting with ChatGPT’s advanced settings!