The world’s leading publication for data science, AI, and ML professionals.

How to Integrate the Microsoft Translator API in Your Code

A comprehensive beginner friendly guide

Photo from Unsplash courtesy of Edurne.
Photo from Unsplash courtesy of Edurne.

There are many good translator services out there, however one of the most versatile and easiest to setup is the Microsoft Translator [1], giving you access to translators for a multitude of low and high resource languages for free (subject to some monthly translation limits).

In this tutorial, I’ll go over how to setup a translator instance on Azure and how to write an interface to connect with it in you code with best practices. If you are familiar with Azure and already have a Translator instance setup, then visit the project repository directly for access to the code.

This tutorial will cover:

  1. Setting up Azure Translator Instance
  2. Sending Your First Translation Request
  3. Cleaning Up Your Code and Structuring Your Project
  4. Considerations for Using Jupyter Notebooks
  5. Conclusion

Setting up a Translator Instance on Azure

Creating an Azure Account

The first step is to create a Microsoft Azure account. This will require you to have:

  • A valid address
  • A valid email account
  • A valid phone number
  • A valid credit or debit card*

Once you’ve made an account, you will be asked if you want to use the free service or the pay-as-you-go subscription. Go for the free service, you can always move back to the pay-as-you-go subscription if you think that is more suitable for you**. Azure will aggressively try to switch you to the pay-as-you-go service, but you can always stick to the free service.

*Note: the credit/debit card is used to verify who you are. No funds are taken if you are using the free tier account

**Note: For more information on the pricing of the translator service, visit the Translator Pricing documentation.

Setting Up the Translator API

Once you’ve logged into Azure, click "Create a resource" then search "translator". Finally, click on the translator service and click create.

Once you’ve done that, you’ll find a page that requires a number of parameters:

  • Resource Group: a name to collect multiple resources that belong to the same project. This controls how you are billed if you go for a non-free subscription. Name this something that is relevant to your project.
  • *Region:** the region where your instance is running. This is related to how Microsoft manages resources and disaster recovery. The recommended region is Global.
  • Name: the name of your translator service. For translation purposes, this has no effect, but if you need document translation then it will affect the name of your resource’s endpoint.
  • *Pricing Tier:** Go for the free version as a start

Once filled, click create. Azure will run a simple validation and take you to another page where you can confirm the creation of your resource.

*Note: you cannot have multiple instances of the translator with the same Region and Pricing Tier. For example, if you have an free tier instance with Region as East US, to add another free instance you need to change the Region.

Finding Information About Your Resource

By default, Azure will take you to the resource that you created. However the next time you log in you’ll have to find it yourself. You can do this from the home page by clicking on the Translator icon. This will take you to the translator page where you can find all your instances.

Clicking on your instance will take you to the instance page, where you can find all its relevant configurations and details. These will become relevant in the next section.

For now, you can use your translation instance in the browser to get an idea of how the input text is represented, and what the output text looks like:


Send Your First Translation Request

By default, Azure gives you default code that you can copy to make your first translation request. However, if you are unfamiliar with how requests work then you may struggle to understand what it’s doing, and thus you won’t be able to use it effectively in your code. Here I’ll go step by step into the concepts involved in making your first translation request.

Brief Intro to HTTPs Requests

Before writing the code, it’s worth covering a couple of concepts related to the translation API.

  • The API has a URL which allows you to access it. For the Microsoft Translator, this is a public URL:

Azure Cognitive Services Translator documentation – quickstarts, tutorials, API reference – Azure…

  • The API has endpoints (these are like paths on a url) that you send HTTPs requests to. For example, the most basic endpoint is the languages endpoint. This endpoint simply returns all the languages that you can choose from. It is a get endpoint because it "gets" resources or data from an API or a resource
  • Each endpoint has parameters that specify what you are asking from the endpoint. For instance, the languages endpoint has a parameter api-version, which indicates which version of the translator you’re using.

For example, the complete URI for the languages endpoint using version 3.0 of the Microsoft API is as follows:

You can call the languages endpoint in Python using the requests module:

import requests
microsoft_api_url = 'https://api.cognitive.microsofttranslator.com/languages?api-version=3.0'
resp = requests.get(microsoft_api_url)
print(resp.json())

This sends a HTTPs request to the API to retrieve the languages available for version 3.0 of the translator. In fact, because this endpoint is public, you can copy and paste that URL into your browser* to get the same output that you would in code:

https://api.cognitive.microsofttranslator.com/languages?api-version=3.0

*Note: in the background, your browser is sending a get request to the URL and returning you the output

You can find more info on the endpoints available on the official API documentation.

The translate Endpoint

This is the endpoint that allows us to translate text. Unlike the languages endpoint, this is a post request and not a get request. This means that you are sending some data over to generate an output. You are not merely "getting" a resource. You send the data as part of a request body. These are bytes of data that are transmitted as part of your request, but they are different to parameters in that they don’t get appended to the URI path.

The translate endpoint has the following requirements:

Parameters

  • api-version (required): version of the translator you want to use
  • to (required): ISO 639–1 language code(s) of the language(s) you want to translate your text(s) into

Request Body

  • Array of texts that you want to translate in the following format: {"text": "This is a sentence I want to translate"}

In Python, you can post requests as follows. I’ve deliberately added two Translation languages to show you how multiple parameters of the same name are added to a request url.

import requests
body = [{"text": "First sentence I want to translate"}, {"text": "Second sentence I want to translate"}]
api_version = "3.0"
german_iso_code = 'de'
arabic_iso_code = 'ar'
endpoint = 'translate'

url = f'https://api.cognitive.microsofttranslator.com/{endpoint}?api-version={api_version}&to={german_iso_code}&to={arabic_iso_code}'

resp = requests.post(url, json=body)

Managing Access

Now, if you ran the above code then you should have gotten an error. This is because we cannot simply run the post service on it’s own. We need authentication.

This is why we needed to create the account and translator instance the first place.

Attached to your instance is a unique key that allows Microsoft to:

a) verify that the request you are sending is coming from a source that has an Azure account

b) calculate your usage of the service, for billing or restriction purposes*

*Note: remember that on the free version, while you are not billed, you are subject to a certain number of translations that you can make per month.

This unique key can be communicated with Microsoft by using request headers. These are key concepts in HTTPs. They can tell the server the following information about your request:

  • IP address and port
  • Type of data to expect
  • Authentication details

The translator API requires the following items in the header:

  • Subscription Key: this is the authentication key that tells Microsoft that you are authorised to use the service. It is tied to the translator resource that you created in the beginning of the tutorial.
  • Subscription Region: this is the region where your project exists.
  • Content Type: the type of data that is being sent
  • Client Trace ID: a unique ID that identifies your computer. You can read more about this here.

You can find your subscription key on the Azure project page:

In the "Keys and Endpoint" page, you can find two API keys (any of which can be used to authenticate you).

Finally, you can define the headers and add them to the post request you created above, to get a successful translation output:


# code as before, new additions enclosed in ------

import requests
body = [{"text": "First sentence I want to translate"}, {"text": "Second sentence I want to translate"}]
api_version = "3.0"
german_iso_code = 'de'
arabic_iso_code = 'ar'
endpoint = 'translate'

### -----------------------------------------------------
import uuid

# YOUR PROJECT CREDENTIALS
your_key = "your_key_keep_this_a_secret"
your_project_location = "your_project_location"

# headers
headers = {
  'Ocp-Apim-Subscription-Key': your_key,
  'Ocp-Apim-Subscription-Region': your_project_location,
  # default values
  'Content-type': 'application/json',
  'X-ClientTraceId': str(uuid.uuid4())
}
### -----------------------------------------------------

url = f'https://api.cognitive.microsofttranslator.com/{endpoint}?api-version={api_version}&to={german_iso_code}&to={arabic_iso_code}'

resp = requests.post(
  url,
  headers=headers,  # add the headers
  json=body
)

Your API keys are what allow you to use the service. These must never get leaked, and it is a good idea to regenerate them every couple of months. In the next section, we will cover best practices for decreasing the chances of leaks.


Cleaning Up the Code and Structuring Your Project

This section will get into good software development practices for integrating the Microsoft Translate API functionality within your code and projects. We will cover:

  • Directory structure
  • How to hide credentials
  • How to package the requests into functions and add basic logging
  • How to add informative documentation

Directory Structure

When developing an application, you may be interacting with multiple external APIs. As such, it is a good practice to store functionality for external APIs in separate files and then call them in your main application code. I recommend having all the external APIs in a subfolder called ‘external_apis’ under your package, and separate Python files that include functions for calling each API. I also recommend adding a config.py file within the external_apis subfolder to add configurations for your external APIs.

Hiding Credentials Using Environment Variables

Remember: you should never leak your API keys. If they do, regenerate them straightaway.

Yet, you need them in order to make translation requests. In general, you should avoid (in order of severity):

  • Hard coding the key in your code: Even if you host your code privately, the key will always be available in commit histories.
  • Printing your key (anywhere): less risky, but having print statements increases the likelihood that your key is pushed to GitHub as part of Jupyter Notebook outputs or stored in server logs.
  • Save your key in configuration files: far less risky, as pushing configuration files by accident is unlikely, and .gitignore can make it near impossible. However, there is still a better method.

The best method for using credentials in your code is to use environment variables. These are session based variables, meaning that they are only saved for the duration of the terminal session that you are running your code against, thus greatly minimising human errors.

To do this, we can make use of the config.py file:

import os

MICROSOFT_TRANSLATE_API_KEY = os.environ.get('MICROSOFT_TRANSLATE_API_KEY', 'default_key')

With this, by default our key takes the value "default_key". We’d need to explicitly set it prior to running any code using the terminal:

python -c "from package_name.external_apis.config import MICROSOFT_TRANSLATE_API_KEY; print(MICROSOFT_TRANSLATE_API_KEY)"

export MICROSOFT_TRANSLATE_API_KEY="your_actual_key"

python -c "from package_name.external_apis.config import MICROSOFT_TRANSLATE_API_KEY; print(MICROSOFT_TRANSLATE_API_KEY)"

If you want to be extra cautious, you can add extra levels of abstraction to the API key to make it difficult to accidentally extract its value. For example, you can create a class Password , storing the password as a hidden variable, and then adding an explicit "get_password" method:

import os

class Password:
  def __init__(self, password):
    self.__password = password

  def get_password():
    return self.__password

MICROSOFT_TRANSLATE_API_KEY_CLASS = Password(os.environ.get('MICROSOFT_TRANSLATE_API_KEY', 'default_key'))

print(MICROSOFT_TRANSLATE_API_KEY_CLASS.get_password())  # prints password
print(MICROSOFT_TRANSLATE_API_KEY_CLASS.password)  # error
print(MICROSOFT_TRANSLATE_API_KEY_CLASS.__password)  # error

This way, you call the get_password method when defining the headers for the request.

Packaging Your Code Into Functions and Adding Logging

Now that we are aware of the basics, we make some improvements:

  • Add all identifiers for the Microsoft Translator API in the config.pyfile
"""
config.py file
"""
import os

# MICROSOFT API CONFIGS
MICROSOFT_TRANSLATE_URL = 'https://api.cognitive.microsofttranslator.com'
MICROSOFT_TRANSLATE_LOCATION = os.environ.get('MICROSOFT_TRANSLATE_LOCATION', 'default_location')
MICROSOFT_TRANSLATE_API_KEY = os.environ.get('MICROSOFT_TRANSLATE_API_KEY', 'default_key')

Here we have also added the location of your instance as an environment variable.

  • Add separate functions for each endpoint
"""
microsoft.py file
"""

import uuid
from package_name.external_apis.config import (
  MICROSOFT_TRANSLATE_URL,
  MICROSOFT_TRANSLATE_LOCATION,
  MICROSOFT_TRANSLATE_API_KEY
)

# -- prepare headers
HEADERS = {
  'Ocp-Apim-Subscription-Key': MICROSOFT_TRANSLATE_API_KEY,
  'Ocp-Apim-Subscription-Region': MICROSOFT_TRANSLATE_LOCATION,
  'Content-type': 'application/json',
  'X-ClientTraceId': str(uuid.uuid4())
}

# -- utils
def _is_response_valid(status_code):
    if str(status_code).startswith('2'):
        return True

# -- functions for endpoints

# /languages endpoint
def get_languages(api_version='3.0'):

    # prepare url
    url = f'{MICROSOFT_TRANSLATE_URL}/languages?api-version={api_version}'

    # send request and process outputs
    resp = requests.get(url)
    status_code = resp.status_code
    if _is_response_valid(status_code):
        return resp.json(), status_code

    return resp.text, status_code

# /translate endpoint
def translate_text(text, target_language, source_language=None, api_version='3.0'):

    # send request and process outputs
    url = f'{MICROSOFT_TRANSLATE_URL}/translate?api-version={api_version}'

    # standardise target language type
    if isinstance(target_language, str):
        target_language = [target_language]

    # dynamically add array parameter to url
    for lang in target_language:
        url = f'{url}&to={lang}'

    if source_language:
        url = f'{url}&from={source_language}'

    # standardise text type
    if isinstance(text, str):
        text = [text]

    # dynamically build the request body
    body = [{'text': text_} for text_ in text]

    # send request and process outputs
    resp = requests.post(url, headers=HEADERS, json=body)
    status_code = resp.status_code

    if _is_response_valid(status_code)
        return resp.json(), status_code

    return resp.text, status_code
  • Add logging and documentation using typing and sphinx style docstrings
"""
microsoft.py file
"""

import uuid
import logging
from package_name.external_apis.config import (
  MICROSOFT_TRANSLATE_URL,
  MICROSOFT_TRANSLATE_LOCATION,
  MICROSOFT_TRANSLATE_API_KEY
)

# imports for typing annotations
from typing import Optional, Union, List

# -- configure logger. Taken from official python docs
LOGGER = logging.getLogger(__name__)
LOGGER.setLevel(logging.DEBUG)
ch = logging.StreamHandler()
ch.setLevel(logging.DEBUG)
date_format = '%Y-%m-%d %H:%M:%S'
formatter = logging.Formatter('%(asctime)s:%(name)s:%(levelname)s:%(message)s', datefmt=date_format)
ch.setFormatter(formatter)
LOGGER.addHandler(ch)

# -- prepare headers
HEADERS = {
  'Ocp-Apim-Subscription-Key': MICROSOFT_TRANSLATE_API_KEY,
  'Ocp-Apim-Subscription-Region': MICROSOFT_TRANSLATE_LOCATION,
  'Content-type': 'application/json',
  'X-ClientTraceId': str(uuid.uuid4())
}

# -- utils
def _is_response_valid(status_code: int) -> Optional[bool]:
    """ Function to check response is valid or not

    :param status_code: status code from response
    :returns: True if valid response, None otherwise
    """
    if str(status_code).startswith('2'):
        return True

# -- functions for endpoints

# /languages endpoint
def get_languages(api_version: str = '3.0') -> tuple:
    """ get languages available from API for specific version

    :param api_version: version of API to use
    :returns: (available languages, status_code)

    """
    # prepare url
    url = f'{MICROSOFT_TRANSLATE_URL}/languages?api-version={api_version}'

    # send request and process outputs
    LOGGER.info(f'Getting languages available on api_version={api_version}')
    resp = requests.get(url)
    status_code = resp.status_code
    if _is_response_valid(status_code):
        return resp.json(), status_code

    LOGGER.error('Failed to get languages')
    return resp.text, status_code

# /translate endpoint
def translate_text(text: Union[str, List[str]], target_language: Union[str, List[str]], source_language: Optional[str] = None, api_version: str = '3.0') -> tuple:
    """translates txt using the microsoft translate API

    :param text: text to be translated. Either single or multiple (stored in a list)
    :param target_language: ISO format of target translation languages
    :param source_language: ISO format of source language. If not provided is inferred by the translator, defaults to None
    :param api_version: api version to use, defaults to "3.0"
    :return: for successful response, (status_code, [{"translations": [{"text": translated_text_1, "to": lang_1}, ...]}, ...]))        
    """
    # send request and process outputs
    url = f'{MICROSOFT_TRANSLATE_URL}/translate?api-version={api_version}'

    # standardise target language type
    if isinstance(target_language, str):
        target_language = [target_language]

    # dynamically add array parameter to url
    for lang in target_language:
        url = f'{url}&to={lang}'

    if source_language:
        url = f'{url}&from={source_language}'

    # standardise text type
    if isinstance(text, str):
        text = [text]

    # dynamically build the request body
    body = [{'text': text_} for text_ in text]

    LOGGER.info(f'Translating {len(text)} texts to {len(target_language)} languages')
    # send request and process outputs
    resp = requests.post(url, headers=HEADERS, json=body)
    status_code = resp.status_code

    if _is_response_valid(status_code)
        return resp.json(), status_code
    LOGGER.error('Failed to translate texts')
    return resp.text, status_code

Considerations for Using Jupyter Notebooks

When using Jupyter Notebook, simply setting environment variables on the Terminal is not enough, because by default Jupyter will not be able to see them. Instead here is what I recommend:

  • Append "_jupyter" when setting your environment variables in the Terminal, then run jupyter notebook
export MICROSOFT_API_CREDENTIALS_JUPYTER='my_key'
jupyter notebook
  • Use the dot_env package (you may have to install this using pip ) to set the correct environment variable by reading the "_jupyter" environment variable. Add the %%capture magic command to ensure that the environment variable is not printed.
%%capture
import os
import json
from dotenv import load_dotenv
load_dotenv() # loads key values pairs into env
MICROSOFT_TRANSLATE_API_KEY = os.environ.get('MICROSOFT_TRANSLATE_API_KEY_JUPYTER')
%set_env MICROSOFT_TRANSLATE_API_KEY=$MICROSOFT_TRANSLATE_API_KEY

You should now be able to authenticate your requests with Microsoft within Jupyter Notebooks.


Concluding Remarks

In this article, we went through setting up a Microsoft Translate instance on Azure and integrating it into projects using best practices.

It’s worth mentioning that while the free version is very good, it is subject to resource limits (2 million characters per month). While that seems like a lot, it runs out pretty quickly. I experienced this recently in a project where I was using the Translate API for data augmentation. Further, there is a limit of 50000 characters per translation request, which means you have to be very careful when translating larger texts. The request is calculated as follows: *total_chars_in_your_texts n_languages**. So in cases where you have larger texts, it makes sense to translate it separately per language or a batch of languages.

I will be releasing an advanced guide for using the Microsoft API where I’ll introduce functions for automatically batching texts such that you are making the best use of the max char limit. Till then, you can find the code for this article here:

ml-utils/microsoft.py at develop · namiyousef/ml-utils

Author’s Note

If you liked this article or learned something new, please consider getting a membership using my referral link:

Join Medium with my referral link – Yousef Nami

This gives you unrestricted access to all of Medium, while helping me produce more content at no extra cost to you.


Reference List

[1] Microsoft Translator. Available from: https://www.google.com/search?q=microsoft+translator&oq=microsoft+translator&aqs=chrome.0.35i39j69i59l2j0i512l2j69i60l3.2307j0j7&sourceid=chrome&ie=UTF-8

All images by author unless otherwise specified


Related Articles