How to Validate Your JSON Using JSON Schema

A Gentle Introduction to JSON Schema In Python

Sivan Biham
Towards Data Science

--

Photo by Ferenc Almasi on Unsplash

Imagine the following scenario: you and your teammate are working on a new feature. Your part is to create a JSON with some results and send it to your teammate. Her part is to take this JSON, parse it and save it in the database. You verbally agreed on what the keys and types should be and each one of you implemented their part. Sounds legit, and it will indeed work if the JSON structure is simple. But one day you had a bug and sent the wrong key. You learned your lesson and decided to create an API and document it in your team's favorite documentation platform. Now you can both take a look at this API to make sure you implemented it correctly.

But is it enough? Let’s say you indeed implement it correctly. Let’s say another teammate made a change, now it returns an array of numbers instead of a single number. Your teammate is not aware of your API, and everything breaks.

What if you could validate your JSON directly in the code before sending it and before parsing it? That is what we have JSON schema for!

In this post, I will introduce JSON Schema, why it is so powerful and how can we use it in different scenarios.

What is JSON Schema?

JSON Schema is a JSON-based format for defining the structure of JSON data. It provides a contract for what JSON data is required for a given application and how to interact with it. It can be used for validation, documentation, hyperlink navigation, and interaction control of JSON data.

The schema can be defined in a JSON file, and be loaded into your code or it can be directly created in the code.

How to validate our JSON?

Easy!

validate(instance=your_json, schema=schema)

For example:

from jsonschema import validate

>>> # A sample schema, like what we'd get from json.load()
>>> schema = {
... "type" : "object",
... "properties" : {
... "price" : {"type" : "number"},
... "name" : {"type" : "string"},
... },
... }

>>> # If no exception is raised by validate(), the instance is valid.
>>> validate(instance={"name" : "Eggs", "price" : 34.99}, schema=schema)

>>> validate(
... instance={"name" : "Eggs", "price" : "Invalid"}, schema=schema,
... )
Traceback (most recent call last):
...
ValidationError: 'Invalid' is not of type 'number'

Why should I use JSON Schema?

Each JSON object has a basic structure of key-value. The key is a string and the value can be of any typenumber, string, array, JSON, etc.

In some cases, the value can be of only a specific type, and in other cases, the value is more flexible. Some keys in our JSON are required, and some of them are optional. There are more complicated scenarios. For example, if we got a certain key, then a second key must appear. The value of one key can be dependent on a second key value.

All those scenarios and many more can be tested and validated locally using JSON Schema. By using it, you can validate your own JSON, and make sure it meets the API requirements before integrating with other services.

Simple JSON Schema

In this example, our JSON contains information about dogs.

{
"breed": "golden retriever",
"age": 5,
"weight": 13.5,
"name": "Luke"
}

Let's take a closer look at this JSON’s properties and the requirements we want to enforce on each one:

  • Breed —we want to represent only three breeds: golden retrievers, Belgian Malinois, and Border Collie. We would like to validate that case.
  • Age — we want the age to be rounded to years, so our value will be represented as an integer. In this example, we also want to limit the maximum age to 15.
  • Weight — can be any positive number, int or float.
  • Name — always a string. Can be any string.

Our schema will be -

{
"type": "object",
"properties":
{
"breed": {"type":"string", "enum":[
"golden retrievers",
"Belgian Malinois",
"Border Collie"
]
},
"age": {"type": "int", "maximum":15, "minimum":0},
"weight": {"type":"number", "minimum":0},
"name": {"type":"string"}
}
}

This way, only age values between 0 and 15 can be added, no negative weight, and only the three specific breeds.

Simple Array Schema

We can also validate array values.

For example, we want an array with the following properties: between 2 to 5 items, unique values, strings only.

['a','b','c']{
"type": "array",
"items": {"type": "string"},
"minItems": 2,
"maxItems": 5,
"uniqueItems": true
}

More complex functionality

Required properties

Some of the properties are must-haves and we would like to raise an error if they are missing.

You can add therequired keyword.

{
"type": "object",
"properties":
{
"breed": {"type":"string"},
"age": {"type": "int", "maximum":15, "minimum":0}
}
"required":["breed"]
}

In this case, an error will be raised when the “breed” property is missing. Other properties like “age” remain optional.

Dependent required

The dependentRequired keyword conditionally requires certain properties to be present if a given property is present in an object.

{
"type": "object",

"properties": {
"name": { "type": "string" },
"credit_card": { "type": "number" },
"billing_address": { "type": "string" }
},

"required": ["name"],

"dependentRequired": {
"credit_card": ["billing_address"]
}
}

In this case, if the “credit_card” property appears, then “billing_address” is required.

One of / Any of

Until now, each property is of only one type. What if our property can be of several different types?

Example 1 — anyOf— To validate against anyOf the given data must be valid against any (one or more) of the given subschemas.

{
"anyOf": [
{ "type": "string"},
{ "type": "number", "minimum": 0 }
]
}

In this case, our data can be either a string or a number bigger or equal to 0.

Example 2 — oneOf— To validate against oneOfthe given data must be valid against exactly one of the given subschemas.

{
"oneOf": [
{ "type": "number", "multipleOf": 5 },
{ "type": "number", "multipleOf": 3 }
]
}

In this case, the data can only be numbers and it can be either multiple of 5 or multiple of 3, but not both!

Summary

JSON Schema is a powerful tool. It enables you to validate your JSON structure and make sure it meets the required API. You can create a schema as complex and nested as you need, all you need are the requirements. You can add it to your code as an additional test or in run-time.

In this post, I introduced the basic structure and mentioned some more complex options. There is a lot to explore and use that you can read about.

I think anyone who works with JSONs as part of their work should be familiar with this package and its options. It has the potential of saving you a lot of time and easing your integration process, just by easily validating your JSON structure. I know it saved me a lot of time since I started using it.

--

--