From Zero to Hero with Pydantic

By Wouter Donders, ma 05 december 2022, in category Blog

api, json, pydantic, python

As a data person, you will undoubtedly use REST APIs to retrieve data in JSON format. With Python, you can load JSON data into simple dictionaries using the json.loads method from the standard library module json. However, these simple dictionaries can make it difficult to work with the JSON data. For example, if your JSON data has a nested structure, you will need chained indexing to access particular fields. Moreover, there is no straight-forward way to test your assumptions about the data that is returned by the REST API. Luckily, there is pydantic Python package, which will make your life much easier.

As part of PyData 2022, we offer a crash course in using pydantic that teaches you how to: - define a data model - deal with varied data types in fields - deal with missing data - deal with unwanted data field names - validate values - deal with pesky dates

The repository is available here. You will learn pydantic data models, type annotations, fields, constrained types and validators so you can configure complex data models such as the one below.

from pydantic import BaseModel, Field, NonNegativeInt, constr, validator
import datetime
from typing import ClassVar

class Name(BaseModel):
    first: str | None
    last: constr(min_length=1)


class Person(BaseModel):
    id: NonNegativeInt
    is_active: bool
    name: Name
    company: str | None
    birth_date: datetime.date

    # ClassVar types refer to class variables, i.e. these are not fields for class instances
    TODAY: ClassVar[datetime.date] = datetime.date.today()
    EARLIEST_ALLOWED_BIRTH_DATE: ClassVar[datetime.date] = datetime.date(1900, 1, 1)
    LATEST_ALLOWED_BIRTH_DATE: ClassVar[datetime.date] = TODAY

    @validator("birth_date", pre=True)
    def birth_date_must_be_iso8601(cls, value):
        """Convert the %m-%d-%Y string to %Y-%m-%d before actual `datetime.date` validation is run"""
        birth_date = datetime.datetime.strptime(value, "%d-%m-%Y")
        return birth_date.strftime("%Y-%m-%d")

    @validator("birth_date")
    def birth_date_not_before(cls, value):
        """Validate that birth_date value is not before EARLIEST_ALLOWED_BIRTH_DATE"""
        if value < cls.EARLIEST_ALLOWED_BIRTH_DATE:
            raise ValueError(
                f"Birth date {value} precedes earliest allowed date {cls.EARLIEST_ALLOWED_BIRTH_DATE}"
            )
        return value

    @validator("birth_date")
    def birth_date_not_after(cls, value):
        """Validate that birth date value does not exceed LATESTS_ALLOWED_BIRTH_DATE"""
        if value > cls.LATEST_ALLOWED_BIRTH_DATE:
            raise ValueError(
                f"Birth date {value} exceeds latest allowed date {cls.LATEST_ALLOWED_BIRTH_DATE}"
            )
        return value


class Data(BaseModel):
    people: list[Person]