By Wouter Donders, ma 05 december 2022, in category Blog
As a data person, you will undoubtedly use REST APIs to retrieve data in JSON format.
With Python, you can load JSON data into simple dictionaries using the json.loads
method from the standard library module json
.
However, these simple dictionaries can make it difficult to work with the JSON data.
For example, if your JSON data has a nested structure, you will need chained indexing to access particular fields.
Moreover, there is no straight-forward way to test your assumptions about the data that is returned by the REST API.
Luckily, there is pydantic
Python package, which will make your life much easier.
As part of PyData 2022, we offer a crash course in using pydantic
that teaches you how to:
- define a data model
- deal with varied data types in fields
- deal with missing data
- deal with unwanted data field names
- validate values
- deal with pesky dates
The repository is available here. You will learn pydantic data models, type annotations, fields, constrained types and validators so you can configure complex data models such as the one below.
from pydantic import BaseModel, Field, NonNegativeInt, constr, validator
import datetime
from typing import ClassVar
class Name(BaseModel):
first: str | None
last: constr(min_length=1)
class Person(BaseModel):
id: NonNegativeInt
is_active: bool
name: Name
company: str | None
birth_date: datetime.date
# ClassVar types refer to class variables, i.e. these are not fields for class instances
TODAY: ClassVar[datetime.date] = datetime.date.today()
EARLIEST_ALLOWED_BIRTH_DATE: ClassVar[datetime.date] = datetime.date(1900, 1, 1)
LATEST_ALLOWED_BIRTH_DATE: ClassVar[datetime.date] = TODAY
@validator("birth_date", pre=True)
def birth_date_must_be_iso8601(cls, value):
"""Convert the %m-%d-%Y string to %Y-%m-%d before actual `datetime.date` validation is run"""
birth_date = datetime.datetime.strptime(value, "%d-%m-%Y")
return birth_date.strftime("%Y-%m-%d")
@validator("birth_date")
def birth_date_not_before(cls, value):
"""Validate that birth_date value is not before EARLIEST_ALLOWED_BIRTH_DATE"""
if value < cls.EARLIEST_ALLOWED_BIRTH_DATE:
raise ValueError(
f"Birth date {value} precedes earliest allowed date {cls.EARLIEST_ALLOWED_BIRTH_DATE}"
)
return value
@validator("birth_date")
def birth_date_not_after(cls, value):
"""Validate that birth date value does not exceed LATESTS_ALLOWED_BIRTH_DATE"""
if value > cls.LATEST_ALLOWED_BIRTH_DATE:
raise ValueError(
f"Birth date {value} exceeds latest allowed date {cls.LATEST_ALLOWED_BIRTH_DATE}"
)
return value
class Data(BaseModel):
people: list[Person]