Pydantic is a Python library for data validation and settings management using Python-type hinting. In this post, we will look at some of the tips and tricks of Pydantic

Jul 2, 2023

#python

Pydantic Tips & Tricks

Recently, I got introduced to Pydantic. I was heavily using FastAPI and absolutely love how it enforces you to use Pydantic for Data serialization and validation. Life before Pydantic was mostly Flask and Django. While both of these were great frameworks, we need something like FastAPI even to see where it is flawed. Enough about FastAPI. This post is not about that.

Let’s go back to Pydantic.

What is Pydantic?

It is a Python library. You do pip install pydantic and come into some powerful stuff. It is primarily used to validate data coming into your application and serialize data going out of your application.

A simple pydantic model looks like this

from pydantic import BaseModel
import datetime

class Publisher(BaseModel):
    name: str
    location: str
    started_date: datetime.date

class BookModel:
    name: str
    publisher: Publisher
    price: float
    isbn: str
    published_date: datetime.date

For the rest of this post, we will use the above snippet as our example and make changes to it and explore Pydantic features.

Data Serialization

1. Data serialization from Dict

book_dict = {
    "name": "The Alchemist",
    "publisher": {
        "name": "HarperCollins",
        "location": "New York"
    },
    "price": 10.0,
    "isbn": "978-0062315007",
    "published_date": "2014-05-01"
}

book = BookModel(**book_dict)

Data serialization from JSON

import json
book_json = json.dumps(book_dict)
book = BookModel.parse_raw(book_json)

Pretty cool, right? Now let’s see what happens when you poke around with the wrong data format

Data Validation

Marking fields as optional

from typing import Optional

class BookModel:
    ...
    published_date: Optional[datetime.date]

Check if the year is greater than 1800

from pydantic import Field

class BookModel:
    ...
    published_date: datetime.date | None = Field(None, gt=datetime.date(1800, 1, 1))

Validate ISBN

To add custom validation to a field, you can use the validator decorator

from pydantic import validator


class BookModel:
    ...
    @validator("isbn")
    def isbn_must_be_valid(cls, v):
        regex = "^(?=(?:\D*\d){10}(?:(?:\D*\d){3})?$)[\d-]+$"
        if not re.search(regex, v):
            raise ValueError("Invalid ISBN")

Validate all fields

There will be cases where you want to validate one field based on another field. You can use the root_validator decorator to validate all fields

from pydantic import root_validator, BaseModel

class Book(BaseModel):
    ...

    @root_validator
    def check_published_date(cls, values):
        if values.get("published_date") < values.get("publisher").get("started_date"):
            raise ValueError("Book cannot be published before publisher started")

So how does it work?

After adding all the above validators, run the following code. You will encounter a series of validation errors. Now imagine this class hooked to an HTTP request body. You no longer have to handle independent validations.

book = BookModel(name= "The Alchemist",
                publisher=Publisher(name="HarperCollins",
                                    location="New York"),
                price=10.0,
                isbn="abcdef", # error
                published_date="1799-05-01" # error
)

Configure Models

Configuring fields

from pydantic import Field

class BookModel:
    name: str = Field(..., min_length=3, max_length=50, alias="book_name")
    publisher: Publisher
    price: float = Field(..., gt=0)
    isbn: str = Field(..., regex="^(?=(?:\D*\d){10}(?:(?:\D*\d){3})?$)[\d-]+$")
    published_date: datetime.date = Field(..., gt=datetime.date(1800, 1, 1))

Configuring model

We can add Config class to configure the model to tweak the behavior of the model. For example, we can set extra to forbid to prevent extra fields from being pubDate to the model. We can also set allow_population_by_field_name to True to allow the population of the model by field name. We can also set fields to a dictionary of field names and its configuration. For example, we can set alias for a field.

from pydantic import BaseModel, Field

class BookModel(BaseModel):
    ...

    class Config:
        extra = "forbid"
        allow_population_by_field_name = True
        fields = {
            "name": {
                "alias": "book_name"
            },
            "published_date": {
                "alias": "published"
            }
        }

Setting common alias

from pydantic import BaseModel, Field

def to_camel(string: str) -> str:
    return "".join(word.capitalize() for word in string.split("_"))

class BookModel(BaseModel):
    ...
    class Config:
        extra = "forbid"
        allow_population_by_field_name = True
        alias_generator = to_camel

There are a lot more configs which you can explore in the docs 🔗

Inheritance woohoo!

Inheritance with pydantic becomes even more powerful since we are also inheriting the config from the parent class.

class EbookModel(BookModel):
    format: str

class AudioBookModel(BookModel):
    duration: int

class PaperBookModel(BookModel):
    weight: float

DeSerializing Data

Serialize to Dict


book_model.dict() # all fields
book_model.dict(exclude={"isbn"}) # exclude isbn
book_model.dict(exclude={"isbn"}, by_alias=True) # use alias
book_model.dict(include={"name", "price", "publisher": {"name"}}) # only these fields
book_model.dict(exclude_unset=True) # removes all None

Serialize to JSON

book_model.json() # all fields
book_model.json(exclude={"isbn"}) # exclude isbn

Pickle

import pickle

book_model_bytes = pickle.dumps(book_model)

Comparison with other libraries

Dataclasses

Let’s start by writing a sample dataclass

from dataclasses import dataclass
import datetime

@dataclass
class Publisher:
    name: str
    location: str

@dataclass
class Book:
    name: str
    price: float
    isbn: str
    published_date: datetime.date

Now let’s see how we can use this dataclass to validate data

book = Book(name= "The Alchemist",
            publisher=Publisher(name="HarperCollins",
                                location="New York"),
            price="abcd",
            isbn="abcdef",
            published_date="1799-05-01"
)

Gives an error TypeError: __init__() got an unexpected keyword argument 'publisher'. Let’s fix that and retry.

book = Book(name= "The Alchemist",
            price="abcd",
            isbn="abcdef",
            published_date="1799-05-01"
)

That passed. But notice how we didn’t get any error for the price field. That’s because dataclasses don’t validate data.

Let’s see how we can add validation to dataclasses


from dataclasses import dataclass, field, fields
from typing import Optional
import datetime

@dataclass
class Book:
    ...

    def __post_init__(self):
        for field in fields(self):
            if field.name == "price":
                if not isinstance(field.value, float):
                    raise ValueError("Price must be a float")
            elif field.name == "isbn":
                regex = "^(?=(?:\D*\d){10}(?:(?:\D*\d){3})?$)[\d-]+$"
                if not re.search(regex, field.value):
                    raise ValueError("Invalid ISBN")
            elif field.name == "published_date":
                if field.value < datetime.date(1800, 1, 1):
                    raise ValueError("Published date must be greater than 1800")

That’s a lot of hoops to jump through. The code doesn’t look clean. We cannot blame Dataclass completely for this. Dataclasses were not designed to validate data. They were designed to create classes with less boilerplate. To keep it generic we had to compensate on lack of powerful features like pydantic.

Want validation? but with dataclass?

Pydantic got you covered in that aspect from pydantic.dataclasses import dataclass , and you can use it just like you would use dataclass.

Attrs

Attrs is another library which is similar to dataclasses. Attrs is more closer to pydantic than dataclasses. Let’s see how we can use attrs to validate data. They have pretty good argument on why you should use attrs over dataclasses. You can read it here 🔗


from attrs import asdict, define, make_class, Factory

@define
class Publisher:
    name: str
    location: str

@define
class Book:
    name: str
    price: float
    isbn: str
    published_date: datetime.date
    publisher: Publisher = Factory(Publisher, name="HarperCollins", location="New York")

    # book cannot be created without a publisher
    __attrs_post_init__ = lambda self: self.publisher
    # published date should be >= 1800
    published_date_validator = validator("published_date")(lambda self, attribute, value: value >= datetime.date(1800, 1, 1))


book = Book(name= "The Alchemist",
            price="abcd",
            isbn="abcdef",
            published_date="1799-05-01"
)