Pydantic is a Python library for data validation and settings management using Python-type hinting. In this post, we will look at some of the tips and tricks of Pydantic
Pydantic Tips & Tricks
Recently, I got introduced to Pydantic. I was heavily using FastAPI and absolutely love how it enforces you to use Pydantic for Data serialization and validation. Life before Pydantic was mostly Flask and Django. While both of these were great frameworks, we need something like FastAPI even to see where it is flawed. Enough about FastAPI. This post is not about that.
Let’s go back to Pydantic.
What is Pydantic?
It is a Python library. You do pip install pydantic
and come into some powerful stuff. It is primarily used to validate data coming into your application and serialize data going out of your application.
A simple pydantic model looks like this
from pydantic import BaseModel
import datetime
class Publisher(BaseModel):
name: str
location: str
started_date: datetime.date
class BookModel:
name: str
publisher: Publisher
price: float
isbn: str
published_date: datetime.date
For the rest of this post, we will use the above snippet as our example and make changes to it and explore Pydantic features.
Data Serialization
1. Data serialization from Dict
book_dict = {
"name": "The Alchemist",
"publisher": {
"name": "HarperCollins",
"location": "New York"
},
"price": 10.0,
"isbn": "978-0062315007",
"published_date": "2014-05-01"
}
book = BookModel(**book_dict)
- Data serialization from JSON
import json
book_json = json.dumps(book_dict)
book = BookModel.parse_raw(book_json)
Pretty cool, right? Now let’s see what happens when you poke around with the wrong data format
Data Validation
- Marking fields as optional
from typing import Optional
class BookModel:
...
published_date: Optional[datetime.date]
- Check if the year is greater than 1800
from pydantic import Field
class BookModel:
...
published_date: datetime.date | None = Field(None, gt=datetime.date(1800, 1, 1))
- Validate ISBN
To add custom validation to a field, you can use the validator
decorator
from pydantic import validator
class BookModel:
...
@validator("isbn")
def isbn_must_be_valid(cls, v):
regex = "^(?=(?:\D*\d){10}(?:(?:\D*\d){3})?$)[\d-]+$"
if not re.search(regex, v):
raise ValueError("Invalid ISBN")
- Validate all fields
There will be cases where you want to validate one field based on another field. You can use the root_validator
decorator to validate all fields
from pydantic import root_validator, BaseModel
class Book(BaseModel):
...
@root_validator
def check_published_date(cls, values):
if values.get("published_date") < values.get("publisher").get("started_date"):
raise ValueError("Book cannot be published before publisher started")
- So how does it work?
After adding all the above validators, run the following code. You will encounter a series of validation errors. Now imagine this class hooked to an HTTP request body. You no longer have to handle independent validations.
book = BookModel(name= "The Alchemist",
publisher=Publisher(name="HarperCollins",
location="New York"),
price=10.0,
isbn="abcdef", # error
published_date="1799-05-01" # error
)
Configure Models
- Configuring fields
from pydantic import Field
class BookModel:
name: str = Field(..., min_length=3, max_length=50, alias="book_name")
publisher: Publisher
price: float = Field(..., gt=0)
isbn: str = Field(..., regex="^(?=(?:\D*\d){10}(?:(?:\D*\d){3})?$)[\d-]+$")
published_date: datetime.date = Field(..., gt=datetime.date(1800, 1, 1))
- Configuring model
We can add Config
class to configure the model to tweak the behavior of the model. For example, we can set extra
to forbid
to prevent extra fields from being pubDate to the model. We can also set allow_population_by_field_name
to True
to allow the population of the model by field name. We can also set fields
to a dictionary of field names and its configuration. For example, we can set alias
for a field.
from pydantic import BaseModel, Field
class BookModel(BaseModel):
...
class Config:
extra = "forbid"
allow_population_by_field_name = True
fields = {
"name": {
"alias": "book_name"
},
"published_date": {
"alias": "published"
}
}
- Setting common alias
from pydantic import BaseModel, Field
def to_camel(string: str) -> str:
return "".join(word.capitalize() for word in string.split("_"))
class BookModel(BaseModel):
...
class Config:
extra = "forbid"
allow_population_by_field_name = True
alias_generator = to_camel
There are a lot more configs which you can explore in the docs 🔗
Inheritance woohoo!
Inheritance with pydantic becomes even more powerful since we are also inheriting the config from the parent class.
class EbookModel(BookModel):
format: str
class AudioBookModel(BookModel):
duration: int
class PaperBookModel(BookModel):
weight: float
DeSerializing Data
- Serialize to Dict
book_model.dict() # all fields
book_model.dict(exclude={"isbn"}) # exclude isbn
book_model.dict(exclude={"isbn"}, by_alias=True) # use alias
book_model.dict(include={"name", "price", "publisher": {"name"}}) # only these fields
book_model.dict(exclude_unset=True) # removes all None
- Serialize to JSON
book_model.json() # all fields
book_model.json(exclude={"isbn"}) # exclude isbn
- Pickle
import pickle
book_model_bytes = pickle.dumps(book_model)
Comparison with other libraries
Dataclasses
Let’s start by writing a sample dataclass
from dataclasses import dataclass
import datetime
@dataclass
class Publisher:
name: str
location: str
@dataclass
class Book:
name: str
price: float
isbn: str
published_date: datetime.date
Now let’s see how we can use this dataclass to validate data
book = Book(name= "The Alchemist",
publisher=Publisher(name="HarperCollins",
location="New York"),
price="abcd",
isbn="abcdef",
published_date="1799-05-01"
)
Gives an error TypeError: __init__() got an unexpected keyword argument 'publisher'
. Let’s fix that and retry.
book = Book(name= "The Alchemist",
price="abcd",
isbn="abcdef",
published_date="1799-05-01"
)
That passed. But notice how we didn’t get any error for the price
field. That’s because dataclasses don’t validate data.
Let’s see how we can add validation to dataclasses
from dataclasses import dataclass, field, fields
from typing import Optional
import datetime
@dataclass
class Book:
...
def __post_init__(self):
for field in fields(self):
if field.name == "price":
if not isinstance(field.value, float):
raise ValueError("Price must be a float")
elif field.name == "isbn":
regex = "^(?=(?:\D*\d){10}(?:(?:\D*\d){3})?$)[\d-]+$"
if not re.search(regex, field.value):
raise ValueError("Invalid ISBN")
elif field.name == "published_date":
if field.value < datetime.date(1800, 1, 1):
raise ValueError("Published date must be greater than 1800")
That’s a lot of hoops to jump through. The code doesn’t look clean. We cannot blame Dataclass completely for this. Dataclasses were not designed to validate data. They were designed to create classes with less boilerplate. To keep it generic we had to compensate on lack of powerful features like pydantic.
Want validation? but with dataclass?
Pydantic got you covered in that aspect from pydantic.dataclasses import dataclass
, and you can use it just like you would use dataclass.
Attrs
Attrs is another library which is similar to dataclasses. Attrs is more closer to pydantic than dataclasses. Let’s see how we can use attrs to validate data. They have pretty good argument on why you should use attrs over dataclasses. You can read it here 🔗
from attrs import asdict, define, make_class, Factory
@define
class Publisher:
name: str
location: str
@define
class Book:
name: str
price: float
isbn: str
published_date: datetime.date
publisher: Publisher = Factory(Publisher, name="HarperCollins", location="New York")
# book cannot be created without a publisher
__attrs_post_init__ = lambda self: self.publisher
# published date should be >= 1800
published_date_validator = validator("published_date")(lambda self, attribute, value: value >= datetime.date(1800, 1, 1))
book = Book(name= "The Alchemist",
price="abcd",
isbn="abcdef",
published_date="1799-05-01"
)