Dataclasses vs Attrs vs Pydantic
Posted on Fri 07 August 2020 in Data Science • 6 min read
Python 3.7 introduced dataclasses, a handy decorator that can make creating classes so much easier and seamless. This post will go into comparing a regular class, a 'dataclass' and a class using attrs. Dataclasses were based on attrs, which is a python package that also aims to make creating classes a much more enjoyable experience. Dataclasses are included in the standard library (provided 3.7+), while to enable attrs, it must be installed with pip (eg, pip install attrs
). Mainly they are for automating the (sometimes) painful experience of writing dunder methods. You can read more about dunder methods in a prevous post here: https://jackmckew.dev/dunders-in-python.html
Dataclasses vs Attrs vs Pydantic: Features¶
Feature | Dataclass | Attrs | Pydantic |
---|---|---|---|
frozen | ✅ | ✅ | ✅ |
defaults | ✅ | ✅ | ✅ |
totuple | ✅ | ✅ | ✅ |
todict | ✅ | ✅ | ✅ |
validators | ❌ | ✅ | ✅ |
converters | ❌ | ✅ | ✅ |
slotted | ❌ | ✅ | ❌ |
programmatic creation | ❌ | ✅ | ❌ |
When to use Dataclasses¶
Dataclasses are mainly about 'grouping' variables together. Choose dataclasses if:
- The main concern is around the type of the variable, not the value
- Adding another package dependancy isn't trivial
When to use Attrs¶
Attrs are about both grouping & validating. Choose attrs if:
- You're concerned around the performance (attrs supports slotted class generation which are optimized for CPython)
When to use Pydantic¶
Pydantic is about thorough data validation. Choose pydantic
if:
- You want to validate the values inside each class
- You want to santise the input
Example Class¶
First off let's start with our example class in the default way that it would be implemented in Python.
We will also use type hints in our class defintions, this is best practice for ensuring our variables are the type we intend them to be. Type hints are also integrated into attrs for creating classes.
import typing
class Data:
def __init__(self, x: float=None, y:float=None, kwargs:typing.Dict=None):
self.x = x
self.y = y
self.kwargs = kwargs
The arguments passed to the __init__
constructor are duplicated when instantiating the parameters of the class with the same arguments. This wouldn't typically be the case if the arguments and the parameters don't match. Luckily this is something that both dataclasses and attrs can help with (which we'll see later on).
Now to demonstrate all the different things that both dataclasses & attrs automates for us, let's define a function which takes in the class constructor and prints out all the different elements for each of our classes.
def class_tester(class_constructor):
test_class_1 = class_constructor()
test_class_2 = class_constructor()
print(f"Repr/str dunder method representation: {test_class_1}")
print(f"Equality dunder method (using ==) (should be True if implemented): {test_class_1 == test_class_2}")
print(f"Equality dunder method (using is) (should be True if implemented): {test_class_1 is test_class_2}")
class_tester(Data)
Dataclasses¶
Dataclasses by default automatically initialise a bunch of dunder methods for us in a class such as:
__init__
The initialisation method for the class__repr__
How the class is represented with print() is called__str__
How the class is represented as a string (called with__repr__
)__eq__
Used when equality operators are used (eg,==
)__hash__
The hash for the class (called with__eq__
)
There's also a stack of other dunder methods that can also be automated which are detailed at: https://docs.python.org/3/library/dataclasses.html
Thank you to Michael Kosher over on Twitter: It's worth noting validation can be added to dataclasses using a __post_init hook. However, it's pretty low level relative to attrs/#pydantic. I did a similar comparison https://mpkocher.github.io/2019/05/22/Dataclasses-in-Python-3-7/
from dataclasses import dataclass
@dataclass
class Data:
x: float = None
y: float = None
kwargs: typing.Dict = None
class_tester(Data)
Finally we have our attrs class, there is two main 'functions' apart of attrs which are attr.s
and attr.ib()
. attr.s
is the decorator to put on a class to have the package initialise the dunder methods for us, while attr.ib()
can be used (optional) for defining the parameters of the class. There is lots of optional arguments for both attr.s
and attr.ib()
, which documented at: https://www.attrs.org/en/stable/api.html. Mainly the optional arguments are for enabling/disabling the differing dunder methods in the class.
import attr
@attr.s
class Data:
x: float = attr.ib(default=None)
y: float = attr.ib(default=None)
kwargs: typing.Dict = attr.ib(default=None)
class_tester(Data)
Attrs¶
Next let's dive into attrs
Validators in attrs¶
One major functionality that attrs
has but dataclasses doesn't, is validators. This enables us to ensure that when our classes are being created that we validate the inputs to any specific values. Let's build an example that ensure our parameter x
is greater than 42, and if not raise an error to the user.
import attr
@attr.s
class ValidatedData:
x: float = attr.ib(default=None,validator=attr.validators.instance_of(int))
y: float = attr.ib(default=None)
kwargs: typing.Dict = attr.ib(default=None)
@x.validator
def more_than_the_meaning_of_life(self, attribute, value):
if not value >= 42:
raise ValueError("Must be more than the meaning of life!")
test_data_point_1 = ValidatedData(42)
test_data_point_2 = ValidatedData(-35)
Converters in Attrs¶
Converters are used for the sanitisation of the input data when creating classes. If we want to support our users to create our parameters which are intended to be integers, we can santise this input with converters. This let's our classes be much more flexible with our users while still keeping stability in the typing behind the parameters.
import attr
@attr.s
class ConvertedData:
x: float = attr.ib(default=None,converter=int)
y: float = attr.ib(default=None)
kwargs: typing.Dict = attr.ib(default=None)
@x.validator
def more_than_the_meaning_of_life(self, attribute, value):
if not value >= 42:
raise ValueError("Must be more than the meaning of life!")
test_data_point_1 = ConvertedData(42)
print(test_data_point_1)
test_data_point_2 = ConvertedData("42")
print(test_data_point_2)
Programmatic Creation of Attrs¶
In some cases you may want to create classes programmatically, well attrs doesn't let us down and provides a method for us! We can easily just pass a dictionary of all the parameters we need.
ProgrammaticData = attr.make_class("Data",
{'x': attr.ib(default=None),
'y': attr.ib(default=None),
'kwargs': attr.ib(default=None)}
)
print(Data())
print(ProgrammaticData())
PyDantic Dataclasses¶
Pydantic
is a python package for data validation and settings management using python type annotations. Perfect, this is what we were trying to do with dataclasses and attrs. Even more so pydantic
provides a dataclass decorator to enable data validation on our dataclasses. This enables us to create extensible classes with data validation even easier than attrs
!
The biggest benefit here, is now, by default the type annotations are enforced at runtime and any invalid data raises a nicely formatted error.
from pydantic.dataclasses import dataclass
import typing
@dataclass
class Data:
x: float = None
y: float = None
kwargs: typing.Dict = None
class_tester(Data)
pydantic
also automatically implements conversion & data validation, let's test this out.
test_data_point = Data(x='123')
print(test_data_point)
Data(x='t')
As we can see above, it gives the developers a nicely formatted error message when the data validation failed, and smoothly sanitises the input when it needs to.