Dataclasses vs Attrs vs Pydantic

Posted on Fri 07 August 2020 in Data Science • 6 min read

Python 3.7 introduced dataclasses, a handy decorator that can make creating classes so much easier and seamless. This post will go into comparing a regular class, a 'dataclass' and a class using attrs. Dataclasses were based on attrs, which is a python package that also aims to make creating classes a much more enjoyable experience. Dataclasses are included in the standard library (provided 3.7+), while to enable attrs, it must be installed with pip (eg, pip install attrs). Mainly they are for automating the (sometimes) painful experience of writing dunder methods. You can read more about dunder methods in a prevous post here: https://jackmckew.dev/dunders-in-python.html

Dataclasses vs Attrs vs Pydantic: Features

Feature Dataclass Attrs Pydantic
frozen
defaults
totuple
todict
validators
converters
slotted
programmatic creation

When to use Dataclasses

Dataclasses are mainly about 'grouping' variables together. Choose dataclasses if:

  • The main concern is around the type of the variable, not the value
  • Adding another package dependancy isn't trivial

When to use Attrs

Attrs are about both grouping & validating. Choose attrs if:

  • You're concerned around the performance (attrs supports slotted class generation which are optimized for CPython)

When to use Pydantic

Pydantic is about thorough data validation. Choose pydantic if:

  • You want to validate the values inside each class
  • You want to santise the input

Example Class

First off let's start with our example class in the default way that it would be implemented in Python.

We will also use type hints in our class defintions, this is best practice for ensuring our variables are the type we intend them to be. Type hints are also integrated into attrs for creating classes.

In [1]:
import typing

class Data:
    def __init__(self, x: float=None, y:float=None, kwargs:typing.Dict=None):
        self.x = x
        self.y = y
        self.kwargs = kwargs

The arguments passed to the __init__ constructor are duplicated when instantiating the parameters of the class with the same arguments. This wouldn't typically be the case if the arguments and the parameters don't match. Luckily this is something that both dataclasses and attrs can help with (which we'll see later on).

Now to demonstrate all the different things that both dataclasses & attrs automates for us, let's define a function which takes in the class constructor and prints out all the different elements for each of our classes.

In [2]:
def class_tester(class_constructor):
    test_class_1 = class_constructor()
    test_class_2 = class_constructor()

    print(f"Repr/str dunder method representation: {test_class_1}")

    print(f"Equality dunder method (using ==) (should be True if implemented): {test_class_1 == test_class_2}")

    print(f"Equality dunder method (using is) (should be True if implemented): {test_class_1 is test_class_2}")

class_tester(Data)
Repr/str dunder method representation: <__main__.Data object at 0x00000269A5758A90>
Equality dunder method (using ==) (should be True if implemented): False
Equality dunder method (using is) (should be True if implemented): False

Dataclasses

Dataclasses by default automatically initialise a bunch of dunder methods for us in a class such as:

  • __init__ The initialisation method for the class
  • __repr__ How the class is represented with print() is called
  • __str__ How the class is represented as a string (called with __repr__)
  • __eq__ Used when equality operators are used (eg, ==)
  • __hash__ The hash for the class (called with __eq__)

There's also a stack of other dunder methods that can also be automated which are detailed at: https://docs.python.org/3/library/dataclasses.html

Thank you to Michael Kosher over on Twitter: It's worth noting validation can be added to dataclasses using a __post_init hook. However, it's pretty low level relative to attrs/#pydantic. I did a similar comparison https://mpkocher.github.io/2019/05/22/Dataclasses-in-Python-3-7/

In [3]:
from dataclasses import dataclass

@dataclass
class Data:
    x: float = None
    y: float = None
    kwargs: typing.Dict = None

class_tester(Data)
Repr/str dunder method representation: Data(x=None, y=None, kwargs=None)
Equality dunder method (using ==) (should be True if implemented): True
Equality dunder method (using is) (should be True if implemented): False

Finally we have our attrs class, there is two main 'functions' apart of attrs which are attr.s and attr.ib(). attr.s is the decorator to put on a class to have the package initialise the dunder methods for us, while attr.ib() can be used (optional) for defining the parameters of the class. There is lots of optional arguments for both attr.s and attr.ib(), which documented at: https://www.attrs.org/en/stable/api.html. Mainly the optional arguments are for enabling/disabling the differing dunder methods in the class.

In [4]:
import attr

@attr.s
class Data:
    x: float = attr.ib(default=None)
    y: float = attr.ib(default=None)
    kwargs: typing.Dict = attr.ib(default=None)

class_tester(Data)
Repr/str dunder method representation: Data(x=None, y=None, kwargs=None)
Equality dunder method (using ==) (should be True if implemented): True
Equality dunder method (using is) (should be True if implemented): False

Attrs

Next let's dive into attrs

Validators in attrs

One major functionality that attrs has but dataclasses doesn't, is validators. This enables us to ensure that when our classes are being created that we validate the inputs to any specific values. Let's build an example that ensure our parameter x is greater than 42, and if not raise an error to the user.

In [5]:
import attr

@attr.s
class ValidatedData:
    x: float = attr.ib(default=None,validator=attr.validators.instance_of(int))
    y: float = attr.ib(default=None)
    kwargs: typing.Dict = attr.ib(default=None)

    @x.validator
    def more_than_the_meaning_of_life(self, attribute, value):
        if not value >= 42:
            raise ValueError("Must be more than the meaning of life!")

test_data_point_1 = ValidatedData(42)

test_data_point_2 = ValidatedData(-35)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-5-1f0c941b3dd2> in <module>
     14 test_data_point_1 = ValidatedData(42)
     15 
---> 16 test_data_point_2 = ValidatedData(-35)

<attrs generated init __main__.ValidatedData> in __init__(self, x, y, kwargs)
      4     self.kwargs = kwargs
      5     if _config._run_validators is True:
----> 6         __attr_validator_x(self, __attr_x, self.x)

c:\Users\jackm\Documents\GitHub\jackmckew.dev\drafts\2020\dataclasses-vs-attrs\.env\lib\site-packages\attr\_make.py in __call__(self, inst, attr, value)
   2144     def __call__(self, inst, attr, value):
   2145         for v in self._validators:
-> 2146             v(inst, attr, value)
   2147 
   2148 

<ipython-input-5-1f0c941b3dd2> in more_than_the_meaning_of_life(self, attribute, value)
     10     def more_than_the_meaning_of_life(self, attribute, value):
     11         if not value >= 42:
---> 12             raise ValueError("Must be more than the meaning of life!")
     13 
     14 test_data_point_1 = ValidatedData(42)

ValueError: Must be more than the meaning of life!

Converters in Attrs

Converters are used for the sanitisation of the input data when creating classes. If we want to support our users to create our parameters which are intended to be integers, we can santise this input with converters. This let's our classes be much more flexible with our users while still keeping stability in the typing behind the parameters.

In [6]:
import attr

@attr.s
class ConvertedData:
    x: float = attr.ib(default=None,converter=int)
    y: float = attr.ib(default=None)
    kwargs: typing.Dict = attr.ib(default=None)

    @x.validator
    def more_than_the_meaning_of_life(self, attribute, value):
        if not value >= 42:
            raise ValueError("Must be more than the meaning of life!")

test_data_point_1 = ConvertedData(42)

print(test_data_point_1)

test_data_point_2 = ConvertedData("42")

print(test_data_point_2)
ConvertedData(x=42, y=None, kwargs=None)
ConvertedData(x=42, y=None, kwargs=None)

Programmatic Creation of Attrs

In some cases you may want to create classes programmatically, well attrs doesn't let us down and provides a method for us! We can easily just pass a dictionary of all the parameters we need.

In [7]:
ProgrammaticData = attr.make_class("Data",
                            {'x': attr.ib(default=None),
                            'y': attr.ib(default=None),
                            'kwargs': attr.ib(default=None)}
                            )

print(Data())
print(ProgrammaticData())
Data(x=None, y=None, kwargs=None)
Data(x=None, y=None, kwargs=None)

PyDantic Dataclasses

Pydantic is a python package for data validation and settings management using python type annotations. Perfect, this is what we were trying to do with dataclasses and attrs. Even more so pydantic provides a dataclass decorator to enable data validation on our dataclasses. This enables us to create extensible classes with data validation even easier than attrs!

The biggest benefit here, is now, by default the type annotations are enforced at runtime and any invalid data raises a nicely formatted error.

In [8]:
from pydantic.dataclasses import dataclass
import typing

@dataclass
class Data:
    x: float = None
    y: float = None
    kwargs: typing.Dict = None

class_tester(Data)
Repr/str dunder method representation: Data(x=None, y=None, kwargs=None)
Equality dunder method (using ==) (should be True if implemented): True
Equality dunder method (using is) (should be True if implemented): False

pydantic also automatically implements conversion & data validation, let's test this out.

In [9]:
test_data_point = Data(x='123')
print(test_data_point)
Data(x='t')
Data(x=123.0, y=None, kwargs=None)
---------------------------------------------------------------------------
ValidationError                           Traceback (most recent call last)
<ipython-input-9-67163ff1f554> in <module>
      1 test_data_point = Data(x='123')
      2 print(test_data_point)
----> 3 Data(x='t')

<string> in __init__(self, x, y, kwargs)

c:\Users\jackm\Documents\GitHub\jackmckew.dev\drafts\2020\dataclasses-vs-attrs\.env\lib\site-packages\pydantic\dataclasses.cp38-win_amd64.pyd in pydantic.dataclasses._process_class._pydantic_post_init()

ValidationError: 1 validation error for Data
x
  value is not a valid float (type=type_error.float)

As we can see above, it gives the developers a nicely formatted error message when the data validation failed, and smoothly sanitises the input when it needs to.