1_SmJpp6AW20ezHayGt6Jm5A

Django testing approaches (fixtures vs sql dumps vs factories)

Django testing approaches (fixtures vs sql dumps vs factories)

In this article, we try to find the most appropriate solution to populate a database for tests in our Django project. Django is an open-source Python framework with “batteries included” and it provides us with the ability to code fast and clean. And what is the thing that is crucial when we try to keep our project neat? These are tests.

Importance of quick-running unit* tests

*Basically Django “unit” tests are more “integrational” ones (most of the time they combine DB, models, and views together), but let us call them “unit” for simplicity as a commonly known definition

It’s unbearable to have a big project that is poorly tested. Having tests of different levels of our app (unit testing, integrational testing, system testing) is the main way to prevent errors. Unfortunately, when you have a large thoroughly tested project running tests locally becomes a burden. One may say that it’s okay to have unit tests that run an hour and even more during the CI process.

But let’s talk from the heart: developers do run tests locally to be sure that everything works fine or they wait for the response from the remote server to start anything further. It slows down development. 5–10 minutes is acceptable for local testing, 60 — is not. It’s not only about the time that you spend. It’s about the motivation of the programmers as well. Having long feedback oppresses developers. As a result, they neither want to test what they have done nor to write new tests.

So what one should do is keeping unit tests running fast. What are the main reasons for testing running slow? In Django, one of the main bottlenecks during the tests is the database population. It’s possible to mock the model’s object without hitting DB, but as most of the time the correctness of the results is about the correctness of the data, it’s not a good choice. So we need to populate the database and to do it as fast and relevant as we can.

Actually, this was exactly the reason why we decided to redesign our approach to writing tests. Having our tests running around 2 minutes we spent around 20 minutes on database population.

Fixtures vs SQL dumps vs factories

Let’s go through the main pros and cons of every technique. SQL dumps and fixtures generally represent the same idea so we will start with their common pros and cons and end with features of each of the techniques.

Fixtures and SQL dumps:

pros

  • easy to create via dumping the existing database
  • you don’t need to know all of the aspects of the app to use them in the tests
  • they are the same for the whole test case
  • mostly they represent all possible relative data

cons

  • most of the time you load data that you don’t actually need In the test what makes it slow

Fixtures:

pros

  • Django provides hooks for loading them during testing

cons

  • they are slow to load

SQL dumps:

pros

  • they are fast to load

cons

  • Django doesn’t provide hooks for loading them during testing
  • it’s hard to browse them or change manually

Factories:

pros

  • the data is highly relevant to the test
  • they are easy to change
  • part of the data may be random what widens coverage

cons

  • you need to code them manually and think them through
  • you may lose something valuable during tests because you haven’t created all relevant models
  • you must know the project well to know what exactly you should create and how to do it
  • If you create a lot of objects, they are slower than other approaches

As you can see, each of them has pros and cons so there is no apparent winner. However, I will explain later on, what are the reasons, that we stick to the factories approach.

Let’s discover the usage of each of them, compare the speed of loading and usability.

Let’s start from the models that we have and the tests:

class Sphere(models.Model):
    name = models.CharField('name', max_length=256, unique=True)

    class Meta:
        ordering = ['name']

    def __str__(self):
        return self.name


class Disease(models.Model):
    sphere = models.ForeignKey(Sphere, on_delete=models.CASCADE)
    name = models.CharField(max_length=256, db_index=True, unique=True)
    chronic = models.BooleanField(default=False)
    symptoms = models.ManyToManyField(to=Symptom, through='DiseaseSymptom')
    duration = models.PositiveSmallIntegerField(default=10)
    contagiousness = models.PositiveSmallIntegerField(validators=[MaxValueValidator(100)])
    malignancy = models.PositiveSmallIntegerField(validators=[MaxValueValidator(100)])
    description = models.TextField()
    diagnostics = models.TextField(blank=True, null=True)
    treatment = models.TextField(blank=True, null=True)
    passing = models.TextField(blank=True, null=True)
    recommendations = models.TextField(blank=True, null=True)
    # occurrence = models.PositiveIntegerField(default=1)  # How many times this disease has occurred
    number = models.PositiveIntegerField('number of people on average to get disease from 10^6', default=0)

    class Meta:
        ordering = ['name']

    def __str__(self):
        return self.name

So these are two simple models that have o2m relation.

Tests.code main part:

def setUp(self):
    super(SymptomFixturesTestCase, self).setUp()
    self.new_sphere = Sphere.objects.create(name='fake')

def test_update_name(self):
    # update some diseases to run rollbacks
    for disease in Disease.objects.all()[:3]:
        disease.name = 'fake_name' + str(disease.id)
        disease.save()

    self.assertEqual(Disease.objects.filter(name__startswith='fake_name').count(), 3)

def test_delete(self):
    # delete some diseases to run rollbacks
    disease_count = Disease.objects.count()
    for disease in Disease.objects.all()[:3]:
        disease.delete()

    self.assertEqual(Disease.objects.count(), disease_count - 3)

def test_create(self):
    diseases_count = Disease.objects.count()
    first_disease = Disease.objects.create(name='fake', sphere=self.new_sphere, duration=15,
                                           contagiousness=15, malignancy=50, description='fake')

    self.assertEqual(Disease.objects.count(), diseases_count + 1)

def test_remove_sphere(self):
    # check that deletion of sphere removes all disease
    sphere = Sphere.objects.first()
    sphere_disease_count = Disease.objects.filter(sphere_id=sphere.id).count()
    all_disease_count = Disease.objects.count()
    sphere.delete()
    self.assertEqual(Disease.objects.count(), all_disease_count - sphere_disease_count)

The code is quite simplified but it shows the idea well.
Let’s start from the fixtures approach. As it’s common, I just dumped our test database, that covers most of the edge-cases. As a result, we have a bit less than 1000 objects in sum. It may seem excessive as we for sure don’t use most of them in the tests, but as I said it’s simplified. Most of them are used in other tests. As it happens, we start from one TestCase where the data is needed and a neat fixture of 5 objects and end up with a monster fixture that is used in 20 test cases and contains 500 objects. The problems fixtures cause are the following:

– fixtures tend to grow in size as most of the time we try to populate them with data suitable for all our tests. As our fixtures grow we load and process irrelevant data.

– we have two sources of data, one is fixtures and another one is created in the SetUp method, that is quite a common use case. Therefore, we need to maintain both of them. What makes the approach even more error-prone and harder to change.

-most of the time we don’t know which object we treat. We just take the first one, or the last one, which makes testing obscure.

Let’s see how much time the tests take.

----Ran 4 tests in 0.745s----Let’s see how much of it is actually spent on running the tests from PyCharm:----
Test Results 26ms
---

It’s just ridiculous. Most of the time is spent on the fixture loading.

Maybe the problem is in fixtures and not the data volume? Fixtures have to be parsed and then Django ORM stands in to create model objects which can create the slowdown. Let’s substitute our fixtures with SQL dump. The first problem that we will face is that Django doesn’t work with SQL loading out of the box. Especially with the tests. So we need to overwrite our TestCase class in the following way:

SQL approach. TestCase, SetUp method

__location__ = os.path.realpath(os.path.join(os.getcwd(), os.path.dirname(__file__)))

@classmethod
def setUpClass(cls):
    super(SymptomFixturesTestCase, cls).setUpClass()
    with connection.cursor() as cur:
        with open(os.path.join(__location__, 'dump.sql')) as f:
            for statement in sqlparse.split(f.read()):
                if not statement:
                    continue
                cur.execute(statement)

This snippet helps us to load SQL dump when tests run for the first time. It has to be said, that this code is a proof of concept only and strongly relies on the transaction support by our test database. So what are the results of such an approach?

------Ran 4 tests in 0.659s-----

We have gained a little speedup but it doesn’t help us a lot. Also, it creates some problems on its own (using SQL dumps is not a recommended solution by the Django team. Actually, they intentionally prohibited it). So the reason for our tests running slow is not the speed of Django ORM and fixtures loading, it’s about the data volume.

One way to solve it is sharding our fixtures into the smaller ones, which are more relative to our TestCase. Unfortunately, it solves only the first problem of the fixtures and makes maintainability harder.

Another solution is to create objects by ourselves in a test to be easily changeable. It will help us to keep data as relevant as possible. As creating objects directly with ORM produces a lot of boilerplate, factories that provide us with basic capabilities to auto-populate data are a good choice. So how will it look with the use of factory_boy?

import factory

from diseases.models import Sphere, Disease


class SphereFactory(factory.django.DjangoModelFactory):
    class Meta:
        model = Sphere

    name = factory.Sequence(lambda n: 'Name {0}'.format(n))


class DiseaseFactory(factory.django.DjangoModelFactory):
    class Meta:
        model = Disease

    sphere = factory.SubFactory(SphereFactory)
    name = factory.Sequence(lambda n: 'Name {0}'.format(n))
    contagiousness =  factory.fuzzy.FuzzyInteger(low=1, high=100)
    malignancy =  factory.fuzzy.FuzzyInteger(low=1, high=100)
    description = factory.Sequence(lambda n: 'Description {0}'.format(n))

And our test case will be as the following

class SymptomFactoriesTestCase(TestCase):

    def test_update_name(self):
        # update some disease to run rollbacks

        for disease in DiseaseFactory.create_batch(3):
            disease.name = 'fake_name' + str(disease.id)
            disease.save()

        self.assertEqual(Disease.objects.filter(name__startswith='fake_name').count(), 3)

    def test_delete(self):
        # delete some disease to run rollbacks
        DiseaseFactory.create_batch(5)
        disease_count = Disease.objects.count()
        for disease in Disease.objects.all()[:3]:
            disease.delete()

        self.assertEqual(Disease.objects.count(), disease_count - 3)

    def test_create(self):
        disease_count = Disease.objects.count()
        DiseaseFactory()

        self.assertEqual(Disease.objects.count(), disease_count + 1)

    def test_remove_sphere(self):

        # check that deletion of sphere removes all disease
        sphere = SphereFactory()
        DiseaseFactory.create_batch(3, sphere=sphere)
        sphere_disease_count = Disease.objects.filter(sphere_id=sphere.id).count()
        all_disease_count = Disease.objects.count()
        sphere.delete()
        self.assertEqual(Disease.objects.count(), all_disease_count - sphere_disease_count)

And what do the time metrics show?

-------Ran 4 tests in 0.028s------

And that’s all. No more time is spent on data creating.

As you can see, it has sped up our tests nearly 30 times. Even though it’s quite hypothetical, as I’ve seen it’s typical to have a speed increase from 2 to 10 times.

Conclusion

In conclusion, I want to say that there is no silver bullet. You either write your tests carefully, loading only relative data but spend more time on writing them or you do it fast and simple but with the time lost later in running them. It’s up to a specific case of what to choose. It’s fine to start from fixtures and rewrite it later to factories when you see that fixtures start growing to an unmaintainable data source that is big in size and used in various tests.

 

  • #Custom software development
  • #Django
  • #Programming
  • #Python
  • #Testing
  • Connect with our experts