Table of contents
Key Takeaways
By the time you are done reading this article, you will:
- Understand the age-old problem of managing concurrency in scalable applications
- Learn how you can leverage Django’s inbuilt select_for_update function to manage concurrency scenarios.
Prerequisites
To follow along, this article assumes that you are familiar with Django. Django is the most popular open-source lightweight framework of Python used to build websites and backend applications lightning-fastly. You should also be familiar with its object-relational mapping (ORM) layer, which can be used to interact with data from various relational databases in an object-oriented fashion without having to write SQL queries.
Introduction
Concurrency is a concept that allows computer programs to handle numerous tasks simultaneously. This may require multiple sequences of operations running in overlapping periods.
For applications handling substantial traffic, which translates to multiple workers across various server instances (depending on the scaling strategy adopted), concurrency inevitably needs to be managed.
How does it play out in Django? Well, when you use the inbuilt ORM to call the save() method on an object, there is a good chance that two different server instances call this method on the same object simultaneously, causing the data to get corrupted.
How can we resolve this? Quite simple. A pessimistic approach dictates that you should lock the resource in question exclusively until you are finished with it.
When a database operation is in progress, the object or the set of updated objects must be locked until the operation is complete so that no other process can access this object. This will prevent multiple server instances from loading stale data into memory and corrupting the database.
Let’s look at a real-world scenario:
Consider a web application that handles event ticket reservations, and you want to update the available seats for an event in your database when someone makes a reservation.
If multiple people try to reserve a ticket for the same event simultaneously, you don't want them to buy more tickets than are available accidentally. This is further heightened on a day when there is a spike in traffic.
This is where the concurrency problem initially plays out.
Suppose two people want to buy the last ticket to a viral event. If they both try to make a reservation simultaneously, the system might not realise there's only one ticket left before completing their transactions. This could lead to selling more seats than you have.
Django solves this problem by providing the select_for_update method, which returns a queryset that locks all the rows that belong to this queryset until the outermost transaction it is inside gets committed, thus preventing data corruption.
This ensures that only one person can "select" or "reserve" the entity at a time. While one person is in the process of buying it, the other person has to wait until the first transaction is completed. This way, you avoid the issue of selling more seats than you have available.
We can use Django's select_for_update to lock the row during the reservation.
Keep in mind that for this to work, the database that you are using must support transactions and locks. If you are using SQLite, select_for_update is pretty much useless. My recommendation would be to use PostgreSQL.
Conclusion
In conclusion, select_for_update in Django helps prevent issues with data consistency, especially in situations where multiple operations might happen simultaneously on the same piece of data in a database, especially when you are running multiple workers of your Django application.
Sidenote
The Django select_for_update() method is one way to handle application concurrency issues. Other approaches are not pessimistic. It is also possible to employ atomic database transactions, where either version numbers or timestamps are used to check that the version the instance is writing is the same as the version on disk before the transaction is committed. This is also known as Optimistic Concurrency Control/Optimistic Locking.
select_for_update() is a convenient way for Django to handle concurrency situations without writing much custom code.