PostgreSQL Çorbası: Multi-Version Concurrency Control

Giriş

Açıklaması şöyle

Postgres was the first DBMS to rollout multi-version concurrency control (MVCC), which means reading never blocks writing and vice versa. This feature is one of the main reasons why businesses prefer Postgres to MySQL. As Postgres explains, "Unlike most other database systems which use locks for concurrency control, Postgres maintains data consistency by using a multiversion model. This means that while querying a database each transaction sees a snapshot of data (a database version) as it was some time ago, regardless of the current state of the underlying data. This protects the transaction from viewing inconsistent data that could be caused by (other) concurrent transaction updates on the same data rows, providing transaction isolation for each database session."

Utilizing what Oracle calls 'snapshot isolation', MVCC lets multiple readers and writers concurrently interact with the Postgres database, eliminating the need for a read-write lock every time someone interacts with the data. A side benefit is this process provides a big efficiency boost.

Açıklaması şöyle

When we execute a SQL statement, it uses a snapshot of data instead of every row. This prevents users from viewing inconsistent data generated by concurrent transactions. It also minimizes lock contentions for different sessions trying to read or write data.

Bir başka açıklama şöyle

Postgres works with MVCC (Multiversion Concurrency Control) that ultimately creates a new row for update/insert. This helps in concurrent transactions with older transactions having access to an older version of data(as they need to be completed) but at the same time, it’s adding a lot of stale storage in the system. So at a time, only one version of a row will be active and all others will present as dead(stale) rows.

Kısaca tüm bu açıklamalar şunu söylüyor, PostgreSQL satırların kopyasını alıyor ve bir müddet sonra bu satırlar gereksiz/geçersiz hale geliyorlar. Bu yüzden bu satırları ara ara VACUUM, REPACK Extension gibi bir şeyle temizlemek gerekiyor.

Transaction ID

Her transaction için bir tane ID üretilir. Bunu görmek için şöyle yaparız

BEGIN;
SELECT * FROM table1;
SELECT txid_current();
txid_current 
--------------
          754
(1 row)

COMMIT;

MVCC model

MVCC modelde her satır için gizli xmin ve xmax sütunları vardır. Bunları görmek için şöyle yaparız

SELECT xmin, xmax, * FROM <table_name>:
 xmin | xmax | <column_name>
------+------+---------------
  800 |    0 | <column_value>

Açıklaması şöyle

xmin holds the txid of the transaction that inserted this tuple.
xmax holds the txid of the transaction that deleted or updated this tuple. If this tuple has not been deleted or updated, t_xmax is set to 0, which means INVALID.

Örnek

Şeklen şöyle

| xmin | xmax | c1  | c_tid |
|------|------|-----|-------|
| 9    | 12   | 100 | (0,1) |
| 12   | -    | 200 | (0,2) |

Açıklaması şöyle

If transaction 9 creates a row, a new tuple is created with a header xmin value of 9. If a later transaction 12 updated the row, a new tuple is created with xmin 12 and the old tuple xmax is marked as 12 indicating that the old tuple was “alive” from 9–12. Thus the row has two tuples (row versions) one which lived between 9–12 and one that is alive starting from 12 and onwards. A repeatable read transaction 10 for instance must read the row’s old tuple 9–12 and not the new one updated by transaction 11.

Daha sonra VACUUM çalıştırılır ve şöyle olur

| xmix | xmax | c1  | c_tid |
|------|------|-----|-------|
| 12   | -    | 200 | (0,2) |

Açıklaması şöyle

A VACUUM operation clean tuples that have been deleted AND no longer required by any running transactions.

DELETE İşlemi

Açıklaması şöyle. PostgreSQL silinmesi istenen satırları hemen silmiyor. Bu yüzden bu satırları ara ara VACUUM ile silmek gerekiyor.

When your Java application executes a DELETE or UPDATE statement against a PostgreSQL database, a deleted record is not removed immediately nor is an existing record updated in its place. Instead, the deleted record is marked as a dead tuple and will remain in storage. The updated record is, in fact, a brand new record that PostgreSQL inserts by copying the previous version of the record and updating requested columns. The previous version of that updated record is considered deleted and, as with the DELETE operation, marked as a dead tuple.

There is a good reason why the database engine keeps old versions of the deleted and updated records in its storage. For starters, your application can run a bunch of transactions against PostgreSQL in parallel. Some of those transactions do start earlier than others. But if a transaction deletes a record that still might be of interest to a few transactions started earlier, then the record needs to be kept in the database (at least until the point in time when all earlier started transactions finish). This is how PostgreSQL implements MVCC (multi-version concurrency protocol).

It’s clear that PostgreSQL can’t and doesn’t want to keep the dead tuples forever. This is why the database has its own garbage collection process called vacuuming. There are two types of VACUUM — the plain one and the full one. The plain VACUUM works in parallel with your application workloads and doesn’t block your queries. This type of vacuuming marks the space occupied by dead tuples as free, making it available for new data that your app will add to the same table later. The plain VACUUM doesn’t return the space to the operating system so that it can be reused by other tables or 3rd party applications (except in some corner cases when a page includes only dead tuples and the page is in the end of a table).

PostgreSQL Çorbası

6 Ocak 2021 Çarşamba

Multi-Version Concurrency Control

Hiç yorum yok:

Yorum Gönder