PostgreSQL Çorbası: Ocak 2021

25 Ocak 2021 Pazartesi

Replication

Giriş

Primary veri tabanından diğerlerine replication işini hiç yapmadım.

Şartlar

Açıklaması şöyle

The basic requirement replication requirement,

1. Replica DB required same PG version
2. Replica DB is in always sync with Primary DB.

Replication Lag

Örnek

Gecikmenin sebebi için bir örnek burada. Burada replica üzerinde koşan sorgu, WAL sender'i engellediği için gecikme oluyor. Özet olarak silme işlemi eskice kayıtlar üzerinde yapılsa daha iyi

You see when you have postgres replication you have a dilemma.

In one hand someone is querying standby with long running query, pinning a snapshot at t7.

On the other hand the primary has deleted some rows and vacuum is about to purge deleted tuples on t7 as no query on the primary needs them. Vacuum purges those, new WAL records of the purge.

WAL sender kicks in and send the vacuumed WAL entries to purge t7 on the standby but guess what? standby still need them. This blocks replication as the WAL receiver cannot apply the purged row if a query need them.

This creates lag in replication and as a result entire replication halt until queries are done.

Replication Lag Ölçümü

Eğer Patroni kullanmıyorsak elle ölçüm yapılabilir

Örnek

Şöyle yaparız

# Check the current LSN on primary DB.
psql -h primary_host -c "SELECT pg_current_wal_lsn();"

# check replica lag
psql -h replica_host 
  -c "SELECT pg_is_in_recovery(),pg_is_wal_replay_paused(),pg_last_wal_receive_lsn(),
    pg_last_wal_replay_lsn(), pg_last_xact_replay_timestamp();"

# Check the difference between primary DB LSN and replica replay LSN.
lsn_diff_size = psql -h replica_host 
  -c "SELECT pg_wal_lsn_diff('54/5A282990','54/5A282990');"

# Take the different and Check the size in MB or GB.
// LAG difference in MB
psql -h replica_host -c "SELECT round(lsn_diff_size/pow(1024,2.0),2 missing_lsn_mIB;" 

// LAG difference in GB
psql -h replica_host -c "SELECT round(lsn_diff_size/pow(1024,3.0),2 missing_lsn_GIB;"

Ayarlar

Not : Bir örnek burada

1. Master Ayarları

Master üzerinde bir kullanıcı yarat

CREATE USER replication 
REPLICATION LOGIN CONNECTION LIMIT 1 
ENCRYPTED PASSWORD 'replicationpa55word';

# change the maximum number of connections allowed to the replication user
ALTER ROLE replication CONNECTION LIMIT -1;

/etc/postgresql/12/main/postgresql.conf dosyasına şunu ekle. 172.XXX adresini master bilgisayarın gerçek IP adresi ile değiştir.

listen_addresses = 'localhost,172.16.10.220'
wal_level = replica
max_wal_senders = 10
wal_keep_segments = 64

Slave bilgisayarın bağlanabilmesi için doğrulanması gerekir. etc/postgresql/12/main/pg_hba.conf dosyasına şunu ekle.

host    replication     replication     172.16.10.119/0   md5

Master bilgisayarı tekrar başlat

sudo /etc/init.d/postgresql restart
veya 
sudo service postgresql restart

2. Slave Ayarları

/etc/postgresql/12/main/postgresql.conf dosyasına şunu ekle. 172.XXX adresini slave bilgisayarın gerçek IP adresi ile değiştir.

listen_addresses = 'localhost,172.16.10.119'
wal_level = replica
max_wal_senders = 10
wal_keep_segments = 64

Master bilgisayarın bağlanabilmesi için doğrulanması gerekir. etc/postgresql/12/main/pg_hba.conf dosyasına şunu ekle.

host    replication     replication     172.16.10.220/0   md5

Slave bilgisayarın tüm verisini sil

sudo rm -rfv * cd /var/lib/postgresql/12/main/

Master bilgisayardaki tüm veriyi slave bilgisayara kopyala. Burada şifre isteyecektir. Şifre replicationpa55word. Bitince slave bilgisayar tekrar başlatılır

sudo su postgres \
  pg_basebackup \
  -h 172.16.10.220 \
  -U replication \
  -p 5432 \
  -D /var/lib/postgresql/12/main/  \
  -Fp \
  -Xs \
  -P \
  -R

6 Ocak 2021 Çarşamba

Multi-Version Concurrency Control

Giriş

Açıklaması şöyle

Postgres was the first DBMS to rollout multi-version concurrency control (MVCC), which means reading never blocks writing and vice versa. This feature is one of the main reasons why businesses prefer Postgres to MySQL. As Postgres explains, "Unlike most other database systems which use locks for concurrency control, Postgres maintains data consistency by using a multiversion model. This means that while querying a database each transaction sees a snapshot of data (a database version) as it was some time ago, regardless of the current state of the underlying data. This protects the transaction from viewing inconsistent data that could be caused by (other) concurrent transaction updates on the same data rows, providing transaction isolation for each database session."

Utilizing what Oracle calls 'snapshot isolation', MVCC lets multiple readers and writers concurrently interact with the Postgres database, eliminating the need for a read-write lock every time someone interacts with the data. A side benefit is this process provides a big efficiency boost.

Açıklaması şöyle

When we execute a SQL statement, it uses a snapshot of data instead of every row. This prevents users from viewing inconsistent data generated by concurrent transactions. It also minimizes lock contentions for different sessions trying to read or write data.

Bir başka açıklama şöyle

Postgres works with MVCC (Multiversion Concurrency Control) that ultimately creates a new row for update/insert. This helps in concurrent transactions with older transactions having access to an older version of data(as they need to be completed) but at the same time, it’s adding a lot of stale storage in the system. So at a time, only one version of a row will be active and all others will present as dead(stale) rows.

Kısaca tüm bu açıklamalar şunu söylüyor, PostgreSQL satırların kopyasını alıyor ve bir müddet sonra bu satırlar gereksiz/geçersiz hale geliyorlar. Bu yüzden bu satırları ara ara VACUUM, REPACK Extension gibi bir şeyle temizlemek gerekiyor.

Transaction ID

Her transaction için bir tane ID üretilir. Bunu görmek için şöyle yaparız

BEGIN;
SELECT * FROM table1;
SELECT txid_current();
txid_current 
--------------
          754
(1 row)

COMMIT;

MVCC model

MVCC modelde her satır için gizli xmin ve xmax sütunları vardır. Bunları görmek için şöyle yaparız

SELECT xmin, xmax, * FROM <table_name>:
 xmin | xmax | <column_name>
------+------+---------------
  800 |    0 | <column_value>

Açıklaması şöyle

xmin holds the txid of the transaction that inserted this tuple.
xmax holds the txid of the transaction that deleted or updated this tuple. If this tuple has not been deleted or updated, t_xmax is set to 0, which means INVALID.

Örnek

Şeklen şöyle

| xmin | xmax | c1  | c_tid |
|------|------|-----|-------|
| 9    | 12   | 100 | (0,1) |
| 12   | -    | 200 | (0,2) |

Açıklaması şöyle

If transaction 9 creates a row, a new tuple is created with a header xmin value of 9. If a later transaction 12 updated the row, a new tuple is created with xmin 12 and the old tuple xmax is marked as 12 indicating that the old tuple was “alive” from 9–12. Thus the row has two tuples (row versions) one which lived between 9–12 and one that is alive starting from 12 and onwards. A repeatable read transaction 10 for instance must read the row’s old tuple 9–12 and not the new one updated by transaction 11.

Daha sonra VACUUM çalıştırılır ve şöyle olur

| xmix | xmax | c1  | c_tid |
|------|------|-----|-------|
| 12   | -    | 200 | (0,2) |

Açıklaması şöyle

A VACUUM operation clean tuples that have been deleted AND no longer required by any running transactions.

DELETE İşlemi

Açıklaması şöyle. PostgreSQL silinmesi istenen satırları hemen silmiyor. Bu yüzden bu satırları ara ara VACUUM ile silmek gerekiyor.

When your Java application executes a DELETE or UPDATE statement against a PostgreSQL database, a deleted record is not removed immediately nor is an existing record updated in its place. Instead, the deleted record is marked as a dead tuple and will remain in storage. The updated record is, in fact, a brand new record that PostgreSQL inserts by copying the previous version of the record and updating requested columns. The previous version of that updated record is considered deleted and, as with the DELETE operation, marked as a dead tuple.

There is a good reason why the database engine keeps old versions of the deleted and updated records in its storage. For starters, your application can run a bunch of transactions against PostgreSQL in parallel. Some of those transactions do start earlier than others. But if a transaction deletes a record that still might be of interest to a few transactions started earlier, then the record needs to be kept in the database (at least until the point in time when all earlier started transactions finish). This is how PostgreSQL implements MVCC (multi-version concurrency protocol).

It’s clear that PostgreSQL can’t and doesn’t want to keep the dead tuples forever. This is why the database has its own garbage collection process called vacuuming. There are two types of VACUUM — the plain one and the full one. The plain VACUUM works in parallel with your application workloads and doesn’t block your queries. This type of vacuuming marks the space occupied by dead tuples as free, making it available for new data that your app will add to the same table later. The plain VACUUM doesn’t return the space to the operating system so that it can be reused by other tables or 3rd party applications (except in some corner cases when a page includes only dead tuples and the page is in the end of a table).

5 Ocak 2021 Salı

Sütun Tipleri - json

Giriş

Json verisi saklamak için kullanılır. Açıklaması şöyle.

The only difference between json & jsonb is their storage:

- json is stored in its plain text format, while
- jsonb is stored in some binary representation

There are 3 major consequences of this:

- jsonb usually takes more disk space to store than json (sometimes not)
- jsonb takes more time to build from its input representation than json
- json operations take significantly more time than jsonb (& parsing also needs to be done each time you do some operation at a json typed value)

When jsonb will be available with a stable release, there will be two major use cases, when you can easily select between them:

- If you only work with the JSON representation in your application, PostgreSQL is only used to store & retrieve this representation, you should use json.
- If you do a lot of operations on the JSON value in PostgreSQL, or use indexing on some JSON field, you should use jsonb.

Açıklaması şöyle. json tipi 2012 yılında PostgreSQL 9.2 ile eklendi. Yani jsonb sütün tipinden daha önce eklendi.

In 2012, PostgreSQL 9.2 introduced the first JSON data type in Postgres. It had syntax validation but it stored the incoming document directly as text with white spaces included. It wasn't very useful for real-world querying, index-based searching, and other functionalities you would normally do with a JSON document.

Açıklaması şöyle.

...a JSON datatype (2012), and a potpourri of new features in PostgreSQL 10 (better native support for partitioning and replication, full text search support for JSON, etc.)

Örnek

Şöyle yaparız

CREATE TABLE IF NOT EXISTS posts (
   ...
   metadata json default '{}'
 );