PostgreSQL Çorbası: Aralık 2023

27 Aralık 2023 Çarşamba

ROW LEVEL SECURITY - CREATE POLICY veya DROP POLICY

Giriş

Açıklaması şöyle

Row Level Security (RLS) lets you define access control policies at the row level within tables. This means that you can specify which users can SELECT, INSERT, UPDATE, or DELETE rows based on the actual data within the row.

Örnek

Elimizde şöyle bir tablo olsun

CREATE TABLE client (
    client_id PRIMARY KEY,     
    client_name VARCHAR(255) NOT NULL,
    email VARCHAR(255),               
    registration_date DATE DEFAULT CURRENT_DATE 
);

ROW LEVEL SECURITY kullanmak için şöyle yaparız

ALTER TABLE client ENABLE ROW LEVEL SECURITY;

ROW LEVEL SECURITY kullanmak için şöyle yaparız

DROP POLICY IF EXISTS client_isolation_policy ON client;

Sonra şöyle yaparız

CREATE POLICY client_isolation_policy on client
  USING (current_user =CONCAT('appuser_', client_id::text));

apppuser + client_id sütun değeri current_user ile aynı ise bu satır kullanıcıya gösterilir

Yeni bir kullanıcı yaratalım

create user appuser_1001 WITH ENCRYPTED PASSWORD 'password';

GRANT connect on database sample_db to appuser_1001;

GRANT USAGE ON SCHEMA public TO appuser_1001;

GRANT select on all tables in schema public to appuser_1001;

GRANT INSERT,UPDATE ON all tables in schema public TO appuser_1001;

root veya postgres kullanıcı etkilenmez. Her şeyi görebilir

select * from client c ;

client_id|client_name|email          |registration_date|
---------+-----------+---------------+-----------------+
     1001|Client 1001|abc@example.com|       2023-10-17|
     1002|Client 1002|abc@example.com|       2023-10-17|

appuser_1001 kullanıcısı sadece 1001 değerine sahip satırları görebilir.

select * from client c ;

client_id|client_name|email          |registration_date|
---------+-----------+---------------+-----------------+
     1001|Client 1001|abc@example.com|       2023-10-17|

Sütun Tipleri - ULID - Universally Unique Lexicographically Sortable Identifier

Bu tipi ilk olarak Performance of ULID and UUID in Postgres Database yazısında gördüm. Açıklaması şöyle

ULIDs contain a 48-bit timestamp followed by 80 bits of random data. The timestamp allows ULIDs to be sorted, which works much better with the b-tree data structure databases use for indexing.”

UUID performans kaybına sebep oluyor. Diğer seçenekler şöyle

ULIDs
Snowflake
Composite keys

25 Aralık 2023 Pazartesi

Checkpointing

Giriş

Checkpointing kısaca WAL dosyasındaki değişikliklerin gerçekten Data Page dosyasına uygulanması demek.

Açıklaması şöyle

The question here can I control how data pages are flushed to disk? And the answer is yes, this is called as checkpointing, where a new WAL record is created called checkpoint record write after the entire data pages in memory are flushed to disk. This is a very delicate process, too much checkpointing sure makes start up faster, but it indeeds takes a toll on your I/O as checkpointing is IO heavy operation as we mentioned before data pages are very large.

Açıklaması şöyle. Page'lerin diske yazılması ve bu anın kaydedilmesi

Technically speaking we are consistent in memory because we do write to pages as we write the WAL and because we always read and write from and to memory first we get the latest and greatest. So data files and indexes pages are up to date in memory but they are out of date on disk.

So in a happy path, I issue a commit and my WAL is flushed successfully to disk and the database returns OK to the client (me) who issued the commit. So the WAL is updated. Then moments later the database start flushing my full pages to disk and now both data files and WAL are in sync on disk. We call this a checkpoint and the process that creates this checkpoint is checkpointing. The database will actually create a checkpoint record that says, I hereby declare at this moment in time, everything written to WAL on this moment is completely in SYNC with what we have on data files on disk.

Undo logs

Açıklaması şöyle. Checkpointing çökerse bellekteki kirli sayfaların bir kısmı diske yazılmış bir kısmı yazılmamış olacak.

Ok. Here is another edge case for you, I flush my WAL and commit, a checkpointer process kicks in and start flushing dirty pages to disk (dirty pages are pages in memory that have been written to) and midway through checkpointing database crashed!

You now have half of the data pages updated and half of them are outdated with the WAL. The WAL only has changes, it has no clue what the original rows looked like. So for that we need another log to record the original state of the row before we made the change and that is the undo log.

Now with the undo log, the database first checks for transactions that started but didn’t commit and roll those back, any changes made by those transactions must be discarded from the WAL. Any changes that made it to the data files MUST be undone so we go back to consistent state where all rows in the data pages are trusted and committed. And finally we redo the WAL changes on those original pages to bring the pages back in sync with the WAL.

The undo logs are also used by transactions that need MVCC visibility to the old version of the row. This means having a long running transaction that does massive changes may slow down repeatable read/serializable isolation level read transactions as those have to go to the undo log and restore the original row before use.

Postgres clever architecture completely can get away without undo logs I explain it here.

14 Aralık 2023 Perşembe

FOR LOOP

Örnek - Batch Deletion

Şöyle yaparız

BEGIN; -- BEGIN A TRANASCATION

WITH batch_cursor AS (
  SELECT id FROM state_transitions
  ORDER BY id ASC
  LIMIT 10000 --- 10k record in a batch
)
FOR batch IN (SELECT * FROM batch_cursor)
LOOP
  DELETE FROM large_table
  WHERE id = batch.id;
END LOOP;

COMMIT;  -- COMMITING THE TRANSACTION

Açıklaması şöyle

With batch deletion, we’re gently removing a bunch of records in each go. This not only keeps our database humming but also reduces the risk of causing chaos. ALWAYS RUN THIS IN A TRANSACTION AND COMMIT IT TOGETHER

7 Aralık 2023 Perşembe

pg_partman

Giriş

Açıklaması şöyle

pg_partman is an extension for PostgreSQL that provides automatic partitioning of tables.
...
pg_partman simplifies the process of managing partitioned tables by automating the creation and maintenance of partitions. It includes features to handle both time-based and serial-based partitioning. This extension is particularly useful for tables that grow over time and need to be partitioned based on date or other criteria.

Parallel Sequential Scan

Giriş

Sequential Scan gibidir. Sadece birden fazla worker tarafından gerçekleştirilir

Örnek

Şöyle yaparız. Tabloda Index olmadığı için Full Table Scan yapılıyor

CREATE TABLE t1 (
    a   int,
    b   int
);
test=# explain analyze select count(*) from t1 where a=9999;
                                                          QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=10633.65..10633.66 rows=1 width=8) (actual time=43.930..47.172 rows=1 loops=1)
   ->  Gather  (cost=10633.44..10633.65 rows=2 width=8) (actual time=43.766..47.164 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial Aggregate  (cost=9633.44..9633.45 rows=1 width=8) (actual time=34.394..34.395 rows=1 loops=3)
               ->  Parallel Seq Scan on t1  (cost=0.00..9633.33 rows=42 width=0) (actual time=34.373..34.385 rows=33 loops=3)
                     Filter: (a = 9999)
                     Rows Removed by Filter: 333300
 Planning Time: 0.313 ms
 Execution Time: 47.250 ms
(10 rows)

Pghero

Giriş

Açıklaması şöyle

Pghero is a performance dashboard and monitoring tool for PostgreSQL. It provides an easy-to-use web interface to track database metrics, query performance, and index usage. Pghero helps you identify slow queries, analyze database trends, and optimize your PostgreSQL database for better performance.

Autovacuum Daemon

Giriş

Açıklaması şöyle

Postgres automates the VACCUM processing using auto vacuum daemon. By default, it runs every 1 minute. When the VACCUM wakes up, it invokes three workers. These workers do the VACCUM processing on the target tables.

You can query pg_settings to check various configurations for the autovacuum process in Postgres:

Şöyle yaparız

SELECT
  name,
  setting,
  category,
  short_desc
FROM pg_settings
WHERE name LIKE '%autovacuum%'

autovacuum_vacuum_scale_factor Alanı

Açıklaması şöyle

Varsayılan değer 0.2. Tablodaki satırların %20'sı MVCC yüzünden atıl hale gelirse auto vacuum başlar. Yazma işlemi daha fazla olan tablolarda autovacuum_vacuum_scale_factor daha düşük bir değer olabilir

Örnek

Şöyle yaparız

ALTER TABLE SampleTable2 SET
  (autovacuum_vacuum_scale_factor = 0, autovacuum_vacuum_threshold = 100)

Açıklaması şöyle

... we set autovacuum for SampleTable2 if it has more than 100 DEAD tuples.

4 Aralık 2023 Pazartesi

Soft Delete

Giriş

Açıklaması şöyle

When using the soft delete mechanism on the database, you might run into a situation where a record with a unique constraint was deleted and you want to reuse the unique value.

Örnek

Elimizde şöyle bir tablo olsun

CREATE TABLE users (
  user_id UUID PRIMARY KEY DEFAULT uuid_generate_v7(),
  email TEXT NOT NULL,
  created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP NOT NULL,
  updated_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP NOT NULL,
  deleted_at TIMESTAMP WITH TIME ZONE,
  created_by UUID NOT NULL,
  updated_by UUID NOT NULL,
  FOREIGN KEY (created_by) REFERENCES users(user_id),
  FOREIGN KEY (updated_by) REFERENCES users(user_id),
  is_system SMALLINT DEFAULT 0 NOT NULL CHECK (is_system BETWEEN 0 AND 1)
);

Açıklaması şöyle

Whether a row should be created that has the same natural key as a ‘deleted’ row depends on the table. If this is desirable, unique indexes should be written with a WHERE clause, for example:

Şöyle yaparız. Böylece aynı email değerine sahip başka kullanıcı yaratılabilir

CREATE UNIQUE INDEX “users_email_unique”
  ON users(email, deleted_at)
  WHERE deleted_at IS NULL;

veya şöyle yaparız. Burada email alanı case insensitive olarak düşünüldüğü için case insensitive collation kullanılıyor

CREATE UNIQUE INDEX users_email_unique
  ON users(email COLLATE case_insensitive)
  WHERE deleted_at IS NULL;

3 Aralık 2023 Pazar

CREATE TEMPORARY TABLE

Giriş

TEMPORARY TABLE iki şekilde silinebilir

1. DELETE ALL

2. TRUNCATE

DELETE ALL zaten iyi bir seçenek değil.

TRUNCATE için açıklama şöyle.

TRUNCATE simply creates a new file on disk and does an UPDATE of the pg_class table.

Yani

1. gp_class sitem tablosunun dolmasına sebep oluyor

2. Disk I/O yapılması gerekiyor

Çözüm 1

Açıklaması şöyle.

In Linux, there is one approach that can significantly reduce disk usage of temporary tables. It is to allocate a separate RAM disk for temporary tables.

Bunun için temp_tablespaces seçeneği kullanılıyor

Örnek

Şöyle yaparız

mkdir /mnt/dbtemp
chmod 700 /mnt/dbtemp
chown postgres:postgres /mnt/dbtemp

Daha sonra SQL ile şöyle yaparız. Burada belirtilen dizinin kullanana bir tablespace yaratılıyor. /mnt/dbtemp altında örneğin /mnt/dbtemp/PG_13_202007201/ diye yeni bir dizin yaratılıyor

CREATE TABLESPACE temp LOCATION '/mnt/dbtemp';

Sonra şöyle yaparız. Böylece örneğin /mnt/dbtemp/PG_13_202007201/936082 diye yeni bir dizin yaratılıyor

CREATE TEMPORARY TABLE mytemptable (key0 int8) TABLESPACE temp;

/etc/fstab dosyasına şöyle yaparız. Bir dahaki tekrar başlatmada bu dizin otomatik yüklenir.

tmpfs /mnt/dbtemp/PG_13_202007201/936082 tmpfs \ 
  rw,nodev,nosuid,noatime,nodiratime,size=1G 0 0

Ama bir dahaki başlatmayı beklememek için dizini RAM'e yükleriz

mount /mnt/dbtemp/PG_13_202007201/936082

Sonra postgresql.conf dosyasına şöyle yaparız

temp_tablespaces = 'temp'

ve uygulamak için şöyle yaparız

SELECT pg_reload_conf();