PostgreSQL Çorbası

15 Ocak 2024 Pazartesi

Generalized Inverted Index - GIN Index - JSONB Sütun Tipine Göre Index Full Text Search İçindir

Giriş

JSONB Sütun Tipi için 2 çeşit GIN index yaratılabilir. Açıklaması şöyle

A set of GIN-specific operators that can be used in a query depends on the operator class specified at index creation. For JSONB, supported classes are:

jsonb_ops — the default operator class, which provides two categories of operators: checking the existence of keys and checking the existence of values based on JSONPath expressions or key-value pairs.
jsonb_path_ops — provides only the latter category and offers better performance for these operators.

jsonb_path_ops Tipi

Açıklaması şöyle

Create a GIN index on the JSONB column using the jsonb_path_ops operator class. This limits the operator to only @>, but that operator allows equality comparisons on any number of properties located at different levels within the JSONB document in a single WHERE criteria.

Örnek

Şöyle yaparız

CREATE TABLE sample_jsonb (id serial NOT NULL PRIMARY KEY, data jsonb);

INSERT INTO sample_jsonb (data) VALUES 
('{"name":"First", "count":12, "orderDate":"2022-03-03T12:14:01", "extra":"some text"}');

INSERT INTO sample_jsonb (data) VALUES
('{"name": "Second", "count": 23, "orderDate": "2022-04-03T15:17:01"}');

CREATE INDEX sample_jsonb_path_ops_idx
ON sample_jsonb USING GIN (data jsonb_path_ops);

-- select all rows where the name property is equal to "First" a
SELECT * FROM sample_jsonb WHERE data @> '{"name":"First"}'::jsonb 

-- select all rows where count is greater than 15
SELECT * FROM sample_jsonb WHERE data @@ '$.count > 15'

Örnek

@> operator yani containment operator kullanıyorsak gerekir. Bir index oluşturalım

CREATE INDEX gin_data ON books_data using gin(data);

Bir sorgu çalıştıralım. Gin indeksinin kullanıldığını görebiliriz.

EXPLAIN ANALYZE SELECT * FROM books_data WHERE data @> '{"author" : "Mark Figueroa"}';

27 Aralık 2023 Çarşamba

ROW LEVEL SECURITY - CREATE POLICY veya DROP POLICY

Giriş

Açıklaması şöyle

Row Level Security (RLS) lets you define access control policies at the row level within tables. This means that you can specify which users can SELECT, INSERT, UPDATE, or DELETE rows based on the actual data within the row.

Örnek

Elimizde şöyle bir tablo olsun

CREATE TABLE client (
    client_id PRIMARY KEY,     
    client_name VARCHAR(255) NOT NULL,
    email VARCHAR(255),               
    registration_date DATE DEFAULT CURRENT_DATE 
);

ROW LEVEL SECURITY kullanmak için şöyle yaparız

ALTER TABLE client ENABLE ROW LEVEL SECURITY;

ROW LEVEL SECURITY kullanmak için şöyle yaparız

DROP POLICY IF EXISTS client_isolation_policy ON client;

Sonra şöyle yaparız

CREATE POLICY client_isolation_policy on client
  USING (current_user =CONCAT('appuser_', client_id::text));

apppuser + client_id sütun değeri current_user ile aynı ise bu satır kullanıcıya gösterilir

Yeni bir kullanıcı yaratalım

create user appuser_1001 WITH ENCRYPTED PASSWORD 'password';

GRANT connect on database sample_db to appuser_1001;

GRANT USAGE ON SCHEMA public TO appuser_1001;

GRANT select on all tables in schema public to appuser_1001;

GRANT INSERT,UPDATE ON all tables in schema public TO appuser_1001;

root veya postgres kullanıcı etkilenmez. Her şeyi görebilir

select * from client c ;

client_id|client_name|email          |registration_date|
---------+-----------+---------------+-----------------+
     1001|Client 1001|abc@example.com|       2023-10-17|
     1002|Client 1002|abc@example.com|       2023-10-17|

appuser_1001 kullanıcısı sadece 1001 değerine sahip satırları görebilir.

select * from client c ;

client_id|client_name|email          |registration_date|
---------+-----------+---------------+-----------------+
     1001|Client 1001|abc@example.com|       2023-10-17|

Sütun Tipleri - ULID - Universally Unique Lexicographically Sortable Identifier

Bu tipi ilk olarak Performance of ULID and UUID in Postgres Database yazısında gördüm. Açıklaması şöyle

ULIDs contain a 48-bit timestamp followed by 80 bits of random data. The timestamp allows ULIDs to be sorted, which works much better with the b-tree data structure databases use for indexing.”

UUID performans kaybına sebep oluyor. Diğer seçenekler şöyle

ULIDs
Snowflake
Composite keys

25 Aralık 2023 Pazartesi

Checkpointing

Giriş

Checkpointing kısaca WAL dosyasındaki değişikliklerin gerçekten Data Page dosyasına uygulanması demek.

Açıklaması şöyle

The question here can I control how data pages are flushed to disk? And the answer is yes, this is called as checkpointing, where a new WAL record is created called checkpoint record write after the entire data pages in memory are flushed to disk. This is a very delicate process, too much checkpointing sure makes start up faster, but it indeeds takes a toll on your I/O as checkpointing is IO heavy operation as we mentioned before data pages are very large.

Açıklaması şöyle. Page'lerin diske yazılması ve bu anın kaydedilmesi

Technically speaking we are consistent in memory because we do write to pages as we write the WAL and because we always read and write from and to memory first we get the latest and greatest. So data files and indexes pages are up to date in memory but they are out of date on disk.

So in a happy path, I issue a commit and my WAL is flushed successfully to disk and the database returns OK to the client (me) who issued the commit. So the WAL is updated. Then moments later the database start flushing my full pages to disk and now both data files and WAL are in sync on disk. We call this a checkpoint and the process that creates this checkpoint is checkpointing. The database will actually create a checkpoint record that says, I hereby declare at this moment in time, everything written to WAL on this moment is completely in SYNC with what we have on data files on disk.

Undo logs

Açıklaması şöyle. Checkpointing çökerse bellekteki kirli sayfaların bir kısmı diske yazılmış bir kısmı yazılmamış olacak.

Ok. Here is another edge case for you, I flush my WAL and commit, a checkpointer process kicks in and start flushing dirty pages to disk (dirty pages are pages in memory that have been written to) and midway through checkpointing database crashed!

You now have half of the data pages updated and half of them are outdated with the WAL. The WAL only has changes, it has no clue what the original rows looked like. So for that we need another log to record the original state of the row before we made the change and that is the undo log.

Now with the undo log, the database first checks for transactions that started but didn’t commit and roll those back, any changes made by those transactions must be discarded from the WAL. Any changes that made it to the data files MUST be undone so we go back to consistent state where all rows in the data pages are trusted and committed. And finally we redo the WAL changes on those original pages to bring the pages back in sync with the WAL.

The undo logs are also used by transactions that need MVCC visibility to the old version of the row. This means having a long running transaction that does massive changes may slow down repeatable read/serializable isolation level read transactions as those have to go to the undo log and restore the original row before use.

Postgres clever architecture completely can get away without undo logs I explain it here.

14 Aralık 2023 Perşembe

FOR LOOP

Örnek - Batch Deletion

Şöyle yaparız

BEGIN; -- BEGIN A TRANASCATION

WITH batch_cursor AS (
  SELECT id FROM state_transitions
  ORDER BY id ASC
  LIMIT 10000 --- 10k record in a batch
)
FOR batch IN (SELECT * FROM batch_cursor)
LOOP
  DELETE FROM large_table
  WHERE id = batch.id;
END LOOP;

COMMIT;  -- COMMITING THE TRANSACTION

Açıklaması şöyle

With batch deletion, we’re gently removing a bunch of records in each go. This not only keeps our database humming but also reduces the risk of causing chaos. ALWAYS RUN THIS IN A TRANSACTION AND COMMIT IT TOGETHER

7 Aralık 2023 Perşembe

pg_partman

Giriş

Açıklaması şöyle

pg_partman is an extension for PostgreSQL that provides automatic partitioning of tables.
...
pg_partman simplifies the process of managing partitioned tables by automating the creation and maintenance of partitions. It includes features to handle both time-based and serial-based partitioning. This extension is particularly useful for tables that grow over time and need to be partitioned based on date or other criteria.

Parallel Sequential Scan

Giriş

Sequential Scan gibidir. Sadece birden fazla worker tarafından gerçekleştirilir

Örnek

Şöyle yaparız. Tabloda Index olmadığı için Full Table Scan yapılıyor

CREATE TABLE t1 (
    a   int,
    b   int
);
test=# explain analyze select count(*) from t1 where a=9999;
                                                          QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=10633.65..10633.66 rows=1 width=8) (actual time=43.930..47.172 rows=1 loops=1)
   ->  Gather  (cost=10633.44..10633.65 rows=2 width=8) (actual time=43.766..47.164 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial Aggregate  (cost=9633.44..9633.45 rows=1 width=8) (actual time=34.394..34.395 rows=1 loops=3)
               ->  Parallel Seq Scan on t1  (cost=0.00..9633.33 rows=42 width=0) (actual time=34.373..34.385 rows=33 loops=3)
                     Filter: (a = 9999)
                     Rows Removed by Filter: 333300
 Planning Time: 0.313 ms
 Execution Time: 47.250 ms
(10 rows)