PostgreSQL Çorbası: 2021

3 Aralık 2021 Cuma

Sütun Tipleri - uuid

Giriş

Açıklaması şöyle

PostgreSQL allows you to store and compare UUID values, but it doesn't have any built-in methods for creating them.

UUID üretmek için şu modüller kullanılabilir.

1. uuid-ossp Module'deki uuid_generate_v1(),uuid_generate_v4() kullanılabilir

2. pgcrypto Module'deki gen_random_uuid() kullanılabilir

3. UUIDv7 üretmek için pg_uuidv7 extension kullanılır

VARCHAR Sütun Tipi

UUID saklamak için bazen VARCHAR sütun tipi kullanılıyor.

Örnek

Şöyle yaparız. Burada id alanı UUID ama VARCHAR olarak saklanıyor

CREATE TABLE test.speed_uuid
(
    id       varchar(36) PRIMARY KEY,
    name    varchar(50),
    created timestamp
);

1. uuid-ossp Module

Açıklaması şöyle

If using Azure Database for PostgreSQL, this can be enabled by going to Server Parameters → azure.extensions → Choose uuid-ossp. User needs to have admin credentials to do this.

Örnek

Eğer kurulu değilse kurmak için şöyle yaparız

CREATE EXTENSION IF NOT EXISTS "uuid-ossp";

Örnek

Şöyle yaparız

CREATE TABLE books (
  id              UUID DEFAULT uuid_generate_v4 (),
  title           VARCHAR(100) NOT NULL,
  primary_author  VARCHAR(100) NULL,
  PRIMARY KEY (id)
);

Örnek

Şöyle yaparız

CREATE EXTENSION "uuid-ossp";

INSERT INTO tenant (id, name) VALUES (uuid_generate_v4(), 'Company 1');
INSERT INTO tenant (id, name) VALUES (uuid_generate_v4(), 'Company 2');

2. pgcrypto Module

Örnek

Şöyle yaparız

CREATE TABLE thingie (
  id UUID PRIMARY KEY DEFAULT public.gen_random.uuid(),
  foo VARCHAR,
  bar VARCHAR,
);

3. UUID7

Açıklaması şöyle

Similar to UUID v4, UUID v7 is a 128-bit identifier represented as a 32-character sequence of letters and numbers, formatted as 8–4–4–4–12. The distinctive feature of UUID v7 lies in its nature as a time-ordered UUID, encoding a Unix timestamp with millisecond precision in the most significant 48 bits. In alignment with UUID formats, 4 bits designate the UUID version, and 2 bits denote the variant. The remaining 74 bits are generated randomly, contributing to the uniqueness of this identifier.

Şeklen şöyle

Örnek

Şöyle yaparız

CREATE EXTENSION IF NOT EXISTS pg_uuidv7;
CREATE TABLE examples (
  example_id UUID PRIMARY KEY DEFAULT uuid_generate_v7(),
  created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP NOT NULL,
  updated_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP NOT NULL,
  deleted_at TIMESTAMP WITH TIME ZONE,
  created_by UUID NOT NULL,
  updated_by UUID NOT NULL,
  FOREIGN KEY (created_by) REFERENCES users(user_id),
  FOREIGN KEY (updated_by) REFERENCES users(user_id)
);

1 Aralık 2021 Çarşamba

Analytic Functions / Window Functions

Window Function Nedir?

Açıklaması şöyle

Similar to an aggregate function, a window function operates on a set of rows. However, it does not reduce the number of rows returned by the query.

The term window describes the set of rows on which the window function operates. A window function returns values from the rows in a window.

Söz dizimi şöyle. Yani aslında bir şeyleri grupluyor ve gruptaki her satır için bir şey hesaplıyor

window_function(arg1, arg2,..) OVER (
   [PARTITION BY partition_expression]
   [ORDER BY sort_expression [ASC | DESC] [NULLS {FIRST | LAST }])

Eğer PARTITION BY kullanılmazsa tek grup varmış gibi davranır. Gruptaki satırlar ORDER BY ile sıralanabilir.

Window Function Nedir vs SubSelect/Subquery

Açıklaması şöyle

In a standards-compliant RDBMS, you can eliminate some of your subqueries and your in clause by using a windowing expression:

Örnek

Elimizde şöyle bir kod olsun. Harcama toplamı belli bir miktarı geçen müşterileri bulmak isteyelim. Burada GROUP BY + HAVING ile harcama tablosundaki müşteriler seçiliyor. Daha sonra JOIN ile müşteri tablosu ile birleştiriliyor.

SELECT exp.employee_id, emp.name, exp.amount
FROM Employee emp
JOIN Expenditure exp ON exp.employee_id = emp.id
WHERE exp.employee_id IN (
    SELECT employee_id
    FROM Expenditure
    GROUP BY employee_id
    HAVING SUM(amount) >= 100::money
)
ORDER BY exp.amount DESC;

Window Function' ile şöyle yaparız

SELECT exp.employee_id, emp.name, exp.amount
FROM Employee emp
JOIN (
    SELECT
        employee_id, amount,
        SUM(amount) OVER (PARTITION BY employee_id) AS total
    FROM Expenditure
) exp ON exp.employee_id = emp.id
WHERE exp.total >= 100::money
ORDER BY exp.amount DESC;

frame_clause Nedir?

Sorguda birden fazla Window Function kullanabilmeyi sağlar.

Örnek

Şöyle yaparız

SELECT
    wf1() OVER(PARTITION BY c1 ORDER BY c2),
    wf2() OVER(PARTITION BY c1 ORDER BY c2)
FROM table_name;

Eğer sürekli PARTITION BY yazmak istemiyorsak bunu kısaltmak için şöyle yaparız. Burada WINDOW kullanılıyor

SELECT 
   wf1() OVER w,
   wf2() OVER w,
FROM table_name
WINDOW w AS (PARTITION BY c1 ORDER BY c2);

Window Function Listesi

Bir açıklamaya göre Window Function Listesi şöyle. Bence bu listeye AVG ve diğer bazı şeyler de eklenmeli

Name        Description
CUME_DIST        Return the relative rank of the current row.
DENSE_RANK Rank the current row within its partition without gaps.
FIRST_VALUE Return a value evaluated against the first row within its partition.
LAG        Return a value evaluated at the row that is at a specified physical offset                                           row before the current row within the partition.
LAST_VALUE        Return a value evaluated against the last row within its partition.
LEAD        Return a value evaluated at the row that is offset rows after the current row                                        within the partition.
NTILE        Divide rows in a partition as equally as possible and assign each row an integer                                starting from 1 to the argument value.
NTH_VALUE        Return a value evaluated against the nth row in an ordered partition.
PERCENT_RANK Return the relative rank of the current row (rank-1) / (total rows – 1)
RANK        Rank the current row within its partition with gaps.
ROW_NUMBER Number the current row within its partition starting from 1.

Başka bir açıklamaya göre Basic Window Functions şöyle. Neyse kimin ne dediği önemli değil

Aggregate Functions : SUM, COUNT, MIN, MAX, AVG
Ranking Functions : RANK, DENSE_RANK, ROW_NUMBER, NTILE
Value Functions : LAG, LEAD, FIRST_VALUE, LAST_VALUE

FIRST_VALUE

Elimizde şöyle bir tablo olsun. Her sütundaki son null olmayan değeri isteyelim. Bu tablodaki sonuç (A1,B2,C1) olmalı

version col_A   col_B   col_C
1       A1      B1     (null)
2       A2      B3     (null)
3       A3      B2     (null)
4       A5     (null)   C1
5       A1     (null)  (null)

Şöyle yaparız. Burada frame_clause kullanılıyor. Sorguyu kısaltmak için WINDOW da kullanılabilirdi. ORDER BY ile NULL değer 0 ile değiştiriliyor ve azalan şekilde sıralanıyor. Böylece FIRST_VALUE en son sütun değerini buluyor

SELECT DISTINCT
FIRST_VALUE(col_A) OVER (ORDER BY CASE WHEN col_A IS NULL THEN 0 ELSE version END DESC)
  AS LastA,
FIRST_VALUE(col_B) OVER (ORDER BY CASE WHEN col_B IS NULL THEN 0 ELSE version END DESC)
  AS LastB,
FIRST_VALUE(col_C) OVER (ORDER BY CASE WHEN col_C IS NULL THEN 0 ELSE version END DESC)
  AS LastC
FROM tab

LAST_VALUE

Açıklaması şöyle

Q : Write a SQL query to find the total number of people who are inside the hospital.

A : the last_value window function is used to find the last value of a particular partition.

So here we can use this to find the last action of each user.

And if the last action is 'in', then the user is inside the hospital.

Şöyle yaparız

SELECT cte AS (
  SELECT*,LAST_VALUE(action) OVER(PARTITION BY emp_id) = 'in' AS 'in_action'
  FROM hospital
) 
SELECT DISTINCT emp_id FROM cte WHERE
  in_action = 1

28 Ekim 2021 Perşembe

SET

max_parallel_workers_per_gather

Örnek - Disable parallel query execution for the current session

Şöyle yaparız

SET max_parallel_workers_per_gather = 0;

Örnek

Şöyle yaparız

-- to enable parallel query execution
SET max_parallel_workers_per_gather = 4;

-- to calculate total sales amount for each product category using parallelism
SELECT category, SUM(amount) AS total_sales
FROM sales
GROUP BY category;

Örnek

Şöyle yaparız

INSERT INTO sample_data SELECT generate_series(200000, 4000000);
SET max_parallel_workers_per_gather = 4;

SELECT COUNT(*) FROM sample_data;

EXPLAIN ANALYZE - Cümleyi Çalıştırır

Giriş

SQL cümlesi için kullanılan "Query Plan" bilgisini gösterir. Açıklaması şöyle

There are several node types in PostgreSQL query plans,
Scans.
Joins.
Sort.
Aggregates etc.,

Explain vs Explain Analyze

Explain için söz dizimi şöyle

EXPLAIN (ANALYZE, VERBOSE) SELECT * FROM your_table WHERE condition;

Açıklaması şöyle. Yani EXPLAIN ANALYZE cümleyi çalıştırır

You can use the EXPLAIN and EXPLAIN ANALYZE commands to view the query plan and see if it makes sense (i.e. by running EXPLAIN ANALYZE <query_statement>).

Note that EXPLAIN is a read-only DB operation but EXPLAIN ANALYZE isn’t! Be very careful not to use EXPLAIN ANALYZE on an UPDATE or DELETE statement, as it will actually execute the query and modify the database.

Scan Çeşitleri

Bazı çeşitler şöyle

1. Sequential Scans

2. Index Scans

3. Index Only Scans

4. Bitmap Heap Scan & Bitmap Index Scan

5. Parallel Scans

1. Sequential Scans

Sequential Scans yazısına taşıdım

2. Index Scans

Eğer tabloda bir sütun için indeks varsa kullanılır. Index genellikle where koşulundaki sütuna konulur

Where koşulu Sırası

Açıklaması şöyle. Yani (tenantId, status) şeklinde index olsa bile ikinci sırada belirtilen where koşulu bunu kullanmayabiliyor. (status, tenantId) şeklinde yeni bir index yaratılabilir ama bence en kolayı sorguyu değiştirmek

Two of the queries had the following WHERE clause:
WHERE "tenantId" = 'some_tenant_id' AND "status" != 'dismissed'.
The other two had the conditions in the opposite order:
WHERE "status" != 'dismissed' AND "tenantId" = 'some_tenant_id'.

Postgres may choose different execution plans depending on the order of the conditions in the WHERE clause, so it treated these as distinct queries. As mentioned earlier, I added a multi-column index on the (tenantId, status) columns in the Notes table, but this only improved the performance of the first two queries. The other two queries, which had the opposite order of conditions in the WHERE clause, were unable to benefit from this index.

Örnek

Şöyle yaparız. Burada id alanı için indeks yaratılıyor

CREATE INDEX id_idx ON fake_data USING BTREE(id);

Sorgu için şöyle yaparız

EXPLAIN ANALYZE SELECT * FROM fake_data WHERE 
 fake_data.id = 1000;

Örnek

Şöyle yaparız

SELECT * FROM songs WHERE (artists_id = 1 AND album_id = 1);

Çıktısı şöyle

Index Scan using songs_artists_id_album_id_index on songs  
(cost=0.28..6.05 rows=1 width=159) (actual time=5.555..5.562 rows=10 loops=1)
   Index Cond: ((artists_id = 1) AND (album_id = 1))
 Planning Time: 311.482 ms
 Execution Time: 9.266 ms
(4 rows)

Açıklaması şöylee

(cost=0.28..6.05 rows=1 width=159) refers to the planner's estimations while (actual time=5.555..5.562 rows=10 loops=1) refers to the actual results of the executing the plan. The planner estimated 1 row would be returned, but there were actually 10.

The planner calculated its row estimate by first taking the total number of Songs (1000), then considering the artists_id filter. 10% of Songs have artists_id = 1 so that leaves 100 Songs. Next it considers the album_id filter. 1% of Songs have album_id = 1, so it's left with 1 Song.

The key piece of information Postgres is missing is that artist_id and album_id are strongly correlated. In fact, knowing the album_iduniquely determines the artist_id. Had Postgres known about this, it could have used only the album_id = 1 filter in its estimation and come up with the correct result of 10 Songs.

3. Index Only Scans

Index Only Scans yazısına taşıdım

4. Bitmap Index Scan ve Bitmap Heap Scan

Bitmap Index Scan ve Bitmap Heap Scan yazısına taşıdım

12 Ekim 2021 Salı

Generalized Inverted Index - GIN Index - Full Text Search İçindir

Giriş

GIN bir Full Text Search için kullanılır. Açıklaması şöyle

The GIN (Generalized Inverted Index) index is a specialized index type in PostgreSQL designed for efficient full-text search. When a GIN index is created on a `ts_vector` column, it builds an index structure that maps each lexeme to the documents that contain it. This index is optimized for quick lookups, making it ideal for full-text search operations.

Açıklaması şöyle

The GIN index enables PostgreSQL to perform queries involving the `@@` operator, which checks if a `ts_vector` column matches a `tsquery` (a query expressed in the same tokenized format). The GIN index speeds up the search by efficiently locating the relevant documents based on the lexemes present in the search query, without the need for a sequential scan of all documents.

TSVECTOR Sütun Tipine Göre Index

TSVECTOR Sütun Tipine Göre Index yazısına taşıdım

TEXT Sütun Tipine Göre Index

Örnek

Şöyle yaparız

CREATE INDEX trgm_idx on storm_events USING gin(event_type gin_trgm_ops);

JSONB Sütun Tipine Göre Index

JSONB Sütun Tipine Göre Index yazısına taşıdım

29 Eylül 2021 Çarşamba

LEAD ve LAG

Giriş

Basic Window Functions şöyle

Aggregate Functions : SUM, COUNT, MIN, MAX, AVG
Ranking Functions : RANK, DENSE_RANK, ROW_NUMBER, NTILE
Value Functions : LAG, LEAD, FIRST_VALUE, LAST_VALUE

Açıklaması şöyle. Bir önceki veya sonraki satırın değerine erişebilme imkanı sağlar. Eğer böyle bir satır yoksa NULL döner.

The LEAD() function returns the next value of the current value.
The LAG() function returns the previous value of the current value.

Açıklaması şöyle

NOTE: LEAD() / LAG() function never works without ORDER BY clause.

Örnek

Şöyle yaparız

SELECT name, city, LEAD(amount) OVER(PARTITION BY city ORDER BY amount ASC)
AS next_amount_value FROM Orders

SELECT name, city, LAG(amount) OVER(PARTITION BY city ORDER BY amount ASC)
AS next_amount_value FROM Orders

DENSE_RANK + OVER - Gruplar ve Her Gruba Sırayla Numara Verir

Giriş

DENSE_RANK() : Satırlar gruplanır. Bir gruptaki satırlara aynı sayıyı verir. Grup değişince bir sonraki sayıdan devam eder. Verilen sayılar arasında boşluk olmaz.

RANK(): Satırlar gruplanır. Bir gruptaki satırlara aynı sayıyı verir. Grup değişince bir sonraki sayıdan devam etmez .

ORDER_BY(): Satırları gruplamaz. Sadece sayı verir.

Şeklen şöyle

Bir başka şekil şöyle

İki kullanım var

1. DENSE_RANK + OVER (PARTITION BY ...) :

2. DENSE_RANK + OVER (ORDER BY ...)

1. DENSE_RANK + OVER (PARTITION BY ...)

PARTITION BY + ORDER BY grubun ilk n sayıdaki elemanını çekmek için kullanılır.

Örnek

Şöyle yaparız

SELECT * FROM (
  SELECT ad,tarih ,DENSE_RANK() OVER(PARTITION BY ad ORDER BY tarih DESC) sira FROM
  kullanici)
WHERE sira=1

Örnek
Elimizde şöyle bir tablo olsun

qid  segment task    before  after   diff
------------------------------------------
1       0    aaa       5     5       0
1       0    aaa       5     5       0
1       1    aaa       15    15      0
1       1    aaa       10    5      10
1       0    aaa       10    5       5
3       0    bbb       10    4       6

Bu tablodaki aynı task grubu için en büyük qid değerini ve ona ait segment değerini çekmek isteyelim. Yani çıktı şöyle olsun

task    before  after   diff
-----------------------------
aaa       25    20      10
bbb       10    4       6

Şöyle yaparız. task alanına göre gruplayıp, qid ve segment büyükten küçüğe sıralanır. Bu satıra rnk olarak 1 verilir. Daha sonra rnk = 1 olan satırlar seçilir.

with cte as
(
    select
        *,
        dense_rank() over (partition by task order by qid desc, segment desc) as rnk
    from myTable
)

select
    task,
    sum(before) as before,
    sum(after)  as after,
    sum(diff) as  diff
from cte
where rnk = 1
group by
    task

2. DENSE_RANK + OVER (ORDER BY ...)

DENSE_RANK + OVER (ORDER BY ...) yazısına taşıdım