PostgreSQL Çorbası: Eylül 2023

26 Eylül 2023 Salı

PG_STAT_PROGRESS_VACUUM Sistem Tablosu - Çalışan Vacuum İşlemlerini Gösterir

Giriş

Açıklaması şöyle

The pg_stat_progress_vacuum view provides insights into the progress of ongoing vacuum operations. It allows you to monitor the vacuum process, track the number of dead tuples, and estimate the remaining time for completion. By using this view, you can identify tables with significant bloat, assess the effectiveness of vacuuming, and schedule maintenance activities accordingly. This feature helps maintain database health and ensures optimal query performance.

Örnek

Şöyle yaparız

SELECT * FROM pg_stat_progress_vacuum;

EXPLAIN

Giriş

Explain için söz dizimi şöyle

EXPLAIN (ANALYZE, VERBOSE) SELECT * FROM your_table WHERE condition;

EXPLAIN cümleyi çalıştırmaz.

EXPLAIN ANALYZE cümleyi çalıştırır

15 Eylül 2023 Cuma

TOAST - The Oversized-Attribute Storage Technique

Giriş

Açıklaması şöyle

If the row/tuple is too big, Postgres will compress and (maybe) move the data out of the main table. The process is called TOAST (The Oversized-Attribute Storage Technique). Queries involving large data objects stored in TOAST tables can be slower than those with smaller data objects.

PG_STATIO_USER_TABLES Sistem Tablosu veri tabanındaki tabloları için I/O kullanımını gösterir

Örnek

Şöyle yaparız

-- list of TOAST tables and indexes
SELECT *
FROM   pg_catalog.pg_class
WHERE  relname LIKE 'pg_toast%';

Şöyle yaparız

-- list of table that use TOAST
SELECT 
  schemaname,
  relname,
  toast_blks_read,  -- Number of disk blocks read from this table's TOAST table (if any)
  toast_blks_hit,   -- Number of buffer hits in this table's TOAST table (if any)
  tidx_blks_read,   -- Number of disk blocks read from this table's TOAST table indexes (if any)
  tidx_blks_hit     -- Number of buffer hits in this table's TOAST table indexes (if any)
FROM pg_catalog.pg_statio_user_tables
WHERE toast_blks_read + toast_blks_hit + tidx_blks_read + tidx_blks_hit > 0;

Şöyle yaparız

-- show storage/toast strategy
-- m = Main: This means no compression, no out of line storage. This is for data types which are not TOASTable at all.
-- p = Plain: Compression, but no out of line storage.
-- x = Extended: Compression and out of line storage.
-- e = External: No compression, but out of line storage.
SELECT psut.schemaname, psut.relname, pa.attname, atttypid :: regtype, attstorage 
FROM pg_catalog.pg_statio_user_tables psut
  join pg_catalog.pg_attribute pa ON psut.relid = pa.attrelid
WHERE psut.toast_blks_read > 0;

INFORMATION_SCHEMA.TABLES Sistem Tablosu

Örnek

Açıklaması şöyle

Sometimes we need to create a temporary table during data migration or handle the issue which is usually safe to be dropped later on. We can check whether there is a name convention for temporary tables (like tmp_* , temp_* , *_backup , etc) within the organization.

Şöyle yaparız

SELECT table_catalog, table_schema, table_name, table_type
FROM information_schema.tables 
WHERE
  table_schema NOT IN ( 'pg_catalog', 'information_schema' )
  AND ( table_name LIKE 'tmp_%' OR table_name LIKE 'temp_%' OR table_name LIKE '%_backup' );

8 Eylül 2023 Cuma

Sequential Scan

Giriş

Not : Parallel Sequential Scan aynı Sequential Scan gibidir, sadece birden fazla worker kullanılır.

Açıklaması şöyle. Full Table Scan olarak da bilinir. Tabloda index yoksa mecburen Full Table Scan yapılır.

This is the simplest of them all; if there is no index or a lesser number of rows, then the Planner resorts to scanning all of the rows present. You would/should never encounter Sequential Scans in a typical production system since they are very slow and get slower as the data increases. There are exceptions when the table is too small where an index is pointless and Sequential Scans are fast enough that you don’t have to worry.

Açıklaması şöyle

One thing to understand is that there are certain kinds of queries, such as Count, Avg and other aggregate queries, which always result in Sequential Scan because they have to scan the entirety of the table anyway for the result.

Örnek

Şöyle yaparız. Burada Seq Scan metninden de görüldüğü gibi sequential scan kullanılıyor

explain analyze SELECT EXISTS (
  SELECT
    1
  FROM
    balances_snapshots
  WHERE
    total_balance = double precision 'NaN'
  LIMIT 1
) as exists;
                                                                QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
 Result  (cost=4.75..4.76 rows=1 width=1) (actual time=237365.680..237365.681 rows=1 loops=1)
   InitPlan 1 (returns $0)
     ->  Seq Scan on balances_snapshots  (cost=0.00..9257326.32 rows=1948181 width=0) (actual time=237365.675..237365.676 rows=0 loops=1)
           Filter: ((total_balance)::double precision = 'NaN'::double precision)
           Rows Removed by Filter: 389636289
 Planning Time: 23.985 ms
 Execution Time: 237365.719 ms
(7 rows)

Sebebi ise total_balance alanının "numeric" olması. Ancak sorguda "double precision" yani double olarak arama yapılıyor

Örnek

Şöyle yaparız. Burada event_type için indeks yok

EXPLAIN ANALYZE SELECT * FROM weather WHERE event_type = 'Winter Storm'; 

Seq Scan on weather (cost=0.00..115.00 rows=11] width=622) (actual time=0.100..19.800 rows=11] loops=1)
  Filter: ((event type)::text = 'Winter Storm'::text)
  Rows Removed by Filter: 1189
Planning Time: 30.000 ms
Execution Time: 20.800 ms
(5 rows)

Bir indeks yaratırsak Sequential Scan yapmaz

CREATE INDEX idx weather_type ON weather(event_type);

Eğer where koşulunda iki tane sütun olsaydı indekste iki tane alan olmalıydı ve alanların sırası önemli. Şöyle yaparız

CREATE INDEX idx_storm_crop ON weather (event_type, damage_crops); 
SELECT * FROM weather WHERE event_type = 'Winter Storm' AND damage_crops > 0;

Eğer GROUP BY kullanırsak yine Sequential Scans yapar

SELECT state FROM weather 
WHERE event_type = 'Winter Storm’ AND damage_crops > O
GROUP BY state;

Açıklaması şöyle

The planner will need to load all of the data to the memory and sort all of the data by state, then for each state it’ll need to filter out by our WHERE statement.

Bu sefer indeki büyütmek gerekiyor. Şöyle yaparız

CREATE INDEX idx_storm_crop ON weather (state, event_type, damage_crops);

5 Eylül 2023 Salı

Generalized Inverted Index - GIN Index - TSVECTOR Sütun Tipine Göre Index Full Text Search İçindir

Giriş

Tabloda tsvector tipinden bir sütun olur. Bu sütun için GIN indeks yaratırız.

CREATE INDEX indexname ON mytable USING GIN(tsvector_column_name);

1. tsvector tipindeki sütunu doldurmak için INSERT işleminden sonra bir tane de UPDATE gerekiyor

2. Bu sütunu sorgulamak için plainto_tsquery() veya to_tsquery() kullanılır

- plainto_tsquery() daha basit sorgulamalar içindir

- to_tsquery() daha karmaşık sorgulamalar içindir. AND, OR, NOT, * (wildcard), ? (wildcard), : (prefix), ~ (fuzzy match) gibi şeyleri destekler

Örnek

Şöyle yaparız

SELECT * FROM mytable WHERE tsvector_column_name @@ plainto_tsquery('my first story');

Örnek

Şöyle yaparız

ALTER TABLE storm_events ADD COLUMN episode_narrative_tsv tsvector;
    
UPDATE storm_events SET episode_narrative_tsv = to_tsvector(episode_narrative);

CREATE INDEX tsv_idx ON storm_events USING GIN(episode_narrative_tsv);

Örnek

Şöyle yaparız

-- Create the table
CREATE TABLE blog_posts (
 id SERIAL PRIMARY KEY,
 content TEXT
);

-- Insert sample data
INSERT INTO blog_posts (content)
VALUES
 ('Welcome to my blog on alternative of Elasticsearch'),
 ('In this blog post, we explore the power of ts_vector and GIN indexing'),
 ('PostgreSQL offers robust full-text search capabilities'),
 ('Searching for keywords in PostgreSQL has never been easier');

- Create `ts_vector` column and GIN index
ALTER TABLE blog_posts ADD COLUMN document tsvector;
UPDATE blog_posts SET document = to_tsvector('english', content);
CREATE INDEX gin_index ON blog_posts USING GIN(document);

-- Perform full-text search
SELECT * FROM blog_posts
  WHERE document @@ to_tsquery('english', 'power & search');

Örnek
Elimizde şöyle tablolar olsun. Burada users ve users'a ait stories isimli iki tablo var. stories içinde tsvector tipinden ismi fulltext olan bir sütun var
--Create Users table
CREATE TABLE IF NOT EXISTS users
(
  id BIGSERIAL NOT NULL,
  name CHARACTER VARYING(100) NOT NULL,
  rating INTEGER,
 PRIMARY KEY (id)
);

CREATE INDEX usr_rating_idx ON users USING BTREE (rating ASC NULLS LAST)
TABLESPACE pg_default;

--Create Stories table
CREATE TABLE  IF NOT EXISTS stories
(
  id BIGSERIAL NOT NULL,
  create_date TIMESTAMP WITHOUT TIME ZONE NOT NULL,
  num_views BIGINT NOT NULL,
  title TEXT NOT NULL,
  body TEXT NOT NULL,
  fulltext TSVECTOR,
  user_id BIGINT,
  PRIMARY KEY (id),
  CONSTRAINT user_id_fk FOREIGN KEY (user_id)
  REFERENCES users (id) MATCH SIMPLE
  ON UPDATE NO ACTION
  ON DELETE NO ACTION
  NOT VALID
);

CREATE INDEX str_bt_idx ON stories USING BTREE
  (create_date ASC NULLS LAST, num_views ASC NULLS LAST, user_id ASC NULLS LAST);

CREATE INDEX fulltext_search_idx ON stories USING GIN (fulltext);
stories tablosuna satır eklemek şöyle. 
INSERT INTO stories (create_date, num_views, title, body, user_id)
                                            VALUES (?, ?, ?, ?, ?)
Daha sonra TSVECTOR tipindeki sütunu güncelleriz.
UPDATE stories SET fulltext = to_tsvector(title || ' ' || body) where id = ?
Sorgulamak için şöyle yaparız. plainto_tsquery() metodu ile string'i aramak istenile tipe çeviririz.
SELECT
 s.id id,
 create_date,
 num_views,
 title,
 body,
 user_id,
 name user_name,
 rating user_rating
FROM stories s INNER JOIN users u
 ON s.user_id = u.id
WHERE true
 AND rating BETWEEN 0 AND 100
 AND num_views BETWEEN 0 AND 100   
 AND create_date BETWEEN '2023-08-28' AND '2023-08-29'   
 AND fulltext @@ plainto_tsquery('Second story')  
ORDER BY create_date DESC OFFSET 0 LIMIT 100
;

4 Eylül 2023 Pazartesi

Docker ve PostgreSQL Debezium

Örnek

Şöyle yaparız

docker pull debezium/postgres

docker run \
  -it \
  --rm \
  --name debezium-postgres \
  -p 8080:8080 \
  -e BOOTSTRAP_SERVERS=kafka:9092 \
  -e GROUP_ID=1 \
  -e CONFIG_STORAGE_TOPIC=my-connect-configs \
  -e OFFSET_STORAGE_TOPIC=my-connect-offsets \
  -e STATUS_STORAGE_TOPIC=my-connect-status \
  -e CONFIG_STORAGE_REPLICATION_FACTOR=1 \
  -e OFFSET_STORAGE_REPLICATION_FACTOR=1 \
  -e STATUS_STORAGE_REPLICATION_FACTOR=1 \
  debezium/postgres:latest

postgres Kullanıcısı - Administrator

PostgreSQL kurulunca genellikle postgres isimli bir administrator kullanıcı yaratılır

Ayrıc da bir root kullanıcısı da olabiliyor. Sanırım bu ikisi aynı