PostgreSQL Çorbası: CLUSTER - Tabloda Performans Problemi Varsa Kullanılır

30 Ocak 2023 Pazartesi

CLUSTER - Tabloda Performans Problemi Varsa Kullanılır

Giriş

Açıklaması şöyle

Another option is to use the CLUSTER command, where the entire table will be rewritten, taking and reordering based on the created index

Google Bard Açıklaması şöyle

The CLUSTER command in PostgreSQL is used to physically reorder the data in a table based on the index information. This can improve the performance of queries that use the index, as the rows will be stored in a contiguous block on disk.

The syntax for the CLUSTER command is as follows:

CLUSTER table_name [USING index_name]
The table_name parameter specifies the name of the table to be clustered. The index_name parameter specifies the name of the index to use for clustering. If the index_name parameter is not specified, PostgreSQL will use the primary key index, if one exists.

Örnek

170 milyon satır içeren büyük bir tabloya satır ekleme işlemi uzun süren bir örnek burada. Çözüm olarak CLUSTER komutu ile tablo tekrar tekrar fiziksel olarak yaratılmış. Açıklaması şöyle

1. Created a btree index on the unique identifier column (took about 20 minutes to complete, the size of the index was about 14gb).
2. Reduced the fill factor from 100 to 85 (we do get a lot of updates on the same unique id)
3. Ran ‘CLUSTER <table> USING <id index>’ (took about 45 minutes to complete)
4. Ran ‘VACUUM FULL ANALYZE’ (we could have ran just ‘ANALYZE’ alone, but i wanted to do the check fresh).
5. Re-ran the bulk upsert command, but this time the temp table will have data sorted as per the unique id (ascending), so that the upsert are done in the sorted id format.

Örnek

Şöyle yaparız

CLUSTER table_name USING index_name

PostgreSQL Çorbası

30 Ocak 2023 Pazartesi

CLUSTER - Tabloda Performans Problemi Varsa Kullanılır

Hiç yorum yok:

Yorum Gönder