22 Ağustos 2019 Perşembe

pg_dumpall komutu

Giriş
Açıklaması şöyle.
pg_dump dumps only a single database at a time, and it does not dump information about roles or tablespaces (because those are cluster-wide rather than per-database). To support convenient dumping of the entire contents of a database cluster, the pg_dumpall program is provided. pg_dumpall backs up each database in a given cluster, and also preserves cluster-wide data such as role and tablespace definitions. The basic usage of this command is:

pg_dumpall > outfile
--globals-only seçeneği
Şöyle yaparız. Böylece örneğin kullanıcı bilgilerini de kaydedebiliriz.
pg_dumpall --globals-only  --file=globals.sql

20 Ağustos 2019 Salı

High Cardinality - Öğe Sayısı

Giriş
Açıklaması şöyle.
In the world of databases, cardinality refers to the number of unique values contained in a particular column, or field, of a database.
Örnek
Açıklaması şöyle.
Say there are 10,000 pieces of equipment, each with 100 sensors, running 10 different firmware versions, spread across 100 sites:

The maximum cardinality of this dataset then becomes 1 billion [10,000 x 100 x 10 x 100].

Now imagine that the equipment can move as well, and we’d like to store the precise GPS location (lat, long) and use that as indexed metadata to query by. Because (lat, long) is a continuous field (as opposed to a discrete field like equipment_id), by indexing on location, the max cardinality of this dataset is now infinitely large (unbounded).
High Cardinality Varsa
Aslında bu kötü bir şey olduğu anlamına gelmez. Ancak cardinality değerini düşürmek için bazı çözümler şöyle.
1. The first question you can answer is: do you need every unique value that you’re storing? For example, you might be able to insert data every minute instead of every 5 seconds without losing the patterns in your data.
2. Another option is to expire data after a specified window of time to keep the dataset smaller.

19 Ağustos 2019 Pazartesi

postgresql.conf Dosyası

Giriş
Açıklaması şöyle.
PostgreSQL has two important files where most of the developer-friendly configurations are defined: "postgresql.conf" and "pg_hba.conf". You can find these files under "C:\Program Files\PostgreSQL\11\data" or incase of Linux "/var/lib/pgsql/11/data/".
Not : pg_hba.conf Dosyası yazısına bakabilirsiniz.

autovacuum  Alanı
Örnek
Şöyle yaparız
autovacuum = on autoanalyze = on
 Açıklaması şöyle
1. Autovacuum ensures that space occupied by deleted or outdated data is reclaimed, preventing unnecessary bloat and optimizing query execution.
2. Auto analyze updates table statistics, enabling the query optimizer to generate accurate query plans. By configuring and monitoring these processes, you can prevent performance degradation due to excessive table bloat or outdated statistics.
autovacuum_freeze_max_age Alanı
Açıklaması şöyle
This is the max age (in no. of transactions) a table’s pg_class.relfrozenxid can attain before vacuum is forced in order to prevent XID wraparound.

Default is 200m which is ~10% of the possible 2³¹ values (wraparound). Now 10% may seem too less and one might be tempted to increase it. But that would just delay the inevitable forced vacuum process further (thereby also increasing the time it’d take to complete).

The right value can be found out based on your transaction burn rate i.e. how fast the XIDs will be used up. But we might just be able to get away with the default value itself if other things are configured correctly (as explained a bit later).
vacuum_freeze_min_age
Açıklaması şöyle
The cutoff age (in transactions) that vacuum should use to decide whether to freeze row versions while scanning a table.

Default is 50m which means all XIDs that are older than 50m will be frozen and the youngest XID in the table will have an age of 50m. This is very very conservative number. Consider transaction advancement rate of 50m in 24 hours. Unless we have transactions or queries that run for 24 hours, do we really need that old XIDs to be present? Another way to think about it is — even after vacuum is run, we are only 50m transactions away from the next forced vacuum. From a practical standpoint, this value can be reduced by a lot (considering the default 200m autovacuum_max_freeze_age).

The right value can be decided based on two things — the rate of transactions and the duration of longest running query/transaction served by the cluster.

A vacuum freeze instead of plain vaccumimplicitly sets this value to 0 i.e. XID on all rows will be frozen.
vacuum_freeze_table_age
Açıklaması şöyle
If the table’s pg_class.relfrozenxid value reaches this threshold, then vacuum will aggressively scan the tuples i.e. it will visit every page that may contain unfrozen tuples, not just the ones that contain dead tuples. This is provided because normally, vacuum uses an optimization to avoid scanning all pages. The visibility_map tells which pages may contain dead tuples and vacuum can only scan those.

Default value is 150m. Recommended is 80% of autovacuum_freeze_max_age

Again, a vacuum freeze instead of plain vaccumimplicitly sets this value to 0 ie. XID on all rows will be frozen.
autovacuum_vacuum_scale_factor Alanı
Varsayılan değer 0.2. Tablodaki satırların %20'sı MVCC yüzünden atıl hale gelirse auto vacuum başlar. Yazma işlemi daha fazla olan tablolarda autovacuum_vacuum_scale_factor daha düşük bir değer olabilir
Örnek
Şöyle yaparız
# Increase Shared Memory shared_buffers = 4GB # Optimize Disk I/O random_page_cost = 1.1 # Tune Autovacuum Settings autovacuum_vacuum_scale_factor = 0.1

commit_delay Alanı
Açıklaması şöyle
PostgreSQL has a commit_delay configuration where transaction commits are delayed for a configured amount. During this delay, more commits can arrive and be batched in the WAL memory (write-ahead log). If delay elapses, the database flushes the WAL changes from all batched transactions once saving multiple disk I/Os.
Örnek
Şöyle yaparız
# postgresql.conf # Set commit_delay to 10 milliseconds commit_delay = 10ms # Set wal_writer_delay to 5 milliseconds wal_writer_delay = 5ms
veya şöyle yaparız
ALTER SYSTEM SET commit_delay TO '30ms'; ALTER SYSTEM SET wal_writer_delay TO '200ms';

deadlock_timeout  Alanı
Açıklaması şöyle
In PostgreSQL, there's a configuration parameter called deadlock_timeout that sets the time to wait on a lock before checking for a deadlock. If the system detects a deadlock, it rolls back one of the transactions and returns an error.
effective_io_concurrency Alanı
Açıklaması şöyle
This parameter determines the maximum number of parallel I/O requests that a PostgreSQL instance can execute simultaneously, which in turn affects how efficiently the database can retrieve data from disk.
...
If your storage system has multiple disks, you may want to set the parameter to a value greater than 1 to take advantage of parallelism, resulting in improved I/O performance. However, if your storage system has a single disk, it is recommended to set the parameter to 1 to avoid the overhead of parallel I/O.

keystore_location Alanı
keystore_location ve tablespace_encryption_algorithm birlikte kullanılarak "Tablespace-Level Encryption" yapılabilir.

listen_addresses Alanı
İlk kurulumda sadece localhost'tan bağlantıya izin veriliyor. Aslında benzer bir yöntem diğer veritabanları tarafından da takip ediliyor. Örneğin MySql için açıklama şöyle
Usually, mysql server is accessed only from the same machine, in which case you can set it to listen only on a unix socket, or on a loopback interface.

If it indeed needs to be accessed from other machines, these are generally just a few ones, in which case you can firewall the port from any host but those that legitimately need access to it. (Moreover, if all mysql users are host-restricted, mysql itself won't allow connecting from hosts different than those)
Sadece localhost'tan bağlanmak genellikle yetersiz olduğu için değiştirmek gerekiyor. Açıklaması şöyle
By default, PostgreSQL DB server listen address is set to the 'localhost' , and we need to change it so it accepts connection from any IP address; or you can use comma separated list of addresses.
Genellikle "*" değeri atanarak makinedeki her ağ arayüzünden bağlanılabilmesi şeklinde değiştiriliyor.
Şöyle yaparız
listen_addresses = ‘*’
 Açıklaması şöyle.
Make PostgreSQL listen on all IP Addresses. This is mandatory if you want to connect or debug remotely. Search for "listen_addresses" and use '*' as its value. You can also change the port number (default 5432)
1. Eğer host based authentication yöntemi kullanılıyorsa pg_hba.conf dosyasında da benzer bir değişiklik yapmak gerekiyor. Bu dosyada hangi istemcilerin veri tabanı sunucusuna bağlanabileceği belirtiliyor.
2. Bu değişiklikten sonra veri tabanı sunucusunu tekrar başlatmak gerekiyor.
Örnek
Şöyle yaparız
# /etc/init.d/postgresql restart
log_min_duration_statement
Açıklaması şöyle
log_min_duration_statement: just logging the statements whose duration is more than this value in milliseconds. This is very useful for detecting slow queries.
Açıklaması şöyle
It sets the minimum execution time in milliseconds (ms) above which all statements will be logged.

The default value for the log_min_duration_statement parameter is -1, which disables logging statements.

Setting the PostgreSQL parameter log_min_duration_statement to 0 will print all statements' durations.
Örnek
Şöyle yaparız
# set log_min_duration_statement=1;
SET # show log_min_duration_statement; log_min_duration_statement ---------------------------- 1ms (1 row)
PostgreSQL log dosyalarındaki statement log cümleleri şöyledir
LOG: duration: 5.477 ms statement: insert into abc values(1); LOG: duration: 8.656 ms statement: insert into abc values(2);

log_statement_sample_rate
log_destination
Bu alanlar için Audit Trails yazısına bakabilirsiniz

log_destination Alanı
PostgreSQL 15 için açıklama şöyle
PostgreSQL supports several methods for logging server messages, including stderr (default), csvlog, and syslog. With this release, jsonlog is also supported which is convenient for exporting logs and debugging. In order to enable this, add the jsonlog under the log_destination inside the postgresql.conf file.

This will be a nice feature to have with exports to logging and monitoring tools like HoneyComb and DataDog etc.

max_connections Alanı
Açıklaması şöyle.
Maximum connections to DB, default is 100. Based on your application's requirement, you can set this number to a higher value. However, it is recommended to use a connections pool to safeguard from database overhead.
PostgreSQL thread kullanmıyor, process kullanıyor. Bu yüzden şeklen şöyle


Örnek
Şöyle yaparız
listen_addresses = '*' port = 5432 max_connections = 20 shared_buffers = 128MB temp_buffers = 8MB work_mem = 4MB wal_level = logical max_wal_senders = 3
max_parallel_workers_per_gather Alanı
Açıklaması şöyle
PostgreSQL allows parallel query execution to utilize multiple CPU cores for computationally intensive queries. The max_parallel_workers_per_gather setting controls the maximum number of worker processes that can be used per query.
Örnek
Şöyle yaparız
ALTER SYSTEM SET max_parallel_workers_per_gather = 4;

work_mem Alanı
Açıklaması şöyle
PostgreSQL uses the work_mem configuration parameter to determine the amount of memory available for sorting and hashing operations. Properly configuring work_mem can greatly impact the performance of queries involving sorting or hash joins.

Configuring the work_mem parameter in PostgreSQL is crucial for optimizing query performance, but it requires finding the right balance. Increasing work_mem on high-performance systems can speed up sorting and hashing operations, leading to faster query execution.

However, setting it too high might cause excessive memory usage and potential memory contention, impacting overall performance. To determine the ideal work_mem value, start conservatively and monitor system performance while gradually increasing it. Keep a close watch on memory consumption and query execution times. Additionally, consider using partitioned queries or limiting result set sizes for resource-intensive queries to prevent excessive memory usage. Striking the right balance ensures PostgreSQL operates efficiently without overwhelming system memory and maximizes the benefits of work_mem for improved query performance.

Örnek
Şöyle yaparız
SET work_mem = "128MB";

shared_buffers Alanı
Açıklaması şöyle
The shared_buffers parameter controls the amount of memory used for caching database pages in memory, while the effective_cache_size parameter estimates the amount of system memory available for caching data.
Açıklaması şöyle
Pro tip: If you are setting a PostgreSQL or MySQL, don't use the default database settings because those are meant for personal computers or notebooks.

One example is the Buffer Pool size:

- For MySQL, increase the innodb_buffer_pool_size
- For PostgreSQL, increase shared_buffers and effective_cache_size to match your OS cache size
Açıklaması şöyle
The best practice recommends setting the shared_buffers value to 1/4 (one-fourth) of the system memory (free -m).

Örnek
Şöyle yaparız
shared_buffers = 256MB effective_cache_size = 1GB
wal_level Alanı
Write-Ahead Log yazısına taşıdım

wal_writer_delay Alanı
Write-Ahead Log yazısına taşıdım

Örnek
Şöyle yaparız
max_wal_size = 8GB min_wal_size = 512MB checkpoint_completion_target = 0.9