PostgreSQL Çorbası: Ağustos 2021

30 Ağustos 2021 Pazartesi

JDBC PostgreSQL Connection String

Giriş

Açıklaması şöyle. Yani host, port, database parametreleri tanımlanabilir.. Eğer tanımlanmazsa her birinin varsayılan değeri var. Örneğin database tanımlanmazsa bağlanırken kullanılan username kabul edilir.

... this takes one of the following forms:
- jdbc:postgresql:database
- jdbc:postgresql:/
- jdbc:postgresql://host/database
- jdbc:postgresql://host/
- jdbc:postgresql://host:port/database
- jdbc:postgresql://host:port/

The parameters have the following meanings:

host = The host name of the server. Defaults to localhost . To specify an IPv6 address your must enclose the host parameter with square brackets, for example: jdbc:postgresql://[::1]:5740/accounting

port = The port number the server is listening on. Defaults to the PostgreSQL® standard port number (5432).

database = The database name. The default is to connect to a database with the same name as the user name used to connect to the server.

JDBC ile bağlanmak için gereken parametreler Postgres jdbc documentation sayfasında.

Veri Tabanı İsmi Belirtilmezse

Yani jdbc:postgresql://host:port/ şeklide belirtirsek varsayılan veri tabanı ismi kullanıcı ismi ile aynı. Açıklaması şöyle

... the dbname parameter in the connection string defaults to the value of the user parameter.

currentSchema

Schema yazısına taşıdım. Kısaca birden fazla schema'yı görebilsin diye şöyle yaparız

jdbc:postgresql://localhost:5432/tst?currentSchema=app1,public

reWriteBatchedInserts

Açıklaması şöyle

Asking PostgreSQL to Rewrite batched inserts
Hibernate will send a multiple insert statements to RDBMS at once, in order to insert data, and this will be done in the same transaction which is great; however if we are using PostgreSQL we could go a little further in our optimizations by asking him to rewrite those inserts to a single multi-value insert statement; this way if we have a 100 insert statements, it will rewrites them to a single multi-value statement.

Ayrı ayrı olan bir çok INSERT cümlesini tek bir INSERT + çoklu VALUES haline getirir. Yani şöyle olur

// before Rewrite batched inserts
INSERT INTO container( ...) VALUES (...);
INSERT INTO container( ...) VALUES (...);
....
INSERT INTO container( ...) VALUES (...);
 
 
// After PostgreSQL to Rewrite batched inserts
INSERT INTO container( ...) VALUES (...),(...) ..., (...);

Örnek

Şöyle yaparız

jdbc:postgresql://localhost:5432/mastership?reWriteBatchedInserts=true

Prepared Statement Ayarları

3 tane ayar yapılabilir. Açıklaması şöyle

prepareThreshold, default: 5
preparedStatementCacheQueries, default: 256
preparedStatementCacheSizeMiB, default: 5

prepareThreshold

Açıklaması şöyle 5 prepared statement çalıştırıldıktan sonra server-side hale geliyor.

When a prepared statement is defined and executed in Java, it doesn't create a server-side prepared statement. Instead, it waits for 4 client-side executions before a server-side prepared statement is created.

preparedStatementCacheQueries

Açıklaması şöyle. Prepared statement kapatılsa bile sunucuda yaşamaya devam ediyor.

A prepared statement is automatically cached. It's logical a prepared statement is kept available as long as the prepared statement is not closed. It's less obvious that the prepared statement is cached even if it's closed. The reason for the cache is to keep a potential server-side prepared statement in existence.

preparedStatementCacheSizeMiB

Açıklaması şöyle

The prepared statement cache described with preparedStatementCacheQueries also has a limit in size, which is 5 MB. Please mind this is the client-side prepared statement itself, not the data.

What I don't know for the cache's number of queries and size is how that relates to open/used prepared statements, in other words: is the number the total (open+closed) number or the number of closed prepared statements, and does this limit open cursors.

user + password Ayarları

Örnek

Şöyle yaparız

jdbc:postgresql://myserver:5432/test?user=postgres&password=<postgresql-password>

ssl Ayarları

Açıklaması şöyle. Yani SSL kullanılmasını etkinleştirir.

Connect using SSL. The server must have been compiled with SSL support. This property does not need a value associated with it. The mere presence of it specifies an SSL connection. However, for compatibility with future versions, the value “true” is preferred.

Şöyle yaparız

jdbc:postgresql://postgres.example.com/test_db?user=bob&password=secret&ssl=true

SSL çeşitli modlarda çalışabilir. Modlar sslmode=XXX şeklinde belirtilir. Modlar şöyle

disable
allow : Non-validating
prefer : Non-validating
require : Non-validating
verify-ca : Validating
verify-full : Validating

verify-full Modu

Açıklaması şöyle. Yani varsayılan davranış verify-full

As of 42.2.5 ssl=true implies verify-full as per the release notes. If you wish to get the old behaviour use sslmode=require

Bu modda şöyle bir hata alabiliriz

org.postgresql.util.PSQLException: Could not open SSL root certificate file
  blahblah/.postgresql/root.crt.

veya

org.postgresql.util.PSQLException: Could not open SSL root certificate file 
  C:\Users\user\AppData\Roaming\postgresql\root.crt.

Örnek

Şöyle yaparız

jdbc:postgresql://myamazondomain.rds.amazonaws.com:5432/db_name?
sslmode=verify-full&
sslfactory=org.postgresql.ssl.SingleCertValidatingFactory&
sslfactoryarg=classpath:cert/rds-ca-cert_name.p12

Açıklaması şöyle

We need to use SingleCertValidatingFactory class to specify certificate file on classpath (or from file system, environment variables etc). This class has argument sslfactoryarg where we can add path to certificate file.

require modu

Açıklaması şöyle

require, allow and prefer all default to a non-validating SSL factory and do not check the validity of the certificate or the host name.

19 Ağustos 2021 Perşembe

PgBouncer - Connection Pooler

Giriş

Açıklaması şöyle.

A lightweight connection pooler for PostgreSQL that can help manage connections and improve performance, but lacks advanced features like replication and failover support. PGBouncer is a good option if you need a simpler solution focused primarily on connection pooling.

Açıklaması şöyle

PostgreSQL was built to handle a small number of long-lasting connections.
That’s why it has a relatively small default limit of connections and it’s not uncommon for people to get the error “connection limit exceeded for non-superusers”.
...
The answer is using a connection pool. Sometimes the framework you’re using offers that (like SQLAlchemy), other times you might need a specialized tool.

But if you’re using multiple machines to serve your API (as you probably are), you will run into the previous problem again since your framework pooler will only be responsible for managing the instance it’s running on.

When you have multiple servers, a good solution is using PGBouncer.

This amazing tool will allow you to connect multiple servers to it instead of connecting directly to Postgres.

It will queue incoming connections when the concurrent amount is greater than the one specified by you.

PgBouncer Neden Lazım

Belki SpringBoot + Hikari bizim için "connecton pooling" yapıyor. Ancak merkezi veri tabanımıza bağlanan 100 tane micro service olsun. Her birisi 2 tane bağlantı açsa yine de 200 tane eder. Bu durumda Hikari yerine PgBouncer kullanmak çok daha iyi.

Şeklen şöyle

Eğer PgBouncer önüne de HAProxy yani bir Load Balancer koyarsak şeklen şöyle

pgbouncer komutu

Başlatmak için şöyle yaparız

pgbouncer -d pgbouncer.ini

reload için şöyle yaparız

pgbouncer -d -R pgbouncer.ini

Log dosyaları şöyle

tail -f pgbouncer.log

pgbouncer.ini Dosyası

1. databases Başlığı

[databases] başlığı altında arka taraftaki veri tabanı için bağlantılar belirtilir.

Örnek

Şöyle yaparız

* = host=localhost port=5432

Örnek

Şöyle yaparız

[databases]
mydatabase = host=127.0.0.1 port=5432 dbname=mydatabase

2. pgbouncer başlığı

[pgbouncer] başlığı altında PgBouncer sunucusu için ayarlar belirtilir

admin_users Alanı

Açıklaması şöyle

This is the database user who will have permission to make PgBouncer settings in the database.

Örnek

Şöyle yaparız

;;;
;;; Users allowed into database 'pgbouncer'
;;;

;; comma-separated list of users who are allowed to change settings
;admin_users = user2, someadmin, otheradmin
admin_users = my_db_user
...

auth_file Alanı

Açıklaması şöyle

The auth_file setting in the pgbouncer configuration file specifies the path to a file that contains a list of users and their passwords for authentication. When a client attempts to connect to the pgbouncer, it checks the user’s credentials against the list of users in the userlist.txt file.

"username" "password"
This file contains a single user named username with a password of password. You can add additional users to the file by adding additional lines with the format “username” “password”.

It is important to note that the userlist.txt file should be secured with appropriate permissions, as it contains sensitive information. By default, pgbouncer expects the userlist.txt file to be owned by the same user as the pgbouncer process and only readable by that user. As per your environment, you can modify the ownership and permissions of the file.

Moreover, it’s advisable to refrain from using a plain text file for storing user credentials. In production environments, it is recommended to use a more secure authentication method, such as SSL or GSSAPI, instead of plain text passwords.

Örnek

Şöyle yaparız

[databases]
mydatabase = host=127.0.0.1 port=5432 dbname=mydatabase

[pgbouncer]
listen_addr = *
listen_port = 6432
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt
pool_mode = transaction
max_client_conn = 100
default_pool_size = 20

auth_type Alanı

Açıklaması şöyle

In the “Authentication settings” section, set auth_type = md5 to authenticate users by a password. The file with the database login and password will be located at /etc/pgbouncer/userlist.txt

listen_port Alanı

Örnek

Şöyle yaparız. Burada pgbouncer 6432 portunu dinliyor

$ cat /etc/pgbouncer/pgbouncer.ini
 
[databases]
world = host=127.0.0.1 port=5432 dbname=world
 
[pgbouncer]
logfile = /var/log/pgbouncer/pgbouncer.log
pidfile = /var/run/pgbouncer/pgbouncer.pid
listen_addr = *
listen_port = 6432
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt
admin_users = admindb

Örnek

Şöyle yaparız. Burada pgbouncer 6432 portunu dinliyor

[databases]
* = host = free-tier.gcp-us-central1.cockroachlabs.cloud  dbname = artem-freetier-2924.defaultdb user = artem port = 26257

 [pgbouncer]
 listen_port = 6432
 listen_addr = *
 auth_type = trust
 auth_file = userlist.txt
 logfile = pgbouncer.log
 pidfile = pgbouncer.pid
 admin_users = artem
 ignore_startup_parameters=extra_float_digits,options

# free tier provided cert
server_tls_sslmode = verify-full
server_tls_ca_file = /Users/artem/.postgresql/root.crt

max_client_conn Alanı

Kaç tane bağlantı olabileceğini belirtir

Örnek

Şöyle yaparız

;;;
;;; Connection limits
;;;

;; Total number of clients that can connect
max_client_conn = 5000
...

pool_mode Alanı

Sanırım hep transaction olarak tanımlamıyor

Örnek

Şöyle yaparız

;;;
;;; Pooler personality questions
;;;

;; When server connection is released back to pool:
;;   session      - after client disconnects (default)
;;   transaction  - after transaction finishes
;;   statement    - after statement finishes
pool_mode = transaction
...

userlist.txt Dosyası

Authentication için gerekir. Açıklaması şöyle

It takes the form of "username" and "password" separated by a space on each new line.

Örnek

Şöyle yaparız

$ cat /etc/pgbouncer/userlist.txt
"admindb" "root123"

Açıklaması şöyle

So, in this case, I have installed PgBouncer in the same database node, listening in all IP addresses, and it connects to a PostgreSQL database called “world”. I am also managing the allowed users in the userlist.txt file with a plain-text password that can be encrypted if needed.

PostgreSQL 13 ve Öncesine Kadar

Açıklaması şöyle

If you are using PostgreSQL versions up to 13 inclusively, then the default password_encryption method is md5.

Place the username in double quotes and the md5 password hash (in one line):

Örnek

Şöyle yaparız

"my_db_user" "md5badc318d987f61146c6ad8e15d84a111"

md5 hesaplamak için şöyle yaparız

echo "md5"$(echo -n 'YourdbpasswordYourdbusername' | md5sum | awk ' { print $1 } ')

PostgreSQL 14 ve Sonrası

Açıklaması şöyle

If you are using PostgreSQL versions starting from 14, then the default password_encryption method is scram-sha-256.

Örnek

Şöyle yaparız

"my_db_user" "SCRAM-SHA-256$4096:lLN4+i05+kpeffD4s3rRiw==$Oq62iUGamAaF5cpB+agWV4u3xfc5cZCRtvMhmA+Zm3E=:hHkCesEi0p0wLWk1uUEeTtJTYLXHKDLdy2te3VAOe8s="

scram-sha-256 hesaplamak için şöyle yaparız

psql -h  -p  -Atq -U postgres -d postgres 
  -c "SELECT concat('\"', usename, '\" \"', passwd, '\"') FROM pg_shadow"

11 Ağustos 2021 Çarşamba

Comparison Operators - BETWEEN

Giriş

Comparison Operators şöyle

< less than
> greater than
<= less than or equal to
>= greater than or equal to
= equal
<> or != not equal
BETWEEN

Örnek

Şöyle yaparız

select * from test a
inner join (
  select distinct order_id from test
  where status_update = 'Delivered'
) b on a.order_id = b.order_id
where status_date between '2020-01-01' and '2020-04-25'

5 Ağustos 2021 Perşembe

COPY - Exporting and Importing Data

Giriş

Açıklaması şöyle. Yani COPY, COPY TO ve COPY FROM komutları var

The COPY command can import data to Postgres if access to text, CSV, or binary format data.
...
The file mentioned in the COPY command must be accessible by the Postgres user and should be specified from the perspective of the Postgres server.

The command can also use a SELECT query to load data to a table. It also allows you to specify a list of columns to insert the data into instead of the whole table. On successful completion, the COPY command outputs result in the format COPY count, denoting the number of rows inserted by the command.

Both the text and the CSV file formats allow you to specify a delimiter. But if your input is CSV, it is better to use the CSV format with the DELIMITER option rather than the TEXT format since CSV format adheres to the commonly accepted CSV escaping mechanism. By contrast, the TEXT format follows Postgres-specific escaping rules.

Açıklaması şöyle

The COPY command has many optional parameters that you can use to customize its behavior. Some of the important ones are listed below:
QUOTE: Specify the character used to quote the data values.
NULL: Specifies the character used to represent the NULL value.
ESCAPE: Specifies the character used to escape a character that is being used as the QUOTE character.
ENCODING: Used to describe the encoding of the file. If nothing is mentioned, it defaults to client encoding.

EXPORT

Örnek - CSV

Şöyle yaparız

COPY table_name TO '/path/to/file.csv' DELIMITER ',' CSV HEADER;
COPY table_name FROM '/path/to/file.csv' DELIMITER ',' CSV HEADER;

IMPORT

Söz dizimi

COPY [tablename] FROM [filename] şeklindedir

Örnek - CSV

Şöyle yaparız

COPY customer FROM '/home/data/customer.csv' DELIMITER ',' CSV HEADER;

Örnek - CSV

CSV dosyasındaki bazı sütunları almak için şöyle yaparız

COPY customer(first_name,last_name,email) FROM '/home/data/customers1.csv' DELIMITER ',' CSV HEADER;

Örnek

Şöyle yaparız

-- Create temporary table
DROP TABLE IF EXISTS music_track;
CREATE TABLE music_track (
  id         varchar primary key,
  artist_id  varchar,
  title      varchar
);

-- first we copy from the csv into a temporary table
-- if the file is inside the server, you can use COPY
-- otherwise, use \copy to point to your machine
\copy music_track -- music_track is the table name
FROM '/home/username/data/music-track-external.csv' -- using absolute path on client side
WITH (format csv, header); -- file is csv and first line is the header

-- we use the temporary table to populate the correct values into the production table
INSERT INTO music_catalog (track_id, artist_id, title, url)
  SELECT music_track.id, music_track.artist_id, music_track.title, music_file.url 
  FROM music_track INNER JOIN music_file ON music_track.id = music_file.track_id
ON CONFLICT (music_catalog.track_id) DO UPDATE SET
  artist_id = music_track.artist_id
  title     = music_track.title,
  url       = music_files.url;

Açıklaması şöyle

In the example above, we needed to import a large amount of data from an external source. Using COPY or \copy (depending on if you have the file in the server or the client) is probably the fastest option.

You can also leverage the power of “INSERT INTO SELECT” to merge the data from a COPY with data from other tables, loading the initial data into a temporary table first.

Insert Into Select - Bulk insert/update İçindir

Açıklaması şöyle

You can also leverage the power of “INSERT INTO SELECT” to merge the data from a COPY with data from other tables, loading the initial data into a temporary table first.

4 Ağustos 2021 Çarşamba

Subquery Expressions - EXISTS

Giriş

Subquery sadece satır olup olmadığıyla ilgilenir. Satırın içeriği önemli değildir. Bu yüzden SELECT 1 kullanılabilir.

NOT IN vs NOT EXISTS

Açıklaması şöyle

NOT EXISTS will usually outperform “NOT IN” by a good margin.

Kullanım

Örnek

bar.geom noktasından 10 birimden fazla uzak olanları foo'ları seçmek için şöyle yaparız. bar.geom ve foo.geom aynı birimden olmalı. Eğer bu ikisinin birimi metre ise 10 birim de aslında 10 metre anlamına gelir.

SELECT *
FROM   foo
WHERE  NOT EXISTS (
  SELECT 1
  FROM   bar
  WHERE  ST_DWithin(bar.geom, foo.geom, 10)
);

PostGIS ST_Dwithin

Giriş

Belirtilen noktanın X birim uzağına düşen noktaları verir

Örnek

SELECT *
FROM   foo
WHERE  NOT EXISTS (
  SELECT 1
  FROM   bar
  WHERE  ST_DWithin(bar.geom, foo.geom, 10)
);

2 Ağustos 2021 Pazartesi

High Availability

Giriş

Tuning your PostgreSQL for High Performance yazısına bakılabilir

Sanırım 3 tane temel senaryo var. Bunlar şöyle

1. Running PostgreSQL Outside of Kubernetes

Şeklen şöyle

Açıklaması şöyle

The PostgreSQL cluster consists of a Master Node and a Standby Node. In case the Master Node failed for some reason (e.g., hardware or network defect) the standby server can overtake the role of the master. The PG-Bouncer in this picture is a component from Postgres and acts as a kind of reverse proxy server. In case of a failure, the switch from the Master to the Standby node can be done by an administrator or can be automated by scripts. From the view of a client, this switch is transparent.

2. Running PostgreSQL Inside of Kubernetes

Örnek

Deployment şöyledir

apiVersion: apps/v1
kind: Deployment
metadata:
  name: postgres
  namespace: spring-keycloak-demo
spec:
  selector:
    matchLabels:
      app: postgres
  replicas: 1
  template:
    metadata:
      labels:
        app: postgres
    spec:
      containers:
        - name: postgres
          image: postgres:latest
          ports:
            - containerPort: 5432
          env:
            - name: POSTGRES_DB
              value: postgres
            - name: POSTGRES_USER
              value: postgres
            - name: POSTGRES_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: spring-keycloak-secrets
                  key: postgres-pass

service şöyledir

apiVersion: v1
kind: Service
metadata:
  name: postgres-service
  namespace: spring-keycloak-demo
  labels:
    app: postgres
spec:
  selector:
    app: postgres
  ports:
    - port: 5432
  type: NodePort

Örnek

Şöyle yaparız. storageClassName is specific to Kubernetes cluster

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgres
  labels:
    component: postgres
spec:
  selector:
    matchLabels:
      component: postgres
  serviceName: postgres
  template:
    metadata:
      labels:
        component: postgres
    spec:
      containers:
        - name: postgres
          image: postgres:11
          ports:
            - containerPort: 5432
          volumeMounts:
            - mountPath: /var/lib/postgresql/data
              name: postgres-data
          env:
            - name: POSTGRES_DB
              value: postgres
            - name: POSTGRES_USER
              value: postgres
            - name: POSTGRES_PASSWORD
              value: postgres
  volumeClaimTemplates:
    - metadata:
        name: postgres-data
      spec:
        accessModes:
          - ReadWriteOnce
        storageClassName: hostpath
        resources:
          requests:
            storage: 5Gi

Service olarak şöyle yaparız

apiVersion: v1
kind: Service
metadata:
  name: postgres
  labels:
    component: postgres
spec:
  selector:
    component: postgres
  ports:
    - port: 5432

3. Running PostgreSQL on a Distributed Block Storage

Select For Share - Read Lock

Giriş

Açıklaması şöyle

A weaker form of select for update is the select for share query. It is an ideal for ensuring referential integrity when creating child records for a parent.

Örnek

Açıklaması şöyle

Suppose that we want to create a new purchase for a user. First, we would select the user from the database and then insert a new record in the purchases database. Can we safely insert a new purchase into the database? With a regular select statement we can’t. Other processes could delete the user in the moments between selecting the user and inserting the purchase.

One way to avoid potential issues is to query for the user with the FOR SHARE locking clause.

Şöyle yaparız

process A: BEGIN;
process A: SELECT * FROM users WHERE id = 1 FOR SHARE;
process B: DELETE FROM users WHERE id = 1;
-- process B blocks and must wait for process A to finish

process A: INSERT INTO purchases (id, user_id) VALUES (1, 1);
process A: COMMIT;
-- process B now unblocks and deletes the user

Açıklaması şöyle

Select for share prevented other processes from deleting the user, but does not prevent concurrent processes from selecting users. This is the major difference between select for share and select for update.

The select for share prevents updates and deletes of rows, but doesn’t prevent other processes from acquiring a select for share. On the other hand, select for update also blocks updates and deletes, but it also prevents other processes from acquiring a select for update lock.

Select For Update Skip Locked

Giriş

Açıklaması şöyle. Yani bir bir transaction tarafından "select for update" ile kilitlenmiş satırları hariç bırakır ve geri kalan satırları verir

Processing Non-Locked Database Rows
Select for update can be a rigid lock on your table. Concurrent processes can be blocked and starved out. Waiting is the slowest form of concurrent processing. If only one CPU can be active at a time, it is pointless to scale your servers. For this purpose, in PostgreSQL there is a mechanism for selecting only rows that are not locked.

The select ... for update skip locked is a statement that allows you to query rows that have no locks. Let’s observe the following scenario to grasp its use case:

Örnek

Şöyle yaparız

//session 1
BEGIN;
SELECT * FROM student WHERE id=1 FOR UPDATE; 

//session 2
SELECT * FROM student FOR UPDATE SKIP LOCKED;

Örnek

Şöyle yaparız

process A: SELECT * FROM purchases
process A:   WHERE processed = false FOR UPDATE SKIP LOCKED;
process B: SELECT * FROM purchases
process B:   WHERE created_at < now()::date - interval '1w';
process B:   FOR UPDATE SKIP LOCKED;
-- process A selects and locks all unprocess rows
-- process B selects all non locked purchases older than a week

process A: UPDATE purchases SET ...;
process B: UPDATE purchases SET ...;

Örnek

Şöyle yaparız. Burada aynı anda birden fazla kişi bu cümleyi çalıştırsa bile, birbirlerini beklemedikleri için aslında bir anlamda işleri bölüşerek çalıştırma imkanı oluyor. Mesela 100 tane satır varsa ve Limit 10 ise, yani 10'luk batch'ler halinde çalıştırıyorsak, 4 kişi paralel çalışabilir.

UPDATE scheduled_tasks st1 SET picked = true, …
WHERE <instance> IN (
    SELECT <instance> FROM scheduled_tasks st2
    WHERE <due-condition>
    FOR UPDATE SKIP LOCKED
    LIMIT <limit>)
RETURNING st1.*