21 Mart 2023 Salı

Data Partitioning - Hash partitioning

Giriş
Hash fonksiyonun çıktısı % tablo sayısı formülü kullanılarak hangi tabloya erişeleceği bulunur

Açıklaması şöyle
This type of partitioning is used to divide a table into partitions based on a hash value. This type of partitioning is useful when you want to evenly distribute data across partitions, but don’t have a specific value to partition on.
Açıklaması şöyle
Hash partitioning is similar to list partitioning, but instead of using a natural finite domain defined by the possible values of a column, it maps all possible values of a column to a finite list of possible values. The randomization used in hashing also ensures that partitions are of comparable sizes.
Şeklen şöyle

Bazı Zorlukları
Açıklaması şöyle. Hash fonksiyonunu veri tabanı sayısına göre yapmamak lazım. Bu çok kolay anlaşılıyor zaten
- I can never use incremental Id as the primary key.
- Primary or foreign keys and unique keys do not make sense anymore. it is only applied to the shard database table. we need another solution to get this feature back.
- The hashing function needs to be chosen very very carefully. e.g. For the above sample, when I add a machine, need to change the function into row_id%3. which is very very bad — re-shard is the cost is very very high: migration is complicated, potentially may lose data, and the downtime is unknown.
- Need to rewrite many queries (into very simple ones). no subquery, no joins, and no aggregate functions can be used.
- The transaction is hard to control. distributed system transaction is already complicated, sharding makes it even harder.
- When a slow query (after a query from multiple machines then merges and sorts in memory) happens, my luck depends on caching or the engine in the middle like vitess or shardingsphere.
- With above. horizontal sharding seems more suitable for NoSQL. but I don’t try until I have to.
Dolayısıyla şöyle bir işlem zor oluyor
Q: “Could you tell me how to query the recent 500th to 1000th updated records and order by modified date? ”

Örnek
Şöyle yaparız
-- Create the main table
CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    order_number INTEGER,
    customer_id INTEGER,
    total_amount DECIMAL(10,2)
);

-- Create the partition tables
CREATE TABLE orders_p1 (LIKE orders);
CREATE TABLE orders_p2 (LIKE orders);
CREATE TABLE orders_p3 (LIKE orders);

-- Attach the partition tables to the main table
ALTER TABLE orders ADD PARTITION BY HASH (order_number)
    (PARTITION orders_p1, PARTITION orders_p2, PARTITION orders_p3);



Hiç yorum yok:

Yorum Gönder