7 Mart 2023 Salı

Write-ahead log (WAL) Neden Gerekir

Giriş
WAL dosyası şu iki işe yarar
1. I/O Maliyetini Azaltma
2. Crash Recovery

1. I/O Maliyetini Azaltma
1. WAL dosyası Data File dosyasına göre daha küçüktür
2. Data File dosyası aslında daha küçük Data Page yapılarına bölünmüştür ama bir işlem bir sürü Data Page dosyasına rastgele yani sırasız bir şekilde dokunabilir. Açıklaması şöyle
1. So WAL is much smaller than the actual data files, so we can flush them relatively faster.
2. They are also sequential unlike data pages changes which are random.
Bir başka açıklama şöyle. Sadece bir kaç byte değişirse bile Data Page  dosyasının tamamını diske yazmak lazım. WAL dosyası append only olduğu için sadece bir kaç byte yazılıyor
WALs are not just smaller but also need fewer IOs. You have to write out a full sector to disk even if only one bit in it changed. WALs are append only and you only write to the end of the file. If you modify 4 rows that data may be in 4 sectors but may only need 1 WAL sector.

And even if you write multiple WAL sectors they are sequential which is faster than doing random IOs. The advantage was massive with spindle drives but even on SSDs it is faster.
Açıklaması şöyle. Yani bir sürü page'i diske yazmaya kalkarsak bu çok fazla I/O maliyeti getiriyor.
Pages
Both data and indexes live as fixed-size pages in data files on disk. Databases may do it differently but essentially every table/collection is stored as an array of fixed-size pages in a file. The database can tell which offset to seek to from the page number and the page length is used to length of bytes to read. This allows the database to issue an operating system read and get that exact page in memory.

#include <unistd.h>

       ssize_t read(int fd, void *buf, size_t count);
That single I/O to read a page will get you not one row but many that is because a page will have many rows/documents. And similarly, to update a column in one row in the page you have to flush the entire page to disk even though only few bytes might have changed.

This is key to understanding database performance. You want pages to remain “dirty” and hopefully receive a lot of writes so we can flush it once to disk to minimize I/O.

WAL
I can start a transaction that updates a row in page 0, inserts a new row in page 10 and deletes a row from page 1 and to commit my transaction I can write the three pages to disk. Two problems with this approach. Small writes lead to massive I/Os which slows down commits. The second problem and the worse is if the database crashed midway of writing the pages, we have some pages written some are not and we lose atomicity. In addition we have no reference on startup after crash to remove pages belonging to transactions that crashed and must be rollbacked as we don’t even know what the original page looked like.

Meet WAL or write-ahead log. How about we do this instead? In addition to updating pages in memory we also record the changes transactions make in an append log and we flush the log to disk when the transaction commits. We know flushing the log is going to be fast because the changes are much smaller than writing the full pages of data files and indexes. And yes we can safely leave the updated data pages in memory and not flush them to disk saving massive I/Os, I’ll explain why later.

By “writing-ahead” in this log file, the data files on disk are technically left behind and older than what is in the log file, this is where the name write-ahead in WAL comes from.
2. Crash Recovery
WAL dosyasındaki en son checkpoint anından sonraki değişiklikler tekrar uygulanır. Açıklaması şöyle. Yani WAL satırları Data File dosyalarına uygulanır
As the database starts back up, the file is out of date we can’t just pull it on memory and have clients read them, the WAL is the source of truth, we need to REDO the changes in the WAL back on the data files, and during that process nothing is allowed to read (otherwise we get corruption). The database startup time slows down the more out of sync the data files are from the WAL (many writes has happened but the data files were not updated for a long time).
3. WAL Yazması Yarım Kalırsa
Açıklaması şöyle.  Transaction bitmediği için WAL'deki yarım kayıtlar temizlenir
I know what you’re thinking, what if we crashed midway through writing the WAL? That is also fine because we know exactly what transaction wrote those WAL changes and upon startup we will clean up the WAL records that belong to transactions that didn’t commit, we effectively roll it back.
Açıklaması şöyle
For example if you are in the middle of a transaction and the database crashed, we consider the transaction rolled-back by default, so WAL entries flushed by this uncommitted transaction will be discarded.















Hiç yorum yok:

Yorum Gönder