1 Haziran 2023 Perşembe

Materialize - Streaming Database

Giriş
Açıklaması şöyle
What is Materialize?

Materialize is a streaming database for real-time applications. It allows you to work with streaming data from multiple external sources using nothing but standard SQL.

You write arbitrarily complex queries; Materialize takes care of maintaining the results automatically up to date as new data arrives.

Your applications can query Materialized to get blazing fast results which often falls in the sub-second latency range.
PostgreSQL Uyumluluğu
Açıklaması şöyle
As a developer, you work with Materialize as if it were a PostgreSQL database. Materialize is wire compatible with Postgres, allowing you to use a CLI tool like psql or mzcli (a wrapper around psql) to issue ANSI-92 compatible SQL statements against it.
CREATE SOURCE
Örnek
Şöyle yaparız
CREATE SOURCE raw_pageviews FROM KAFKA BROKER 'kafka:9092' TOPIC 'page_views' FORMAT BYTES;

// The raw_pageviews source produces data as a single text column containing JSON. 
// To extract the JSON fields for each page view event, you can use the built-in // jsonb operators
CREATE VIEW pageviews AS
  SELECT
    (pageview_data->'user_id')::INT as user_id,
    (pageview_data->'url')::STRING as url,
    (pageview_data->'channel')::STRING as channel,
    TO_TIMESTAMP((pageview_data->'received_at')::INT AS ts
  FROM (
    SELECT CONVERT_FROM(data, 'utf8')::jsonb AS pageview_data
    FROM pageviews
);
Buraya kadar sadece non-materialized view yarattık. Sonra şöyle yaparız
CREATE MATERIALIZED VIEW page_views_by_channel AS
SELECT channel,
       count(*) as pageview_count
FROM pageviews
GROUP BY channel;

// Reading the output of a materialized view
SELECT * from page_views_by_channel;
CREATE SINK
Şöyle yaparız
CREATE SINK dashboard_sink
FROM page_views_by_channel
INTO KAFKA BROKER 'localhost' TOPIC 'frank-quotes-sink'
FORMAT AVRO USING
    CONFLUENT SCHEMA REGISTRY 'http://localhost:8081';
Joining multiple data sources
Şöyle yaparız. Burada pageviews ve users birleştiriliyor
CREATE MATERIALIZED VIEW pageviews_by_user_segment AS
SELECT
  users.is_vip,       
  pageviews.channel,       
  date_trunc('hour', pageviews.ts) as ts_hour,
  count(*) as pageview_count     
FROM users     
JOIN pageviews ON pageviews.user_id = users.id     
GROUP BY 1,2,3;

Hiç yorum yok:

Yorum Gönder