Giriş
Aslında hem Stream Processing hem de Streaming Database bileşimidir.
PostgreSQL uyumlu SQL kullanır. Açıklaması şöyle
Apache Flink, as a big-data system, has faced criticism regarding its ease-of-use and cost efficiency. Observing this, some vendors are building their own stream processing systems to offer an improved experience in the cloud. RisingWave, for example, is creating a distributed SQL streaming database from scratch, aiming to provide a PostgreSQL-like stream processing experience with a cloud-native architecture.Stream processing is poised to be the primary battleground in the realm of data streaming. The dominance of Flink as the de-facto standard remains uncertain, as emerging alternatives such as RisingWave and Materialize show potential for capturing larger market shares. Additionally, the competitive strategies employed by Flink-based vendors in their quest for superiority are of great interest to us.
Örnek
Şöyle yaparız
// the mapping for the driver's Kafka topicCREATE SOURCE driver_data (driver_id BIGINT,location VARCHAR,speed BIGINT,) WITH (connector = 'kafka',topic = 'driver_topic',properties.bootstrap.server = 'message_queue:29092',scan.startup.mode = 'earliest') ROW FORMAT JSON;
CREATE TABLE
Örnek
Şöyle yaparız
CREATE TABLE users (_id JSONB PRIMARY KEY, payload JSONB) WITH (
connector = 'kafka',
kafka.topic = 'dbserver1.random_data.users',
kafka.brokers = 'message_queue:29092',
kafka.scan.startup.mode = 'earliest'
) ROW FORMAT DEBEZIUM_MONGO_JSON;CREATE MATERIALIZED VIEW
Örnek - Join Kullanan
Şöyle yaparız.San Francisco'da en çok müşteri taşıyan sürücüleri görüntüler
CREATE MATERIALIZED VIEW most_active_drivers AS SELECT drivers.driver_id, drivers.location, drivers.rating, COUNT(rides.ride_id) as total_rides FROM drivers JOIN rides ON drivers.driver_id = rides.driver_id WHERE drivers.location = 'San Francisco' GROUP BY drivers.driver_id, drivers.location, drivers.rating ORDER BY total_rides DESC
Örnek
Şöyle yaparız
CREATE MATERIALIZED VIEW page_visits_mv AS
SELECT page_id,
COUNT(*) AS total_visits,
COUNT(DISTINCT user_id) AS unique_visitors,
MAX(timestamp) AS last_visit_time
FROM website_visits
GROUP BY page_id;Örnek
Şöyle yaparız
CREATE TABLE users (_id JSONB PRIMARY KEY, payload JSONB) WITH (
connector = 'kafka',
kafka.topic = 'dbserver1.random_data.users',
kafka.brokers = 'message_queue:29092',
kafka.scan.startup.mode = 'earliest'
) ROW FORMAT DEBEZIUM_MONGO_JSON;
CREATE MATERIALIZED VIEW normalized_users AS
SELECT
payload ->> 'name' as name,
payload ->> 'email' as email,
payload ->> 'address' as address
FROM
users;
SELECT * FROM normalized_users LIMIT 10;TUMBLE
Örnek
Şöyle yaparız
CCREATE SOURCE live_stream_metrics (
client_ip VARCHAR,
user_agent VARCHAR,
user_id VARCHAR,
-- The live room.
room_id VARCHAR,
-- Sent bits per second.
video_bps BIGINT,
-- Sent frames per second. Typically 30 fps.
video_fps BIGINT,
-- Round-trip time (in ms). 200ms is recommended.
video_rtt BIGINT,
-- Lost packets per second.
video_lost_pps BIGINT,
-- How long was the longest freeze (in ms).
video_longest_freeze_duration BIGINT,
-- Total freeze duration.
video_total_freeze_duration BIGINT,
report_timestamp TIMESTAMPTZ,
country VARCHAR
) WITH (
connector = 'kafka',
topic = 'live_stream_metrics',
properties.bootstrap.server = 'message_queue:29092',
scan.startup.mode = 'earliest'
) ROW FORMAT JSON;
-- A real-time dashboard of the total UV.
CREATE MATERIALIZED VIEW total_user_visit_1min AS
SELECT
window_start AS report_ts,
COUNT(DISTINCT user_id) AS uv
FROM
TUMBLE(
live_stream_metrics,
report_timestamp,
INTERVAL '1' MINUTE
)
GROUP BY window_start;
SELECT * FROM total_user_visit_1min LIMIT 1;
Hiç yorum yok:
Yorum Gönder