Giriş
Pattern matching veya full text search başlığı altına girer. Açıklaması şöyle. trgm kısaltması trigrams anlamına gelir.
Örnek
Şöyle yaparız.
Açıklaması şöyle
Pattern matching veya full text search başlığı altına girer. Açıklaması şöyle. trgm kısaltması trigrams anlamına gelir.
The pg_trgm module provides functions and operators for determining the similarity of alphanumeric text based on trigram matching
trigram için açıklaması şöyle. Yani 3 karakter anlamına gelir.
Postgres provides a module called pg_trgm which ships different functions to work with trigrams matching. A trigram is a group of three consecutive characters taken from a string. Based on this we can measure the similarity of two strings by counting the number of trigrams they share and estimating how similar they are on a scale between 0 and 1. For example, the string “hello” would be represented by the following set of trigrams:“ h”, “ he”, “hel”, “ell”, “llo”, “lo “In order to make use of this module we need to enable it using the following statement:CREATE EXTENSION pg_trgm;
pg_trm module için açıklama şöyle. Yani diğer araçlar kadar gelişmiş olmayabilir
PostgreSQL has a lot of advanced extensions and indexes to speed up the same. But one should keep in mind that these have their own limitations in terms of speed and might not work with languages other than English/multi-byte encoding, such as UTF-8. However, these fill the critical gap where the functionalities mentioned are necessary, but we cannot go for a full-fledged search solution such as Solr/Elasticsearch.
Kurulum
Şöyle yaparız
CREATE EXTENSION pg_trgm;
tsvector Nedir?
tsvector Veri Yapısı Nedir? yazısına bakabilirsiniz
to_tsvector Metodu - Text to tsvector Type
to_tsvector metodu yazısına bakabilirsiniz
tsquery Nedir? - Searching Against the Vector
Örnek
Şöyle yaparız ve çıktı olarak şunu alırız
SELECT to_tsvector('the cat got scared by a cucumber') @@to_tsquery('scares') true
Açıklaması şöyle
By now, we know that the tsvector is not a standard text column, and there is a lot of pre-processing that happens for easier search. We need to use a particular function called tsquery to search against the tsvector column.The output as we see is simply true or false. In this case, note that we searched for the word scares, but still, the result was true. That was because of the normalization we saw earlier. The @@ operator is used for direct matches. We have other operators that perform more sophisticated matches.
Örnek
Şöyle yaparız
SELECT episode_narrative FROM storm_events WHERE
episode_narrative_tsv @@ to_tsquery('logan & international & airport')
GIST Index Yaratma - Generalized Search Tree
Örnek
Şöyle yaparız
CREATE TABLE IF NOT EXISTS PEOPLE (
ID VARCHAR(36) PRIMARY KEY,
NICKNAME VARCHAR(32) UNIQUE NOT NULL,
NAME VARCHAR(100) NOT NULL,
BIRTHDATE VARCHAR(10) NOT NULL,
STACK TEXT[],
SEARCH_TRGM TEXT
);
CREATE EXTENSION IF NOT EXISTS PG_TRGM;
CREATE INDEX CONCURRENTLY IF NOT EXISTS IDX_PEAPLE_SEARCH_TGRM
ON PEOPLE USING GIST (SEARCH_TRGM GIST_TRGM_OPS);
GIN Index Yaratma - Generalized Inverted Index
gin biraz daha hızlı bir Full Text Search index galiba.
Örnek
Şöyle yaparız
CREATE INDEX trgm_idx on storm_events USING gin(event_type GIN_TGRM_OPS);
Örnek - tsvector alanına göre index
Şöyle yaparız
ALTER TABLE storm_events ADD COLUMN episode_narrative_tsv tsvector;
UPDATE storm_events SET episode_narrative_tsv = to_tsvector(episode_narrative);
CREATE INDEX tsv_idx ON storm_events USING gin(episode_narrative_tsv);
operator <->
Şöyle yaparız.
SELECT t, 'super cool' <-> t AS dist
FROM test_trgm
ORDER BY dist DESC LIMIT 10;
similarity metoduAçıklaması şöyle
The SIMILARITY function returns what is called a score. We can adjust this score according to the matches we need. The default score is 0.3 And if we are not worried about the score, then we can use the below shorthand.SELECT * FROM storm_events WHERE event_type % 'Drot'
Örnek
Tablomuzda şöyle bir veri olsun
show_trgm metodu
Örnek
Şöyle yaparız.
Örnek ver
strict_word_similarity metodu
Örnek ver
show_limit metodu
Örnek ver
show_limit metodu
Örnek ver
- Hello
? Hello
| Hello
$ Hello
! Hello
!? Hello
Şöyle yaparız.SELECT my_column, similarity('$ Hello', my_column) AS sml
FROM my_table
WHERE my_column % '$ Hello'
ORDER BY sml DESC, my_column;
Çıktı olarak 1 alırız.show_trgm metodu
Örnek
Şöyle yaparız.
SELECT show_trgm('$ Hello');
show_trgm
---------------------------------
{" h"," he",ell,hel,llo,"lo "}
(1 row)
SELECT show_trgm('- Hello');
show_trgm
---------------------------------
{" h"," he",ell,hel,llo,"lo "}
(1 row)
word_similarity metoduÖrnek ver
strict_word_similarity metodu
Örnek ver
show_limit metodu
Örnek ver
show_limit metodu
Örnek ver
STRING_TO_ARRAY metodu
Elimizde şöyle bir tablo olsun
CREATE SEQUENCE todo_id_seq
START WITH 1
INCREMENT BY 10
NO MINVALUE
NO MAXVALUE CACHE 1;
CREATE TABLE t_todo
(
id BIGINT NOT NULL DEFAULT nextval('todo_id_seq'),
ext_ref CHARACTER VARYING(255) NOT NULL,
title CHARACTER VARYING(255) NOT NULL,
active BOOLEAN NOT NULL DEFAULT TRUE,
created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (id),
UNIQUE (ext_ref),
UNIQUE (title)
);
Şöyle yaparız@Repository
public interface ToDoRepository extends JpaRepository<ToDo, Long> {
@Query(value = "SELECT * FROM t_todo todo WHERE ?1 %
ANY(STRING_TO_ARRAY(todo.title,' '))",
nativeQuery = true)
List<ToDo> findAllMatching(String partialTitle);
}
Açıklaması şöyle
One more thing…the % operator uses the default similarity threshold having a value of 0.3 but if we need to adjust it (e.g 0.5) we can use the SIMILARITY function:
Bu durumda şöyle yaparız
SELECT * FROM t_todo todo WHERE SIMILARITY(todo.title, 'Clean') > 0.4
2. Root (Kök ve Lexeme) Kelime Arama
Örnek
Elimizde şöyle bir tablo olsun
CREATE TABLE log(name TEXT,description TEXT);INSERT INTO log VALUES('ERROR1','Failed to retreive credentials');INSERT INTO log VALUES('ERROR2','Fatal error. No records present. Please try again with a
different value');INSERT INTO log VALUES('ERROR3','Unable to connect. Credentials missing');
Arama için şöyle yaparız
SELECT * FROM log WHERE to_tsvector(description) @@ to_tsquery('miss');
Açıklaması şöyle
The to_tsvector function converts the values to their lexeme and the to_tsquery function will try to match the words.
3. Levenshtein Matching
Şöyle yaparız
4. Phonetic MatchSELECT * FROM storm_events WHERE levenshtein(event_type, 'Drot') < 4
Açıklaması şöyle
The fuzzystrmatch extension has another cool feature where we can search for text which does not have similar spelling but sounds the same.
Örnek
Şöyle yaparız. Telaffuzu "he" kelimesine benzeyen Heat ve Hail gibi şeyler döndürür
SELECT DISTINCT(event_type) FROM storm_events WHERE DIFFERENCE(event_type, 'he') > 2
Hiç yorum yok:
Yorum Gönder