You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
而你的类似的查询,才7毫秒左右, 我shared hit的特别多, 我自己中文的文档,10万文档,然类似查询的话,时间也在几千毫秒左右,这是哪边配置不对吗,
set smlar.type = tfidf;
set smlar.stattable = 'documents_body_stats'
set smlar.persistent_cache=true;
set smlar.threshold =0.4;
set smlar.tf_method = 'n';
The text was updated successfully, but these errors were encountered:
我参照网页上你的步骤1步步做的,
数据是从https://www.mockaroo.com/c5418bd0,通过命令curl "https://api.mockaroo.com/api/c5418bd0?count=100000&key=b21830e0" > "tfidf.csv"下载的,一次只能下载5000,所以我下载了20次,得到10万数据,
然后我的执行计划:
postgres=# explain (analyze,verbose,timing,costs,buffers) select
postgres-# *,
postgres-# smlar( tokenize(body), '{Etiam,pretium,iaculis}'::text[] )
postgres-# from
postgres-# documents
postgres-# where
postgres-# tokenize(body) %% '{Etiam,pretium,iaculis}'::text[] -- where TFIDF similarity >= smlar.threshold
postgres-# order by
postgres-# smlar( tokenize(body), '{Etiam,pretium,iaculis}'::text[] ) desc
postgres-# limit 10;
QUERY PLAN
Limit (cost=368.05..368.07 rows=10 width=258) (actual time=4957.830..4957.832 rows=10 loops=1)
Output: document_id, body, (smlar(regexp_split_to_array(lower(body), '[^[:alnum:]]'::text), '{Etiam,pretium
,iaculis}'::text[]))
Buffers: shared hit=4870
-> Sort (cost=368.05..368.30 rows=100 width=258) (actual time=4957.829..4957.829 rows=10 loops=1)
Output: document_id, body, (smlar(regexp_split_to_array(lower(body), '[^[:alnum:]]'::text), '{Etiam,p
retium,iaculis}'::text[]))
Sort Key: (smlar(regexp_split_to_array(lower(documents.body), '[^[:alnum:]]'::text), '{Etiam,pretium,
iaculis}'::text[])) DESC
Sort Method: top-N heapsort Memory: 26kB
Buffers: shared hit=4870
-> Bitmap Heap Scan on public.documents (cost=17.05..365.89 rows=100 width=258) (actual time=111.75
7..4957.313 rows=666 loops=1)
Output: document_id, body, smlar(regexp_split_to_array(lower(body), '[^[:alnum:]]'::text), '{Et
iam,pretium,iaculis}'::text[])
Recheck Cond: (regexp_split_to_array(lower(documents.body), '[^[:alnum:]]'::text) %% '{Etiam,pr
etium,iaculis}'::text[])
Rows Removed by Index Recheck: 17737
Heap Blocks: exact=3525
Buffers: shared hit=4870
-> Bitmap Index Scan on documents_tokenize_idx (cost=0.00..17.03 rows=100 width=0) (actual ti
me=95.413..95.413 rows=18403 loops=1)
Index Cond: (regexp_split_to_array(lower(documents.body), '[^[:alnum:]]'::text) %% '{Etia
m,pretium,iaculis}'::text[])
Buffers: shared hit=1345
Planning time: 0.456 ms
Execution time: 4957.952 ms
(19 rows)
而你的类似的查询,才7毫秒左右, 我shared hit的特别多, 我自己中文的文档,10万文档,然类似查询的话,时间也在几千毫秒左右,这是哪边配置不对吗,
set smlar.type = tfidf;
set smlar.stattable = 'documents_body_stats'
set smlar.persistent_cache=true;
set smlar.threshold =0.4;
set smlar.tf_method = 'n';
The text was updated successfully, but these errors were encountered: