Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

报告BUG #63

Open
kmblack1 opened this issue Feb 19, 2024 · 1 comment
Open

报告BUG #63

kmblack1 opened this issue Feb 19, 2024 · 1 comment

Comments

@kmblack1
Copy link

PostgreSQL是多进程,pg_jieba的字典数据在每个进程中都加载了比较耗费内存,个人觉得字典数据应该加载在share_buffers中.

使用字典中不存在的关键字"新华三" 证明如下:

1 启动二个psql客户端,分别为A和B,在A和B中分别执行

select to_tsvector('public.jiebacfg', '新华三');

输出,A和B结果完全相同

   to_tsvector
-----------------
 '':2 '新华':1
(1 行记录)

2 添加关键字

在服务器上修改文件jieba_user.dict

sudo vim jieba_user.dict

在用户字典中添加关键字"新华三"

云计算
韩玉鉴赏
蓝翔 nz
区块链 10 nz
新华三

3 终端A

--重新加载字典
select jieba_reload_dict();

select to_tsvector('public.jiebacfg', '新华三');

终端A输出

 to_tsvector
-------------
 '新华三':1
(1 行记录)

4 终端B

--因为已经在终端A中加载过字典了,所以终端B中不再加载
select to_tsvector('public.jiebacfg', '新华三');

终端B输出

   to_tsvector
-----------------
 '':2 '新华':1
(1 行记录)

终端A和终端B中的结果完全不同

5 解决方法:

1 重启服务器,但生产环境不允许这么做;
2 修改字典后在每个连接在开始分词之前都需要重新加载字典;

@SharkSyl
Copy link

请问自定义字典如何生效啊?我这样尝试字典没生效

root@3152d06267d8:/usr/share/postgresql/15/tsearch_data# ls
danish.stop   french.stop            hunspell_sample_long.affix  ispell_sample.affix  jieba_hmm.model  nepali.stop      spanish.stop          turkish.stop
dutch.stop    german.stop            hunspell_sample_long.dict   ispell_sample.dict   jieba.idf        norwegian.stop   swedish.stop          unaccent.rules
english.stop  hungarian.stop         hunspell_sample_num.affix   italian.stop         jieba.stop       portuguese.stop  synonym_sample.syn    xsyn_sample.rules
finnish.stop  hunspell_sample.affix  hunspell_sample_num.dict    jieba_base.dict      jieba_user.dict  russian.stop     thesaurus_sample.ths
root@3152d06267d8:/usr/share/postgresql/15/tsearch_data# cd ..
root@3152d06267d8:/usr/share/postgresql/15# cd ..
root@3152d06267d8:/usr/share/postgresql# mkdir tsearch_data
root@3152d06267d8:/usr/share/postgresql# cd tsearch_data/
root@3152d06267d8:/usr/share/postgresql/tsearch_data# cp ../15/tsearch_data/jieba_user.dict ./
root@3152d06267d8:/usr/share/postgresql/tsearch_data# ls
jieba_user.dict
root@3152d06267d8:/usr/share/postgresql/tsearch_data# psql
psql (15.10 (Debian 15.10-0+deb12u1), server 15.8 (Debian 15.8-1.pgdg120+1))
Type "help" for help.

postgres=# CREATE EXTENSION pg_jieba;
ERROR:  extension "pg_jieba" already exists
postgres=# select to_tsvector('jiebacfg', '你是AI助手云计算泽阳');
   to_tsvector
-----------------
 'ai':3 '助手':4
(1 row)

postgres=# exit
root@3152d06267d8:/usr/share/postgresql/tsearch_data# cat ../15/tsearch_data/jieba_user.dict
创新办 3 i
云计算 5
凱特琳 nz
台中
ai助手

这个是我的dockerfile

# 使用较小的基础镜像
FROM debian:bookworm-slim AS builder

# 安装必要的工具和依赖项
RUN apt-get update && \
    apt-get install -y \
    wget \
    unzip \
    cmake \
    make \
    gcc \
    g++ \
    git \
    libpq-dev \
	gnupg \
    postgresql-server-dev-15 && \
    echo "deb http://apt.postgresql.org/pub/repos/apt bookworm-pgdg main" > /etc/apt/sources.list.d/pgdg.list && \
    wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | apt-key add - && \
    apt-get update

# 复制 pg_jieba 的压缩包
COPY pg_jieba.zip /tmp/pg_jieba.zip

# 解压并构建 pg_jieba
RUN unzip /tmp/pg_jieba.zip -d /tmp/pg_jieba && \
    cd /tmp/pg_jieba/pg_jieba && \
    git submodule update --init --recursive && \
    mkdir build && \
    cd build && \
    cmake -DPostgreSQL_TYPE_INCLUDE_DIR=$(pg_config --includedir-server) .. && \
    make && \
    make install

# 使用较小的基础镜像构建最终镜像
FROM pgvector/pgvector:pg15

# 复制构建好的 pg_jieba
COPY --from=builder /usr/share/postgresql/15/extension/ /usr/share/postgresql/15/extension/
COPY --from=builder /usr/share/postgresql/15/tsearch_data/ /usr/share/postgresql/15/tsearch_data/
COPY --from=builder /usr/lib/postgresql/ /usr/lib/postgresql/

# 修改 postgresql.conf 文件,添加 shared_preload_libraries = 'pg_jieba.so'
RUN echo "  \n\
  echo \"shared_preload_libraries = 'pg_jieba'\" >> /var/lib/postgresql/data/pgdata/postgresql.conf" \
  > /docker-entrypoint-initdb.d/init-dict.sh \
  && echo "CREATE EXTENSION pg_jieba;" > /docker-entrypoint-initdb.d/init-jieba.sql 

# 清理多余文件
RUN apt-get clean && \
    rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

在启动日志中存在这样的日志,和这个有关系吗?

pgvector-1  | 2025-01-24 00:54:38.352 UTC [190] LOG:  pg_jieba Extension is not loaded by shared_preload_libraries, Variables can't be configured

我确认postgres.conf已经增加了如下内容,我尝试把pg_jieba换成pg_jieba.so也会出现上面的日志

# Add settings for extensions here
shared_preload_libraries = 'pg_jieba'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants