Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cutforsearch #54

Open
bryrosal opened this issue Feb 21, 2019 · 2 comments
Open

cutforsearch #54

bryrosal opened this issue Feb 21, 2019 · 2 comments
Labels

Comments

@bryrosal
Copy link

bryrosal commented Feb 21, 2019

Hi, this is my first time using this. so please bear with me :).
i tried the cutforsearch demo,
$seg_list = Jieba::cutForSearch("小明硕士毕业于中国科学院计算所,后在日本京都大学深造"); #搜索引擎模式
var_dump($seg_list);

the output is array(18) without comma but I run it on my local the output is array(19) with comma
image (and it is using the Jieba::init(array('mode'=>'test','dict'=>'big'));)

but if i use Jieba::init only the output is array(20)

image

@bryrosal
Copy link
Author

how can I remove this on for cutforsearch only?

$re_punctuation_pattern = '([\x{ff5e}\x{ff01}\x{ff08}\x{ff09}\x{300e}'.
'\x{300c}\x{300d}\x{300f}\x{3001}\x{ff1a}\x{ff1b}'.
'\x{ff0c}\x{ff1f}\x{3002}]+)';

@fukuball
Copy link
Owner

You can replace punctuation with space first, then cut for search, maybe that's the result you want to get.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants