-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
使用自定义词典时的问题 #1
Comments
刚看了下你写的代码
senti = Sentiment(pos=r"D:\情感词典\负面词典.txt", #正面词典txt文件相对路径
neg=r"D:\情感词典\正面词典.txt", #负面词典txt文件相对路径
encoding='unicode_escape') #两txt均为utf-8编码
有两种可能的原因
1. pos和neg分别是积极和消极,参数你传递的不对。
2. encoding参数用来接收自定义词典的编码格式,如果你的txt是utf-8,encoding='utf-8'即可。unicode_escape我不太了解,不晓得是否也会导致问题
3. cnsenti用的jieba分词,不排除jieba分词分错新情感词。这块我cnsenti中没有开发这部分,后续会改进的
…------------------ 原始邮件 ------------------
发件人: "DesmondLiu"<[email protected]>;
发送时间: 2020年4月12日(星期天) 上午10:48
收件人: "thunderhit/cnsenti"<[email protected]>;
抄送: "Subscribed"<[email protected]>;
主题: [thunderhit/cnsenti] 使用自定义词典时的问题 (#1)
您好,我在使用“通过自定义词典”来进行情感分析的过程中遇到些问题。想向您请教。
问题一,我遇到了不识别词典的问题。以下是代码。
`from cnsenti import Sentiment
senti = Sentiment(pos=r"D:\情感词典\负面词典.txt", #正面词典txt文件相对路径
neg=r"D:\情感词典\正面词典.txt", #负面词典txt文件相对路径
encoding='unicode_escape') #两txt均为utf-8编码
test_text = '这家公司是行业的引领者,是中流砥柱。今年的业绩非常好'
result1 = senti.sentiment_count(test_text)
result2 = senti.sentiment_calculate(test_text)
print('sentiment_count',result1)
print('sentiment_calculate',result2)`
以下是运行结果,可以发现显示的积极词数是0。这个文本中的词,”引领者“和”中流砥柱“均是词典中的积极词。
`D:\Anaconda3\python.exe D:/pythonproject/test1/金融文本情感-中文-cnsenti.py
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\刘铭基\AppData\Local\Temp\jieba.cache
Loading model cost 0.675 seconds.
sentiment_count {'words': 15, 'sentences': 2, 'pos': 0, 'neg': 0}
Prefix dict has been built succesfully.
sentiment_calculate {'sentences': 2, 'words': 15, 'pos': 0, 'neg': 0}
进程已结束,退出代码 0
`
问题二,我遇到了被检测文本的末尾如果有句号,则报错。以下是代码。
`from cnsenti import Sentiment
senti = Sentiment(pos=r"D:\情感词典\负面词典.txt", #正面词典txt文件相对路径
neg=r"D:\情感词典\正面词典.txt", #负面词典txt文件相对路径
encoding='unicode_escape') #两txt均为utf-8编码
test_text = '这家公司是行业的引领者,是中流砥柱。今年的业绩非常好。'
result1 = senti.sentiment_count(test_text)
result2 = senti.sentiment_calculate(test_text)
print('sentiment_count',result1)
print('sentiment_calculate',result2)`
以下是运行结果。
`D:\Anaconda3\python.exe D:/pythonproject/test1/金融文本情感-中文-cnsenti.py
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\刘铭基\AppData\Local\Temp\jieba.cache
Loading model cost 0.685 seconds.
Prefix dict has been built succesfully.
Traceback (most recent call last):
File "D:/pythonproject/test1/金融文本情感-中文-cnsenti.py", line 9, in
result2 = senti.sentiment_calculate(test_text)
File "D:\Anaconda3\lib\site-packages\cnsenti\sentiment.py", line 218, in sentiment_calculate
pos = np.sum(score_array[:, 0])
IndexError: too many indices for array
进程已结束,退出代码 1
`
最后非常感谢您的贡献,希望得到您的帮助!
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
非常感谢您的回复!
1.正负词我弄反了
2.我改为了“utf-8"
但是依然是不识别情感词。
此外,关于第二个问题,即“被分析文本中有句号”时报错,请问有什么办法解决呢?代码如下。
from cnsenti import Sentiment
senti = Sentiment(pos=r"D:\情感词典\正面词典.txt", #正面词典txt文件相对路径
neg=r"D:\情感词典\负面词典.txt", #负面词典txt文件相对路径
encoding='utf-8') #两txt均为utf-8编码
test_text = '这家公司是行业的引领者,是中流砥柱。今年的业绩非常好。'
result1 = senti.sentiment_count(test_text)
result2 = senti.sentiment_calculate(test_text)
print('sentiment_count',result1)
print('sentiment_calculate',result2)
运行结果如下
D:\Anaconda3\python.exe D:/pythonproject/test1/金融文本情感-中文-cnsenti.py
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\AppData\Local\Temp\jieba.cache
Loading model cost 0.739 seconds.
Prefix dict has been built succesfully.
Traceback (most recent call last):
File "D:/pythonproject/test1/金融文本情感-中文-cnsenti.py", line 9, in <module>
result2 = senti.sentiment_calculate(test_text)
File "D:\Anaconda3\lib\site-packages\cnsenti\sentiment.py", line 218, in sentiment_calculate
pos = np.sum(score_array[:, 0])
IndexError: too many indices for array
进程已结束,退出代码 1
非常感谢您的帮助!
…------------------ 原始邮件 ------------------
发件人: "thunderhit"<[email protected]>;
发送时间: 2020年4月12日(星期天) 中午11:00
收件人: "thunderhit/cnsenti"<[email protected]>;
抄送: ""<[email protected]>;"Author"<[email protected]>;
主题: Re: [thunderhit/cnsenti] 使用自定义词典时的问题 (#1)
刚看了下你写的代码
senti = Sentiment(pos=r"D:\情感词典\负面词典.txt", #正面词典txt文件相对路径
neg=r"D:\情感词典\正面词典.txt", #负面词典txt文件相对路径
encoding='unicode_escape') #两txt均为utf-8编码
有两种可能的原因
1. pos和neg分别是积极和消极,参数你传递的不对。
2. encoding参数用来接收自定义词典的编码格式,如果你的txt是utf-8,encoding='utf-8'即可。unicode_escape我不太了解,不晓得是否也会导致问题
3. cnsenti用的jieba分词,不排除jieba分词分错新情感词。这块我cnsenti中没有开发这部分,后续会改进的
------------------&nbsp;原始邮件&nbsp;------------------
发件人:&nbsp;"DesmondLiu"<[email protected]&gt;;
发送时间:&nbsp;2020年4月12日(星期天) 上午10:48
收件人:&nbsp;"thunderhit/cnsenti"<[email protected]&gt;;
抄送:&nbsp;"Subscribed"<[email protected]&gt;;
主题:&nbsp;[thunderhit/cnsenti] 使用自定义词典时的问题 (#1)
您好,我在使用“通过自定义词典”来进行情感分析的过程中遇到些问题。想向您请教。
问题一,我遇到了不识别词典的问题。以下是代码。
`from cnsenti import Sentiment
senti = Sentiment(pos=r"D:\情感词典\负面词典.txt", #正面词典txt文件相对路径
neg=r"D:\情感词典\正面词典.txt", #负面词典txt文件相对路径
encoding='unicode_escape') #两txt均为utf-8编码
test_text = '这家公司是行业的引领者,是中流砥柱。今年的业绩非常好'
result1 = senti.sentiment_count(test_text)
result2 = senti.sentiment_calculate(test_text)
print('sentiment_count',result1)
print('sentiment_calculate',result2)`
以下是运行结果,可以发现显示的积极词数是0。这个文本中的词,”引领者“和”中流砥柱“均是词典中的积极词。
`D:\Anaconda3\python.exe D:/pythonproject/test1/金融文本情感-中文-cnsenti.py
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\AppData\Local\Temp\jieba.cache
Loading model cost 0.675 seconds.
sentiment_count {'words': 15, 'sentences': 2, 'pos': 0, 'neg': 0}
Prefix dict has been built succesfully.
sentiment_calculate {'sentences': 2, 'words': 15, 'pos': 0, 'neg': 0}
进程已结束,退出代码 0
`
问题二,我遇到了被检测文本的末尾如果有句号,则报错。以下是代码。
`from cnsenti import Sentiment
senti = Sentiment(pos=r"D:\情感词典\负面词典.txt", #正面词典txt文件相对路径
neg=r"D:\情感词典\正面词典.txt", #负面词典txt文件相对路径
encoding='unicode_escape') #两txt均为utf-8编码
test_text = '这家公司是行业的引领者,是中流砥柱。今年的业绩非常好。'
result1 = senti.sentiment_count(test_text)
result2 = senti.sentiment_calculate(test_text)
print('sentiment_count',result1)
print('sentiment_calculate',result2)`
以下是运行结果。
`D:\Anaconda3\python.exe D:/pythonproject/test1/金融文本情感-中文-cnsenti.py
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\AppData\Local\Temp\jieba.cache
Loading model cost 0.685 seconds.
Prefix dict has been built succesfully.
Traceback (most recent call last):
File "D:/pythonproject/test1/金融文本情感-中文-cnsenti.py", line 9, in
result2 = senti.sentiment_calculate(test_text)
File "D:\Anaconda3\lib\site-packages\cnsenti\sentiment.py", line 218, in sentiment_calculate
pos = np.sum(score_array[:, 0])
IndexError: too many indices for array
进程已结束,退出代码 1
`
最后非常感谢您的贡献,希望得到您的帮助!
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
第一个大问题,我这里代码跑了,识别了中流砥柱,没什么问题。
你留的第二个大问题,我正在解决。
…---原始邮件---
发件人: "DesmondLiu"<[email protected]>
发送时间: 2020年4月12日(周日) 中午11:08
收件人: "thunderhit/cnsenti"<[email protected]>;
抄送: "Comment"<[email protected]>;"thunderhit"<[email protected]>;
主题: Re: [thunderhit/cnsenti] 使用自定义词典时的问题 (#1)
非常感谢您的回复!
1.正负词我弄反了
2.我改为了“utf-8"
但是依然是不识别情感词。
此外,关于第二个问题,即“被分析文本中有句号”时报错,请问有什么办法解决呢?代码如下。
from cnsenti import Sentiment
senti = Sentiment(pos=r"D:\情感词典\正面词典.txt", #正面词典txt文件相对路径
neg=r"D:\情感词典\负面词典.txt", #负面词典txt文件相对路径
encoding='utf-8') #两txt均为utf-8编码
test_text = '这家公司是行业的引领者,是中流砥柱。今年的业绩非常好。'
result1 = senti.sentiment_count(test_text)
result2 = senti.sentiment_calculate(test_text)
print('sentiment_count',result1)
print('sentiment_calculate',result2)
运行结果如下
D:\Anaconda3\python.exe D:/pythonproject/test1/金融文本情感-中文-cnsenti.py
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\刘铭基\AppData\Local\Temp\jieba.cache
Loading model cost 0.739 seconds.
Prefix dict has been built succesfully.
Traceback (most recent call last):
&nbsp; File "D:/pythonproject/test1/金融文本情感-中文-cnsenti.py", line 9, in <module&gt;
&nbsp; &nbsp; result2 = senti.sentiment_calculate(test_text)
&nbsp; File "D:\Anaconda3\lib\site-packages\cnsenti\sentiment.py", line 218, in sentiment_calculate
&nbsp; &nbsp; pos = np.sum(score_array[:, 0])
IndexError: too many indices for array
进程已结束,退出代码 1
非常感谢您的帮助!
------------------&nbsp;原始邮件&nbsp;------------------
发件人:&nbsp;"thunderhit"<[email protected]&gt;;
发送时间:&nbsp;2020年4月12日(星期天) 中午11:00
收件人:&nbsp;"thunderhit/cnsenti"<[email protected]&gt;;
抄送:&nbsp;"刘铭基"<[email protected]&gt;;"Author"<[email protected]&gt;;
主题:&nbsp;Re: [thunderhit/cnsenti] 使用自定义词典时的问题 (#1)
刚看了下你写的代码
senti = Sentiment(pos=r"D:\情感词典\负面词典.txt", #正面词典txt文件相对路径
neg=r"D:\情感词典\正面词典.txt", #负面词典txt文件相对路径
encoding='unicode_escape') #两txt均为utf-8编码
有两种可能的原因
1. pos和neg分别是积极和消极,参数你传递的不对。
2. encoding参数用来接收自定义词典的编码格式,如果你的txt是utf-8,encoding='utf-8'即可。unicode_escape我不太了解,不晓得是否也会导致问题
3. cnsenti用的jieba分词,不排除jieba分词分错新情感词。这块我cnsenti中没有开发这部分,后续会改进的
------------------&amp;nbsp;原始邮件&amp;nbsp;------------------
发件人:&amp;nbsp;"DesmondLiu"<[email protected]&amp;gt;;
发送时间:&amp;nbsp;2020年4月12日(星期天) 上午10:48
收件人:&amp;nbsp;"thunderhit/cnsenti"<[email protected]&amp;gt;;
抄送:&amp;nbsp;"Subscribed"<[email protected]&amp;gt;;
主题:&amp;nbsp;[thunderhit/cnsenti] 使用自定义词典时的问题 (#1)
您好,我在使用“通过自定义词典”来进行情感分析的过程中遇到些问题。想向您请教。
问题一,我遇到了不识别词典的问题。以下是代码。
`from cnsenti import Sentiment
senti = Sentiment(pos=r"D:\情感词典\负面词典.txt", #正面词典txt文件相对路径
neg=r"D:\情感词典\正面词典.txt", #负面词典txt文件相对路径
encoding='unicode_escape') #两txt均为utf-8编码
test_text = '这家公司是行业的引领者,是中流砥柱。今年的业绩非常好'
result1 = senti.sentiment_count(test_text)
result2 = senti.sentiment_calculate(test_text)
print('sentiment_count',result1)
print('sentiment_calculate',result2)`
以下是运行结果,可以发现显示的积极词数是0。这个文本中的词,”引领者“和”中流砥柱“均是词典中的积极词。
`D:\Anaconda3\python.exe D:/pythonproject/test1/金融文本情感-中文-cnsenti.py
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\刘铭基\AppData\Local\Temp\jieba.cache
Loading model cost 0.675 seconds.
sentiment_count {'words': 15, 'sentences': 2, 'pos': 0, 'neg': 0}
Prefix dict has been built succesfully.
sentiment_calculate {'sentences': 2, 'words': 15, 'pos': 0, 'neg': 0}
进程已结束,退出代码 0
`
问题二,我遇到了被检测文本的末尾如果有句号,则报错。以下是代码。
`from cnsenti import Sentiment
senti = Sentiment(pos=r"D:\情感词典\负面词典.txt", #正面词典txt文件相对路径
neg=r"D:\情感词典\正面词典.txt", #负面词典txt文件相对路径
encoding='unicode_escape') #两txt均为utf-8编码
test_text = '这家公司是行业的引领者,是中流砥柱。今年的业绩非常好。'
result1 = senti.sentiment_count(test_text)
result2 = senti.sentiment_calculate(test_text)
print('sentiment_count',result1)
print('sentiment_calculate',result2)`
以下是运行结果。
`D:\Anaconda3\python.exe D:/pythonproject/test1/金融文本情感-中文-cnsenti.py
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\刘铭基\AppData\Local\Temp\jieba.cache
Loading model cost 0.685 seconds.
Prefix dict has been built succesfully.
Traceback (most recent call last):
File "D:/pythonproject/test1/金融文本情感-中文-cnsenti.py", line 9, in
result2 = senti.sentiment_calculate(test_text)
File "D:\Anaconda3\lib\site-packages\cnsenti\sentiment.py", line 218, in sentiment_calculate
pos = np.sum(score_array[:, 0])
IndexError: too many indices for array
进程已结束,退出代码 1
`
最后非常感谢您的贡献,希望得到您的帮助!
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
好的。再次感谢您!
发自我的iPhone
…------------------ 原始邮件 ------------------
发件人: thunderhit <[email protected]>
发送时间: 2020年4月12日 11:31
收件人: thunderhit/cnsenti <[email protected]>
抄送: DesmondLiu <[email protected]>, Author <[email protected]>
主题: 回复:[thunderhit/cnsenti] 使用自定义词典时的问题 (#1)
第一个大问题,我这里代码跑了,识别了中流砥柱,没什么问题。
你留的第二个大问题,我正在解决。
---原始邮件---
发件人: "DesmondLiu"<[email protected]&gt;
发送时间: 2020年4月12日(周日) 中午11:08
收件人: "thunderhit/cnsenti"<[email protected]&gt;;
抄送: "Comment"<[email protected]&gt;;"thunderhit"<[email protected]&gt;;
主题: Re: [thunderhit/cnsenti] 使用自定义词典时的问题 (#1)
非常感谢您的回复!
1.正负词我弄反了
2.我改为了“utf-8"
但是依然是不识别情感词。
此外,关于第二个问题,即“被分析文本中有句号”时报错,请问有什么办法解决呢?代码如下。
from cnsenti import Sentiment
senti = Sentiment(pos=r"D:\情感词典\正面词典.txt", #正面词典txt文件相对路径
neg=r"D:\情感词典\负面词典.txt", #负面词典txt文件相对路径
encoding='utf-8') #两txt均为utf-8编码
test_text = '这家公司是行业的引领者,是中流砥柱。今年的业绩非常好。'
result1 = senti.sentiment_count(test_text)
result2 = senti.sentiment_calculate(test_text)
print('sentiment_count',result1)
print('sentiment_calculate',result2)
运行结果如下
D:\Anaconda3\python.exe D:/pythonproject/test1/金融文本情感-中文-cnsenti.py
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\刘铭基\AppData\Local\Temp\jieba.cache
Loading model cost 0.739 seconds.
Prefix dict has been built succesfully.
Traceback (most recent call last):
&amp;nbsp; File "D:/pythonproject/test1/金融文本情感-中文-cnsenti.py", line 9, in <module&amp;gt;
&amp;nbsp; &amp;nbsp; result2 = senti.sentiment_calculate(test_text)
&amp;nbsp; File "D:\Anaconda3\lib\site-packages\cnsenti\sentiment.py", line 218, in sentiment_calculate
&amp;nbsp; &amp;nbsp; pos = np.sum(score_array[:, 0])
IndexError: too many indices for array
进程已结束,退出代码 1
非常感谢您的帮助!
------------------&amp;nbsp;原始邮件&amp;nbsp;------------------
发件人:&amp;nbsp;"thunderhit"<[email protected]&amp;gt;;
发送时间:&amp;nbsp;2020年4月12日(星期天) 中午11:00
收件人:&amp;nbsp;"thunderhit/cnsenti"<[email protected]&amp;gt;;
抄送:&amp;nbsp;"刘铭基"<[email protected]&amp;gt;;"Author"<[email protected]&amp;gt;;
主题:&amp;nbsp;Re: [thunderhit/cnsenti] 使用自定义词典时的问题 (#1)
刚看了下你写的代码
senti = Sentiment(pos=r"D:\情感词典\负面词典.txt", #正面词典txt文件相对路径
neg=r"D:\情感词典\正面词典.txt", #负面词典txt文件相对路径
encoding='unicode_escape') #两txt均为utf-8编码
有两种可能的原因
1. pos和neg分别是积极和消极,参数你传递的不对。
2. encoding参数用来接收自定义词典的编码格式,如果你的txt是utf-8,encoding='utf-8'即可。unicode_escape我不太了解,不晓得是否也会导致问题
3. cnsenti用的jieba分词,不排除jieba分词分错新情感词。这块我cnsenti中没有开发这部分,后续会改进的
------------------&amp;amp;nbsp;原始邮件&amp;amp;nbsp;------------------
发件人:&amp;amp;nbsp;"DesmondLiu"<[email protected]&amp;amp;gt;;
发送时间:&amp;amp;nbsp;2020年4月12日(星期天) 上午10:48
收件人:&amp;amp;nbsp;"thunderhit/cnsenti"<[email protected]&amp;amp;gt;;
抄送:&amp;amp;nbsp;"Subscribed"<[email protected]&amp;amp;gt;;
主题:&amp;amp;nbsp;[thunderhit/cnsenti] 使用自定义词典时的问题 (#1)
您好,我在使用“通过自定义词典”来进行情感分析的过程中遇到些问题。想向您请教。
问题一,我遇到了不识别词典的问题。以下是代码。
`from cnsenti import Sentiment
senti = Sentiment(pos=r"D:\情感词典\负面词典.txt", #正面词典txt文件相对路径
neg=r"D:\情感词典\正面词典.txt", #负面词典txt文件相对路径
encoding='unicode_escape') #两txt均为utf-8编码
test_text = '这家公司是行业的引领者,是中流砥柱。今年的业绩非常好'
result1 = senti.sentiment_count(test_text)
result2 = senti.sentiment_calculate(test_text)
print('sentiment_count',result1)
print('sentiment_calculate',result2)`
以下是运行结果,可以发现显示的积极词数是0。这个文本中的词,”引领者“和”中流砥柱“均是词典中的积极词。
`D:\Anaconda3\python.exe D:/pythonproject/test1/金融文本情感-中文-cnsenti.py
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\刘铭基\AppData\Local\Temp\jieba.cache
Loading model cost 0.675 seconds.
sentiment_count {'words': 15, 'sentences': 2, 'pos': 0, 'neg': 0}
Prefix dict has been built succesfully.
sentiment_calculate {'sentences': 2, 'words': 15, 'pos': 0, 'neg': 0}
进程已结束,退出代码 0
`
问题二,我遇到了被检测文本的末尾如果有句号,则报错。以下是代码。
`from cnsenti import Sentiment
senti = Sentiment(pos=r"D:\情感词典\负面词典.txt", #正面词典txt文件相对路径
neg=r"D:\情感词典\正面词典.txt", #负面词典txt文件相对路径
encoding='unicode_escape') #两txt均为utf-8编码
test_text = '这家公司是行业的引领者,是中流砥柱。今年的业绩非常好。'
result1 = senti.sentiment_count(test_text)
result2 = senti.sentiment_calculate(test_text)
print('sentiment_count',result1)
print('sentiment_calculate',result2)`
以下是运行结果。
`D:\Anaconda3\python.exe D:/pythonproject/test1/金融文本情感-中文-cnsenti.py
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\刘铭基\AppData\Local\Temp\jieba.cache
Loading model cost 0.685 seconds.
Prefix dict has been built succesfully.
Traceback (most recent call last):
File "D:/pythonproject/test1/金融文本情感-中文-cnsenti.py", line 9, in
result2 = senti.sentiment_calculate(test_text)
File "D:\Anaconda3\lib\site-packages\cnsenti\sentiment.py", line 218, in sentiment_calculate
pos = np.sum(score_array[:, 0])
IndexError: too many indices for array
进程已结束,退出代码 1
`
最后非常感谢您的贡献,希望得到您的帮助!
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or unsubscribe.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
我更新了代码,你可以卸载cnsenti
过半个小时再安装新的cnsenti
这是cnsenti文档,已经解决多个句子问题;
https://github.com/thunderhit/cnsenti/blob/master/README.md
…------------------ 原始邮件 ------------------
发件人: "DesmondLiu"<[email protected]>;
发送时间: 2020年4月12日(星期天) 上午10:48
收件人: "thunderhit/cnsenti"<[email protected]>;
抄送: "Subscribed"<[email protected]>;
主题: [thunderhit/cnsenti] 使用自定义词典时的问题 (#1)
您好,我在使用“通过自定义词典”来进行情感分析的过程中遇到些问题。想向您请教。
问题一,我遇到了不识别词典的问题。以下是代码。
`from cnsenti import Sentiment
senti = Sentiment(pos=r"D:\情感词典\负面词典.txt", #正面词典txt文件相对路径
neg=r"D:\情感词典\正面词典.txt", #负面词典txt文件相对路径
encoding='unicode_escape') #两txt均为utf-8编码
test_text = '这家公司是行业的引领者,是中流砥柱。今年的业绩非常好'
result1 = senti.sentiment_count(test_text)
result2 = senti.sentiment_calculate(test_text)
print('sentiment_count',result1)
print('sentiment_calculate',result2)`
以下是运行结果,可以发现显示的积极词数是0。这个文本中的词,”引领者“和”中流砥柱“均是词典中的积极词。
`D:\Anaconda3\python.exe D:/pythonproject/test1/金融文本情感-中文-cnsenti.py
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\刘铭基\AppData\Local\Temp\jieba.cache
Loading model cost 0.675 seconds.
sentiment_count {'words': 15, 'sentences': 2, 'pos': 0, 'neg': 0}
Prefix dict has been built succesfully.
sentiment_calculate {'sentences': 2, 'words': 15, 'pos': 0, 'neg': 0}
进程已结束,退出代码 0
`
问题二,我遇到了被检测文本的末尾如果有句号,则报错。以下是代码。
`from cnsenti import Sentiment
senti = Sentiment(pos=r"D:\情感词典\负面词典.txt", #正面词典txt文件相对路径
neg=r"D:\情感词典\正面词典.txt", #负面词典txt文件相对路径
encoding='unicode_escape') #两txt均为utf-8编码
test_text = '这家公司是行业的引领者,是中流砥柱。今年的业绩非常好。'
result1 = senti.sentiment_count(test_text)
result2 = senti.sentiment_calculate(test_text)
print('sentiment_count',result1)
print('sentiment_calculate',result2)`
以下是运行结果。
`D:\Anaconda3\python.exe D:/pythonproject/test1/金融文本情感-中文-cnsenti.py
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\刘铭基\AppData\Local\Temp\jieba.cache
Loading model cost 0.685 seconds.
Prefix dict has been built succesfully.
Traceback (most recent call last):
File "D:/pythonproject/test1/金融文本情感-中文-cnsenti.py", line 9, in
result2 = senti.sentiment_calculate(test_text)
File "D:\Anaconda3\lib\site-packages\cnsenti\sentiment.py", line 218, in sentiment_calculate
pos = np.sum(score_array[:, 0])
IndexError: too many indices for array
进程已结束,退出代码 1
`
最后非常感谢您的贡献,希望得到您的帮助!
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
好的。非常感谢您!
祝您工作顺利!
发自我的iPhone
…------------------ 原始邮件 ------------------
发件人: thunderhit <[email protected]>
发送时间: 2020年4月12日 12:13
收件人: thunderhit/cnsenti <[email protected]>
抄送: DesmondLiu <[email protected]>, Author <[email protected]>
主题: 回复:[thunderhit/cnsenti] 使用自定义词典时的问题 (#1)
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
您好,我在使用“通过自定义词典”来进行情感分析的过程中遇到些问题。想向您请教。
问题一,我遇到了不识别词典的问题。以下是代码。
`from cnsenti import Sentiment
senti = Sentiment(pos=r"D:\情感词典\负面词典.txt", #正面词典txt文件相对路径
neg=r"D:\情感词典\正面词典.txt", #负面词典txt文件相对路径
encoding='unicode_escape') #两txt均为utf-8编码
test_text = '这家公司是行业的引领者,是中流砥柱。今年的业绩非常好'
result1 = senti.sentiment_count(test_text)
result2 = senti.sentiment_calculate(test_text)
print('sentiment_count',result1)
print('sentiment_calculate',result2)`
以下是运行结果,可以发现显示的积极词数是0。这个文本中的词,”引领者“和”中流砥柱“均是词典中的积极词。
`D:\Anaconda3\python.exe D:/pythonproject/test1/金融文本情感-中文-cnsenti.py
Building prefix dict from the default dictionary ...
Loading model from cache C:\User\AppData\Local\Temp\jieba.cache
Loading model cost 0.675 seconds.
sentiment_count {'words': 15, 'sentences': 2, 'pos': 0, 'neg': 0}
Prefix dict has been built succesfully.
sentiment_calculate {'sentences': 2, 'words': 15, 'pos': 0, 'neg': 0}
进程已结束,退出代码 0
`
问题二,我遇到了被检测文本的末尾如果有句号,则报错。以下是代码。
`from cnsenti import Sentiment
senti = Sentiment(pos=r"D:\情感词典\负面词典.txt", #正面词典txt文件相对路径
neg=r"D:\情感词典\正面词典.txt", #负面词典txt文件相对路径
encoding='unicode_escape') #两txt均为utf-8编码
test_text = '这家公司是行业的引领者,是中流砥柱。今年的业绩非常好。'
result1 = senti.sentiment_count(test_text)
result2 = senti.sentiment_calculate(test_text)
print('sentiment_count',result1)
print('sentiment_calculate',result2)`
以下是运行结果。
`D:\Anaconda3\python.exe D:/pythonproject/test1/金融文本情感-中文-cnsenti.py
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\AppData\Local\Temp\jieba.cache
Loading model cost 0.685 seconds.
Prefix dict has been built succesfully.
Traceback (most recent call last):
File "D:/pythonproject/test1/金融文本情感-中文-cnsenti.py", line 9, in
result2 = senti.sentiment_calculate(test_text)
File "D:\Anaconda3\lib\site-packages\cnsenti\sentiment.py", line 218, in sentiment_calculate
pos = np.sum(score_array[:, 0])
IndexError: too many indices for array
进程已结束,退出代码 1
`
最后非常感谢您的贡献,希望得到您的帮助!
The text was updated successfully, but these errors were encountered: