You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The following code seems to have a bug in that it reshapes a matrix in an apparent attempt to transpose:
function embedding(embedding_matrix, x)
# inefficient code:
temp = embedding_matrix[:, Int64(x[1])+1]
for i=2:length(x)
temp = hcat(temp, embedding_matrix[:, Int64(x[i])+1])
end
# bug
return reshape(temp, reverse(size(temp)))
end
I propose the following:
function embedding(embedding_matrix, x)
temp = embedding_matrix[:, Int64.(x).+1]';
end
# After this change the results of the following are much improved
d_good = StringDocument("A very nice thing that everyone likes")
prepare!(d_good, strip_case | strip_punctuation )
d_bad = StringDocument("A boring long dull movie that no one likes")
prepare!(d_bad, strip_case | strip_punctuation)
s = SentimentAnalyzer()
s(d_good)
s(d_bad)
The text was updated successfully, but these errors were encountered:
I believe there is a deeper problem in the word embedding. Bad words such as "hate" get good scores.
using TextAnalysis
d_bad = StringDocument("a horrible thing that everyone hates")
prepare!(d_bad, strip_case | strip_punctuation)
s = SentimentAnalyzer()
[s(w) for w in words]
I note that there are 88587 words in the dictionary length(s.model.words)
but the lookup table has only 5000 entries size(s.model.weight[:embedding_1]["embedding_1"]["embeddings:0"],2)
Invoking s(d_illegal), with d_illegal being any StringDocument containing a word mapping higher than 5000 will cause an error. I don't know where the weights come from exactly so I can't track it down.
The following code seems to have a bug in that it reshapes a matrix in an apparent attempt to transpose:
I propose the following:
The text was updated successfully, but these errors were encountered: