Skip to content

Latest commit

 

History

History
17 lines (10 loc) · 379 Bytes

文本相似度专题.md

File metadata and controls

17 lines (10 loc) · 379 Bytes

编辑距离

Jaccard similarity:

$$ J(A,B) = \frac{|A \cap B|}{|A \cup B|} = \frac{|A \cap B|}{|A|+|B|-|A \cap B|} $$

Jaccard distance:

$$ D(A,B) = 1 - J(A,B) $$

当每个标签具有权重时,相似度定义为:

$$ J(x,y) = \frac{\sum\nolimits_{i}min(x_i,y_i)}{\sum\nolimits_{i}max(x_i,y_i)} $$