Improve model score #119

connorjoleary · 2022-01-28T00:05:35Z

Is your feature request related to a problem? Please describe.
Currently the model takes the similarity of each block of text to the parent. Then after every block is computed, it multiplies this score by it's similarity to the original claim. This algorithm leads to two problems:

The claims which are more dissimilar, but further down in the tree don't get scored highly. We should take into account how far down the source is an if it is very far down increase the score.
The algorithm can have runaway branches where it doesn't realize how far off of the claim it is going, because it only looks at the similarity to the parent.

Describe the solution you'd like
A model which calculates the score for each layer and takes into account the similarity to the original claim (not just at the end) and weights deeper nodes as better.

Describe alternatives you've considered
Potentially this is where we could introduce a simple ml model which creates these scores based on the data we have labeled, but this seems like overkill at this point.

Additional context
Using a Jupyter notebook to rerun data would be a good way to test. Then you can see if the labeled data is getting higher or lower scores.

…connor-#119

connorjoleary self-assigned this Jan 29, 2022

connorjoleary added a commit that referenced this issue Apr 14, 2022

Merge branch 'connor-#119' of github.com:connorjoleary/DeepCite into …

8c3e26c

…connor-#119

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve model score #119

Improve model score #119

connorjoleary commented Jan 28, 2022

Improve model score #119

Improve model score #119

Comments

connorjoleary commented Jan 28, 2022