Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify evaluation to plain functions #3

Open
rafguns opened this issue Feb 13, 2015 · 1 comment
Open

Simplify evaluation to plain functions #3

rafguns opened this issue Feb 13, 2015 · 1 comment

Comments

@rafguns
Copy link
Owner

rafguns commented Feb 13, 2015

sklearn.metrics is in a way much simpler, using plain fuctions. Can we do something analogous or even depend on scikit-learn for stuff like ROC, recall-precision etc.?

@rafguns
Copy link
Owner Author

rafguns commented Nov 6, 2019

Looking at this again. The main issue is that linkpred uses its own data structure (Scoresheet) to track prediction scores. This has at least two advantages:

  1. Order of nodes is never a problem: (a,b) == (b,a) and ranking of pairs with the same scores is deterministic
  2. Only node pairs for which there is a prediction need to be tracked, which is less memory-intensive. This is especially a concern for larger networks. E.g. 5000 nodes yield 12497500 node pairs.

Especially 2 is fundamentally different from scikit-learn.

The way forward is probably to replace Scoresheet with a Pandas Series, whose keys are all node pairs and whose values are scores. The index could be built prior to evaluation:

idx = pd.MultiIndex.from_tuples(itertools.combinations(G.nodes(), 2))

and shared across evaluations. The underlying numpy array could then be passed to scikit-learn metrics.

I am not yet sure how best to deal with 1 and/or to what extent it constitutes a problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant