Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plan out what a multigraph approach to coordination networks is #24

Open
SamHames opened this issue Oct 8, 2021 · 1 comment
Open

Comments

@SamHames
Copy link
Collaborator

SamHames commented Oct 8, 2021

Thoughts to be considered off the top of my head?

  • What does it mean to create multigraphs - do we need to keep track of which graphs have been created?
  • Does the CLI need some changes - I feel like there may need to be a different interface to making sense of the idea that outputs are multigraphs by default?
  • Are there implications for the downstream graph formats?
@SamHames
Copy link
Collaborator Author

Multigraph proposal

What makes sense to me is that we start to think about multigraphs as composed from both different types of networks, but also different parameters for the same network. For example, a co-reply network built on 60, 900, and 3600 seconds are qualitatively different and describe different kinds of coordination, even though they're all co-reply networks.

The multigraph of coordination we construct is then composed of accounts and associated metadata as the nodes as for now, but with distinct types of edges. Each directed edge is characterised by the count of coordinated messages, the type of event used for coordination, the leading and lagging time window (to account for symmetric and asymetric windows, which allows us to tackle #13 ), and any additional parameters used in the network construction (such as the similarity threshold).

I propose that a starting point for enabling this is the following workflow, which extends slightly on the current workflow:

  1. Preprocess data to ingest messages as for now
  2. Construct as many networks as necessary, either by running a series of individual commands, or by specifying a set of networks and different parameters from a configuration file.
  3. Output a single multigraph in graphml format, including all constructed networks by default, or a selected subset.

The new components here are:

  • a new datastructure to store networks, that is aware of both edge weights, edge types and associated parameters
  • step 2 requires keeping track of both the edges and associated parameters for that network construction, rather than just the type of network as currently
  • step 2 and step 3 also imply that we will have a more standardised tracking infrastructure for listing which network types and parameters have already been run
  • functionality to read from a configuration file and map that to a set of networks to be created

The machinery for this also suggests a couple of possible quality of life improvements for workflows that let's us tackle some of #25:

  • if we're keeping track of the networks that have already been created in a convenient inventory, we can also start to track data/files that have been inserted + also track whether the networks are actually up to date
  • Preprocessing that results in new data being inserted could mark existing networks as stale, and we could provide functionality to refresh those networks in bulk from a single command
  • Writing graphml files could also warn or error if networks are marked as stale

This also might be a good opportunity to review the CLI, and see if we can refactor that to be a little more consistent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant