Plan out what a multigraph approach to coordination networks is #24

SamHames · 2021-10-08T01:54:53Z

Thoughts to be considered off the top of my head?

What does it mean to create multigraphs - do we need to keep track of which graphs have been created?
Does the CLI need some changes - I feel like there may need to be a different interface to making sense of the idea that outputs are multigraphs by default?
Are there implications for the downstream graph formats?

SamHames · 2021-12-21T04:12:31Z

Multigraph proposal

What makes sense to me is that we start to think about multigraphs as composed from both different types of networks, but also different parameters for the same network. For example, a co-reply network built on 60, 900, and 3600 seconds are qualitatively different and describe different kinds of coordination, even though they're all co-reply networks.

The multigraph of coordination we construct is then composed of accounts and associated metadata as the nodes as for now, but with distinct types of edges. Each directed edge is characterised by the count of coordinated messages, the type of event used for coordination, the leading and lagging time window (to account for symmetric and asymetric windows, which allows us to tackle #13 ), and any additional parameters used in the network construction (such as the similarity threshold).

I propose that a starting point for enabling this is the following workflow, which extends slightly on the current workflow:

Preprocess data to ingest messages as for now
Construct as many networks as necessary, either by running a series of individual commands, or by specifying a set of networks and different parameters from a configuration file.
Output a single multigraph in graphml format, including all constructed networks by default, or a selected subset.

The new components here are:

a new datastructure to store networks, that is aware of both edge weights, edge types and associated parameters
step 2 requires keeping track of both the edges and associated parameters for that network construction, rather than just the type of network as currently
step 2 and step 3 also imply that we will have a more standardised tracking infrastructure for listing which network types and parameters have already been run
functionality to read from a configuration file and map that to a set of networks to be created

The machinery for this also suggests a couple of possible quality of life improvements for workflows that let's us tackle some of #25:

if we're keeping track of the networks that have already been created in a convenient inventory, we can also start to track data/files that have been inserted + also track whether the networks are actually up to date
Preprocessing that results in new data being inserted could mark existing networks as stale, and we could provide functionality to refresh those networks in bulk from a single command
Writing graphml files could also warn or error if networks are marked as stale

This also might be a good opportunity to review the CLI, and see if we can refactor that to be a little more consistent.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Plan out what a multigraph approach to coordination networks is #24

Plan out what a multigraph approach to coordination networks is #24

SamHames commented Oct 8, 2021

SamHames commented Dec 21, 2021

Plan out what a multigraph approach to coordination networks is #24

Plan out what a multigraph approach to coordination networks is #24

Comments

SamHames commented Oct 8, 2021

SamHames commented Dec 21, 2021

Multigraph proposal