documentation/graph-data-science.neo4j-browser-guide

<style type="text/css" media="screen">
/*
.nodes-image {
	margin:-100;
}
*/	
@import url("//maxcdn.bootstrapcdn.com/font-awesome/4.4.0/css/font-awesome.min.css");

.imageblock .content img, .image img {max-width: 100%;}
.deck h3, .deck h4 {display: block !important;margin-bottom:8px;margin-top:5px;}
.listingblock {margin:8px;}
.pull-bottom {position:relative;bottom:1em;}
.admonitionblock td.icon [class^="fa icon-"]{font-size:2.5em;text-shadow:1px 1px 2px rgba(0,0,0,.5);cursor:default}
.admonitionblock td.icon .icon-note:before{content:"\f05a";color:#19407c}
.admonitionblock td.icon .icon-tip:before{content:"\f0eb";text-shadow:1px 1px 2px rgba(155,155,0,.8);color:#111}
.admonitionblock td.icon .icon-warning:before{content:"\f071";color:#bf6900}
.admonitionblock td.icon .icon-caution:before{content:"\f06d";color:#bf3400}
.admonitionblock td.icon .icon-important:before{content:"\f06a";color:#bf0000}
.admonitionblock.note.speaker { display:none; }
</style>
<style type="text/css" media="screen">
/* #editor.maximize-editor .CodeMirror-code { font-size:24px; line-height:26px; } */
</style>
<article class="guide" ng-controller="AdLibDataController">
  <carousel class="deck container-fluid">
    <!--slide class="row-fluid">
      <div class="col-sm-3">
        <h3>Graph Algorithms</h3>
        <p class="lead">Information</p>
			<!dl>
				
				
			</dl>
		</div>
      <div class="col-sm-9">
        <figure>
          <img style="width:300px" src=""/>
        </figure>
      </div>
    </slide-->
    

   <h4>Graph Algorithms</h4>
   

<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>Neo4j Graph Data Science</h3>
    <br/>
    <div>
      <div class="paragraph">
<p>The Neo4j Graph Data Science (GDS) library contains a set of graph algorithms, exposed through Cypher procedures.
Graph algorithms provide insights into the graph structure and elements, for example, by computing centrality and similarity scores, and detecting communities.
The GDS library is divided into three tiers of maturity: product, beta and alpha.</p>
</div>
<div class="paragraph">
<p>This guide follows the ordinary workflow for running the product tier algorithms: PageRank, Label Propagation, Weakly Connected Components, Louvain, and Node Similarity.</p>
</div>
<div class="ulist">
<ul>
<li>
<p>Estimate memory usage for your graph and the algorithm you want to run.</p>
</li>
<li>
<p>Create a graph and manage created graphs.</p>
</li>
<li>
<p>Configure the algorithm to suit your needs and run it in one of the supported modes: stream, write, and stats.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>For more resources, see <a href="https://neo4j.com/developer/graph-data-science/" target="_blank">the developer guides</a>.</p>
</div>
<div class="paragraph">
<p>The official Graph Data Science (GDS) library documentation can be found <a href="https://neo4j.com/docs/graph-data-science/current/" target="_blank">here</a>.</p>
</div>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>The example dataset</h3>
    <br/>
    <div>
      <div class="imageblock" style="float: right;">
<div class="content">
<img src="https://upload.wikimedia.org/wikipedia/en/2/24/AStormOfSwords.jpg" alt="AStormOfSwords" width="150">
</div>
</div>
<div class="paragraph">
<p>Before you can run any of the algorithms, you need to import your data in Neo4j.<br>
The example dataset used to demonstrate the GDS library is based on the Game of Thrones fantasy saga.
You may recognize it from the blogs, events, and sandbox.
However, both data and queries are different enough from previous installments that it merits your attention.
&#160;<br>
&#160;<br>
&#160;<br></p>
</div>


   <h4>Attribution</h4>
   <div class="paragraph">
<p>The dataset is partly based on the following works:</p>
</div>
<div class="paragraph">
<p><em><a href="https://networkofthrones.wordpress.com/" target="_blank">Network of Thrones, A Song of Math and Westeros</a>, research by Dr. Andrew Beveridge.</em><br>
<em><a href="https://www.macalester.edu/~abeverid/index.html" target="_blank">A. Beveridge and J. Shan, "Network of Thrones," Math Horizons Magazine , Vol. 23, No. 4 (2016), pp. 18-22</a></em><br>
<em><a href="https://www.kaggle.com/mylesoneill/game-of-thrones">Game of Thrones, Explore deaths and battles from this fantasy world</a>, by Myles O&#8217;Neill, <a href="https://www.kaggle.com/" target="_blank">https://www.kaggle.com/</a></em><br>
<em><a href="https://github.com/tomasonjo/neo4j-game-of-thrones" target="_blank">Game of Thrones</a>, by Tomaz Bratanic, GitHub repository.</em></p>
</div>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>Graph of character interactions.. and more</h3>
    <br/>
    <div>
      <div class="paragraph">
<p>The graph contains <code>:Person</code> nodes, representing the characters, and <code>:INTERACTS</code> relationships, representing the characters' interactions.
An interaction occurs each time two characters' names (or nicknames) <strong>appear within 15 words of one another</strong> in the book text.
For more information about the data extraction process, see <em><a href="https://networkofthrones.wordpress.com/from-book-to-network/" target="_blank">Network of Thrones, A Song of Math and Westeros</a>, research by Dr. Andrew Beveridge.</em></p>
</div>
<div class="paragraph">
<p>The <code>(:Person)-[:INTERACTS]&#8594;(:Person)</code> graph is enriched with data on houses, battles, commanders, kings, knights, regions, locations, and deaths.</p>
</div>
<div class="paragraph">
<p>Now, let&#8217;s import the data.</p>
</div>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>Data ingestion</h3>
    <br/>
    <div>
      <div class="listingblock">
<div class="title">Enable <code>multi statement queries</code></div>
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding"><!--code-->:config "enableMultiStatementMode":true<!--/code--></pre>
</div>
</div>
<div class="listingblock">
<div class="title">Create unique constraints on the names of the nodes <code>:Location</code>, <code>:Region</code>, <code>:Battle</code>, <code>:Person</code>, and <code>:House</code>. This ensures your data integrity and improves performance.</div>
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->CREATE CONSTRAINT IF NOT EXISTS FOR (n:Location) REQUIRE (n.name) IS UNIQUE;
CREATE CONSTRAINT IF NOT EXISTS FOR (n:Region) REQUIRE (n.name) IS UNIQUE;
CREATE CONSTRAINT IF NOT EXISTS FOR (n:Battle) REQUIRE (n.name) IS UNIQUE;
CREATE CONSTRAINT IF NOT EXISTS FOR (n:Person) REQUIRE (n.name) IS UNIQUE;
CREATE CONSTRAINT IF NOT EXISTS FOR (n:House) REQUIRE (n.name) IS UNIQUE;<!--/code--></pre>
</div>
</div>
<div class="listingblock">
<div class="title">Then, ingest the data.</div>
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->LOAD CSV WITH HEADERS FROM 'https://s3.eu-north-1.amazonaws.com/com.neo4j.gds.browser-guide/data/battles.csv' AS row
MERGE (b:Battle {name: row.name})
  ON CREATE SET b.year = toInteger(row.year),
  b.summer = row.summer,
  b.major_death = row.major_death,
  b.major_capture = row.major_capture,
  b.note = row.note,
  b.battle_type = row.battle_type,
  b.attacker_size = toInteger(row.attacker_size),
  b.defender_size = toInteger(row.defender_size);

LOAD CSV WITH HEADERS FROM 'https://s3.eu-north-1.amazonaws.com/com.neo4j.gds.browser-guide/data/battles.csv' AS row

// Because there is only attacker_outcome in the data, do a CASE statement for defender_outcome.
WITH row,
     CASE WHEN row.attacker_outcome = 'win' THEN 'loss'
       ELSE 'win'
       END AS defender_outcome

// Match the battle
MATCH (b:Battle {name: row.name})

// All battles have at least one attacker, so you don't have to use FOREACH.
MERGE (attacker1:House {name: row.attacker_1})
MERGE (attacker1)-[a1:ATTACKER]-&gt;(b)
  ON CREATE SET a1.outcome = row.attacker_outcome

// Use FOREACH to skip the null values.
FOREACH
(ignoreMe IN CASE WHEN row.defender_1 IS NOT NULL THEN [1]
  ELSE []
  END |
  MERGE (defender1:House {name: row.defender_1})
  MERGE (defender1)-[d1:DEFENDER]-&gt;(b)
    ON CREATE SET d1.outcome = defender_outcome
)
FOREACH
(ignoreMe IN CASE WHEN row.defender_2 IS NOT NULL THEN [1]
  ELSE []
  END |
  MERGE (defender2:House {name: row.defender_2})
  MERGE (defender2)-[d2:DEFENDER]-&gt;(b)
    ON CREATE SET d2.outcome = defender_outcome
)
FOREACH
(ignoreMe IN CASE WHEN row.attacker_2 IS NOT NULL THEN [1]
  ELSE []
  END |
  MERGE (attacker2:House {name: row.attacker_2})
  MERGE (attacker2)-[a2:ATTACKER]-&gt;(b)
    ON CREATE SET a2.outcome = row.attacker_outcome
)
FOREACH
(ignoreMe IN CASE WHEN row.attacker_3 IS NOT NULL THEN [1]
  ELSE []
  END |
  MERGE (attacker2:House {name: row.attacker_3})
  MERGE (attacker3)-[a3:ATTACKER]-&gt;(b)
    ON CREATE SET a3.outcome = row.attacker_outcome
)
FOREACH
(ignoreMe IN CASE WHEN row.attacker_4 IS NOT NULL THEN [1]
  ELSE []
  END |
  MERGE (attacker4:House {name: row.attacker_4})
  MERGE (attacker4)-[a4:ATTACKER]-&gt;(b)
    ON CREATE SET a4.outcome = row.attacker_outcome
);

LOAD CSV WITH HEADERS FROM
'https://s3.eu-north-1.amazonaws.com/com.neo4j.gds.browser-guide/data/battles.csv'
AS row
MATCH (b:Battle {name: row.name})

// Use coalesce to replace the null values with "Unknown".
MERGE (location:Location {name: coalesce(row.location, 'Unknown')})
MERGE (b)-[:IS_IN]-&gt;(location)
MERGE (region:Region {name: row.region})
MERGE (location)-[:IS_IN]-&gt;(region);

LOAD CSV WITH HEADERS FROM 'https://s3.eu-north-1.amazonaws.com/com.neo4j.gds.browser-guide/data/battles.csv' AS row

// Split the columns that may contain more than one person.
WITH row,
     split(row.attacker_commander, ',') AS att_commanders,
     split(row.defender_commander, ',') AS def_commanders,
     split(row.attacker_king, '/') AS att_kings,
     split(row.defender_king, '/') AS def_kings,
     row.attacker_outcome AS att_outcome,
     CASE WHEN row.attacker_outcome = 'win' THEN 'loss'
       ELSE 'win'
       END AS def_outcome
MATCH (b:Battle {name: row.name})

UNWIND att_commanders AS att_commander
MERGE (p:Person {name: trim(att_commander)})
MERGE (p)-[ac:ATTACKER_COMMANDER]-&gt;(b)
  ON CREATE SET ac.outcome = att_outcome

// To end the unwind and correct cardinality(number of rows), use any aggregation function ( e.g. count(*)).
WITH b, def_commanders, def_kings, att_kings, att_outcome, def_outcome,
     COUNT(*) AS c1
UNWIND def_commanders AS def_commander
MERGE (p:Person {name: trim(def_commander)})
MERGE (p)-[dc:DEFENDER_COMMANDER]-&gt;(b)
  ON CREATE SET dc.outcome = def_outcome

// Reset cardinality with an aggregation function (end the unwind).
WITH b, def_kings, att_kings, att_outcome, def_outcome, COUNT(*) AS c2
UNWIND def_kings AS def_king
MERGE (p:Person {name: trim(def_king)})
MERGE (p)-[dk:DEFENDER_KING]-&gt;(b)
  ON CREATE SET dk.outcome = def_outcome

// Reset cardinality with an aggregation function (end the unwind).
WITH b, att_kings, att_outcome, COUNT(*) AS c3
UNWIND att_kings AS att_king
MERGE (p:Person {name: trim(att_king)})
MERGE (p)-[ak:ATTACKER_KING]-&gt;(b)
  ON CREATE SET ak.outcome = att_outcome;

LOAD CSV WITH HEADERS FROM
'https://s3.eu-north-1.amazonaws.com/com.neo4j.gds.browser-guide/data/character-deaths.csv'
AS row

WITH row,
     CASE WHEN row.Nobility = '1' THEN 'Noble'
       ELSE 'Commoner'
       END AS status_value

// Remove House for better linking.
MERGE (house:House {name: replace(row.Allegiances, 'House ', '')})
MERGE (person:Person {name: row.Name})

SET person.gender = CASE WHEN row.Gender = '1' THEN 'male'
  ELSE 'female'
  END,
person.book_intro_chapter = row.`Book Intro Chapter`,
person.book_death_chapter = row.`Death Chapter`,
person.book_of_death = row.`Book of Death`,
person.death_year = toInteger(row.`Death Year`)
MERGE (person)-[:BELONGS_TO]-&gt;(house)
MERGE (status:Status {name: status_value})
MERGE (person)-[:HAS_STATUS]-&gt;(status)

// Use FOREACH to skip the null values.
FOREACH
(ignoreMe IN CASE WHEN row.GoT = '1' THEN [1]
  ELSE []
  END |
  MERGE (book1:Book {sequence: 1})
    ON CREATE SET book1.name = 'Game of thrones'
  MERGE (person)-[:APPEARED_IN]-&gt;(book1)
)
FOREACH
(ignoreMe IN CASE WHEN row.CoK = '1' THEN [1]
  ELSE []
  END |
  MERGE (book2:Book {sequence: 2})
    ON CREATE SET book2.name = 'Clash of kings'
  MERGE (person)-[:APPEARED_IN]-&gt;(book2)
)
FOREACH
(ignoreMe IN CASE WHEN row.SoS = '1' THEN [1]
  ELSE []
  END |
  MERGE (book3:Book {sequence: 3})
    ON CREATE SET book3.name = 'Storm of swords'
  MERGE (person)-[:APPEARED_IN]-&gt;(book3)
)
FOREACH
(ignoreMe IN CASE WHEN row.FfC = '1' THEN [1]
  ELSE []
  END |
  MERGE (book4:Book {sequence: 4})
    ON CREATE SET book4.name = 'Feast for crows'
  MERGE (person)-[:APPEARED_IN]-&gt;(book4)
)
FOREACH
(ignoreMe IN CASE WHEN row.DwD = '1' THEN [1]
  ELSE []
  END |
  MERGE (book5:Book {sequence: 5})
    ON CREATE SET book5.name = 'Dance with dragons'
  MERGE (person)-[:APPEARED_IN]-&gt;(book5)
)
FOREACH
(ignoreMe IN CASE WHEN row.`Book of Death` IS NOT NULL THEN [1]
  ELSE []
  END |
  MERGE (book:Book {sequence: toInteger(row.`Book of Death`)})
  MERGE (person)-[:DIED_IN]-&gt;(book)
);

LOAD CSV WITH HEADERS FROM
'https://s3.eu-north-1.amazonaws.com/com.neo4j.gds.browser-guide/data/character-predictions.csv'
AS row
MERGE (p:Person {name: row.name})
// Set properties on the person node.
SET p.title = row.title,
p.death_year = toInteger(row.DateoFdeath),
p.birth_year = toInteger(row.dateOfBirth),
p.age = toInteger(row.age),
p.gender = CASE WHEN row.male = '1' THEN 'male'
  ELSE 'female'
  END

// Use FOREACH to skip the null values.
FOREACH
(ignoreMe IN CASE WHEN row.mother IS NOT NULL THEN [1]
  ELSE []
  END |
  MERGE (mother:Person {name: row.mother})
  MERGE (p)-[:RELATED_TO {name: 'mother'}]-&gt;(mother)
)
FOREACH
(ignoreMe IN CASE WHEN row.spouse IS NOT NULL THEN [1]
  ELSE []
  END |
  MERGE (spouse:Person {name: row.spouse})
  MERGE (p)-[:RELATED_TO {name: 'spouse'}]-&gt;(spouse)
)
FOREACH
(ignoreMe IN CASE WHEN row.father IS NOT NULL THEN [1]
  ELSE []
  END |
  MERGE (father:Person {name: row.father})
  MERGE (p)-[:RELATED_TO {name: 'father'}]-&gt;(father)
)
FOREACH
(ignoreMe IN CASE WHEN row.heir IS NOT NULL THEN [1]
  ELSE []
  END |
  MERGE (heir:Person {name: row.heir})
  MERGE (p)-[:RELATED_TO {name: 'heir'}]-&gt;(heir)
)

// Remove "House " from the value for better linking.
FOREACH
(ignoreMe IN CASE WHEN row.house IS NOT NULL THEN [1]
  ELSE []
  END |
  MERGE (house:House {name: replace(row.house, 'House ', '')})
  MERGE (p)-[:BELONGS_TO]-&gt;(house)
);

LOAD CSV WITH HEADERS FROM
'https://s3.eu-north-1.amazonaws.com/com.neo4j.gds.browser-guide/data/character-predictions.csv'
AS row

MERGE (p:Person {name: row.name})

// Use FOREACH to skip the null values. Lower row.culture for better linking.
FOREACH
(ignoreMe IN CASE WHEN row.culture IS NOT NULL THEN [1]
  ELSE []
  END |
  MERGE (culture:Culture {name: toLower(row.culture)})
  MERGE (p)-[:MEMBER_OF_CULTURE]-&gt;(culture)
)
FOREACH
(ignoreMe IN CASE WHEN row.book1 = '1' THEN [1]
  ELSE []
  END |
  MERGE (book:Book {sequence: 1})
  MERGE (p)-[:APPEARED_IN]-&gt;(book)
)
FOREACH
(ignoreMe IN CASE WHEN row.book2 = '1' THEN [1]
  ELSE []
  END |
  MERGE (book:Book {sequence: 2})
  MERGE (p)-[:APPEARED_IN]-&gt;(book)
)
FOREACH
(ignoreMe IN CASE WHEN row.book3 = '1' THEN [1]
  ELSE []
  END |
  MERGE (book:Book {sequence: 3})
  MERGE (p)-[:APPEARED_IN]-&gt;(book)
)
FOREACH
(ignoreMe IN CASE WHEN row.book4 = '1' THEN [1]
  ELSE []
  END |
  MERGE (book:Book {sequence: 4})
  MERGE (p)-[:APPEARED_IN]-&gt;(book)
)
FOREACH
(ignoreMe IN CASE WHEN row.book5 = '1' THEN [1]
  ELSE []
  END |
  MERGE (book:Book {sequence: 5})
  MERGE (p)-[:APPEARED_IN]-&gt;(book)
);

LOAD CSV WITH HEADERS FROM 'https://s3.eu-north-1.amazonaws.com/com.neo4j.gds.browser-guide/data/character-predictions.csv' AS row

WITH row,
     CASE WHEN row.isAlive = '0' THEN [1]
       ELSE []
       END AS dead_person,
     CASE WHEN row.isAliveMother = '0' THEN [1]
       ELSE []
       END AS dead_mother,
     CASE WHEN row.isAliveFather = '0' THEN [1]
       ELSE []
       END AS dead_father,
     CASE WHEN row.isAliveHeir = '0' THEN [1]
       ELSE []
       END AS dead_heir,
     CASE WHEN row.isAliveSpouse = '0' THEN [1]
       ELSE []
       END AS dead_spouse

MATCH (p:Person {name: row.name})

// Use OPTIONAL MATCH (mother:Person {name: row.mother}) not to stop the query if the Person is not found.
OPTIONAL MATCH (mother:Person {name: row.mother})
OPTIONAL MATCH (father:Person {name: row.father})
OPTIONAL MATCH (heir:Person {name: row.heir})
OPTIONAL MATCH (spouse:Spouse {name: row.spouse})

// Set the label Dead to each dead person.
FOREACH (d IN dead_person |
  SET p:Dead
)
FOREACH (d IN dead_mother |
  SET mother:Dead
)
FOREACH (d IN dead_father |
  SET father:Dead
)
FOREACH (d IN dead_heir |
  SET heir:Dead
)
FOREACH (d IN dead_spouse |
  SET spouse:Dead
);

MATCH (p:Person) where p.death_year is not null
SET p:Dead;

MATCH (p:Person)-[:DEFENDER_KING|ATTACKER_KING]-()
SET p:King;

MATCH (p:Person) where toLower(p.title) contains "king"
SET p:King;

MATCH (p:Person) where p.title = "Ser"
SET p:Knight;

// Map the names coming from the different data sources.
:param [map] =&gt; {
  RETURN
    {
      `Aemon Targaryen (Maester Aemon)`: 'Aemon Targaryen (son of Maekar I)',
      `Arstan`:                          'Barristan Selmy',
      `Garin (orphan)`:                  'Garin (Orphan)',
      `Hareth (Moles Town)`:             "Hareth (Mole's Town)",
      `Jaqen Hghar`:                     "Jaqen H'ghar",
      `Lommy Greenhands`:                'Lommy',
      `Rattleshirt`:                     'Lord of Bones',
      `Thoros of Myr`:                   'Thoros'
    } AS map
};

LOAD CSV WITH HEADERS FROM 'https://raw.githubusercontent.com/mathbeveridge/asoiaf/2d8ded13eda5128ace5e3b995253d69e62bc4bf6/data/asoiaf-book1-edges.csv' AS row
WITH replace(row.Source, '-', ' ') AS srcName,
     replace(row.Target, '-', ' ') AS tgtName,
     toInteger(row.weight) AS weight
MERGE (src:Person {name: coalesce($map[srcName], srcName)})
MERGE (tgt:Person {name: coalesce($map[tgtName], tgtName)})
MERGE (src)-[i:INTERACTS {book: 1}]-&gt;(tgt)
  ON CREATE SET i.weight = weight
  ON MATCH SET i.weight = i.weight + weight
MERGE (src)-[r:INTERACTS_1]-&gt;(tgt)
  ON CREATE SET r.weight = weight, r.book = 1;

LOAD CSV WITH HEADERS FROM 'https://raw.githubusercontent.com/mathbeveridge/asoiaf/2d8ded13eda5128ace5e3b995253d69e62bc4bf6/data/asoiaf-book2-edges.csv' AS row
WITH replace(row.Source, '-', ' ') AS srcName,
     replace(row.Target, '-', ' ') AS tgtName,
     toInteger(row.weight) AS weight
MERGE (src:Person {name: coalesce($map[srcName], srcName)})
MERGE (tgt:Person {name: coalesce($map[tgtName], tgtName)})
MERGE (src)-[i:INTERACTS {book: 2}]-&gt;(tgt)
  ON CREATE SET i.weight = weight
  ON MATCH SET i.weight = i.weight + weight
MERGE (src)-[r:INTERACTS_2]-&gt;(tgt)
  ON CREATE SET r.weight = weight, r.book = 2;

LOAD CSV WITH HEADERS FROM 'https://raw.githubusercontent.com/mathbeveridge/asoiaf/2d8ded13eda5128ace5e3b995253d69e62bc4bf6/data/asoiaf-book3-edges.csv' AS row
WITH replace(row.Source, '-', ' ') AS srcName,
     replace(row.Target, '-', ' ') AS tgtName,
     toInteger(row.weight) AS weight
MERGE (src:Person {name: coalesce($map[srcName], srcName)})
MERGE (tgt:Person {name: coalesce($map[tgtName], tgtName)})
MERGE (src)-[i:INTERACTS {book: 3}]-&gt;(tgt)
  ON CREATE SET i.weight = weight
  ON MATCH SET i.weight = i.weight + weight
MERGE (src)-[r:INTERACTS_3]-&gt;(tgt)
  ON CREATE SET r.weight = weight, r.book = 3;

LOAD CSV WITH HEADERS FROM 'https://raw.githubusercontent.com/mathbeveridge/asoiaf/2d8ded13eda5128ace5e3b995253d69e62bc4bf6/data/asoiaf-book4-edges.csv' AS row
WITH replace(row.Source, '-', ' ') AS srcName,
     replace(row.Target, '-', ' ') AS tgtName,
     toInteger(row.weight) AS weight
MERGE (src:Person {name: coalesce($map[srcName], srcName)})
MERGE (tgt:Person {name: coalesce($map[tgtName], tgtName)})
MERGE (src)-[i:INTERACTS {book: 4}]-&gt;(tgt)
  ON CREATE SET i.weight = weight
  ON MATCH SET i.weight = i.weight + weight
MERGE (src)-[r:INTERACTS_4]-&gt;(tgt)
  ON CREATE SET r.weight = weight, r.book = 4;

LOAD CSV WITH HEADERS FROM 'https://raw.githubusercontent.com/mathbeveridge/asoiaf/2d8ded13eda5128ace5e3b995253d69e62bc4bf6/data/asoiaf-book5-edges.csv' AS row
WITH replace(row.Source, '-', ' ') AS srcName,
     replace(row.Target, '-', ' ') AS tgtName,
     toInteger(row.weight) AS weight
MERGE (src:Person {name: coalesce($map[srcName], srcName)})
MERGE (tgt:Person {name: coalesce($map[tgtName], tgtName)})
MERGE (src)-[i:INTERACTS {book: 5}]-&gt;(tgt)
  ON CREATE SET i.weight = weight
  ON MATCH SET i.weight = i.weight + weight
MERGE (src)-[r:INTERACTS_5]-&gt;(tgt)
  ON CREATE SET r.weight = weight, r.book = 5;<!--/code--></pre>
</div>
</div>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>Data visualization</h3>
    <br/>
    <div>
      <div class="paragraph">
<p>Let&#8217;s briefly explore the dataset before running some algorithms.</p>
</div>
<div class="paragraph">
<p>Run the following query to visualize the schema of your graph:</p>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->CALL db.schema.visualization()<!--/code--></pre>
</div>
</div>
<div class="paragraph">
<p>The <code>:Dead</code>, <code>:King</code>, and <code>:Knight</code> labels all appear on <code>:Person</code> nodes.
You may find it useful to remove them from the visualization to make it easier to inspect.</p>
</div>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>Summary statistics</h3>
    <br/>
    <div>
      <div class="paragraph">
<p>Calculate some simple statistics to see how data is distributed.
For example, find the minimum, maximum, average, and standard deviation of the number of interactions per character:</p>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->MATCH (c:Person)-[:INTERACTS]-&gt;()
WITH c, count(*) AS num
RETURN min(num) AS min, max(num) AS max, avg(num) AS avg_interactions, stdev(num) AS stdev<!--/code--></pre>
</div>
</div>
<div class="paragraph">
<p>Calculate the same grouped by book:</p>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->MATCH (c:Person)-[r:INTERACTS]-&gt;()
WITH r.book AS book, c, count(*) AS num
RETURN book, min(num) AS min, max(num) AS max, avg(num) AS avg_interactions, stdev(num) AS stdev
ORDER BY book<!--/code--></pre>
</div>
</div>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>Estimate memory usage: why?</h3>
    <br/>
    <div>
      <div class="paragraph">
<p>Now that you have data and know something about its shape, you need to estimate the memory usage of your graph and algorithm(s), and to configure your Neo4j Server with a much larger heap size than for a transactional deployment.
Why?</p>
</div>
<div class="paragraph">
<p>Because, the graph algorithms run on an in-memory, heap-allocated projection of the Neo4j graph, which resides outside the main database.
This means that before you execute an algorithm, you must create (explicitly or implicitly) a projection of your graph in memory.</p>
</div>
<div class="paragraph">
<p>However, creating graphs and running algorithms on them can have a significant memory footprint.</p>
</div>
<div class="paragraph">
<p>Therefore, a good habit is always to estimate the amount of RAM you need and configure a large heap size before running a heavy memory workload.</p>
</div>
<div class="paragraph">
<p>In the following three chapters, you will be able to exercise memory estimation and explore its results.</p>
</div>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>Memory estimation: graphs</h3>
    <br/>
    <div>
      <div class="paragraph">
<p>The GDS library offers a set of procedures that can help you estimate the memory needed to create a graph and run algorithms.</p>
</div>
<div class="paragraph">
<p>To estimate the required memory for a subset of your graph, for example, the <code>Person</code> nodes and <code>INTERACTS</code> relationships, call the following procedure.</p>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->CALL gds.graph.project.estimate('Person', 'INTERACTS') YIELD nodeCount, relationshipCount, requiredMemory<!--/code--></pre>
</div>
</div>
<div class="paragraph">
<p>The result shows that the example graph is small.
So, you can create your projected graph and name it, for example, <code>got-interactions</code>.</p>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->CALL gds.graph.project('got-interactions', 'Person', 'INTERACTS')<!--/code--></pre>
</div>
</div>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>Estimate memory usage: algorithms</h3>
    <br/>
    <div>
      <div class="paragraph">
<p>To estimate the memory needed to execute an algorithm on your <code>got-interactions</code> graph, for example, Page Rank, call the following procedure.</p>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->CALL gds.pageRank.stream.estimate('got-interactions', {}) YIELD requiredMemory<!--/code--></pre>
</div>
</div>
<div class="paragraph">
<p>This estimation considers only the algorithm execution, as the graph is already in-memory.</p>
</div>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>Estimate memory usage: details</h3>
    <br/>
    <div>
      <div class="paragraph">
<p>If you want to look at the full details of the memory estimation, remove the <code>YIELD</code> clause.
The procedure returns a tree view and a map view of all the "components" with their memory estimates.</p>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->CALL gds.pageRank.stream.estimate('got-interactions', {})<!--/code--></pre>
</div>
</div>
<div class="paragraph">
<p>As you see, the more detailed views contain estimates on the individual compute steps and the result data structures.</p>
</div>
<div class="paragraph">
<p>Now, you can filter the result to the top level components: graph and algorithm.</p>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->CALL gds.pageRank.stream.estimate('got-interactions',{}) YIELD mapView
UNWIND [ x IN mapView.components | [x.name, x.memoryUsage] ] AS component
RETURN component[0] AS name, component[1] AS size<!--/code--></pre>
</div>
</div>
<div class="paragraph">
<p>For more details, see <em><a href="https://neo4j.com/docs/graph-data-science/current/common-usage/memory-estimation/" target="_blank">the Memory Estimation section in the GDS Manual</a></em>.</p>
</div>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>Memory estimation: cleanup</h3>
    <br/>
    <div>
      <div class="paragraph">
<p>If you do not want to use the projected graph anymore, a good practice is to release it from the memory.</p>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->CALL gds.graph.drop('got-interactions');<!--/code--></pre>
</div>
</div>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>Graph creation</h3>
    <br/>
    <div>
      <div class="paragraph">
<p>The first stage of execution in GDS is always graph creation, but what does this mean?</p>
</div>
<div class="paragraph">
<p>To enable fast caching of the graph topology, containing only the relevant nodes, relationships, and weights, the GDS library operates on in-memory graphs that are created as projections of the Neo4j stored graph.</p>
</div>
<div class="paragraph">
<p>These projections may change the nature of the graph elements by any of the following:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>Subgraphing</p>
</li>
<li>
<p>Renaming relationship types or node labels</p>
</li>
<li>
<p>Merging several relationship types or node labels</p>
</li>
<li>
<p>Altering relationship direction</p>
</li>
<li>
<p>Aggregating parallel relationships and their properties</p>
</li>
<li>
<p>Deriving relationships from larger patterns</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>There are two ways of creating graphs – <em>explicit</em> and <em>implicit</em>.</p>
</div>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>Graph catalog</h3>
    <br/>
    <div>
      <div class="paragraph">
<p>The typical workflow is to create the projected graph <em>explicitly</em> by giving it a name and storing it in the <em>graph catalog</em>.
This allows you to operate on the graph multiple times.</p>
</div>
<div class="paragraph">
<p>In the <em>Memory estimation</em> chapters, you calculated the memory needed for creating a small graph of interactions, called <code>got-interactions</code>.
If you have removed it from the memory, you can create it again.
Because each <code>INTERACTS</code> relationship is symmetric, you can even ignore its direction by creating your graph with an <code>UNDIRECTED</code> orientation.</p>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->CALL gds.graph.project('got-interactions', 'Person', {
  INTERACTS: {
    orientation: 'UNDIRECTED'
  }
})<!--/code--></pre>
</div>
</div>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>Graph catalog: standard creation and Cypher projection</h3>
    <br/>
    <div>
      <div class="paragraph">
<p>The GDS library supports two approaches for loading projected graphs - <strong>standard creation</strong> (<code>gds.graph.project()</code>) and <strong>Cypher projection</strong> (<code>gds.graph.project.cypher()</code>).</p>
</div>
<div class="paragraph">
<p>In the <strong>standard creation</strong> approach, which you used to create your graph, you specify node labels and relationship types and project them onto the in-memory graph as labels and relationship types with new names.
You can further specify properties for each node label and relationship type.
For some use cases, this approach might be sufficient.
However, it is not possible to take only some nodes with a given label or only some relationships of a given type.
One way to work around it is by adding additional labels that define the desired subset of nodes that you want to project.</p>
</div>
<div class="paragraph">
<p>In the <strong>Cypher projection</strong> approach, you use Cypher queries to project nodes and relationships onto the in-memory graph.
Instead of specifying labels and relationship types, you define node-statements and relationship-statements.
In this way, you can leverage the expressivity of the Cypher language and describe your graph in a more sophisticated way.</p>
</div>
<div class="paragraph">
<p>It is important to note that the standard creation is orders of magnitude faster than the Cypher projection.
When designing a use case with Cypher projection at a production scale, make sure to measure the performance in advance.</p>
</div>
<div class="paragraph">
<p>Now, let’s try the Cypher projection and load the same graph with a new name, for example, <code>got-interactions-cypher</code>.</p>
</div>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>Graph catalog: Cypher projection</h3>
    <br/>
    <div>
      <div class="paragraph">
<p>You specify two queries: one for the nodes and one for the relationships.
You need to return <code>id</code>, <code>source</code>, and <code>target</code> columns and can optionally return label, relationship type and property columns.</p>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->CALL gds.graph.project.cypher(
  'got-interactions-cypher',
  'MATCH (n:Person) RETURN id(n) AS id',
  'MATCH (s:Person)-[i:INTERACTS]-&gt;(t:Person) RETURN id(s) AS source, id(t) AS target, i.weight AS weight'
)<!--/code--></pre>
</div>
</div>
<div class="paragraph">
<p>The first query returns the node IDs; the second one returns the source and target IDs of the relationships, as well as one relationship property <code>weight</code>.
Here, you can use any pair of Cypher queries as long as they return the expected columns and field types.<br>
To aggregate relationships, standard Cypher features can be used, such as <code>DISTINCT</code>.
You can find more details about relationship aggregations <em><a href="https://neo4j.com/docs/graph-data-science/current/management-ops/cypher-projection/#cypher-projection-relationship-aggregation" target="_blank">here</a></em>.</p>
</div>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>Graph catalog: Cypher projection of virtual relationships</h3>
    <br/>
    <div>
      <div class="paragraph">
<p>Another interesting feature of the Cypher graph projection is that it allows you to represent complex patterns by computing relationships that do not exist in the Neo4j stored graph.
This is especially useful when the algorithm you want to run supports only mono-partite graphs.<br>
For example, you can use the following query to create a graph with <code>Person</code> nodes connected with an (untyped) relationship if they belong to the same house.
The projected relationship does not exist in the stored graph.</p>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->CALL gds.graph.project.cypher(
  'same-house-graph',
  'MATCH (n:Person) RETURN id(n) AS id',
  'MATCH (p1:Person)-[:BELONGS_TO]-(:House)-[:BELONGS_TO]-(p2:Person) RETURN id(p1) AS source, id(p2) AS target'
)<!--/code--></pre>
</div>
</div>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>Graph catalog: listing</h3>
    <br/>
    <div>
      <div class="paragraph">
<p>After you create your projected graph, you can try several useful queries to manage it.</p>
</div>
<div class="paragraph">
<p>You can list all information about it by using following procedure:</p>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->CALL gds.graph.list('got-interactions-cypher')<!--/code--></pre>
</div>
</div>
<div class="paragraph">
<p>You can list the graphs you have loaded so far by using following procedure:</p>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->CALL gds.graph.list()<!--/code--></pre>
</div>
</div>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>Graph catalog: existence</h3>
    <br/>
    <div>
      <div class="paragraph">
<p>You can check if a graph exists by using the following procedure:</p>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->CALL gds.graph.exists('got-interactions')<!--/code--></pre>
</div>
</div>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>Graph catalog: removal</h3>
    <br/>
    <div>
      <div class="paragraph">
<p>You can free up memory space by dropping some of the created graphs from the catalog:</p>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->CALL gds.graph.drop('got-interactions-cypher');<!--/code--></pre>
</div>
</div>
<div class="paragraph">
<p><strong>TIP:</strong> It is a good practice to remove the unused graphs, yours and of the previous users, from the memory.</p>
</div>
<div class="paragraph">
<p><strong>NOTE:</strong> Multiple users running algorithms at the same time is not supported.</p>
</div>
<div class="paragraph">
<p>Now you are ready to run some actual algorithms.</p>
</div>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>Getting started with algorithms</h3>
    <br/>
    <div>
      <div class="paragraph">
<p>With Neo4j, you can run algorithms on explicitly and implicitly created graphs.<br>
In this tutorial, we will show you how to get the most out of the following algorithms:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>Page Rank</p>
</li>
<li>
<p>Label Propagation</p>
</li>
<li>
<p>Weakly Connected Components (WCC)</p>
</li>
<li>
<p>Louvain</p>
</li>
<li>
<p>Node Similarity</p>
</li>
<li>
<p>Triangle Count</p>
</li>
<li>
<p>Local Clustering Coefficient</p>
</li>
</ul>
</div>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>Algorithm syntax: explicit graphs</h3>
    <br/>
    <div>
      <div class="paragraph">
<p>Running algorithms on explicitly created graphs allows you to operate on a graph multiple times.
To do this, refer to the graph by its name,  as it is stored in the graph catalog.</p>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding"><!--code-->CALL gds.&lt;algo-name&gt;.&lt;mode&gt;(
  graphName: String,
  configuration: Map
)<!--/code--></pre>
</div>
</div>
<div class="ulist">
<ul>
<li>
<p><code>&lt;algo-name&gt;</code> is the algorithm name.</p>
</li>
<li>
<p><code>&lt;mode&gt;</code> is the algorithm execution mode.
The supported modes are:</p>
<div class="ulist">
<ul>
<li>
<p><code>write</code>: writes results to the Neo4j database and returns a summary of the results.</p>
</li>
<li>
<p><code>stats</code>: same as <code>write</code> but does not write to the Neo4j database.</p>
</li>
<li>
<p><code>stream</code>: streams results back to the user.</p>
</li>
</ul>
</div>
</li>
<li>
<p>The <code>graphName</code> parameter value is the name of the graph from the graph catalog.</p>
</li>
<li>
<p>The <code>configuration</code> parameter value is the algorithm-specific configuration.</p>
</li>
</ul>
</div>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>Algorithm syntax: implicit graphs</h3>
    <br/>
    <div>
      <div class="paragraph">
<p>The implicit variant does not access the graph catalog.
If you want to run an algorithm on such a graph, you configure the graph creation within the algorithm configuration map.</p>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding"><!--code-->CALL gds.&lt;algo-name&gt;.&lt;mode&gt;(
  configuration: Map
)<!--/code--></pre>
</div>
</div>
<div class="paragraph">
<p>After the algorithm execution finishes, the graph is released from the memory.</p>
</div>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>Page Rank</h3>
    <br/>
    <div>
      <div class="imageblock" style="float: right;">
<div class="content">
<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/f/fb/PageRanks-Example.svg/758px-PageRanks-Example.svg.png" alt="758px PageRanks Example.svg" width="300">
</div>
</div>
<div class="paragraph">
<p>Page Rank is an algorithm that measures the transitive influence and connectivity of nodes to find the most <strong>influential</strong> nodes in a graph.<br>
It computes an influence value for each node, called a <em>score</em>.
As a result, the score of a node is a certain weighted average of the scores of its direct neighbors.</p>
</div>
<div class="paragraph">
<p><strong>How Page Rank works</strong></p>
</div>
<div class="paragraph">
<p>PageRank is an <em>iterative</em> algorithm.
In each iteration, every node propagates its score evenly divided to its neighbors.<br>
The algorithm runs for a configurable maximum number of iterations (default is 20), or until the node scores converge.
That is, when the maximum change in node score between two sequential iterations is smaller than the configured <code>tolerance</code> value.</p>
</div>
<div class="paragraph">
<p>In the following chapters, you will see how Page Rank identifies the most important nodes.</p>
</div>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>Page Rank: stream mode</h3>
    <br/>
    <div>
      <div class="paragraph">
<p>Let&#8217;s find out who is influential in the graph by running Page Rank.
If you have removed it from the catalog, you have to create it again:</p>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->CALL gds.graph.project('got-interactions', 'Person', {
  INTERACTS: {
    orientation: 'UNDIRECTED'
  }
})<!--/code--></pre>
</div>
</div>
<div class="paragraph">
<p>First, you run a basic Page Rank call in <code>stream</code> mode.</p>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->CALL gds.pageRank.stream('got-interactions') YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS name, score
ORDER BY score DESC LIMIT 10<!--/code--></pre>
</div>
</div>
<div class="paragraph">
<p>Then, you compare the Page Rank of each <code>Person</code> node with the number of interactions for that node.</p>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->CALL gds.pageRank.stream('got-interactions') YIELD nodeId, score AS pageRank
WITH gds.util.asNode(nodeId) AS n, pageRank
MATCH (n)-[i:INTERACTS]-()
RETURN n.name AS name, pageRank, count(i) AS interactions
ORDER BY pageRank DESC LIMIT 10<!--/code--></pre>
</div>
</div>
<div class="paragraph">
<p>The result shows that not always the most talkative characters have the highest rank.</p>
</div>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>Page Rank: write mode</h3>
    <br/>
    <div>
      <div class="paragraph">
<p>Now that you have the results from your Page Rank query, you write them back to Neo4j and use them for further queries.<br>
You specify the name of the property to which the algorithm will write using the <code>writeProperty</code> key in the config map passed to the procedure.</p>
</div>
<div class="paragraph">
<p>Note that the writing is done in Neo4j, not in the graph <code>got-interactions</code>.</p>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->CALL gds.pageRank.write('got-interactions', {writeProperty: 'pageRank'})<!--/code--></pre>
</div>
</div>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>Page Rank: rank per book</h3>
    <br/>
    <div>
      <div class="paragraph">
<p>Along with the generic <code>INTERACTS</code> relationships, you also have <code>INTERACTS_1</code>, <code>INTERACTS_2</code>, etc., for the different books.
Let&#8217;s load a graph for the interactions in book 1 and compute and write the Page Rank scores.</p>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->CALL gds.graph.project(
  'got-interactions-1',
  'Person',
  {
    INTERACTS_1: {
      orientation: 'UNDIRECTED'
    }
  }
);<!--/code--></pre>
</div>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->CALL gds.pageRank.write(
  'got-interactions-1',
  {
    writeProperty: 'pageRank-1'
  }
)<!--/code--></pre>
</div>
</div>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>Page Rank: exercise</h3>
    <br/>
    <div>
      <div class="paragraph">
<p>Let&#8217;s see what you have learned so far.</p>
</div>
<div class="paragraph">
<p>Try to calculate the Page Rank of the other books in the series and store the results in the database.</p>
</div>
<div class="ulist">
<ul>
<li>
<p>Write queries that call <code>gds.pageRank.write</code> for the <code>INTERACTS_2</code>, <code>INTERACTS_3</code>, <code>INTERACTS_4</code>, and <code>INTERACTS_5</code> relationship types.
You can load a graph for each relationship type explicitly, or use the shorthand.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>Then, try to write queries to answer the following questions:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>Which character has the biggest increase in influence from book 1 to 5?</p>
</li>
<li>
<p>Which character has the biggest decrease?</p>
</li>
</ul>
</div>
<div class="paragraph">
<p><strong>Bonus task</strong></p>
</div>
<div class="ulist">
<ul>
<li>
<p>Use a Cypher projection to create a graph of <code>House</code>s that fought in the same <code>Battle</code>s and run Page Rank.</p>
</li>
<li>
<p>Does the result change if you weight Page Rank with the number of shared <code>Battle</code>s?</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>You can find the solution on the next slide.</p>
</div>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>Page Rank: exercise answer</h3>
    <br/>
    <div>
      <div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->CALL gds.graph.project.cypher(
  'house-battles',
  'MATCH (h:House) RETURN id(h) AS id',
  'MATCH (h1:House)--&gt;(b:Battle)&lt;--(h2:House) RETURN id(h1) AS source, id(h2) AS target, count(b) AS weight'
)<!--/code--></pre>
</div>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->CALL gds.pageRank.stream(
  'house-battles',
  {
    relationshipWeightProperty: 'weight'
  }
)
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS name, score
ORDER BY score DESC<!--/code--></pre>
</div>
</div>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>Label Propagation</h3>
    <br/>
    <div>
      <div class="imageblock" style="float: right;">
<div class="content">
<img src="https://s3.amazonaws.com/dev.assets.neo4j.com/wp-content/uploads/20190226091707/label-propagation-graph-algorithm-1.png" alt="label propagation graph algorithm 1" width="300">
</div>
</div>
<div class="paragraph">
<p>Label Propagation (LPA) is a fast algorithm for finding communities in a graph.
It propagates labels throughout the graph and forms communities of nodes based on their influence.</p>
</div>
<div class="paragraph">
<p><strong>How Label Propagation works</strong></p>
</div>
<div class="paragraph">
<p>LPA is an <em>iterative</em> algorithm.
First, it assigns a unique community label to each node.<br>
In each iteration, the algorithm changes this label to the most common one among its neighbors.
Densely connected nodes quickly broadcast their labels across the graph.<br>
At the end of the propagation, only a few labels remain.<br>
Nodes that have the same community label at convergence are considered from the same community.
The algorithm runs for a configurable maximum number of iterations, or until it converges.</p>
</div>
<div class="paragraph">
<p>For more details, see <em><a href="https://neo4j.com/docs/graph-data-science/current/algorithms/label-propagation/" target="_blank">the documentation</a></em>.</p>
</div>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>Label Propagation: example</h3>
    <br/>
    <div>
      <div class="paragraph">
<p>Let&#8217;s run Label Propagation to find the five largest communities of people interacting with each other.<br>
For flexibility, in this example, you can create the graph directly in the algorithm call.<br>
The weight property on the relationship represents the number of interactions between two people.
In LPA, the weight is used to determine the influence of neighboring nodes when voting on community assignment.</p>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->CALL gds.graph.project(
  'got-interactions-weighted',
  'Person',
  {
    INTERACTS: {
      orientation: 'UNDIRECTED',
      properties: 'weight'
    }
  }
)<!--/code--></pre>
</div>
</div>
<div class="paragraph">
<p>Let&#8217;s now run LPA with just one iteration:</p>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->CALL gds.labelPropagation.stream(
  'got-interactions-weighted',
  {
    relationshipWeightProperty: 'weight',
    maxIterations: 1
  }
) YIELD nodeId, communityId
RETURN communityId, count(nodeId) AS size
ORDER BY size DESC
LIMIT 5<!--/code--></pre>
</div>
</div>
<div class="paragraph">
<p>You can see that the nodes are assigned to initial communities - 2166	nodes to 1476 communities.<br>
However, the algorithm needs multiple iterations to achieve a stable result.
So, you run the same procedure with two iterations and see how the results change.</p>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->CALL gds.labelPropagation.stream(
  'got-interactions-weighted',
  {
    relationshipWeightProperty: 'weight',
    maxIterations: 2
  }
) YIELD nodeId, communityId
RETURN communityId, count(nodeId) AS size
ORDER BY size DESC
LIMIT 5<!--/code--></pre>
</div>
</div>
<div class="paragraph">
<p>Usually, label propagation requires more than a few iterations to converge on a stable result.
The number of the required iterations depends on the graph structure&#8201;&#8212;&#8201;you should experiment.</p>
</div>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>Label Propagation: seeding</h3>
    <br/>
    <div>
      <div class="paragraph">
<p>Label Propagation can be seeded with an initial community label from a pre-existing node property.
This allows you to compute communities incrementally.<br>
Let&#8217;s write the results after the first iteration back to the source graph, under the write property name <code>community</code>.</p>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->CALL gds.labelPropagation.write(
  'got-interactions-weighted',
  {
    relationshipWeightProperty: 'weight',
    maxIterations: 1,
    writeProperty: 'community'
  }
)<!--/code--></pre>
</div>
</div>
<div class="paragraph">
<p>You can now use the <code>community</code> property as a seed property for the second iteration.
The results should be the same as the previous run with two iterations.<br>
Seeding is particularly useful when the source graph grows and you want to compute communities incrementally, without starting again from scratch.
Since 'got-interactions-weighted' does not contain the 'community' property, you must create a new graph that does.</p>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->CALL gds.graph.project(
  'got-interactions-seeded',
  {
    Person: {
      properties: 'community'
    }
  },
  {
    INTERACTS: {
      orientation: 'UNDIRECTED',
      properties: 'weight'
    }
  }
)<!--/code--></pre>
</div>
</div>
<div class="paragraph">
<p>And then, you can use the <code>seed</code> configuration key to specify the property from which you want to seed community IDs.</p>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->CALL gds.labelPropagation.stream(
  'got-interactions-seeded',
  {
    relationshipWeightProperty: 'weight',
    maxIterations: 1,
    seedProperty: 'community'
  }
) YIELD nodeId, communityId
RETURN communityId, count(nodeId) AS size
ORDER BY size DESC
LIMIT 5<!--/code--></pre>
</div>
</div>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>Label Propagation: exercise</h3>
    <br/>
    <div>
      <div class="paragraph">
<p>Now that you understand the basics of LPA, let&#8217;s experiment a little.</p>
</div>
<div class="ulist">
<ul>
<li>
<p>How many iterations does it take for LPA to converge on a stable number of communities? How many communities do you end up with?</p>
</li>
<li>
<p>What happens when you run LPA for 1,000 maxIterations? (<em>hint: try using YIELD ranIterations</em>)</p>
</li>
<li>
<p>What happens if you run LPA without weights? Do you find the same communities?</p>
</li>
<li>
<p><strong>Bonus task</strong>: What if you use house affiliations as seeds for communities? How would you use Cypher to create the initial seeds? Run the algorithm with the new seeds. Do you find a different set of communities?</p>
</li>
</ul>
</div>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>Label Propagation: cleanup</h3>
    <br/>
    <div>
      <div class="paragraph">
<p>Now that you are done with Label Propagation, you can remove the graphs from the catalog.</p>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->CALL gds.graph.drop('got-interactions-weighted');
CALL gds.graph.drop('got-interactions-seeded');<!--/code--></pre>
</div>
</div>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>Weakly Connected Components</h3>
    <br/>
    <div>
      <div class="imageblock" style="float: right;">
<div class="content">
<img src="https://s3.amazonaws.com/dev.assets.neo4j.com/wp-content/uploads/20190222092528/union-find-graph-algorithm-visualization-3.png" alt="union find graph algorithm visualization 3" width="350">
</div>
</div>
<div class="paragraph">
<p>The Weakly Connected Components algorithm (previously known as Union Find) finds sets of connected nodes in an <em>undirected</em> graph, where each node is reachable from any other node in the same set.
It is called <em>weakly</em> because it relies on the relationship between two nodes regardless of its direction, wherefore the graph is treated as <em>undirected</em>.<br>
This algorithm is useful for identifying disjoint subgraphs, when pre-processing graphs, or for disambiguation purposes.</p>
</div>
<div class="paragraph">
<p>Let&#8217;s start with a simple example that shows how to run the algorithm and stream the results.</p>
</div>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>Weakly Connected Components: example</h3>
    <br/>
    <div>
      <div class="paragraph">
<p>You can use the <code>got-interactions</code> graph and run the algorithm to compute components.
If you have removed it from the catalog, you have to create it again:</p>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->CALL gds.graph.project('got-interactions', 'Person', {
  INTERACTS: {
    orientation: 'UNDIRECTED'
  }
})<!--/code--></pre>
</div>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->CALL gds.wcc.stream('got-interactions')
YIELD nodeId, componentId
RETURN componentId AS component, count(nodeId) AS size
ORDER BY size DESC<!--/code--></pre>
</div>
</div>
<div class="paragraph">
<p>The result is one large component containing 795 characters and many isolated characters.</p>
</div>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>Weakly Connected Components: connected components</h3>
    <br/>
    <div>
      <div class="paragraph">
<p>Let&#8217;s use a Cypher projection to build a new graph named <code>got-culture-interactions-cypher</code>.
It will contain people that belong to the same culture.</p>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->CALL gds.graph.project.cypher(
  'got-culture-interactions-cypher',
  'MATCH (n:Person) RETURN id(n) AS id',
  'MATCH (p1:Person)-[:MEMBER_OF_CULTURE]-&gt;(c:Culture)&lt;-[:MEMBER_OF_CULTURE]-(p2:Person) RETURN id(p1) AS source, id(p2) AS target'
)<!--/code--></pre>
</div>
</div>
<div class="paragraph">
<p>Now, run the algorithm to compute components.</p>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->CALL gds.wcc.stream('got-culture-interactions-cypher')
YIELD nodeId, componentId
RETURN componentId AS component, count(nodeId) AS size ORDER BY size DESC<!--/code--></pre>
</div>
</div>
<div class="paragraph">
<p>The result is components with different sizes.</p>
</div>
<div class="paragraph">
<p>Reviewing the results, which cultures are represented by the five largest components?</p>
</div>
<div class="paragraph">
<p>Can you modify the query to write the components back to the database?
Add the property <code>wcc_partition</code> to your <code>:Person</code> nodes.</p>
</div>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>Weakly Connected Components: thresholds</h3>
    <br/>
    <div>
      <div class="paragraph">
<p>You can also use some additional configuration options:</p>
</div>
<div class="ulist">
<ul>
<li>
<p><code>threshold</code> for connectivity (used along with <code>relationshipWeightProperty</code>)</p>
</li>
<li>
<p><code>seedProperty</code></p>
</li>
</ul>
</div>
<div class="paragraph">
<p><strong>Threshold</strong></p>
</div>
<div class="paragraph">
<p>If the <code>threshold</code> option is specified, the <code>relationshipWeightProperty</code> option must also be present.
In this case, relationships whose weight is below the given threshold will not be used in the computation.</p>
</div>
<div class="paragraph">
<p>You will consider a graph with relationships weighted by the number of times a pair of individuals have interacted.</p>
</div>
<div class="paragraph">
<p><strong>Note:</strong> You are casting the weight property from the graph as a float because that is what the algorithm expects as an input.</p>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->CALL gds.graph.project('got-wcc-weighted-interactions',
  'Person',
  {
    INTERACTS: {
      orientation: 'NATURAL',
      properties: {
        weight: {
          property: 'weight',
          defaultValue: 0.0,
          aggregation: 'SINGLE'
        }
      }
    }
  }
)<!--/code--></pre>
</div>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->CALL gds.wcc.stream(
  'got-wcc-weighted-interactions',
  {
    relationshipWeightProperty:'weight',
    threshold:5.0
  }
)
YIELD nodeId, componentId
RETURN count(distinct componentId) AS components<!--/code--></pre>
</div>
</div>
<div class="paragraph">
<p>How does the number of identified communities change when you change the threshold?
What happens to their size?
What value produces the most communities?</p>
</div>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>Weakly Connected Components: seeding</h3>
    <br/>
    <div>
      <div class="paragraph">
<p>Now you can use the <code>wcc_partition</code> property to seed the algorithm with an initial community label.
This allows you to compute communities incrementally.</p>
</div>
<div class="paragraph">
<p>If you have not managed to create the property <code>wcc_partition</code>, execute the following query.</p>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->CALL gds.wcc.write(
  'got-culture-interactions-cypher',
  {
    writeProperty: 'wcc_partition'
  }
)<!--/code--></pre>
</div>
</div>
<div class="paragraph">
<p>Then, you can create a projected graph, called <code>got-wcc-interactions-seeded</code> and add the property to your <code>Person</code> nodes:</p>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->CALL gds.graph.project(
  'got-wcc-interactions-seeded',
  {
    Person: {
      properties: 'wcc_partition'
    }
  },
  {
    INTERACTS: {
      orientation: 'UNDIRECTED',
      properties: 'weight'
    }
  }
)<!--/code--></pre>
</div>
</div>
<div class="paragraph">
<p><strong>Seeding</strong></p>
</div>
<div class="paragraph">
<p>For the Weakly Connected Components algorithm, this functionality is most useful when you want to add data to an existing graph.</p>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->MATCH (p:Person)
WITH p.wcc_partition AS community, collect(p) AS members
WITH community, size(members) AS size, members[0] AS someGuy
    ORDER BY size DESC
    LIMIT 6
WITH collect(someGuy) AS someGuys
WITH someGuys, someGuys[0] AS first
MERGE (mats:Person {name: 'Mats'})
MERGE (mats)-[:INTERACTS]-&gt;(first)
WITH someGuys, someGuys[1] AS second
MERGE (martin:Person {name: 'Martin'})
MERGE (martin)-[:INTERACTS]-&gt;(second)
WITH someGuys, someGuys[2] AS third
MERGE (jonatan:Person {name: 'Jonatan'})
MERGE (jonatan)-[:INTERACTS]-&gt;(third)
WITH someGuys, someGuys[3] AS fourth
MERGE (max:Person {name: 'Max'})
MERGE (max)-[:INTERACTS]-&gt;(fourth)
WITH someGuys, someGuys[4] AS fifth
MERGE (soren:Person {name: 'Soren'})
MERGE (soren)-[:INTERACTS]-&gt;(fifth)
WITH someGuys, someGuys[5] AS sixth
MERGE (paul:Person {name: 'Paul'})
MERGE (paul)-[:INTERACTS]-&gt;(fourth)<!--/code--></pre>
</div>
</div>
<div class="paragraph">
<p>Now let&#8217;s use the previously labeled <code>wcc_partition</code> as a seed, and assign communities to your new nodes:</p>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->CALL gds.wcc.stream(
  'got-wcc-interactions-seeded',
  {
    seedProperty: 'wcc_partition'
  }
)
YIELD nodeId, componentId
RETURN componentId, count(nodeId) AS size
ORDER BY size DESC<!--/code--></pre>
</div>
</div>
<div class="paragraph">
<p>The number of communities is the same as before, but you have also added the properties to the new nodes.
On a small graph this is trivial, but on a large graph this saves a lot of computational time.</p>
</div>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>Weakly Connected Components: exercise</h3>
    <br/>
    <div>
      <div class="ulist">
<ul>
<li>
<p>Can you use a Cypher projection to create a graph that contains at least five communities with more than two members?</p>
</li>
<li>
<p>Can you use a Cypher projection with thresholding (you can use Cypher to add a new weight property if you want) to break the graph into multiple properties?
Does increasing your threshold create <em>more</em> or <em>fewer</em> partitions?</p>
</li>
<li>
<p>Using the previous exercise, write the partitions to the graph, and then use them as seeds for Union Find on the full graph, using <code>Person</code> and <code>INTERACTS</code>.
How many communities do you find?
What happened?</p>
</li>
</ul>
</div>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>Weakly Connected Components: cleanup</h3>
    <br/>
    <div>
      <div class="paragraph">
<p>To remove the nodes that have been created during the seeding exercise, run the following query:</p>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->MATCH (p:Person) WHERE p.name IN ['Mats', 'Martin', 'Jonatan', 'Max', 'Soren', 'Paul'] DETACH DELETE p<!--/code--></pre>
</div>
</div>
<div class="paragraph">
<p>To clean up the in-memory graphs created during the exercises, you can run the following queries.</p>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->CALL gds.graph.drop('got-culture-interactions-cypher');
CALL gds.graph.drop('got-wcc-weighted-interactions');
CALL gds.graph.drop('got-wcc-interactions-seeded');<!--/code--></pre>
</div>
</div>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>Louvain</h3>
    <br/>
    <div>
      <div class="imageblock" style="float: right;">
<div class="content">
<img src="https://neo4j.com/docs/graph-algorithms/current/images/louvain-multilevel-graph.svg" alt="louvain multilevel graph" width="400">
</div>
</div>
<div class="paragraph">
<p>The Louvain algorithm, like Label Propagation and Weakly Connected Components, is a community detection algorithm designed to identify clusters of nodes in a graph.
It applies heuristic modularity to define the community structure by calculating how densely connected the nodes within a community (module) are, versus in a random graph.
Louvain also reveals a hierarchy of communities at different scales, which enables you to zoom in on different levels of granularity and find sub-communities within sub-communities within sub-communities.</p>
</div>
<div class="paragraph">
<p><strong>How Louvain works</strong></p>
</div>
<div class="paragraph">
<p>Louvain is a <em>greedy</em>, <em>hierarchical clustering</em> algorithm.
It repeats the following two steps until it finds a global optimum:</p>
</div>
<div class="olist arabic">
<ol class="arabic">
<li>
<p>Assign the nodes to communities, favoring local optimizations of modularity.</p>
</li>
<li>
<p>Aggregate the nodes from the same community to form a single node, which inherits all connected relationships.</p>
</li>
</ol>
</div>
<div class="paragraph">
<p>These two steps are repeated until no further modularity-increasing reassignments of communities are possible.
Because ties are broken arbitrarily, you can get different results between different runs of the Louvain algorithm.</p>
</div>
<div class="paragraph">
<p><strong>What to consider</strong></p>
</div>
<div class="paragraph">
<p>Louvain is significantly slower than Label Propagation and Weakly Connected Components, and the results can be hard to interpret.</p>
</div>
<div class="paragraph">
<p>The algorithm is sensitive to the weighting scheme used on the relationships.
A good sign that you need to tweak your schema or weighting is when you notice that the results include only a <em>single</em> giant community, or every node is a community on its own.</p>
</div>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>Louvain: examples</h3>
    <br/>
    <div>
      <div class="paragraph">
<p>Let&#8217;s compute the Louvain community structure of the graph <code>got-interactions</code>.
If you have removed it from the catalog, you have to create it again:</p>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->CALL gds.graph.project('got-interactions', 'Person', {
  INTERACTS: {
    orientation: 'UNDIRECTED'
  }
})<!--/code--></pre>
</div>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->CALL gds.louvain.stream('got-interactions')
YIELD nodeId, communityId
RETURN gds.util.asNode(nodeId).name AS person, communityId
ORDER BY communityId DESC<!--/code--></pre>
</div>
</div>
<div class="paragraph">
<p>The query returns the name of each person and the id of the community to which it belongs.
If you want to investigate how many communities are available, and the number of members of each community, you can change the RETURN statement.</p>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->CALL gds.louvain.stream('got-interactions')
YIELD nodeId, communityId
RETURN communityId, COUNT(DISTINCT nodeId) AS members
ORDER BY members DESC<!--/code--></pre>
</div>
</div>
<div class="paragraph">
<p>The result is 1382 communities, 11 of which with more than one member.</p>
</div>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>Louvain: weighting</h3>
    <br/>
    <div>
      <div class="paragraph">
<p>Now let&#8217;s run the Louvain algorithm on a weighted graph.
This way, it considers the relationship weights when calculating the modularity.</p>
</div>
<div class="paragraph">
<p>First, you must create a graph with the <code>weight</code> relationship property.
Otherwise, the number specified in <code>defaultValue</code> will be used as a fallback.</p>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->CALL gds.graph.project(
  'got-weighted-interactions',
  'Person',
  {
    INTERACTS: {
      orientation: 'UNDIRECTED',
      aggregation: 'NONE',
      properties: {
      	weight: {
          property: 'weight',
          aggregation: 'NONE',
          defaultValue: 0.0
        }
      }
    }
  }
)<!--/code--></pre>
</div>
</div>
<div class="paragraph">
<p>Then, use the <code>weight</code> property on the INTERACTS relationship and see what happens:</p>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->CALL gds.louvain.stream(
  'got-weighted-interactions',
  {
    relationshipWeightProperty: 'weight'
  }
)
YIELD nodeId, communityId
RETURN communityId, COUNT(DISTINCT nodeId) AS members
ORDER BY members DESC<!--/code--></pre>
</div>
</div>
<div class="paragraph">
<p>The result is 1384 communities, 13 of which with more than one member.</p>
</div>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>Louvain: intermediate communities</h3>
    <br/>
    <div>
      <div class="paragraph">
<p>Now let&#8217;s try to identify communities at multiple levels in the graph: first small communities, and then combine them in large ones.</p>
</div>
<div class="paragraph">
<p>To retrieve the intermediate communities, set <code>includeIntermediateCommunities</code> to <code>true</code>:</p>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->CALL gds.louvain.stream(
  'got-interactions',
  {
    includeIntermediateCommunities: true
  }
)
YIELD nodeId, communityId, intermediateCommunityIds
RETURN communityId, COUNT(DISTINCT nodeId) AS members, intermediateCommunityIds<!--/code--></pre>
</div>
</div>
<div class="paragraph">
<p>You can extract membership in different levels of communities and see how the composition changes:</p>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->CALL gds.louvain.stream(
  'got-interactions',
  {
    includeIntermediateCommunities: true
  }
)
YIELD nodeId, intermediateCommunityIds
RETURN count(distinct intermediateCommunityIds[0]), count(distinct intermediateCommunityIds[1])<!--/code--></pre>
</div>
</div>
<div class="paragraph">
<p><code>includeIntermediateCommunities: false</code> is the default value, in which case, the <code>intermediateCommunityIds</code> field of the result is <code>null</code>.</p>
</div>
<div class="paragraph">
<p><strong>Bonus task</strong></p>
</div>
<div class="paragraph">
<p>Can you identify nodes that belong to different communities in the first level of the hierarchy, but combine to the same community in the next level?</p>
</div>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>Louvain: cleanup</h3>
    <br/>
    <div>
      <div class="paragraph">
<p>To clean up the in-memory graph created during the Louvain exercise, run the following query:</p>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->CALL gds.graph.drop('got-weighted-interactions');<!--/code--></pre>
</div>
</div>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>Node Similarity</h3>
    <br/>
    <div>
      <div class="imageblock" style="float: right;">
<div class="content">
<img src="https://miro.medium.com/max/4000/0*ZjP7pSSaidIgSDmm.png" alt="0*ZjP7pSSaidIgSDmm" width="350">
</div>
</div>
<div class="paragraph">
<p>The Node Similarity algorithm compares pairs of nodes in a graph based on their connections to other nodes.
Two nodes are considered similar if they share many of the same neighbors.</p>
</div>
<div class="paragraph">
<p>The algorithm uses the so-called <em>Jaccard Similarity Score</em> to obtain a similarity measure between two sets.
More precisely, the similarity between two nodes A and B is given by the following formula:</p>
</div>
<div class="paragraph">
<p>Similarity (A,B) = [#nodes neighboring A and B] / [#nodes neighboring A or B (or both)]</p>
</div>
<div class="paragraph">
<p>That is, nodes A and B are similar if most nodes that are neighbors to either node are also neighbors to both.</p>
</div>
<div class="paragraph">
<p><strong>How it works</strong></p>
</div>
<div class="paragraph">
<p>The input of this algorithm is a bipartite, connected graph containing two disjoint node sets.
Each relationship starts from a node in the first node set and ends at a node in the second node set.
The Node Similarity algorithm compares all nodes from the first node set with each other based on their relationships to nodes in the second set.
The complexity of this comparison grows quadratically with the number of nodes to compare.
The algorithm reduces the complexity by ignoring disconnected nodes.</p>
</div>
<div class="paragraph">
<p>For more information, see <a href="https://neo4j.com/docs/graph-data-science/current/algorithms/node-similarity/" target="_blank">the documentation</a>.</p>
</div>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>Node Similarity: example graph</h3>
    <br/>
    <div>
      <div class="paragraph">
<p>Before you run the Node Similarity algorithm, you have to create a projected graph that consists of GOT characters and the various entities to which they relate.
The task will be to find similar characters by comparing the books they appear or die in, and the houses and cultures to which they belong.
It is a bipartite graph between <code>Person</code> on one side and <code>Book</code>, <code>House</code>, and <code>Culture</code> on the other side.</p>
</div>
<div class="paragraph">
<p>You create the graph using the following query:</p>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->CALL gds.graph.project('got-character-related-entities', ['Person', 'Book', 'House', 'Culture'], '*')<!--/code--></pre>
</div>
</div>
<div class="paragraph">
<p>This graph creation uses projection with multiple node labels.
You load all types of relationships with <code>*</code>.</p>
</div>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>Node Similarity: simple run</h3>
    <br/>
    <div>
      <div class="paragraph">
<p>Now, you can run Node Similarity with the default settings and extract the top 10 most similar pairs of characters.
The algorithm computes similarities only for <code>Person</code> nodes as they are the only nodes with outgoing edges.
To get more interesting results, you can limit the result by using the property <code>degreeCutoff</code>, to get only characters with at least 20 related entities.</p>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->CALL gds.nodeSimilarity.stream(
  'got-character-related-entities',
  {
    degreeCutoff: 20
  }
)
YIELD node1, node2, similarity
RETURN gds.util.asNode(node1).name AS character1, gds.util.asNode(node2).name AS character2, similarity
ORDER BY similarity DESC
LIMIT 10<!--/code--></pre>
</div>
</div>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>Node Similarity: similarity cutoff</h3>
    <br/>
    <div>
      <div class="paragraph">
<p>In most real-world graphs, the number of pairs of nodes to compare is huge, and most pairs are not similar.
Therefore, it is useful to be able to limit the output.
There are several ways to deal with this.
One way is to set a threshold for a minimum similarity by specifying the <code>similarityCutoff</code> property.</p>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->CALL gds.nodeSimilarity.stream(
  'got-character-related-entities',
  {
    degreeCutoff: 20,
    similarityCutoff: 0.45
  }
)
YIELD node1, node2, similarity
RETURN gds.util.asNode(node1).name AS character1, gds.util.asNode(node2).name AS character2, similarity
ORDER BY similarity DESC<!--/code--></pre>
</div>
</div>
<div class="paragraph">
<p>Note that you no longer need to use the LIMIT clause.</p>
</div>
<div class="paragraph">
<p>By default, the <code>similarityCutoff</code> value is a very small number, effectively filtering out pairs that have zero similarity.</p>
</div>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>Node Similarity: topN</h3>
    <br/>
    <div>
      <div class="paragraph">
<p>You can also limit the number of similarities returned by using the <code>topN</code> config option.</p>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->CALL gds.nodeSimilarity.stream(
  'got-character-related-entities',
  {
    degreeCutoff: 20,
    topN: 10
  }
)
YIELD node1, node2, similarity
RETURN gds.util.asNode(node1).name AS character1, gds.util.asNode(node2).name AS character2, similarity
ORDER BY similarity DESC<!--/code--></pre>
</div>
</div>
<div class="paragraph">
<p>This algorithm specific way of limiting is more memory efficient than constructing the entire stream and using the LIMIT clause afterwards.</p>
</div>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>Node Similarity: topK</h3>
    <br/>
    <div>
      <div class="paragraph">
<p>Another way to limit the results is the <code>topK</code> config option.
The algorithm output will be the <code>K</code> most similar characters for each character.
Let&#8217;s set this value to 1, and see if Loras Tyrell has only one similar neighbor instead of two.</p>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->CALL gds.nodeSimilarity.stream(
  'got-character-related-entities',
  {
    degreeCutoff: 20,
    topN: 10,
    topK: 1
  }
)
YIELD node1, node2, similarity
RETURN gds.util.asNode(node1).name AS character1, gds.util.asNode(node2).name AS character2, similarity
ORDER BY similarity DESC<!--/code--></pre>
</div>
</div>
<div class="paragraph">
<p>Did you notice anything surprising?
Loras Tyrell still appeared twice as character2.<br>
The algorithm returns only the most similar character to Loras when considering his neighbors.
The explanation is that when considering other characters, multiple ones may have Loras as their most similar neighbor.</p>
</div>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>Node Similarity: bottomN and bottomK</h3>
    <br/>
    <div>
      <div class="paragraph">
<p>Similarly to the <code>topN</code> and <code>topK</code>, <code>bottomN</code> and <code>bottomK</code> config options limit the results but return the least similar pairs.</p>
</div>
<div class="paragraph">
<p>Why don&#8217;t you try it yourself?</p>
</div>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>Node Similarity: writing</h3>
    <br/>
    <div>
      <div class="paragraph">
<p>Now, let&#8217;s see how to write similarity scores back to Neo4j.
The output of the algorithm can be written as weighted relationships.
The weight property is set to the computed node similarity of the relationship it concerns.
The config option <code>writeProperty</code> specifies the name of the property.</p>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->CALL gds.nodeSimilarity.write(
  'got-character-related-entities',
  {
    degreeCutoff: 20,
    topN: 10,
    topK: 1,
    writeRelationshipType: 'SIMILARITY',
    writeProperty: 'character_similarity'
  }
)<!--/code--></pre>
</div>
</div>
<div class="paragraph">
<p>The result is 10 relationships caused by the <code>topN</code> value.</p>
</div>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>Node Similarity: cleanup</h3>
    <br/>
    <div>
      <div class="paragraph">
<p>To clean up the in-memory graph created during the tutorial, you can run the following query:</p>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->CALL gds.graph.drop('got-character-related-entities');<!--/code--></pre>
</div>
</div>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>Triangle Count</h3>
    <br/>
    <div>
      <div class="paragraph">
<p><strong>Since GDS 1.2</strong></p>
</div>
<div class="paragraph">
<p>A triangle in a graph is a set of three nodes all connected to each other.
The triangle count of a node is the number of triangles that node belongs to.
The Graph Data Science library provides procedures for all standard execution modes in the namespace <code>gds.triangleCount</code>.
The algorithm is only defined for undirected graphs, so we make sure to fulfil this requirement in the examples below.</p>
</div>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>Triangle Count: examples</h3>
    <br/>
    <div>
      <div class="paragraph">
<p>In order to better understand the concept of triangle counting, let us visualize a part of the GoT graph.
We will select two characters and only include relationships from the first book between them and their neighbours.
First make sure to uncheck 'Connect result nodes' in the settings of Neo4j Browser.</p>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->MATCH (n:Person)-[r:INTERACTS_1]-&gt;(m:Person)
WHERE n.name IN ["Robb Stark", "Tyrion Lannister"]
RETURN n, m, r<!--/code--></pre>
</div>
</div>
<div class="paragraph">
<p>How many triangles do you see?</p>
</div>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>Triangle Count: examples</h3>
    <br/>
    <div>
      <div class="paragraph">
<p>Let us verify with the triangle count procedure executed on the same subgraph as above.</p>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->CALL gds.graph.project.cypher('small_got',
'MATCH (n:Person) RETURN id(n) AS id',
"MATCH (n:Person)-[r:INTERACTS_1]-&gt;(m:Person)
  WHERE n.name IN ['Robb Stark', 'Tyrion Lannister'] RETURN id(n) AS source, id(m) AS target
  UNION MATCH (n:Person)-[r:INTERACTS_1]-&gt;(m:Person) WHERE n.name IN ['Robb Stark', 'Tyrion Lannister'] RETURN id(m) AS source, id(n) AS target")<!--/code--></pre>
</div>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->CALL gds.triangleCount.stream('small_got')
YIELD nodeId, triangleCount
WITH gds.util.asNode(nodeId).name AS name, triangleCount
WHERE triangleCount &gt; 0
RETURN name, triangleCount<!--/code--></pre>
</div>
</div>
<div class="paragraph">
<p>As you might have seen, indeed there are exactly two triangles which give Tyrion and Robb triangle counts of two and Tywin and Yoren triangle counts of one.</p>
</div>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>Triangle Count: examples</h3>
    <br/>
    <div>
      <div class="paragraph">
<p>For finding the people with the highest overall triangle count in book 1, we can do the following:</p>
</div>
<div class="listingblock">
<div class="title">This will create the named graph we are going to use in the examples (run if not already created)</div>
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->CALL gds.graph.project(
  'got-interactions-1',
  'Person',
  {
    INTERACTS_1: {
      orientation: 'UNDIRECTED'
    }
  }
);<!--/code--></pre>
</div>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->CALL gds.triangleCount.stream('got-interactions-1')
YIELD nodeId, triangleCount
RETURN gds.util.asNode(nodeId).name AS name, triangleCount
ORDER BY triangleCount DESC
LIMIT 10<!--/code--></pre>
</div>
</div>
<div class="paragraph">
<p>Does perhaps Eddard Stark have an inclination to triangle dramas?</p>
</div>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>Triangle Count: stats mode</h3>
    <br/>
    <div>
      <div class="paragraph">
<p>The stats mode can be used to compute the total number of triangles in the graph.</p>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->CALL gds.triangleCount.stats('got-interactions-1')
YIELD globalTriangleCount<!--/code--></pre>
</div>
</div>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>Triangle Count: max degree</h3>
    <br/>
    <div>
      <div class="paragraph">
<p>For nodes with a high degree, it is expensive to compute the triangle count.
One can exclude certain nodes from the computation by setting the configuration option <code>maxDegree</code> as follows.
For each excluded node, the triangle count will be reported as <code>-1</code>.
These nodes will also be excluded from the triangle counts of the adjacent nodes.</p>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->CALL gds.triangleCount.stream('got-interactions-1', {maxDegree: 10})
YIELD nodeId, triangleCount
WHERE triangleCount &lt;&gt; 0
RETURN gds.util.asNode(nodeId).name AS name, triangleCount
LIMIT 20<!--/code--></pre>
</div>
</div>
<div class="paragraph">
<p>We note that for example Eddard Stark is no longer in the top list of high triangle count characters since his degree 51 exceeds the <code>maxDegree</code> setting.
Moreover, the triangle counts for nodes of lower degrees are also affected.
You can verify this by running the query below with and without <code>maxDegree</code>.</p>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->CALL gds.triangleCount.stream('got-interactions-1', {maxDegree: 10})
YIELD nodeId, triangleCount
WITH gds.util.asNode(nodeId).name AS name, triangleCount
WHERE name = "Halder"
RETURN name, triangleCount<!--/code--></pre>
</div>
</div>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>Triangle Count: cleanup</h3>
    <br/>
    <div>
      <div class="paragraph">
<p>To clean up the in-memory graph created during the Triangle Count exercise, run the following query:</p>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->CALL gds.graph.drop('got-interactions-1');<!--/code--></pre>
</div>
</div>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>Local Clustering Coefficient</h3>
    <br/>
    <div>
      <div class="paragraph">
<p><strong>Since GDS 1.2</strong></p>
</div>
<div class="paragraph">
<p>The local clustering coefficient is a metric quantifying how connected the neighborhood of a node is.
It is the probability that two random neighbors of the node are connected in the graph.
This can be obtained from the triangle count and the degree of the node.
The Graph Data Science library provides procedures for all standard execution modes in the namespace <code>gds.localClusteringCoefficient</code>.
The algorithm is only defined for undirected graphs, so we make sure to fulfil this requirement in the examples below.</p>
</div>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>Local Clustering Coefficient: stream mode</h3>
    <br/>
    <div>
      <div class="paragraph">
<p>For finding the people with the highest overall local clustering coefficient in book 1, we can do the following:</p>
</div>
<div class="listingblock">
<div class="title">This will create the named graph we are going to use in the examples (run if not already created)</div>
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->CALL gds.graph.project(
  'got-interactions-1',
  'Person',
  {
    INTERACTS_1: {
      orientation: 'UNDIRECTED'
    }
  }
);<!--/code--></pre>
</div>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->CALL gds.localClusteringCoefficient.stream('got-interactions-1')
YIELD nodeId, localClusteringCoefficient
RETURN gds.util.asNode(nodeId).name AS name, localClusteringCoefficient
ORDER BY localClusteringCoefficient DESC
LIMIT 10<!--/code--></pre>
</div>
</div>
<div class="paragraph">
<p>We see here multiple nodes with local clustering coefficient 1.0, however they have in fact only few neighbors and triangles, sometimes a single triangle.
In the following example we will identify nodes with high local clustering coefficient but filter out nodes with low triangle count.</p>
</div>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>Local Clustering Coefficient: triangleCountProperty</h3>
    <br/>
    <div>
      <div class="paragraph">
<p>To compute the local clustering coefficient we need to know the number of triangles for each node.
The Local Clustering Coefficient is capable of reusing previously computed triangle counts.</p>
</div>
<div class="paragraph">
<p>First we compute the triangle counts and save them in the in-memory graph as a node property called <code>triangleCount</code>.</p>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->CALL gds.triangleCount.mutate('got-interactions-1', {mutateProperty: "triangleCount"})<!--/code--></pre>
</div>
</div>
<div class="paragraph">
<p>In the following, we look at nodes which have both a high triangle count and a high local clustering coefficient.</p>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->CALL gds.localClusteringCoefficient.stream('got-interactions-1', {triangleCountProperty: "triangleCount"})
YIELD nodeId, localClusteringCoefficient AS lcc
WITH gds.util.asNode(nodeId).name AS name , lcc, gds.util.nodeProperty('got-interactions-1', nodeId, "triangleCount") AS triangleCount
WHERE triangleCount &gt; 50
RETURN name, lcc, triangleCount
ORDER BY lcc DESC
LIMIT 10<!--/code--></pre>
</div>
</div>
<div class="paragraph">
<p>The persons we see here might be regarded as central in medium to large communities.</p>
</div>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>Local Clustering Coefficient: stats mode</h3>
    <br/>
    <div>
      <div class="paragraph">
<p>To see if the GoT person graph of book 1 is a small-world network, we can run the following:</p>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->CALL gds.localClusteringCoefficient.stats('got-interactions-1')
YIELD averageClusteringCoefficient<!--/code--></pre>
</div>
</div>
<div class="paragraph">
<p>As we see, the average clustering coefficient of around 0.036 is rather small.
In comparison, clustering coefficients of 0.11 have been reported for the world wide web and 0.59 for a network of company directors.
The explanation for the lack of small world structure could be that there are many characters in GoT, and it would require even more pages to turn them into a small-world network.</p>
</div>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>Local Clustering Coefficient: cleanup</h3>
    <br/>
    <div>
      <div class="paragraph">
<p>To clean up the in-memory graph created during the Local Clustering Coefficient exercise, run the following query:</p>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->CALL gds.graph.drop('got-interactions-1');<!--/code--></pre>
</div>
</div>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>Betweenness Centrality</h3>
    <br/>
    <div>
      <div class="imageblock" style="float: right;">
<div class="content">
<img src="https://upload.wikimedia.org/wikipedia/commons/6/60/Graph_betweenness.svg" alt="Graph betweenness" width="300">
</div>
</div>
<div class="paragraph">
<p><strong>Since GDS 1.3</strong></p>
</div>
<div class="paragraph">
<p>Betweenness Centrality is a way of detecting the amount of influence a node has over the flow of information in a graph.
It is often used to find nodes that serve as a bridge from one part of a graph to another.</p>
</div>
<div class="paragraph">
<p><strong>How Betweenness Centrality works</strong></p>
</div>
<div class="paragraph">
<p>The algorithm calculates unweighted shortest paths between all pairs of nodes in a graph.
Each node receives a score, based on the number of shortest paths that pass through the node.
Nodes that more frequently lie on shortest paths between other nodes will have higher betweenness centrality scores.</p>
</div>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>Betweenness Centrality: stream mode</h3>
    <br/>
    <div>
      <div class="paragraph">
<p>Let&#8217;s find out who is influential in the graph by running Betweenness Centrality.
If you have removed it from the catalog, you have to create it again:</p>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->CALL gds.graph.project('got-interactions', 'Person', {
  INTERACTS: {
    orientation: 'UNDIRECTED'
  }
})<!--/code--></pre>
</div>
</div>
<div class="paragraph">
<p>First, you run the Betweenness Centrality algorithm in <code>stream</code> mode.</p>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->CALL gds.betweenness.stream('got-interactions') YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS name, score
ORDER BY score DESC LIMIT 10<!--/code--></pre>
</div>
</div>
<div class="paragraph">
<p>If you ran Page Rank previously, you may notice that the result is similar.
You can run the Page Rank query again and compare the result.</p>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->CALL gds.pageRank.stream('got-interactions') YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS name, score
ORDER BY score DESC LIMIT 10<!--/code--></pre>
</div>
</div>
<div class="paragraph">
<p>The result is similar, but not identical.
In general Betweenness Centrality is a good metric to identify bottlenecks and bridges in a graph while Page Rank is used to understand the influence of a node in a network.</p>
</div>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>Betweenness Centrality: sampling</h3>
    <br/>
    <div>
      <div class="paragraph">
<p>This algorithm is very computationally expensive.
To make it possible to run on large graphs we sample.
Sampling means we compute the shortest paths for some nodes but not for others.
The number of nodes sampled is configured using the <code>samplingSize</code> parameter.</p>
</div>
<div class="paragraph">
<p>Find out how many nodes are in your graph:</p>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->CALL gds.graph.list('got-interactions') YIELD nodeCount<!--/code--></pre>
</div>
</div>
<div class="paragraph">
<p>Decide how large of a sample to use.
Here we run with half the node count as <code>sampleSize</code>.
The appropriate sample size for a use case will depend on the size and shape of the graph, as well as the resources (RAM and CPU) available.</p>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->CALL gds.betweenness.stream('got-interactions', {samplingSize: 1083}) YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS name, score
ORDER BY score DESC LIMIT 10<!--/code--></pre>
</div>
</div>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>Betweenness Centrality: stats, write and mutate</h3>
    <br/>
    <div>
      <div class="paragraph">
<p>In stats mode, Betweenness Centrality will return statistical and measurement values of the centrality score.</p>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->CALL gds.betweenness.stats('got-interactions')
YIELD centralityDistribution<!--/code--></pre>
</div>
</div>
<div class="paragraph">
<p>The same is returned by the write and mutate modes as well, in addition to writing results back to Neo4j or mutating the in-memory graph, respectively.</p>
</div>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>The end</h3>
    <br/>
    <div>
      <div class="paragraph">
<p>You just learnt how to explore the graph structure and elements by computing centrality and similarity scores, and detecting communities.<br>
To learn more about the Neo4j Graph Data Science (GDS) library, see <a href="https://neo4j.com/docs/graph-data-science/current/" target="_blank">the documentation</a>.</p>
</div>
	</div>
  </div>
</slide>
  </carousel>
</article>