From 421662e6438fbfe1a3c98e4742d06f90b449df5c Mon Sep 17 00:00:00 2001 From: Tomaz Bratanic Date: Sun, 17 Apr 2022 14:37:11 +0100 Subject: [PATCH] Update to GDS 2.0 --- .../Exploratory graph analysis.ipynb | 7304 +++++++++++------ 1 file changed, 4727 insertions(+), 2577 deletions(-) diff --git a/Marvel_series/Exploratory graph analysis.ipynb b/Marvel_series/Exploratory graph analysis.ipynb index ca46e2a..c94d358 100644 --- a/Marvel_series/Exploratory graph analysis.ipynb +++ b/Marvel_series/Exploratory graph analysis.ipynb @@ -1,2642 +1,4792 @@ { - "cells": [ - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "# Define Neo4j connections\n", - "import pandas as pd\n", - "from neo4j import GraphDatabase\n", - "host = 'neo4j://localhost:7687'\n", - "user = 'neo4j'\n", - "password = 'letmein'\n", - "driver = GraphDatabase.driver(host,auth=(user, password))" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [], - "source": [ - "def run_query(query):\n", - " with driver.session() as session:\n", - " result = session.run(query)\n", - " return pd.DataFrame([r.values() for r in result], columns=result.keys())" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [], - "source": [ - "import_queries = \"\"\"\n", - "\n", - "CALL apoc.schema.assert({Character:['name']},{Comic:['id'], Character:['id'], Event:['id'], Group:['id']});\n", - "\n", - "LOAD CSV WITH HEADERS FROM \"https://raw.githubusercontent.com/tomasonjo/blog-datasets/main/Marvel/heroes.csv\" as row\n", - "CREATE (c:Character)\n", - "SET c += row;\n", - "\n", - "LOAD CSV WITH HEADERS FROM \"https://raw.githubusercontent.com/tomasonjo/blog-datasets/main/Marvel/groups.csv\" as row\n", - "CREATE (c:Group)\n", - "SET c += row;\n", - "\n", - "LOAD CSV WITH HEADERS FROM \"https://raw.githubusercontent.com/tomasonjo/blog-datasets/main/Marvel/events.csv\" as row\n", - "CREATE (c:Event)\n", - "SET c += row;\n", - "\n", - "LOAD CSV WITH HEADERS FROM \"https://raw.githubusercontent.com/tomasonjo/blog-datasets/main/Marvel/comics.csv\" as row\n", - "CREATE (c:Comic)\n", - "SET c += apoc.map.clean(row,[],[\"null\"]);\n", - "\n", - "LOAD CSV WITH HEADERS FROM \"https://raw.githubusercontent.com/tomasonjo/blog-datasets/main/Marvel/heroToComics.csv\" as row\n", - "MATCH (c:Character{id:row.hero})\n", - "MATCH (co:Comic{id:row.comic})\n", - "MERGE (c)-[:APPEARED_IN]->(co);\n", - "\n", - "LOAD CSV WITH HEADERS FROM \"https://raw.githubusercontent.com/tomasonjo/blog-datasets/main/Marvel/heroToEvent.csv\" as row\n", - "MATCH (c:Character{id:row.hero})\n", - "MATCH (e:Event{id:row.event})\n", - "MERGE (c)-[:PART_OF_EVENT]->(e);\n", - "\n", - "LOAD CSV WITH HEADERS FROM \"https://raw.githubusercontent.com/tomasonjo/blog-datasets/main/Marvel/heroToGroup.csv\" as row\n", - "MATCH (c:Character{id:row.hero})\n", - "MATCH (g:Group{id:row.group})\n", - "MERGE (c)-[:PART_OF_GROUP]->(g);\n", - "\n", - "LOAD CSV WITH HEADERS FROM \"https://raw.githubusercontent.com/tomasonjo/blog-datasets/main/Marvel/heroToHero.csv\" as row\n", - "MATCH (s:Character{id:row.source})\n", - "MATCH (t:Character{id:row.target})\n", - "CALL apoc.create.relationship(s,row.type, {}, t) YIELD rel\n", - "RETURN distinct 'done';\n", - "\n", - "LOAD CSV WITH HEADERS FROM \"https://raw.githubusercontent.com/tomasonjo/blog-datasets/main/Marvel/heroStats.csv\" as row\n", - "MATCH (s:Character{id:row.hero})\n", - "CREATE (s)-[:HAS_STATS]->(stats:Stats)\n", - "SET stats += apoc.map.clean(row,['hero'],[]);\n", - "\n", - "LOAD CSV WITH HEADERS FROM \"https://raw.githubusercontent.com/tomasonjo/blog-datasets/main/Marvel/heroFlight.csv\" as row\n", - "MATCH (s:Character{id:row.hero})\n", - "SET s.flight = row.flight;\n", - "\n", - "MATCH (s:Stats)\n", - "WITH keys(s) as keys LIMIT 1\n", - "MATCH (s:Stats)\n", - "UNWIND keys as key\n", - "CALL apoc.create.setProperty(s, key, toInteger(s[key]))\n", - "YIELD node\n", - "RETURN distinct 'done';\n", - "\"\"\"" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Graph import" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "with driver.session() as session:\n", - " for statement in import_queries.split(';'):\n", - " try:\n", - " session.run(statement.strip())\n", - " except:\n", - " pass" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Graph schema\n", - "In the center of the graph, there are characters, also known as heroes. They can appear in multiple comics, are part of an event, and can belong to a group. For some of the characters, we also know their stats like speed and fighting skills. Finally, we have social ties between characters that represent relative, ally, or enemy relationships.\n", - "\n", - "There are 1105 characters that have appeared in 38875 comics.\n", - "We have stats for 470 of the characters. There are also 92 groups and 74 events stored in the graph.\n", - "## Exploratory graph analysis\n", - "To get to know our graph, we will begin with a basic graph data exploration process. First, we will take a look at the characters that have most frequently appeared in comics." - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "metadata": {}, - "outputs": [ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "view-in-github", + "colab_type": "text" + }, + "source": [ + "\"Open" + ] + }, + { + "cell_type": "markdown", + "source": [ + "* Updated to GDS 2.0 version\n", + "* Link to original blog post: https://towardsdatascience.com/exploratory-network-analysis-of-marvel-universe-c557f4959048" + ], + "metadata": { + "id": "iXLxc35asnnV" + } + }, { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
charactercomics
0Spider-Man (1602)3357
1Tony Stark2354
2Logan2098
3Steve Rogers2019
4Thor (Marvel: Avengers Alliance)1547
\n", - "
" + "cell_type": "code", + "source": [ + "!pip install neo4j" ], - "text/plain": [ - " character comics\n", - "0 Spider-Man (1602) 3357\n", - "1 Tony Stark 2354\n", - "2 Logan 2098\n", - "3 Steve Rogers 2019\n", - "4 Thor (Marvel: Avengers Alliance) 1547" + "metadata": { + "id": "a51ym4p3stDD", + "outputId": "8983fc4d-63f1-4537-ea41-c9c146601ee5", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "execution_count": 1, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Collecting neo4j\n", + " Downloading neo4j-4.4.2.tar.gz (89 kB)\n", + "\u001b[?25l\r\u001b[K |███▋ | 10 kB 19.9 MB/s eta 0:00:01\r\u001b[K |███████▎ | 20 kB 13.6 MB/s eta 0:00:01\r\u001b[K |███████████ | 30 kB 10.5 MB/s eta 0:00:01\r\u001b[K |██████████████▋ | 40 kB 9.2 MB/s eta 0:00:01\r\u001b[K |██████████████████▎ | 51 kB 4.3 MB/s eta 0:00:01\r\u001b[K |██████████████████████ | 61 kB 5.0 MB/s eta 0:00:01\r\u001b[K |█████████████████████████▋ | 71 kB 5.7 MB/s eta 0:00:01\r\u001b[K |█████████████████████████████▎ | 81 kB 6.1 MB/s eta 0:00:01\r\u001b[K |████████████████████████████████| 89 kB 4.0 MB/s \n", + "\u001b[?25hRequirement already satisfied: pytz in /usr/local/lib/python3.7/dist-packages (from neo4j) (2018.9)\n", + "Building wheels for collected packages: neo4j\n", + " Building wheel for neo4j (setup.py) ... \u001b[?25l\u001b[?25hdone\n", + " Created wheel for neo4j: filename=neo4j-4.4.2-py3-none-any.whl size=115365 sha256=37440807321d6c4c73bd85c38c1683cf854f8a8e68d3ed332d5782abd497de89\n", + " Stored in directory: /root/.cache/pip/wheels/10/d6/28/95029d7f69690dbc3b93e4933197357987de34fbd44b50a0e4\n", + "Successfully built neo4j\n", + "Installing collected packages: neo4j\n", + "Successfully installed neo4j-4.4.2\n" + ] + } ] - }, - "execution_count": 15, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "run_query(\"\"\"\n", - "MATCH (c:Character)\n", - "RETURN c.name as character, \n", - " size((c)-[:APPEARED_IN]->()) as comics\n", - "ORDER BY comics DESC\n", - "LIMIT 5\n", - "\"\"\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The top five most frequent characters come as no surprise. Spiderman is the most frequent or popular character. It is no wonder that they created a younger version of Spiderman just recently, given his popularity. Tony Stark, also known as Iron Man, is in second place. It seems that Logan, also known as Wolverine, was quite popular throughout history, but I think that his popularity slowly faded away in recent times. Steve Rogers, who goes by the more popular name Captain America, is also quite famous. It would seem that the recent Marvel movies showcased the more popular characters from the comics.\n", - "\n", - "Next, we will look at how many comics were released throughout the decades. The year of the comic is stored as a string in our graph, so we can use the substringfunction to extract the decade." - ] - }, - { - "cell_type": "code", - "execution_count": 16, - "metadata": {}, - "outputs": [ + }, { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
decadecount
0193095
11940584
21950756
319604114
419701956
519802428
619903738
720008309
8201011139
9202019
10None5737
\n", - "
" + "cell_type": "markdown", + "source": [ + "I recommend you setup a [blank project on Neo4j Sandbox environment](https://sandbox.neo4j.com/?usecase=blank-sandbox), but you can also use other environment versions\n", + "\n" ], - "text/plain": [ - " decade count\n", - "0 1930 95\n", - "1 1940 584\n", - "2 1950 756\n", - "3 1960 4114\n", - "4 1970 1956\n", - "5 1980 2428\n", - "6 1990 3738\n", - "7 2000 8309\n", - "8 2010 11139\n", - "9 2020 19\n", - "10 None 5737" + "metadata": { + "id": "_-yzSIlvswPi" + } + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "id": "uIStX_HGSsH-" + }, + "outputs": [], + "source": [ + "# Define Neo4j connections\n", + "import pandas as pd\n", + "from neo4j import GraphDatabase\n", + "\n", + "host = 'bolt://3.235.2.228:7687'\n", + "user = 'neo4j'\n", + "password = 'seats-drunks-carbon'\n", + "driver = GraphDatabase.driver(host,auth=(user, password))" ] - }, - "execution_count": 16, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "run_query(\"\"\"\n", - "MATCH (c:Comic)\n", - "RETURN substring(c.year, 0, 3) + \"0\" as decade, \n", - " count(*) as count\n", - "ORDER BY decade ASC\n", - "\"\"\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Interesting to see that the first comics were produced in the 1930s. Some of the heroes are relatively senior by now. There was a spike in the 1960s and then gradual progression over the decades with 11.139 comics in the 2010s. The last column represents comics with a null date, so for around 6000 comics out of 38000, we don’t have the date available. And we haven’t scraped all the comics in the 2020s either.\n", - "\n", - "Next, we will take a look at the most popular characters in the comics throughout the decades. We will iterate over comics and extract the top three most frequent heroes by the decade." - ] - }, - { - "cell_type": "code", - "execution_count": 17, - "metadata": {}, - "outputs": [ + }, { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
decadetop_3_characters
01930[Johnny Storm, Sub-Mariner, Archangel]
11940[Johnny Storm, Two-Gun Kid, Steve Rogers]
21950[Rawhide Kid, Tony Stark, Stephen Strange]
31960[Thor (Marvel: Avengers Alliance), Spider-Man ...
41970[Spider-Man (1602), Stephen Strange, Shang-Chi...
51980[Logan, Tony Stark, Spider-Man (1602)]
61990[Spider-Man (1602), Tony Stark, Steve Rogers]
72000[Spider-Man (1602), Tony Stark, Logan]
82010[Spider-Man (1602), Steve Rogers, Logan]
\n", - "
" + "cell_type": "code", + "execution_count": 3, + "metadata": { + "id": "I3C3cMuiSsIC" + }, + "outputs": [], + "source": [ + "def run_query(query):\n", + " with driver.session() as session:\n", + " result = session.run(query)\n", + " return pd.DataFrame([r.values() for r in result], columns=result.keys())" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": { + "id": "vVOKmHldSsIC" + }, + "outputs": [], + "source": [ + "import_queries = \"\"\"\n", + "\n", + "CALL apoc.schema.assert({Character:['name']},{Comic:['id'], Character:['id'], Event:['id'], Group:['id']});\n", + "\n", + "LOAD CSV WITH HEADERS FROM \"https://raw.githubusercontent.com/tomasonjo/blog-datasets/main/Marvel/heroes.csv\" as row\n", + "CREATE (c:Character)\n", + "SET c += row;\n", + "\n", + "LOAD CSV WITH HEADERS FROM \"https://raw.githubusercontent.com/tomasonjo/blog-datasets/main/Marvel/groups.csv\" as row\n", + "CREATE (c:Group)\n", + "SET c += row;\n", + "\n", + "LOAD CSV WITH HEADERS FROM \"https://raw.githubusercontent.com/tomasonjo/blog-datasets/main/Marvel/events.csv\" as row\n", + "CREATE (c:Event)\n", + "SET c += row;\n", + "\n", + "LOAD CSV WITH HEADERS FROM \"https://raw.githubusercontent.com/tomasonjo/blog-datasets/main/Marvel/comics.csv\" as row\n", + "CREATE (c:Comic)\n", + "SET c += apoc.map.clean(row,[],[\"null\"]);\n", + "\n", + "LOAD CSV WITH HEADERS FROM \"https://raw.githubusercontent.com/tomasonjo/blog-datasets/main/Marvel/heroToComics.csv\" as row\n", + "MATCH (c:Character{id:row.hero})\n", + "MATCH (co:Comic{id:row.comic})\n", + "MERGE (c)-[:APPEARED_IN]->(co);\n", + "\n", + "LOAD CSV WITH HEADERS FROM \"https://raw.githubusercontent.com/tomasonjo/blog-datasets/main/Marvel/heroToEvent.csv\" as row\n", + "MATCH (c:Character{id:row.hero})\n", + "MATCH (e:Event{id:row.event})\n", + "MERGE (c)-[:PART_OF_EVENT]->(e);\n", + "\n", + "LOAD CSV WITH HEADERS FROM \"https://raw.githubusercontent.com/tomasonjo/blog-datasets/main/Marvel/heroToGroup.csv\" as row\n", + "MATCH (c:Character{id:row.hero})\n", + "MATCH (g:Group{id:row.group})\n", + "MERGE (c)-[:PART_OF_GROUP]->(g);\n", + "\n", + "LOAD CSV WITH HEADERS FROM \"https://raw.githubusercontent.com/tomasonjo/blog-datasets/main/Marvel/heroToHero.csv\" as row\n", + "MATCH (s:Character{id:row.source})\n", + "MATCH (t:Character{id:row.target})\n", + "CALL apoc.create.relationship(s,row.type, {}, t) YIELD rel\n", + "RETURN distinct 'done';\n", + "\n", + "LOAD CSV WITH HEADERS FROM \"https://raw.githubusercontent.com/tomasonjo/blog-datasets/main/Marvel/heroStats.csv\" as row\n", + "MATCH (s:Character{id:row.hero})\n", + "CREATE (s)-[:HAS_STATS]->(stats:Stats)\n", + "SET stats += apoc.map.clean(row,['hero'],[]);\n", + "\n", + "LOAD CSV WITH HEADERS FROM \"https://raw.githubusercontent.com/tomasonjo/blog-datasets/main/Marvel/heroFlight.csv\" as row\n", + "MATCH (s:Character{id:row.hero})\n", + "SET s.flight = row.flight;\n", + "\n", + "MATCH (s:Stats)\n", + "WITH keys(s) as keys LIMIT 1\n", + "MATCH (s:Stats)\n", + "UNWIND keys as key\n", + "CALL apoc.create.setProperty(s, key, toInteger(s[key]))\n", + "YIELD node\n", + "RETURN distinct 'done';\n", + "\"\"\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "uPqZhSJXSsIE" + }, + "source": [ + "## Graph import" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": { + "id": "0SgWfhB9SsIF" + }, + "outputs": [], + "source": [ + "with driver.session() as session:\n", + " for statement in import_queries.split(';'):\n", + " try:\n", + " session.run(statement.strip())\n", + " except:\n", + " pass" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "pDc-7vY2SsIG" + }, + "source": [ + "## Graph schema\n", + "In the center of the graph, there are characters, also known as heroes. They can appear in multiple comics, are part of an event, and can belong to a group. For some of the characters, we also know their stats like speed and fighting skills. Finally, we have social ties between characters that represent relative, ally, or enemy relationships.\n", + "\n", + "There are 1105 characters that have appeared in 38875 comics.\n", + "We have stats for 470 of the characters. There are also 92 groups and 74 events stored in the graph.\n", + "## Exploratory graph analysis\n", + "To get to know our graph, we will begin with a basic graph data exploration process. First, we will take a look at the characters that have most frequently appeared in comics." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": { + "id": "5znOehe4SsIG", + "outputId": "2637175d-e557-4577-8a63-d4fad1bac047", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 206 + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " character comics\n", + "0 Spider-Man (1602) 3357\n", + "1 Tony Stark 2354\n", + "2 Logan 2098\n", + "3 Steve Rogers 2019\n", + "4 Thor (Marvel: Avengers Alliance) 1547" + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
charactercomics
0Spider-Man (1602)3357
1Tony Stark2354
2Logan2098
3Steve Rogers2019
4Thor (Marvel: Avengers Alliance)1547
\n", + "
\n", + " \n", + " \n", + " \n", + "\n", + " \n", + "
\n", + "
\n", + " " + ] + }, + "metadata": {}, + "execution_count": 6 + } ], - "text/plain": [ - " decade top_3_characters\n", - "0 1930 [Johnny Storm, Sub-Mariner, Archangel]\n", - "1 1940 [Johnny Storm, Two-Gun Kid, Steve Rogers]\n", - "2 1950 [Rawhide Kid, Tony Stark, Stephen Strange]\n", - "3 1960 [Thor (Marvel: Avengers Alliance), Spider-Man ...\n", - "4 1970 [Spider-Man (1602), Stephen Strange, Shang-Chi...\n", - "5 1980 [Logan, Tony Stark, Spider-Man (1602)]\n", - "6 1990 [Spider-Man (1602), Tony Stark, Steve Rogers]\n", - "7 2000 [Spider-Man (1602), Tony Stark, Logan]\n", - "8 2010 [Spider-Man (1602), Steve Rogers, Logan]" + "source": [ + "run_query(\"\"\"\n", + "MATCH (c:Character)\n", + "RETURN c.name as character, \n", + " size((c)-[:APPEARED_IN]->()) as comics\n", + "ORDER BY comics DESC\n", + "LIMIT 5\n", + "\"\"\")" ] - }, - "execution_count": 17, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "run_query(\"\"\"\n", - "MATCH (c:Comic)<-[:APPEARED_IN]-(c1:Character)\n", - "WHERE NOT c.year = \"null\"\n", - "WITH substring(c.year,0,3) + \"0\" as decade, \n", - " c1.name as character, \n", - " count(*) as count\n", - "ORDER BY count DESC\n", - "RETURN decade, collect(character)[..3] as top_3_characters\n", - "ORDER BY decade\n", - "\"\"\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "It seems it all started with Johnny Storm, also known as the Human Torch. Iron Man (Tony Stark) was already popular in the 1950s, and Spiderman and Captain America (Steve Rogers) have risen in popularity in the 1960s. From then on, it seems that Spiderman, Wolverine, Iron Man, and Captain America win the popularity contest.\n", - "You might be wondering what the events are in our graph, so let’s take a look. We will examine the events with the highest count of participating heroes." - ] - }, - { - "cell_type": "code", - "execution_count": 18, - "metadata": {}, - "outputs": [ + }, { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
eventcount_of_heroesstartenddescription
0Fear Itself1322011-04-16 00:00:002011-10-18 00:00:00The Serpent, God of Fear and brother to the Al...
1Dark Reign1282008-12-01 00:00:002009-12-31 12:59:00Norman Osborn came out the hero of Secret Inva...
2Acts of Vengeance!931989-12-10 00:00:002008-01-04 00:00:00Loki sets about convincing the super-villains ...
3Secret Invasion892008-06-02 00:00:002009-01-25 00:00:00The shape-shifting Skrulls have been infiltrat...
4Civil War862006-07-01 00:00:002007-01-29 00:00:00After a horrific tragedy raises questions on w...
\n", - "
" + "cell_type": "markdown", + "metadata": { + "id": "DH8gYWHDSsII" + }, + "source": [ + "The top five most frequent characters come as no surprise. Spiderman is the most frequent or popular character. It is no wonder that they created a younger version of Spiderman just recently, given his popularity. Tony Stark, also known as Iron Man, is in second place. It seems that Logan, also known as Wolverine, was quite popular throughout history, but I think that his popularity slowly faded away in recent times. Steve Rogers, who goes by the more popular name Captain America, is also quite famous. It would seem that the recent Marvel movies showcased the more popular characters from the comics.\n", + "\n", + "Next, we will look at how many comics were released throughout the decades. The year of the comic is stored as a string in our graph, so we can use the substringfunction to extract the decade." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": { + "id": "WVFZR8EhSsIJ", + "outputId": "6c89d8ca-c443-4e67-f451-5baa1d9017ed", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 394 + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " decade count\n", + "0 1930 95\n", + "1 1940 584\n", + "2 1950 756\n", + "3 1960 4114\n", + "4 1970 1956\n", + "5 1980 2428\n", + "6 1990 3738\n", + "7 2000 8309\n", + "8 2010 11139\n", + "9 2020 19\n", + "10 None 5737" + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
decadecount
0193095
11940584
21950756
319604114
419701956
519802428
619903738
720008309
8201011139
9202019
10None5737
\n", + "
\n", + " \n", + " \n", + " \n", + "\n", + " \n", + "
\n", + "
\n", + " " + ] + }, + "metadata": {}, + "execution_count": 7 + } ], - "text/plain": [ - " event count_of_heroes start \\\n", - "0 Fear Itself 132 2011-04-16 00:00:00 \n", - "1 Dark Reign 128 2008-12-01 00:00:00 \n", - "2 Acts of Vengeance! 93 1989-12-10 00:00:00 \n", - "3 Secret Invasion 89 2008-06-02 00:00:00 \n", - "4 Civil War 86 2006-07-01 00:00:00 \n", - "\n", - " end description \n", - "0 2011-10-18 00:00:00 The Serpent, God of Fear and brother to the Al... \n", - "1 2009-12-31 12:59:00 Norman Osborn came out the hero of Secret Inva... \n", - "2 2008-01-04 00:00:00 Loki sets about convincing the super-villains ... \n", - "3 2009-01-25 00:00:00 The shape-shifting Skrulls have been infiltrat... \n", - "4 2007-01-29 00:00:00 After a horrific tragedy raises questions on w... " + "source": [ + "run_query(\"\"\"\n", + "MATCH (c:Comic)\n", + "RETURN substring(c.year, 0, 3) + \"0\" as decade, \n", + " count(*) as count\n", + "ORDER BY decade ASC\n", + "\"\"\")" ] - }, - "execution_count": 18, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "run_query(\"\"\"\n", - "MATCH (e:Event)\n", - "RETURN e.title as event, \n", - " size((e)<-[:PART_OF_EVENT]-()) as count_of_heroes,\n", - " e.start as start,\n", - " e.end as end,\n", - " e.description as description \n", - "ORDER BY count_of_heroes DESC \n", - "LIMIT 5\n", - "\"\"\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "I have little to no idea what these events represent, but it is interesting to see that many characters participate. Most of the events span over less than a year, while the Acts of Vengeance spans over two decades. And judging by the description, Loki had something to do with it along with 92! other characters. Unfortunately, we don’t have the connection between comics and events stored in our graph to allow further analysis. If someone will scrape the Marvel API, I will gladly add it to the dataset.\n", - "\n", - "Let’s also take a look at the biggest groups of characters." - ] - }, - { - "cell_type": "code", - "execution_count": 19, - "metadata": {}, - "outputs": [ + }, { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
groupmembers
0X-Men41
1Avengers31
2Defenders26
3Next Avengers14
4Guardians of the Galaxy12
\n", - "
" + "cell_type": "markdown", + "metadata": { + "id": "lDjnXxBYSsIJ" + }, + "source": [ + "Interesting to see that the first comics were produced in the 1930s. Some of the heroes are relatively senior by now. There was a spike in the 1960s and then gradual progression over the decades with 11.139 comics in the 2010s. The last column represents comics with a null date, so for around 6000 comics out of 38000, we don’t have the date available. And we haven’t scraped all the comics in the 2020s either.\n", + "\n", + "Next, we will take a look at the most popular characters in the comics throughout the decades. We will iterate over comics and extract the top three most frequent heroes by the decade." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": { + "id": "r03jU04TSsIK", + "outputId": "a6f64b13-a35c-4218-b653-792564b2c9c9", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 332 + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " decade top_3_characters\n", + "0 1930 [Johnny Storm, Sub-Mariner, Archangel]\n", + "1 1940 [Johnny Storm, Two-Gun Kid, Steve Rogers]\n", + "2 1950 [Rawhide Kid, Tony Stark, Stephen Strange]\n", + "3 1960 [Thor (Marvel: Avengers Alliance), Spider-Man ...\n", + "4 1970 [Spider-Man (1602), Stephen Strange, Shang-Chi...\n", + "5 1980 [Logan, Tony Stark, Spider-Man (1602)]\n", + "6 1990 [Spider-Man (1602), Tony Stark, Steve Rogers]\n", + "7 2000 [Spider-Man (1602), Tony Stark, Logan]\n", + "8 2010 [Spider-Man (1602), Steve Rogers, Logan]" + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
decadetop_3_characters
01930[Johnny Storm, Sub-Mariner, Archangel]
11940[Johnny Storm, Two-Gun Kid, Steve Rogers]
21950[Rawhide Kid, Tony Stark, Stephen Strange]
31960[Thor (Marvel: Avengers Alliance), Spider-Man ...
41970[Spider-Man (1602), Stephen Strange, Shang-Chi...
51980[Logan, Tony Stark, Spider-Man (1602)]
61990[Spider-Man (1602), Tony Stark, Steve Rogers]
72000[Spider-Man (1602), Tony Stark, Logan]
82010[Spider-Man (1602), Steve Rogers, Logan]
\n", + "
\n", + " \n", + " \n", + " \n", + "\n", + " \n", + "
\n", + "
\n", + " " + ] + }, + "metadata": {}, + "execution_count": 8 + } ], - "text/plain": [ - " group members\n", - "0 X-Men 41\n", - "1 Avengers 31\n", - "2 Defenders 26\n", - "3 Next Avengers 14\n", - "4 Guardians of the Galaxy 12" + "source": [ + "run_query(\"\"\"\n", + "MATCH (c:Comic)<-[:APPEARED_IN]-(c1:Character)\n", + "WHERE NOT c.year = \"null\"\n", + "WITH substring(c.year,0,3) + \"0\" as decade, \n", + " c1.name as character, \n", + " count(*) as count\n", + "ORDER BY count DESC\n", + "RETURN decade, collect(character)[..3] as top_3_characters\n", + "ORDER BY decade\n", + "\"\"\")" ] - }, - "execution_count": 19, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "run_query(\"\"\"\n", - "MATCH (g:Group)\n", - "RETURN g.name as group, \n", - " size((g)<-[:PART_OF_GROUP]-()) as members\n", - "ORDER BY members DESC LIMIT 5\n", - "\"\"\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "There are 41 characters in X-Men, which makes sense as they had a whole academy. You might be surprised by 31 members of Avengers, but in the comics, there were many members of Avengers, although most are former members.\n", - "\n", - "Just because we can, let’s inspect if some members of the same group are also enemies." - ] - }, - { - "cell_type": "code", - "execution_count": 20, - "metadata": {}, - "outputs": [ + }, + { + "cell_type": "markdown", + "metadata": { + "id": "K87WMRe8SsIK" + }, + "source": [ + "It seems it all started with Johnny Storm, also known as the Human Torch. Iron Man (Tony Stark) was already popular in the 1950s, and Spiderman and Captain America (Steve Rogers) have risen in popularity in the 1960s. From then on, it seems that Spiderman, Wolverine, Iron Man, and Captain America win the popularity contest.\n", + "You might be wondering what the events are in our graph, so let’s take a look. We will examine the events with the highest count of participating heroes." + ] + }, { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
character1character2group
0CAIN MARKO JUGGERNAUTStorm (Marvel Heroes)X-Men
1LoganMystique (House of M)X-Men
2CAIN MARKO JUGGERNAUTLoganX-Men
3LoganSabretooth (House of M)X-Men
4Rogue (X-Men: Battle of the Atom)Warren Worthington IIIX-Men
\n", - "
" + "cell_type": "code", + "execution_count": 9, + "metadata": { + "id": "-kYriTS7SsIL", + "outputId": "b6240ecf-037e-4e21-ad4f-502ba24ae8ff", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 206 + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " event count_of_heroes start \\\n", + "0 Fear Itself 132 2011-04-16 00:00:00 \n", + "1 Dark Reign 128 2008-12-01 00:00:00 \n", + "2 Acts of Vengeance! 93 1989-12-10 00:00:00 \n", + "3 Secret Invasion 89 2008-06-02 00:00:00 \n", + "4 Civil War 86 2006-07-01 00:00:00 \n", + "\n", + " end description \n", + "0 2011-10-18 00:00:00 The Serpent, God of Fear and brother to the Al... \n", + "1 2009-12-31 12:59:00 Norman Osborn came out the hero of Secret Inva... \n", + "2 2008-01-04 00:00:00 Loki sets about convincing the super-villains ... \n", + "3 2009-01-25 00:00:00 The shape-shifting Skrulls have been infiltrat... \n", + "4 2007-01-29 00:00:00 After a horrific tragedy raises questions on w... " + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
eventcount_of_heroesstartenddescription
0Fear Itself1322011-04-16 00:00:002011-10-18 00:00:00The Serpent, God of Fear and brother to the Al...
1Dark Reign1282008-12-01 00:00:002009-12-31 12:59:00Norman Osborn came out the hero of Secret Inva...
2Acts of Vengeance!931989-12-10 00:00:002008-01-04 00:00:00Loki sets about convincing the super-villains ...
3Secret Invasion892008-06-02 00:00:002009-01-25 00:00:00The shape-shifting Skrulls have been infiltrat...
4Civil War862006-07-01 00:00:002007-01-29 00:00:00After a horrific tragedy raises questions on w...
\n", + "
\n", + " \n", + " \n", + " \n", + "\n", + " \n", + "
\n", + "
\n", + " " + ] + }, + "metadata": {}, + "execution_count": 9 + } ], - "text/plain": [ - " character1 character2 group\n", - "0 CAIN MARKO JUGGERNAUT Storm (Marvel Heroes) X-Men\n", - "1 Logan Mystique (House of M) X-Men\n", - "2 CAIN MARKO JUGGERNAUT Logan X-Men\n", - "3 Logan Sabretooth (House of M) X-Men\n", - "4 Rogue (X-Men: Battle of the Atom) Warren Worthington III X-Men" + "source": [ + "run_query(\"\"\"\n", + "MATCH (e:Event)\n", + "RETURN e.title as event, \n", + " size((e)<-[:PART_OF_EVENT]-()) as count_of_heroes,\n", + " e.start as start,\n", + " e.end as end,\n", + " e.description as description \n", + "ORDER BY count_of_heroes DESC \n", + "LIMIT 5\n", + "\"\"\")" ] - }, - "execution_count": 20, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "run_query(\"\"\"\n", - "MATCH (c1:Character)-[:PART_OF_GROUP]->(g:Group)<-[:PART_OF_GROUP]-(c2:Character)\n", - "WHERE (c1)-[:ENEMY]-(c2) and id(c1) < id(c2)\n", - "RETURN c1.name as character1, c2.name as character2, g.name as group\n", - "\"\"\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "It seems that Logan does not get along with some of the other X-Men. For some of the characters, we also have the place of origin and education available, so let’s quickly look at that. During the scraping, I noticed a hero originated from Yugoslavia, so I wonder if there are more characters from Yugoslavia." - ] - }, - { - "cell_type": "code", - "execution_count": 21, - "metadata": {}, - "outputs": [ + }, + { + "cell_type": "markdown", + "metadata": { + "id": "nvfiqgprSsIL" + }, + "source": [ + "I have little to no idea what these events represent, but it is interesting to see that many characters participate. Most of the events span over less than a year, while the Acts of Vengeance spans over two decades. And judging by the description, Loki had something to do with it along with 92! other characters. Unfortunately, we don’t have the connection between comics and events stored in our graph to allow further analysis. If someone will scrape the Marvel API, I will gladly add it to the dataset.\n", + "\n", + "Let’s also take a look at the biggest groups of characters." + ] + }, { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
characterplace_of_originaliases
0Purple ManRijeka, YugoslaviaKillgrave the Purple Man, Killy
1Abomination (Ultimate)Zagreb, YugoslaviaAgent R-7, the Ravager of Worlds
\n", - "
" + "cell_type": "code", + "execution_count": 10, + "metadata": { + "id": "FiOynPnrSsIL", + "outputId": "04a0dfc5-ed58-46ff-c296-104f0c580471", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 206 + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " group members\n", + "0 X-Men 41\n", + "1 Avengers 31\n", + "2 Defenders 26\n", + "3 Next Avengers 14\n", + "4 Guardians of the Galaxy 12" + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
groupmembers
0X-Men41
1Avengers31
2Defenders26
3Next Avengers14
4Guardians of the Galaxy12
\n", + "
\n", + " \n", + " \n", + " \n", + "\n", + " \n", + "
\n", + "
\n", + " " + ] + }, + "metadata": {}, + "execution_count": 10 + } ], - "text/plain": [ - " character place_of_origin \\\n", - "0 Purple Man Rijeka, Yugoslavia \n", - "1 Abomination (Ultimate) Zagreb, Yugoslavia \n", - "\n", - " aliases \n", - "0 Killgrave the Purple Man, Killy \n", - "1 Agent R-7, the Ravager of Worlds " + "source": [ + "run_query(\"\"\"\n", + "MATCH (g:Group)\n", + "RETURN g.name as group, \n", + " size((g)<-[:PART_OF_GROUP]-()) as members\n", + "ORDER BY members DESC LIMIT 5\n", + "\"\"\")" ] - }, - "execution_count": 21, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "run_query(\"\"\"\n", - "MATCH (c:Character)\n", - "WHERE c.place_of_origin contains \"Yugoslavia\"\n", - "RETURN c.name as character, \n", - " c.place_of_origin as place_of_origin,\n", - " c.aliases as aliases\n", - "\"\"\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Two characters originated from today’s Croatia, which is less than two hours drive from where I live. Let’s also check out all the characters that completed their Ph.D. degree." - ] - }, - { - "cell_type": "code", - "execution_count": 22, - "metadata": {}, - "outputs": [ + }, + { + "cell_type": "markdown", + "metadata": { + "id": "W-oBTLSySsIM" + }, + "source": [ + "There are 41 characters in X-Men, which makes sense as they had a whole academy. You might be surprised by 31 members of Avengers, but in the comics, there were many members of Avengers, although most are former members.\n", + "\n", + "Just because we can, let’s inspect if some members of the same group are also enemies." + ] + }, { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
charactereducation
0Doc SamsonPh.D. in psychiatry
1Goliath (Bill Foster)Ph.D. in Biochemistry from California Technica...
2MoonstonePh.D. in Psychology
3Beast (Earth-311)Ph.D. Biophysics
4Radioactive ManPh.D. in physics
5KillmongerPh.D. in engineering and MBA
6NightshadeExtensively self-taught in multiple discipline...
7HumbugPh.D. in entomology
8Sunset BainPh.D. from Massachusetts Institute of Technology
9AlbionPh.D. in History, B.A. in English Literature
\n", - "
" + "cell_type": "code", + "execution_count": 11, + "metadata": { + "id": "4_jPXMWOSsIM", + "outputId": "36d2343b-df69-449a-e0f8-01c97b1fb7e0", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 206 + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " character1 character2 group\n", + "0 Logan Sabretooth (House of M) X-Men\n", + "1 Logan Mystique (House of M) X-Men\n", + "2 CAIN MARKO JUGGERNAUT Logan X-Men\n", + "3 CAIN MARKO JUGGERNAUT Storm (Marvel Heroes) X-Men\n", + "4 Rogue (X-Men: Battle of the Atom) Warren Worthington III X-Men" + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
character1character2group
0LoganSabretooth (House of M)X-Men
1LoganMystique (House of M)X-Men
2CAIN MARKO JUGGERNAUTLoganX-Men
3CAIN MARKO JUGGERNAUTStorm (Marvel Heroes)X-Men
4Rogue (X-Men: Battle of the Atom)Warren Worthington IIIX-Men
\n", + "
\n", + " \n", + " \n", + " \n", + "\n", + " \n", + "
\n", + "
\n", + " " + ] + }, + "metadata": {}, + "execution_count": 11 + } ], - "text/plain": [ - " character education\n", - "0 Doc Samson Ph.D. in psychiatry\n", - "1 Goliath (Bill Foster) Ph.D. in Biochemistry from California Technica...\n", - "2 Moonstone Ph.D. in Psychology\n", - "3 Beast (Earth-311) Ph.D. Biophysics\n", - "4 Radioactive Man Ph.D. in physics\n", - "5 Killmonger Ph.D. in engineering and MBA\n", - "6 Nightshade Extensively self-taught in multiple discipline...\n", - "7 Humbug Ph.D. in entomology\n", - "8 Sunset Bain Ph.D. from Massachusetts Institute of Technology\n", - "9 Albion Ph.D. in History, B.A. in English Literature" + "source": [ + "run_query(\"\"\"\n", + "MATCH (c1:Character)-[:PART_OF_GROUP]->(g:Group)<-[:PART_OF_GROUP]-(c2:Character)\n", + "WHERE (c1)-[:ENEMY]-(c2) and id(c1) < id(c2)\n", + "RETURN c1.name as character1, c2.name as character2, g.name as group\n", + "\"\"\")" ] - }, - "execution_count": 22, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "run_query(\"\"\"\n", - "MATCH (c:Character)\n", - "WHERE c.education contains \"Ph.D\"\n", - "RETURN c.name as character, c.education as education\n", - "LIMIT 10\n", - "\"\"\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "It looks like a lot of these heroes are quite employable. Only Nightshade seems a bit dodgy. It feels like something one would put on their LinkedIn profile to get noticed when searching for Ph.D. profiles. By the way, did you know that Professor X has four Ph.D.s and is also MD in psychiatry? Quite the educated men.\n", - "## Analyzing communities of allies and relatives\n", - "We have examined basic graph statistics, and now we will focus more on network analysis. We will investigate the social ties between characters.\n", - "To start, we will calculate the degree values for each relationship type between characters and display the heroes with the highest overall degree." - ] - }, - { - "cell_type": "code", - "execution_count": 23, - "metadata": {}, - "outputs": [ + }, + { + "cell_type": "markdown", + "metadata": { + "id": "7a97nGmbSsIN" + }, + "source": [ + "It seems that Logan does not get along with some of the other X-Men. For some of the characters, we also have the place of origin and education available, so let’s quickly look at that. During the scraping, I noticed a hero originated from Yugoslavia, so I wonder if there are more characters from Yugoslavia." + ] + }, { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
namealliesenemiesrelative
0Scarlet Witch (Marvel Heroes)16148
1Thor (Marvel: Avengers Alliance)91410
2Invisible Woman (Marvel: Avengers Alliance)13107
3Logan14105
4Karnak6217
\n", - "
" + "cell_type": "code", + "execution_count": 12, + "metadata": { + "id": "VyWjGhDmSsIN", + "outputId": "01f06941-5ac1-4c74-a7ad-fc8427157e79", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 112 + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " character place_of_origin \\\n", + "0 Purple Man Rijeka, Yugoslavia \n", + "1 Abomination (Ultimate) Zagreb, Yugoslavia \n", + "\n", + " aliases \n", + "0 Killgrave the Purple Man, Killy \n", + "1 Agent R-7, the Ravager of Worlds " + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
characterplace_of_originaliases
0Purple ManRijeka, YugoslaviaKillgrave the Purple Man, Killy
1Abomination (Ultimate)Zagreb, YugoslaviaAgent R-7, the Ravager of Worlds
\n", + "
\n", + " \n", + " \n", + " \n", + "\n", + " \n", + "
\n", + "
\n", + " " + ] + }, + "metadata": {}, + "execution_count": 12 + } ], - "text/plain": [ - " name allies enemies relative\n", - "0 Scarlet Witch (Marvel Heroes) 16 14 8\n", - "1 Thor (Marvel: Avengers Alliance) 9 14 10\n", - "2 Invisible Woman (Marvel: Avengers Alliance) 13 10 7\n", - "3 Logan 14 10 5\n", - "4 Karnak 6 2 17" + "source": [ + "run_query(\"\"\"\n", + "MATCH (c:Character)\n", + "WHERE c.place_of_origin contains \"Yugoslavia\"\n", + "RETURN c.name as character, \n", + " c.place_of_origin as place_of_origin,\n", + " c.aliases as aliases\n", + "\"\"\")" ] - }, - "execution_count": 23, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "run_query(\"\"\"\n", - "MATCH (c:Character)\n", - "RETURN c.name as name,\n", - " size((c)-[:ALLY]->()) as allies,\n", - " size((c)-[:ENEMY]->()) as enemies,\n", - " size((c)-[:RELATIVE]->()) as relative\n", - "ORDER BY allies + enemies + relative DESC \n", - "LIMIT 5\n", - "\"\"\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Scarlet Witch and Thor seem to have the most direct enemies. Wolverine has the most allies but also many enemies. It looks like Triton has a big family with 17 direct relative relationships. We can use the `apoc.path.subgraphAll` procedure to examine the relatives' community of Triton." - ] - }, - { - "cell_type": "code", - "execution_count": 24, - "metadata": {}, - "outputs": [ + }, + { + "cell_type": "markdown", + "metadata": { + "id": "hOqK29i2SsIN" + }, + "source": [ + "Two characters originated from today’s Croatia, which is less than two hours drive from where I live. Let’s also check out all the characters that completed their Ph.D. degree." + ] + }, { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
nodesrelationships
0[(aliases, education, identity, name, id, plac...[(), (), (), (), (), (), (), (), (), (), (), (...
\n", - "
" + "cell_type": "code", + "execution_count": 13, + "metadata": { + "id": "jkBUu0vqSsIO", + "outputId": "92e6148e-dca1-492c-9256-b7687bfe3677", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 363 + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " character \\\n", + "0 UNKNOWN ACHEBE \n", + "1 PROFESSOR MENDEL STROMM MENDEL STROMM \n", + "2 FRANKLIN HALL GRAVITON \n", + "3 Morbius \n", + "4 Tony Stark \n", + "5 Hulk-dok \n", + "6 Sasquatch (Walter Langkowski) \n", + "7 Professor X (Ultimate) \n", + "8 Klaw \n", + "9 High Evolutionary \n", + "\n", + " education \n", + "0 Ph.D. in Law (Yale), degrees in Psychology, Po... \n", + "1 Ph.D. in robotics \n", + "2 Ph.D. in physics \n", + "3 Ph.D in Biochemistry \n", + "4 Ph.Ds in physics and electrical engineering \n", + "5 Ph.D in nuclear physics and two other fields \n", + "6 Ph.D. in physics from the Massachusetts Instit... \n", + "7 Ph.Ds in genetics, biophysics, psychology, and... \n", + "8 Ph.D. in physics, bachelor’s degree in geology \n", + "9 Uncompleted Ph.D at Oxford University " + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
charactereducation
0UNKNOWN ACHEBEPh.D. in Law (Yale), degrees in Psychology, Po...
1PROFESSOR MENDEL STROMM MENDEL STROMMPh.D. in robotics
2FRANKLIN HALL GRAVITONPh.D. in physics
3MorbiusPh.D in Biochemistry
4Tony StarkPh.Ds in physics and electrical engineering
5Hulk-dokPh.D in nuclear physics and two other fields
6Sasquatch (Walter Langkowski)Ph.D. in physics from the Massachusetts Instit...
7Professor X (Ultimate)Ph.Ds in genetics, biophysics, psychology, and...
8KlawPh.D. in physics, bachelor’s degree in geology
9High EvolutionaryUncompleted Ph.D at Oxford University
\n", + "
\n", + " \n", + " \n", + " \n", + "\n", + " \n", + "
\n", + "
\n", + " " + ] + }, + "metadata": {}, + "execution_count": 13 + } ], - "text/plain": [ - " nodes \\\n", - "0 [(aliases, education, identity, name, id, plac... \n", - "\n", - " relationships \n", - "0 [(), (), (), (), (), (), (), (), (), (), (), (... " + "source": [ + "run_query(\"\"\"\n", + "MATCH (c:Character)\n", + "WHERE c.education contains \"Ph.D\"\n", + "RETURN c.name as character, c.education as education\n", + "LIMIT 10\n", + "\"\"\")" ] - }, - "execution_count": 24, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "run_query(\"\"\"\n", - "MATCH p=(c:Character{name:\"Triton\"})\n", - "CALL apoc.path.subgraphAll(id(c), {relationshipFilter:\"RELATIVE\"})\n", - "YIELD nodes, relationships\n", - "RETURN nodes, relationships\n", - "\"\"\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "I never knew that some of the Marvel heroes have quite a big happy family. It wouldn’t be accurate if there weren’t a black sheep of the family present. Maximus looks like the family’s black sheep here as he has four enemies within the family. You might wonder why ally and enemy relationships are shown when we only traversed the relative ties. Neo4j Browser has a feature that displays all connections between nodes on the screen.\n", - "\n", - "## Weakly Connected Components algorithm\n", - "The Weakly Connected Components is a part of almost every graph analysis workflow. It is used to find disconnected components or islands within the network. In this example, the graph consists of two components. Michael, Mark, and Doug belong to the first component, while Bridget, Alice, and Charles belong to the second component. We will apply the Weakly Connected Components algorithm to find the largest component of allied characters. As we don’t plan to run any other algorithms on this network, we will use the anonymous graph projection." - ] - }, - { - "cell_type": "code", - "execution_count": 25, - "metadata": {}, - "outputs": [ + }, + { + "cell_type": "markdown", + "metadata": { + "id": "64g3z5ZKSsIO" + }, + "source": [ + "It looks like a lot of these heroes are quite employable. Only Nightshade seems a bit dodgy. It feels like something one would put on their LinkedIn profile to get noticed when searching for Ph.D. profiles. By the way, did you know that Professor X has four Ph.D.s and is also MD in psychiatry? Quite the educated men.\n", + "## Analyzing communities of allies and relatives\n", + "We have examined basic graph statistics, and now we will focus more on network analysis. We will investigate the social ties between characters.\n", + "To start, we will calculate the degree values for each relationship type between characters and display the heroes with the highest overall degree." + ] + }, { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
componentIdmembers
00195
17724
21993
37522
47482
\n", - "
" + "cell_type": "code", + "execution_count": 14, + "metadata": { + "id": "ngpqol3mSsIO", + "outputId": "01481c9d-4f13-4b4f-ac79-0a643b3ae2ea", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 206 + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " name allies enemies relative\n", + "0 Scarlet Witch (Marvel Heroes) 16 14 8\n", + "1 Thor (Marvel: Avengers Alliance) 9 14 10\n", + "2 Invisible Woman (Marvel: Avengers Alliance) 13 10 7\n", + "3 Logan 14 10 5\n", + "4 Karnak 6 2 17" + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
namealliesenemiesrelative
0Scarlet Witch (Marvel Heroes)16148
1Thor (Marvel: Avengers Alliance)91410
2Invisible Woman (Marvel: Avengers Alliance)13107
3Logan14105
4Karnak6217
\n", + "
\n", + " \n", + " \n", + " \n", + "\n", + " \n", + "
\n", + "
\n", + " " + ] + }, + "metadata": {}, + "execution_count": 14 + } ], - "text/plain": [ - " componentId members\n", - "0 0 195\n", - "1 772 4\n", - "2 199 3\n", - "3 752 2\n", - "4 748 2" + "source": [ + "run_query(\"\"\"\n", + "MATCH (c:Character)\n", + "RETURN c.name as name,\n", + " size((c)-[:ALLY]->()) as allies,\n", + " size((c)-[:ENEMY]->()) as enemies,\n", + " size((c)-[:RELATIVE]->()) as relative\n", + "ORDER BY allies + enemies + relative DESC \n", + "LIMIT 5\n", + "\"\"\")" ] - }, - "execution_count": 25, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "run_query(\"\"\"\n", - "CALL gds.wcc.stream({\n", - " nodeProjection:'Character',\n", - " relationshipProjection:'ALLY'})\n", - "YIELD nodeId, componentId\n", - "WITH componentId, count(*) as members\n", - "WHERE members > 1\n", - "RETURN componentId, members\n", - "ORDER BY members DESC\n", - "LIMIT 5\n", - "\"\"\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The largest component of allies has 195 members. Then we have a couple of tiny allies islands with only a few members. If we visualize the largest component of allies in the Neo4j Browser and have the connect results nodes option selected, we get the following visualization.\n", - "\n", - "Although we have found the largest allies component, we can observe that many of the characters in the component are actually enemies (red relationships). To better understand why this occurs, let’s look at the following example." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Custom ally component algorithm\n", - "Suppose we wanted to find communities of allies where there are no enemies within the given component. The algorithm implementation is relatively straightforward, and you could use Neo4j custom procedures, for example. Still, if you are like me and don’t speak Java, you can always resort to your favorite scripting language. I have developed the custom Ally component algorithm in Python. First, we define some helper functions for fetching allies and enemies of a single node." - ] - }, - { - "cell_type": "code", - "execution_count": 26, - "metadata": {}, - "outputs": [], - "source": [ - "def get_allies(node_id):\n", - " data = session.run(\"\"\"MATCH (c:Character)-[:ALLY]-(ally) \n", - " WHERE c.id = $node_id \n", - " RETURN collect(ally.id) as allies\"\"\",\n", - " {'node_id':node_id})\n", - " return data.single()['allies']\n", - "\n", - "def get_enemies(node_id):\n", - " data = session.run(\"\"\"MATCH (c:Character)-[:ENEMY]-(enemy) \n", - " WHERE c.id = $node_id \n", - " RETURN collect(enemy.id) as allies\"\"\",\n", - " {'node_id':node_id})\n", - " return data.single()['allies']\n", - "\n", - "def get_members():\n", - " return session.run(\"\"\"\n", - " CALL gds.wcc.stream({\n", - " nodeProjection:'Character',\n", - " relationshipProjection:'ALLY'})\n", - " YIELD nodeId, componentId\n", - " WITH componentId, collect(gds.util.asNode(nodeId).id) as members\n", - " WHERE size(members) > 10\n", - " RETURN componentId, members\n", - " \"\"\").single()['members']" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "My implementation is relatively simple. The input to the algorithm is the list of all node ids in the largest allied components. Start from a single node, load its enemies into the enemies list and load its allies into a queue that will be processed later. Then we iterate over the allied queue. If a node is not an enemy with any of the existing nodes in the component, add them to the community list and add their enemies to the community’s enemies list. I’ve added some minor performance tweaks like if we have traversed the node already in the allies queue, we can remove that node from the global list of starting nodes." - ] - }, - { - "cell_type": "code", - "execution_count": 33, - "metadata": {}, - "outputs": [], - "source": [ - "from collections import deque\n", - "\n", - "def get_largest_stable_allies(node_list):\n", - " final_communities = list()\n", - " while node_list:\n", - " community = set()\n", - " enemies_list = set()\n", - " visited = set()\n", - " \n", - " allies_list = deque()\n", - " allies_list.appendleft(node_list[0])\n", - " \n", - " while allies_list:\n", - " # Get the node from the queue\n", - " start_node = allies_list.pop()\n", - " \n", - " # Skip if current node is enemy with anyone\n", - " if start_node in enemies_list:\n", - " continue\n", - " \n", - " # Get allies and enemies\n", - " allies = get_allies(start_node)\n", - " enemies = get_enemies(start_node)\n", - " \n", - " visited.add(start_node)\n", - " # Add enemies\n", - " enemies_list.update(enemies)\n", - " # Add allies to the list of next visits\n", - " allies_list.extendleft([x for x in allies if (x not in enemies_list) and (x not in visited)])\n", - " # Add current node to community\n", - " community.add(start_node)\n", - " \n", - " # Remove visited nodes from global node list\n", - " try:\n", - " node_list.remove(start_node)\n", - " except:\n", - " pass\n", - " final_communities.append(list(community))\n", - " return max(final_communities, key=len)" - ] - }, - { - "cell_type": "code", - "execution_count": 34, - "metadata": {}, - "outputs": [ + }, { - "data": { - "text/plain": [ - "['cbe82e9f-ab46-4351-b3ed-b83bb9474c45',\n", - " 'b9b4c676-7fb8-4d55-91b1-718a4194a84e',\n", - " '09cdc768-8215-4518-8789-0c0d065a937d',\n", - " 'b11ae496-02aa-42a9-b9d8-09f47eda2dea',\n", - " '79444afc-74d2-4149-8bc7-4608ca46b220',\n", - " '3ec99504-71f5-4300-bf94-e25748cadbdb',\n", - " '84e21ea4-8f4e-404b-8be2-d4742cf14100',\n", - " '58bb9a56-41c0-4895-83c2-0a098a7e244a',\n", - " '8728da4e-32c6-469b-9469-b3e8233d22d7',\n", - " '33010f8d-9c11-4150-afc3-60eb99228c74',\n", - " 'e6ba329c-99cd-433c-9a4b-7704542c7638',\n", - " '1b7345c8-3534-4dae-86e5-44afeedbdbf7',\n", - " '7fda53b6-048b-4712-b7e7-823dc41867a6',\n", - " 'c9238bda-687a-4cef-94d4-5c7fbdc4789a',\n", - " '0f3d523c-204c-438d-ae0e-2379ad264d97',\n", - " '48fdcbf7-c928-46ea-9aa2-182b71ecc7ae',\n", - " '67328fa7-4fc4-477f-bfc9-a556b41c9cbb',\n", - " 'bcb5832b-d2f2-4ef3-9f64-11f99a13bde9',\n", - " 'ad7aa449-6be7-494b-8c85-5a31f170d8cf',\n", - " '0defe21f-0289-480a-b227-3ec712d9b080',\n", - " '54a64fdf-6d5b-4c0b-b51b-b23407df8652',\n", - " '952db63e-9d8f-480d-b77f-3b92a759d0ea',\n", - " 'e84aa488-32ae-4026-b906-cff81c3e4735',\n", - " '36c8f4fb-612c-41e7-9565-3b47009f77e7',\n", - " '5e208f97-0faf-4965-9ff0-a301a492473e',\n", - " 'd6b0736d-3d94-4482-b44d-35da66ee08d4',\n", - " '2470246f-d1e8-4e33-b687-8ef2385bf1e8',\n", - " '7b018aa5-33ab-4f1a-a5ee-7ba2cd626b9b',\n", - " 'c431aa75-e146-4424-8d82-d2bf4ec90019',\n", - " '3c65ff51-d217-4f91-8cd5-4988e7709ced',\n", - " 'ff28c7ef-f142-44ac-9007-c0e828fce62b',\n", - " 'e1178968-bf0b-4477-9cb7-a92bbb13a234',\n", - " 'a1047c83-faad-40c0-8361-b75ede304d39',\n", - " 'bd1d9edc-c59d-4ac2-9175-8f2e37888c7f',\n", - " 'd2c30a1e-3a2b-4716-b1e0-4fef8c9bfe1f',\n", - " '246ee50b-2b04-48c4-87f4-9e6f24c0c389',\n", - " 'fd659c02-0ac4-4dce-be1d-750fa9eb48a0',\n", - " 'e8a83f57-6717-403f-84e9-56ecded817c7',\n", - " '84888a4c-38a6-4c15-9ae5-e807934bf0a1',\n", - " '4331cf46-99e2-4b5d-967d-011d9e80c326',\n", - " '2e73d163-0c6e-4d14-815d-c42ca60b5c9e',\n", - " '7f2c9491-876d-4335-9963-d99a1d33b886',\n", - " '02d3876c-8ba0-4a7e-b749-be5fa1897169',\n", - " 'db45aa16-0da8-4c30-a6d6-d2ffe8f8719c',\n", - " 'c485aa0d-c65f-41b3-a4d2-1af3ed217e22',\n", - " 'a0c2db0e-f565-4953-969e-ca11493c92d8',\n", - " '9cacdab8-2e9e-4446-a6f6-ff0fe6cfaee7',\n", - " 'b1fa175b-5425-4b12-9a39-e6adff1fd0a9',\n", - " '22cb1883-a5d6-470f-91b0-73008fa18fe1',\n", - " 'be28ee6f-7cb3-4c61-8d26-9a49a24214c3',\n", - " '8f6665e0-f8e6-4b0e-a557-16c11e00f951',\n", - " 'd4920ff6-62fc-41c0-a0eb-432638dfa949',\n", - " 'e0b5cd04-9573-4c02-83dd-2ad1df8c68ac',\n", - " 'ad6003b1-7e99-4467-a046-f3fc4deab043',\n", - " '00cc62ce-d881-44ed-b0d6-7ec0841077b1',\n", - " '2ba70ea4-0c6e-439e-ba82-5fda67a1a6e6',\n", - " '48e3455d-c108-46b4-87d2-885426266aa8',\n", - " '2a19068f-cabc-4922-8004-1f1cb2c945a1',\n", - " 'c9f50852-5adc-42d1-8c37-7681071c8529',\n", - " '69157309-1072-486c-a00d-4852989fa19f',\n", - " '5ee4bb0c-e647-48b1-8dca-f173405d0ae8',\n", - " '36e9a5f5-37ef-45eb-b1c7-bd320cf572e1',\n", - " '74128e40-1bc8-44ab-ac6b-1e12cd431a2e',\n", - " 'f04de33d-64cd-47a7-aab8-a7ba874576ce',\n", - " 'e027479d-969e-485e-851a-2473cd2c2731',\n", - " 'b869c676-a78c-4cfa-af42-bc17af1ef741',\n", - " '5adb38ae-41fc-420a-b555-f446669401c8',\n", - " 'c7da34d0-bc0d-4f10-81db-9d591c236325',\n", - " '2e844ba1-9786-4ee7-ba32-61d85eccbea0',\n", - " '69450170-f0e0-4d86-8484-268536b32925',\n", - " 'c52f0d2e-c478-4307-8f71-3f6d751f6b4e',\n", - " '5e6cd684-4d14-430f-a880-47fe6226ef43',\n", - " '54ef7a3c-cc94-46a3-a575-b90e45ccdc31',\n", - " 'eadd8f5e-399a-4aad-8d6b-02eb641dea64',\n", - " 'db7ff8c3-ba60-4218-8a44-50ce1c61fe78',\n", - " '567ce31c-8848-4974-87e8-ecff9b649371',\n", - " 'd1e13ba3-b3eb-4fb0-a07c-e1c4e0b788f8',\n", - " '48cd33ec-9c69-473d-a1d7-d3b654fd5119',\n", - " '927ac3a8-7139-4735-94c7-31fa5d80b835',\n", - " '8d4b5284-d94d-4bfd-bcf5-5fcda7b03a40',\n", - " 'f1ec7818-22a3-43bd-9292-206027ef0384',\n", - " '0d4673bd-843e-4ef6-91c3-6e46c8b506df',\n", - " 'cef01eb5-7abd-4301-9377-57dbf508700c',\n", - " '32a2fc37-8c14-4220-ba50-b8144e2744e4',\n", - " '43ebaa36-b8bf-4f84-9e37-c42377489522',\n", - " 'ac75b839-5c59-47f5-a75d-313b87c61925',\n", - " 'ce2d47bd-e0b6-4833-af53-d2c284145863',\n", - " '16239a1e-a119-4f7b-9fdb-ee9d1178d779',\n", - " 'd089f294-a95e-4446-9a88-4abdde5081b8',\n", - " 'f7fafb58-6fdb-4948-9baa-a09283efd7e0',\n", - " '9c4e5a4c-8bd7-421f-bcd0-4fe4bc8b26d9',\n", - " '1add487d-d6ab-43dc-8d84-bf76cfcfcfb9',\n", - " '9a664df4-1f99-4d38-9098-8a63bf322d43',\n", - " '1af24a18-c1bb-4542-a9b4-b1f537035a98',\n", - " 'c0c7d949-1658-4e33-80fb-5b25bdc329b8',\n", - " '1c02ce63-41d9-4e6e-974e-dd2a5fa1183f',\n", - " '0addb92c-a068-41a0-a447-c89c3f3323aa',\n", - " '8590a607-0702-4b60-b4a6-de178effbe6d',\n", - " 'f32b8909-9207-4e5b-aa01-7a348e7fe8ac',\n", - " '263a1e51-326b-45f7-91b4-298acbaa6291',\n", - " 'af3c94db-e147-4272-b9fd-eca13bf45f23',\n", - " '51b6a301-e57c-48fb-950b-f37e3f19ecda',\n", - " '6e75668b-dc28-479e-b6a5-01b6327fbd57',\n", - " '51db3028-6637-42ec-bf24-f249b12a2b40',\n", - " '2253bd88-3f20-42f7-8540-cbf3674532da',\n", - " '482cf9d9-4c6b-4af2-a3b3-8fb8b8d5f39f',\n", - " '682f57bc-9422-43db-9406-224cda3735fc',\n", - " 'f4a95cc5-a812-468a-8d61-c999405677be',\n", - " 'ce4e47da-c2e8-4e18-8378-b8db0b0266b9',\n", - " '280a1394-51cc-4491-bc46-41c7bfd00f83',\n", - " '28015751-dea1-4478-99f1-98d98d721113',\n", - " '04290c46-731d-4afb-ba0c-aa5cf5ed2be5',\n", - " '262500db-86d4-45a7-9a4a-c4544057ff70',\n", - " '7b8635c0-9a4a-4a10-a66d-43c79b4e505b',\n", - " '7613b9ae-52d3-40de-8861-81df6d7111ce',\n", - " '271c3f2a-340e-49f8-8519-6532c933c6fa',\n", - " 'c1897cfa-41de-4a88-be0c-a5ba965bf839',\n", - " '83dae6e1-bda9-4d45-8b37-44997da2ac85',\n", - " '89b72dd5-8ea3-45f2-8271-0a0ea222ab8d',\n", - " '221ff1d2-1c2d-4c0a-8710-e1a91d589f77',\n", - " '6b5296f2-8a41-4816-818a-c698a80b3d1c',\n", - " 'a891aaf5-7aed-4216-aed8-f128c0f182ec',\n", - " '1658c37f-3f02-4feb-8c18-ecd825ebbd85',\n", - " 'cf2b03a3-d071-4a89-82ca-94e31ea1f151',\n", - " '05856574-6524-4d1a-b293-77288599c5a1',\n", - " '5f4e3b38-f4b5-45e5-adcb-03347afbfe49',\n", - " '1091571f-9fcc-4b0c-b756-f87d3b67c591',\n", - " 'c6c96262-9695-4e29-9e39-8870b5c6d373',\n", - " 'bb2bd40e-f547-46b7-ab65-2f7f192bfd31',\n", - " '50a78f57-d806-4479-9871-d616cf1a5f3d',\n", - " '920ee492-02a7-48ad-b5e4-3f8b76071e78',\n", - " '705d7d9e-ac82-48f0-a825-e2b90dac7da9',\n", - " '916a9f53-38b1-4dcc-a35a-8113dc166227',\n", - " '7612cca2-e740-4606-a823-a6ae7a699e22',\n", - " 'a8521022-a926-4e58-95dd-ecc26def8ee8',\n", - " '423a200d-9ffc-4f74-a748-3be4e771c346',\n", - " '7bef3917-91d7-4277-bdbf-01b4526c19fb',\n", - " '90b5dc02-8256-40a5-bb89-8fd303648011']" + "cell_type": "markdown", + "metadata": { + "id": "e8zoVxfvSsIP" + }, + "source": [ + "Scarlet Witch and Thor seem to have the most direct enemies. Wolverine has the most allies but also many enemies. It looks like Triton has a big family with 17 direct relative relationships. We can use the `apoc.path.subgraphAll` procedure to examine the relatives' community of Triton." ] - }, - "execution_count": 34, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "members = get_members()\n", - "get_largest_stable_allies(members)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "In this code, the algorithm only returns the ids of nodes that belong to the largest allied component where there are no enemies within. It shouldn’t be a problem to mark these nodes in Neo4j, as you can match them by their ids. The largest component of allies, where there are no enemies within, has 142 members. If we visualize it in Neo4j Browser, we can see that there are no enemy relationships visible." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Analyzing characters’ stats\n", - "In the last part of our analysis, we will examine the stats of the characters. We have the stats available for a total of 470 heroes. This information was scraped from Marvel’s website. The scale for stats ranges from zero to seven, and Iron Man does not have a single seven. Probably not the strongest of the heroes, even though he is one of the more popular ones. Now we will explore the characters with the highest stats average. Whenever I need some help with my cypher queries, I turn to Neo4j Slack. Luckily for us, Andrew Bowman is always around with great advice on optimizing and prettifying our cypher queries. This time he showed me the `apoc.map.values` procedure. It can be used to fetch all properties of a single node without explicitly writing the property keys." - ] - }, - { - "cell_type": "code", - "execution_count": 35, - "metadata": {}, - "outputs": [ + }, { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
characteraverage_stats
0Asylum7.0
1CHTHON7.0
2Bloodscream7.0
3Dracula7.0
4Eternity7.0
5Reaper7.0
6Living Tribunal7.0
7Hyperion (Earth-712)7.0
8Juggernaut7.0
9SET7.0
\n", - "
" + "cell_type": "code", + "execution_count": 15, + "metadata": { + "id": "iuYqg9DNSsIP", + "outputId": "34bd025f-efd6-4005-f011-9e911f08c1f8", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 81 + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " nodes \\\n", + "0 [(aliases, education, identity, name, id, plac... \n", + "\n", + " relationships \n", + "0 [(), (), (), (), (), (), (), (), (), (), (), (... " + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
nodesrelationships
0[(aliases, education, identity, name, id, plac...[(), (), (), (), (), (), (), (), (), (), (), (...
\n", + "
\n", + " \n", + " \n", + " \n", + "\n", + " \n", + "
\n", + "
\n", + " " + ] + }, + "metadata": {}, + "execution_count": 15 + } ], - "text/plain": [ - " character average_stats\n", - "0 Asylum 7.0\n", - "1 CHTHON 7.0\n", - "2 Bloodscream 7.0\n", - "3 Dracula 7.0\n", - "4 Eternity 7.0\n", - "5 Reaper 7.0\n", - "6 Living Tribunal 7.0\n", - "7 Hyperion (Earth-712) 7.0\n", - "8 Juggernaut 7.0\n", - "9 SET 7.0" + "source": [ + "run_query(\"\"\"\n", + "MATCH p=(c:Character{name:\"Triton\"})\n", + "CALL apoc.path.subgraphAll(id(c), {relationshipFilter:\"RELATIVE\"})\n", + "YIELD nodes, relationships\n", + "RETURN nodes, relationships\n", + "\"\"\")" ] - }, - "execution_count": 35, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "run_query(\"\"\"\n", - "MATCH (c:Character)-[:HAS_STATS]->(stats)\n", - "RETURN c.name as character, \n", - " apoc.coll.avg(apoc.map.values(stats, keys(stats))) as average_stats\n", - "ORDER BY average_stats DESC\n", - "LIMIT 10\n", - "\"\"\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "It seems many characters have their stats maxed out. I am not sure exactly how this data collection process works, but I found a fascinating heroine by the name of Squirrel Girl that could probably kick Iron Man’s ass with one hand while making sourdough bread with the other. Or polish her nails, not exactly sure what type of girl she is. The only thing certain is that she is a badass.\n", - "## k-Nearest Neighbours algorithm\n", - "The k-Nearest Neighbour is one of the more standard graph algorithms and was already implemented in the Graph Data Science library before in the form of Cosine, Euclidian, and Pearson similarity algorithms. Those were basic implementation where the algorithms compared a given vector for all node pairs in the network. Because comparing all node pairs does not scale well, another implementation of the kNN algorithm was added to the library. It is based on the Efficient k-nearest neighbor graph construction for generic similarity measures article. Instead of comparing every node pair, the algorithm selects possible neighbors based on the assumption that the neighbors-of-neighbors of a node are most likely already the nearest one. The algorithm scales quasi-linear with respect to the node count instead of being quadratic. The implementation uses the Cosine similarity to compare two vectors.\n", - "First, we need to create a vector (array of numbers) that will be compared between the pairs of heroes. We will use the characters’ stats as well as their ability to fly to populate the vector. Because all stats have the same range between zero and seven, there is no need for normalization. We only need to encode the flight feature to span between zero and seven as well. Those characters that can fly will have the value of flight feature seven, while those who can’t fly will have the value zero." - ] - }, - { - "cell_type": "code", - "execution_count": 36, - "metadata": {}, - "outputs": [ + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ULqVl7c1SsIP" + }, + "source": [ + "I never knew that some of the Marvel heroes have quite a big happy family. It wouldn’t be accurate if there weren’t a black sheep of the family present. Maximus looks like the family’s black sheep here as he has four enemies within the family. You might wonder why ally and enemy relationships are shown when we only traversed the relative ties. Neo4j Browser has a feature that displays all connections between nodes on the screen.\n", + "\n", + "## Weakly Connected Components algorithm\n", + "The Weakly Connected Components is a part of almost every graph analysis workflow. It is used to find disconnected components or islands within the network. In this example, the graph consists of two components. Michael, Mark, and Doug belong to the first component, while Bridget, Alice, and Charles belong to the second component. We will apply the Weakly Connected Components algorithm to find the largest component of allied characters. As we don’t plan to run any other algorithms on this network, we will use the anonymous graph projection." + ] + }, { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
\n", - "
" + "cell_type": "code", + "source": [ + "run_query(\"\"\"\n", + "CALL gds.graph.project('allies', 'Character', 'ALLY')\n", + "\"\"\")" ], - "text/plain": [ - "Empty DataFrame\n", - "Columns: []\n", - "Index: []" + "metadata": { + "id": "bRJ6FcK3thiW", + "outputId": "e2f25953-af46-4dc6-d348-388253b1731c", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 81 + } + }, + "execution_count": 19, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " nodeProjection \\\n", + "0 {'Character': {'label': 'Character', 'properti... \n", + "\n", + " relationshipProjection graphName nodeCount \\\n", + "0 {'ALLY': {'orientation': 'NATURAL', 'aggregati... allies 1105 \n", + "\n", + " relationshipCount projectMillis \n", + "0 313 27 " + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
nodeProjectionrelationshipProjectiongraphNamenodeCountrelationshipCountprojectMillis
0{'Character': {'label': 'Character', 'properti...{'ALLY': {'orientation': 'NATURAL', 'aggregati...allies110531327
\n", + "
\n", + " \n", + " \n", + " \n", + "\n", + " \n", + "
\n", + "
\n", + " " + ] + }, + "metadata": {}, + "execution_count": 19 + } ] - }, - "execution_count": 36, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "run_query(\"\"\"\n", - "MATCH (c:Character)-[:HAS_STATS]->(s)\n", - "WITH c, [s.durability, s.energy, s.fighting_skills, \n", - " s.intelligence, s.speed, s.strength,\n", - " CASE WHEN c.flight = 'true' THEN 7 ELSE 0 END] as stats_vector\n", - "SET c.stats_vector = stats_vector\n", - "\"\"\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We will also tag the characters that have the stats vector with a second label. This way, we can easily filter heroes with a stats vector in our native projection of the named graph." - ] - }, - { - "cell_type": "code", - "execution_count": 37, - "metadata": {}, - "outputs": [ + }, { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
\n", - "
" + "cell_type": "code", + "execution_count": 17, + "metadata": { + "id": "_sTRSNI4SsIP", + "outputId": "cd39e100-0d26-41ad-f92a-74eb034da933", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 206 + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " componentId members\n", + "0 0 195\n", + "1 26 4\n", + "2 245 3\n", + "3 6 2\n", + "4 2 2" + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
componentIdmembers
00195
1264
22453
362
422
\n", + "
\n", + " \n", + " \n", + " \n", + "\n", + " \n", + "
\n", + "
\n", + " " + ] + }, + "metadata": {}, + "execution_count": 17 + } ], - "text/plain": [ - "Empty DataFrame\n", - "Columns: []\n", - "Index: []" + "source": [ + "run_query(\"\"\"\n", + "CALL gds.wcc.stream('allies')\n", + "YIELD nodeId, componentId\n", + "WITH componentId, count(*) as members\n", + "WHERE members > 1\n", + "RETURN componentId, members\n", + "ORDER BY members DESC\n", + "LIMIT 5\n", + "\"\"\")" ] - }, - "execution_count": 37, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "run_query(\"\"\"\n", - "MATCH (c:Character)\n", - "WHERE exists (c.stats_vector)\n", - "SET c:CharacterStats\n", - "\"\"\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now that everything is ready, we can go ahead and load our named graph. We will project all nodes with the CharacterStats label and their stats_vector properties in a named graph. If you need a quick refresher or introduction to how the GDS library works, I would suggest taking the Introduction to Graph Algorithms course." - ] - }, - { - "cell_type": "code", - "execution_count": 38, - "metadata": {}, - "outputs": [ + }, + { + "cell_type": "markdown", + "metadata": { + "id": "GIoYK30eSsIQ" + }, + "source": [ + "The largest component of allies has 195 members. Then we have a couple of tiny allies islands with only a few members. If we visualize the largest component of allies in the Neo4j Browser and have the connect results nodes option selected, we get the following visualization.\n", + "\n", + "Although we have found the largest allies component, we can observe that many of the characters in the component are actually enemies (red relationships). To better understand why this occurs, let’s look at the following example." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "xRX4udRpSsIQ" + }, + "source": [ + "## Custom ally component algorithm\n", + "Suppose we wanted to find communities of allies where there are no enemies within the given component. The algorithm implementation is relatively straightforward, and you could use Neo4j custom procedures, for example. Still, if you are like me and don’t speak Java, you can always resort to your favorite scripting language. I have developed the custom Ally component algorithm in Python. First, we define some helper functions for fetching allies and enemies of a single node." + ] + }, { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
nodeProjectionrelationshipProjectiongraphNamenodeCountrelationshipCountcreateMillis
0{'CharacterStats': {'properties': {'stats_vect...{'__ALL__': {'orientation': 'NATURAL', 'aggreg...marvel47051536
\n", - "
" + "cell_type": "code", + "execution_count": 20, + "metadata": { + "id": "Mws_aYp3SsIQ" + }, + "outputs": [], + "source": [ + "def get_allies(node_id):\n", + " data = session.run(\"\"\"MATCH (c:Character)-[:ALLY]-(ally) \n", + " WHERE c.id = $node_id \n", + " RETURN collect(ally.id) as allies\"\"\",\n", + " {'node_id':node_id})\n", + " return data.single()['allies']\n", + "\n", + "def get_enemies(node_id):\n", + " data = session.run(\"\"\"MATCH (c:Character)-[:ENEMY]-(enemy) \n", + " WHERE c.id = $node_id \n", + " RETURN collect(enemy.id) as allies\"\"\",\n", + " {'node_id':node_id})\n", + " return data.single()['allies']\n", + "\n", + "def get_members():\n", + " return session.run(\"\"\"\n", + " CALL gds.wcc.stream('allies')\n", + " YIELD nodeId, componentId\n", + " WITH componentId, collect(gds.util.asNode(nodeId).id) as members\n", + " WHERE size(members) > 10\n", + " RETURN componentId, members\n", + " \"\"\").single()['members']" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "c0gNv0AvSsIR" + }, + "source": [ + "My implementation is relatively simple. The input to the algorithm is the list of all node ids in the largest allied components. Start from a single node, load its enemies into the enemies list and load its allies into a queue that will be processed later. Then we iterate over the allied queue. If a node is not an enemy with any of the existing nodes in the component, add them to the community list and add their enemies to the community’s enemies list. I’ve added some minor performance tweaks like if we have traversed the node already in the allies queue, we can remove that node from the global list of starting nodes." + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": { + "id": "9VelqZB5SsIR" + }, + "outputs": [], + "source": [ + "from collections import deque\n", + "\n", + "def get_largest_stable_allies(node_list):\n", + " final_communities = list()\n", + " while node_list:\n", + " community = set()\n", + " enemies_list = set()\n", + " visited = set()\n", + " \n", + " allies_list = deque()\n", + " allies_list.appendleft(node_list[0])\n", + " \n", + " while allies_list:\n", + " # Get the node from the queue\n", + " start_node = allies_list.pop()\n", + " \n", + " # Skip if current node is enemy with anyone\n", + " if start_node in enemies_list:\n", + " continue\n", + " \n", + " # Get allies and enemies\n", + " allies = get_allies(start_node)\n", + " enemies = get_enemies(start_node)\n", + " \n", + " visited.add(start_node)\n", + " # Add enemies\n", + " enemies_list.update(enemies)\n", + " # Add allies to the list of next visits\n", + " allies_list.extendleft([x for x in allies if (x not in enemies_list) and (x not in visited)])\n", + " # Add current node to community\n", + " community.add(start_node)\n", + " \n", + " # Remove visited nodes from global node list\n", + " try:\n", + " node_list.remove(start_node)\n", + " except:\n", + " pass\n", + " final_communities.append(list(community))\n", + " return max(final_communities, key=len)" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": { + "id": "2h03aJT2SsIR", + "outputId": "a81deb75-7b02-4eac-dedc-0ced7f93c7e8", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "['2253bd88-3f20-42f7-8540-cbf3674532da',\n", + " '84888a4c-38a6-4c15-9ae5-e807934bf0a1',\n", + " 'c6c96262-9695-4e29-9e39-8870b5c6d373',\n", + " 'b869c676-a78c-4cfa-af42-bc17af1ef741',\n", + " '7b8635c0-9a4a-4a10-a66d-43c79b4e505b',\n", + " '9153ead5-bfb9-476c-95b1-db110bbdb336',\n", + " '3c65ff51-d217-4f91-8cd5-4988e7709ced',\n", + " 'bd1d9edc-c59d-4ac2-9175-8f2e37888c7f',\n", + " '69450170-f0e0-4d86-8484-268536b32925',\n", + " '2e73d163-0c6e-4d14-815d-c42ca60b5c9e',\n", + " 'db7ff8c3-ba60-4218-8a44-50ce1c61fe78',\n", + " 'e6ba329c-99cd-433c-9a4b-7704542c7638',\n", + " 'bb2bd40e-f547-46b7-ab65-2f7f192bfd31',\n", + " '51b6a301-e57c-48fb-950b-f37e3f19ecda',\n", + " '952db63e-9d8f-480d-b77f-3b92a759d0ea',\n", + " '6e75668b-dc28-479e-b6a5-01b6327fbd57',\n", + " '1658c37f-3f02-4feb-8c18-ecd825ebbd85',\n", + " '263a1e51-326b-45f7-91b4-298acbaa6291',\n", + " '3ec99504-71f5-4300-bf94-e25748cadbdb',\n", + " 'cbe82e9f-ab46-4351-b3ed-b83bb9474c45',\n", + " '2a19068f-cabc-4922-8004-1f1cb2c945a1',\n", + " '1b838aa2-c05e-4a85-aaed-b5ffda25cdb5',\n", + " 'a4532c76-9c68-412e-a31c-6d51483354be',\n", + " '69157309-1072-486c-a00d-4852989fa19f',\n", + " '0f3d523c-204c-438d-ae0e-2379ad264d97',\n", + " '9c4e5a4c-8bd7-421f-bcd0-4fe4bc8b26d9',\n", + " '927ac3a8-7139-4735-94c7-31fa5d80b835',\n", + " '22cb1883-a5d6-470f-91b0-73008fa18fe1',\n", + " '00cc62ce-d881-44ed-b0d6-7ec0841077b1',\n", + " '74128e40-1bc8-44ab-ac6b-1e12cd431a2e',\n", + " '5f4e3b38-f4b5-45e5-adcb-03347afbfe49',\n", + " '33010f8d-9c11-4150-afc3-60eb99228c74',\n", + " 'f3099ddb-f5e5-45d5-b227-5eb14c03f298',\n", + " '280a1394-51cc-4491-bc46-41c7bfd00f83',\n", + " '36e9a5f5-37ef-45eb-b1c7-bd320cf572e1',\n", + " 'c9238bda-687a-4cef-94d4-5c7fbdc4789a',\n", + " 'c431aa75-e146-4424-8d82-d2bf4ec90019',\n", + " 'fd659c02-0ac4-4dce-be1d-750fa9eb48a0',\n", + " '4d7a7eee-5da7-4d4e-9dc2-d8dca18da46b',\n", + " 'd089f294-a95e-4446-9a88-4abdde5081b8',\n", + " '2ba70ea4-0c6e-439e-ba82-5fda67a1a6e6',\n", + " '262500db-86d4-45a7-9a4a-c4544057ff70',\n", + " '1af24a18-c1bb-4542-a9b4-b1f537035a98',\n", + " 'b11ae496-02aa-42a9-b9d8-09f47eda2dea',\n", + " 'f1ec7818-22a3-43bd-9292-206027ef0384',\n", + " '246ee50b-2b04-48c4-87f4-9e6f24c0c389',\n", + " 'cef01eb5-7abd-4301-9377-57dbf508700c',\n", + " 'bcb5832b-d2f2-4ef3-9f64-11f99a13bde9',\n", + " 'ff28c7ef-f142-44ac-9007-c0e828fce62b',\n", + " 'c52f0d2e-c478-4307-8f71-3f6d751f6b4e',\n", + " '682f57bc-9422-43db-9406-224cda3735fc',\n", + " '58bb9a56-41c0-4895-83c2-0a098a7e244a',\n", + " '36c8f4fb-612c-41e7-9565-3b47009f77e7',\n", + " 'cf2b03a3-d071-4a89-82ca-94e31ea1f151',\n", + " 'ad7aa449-6be7-494b-8c85-5a31f170d8cf',\n", + " '0defe21f-0289-480a-b227-3ec712d9b080',\n", + " '705d7d9e-ac82-48f0-a825-e2b90dac7da9',\n", + " 'eadd8f5e-399a-4aad-8d6b-02eb641dea64',\n", + " 'b1fa175b-5425-4b12-9a39-e6adff1fd0a9',\n", + " '54a64fdf-6d5b-4c0b-b51b-b23407df8652',\n", + " '1f9fc7ec-5861-4133-bc65-8ae5f73542c2',\n", + " '7f2c9491-876d-4335-9963-d99a1d33b886',\n", + " '1091571f-9fcc-4b0c-b756-f87d3b67c591',\n", + " 'db45aa16-0da8-4c30-a6d6-d2ffe8f8719c',\n", + " '04290c46-731d-4afb-ba0c-aa5cf5ed2be5',\n", + " 'e84aa488-32ae-4026-b906-cff81c3e4735',\n", + " 'd4920ff6-62fc-41c0-a0eb-432638dfa949',\n", + " '7613b9ae-52d3-40de-8861-81df6d7111ce',\n", + " '4331cf46-99e2-4b5d-967d-011d9e80c326',\n", + " 'a1047c83-faad-40c0-8361-b75ede304d39',\n", + " '1b7345c8-3534-4dae-86e5-44afeedbdbf7',\n", + " '5adb38ae-41fc-420a-b555-f446669401c8',\n", + " 'f32b8909-9207-4e5b-aa01-7a348e7fe8ac',\n", + " '48fdcbf7-c928-46ea-9aa2-182b71ecc7ae',\n", + " 'af3c94db-e147-4272-b9fd-eca13bf45f23',\n", + " 'cab2b9ec-2162-4d76-a7cf-6b12249e30e7',\n", + " '0addb92c-a068-41a0-a447-c89c3f3323aa',\n", + " '916a9f53-38b1-4dcc-a35a-8113dc166227',\n", + " '16239a1e-a119-4f7b-9fdb-ee9d1178d779',\n", + " 'ce4e47da-c2e8-4e18-8378-b8db0b0266b9',\n", + " 'c9f50852-5adc-42d1-8c37-7681071c8529',\n", + " '67328fa7-4fc4-477f-bfc9-a556b41c9cbb',\n", + " '32a2fc37-8c14-4220-ba50-b8144e2744e4',\n", + " 'a891aaf5-7aed-4216-aed8-f128c0f182ec',\n", + " 'd6b0736d-3d94-4482-b44d-35da66ee08d4',\n", + " '6b5296f2-8a41-4816-818a-c698a80b3d1c',\n", + " '5e208f97-0faf-4965-9ff0-a301a492473e',\n", + " 'ad6003b1-7e99-4467-a046-f3fc4deab043',\n", + " '5e6cd684-4d14-430f-a880-47fe6226ef43',\n", + " '77871f33-4165-4b8a-829c-f20a61a9a9bc',\n", + " 'e0b5cd04-9573-4c02-83dd-2ad1df8c68ac',\n", + " 'c7da34d0-bc0d-4f10-81db-9d591c236325',\n", + " '482cf9d9-4c6b-4af2-a3b3-8fb8b8d5f39f',\n", + " '28015751-dea1-4478-99f1-98d98d721113',\n", + " '8728da4e-32c6-469b-9469-b3e8233d22d7',\n", + " '2e844ba1-9786-4ee7-ba32-61d85eccbea0',\n", + " '271c3f2a-340e-49f8-8519-6532c933c6fa',\n", + " 'ac75b839-5c59-47f5-a75d-313b87c61925',\n", + " 'c0c7d949-1658-4e33-80fb-5b25bdc329b8',\n", + " '7612cca2-e740-4606-a823-a6ae7a699e22',\n", + " '48e3455d-c108-46b4-87d2-885426266aa8',\n", + " 'c485aa0d-c65f-41b3-a4d2-1af3ed217e22',\n", + " '8590a607-0702-4b60-b4a6-de178effbe6d',\n", + " '02d3876c-8ba0-4a7e-b749-be5fa1897169',\n", + " '8f6665e0-f8e6-4b0e-a557-16c11e00f951',\n", + " '423a200d-9ffc-4f74-a748-3be4e771c346',\n", + " '05856574-6524-4d1a-b293-77288599c5a1',\n", + " 'f04de33d-64cd-47a7-aab8-a7ba874576ce',\n", + " 'f7fafb58-6fdb-4948-9baa-a09283efd7e0',\n", + " '83dae6e1-bda9-4d45-8b37-44997da2ac85',\n", + " '09cdc768-8215-4518-8789-0c0d065a937d',\n", + " 'e1178968-bf0b-4477-9cb7-a92bbb13a234',\n", + " '1add487d-d6ab-43dc-8d84-bf76cfcfcfb9',\n", + " '54ef7a3c-cc94-46a3-a575-b90e45ccdc31',\n", + " 'd2c30a1e-3a2b-4716-b1e0-4fef8c9bfe1f',\n", + " '84e21ea4-8f4e-404b-8be2-d4742cf14100',\n", + " '9cacdab8-2e9e-4446-a6f6-ff0fe6cfaee7',\n", + " 'be28ee6f-7cb3-4c61-8d26-9a49a24214c3',\n", + " '8db480a9-63a0-4e0a-b9bb-5d4dd0ab9f20',\n", + " '1c02ce63-41d9-4e6e-974e-dd2a5fa1183f',\n", + " '221ff1d2-1c2d-4c0a-8710-e1a91d589f77',\n", + " 'ce2d47bd-e0b6-4833-af53-d2c284145863',\n", + " 'b9b4c676-7fb8-4d55-91b1-718a4194a84e',\n", + " 'e027479d-969e-485e-851a-2473cd2c2731',\n", + " '0d4673bd-843e-4ef6-91c3-6e46c8b506df',\n", + " '5ee4bb0c-e647-48b1-8dca-f173405d0ae8',\n", + " '8d4b5284-d94d-4bfd-bcf5-5fcda7b03a40',\n", + " '7fda53b6-048b-4712-b7e7-823dc41867a6',\n", + " 'f4a95cc5-a812-468a-8d61-c999405677be',\n", + " '79444afc-74d2-4149-8bc7-4608ca46b220',\n", + " 'e666034e-43ea-4629-b6f4-f77e27c379b0',\n", + " '9a664df4-1f99-4d38-9098-8a63bf322d43',\n", + " '51db3028-6637-42ec-bf24-f249b12a2b40',\n", + " 'c1897cfa-41de-4a88-be0c-a5ba965bf839',\n", + " '90b5dc02-8256-40a5-bb89-8fd303648011',\n", + " '43ebaa36-b8bf-4f84-9e37-c42377489522',\n", + " '567ce31c-8848-4974-87e8-ecff9b649371',\n", + " 'a0c2db0e-f565-4953-969e-ca11493c92d8',\n", + " '97f0c4c2-04df-447b-8a74-0c109d8e467f',\n", + " 'a8521022-a926-4e58-95dd-ecc26def8ee8',\n", + " 'e8a83f57-6717-403f-84e9-56ecded817c7',\n", + " '7bef3917-91d7-4277-bdbf-01b4526c19fb']" + ] + }, + "metadata": {}, + "execution_count": 22 + } ], - "text/plain": [ - " nodeProjection \\\n", - "0 {'CharacterStats': {'properties': {'stats_vect... \n", - "\n", - " relationshipProjection graphName nodeCount \\\n", - "0 {'__ALL__': {'orientation': 'NATURAL', 'aggreg... marvel 470 \n", - "\n", - " relationshipCount createMillis \n", - "0 515 36 " + "source": [ + "members = get_members()\n", + "get_largest_stable_allies(members)" ] - }, - "execution_count": 38, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "run_query(\"\"\"\n", - "CALL gds.graph.create('marvel', 'CharacterStats',\n", - " '*', {nodeProperties:'stats_vector'})\n", - "\"\"\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now, we can go ahead and infer the similarity network with the new kNN algorithm. We will use the mutate mode of the algorithm. The mutate mode stores the results back to the projected graph instead of the Neo4j stored graph. This way, we can use the kNN algorithm results as the input for the community detection algorithms later in the workflow. The kNN algorithm has some parameters we can use to fine-tune the results:\n", - "* topK: The number of neighbors to find for each node. The K-nearest neighbors are returned.\n", - "* sampleRate: Sample rate to limit the number of comparisons per node.\n", - "* deltaThreshold: Value as a percentage to determine when to stop early. If fewer updates than the configured value happen, the algorithm stops.\n", - "* randomJoins: Between every iteration, how many attempts are being made to connect new node neighbors based on random selection.\n", - "\n", - "We will define the topK value of 15 and sampleRate of 0.8, and leave the other parameters at default values." - ] - }, - { - "cell_type": "code", - "execution_count": 39, - "metadata": {}, - "outputs": [ + }, + { + "cell_type": "markdown", + "metadata": { + "id": "q-8YOnneSsIS" + }, + "source": [ + "In this code, the algorithm only returns the ids of nodes that belong to the largest allied component where there are no enemies within. It shouldn’t be a problem to mark these nodes in Neo4j, as you can match them by their ids. The largest component of allies, where there are no enemies within, has 142 members. If we visualize it in Neo4j Browser, we can see that there are no enemy relationships visible." + ] + }, { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
createMilliscomputeMillismutateMillispostProcessingMillisnodesComparedrelationshipsWrittensimilarityDistributionconfiguration
0027339-14707050{'p1': 0.2500009536743164, 'max': 1.0000066757...{'topK': 15, 'maxIterations': 100, 'randomJoin...
\n", - "
" + "cell_type": "code", + "source": [ + "run_query(\"\"\"\n", + "CALL gds.graph.drop('allies')\n", + "\"\"\")" ], - "text/plain": [ - " createMillis computeMillis mutateMillis postProcessingMillis \\\n", - "0 0 273 39 -1 \n", - "\n", - " nodesCompared relationshipsWritten \\\n", - "0 470 7050 \n", - "\n", - " similarityDistribution \\\n", - "0 {'p1': 0.2500009536743164, 'max': 1.0000066757... \n", - "\n", - " configuration \n", - "0 {'topK': 15, 'maxIterations': 100, 'randomJoin... " + "metadata": { + "id": "sJjJmZLju4a2", + "outputId": "eaaf61f0-6d12-43d4-9dc9-a92defb21900", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 142 + } + }, + "execution_count": 23, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " graphName database memoryUsage sizeInBytes nodeCount relationshipCount \\\n", + "0 allies neo4j -1 1105 313 \n", + "\n", + " configuration density \\\n", + "0 {'relationshipProjection': {'ALLY': {'orientat... 0.000257 \n", + "\n", + " creationTime modificationTime \\\n", + "0 2022-04-17T13:30:32.452544000+00:00 2022-04-17T13:30:32.479791000+00:00 \n", + "\n", + " schema \n", + "0 {'relationships': {'ALLY': {}}, 'nodes': {'Cha... " + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
graphNamedatabasememoryUsagesizeInBytesnodeCountrelationshipCountconfigurationdensitycreationTimemodificationTimeschema
0alliesneo4j-11105313{'relationshipProjection': {'ALLY': {'orientat...0.0002572022-04-17T13:30:32.452544000+00:002022-04-17T13:30:32.479791000+00:00{'relationships': {'ALLY': {}}, 'nodes': {'Cha...
\n", + "
\n", + " \n", + " \n", + " \n", + "\n", + " \n", + "
\n", + "
\n", + " " + ] + }, + "metadata": {}, + "execution_count": 23 + } ] - }, - "execution_count": 39, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "run_query(\"\"\"\n", - "CALL gds.beta.knn.mutate('marvel', {nodeWeightProperty:'stats_vector', \n", - " sampleRate:0.8, topK:15, mutateProperty:'score', mutateRelationshipType:'SIMILAR'})\n", - "\"\"\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Louvain Modularity algorithm\n", - "The similarity network is inferred and stored in the named graph. We can examine the community structure of this new similarity network with the Louvain Modularity algorithm. As the similarity scores of relationships are available as their properties, we will use the weighted variant of the Louvain Modularity algorithm. Using the `relationshipWeightProperty` parameter, we let the algorithm know it should consider the relationships’ weight when calculating the network’s community structure. This time we will use the `write` mode of the algorithm to store the results back to the Neo4j stored graph." - ] - }, - { - "cell_type": "code", - "execution_count": 40, - "metadata": {}, - "outputs": [ + }, { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
writeMillisnodePropertiesWrittenmodularitymodularitiesranLevelscommunityCountcommunityDistributionpostProcessingMilliscreateMilliscomputeMillisconfiguration
054700.628005[0.5516351717871184, 0.6280045375871258]28{'p99': 100, 'min': 15, 'max': 100, 'mean': 58...150373{'maxIterations': 10, 'writeConcurrency': 4, '...
\n", - "
" + "cell_type": "markdown", + "metadata": { + "id": "c0zjM88gSsIS" + }, + "source": [ + "## Analyzing characters’ stats\n", + "In the last part of our analysis, we will examine the stats of the characters. We have the stats available for a total of 470 heroes. This information was scraped from Marvel’s website. The scale for stats ranges from zero to seven, and Iron Man does not have a single seven. Probably not the strongest of the heroes, even though he is one of the more popular ones. Now we will explore the characters with the highest stats average. Whenever I need some help with my cypher queries, I turn to Neo4j Slack. Luckily for us, Andrew Bowman is always around with great advice on optimizing and prettifying our cypher queries. This time he showed me the `apoc.map.values` procedure. It can be used to fetch all properties of a single node without explicitly writing the property keys." + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": { + "id": "Hq_NzZ4qSsIS", + "outputId": "954cbafd-6973-4256-fdf1-bcf2fb837fd6", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 363 + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " character average_stats\n", + "0 Sasquatch (Walter Langkowski) 7.0\n", + "1 Squirrel Girl 7.0\n", + "2 Galactus 7.0\n", + "3 Deathstrike (Ultimate) 7.0\n", + "4 GRAYDON CREED 7.0\n", + "5 CHTHON 7.0\n", + "6 UNREVEALED; GAEA IS HER GREEK NAME GAEA 7.0\n", + "7 SET 7.0\n", + "8 Rogue (X-Men: Battle of the Atom) 7.0\n", + "9 Legion 7.0" + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
characteraverage_stats
0Sasquatch (Walter Langkowski)7.0
1Squirrel Girl7.0
2Galactus7.0
3Deathstrike (Ultimate)7.0
4GRAYDON CREED7.0
5CHTHON7.0
6UNREVEALED; GAEA IS HER GREEK NAME GAEA7.0
7SET7.0
8Rogue (X-Men: Battle of the Atom)7.0
9Legion7.0
\n", + "
\n", + " \n", + " \n", + " \n", + "\n", + " \n", + "
\n", + "
\n", + " " + ] + }, + "metadata": {}, + "execution_count": 24 + } ], - "text/plain": [ - " writeMillis nodePropertiesWritten modularity \\\n", - "0 5 470 0.628005 \n", - "\n", - " modularities ranLevels communityCount \\\n", - "0 [0.5516351717871184, 0.6280045375871258] 2 8 \n", - "\n", - " communityDistribution postProcessingMillis \\\n", - "0 {'p99': 100, 'min': 15, 'max': 100, 'mean': 58... 15 \n", - "\n", - " createMillis computeMillis \\\n", - "0 0 373 \n", - "\n", - " configuration \n", - "0 {'maxIterations': 10, 'writeConcurrency': 4, '... " + "source": [ + "run_query(\"\"\"\n", + "MATCH (c:Character)-[:HAS_STATS]->(stats)\n", + "RETURN c.name as character, \n", + " apoc.coll.avg(apoc.map.values(stats, keys(stats))) as average_stats\n", + "ORDER BY average_stats DESC\n", + "LIMIT 10\n", + "\"\"\")" ] - }, - "execution_count": 40, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "run_query(\"\"\"\n", - "CALL gds.louvain.write('marvel',\n", - " {relationshipTypes:['SIMILAR'], \n", - " relationshipWeightProperty:'score', \n", - " writeProperty:'louvain'});\n", - "\"\"\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We can examine the community structure results with the following cypher query." - ] - }, - { - "cell_type": "code", - "execution_count": 41, - "metadata": {}, - "outputs": [ + }, { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
communitymembersfighting_skillsdurabilityenergyintelligencespeedstrengthflight
01051003.6900004.1100002.9200003.2700003.1500003.7400000.840000
19446.0681826.8636366.6363646.3409096.9090916.8409091.750000
2372944.4042555.6382985.2446814.3191495.1382984.9574471.712766
3152604.1333333.2666672.3166673.0833332.9666673.2333330.350000
4151324.6250005.4062504.4687504.5000004.1562504.9375001.093750
536432.8837212.4883720.8139532.9534881.9302332.0697670.162791
6302823.1097562.5609762.0487802.8414632.3170732.2439020.597561
744154.6000003.5333332.2666673.4000003.1333334.1333330.000000
\n", - "
" + "cell_type": "markdown", + "metadata": { + "id": "S6eQ0u4ESsIT" + }, + "source": [ + "It seems many characters have their stats maxed out. I am not sure exactly how this data collection process works, but I found a fascinating heroine by the name of Squirrel Girl that could probably kick Iron Man’s ass with one hand while making sourdough bread with the other. Or polish her nails, not exactly sure what type of girl she is. The only thing certain is that she is a badass.\n", + "## k-Nearest Neighbours algorithm\n", + "The k-Nearest Neighbour is one of the more standard graph algorithms and was already implemented in the Graph Data Science library before in the form of Cosine, Euclidian, and Pearson similarity algorithms. Those were basic implementation where the algorithms compared a given vector for all node pairs in the network. Because comparing all node pairs does not scale well, another implementation of the kNN algorithm was added to the library. It is based on the Efficient k-nearest neighbor graph construction for generic similarity measures article. Instead of comparing every node pair, the algorithm selects possible neighbors based on the assumption that the neighbors-of-neighbors of a node are most likely already the nearest one. The algorithm scales quasi-linear with respect to the node count instead of being quadratic. The implementation uses the Cosine similarity to compare two vectors.\n", + "First, we need to create a vector (array of numbers) that will be compared between the pairs of heroes. We will use the characters’ stats as well as their ability to fly to populate the vector. Because all stats have the same range between zero and seven, there is no need for normalization. We only need to encode the flight feature to span between zero and seven as well. Those characters that can fly will have the value of flight feature seven, while those who can’t fly will have the value zero." + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": { + "id": "HnARtBNHSsIT", + "outputId": "36ed9ea8-668b-4e1f-be5c-11aac8ceddad", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 49 + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "Empty DataFrame\n", + "Columns: []\n", + "Index: []" + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + " \n", + " \n", + "\n", + " \n", + "
\n", + "
\n", + " " + ] + }, + "metadata": {}, + "execution_count": 25 + } ], - "text/plain": [ - " community members fighting_skills durability energy intelligence \\\n", - "0 105 100 3.690000 4.110000 2.920000 3.270000 \n", - "1 9 44 6.068182 6.863636 6.636364 6.340909 \n", - "2 372 94 4.404255 5.638298 5.244681 4.319149 \n", - "3 152 60 4.133333 3.266667 2.316667 3.083333 \n", - "4 151 32 4.625000 5.406250 4.468750 4.500000 \n", - "5 36 43 2.883721 2.488372 0.813953 2.953488 \n", - "6 302 82 3.109756 2.560976 2.048780 2.841463 \n", - "7 44 15 4.600000 3.533333 2.266667 3.400000 \n", - "\n", - " speed strength flight \n", - "0 3.150000 3.740000 0.840000 \n", - "1 6.909091 6.840909 1.750000 \n", - "2 5.138298 4.957447 1.712766 \n", - "3 2.966667 3.233333 0.350000 \n", - "4 4.156250 4.937500 1.093750 \n", - "5 1.930233 2.069767 0.162791 \n", - "6 2.317073 2.243902 0.597561 \n", - "7 3.133333 4.133333 0.000000 " + "source": [ + "run_query(\"\"\"\n", + "MATCH (c:Character)-[:HAS_STATS]->(s)\n", + "WITH c, [s.durability, s.energy, s.fighting_skills, \n", + " s.intelligence, s.speed, s.strength,\n", + " CASE WHEN c.flight = 'true' THEN 7 ELSE 0 END] as stats_vector\n", + "SET c.stats_vector = stats_vector\n", + "\"\"\")" ] - }, - "execution_count": 41, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "run_query(\"\"\"\n", - "MATCH (c:Character)-[:HAS_STATS]->(stats)\n", - "RETURN c.louvain as community, count(*) as members, \n", - " avg(stats.fighting_skills) as fighting_skills,\n", - " avg(stats.durability) as durability,\n", - " avg(stats.energy) as energy,\n", - " avg(stats.intelligence) as intelligence,\n", - " avg(stats.speed) as speed,\n", - " avg(stats.strength) as strength,\n", - " avg(CASE WHEN c.flight = 'true' THEN 7.0 ELSE 0.0 END) as flight\n", - "\"\"\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "It would make sense to add the standard deviation for each stat, but it wouldn’t be presentable for a blog post. The community with an id 68 has the most powerful members. The average for most stats is 6.5, which means that they are almost entirely maxed out. The average value of flight at 2 indicates that around 30% (2/7) of the members can fly. The largest community with 106 members has their stats averaged between 2 and 3, which would indicate that they might be support characters with lesser abilities. The characters with stronger abilities are usually the lead characters.\n", - "\n", - "## Label Propagation algorithm\n", - "Label Propagation algorithm can also be used to determine the community structure of a network. We will apply it to the inferred similarity network and compare the results with the Louvain Modularity algorithm results." - ] - }, - { - "cell_type": "code", - "execution_count": 42, - "metadata": {}, - "outputs": [ + }, { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
writeMillisnodePropertiesWrittenranIterationsdidConvergecommunityCountcommunityDistributionpostProcessingMilliscreateMilliscomputeMillisconfiguration
0747010False16{'p99': 132, 'min': 3, 'max': 132, 'mean': 29....7057{'maxIterations': 10, 'writeConcurrency': 4, '...
\n", - "
" + "cell_type": "markdown", + "metadata": { + "id": "jBJcE8-tSsIT" + }, + "source": [ + "We will also tag the characters that have the stats vector with a second label. This way, we can easily filter heroes with a stats vector in our native projection of the named graph." + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": { + "id": "C9FhxG_uSsIT", + "outputId": "304c8dd9-680c-4bec-eca1-068862baf3c5", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 49 + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "Empty DataFrame\n", + "Columns: []\n", + "Index: []" + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + " \n", + " \n", + "\n", + " \n", + "
\n", + "
\n", + " " + ] + }, + "metadata": {}, + "execution_count": 26 + } ], - "text/plain": [ - " writeMillis nodePropertiesWritten ranIterations didConverge \\\n", - "0 7 470 10 False \n", - "\n", - " communityCount communityDistribution \\\n", - "0 16 {'p99': 132, 'min': 3, 'max': 132, 'mean': 29.... \n", - "\n", - " postProcessingMillis createMillis computeMillis \\\n", - "0 7 0 57 \n", - "\n", - " configuration \n", - "0 {'maxIterations': 10, 'writeConcurrency': 4, '... " + "source": [ + "run_query(\"\"\"\n", + "MATCH (c:Character)\n", + "WHERE exists (c.stats_vector)\n", + "SET c:CharacterStats\n", + "\"\"\")" ] - }, - "execution_count": 42, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "run_query(\"\"\"\n", - "CALL gds.labelPropagation.write('marvel',\n", - " {relationshipTypes:['SIMILAR'],\n", - " relationshipWeightProperty:'score', \n", - " writeProperty:'labelPropagation'})\n", - "\"\"\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We investigate the results of the Label Propagation algorithm.\n" - ] - }, - { - "cell_type": "code", - "execution_count": 43, - "metadata": {}, - "outputs": [ + }, + { + "cell_type": "markdown", + "metadata": { + "id": "VlIOJaADSsIT" + }, + "source": [ + "Now that everything is ready, we can go ahead and load our named graph. We will project all nodes with the CharacterStats label and their stats_vector properties in a named graph. If you need a quick refresher or introduction to how the GDS library works, I would suggest taking the Introduction to Graph Algorithms course." + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": { + "id": "YBv0q7zESsIU", + "outputId": "dc5281e8-960f-4090-adb6-4eb597af3e6d", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 81 + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " nodeProjection \\\n", + "0 {'CharacterStats': {'label': 'CharacterStats',... \n", + "\n", + " relationshipProjection graphName nodeCount \\\n", + "0 {'__ALL__': {'orientation': 'NATURAL', 'aggreg... marvel 470 \n", + "\n", + " relationshipCount projectMillis \n", + "0 515 348 " + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
nodeProjectionrelationshipProjectiongraphNamenodeCountrelationshipCountprojectMillis
0{'CharacterStats': {'label': 'CharacterStats',...{'__ALL__': {'orientation': 'NATURAL', 'aggreg...marvel470515348
\n", + "
\n", + " \n", + " \n", + " \n", + "\n", + " \n", + "
\n", + "
\n", + " " + ] + }, + "metadata": {}, + "execution_count": 27 + } + ], + "source": [ + "run_query(\"\"\"\n", + "CALL gds.graph.project('marvel', 'CharacterStats',\n", + " '*', {nodeProperties:'stats_vector'})\n", + "\"\"\")" + ] + }, { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
communitymembersfighting_skillsdurabilityenergyintelligencespeedstrengthflight
02211323.4621213.9469702.9242423.2045453.1212123.5151521.166667
1105224.3181824.4090913.4545453.7727273.8636364.2727270.000000
2100144.9285716.7857146.4285715.8571436.9285716.7857142.000000
340480424.6904766.3095245.3571434.3571435.5714295.3571431.833333
487204.5500004.7500005.0500004.4500004.4500004.2000001.050000
5192234.5217392.6521742.3043482.8695652.6956522.5652170.608696
6136306.6000006.9000006.7333336.5666676.9000006.8666671.633333
785194.5263165.7894745.3684214.4736845.1052635.1052631.473684
8378243.5833333.3333332.0833333.1250003.2083333.5000000.000000
9119294.7586215.4827594.4827594.4482764.2758625.0000000.965517
10269352.2285711.9142860.8571432.8571431.7428571.6285710.200000
1124154.0000004.0000001.0000004.2000003.8000004.0000000.000000
1256931.0000001.0000001.0000001.0000001.0000001.0000000.000000
13113303.0000002.4333332.5333332.6333332.0333332.1333330.233333
14475304.2000002.9000001.2666673.1666672.2333332.7000000.700000
15583124.6666673.5000002.4166673.5000003.1666674.1666670.000000
\n", - "
" + "cell_type": "markdown", + "metadata": { + "id": "zqmGpbBfSsIU" + }, + "source": [ + "Now, we can go ahead and infer the similarity network with the new kNN algorithm. We will use the mutate mode of the algorithm. The mutate mode stores the results back to the projected graph instead of the Neo4j stored graph. This way, we can use the kNN algorithm results as the input for the community detection algorithms later in the workflow. The kNN algorithm has some parameters we can use to fine-tune the results:\n", + "* topK: The number of neighbors to find for each node. The K-nearest neighbors are returned.\n", + "* sampleRate: Sample rate to limit the number of comparisons per node.\n", + "* deltaThreshold: Value as a percentage to determine when to stop early. If fewer updates than the configured value happen, the algorithm stops.\n", + "* randomJoins: Between every iteration, how many attempts are being made to connect new node neighbors based on random selection.\n", + "\n", + "We will define the topK value of 15 and sampleRate of 0.8, and leave the other parameters at default values." + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "metadata": { + "id": "2ZxBxqg4SsIU", + "outputId": "809af1c6-e74a-4f45-e8c5-8b5f9d74f5c0", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 159 + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " ranIterations nodePairsConsidered didConverge preProcessingMillis \\\n", + "0 6 455147 True 0 \n", + "\n", + " computeMillis mutateMillis postProcessingMillis nodesCompared \\\n", + "0 1477 199 -1 470 \n", + "\n", + " relationshipsWritten similarityDistribution \\\n", + "0 7050 {'p1': 0.40000057220458984, 'max': 1.000006675... \n", + "\n", + " configuration \n", + "0 {'topK': 15, 'maxIterations': 100, 'randomJoin... " + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
ranIterationsnodePairsConsidereddidConvergepreProcessingMilliscomputeMillismutateMillispostProcessingMillisnodesComparedrelationshipsWrittensimilarityDistributionconfiguration
06455147True01477199-14707050{'p1': 0.40000057220458984, 'max': 1.000006675...{'topK': 15, 'maxIterations': 100, 'randomJoin...
\n", + "
\n", + " \n", + " \n", + " \n", + "\n", + " \n", + "
\n", + "
\n", + " " + ] + }, + "metadata": {}, + "execution_count": 29 + } + ], + "source": [ + "run_query(\"\"\"\n", + "CALL gds.knn.mutate('marvel', {nodeProperties:'stats_vector', \n", + " sampleRate:0.8, topK:15, mutateProperty:'score', mutateRelationshipType:'SIMILAR'})\n", + "\"\"\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "JgZtHETpSsIU" + }, + "source": [ + "## Louvain Modularity algorithm\n", + "The similarity network is inferred and stored in the named graph. We can examine the community structure of this new similarity network with the Louvain Modularity algorithm. As the similarity scores of relationships are available as their properties, we will use the weighted variant of the Louvain Modularity algorithm. Using the `relationshipWeightProperty` parameter, we let the algorithm know it should consider the relationships’ weight when calculating the network’s community structure. This time we will use the `write` mode of the algorithm to store the results back to the Neo4j stored graph." + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "metadata": { + "id": "hvFThW3mSsIU", + "outputId": "401bc3ad-2393-4047-9f8e-a2150160a1ef", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 142 + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " writeMillis nodePropertiesWritten modularity \\\n", + "0 285 470 0.6083 \n", + "\n", + " modularities ranLevels communityCount \\\n", + "0 [0.5742237273581017, 0.6082996409215549] 2 7 \n", + "\n", + " communityDistribution postProcessingMillis \\\n", + "0 {'p99': 119, 'min': 16, 'max': 119, 'mean': 67... 4 \n", + "\n", + " preProcessingMillis computeMillis \\\n", + "0 0 1496 \n", + "\n", + " configuration \n", + "0 {'maxIterations': 10, 'writeConcurrency': 4, '... " + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
writeMillisnodePropertiesWrittenmodularitymodularitiesranLevelscommunityCountcommunityDistributionpostProcessingMillispreProcessingMilliscomputeMillisconfiguration
02854700.6083[0.5742237273581017, 0.6082996409215549]27{'p99': 119, 'min': 16, 'max': 119, 'mean': 67...401496{'maxIterations': 10, 'writeConcurrency': 4, '...
\n", + "
\n", + " \n", + " \n", + " \n", + "\n", + " \n", + "
\n", + "
\n", + " " + ] + }, + "metadata": {}, + "execution_count": 30 + } ], - "text/plain": [ - " community members fighting_skills durability energy intelligence \\\n", - "0 221 132 3.462121 3.946970 2.924242 3.204545 \n", - "1 105 22 4.318182 4.409091 3.454545 3.772727 \n", - "2 100 14 4.928571 6.785714 6.428571 5.857143 \n", - "3 40480 42 4.690476 6.309524 5.357143 4.357143 \n", - "4 87 20 4.550000 4.750000 5.050000 4.450000 \n", - "5 192 23 4.521739 2.652174 2.304348 2.869565 \n", - "6 136 30 6.600000 6.900000 6.733333 6.566667 \n", - "7 85 19 4.526316 5.789474 5.368421 4.473684 \n", - "8 378 24 3.583333 3.333333 2.083333 3.125000 \n", - "9 119 29 4.758621 5.482759 4.482759 4.448276 \n", - "10 269 35 2.228571 1.914286 0.857143 2.857143 \n", - "11 241 5 4.000000 4.000000 1.000000 4.200000 \n", - "12 569 3 1.000000 1.000000 1.000000 1.000000 \n", - "13 113 30 3.000000 2.433333 2.533333 2.633333 \n", - "14 475 30 4.200000 2.900000 1.266667 3.166667 \n", - "15 583 12 4.666667 3.500000 2.416667 3.500000 \n", - "\n", - " speed strength flight \n", - "0 3.121212 3.515152 1.166667 \n", - "1 3.863636 4.272727 0.000000 \n", - "2 6.928571 6.785714 2.000000 \n", - "3 5.571429 5.357143 1.833333 \n", - "4 4.450000 4.200000 1.050000 \n", - "5 2.695652 2.565217 0.608696 \n", - "6 6.900000 6.866667 1.633333 \n", - "7 5.105263 5.105263 1.473684 \n", - "8 3.208333 3.500000 0.000000 \n", - "9 4.275862 5.000000 0.965517 \n", - "10 1.742857 1.628571 0.200000 \n", - "11 3.800000 4.000000 0.000000 \n", - "12 1.000000 1.000000 0.000000 \n", - "13 2.033333 2.133333 0.233333 \n", - "14 2.233333 2.700000 0.700000 \n", - "15 3.166667 4.166667 0.000000 " + "source": [ + "run_query(\"\"\"\n", + "CALL gds.louvain.write('marvel',\n", + " {relationshipTypes:['SIMILAR'], \n", + " relationshipWeightProperty:'score', \n", + " writeProperty:'louvain'});\n", + "\"\"\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "nw6aA0KWSsIV" + }, + "source": [ + "We can examine the community structure results with the following cypher query." + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "metadata": { + "id": "iri4abwxSsIV", + "outputId": "41300bee-b113-479d-863c-8a42349a8caa", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 269 + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " community members fighting_skills durability energy intelligence \\\n", + "0 144 119 4.252101 4.705882 3.697479 3.831933 \n", + "1 50 64 2.593750 2.218750 1.562500 2.750000 \n", + "2 235 77 3.948052 3.142857 2.324675 2.948052 \n", + "3 259 92 3.445652 3.554348 2.532609 3.326087 \n", + "4 120 16 4.500000 3.500000 2.375000 3.312500 \n", + "5 27 46 5.826087 6.739130 6.760870 6.413043 \n", + "6 67 56 4.660714 6.160714 5.267857 4.267857 \n", + "\n", + " speed strength flight \n", + "0 3.857143 4.344538 0.588235 \n", + "1 1.906250 1.921875 0.218750 \n", + "2 2.974026 2.961039 0.636364 \n", + "3 2.630435 3.086957 0.760870 \n", + "4 3.125000 4.125000 0.000000 \n", + "5 6.782609 6.565217 2.130435 \n", + "6 5.696429 5.500000 2.375000 " + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
communitymembersfighting_skillsdurabilityenergyintelligencespeedstrengthflight
01441194.2521014.7058823.6974793.8319333.8571434.3445380.588235
150642.5937502.2187501.5625002.7500001.9062501.9218750.218750
2235773.9480523.1428572.3246752.9480522.9740262.9610390.636364
3259923.4456523.5543482.5326093.3260872.6304353.0869570.760870
4120164.5000003.5000002.3750003.3125003.1250004.1250000.000000
527465.8260876.7391306.7608706.4130436.7826096.5652172.130435
667564.6607146.1607145.2678574.2678575.6964295.5000002.375000
\n", + "
\n", + " \n", + " \n", + " \n", + "\n", + " \n", + "
\n", + "
\n", + " " + ] + }, + "metadata": {}, + "execution_count": 31 + } + ], + "source": [ + "run_query(\"\"\"\n", + "MATCH (c:Character)-[:HAS_STATS]->(stats)\n", + "RETURN c.louvain as community, count(*) as members, \n", + " avg(stats.fighting_skills) as fighting_skills,\n", + " avg(stats.durability) as durability,\n", + " avg(stats.energy) as energy,\n", + " avg(stats.intelligence) as intelligence,\n", + " avg(stats.speed) as speed,\n", + " avg(stats.strength) as strength,\n", + " avg(CASE WHEN c.flight = 'true' THEN 7.0 ELSE 0.0 END) as flight\n", + "\"\"\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "EuDqexP_SsIV" + }, + "source": [ + "It would make sense to add the standard deviation for each stat, but it wouldn’t be presentable for a blog post. The community with an id 27 has the most powerful members. The average for most stats is 6.5, which means that they are almost entirely maxed out. The average value of flight at 2 indicates that around 30% (2/7) of the members can fly. The largest community with 106 members has their stats averaged between 2 and 3, which would indicate that they might be support characters with lesser abilities. The characters with stronger abilities are usually the lead characters.\n", + "\n", + "## Label Propagation algorithm\n", + "Label Propagation algorithm can also be used to determine the community structure of a network. We will apply it to the inferred similarity network and compare the results with the Louvain Modularity algorithm results." + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "metadata": { + "id": "Swa-elcbSsIV", + "outputId": "d1a8bc89-c333-496d-95b4-4c4e6014bdae", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 142 + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " writeMillis nodePropertiesWritten ranIterations didConverge \\\n", + "0 45 470 10 False \n", + "\n", + " communityCount communityDistribution \\\n", + "0 13 {'p99': 147, 'min': 3, 'max': 147, 'mean': 36.... \n", + "\n", + " postProcessingMillis preProcessingMillis computeMillis \\\n", + "0 4 0 693 \n", + "\n", + " configuration \n", + "0 {'maxIterations': 10, 'writeConcurrency': 4, '... " + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
writeMillisnodePropertiesWrittenranIterationsdidConvergecommunityCountcommunityDistributionpostProcessingMillispreProcessingMilliscomputeMillisconfiguration
04547010False13{'p99': 147, 'min': 3, 'max': 147, 'mean': 36....40693{'maxIterations': 10, 'writeConcurrency': 4, '...
\n", + "
\n", + " \n", + " \n", + " \n", + "\n", + " \n", + "
\n", + "
\n", + " " + ] + }, + "metadata": {}, + "execution_count": 32 + } + ], + "source": [ + "run_query(\"\"\"\n", + "CALL gds.labelPropagation.write('marvel',\n", + " {relationshipTypes:['SIMILAR'],\n", + " relationshipWeightProperty:'score', \n", + " writeProperty:'labelPropagation'})\n", + "\"\"\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "o-XFMfXYSsIV" + }, + "source": [ + "We investigate the results of the Label Propagation algorithm.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "metadata": { + "id": "x3bMD8sySsIW", + "outputId": "7617bf5f-8262-4a16-e9df-9503067867ff", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 457 + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " community members fighting_skills durability energy intelligence \\\n", + "0 218 147 4.020408 4.639456 3.510204 3.782313 \n", + "1 343 37 3.297297 2.270270 1.783784 2.810811 \n", + "2 118 65 3.969231 3.430769 2.476923 3.153846 \n", + "3 270 14 5.285714 5.357143 4.714286 4.785714 \n", + "4 589 30 3.566667 3.433333 2.600000 3.033333 \n", + "5 215 16 4.500000 3.500000 2.375000 3.312500 \n", + "6 675 7 2.000000 1.000000 1.857143 3.000000 \n", + "7 722 10 3.100000 2.300000 0.400000 3.200000 \n", + "8 216 38 2.842105 2.763158 1.921053 2.736842 \n", + "9 73 47 5.829787 6.744681 6.617021 6.276596 \n", + "10 150 42 4.595238 6.285714 5.547619 4.380952 \n", + "11 395 3 1.000000 1.000000 1.000000 1.000000 \n", + "12 390 14 3.357143 2.857143 2.428571 2.642857 \n", + "\n", + " speed strength flight \n", + "0 3.571429 4.197279 0.809524 \n", + "1 2.405405 2.027027 1.135135 \n", + "2 3.230769 3.292308 0.861538 \n", + "3 4.642857 4.857143 0.000000 \n", + "4 2.900000 3.033333 0.233333 \n", + "5 3.125000 4.125000 0.000000 \n", + "6 0.285714 0.857143 0.000000 \n", + "7 2.500000 1.900000 0.000000 \n", + "8 2.026316 2.342105 0.368421 \n", + "9 6.787234 6.574468 2.085106 \n", + "10 5.761905 5.595238 2.333333 \n", + "11 1.000000 1.000000 0.000000 \n", + "12 2.785714 2.571429 0.000000 " + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
communitymembersfighting_skillsdurabilityenergyintelligencespeedstrengthflight
02181474.0204084.6394563.5102043.7823133.5714294.1972790.809524
1343373.2972972.2702701.7837842.8108112.4054052.0270271.135135
2118653.9692313.4307692.4769233.1538463.2307693.2923080.861538
3270145.2857145.3571434.7142864.7857144.6428574.8571430.000000
4589303.5666673.4333332.6000003.0333332.9000003.0333330.233333
5215164.5000003.5000002.3750003.3125003.1250004.1250000.000000
667572.0000001.0000001.8571433.0000000.2857140.8571430.000000
7722103.1000002.3000000.4000003.2000002.5000001.9000000.000000
8216382.8421052.7631581.9210532.7368422.0263162.3421050.368421
973475.8297876.7446816.6170216.2765966.7872346.5744682.085106
10150424.5952386.2857145.5476194.3809525.7619055.5952382.333333
1139531.0000001.0000001.0000001.0000001.0000001.0000000.000000
12390143.3571432.8571432.4285712.6428572.7857142.5714290.000000
\n", + "
\n", + " \n", + " \n", + " \n", + "\n", + " \n", + "
\n", + "
\n", + " " + ] + }, + "metadata": {}, + "execution_count": 33 + } + ], + "source": [ + "run_query(\"\"\"\n", + "MATCH (c:Character)-[:HAS_STATS]->(stats)\n", + "RETURN c.labelPropagation as community, count(*) as members, \n", + " avg(stats.fighting_skills) as fighting_skills,\n", + " avg(stats.durability) as durability,\n", + " avg(stats.energy) as energy,\n", + " avg(stats.intelligence) as intelligence,\n", + " avg(stats.speed) as speed,\n", + " avg(stats.strength) as strength,\n", + " avg(CASE WHEN c.flight = 'true' THEN 7.0 ELSE 0.0 END) as flight\n", + "\"\"\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "OUqOLjljSsIW" + }, + "source": [ + "We can notice that the Label Propagation algorithm found twice as many communities as the Louvain Modularity algorithm. Some of them are relatively tiny. For example, the community with an id 395 has only three members, and all their average stats are at 1.0 value. They are the heroes that go by the name of Maggott, Deathbird, and Slayback. Funky names. The most powerful community has an id of 137 and only 23 members. Remember, the most powerful community found by the Louvain Modularity algorithm had 46 members and a slightly lower value of average stats.\n", + "\n", + "## Conclusion\n", + "I hope you have learned some tricks on performing network analysis in Neo4j with the help of APOC and GDS libraries. There are still many things we could do with this graph, so expect a new post shortly." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "BcGLRfhVSsIW" + }, + "outputs": [], + "source": [ + "" ] - }, - "execution_count": 43, - "metadata": {}, - "output_type": "execute_result" } - ], - "source": [ - "run_query(\"\"\"\n", - "MATCH (c:Character)-[:HAS_STATS]->(stats)\n", - "RETURN c.labelPropagation as community, count(*) as members, \n", - " avg(stats.fighting_skills) as fighting_skills,\n", - " avg(stats.durability) as durability,\n", - " avg(stats.energy) as energy,\n", - " avg(stats.intelligence) as intelligence,\n", - " avg(stats.speed) as speed,\n", - " avg(stats.strength) as strength,\n", - " avg(CASE WHEN c.flight = 'true' THEN 7.0 ELSE 0.0 END) as flight\n", - "\"\"\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We can notice that the Label Propagation algorithm found twice as many communities as the Louvain Modularity algorithm. Some of them are relatively tiny. For example, the community with an id 693 has only three members, and all their average stats are at 1.0 value. They are the heroes that go by the name of Maggott, Deathbird, and Slayback. Funky names. The most powerful community has an id of 137 and only 23 members. Remember, the most powerful community found by the Louvain Modularity algorithm had 46 members and a slightly lower value of average stats.\n", - "\n", - "## Conclusion\n", - "I hope you have learned some tricks on performing network analysis in Neo4j with the help of APOC and GDS libraries. There are still many things we could do with this graph, so expect a new post shortly." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "scispacy", - "language": "python", - "name": "scispacy" + ], + "metadata": { + "kernelspec": { + "display_name": "scispacy", + "language": "python", + "name": "scispacy" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.10" + }, + "colab": { + "name": "Exploratory graph analysis.ipynb", + "provenance": [], + "include_colab_link": true + } }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.10" - } - }, - "nbformat": 4, - "nbformat_minor": 4 -} + "nbformat": 4, + "nbformat_minor": 0 +} \ No newline at end of file