Added global and temp view example

tirthajyoti · Jul 13, 2019 · 292a66c · 292a66c
1 parent d640745
commit 292a66c
Showing 1 changed file with 192 additions and 0 deletions.
diff --git a/Dataframe_SQL_query.ipynb b/Dataframe_SQL_query.ipynb
@@ -54,6 +54,15 @@
     "![catalyst-2](https://cdn-images-1.medium.com/max/1500/1*81ZOMxCci-tM2b-HNUX6Ww.png)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Useful references for this Notebook\n",
+    "* [PySpark in Jupyter Notebook — Working with Dataframe & JDBC Data Sources](https://medium.com/@thucnc/pyspark-in-jupyter-notebook-working-with-dataframe-jdbc-data-sources-6f3d39300bf6)\n",
+    "* [PySpark - Working with JDBC Sqlite database](http://mitzen.blogspot.com/2017/06/pyspark-working-with-jdbc-sqlite.html)"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -642,6 +651,189 @@
    "source": [
     "df_combined.show()"
    ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### What's the difference between temporary and global SQL views? "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### A temporary view does not persist (shared) across multiple sessions"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 27,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "+--------+--------------------+\n",
+      "|ArtistId|                Name|\n",
+      "+--------+--------------------+\n",
+      "|       1|               AC/DC|\n",
+      "|       2|              Accept|\n",
+      "|       3|           Aerosmith|\n",
+      "|       4|   Alanis Morissette|\n",
+      "|       5|     Alice In Chains|\n",
+      "|       6|Antônio Carlos Jobim|\n",
+      "|       7|        Apocalyptica|\n",
+      "|       8|          Audioslave|\n",
+      "|       9|            BackBeat|\n",
+      "|      10|        Billy Cobham|\n",
+      "+--------+--------------------+\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "df_artists.createOrReplaceTempView(\"temp_artists\")\n",
+    "\n",
+    "df_temp = spark1.sql(\"SELECT * FROM temp_artists LIMIT 10\")\n",
+    "df_temp.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### A new session is created but the temp view `temp_artists` cannot be accessed"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 28,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "spark2 = SparkSession.builder.appName('SQL2').getOrCreate()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### We use `try...except` to catch the error and display a generic message"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 29,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "try:\n",
+    "    df_temp = spark2.sql(\"SELECT * FROM temp_artists LIMIT 10\")\n",
+    "except:\n",
+    "    print(\"Error happened in this execution\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Now, a global view is created in this session\n",
+    "Global temporary view is tied to a system preserved database `global_temp`. So the view name must be referenced as such."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 30,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "tablename = \"artists\"\n",
+    "df_artists = spark2.read.format(\"jdbc\").option(\"url\", url).option(\"dbtable\", tablename).option(\"driver\", driver).load()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 31,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "+--------+--------------------+\n",
+      "|ArtistId|                Name|\n",
+      "+--------+--------------------+\n",
+      "|       1|               AC/DC|\n",
+      "|       2|              Accept|\n",
+      "|       3|           Aerosmith|\n",
+      "|       4|   Alanis Morissette|\n",
+      "|       5|     Alice In Chains|\n",
+      "|       6|Antônio Carlos Jobim|\n",
+      "|       7|        Apocalyptica|\n",
+      "|       8|          Audioslave|\n",
+      "|       9|            BackBeat|\n",
+      "|      10|        Billy Cobham|\n",
+      "+--------+--------------------+\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "df_artists.createOrReplaceGlobalTempView(\"global_artists\")\n",
+    "\n",
+    "df_global = spark2.sql(\"SELECT * FROM global_temp.global_artists LIMIT 10\")\n",
+    "df_global.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Start a new session. The view `global_artists` can be accessed across the sessions"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 32,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "spark3 = SparkSession.builder.appName('SQL3').getOrCreate()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 33,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "+--------+--------------------+\n",
+      "|ArtistId|                Name|\n",
+      "+--------+--------------------+\n",
+      "|       1|               AC/DC|\n",
+      "|       2|              Accept|\n",
+      "|       3|           Aerosmith|\n",
+      "|       4|   Alanis Morissette|\n",
+      "|       5|     Alice In Chains|\n",
+      "|       6|Antônio Carlos Jobim|\n",
+      "|       7|        Apocalyptica|\n",
+      "|       8|          Audioslave|\n",
+      "|       9|            BackBeat|\n",
+      "|      10|        Billy Cobham|\n",
+      "+--------+--------------------+\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "df_global = spark3.sql(\"SELECT * FROM global_temp.global_artists LIMIT 10\")\n",
+    "df_global.show()"
+   ]
   }
  ],
  "metadata": {