Skip to content

Commit

Permalink
Added global and temp view example
Browse files Browse the repository at this point in the history
  • Loading branch information
tirthajyoti committed Jul 13, 2019
1 parent d640745 commit 292a66c
Showing 1 changed file with 192 additions and 0 deletions.
192 changes: 192 additions & 0 deletions Dataframe_SQL_query.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,15 @@
"![catalyst-2](https://cdn-images-1.medium.com/max/1500/1*81ZOMxCci-tM2b-HNUX6Ww.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Useful references for this Notebook\n",
"* [PySpark in Jupyter Notebook — Working with Dataframe & JDBC Data Sources](https://medium.com/@thucnc/pyspark-in-jupyter-notebook-working-with-dataframe-jdbc-data-sources-6f3d39300bf6)\n",
"* [PySpark - Working with JDBC Sqlite database](http://mitzen.blogspot.com/2017/06/pyspark-working-with-jdbc-sqlite.html)"
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down Expand Up @@ -642,6 +651,189 @@
"source": [
"df_combined.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### What's the difference between temporary and global SQL views? "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### A temporary view does not persist (shared) across multiple sessions"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"+--------+--------------------+\n",
"|ArtistId| Name|\n",
"+--------+--------------------+\n",
"| 1| AC/DC|\n",
"| 2| Accept|\n",
"| 3| Aerosmith|\n",
"| 4| Alanis Morissette|\n",
"| 5| Alice In Chains|\n",
"| 6|Antônio Carlos Jobim|\n",
"| 7| Apocalyptica|\n",
"| 8| Audioslave|\n",
"| 9| BackBeat|\n",
"| 10| Billy Cobham|\n",
"+--------+--------------------+\n",
"\n"
]
}
],
"source": [
"df_artists.createOrReplaceTempView(\"temp_artists\")\n",
"\n",
"df_temp = spark1.sql(\"SELECT * FROM temp_artists LIMIT 10\")\n",
"df_temp.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### A new session is created but the temp view `temp_artists` cannot be accessed"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [],
"source": [
"spark2 = SparkSession.builder.appName('SQL2').getOrCreate()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### We use `try...except` to catch the error and display a generic message"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [],
"source": [
"try:\n",
" df_temp = spark2.sql(\"SELECT * FROM temp_artists LIMIT 10\")\n",
"except:\n",
" print(\"Error happened in this execution\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Now, a global view is created in this session\n",
"Global temporary view is tied to a system preserved database `global_temp`. So the view name must be referenced as such."
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [],
"source": [
"tablename = \"artists\"\n",
"df_artists = spark2.read.format(\"jdbc\").option(\"url\", url).option(\"dbtable\", tablename).option(\"driver\", driver).load()"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"+--------+--------------------+\n",
"|ArtistId| Name|\n",
"+--------+--------------------+\n",
"| 1| AC/DC|\n",
"| 2| Accept|\n",
"| 3| Aerosmith|\n",
"| 4| Alanis Morissette|\n",
"| 5| Alice In Chains|\n",
"| 6|Antônio Carlos Jobim|\n",
"| 7| Apocalyptica|\n",
"| 8| Audioslave|\n",
"| 9| BackBeat|\n",
"| 10| Billy Cobham|\n",
"+--------+--------------------+\n",
"\n"
]
}
],
"source": [
"df_artists.createOrReplaceGlobalTempView(\"global_artists\")\n",
"\n",
"df_global = spark2.sql(\"SELECT * FROM global_temp.global_artists LIMIT 10\")\n",
"df_global.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Start a new session. The view `global_artists` can be accessed across the sessions"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [],
"source": [
"spark3 = SparkSession.builder.appName('SQL3').getOrCreate()"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"+--------+--------------------+\n",
"|ArtistId| Name|\n",
"+--------+--------------------+\n",
"| 1| AC/DC|\n",
"| 2| Accept|\n",
"| 3| Aerosmith|\n",
"| 4| Alanis Morissette|\n",
"| 5| Alice In Chains|\n",
"| 6|Antônio Carlos Jobim|\n",
"| 7| Apocalyptica|\n",
"| 8| Audioslave|\n",
"| 9| BackBeat|\n",
"| 10| Billy Cobham|\n",
"+--------+--------------------+\n",
"\n"
]
}
],
"source": [
"df_global = spark3.sql(\"SELECT * FROM global_temp.global_artists LIMIT 10\")\n",
"df_global.show()"
]
}
],
"metadata": {
Expand Down

0 comments on commit 292a66c

Please sign in to comment.