-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path10-prog.html
129 lines (123 loc) · 10.8 KB
/
10-prog.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="generator" content="pandoc">
<title>Software Carpentry: Databases and SQL</title>
<link rel="shortcut icon" type="image/x-icon" href="/favicon.ico" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<link rel="stylesheet" type="text/css" href="css/bootstrap/bootstrap.css" />
<link rel="stylesheet" type="text/css" href="css/bootstrap/bootstrap-theme.css" />
<link rel="stylesheet" type="text/css" href="css/swc.css" />
<link rel="alternate" type="application/rss+xml" title="Software Carpentry Blog" href="http://software-carpentry.org/feed.xml"/>
<meta charset="UTF-8" />
<!-- HTML5 shim, for IE6-8 support of HTML5 elements -->
<!--[if lt IE 9]>
<script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
<![endif]-->
</head>
<body class="lesson">
<div class="container card">
<div class="banner">
<a href="http://software-carpentry.org" title="Software Carpentry">
<img alt="Software Carpentry banner" src="img/software-carpentry-banner.png" />
</a>
</div>
<article>
<div class="row">
<div class="col-md-10 col-md-offset-1">
<h1 class="title">Databases and SQL</h1>
<h2 class="subtitle">Programming with Databases</h2>
<div id="learning-objectives" class="objectives panel panel-warning">
<div class="panel-heading">
<h2 id="learning-objectives" class="objectives panel panel-warning"><span class="glyphicon glyphicon-certificate"></span>Learning Objectives</h2>
</div>
<div class="panel-body">
<ul>
<li>Write short programs that execute SQL queries.</li>
<li>Trace the execution of a program that contains an SQL query.</li>
<li>Explain why most database applications are written in a general-purpose language rather than in SQL.</li>
</ul>
</div>
</div>
<p>To close, let’s have a look at how to access a database from a general-purpose programming language like Python. Other languages use almost exactly the same model: library and function names may differ, but the concepts are the same.</p>
<p>Here’s a short Python program that selects latitudes and longitudes from an SQLite database stored in a file called <code>survey.db</code>:</p>
<pre class="sourceCode python"><code class="sourceCode python"><span class="ch">import</span> sqlite3
connection = sqlite3.<span class="ot">connect</span>(<span class="st">"survey.db"</span>)
cursor = connection.cursor()
cursor.execute(<span class="st">"SELECT site.lat, site.long FROM site;"</span>)
results = cursor.fetchall()
<span class="kw">for</span> r in results:
<span class="dt">print</span> r
cursor.close()
connection.close()</code></pre>
<pre class="output"><code>(-49.85, -128.57)
(-47.15, -126.72)
(-48.87, -123.4)</code></pre>
<p>The program starts by importing the <code>sqlite3</code> library. If we were connecting to MySQL, DB2, or some other database, we would import a different library, but all of them provide the same functions, so that the rest of our program does not have to change (at least, not much) if we switch from one database to another.</p>
<p>Line 2 establishes a connection to the database. Since we’re using SQLite, all we need to specify is the name of the database file. Other systems may require us to provide a username and password as well. Line 3 then uses this connection to create a <a href="reference.html#cursor">cursor</a>. Just like the cursor in an editor, its role is to keep track of where we are in the database.</p>
<p>On line 4, we use that cursor to ask the database to execute a query for us. The query is written in SQL, and passed to <code>cursor.execute</code> as a string. It’s our job to make sure that SQL is properly formatted; if it isn’t, or if something goes wrong when it is being executed, the database will report an error.</p>
<p>The database returns the results of the query to us in response to the <code>cursor.fetchall</code> call on line 5. This result is a list with one entry for each record in the result set; if we loop over that list (line 6) and print those list entries (line 7), we can see that each one is a tuple with one element for each field we asked for.</p>
<p>Finally, lines 8 and 9 close our cursor and our connection, since the database can only keep a limited number of these open at one time. Since establishing a connection takes time, though, we shouldn’t open a connection, do one operation, then close the connection, only to reopen it a few microseconds later to do another operation. Instead, it’s normal to create one connection that stays open for the lifetime of the program.</p>
<p>Queries in real applications will often depend on values provided by users. For example, this function takes a user’s ID as a parameter and returns their name:</p>
<pre class="sourceCode python"><code class="sourceCode python"><span class="kw">def</span> get_name(database_file, person_ident):
query = <span class="st">"SELECT personal || '; '; || family FROM Person WHERE ident=';"</span> + person_ident + <span class="st">"';;"</span>
connection = sqlite3.<span class="ot">connect</span>(database_file)
cursor = connection.cursor()
cursor.execute(query)
results = cursor.fetchall()
cursor.close()
connection.close()
<span class="kw">return</span> results[<span class="dv">0</span>][<span class="dv">0</span>]
<span class="dt">print</span> <span class="st">"full name for dyer:"</span>, get_name(<span class="st">';survey.db'</span>;, <span class="st">';dyer'</span>;)</code></pre>
<pre class="output"><code>full name for dyer: William Dyer</code></pre>
<p>We use string concatenation on the first line of this function to construct a query containing the user ID we have been given. This seems simple enough, but what happens if someone gives us this string as input?</p>
<pre class="sourceCode sql"><code class="sourceCode sql">dyer<span class="st">'; DROP TABLE Survey; SELECT '</span></code></pre>
<p>It looks like there’s garbage after the name of the project, but it is very carefully chosen garbage. If we insert this string into our query, the result is:</p>
<pre class="sourceCode sql"><code class="sourceCode sql"><span class="kw">SELECT</span> personal || <span class="st">' '</span> || family <span class="kw">FROM</span> Person <span class="kw">WHERE</span> ident=<span class="st">'dyer'</span>; <span class="kw">DROP</span> <span class="kw">TABLE</span> Survey; <span class="kw">SELECT</span> <span class="st">''</span>;</code></pre>
<p>If we execute this, it will erase one of the tables in our database.</p>
<p>This is called an <a href="reference.html#sql-injection-attack">SQL injection attack</a>, and it has been used to attack thousands of programs over the years. In particular, many web sites that take data from users insert values directly into queries without checking them carefully first.</p>
<p>Since a villain might try to smuggle commands into our queries in many different ways, the safest way to deal with this threat is to replace characters like quotes with their escaped equivalents, so that we can safely put whatever the user gives us inside a string. We can do this by using a <a href="reference.html#prepared-statement">prepared statement</a> instead of formatting our statements as strings. Here’s what our example program looks like if we do this:</p>
<pre class="sourceCode python"><code class="sourceCode python"><span class="kw">def</span> get_name(database_file, person_ident):
query = <span class="st">"SELECT personal || '; '; || family FROM Person WHERE ident=?;"</span>
connection = sqlite3.<span class="ot">connect</span>(database_file)
cursor = connection.cursor()
cursor.execute(query, [person_ident])
results = cursor.fetchall()
cursor.close()
connection.close()
<span class="kw">return</span> results[<span class="dv">0</span>][<span class="dv">0</span>]
<span class="dt">print</span> <span class="st">"full name for dyer:"</span>, get_name(<span class="st">';survey.db'</span>;, <span class="st">';dyer'</span>;)</code></pre>
<pre class="output"><code>full name for dyer: William Dyer</code></pre>
<p>The key changes are in the query string and the <code>execute</code> call. Instead of formatting the query ourselves, we put question marks in the query template where we want to insert values. When we call <code>execute</code>, we provide a list that contains as many values as there are question marks in the query. The library matches values to question marks in order, and translates any special characters in the values into their escaped equivalents so that they are safe to use.</p>
<div id="filling-a-table-vs.printing-values" class="challenge panel panel-success">
<div class="panel-heading">
<h2 id="filling-a-table-vs.printing-values" class="challenge panel panel-success"><span class="glyphicon glyphicon-pencil"></span>Filling a Table vs. Printing Values</h2>
</div>
<div class="panel-body">
<p>Write a Python program that creates a new database in a file called <code>original.db</code> containing a single table called <code>Pressure</code>, with a single field called <code>reading</code>, and inserts 100,000 random numbers between 10.0 and 25.0. How long does it take this program to run? How long does it take to run a program that simply writes those random numbers to a file?</p>
</div>
</div>
<div id="filtering-in-sql-vs.filtering-in-python" class="challenge panel panel-success">
<div class="panel-heading">
<h2 id="filtering-in-sql-vs.filtering-in-python" class="challenge panel panel-success"><span class="glyphicon glyphicon-pencil"></span>Filtering in SQL vs. Filtering in Python</h2>
</div>
<div class="panel-body">
<p>Write a Python program that creates a new database called <code>backup.db</code> with the same structure as <code>original.db</code> and copies all the values greater than 20.0 from <code>original.db</code> to <code>backup.db</code>. Which is faster: filtering values in the query, or reading everything into memory and filtering in Python?</p>
</div>
</div>
</div>
</div>
</article>
<div class="footer">
<a class="label swc-blue-bg" href="http://software-carpentry.org">Software Carpentry</a>
<a class="label swc-blue-bg" href="https://github.com/swcarpentry/sql-novice-survey">Source</a>
<a class="label swc-blue-bg" href="mailto:[email protected]">Contact</a>
<a class="label swc-blue-bg" href="LICENSE.html">License</a>
</div>
</div>
<!-- Javascript placed at the end of the document so the pages load faster -->
<script src="http://software-carpentry.org/v5/js/jquery-1.9.1.min.js"></script>
<script src="css/bootstrap/bootstrap-js/bootstrap.js"></script>
</body>
</html>