diff --git a/quarto/.gitignore b/quarto/.gitignore index 097cf229..e5b5ac02 100644 --- a/quarto/.gitignore +++ b/quarto/.gitignore @@ -3,4 +3,5 @@ /_freeze/ /*/*_files/ /*/*.ipynb/ +/*/references.bib weave_support.jl \ No newline at end of file diff --git a/quarto/ODEs/differential_equations.qmd b/quarto/ODEs/differential_equations.qmd index 992dbb9f..082e3245 100644 --- a/quarto/ODEs/differential_equations.qmd +++ b/quarto/ODEs/differential_equations.qmd @@ -71,12 +71,13 @@ $$ The author's apply this model to flu statistics from Hong Kong where: - +$$ \begin{align*} S(0) &= 7,900,000\\ I(0) &= 10\\ R(0) &= 0\\ \end{align*} +$$ In `Julia` we define these, `N` to model the total population, and `u0` to be the proportions. @@ -130,12 +131,13 @@ The plot shows steady decay, as there is no mixing of infected with others. Adding in the interaction requires a bit more work. We now have what is known as a *system* of equations: - +$$ \begin{align*} \frac{ds}{dt} &= -b \cdot s(t) \cdot i(t)\\ \frac{di}{dt} &= b \cdot s(t) \cdot i(t) - k \cdot i(t)\\ \frac{dr}{dt} &= k \cdot i(t)\\ \end{align*} +$$ Systems of equations can be solved in a similar manner as a single ordinary differential equation, though adjustments are made to accommodate the multiple functions. @@ -277,11 +279,12 @@ We now solve numerically the problem of a trajectory with a drag force from air The general model is: - +$$ \begin{align*} x''(t) &= - W(t,x(t), x'(t), y(t), y'(t)) \cdot x'(t)\\ y''(t) &= -g - W(t,x(t), x'(t), y(t), y'(t)) \cdot y'(t)\\ \end{align*} +$$ with initial conditions: $x(0) = y(0) = 0$ and $x'(0) = v_0 \cos(\theta), y'(0) = v_0 \sin(\theta)$. diff --git a/quarto/ODEs/euler.qmd b/quarto/ODEs/euler.qmd index 92dc4a73..ceff1cbf 100644 --- a/quarto/ODEs/euler.qmd +++ b/quarto/ODEs/euler.qmd @@ -602,12 +602,13 @@ $$ We can try the Euler method here. A simple approach might be this iteration scheme: - +$$ \begin{align*} x_{n+1} &= x_n + h,\\ u_{n+1} &= u_n + h v_n,\\ v_{n+1} &= v_n - h \cdot g/l \cdot \sin(u_n). \end{align*} +$$ Here we need *two* initial conditions: one for the initial value $u(t_0)$ and the initial value of $u'(t_0)$. We have seen if we start at an angle $a$ and release the bob from rest, so $u'(0)=0$ we get a sinusoidal answer to the linearized model. What happens here? We let $a=1$, $l=5$ and $g=9.8$: diff --git a/quarto/ODEs/odes.qmd b/quarto/ODEs/odes.qmd index 03cbe85c..55e87225 100644 --- a/quarto/ODEs/odes.qmd +++ b/quarto/ODEs/odes.qmd @@ -86,12 +86,13 @@ $$ Again, we can integrate to get an answer for any value $t$: - +$$ \begin{align*} x(t) - x(t_0) &= \int_{t_0}^t \frac{dx}{dt} dt \\ &= (v_0t + \frac{1}{2}a t^2 - at_0 t) |_{t_0}^t \\ &= (v_0 - at_0)(t - t_0) + \frac{1}{2} a (t^2 - t_0^2). \end{align*} +$$ There are three constants: the initial value for the independent variable, $t_0$, and the two initial values for the velocity and position, $v_0, x_0$. Assuming $t_0 = 0$, we can simplify the above to get a formula familiar from introductory physics: @@ -336,11 +337,12 @@ Differential equations are classified according to their type. Different types h The first-order initial value equations we have seen can be described generally by - +$$ \begin{align*} y'(x) &= F(y,x),\\ y(x_0) &= x_0. \end{align*} +$$ Special cases include: @@ -667,12 +669,13 @@ Though `y` is messy, it can be seen that the answer is a quadratic polynomial in In a resistive medium, there are drag forces at play. If this force is proportional to the velocity, say, with proportion $\gamma$, then the equations become: - +$$ \begin{align*} x''(t) &= -\gamma x'(t), & \quad y''(t) &= -\gamma y'(t) -g, \\ x(0) &= x_0, &\quad y(0) &= y_0,\\ x'(0) &= v_0\cos(\alpha),&\quad y'(0) &= v_0 \sin(\alpha). \end{align*} +$$ We now attempt to solve these. diff --git a/quarto/_make_pdf.jl b/quarto/_make_pdf.jl index f0defcc2..6dfc3e3c 100644 --- a/quarto/_make_pdf.jl +++ b/quarto/_make_pdf.jl @@ -17,11 +17,13 @@ execute: format: typst: toc: false - section-numbering: "1." + section-numbering: "1.1.1" + number-depth: 3 keep-typ: false include-before-body: - text: | #set figure(placement: auto) +bibliography: references.bib --- """ diff --git a/quarto/_quarto.yml b/quarto/_quarto.yml index d44c3cdb..357c5a21 100644 --- a/quarto/_quarto.yml +++ b/quarto/_quarto.yml @@ -1,4 +1,4 @@ -version: "0.21" +version: "0.22" engine: julia project: diff --git a/quarto/alternatives/SciML.qmd b/quarto/alternatives/SciML.qmd index 124299ae..80925955 100644 --- a/quarto/alternatives/SciML.qmd +++ b/quarto/alternatives/SciML.qmd @@ -573,13 +573,14 @@ As well, suppose we wanted to parameterize our function and then differentiate. Consider $d/dp \int_0^\pi \sin(px) dx$. We can do this integral directly to get - +$$ \begin{align*} \frac{d}{dp} \int_0^\pi \sin(px)dx &= \frac{d}{dp}\left( \frac{-1}{p} \cos(px)\Big\rvert_0^\pi\right)\\ &= \frac{d}{dp}\left( -\frac{\cos(p\cdot\pi)-1}{p}\right)\\ &= \frac{\cos(p\cdot \pi) - 1}{p^2} + \frac{\pi\cdot\sin(p\cdot\pi)}{p} \end{align*} +$$ Using `Integrals` with `QuadGK` we have: diff --git a/quarto/derivatives/condition.qmd b/quarto/derivatives/condition.qmd index 98c94aed..b7ff1b8a 100644 --- a/quarto/derivatives/condition.qmd +++ b/quarto/derivatives/condition.qmd @@ -2,73 +2,74 @@ In Part III of @doi:10.1137/1.9781611977165 we find language of numerical analysis useful to formally describe the zero-finding problem. Key concepts are errors, conditioning, and stability. -Abstractly a *problem* is a mapping, or function, from a domain $X$ of data to a range $Y$ of solutions. Both $X$ and $Y$ have a sense of distance given by a *norm*. A norm is a generalization of the absolute value. +Abstractly a *problem* is a mapping, $F$, from a domain $X$ of data to a range $Y$ of solutions. Both $X$ and $Y$ have a sense of distance given by a *norm*. A norm is a generalization of the absolute value and gives quantitative meaning to terms like small and large. -> A *well-conditioned* problem is one with the property that all small perturbations of $x$ lead to only small changes in $f(x)$. +> A *well-conditioned* problem is one with the property that all small perturbations of $x$ lead to only small changes in $F(x)$. -This sense of "small" is quantified through a *condition number*. +This sense of "small" is measured through a *condition number*. -If we let $\delta_x$ be a small perturbation of $x$ then $\delta_f = f(x + \delta_x) - f(x)$. +If we let $\delta_x$ be a small perturbation of $x$ then $\delta_F = F(x + \delta_x) - F(x)$. -The *forward error* is $\lVert\delta_f\rVert = \lVert f(x+\delta_x) - f(x)\rVert$, the *relative forward error* is $\lVert\delta_f\rVert/\lVert f\rVert = \lVert f(x+\delta_x) - f(x)\rVert/ \lVert f(x)\rVert$. +The *forward error* is $\lVert\delta_F\rVert = \lVert F(x+\delta_x) - F(x)\rVert$, the *relative forward error* is $\lVert\delta_F\rVert/\lVert F\rVert = \lVert F(x+\delta_x) - F(x)\rVert/ \lVert F(x)\rVert$. The *backward error* is $\lVert\delta_x\rVert$, the *relative backward error* is $\lVert\delta_x\rVert / \lVert x\rVert$. - The *absolute condition number* $\hat{\kappa}$ is worst case of this ratio $\lVert\delta_f\rVert/ \lVert\delta_x\rVert$ as the perturbation size shrinks to $0$. -The relative condition number $\kappa$ divides $\lVert\delta_f\rVert$ by $\lVert f(x)\rVert$ and $\lVert\delta_x\rVert$ by $\lVert x\rVert$ before taking the ratio. + The *absolute condition number* $\hat{\kappa}$ is worst case of this ratio $\lVert\delta_F\rVert/ \lVert\delta_x\rVert$ as the perturbation size shrinks to $0$. +The relative condition number $\kappa$ divides $\lVert\delta_F\rVert$ by $\lVert F(x)\rVert$ and $\lVert\delta_x\rVert$ by $\lVert x\rVert$ before taking the ratio. -A *problem* is a mathematical concept, an *algorithm* the computational version. Algorithms may differ for many reasons, such as floating point errors, tolerances, etc. We use notation $\tilde{f}$ to indicate the algorithm. +A *problem* is a mathematical concept, an *algorithm* the computational version. Algorithms may differ for many reasons, such as floating point errors, tolerances, etc. We use notation $\tilde{F}$ to indicate the algorithm. -The absolute error in the algorithm is $\lVert\tilde{f}(x) - f(x)\rVert$, the relative error divides by $\lVert f(x)\rVert$. A good algorithm would have smaller relative errors. +The absolute error in the algorithm is $\lVert\tilde{F}(x) - F(x)\rVert$, the relative error divides by $\lVert F(x)\rVert$. A good algorithm would have smaller relative errors. An algorithm is called *stable* if $$ -\frac{\lVert\tilde{f}(x) - f(\tilde{x})\rVert}{\lVert f(\tilde{x})\rVert} +\frac{\lVert\tilde{F}(x) - F(\tilde{x})\rVert}{\lVert F(\tilde{x})\rVert} $$ is *small* for *some* $\tilde{x}$ relatively near $x$, $\lVert\tilde{x}-x\rVert/\lVert x\rVert$. -> "A *stable* algorithm gives nearly the right answer to nearly the right question." +> A *stable* algorithm gives nearly the right answer to nearly the right question. -(The answer it gives is $\tilde{f}(x)$, the nearly right question is $f(\tilde{x})$.) +(The answer it gives is $\tilde{F}(x)$, the nearly right question: what is $F(\tilde{x})$?) -A related concept is an algorithm $\tilde{f}$ for a problem $f$ is *backward stable* if for each $x \in X$, +A related concept is an algorithm $\tilde{F}$ for a problem $F$ is *backward stable* if for each $x \in X$, $$ -\tilde{f}(x) = f(\tilde{x}) +\tilde{F}(x) = F(\tilde{x}) $$ - for some $\tilde{x}$ with $\lVert\tilde{x} - x\rVert/\lVert x\rVert$ is small. +for some $\tilde{x}$ with $\lVert\tilde{x} - x\rVert/\lVert x\rVert$ is small. > "A backward stable algorithm gives exactly the right answer to nearly the right question." -The concepts are related by Trefethen and Bao's Theorem 15.1 which says for a backward stable algorithm the relative error $\lVert\tilde{f}(x) - f(x)\rVert/\lVert f(x)\rVert$ is small in a manner proportional to the relative condition number. +The concepts are related by Trefethen and Bao's Theorem 15.1 which says for a backward stable algorithm the relative error $\lVert\tilde{F}(x) - F(x)\rVert/\lVert F(x)\rVert$ is small in a manner proportional to the relative condition number. Applying this to the zero-finding we follow @doi:10.1137/1.9781611975086. -To be specific, the problem is finding a zero of $f$ starting at an initial point $x_0$. The data is $(f, x_0)$, the solution is $r$ a zero of $f$. +To be specific, the problem, $F$, is finding a zero of a function $f$ starting at an initial point $x_0$. The data is $(f, x_0)$, the solution is $r$ a zero of $f$. Take the algorithm as Newton's method. Any implementation must incorporate tolerances, so this is a computational approximation to the problem. The data is the same, but technically we use $\tilde{f}$ for the function, as any computation is dependent on machine implementations. The output is $\tilde{r}$ an *approximate* zero. -Suppose for sake of argument that $\tilde{f}(x) = f(x) + \epsilon$ and $r$ is a root of $f$ and $\tilde{r}$ is a root of $\tilde{f}$. Then by linearization: +Suppose for sake of argument that $\tilde{f}(x) = f(x) + \epsilon$, $f$ has a continuous derivative, and $r$ is a root of $f$ and $\tilde{r}$ is a root of $\tilde{f}$. Then by linearization: +$$ \begin{align*} 0 &= \tilde{f}(\tilde r) \\ &= f(r + \delta) + \epsilon\\ &\approx f(r) + f'(r)\delta + \epsilon\\ &= 0 + f'(r)\delta + \epsilon \end{align*} - -Rearranging gives $\lVert\delta/\epsilon\rVert \approx 1/\lVert f'(r)\rVert$ leading to: +$$ +Rearranging gives $\lVert\delta/\epsilon\rVert \approx 1/\lVert f'(r)\rVert$. But the $|\delta|/|\epsilon|$ ratio is related to the the condition number: > The absolute condition number is $\hat{\kappa}_r = |f'(r)|^{-1}$. -The error formula in Newton's method includes the derivative in the denominator, so we see large condition numbers are tied into larger errors. +The error formula in Newton's method measuring the distance between the actual root and an approximation includes the derivative in the denominator, so we see large condition numbers are tied into possibly larger errors. Now consider $g(x) = f(x) - f(\tilde{r})$. Call $f(\tilde{r})$ the residual. We have $g$ is near $f$ if the residual is small. The algorithm will solve $(g, x_0)$ with $\tilde{r}$, so with a small residual an exact solution to an approximate question will be found. Driscoll and Braun state @@ -83,4 +84,5 @@ Practically these two observations lead to For the first observation, the example of Wilkinson's polynomial is often used where $f(x) = (x-1)\cdot(x-2)\cdot \cdots\cdot(x-20)$. When expanded this function has exactness issues of typical floating point values, the condition number is large and some of the roots found are quite different from the mathematical values. -The second observation helps explain why a problem like finding the zero of $f(x) = x \cdot \exp(x)$ using Newton's method starting at $2$ might return a value like $5.89\dots$. The residual is checked to be zero in a *relative* manner which would basically use a tolerance of `atol + abs(xn)*rtol`. Functions with asymptotes of $0$ will eventually be smaller than this value. + +The second observation follows from $f(x_n)$ monitoring the backward error and the product of the condition number and the backward error monitoring the forward error. This product is on the order of $|f(x_n)/f'(x_n)|$ or $|x_{n+1} - x_n|$. diff --git a/quarto/derivatives/derivatives.qmd b/quarto/derivatives/derivatives.qmd index d60fbc1d..1ae16472 100644 --- a/quarto/derivatives/derivatives.qmd +++ b/quarto/derivatives/derivatives.qmd @@ -22,6 +22,8 @@ nothing --- +![Device to measure units of distance by units of time](figures/galileo-ramp.png){width=60%} + Before defining the derivative of a function, let's begin with two motivating examples. @@ -520,13 +522,14 @@ This holds two rules: the derivative of a constant times a function is the const This example shows a useful template: - +$$ \begin{align*} [2x^2 - \frac{x}{3} + 3e^x]' & = 2[\square]' - \frac{[\square]'}{3} + 3[\square]'\\ &= 2[x^2]' - \frac{[x]'}{3} + 3[e^x]'\\ &= 2(2x) - \frac{1}{3} + 3e^x\\ &= 4x - \frac{1}{3} + 3e^x \end{align*} +$$ ### Product rule @@ -548,12 +551,13 @@ The output uses the Leibniz notation to represent that the derivative of $u(x) \ This example shows a useful template for the product rule: - +$$ \begin{align*} [(x^2+1)\cdot e^x]' &= [\square]' \cdot (\square) + (\square) \cdot [\square]'\\ &= [x^2 + 1]' \cdot (e^x) + (x^2+1) \cdot [e^x]'\\ &= (2x)\cdot e^x + (x^2+1)\cdot e^x \end{align*} +$$ ### Quotient rule @@ -572,12 +576,13 @@ limit((f(x+h) - f(x))/h, h => 0) This example shows a useful template for the quotient rule: - +$$ \begin{align*} [\frac{x^2+1}{e^x}]' &= \frac{[\square]' \cdot (\square) - (\square) \cdot [\square]'}{(\square)^2}\\ &= \frac{[x^2 + 1]' \cdot (e^x) - (x^2+1) \cdot [e^x]'}{(e^x)^2}\\ &= \frac{(2x)\cdot e^x - (x^2+1)\cdot e^x}{e^{2x}} \end{align*} +$$ ##### Examples @@ -672,19 +677,21 @@ There are $n$ terms, each where one of the $f_i$s have a derivative. Were we to With this, we can proceed. Each term $x-i$ has derivative $1$, so the answer to $f'(x)$, with $f$ as above, is - +$$ \begin{align*} f'(x) &= f(x)/(x-1) + f(x)/(x-2) + f(x)/(x-3)\\ &+ f(x)/(x-4) + f(x)/(x-5), \end{align*} +$$ That is - +$$ \begin{align*} f'(x) &= (x-2)(x-3)(x-4)(x-5) + (x-1)(x-3)(x-4)(x-5)\\ &+ (x-1)(x-2)(x-4)(x-5) + (x-1)(x-2)(x-3)(x-5) \\ &+ (x-1)(x-2)(x-3)(x-4). \end{align*} +$$ --- @@ -749,17 +756,18 @@ Combined, we would end up with: To see that this works in our specific case, we assume the general power rule that $[x^n]' = n x^{n-1}$ to get: - +$$ \begin{align*} f(x) &= x^2 & g(x) &= \sqrt{x}\\ f'(\square) &= 2(\square) & g'(x) &= \frac{1}{2}x^{-1/2} \end{align*} +$$ We use $\square$ for the argument of `f'` to emphasize that $g(x)$ is the needed value, not just $x$: - +$$ \begin{align*} [(\sqrt{x})^2]' &= [f(g(x)]'\\ &= f'(g(x)) \cdot g'(x) \\ @@ -767,6 +775,7 @@ We use $\square$ for the argument of `f'` to emphasize that $g(x)$ is the needed &= \frac{2\sqrt{x}}{2\sqrt{x}}\\ &=1 \end{align*} +$$ This is the same as the derivative of $x$ found by first evaluating the composition. For this problem, the chain rule is not necessary, but typically it is a needed rule to fully differentiate a function. @@ -778,11 +787,12 @@ This is the same as the derivative of $x$ found by first evaluating the composit Find the derivative of $f(x) = \sqrt{1 - x^2}$. We identify the composition of $\sqrt{x}$ and $(1-x^2)$. We set the functions and their derivatives into a pattern to emphasize the pieces in the chain-rule formula: - +$$ \begin{align*} f(x) &=\sqrt{x} = x^{1/2} & g(x) &= 1 - x^2 \\ f'(\square) &=(1/2)(\square)^{-1/2} & g'(x) &= -2x \end{align*} +$$ Then: @@ -823,11 +833,12 @@ This is a useful rule to remember for expressions involving exponentials. Find the derivative of $\sin(x)\cos(2x)$ at $x=\pi$. - +$$ \begin{align*} [\sin(x)\cos(2x)]'\big|_{x=\pi} &=(\cos(x)\cos(2x) + \sin(x)(-\sin(2x)\cdot 2))\big|_{x=\pi} \\ & =((-1)(1) + (0)(-0)(2)) = -1. \end{align*} +$$ ##### Proof of the Chain Rule @@ -844,23 +855,25 @@ g(a+h) = g(a) + g'(a)h + \epsilon_g(h) h = g(a) + h', $$ Where $h' = (g'(a) + \epsilon_g(h))h \rightarrow 0$ as $h \rightarrow 0$ will be used to simplify the following: - +$$ \begin{align*} f(g(a+h)) - f(g(a)) &= f(g(a) + g'(a)h + \epsilon_g(h)h) - f(g(a)) \\ &= f(g(a)) + f'(g(a)) (g'(a)h + \epsilon_g(h)h) + \epsilon_f(h')(h') - f(g(a))\\ &= f'(g(a)) g'(a)h + f'(g(a))(\epsilon_g(h)h) + \epsilon_f(h')(h'). \end{align*} +$$ Rearranging: - +$$ \begin{align*} f(g(a+h)) &- f(g(a)) - f'(g(a)) g'(a) h\\ &= f'(g(a))\epsilon_g(h)h + \epsilon_f(h')(h')\\ &=(f'(g(a)) \epsilon_g(h) + \epsilon_f(h') (g'(a) + \epsilon_g(h)))h \\ &=\epsilon(h)h, \end{align*} +$$ where $\epsilon(h)$ combines the above terms which go to zero as $h\rightarrow 0$ into one. This is the alternative definition of the derivative, showing $(f\circ g)'(a) = f'(g(a)) g'(a)$ when $g$ is differentiable at $a$ and $f$ is differentiable at $g(a)$. @@ -871,17 +884,18 @@ where $\epsilon(h)$ combines the above terms which go to zero as $h\rightarrow 0 The chain rule name could also be simply the "composition rule," as that is the operation the rule works for. However, in practice, there are usually *multiple* compositions, and the "chain" rule is used to chain together the different pieces. To get a sense, consider a triple composition $u(v(w(x)))$. This will have derivative: - +$$ \begin{align*} [u(v(w(x)))]' &= u'(v(w(x))) \cdot [v(w(x))]' \\ &= u'(v(w(x))) \cdot v'(w(x)) \cdot w'(x) \end{align*} +$$ The answer can be viewed as a repeated peeling off of the outer function, a view with immediate application to many compositions. To see that in action with an expression, consider this derivative problem, shown in steps: - +$$ \begin{align*} [\sin(e^{\cos(x^2-x)})]' &= \cos(e^{\cos(x^2-x)}) \cdot [e^{\cos(x^2-x)}]'\\ @@ -889,6 +903,7 @@ The answer can be viewed as a repeated peeling off of the outer function, a view &= \cos(e^{\cos(x^2-x)}) \cdot e^{\cos(x^2-x)} \cdot (-\sin(x^2-x)) \cdot [x^2-x]'\\ &= \cos(e^{\cos(x^2-x)}) \cdot e^{\cos(x^2-x)} \cdot (-\sin(x^2-x)) \cdot (2x-1)\\ \end{align*} +$$ ##### More examples of differentiation @@ -1004,7 +1019,7 @@ Find the derivative of $f(x) = x \cdot e^{-x^2}$. Using the product rule and then the chain rule, we have: - +$$ \begin{align*} f'(x) &= [x \cdot e^{-x^2}]'\\ &= [x]' \cdot e^{-x^2} + x \cdot [e^{-x^2}]'\\ @@ -1012,6 +1027,7 @@ f'(x) &= [x \cdot e^{-x^2}]'\\ &= e^{-x^2} + x \cdot e^{-x^2} \cdot (-2x)\\ &= e^{-x^2} (1 - 2x^2). \end{align*} +$$ --- @@ -1022,7 +1038,7 @@ Find the derivative of $f(x) = e^{-ax} \cdot \sin(x)$. Using the product rule and then the chain rule, we have: - +$$ \begin{align*} f'(x) &= [e^{-ax} \cdot \sin(x)]'\\ &= [e^{-ax}]' \cdot \sin(x) + e^{-ax} \cdot [\sin(x)]'\\ @@ -1030,6 +1046,7 @@ f'(x) &= [e^{-ax} \cdot \sin(x)]'\\ &= e^{-ax} \cdot (-a) \cdot \sin(x) + e^{-ax} \cos(x)\\ &= e^{-ax}(\cos(x) - a\sin(x)). \end{align*} +$$ --- @@ -1164,13 +1181,14 @@ Find the first $3$ derivatives of $f(x) = ax^3 + bx^2 + cx + d$. Differentiating a polynomial is done with the sum rule, here we repeat three times: - +$$ \begin{align*} f(x) &= ax^3 + bx^2 + cx + d\\ f'(x) &= 3ax^2 + 2bx + c \\ f''(x) &= 3\cdot 2 a x + 2b \\ f'''(x) &= 6a \end{align*} +$$ We can see, the fourth derivative – and all higher order ones – would be identically $0$. This is part of a general phenomenon: an $n$th degree polynomial has only $n$ non-zero derivatives. @@ -1181,7 +1199,7 @@ We can see, the fourth derivative – and all higher order ones – would be ide Find the first $5$ derivatives of $\sin(x)$. - +$$ \begin{align*} f(x) &= \sin(x) \\ f'(x) &= \cos(x) \\ @@ -1190,6 +1208,7 @@ f'''(x) &= -\cos(x) \\ f^{(4)} &= \sin(x) \\ f^{(5)} &= \cos(x) \end{align*} +$$ We see the derivatives repeat themselves. (We also see alternative notation for higher order derivatives.) @@ -1616,13 +1635,14 @@ The right graph is of $g(x) = \exp(x)$ at $x=1$, the left graph of $f(x) = \sin( Assuming the approximation gets better for $h$ close to $0$, as it visually does, the derivative at $1$ for $f(g(x))$ should be given by this limit: - +$$ \begin{align*} \frac{d(f\circ g)}{dx}\mid_{x=1} &= \lim_{h\rightarrow 0} \frac{f(g(1) + g'(1)h)-f(g(1))}{h}\\ &= \lim_{h\rightarrow 0} \frac{f(g(1) + g'(1)h)-f(g(1))}{g'(1)h} \cdot g'(1)\\ &= \lim_{h\rightarrow 0} (f\circ g)'(g(1)) \cdot g'(1). \end{align*} +$$ What limit law, described below assuming all limits exist. allows the last equals sign? diff --git a/quarto/derivatives/figures/galileo-ramp.png b/quarto/derivatives/figures/galileo-ramp.png new file mode 100644 index 00000000..d32f4f05 Binary files /dev/null and b/quarto/derivatives/figures/galileo-ramp.png differ diff --git a/quarto/derivatives/first_second_derivatives.qmd b/quarto/derivatives/first_second_derivatives.qmd index 9baad5a8..c6ca4b2f 100644 --- a/quarto/derivatives/first_second_derivatives.qmd +++ b/quarto/derivatives/first_second_derivatives.qmd @@ -36,16 +36,16 @@ Of course, we define *negative* in a parallel manner. The intermediate value th Next, +::: {.callout-note icon=false} +## Strictly increasing -> A function, $f$, is (strictly) **increasing** on an interval $I$ if for any $a < b$ it must be that $f(a) < f(b)$. - - +A function, $f$, is (strictly) **increasing** on an interval $I$ if for any $a < b$ it must be that $f(a) < f(b)$. The word strictly is related to the inclusion of the $<$ precluding the possibility of a function being flat over an interval that the $\leq$ inequality would allow. A parallel definition with $a < b$ implying $f(a) > f(b)$ would be used for a *strictly decreasing* function. - +::: We can try and prove these properties for a function algebraically – we'll see both are related to the zeros of some function. However, before proceeding to that it is usually helpful to get an idea of where the answer is using exploratory graphs. @@ -160,13 +160,17 @@ This leaves the question: This question can be answered by considering the first derivative. -> *The first derivative test*: If $c$ is a critical point for $f(x)$ and *if* $f'(x)$ changes sign at $x=c$, then $f(c)$ will be either a relative maximum or a relative minimum. -> -> * $f$ will have a relative maximum at $c$ if the derivative changes sign from $+$ to $-$. -> * $f$ will have a relative minimum at $c$ if the derivative changes sign from $-$ to $+$. -> -> Further, If $f'(x)$ does *not* change sign at $c$, then $f$ will *not* have a relative maximum or minimum at $c$. +::: {.callout-note icon=false} +## The first derivative test + +If $c$ is a critical point for $f(x)$ and *if* $f'(x)$ changes sign at $x=c$, then $f(c)$ will be either a relative maximum or a relative minimum. + * $f$ will have a relative maximum at $c$ if the derivative changes sign from $+$ to $-$. + * $f$ will have a relative minimum at $c$ if the derivative changes sign from $-$ to $+$. + + Further, If $f'(x)$ does *not* change sign at $c$, then $f$ will *not* have a relative maximum or minimum at $c$. + +::: The classification part, should be clear: e.g., if the derivative is positive then negative, the function $f$ will increase to $(c,f(c))$ then decrease from $(c,f(c))$ – so $f$ will have a local maximum at $c$. @@ -424,12 +428,15 @@ The graph attempts to illustrate that for this function the secant line between This is a special property not shared by all functions. Let $I$ be an open interval. +::: {.callout-note icon=false} +## Concave up -> **Concave up**: A function $f(x)$ is concave up on $I$ if for any $a < b$ in $I$, the secant line between $a$ and $b$ lies above the graph of $f(x)$ over $[a,b]$. - +A function $f(x)$ is concave up on $I$ if for any $a < b$ in $I$, the secant line between $a$ and $b$ lies above the graph of $f(x)$ over $[a,b]$. +A similar definition exists for *concave down* where the secant lines lie below the graph. +::: -A similar definition exists for *concave down* where the secant lines lie below the graph. Notationally, concave up says for any $x$ in $[a,b]$: +Notationally, concave up says for any $x$ in $[a,b]$: $$ @@ -491,14 +498,16 @@ sign_chart(h'', 0, 8) Concave up functions are "opening" up, and often clearly $U$-shaped, though that is not necessary. At a relative minimum, where there is a $U$-shape, the graph will be concave up; conversely at a relative maximum, where the graph has a downward $\cap$-shape, the function will be concave down. This observation becomes: +::: {.callout-note icon=false} +## The second derivative test -> The **second derivative test**: If $c$ is a critical point of $f(x)$ with $f''(c)$ existing in a neighborhood of $c$, then -> -> * $f$ will have a relative maximum at the critical point $c$ if $f''(c) > 0$, -> * $f$ will have a relative minimum at the critical point $c$ if $f''(c) < 0$, and -> * *if* $f''(c) = 0$ the test is *inconclusive*. +If $c$ is a critical point of $f(x)$ with $f''(c)$ existing in a neighborhood of $c$, then + * $f$ will have a relative maximum at the critical point $c$ if $f''(c) > 0$, + * $f$ will have a relative minimum at the critical point $c$ if $f''(c) < 0$, and + * *if* $f''(c) = 0$ the test is *inconclusive*. +::: If $f''(c)$ is positive in an interval about $c$, then $f''(c) > 0$ implies the function is concave up at $x=c$. In turn, concave up implies the derivative is increasing so must go from negative to positive at the critical point. diff --git a/quarto/derivatives/lhospitals_rule.qmd b/quarto/derivatives/lhospitals_rule.qmd index 829a62b8..05d0fb66 100644 --- a/quarto/derivatives/lhospitals_rule.qmd +++ b/quarto/derivatives/lhospitals_rule.qmd @@ -52,15 +52,18 @@ Wouldn't that be nice? We could find difficult limits just by differentiating th Well, in fact that is more or less true, a fact that dates back to [L'Hospital](http://en.wikipedia.org/wiki/L%27H%C3%B4pital%27s_rule) - who wrote the first textbook on differential calculus - though this result is likely due to one of the Bernoulli brothers. +::: {.callout-note icon=false} +## L'Hospital's rule -> *L'Hospital's rule*: Suppose: -> -> * that $\lim_{x\rightarrow c+} f(c) =0$ and $\lim_{x\rightarrow c+} g(c) =0$, -> * that $f$ and $g$ are differentiable in $(c,b)$, and -> * that $g(x)$ exists and is non-zero for *all* $x$ in $(c,b)$, -> -> then **if** the following limit exists: $\lim_{x\rightarrow c+}f'(x)/g'(x)=L$ it follows that $\lim_{x \rightarrow c+}f(x)/g(x) = L$. +Suppose: + * that $\lim_{x\rightarrow c+} f(c) =0$ and $\lim_{x\rightarrow c+} g(c) =0$, + * that $f$ and $g$ are differentiable in $(c,b)$, and + * that $g(x)$ exists and is non-zero for *all* $x$ in $(c,b)$, + +then **if** the following limit exists: $\lim_{x\rightarrow c+}f'(x)/g'(x)=L$ it follows that $\lim_{x \rightarrow c+}f(x)/g(x) = L$. + +::: That is *if* the right limit of $f(x)/g(x)$ is indeterminate of the form $0/0$, but the right limit of $f'(x)/g'(x)$ is known, possibly by simple continuity, then the right limit of $f(x)/g(x)$ exists and is equal to that of $f'(x)/g'(x)$. @@ -308,23 +311,25 @@ L'Hospital's rule generalizes to other indeterminate forms, in particular the in The value $c$ in the limit can also be infinite. Consider this case with $c=\infty$: - +$$ \begin{align*} \lim_{x \rightarrow \infty} \frac{f(x)}{g(x)} &= \lim_{x \rightarrow 0} \frac{f(1/x)}{g(1/x)} \end{align*} +$$ L'Hospital's limit applies as $x \rightarrow 0$, so we differentiate to get: - +$$ \begin{align*} \lim_{x \rightarrow 0} \frac{[f(1/x)]'}{[g(1/x)]'} &= \lim_{x \rightarrow 0} \frac{f'(1/x)\cdot(-1/x^2)}{g'(1/x)\cdot(-1/x^2)}\\ &= \lim_{x \rightarrow 0} \frac{f'(1/x)}{g'(1/x)}\\ &= \lim_{x \rightarrow \infty} \frac{f'(x)}{g'(x)}, \end{align*} +$$ *assuming* the latter limit exists, L'Hospital's rule assures the equality @@ -415,11 +420,12 @@ Be just saw that $\lim_{x \rightarrow 0+}\log(x)/(1/x) = 0$. So by the rules for A limit $\lim_{x \rightarrow c} f(x) - g(x)$ of indeterminate form $\infty - \infty$ can be reexpressed to be of the from $0/0$ through the transformation: - +$$ \begin{align*} f(x) - g(x) &= f(x)g(x) \cdot (\frac{1}{g(x)} - \frac{1}{f(x)}) \\ &= \frac{\frac{1}{g(x)} - \frac{1}{f(x)}}{\frac{1}{f(x)g(x)}}. \end{align*} +$$ Applying this to @@ -438,7 +444,7 @@ $$ \lim_{x \rightarrow 1} \frac{x\log(x)-(x-1)}{(x-1)\log(x)} $$ -In `SymPy` we have (using italic variable names to avoid a problem with the earlier use of `f` as a function): +In `SymPy` we have: ```{julia} diff --git a/quarto/derivatives/linearization.qmd b/quarto/derivatives/linearization.qmd index ef877cbe..a02c0837 100644 --- a/quarto/derivatives/linearization.qmd +++ b/quarto/derivatives/linearization.qmd @@ -510,21 +510,23 @@ $$ Suppose $f(x)$ and $g(x)$ are represented by their tangent lines about $c$, respectively: - +$$ \begin{align*} f(x) &= f(c) + f'(c)(x-c) + \mathcal{O}((x-c)^2), \\ g(x) &= g(c) + g'(c)(x-c) + \mathcal{O}((x-c)^2). \end{align*} +$$ Consider the sum, after rearranging we have: - +$$ \begin{align*} f(x) + g(x) &= \left(f(c) + f'(c)(x-c) + \mathcal{O}((x-c)^2)\right) + \left(g(c) + g'(c)(x-c) + \mathcal{O}((x-c)^2)\right)\\ &= \left(f(c) + g(c)\right) + \left(f'(c)+g'(c)\right)(x-c) + \mathcal{O}((x-c)^2). \end{align*} +$$ The two big "Oh" terms become just one as the sum of a constant times $(x-c)^2$ plus a constant time $(x-c)^2$ is just some other constant times $(x-c)^2$. What we can read off from this is the term multiplying $(x-c)$ is just the derivative of $f(x) + g(x)$ (from the sum rule), so this too is a tangent line approximation. @@ -533,7 +535,7 @@ The two big "Oh" terms become just one as the sum of a constant times $(x-c)^2$ Is it a coincidence that a basic algebraic operation with tangent lines approximations produces a tangent line approximation? Let's try multiplication: - +$$ \begin{align*} f(x) \cdot g(x) &= [f(c) + f'(c)(x-c) + \mathcal{O}((x-c)^2)] \cdot [g(c) + g'(c)(x-c) + \mathcal{O}((x-c)^2)]\\ &=[f(c) + f'(c)(x-c)] \cdot [g(c) + g'(c)(x-c)] + (f(c) + f'(c)(x-c)) \cdot \mathcal{O}((x-c)^2) + (g(c) + g'(c)(x-c)) \cdot \mathcal{O}((x-c)^2) + [\mathcal{O}((x-c)^2)]^2\\ @@ -541,6 +543,7 @@ f(x) \cdot g(x) &= [f(c) + f'(c)(x-c) + \mathcal{O}((x-c)^2)] \cdot [g(c) + g'( &= f(c) \cdot g(c) + [f'(c)\cdot g(c) + f(c)\cdot g'(c)] \cdot (x-c) + [f'(c)\cdot g'(c) \cdot (x-c)^2 + \mathcal{O}((x-c)^2)] \\ &= f(c) \cdot g(c) + [f'(c)\cdot g(c) + f(c)\cdot g'(c)] \cdot (x-c) + \mathcal{O}((x-c)^2) \end{align*} +$$ The big "oh" notation just sweeps up many things including any products of it *and* the term $f'(c)\cdot g'(c) \cdot (x-c)^2$. Again, we see from the product rule that this is just a tangent line approximation for $f(x) \cdot g(x)$. @@ -803,13 +806,14 @@ numericq(abs(answ)) The [Birthday problem](https://en.wikipedia.org/wiki/Birthday_problem) computes the probability that in a group of $n$ people, under some assumptions, that no two share a birthday. Without trying to spoil the problem, we focus on the calculus specific part of the problem below: - +$$ \begin{align*} p &= \frac{365 \cdot 364 \cdot \cdots (365-n+1)}{365^n} \\ &= \frac{365(1 - 0/365) \cdot 365(1 - 1/365) \cdot 365(1-2/365) \cdot \cdots \cdot 365(1-(n-1)/365)}{365^n}\\ &= (1 - \frac{0}{365})\cdot(1 -\frac{1}{365})\cdot \cdots \cdot (1-\frac{n-1}{365}). \end{align*} +$$ Taking logarithms, we have $\log(p)$ is diff --git a/quarto/derivatives/mean_value_theorem.qmd b/quarto/derivatives/mean_value_theorem.qmd index 64222c9d..1968497c 100644 --- a/quarto/derivatives/mean_value_theorem.qmd +++ b/quarto/derivatives/mean_value_theorem.qmd @@ -92,9 +92,12 @@ Lest you think that continuous functions always have derivatives except perhaps We have defined an *absolute maximum* of $f(x)$ over an interval to be a value $f(c)$ for a point $c$ in the interval that is as large as any other value in the interval. Just specifying a function and an interval does not guarantee an absolute maximum, but specifying a *continuous* function and a *closed* interval does, by the extreme value theorem. -> *A relative maximum*: We say $f(x)$ has a *relative maximum* at $c$ if there exists *some* interval $I=(a,b)$ with $a < c < b$ for which $f(c)$ is an absolute maximum for $f$ and $I$. +::: {.callout-note icon=false} +## A relative maximum +We say $f(x)$ has a *relative maximum* at $c$ if there exists *some* interval $I=(a,b)$ with $a < c < b$ for which $f(c)$ is an absolute maximum for $f$ and $I$. +::: The difference is a bit subtle, for an absolute maximum the interval must also be specified, for a relative maximum there just needs to exist some interval, possibly really small, though it must be bigger than a point. @@ -139,12 +142,16 @@ For a continuous function $f(x)$, call a point $c$ in the domain of $f$ where ei We can combine Bolzano's extreme value theorem with Fermat's insight to get the following: +::: {.callout-note icon=false} +## Absolute maxima characterization -> A continuous function on $[a,b]$ has an absolute maximum that occurs at a critical point $c$, $a < c < b$, or an endpoint, $a$ or $b$. +A continuous function on $[a,b]$ has an absolute maximum that occurs at a critical point $c$, $a < c < b$, or an endpoint, $a$ or $b$. +A similar statement holds for an absolute minimum. +::: -A similar statement holds for an absolute minimum. This gives a restricted set of places to look for absolute maximum and minimum values - all the critical points and the endpoints. +The above gives a restricted set of places to look for absolute maximum and minimum values - all the critical points and the endpoints. It is also the case that all relative extrema occur at a critical point, *however* not all critical points correspond to relative extrema. We will see *derivative tests* that help characterize when that occurs. @@ -263,10 +270,12 @@ Here the maximum occurs at an endpoint. The critical point $c=0.67\dots$ does no Let $f(x)$ be differentiable on $(a,b)$ and continuous on $[a,b]$. Then the absolute maximum occurs at an endpoint or where the derivative is $0$ (as the derivative is always defined). This gives rise to: +::: {.callout-note icon=false} +## [Rolle's](http://en.wikipedia.org/wiki/Rolle%27s_theorem) theorem -> *[Rolle's](http://en.wikipedia.org/wiki/Rolle%27s_theorem) theorem*: For $f$ differentiable on $(a,b)$ and continuous on $[a,b]$, if $f(a)=f(b)$, then there exists some $c$ in $(a,b)$ with $f'(c) = 0$. - +For $f$ differentiable on $(a,b)$ and continuous on $[a,b]$, if $f(a)=f(b)$, then there exists some $c$ in $(a,b)$ with $f'(c) = 0$. +::: This modest observation opens the door to many relationships between a function and its derivative, as it ties the two together in one statement. @@ -311,10 +320,12 @@ We are driving south and in one hour cover 70 miles. If the speed limit is 65 mi The mean value theorem is a direct generalization of Rolle's theorem. +::: {.callout-note icon=false} +## Mean value theorem -> *Mean value theorem*: Let $f(x)$ be differentiable on $(a,b)$ and continuous on $[a,b]$. Then there exists a value $c$ in $(a,b)$ where $f'(c) = (f(b) - f(a)) / (b - a)$. - +Let $f(x)$ be differentiable on $(a,b)$ and continuous on $[a,b]$. Then there exists a value $c$ in $(a,b)$ where $f'(c) = (f(b) - f(a)) / (b - a)$. +::: This says for any secant line between $a < b$ there will be a parallel tangent line at some $c$ with $a < c < b$ (all provided $f$ is differentiable on $(a,b)$ and continuous on $[a,b]$). @@ -425,13 +436,20 @@ Suppose it is known that $f'(x)=0$ on some interval $I$ and we take any $a < b$ ### The Cauchy mean value theorem -[Cauchy](http://en.wikipedia.org/wiki/Mean_value_theorem#Cauchy.27s_mean_value_theorem) offered an extension to the mean value theorem above. Suppose both $f$ and $g$ satisfy the conditions of the mean value theorem on $[a,b]$ with $g(b)-g(a) \neq 0$, then there exists at least one $c$ with $a < c < b$ such that +[Cauchy](http://en.wikipedia.org/wiki/Mean_value_theorem#Cauchy.27s_mean_value_theorem) offered an extension to the mean value theorem above. + +::: {.callout-note icon=false} +## Cauchy mean value theorem + +Suppose both $f$ and $g$ satisfy the conditions of the mean value theorem on $[a,b]$ with $g(b)-g(a) \neq 0$, then there exists at least one $c$ with $a < c < b$ such that $$ f'(c) = g'(c) \cdot \frac{f(b) - f(a)}{g(b) - g(a)}. $$ +::: + The proof follows by considering $h(x) = f(x) - r\cdot g(x)$, with $r$ chosen so that $h(a)=h(b)$. Then Rolle's theorem applies so that there is a $c$ with $h'(c)=0$, so $f'(c) = r g'(c)$, but $r$ can be seen to be $(f(b)-f(a))/(g(b)-g(a))$, which proves the theorem. diff --git a/quarto/derivatives/more_zeros.qmd b/quarto/derivatives/more_zeros.qmd index 9995045b..81b6c9b0 100644 --- a/quarto/derivatives/more_zeros.qmd +++ b/quarto/derivatives/more_zeros.qmd @@ -127,12 +127,13 @@ Though the derivative is related to the slope of the secant line, that is in the Let $\epsilon_{n+1} = x_{n+1}-\alpha$, where $\alpha$ is assumed to be the *simple* zero of $f(x)$ that the secant method converges to. A [calculation](https://math.okstate.edu/people/binegar/4513-F98/4513-l08.pdf) shows that - +$$ \begin{align*} \epsilon_{n+1} &\approx \frac{x_n-x_{n-1}}{f(x_n)-f(x_{n-1})} \frac{(1/2)f''(\alpha)(\epsilon_n-\epsilon_{n-1})}{x_n-x_{n-1}} \epsilon_n \epsilon_{n-1}\\ & \approx \frac{f''(\alpha)}{2f'(\alpha)} \epsilon_n \epsilon_{n-1}\\ &= C \epsilon_n \epsilon_{n-1}. \end{align*} +$$ The constant `C` is similar to that for Newton's method, and reveals potential troubles for the secant method similar to those of Newton's method: a poor initial guess (the initial error is too big), the second derivative is too large, the first derivative too flat near the answer. @@ -185,7 +186,7 @@ Here we use `SymPy` to identify the degree-$2$ polynomial as a function of $y$, @syms y hs[0:2] xs[0:2] fs[0:2] H(y) = sum(hᵢ*(y - fs[end])^i for (hᵢ,i) ∈ zip(hs, 0:2)) -eqs = [H(fᵢ) ~ xᵢ for (xᵢ, fᵢ) ∈ zip(xs, fs)] +eqs = tuple((H(fᵢ) ~ xᵢ for (xᵢ, fᵢ) ∈ zip(xs, fs))...) ϕ = solve(eqs, hs) hy = subs(H(y), ϕ) ``` @@ -279,41 +280,6 @@ We can see it in action on the sine function. Here we pass in $\lambda$, but i chandrapatla(sin, 3, 4, λ3, verbose=true) ``` -```{julia} -#| output: false - -#= -The condition `Φ^2 < ξ < 1 - (1-Φ)^2` can be visualized. Assume `a,b=0,1`, `fa,fb=-1/2,1`, Then `c < a < b`, and `fc` has the same sign as `fa`, but what values of `fc` will satisfy the inequality? - - -XX```{julia} -ξ(c,fc) = (a-b)/(c-b) -Φ(c,fc) = (fa-fb)/(fc-fb) -Φl(c,fc) = Φ(c,fc)^2 -Φr(c,fc) = 1 - (1-Φ(c,fc))^2 -a,b = 0, 1 -fa,fb = -1/2, 1 -region = Lt(Φl, ξ) & Lt(ξ,Φr) -plot(region, xlims=(-2,a), ylims=(-3,0)) -XX``` - -When `(c,fc)` is in the shaded area, the inverse quadratic step is chosen. We can see that `fc < fa` is needed. - - -For these values, this area is within the area where a implicit quadratic step will result in a value between `a` and `b`: - - -XX```{julia} -l(c,fc) = λ3(fa,fb,fc,a,b,c) -region₃ = ImplicitEquations.Lt(l,b) & ImplicitEquations.Gt(l,a) -plot(region₃, xlims=(-2,0), ylims=(-3,0)) -XX``` - -There are values in the parameter space where this does not occur. -=# -nothing -``` - ## Tolerances @@ -426,6 +392,96 @@ So a modified criteria for convergence might look like: It is not uncommon to assign `rtol` to have a value like `sqrt(eps())` to account for accumulated floating point errors and the factor of $f'(\alpha)$, though in the `Roots` package it is set smaller by default. +### Conditioning and stability + +In Part III of @doi:10.1137/1.9781611977165 we find language of numerical analysis useful to formally describe the zero-finding problem. Key concepts are errors, conditioning, and stability. These give some theoretical justification for the tolerances above. + +Abstractly a *problem* is a mapping, $F$, from a domain $X$ of data to a range $Y$ of solutions. Both $X$ and $Y$ have a sense of distance given by a *norm*. A norm is a generalization of the absolute value and gives quantitative meaning to terms like small and large. + + +> A *well-conditioned* problem is one with the property that all small perturbations of $x$ lead to only small changes in $F(x)$. + +This sense of "small" is measured through a *condition number*. + +If we let $\delta_x$ be a small perturbation of $x$ then $\delta_F = F(x + \delta_x) - F(x)$. + +The *forward error* is $\lVert\delta_F\rVert = \lVert F(x+\delta_x) - F(x)\rVert$, the *relative forward error* is $\lVert\delta_F\rVert/\lVert F\rVert = \lVert F(x+\delta_x) - F(x)\rVert/ \lVert F(x)\rVert$. + +The *backward error* is $\lVert\delta_x\rVert$, the *relative backward error* is $\lVert\delta_x\rVert / \lVert x\rVert$. + + The *absolute condition number* $\hat{\kappa}$ is worst case of this ratio $\lVert\delta_F\rVert/ \lVert\delta_x\rVert$ as the perturbation size shrinks to $0$. +The relative condition number $\kappa$ divides $\lVert\delta_F\rVert$ by $\lVert F(x)\rVert$ and $\lVert\delta_x\rVert$ by $\lVert x\rVert$ before taking the ratio. + + +A *problem* is a mathematical concept, an *algorithm* the computational version. Algorithms may differ for many reasons, such as floating point errors, tolerances, etc. We use notation $\tilde{F}$ to indicate the algorithm. + +The absolute error in the algorithm is $\lVert\tilde{F}(x) - F(x)\rVert$, the relative error divides by $\lVert F(x)\rVert$. A good algorithm would have smaller relative errors. + +An algorithm is called *stable* if + +$$ +\frac{\lVert\tilde{F}(x) - F(\tilde{x})\rVert}{\lVert F(\tilde{x})\rVert} +$$ + +is *small* for *some* $\tilde{x}$ relatively near $x$, $\lVert\tilde{x}-x\rVert/\lVert x\rVert$. + +> A *stable* algorithm gives nearly the right answer to nearly the right question. + +(The answer it gives is $\tilde{F}(x)$, the nearly right question: what is $F(\tilde{x})$?) + +A related concept is an algorithm $\tilde{F}$ for a problem $F$ is *backward stable* if for each $x \in X$, + +$$ +\tilde{F}(x) = F(\tilde{x}) +$$ + +for some $\tilde{x}$ with $\lVert\tilde{x} - x\rVert/\lVert x\rVert$ is small. + +> "A backward stable algorithm gives exactly the right answer to nearly the right question." + + +The concepts are related by Trefethen and Bao's Theorem 15.1 which says for a backward stable algorithm the relative error $\lVert\tilde{F}(x) - F(x)\rVert/\lVert F(x)\rVert$ is small in a manner proportional to the relative condition number. + +Applying this to the zero-finding we follow @doi:10.1137/1.9781611975086. + +To be specific, the problem, $F$, is finding a zero of a function $f$ starting at an initial point $x_0$. The data is $(f, x_0)$, the solution is $r$ a zero of $f$. + +Take the algorithm as Newton's method. Any implementation must incorporate tolerances, so this is a computational approximation to the problem. The data is the same, but technically we use $\tilde{f}$ for the function, as any computation is dependent on machine implementations. The output is $\tilde{r}$ an *approximate* zero. + +Suppose for sake of argument that $\tilde{f}(x) = f(x) + \epsilon$, $f$ has a continuous derivative, and $r$ is a root of $f$ and $\tilde{r}$ is a root of $\tilde{f}$. Then by linearization: + +$$ +\begin{align*} +0 &= \tilde{f}(\tilde r) \\ + &= f(r + \delta) + \epsilon\\ + &\approx f(r) + f'(r)\delta + \epsilon\\ + &= 0 + f'(r)\delta + \epsilon +\end{align*} +$$ +Rearranging gives $\lVert\delta/\epsilon\rVert \approx 1/\lVert f'(r)\rVert$. But the $|\delta|/|\epsilon|$ ratio is related to the the condition number: + +> The absolute condition number is $\hat{\kappa}_r = |f'(r)|^{-1}$. + + +The error formula in Newton's method measuring the distance between the actual root and an approximation includes the derivative in the denominator, so we see large condition numbers are tied into possibly larger errors. + +Now consider $g(x) = f(x) - f(\tilde{r})$. Call $f(\tilde{r})$ the residual. We have $g$ is near $f$ if the residual is small. The algorithm will solve $(g, x_0)$ with $\tilde{r}$, so with a small residual an exact solution to an approximate question will be found. Driscoll and Braun state + +> The backward error in a root estimate is equal to the residual. + + +Practically these two observations lead to + +* If there is a large condition number, it may not be possible to find an approximate root near the real root. + +* A tolerance in an algorithm should consider both the size of $x_{n} - x_{n-1}$ and the residual $f(x_n)$. + +For the first observation, the example of Wilkinson's polynomial is often used where $f(x) = (x-1)\cdot(x-2)\cdot \cdots\cdot(x-20)$. When expanded this function has exactness issues of typical floating point values, the condition number is large and some of the roots found are quite different from the mathematical values. + + +The second observation follows from $f(x_n)$ monitoring the backward error and the product of the condition number and the backward error monitoring the forward error. This product is on the order of $|f(x_n)/f'(x_n)|$ or $|x_{n+1} - x_n|$. + + ## Questions diff --git a/quarto/derivatives/newtons_method.qmd b/quarto/derivatives/newtons_method.qmd index eb0181ff..541d181d 100644 --- a/quarto/derivatives/newtons_method.qmd +++ b/quarto/derivatives/newtons_method.qmd @@ -178,15 +178,18 @@ x4, f(x4), f(x3) We see now that $f(x_4)$ is within machine tolerance of $0$, so we call $x_4$ an *approximate zero* of $f(x)$. +::: {.callout-note icon=false} +## Newton's method -> **Newton's method:** Let $x_0$ be an initial guess for a zero of $f(x)$. Iteratively define $x_{i+1}$ in terms of the just generated $x_i$ by: -> -> $$ -> x_{i+1} = x_i - f(x_i) / f'(x_i). -> $$ -> -> Then for reasonable functions and reasonable initial guesses, the sequence of points converges to a zero of $f$. +Let $x_0$ be an initial guess for a zero of $f(x)$. Iteratively define $x_{i+1}$ in terms of the just generated $x_i$ by: +$$ +x_{i+1} = x_i - f(x_i) / f'(x_i). +$$ + +Then for reasonable functions and reasonable initial guesses, the sequence of points converges to a zero of $f$. + +::: On the computer, we know that actual convergence will likely never occur, but accuracy to a certain tolerance can often be achieved. @@ -206,7 +209,12 @@ In practice, the algorithm is implemented not by repeating the update step a fix :::{.callout-note} ## Note -Newton looked at this same example in 1699 (B.T. Polyak, *Newton's method and its use in optimization*, European Journal of Operational Research. 02/2007; 181(3):1086-1096.) though his technique was slightly different as he did not use the derivative, *per se*, but rather an approximation based on the fact that his function was a polynomial (though identical to the derivative). Raphson (1690) proposed the general form, hence the usual name of the Newton-Raphson method. + +Newton looked at this same example in 1699 (B.T. Polyak, *Newton's method and its use in optimization*, European Journal of Operational Research. 02/2007; 181(3):1086-1096.; and Deuflhard *Newton Methods for Nonlinear Problems: Affine Invariance and Adaptive Algorithms*) though his technique was slightly different as he did not use the derivative, *per se*, but rather an approximation based on the fact that his function was a polynomial. + +We can read that he guessed the answer was ``2 + p``, as there is a sign change between $2$ and $3$. Newton put this guess into the polynomial to get after simplification ``p^3 + 6p^2 + 10p - 1``. This has an **approximate** zero found by solving the linear part ``10p-1 = 0``. Taking ``p = 0.1`` he then can say the answer looks like ``2 + p + q`` and repeat to get ``q^3 + 6.3\cdot q^2 + 11.23 \cdot q + 0.061 = 0``. Again taking just the linear part estimates `q = 0.005431...`. After two steps the estimate is `2.105431...`. This can be continued by expressing the answer as ``2 + p + q + r`` and then solving for an estimate for ``r``. + +Raphson (1690) proposed a simplification avoiding the computation of new polynomials, hence the usual name of the Newton-Raphson method. Simpson introduced derivatives into the formulation and systems of equations. ::: @@ -392,6 +400,24 @@ x, f(x) To machine tolerance the answer is a zero, even though the exact answer is irrational and all finite floating point values can be represented as rational numbers. +##### Example non-polynomial + +The first example by Newton of applying the method to a non-polynomial function was solving an equation from astronomy: $x - e \sin(x) = M$, where $e$ is an eccentric anomaly and $M$ a mean anomaly. Newton used polynomial approximations for the trigonometric functions, here we can solve directly. + +Let $e = 1/2$ and $M = 3/4$. With $f(x) = x - e\sin(x) - M$ then $f'(x) = 1 - e cos(x)$. Starting at 1, Newton's method for 3 steps becomes: + +```{julia} +ec, M = 0.5, 0.75 +f(x) = x - ec * sin(x) - M +fp(x) = 1 - ec * cos(x) +x = 1 +x = x - f(x) / fp(x) +x = x - f(x) / fp(x) +x = x - f(x) / fp(x) +x, f(x) +``` + + ##### Example @@ -429,7 +455,6 @@ end So it takes $8$ steps to get an increment that small and about `10` steps to get to full convergence. - ##### Example division as multiplication @@ -686,7 +711,7 @@ $$ For this value, we have - +$$ \begin{align*} x_{i+1} - \alpha &= \left(x_i - \frac{f(x_i)}{f'(x_i)}\right) - \alpha\\ @@ -696,6 +721,7 @@ x_{i+1} - \alpha \right)\\ &= \frac{1}{2}\frac{f''(\xi)}{f'(x_i)} \cdot(x_i - \alpha)^2. \end{align*} +$$ That is diff --git a/quarto/derivatives/optimization.qmd b/quarto/derivatives/optimization.qmd index cf132340..fef9f8ce 100644 --- a/quarto/derivatives/optimization.qmd +++ b/quarto/derivatives/optimization.qmd @@ -302,20 +302,20 @@ We could also do the above problem symbolically with the aid of `SymPy`. Here ar ```{julia} -@syms 𝐰::real 𝐡::real +@syms w₀::real h₀::real -𝐀₀ = 𝐰 * 𝐡 + pi * (𝐰/2)^2 / 2 -𝐏erim = 2*𝐡 + 𝐰 + pi * 𝐰/2 -𝐡₀ = solve(𝐏erim - 20, 𝐡)[1] -𝐀₁ = 𝐀₀(𝐡 => 𝐡₀) -𝐰₀ = solve(diff(𝐀₁,𝐰), 𝐰)[1] +A₀ = w₀ * h₀ + pi * (w₀/2)^2 / 2 +Perim = 2*h₀ + w₀ + pi * w₀/2 +h₁ = solve(Perim - 20, h₀)[1] +A₁ = A₀(h₀ => h₁) +w₁ = solve(diff(A₁,w₀), w₀)[1] ``` -We know that `𝐰₀` is the maximum in this example from our previous work. We shall see soon, that just knowing that the second derivative is negative at `𝐰₀` would suffice to know this. Here we check that condition: +We know that `w₀` is the maximum in this example from our previous work. We shall see soon, that just knowing that the second derivative is negative at `w₀` would suffice to know this. Here we check that condition: ```{julia} -diff(𝐀₁, 𝐰, 𝐰)(𝐰 => 𝐰₀) +diff(A₁, w₀, w₀)(w₀ => w₁) ``` As an aside, compare the steps involved above for a symbolic solution to those of previous work for a numeric solution: @@ -614,7 +614,7 @@ We see two terms: one with $x=L$ and another quadratic. For the simple case $r_0 ```{julia} -solve(q(r1=>r0), x) +solve(q(r1=>r0) ~ 0, x) ``` Well, not so fast. We need to check the other endpoint, $x=0$: @@ -632,7 +632,7 @@ Now, if, say, travel above the line is half as slow as travel along, then $2r_0 ```{julia} -out = solve(q(r1 => 2r0), x) +out = solve(q(r1 => 2r0) ~ 0, x) ``` It is hard to tell which would minimize time without more work. To check a case ($a=1, L=2, r_0=1$) we might have @@ -1372,11 +1372,12 @@ solve(x/b ~ (x+a)/(b + b*p), x) With $x = a/p$ we get by Pythagorean's theorem that - +$$ \begin{align*} c^2 &= (a + a/p)^2 + (b + bp)^2 \\ &= a^2(1 + \frac{1}{p})^2 + b^2(1+p)^2. \end{align*} +$$ The ladder problem minimizes $c$ or equivalently $c^2$. diff --git a/quarto/derivatives/taylor_series_polynomials.qmd b/quarto/derivatives/taylor_series_polynomials.qmd index 45fefa3e..e743892b 100644 --- a/quarto/derivatives/taylor_series_polynomials.qmd +++ b/quarto/derivatives/taylor_series_polynomials.qmd @@ -115,7 +115,7 @@ The term "best" is deserved, as any other straight line will differ at least in (This is a consequence of Cauchy's mean value theorem with $F(c) = f(c) - f'(c)\cdot(c-x)$ and $G(c) = (c-x)^2$ - +$$ \begin{align*} \frac{F'(\xi)}{G'(\xi)} &= \frac{f'(\xi) - f''(\xi)(\xi-x) - f(\xi)\cdot 1}{2(\xi-x)} \\ @@ -124,6 +124,7 @@ The term "best" is deserved, as any other straight line will differ at least in &= \frac{f(c) - f'(c)(c-x) - (f(x) - f'(x)(x-x))}{(c-x)^2 - (x-x)^2} \\ &= \frac{f(c) + f'(c)(x-c) - f(x)}{(x-c)^2} \end{align*} +$$ That is, $f(x) = f(c) + f'(c)(x-c) + f''(\xi)/2\cdot(x-c)^2$, or $f(x)-tl(x)$ is as described.) @@ -154,13 +155,14 @@ As in the linear case, there is flexibility in the exact points chosen for the i Now, we take a small detour to define some notation. Instead of writing our two points as $c$ and $c+h,$ we use $x_0$ and $x_1$. For any set of points $x_0, x_1, \dots, x_n$, define the **divided differences** of $f$ inductively, as follows: - +$$ \begin{align*} f[x_0] &= f(x_0) \\ f[x_0, x_1] &= \frac{f[x_1] - f[x_0]}{x_1 - x_0}\\ \cdots &\\ f[x_0, x_1, x_2, \dots, x_n] &= \frac{f[x_1, \dots, x_n] - f[x_0, x_1, x_2, \dots, x_{n-1}]}{x_n - x_0}. \end{align*} +$$ We see the first two values look familiar, and to generate more we just take certain ratios akin to those formed when finding a secant line. @@ -252,11 +254,12 @@ A proof based on Rolle's theorem appears in the appendix. Why the fuss? The answer comes from a result of Newton on *interpolating* polynomials. Consider a function $f$ and $n+1$ points $x_0$, $x_1, \dots, x_n$. Then an interpolating polynomial is a polynomial of least degree that goes through each point $(x_i, f(x_i))$. The [Newton form](https://en.wikipedia.org/wiki/Newton_polynomial) of such a polynomial can be written as: - +$$ \begin{align*} f[x_0] &+ f[x_0,x_1] \cdot (x-x_0) + f[x_0, x_1, x_2] \cdot (x-x_0) \cdot (x-x_1) + \\ & \cdots + f[x_0, x_1, \dots, x_n] \cdot (x-x_0)\cdot \cdots \cdot (x-x_{n-1}). \end{align*} +$$ The case $n=0$ gives the value $f[x_0] = f(c)$, which can be interpreted as the slope-$0$ line that goes through the point $(c,f(c))$. @@ -485,16 +488,19 @@ On inspection, it is seen that this is Newton's method applied to $f'(x)$. This Starting with the Newton form of the interpolating polynomial of smallest degree: - +$$ \begin{align*} f[x_0] &+ f[x_0,x_1] \cdot (x - x_0) + f[x_0, x_1, x_2] \cdot (x - x_0)\cdot(x-x_1) + \\ & \cdots + f[x_0, x_1, \dots, x_n] \cdot (x-x_0) \cdot \cdots \cdot (x-x_{n-1}). \end{align*} +$$ and taking $x_i = c + i\cdot h$, for a given $n$, we have in the limit as $h > 0$ goes to zero that coefficients of this polynomial converge to the coefficients of the *Taylor Polynomial of degree n*: + + $$ f(c) + f'(c)\cdot(x-c) + \frac{f''(c)}{2!}(x-c)^2 + \cdots + \frac{f^{(n)}(c)}{n!} (x-c)^n. $$ @@ -850,23 +856,25 @@ The actual code is different, as the Taylor polynomial isn't used. The Taylor p For notational purposes, let $g(x)$ be the inverse function for $f(x)$. Assume *both* functions have a Taylor polynomial expansion: - +$$ \begin{align*} f(x_0 + \Delta_x) &= f(x_0) + a_1 \Delta_x + a_2 (\Delta_x)^2 + \cdots a_n (\Delta_x)^n + \dots\\ g(y_0 + \Delta_y) &= g(y_0) + b_1 \Delta_y + b_2 (\Delta_y)^2 + \cdots b_n (\Delta_y)^n + \dots \end{align*} +$$ Then using $x = g(f(x))$, we have expanding the terms and using $\approx$ to drop the $\dots$: - +$$ \begin{align*} x_0 + \Delta_x &= g(f(x_0 + \Delta_x)) \\ &\approx g(f(x_0) + \sum_{j=1}^n a_j (\Delta_x)^j) \\ &\approx g(f(x_0)) + \sum_{i=1}^n b_i \left(\sum_{j=1}^n a_j (\Delta_x)^j \right)^i \\ &\approx x_0 + \sum_{i=1}^{n-1} b_i \left(\sum_{j=1}^n a_j (\Delta_x)^j\right)^i + b_n \left(\sum_{j=1}^n a_j (\Delta_x)^j\right)^n \end{align*} +$$ That is: @@ -1207,7 +1215,7 @@ $$ These two polynomials are of degree $n$ or less and have $u(x) = h(x)-g(x)=0$, by uniqueness. So the coefficients of $u(x)$ are $0$. We have that the coefficient of $x^n$ must be $a_n-b_n$ so $a_n=b_n$. Our goal is to express $a_n$ in terms of $a_{n-1}$ and $b_{n-1}$. Focusing on the $x^{n-1}$ term, we have: - +$$ \begin{align*} b_n(x-x_n)(x-x_{n-1})\cdot\cdots\cdot(x-x_1) &- a_n\cdot(x-x_0)\cdot\cdots\cdot(x-x_{n-1}) \\ @@ -1215,6 +1223,7 @@ b_n(x-x_n)(x-x_{n-1})\cdot\cdots\cdot(x-x_1) a_n [(x-x_1)\cdot\cdots\cdot(x-x_{n-1})] [(x- x_n)-(x-x_0)] \\ &= -a_n \cdot(x_n - x_0) x^{n-1} + p_{n-2}, \end{align*} +$$ where $p_{n-2}$ is a polynomial of at most degree $n-2$. (The expansion of $(x-x_1)\cdot\cdots\cdot(x-x_{n-1}))$ leaves $x^{n-1}$ plus some lower degree polynomial.) Similarly, we have $a_{n-1}(x-x_0)\cdot\cdots\cdot(x-x_{n-2}) = a_{n-1}x^{n-1} + q_{n-2}$ and $b_{n-1}(x-x_n)\cdot\cdots\cdot(x-x_2) = b_{n-1}x^{n-1}+r_{n-2}$. Combining, we get that the $x^{n-1}$ term of $u(x)$ is diff --git a/quarto/differentiable_vector_calculus/polar_coordinates.qmd b/quarto/differentiable_vector_calculus/polar_coordinates.qmd index d1bc568d..b8b91f41 100644 --- a/quarto/differentiable_vector_calculus/polar_coordinates.qmd +++ b/quarto/differentiable_vector_calculus/polar_coordinates.qmd @@ -321,10 +321,15 @@ As well, see this part of a [Wikipedia](http://en.wikipedia.org/wiki/Polar_coord Imagine we have $a < b$ and a partition $a=t_0 < t_1 < \cdots < t_n = b$. Let $\phi_i = (1/2)(t_{i-1} + t_{i})$ be the midpoint. Then the wedge of radius $r(\phi_i)$ with angle between $t_{i-1}$ and $t_i$ will have area $\pi r(\phi_i)^2 (t_i-t_{i-1}) / (2\pi) = (1/2) r(\phi_i)^2(t_i-t_{i-1})$, the ratio $(t_i-t_{i-1}) / (2\pi)$ being the angle to the total angle of a circle. Summing the area of these wedges over the partition gives a Riemann sum approximation for the integral $(1/2)\int_a^b r(\theta)^2 d\theta$. This limit of this sum defines the area in polar coordinates. +::: {.callout-note icon=false} +## Area of polar regions -> *Area of polar regions*. Let $R$ denote the region bounded by the curve $r(\theta)$ and bounded by the rays $\theta=a$ and $\theta=b$ with $b-a \leq 2\pi$, then the area of $R$ is given by: -> -> $A = \frac{1}{2}\int_a^b r(\theta)^2 d\theta.$ +Let $R$ denote the region bounded by the curve $r(\theta)$ and bounded by the rays $\theta=a$ and $\theta=b$ with $b-a \leq 2\pi$, then the area of $R$ is given by: + +$$ +A = \frac{1}{2}\int_a^b r(\theta)^2 d\theta. +$$ +::: @@ -412,18 +417,19 @@ The answer is the difference: The length of the arc traced by a polar graph can also be expressed using an integral. Again, we partition the interval $[a,b]$ and consider the wedge from $(r(t_{i-1}), t_{i-1})$ to $(r(t_i), t_i)$. The curve this wedge approximates will have its arc length approximated by the line segment connecting the points. Expressing the points in Cartesian coordinates and simplifying gives the distance squared as: - +$$ \begin{align*} d_i^2 &= (r(t_i) \cos(t_i) - r(t_{i-1})\cos(t_{i-1}))^2 + (r(t_i) \sin(t_i) - r(t_{i-1})\sin(t_{i-1}))^2\\ &= r(t_i)^2 - 2r(t_i)r(t_{i-1}) \cos(t_i - t_{i-1}) + r(t_{i-1})^2 \\ &\approx r(t_i)^2 - 2r(t_i)r(t_{i-1}) (1 - \frac{(t_i - t_{i-1})^2}{2})+ r(t_{i-1})^2 \quad(\text{as} \cos(x) \approx 1 - x^2/2)\\ &= (r(t_i) - r(t_{i-1}))^2 + r(t_i)r(t_{i-1}) (t_i - t_{i-1})^2. \end{align*} +$$ As was done with arc length we multiply $d_i$ by $(t_i - t_{i-1})/(t_i - t_{i-1})$ and move the bottom factor under the square root: - +$$ \begin{align*} d_i &= d_i \frac{t_i - t_{i-1}}{t_i - t_{i-1}} \\ @@ -431,13 +437,19 @@ d_i \frac{r(t_i)r(t_{i-1}) (t_i - t_{i-1})^2}{(t_i - t_{i-1})^2}} \cdot (t_i - t_{i-1})\\ &= \sqrt{(r'(\xi_i))^2 + r(t_i)r(t_{i-1})} \cdot (t_i - t_{i-1}).\quad(\text{the mean value theorem}) \end{align*} +$$ Adding the approximations to the $d_i$ looks like a Riemann sum approximation to the integral $\int_a^b \sqrt{(r'(\theta)^2) + r(\theta)^2} d\theta$ (with the extension to the Riemann sum formula needed to derive the arc length for a parameterized curve). That is: +::: {.callout-note icon=false} +## Arc length of a polar curve -> *Arc length of a polar curve*. The arc length of the curve described in polar coordinates by $r(\theta)$ for $a \leq \theta \leq b$ is given by: -> -> $\int_a^b \sqrt{r'(\theta)^2 + r(\theta)^2} d\theta.$ +The arc length of the curve described in polar coordinates by $r(\theta)$ for $a \leq \theta \leq b$ is given by: + +$$ +\int_a^b \sqrt{r'(\theta)^2 + r(\theta)^2} d\theta. +$$ +::: diff --git a/quarto/differentiable_vector_calculus/scalar_functions.qmd b/quarto/differentiable_vector_calculus/scalar_functions.qmd index 91e55df2..323fe6d6 100644 --- a/quarto/differentiable_vector_calculus/scalar_functions.qmd +++ b/quarto/differentiable_vector_calculus/scalar_functions.qmd @@ -36,12 +36,13 @@ nothing Consider a function $f: R^n \rightarrow R$. It has multiple arguments for its input (an $x_1, x_2, \dots, x_n$) and only one, *scalar*, value for an output. Some simple examples might be: - +$$ \begin{align*} f(x,y) &= x^2 + y^2\\ g(x,y) &= x \cdot y\\ h(x,y) &= \sin(x) \cdot \sin(y) \end{align*} +$$ For two examples from real life consider the elevation Point Query Service (of the [USGS](https://nationalmap.gov/epqs/)) returns the elevation in international feet or meters for a specific latitude/longitude within the United States. The longitude can be associated to an $x$ coordinate, the latitude to a $y$ coordinate, and the elevation a $z$ coordinate, and as long as the region is small enough, the $x$-$y$ coordinates can be thought to lie on a plane. (A flat earth assumption.) @@ -631,23 +632,25 @@ Before answering this, we discuss *directional* derivatives along the simplified If we compose $f \circ \vec\gamma_x$, we can visualize this as a curve on the surface from $f$ that moves in the $x$-$y$ plane along the line $y=c$. The derivative of this curve will satisfy: - +$$ \begin{align*} (f \circ \vec\gamma_x)'(x) &= \lim_{t \rightarrow x} \frac{(f\circ\vec\gamma_x)(t) - (f\circ\vec\gamma_x)(x)}{t-x}\\ &= \lim_{t\rightarrow x} \frac{f(t, c) - f(x,c)}{t-x}\\ &= \lim_{h \rightarrow 0} \frac{f(x+h, c) - f(x, c)}{h}. \end{align*} +$$ The latter expresses this to be the derivative of the function that holds the $y$ value fixed, but lets the $x$ value vary. It is the rate of change in the $x$ direction. There is special notation for this: - +$$ \begin{align*} \frac{\partial f(x,y)}{\partial x} &= \lim_{h \rightarrow 0} \frac{f(x+h, y) - f(x, y)}{h},\quad\text{and analogously}\\ \frac{\partial f(x,y)}{\partial y} &= \lim_{h \rightarrow 0} \frac{f(x, y+h) - f(x, y)}{h}. \end{align*} +$$ These are called the *partial* derivatives of $f$. The symbol $\partial$, read as "partial", is reminiscent of "$d$", but indicates the derivative is only in a given direction. Other notations exist for this: @@ -685,11 +688,12 @@ Let $f(x,y) = x^2 - 2xy$, then to compute the partials, we just treat the other Then - +$$ \begin{align*} \frac{\partial (x^2 - 2xy)}{\partial x} &= 2x - 2y\\ \frac{\partial (x^2 - 2xy)}{\partial y} &= 0 - 2x = -2x. \end{align*} +$$ Combining, gives $\nabla{f} = \langle 2x -2y, -2x \rangle$. @@ -697,12 +701,13 @@ Combining, gives $\nabla{f} = \langle 2x -2y, -2x \rangle$. If $g(x,y,z) = \sin(x) + z\cos(y)$, then - +$$ \begin{align*} \frac{\partial g }{\partial x} &= \cos(x) + 0 = \cos(x),\\ \frac{\partial g }{\partial y} &= 0 + z(-\sin(y)) = -z\sin(y),\\ \frac{\partial g }{\partial z} &= 0 + \cos(y) = \cos(y). \end{align*} +$$ Combining, gives $\nabla{g} = \langle \cos(x), -z\sin(y), \cos(y) \rangle$. @@ -938,12 +943,17 @@ where $\epsilon(h) \rightarrow 0$ as $h \rightarrow 0$. It is this characterization of differentiable that is generalized to define when a scalar function is *differentiable*. +::: {.callout-note icon=false} +## Differentiable -> *Differentiable*: Let $f$ be a scalar function. Then $f$ is [differentiable](https://tinyurl.com/qj8qcbb) at a point $C$ **if** the first order partial derivatives exist at $C$ **and** for $\vec{h}$ going to $\vec{0}$: -> -> $\|f(C + \vec{h}) - f(C) - \nabla{f}(C) \cdot \vec{h}\| = \mathcal{o}(\|\vec{h}\|),$ -> -> where $\mathcal{o}(\|\vec{h}\|)$ means that dividing the left hand side by $\|\vec{h}\|$ and taking a limit as $\vec{h}\rightarrow 0$ the limit will be $0$. +Let $f$ be a scalar function. Then $f$ is [differentiable](https://tinyurl.com/qj8qcbb) at a point $C$ **if** the first order partial derivatives exist at $C$ **and** for $\vec{h}$ going to $\vec{0}$: + +$$ +\|f(C + \vec{h}) - f(C) - \nabla{f}(C) \cdot \vec{h}\| = \mathcal{o}(\|\vec{h}\|), +$$ + +where $\mathcal{o}(\|\vec{h}\|)$ means that dividing the left hand side by $\|\vec{h}\|$ and taking a limit as $\vec{h}\rightarrow 0$ the limit will be $0$. +::: @@ -962,8 +972,12 @@ Later we will see how Taylor's theorem generalizes for scalar functions and inte In finding a partial derivative, we restricted the surface along a curve in the $x$-$y$ plane, in this case the curve $\vec{\gamma}(t)=\langle t, c\rangle$. In general if we have a curve in the $x$-$y$ plane, $\vec{\gamma}(t)$, we can compose the scalar function $f$ with $\vec{\gamma}$ to create a univariate function. If the functions are "smooth" then this composed function should have a derivative, and some version of a "chain rule" should provide a means to compute the derivative in terms of the "derivative" of $f$ (the gradient) and the derivative of $\vec{\gamma}$ ($\vec{\gamma}'$). -> *Chain rule*: Suppose $f$ is *differentiable* at $C$, and $\vec{\gamma}(t)$ is differentiable at $c$ with $\vec{\gamma}(c) = C$. Then $f\circ\vec{\gamma}$ is differentiable at $c$ with derivative $\nabla f(\vec{\gamma}(c)) \cdot \vec{\gamma}'(c)$. +::: {.callout-note icon=false} +## Chain rule + +Suppose $f$ is *differentiable* at $C$, and $\vec{\gamma}(t)$ is differentiable at $c$ with $\vec{\gamma}(c) = C$. Then $f\circ\vec{\gamma}$ is differentiable at $c$ with derivative $\nabla f(\vec{\gamma}(c)) \cdot \vec{\gamma}'(c)$. +::: This is similar to the chain rule for univariate functions $(f\circ g)'(u) = f'(g(u)) g'(u)$ or $df/dx = df/du \cdot du/dx$. However, when we write out in components there are more terms. For example, for $n=2$ we have with $\vec{\gamma} = \langle x(t), y(t) \rangle$: @@ -1217,7 +1231,10 @@ Let $f(x,y) = \sin(x+2y)$ and $\vec{v} = \langle 2, 1\rangle$. The directional d $$ -\nabla{f}\cdot \frac{\vec{v}}{\|\vec{v}\|} = \langle \cos(x + 2y), 2\cos(x + 2y)\rangle \cdot \frac{\langle 2, 1 \rangle}{\sqrt{5}} = \frac{4}{\sqrt{5}} \cos(x + 2y). +\nabla{f}\cdot \frac{\vec{v}}{\|\vec{v}\|} = +\langle \cos(x + 2y), 2\cos(x + 2y)\rangle \cdot +\frac{(\langle 2, 1 \rangle)}{\sqrt{5}} = +\frac{4}{\sqrt{5}} \cos(x + 2y). $$ ##### Example @@ -1408,17 +1425,18 @@ Let $f(x,y) = x^2 + y^2$ be a scalar function. We have if $G(r, \theta) = \langl Were this computed through the chain rule, we have: - +$$ \begin{align*} \nabla G_1 &= \langle \frac{\partial r\cos(\theta)}{\partial r}, \frac{\partial r\cos(\theta)}{\partial \theta} \rangle= \langle \cos(\theta), -r \sin(\theta) \rangle,\\ \nabla G_2 &= \langle \frac{\partial r\sin(\theta)}{\partial r}, \frac{\partial r\sin(\theta)}{\partial \theta} \rangle= \langle \sin(\theta), r \cos(\theta) \rangle. \end{align*} +$$ We have $\partial f/\partial x = 2x$ and $\partial f/\partial y = 2y$, which at $G$ are $2r\cos(\theta)$ and $2r\sin(\theta)$, so by the chain rule, we should have - +$$ \begin{align*} \frac{\partial (f\circ G)}{\partial r} &= \frac{\partial{f}}{\partial{x}}\frac{\partial G_1}{\partial r} + @@ -1430,6 +1448,7 @@ We have $\partial f/\partial x = 2x$ and $\partial f/\partial y = 2y$, which at \frac{\partial f}{\partial y}\frac{\partial G_2}{\partial \theta} = 2r\cos(\theta)(-r\sin(\theta)) + 2r\sin(\theta)(r\cos(\theta)) = 0. \end{align*} +$$ ## Higher order partial derivatives @@ -1467,9 +1486,11 @@ In `SymPy` the variable to differentiate by is taken from left to right, so `dif We see that `diff(ex, x, y)` and `diff(ex, y, x)` are identical. This is not a coincidence, as by [Schwarz's Theorem](https://tinyurl.com/y7sfw9sx) (also known as Clairaut's theorem) this will always be the case under typical assumptions: +::: {.callout-note icon=false} +## Theorem on mixed partials -> Theorem on mixed partials. If the mixed partials $\partial^2 f/\partial x \partial y$ and $\partial^2 f/\partial y \partial x$ exist and are continuous, then they are equal. - +If the mixed partials $\partial^2 f/\partial x \partial y$ and $\partial^2 f/\partial y \partial x$ exist and are continuous, then they are equal. +::: For higher order mixed partials, something similar to Schwarz's theorem still holds. Say $f:R^n \rightarrow R$ is $C^k$ if $f$ is continuous and all partial derivatives of order $j \leq k$ are continuous. If $f$ is $C^k$, and $k=k_1+k_2+\cdots+k_n$ ($k_i \geq 0$) then diff --git a/quarto/differentiable_vector_calculus/scalar_functions_applications.qmd b/quarto/differentiable_vector_calculus/scalar_functions_applications.qmd index 2c6ac567..f3a74288 100644 --- a/quarto/differentiable_vector_calculus/scalar_functions_applications.qmd +++ b/quarto/differentiable_vector_calculus/scalar_functions_applications.qmd @@ -341,11 +341,12 @@ The level curve $f(x,y)=0$ and the level curve $g(x,y)=0$ may intersect. Solving To elaborate, consider two linear equations written in a general form: - +$$ \begin{align*} ax + by &= u\\ cx + dy &= v \end{align*} +$$ A method to solve this by hand would be to solve for $y$ from one equation, replace this expression into the second equation and then solve for $x$. From there, $y$ can be found. A more advanced method expresses the problem in a matrix formulation of the form $Mx=b$ and solves that equation. This form of solving is implemented in `Julia`, through the "backslash" operator. Here is the general solution: @@ -422,21 +423,23 @@ We look to find the intersection point near $(1,1)$ using Newton's method We have by linearization: - +$$ \begin{align*} f(x,y) &\approx f(x_n, y_n) + \frac{\partial f}{\partial x}\Delta x + \frac{\partial f}{\partial y}\Delta y \\ g(x,y) &\approx g(x_n, y_n) + \frac{\partial g}{\partial x}\Delta x + \frac{\partial g}{\partial y}\Delta y, \end{align*} +$$ where $\Delta x = x- x_n$ and $\Delta y = y-y_n$. Setting $f(x,y)=0$ and $g(x,y)=0$, leaves these two linear equations in $\Delta x$ and $\Delta y$: - +$$ \begin{align*} \frac{\partial f}{\partial x} \Delta x + \frac{\partial f}{\partial y} \Delta y &= -f(x_n, y_n)\\ \frac{\partial g}{\partial x} \Delta x + \frac{\partial g}{\partial y} \Delta y &= -g(x_n, y_n). \end{align*} +$$ One step of Newton's method defines $(x_{n+1}, y_{n+1})$ to be the values $(x,y)$ that make the linearized functions about $(x_n, y_n)$ both equal to $\vec{0}$. @@ -679,14 +682,18 @@ An *absolute* maximum over $U$, should it exist, would be $f(\vec{a})$ if there The difference is the same as the one-dimensional case: local is a statement about nearby points only, absolute a statement about all the points in the specified set. +::: {.callout-note icon=false} +## The [Extreme Value Theorem](https://tinyurl.com/yyhgxu8y) -> The [Extreme Value Theorem](https://tinyurl.com/yyhgxu8y) Let $f:R^n \rightarrow R$ be continuous and defined on *closed* set $V$. Then $f$ has a minimum value $m$ and maximum value $M$ over $V$ and there exists at least two points $\vec{a}$ and $\vec{b}$ with $m = f(\vec{a})$ and $M = f(\vec{b})$. - - +Let $f:R^n \rightarrow R$ be continuous and defined on *closed* set $V$. Then $f$ has a minimum value $m$ and maximum value $M$ over $V$ and there exists at least two points $\vec{a}$ and $\vec{b}$ with $m = f(\vec{a})$ and $M = f(\vec{b})$. +::: -> [Fermat](https://tinyurl.com/nfgz8fz)'s theorem on critical points. Let $f:R^n \rightarrow R$ be a continuous function defined on an *open* set $U$. If $x \in U$ is a point where $f$ has a local extrema *and* $f$ is differentiable, then the gradient of $f$ at $x$ is $\vec{0}$. +::: {.callout-note icon=false} +## [Fermat](https://tinyurl.com/nfgz8fz)'s theorem on critical points +Let $f:R^n \rightarrow R$ be a continuous function defined on an *open* set $U$. If $x \in U$ is a point where $f$ has a local extrema *and* $f$ is differentiable, then the gradient of $f$ at $x$ is $\vec{0}$. +::: Call a point in the domain of $f$ where the function is differentiable and the gradient is zero a *stationary point* and a point in the domain where the function is either not differentiable or is a stationary point a *critical point*. The local extrema can only happen at critical points by Fermat. @@ -735,16 +742,16 @@ To identify these through formulas, and not graphically, we could try and use th The generalization of the *second* derivative test is more concrete though. Recall, the second derivative test is about the concavity of the function at the critical point. When the concavity can be determined as non-zero, the test is conclusive; when the concavity is zero, the test is not conclusive. Similarly here: +::: {.callout-note icon=false} +## The [second](https://en.wikipedia.org/wiki/Second_partial_derivative_test) Partial Derivative Test for $f:R^2 \rightarrow R$. -> The [second](https://en.wikipedia.org/wiki/Second_partial_derivative_test) Partial Derivative Test for $f:R^2 \rightarrow R$. -> -> Assume the first and second partial derivatives of $f$ are defined and continuous; $\vec{a}$ be a critical point of $f$; $H$ is the hessian matrix, $[f_{xx}\quad f_{xy};f_{xy}\quad f_{yy}]$, and $d = \det(H) = f_{xx} f_{yy} - f_{xy}^2$ is the determinant of the Hessian matrix. Then: -> -> * The function $f$ has a local minimum at $\vec{a}$ if $f_{xx} > 0$ *and* $d>0$, -> * The function $f$ has a local maximum at $\vec{a}$ if $f_{xx} < 0$ *and* $d>0$, -> * The function $f$ has a saddle point at $\vec{a}$ if $d < 0$, -> * Nothing can be said if $d=0$. +Assume the first and second partial derivatives of $f$ are defined and continuous; $\vec{a}$ be a critical point of $f$; $H$ is the hessian matrix, $[f_{xx}\quad f_{xy};f_{xy}\quad f_{yy}]$, and $d = \det(H) = f_{xx} f_{yy} - f_{xy}^2$ is the determinant of the Hessian matrix. Then: + * The function $f$ has a local minimum at $\vec{a}$ if $f_{xx} > 0$ *and* $d>0$, + * The function $f$ has a local maximum at $\vec{a}$ if $f_{xx} < 0$ *and* $d>0$, + * The function $f$ has a saddle point at $\vec{a}$ if $d < 0$, + * Nothing can be said if $d=0$. +::: --- @@ -1069,11 +1076,12 @@ $$ Another might be the vertical squared distance to the line: - +$$ \begin{align*} d2(\alpha, \beta) &= (y_1 - l(x_1))^2 + (y_2 - l(x_2))^2 + (y_3 - l(x_3))^2 \\ &= (y1 - (\alpha + \beta x_1))^2 + (y2 - (\alpha + \beta x_2))^2 + (y3 - (\alpha + \beta x_3))^2 \end{align*} +$$ Another might be the *shortest* distance to the line: @@ -1407,18 +1415,22 @@ contour!(xs, ys, f, levels = [.7, .85, 1, 1.15, 1.3]) We can still identify the tangent and normal directions. What is different about this point is that local movement on the constraint curve is also local movement on the contour line of $f$, so $f$ doesn't increase or decrease here, as it would if this point were an extrema along the constraint. The key to seeing this is the contour lines of $f$ are *tangent* to the constraint. The respective gradients are *orthogonal* to their tangent lines, and in dimension $2$, this implies they are parallel to each other. +::: {.callout-note icon=false} +## The method of Lagrange multipliers + +To optimize $f(x,y)$ subject to a constraint $g(x,y) = k$ we solve for all *simultaneous* solutions to + +$$ +\begin{align*} +\nabla{f}(x,y) &= \lambda \nabla{g}(x,y), \text{and}\\ +g(x,y) &= k. +\end{align*} +$$ + -> *The method of Lagrange multipliers*: To optimize $f(x,y)$ subject to a constraint $g(x,y) = k$ we solve for all *simultaneous* solutions to -> -> -> \begin{align*} -> \nabla{f}(x,y) &= \lambda \nabla{g}(x,y), \text{and}\\ -> g(x,y) &= k. -> \end{align*} -> -> -> These *possible* points are evaluated to see if they are maxima or minima. +These *possible* points are evaluated to see if they are maxima or minima. +::: The method will not work if $\nabla{g} = \vec{0}$ or if $f$ and $g$ are not differentiable. @@ -1472,12 +1484,13 @@ $$ The we have - +$$ \begin{align*} \frac{\partial L}{\partial{x}} &= \frac{\partial{f}}{\partial{x}} - \lambda \frac{\partial{g}}{\partial{x}}\\ \frac{\partial L}{\partial{y}} &= \frac{\partial{f}}{\partial{y}} - \lambda \frac{\partial{g}}{\partial{y}}\\ \frac{\partial L}{\partial{\lambda}} &= 0 + (g(x,y) - k). \end{align*} +$$ But if the Lagrange condition holds, each term is $0$, so Lagrange's method can be seen as solving for point $\nabla{L} = \vec{0}$. The optimization problem in two variables with a constraint becomes a problem of finding and classifying zeros of a function with *three* variables. @@ -1556,13 +1569,14 @@ The starting point is a *perturbation*: $\hat{y}(x) = y(x) + \epsilon_1 \eta_1(x With this notation, and fixing $y$ we can re-express the equations in terms of $\epsilon_1$ and $\epsilon_2$: - +$$ \begin{align*} F(\epsilon_1, \epsilon_2) &= \int f(x, \hat{y}, \hat{y}') dx = \int f(x, y + \epsilon_1 \eta_1 + \epsilon_2 \eta_2, y' + \epsilon_1 \eta_1' + \epsilon_2 \eta_2') dx,\\ G(\epsilon_1, \epsilon_2) &= \int g(x, \hat{y}, \hat{y}') dx = \int g(x, y + \epsilon_1 \eta_1 + \epsilon_2 \eta_2, y' + \epsilon_1 \eta_1' + \epsilon_2 \eta_2') dx. \end{align*} +$$ Then our problem is restated as: @@ -1590,7 +1604,7 @@ $$ Computing just the first one, we have using the chain rule and assuming interchanging the derivative and integral is possible: - +$$ \begin{align*} \frac{\partial{F}}{\partial{\epsilon_1}} &= \int \frac{\partial}{\partial{\epsilon_1}}( @@ -1598,6 +1612,7 @@ f(x, y + \epsilon_1 \eta_1 + \epsilon_2 \eta_2, y' + \epsilon_1 \eta_1' + \epsil &= \int \left(\frac{\partial{f}}{\partial{y}} \eta_1 + \frac{\partial{f}}{\partial{y'}} \eta_1'\right) dx\quad\quad(\text{from }\nabla{f} \cdot \langle 0, \eta_1, \eta_1'\rangle)\\ &=\int \eta_1 \left(\frac{\partial{f}}{\partial{y}} - \frac{d}{dx}\frac{\partial{f}}{\partial{y'}}\right) dx. \end{align*} +$$ The last line by integration by parts: @@ -1664,11 +1679,12 @@ ex2 = Eq(ex1.lhs()^2 - 1, simplify(ex1.rhs()^2) - 1) Now $y'$ can be integrated using the substitution $y - C = \lambda \cos\theta$ to give: $-\lambda\int\cos\theta d\theta = x + D$, $D$ some constant. That is: - +$$ \begin{align*} x + D &= - \lambda \sin\theta\\ y - C &= \lambda\cos\theta. \end{align*} +$$ Squaring gives the equation of a circle: $(x +D)^2 + (y-C)^2 = \lambda^2$. @@ -1680,11 +1696,12 @@ We center and *rescale* the problem so that $x_0 = -1, x_1 = 1$. Then $L > 2$ as We have $y=0$ at $x=1$ and $-1$ giving: - +$$ \begin{align*} (-1 + D)^2 + (0 - C)^2 &= \lambda^2\\ (+1 + D)^2 + (0 - C)^2 &= \lambda^2. \end{align*} +$$ Squaring out and solving gives $D=0$, $1 + C^2 = \lambda^2$. That is, an arc of circle with radius $\sqrt{1+C^2}$ and centered at $(0, C)$. @@ -1776,7 +1793,7 @@ where $R_k(x) = f^{k+1}(\xi)/(k+1)!(x-a)^{k+1}$ for some $\xi$ between $a$ and $ This theorem can be generalized to scalar functions, but the notation can be cumbersome. Following [Folland](https://sites.math.washington.edu/~folland/Math425/taylor2.pdf) we use *multi-index* notation. Suppose $f:R^n \rightarrow R$, and let $\alpha=(\alpha_1, \alpha_2, \dots, \alpha_n)$. Then define the following notation: - +$$ \begin{align*} |\alpha| &= \alpha_1 + \cdots + \alpha_n, \\ \alpha! &= \alpha_1!\alpha_2!\cdot\cdots\cdot\alpha_n!, \\ @@ -1784,6 +1801,7 @@ This theorem can be generalized to scalar functions, but the notation can be cum \partial^\alpha f &= \partial_1^{\alpha_1}\partial_2^{\alpha_2}\cdots \partial_n^{\alpha_n} f \\ & = \frac{\partial^{|\alpha|}f}{\partial x_1^{\alpha_1} \partial x_2^{\alpha_2} \cdots \partial x_n^{\alpha_n}}. \end{align*} +$$ This notation makes many formulas from one dimension carry over to higher dimensions. For example, the binomial theorem says: @@ -1800,8 +1818,8 @@ $$ (x_1 + x_2 + \cdots + x_n)^n = \sum_{|\alpha|=k} \frac{k!}{\alpha!} \vec{x}^\alpha. $$ -Taylor's theorem then becomes: - +::: {.callout-note icon=false} +## Taylor's theorem using multi-index If $f: R^n \rightarrow R$ is sufficiently smooth ($C^{k+1}$) on an open convex set $S$ about $\vec{a}$ then if $\vec{a}$ and $\vec{a}+\vec{h}$ are in $S$, @@ -1812,18 +1830,20 @@ $$ where $R_{\vec{a},k} = \sum_{|\alpha|=k+1}\partial^\alpha \frac{f(\vec{a} + c\vec{h})}{\alpha!} \vec{h}^\alpha$ for some $c$ in $(0,1)$. +::: ##### Example The elegant notation masks what can be complicated expressions. Consider the simple case $f:R^2 \rightarrow R$ and $k=2$. Then this says: - +$$ \begin{align*} f(x + dx, y+dy) &= f(x, y) + \frac{\partial f}{\partial x} dx + \frac{\partial f}{\partial y} dy \\ &+ \frac{\partial^2 f}{\partial x^2} \frac{dx^2}{2} + 2\frac{\partial^2 f}{\partial x\partial y} \frac{dx dy}{2}\\ &+ \frac{\partial^2 f}{\partial y^2} \frac{dy^2}{2} + R_{\langle x, y \rangle, k}(\langle dx, dy \rangle). \end{align*} +$$ Using $\nabla$ and $H$ for the Hessian and $\vec{x} = \langle x, y \rangle$ and $d\vec{x} = \langle dx, dy \rangle$, this can be expressed as: diff --git a/quarto/differentiable_vector_calculus/vector_fields.qmd b/quarto/differentiable_vector_calculus/vector_fields.qmd index c4daf38a..bc270f3b 100644 --- a/quarto/differentiable_vector_calculus/vector_fields.qmd +++ b/quarto/differentiable_vector_calculus/vector_fields.qmd @@ -191,11 +191,12 @@ surface(unzip(Phi.(thetas, phis'))...) The partial derivatives of each component, $\partial{\Phi}/\partial{\theta}$ and $\partial{\Phi}/\partial{\phi}$, can be computed directly: - +$$ \begin{align*} \partial{\Phi}/\partial{\theta} &= \langle -\sin(\phi)\sin(\theta), \sin(\phi)\cos(\theta),0 \rangle,\\ \partial{\Phi}/\partial{\phi} &= \langle \cos(\phi)\cos(\theta), \cos(\phi)\sin(\theta), -\sin(\phi) \rangle. \end{align*} +$$ Using `SymPy`, we can compute through: @@ -359,7 +360,7 @@ where $\epsilon(h) \rightarrow \vec{0}$ as $h \rightarrow \vec{0}$. We have, using this for *both* $F$ and $G$: - +$$ \begin{align*} F(G(a + \vec{h})) - F(G(a)) &= F(G(a) + (dG_a \cdot \vec{h} + \epsilon_G \vec{h})) - F(G(a))\\ @@ -367,18 +368,20 @@ F(G(a) + (dG_a \cdot \vec{h} + \epsilon_G \vec{h})) - F(G(a))\\ &+ \quad\epsilon_F (dG_a \cdot \vec{h} + \epsilon_G \vec{h}) - F(G(a))\\ &= dF_{G(a)} \cdot (dG_a \cdot \vec{h}) + dF_{G(a)} \cdot (\epsilon_G \vec{h}) + \epsilon_F (dG_a \cdot \vec{h}) + (\epsilon_F \cdot \epsilon_G\vec{h}) \end{align*} +$$ The last line uses the linearity of $dF$ to isolate $dF_{G(a)} \cdot (dG_a \cdot \vec{h})$. Factoring out $\vec{h}$ and taking norms gives: - +$$ \begin{align*} \frac{\| F(G(a+\vec{h})) - F(G(a)) - dF_{G(a)}dG_a \cdot \vec{h} \|}{\| \vec{h} \|} &= \frac{\| dF_{G(a)}\cdot(\epsilon_G\vec{h}) + \epsilon_F (dG_a\cdot \vec{h}) + (\epsilon_F\cdot\epsilon_G\vec{h}) \|}{\| \vec{h} \|} \\ &\leq \| dF_{G(a)}\cdot\epsilon_G + \epsilon_F (dG_a) + \epsilon_F\cdot\epsilon_G \|\frac{\|\vec{h}\|}{\| \vec{h} \|}\\ &\rightarrow 0. \end{align*} +$$ ### Examples @@ -660,7 +663,7 @@ det(A1), 1/det(A2) The technique of *implicit differentiation* is a useful one, as it allows derivatives of more complicated expressions to be found. The main idea, expressed here with three variables is if an equation may be viewed as $F(x,y,z) = c$, $c$ a constant, then $z=\phi(x,y)$ may be viewed as a function of $x$ and $y$. Hence, we can use the chain rule to find: $\partial z / \partial x$ and $\partial z /\partial y$. Let $G(x,y) = \langle x, y, \phi(x,y) \rangle$ and then differentiation $(F \circ G)(x,y) = c$: - +$$ \begin{align*} 0 &= dF_{G(x,y)} \circ dG_{\langle x, y\rangle}\\ &= [\frac{\partial F}{\partial x}\quad \frac{\partial F}{\partial y}\quad \frac{\partial F}{\partial z}](G(x,y)) \cdot @@ -670,6 +673,7 @@ The technique of *implicit differentiation* is a useful one, as it allows deriva \frac{\partial \phi}{\partial x} & \frac{\partial \phi}{\partial y} \end{bmatrix}. \end{align*} +$$ Solving yields @@ -685,14 +689,17 @@ Where the right hand side of each is evaluated at $G(x,y)$. When can it be reasonably assumed that such a function $z= \phi(x,y)$ exists? +::: {.callout-note icon=false} +The [Implicit Function Theorem](https://en.wikipedia.org/wiki/Implicit_function_theorem) (slightly abridged) -The [Implicit Function Theorem](https://en.wikipedia.org/wiki/Implicit_function_theorem) provides a statement (slightly abridged here): +Let $F:R^{n+m} \rightarrow R^m$ be a continuously differentiable function and let $R^{n+m}$ have (compactly defined) coordinates $\langle \vec{x}, \vec{y} \rangle$, Fix a point $\langle \vec{a}, \vec{b} \rangle$ with $F(\vec{a}, \vec{b}) = \vec{0}$. Let $J_{F, \vec{y}}(\vec{a}, \vec{b})$ be the Jacobian restricted to *just* the $y$ variables. ($J$ is $m \times m$.) If this matrix has non-zero determinant (it is invertible), then there exists an open set $U$ containing $\vec{a}$ and a *unique* continuously differentiable function $G: U \subset R^n \rightarrow R^m$ such that $G(\vec{a}) = \vec{b}$, $F(\vec{x}, G(\vec{x})) = 0$ for $\vec x$ in $U$. Moreover, the partial derivatives of $G$ are given by the matrix product: -> Let $F:R^{n+m} \rightarrow R^m$ be a continuously differentiable function and let $R^{n+m}$ have (compactly defined) coordinates $\langle \vec{x}, \vec{y} \rangle$, Fix a point $\langle \vec{a}, \vec{b} \rangle$ with $F(\vec{a}, \vec{b}) = \vec{0}$. Let $J_{F, \vec{y}}(\vec{a}, \vec{b})$ be the Jacobian restricted to *just* the $y$ variables. ($J$ is $m \times m$.) If this matrix has non-zero determinant (it is invertible), then there exists an open set $U$ containing $\vec{a}$ and a *unique* continuously differentiable function $G: U \subset R^n \rightarrow R^m$ such that $G(\vec{a}) = \vec{b}$, $F(\vec{x}, G(\vec{x})) = 0$ for $\vec x$ in $U$. Moreover, the partial derivatives of $G$ are given by the matrix product: -> -> $\frac{\partial G}{\partial x_j}(\vec{x}) = - [J_{F, \vec{y}}(x, F(\vec{x}))]^{-1} \left[\frac{\partial F}{\partial x_j}(x, G(\vec{x}))\right].$ +$$ +\frac{\partial G}{\partial x_j}(\vec{x}) = - [J_{F, \vec{y}}(x, F(\vec{x}))]^{-1} \left[\frac{\partial F}{\partial x_j}(x, G(\vec{x}))\right]. +$$ +::: --- diff --git a/quarto/differentiable_vector_calculus/vector_valued_functions.qmd b/quarto/differentiable_vector_calculus/vector_valued_functions.qmd index a382a328..738b8c45 100644 --- a/quarto/differentiable_vector_calculus/vector_valued_functions.qmd +++ b/quarto/differentiable_vector_calculus/vector_valued_functions.qmd @@ -800,7 +800,7 @@ Vector-valued functions do not have multiplication or division defined for them, For the dot product, the combination $\vec{f}(t) \cdot \vec{g}(t)$ we have a univariate function of $t$, so we know a derivative is well defined. Can it be represented in terms of the vector-valued functions? In terms of the component functions, we have this calculation specific to $n=2$, but that which can be generalized: - +$$ \begin{align*} \frac{d}{dt}(\vec{f}(t) \cdot \vec{g}(t)) &= \frac{d}{dt}(f_1(t) g_1(t) + f_2(t) g_2(t))\\ @@ -808,6 +808,7 @@ For the dot product, the combination $\vec{f}(t) \cdot \vec{g}(t)$ we have a uni &= f_1'(t) g_1(t) + f_2'(t) g_2(t) + f_1(t) g_1'(t) + f_2(t) g_2'(t)\\ &= \vec{f}'(t)\cdot \vec{g}(t) + \vec{f}(t) \cdot \vec{g}'(t). \end{align*} +$$ Suggesting that a product rule like formula applies for dot products. @@ -839,11 +840,12 @@ diff.(uₛ × vₛ, tₛ) - (diff.(uₛ, tₛ) × vₛ + uₛ × diff.(vₛ, t In summary, these two derivative formulas hold for vector-valued functions $R \rightarrow R^n$: - +$$ \begin{align*} (\vec{u} \cdot \vec{v})' &= \vec{u}' \cdot \vec{v} + \vec{u} \cdot \vec{v}',\\ (\vec{u} \times \vec{v})' &= \vec{u}' \times \vec{v} + \vec{u} \times \vec{v}'. \end{align*} +$$ ##### Application. Circular motion and the tangent vector. @@ -896,11 +898,12 @@ Combining, Newton states $\vec{a} = -(GM/r^2) \hat{x}$. Now to show the first law. Consider $\vec{x} \times \vec{v}$. It is constant, as: - +$$ \begin{align*} (\vec{x} \times \vec{v})' &= \vec{x}' \times \vec{v} + \vec{x} \times \vec{v}'\\ &= \vec{v} \times \vec{v} + \vec{x} \times \vec{a}. \end{align*} +$$ Both terms are $\vec{0}$, as $\vec{a}$ is parallel to $\vec{x}$ by the above, and clearly $\vec{v}$ is parallel to itself. @@ -912,34 +915,37 @@ This says, $\vec{x} \times \vec{v} = \vec{c}$ is a constant vector, meaning, the Now, by differentiating $\vec{x} = r \hat{x}$ we have: - +$$ \begin{align*} \vec{v} &= \vec{x}'\\ &= (r\hat{x})'\\ &= r' \hat{x} + r \hat{x}', \end{align*} +$$ and so - +$$ \begin{align*} \vec{c} &= \vec{x} \times \vec{v}\\ &= (r\hat{x}) \times (r'\hat{x} + r \hat{x}')\\ &= r^2 (\hat{x} \times \hat{x}'). \end{align*} +$$ From this, we can compute $\vec{a} \times \vec{c}$: - +$$ \begin{align*} \vec{a} \times \vec{c} &= (-\frac{GM}{r^2})\hat{x} \times r^2(\hat{x} \times \hat{x}')\\ &= -GM \hat{x} \times (\hat{x} \times \hat{x}') \\ &= GM (\hat{x} \times \hat{x}')\times \hat{x}. \end{align*} +$$ The last line by anti-commutativity. @@ -948,22 +954,24 @@ The last line by anti-commutativity. But, the triple cross product can be simplified through the identify $(\vec{u}\times\vec{v})\times\vec{w} = (\vec{u}\cdot\vec{w})\vec{v} - (\vec{v}\cdot\vec{w})\vec{u}$. So, the above becomes: - +$$ \begin{align*} \vec{a} \times \vec{c} &= GM ((\hat{x}\cdot\hat{x})\hat{x}' - (\hat{x} \cdot \hat{x}')\hat{x})\\ &= GM (1 \hat{x}' - 0 \hat{x}). \end{align*} +$$ Now, since $\vec{c}$ is constant, we have: - +$$ \begin{align*} (\vec{v} \times \vec{c})' &= (\vec{a} \times \vec{c})\\ &= GM \hat{x}'\\ &= (GM\hat{x})'. \end{align*} +$$ The two sides have the same derivative, hence differ by a constant: @@ -979,7 +987,7 @@ As $\vec{x}$ and $\vec{v}\times\vec{c}$ lie in the same plane - orthogonal to $\ Now - +$$ \begin{align*} c^2 &= \|\vec{c}\|^2 \\ &= \vec{c} \cdot \vec{c}\\ @@ -989,6 +997,7 @@ c^2 &= \|\vec{c}\|^2 \\ &= GMr + r \hat{x} \cdot \vec{d}\\ &= GMr + rd \cos(\theta). \end{align*} +$$ Solving, this gives the first law. That is, the radial distance is in the form of an ellipse: @@ -1397,12 +1406,15 @@ plotly(); In [Arc length](../integrals/arc_length.html) there is a discussion of how to find the arc length of a parameterized curve in $2$ dimensions. The general case is discussed by [Destafano](https://randomproofs.files.wordpress.com/2010/11/arc_length.pdf) who shows: -> *Arc-length*: if a curve $C$ is parameterized by a smooth function $\vec{r}(t)$ over an interval $I$, then the arc length of $C$ is: -> -> $$ -> \int_I \| \vec{r}'(t) \| dt. -> $$ +::: {.callout-note icon=false} +## Arc-length +If a curve $C$ is parameterized by a smooth function $\vec{r}(t)$ over an interval $I$, then the arc length of $C$ is: + +$$ +\int_I \| \vec{r}'(t) \| dt. +$$ +::: If we associate $\vec{r}'(t)$ with the velocity, then this is the integral of the speed (the magnitude of the velocity). @@ -1519,12 +1531,13 @@ $$ As before, but further, we have if $\kappa$ is the curvature and $\tau$ the torsion, these relationships expressing the derivatives with respect to $s$ in terms of the components in the frame: - +$$ \begin{align*} \hat{T}'(s) &= &\kappa \hat{N}(s) &\\ \hat{N}'(s) &= -\kappa \hat{T}(s) & &+ \tau \hat{B}(s)\\ \hat{B}'(s) &= &-\tau \hat{N}(s) & \end{align*} +$$ These are the [Frenet-Serret](https://en.wikipedia.org/wiki/Frenet%E2%80%93Serret_formulas) formulas. @@ -1637,12 +1650,13 @@ end Levi and Tabachnikov prove in their Proposition 2.4: - +$$ \begin{align*} \kappa(u) &= \frac{d\alpha(u)}{du} + \frac{\sin(\alpha(u))}{a},\\ |\frac{dv}{du}| &= |\cos(\alpha)|, \quad \text{and}\\ k &= \frac{\tan(\alpha)}{a}. \end{align*} +$$ The first equation relates the steering angle with the curvature. If the steering angle is not changed ($d\alpha/du=0$) then the curvature is constant and the motion is circular. It will be greater for larger angles (up to $\pi/2$). As the curvature is the reciprocal of the radius, this means the radius of the circular trajectory will be smaller. For the same constant steering angle, the curvature will be smaller for longer wheelbases, meaning the circular trajectory will have a larger radius. For cars, which have similar dynamics, this means longer wheelbase cars will take more room to make a U-turn. @@ -1657,13 +1671,14 @@ The last equation, relates the curvature of the back wheel track to the steering To derive the first one, we have previously noted that when a curve is parameterized by arc length, the curvature is more directly computed: it is the magnitude of the derivative of the tangent vector. The tangent vector is of unit length, when parametrized by arc length. This implies its derivative will be orthogonal. If $\vec{r}(t)$ is a parameterization by arc length, then the curvature formula simplifies as: - +$$ \begin{align*} \kappa(s) &= \frac{\| \vec{r}'(s) \times \vec{r}''(s) \|}{\|\vec{r}'(s)\|^3} \\ &= \frac{\| \vec{r}'(s) \times \vec{r}''(s) \|}{1} \\ &= \| \vec{r}'(s) \| \| \vec{r}''(s) \| \sin(\theta) \\ &= 1 \| \vec{r}''(s) \| 1 = \| \vec{r}''(s) \|. \end{align*} +$$ So in the above, the curvature is $\kappa = \| \vec{F}''(u) \|$ and $k = \|\vec{B}''(v)\|$. @@ -1691,7 +1706,7 @@ $$ It must be that the tangent line of $\vec{B}$ is parallel to $\vec{U} \cos(\alpha) + \vec{V} \sin(\alpha)$. To utilize this, we differentiate $\vec{B}$ using the facts that $\vec{U}' = -\kappa \vec{V}$ and $\vec{V}' = \kappa \vec{U}$. These coming from $\vec{U} = \vec{F}'$ and so it's derivative in $u$ has magnitude yielding the curvature, $\kappa$, and direction orthogonal to $\vec{U}$. - +$$ \begin{align*} \vec{B}'(u) &= \vec{F}'(u) -a \vec{U}' \cos(\alpha) -a \vec{U} (-\sin(\alpha)) \alpha' @@ -1703,12 +1718,13 @@ a (\kappa) \vec{U} \sin(\alpha) - a \vec{V} \cos(\alpha) \alpha' \\ + a(\alpha' - \kappa) \sin(\alpha) \vec{U} - a(\alpha' - \kappa) \cos(\alpha)\vec{V}. \end{align*} +$$ Extend the $2$-dimensional vectors to $3$ dimensions, by adding a zero $z$ component, then: - +$$ \begin{align*} \vec{0} &= (\vec{U} + a(\alpha' - \kappa) \sin(\alpha) \vec{U} @@ -1721,6 +1737,7 @@ a(\alpha' - \kappa) \cos(\alpha)\vec{V} \times \vec{U} \cos(\alpha) \\ a(\alpha'-\kappa) \cos^2(\alpha)) \vec{U} \times \vec{V} \\ &= (\sin(\alpha) + a (\alpha' - \kappa)) \vec{U} \times \vec{V}. \end{align*} +$$ The terms $\vec{U} \times\vec{U}$ and $\vec{V}\times\vec{V}$ being $\vec{0}$, due to properties of the cross product. This says the scalar part must be $0$, or @@ -1733,7 +1750,7 @@ $$ As for the second equation, from the expression for $\vec{B}'(u)$, after setting $a(\alpha'-\kappa) = -\sin(\alpha)$: - +$$ \begin{align*} \|\vec{B}'(u)\|^2 &= \| (1 -\sin(\alpha)\sin(\alpha)) \vec{U} +\sin(\alpha)\cos(\alpha) \vec{V} \|^2\\ @@ -1742,6 +1759,7 @@ As for the second equation, from the expression for $\vec{B}'(u)$, after setting &= \cos^2(\alpha)(\cos^2(\alpha) + \sin^2(\alpha))\\ &= \cos^2(\alpha). \end{align*} +$$ From this $\|\vec{B}(u)\| = |\cos(\alpha)\|$. But $1 = \|d\vec{B}/dv\| = \|d\vec{B}/du \| \cdot |du/dv|$ and $|dv/du|=|\cos(\alpha)|$ follows. @@ -1778,11 +1796,12 @@ Consider a parameterization of a curve by arc-length, $\vec\gamma(s) = \langle u Consider two nearby points $t$ and $t+\epsilon$ and the intersection of $l_t$ and $l_{t+\epsilon}$. That is, we need points $a$ and $b$ with: $l_t(a) = l_{t+\epsilon}(b)$. Setting the components equal, this is: - +$$ \begin{align*} u(t) - av'(t) &= u(t+\epsilon) - bv'(t+\epsilon) \\ v(t) + au'(t) &= v(t+\epsilon) + bu'(t+\epsilon). \end{align*} +$$ This is a linear equation in two unknowns ($a$ and $b$) which can be solved. Here is the value for `a`: @@ -1801,24 +1820,26 @@ out[a] Letting $\epsilon \rightarrow 0$ we get an expression for $a$ that will describe the evolute at time $t$ in terms of the function $\gamma$. Looking at the expression above, we can see that dividing the *numerator* by $\epsilon$ and taking a limit will yield $u'(t)^2 + v'(t)^2$. If the *denominator* has a limit after dividing by $\epsilon$, then we can find the description sought. Pursuing this leads to: - +$$ \begin{align*} \frac{u'(t) v'(t+\epsilon) - v'(t) u'(t+\epsilon)}{\epsilon} &= \frac{u'(t) v'(t+\epsilon) -u'(t)v'(t) + u'(t)v'(t)- v'(t) u'(t+\epsilon)}{\epsilon} \\ &= \frac{u'(t)(v'(t+\epsilon) -v'(t))}{\epsilon} + \frac{(u'(t)- u'(t+\epsilon))v'(t)}{\epsilon}, \end{align*} +$$ which in the limit will give $u'(t)v''(t) - u''(t) v'(t)$. All told, in the limit as $\epsilon \rightarrow 0$ we get - +$$ \begin{align*} a &= \frac{u'(t)^2 + v'(t)^2}{u'(t)v''(t) - v'(t) u''(t)} \\ &= 1/(\|\vec\gamma'\|\kappa) \\ &= 1/(\|\hat{T}\|\kappa) \\ &= 1/\kappa, \end{align*} +$$ with $\kappa$ being the curvature of the planar curve. That is, the evolute of $\vec\gamma$ is described by: @@ -1844,13 +1865,14 @@ plot_parametric!(0..2pi, t -> (rₑ₃(t) + Normal(rₑ₃, t)/curvature(rₑ₃ We computed the above illustration using $3$ dimensions (hence the use of `[1:2]...`) as the curvature formula is easier to express. Recall, the curvature also appears in the [Frenet-Serret](https://en.wikipedia.org/wiki/Frenet%E2%80%93Serret_formulas) formulas: $d\hat{T}/ds = \kappa \hat{N}$ and $d\hat{N}/ds = -\kappa \hat{T}+ \tau \hat{B}$. In a planar curve, as under consideration, the binormal is $\vec{0}$. This allows the computation of $\vec\beta(s)'$: - +$$ \begin{align*} \vec{\beta}' &= \frac{d(\vec\gamma + (1/ \kappa) \hat{N})}{ds}\\ &= \hat{T} + (-\frac{\kappa '}{\kappa ^2}\hat{N} + \frac{1}{\kappa} \hat{N}')\\ &= \hat{T} - \frac{\kappa '}{\kappa ^2}\hat{N} + \frac{1}{\kappa} (-\kappa \hat{T})\\ &= - \frac{\kappa '}{\kappa ^2}\hat{N}. \end{align*} +$$ We see $\vec\beta'$ is zero (the curve is non-regular) when $\kappa'(s) = 0$. The curvature changes from increasing to decreasing, or vice versa at each of the $4$ crossings of the major and minor axes - there are $4$ non-regular points, and we see $4$ cusps in the evolute. @@ -1915,11 +1937,12 @@ $$ If $\vec\gamma(s)$ is parameterized by arc length, then this simplifies quite a bit, as the unit tangent is just $\vec\gamma'(s)$ and the remaining arc length just $(s-a)$: - +$$ \begin{align*} \vec\beta_a(s) &= \vec\gamma(s) - \vec\gamma'(s) (s-a) \\ &=\vec\gamma(s) - \hat{T}_{\vec\gamma}(s)(s-a).\quad (a \text{ is the arc-length parameter}) \end{align*} +$$ With this characterization, we see several properties: @@ -1940,11 +1963,12 @@ $$ In the following we show that: - +$$ \begin{align*} \kappa_{\vec\beta_a}(s) &= 1/(s-a),\\ \hat{N}_{\vec\beta_a}(s) &= \hat{T}_{\vec\beta_a}'(s)/\|\hat{T}_{\vec\beta_a}'(s)\| = -\hat{T}_{\vec\gamma}(s). \end{align*} +$$ The first shows in a different way that when $s=a$ the curve is not regular, as the curvature fails to exists. In the above figure, when the involute touches $\vec\gamma$, there will be a cusp. @@ -1953,7 +1977,7 @@ The first shows in a different way that when $s=a$ the curve is not regular, as With these two identifications and using $\vec\gamma'(s) = \hat{T}_{\vec\gamma(s)}$, we have the evolute simplifies to - +$$ \begin{align*} \vec\beta_a(s) + \frac{1}{\kappa_{\vec\beta_a}(s)}\hat{N}_{\vec\beta_a}(s) &= @@ -1962,6 +1986,7 @@ With these two identifications and using $\vec\gamma'(s) = \hat{T}_{\vec\gamma(s \vec\gamma(s) + \hat{T}_{\vec\gamma}(s)(s-a) + \frac{1}{1/(s-a)} (-\hat{T}_{\vec\gamma}(s)) \\ &= \vec\gamma(s). \end{align*} +$$ That is the evolute of an involute of $\vec\gamma(s)$ is $\vec\gamma(s)$. @@ -1970,12 +1995,13 @@ That is the evolute of an involute of $\vec\gamma(s)$ is $\vec\gamma(s)$. We have: - +$$ \begin{align*} \beta_a(s) &= \vec\gamma - \vec\gamma'(s)(s-a)\\ \beta_a'(s) &= -\kappa_{\vec\gamma}(s)(s-a)\hat{N}_{\vec\gamma}(s)\\ \beta_a''(s) &= (-\kappa_{\vec\gamma}(s)(s-a))' \hat{N}_{\vec\gamma}(s) + (-\kappa_{\vec\gamma}(s)(s-a))(-\kappa_{\vec\gamma}\hat{T}_{\vec\gamma}(s)), \end{align*} +$$ the last line by the Frenet-Serret formula for *planar* curves which show $\hat{T}'(s) = \kappa(s) \hat{N}$ and $\hat{N}'(s) = -\kappa(s)\hat{T}(s)$. @@ -1984,11 +2010,12 @@ the last line by the Frenet-Serret formula for *planar* curves which show $\hat To compute the curvature of $\vec\beta_a$, we need to compute both: - +$$ \begin{align*} \| \vec\beta' \|^3 &= |\kappa^3 (s-a)^3|\\ \| \vec\beta' \times \vec\beta'' \| &= |\kappa(s)^3 (s-a)^2|, \end{align*} +$$ the last line using both $\hat{N}\times\hat{N} = \vec{0}$ and $\|\hat{N}\times\hat{T}\| = 1$. The curvature then is $\kappa_{\vec\beta_a}(s) = 1/(s-a)$. @@ -2672,13 +2699,14 @@ radioq(choices, answ) The evolute comes from the formula $\vec\gamma(T) - (1/\kappa(t)) \hat{N}(t)$. For hand computation, this formula can be explicitly given by two components $\langle X(t), Y(t) \rangle$ through: - +$$ \begin{align*} r(t) &= x'(t)^2 + y'(t)^2\\ k(t) &= x'(t)y''(t) - x''(t) y'(t)\\ X(t) &= x(t) - y'(t) r(t)/k(t)\\ Y(t) &= y(t) + x'(t) r(t)/k(t) \end{align*} +$$ Let $\vec\gamma(t) = \langle t, t^2 \rangle = \langle x(t), y(t)\rangle$ be a parameterization of a parabola. diff --git a/quarto/differentiable_vector_calculus/vectors.qmd b/quarto/differentiable_vector_calculus/vectors.qmd index f621e343..1a7fa0b4 100644 --- a/quarto/differentiable_vector_calculus/vectors.qmd +++ b/quarto/differentiable_vector_calculus/vectors.qmd @@ -443,12 +443,13 @@ $$ The left hand sides are in the form of a dot product, in this case $\langle a,b \rangle \cdot \langle x, y\rangle$ and $\langle a,b,c \rangle \cdot \langle x, y, z\rangle$ respectively. When there is a system of equations, something like: - +$$ \begin{align*} 3x &+ 4y &- 5z &= 10\\ 3x &- 5y &+ 7z &= 11\\ -3x &+ 6y &+ 9z &= 12, \end{align*} +$$ Then we might think of $3$ vectors $\langle 3,4,-5\rangle$, $\langle 3,-5,7\rangle$, and $\langle -3,6,9\rangle$ being dotted with $\langle x,y,z\rangle$. Mathematically, matrices and their associated algebra are used to represent this. In this example, the system of equations above would be represented by a matrix and two vectors: diff --git a/quarto/integral_vector_calculus/div_grad_curl.qmd b/quarto/integral_vector_calculus/div_grad_curl.qmd index f043b8c1..5180b127 100644 --- a/quarto/integral_vector_calculus/div_grad_curl.qmd +++ b/quarto/integral_vector_calculus/div_grad_curl.qmd @@ -105,7 +105,7 @@ annotate!([ (.5, -.1, "Δy"), (1+.75dx, .1, "Δx"), (1+dx+.1, .75, "Δz"), - (.5,.15,L"(x,y,z)"), + (.5,.15,"(x,y,z)"), (.45,.6, "î"), (1+.8dx, .7, "ĵ"), (.8, 1+dy+.1, "k̂") @@ -204,20 +204,20 @@ arrow!([1/2, 1-dx], .01 *[-1,0], linewidth=3, color=:blue) arrow!([1-dx, 1/2], .01 *[0, 1], linewidth=3, color=:blue) annotate!([ - (0,-1/16,L"(x,y)"), - (1, -1/16, L"(x+\Delta{x},y)"), - (0, 1+1/16, L"(x,y+\Delta{y})"), - (1/2, 4dx, L"\hat{i}"), - (1/2, 1-4dx, L"-\hat{i}"), - (3dx, 1/2, L"-\hat{j}"), - (1-3dx, 1/2, L"\hat{j}") + (0,-1/16,"(x,y)"), + (1, -1/16, "(x+Δx,y)"), + (0, 1+1/16, "(x,y+Δy)"), + (1/2, 4dx, "î}"), + (1/2, 1-4dx, "-î"), + (3dx, 1/2, "-ĵ"), + (1-3dx, 1/2, "ĵ") ]) ``` Let $F=\langle F_x, F_y\rangle$. For small enough values of $\Delta{x}$ and $\Delta{y}$ the line integral, $\oint_C F\cdot d\vec{r}$ can be *approximated* by $4$ terms: - +$$ \begin{align*} \left(F(x,y) \cdot \hat{i}\right)\Delta{x} &+ \left(F(x+\Delta{x},y) \cdot \hat{j}\right)\Delta{y} + @@ -230,6 +230,7 @@ F_x(x, y+\Delta{y}) (-\Delta{x}) + F_y(x,y) (-\Delta{y})\\ (F_y(x + \Delta{x}, y) - F_y(x, y))\Delta{y} - (F_x(x, y+\Delta{y})-F_x(x,y))\Delta{x}. \end{align*} +$$ The Riemann approximation allows a choice of evaluation point for Riemann integrable functions, and the choice here lends itself to further analysis. Were the above divided by $\Delta{x}\Delta{y}$, the area of the box, and a limit taken, partial derivatives appear to suggest this formula: @@ -275,7 +276,7 @@ annotate!([ (.5, -.1, "Δy"), (1+.75dx, .1, "Δx"), (1+dx+.1, .75, "Δz"), - (.5,.15,L"(x,y,z)"), + (.5,.15,"(x,y,z)"), (.45,.6, "î"), (1+.8dx, .667, "ĵ"), (.8, 1+dy+.067, "k̂"), @@ -309,10 +310,10 @@ annotate!([ (.9, 1+dx, "C₁"), - (2*dx, 1/2, L"\hat{T}=\hat{i}"), - (1+2*dx,1/2, L"\hat{T}=-\hat{i}"), - (1/2,-3/2*dx, L"\hat{T}=\hat{j}"), - (1/2, 1+(3/2)*dx, L"\hat{T}=-\hat{j}"), + (2*dx, 1/2, "T̂=î"), + (1+2*dx,1/2, "T̂=-î"), + (1/2,-3/2*dx, "T̂= ĵ"), + (1/2, 1+(3/2)*dx, "T̂=-ĵ"), (3dx,1-2dx, "(x,y,z+Δz)"), (4dx,2dx, "(x+Δx,y,z+Δz)"), @@ -326,18 +327,19 @@ p Now we compute the *line integral*. Consider the top face, $S_1$, connecting $(x,y,z+\Delta z), (x + \Delta x, y, z + \Delta z), (x + \Delta x, y + \Delta y, z + \Delta z), (x, y + \Delta y, z + \Delta z)$, Using the *right hand rule*, parameterize the boundary curve, $C_1$, in a counter clockwise direction so the right hand rule yields the outward pointing normal ($\hat{k}$). Then the integral $\oint_{C_1} F\cdot \hat{T} ds$ is *approximated* by the following Riemann sum of $4$ terms: - +$$ \begin{align*} F(x,y, z+\Delta{z}) \cdot \hat{i}\Delta{x} &+ F(x+\Delta x, y, z+\Delta{z}) \cdot \hat{j} \Delta y \\ &+ F(x, y+\Delta y, z+\Delta{z}) \cdot (-\hat{i}) \Delta{x} \\ &+ F(x, y, z+\Delta{z}) \cdot (-\hat{j}) \Delta{y}. \end{align*} +$$ (The points $c_i$ are chosen from the endpoints of the line segments.) - +$$ \begin{align*} \oint_{C_1} F\cdot \hat{T} ds &\approx (F_y(x+\Delta x, y, z+\Delta{z}) \\ @@ -345,17 +347,19 @@ F(x,y, z+\Delta{z}) \cdot \hat{i}\Delta{x} &+ F(x+\Delta x, y, z+\Delta{z}) \cd &- (F_x(x,y + \Delta{y}, z+\Delta{z}) \\ &- F_x(x, y, z+\Delta{z})) \Delta{x} \end{align*} +$$ As before, were this divided by the *area* of the surface, we have after rearranging and cancellation: - +$$ \begin{align*} \frac{1}{\Delta{S_1}} \oint_{C_1} F \cdot \hat{T} ds &\approx \frac{F_y(x+\Delta x, y, z+\Delta{z}) - F_y(x, y, z+\Delta{z})}{\Delta{x}}\\ &- \frac{F_x(x, y+\Delta y, z+\Delta{z}) - F_x(x, y, z+\Delta{z})}{\Delta{y}}. \end{align*} +$$ In the limit, as $\Delta{S} \rightarrow 0$, this will converge to $\partial{F_y}/\partial{x}-\partial{F_x}/\partial{y}$. @@ -367,7 +371,7 @@ Had the bottom of the box been used, a similar result would be found, up to a mi Unlike the two dimensional case, there are other directions to consider and here the other sides will yield different answers. Consider now the face connecting $(x,y,z), (x+\Delta{x}, y, z), (x+\Delta{x}, y, z + \Delta{z})$, and $(x,y,z +\Delta{z})$ with outward pointing normal $-\hat{j}$. Let $S_2$ denote this face and $C_2$ describe its boundary. Orient this curve so that the right hand rule points in the $-\hat{j}$ direction (the outward pointing normal). Then, as before, we can approximate: - +$$ \begin{align*} \oint_{C_2} F \cdot \hat{T} ds &\approx @@ -378,6 +382,7 @@ F(x,y,z) \cdot \hat{i} \Delta{x} \\ &= (F_z(x+\Delta{x},y,z) - F_z(x, y, z))\Delta{z} - (F_x(x,y,z+\Delta{z}) - F(x,y,z)) \Delta{x}. \end{align*} +$$ Dividing by $\Delta{S}=\Delta{x}\Delta{z}$ and taking a limit will give: @@ -401,16 +406,18 @@ $$ In short, depending on the face chosen, a different answer is given, but all have the same type. +::: {.callout-note icon=false} +## The curl -> Define the *curl* of a $3$-dimensional vector field $F=\langle F_x,F_y,F_z\rangle$ by: -> -> $$ -> \text{curl}(F) = -> \langle \frac{\partial{F_z}}{\partial{y}} - \frac{\partial{F_y}}{\partial{z}}, -> \frac{\partial{F_x}}{\partial{z}} - \frac{\partial{F_z}}{\partial{x}}, -> \frac{\partial{F_y}}{\partial{x}} - \frac{\partial{F_x}}{\partial{y}} \rangle. -> $$ +Define the *curl* of a $3$-dimensional vector field $F=\langle F_x,F_y,F_z\rangle$ by: +$$ +\text{curl}(F) = +\langle \frac{\partial{F_z}}{\partial{y}} - \frac{\partial{F_y}}{\partial{z}}, +\frac{\partial{F_x}}{\partial{z}} - \frac{\partial{F_z}}{\partial{x}}, +\frac{\partial{F_y}}{\partial{x}} - \frac{\partial{F_x}}{\partial{y}} \rangle. +$$ +::: If $S$ is some surface with closed boundary $C$ oriented so that the unit normal, $\hat{N}$, of $S$ is given by the right hand rule about $C$, then @@ -474,7 +481,7 @@ The divergence, gradient, and curl all involve partial derivatives. There is a n This is a *vector differential operator* that acts on functions and vector fields through the typical notation to yield the three operations: - +$$ \begin{align*} \nabla{f} &= \langle \frac{\partial{f}}{\partial{x}}, @@ -512,6 +519,7 @@ F_x & F_y & F_z \end{bmatrix} ,\quad\text{the curl}. \end{align*} +$$ :::{.callout-note} @@ -842,12 +850,13 @@ Let $f$ and $g$ denote scalar functions, $R^3 \rightarrow R$ and $F$ and $G$ be As with the sum rule of univariate derivatives, these operations satisfy: - +$$ \begin{align*} \nabla(f + g) &= \nabla{f} + \nabla{g}\\ \nabla\cdot(F+G) &= \nabla\cdot{F} + \nabla\cdot{G}\\ \nabla\times(F+G) &= \nabla\times{F} + \nabla\times{G}. \end{align*} +$$ ### Product rule @@ -856,12 +865,13 @@ As with the sum rule of univariate derivatives, these operations satisfy: The product rule $(uv)' = u'v + uv'$ has related formulas: - +$$ \begin{align*} \nabla{(fg)} &= (\nabla{f}) g + f\nabla{g} = g\nabla{f} + f\nabla{g}\\ \nabla\cdot{fF} &= (\nabla{f})\cdot{F} + f(\nabla\cdot{F})\\ \nabla\times{fF} &= (\nabla{f})\times{F} + f(\nabla\times{F}). \end{align*} +$$ ### Rules over cross products @@ -870,12 +880,13 @@ The product rule $(uv)' = u'v + uv'$ has related formulas: The cross product of two vector fields is a vector field for which the divergence and curl may be taken. There are formulas to relate to the individual terms: - +$$ \begin{align*} \nabla\cdot(F \times G) &= (\nabla\times{F})\cdot G - F \cdot (\nabla\times{G})\\ \nabla\times(F \times G) &= F(\nabla\cdot{G}) - G(\nabla\cdot{F}) + (G\cdot\nabla)F-(F\cdot\nabla)G\\ &= \nabla\cdot(BA^t - AB^t). \end{align*} +$$ The curl formula is more involved. @@ -921,7 +932,7 @@ Second, This is not as clear, but can be seen algebraically as terms cancel. First: - +$$ \begin{align*} \nabla\cdot(\nabla\times{F}) &= \langle @@ -938,6 +949,7 @@ This is not as clear, but can be seen algebraically as terms cancel. First: \left(\frac{\partial^2{F_x}}{\partial{z}\partial{y}} - \frac{\partial^2{F_z}}{\partial{x}\partial{y}}\right) + \left(\frac{\partial^2{F_y}}{\partial{x}\partial{z}} - \frac{\partial^2{F_x}}{\partial{y}\partial{z}}\right) \end{align*} +$$ Focusing on one component function, $F_z$ say, we see this contribution: @@ -974,10 +986,10 @@ apoly!(ps, linewidth=3, color=:red) ps = [[1,0],[1+dx, dy],[1+dx, 1+dy],[1,1]] apoly!(ps, linewidth=3, color=:green) -annotate!(dx+.02, dy-0.05, L"P_1") -annotate!(0+0.05, 0 - 0.02, L"P_2") -annotate!(1+0.05, 0 - 0.02, L"P_3") -annotate!(1+dx+.02, dy-0.05, L"P_4") +annotate!(dx+.02, dy-0.05, "P₁") +annotate!(0+0.05, 0 - 0.02, "P₂") +annotate!(1+0.05, 0 - 0.02, "P₃") +annotate!(1+dx+.02, dy-0.05, "P₄") p ``` @@ -1014,7 +1026,7 @@ This is because of how the line integrals are oriented so that the right-hand ru The [invariance of charge](https://en.wikipedia.org/wiki/Maxwell%27s_equations#Charge_conservation) can be derived as a corollary of Maxwell's equation. The divergence of the curl of the magnetic field is $0$, leading to: - +$$ \begin{align*} 0 &= \nabla\cdot(\nabla\times{B}) \\ &= @@ -1024,6 +1036,7 @@ The [invariance of charge](https://en.wikipedia.org/wiki/Maxwell%27s_equations#C &= \mu_0(\nabla\cdot{J} + \frac{\partial{\rho}}{\partial{t}}). \end{align*} +$$ That is $\nabla\cdot{J} = -\partial{\rho}/\partial{t}$. This says any change in the charge density in time ($\partial{\rho}/\partial{t}$) is balanced off by a divergence in the electric current density ($\nabla\cdot{J}$). That is, charge can't be created or destroyed in an isolated system. @@ -1048,7 +1061,7 @@ $$ Without explaining why, these values can be computed using volume and surface integrals: - +$$ \begin{align*} \phi(\vec{r}') &= \frac{1}{4\pi} \int_V \frac{\nabla \cdot F(\vec{r})}{\|\vec{r}'-\vec{r} \|} dV - @@ -1056,16 +1069,18 @@ Without explaining why, these values can be computed using volume and surface in A(\vec{r}') &= \frac{1}{4\pi} \int_V \frac{\nabla \times F(\vec{r})}{\|\vec{r}'-\vec{r} \|} dV + \frac{1}{4\pi} \oint_S \frac{F(\vec{r})}{\|\vec{r}'-\vec{r} \|} \times \hat{N} dS. \end{align*} +$$ If $V = R^3$, an unbounded domain, *but* $F$ *vanishes* faster than $1/r$, then the theorem still holds with just the volume integrals: - +$$ \begin{align*} \phi(\vec{r}') &=\frac{1}{4\pi} \int_V \frac{\nabla \cdot F(\vec{r})}{\|\vec{r}'-\vec{r} \|} dV\\ A(\vec{r}') &= \frac{1}{4\pi} \int_V \frac{\nabla \times F(\vec{r})}{\|\vec{r}'-\vec{r}\|} dV. \end{align*} +$$ ## Change of variable @@ -1080,7 +1095,7 @@ Some details are [here](https://en.wikipedia.org/wiki/Curvilinear_coordinates), We restrict to $n=3$ and use $(x,y,z)$ for Cartesian coordinates and $(u,v,w)$ for an *orthogonal* curvilinear coordinate system, such as spherical or cylindrical. If $\vec{r} = \langle x,y,z\rangle$, then - +$$ \begin{align*} d\vec{r} &= \langle dx,dy,dz \rangle = J \langle du,dv,dw\rangle\\ &= @@ -1091,6 +1106,7 @@ d\vec{r} &= \langle dx,dy,dz \rangle = J \langle du,dv,dw\rangle\\ \frac{\partial{\vec{r}}}{\partial{v}} dv + \frac{\partial{\vec{r}}}{\partial{w}} dw. \end{align*} +$$ The term ${\partial{\vec{r}}}/{\partial{u}}$ is tangent to the curve formed by *assuming* $v$ and $w$ are constant and letting $u$ vary. Similarly for the other partial derivatives. Orthogonality assumes that at every point, these tangent vectors are orthogonal. @@ -1138,7 +1154,7 @@ This uses orthogonality, so $\hat{e}_v \times \hat{e}_w$ is parallel to $\hat{e} The volume element is found by *projecting* $d\vec{r}$ onto the $\hat{e}_u$, $\hat{e}_v$, $\hat{e}_w$ coordinate system through $(d\vec{r} \cdot\hat{e}_u) \hat{e}_u$, $(d\vec{r} \cdot\hat{e}_v) \hat{e}_v$, and $(d\vec{r} \cdot\hat{e}_w) \hat{e}_w$. Then forming the triple scalar product to compute the volume of the parallelepiped: - +$$ \begin{align*} \left[(d\vec{r} \cdot\hat{e}_u) \hat{e}_u\right] \cdot \left( @@ -1149,6 +1165,7 @@ The volume element is found by *projecting* $d\vec{r}$ onto the $\hat{e}_u$, $\h &= h_u h_v h_w du dv dw, \end{align*} +$$ as the unit vectors are orthonormal, their triple scalar product is $1$ and $d\vec{r}\cdot\hat{e}_u = h_u du$, etc. @@ -1214,7 +1231,7 @@ p The tangent vectors found from the partial derivatives of $\vec{r}$: - +$$ \begin{align*} \frac{\partial{\vec{r}}}{\partial{r}} &= \langle \cos(\theta) \cdot \sin(\phi), \sin(\theta) \cdot \sin(\phi), \cos(\phi)\rangle,\\ @@ -1223,12 +1240,13 @@ The tangent vectors found from the partial derivatives of $\vec{r}$: \frac{\partial{\vec{r}}}{\partial{\phi}} &= \langle r\cdot\cos(\theta)\cdot\cos(\phi), r\cdot\sin(\theta)\cdot\cos(\phi), -r\cdot\sin(\phi) \rangle. \end{align*} +$$ With this, we have $h_r=1$, $h_\theta=r\sin(\phi)$, and $h_\phi = r$. So that - +$$ \begin{align*} dl &= \sqrt{dr^2 + (r\sin(\phi)d\theta)^2 + (rd\phi)^2},\\ dS_r &= r^2\sin(\phi)d\theta d\phi,\\ @@ -1236,6 +1254,7 @@ dS_\theta &= rdr d\phi,\\ dS_\phi &= r\sin(\phi)dr d\theta, \quad\text{and}\\ dV &= r^2\sin(\phi) drd\theta d\phi. \end{align*} +$$ The following visualizes the volume and the surface elements. @@ -1292,7 +1311,7 @@ p If $f$ is a scalar function then $df = \nabla{f} \cdot d\vec{r}$ by the chain rule. Using the curvilinear coordinates: - +$$ \begin{align*} df &= \frac{\partial{f}}{\partial{u}} du + @@ -1303,6 +1322,7 @@ df &= \frac{1}{h_v}\frac{\partial{f}}{\partial{v}} h_vdv + \frac{1}{h_w}\frac{\partial{f}}{\partial{w}} h_wdw. \end{align*} +$$ But, as was used above, $d\vec{r} \cdot \hat{e}_u = h_u du$, etc. so $df$ can be re-expressed as: diff --git a/quarto/integral_vector_calculus/double_triple_integrals.qmd b/quarto/integral_vector_calculus/double_triple_integrals.qmd index 3f61d7a4..82866498 100644 --- a/quarto/integral_vector_calculus/double_triple_integrals.qmd +++ b/quarto/integral_vector_calculus/double_triple_integrals.qmd @@ -391,17 +391,19 @@ By "iterated" we mean performing two different definite integrals. For example, The question then: under what conditions will the three integrals be equal? +::: {.callout-note icon=false} +## [Fubini](https://math.okstate.edu/people/lebl/osu4153-s16/chapter10-ver1.pdf) -> [Fubini](https://math.okstate.edu/people/lebl/osu4153-s16/chapter10-ver1.pdf). Let $R \times S$ be a closed rectangular region in $R^n \times R^m$. Suppose $f$ is bounded. Define $f_x(y) = f(x,y)$ and $f^y(x) = f(x,y)$ where $x$ is in $R^n$ and $y$ in $R^m$. *If* $f_x$ and $f^y$ are integrable then -> -> $$ -> \iint_{R\times S}fdV = \iint_R \left(\iint_S f_x(y) dy\right) dx -> = \iint_S \left(\iint_R f^y(x) dx\right) dy. -> $$ - +Let $R \times S$ be a closed rectangular region in $R^n \times R^m$. Suppose $f$ is bounded. Define $f_x(y) = f(x,y)$ and $f^y(x) = f(x,y)$ where $x$ is in $R^n$ and $y$ in $R^m$. *If* $f_x$ and $f^y$ are integrable then +$$ +\iint_{R\times S}fdV = \iint_R \left(\iint_S f_x(y) dy\right) dx += \iint_S \left(\iint_R f^y(x) dx\right) dy. +$$ Similarly, if $f^y$ is integrable for all $y$, then $\iint_{R\times S}fdV =\iint_S \iint_R f(x,y) dx dy$. +::: + An immediate corollary is that the above holds for continuous functions when $R$ and $S$ are bounded, the case described here. @@ -939,11 +941,12 @@ In [Katz](http://www.jstor.org/stable/2689856) a review of the history of "chang We view $R$ in two coordinate systems $(x,y)$ and $(u,v)$. We have that - +$$ \begin{align*} dx &= A du + B dv\\ dy &= C du + D dv, \end{align*} +$$ where $A = \partial{x}/\partial{u}$, $B = \partial{x}/\partial{v}$, $C= \partial{y}/\partial{u}$, and $D = \partial{y}/\partial{v}$. Lagrange, following Euler, first sets $x$ to be constant (as is done in iterated integration). Hence, $dx = 0$ and so $du = -(B/A) dv$ and, after substitution, $dy = (D-C(B/A))dv$. Then Lagrange set $y$ to be a constant, so $dy = 0$ and hence $dv=0$ so $dx = Adu$. The area "element" $dx dy = A du \cdot (D - C(B/A)) dv = (AD - BC) du dv$. Since areas and volumes are non-negative, the absolute value is used. With this, we have "$dxdy = |AD-BC|du dv$" as the analog of $dx = g'(u) du$. @@ -952,11 +955,12 @@ where $A = \partial{x}/\partial{u}$, $B = \partial{x}/\partial{v}$, $C= \partial The expression $AD - BC$ was also derived by Euler, by related means. Lagrange extended the analysis to 3 dimensions. Before doing so, it is helpful to understand the problem from a geometric perspective. Euler was attempting to understand the effects of the following change of variable: - +$$ \begin{align*} x &= a + mt + \sqrt{1-m^2} v\\ y & = b + \sqrt{1-m^2}t -mv \end{align*} +$$ Euler knew this to be a clockwise *rotation* by an angle $\theta$ with $\cos(\theta) = m$, a *reflection* through the $x$ axis, and a translation by $\langle a, b\rangle$. All these *should* preserve the area represented by $dx dy$, so he was *expecting* $dx dy = dt dv$. @@ -1090,13 +1094,15 @@ Using the fact that the two vectors involved are columns in the Jacobian of the The absolute value of the determinant of the Jacobian is the multiplying factor that is seen in the change of variable formula for all dimensions: +::: {.callout-note icon=false} +## [Change of variable](https://en.wikipedia.org/wiki/Integration_by_substitution#Substitution_for_multiple_variables) -> [Change of variable](https://en.wikipedia.org/wiki/Integration_by_substitution#Substitution_for_multiple_variables) Let $U$ be an open set in $R^n$, $G:U \rightarrow R^n$ be an *injective* differentiable function with *continuous* partial derivatives. If $f$ is continuous and compactly supported, then -> -> $$ -> \iint_{G(S)} f(\vec{x}) dV = \iint_S (f \circ G)(\vec{u}) |\det(J_G)(\vec{u})| dU. -> $$ +Let $U$ be an open set in $R^n$, $G:U \rightarrow R^n$ be an *injective* differentiable function with *continuous* partial derivatives. If $f$ is continuous and compactly supported, then +$$ +\iint_{G(S)} f(\vec{x}) dV = \iint_S (f \circ G)(\vec{u}) |\det(J_G)(\vec{u})| dU. +$$ +::: For the one-dimensional case, there is no absolute value, but there the interval is reversed, producing "negative" area. This is not the case here, where $S$ is parameterized to give positive volume. @@ -1308,12 +1314,13 @@ What about other triangles, say the triangle bounded by $x=0$, $y=0$ and $y-x=1$ This can be seen as a reflection through the line $x=1/2$ of the triangle above. If $G_1$ represents the mapping from $U [0,1]\times[0,1]$ into the triangle of the last problem, and $G_2$ represents the reflection through the line $x=1/2$, then the transformation $G_2 \circ G_1$ will map the box $U$ into the desired region. By the chain rule, we have: - +$$ \begin{align*} \int_{(G_2\circ G_1)(U)} f dx &= \int_U (f\circ G_2 \circ G_1) |\det(J_{G_2 \circ G_1})| du \\ &= \int_U (f\circ G_2 \circ G_1) |\det(J_{G_2}(G_1(u)))||\det(J_{G_1}(u))| du. \end{align*} +$$ (In [Katz](http://www.jstor.org/stable/2689856) it is mentioned that Jacobi showed this in 1841.) diff --git a/quarto/integral_vector_calculus/line_integrals.qmd b/quarto/integral_vector_calculus/line_integrals.qmd index 6e246749..94b0b8da 100644 --- a/quarto/integral_vector_calculus/line_integrals.qmd +++ b/quarto/integral_vector_calculus/line_integrals.qmd @@ -166,13 +166,14 @@ However, it proves more interesting to define an integral incorporating how prop The canonical example is [work](https://en.wikipedia.org/wiki/Work_(physics)), which is a measure of a force times a distance. For an object following a path, the work done is still a force times a distance, but only that force in the direction of the motion is considered. (The *constraint force* keeping the object on the path does no work.) Mathematically, $\hat{T}$ describes the direction of motion along a path, so the work done in moving an object over a small segment of the path is $(F\cdot\hat{T}) \Delta{s}$. Adding up incremental amounts of work leads to a Riemann sum for a line integral involving a vector field. +::: {.callout-note icon=false} +## Work -> The *work* done in moving an object along a path $C$ by a force field, $F$, is given by the integral -> -> $$ -> \int_C (F \cdot \hat{T}) ds = \int_C F\cdot d\vec{r} = \int_a^b ((F\circ\vec{r}) \cdot \frac{d\vec{r}}{dt})(t) dt. -> $$ - +The *work* done in moving an object along a path $C$ by a force field, $F$, is given by the integral +$$ +\int_C (F \cdot \hat{T}) ds = \int_C F\cdot d\vec{r} = \int_a^b ((F\circ\vec{r}) \cdot \frac{d\vec{r}}{dt})(t) dt. +$$ +::: --- @@ -180,13 +181,15 @@ The canonical example is [work](https://en.wikipedia.org/wiki/Work_(physics)), w In the $n=2$ case, there is another useful interpretation of the line integral. In this dimension the normal vector, $\hat{N}$, is well defined in terms of the tangent vector, $\hat{T}$, through a rotation: $\langle a,b\rangle^t = \langle b,-a\rangle$. (The negative, $\langle -b,a\rangle$ is also a candidate, the difference in this choice would lead to a sign difference in the answer.) This allows the definition of a different line integral, called a flow integral, as detailed later: +::: {.callout-note icon=false} +## Flow -> The *flow* across a curve $C$ is given by -> -> $$ -> \int_C (F\cdot\hat{N}) ds = \int_a^b (F \circ \vec{r})(t) \cdot (\vec{r}'(t))^t dt. -> $$ +The *flow* across a curve $C$ is given by +$$ +\int_C (F\cdot\hat{N}) ds = \int_a^b (F \circ \vec{r})(t) \cdot (\vec{r}'(t))^t dt. +$$ +::: ### Examples @@ -296,9 +299,11 @@ using the Fundamental Theorem of Calculus. The main point above is that *if* the vector field is the gradient of a scalar field, then the work done depends *only* on the endpoints of the path and not the path itself. +::: {.callout-note icon=false} +## Conservative vector field -> **Conservative vector field**: If $F$ is a vector field defined in an *open* region $R$; $A$ and $B$ are points in $R$ and *if* for *any* curve $C$ in $R$ connecting $A$ to $B$, the line integral of $F \cdot \vec{T}$ over $C$ depends *only* on the endpoint $A$ and $B$ and not the path, then the line integral is called *path indenpendent* and the field is called a *conservative field*. - +If $F$ is a vector field defined in an *open* region $R$; $A$ and $B$ are points in $R$ and *if* for *any* curve $C$ in $R$ connecting $A$ to $B$, the line integral of $F \cdot \vec{T}$ over $C$ depends *only* on the endpoint $A$ and $B$ and not the path, then the line integral is called *path indenpendent* and the field is called a *conservative field*. +::: The force of gravity is the gradient of a scalar field. As such, the two integrals above which yield $0$ could have been computed more directly. The particular scalar field is $f = -GMm/\|\vec{r}\|$, which goes by the name the gravitational *potential* function. As seen, $f$ depends only on magnitude, and as the endpoints of the path in the example have the same distance to the origin, the work integral, $(f\circ\vec{r})(b) - (f\circ\vec{r})(a)$ will be $0$. @@ -345,17 +350,19 @@ There are technical assumptions about curves and regions that are necessary for ### The fundamental theorem of line integrals -The fact that work in a potential field is path independent is a consequence of the Fundamental Theorem of Line [Integrals](https://en.wikipedia.org/wiki/Gradient_theorem): +The fact that work in a potential field is path independent is a consequence of +::: {.callout-note icon=false} +## The Fundamental Theorem of Line [Integrals](https://en.wikipedia.org/wiki/Gradient_theorem): -> Let $U$ be an open subset of $R^n$, $f: U \rightarrow R$ a *differentiable* function and $\vec{r}: R \rightarrow R^n$ a differentiable function such that the the path $C = \vec{r}(t)$, $a\leq t\leq b$ is contained in $U$. Then -> -> $$ -> \int_C \nabla{f} \cdot d\vec{r} = -> \int_a^b \nabla{f}(\vec{r}(t)) \cdot \vec{r}'(t) dt = -> f(\vec{r}(b)) - f(\vec{r}(a)). -> $$ +Let $U$ be an open subset of $R^n$, $f: U \rightarrow R$ a *differentiable* function and $\vec{r}: R \rightarrow R^n$ a differentiable function such that the the path $C = \vec{r}(t)$, $a\leq t\leq b$ is contained in $U$. Then +$$ +\int_C \nabla{f} \cdot d\vec{r} = +\int_a^b \nabla{f}(\vec{r}(t)) \cdot \vec{r}'(t) dt = +f(\vec{r}(b)) - f(\vec{r}(a)). +$$ +::: That is, a line integral through a gradient field can be evaluated by evaluating the original scalar field at the endpoints of the curve. In other words, line integrals through gradient fields are conservative. diff --git a/quarto/integral_vector_calculus/review.qmd b/quarto/integral_vector_calculus/review.qmd index 812a06e6..7e56b1e6 100644 --- a/quarto/integral_vector_calculus/review.qmd +++ b/quarto/integral_vector_calculus/review.qmd @@ -99,12 +99,13 @@ In dimension $m=3$, the **binormal** vector, $\hat{B}$, is the unit vector $\hat The [Frenet-Serret]() formulas define the **curvature**, $\kappa$, and the **torsion**, $\tau$, by - +$$ \begin{align*} \frac{d\hat{T}}{ds} &= & \kappa \hat{N} &\\ \frac{d\hat{N}}{ds} &= -\kappa\hat{T} & & + \tau\hat{B}\\ \frac{d\hat{B}}{ds} &= & -\tau\hat{N}& \end{align*} +$$ These formulas apply in dimension $m=2$ with $\hat{B}=\vec{0}$. @@ -122,13 +123,14 @@ The chain rule says $(\vec{r}(g(t))' = \vec{r}'(g(t)) g'(t)$. A scalar function, $f:R^n\rightarrow R$, $n > 1$ has a **partial derivative** defined. For $n=2$, these are: - +$$ \begin{align*} \frac{\partial{f}}{\partial{x}}(x,y) &= \lim_{h\rightarrow 0} \frac{f(x+h,y)-f(x,y)}{h}\\ \frac{\partial{f}}{\partial{y}}(x,y) &= \lim_{h\rightarrow 0} \frac{f(x,y+h)-f(x,y)}{h}. \end{align*} +$$ The generalization to $n>2$ is clear - the partial derivative in $x_i$ is the derivative of $f$ when the *other* $x_j$ are held constant. @@ -356,7 +358,7 @@ $$ In two dimensions, we have the following interpretations: - +$$ \begin{align*} \iint_R dA &= \text{area of } R\\ \iint_R \rho dA &= \text{mass with constant density }\rho\\ @@ -364,12 +366,13 @@ In two dimensions, we have the following interpretations: \frac{1}{\text{area}}\iint_R x \rho(x,y)dA &= \text{centroid of region in } x \text{ direction}\\ \frac{1}{\text{area}}\iint_R y \rho(x,y)dA &= \text{centroid of region in } y \text{ direction} \end{align*} +$$ In three dimensions, we have the following interpretations: - +$$ \begin{align*} \iint_VdV &= \text{volume of } V\\ \iint_V \rho dV &= \text{mass with constant density }\rho\\ @@ -378,6 +381,7 @@ In three dimensions, we have the following interpretations: \frac{1}{\text{volume}}\iint_V y \rho(x,y)dV &= \text{centroid of volume in } y \text{ direction}\\ \frac{1}{\text{volume}}\iint_V z \rho(x,y)dV &= \text{centroid of volume in } z \text{ direction} \end{align*} +$$ To compute integrals over non-box-like regions, Fubini's theorem may be utilized. Alternatively, a **transformation** of variables diff --git a/quarto/integral_vector_calculus/stokes_theorem.qmd b/quarto/integral_vector_calculus/stokes_theorem.qmd index afd2bbaa..0901e9f5 100644 --- a/quarto/integral_vector_calculus/stokes_theorem.qmd +++ b/quarto/integral_vector_calculus/stokes_theorem.qmd @@ -109,7 +109,7 @@ p = plot(legend=false, xticks=nothing, yticks=nothing, border=:none, ylim=(-1/2, for m in ms drawf!(p, f, m, 0.9*dx/2) end -annotate!([(ms[6]-dx/2,-0.3, L"x_{i-1}"), (ms[6]+dx/2,-0.3, L"x_{i}")]) +annotate!([(ms[6]-dx/2,-0.3, "xᵢ₋₁}"), (ms[6]+dx/2,-0.3, "xᵢ")]) p ``` @@ -214,18 +214,20 @@ However, the microscopic boundary integrals have cancellations that lead to a ma This all suggests that the flow integral around the surface of the larger region (the blue square) is equivalent to the integral of the curl component over the region. This is [Green](https://en.wikipedia.org/wiki/Green%27s_theorem)'s theorem, as stated by Wikipedia: +::: {.callout-note icon=false} +## Green's theorem -> **Green's theorem**: Let $C$ be a positively oriented, piecewise smooth, simple closed curve in the plane, and let $D$ be the region bounded by $C$. If $F=\langle F_x, F_y\rangle$, is a vector field on an open region containing $D$ having continuous partial derivatives then: -> -> $$ -> \oint_C F\cdot\hat{T}ds = -> \iint_D \left( -> \frac{\partial{F_y}}{\partial{x}} - \frac{\partial{F_x}}{\partial{y}} -> \right) dA= -> \iint_D \text{curl}(F)dA. -> $$ +Let $C$ be a positively oriented, piecewise smooth, simple closed curve in the plane, and let $D$ be the region bounded by $C$. If $F=\langle F_x, F_y\rangle$, is a vector field on an open region containing $D$ having continuous partial derivatives then: +$$ +\oint_C F\cdot\hat{T}ds = +\iint_D \left( +\frac{\partial{F_y}}{\partial{x}} - \frac{\partial{F_x}}{\partial{y}} +\right) dA= +\iint_D \text{curl}(F)dA. +$$ +::: The statement of the theorem applies only to regions whose boundaries are simple closed curves. Not all simple regions have such boundaries. An annulus for example. This is a restriction that will be generalized. @@ -271,11 +273,12 @@ r(t) = [a*cos(t),b*sin(t)] To compute the area of the triangle with vertices $(0,0)$, $(a,0)$ and $(0,b)$ we can orient the boundary counter clockwise. Let $A$ be the line segment from $(0,b)$ to $(0,0)$, $B$ be the line segment from $(0,0)$ to $(a,0)$, and $C$ be the other. Then - +$$ \begin{align*} \frac{1}{2} \int_A F\cdot\hat{T} ds &=\frac{1}{2} \int_A -ydx = 0\\ \frac{1}{2} \int_B F\cdot\hat{T} ds &=\frac{1}{2} \int_B xdy = 0, \end{align*} +$$ as on $A$, $y=0$ and $dy=0$ and on $B$, $x=0$ and $dx=0$. @@ -311,7 +314,7 @@ For the two dimensional case the curl is a scalar. *If* $F = \langle F_x, F_y\ra Now assume $\partial{F_y}/\partial{x} - \partial{F_x}/\partial{y} = 0$. Let $P$ and $Q$ be two points in the plane. Take any path, $C_1$ from $P$ to $Q$ and any return path, $C_2$, from $Q$ to $P$ that do not cross and such that $C$, the concatenation of the two paths, satisfies Green's theorem. Then, as $F$ is continuous on an open interval containing $D$, we have: - +$$ \begin{align*} 0 &= \iint_D 0 dA \\ &= @@ -321,6 +324,7 @@ Now assume $\partial{F_y}/\partial{x} - \partial{F_x}/\partial{y} = 0$. Let $P$ &= \int_{C_1} F \cdot \hat{T} ds + \int_{C_2}F \cdot \hat{T} ds. \end{align*} +$$ Reversing $C_2$ to go from $P$ to $Q$, we see the two work integrals are identical, that is the field is conservative. @@ -339,13 +343,14 @@ For example, let $F(x,y) = \langle \sin(xy), \cos(xy) \rangle$. Is this a conser We can check by taking partial derivatives. Those of interest are: - +$$ \begin{align*} \frac{\partial{F_y}}{\partial{x}} &= \frac{\partial{(\cos(xy))}}{\partial{x}} = -\sin(xy) y,\\ \frac{\partial{F_x}}{\partial{y}} &= \frac{\partial{(\sin(xy))}}{\partial{y}} = \cos(xy)x. \end{align*} +$$ It is not the case that $\partial{F_y}/\partial{x} - \partial{F_x}/\partial{y}=0$, so this vector field is *not* conservative. @@ -417,24 +422,26 @@ p Let $A$ label the red line, $B$ the green curve, $C$ the blue line, and $D$ the black line. Then the area is given from Green's theorem by considering half of the the line integral of $F(x,y) = \langle -y, x\rangle$ or $\oint_C (xdy - ydx)$. To that matter we have: - +$$ \begin{align*} \int_A (xdy - ydx) &= a(-f(a))\\ \int_C (xdy - ydx) &= b f(b)\\ \int_D (xdy - ydx) &= 0\\ \end{align*} +$$ Finally the integral over $B$, using integration by parts: - +$$ \begin{align*} \int_B F(\vec{r}(t))\cdot \frac{d\vec{r}(t)}{dt} dt &= \int_b^a \langle -f(t),t \rangle\cdot\langle 1, f'(t)\rangle dt\\ &= \int_a^b f(t)dt - \int_a^b tf'(t)dt\\ &= \int_a^b f(t)dt - \left(tf(t)\mid_a^b - \int_a^b f(t) dt\right). \end{align*} +$$ Combining, we have after cancellation $\oint (xdy - ydx) = 2\int_a^b f(t) dt$, or after dividing by $2$ the signed area under the curve. @@ -470,7 +477,7 @@ The cut leads to a counter-clockwise orientation on the outer ring and a clockw To see that the area integral of $F(x,y) = (1/2)\langle -y, x\rangle$ produces the area for this orientation we have, using $C_1$ as the outer ring, and $C_2$ as the inner ring: - +$$ \begin{align*} \oint_{C_1} F \cdot \hat{T} ds &= \int_0^{2\pi} (1/2)(2)\langle -\sin(t), \cos(t)\rangle \cdot (2)\langle-\sin(t), \cos(t)\rangle dt \\ @@ -479,6 +486,7 @@ To see that the area integral of $F(x,y) = (1/2)\langle -y, x\rangle$ produces t \int_{0}^{2\pi} (1/2) \langle \sin(t), \cos(t)\rangle \cdot \langle-\sin(t), -\cos(t)\rangle dt\\ &= -(1/2)(2\pi) = -\pi. \end{align*} +$$ (Using $\vec{r}(t) = 2\langle \cos(t), \sin(t)\rangle$ for the outer ring and $\vec{r}(t) = 1\langle \cos(t), -\sin(t)\rangle$ for the inner ring.) @@ -739,7 +747,7 @@ $$ This gives the series of approximations: - +$$ \begin{align*} \oint_C F\cdot\hat{T} ds &= \sum \oint_{C_i} F\cdot\hat{T} ds \\ @@ -750,18 +758,21 @@ This gives the series of approximations: &\approx \iint_S \nabla\times{F}\cdot\hat{N} dS. \end{align*} +$$ In terms of our expanding popcorn, the boundary integral - after accounting for cancellations, as in Green's theorem - can be seen as a microscopic sum of boundary integrals each of which is approximated by a term $\nabla\times{F}\cdot\hat{N} \Delta{S}$ which is viewed as a Riemann sum approximation for the the integral of the curl over the surface. The cancellation depends on a proper choice of orientation, but with that we have: +::: {.callout-note icon=false} +## Stokes' theorem -> **Stokes' theorem**: Let $S$ be an orientable smooth surface in $R^3$ with boundary $C$, $C$ oriented so that the chosen normal for $S$ agrees with the right-hand rule for $C$'s orientation. Then *if* $F$ has continuous partial derivatives -> -> $$ -> \oint_C F \cdot\hat{T} ds = \iint_S (\nabla\times{F})\cdot\hat{N} dA. -> $$ +Let $S$ be an orientable smooth surface in $R^3$ with boundary $C$, $C$ oriented so that the chosen normal for $S$ agrees with the right-hand rule for $C$'s orientation. Then *if* $F$ has continuous partial derivatives +$$ +\oint_C F \cdot\hat{T} ds = \iint_S (\nabla\times{F})\cdot\hat{N} dA. +$$ +::: Green's theorem is an immediate consequence upon viewing the region in $R^2$ as a surface in $R^3$ with normal $\hat{k}$. @@ -997,17 +1008,17 @@ $$ the last approximation through a Riemann sum approximation. This heuristic leads to: +::: {.callout-note icon=false} +## The divergence theorem -> **The divergence theorem**: Suppose $V$ is a $3$-dimensional volume which is bounded (compact) and has a boundary, $S$, that is piecewise smooth. If $F$ is a continuously differentiable vector field defined on an open set containing $V$, then: -> -> $$ -> \iiint_V (\nabla\cdot{F}) dV = \oint_S (F\cdot\hat{N})dS. -> $$ - +Suppose $V$ is a $3$-dimensional volume which is bounded (compact) and has a boundary, $S$, that is piecewise smooth. If $F$ is a continuously differentiable vector field defined on an open set containing $V$, then: +$$ +\iiint_V (\nabla\cdot{F}) dV = \oint_S (F\cdot\hat{N})dS. +$$ That is, the volume integral of the divergence can be computed from the flux integral over the boundary of $V$. - +::: ### Examples of the divergence theorem @@ -1130,12 +1141,13 @@ The divergence theorem provides two means to compute a value, the point here is Following Schey, we now consider a continuous analog to the crowd counting problem through a flow with a non-uniform density that may vary in time. Let $\rho(x,y,z;t)$ be the time-varying density and $v(x,y,z;t)$ be a vector field indicating the direction of flow. Consider some three-dimensional volume, $V$, with boundary $S$ (though two-dimensional would also be applicable). Then these integrals have interpretations: - +$$ \begin{align*} \iiint_V \rho dV &&\quad\text{Amount contained within }V\\ \frac{\partial}{\partial{t}} \iiint_V \rho dV &= \iiint_V \frac{\partial{\rho}}{\partial{t}} dV &\quad\text{Change in time of amount contained within }V \end{align*} +$$ Moving the derivative inside the integral requires an assumption of continuity. Assume the material is *conserved*, meaning that if the amount in the volume $V$ changes it must flow in and out through the boundary. The flow out through $S$, the boundary of $V$, is diff --git a/quarto/integrals/arc_length.qmd b/quarto/integrals/arc_length.qmd index 10d164cb..c31ff5e6 100644 --- a/quarto/integrals/arc_length.qmd +++ b/quarto/integrals/arc_length.qmd @@ -48,14 +48,18 @@ Recall the distance formula gives the distance between two points: $\sqrt{(x_1 - Consider now two functions $g(t)$ and $f(t)$ and the parameterized graph between $a$ and $b$ given by the points $(g(t), f(t))$ for $a \leq t \leq b$. Assume that both $g$ and $f$ are differentiable on $(a,b)$ and continuous on $[a,b]$ and furthermore that $\sqrt{g'(t)^2 + f'(t)^2}$ is Riemann integrable. +::: {.callout-note icon=false} +## The arc length of a curve -> **The arc length of a curve**. For $f$ and $g$ as described, the arc length of the parameterized curve is given by -> -> $L = \int_a^b \sqrt{g'(t)^2 + f'(t)^2} dt.$ -> -> For the special case of the graph of a function $f(x)$ between $a$ and $b$ the formula becomes $L = \int_a^b \sqrt{ 1 + f'(x)^2} dx$ (taking $g(t) = t$). +For $f$ and $g$ as described, the arc length of the parameterized curve is given by +$$ +L = \int_a^b \sqrt{g'(t)^2 + f'(t)^2} dt. +$$ + +For the special case of the graph of a function $f(x)$ between $a$ and $b$ the formula becomes $L = \int_a^b \sqrt{ 1 + f'(x)^2} dx$ (taking $g(t) = t$). +::: :::{.callout-note} ## Note @@ -126,7 +130,7 @@ $$ But looking at each term, we can push the denominator into the square root as: - +$$ \begin{align*} d_i &= d_i \cdot \frac{t_i - t_{i-1}}{t_i - t_{i-1}} \\ @@ -134,6 +138,7 @@ d_i &= d_i \cdot \frac{t_i - t_{i-1}}{t_i - t_{i-1}} \left(\frac{f(t_i)-f(t_{i-1})}{t_i-t_{i-1}}\right)^2} \cdot (t_i - t_{i-1}) \\ &= \sqrt{ g'(\xi_i)^2 + f'(\psi_i)^2} \cdot (t_i - t_{i-1}). \end{align*} +$$ The values $\xi_i$ and $\psi_i$ are guaranteed by the mean value theorem and must be in $[t_{i-1}, t_i]$. @@ -272,7 +277,7 @@ nothing The museum notes have -> For his Catenary series (1997–2003), of which Near the Lagoon is the largest and last work, Johns formed catenaries—a term used to describe the curve assumed by a cord suspended freely from two points—by tacking ordinary household string to the canvas or its supports. +> For his Catenary series (1997–2003), of which Near the Lagoon is the largest and last work, Johns formed catenaries—a term used to describe the curve assumed by a cord suspended freely from two points—by tacking ordinary household string to the canvas or its supports. @@ -377,11 +382,12 @@ nothing The [nephroid](http://www-history.mcs.st-and.ac.uk/Curves/Nephroid.html) is a curve that can be described parametrically by - +$$ \begin{align*} g(t) &= a(3\cos(t) - \cos(3t)), \\ f(t) &= a(3\sin(t) - \sin(3t)). \end{align*} +$$ Taking $a=1$ we have this graph: @@ -407,7 +413,7 @@ quadgk(t -> sqrt(𝒈'(t)^2 + 𝒇'(t)^2), 0, 2pi)[1] The answer seems like a floating point approximation of $24$, which suggests that this integral is tractable. Pursuing this, the integrand simplifies: - +$$ \begin{align*} \sqrt{g'(t)^2 + f'(t)^2} &= \sqrt{(-3\sin(t) + 3\sin(3t))^2 + (3\cos(t) - 3\cos(3t))^2} \\ @@ -417,6 +423,7 @@ The answer seems like a floating point approximation of $24$, which suggests th &= 3\sqrt{2}\sqrt{1 - \cos(2t)}\\ &= 3\sqrt{2}\sqrt{2\sin(t)^2}. \end{align*} +$$ The second to last line comes from a double angle formula expansion of $\cos(3t - t)$ and the last line from the half angle formula for $\cos$. @@ -452,13 +459,14 @@ A teacher of small children assigns his students the task of computing the lengt Mathematically, suppose a curve is described parametrically by $(g(t), f(t))$ for $a \leq t \leq b$. A new parameterization is provided by $\gamma(t)$. Suppose $\gamma$ is strictly increasing, so that an inverse function exists. (This assumption is implicitly made by the teacher, as it implies the student won't start counting in the wrong direction.) Then the same curve is described by composition through $(g(\gamma(u)), f(\gamma(u)))$, $\gamma^{-1}(a) \leq u \leq \gamma^{-1}(b)$. That the arc length is the same follows from substitution: - +$$ \begin{align*} \int_{\gamma^{-1}(a)}^{\gamma^{-1}(b)} \sqrt{([g(\gamma(t))]')^2 + ([f(\gamma(t))]')^2} dt &=\int_{\gamma^{-1}(a)}^{\gamma^{-1}(b)} \sqrt{(g'(\gamma(t) )\gamma'(t))^2 + (f'(\gamma(t) )\gamma'(t))^2 } dt \\ &=\int_{\gamma^{-1}(a)}^{\gamma^{-1}(b)} \sqrt{g'(\gamma(t))^2 + f'(\gamma(t))^2} \gamma'(t) dt\\ &=\int_a^b \sqrt{g'(u)^2 + f'(u)^2} du = L \end{align*} +$$ (Using $u=\gamma(t)$ for the substitution.) @@ -483,12 +491,13 @@ For a simple example, we have $g(t) = R\cos(t)$ and $f(t)=R\sin(t)$ parameterizi What looks at first glance to be just a slightly more complicated equation is that of an ellipse, with $g(t) = a\cos(t)$ and $f(t) = b\sin(t)$. Taking $a=1$ and $b = a + c$, for $c > 0$ we get the equation for the arc length as a function of $t$ is just - +$$ \begin{align*} s(u) &= \int_0^u \sqrt{(-\sin(t))^2 + b\cos(t)^2} dt\\ &= \int_0^u \sqrt{\sin(t)^2 + \cos(t)^2 + c\cos(t)^2} dt \\ &=\int_0^u \sqrt{1 + c\cos(t)^2} dt. \end{align*} +$$ But, despite it not looking too daunting, this integral is not tractable through our techniques and has an answer involving elliptic integrals. We can work numerically though. Letting $a=1$ and $b=2$, we have the arc length is given by: @@ -588,11 +597,12 @@ $$ So - +$$ \begin{align*} \int_0^1 (tf'(u) + (1-t)f'(v)) dt &< \int_0^1 f'(tu + (1-t)v) dt, \text{or}\\ \frac{f'(u) + f'(v)}{2} &< \frac{1}{v-u}\int_u^v f'(w) dw, \end{align*} +$$ by the substitution $w = tu + (1-t)v$. Using the fundamental theorem of calculus to compute the mean value of the integral of $f'$ over $[u,v]$ gives the following as a consequence of strict concavity of $f'$: @@ -684,24 +694,26 @@ which holds by the strict concavity of $f'$, as found previously. Using the substitution $x = f_i^{-1}(u)$ as needed to see: - +$$ \begin{align*} \int_a^u f(x) dx &= \int_0^{f(u)} u [f_1^{-1}]'(u) du \\ &> -\int_0^h u [f_2^{-1}]'(u) du \\ &= \int_h^0 u [f_2^{-1}]'(u) du \\ &= \int_v^b f(x) dx. \end{align*} +$$ For the latter claim, integrating in the $y$ variable gives - +$$ \begin{align*} \int_u^c (f(x)-h) dx &= \int_h^m (c - f_1^{-1}(y)) dy\\ &> \int_h^m (c - f_2^{-1}(y)) dy\\ &= \int_c^v (f(x)-h) dx \end{align*} +$$ Now, the area under $h$ over $[u,c]$ is greater than that over $[c,v]$ as $(u+v)/2 < c$ or $v-c < c-u$. That means the area under $f$ over $[u,c]$ is greater than that over $[c,v]$. @@ -724,7 +736,7 @@ or $\phi'(z) < 0$. Moreover, we have by the first assertion that $f'(z) < -f'(\p Using the substitution $x = \phi(z)$ gives: - +$$ \begin{align*} \int_v^b \sqrt{1 + f'(x)^2} dx &= \int_u^a \sqrt{1 + f'(\phi(z))^2} \phi'(z) dz\\ @@ -733,6 +745,7 @@ Using the substitution $x = \phi(z)$ gives: &= \int_a^u \sqrt{\phi'(z)^2 + f'(z)^2} dz\\ &< \int_a^u \sqrt{1 + f'(z)^2} dz \end{align*} +$$ Letting $h=f(u \rightarrow c)$ we get the *inequality* @@ -782,11 +795,12 @@ $$ with the case above corresponding to $W = -m(k/m)$. The set of equations then satisfy: - +$$ \begin{align*} x''(t) &= - W(t,x(t), x'(t), y(t), y'(t)) \cdot x'(t)\\ y''(t) &= -g - W(t,x(t), x'(t), y(t), y'(t)) \cdot y'(t)\\ \end{align*} +$$ with initial conditions: $x(0) = y(0) = 0$ and $x'(0) = v_0 \cos(\theta), y'(0) = v_0 \sin(\theta)$. @@ -795,28 +809,30 @@ with initial conditions: $x(0) = y(0) = 0$ and $x'(0) = v_0 \cos(\theta), y'(0) Only with certain drag forces, can this set of equations be solved exactly, though it can be approximated numerically for admissible $W$, but if $W$ is strictly positive then it can be shown $x(t)$ is increasing on $[0, x_\infty)$ and so invertible, and $f(u) = y(x^{-1}(u))$ is three times differentiable with both $f$ and $f'$ being strictly concave, as it can be shown that (say $x(v) = u$ so $dv/du = 1/x'(v) > 0$): - +$$ \begin{align*} f''(u) &= -\frac{g}{x'(v)^2} < 0\\ f'''(u) &= \frac{2gx''(v)}{x'(v)^3} \\ &= -\frac{2gW}{x'(v)^2} \cdot \frac{dv}{du} < 0 \end{align*} +$$ The latter by differentiating, the former a consequence of the following formulas for derivatives of inverse functions - +$$ \begin{align*} [x^{-1}]'(u) &= 1 / x'(v) \\ [x^{-1}]''(u) &= -x''(v)/(x'(v))^3 \end{align*} +$$ For then - +$$ \begin{align*} f(u) &= y(x^{-1}(u)) \\ f'(u) &= y'(x^{-1}(u)) \cdot {x^{-1}}'(u) \\ @@ -825,6 +841,7 @@ f''(u) &= y''(x^{-1}(u))\cdot[x^{-1}]'(u)^2 + y'(x^{-1}(u)) \cdot [x^{-1}]''(u) &= -g/(x'(v))^2 - W y'/(x'(v))^2 - y'(v) \cdot (- W \cdot x'(v)) / x'(v)^3\\ &= -g/x'(v)^2. \end{align*} +$$ ## Questions diff --git a/quarto/integrals/area.qmd b/quarto/integrals/area.qmd index 263ed79b..3f18360f 100644 --- a/quarto/integrals/area.qmd +++ b/quarto/integrals/area.qmd @@ -230,7 +230,7 @@ To successfully compute a good approximation for the area, we would need to choo For Archimedes' problem - finding the area under $f(x)=x^2$ between $0$ and $1$ - if we take as a partition $x_i = i/n$ and $c_i = x_i$, then the above sum becomes: - +$$ \begin{align*} S_n &= f(c_1) \cdot (x_1 - x_0) + f(c_2) \cdot (x_2 - x_1) + \cdots + f(c_n) \cdot (x_n - x_{n-1})\\ &= (x_1)^2 \cdot \frac{1}{n} + (x_2)^2 \cdot \frac{1}{n} + \cdots + (x_n)^2 \cdot \frac{1}{n}\\ @@ -238,6 +238,7 @@ S_n &= f(c_1) \cdot (x_1 - x_0) + f(c_2) \cdot (x_2 - x_1) + \cdots + f(c_n) \cd &= \frac{1}{n^3} \cdot (1^2 + 2^2 + \cdots + n^2) \\ &= \frac{1}{n^3} \cdot \frac{n\cdot(n-1)\cdot(2n+1)}{6}. \end{align*} +$$ The latter uses a well-known formula for the sum of squares of the first $n$ natural numbers. @@ -301,13 +302,18 @@ The general statement allows for any partition such that the largest gap goes to Riemann sums weren't named after Riemann because he was the first to approximate areas using rectangles. Indeed, others had been using even more efficient ways to compute areas for centuries prior to Riemann's work. Rather, Riemann put the definition of the area under the curve on a firm theoretical footing with the following theorem which gives a concrete notion of what functions are integrable: -> **Riemann Integral**: A function $f$ is Riemann integrable over the interval $[a,b]$ and its integral will have value $V$ provided for every $\epsilon > 0$ there exists a $\delta > 0$ such that for any partition $a =x_0 < x_1 < \cdots < x_n=b$ with $\lvert x_i - x_{i-1} \rvert < \delta$ and for any choice of points $x_{i-1} \leq c_i \leq x_{i}$ this is satisfied: -> -> $$ -> \lvert \sum_{i=1}^n f(c_i)(x_{i} - x_{i-1}) - V \rvert < \epsilon. -> $$ -> -> When the integral exists, it is written $V = \int_a^b f(x) dx$. +::: {.callout-note icon=false} +## Riemann Integral + +A function $f$ is Riemann integrable over the interval $[a,b]$ and its integral will have value $V$ provided for every $\epsilon > 0$ there exists a $\delta > 0$ such that for any partition $a =x_0 < x_1 < \cdots < x_n=b$ with $\lvert x_i - x_{i-1} \rvert < \delta$ and for any choice of points $x_{i-1} \leq c_i \leq x_{i}$ this is satisfied: + +$$ +\lvert \sum_{i=1}^n f(c_i)(x_{i} - x_{i-1}) - V \rvert < \epsilon. +$$ + +When the integral exists, it is written $V = \int_a^b f(x) dx$. + +::: @@ -370,13 +376,14 @@ The area is invariant under shifts left or right. Any partition $a =x_0 < x_1 < \cdots < x_n=b$ is related to a partition of $[a-c, b-c]$ through $a-c < x_0-c < x_1-c < \cdots < x_n - c = b-c$. Let $d_i=c_i-c$ denote this partition, then we have: - +$$ \begin{align*} f(c_1 -c) \cdot (x_1 - x_0) &+ f(c_2 -c) \cdot (x_2 - x_1) + \cdots\\ &\quad + f(c_n -c) \cdot (x_n - x_{n-1})\\ &= f(d_1) \cdot(x_1-c - (x_0-c)) + f(d_2) \cdot(x_2-c - (x_1-c)) + \cdots\\ &\quad + f(d_n) \cdot(x_n-c - (x_{n-1}-c)). \end{align*} +$$ The left side will have a limit of $\int_a^b f(x-c) dx$ the right would have a "limit" of $\int_{a-c}^{b-c}f(x)dx$. @@ -471,7 +478,7 @@ Using the definition, we can compute a few definite integrals: This is just the area of a trapezoid with heights $a$ and $b$ and side length $b-a$, or $1/2 \cdot (b + a) \cdot (b - a)$. The right sum would be: - +$$ \begin{align*} S &= x_1 \cdot (x_1 - x_0) + x_2 \cdot (x_2 - x_1) + \cdots x_n \cdot (x_n - x_{n-1}) \\ &= (a + 1\frac{b-a}{n}) \cdot \frac{b-a}{n} + (a + 2\frac{b-a}{n}) \cdot \frac{b-a}{n} + \cdots (a + n\frac{b-a}{n}) \cdot \frac{b-a}{n}\\ @@ -480,6 +487,7 @@ S &= x_1 \cdot (x_1 - x_0) + x_2 \cdot (x_2 - x_1) + \cdots x_n \cdot (x_n - x_{ & \rightarrow a \cdot(b-a) + \frac{(b-a)^2}{2} \\ &= \frac{b^2}{2} - \frac{a^2}{2}. \end{align*} +$$ > $$ @@ -502,7 +510,7 @@ This is similar to the Archimedes case with $a=0$ and $b=1$ shown above. Cauchy showed this using a *geometric series* for the partition, not the arithmetic series $x_i = a + i (b-a)/n$. The series defined by $1 + \alpha = (b/a)^{1/n}$, then $x_i = a \cdot (1 + \alpha)^i$. Here the bases $x_{i+1} - x_i$ simplify to $x_i \cdot \alpha$ and $f(x_i) = (a\cdot(1+\alpha)^i)^k = a^k (1+\alpha)^{ik}$, or $f(x_i)(x_{i+1}-x_i) = a^{k+1}\alpha[(1+\alpha)^{k+1}]^i$, so, using $u=(1+\alpha)^{k+1}=(b/a)^{(k+1)/n}$, $f(x_i) \cdot(x_{i+1} - x_i) = a^{k+1}\alpha u^i$. This gives - +$$ \begin{align*} S &= a^{k+1}\alpha u^0 + a^{k+1}\alpha u^1 + \cdots + a^{k+1}\alpha u^{n-1}\\ &= a^{k+1} \cdot \alpha \cdot (u^0 + u^1 + \cdot u^{n-1}) \\ @@ -510,6 +518,7 @@ S &= a^{k+1}\alpha u^0 + a^{k+1}\alpha u^1 + \cdots + a^{k+1}\alpha u^{n-1}\\ &= (b^{k+1} - a^{k+1}) \cdot \frac{\alpha}{(1+\alpha)^{k+1} - 1} \\ &\rightarrow \frac{b^{k+1} - a^{k+1}}{k+1}. \end{align*} +$$ > $$ @@ -541,9 +550,12 @@ Certainly other integrals could be computed with various tricks, but we won't pu ### Some other consequences - * The definition is defined in terms of any partition with its norm bounded by $\delta$. If you know a function $f$ is Riemann integrable, then it is enough to consider just a regular partition $x_i = a + i \cdot (b-a)/n$ when forming the sums, as was done above. It is just that showing a limit for just this particular type of partition would not be sufficient to prove Riemann integrability. - * The choice of $c_i$ is arbitrary to allow for maximum flexibility. The Darboux integrals use the maximum and minimum over the subinterval. It is sufficient to prove integrability to show that the limit exists with just these choices. - * Most importantly, +* The definition is defined in terms of any partition with its norm bounded by $\delta$. If you know a function $f$ is Riemann integrable, then it is enough to consider just a regular partition $x_i = a + i \cdot (b-a)/n$ when forming the sums, as was done above. It is just that showing a limit for just this particular type of partition would not be sufficient to prove Riemann integrability. + +* The choice of $c_i$ is arbitrary to allow for maximum flexibility. The Darboux integrals use the maximum and minimum over the subinterval. It is sufficient to prove integrability to show that the limit exists with just these choices. + + +* Most importantly, > A continuous function on $[a,b]$ is Riemann integrable on $[a,b]$. @@ -553,13 +565,13 @@ Certainly other integrals could be computed with various tricks, but we won't pu The main idea behind this is that the difference between the maximum and minimum values over a partition gets small. That is if $[x_{i-1}, x_i]$ is like $1/n$ is length, then the difference between the maximum of $f$ over this interval, $M$, and the minimum, $m$ over this interval will go to zero as $n$ gets big. That $m$ and $M$ exists is due to the extreme value theorem, that this difference goes to $0$ is a consequence of continuity. What is needed is that this value goes to $0$ at the same rate – no matter what interval is being discussed – is a consequence of a notion of uniform continuity, a concept discussed in advanced calculus, but which holds for continuous functions on closed intervals. Armed with this, the Riemann sum for a general partition can be bounded by this difference times $b-a$, which will go to zero. So the upper and lower Riemann sums will converge to the same value. - * A "jump", or discontinuity of the first kind, is a value $c$ in $[a,b]$ where $\lim_{x \rightarrow c+} f(x)$ and $\lim_{x \rightarrow c-}f(x)$ both exist, but are not equal. It is true that a function that is not continuous on $I=[a,b]$, but only has discontinuities of the first kind on $I$ will be Riemann integrable on $I$. +* A "jump", or discontinuity of the first kind, is a value $c$ in $[a,b]$ where $\lim_{x \rightarrow c+} f(x)$ and $\lim_{x \rightarrow c-}f(x)$ both exist, but are not equal. It is true that a function that is not continuous on $I=[a,b]$, but only has discontinuities of the first kind on $I$ will be Riemann integrable on $I$. For example, the function $f(x) = 1$ for $x$ in $[0,1]$ and $0$ otherwise will be integrable, as it is continuous at all but two points, $0$ and $1$, where it jumps. - * Some functions can have infinitely many points of discontinuity and still be integrable. The example of $f(x) = 1/q$ when $x=p/q$ is rational, and $0$ otherwise is often used as an example. +* Some functions can have infinitely many points of discontinuity and still be integrable. The example of $f(x) = 1/q$ when $x=p/q$ is rational, and $0$ otherwise is often used as an example. ## Numeric integration @@ -799,7 +811,7 @@ While such bounds are disappointing, often, when looking for specific values, th The Riemann sum above is actually extremely inefficient. To see how much, we can derive an estimate for the error in approximating the value using an arithmetic progression as the partition. Let's assume that our function $f(x)$ is increasing, so that the right sum gives an upper estimate and the left sum a lower estimate, so the error in the estimate will be between these two values: - +$$ \begin{align*} \text{error} &\leq \left[ @@ -809,6 +821,7 @@ f(x_1) \cdot (x_{1} - x_0) + f(x_2) \cdot (x_{2} - x_1) + \cdots + f(x_{n-1})( &= \frac{b-a}{n} \cdot (\left[f(x_1) + f(x_2) + \cdots + f(x_n)\right] - \left[f(x_0) + \cdots + f(x_{n-1})\right]) \\ &= \frac{b-a}{n} \cdot (f(b) - f(a)). \end{align*} +$$ We see the error goes to $0$ at a rate of $1/n$ with the constant depending on $b-a$ and the function $f$. In general, a similar bound holds when $f$ is not monotonic. @@ -853,11 +866,12 @@ This formula will actually be exact for any 3rd degree polynomial. In fact an en The formulas for an approximation to the integral $\int_{-1}^1 f(x) dx$ discussed so far can be written as: - +$$ \begin{align*} S &= f(x_1) \Delta_1 + f(x_2) \Delta_2 + \cdots + f(x_n) \Delta_n\\ &= w_1 f(x_1) + w_2 f(x_2) + \cdots + w_n f(x_n). \end{align*} +$$ The $w$s are "weights" and the $x$s are nodes. A [Gaussian](http://en.wikipedia.org/wiki/Gaussian_quadrature) *quadrature rule* is a set of weights and nodes for $i=1, \dots n$ for which the sum is *exact* for any $f$ which is a polynomial of degree $2n-1$ or less. Such choices then also approximate well the integrals of functions which are not polynomials of degree $2n-1$, provided $f$ can be well approximated by a polynomial over $[-1,1]$. (Which is the case for the "nice" functions we encounter.) Some examples are given in the questions. diff --git a/quarto/integrals/center_of_mass.qmd b/quarto/integrals/center_of_mass.qmd index 5c0f1cbc..a3f486b9 100644 --- a/quarto/integrals/center_of_mass.qmd +++ b/quarto/integrals/center_of_mass.qmd @@ -158,23 +158,29 @@ The figure shows the approximating rectangles and circles representing their mas Generalizing from this figure shows the center of mass for such an approximation will be: - +$$ \begin{align*} &\frac{\rho f(c_1) (x_1 - x_0) \cdot x_1 + \rho f(c_2) (x_2 - x_1) \cdot x_1 + \cdots + \rho f(c_n) (x_n- x_{n-1}) \cdot x_{n-1}}{\rho f(c_1) (x_1 - x_0) + \rho f(c_2) (x_2 - x_1) + \cdots + \rho f(c_n) (x_n- x_{n-1})} \\ &=\\ &\quad\frac{f(c_1) (x_1 - x_0) \cdot x_1 + f(c_2) (x_2 - x_1) \cdot x_1 + \cdots + f(c_n) (x_n- x_{n-1}) \cdot x_{n-1}}{f(c_1) (x_1 - x_0) + f(c_2) (x_2 - x_1) + \cdots + f(c_n) (x_n- x_{n-1})}. \end{align*} +$$ But the top part is an approximation to the integral $\int_a^b x f(x) dx$ and the bottom part the integral $\int_a^b f(x) dx$. The ratio of these defines the center of mass. +::: {.callout-note icon=false} +## Center of Mass -> **Center of Mass**: The center of mass (in the $x$ direction) of a region in the $x-y$ plane described by the area under a (positive) function $f(x)$ between $a$ and $b$ is given by -> -> $\text{Center of mass} = \text{cm}_x = \frac{\int_a^b xf(x) dx}{\int_a^b f(x) dx}.$ -> -> For regions described by a more complicated set of equations, the center of mass is found from the same formula where $f(x)$ is the total height in the $x$ direction for a given $x$. +The center of mass (in the $x$ direction) of a region in the $x-y$ plane described by the area under a (positive) function $f(x)$ between $a$ and $b$ is given by +$$ +\text{Center of mass} = +\text{cm}_x = \frac{\int_a^b xf(x) dx}{\int_a^b f(x) dx}. +$$ + +For regions described by a more complicated set of equations, the center of mass is found from the same formula where $f(x)$ is the total height in the $x$ direction for a given $x$. +::: For the triangular shape, we have by the fact that $f(x) = 1 - \lvert x \rvert$ is an even function that $xf(x)$ will be odd, so the integral around $-1,1$ will be $0$. So the center of mass formula applied to this problem agrees with our expectation. diff --git a/quarto/integrals/ftc.qmd b/quarto/integrals/ftc.qmd index f1e38b48..14abf852 100644 --- a/quarto/integrals/ftc.qmd +++ b/quarto/integrals/ftc.qmd @@ -100,7 +100,7 @@ where we define $g(i) = f(a + ih)h$. In the above, $n$ relates to $b$, but we co Again, we fix a large $n$ and let $h=(b-a)/n$. And suppose $x = a + Mh$ for some $M$. Then writing out the approximations to both the definite integral and the derivative we have - +$$ \begin{align*} F'(x) = & \frac{d}{dx} \int_a^x f(u) du \\ & \approx \frac{F(x) - F(x-h)}{h} \\ @@ -113,17 +113,19 @@ F'(x) = & \frac{d}{dx} \int_a^x f(u) du \\ \left(f(a + 1h) + f(a + 2h) + \cdots + f(a + (M-1)h) \right) \\ &= f(a + Mh). \end{align*} +$$ If $g(i) = f(a + ih)$, then the above becomes - +$$ \begin{align*} F'(x) & \approx D(S(g))(M) \\ &= f(a + Mh)\\ &= f(x). \end{align*} +$$ That is $F'(x) \approx f(x)$. @@ -138,13 +140,14 @@ $$ With these heuristics, we now have: +::: {.callout-note icon=false} +## The fundamental theorem of calculus + +Part 1: Let $f$ be a continuous function on a closed interval $[a,b]$ and define $F(x) = \int_a^x f(u) du$ for $a \leq x \leq b$. Then $F$ is continuous on $[a,b]$, differentiable on $(a,b)$ and moreover, $F'(x) =f(x)$. -> **The fundamental theorem of calculus** -> -> Part 1: Let $f$ be a continuous function on a closed interval $[a,b]$ and define $F(x) = \int_a^x f(u) du$ for $a \leq x \leq b$. Then $F$ is continuous on $[a,b]$, differentiable on $(a,b)$ and moreover, $F'(x) =f(x)$. -> -> Part 2: Now suppose $f$ is any integrable function on a closed interval $[a,b]$ and $F(x)$ is *any* differentiable function on $[a,b]$ with $F'(x) = f(x)$. Then $\int_a^b f(x)dx=F(b)-F(a)$. +Part 2: Now suppose $f$ is any integrable function on a closed interval $[a,b]$ and $F(x)$ is *any* differentiable function on $[a,b]$ with $F'(x) = f(x)$. Then $\int_a^b f(x)dx=F(b)-F(a)$. +::: :::{.callout-note} @@ -366,25 +369,27 @@ This statement is nothing more than the derivative formula $[cf(x) + dg(x)]' = c * The antiderivative of the polynomial $p(x) = a_n x^n + \cdots + a_1 x + a_0$ follows from the linearity of the integral and the general power rule: - +$$ \begin{align*} \int (a_n x^n + \cdots + a_1 x + a_0) dx &= \int a_nx^n dx + \cdots + \int a_1 x dx + \int a_0 dx \\ &= a_n \int x^n dx + \cdots + a_1 \int x dx + a_0 \int dx \\ &= a_n\frac{x^{n+1}}{n+1} + \cdots + a_1 \frac{x^2}{2} + a_0 \frac{x}{1}. \end{align*} +$$ * More generally, a [Laurent](https://en.wikipedia.org/wiki/Laurent_polynomial) polynomial allows for terms with negative powers. These too can be handled by the above. For example - +$$ \begin{align*} \int (\frac{2}{x} + 2 + 2x) dx &= \int \frac{2}{x} dx + \int 2 dx + \int 2x dx \\ &= 2\int \frac{1}{x} dx + 2 \int dx + 2 \int xdx\\ &= 2\log(x) + 2x + 2\frac{x^2}{2}. \end{align*} +$$ * Consider this integral: @@ -645,13 +650,14 @@ Under assumptions that the $X$ are identical and independent, the largest value, This problem is constructed to take advantage of the FTC, and we have: - +$$ \begin{align*} \left[P(M \leq a)\right]' &= \left[F(a)^n\right]'\\ &= n \cdot F(a)^{n-1} \left[F(a)\right]'\\ &= n F(a)^{n-1}f(a) \end{align*} +$$ ##### Example diff --git a/quarto/integrals/integration_by_parts.qmd b/quarto/integrals/integration_by_parts.qmd index 7e99c021..9d80ff13 100644 --- a/quarto/integrals/integration_by_parts.qmd +++ b/quarto/integrals/integration_by_parts.qmd @@ -95,7 +95,7 @@ An illustration can clarify. Consider the integral $\int_0^\pi x\sin(x) dx$. If we let $u=x$ and $dv=\sin(x) dx$, then $du = 1dx$ and $v=-\cos(x)$. The above then says: - +$$ \begin{align*} \int_0^\pi x\sin(x) dx &= \int_0^\pi u dv\\ &= uv\big|_0^\pi - \int_0^\pi v du\\ @@ -104,6 +104,7 @@ Consider the integral $\int_0^\pi x\sin(x) dx$. If we let $u=x$ and $dv=\sin(x) &= \pi + \sin(x)\big|_0^\pi\\ &= \pi. \end{align*} +$$ The technique means one part is differentiated and one part integrated. The art is to break the integrand up into a piece that gets easier through differentiation and a piece that doesn't get much harder through integration. @@ -129,7 +130,7 @@ $$ Putting together gives: - +$$ \begin{align*} \int_1^2 x \log(x) dx &= (\log(x) \cdot \frac{x^2}{2}) \big|_1^2 - \int_1^2 \frac{x^2}{2} \frac{1}{x} dx\\ @@ -137,6 +138,7 @@ Putting together gives: &= 2\log(2) - (1 - \frac{1}{4}) \\ &= 2\log(2) - \frac{3}{4}. \end{align*} +$$ ##### Example @@ -145,14 +147,15 @@ Putting together gives: This related problem, $\int \log(x) dx$, uses the same idea, though perhaps harder to see at first glance, as setting `dv=dx` is almost too simple to try: - +$$ \begin{align*} u &= \log(x) & dv &= dx\\ du &= \frac{1}{x}dx & v &= x \end{align*} +$$ - +$$ \begin{align*} \int \log(x) dx &= \int u dv\\ @@ -161,6 +164,7 @@ du &= \frac{1}{x}dx & v &= x &= x \log(x) - \int dx\\ &= x \log(x) - x \end{align*} +$$ Were this a definite integral problem, we would have written: @@ -244,13 +248,14 @@ $$ Positive integer powers of trigonometric functions can be addressed by this technique. Consider $\int \cos(x)^n dx$. We let $u=\cos(x)^{n-1}$ and $dv=\cos(x) dx$. Then $du = (n-1)\cos(x)^{n-2}(-\sin(x))dx$ and $v=\sin(x)$. So, - +$$ \begin{align*} \int \cos(x)^n dx &= \cos(x)^{n-1} \cdot (\sin(x)) + \int (\sin(x)) ((n-1)\sin(x) \cos(x)^{n-2}) dx \\ &= \sin(x) \cos(x)^{n-1} + (n-1)\int \sin^2(x) \cos(x)^{n-2} dx\\ &= \sin(x) \cos(x)^{n-1} + (n-1)\int (1 - \cos(x)^2) \cos(x)^{n-2} dx\\ &= \sin(x) \cos(x)^{n-1} + (n-1)\int \cos(x)^{n-2}dx - (n-1)\int \cos(x)^n dx. \end{align*} +$$ We can then solve for the unknown ($\int \cos(x)^{n}dx$) to get this *reduction formula*: @@ -279,12 +284,13 @@ The visual interpretation of integration by parts breaks area into two pieces, t Let $uv = x f^{-1}(x)$. Then we have $[uv]' = u'v + uv' = f^{-1}(x) + x [f^{-1}(x)]'$. So, up to a constant $uv = \int [uv]'dx = \int f^{-1}(x)dx + \int x [f^{-1}(x)]'dx$. Re-expressing gives: - +$$ \begin{align*} \int f^{-1}(x) dx &= xf^{-1}(x) - \int x [f^{-1}(x)]' dx\\ &= xf^{-1}(x) - \int f(u) du.\\ \end{align*} +$$ The last line follows from the $u$-substitution: $u=f^{-1}(x)$ for then $du = [f^{-1}(x)]' dx$ and $x=f(u)$. @@ -293,12 +299,13 @@ The last line follows from the $u$-substitution: $u=f^{-1}(x)$ for then $du = [f We use this to find an antiderivative for $\sin^{-1}(x)$: - +$$ \begin{align*} \int \sin^{-1}(x) dx &= x \sin^{-1}(x) - \int \sin(u) du \\ &= x \sin^{-1}(x) + \cos(u) \\ &= x \sin^{-1}(x) + \cos(\sin^{-1}(x)). \end{align*} +$$ Using right triangles to simplify, the last value $\cos(\sin^{-1}(x))$ can otherwise be written as $\sqrt{1 - x^2}$. @@ -322,11 +329,12 @@ This [proof](http://www.math.ucsd.edu/~ebender/20B/77_Trap.pdf) for the error es First, for convenience, we consider the interval $x_i$ to $x_i+h$. The actual answer over this is just $\int_{x_i}^{x_i+h}f(x) dx$. By a $u$-substitution with $u=x-x_i$ this becomes $\int_0^h f(t + x_i) dt$. For analyzing this we integrate once by parts using $u=f(t+x_i)$ and $dv=dt$. But instead of letting $v=t$, we choose to add - as is our prerogative - a constant of integration $A$, so $v=t+A$: - +$$ \begin{align*} \int_0^h f(t + x_i) dt &= uv \big|_0^h - \int_0^h v du\\ &= f(t+x_i)(t+A)\big|_0^h - \int_0^h (t + A) f'(t + x_i) dt. \end{align*} +$$ We choose $A$ to be $-h/2$, any constant is possible, for then the term $f(t+x_i)(t+A)\big|_0^h$ becomes $(1/2)(f(x_i+h) + f(x_i)) \cdot h$, or the trapezoid approximation. This means, the error over this interval - actual minus estimate - satisfies: @@ -339,11 +347,12 @@ $$ For this, we *again* integrate by parts with - +$$ \begin{align*} u &= f'(t + x_i) & dv &= (t + A)dt\\ du &= f''(t + x_i) & v &= \frac{(t + A)^2}{2} + B \end{align*} +$$ Again we added a constant of integration, $B$, to $v$. The error becomes: @@ -418,13 +427,14 @@ We added a rectangle for a Riemann sum for $t_i = \pi/3$ and $t_{i+1} = \pi/3 + Taking this Riemann sum approach, we can approximate the area under the curve parameterized by $(u(t), v(t))$ over the time range $[t_i, t_{i+1}]$ as a rectangle with height $y(t_i)$ and base $x(t_{i}) - x(t_{i+1})$. Then we get, as expected: - +$$ \begin{align*} A &\approx \sum_i y(t_i) \cdot (x(t_{i}) - x(t_{i+1}))\\ &= - \sum_i y(t_i) \cdot (x(t_{i+1}) - x(t_{i}))\\ &= - \sum_i y(t_i) \cdot \frac{x(t_{i+1}) - x(t_i)}{t_{i+1}-t_i} \cdot (t_{i+1}-t_i)\\ &\approx -\int_a^b y(t) x'(t) dt. \end{align*} +$$ So with a counterclockwise rotation, the actual answer for the area includes a minus sign. If the area is traced out in a *clockwise* manner, there is no minus sign. diff --git a/quarto/integrals/mean_value_theorem.qmd b/quarto/integrals/mean_value_theorem.qmd index e28bba76..1079764f 100644 --- a/quarto/integrals/mean_value_theorem.qmd +++ b/quarto/integrals/mean_value_theorem.qmd @@ -89,13 +89,14 @@ Though not continuous, $f(x)$ is integrable as it contains only jumps. The integ What is the average value of the function $e^{-x}$ between $0$ and $\log(2)$? - +$$ \begin{align*} \text{average} = \frac{1}{\log(2) - 0} \int_0^{\log(2)} e^{-x} dx\\ &= \frac{1}{\log(2)} (-e^{-x}) \big|_0^{\log(2)}\\ &= -\frac{1}{\log(2)} (\frac{1}{2} - 1)\\ &= \frac{1}{2\log(2)}. \end{align*} +$$ Visualizing, we have @@ -118,11 +119,15 @@ $$ When we assume that $f(x)$ is continuous, we can describe $K$ as a value in the range of $f$: +::: {.callout-note icon=false} +## The mean value theorem for integrals -> **The mean value theorem for integrals**: Let $f(x)$ be a continuous function on $[a,b]$ with $a < b$. Then there exists $c$ with $a \leq c \leq b$ with -> -> $f(c) \cdot (b-a) = \int_a^b f(x) dx.$` +Let $f(x)$ be a continuous function on $[a,b]$ with $a < b$. Then there exists $c$ with $a \leq c \leq b$ with +$$ +f(c) \cdot (b-a) = \int_a^b f(x) dx. +$$ +::: The proof comes from the intermediate value theorem and the extreme value theorem. Since $f$ is continuous on a closed interval, there exists values $m$ and $M$ with $f(c_m) = m \leq f(x) \leq M=f(c_M)$, for some $c_m$ and $c_M$ in the interval $[a,b]$. Since $m \leq f(x) \leq M$, we must have: diff --git a/quarto/integrals/partial_fractions.qmd b/quarto/integrals/partial_fractions.qmd index 138aca16..2fc0983e 100644 --- a/quarto/integrals/partial_fractions.qmd +++ b/quarto/integrals/partial_fractions.qmd @@ -28,13 +28,16 @@ Let $f(x) = p(x)/q(x)$, where $p$ and $q$ are polynomial functions with real co The function $q(x)$ will factor over the real numbers. The fundamental theorem of algebra can be applied to say that $q(x)=q_1(x)^{n_1} \cdots q_k(x)^{n_k}$ where $q_i(x)$ is a linear or quadratic polynomial and $n_k$ a positive integer. +::: {.callout-note icon=false} +## Partial Fraction Decomposition -> **Partial Fraction Decomposition**: There are unique polynomials $a_{ij}$ with degree $a_{ij} <$ degree $q_i$ such that -> -> $$ -> \frac{p(x)}{q(x)} = a(x) + \sum_{i=1}^k \sum_{j=1}^{n_i} \frac{a_{ij}(x)}{q_i(x)^j}. -> $$ +There are unique polynomials $a_{ij}$ with degree $a_{ij} <$ degree $q_i$ such that +$$ +\frac{p(x)}{q(x)} = a(x) + \sum_{i=1}^k \sum_{j=1}^{n_i} \frac{a_{ij}(x)}{q_i(x)^j}. +$$ + +::: The method is attributed to John Bernoulli, one of the prolific Bernoulli brothers who put a stamp on several areas of math. This Bernoulli was a mentor to Euler. @@ -109,7 +112,7 @@ What remains is to establish that we can take $A(x) = a(x)\cdot P(x)$ with a deg In Proposition 3.8 of [Bradley](http://www.m-hikari.com/imf/imf-2012/29-32-2012/cookIMF29-32-2012.pdf) and Cook we can see how. Recall the division algorithm, for example, says there are $q_k$ and $r_k$ with $A=q\cdot q_k + r_k$ where the degree of $r_k$ is less than that of $q$, which is linear or quadratic. This is repeatedly applied below: - +$$ \begin{align*} \frac{A}{q^k} &= \frac{q\cdot q_k + r_k}{q^k}\\ &= \frac{r_k}{q^k} + \frac{q_k}{q^{k-1}}\\ @@ -119,6 +122,7 @@ In Proposition 3.8 of [Bradley](http://www.m-hikari.com/imf/imf-2012/29-32-2012/ &= \cdots\\ &= \frac{r_k}{q^k} + \frac{r_{k-1}}{q^{k-1}} + \cdots + q_1. \end{align*} +$$ So the term $A(x)/q(x)^k$ can be expressed in terms of a sum where the numerators or each term have degree less than $q(x)$, as expected by the statement of the theorem. @@ -208,13 +212,14 @@ integrate(B/((a*x)^2 - 1)^4, x) In [Bronstein](http://www-sop.inria.fr/cafe/Manuel.Bronstein/publications/issac98.pdf) this characterization can be found - "This method, which dates back to Newton, Leibniz and Bernoulli, should not be used in practice, yet it remains the method found in most calculus texts and is often taught. Its major drawback is the factorization of the denominator of the integrand over the real or complex numbers." We can also find the following formulas which formalize the above exploratory calculations ($j>1$ and $b^2 - 4c < 0$ below): - +$$ \begin{align*} \int \frac{A}{(x-a)^j} &= \frac{A}{1-j}\frac{1}{(x-a)^{j-1}}\\ \int \frac{A}{x-a} &= A\log(x-a)\\ \int \frac{Bx+C}{x^2 + bx + c} &= \frac{B}{2} \log(x^2 + bx + c) + \frac{2C-bB}{\sqrt{4c-b^2}}\cdot \arctan\left(\frac{2x+b}{\sqrt{4c-b^2}}\right)\\ \int \frac{Bx+C}{(x^2 + bx + c)^j} &= \frac{B' x + C'}{(x^2 + bx + c)^{j-1}} + \int \frac{C''}{(x^2 + bx + c)^{j-1}} \end{align*} +$$ The first returns a rational function; the second yields a logarithm term; the third yields a logarithm and an arctangent term; while the last, which has explicit constants available, provides a reduction that can be recursively applied; @@ -288,7 +293,7 @@ The answers found can become quite involved. [Corless](https://arxiv.org/pdf/171 ex = (x^2 - 1) / (x^4 + 5x^2 + 7) ``` -But the integral is something best suited to a computer algebra system: +But the integral is something best suited for a computer algebra system: ```{julia} @@ -482,11 +487,12 @@ How to see that these give rise to real answers on integration is the point of t Breaking the terms up over $a$ and $b$ we have: - +$$ \begin{align*} I &= \frac{a}{x - (\alpha + i \beta)} + \frac{a}{x - (\alpha - i \beta)} \\ II &= i\frac{b}{x - (\alpha + i \beta)} - i\frac{b}{x - (\alpha - i \beta)} \end{align*} +$$ Integrating $I$ leads to two logarithmic terms, which are combined to give: diff --git a/quarto/integrals/substitution.qmd b/quarto/integrals/substitution.qmd index a3dd9634..62d08f25 100644 --- a/quarto/integrals/substitution.qmd +++ b/quarto/integrals/substitution.qmd @@ -41,13 +41,14 @@ $$ So, - +$$ \begin{align*} \int_a^b g(u(t)) \cdot u'(t) dt &= \int_a^b (G \circ u)'(t) dt\\ &= (G\circ u)(b) - (G\circ u)(a) \quad\text{(the FTC, part II)}\\ &= G(u(b)) - G(u(a)) \\ &= \int_{u(a)}^{u(b)} g(x) dx. \quad\text{(the FTC part II)} \end{align*} +$$ That is, this substitution formula applies: @@ -181,7 +182,7 @@ when $-1 \leq x \leq 1$ and $0$ otherwise. The area under $f$ is just $1$ - the Let $u(x) = (x-c)/h$ and $g(x) = (1/h) \cdot f(u(x))$. Then, as $du = 1/h dx$ - +$$ \begin{align*} \int_{c-h}^{c+h} g(x) dx &= \int_{c-h}^{c+h} \frac{1}{h} f(u(x)) dx\\ @@ -189,6 +190,7 @@ Let $u(x) = (x-c)/h$ and $g(x) = (1/h) \cdot f(u(x))$. Then, as $du = 1/h dx$ &= \int_{-1}^1 f(u) du\\ &= 1. \end{align*} +$$ So the area of this transformed function is still $1$. The shifting by $c$ we know doesn't effect the area, the scaling by $h$ inside of $f$ does, but is balanced out by the multiplication by $1/h$ outside of $f$. @@ -248,13 +250,14 @@ $$ But $u^3/3 - 4u/3 = (1/3) \cdot u(u-2)(u+2)$, so between $-2$ and $0$ it is positive and between $0$ and $1$ negative, so this integral is: - +$$ \begin{align*} \int_{-2}^0 (u^3/3 - 4u/3 ) du + \int_{0}^1 -(u^3/3 - 4u/3) du &= (\frac{u^4}{12} - \frac{4}{3}\frac{u^2}{2}) \big|_{-2}^0 - (\frac{u^4}{12} - \frac{4}{3}\frac{u^2}{2}) \big|_{0}^1\\ &= \frac{4}{3} - -\frac{7}{12}\\ &= \frac{23}{12}. \end{align*} +$$ ##### Example @@ -270,13 +273,14 @@ $$ Integrals involving this function are typically transformed by substitution. For example: - +$$ \begin{align*} \int_a^b f(x; \mu, \sigma) dx &= \int_a^b \frac{1}{\sqrt{2\pi}}\frac{1}{\sigma} \exp(-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2) dx \\ &= \int_{u(a)}^{u(b)} \frac{1}{\sqrt{2\pi}} \exp(-\frac{1}{2}u^2) du \\ &= \int_{u(a)}^{u(b)} f(u; 0, 1) du, \end{align*} +$$ where $u = (x-\mu)/\sigma$, so $du = (1/\sigma) dx$. @@ -295,12 +299,13 @@ $$ A further change of variables by $t = u/\sqrt{2}$ (with $\sqrt{2}dt = du$) gives: - +$$ \begin{align*} \int_a^b f(x; \mu, \sigma) dx &= \int_{t(u(a))}^{t(u(b))} \frac{\sqrt{2}}{\sqrt{2\pi}} \exp(-t^2) dt\\ &= \frac{1}{2} \int_{t(u(a))}^{t(u(b))} \frac{2}{\sqrt{\pi}} \exp(-t^2) dt \end{align*} +$$ Up to a factor of $1/2$ this is `erf`. @@ -309,13 +314,14 @@ Up to a factor of $1/2$ this is `erf`. So we would have, for example, with $\mu=1$,$\sigma=2$ and $a=1$ and $b=3$ that: - +$$ \begin{align*} t(u(a)) &= (1 - 1)/2/\sqrt{2} = 0\\ t(u(b)) &= (3 - 1)/2/\sqrt{2} = \frac{1}{\sqrt{2}}\\ \int_1^3 f(x; 1, 2) &= \frac{1}{2} \int_0^{1/\sqrt{2}} \frac{2}{\sqrt{\pi}} \exp(-t^2) dt. \end{align*} +$$ Or @@ -488,7 +494,7 @@ integrate(1 / (a^2 + (b*x)^2), x) The expression $1-x^2$ can be attacked by the substitution $\sin(u) =x$ as then $1-x^2 = 1-\sin(u)^2 = \cos(u)^2$. Here we see this substitution being used successfully: - +$$ \begin{align*} \int \frac{1}{\sqrt{9 - x^2}} dx &= \int \frac{1}{\sqrt{9 - (3\sin(u))^2}} \cdot 3\cos(u) du\\ &=\int \frac{1}{3\sqrt{1 - \sin(u)^2}}\cdot3\cos(u) du \\ @@ -496,6 +502,7 @@ The expression $1-x^2$ can be attacked by the substitution $\sin(u) =x$ as then &= u \\ &= \sin^{-1}(x/3). \end{align*} +$$ Further substitution allows the following integral to be solved for an antiderivative: @@ -513,23 +520,25 @@ integrate(1 / sqrt(a^2 - b^2*x^2), x) The expression $x^2 - 1$ is a bit different, this lends itself to $\sec(u) = x$ for a substitution, for $\sec(u)^2 - 1 = \tan(u)^2$. For example, we try $\sec(u) = x$ to integrate: - +$$ \begin{align*} \int \frac{1}{\sqrt{x^2 - 1}} dx &= \int \frac{1}{\sqrt{\sec(u)^2 - 1}} \cdot \sec(u)\tan(u) du\\ &=\int \frac{1}{\tan(u)}\sec(u)\tan(u) du\\ &= \int \sec(u) du. \end{align*} +$$ This doesn't seem that helpful, but the antiderivative to $\sec(u)$ is $\log\lvert (\sec(u) + \tan(u))\rvert$, so we can proceed to get: - +$$ \begin{align*} \int \frac{1}{\sqrt{x^2 - 1}} dx &= \int \sec(u) du\\ &= \log\lvert (\sec(u) + \tan(u))\rvert\\ &= \log\lvert x + \sqrt{x^2-1} \rvert. \end{align*} +$$ SymPy gives a different representation using the arccosine: @@ -566,13 +575,14 @@ $$ The identify $\cos(u)^2 = (1 + \cos(2u))/2$ makes this tractable: - +$$ \begin{align*} 4ab \int \cos(u)^2 du &= 4ab\int_0^{\pi/2}(\frac{1}{2} + \frac{\cos(2u)}{2}) du\\ &= 4ab(\frac{1}{2}u + \frac{\sin(2u)}{4})\big|_0^{\pi/2}\\ &= 4ab (\pi/4 + 0) = \pi ab. \end{align*} +$$ Keeping in mind that that a circle with radius $a$ is an ellipse with $b=a$, we see that this gives the correct answer for a circle. diff --git a/quarto/integrals/surface_area.qmd b/quarto/integrals/surface_area.qmd index 688de64f..ee946fb3 100644 --- a/quarto/integrals/surface_area.qmd +++ b/quarto/integrals/surface_area.qmd @@ -50,19 +50,24 @@ revolution, there is an easier way. (Photo credit to [firepanjewellery](http://firepanjewellery.com/).) ](./figures/gehry-hendrix.jpg) -> The surface area generated by rotating the graph of $f(x)$ between $a$ and $b$ about the $x$-axis is given by the integral -> -> $$ -> \int_a^b 2\pi f(x) \cdot \sqrt{1 + f'(x)^2} dx. -> $$ -> -> If the curve is parameterized by $(g(t), f(t))$ between $a$ and $b$ then the surface area is -> -> $$ -> \int_a^b 2\pi f(t) \cdot \sqrt{g'(t)^2 + f'(t)^2} dx. -> $$ -> -> These formulas do not add in the surface area of either of the ends. + +::: {.callout-note icon=false} +## Surface area of a rotated curve + +The surface area generated by rotating the graph of $f(x)$ between $a$ and $b$ about the $x$-axis is given by the integral + +$$ +\int_a^b 2\pi f(x) \cdot \sqrt{1 + f'(x)^2} dx. +$$ + +If the curve is parameterized by $(g(t), f(t))$ between $a$ and $b$ then the surface area is + +$$ +\int_a^b 2\pi f(t) \cdot \sqrt{g'(t)^2 + f'(t)^2} dx. +$$ + +These formulas do not add in the surface area of either of the ends. +::: @@ -129,7 +134,7 @@ Lets see that the surface area of an open cone follows from this formula, even t A cone can be envisioned as rotating the function $f(x) = x\tan(\theta)$ between $0$ and $h$ around the $x$ axis. This integral yields the surface area: - +$$ \begin{align*} \int_0^h 2\pi f(x) \sqrt{1 + f'(x)^2}dx &= \int_0^h 2\pi x \tan(\theta) \sqrt{1 + \tan(\theta)^2}dx \\ @@ -137,6 +142,7 @@ A cone can be envisioned as rotating the function $f(x) = x\tan(\theta)$ between &= \pi \tan(\theta) \sec(\theta) h^2 \\ &= \pi r^2 / \sin(\theta). \end{align*} +$$ (There are many ways to express this, we used $r$ and $\theta$ to match the work above. If the cone is parameterized by a height $h$ and radius $r$, then the surface area of the sides is $\pi r\sqrt{h^2 + r^2}$. If the base is included, there is an additional $\pi r^2$ term.) @@ -322,13 +328,14 @@ plot(g, f, 0, 1pi) The integrand simplifies to $8\sqrt{2}\pi \sin(t) (1 + \cos(t))^{3/2}$. This lends itself to $u$-substitution with $u=\cos(t)$. - +$$ \begin{align*} \int_0^\pi 8\sqrt{2}\pi \sin(t) (1 + \cos(t))^{3/2} &= 8\sqrt{2}\pi \int_1^{-1} (1 + u)^{3/2} (-1) du\\ &= 8\sqrt{2}\pi (2/5) (1+u)^{5/2} \big|_{-1}^1\\ &= 8\sqrt{2}\pi (2/5) 2^{5/2} = \frac{2^7 \pi}{5}. \end{align*} +$$ ## The first Theorem of Pappus @@ -378,11 +385,12 @@ surface(ws..., legend=false, zlims=(-12,12)) The surface area of sphere will be SA$=2\pi \rho (\pi r) = 2 \pi^2 r \cdot \rho$. What is $\rho$? The centroid of an arc formula can be derived in a manner similar to that of the centroid of a region. The formulas are: - +$$ \begin{align*} \text{cm}_x &= \frac{1}{L} \int_a^b g(t) \sqrt{g'(t)^2 + f'(t)^2} dt\\ \text{cm}_y &= \frac{1}{L} \int_a^b f(t) \sqrt{g'(t)^2 + f'(t)^2} dt. \end{align*} +$$ Here, $L$ is the arc length of the curve. diff --git a/quarto/limits/continuity.qmd b/quarto/limits/continuity.qmd index 39ad2534..9cc18013 100644 --- a/quarto/limits/continuity.qmd +++ b/quarto/limits/continuity.qmd @@ -16,6 +16,8 @@ using SymPy --- +![A Möbius strip by Koo Jeong A](figures/korean-mobius.jpg){width=40%} + The definition Google finds for *continuous* is *forming an unbroken whole; without interruption*. @@ -54,12 +56,15 @@ However, [Cauchy](http://en.wikipedia.org/wiki/Cours_d%27Analyse) defined contin The [modern](http://en.wikipedia.org/wiki/Continuous_function#History) definition simply pushes the details to the definition of the limit: +::: {.callout-note icon=false} +## Definition of continuity at a point -> A function $f(x)$ is continuous at $x=c$ if $\lim_{x \rightarrow c}f(x) = f(c)$. +A function $f(x)$ is continuous at $x=c$ if $\lim_{x \rightarrow c}f(x) = f(c)$. +::: -This says three things +The definition says three things * The limit exists at $c$. @@ -67,11 +72,14 @@ This says three things * The value of the limit is the same as $f(c)$. -This speaks to continuity at a point, we can extend this to continuity over an interval $(a,b)$ by saying: +The defined speaks to continuity at a point, we can extend it to continuity over an interval $(a,b)$ by saying: +::: {.callout-note icon=false} +## Definition of continuity over an open interval -> A function $f(x)$ is continuous over $(a,b)$ if at each point $c$ with $a < c < b$, $f(x)$ is continuous at $c$. +A function $f(x)$ is continuous over $(a,b)$ if at each point $c$ with $a < c < b$, $f(x)$ is continuous at $c$. +::: Finally, as with limits, it can be convenient to speak of *right* continuity and *left* continuity at a point, where the limit in the definition is replaced by a right or left limit, as appropriate. @@ -192,7 +200,7 @@ $$ What value of $c$ will make $f(x)$ a continuous function? -We note that for $x < 0$ and for $x > 0$ the function is a simple polynomial, so is continuous. At $x=0$ to be continuous we need a limit to exists and be equal to $f(0)$, which is $c$. A limit exists if the left and right limits are equal. This means we need to solve for $c$ to make the left and right limits equal. We do this next with a bit of overkill in this case: +We note that for $x < 0$ and for $x > 0$ the function is defined by a simple polynomial, so is continuous. At $x=0$ to be continuous we need a limit to exists and be equal to $f(0)$, which is $c$. A limit exists if the left and right limits are equal. This means we need to solve for $c$ to make the left and right limits equal. We do this next with a bit of overkill in this case: ```{julia} @@ -206,7 +214,7 @@ We need to solve for $c$ to make `del` zero: ```{julia} -solve(del, c) +solve(del ~ 0, c) ``` This gives the value of $c$. diff --git a/quarto/limits/figures/buzz-infinity.jpg b/quarto/limits/figures/buzz-infinity.jpg new file mode 100644 index 00000000..b7d31990 Binary files /dev/null and b/quarto/limits/figures/buzz-infinity.jpg differ diff --git a/quarto/limits/figures/ivt.jpg b/quarto/limits/figures/ivt.jpg new file mode 100644 index 00000000..4987a145 Binary files /dev/null and b/quarto/limits/figures/ivt.jpg differ diff --git a/quarto/limits/figures/korean-mobius.jpg b/quarto/limits/figures/korean-mobius.jpg new file mode 100644 index 00000000..8816f765 Binary files /dev/null and b/quarto/limits/figures/korean-mobius.jpg differ diff --git a/quarto/limits/intermediate_value_theorem.qmd b/quarto/limits/intermediate_value_theorem.qmd index b6e51ca0..5711ac15 100644 --- a/quarto/limits/intermediate_value_theorem.qmd +++ b/quarto/limits/intermediate_value_theorem.qmd @@ -17,14 +17,19 @@ using SymPy --- -Continuity for functions is a valued property which carries implications. In this section we discuss two: the intermediate value theorem and the extreme value theorem. These two theorems speak to some fundamental applications of calculus: finding zeros of a function and finding extrema of a function. +![Between points M and M lies an F](figures/ivt.jpg){width=40%} + +Continuity for functions is a valued property which carries implications. In this section we discuss two: the intermediate value theorem and the extreme value theorem. These two theorems speak to some fundamental applications of calculus: finding zeros of a function and finding extrema of a function. [L'Hospitals](https://ia801601.us.archive.org/26/items/infinimentpetits1716lhos00uoft/infinimentpetits1716lhos00uoft.pdf) figure 55, above, suggests why. ## Intermediate Value Theorem +::: {.callout-note icon=false} +## The intermediate value theorem -> The *intermediate value theorem*: If $f$ is continuous on $[a,b]$ with, say, $f(a) < f(b)$, then for any $y$ with $f(a) \leq y \leq f(b)$ there exists a $c$ in $[a,b]$ with $f(c) = y$. +If $f$ is continuous on $[a,b]$ with, say, $f(a) < f(b)$, then for any $y$ with $f(a) \leq y \leq f(b)$ there exists a $c$ in $[a,b]$ with $f(c) = y$. +::: ```{julia} @@ -98,7 +103,9 @@ sign_chart(f, -3, 3) The intermediate value theorem can find the sign of the function *between* adjacent zeros, but how are the zeros identified? Here, we use the Bolzano theorem to give an algorithm - the *bisection method* - to locate a value $c$ in $[a,b]$ with $f(c) = 0$ under the assumptions: + * $f$ is continuous on $[a,b]$ + * $f$ changes sign between $a$ and $b$. (In particular, when $f(a)$ and $f(b)$ have different signs.) ::: {.callout-note} @@ -375,11 +382,11 @@ For symbolic expressions, as below, then, as a convenience, an equation (formed ```{julia} @syms x -solve(cos(x) ~ x, (0, 2)) +find_zero(cos(x) ~ x, (0, 2)) ``` ::: -[![Intersection of two curves as illustrated by Canadian artist Kapwani Kiwanga.](figures/intersection-biennale.jpg)](https://www.gallery.ca/whats-on/touring-exhibitions-and-loans/around-the-world/canada-pavilion-at-the-venice-biennale/kapwani-kiwanga-trinket) +[![Intersection of two curves as illustrated by Canadian artist Kapwani Kiwanga.](figures/intersection-biennale.jpg)](https://www.gallery.ca/whats-on/touring-exhibitions-and-loans/around-the-world/canada-pavilion-at-the-venice-biennale/kapwani-kiwanga-trinket){width=40%} ##### Example: Inverse functions @@ -496,11 +503,12 @@ For the model without wind resistance, we can graph the function easily enough. plot(j, 0, 500) ``` -Well, we haven't even seen the peak yet. Better to do a little spade work first. This is a quadratic function, so we can use `roots` from `SymPy` to find the roots: +Well, we haven't even seen the peak yet. Better to do a little spade work first. This is a quadratic function, so we can use `solve` from `SymPy` to find the roots: ```{julia} -roots(j(x)) +@syms x +solve(j(x) ~ 0, x) ``` We see that $1250$ is the largest root. So we plot over this domain to visualize the flight: @@ -706,22 +714,25 @@ The Extreme Value Theorem is another consequence of continuity. To discuss the extreme value theorem, we define an *absolute maximum*. +::: {.callout-note icon=false} +## Absolute maximum, absolute minimum -> The absolute maximum of $f(x)$ over an interval $I$, when it exists, is the value $f(c)$, $c$ in $I$, where $f(x) \leq f(c)$ for any $x$ in $I$. -> -> Similarly, an *absolute minimum* of $f(x)$ over an interval $I$ can be defined, when it exists, by a value $f(c)$ where $c$ is in $I$ *and* $f(c) \leq f(x)$ for any $x$ in $I$. +The absolute maximum of $f(x)$ over an interval $I$, when it exists, is the value $f(c)$, $c$ in $I$, where $f(x) \leq f(c)$ for any $x$ in $I$. +Similarly, an *absolute minimum* of $f(x)$ over an interval $I$ can be defined, when it exists, by a value $f(c)$ where $c$ is in $I$ *and* $f(c) \leq f(x)$ for any $x$ in $I$. +::: Related but different is the concept of a relative of *local extrema*: +::: {.callout-note icon=false} +## Local maximum, local minimum -> A local maxima for $f$ is a value $f(c)$ where $c$ is in **some** *open* interval $I=(a,b)$, $I$ in the domain of $f$, and $f(c)$ is an absolute maxima for $f$ over $I$. Similarly, an local minima for $f$ is a value $f(c)$ where $c$ is in **some** *open* interval $I=(a,b)$, $I$ in the domain of $f$, and $f(x)$ is an absolute minima for $f$ over $I$. - +A local maxima for $f$ is a value $f(c)$ where $c$ is in **some** *open* interval $I=(a,b)$, $I$ in the domain of $f$, and $f(c)$ is an absolute maxima for $f$ over $I$. Similarly, an local minima for $f$ is a value $f(c)$ where $c$ is in **some** *open* interval $I=(a,b)$, $I$ in the domain of $f$, and $f(x)$ is an absolute minima for $f$ over $I$. The term *local extrema* is used to describe either a local maximum or local minimum. - +::: The key point, is the extrema are values in the *range* that are realized by some value in the *domain* (possibly more than one.) @@ -742,14 +753,16 @@ nothing ``` [![Elevation profile of the Hardrock 100 ultramarathon. Treating the elevation profile as a function, the absolute maximum is just about 14,000 feet and the absolute minimum about 7600 feet. These are of interest to the runner for different reasons. Also of interest would be each local maxima and local minima - the peaks and valleys of the graph - and the total elevation climbed - the latter so important/unforgettable its value makes it into the chart's title. -](limits/figures/hardrock-100.jpeg)](https://hardrock100.com) +](figures/hardrock-100.jpeg)](https://hardrock100.com){width=50%} The extreme value theorem discusses an assumption that ensures absolute maximum and absolute minimum values exist. +::: {.callout-note icon=false} +## The extreme value theorem -> The *extreme value theorem*: If $f(x)$ is continuous over a closed interval $[a,b]$ then $f$ has an absolute maximum and an absolute minimum over $[a,b]$. - +If $f(x)$ is continuous over a closed interval $[a,b]$ then $f$ has an absolute maximum and an absolute minimum over $[a,b]$. +::: (By continuous over $[a,b]$ we mean continuous on $(a,b)$ and right continuous at $a$ and left continuous at $b$.) @@ -1013,7 +1026,7 @@ nothing ``` ![Trajectories of potential cannonball fires with air-resistance included. (http://ej.iop.org/images/0143-0807/33/1/149/Full/ejp405251f1_online.jpg) -](figures/cannonball.jpg) +](figures/cannonball.jpg){width=50%} In 1638, according to Amir D. [Aczel](http://books.google.com/books?id=kvGt2OlUnQ4C&pg=PA28&lpg=PA28&dq=mersenne+cannon+ball+tests&source=bl&ots=wEUd7e0jFk&sig=LpFuPoUvODzJdaoug4CJsIGZZHw&hl=en&sa=X&ei=KUGcU6OAKJCfyASnioCoBA&ved=0CCEQ6AEwAA#v=onepage&q=mersenne%20cannon%20ball%20tests&f=false), an experiment was performed in the French Countryside. A monk, Marin Mersenne, launched a cannonball straight up into the air in an attempt to help Descartes prove facts about the rotation of the earth. Though the experiment was not successful, Mersenne later observed that the time for the cannonball to go up was greater than the time to come down. ["Vertical Projection in a Resisting Medium: Reflections on Observations of Mersenne".](http://www.maa.org/publications/periodicals/american-mathematical-monthly/american-mathematical-monthly-contents-junejuly-2014) diff --git a/quarto/limits/limits.qmd b/quarto/limits/limits.qmd index 577fb7ca..ba2221b6 100644 --- a/quarto/limits/limits.qmd +++ b/quarto/limits/limits.qmd @@ -181,9 +181,12 @@ Informally, if a limit exists it is the value that $f(x)$ gets close to as $x$ g The modern formulation is due to Weierstrass: +::: {.callout-note icon=false} +## The $\epsilon-\delta$ Definition of a limit of $f(x)$ -> The limit of $f(x)$ as $x$ approaches $c$ is $L$ if for every real $\epsilon > 0$, there exists a real $\delta > 0$ such that for all real $x$, $0 < \lvert x − c \rvert < \delta$ implies $\lvert f(x) − L \rvert < \epsilon$. The notation used is $\lim_{x \rightarrow c}f(x) = L$. +The limit of $f(x)$ as $x$ approaches $c$ is $L$ if for every real $\epsilon > 0$, there exists a real $\delta > 0$ such that for all real $x$, $0 < \lvert x − c \rvert < \delta$ implies $\lvert f(x) − L \rvert < \epsilon$. The notation used is $\lim_{x \rightarrow c}f(x) = L$. +::: We comment on this later. @@ -266,14 +269,14 @@ xs = [1/10^i for i in 1:5] This progression can be seen to be increasing. Cauchy, in his treatise, can see this through: - +$$ \begin{align*} (1 + \frac{1}{m})^n &= 1 + \frac{1}{1} + \frac{1}{1\cdot 2}(1 - \frac{1}{m}) + \\ & \frac{1}{1\cdot 2\cdot 3}(1 - \frac{1}{m})(1 - \frac{2}{m}) + \cdots \\ &+ \frac{1}{1 \cdot 2 \cdot \cdots \cdot m}(1 - \frac{1}{m}) \cdot \cdots \cdot (1 - \frac{m-1}{m}). \end{align*} - +$$ These values are clearly increasing as $m$ increases. Cauchy showed the value was bounded between $2$ and $3$ and had the approximate value above. Then he showed the restriction to integers was not necessary. Later we will use this definition for the exponential function: @@ -597,6 +600,7 @@ Hmm, the values in `ys` appear to be going to $0.5$, but then end up at $0$. Is ```{julia} +xs = [1/10^i for i in 1:8] y1s = [1 - cos(x) for x in xs] y2s = [x^2 for x in xs] [xs y1s y2s] @@ -722,7 +726,7 @@ For example, the limit at $0$ of $(1-\cos(x))/x^2$ is easily handled: limit((1 - cos(x)) / x^2, x => 0) ``` -The pair notation (`x => 0`) is used to indicate the variable and the value it is going to. A `dir` argument is used to indicate ``x \rightarrow c+`` (the default), ``x \rightarrow c-``, and ``x \rightarrow c``. +The pair notation (`x => 0`) is used to indicate the variable and the value it is going to. A `dir` argument is used to indicate $x \rightarrow c+$ (the default, or `dir="+"`), $x \rightarrow c-$ (`dir="-"`), and $x \rightarrow c$ (`dir="+-"`). ##### Example @@ -856,7 +860,7 @@ This accurately shows the limit does not exist mathematically, but `limit(ceil(x The `limit` function doesn't compute limits from the definition, rather it applies some known facts about functions within a set of rules. Some of these rules are the following. Suppose the individual limits of $f$ and $g$ always exist (and are finite) below. - +$$ \begin{align*} \lim_{x \rightarrow c} (a \cdot f(x) + b \cdot g(x)) &= a \cdot \lim_{x \rightarrow c} f(x) + b \cdot \lim_{x \rightarrow c} g(x) @@ -870,7 +874,7 @@ The `limit` function doesn't compute limits from the definition, rather it appli \frac{\lim_{x \rightarrow c} f(x)}{\lim_{x \rightarrow c} g(x)} &(\text{provided }\lim_{x \rightarrow c} g(x) \neq 0)\\ \end{align*} - +$$ These are verbally described as follows, when the individual limits exist and are finite then: @@ -920,7 +924,7 @@ $$ This is clearly related to the function $f(x) = \sin(x)/x$, which has a limit of $1$ as $x \rightarrow 0$. We see $g(x) = k f(kx)$ is the limit in question. As $kx \rightarrow 0$, though not taking a value of $0$ except when $x=0$, the limit above is $k \lim_{x \rightarrow 0} f(kx) = k \lim_{u \rightarrow 0} f(u) = k$. -Basically when taking a limit as $x$ goes to $0$ we can multiply $x$ by any constant and figure out the limit for that. (It is as though we "go to" $0$ faster or slower. but are still going to $0$. +Basically when taking a limit as $x$ goes to $0$ we can multiply $x$ by any constant and figure out the limit for that. (It is as though we "go to" $0$ faster or slower, but are still going to $0$.) Similarly, diff --git a/quarto/limits/limits_extensions.qmd b/quarto/limits/limits_extensions.qmd index 63e2b65e..2c46b2bd 100644 --- a/quarto/limits/limits_extensions.qmd +++ b/quarto/limits/limits_extensions.qmd @@ -22,6 +22,8 @@ nothing --- +![To infinity and beyond](figures/buzz-infinity.jpg){width=40%} + The limit of a function at $c$ need not exist for one of many different reasons. Some of these reasons can be handled with extensions to the concept of the limit, others are just problematic in terms of limits. This section covers examples of each. @@ -97,8 +99,11 @@ But unlike the previous example, this function *would* have a limit if the defin Let's loosen up the language in the definition of a limit to read: -> The limit of $f(x)$ as $x$ approaches $c$ is $L$ if for every neighborhood, $V$, of $L$ there is a neighborhood, $U$, of $c$ for which $f(x)$ is in $V$ for every $x$ in $U$, except possibly $x=c$. +::: {.callout-note icon=false} + +The limit of $f(x)$ as $x$ approaches $c$ is $L$ if for every neighborhood, $V$, of $L$ there is a neighborhood, $U$, of $c$ for which $f(x)$ is in $V$ for every $x$ in $U$, except possibly $x=c$. +::: The $\epsilon-\delta$ definition has $V = (L-\epsilon, L + \epsilon)$ and $U=(c-\delta, c+\delta)$. This is a rewriting of $L-\epsilon < f(x) < L + \epsilon$ as $|f(x) - L| < \epsilon$. @@ -106,12 +111,16 @@ The $\epsilon-\delta$ definition has $V = (L-\epsilon, L + \epsilon)$ and $U=(c- Now for the definition: +::: {.callout-note icon=false} +## The $\epsilon-\delta$ Definition of a right limit -> A function $f(x)$ has a limit on the right of $c$, written $\lim_{x \rightarrow c+}f(x) = L$ if for every $\epsilon > 0$, there exists a $\delta > 0$ such that whenever $0 < x - c < \delta$ it holds that $|f(x) - L| < \epsilon$. That is, $U$ is $(c, c+\delta)$ +A function $f(x)$ has a limit on the right of $c$, written $\lim_{x \rightarrow c+}f(x) = L$ if for every $\epsilon > 0$, there exists a $\delta > 0$ such that whenever $0 < x - c < \delta$ it holds that $|f(x) - L| < \epsilon$. That is, $U$ is $(c, c+\delta)$ + +Similarly, a limit on the left is defined where $U=(c-\delta, c)$. +::: -Similarly, a limit on the left is defined where $U=(c-\delta, c)$. The `SymPy` function `limit` has a keyword argument `dir="+"` or `dir="-"` to request that a one-sided limit be formed. The default is `dir="+"`. Passing `dir="+-"` will compute both one side limits, and throw an error if the two are not equal, in agreement with no limit existing. @@ -354,7 +363,7 @@ limit(g(x), x=>0, dir="+") ## Limits of sequences -After all this, we still can't formalize the basic question asked in the introduction to limits: what is the area contained in a parabola. For that we developed a sequence of sums: $s_n = 1/2 \cdot((1/4)^0 + (1/4)^1 + (1/4)^2 + \cdots + (1/4)^n)$. This isn't a function of $x$, but rather depends only on non-negative integer values of $n$. However, the same idea as a limit at infinity can be used to define a limit. +After all this, we still can't formalize the basic question asked in the introduction to limits: what is the area contained in a parabola. For that we developed a sequence of sums: $s_n = 1/2 \cdot((1/4)^0 + (1/4)^1 + (1/4)^2 + \cdots + (1/4)^n)$. This isn't a function of real $x$, but rather depends only on non-negative integer values of $n$. However, the same idea as a limit at infinity can be used to define a limit. > Let $a_0,a_1, a_2, \dots, a_n, \dots$ be a sequence of values indexed by $n$. We have $\lim_{n \rightarrow \infty} a_n = L$ if for every $\epsilon > 0$ there exists an $M>0$ where if $n > M$ then $|a_n - L| < \epsilon$. diff --git a/quarto/precalc/calculator.qmd b/quarto/precalc/calculator.qmd index f6ed671a..4780c3de 100644 --- a/quarto/precalc/calculator.qmd +++ b/quarto/precalc/calculator.qmd @@ -533,7 +533,7 @@ sqrt(11^2 + 12^2) ##### Example -A formula from statistics to compute the variance of a binomial random variable for parameters $p$ and $n$ is $\sqrt{n p (1-p)}$. Compute this value for $p=1/4$ and $n=10$. +A formula from statistics to compute the variance of a binomial random variable with parameters $p$ and $n$ is $\sqrt{n p (1-p)}$. Compute this value for $p=1/4$ and $n=10$. ```{julia} @@ -568,20 +568,54 @@ Not all computations on a calculator are valid. For example, the Google calculat In `Julia`, there is a richer set of error types. The value `0/0` will in fact not be an error, but rather a value `NaN`. This is a special floating point value indicating "not a number" and is the result for various operations. The output of $\sqrt{-1}$ (computed via `sqrt(-1)`) will indicate a domain error: +```{julia} +#| error: true +sqrt(-1) +``` + +Other calls may result in an overflow error: ```{julia} #| error: true factorial(1000) ``` -How `Julia` handles overflow is a study in tradeoffs. For integer operations that demand high performance, `Julia` does not check for overflow. So, for example, if we are not careful strange answers can be had. Consider the difference here between powers of 2: +How `Julia` handles overflow is a study in tradeoffs. For integer operations that demand high performance, `Julia` does not check for overflow. So, for example, if we are not careful strange answers can be had. Consider the difference here between these powers of 2: ```{julia} -2^62, 2^63 +2^62, 2^63, 2^64 ``` -On a machine with $64$-bit integers, the first of these two values is correct, the second, clearly wrong, as the answer given is negative. This is due to overflow. The cost of checking is considered too high, so no error is thrown. The user is expected to have a sense that they need to be careful when their values are quite large. (Or the user can use floating point numbers, which though not always exact, can represent much bigger values and are exact for a reasonably wide range of integer values.) +On a machine with $64$-bit integers, the first of these two values is correct, the third is clearly "wrong," and the second clearly "wrong," as the answer given is negative. + +Wrong is in quotes, as though they are mathematically incorrect, computationally they are correct. The last two are due to overflow. The cost of checking is considered too high, so no error is thrown and the values represent what happens at the machine level. + +The user is expected to have a sense that they need to be careful when their values are quite large. But the better recommendation is that the user use floating point numbers, which as easy as typing `2.0^63`. Though not always exact, floating point values can represent a much bigger range values and are exact for a reasonably wide range of integer values. + + +::: {.callout-note} +## Bit-level details + +We can see in the following, using using the smaller 8-bit type, what goes on internally with successive powers of `2`: the bit pattern is found by shifting the previous one over to the left, consistent with what happens at the bit level when multiplying by `2`: + +``` +[bitstring(Int8(2)^i) for i in 1:8] +``` + +The last line is similar to what happens to `2^64` which is also `0`, as seen. The second to last line requires some understanding of how integers are represented internally. Of the 8 bits, the last 7 represent which powers of `2` (`2^6`, `2^5`, ..., `2^1`, `2^0`) are included. The first `1` represents `-2^7`. So all zeros, as we see `2^8` is, means `0`, but `"10000000"` means `-2^7`. The largest *positive* number would be represented as `"01111111"` or ``2^6 + 2^5 + \cdots + 2^1 + 2^0 = 2^7 - 1``. These values can be seen using `typemin` and `typemax`: + +```{julia} +typemin(Int8), typemax(Int8) +``` +For 64-bit, these are: + +```{julia} +typemin(Int), typemax(Int) +``` + +And we see, `2^63` is also just `typemin(Int)`. +::: :::{.callout-warning} diff --git a/quarto/precalc/exp_log_functions.qmd b/quarto/precalc/exp_log_functions.qmd index c8d416ef..d3e4694f 100644 --- a/quarto/precalc/exp_log_functions.qmd +++ b/quarto/precalc/exp_log_functions.qmd @@ -8,7 +8,8 @@ This section uses the following add-on packages: ```{julia} using CalculusWithJulia -using Plots; plotly() +using Plots +plotly() ``` @@ -27,7 +28,7 @@ The family of exponential functions is defined by $f(x) = a^x, -\infty< x < \inf For a given $a$, defining $a^n$ for positive integers is straightforward, as it means multiplying $n$ copies of $a.$ From this, for *integer powers*, the key properties of exponents: $a^x \cdot a^y = a^{x+y}$, and $(a^x)^y = a^{x \cdot y}$ are immediate consequences. For example with $x=3$ and $y=2$: - +$$ \begin{align*} a^3 \cdot a^2 &= (a\cdot a \cdot a) \cdot (a \cdot a) \\ &= (a \cdot a \cdot a \cdot a \cdot a) \\ @@ -36,7 +37,7 @@ a^3 \cdot a^2 &= (a\cdot a \cdot a) \cdot (a \cdot a) \\ &= (a\cdot a \cdot a \cdot a\cdot a \cdot a) \\ &= a^6 = a^{3\cdot 2}. \end{align*} - +$$ For $a \neq 0$, $a^0$ is defined to be $1$. @@ -167,7 +168,7 @@ Later we will see an easy way to certify this statement. ##### The mathematical constant $e$ -Euler's number, $e$, may be defined several ways. One way is to define $e^x$ by the limit $(1+x/n)^n$. Then $e=e^1$. The value is an irrational number. This number turns up to be the natural base to use for many problems arising in Calculus. In `Julia` there are a few mathematical constants that get special treatment, so that when needed, extra precision is available. The value `e` is not immediately assigned to this value, rather `ℯ` is. This is typed `\euler[tab]`. The label `e` is thought too important for other uses to reserve the name for representing a single number. However, users can issue the command `using Base.MathConstants` and `e` will be available to represent this number. When the `CalculusWithJulia` package is loaded, the value `e` is defined to be the floating point number returned by `exp(1)`. This loses the feature of arbitrary precision, but has other advantages. +Euler's number, $e$, may be defined several ways. One way is to define $e^x$ by the limit as $n$ grows infinitely large of $(1+x/n)^n$. Then $e=e^1$. The value is an irrational number. This number turns up to be the natural base to use for many problems arising in calculus. In `Julia` there are a few mathematical constants that get special treatment, so that when needed, extra precision is available. The value `e` is not immediately assigned to this value, rather `ℯ` is. This is typed `\euler[tab]`. The label `e` is thought too important for other uses to reserve the name for representing a single number. However, users can issue the command `using Base.MathConstants` and `e` will be available to represent this number. When the `CalculusWithJulia` package is loaded, the value `e` is defined to be the floating point number returned by `exp(1)`. This loses the feature of arbitrary precision, but has other advantages. A [cute](https://www.mathsisfun.com/numbers/e-eulers-number.html) appearance of $e$ is in this problem: Let $a>0$. Cut $a$ into $n$ equal pieces and then multiply them. What $n$ will produce the largest value? Note that the formula is $(a/n)^n$ for a given $a$ and $n$. @@ -215,32 +216,32 @@ co2_1970 = [(1860, 293), (1870, 293), (1880, 294), (1890, 295), (1900, 297), co2_2021 = [(1960, 318), (1970, 325), (1980, 338), (1990, 358), (2000, 370), (2010, 390), (2020, 415)] -xs,ys = unzip(co2_1970) -plot(xs, ys, legend=false) +plot(co2_1970, legend=false) # vector of points interface +plot!(co2_2021) -𝒙s, 𝒚s = unzip(co2_2021) -plot!(𝒙s, 𝒚s) +exp_model(;r, x0, P0) = x -> P0 * exp(r * (x - x0)) r = 0.002 -x₀, P₀ = 1960, 313 -plot!(x -> P₀ * exp(r * (x - x₀)), 1950, 1990, linewidth=5, alpha=0.25) +x0, P0 = 1960, 313 +plot!(exp_model(; r, x0, P0), 1950, 1990, linewidth=5, alpha=0.25) + +r = 0.005 +x0, P0 = 2000, 370 -𝒓 = 0.005 -𝒙₀, 𝑷₀ = 2000, 370 -plot!(x -> 𝑷₀ * exp(𝒓 * (x - 𝒙₀)), 1960, 2020, linewidth=5, alpha=0.25) +plot!(exp_model(; r, x0, P0), 1960, 2020, linewidth=5, alpha=0.25) ``` -(The `unzip` function is from the `CalculusWithJulia` package and will be explained in a subsequent section.) We can see that the projections from the year $1970$ hold up fairly well. +We can see that the projections from the year $1970$ hold up fairly well. -On this plot we added two *exponential* models. at $1960$ we added a *roughly* $0.2$ percent per year growth (a rate mentioned in an accompanying caption) and at $2000$ a roughly $0.5$ percent per year growth. The former barely keeping up with the data. +On this plot we added two *exponential* models. at $1960$ we added a *roughly* $0.2$ percent per year growth (a rate mentioned in an accompanying caption) and at $2000$ a roughly $0.5$ percent per year growth. The former barely keeping up with the data. (To do so, we used a parameterized function making for easier code reuse.) The word **roughly** above could be made exact. Suppose we knew that between $1960$ and $1970$ the rate went from $313$ to $320$. If this followed an exponential model, then $r$ above would satisfy: $$ -P_{1970} = P_{1960} e^{r * (1970 - 1960)} +P_{1970} = P_{1960} e^{r \cdot (1970 - 1960)} $$ or on division $320/313 = e^{r\cdot 10}$. Solving for $r$ can be done – as explained next – and yields $0.002211\dots$. @@ -271,6 +272,7 @@ xs = range(-2, stop=2, length=100) ys = f.(xs) plot(xs, ys, color=:blue, label="2ˣ") # plot f plot!(ys, xs, color=:red, label="f⁻¹") # plot f^(-1) + xs = range(1/4, stop=4, length=100) plot!(xs, log2.(xs), color=:green, label="log₂") # plot log2 ``` @@ -306,10 +308,10 @@ The half-life of a radioactive material is the time it takes for half the materi The carbon $14$ isotope is a naturally occurring isotope on Earth, appearing in trace amounts. Unlike Carbon $12$ and $13$ it decays, in this case with a half life of $5730$ years (plus or minus $40$ years). In a [technique](https://en.wikipedia.org/wiki/Radiocarbon_dating) due to Libby, measuring the amount of Carbon 14 present in an organic item can indicate the time since death. The amount of Carbon $14$ at death is essentially that of the atmosphere, and this amount decays over time. So, for example, if roughly half the carbon $14$ remains, then the death occurred about $5730$ years ago. -A formula for the amount of carbon $14$ remaining $t$ years after death would be $P(t) = P_0 \cdot 2^{-t/5730}$. +A formula for the expected amount of carbon $14$ remaining $t$ years after death would be $P(t) = P_0 \cdot 2^{-t/5730}$. -If $1/10$ of the original carbon $14$ remains, how old is the item? This amounts to solving $2^{-t/5730} = 1/10$. We have: $-t/5730 = \log_2(1/10)$ or: +If $1/10$ of the original carbon $14$ remains, how old is the item? This amounts to solving $2^{-t/5730} = 1/10$. We have after applying the inverse function: $-t/5730 = \log_2(1/10)$, or: ```{julia} @@ -382,19 +384,16 @@ a^{(\log_b(x)/\log_b(b))} = (b^{\log_b(a)})^{(\log_b(x)/\log_b(a))} = b^{\log_b(a) \cdot \log_b(x)/\log_b(a) } = b^{\log_b(x)} = x. $$ -In short, we have these three properties of logarithmic functions: - - -If $a, b$ are positive bases; $u,v$ are positive numbers; and $x$ is any real number then: - +In short, we have these three properties of logarithmic functions when $a, b$ are positive bases; $u,v$ are positive numbers; and $x$ is any real number: +$$ \begin{align*} \log_a(uv) &= \log_a(u) + \log_a(v), \\ \log_a(u^x) &= x \log_a(u), \text{ and} \\ \log_a(u) &= \log_b(u)/\log_b(a). \end{align*} - +$$ ##### Example diff --git a/quarto/precalc/functions.qmd b/quarto/precalc/functions.qmd index df4bd966..de069b4a 100644 --- a/quarto/precalc/functions.qmd +++ b/quarto/precalc/functions.qmd @@ -7,7 +7,8 @@ This section will use the following add-on packages: ```{julia} -using CalculusWithJulia, Plots +using CalculusWithJulia +using Plots plotly() ``` @@ -153,10 +154,14 @@ The definition of $s(x)$ above has two cases: $$ -s(x) = \begin{cases} -1 & s < 0\\ 1 & s > 0. \end{cases} +s(x) = +\begin{cases} +-1 & x < 0 \\ + 1 & x > 0. +\end{cases} $$ -We learn to read this as: when $s$ is less than $0$, then the answer is $-1$. If $s$ is greater than $0$ the answer is $1.$ Often - but not in this example - there is an "otherwise" case to catch those values of $x$ that are not explicitly mentioned. As there is no such "otherwise" case here, we can see that this function has no definition when $x=0$. This function is often called the "sign" function and is also defined by $\lvert x\rvert/x$. (`Julia`'s `sign` function actually defines `sign(0)` to be `0`.) +We learn to read this as: when $x$ is less than $0$, then the answer is $-1$. If $x$ is greater than $0$ the answer is $1.$ Often - but not in this example - there is an "otherwise" case to catch those values of $x$ that are not explicitly mentioned. As there is no such "otherwise" case here, we can see that this function has no definition when $x=0$. This function is often called the "sign" function and is also defined by $\lvert x\rvert/x$. (`Julia`'s `sign` function actually defines `sign(0)` to be `0`.) How do we create conditional statements in `Julia`? Programming languages generally have "if-then-else" constructs to handle conditional evaluation. In `Julia`, the following code will handle the above condition: @@ -333,8 +338,8 @@ We utilize this as follows. Suppose we wish to solve $f(x) = 0$ and we have two ```{julia} q(x) = x^2 - 2 -𝒂, 𝒃 = 1, 2 -𝒄 = secant_intersection(q, 𝒂, 𝒃) +a, b = 1, 2 +c = secant_intersection(q, a, b) ``` In our example, we see that in trying to find an answer to $f(x) = 0$ ( $\sqrt{2}\approx 1.414\dots$) our value found from the intersection point is a better guess than either $a=1$ or $b=2$: @@ -342,17 +347,17 @@ In our example, we see that in trying to find an answer to $f(x) = 0$ ( $\sqrt{2 ```{julia} #| echo: false -plot(q, 𝒂, 𝒃, linewidth=5, legend=false) -plot!(zero, 𝒂, 𝒃) -plot!([𝒂, 𝒃], q.([𝒂, 𝒃])) -scatter!([𝒄], [q(𝒄)]) +plot(q, a, b, linewidth=5, legend=false) +plot!(zero, a, b) +plot!([a, b], q.([a, b])) +scatter!([c], [q(c)]) ``` -Still, `q(𝒄)` is not really close to $0$: +Still, `q(c)` is not really close to $0$: ```{julia} -q(𝒄) +q(c) ``` *But* it is much closer than either $q(a)$ or $q(b)$, so it is an improvement. This suggests renaming $a$ and $b$ with the old $b$ and $c$ values and trying again we might do better still: @@ -360,9 +365,9 @@ q(𝒄) ```{julia} #| hold: true -𝒂, 𝒃 = 𝒃, 𝒄 -𝒄 = secant_intersection(q, 𝒂, 𝒃) -q(𝒄) +a, b = b, c +c = secant_intersection(q, a, b) +q(c) ``` Yes, now the function value at this new $c$ is even closer to $0$. Trying a few more times we see we just get closer and closer. Here we start again to see the progress @@ -370,10 +375,10 @@ Yes, now the function value at this new $c$ is even closer to $0$. Trying a few ```{julia} #| hold: true -𝒂,𝒃 = 1, 2 +a,b = 1, 2 for step in 1:6 - 𝒂, 𝒃 = 𝒃, secant_intersection(q, 𝒂, 𝒃) - current = (c=𝒃, qc=q(𝒃)) + a, b = b, secant_intersection(q, a, b) + current = (c=b, qc=q(b)) @show current end ``` @@ -486,7 +491,14 @@ During this call, values for `m` and `b` are found from how the function is call mxplusb(0; m=3, b=2) ``` -Keywords are used to mark the parameters whose values are to be changed from the default. Though one can use *positional arguments* for parameters - and there are good reasons to do so - using keyword arguments is a good practice if performance isn't paramount, as their usage is more explicit yet the defaults mean that a minimum amount of typing needs to be done. Keyword arguments are widely used with plotting commands, as there are numerous options to adjust, but typically only a handful adjusted per call. +Keywords are used to mark the parameters whose values are to be changed from the default. Though one can use *positional arguments* for parameters - and there are good reasons to do so - using keyword arguments is a good practice if performance isn't paramount, as their usage is more explicit yet the defaults mean that a minimum amount of typing needs to be done. + + +Keyword arguments are widely used with plotting commands, as there are numerous options to adjust, but typically only a handful adjusted per call. The `Plots` package whose commands we illustrate throughout these notes starting with the next section has this in its docs: `Plots.jl` follows two simple rules with data and attributes: + +* Positional arguments correspond to input data +* Keyword arguments correspond to attributes + ##### Example @@ -526,7 +538,14 @@ p = (g=9.8, v0=200, theta = 45*pi/180, k=1/2) trajectory(100, p) ``` -The style isn't so different from using keyword arguments, save the extra step of unpacking the parameters. The *big* advantage is consistency – the function is always called in an identical manner regardless of the number of parameters (or variables). +The style isn't so different from using keyword arguments, save the extra step of unpacking the parameters. Unpacking as above has shortcut syntax to extract selected fields by name: + +```{julia} +(; v0, theta) = p +v0, theta +``` + +The *big* advantage of bundling parameters into a container is consistency – the function is always called in an identical manner regardless of the number of parameters (or variables). ## Multiple dispatch @@ -568,7 +587,7 @@ methods(log, (Number,)) ##### Example -A common usage of multiple dispatch is, as is done with `log` above, to restrict the type of an argument and define a method for just this type. Types in `Julia` can be abstract or concrete. This distinction is important when construction *composite types* (which we are not doing here), but otherwise not so important. In the following example, we use the abstract types `Integer`, `Real`, and `Complex` to define methods for a function we call `twotox`: +A common usage of multiple dispatch is, as is done with `log` above, to restrict the type of an argument and define a method for just this type. Types in `Julia` can be abstract or concrete. This distinction is important when constructing *composite types* (which we are not doing here), but otherwise not so important. In the following example, we use the abstract types `Integer`, `Real`, and `Complex` to define methods for a function we call `twotox`: ```{julia} function twotox(x::Integer) @@ -706,7 +725,7 @@ When the `->` is seen a function is being created. :::{.callout-warning} ## Warning -Generic versus anonymous functions. Julia has two types of functions, generic ones, as defined by `f(x)=x^2` and anonymous ones, as defined by `x -> x^2`. One gotcha is that `Julia` does not like to use the same variable name for the two types. In general, Julia is a dynamic language, meaning variable names can be reused with different types of variables. But generic functions take more care, as when a new method is defined it gets added to a method table. So repurposing the name of a generic function for something else is not allowed. Similarly, repurposing an already defined variable name for a generic function is not allowed. This comes up when we use functions that return functions as we have different styles that can be used: When we defined `l = shift_right(f, c=3)` the value of `l` is assigned an anonymous function. This binding can be reused to define other variables. However, we could have defined the function `l` through `l(x) = shift_right(f, c=3)(x)`, being explicit about what happens to the variable `x`. This would add a method to the generic function `l`. Meaning, we get an error if we tried to assign a variable to `l`, such as an expression like `l=3`. We generally employ the latter style, even though it involves a bit more typing, as we tend to stick to methods of generic functions for consistency. +Generic versus anonymous functions. Julia has two types of functions, generic ones, as defined by `f(x)=x^2` and anonymous ones, as defined by `x -> x^2`. One gotcha is that `Julia` does not like to use the same variable name for the two types. In general, Julia is a dynamic language, meaning variable names can be reused with different types of variables. But generic functions take more care, as when a new method is defined it gets added to a method table. So repurposing the name of a generic function for something else is not allowed. Similarly, repurposing an already defined variable name for a generic function is not allowed. This comes up when we use functions that return functions as we have different styles that can be used: When we defined `l = shift_right(f, c=3)` the value of `l` is assigned an anonymous function. This binding can be reused to define other variables. However, we could have defined the function `l` through `l(x) = shift_right(f, c=3)(x)`, being explicit about what happens to the variable `x`. This would add a method to the generic function `l`. Meaning, we get an error if we tried to assign a variable to `l`, such as an expression like `l=3`. The latter style is inefficient, so is not preferred. ::: diff --git a/quarto/precalc/inversefunctions.qmd b/quarto/precalc/inversefunctions.qmd index b650c3c5..9864556e 100644 --- a/quarto/precalc/inversefunctions.qmd +++ b/quarto/precalc/inversefunctions.qmd @@ -40,10 +40,12 @@ plot(f, 0, 4, legend=false) plot!([2,2,0], [0,f(2),f(2)]) ``` -The graph of a function is a representation of points $(x,f(x))$, so to *find* $f(c)$ from the graph, we begin on the $x$ axis at $c$, move vertically to the graph (the point $(c, f(c))$), and then move horizontally to the $y$ axis, intersecting it at $f(c)$. The figure shows this for $c=2$, from which we can read that $f(c)$ is about $4$. This is how an $x$ is associated to a single $y$. +The graph of a function is a representation of points $(x,f(x))$, so to *find* $y = f(c)$ from the graph, we begin on the $x$ axis at $c$, move vertically to the graph (the point $(c, f(c))$), and then move horizontally to the $y$ axis, intersecting it at $y = f(c)$. The figure shows this for $c=2$, from which we can read that $f(c)$ is about $4$. This is how an $x$ is associated to a single $y$. -If we were to *reverse* the direction, starting at $f(c)$ on the $y$ axis and then moving horizontally to the graph, and then vertically to the $x$-axis we end up at a value $c$ with the correct $f(c)$. This operation will form a function **if** the initial movement horizontally is guaranteed to find *no more than one* value on the graph. That is, to have an inverse function, there can not be two $x$ values corresponding to a given $y$ value. This observation is often visualized through the "horizontal line test" - the graph of a function with an inverse function can only intersect a horizontal line at most in one place. +If we were to *reverse* the direction, starting at $y = f(c)$ on the $y$ axis and then moving horizontally to the graph, and then vertically to the $x$-axis we end up at a value $c$ with the correct $f(c)$. This allows solving for $x$ knowing $y$ in $y=f(x)$. + +The operation described will form a function **if** the initial movement horizontally is guaranteed to find *no more than one* value on the graph. That is, to have an inverse function, there can not be two $x$ values corresponding to a given $y$ value. This observation is often visualized through the "horizontal line test" - the graph of a function with an inverse function can only intersect a horizontal line at most in one place. More formally, a function is called *one-to-one* *if* for any two $a \neq b$, it must be that $f(a) \neq f(b)$. Many functions are one-to-one, many are not. Familiar one-to-one functions are linear functions ($f(x)=a \cdot x + b$ with $a\neq 0$), odd powers of $x$ ($f(x)=x^{2k+1}$), and functions of the form $f(x)=x^{1/n}$ for $x \geq 0$. In contrast, all *even* functions are *not* one-to-one, as $f(x) = f(-x)$ for any nonzero $x$ in the domain of $f$. @@ -70,13 +72,13 @@ However, typically we have a rule describing our function. What is the process t When we solve algebraically for $x$ in $y=9/5 \cdot x + 32$ we do the same thing as we do verbally: we subtract $32$ from each side, and then divide by $9/5$ to isolate $x$: - +$$ \begin{align*} y &= 9/5 \cdot x + 32\\ y - 32 &= 9/5 \cdot x\\ (y-32) / (9/5) &= x. \end{align*} - +$$ From this, we have the function $g(y) = (y-32) / (9/5)$ is the inverse function of $f(x) = 9/5\cdot x + 32$. @@ -102,7 +104,7 @@ Suppose a transformation of $x$ is given by $y = f(x) = (ax + b)/(cx+d)$. This f From the expression $y=f(x)$ we *algebraically* solve for $x$: - +$$ \begin{align*} y &= \frac{ax +b}{cx+d}\\ y \cdot (cx + d) &= ax + b\\ @@ -110,7 +112,7 @@ ycx - ax &= b - yd\\ (cy-a) \cdot x &= b - dy\\ x &= -\frac{dy - b}{cy-a}. \end{align*} - +$$ We see that to solve for $x$ we need to divide by $cy-a$, so this expression can not be zero. So, using $x$ as the dummy variable, we have @@ -128,14 +130,14 @@ The function $f(x) = (x-1)^5 + 2$ is strictly increasing and so will have an inv Again, we solve algebraically starting with $y=(x-1)^5 + 2$ and solving for $x$: - +$$ \begin{align*} y &= (x-1)^5 + 2\\ y - 2 &= (x-1)^5\\ (y-2)^{1/5} &= x - 1\\ (y-2)^{1/5} + 1 &= x. \end{align*} - +$$ We see that $f^{-1}(x) = 1 + (x - 2)^{1/5}$. The fact that the power $5$ is an odd power is important, as this ensures a unique (real) solution to the fifth root of a value, in the above $y-2$. @@ -171,14 +173,14 @@ The [inverse function theorem](https://en.wikipedia.org/wiki/Inverse_function_th Consider the function $f(x) = (1+x^2)^{-1}$. This bell-shaped function is even (symmetric about $0$), so can not possibly be one-to-one. However, if the domain is restricted to $[0,\infty)$ it is. The restricted function is strictly decreasing and its inverse is found, as follows: - +$$ \begin{align*} y &= \frac{1}{1 + x^2}\\ 1+x^2 &= \frac{1}{y}\\ x^2 &= \frac{1}{y} - 1\\ x &= \sqrt{(1-y)/y}, \quad 0 < y \leq 1. \end{align*} - +$$ Then $f^{-1}(x) = \sqrt{(1-x)/x}$ where $0 < x \leq 1$. The somewhat complicated restriction for the the domain coincides with the range of $f(x)$. We shall see next that this is no coincidence. @@ -268,9 +270,9 @@ We drew a line connecting $(1/2, f(1/2))$ to $(f(1/2),1/2)$. We can see that it One consequence of this symmetry, is that if $f$ is strictly increasing, then so is its inverse. - -!!!note In the above we used `cbrt(x)` and not `x^(1/3)`. The latter usage assumes that $x \geq 0$ as it isn't guaranteed that for all real exponents the answer will be a real number. The `cbrt` function knows there will always be a real answer and provides it. - +::: {.callout-note} +In the above we used `cbrt(x)` and not `x^(1/3)`. The latter usage assumes that $x \geq 0$ as it isn't guaranteed that for all real exponents the answer will be a real number. The `cbrt` function knows there will always be a real answer and provides it. +::: ### Lines diff --git a/quarto/precalc/julia_overview.qmd b/quarto/precalc/julia_overview.qmd index 8680e974..bac1c2d6 100644 --- a/quarto/precalc/julia_overview.qmd +++ b/quarto/precalc/julia_overview.qmd @@ -32,7 +32,7 @@ The [https://mybinder.org/](https://mybinder.org/) service in particular allows * [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/jverzani/CalculusWithJuliaBinder.jl/sympy?labpath=blank-notebook.ipynb) (Image with SymPy, longer to load) -[Google colab](https://colab.research.google.com/) offers a free service with more computing power than `binder`, though setup is a bit more fussy. To use `colab`, you need to execute a command that downloads `Julia` and installs the `CalculusWithJulia` package and a plotting package. (Modify the `pkg"add ..."` command to add other desired packages): +[Google colab](https://colab.research.google.com/) offers a free service with more computing power than `binder`, though setup is a bit more fussy. To use `colab` along with these notes, you need to execute a command that downloads `Julia` and installs the `CalculusWithJulia` package and a plotting package. (Modify the `pkg"add ..."` command to add other desired packages; update the julia version as necessary): ``` # Installation cell @@ -50,6 +50,9 @@ julia -e 'using Pkg; Pkg.add(url="https://github.com/mth229/BinderPlots.jl")' echo 'Now change the runtime type' ``` + (The `BinderPlots` is a light-weight, barebones, plotting package that uses `PlotlyLight` to render graphics with commands mostly following those of the `Plots` package. Though suitable for most examples herein, the `Plots` package could instead be installed) + + After this executes (which can take quite some time, as in a few minutes) under the `Runtime` menu select `Change runtime type` and then select `Julia`. After that, in a cell execute these commands to load the two installed packages: @@ -63,11 +66,10 @@ As mentioned, other packages can be chosen for installation. - ## Interacting with `Julia` -At a basic level, `Julia` provides a means to read commands or instructions, evaluate those commands, and then print or return those commands. At a user level, there are many different ways to interact with the reading and printing. For example: +At a basic level, `Julia` provides an interactive means to read commands or instructions, evaluate those commands, and then print or return those commands. At a user level, there are many different ways to interact with the reading and printing. For example: * The REPL. The `Julia` terminal is the built-in means to interact with `Julia`. A `Julia` Terminal has a command prompt, after which commands are typed and then sent to be evaluated by the `enter` key. The terminal may look something like the following where `2+2` is evaluated: @@ -83,7 +85,7 @@ $ julia (_) | (_) (_) | _ _ _| |_ __ _ | Type "?" for help, "]?" for Pkg help. | | | | | | |/ _` | | - | | |_| | | | (_| | | Version 1.7.0 (2021-11-30) + | | |_| | | | (_| | | Version 1.11.1 (2024-10-16) _/ |\__'_|_|_|\__'_| | Official https://julialang.org/ release |__/ | @@ -104,8 +106,8 @@ The `Pluto` interface has some idiosyncrasies that need explanation: * Cells can only have one command within them. Multiple-command cells must be contained in a `begin` block or a `let` block. * By default, the cells are *reactive*. This means when a variable in one cell is changed, then any references to that variable are also updated – like a spreadsheet. This is fantastic for updating several computations at once. However it means variable names can not be repeated within a page. Pedagogically, it is convenient to use variable names and function names (e.g., `x` and `f`) repeatedly, but this is only possible *if* they are within a `let` block or a function body. - * To not repeat names, but to be able to reference a value from cell-to-cell, some Unicode variants are used within a page. Visually these look familiar, but typing the names requires some understanding of Unicode input. The primary usages is *bold italic* (e.g., `\bix[tab]` or `\bif[tab]`) or *bold face* (e.g. `\bfx[tab]` or `\bff[tab]`). - * The notebooks snapshot the packages they depend on, which is great for reproducibility, but may mean older versions are silently used. + * To not repeat names, but to be able to reference a value from cell-to-cell, some Unicode variants may be used within a page. Visually these look familiar, but typing the names requires some understanding of Unicode input. The primary usages is *bold italic* (e.g., `\bix[tab]` or `\bif[tab]`) or *bold face* (e.g. `\bfx[tab]` or `\bff[tab]`). + * The notebooks snapshot the packages they depend on, which is great for reproducibility, but may lead to older versions of the packages being silently used. ## Augmenting base `Julia` @@ -152,7 +154,7 @@ This command instructs `Julia` to look at its *general registry* for the `Calcul :::{.callout-note} ## Note -In a terminal setting, there is a package mode, entered by typing `]` as the leading character and exited by entering `` at a blank line. This mode allows direct access to `Pkg` with a simpler syntax. The command above would be just `add CalculusWithJulia`.) +In a terminal setting, there is a package mode, entered by typing `]` as the leading character and exited by entering `` at a blank line. This mode allows direct access to `Pkg` with a simpler syntax. The command above would be just `add CalculusWithJulia`. As well, when a package is not installed, calling `using SomePackage` will prompt the user if they wish to install the package in the current environment.) ::: @@ -167,11 +169,11 @@ For these notes, the following packages, among others, are used: ```{julia} #| eval: false -Pkg.add("CalculusWithJulia") # for some simplifying functions and a few packages (SpecialFunctions, ForwardDiff) -Pkg.add("Plots") # for basic plotting -Pkg.add("SymPy") # for symbolic math -Pkg.add("Roots") # for solving `f(x)=0` -Pkg.add("QuadGk") # for integration +Pkg.add("CalculusWithJulia") # for some convenience functions and a few packages (SpecialFunctions, ForwardDiff) +Pkg.add("Plots") # for basic plotting +Pkg.add("SymPy") # for symbolic math +Pkg.add("Roots") # for numerically solving `f(x)=0` and `f(x)=g(x)` +Pkg.add("QuadGk") # for 1-dimensional numeric integration Pkg.add("HQuadrature") # for higher-dimensional integration ``` @@ -236,7 +238,7 @@ Integer operations may silently overflow, producing odd answers, at first glance 2^64 ``` -(Though the output is predictable, if overflow is taken into consideration appropriately.) +(Though the output is predictable, knowing why requires understanding of how the hardware implements these operations.) When different types of numbers are mixed, `Julia` will usually promote the values to a common type before the operation: @@ -249,7 +251,10 @@ When different types of numbers are mixed, `Julia` will usually promote the valu `Julia` will first add `2` and `1//2` promoting `2` to rational before doing so. Then add the result, `5//2` to `0.5` by promoting `5//2` to the floating point number `2.5` before proceeding. -`Julia` uses a special type to store a handful of irrational constants such as `pi`. The special type allows these constants to be treated without round off, until they mix with other floating point numbers. There are some functions that require these be explicitly promoted to floating point. This can be done by calling `float`. +`Julia` uses a special type to store a handful of irrational constants such as `pi`. The special type allows these constants to be treated without round off, until they mix with other floating point numbers. An irrational value for `e` is not exported; the `CalculusWithJulia` exports a floating point value `e=exp(1)`. + + +There are some functions that require these be explicitly promoted to floating point. This can be done by calling `float`. The standard mathematical operations are implemented by `+`, `-`, `*`, `/`, `^`. Parentheses are used for grouping. @@ -275,6 +280,9 @@ Values will be promoted to a common type (or type `Any` if none exists). For exa (Vectors are used as a return type from some functions, as such, some familiarity is needed.) +Other common container types are variables of vectors (higher-dimensional arrarys, offset arrays, etc.) tuples (for heterogeneous, immutable, indexed values); named tuples (which add a name to each value in a tuple); and dictionaries (for associative relationships between a key and a value). + + Regular arithmetic sequences can be defined by either: @@ -325,7 +333,7 @@ u + a_really_long_name + a0 - b0 + α Within `Pluto`, names are idiosyncratic: within the global scope, only a single usage is possible per notebook; functions and variables can be freely renamed; structures can be redefined or renamed; ... -Outside of `Pluto`, names may be repurposed, even with values of different types (`Julia` is a dynamic language), save for (generic) function names, which have some special rules and can only be redefined as another function. Generic functions are central to `Julia`'s design. Generic functions use a method table to dispatch on, so once a name is assigned to a generic function, it can not be used as a variable name; the reverse is also true. +Outside of `Pluto`, names may be repurposed, even with values of different types (`Julia` is a dynamic language), save for (generic) function names, which have some special rules and can only be redefined as another method for the function. Generic functions are central to `Julia`'s design. Generic functions use a method table to dispatch on, so once a name is assigned to a generic function, it can not be used as a variable name; the reverse is also true. ## Functions @@ -362,7 +370,7 @@ log Besides `^`, there are `sqrt` and `cbrt` for powers. In addition basic functions for exponential and logarithmic functions: -```{verbatim} +``` sqrt, cbrt exp log # base e @@ -375,7 +383,7 @@ log10, log2, # also log(b, x) The `6` standard trig functions are implemented; their implementation for degree arguments; their inverse functions; and the hyperbolic analogs. -```{verbatim} +``` sin, cos, tan, csc, sec, cot asin, acos, atan, acsc, asec, acot sinh, cosh, tanh, csch, sech, coth @@ -385,7 +393,7 @@ asinh, acosh, atanh, acsch, asech, acoth If degrees are preferred, the following are defined to work with arguments in degrees: -```{verbatim} +``` sind, cosd, tand, cscd, secd, cotd ``` @@ -444,7 +452,7 @@ a = 4 f(3) # now 2 * 4 + 3 ``` -User-defined functions can have $0$, $1$ or more arguments: +User-defined functions can have $0$, $1$ or more positional arguments: ```{julia} @@ -454,7 +462,7 @@ area(w, h) = w*h Julia makes different *methods* for *generic* function names, so function definitions whose argument specification is different are for different uses, even if the name is the same. This is *polymorphism*. The practical use is that it means users need only remember a much smaller set of function names, as attempts are made to give common expectations to the same name. (That is, `+` should be used only for "add" ing objects, however defined.) -Functions can be defined with *keyword* arguments that may have defaults specified: +Functions can also be defined with *keyword* arguments that may have defaults specified: ```{julia} @@ -465,11 +473,13 @@ f(1, m=10) # uses m=10, b=0 -> 10 * 1 + 0 f(1, m=10, b=5) # uses m=10, b=5 -> 10 * 1 + 5 ``` +Keyword arguments are not considered for dispatch. + Longer functions can be defined using the `function` keyword, the last command executed is returned: ```{julia} -function 𝒇(x) +function f(x) y = x^2 z = y - 3 z @@ -534,7 +544,7 @@ sin.(xs) # gives back [sin(1), sin(2), sin(3), sin(4), sin(5)] For "infix" operators, the dot precedes the operator, as in this example instructing pointwise multiplication of each element in `xs`: -```{juila} +```{julia} xs .* xs ``` @@ -566,8 +576,9 @@ With `Plots` loaded, we can plot a function by passing the function object by na plot(sin, 0, 2pi) # plot a function - by name - over an interval [a,b] ``` -!!! note This is in the form of **the** basic pattern employed: `verb(function_object, arguments...)`. The verb in this example is `plot`, the object `sin`, the arguments `0, 2pi` to specify `[a,b]` domain to plot over. - +::: {.callout-note} +This is in the form of **the** basic pattern employed: `verb(function_object, arguments...)`. The verb in this example is `plot`, the object `sin`, the arguments `0, 2pi` to specify `[a,b]` domain to plot over. +::: Plotting more than one function over `[a,b]` is achieved through the `plot!` function, which modifies the existing plot (`plot` creates a new one) by adding a new layer: @@ -578,6 +589,8 @@ plot!(cos, 0, 2pi) plot!(zero, 0, 2pi) # add the line y=0 ``` +(There are alternatives to plot functions or other traces all at once.) + Individual points are added with `scatter` or `scatter!`: @@ -587,7 +600,7 @@ plot!(cos, 0, 2pi) scatter!([pi/4, pi+pi/4], [sin(pi/4), sin(pi + pi/4)]) ``` -(The extra argument `legend=false` suppresses the automatic legend drawing. There are many other useful arguments to adjust a graphic. For example, passing `markersize=10` to the `scatter!` command would draw the points larger than the default.) +(The extra argument `legend=false` suppresses the automatic legend drawing. There are many other useful keyword arguments to adjust attributes of a trace of a graphic. For example, passing `markersize=10` to the `scatter!` command would draw the points larger than the default.) Plotting an *anonymous* function is a bit more immediate than the two-step approach of defining a named function then calling `plot` with this as an argument: @@ -607,6 +620,20 @@ ys = [sin(2x) + sin(3x) + sin(4x) for x in xs] plot(xs, ys) ``` +There are different plotting interfaces. Though not shown, all of these `plot` commands produce a plot of `f`, though with minor differences: + +```{julia} +#| eval: false +xs = range(a, b, length=251) +ys = f.(xs) +plot(f, a, b) # recipe for a function +plot(xs, f) # alternate recipe +plot(xs, ys) # plot coordinates as two vectors +plot([(x,f(x)) for x in xs]) # plot a vector o points +``` + +The choice should depend on convenience. + ## Equations @@ -623,7 +650,7 @@ In `Julia` the equals sign is **only** for *assignment* and *mutation*. The *lef ## Symbolic math -Symbolic math is available through an add-on package `SymPy` (among others). Once loaded, symbolic variables are created with the macro `@syms`: +Symbolic math is available through an add-on package `SymPy` (among others). Once loaded, symbolic variables in `SymPy` are created with the macro `@syms`: ```{julia} diff --git a/quarto/precalc/numbers_types.qmd b/quarto/precalc/numbers_types.qmd index 49c54a8b..4f517daa 100644 --- a/quarto/precalc/numbers_types.qmd +++ b/quarto/precalc/numbers_types.qmd @@ -62,8 +62,7 @@ Similarly, each type is printed slightly differently. The key distinction is between integers and floating points. While floating point values include integers, and so can be used exclusively on the calculator, the difference is that an integer is guaranteed to be an exact value, whereas a floating point value, while often an exact representation of a number is also often just an *approximate* value. This can be an advantage – floating point values can model a much wider range of numbers. - -Now in nearly all cases the differences are not noticeable. Take for instance this simple calculation involving mixed types. +In nearly all cases the differences are not noticeable. Take for instance this simple calculation involving mixed types. ```{julia} @@ -89,10 +88,10 @@ These values are *very* small numbers, but not exactly $0$, as they are mathemat --- -The only common issue is with powers. `Julia` tries to keep a predictable output from the input types (not their values). Here are the two main cases that arise where this can cause unexpected results: +The only common issue is with powers. We saw this previously when discussing a distinction between `2^64` and `2.0^64. `Julia` tries to keep a predictable output from the input types (not their values). Here are the two main cases that arise where this can cause unexpected results: - * integer bases and integer exponents can *easily* overflow. Not only `m^n` is always an integer, it is always an integer with a fixed storage size computed from the sizes of `m` and `n`. So the powers can quickly get too big. This can be especially noticeable on older $32$-bit machines, where too big is $2^{32} = 4,294,967,296$. On $64$-bit machines, this limit is present but much bigger. +* integer bases and integer exponents can *easily* overflow. Not only `m^n` is always an integer, it is always an integer with a fixed storage size computed from the sizes of `m` and `n`. So the powers can quickly get too big. This can be especially noticeable on older $32$-bit machines, where too big is $2^{32} = 4,294,967,296$. On $64$-bit machines, this limit is present but much bigger. Rather than give an error though, `Julia` gives seemingly arbitrary answers, as can be seen in this example on a $64$-bit machine: @@ -102,13 +101,13 @@ Rather than give an error though, `Julia` gives seemingly arbitrary answers, as 2^62, 2^63 ``` -(They aren't arbitrary, rather integer arithmetic is implemented as modular arithmetic.) +(They aren't arbitrary, as explained previously.) -This could be worked around, as it is with some programming languages, but it isn't, as it would slow down this basic computation. So, it is up to the user to be aware of cases where their integer values can grow to big. The suggestion is to use floating point numbers in this domain, as they have more room, at the cost of sometimes being approximate values. +This could be worked around, as it is with some programming languages, but it isn't, as it would slow down this basic computation. So, it is up to the user to be aware of cases where their integer values can grow to big. The suggestion is to use floating point numbers in this domain, as they have more room, at the cost of sometimes being approximate values for fairly large values. - * the `sqrt` function will give a domain error for negative values: +* the `sqrt` function will give a domain error for negative values: ```{julia} @@ -161,11 +160,12 @@ Integers are often used casually, as they come about from parsing. As with a cal ### Floating point numbers -[Floating point](http://en.wikipedia.org/wiki/Floating_point) numbers are a computational model for the real numbers. For floating point numbers, $64$ bits are used by default for both $32$- and $64$-bit systems, though other storage sizes can be requested. This gives a large ranging - but still finite - set of real numbers that can be represented. However, there are infinitely many real numbers just between $0$ and $1$, so there is no chance that all can be represented exactly on the computer with a floating point value. Floating point then is *necessarily* an approximation for all but a subset of the real numbers. Floating point values can be viewed in normalized [scientific notation](http://en.wikipedia.org/wiki/Scientific_notation) as $a\cdot 2^b$ where $a$ is the *significand* and $b$ is the *exponent*. Save for special values, the significand $a$ is normalized to satisfy $1 \leq \lvert a\rvert < 2$, the exponent can be taken to be an integer, possibly negative. +[Floating point](http://en.wikipedia.org/wiki/Floating_point) numbers are a computational model for the real numbers. For floating point numbers, $64$ bits are used by default for both $32$- and $64$-bit systems, though other storage sizes can be requested. This gives a large range - but still finite - set of real numbers that can be represented. However, there are infinitely many real numbers just between $0$ and $1$, so there is no chance that all can be represented exactly on the computer with a floating point value. Floating point then is *necessarily* an approximation for all but a subset of the real numbers. Floating point values can be viewed in normalized [scientific notation](http://en.wikipedia.org/wiki/Scientific_notation) as $a\cdot 2^b$ where $a$ is the *significand* and $b$ is the *exponent*. Save for special values, the significand $a$ is normalized to satisfy $1 \leq \lvert a\rvert < 2$, the exponent can be taken to be an integer, possibly negative. As per IEEE Standard 754, the `Float64` type gives 52 bits to the precision (with an additional implied one), 11 bits to the exponent and the other bit is used to represent the sign. Positive, finite, floating point numbers have a range approximately between $10^{-308}$ and $10^{308}$, as 308 is about $\log_{10} 2^{1023}$. The numbers are not evenly spread out over this range, but, rather, are much more concentrated closer to $0$. +The use of 32-bit floating point values is common, as some widley used computer chips expect this. These values have a narrower range of possible values. :::{.callout-warning} ## More on floating point numbers @@ -205,8 +205,10 @@ The special coding `aeb` (or if the exponent is negative `ae-b`) is used to repr avogadro = 6.022e23 ``` -Here `e` is decidedly *not* the Euler number, rather syntax to separate the exponent from the mantissa. - +::: {.callout-note} +## Not `e` +Here `e` is decidedly *not* the Euler number, rather **syntax** to separate the exponent from the mantissa. +::: The first way of representing this number required using `10.0` and not `10` as the integer power will return an integer and even for 64-bit systems is only valid up to `10^18`. Using scientific notation avoids having to concentrate on such limitations. @@ -296,7 +298,7 @@ That is adding `1/10` and `2/10` is not exactly `3/10`, as expected mathematical 1/10 + (2/10 + 3/10) == (1/10 + 2/10) + 3/10 ``` - * For real numbers subtraction of similar-sized numbers is not exceptional, for example $1 - \cos(x)$ is positive if $0 < x < \pi/2$, say. This will not be the case for floating point values. If $x$ is close enough to $0$, then $\cos(x)$ and $1$ will be so close, that they will be represented by the same floating point value, `1.0`, so the difference will be zero: + * Mathematically, or real numbers, subtraction of similar-sized numbers is not exceptional, for example $1 - \cos(x)$ is positive if $0 < x < \pi/2$, say. This will not be the case for floating point values. If $x$ is close enough to $0$, then $\cos(x)$ and $1$ will be so close, that they will be represented by the same floating point value, `1.0`, so the difference will be zero: ```{julia} @@ -306,7 +308,7 @@ That is adding `1/10` and `2/10` is not exactly `3/10`, as expected mathematical ### Rational numbers -Rational numbers can be used when the exactness of the number is more important than the speed or wider range of values offered by floating point numbers. In `Julia` a rational number is comprised of a numerator and a denominator, each an integer of the same type, and reduced to lowest terms. The operations of addition, subtraction, multiplication, and division will keep their answers as rational numbers. As well, raising a rational number to a positive, integer value will produce a rational number. +Rational numbers can be used when the exactness of the number is more important than the speed or wider range of values offered by floating point numbers. In `Julia` a rational number is comprised of a numerator and a denominator, each an integer of the same type, and reduced to lowest terms. The operations of addition, subtraction, multiplication, and division will keep their answers as rational numbers. As well, raising a rational number to an integer value will produce a rational number. As mentioned, these are constructed using double slashes: @@ -643,5 +645,5 @@ Finding the value through division introduces a floating point deviation. Which #| echo: false as = [`1/10^21`, `1e-21`] explanation = "The scientific notation is correct. Due to integer overflow `10^21` is not the same number as `10.0^21`." -buttonq(as, 2; explanation=explanation) +buttonq(as, 2; explanation) ``` diff --git a/quarto/precalc/plotting.qmd b/quarto/precalc/plotting.qmd index a574087e..8386bf39 100644 --- a/quarto/precalc/plotting.qmd +++ b/quarto/precalc/plotting.qmd @@ -30,7 +30,7 @@ A scalar, univariate function, such as $f(x) = 1 - x^2/2$, can be thought of in * It can be represented through a rule of what it does to $x$, as above. This is useful for computing numeric values. * it can be interpreted verbally, as in *square* $x$, take half then *subtract* from one. This can give clarity to what the function does. - * It can be thought of in terms of its properties: a polynomial, continuous, $U$-shaped, an approximation for $\cos(x)$ near $0$, $\dots$ + * It can be thought of in terms of its properties: a polynomial, continuous, upside down $U$-shaped, an approximation for $\cos(x)$ near $0$, $\dots$ * it can be visualized graphically. This is useful for seeing the qualitative behavior of a function. @@ -243,15 +243,15 @@ For instances where a *specific* set of $x$ values is desired to be used, the `r ```{julia} -𝒙s = range(0, 2pi, length=10) -𝒚s = sin.(𝒙s) +xs = range(0, 2pi, length=10) +ys = sin.(xs) ``` Finally, to plot the set of points and connect with lines, the $x$ and $y$ values are passed along as vectors: ```{julia} -plot(𝒙s, 𝒚s) +plot(xs, ys) ``` This plots the points as pairs and then connects them in order using straight lines. Basically, it creates a dot-to-dot graph. The above graph looks primitive, as it doesn't utilize enough points. @@ -490,7 +490,7 @@ For plotting points with `scatter`, or `scatter!` the markers can be adjusted vi * `scatter(..., marker=:square)`: change the marker (uses a symbol, not a string to specify) -Of course, zero, one, or more of these can be used on any given call to `plot`, `plot!`, `scatter` or `scatter!`. +Of course, zero, one, or more of these can be used on any given call to `plot`, `plot!`, `scatter`, or `scatter!`. #### Example: Bresenham's algorithm @@ -575,9 +575,9 @@ The most "famous" parametric graph is one that is likely already familiar, as it ```{julia} -𝒇(x) = cos(x); 𝒈(x) = sin(x) -𝒕s = range(0, 2pi, length=100) -plot(𝒇.(𝒕s), 𝒈.(𝒕s), aspect_ratio=:equal) # make equal axes +f(x) = cos(x); g(x) = sin(x) +ts = range(0, 2pi, length=100) +plot(f.(ts), g.(ts), aspect_ratio=:equal) # make equal axes ``` Any point $(a,b)$ on this graph is represented by $(\cos(t), \sin(t))$ for some value of $t$, and in fact multiple values of $t$, since $t + 2k\pi$ will produce the same $(a,b)$ value as $t$ will. @@ -599,8 +599,8 @@ DataFrame(θ=θs, x=cos.(θs), y=sin.(θs)) ```{julia} #| hold: true θs =[0, pi/6, pi/4, pi/3, pi/2, 2pi/3, 3pi/4, 5pi/6, pi] -plot(𝒇.(θs), 𝒈.(θs), legend=false, aspect_ratio=:equal) -scatter!(𝒇.(θs), 𝒈.(θs)) +plot(f.(θs), g.(θs), legend=false, aspect_ratio=:equal) +scatter!(f.(θs), g.(θs)) ``` --- @@ -610,7 +610,7 @@ As with the plot of a univariate function, there is a convenience interface for ```{julia} -plot(𝒇, 𝒈, 0, 2pi, aspect_ratio=:equal) +plot(f, g, 0, 2pi, aspect_ratio=:equal) ``` ##### Example @@ -876,6 +876,24 @@ answ = 2 radioq(choices, answ) ``` +###### Question + +The `plot` function can have its data specified through a vector of points. Such data can be generated through a comprehension. Does this command plot the given expression avoiding the need to define a function? + +```{julia} +#| eval: false +plot([(x, sin(x^2)-sin(x)) for x in range(-pi, pi, 100)]) +``` + +```{julia} +#| echo: false +explanation = """ +Yes, it does. Whether this is more convenient than say `plot(x -> sin(x^2) - sin(x), -pi, pi)` is a different question. +""" +buttonq(["Yes", "No"], 1; explanation) +``` + + ###### Question diff --git a/quarto/precalc/polynomial.qmd b/quarto/precalc/polynomial.qmd index 495841f8..260a6962 100644 --- a/quarto/precalc/polynomial.qmd +++ b/quarto/precalc/polynomial.qmd @@ -28,7 +28,7 @@ Polynomials are a particular class of expressions that are simple enough to have Here we discuss some vocabulary and basic facts related to polynomials and show how the add-on `SymPy` package can be used to model polynomial expressions within `SymPy`. `SymPy` provides a Computer Algebra System (CAS) for `Julia`. In this case, by leveraging a mature `Python` package [SymPy](https://www.sympy.org/). Later we will discuss the `Polynomials` package for polynomials. -For our purposes, a *monomial* is simply a non-negative integer power of $x$ (or some other indeterminate symbol) possibly multiplied by a scalar constant. For example, $5x^4$ is a monomial, as are constants, such as $-2=-2x^0$ and the symbol itself, as $x = x^1$. In general, one may consider restrictions on where the constants can come from, and consider more than one symbol, but we won't pursue this here, restricting ourselves to the case of a single variable and real coefficients. +For our purposes, a *monomial* is simply a non-negative integer power of $x$ (or some other indeterminate symbol) possibly multiplied by a scalar constant. For example, $5x^4$ is a monomial, as are constants, such as $-2$ (it being $-2x^0$) and the symbol $x$ itself (it begin $x^1$. In general, one may consider restrictions on where the constants can come from, and consider more than one symbol, but we won't pursue this here, restricting ourselves to the case of a single variable and real coefficients. A *polynomial* is a sum of monomials. After combining terms with same powers, a non-zero polynomial may be written uniquely as: @@ -158,7 +158,7 @@ The above shows that multiple symbols can be defined at once. The annotation `x: :::{.callout-note} ## Note -Macros in `Julia` are just transformations of the syntax into other syntax. The `@` indicates they behave differently than regular function calls. +Macros in `Julia` are just transformations of the syntax into other syntax. The leading `@` indicates they behave differently than regular function calls. ::: @@ -235,7 +235,7 @@ To illustrate, to do the task above for the polynomial $-16x^2 + 100$ we could h p(x => (x-1)^2) ``` -This "call" notation takes pairs (designated by `a=>b`) where the left-hand side is the variable to substitute for, and the right-hand side the new value. The value to substitute can depend on the variable, as illustrated; be a different variable; or be a numeric value, such as $2$: +This "call" notation takes pairs (designated by `a=>b`) where the left-hand side is the variable to substitute for, and the right-hand side the new value. (This mirrors a similar use with `replace`.) The value to substitute can depend on the variable, as illustrated; be a different variable; or be a numeric value, such as $2$: ```{julia} @@ -264,15 +264,15 @@ Suppose we have the polynomial $p = ax^2 + bx +c$. What would it look like if we ```{julia} @syms E F -p₂ = a*x^2 + b*x + c -p₂(x => x-E) + F +p = a*x^2 + b*x + c +p(x => x-E) + F ``` And expanded this becomes: ```{julia} -expand(p₂(x => x-E) + F) +expand(p(x => x-E) + F) ``` ### Conversion of symbolic numbers to Julia numbers @@ -287,7 +287,17 @@ p = -16x^2 + 100 y = p(2) ``` -The value, $36$ is still symbolic, but clearly an integer. If we are just looking at the output, we can easily translate from the symbolic value to an integer, as they print similarly. However the conversion to an integer, or another type of number, does not happen automatically. If a number is needed to pass along to another `Julia` function, it may need to be converted. In general, conversions between different types are handled through various methods of `convert`. However, with `SymPy`, the `N` function will attempt to do the conversion for you: +The value, $36$ is still symbolic, but clearly an integer. If we are just looking at the output, we can easily translate from the symbolic value to an integer, as they print similarly. However the conversion to an integer, or another type of number, does not happen automatically. If a number is needed to pass along to another `Julia` function, it may need to be converted. In general, conversions between different types are handled through various methods of `convert`. + +For real numbers, an easy to call conversion is available through the `float` method: + +```{julia} +float(y) +``` + + + +The use of the generic `float` method returns a floating point number. `SymPy` objects have their own internal types. To preserve these on conversion to a related `Julia` value, the `N` function from `SymPy` is useful: ```{julia} @@ -296,7 +306,7 @@ p = -16x^2 + 100 N(p(2)) ``` -Where `convert(T,x)` requires a specification of the type to convert `x` to, `N` attempts to match the data type used by SymPy to store the number. As such, the output type of `N` may vary (rational, a BigFloat, a float, etc.) For getting more digits of accuracy, a precision can be passed to `N`. The following command will take the symbolic value for $\pi$, `PI`, and produce about $60$ digits worth as a `BigFloat` value: +Where `convert(T, x)` requires a specification of the type to convert `x` to, `N` attempts to match the data type used by SymPy to store the number. As such, the output type of `N` may vary (rational, a BigFloat, a float, etc.) For getting more digits of accuracy, a precision can be passed to `N`. The following command will take the symbolic value for $\pi$, `PI`, and produce about $60$ digits worth as a `BigFloat` value: ```{julia} @@ -329,7 +339,7 @@ pp = lambdify(p) pp(2) ``` -The `lambdify` function uses the name of the similar `SymPy` function which is named after Pythons convention of calling anoynmous function "lambdas." The use above is straightforward. Only slightly more complicated is the use when there are multiple symbolic values. For example: +The `lambdify` function uses the name of the similar `SymPy` function which is named after Python's convention of calling anoynmous function "lambdas." The use above is straightforward. Only slightly more complicated is the use when there are multiple symbolic values. For example: ```{julia} @@ -366,7 +376,7 @@ This graph illustrates the key features of polynomial graphs: * there may be values for `x` where the graph crosses the $x$ axis (real roots of the polynomial); * there may be peaks and valleys (local maxima and local minima) - * except for constant polynomials, the ultimate behaviour for large values of $\lvert x\rvert$ is either both sides of the graph going to positive infinity, or negative infinity, or as in this graph one to the positive infinity and one to negative infinity. In particular, there is no *horizontal asymptote*. + * except for constant polynomials, the ultimate behaviour for large values of $|x|$ is either both sides of the graph going to positive infinity, or negative infinity, or as in this graph one to the positive infinity and one to negative infinity. In particular, there is no *horizontal asymptote*. To investigate this last point, let's consider the case of the monomial $x^n$. When $n$ is even, the following animation shows that larger values of $n$ have greater growth once outside of $[-1,1]$: @@ -390,7 +400,7 @@ plotly() ImageFile(imgfile, caption) ``` -Of course, this is expected, as, for example, $2^2 < 2^4 < 2^6 < \cdots$. The general shape of these terms is similar - $U$ shaped, and larger powers dominate the smaller powers as $\lvert x\rvert$ gets big. +Of course, this is expected, as, for example, $2^2 < 2^4 < 2^6 < \cdots$. The general shape of these terms is similar - $U$ shaped, and larger powers dominate the smaller powers as $|x|$ gets big. For odd powers of $n$, the graph of the monomial $x^n$ is no longer $U$ shaped, but rather constantly increasing. This graph of $x^5$ is typical: @@ -443,7 +453,7 @@ $$ x^5 \cdot (2 - \frac{1}{x^4} + \frac{1}{x^5}). $$ -For large $\lvert x\rvert$, the last two terms in the product on the right get close to $0$, so this expression is *basically* just $2x^5$ - the leading term. +For large $|x|$, the last two terms in the product on the right get close to $0$, so this expression is *basically* just $2x^5$ - the leading term. --- @@ -523,7 +533,7 @@ The `SymPy` function `expand` will perform these algebraic manipulations without expand((x-1)*(x-2)*(x-3)) ``` -Factoring a polynomial is several weeks worth of lessons, as there is no one-size-fits-all algorithm to follow. There are some tricks that are taught: for example factoring differences of perfect squares, completing the square, the rational root theorem, $\dots$. But in general the solution is not automated. The `SymPy` function `factor` will find all rational factors (terms like $(qx-p)$), but will leave terms that do not have rational factors alone. For example: +Factoring a polynomial is several weeks worth of lessons, as there is no one-size-fits-all algorithm available to teach to algebra students. There are some tricks that are taught: for example factoring differences of perfect squares, completing the square, the rational root theorem, $\dots$. But in general the solution is not automated without some more advanced techniques. The `SymPy` function `factor` will find all rational factors (terms like $(qx-p)$), but will leave terms that do not have rational factors alone. For example: ```{julia} @@ -563,7 +573,7 @@ However, *polynomial functions* are easily represented by `Julia`, for example, f(x) = -16x^2 + 100 ``` -The distinction is subtle, the expression is turned into a function just by adding the `f(x) =` preface. But to `Julia` there is a big distinction. The function form never does any computation until after a value of $x$ is passed to it. Whereas symbolic expressions can be manipulated quite freely before any numeric values are specified. +The distinction is subtle, the expression is turned into a function just by adding the "`f(x) =`" preface. But to `Julia` there is a big distinction. The function form never does any computation until after a value of $x$ is passed to it. Whereas symbolic expressions can be manipulated quite freely before any numeric values are specified. It is easy to create a symbolic expression from a function - just evaluate the function on a symbolic value: @@ -581,7 +591,7 @@ p = f(x) p(2) ``` -For many uses, the distinction is unnecessary to make, as the many functions will work with any callable expression. One such is `plot` – either `plot(f, a, b)` or `plot(f(x),a, b)` will produce the same plot using the `Plots` package. +For many uses, the distinction is unnecessary to make, as the many functions will work with any callable expression. For `Plots` there is a recipe – either `plot(f, a, b)` or `plot(f(x), a, b)` will produce the same plot using the `Plots` package. ## Questions diff --git a/quarto/precalc/polynomial_roots.qmd b/quarto/precalc/polynomial_roots.qmd index a04b6f8c..8f5dd794 100644 --- a/quarto/precalc/polynomial_roots.qmd +++ b/quarto/precalc/polynomial_roots.qmd @@ -288,33 +288,35 @@ Finding roots with `SymPy` can also be done through its `solve` function, a func ```{julia} -solve(x^2 + 2x - 3) +solve(x^2 + 2x - 3 ~ 0, x) ``` -The answer is a vector of values that when substituted in for the free variable `x` produce $0.$ The call to `solve` does not have an equals sign. To solve a more complicated expression of the type $f(x) = g(x),$ one can solve $f(x) - g(x) = 0,$ use the `Eq` function, or use `f ~ g`. +The answer is a vector of values that when substituted in for the free variable `x` produce $0.$ +We use the `~` notation to define an equation to pass to `solve`. This convention is not necessary here, as `SymPy` will assume an expression passed to solve is an equation set to `0`, but is pedagogically useful. Equations do not have an equals sign, which is reserved for assignment. To solve a more complicated expression of the type $f(x) = g(x),$ one can solve $f(x) - g(x) = 0,$ use the `Eq` function, or use `f ~ g`. -When the expression to solve has more than one free variable, the variable to solve for should be explicitly stated with a second argument. For example, here we show that `solve` is aware of the quadratic formula: + +When the expression to solve has more than one free variable, the variable to solve for should be explicitly stated with a second argument. (The specification above is unnecessary.) For example, here we show that `solve` is aware of the quadratic formula: ```{julia} @syms a b::real c::positive -solve(a*x^2 + b*x + c, x) +solve(a*x^2 + b*x + c ~ 0, x) ``` The `solve` function will respect assumptions made when a variable is defined through `symbols` or `@syms`: ```{julia} -solve(a^2 + 1) # works, as a can be complex +solve(a^2 + 1 ~ 0, a) # works, as a can be complex ``` ```{julia} -solve(b^2 + 1) # fails, as b is assumed real +solve(b^2 + 1 ~ 0, b) # fails, as b is assumed real ``` ```{julia} -solve(c + 1) # fails, as c is assumed positive +solve(c + 1 ~ 0, c) # fails, as c is assumed positive ``` Previously, it was mentioned that `factor` only factors polynomials with integer coefficients over rational roots. However, `solve` can be used to factor. Here is an example: @@ -328,7 +330,7 @@ Nothing is found, as the roots are $\pm \sqrt{2}$, irrational numbers. ```{julia} -rts = solve(x^2 - 2) +rts = solve(x^2 - 2 ~ 0, x) prod(x-r for r in rts) ``` @@ -337,7 +339,7 @@ Solving cubics and quartics can be done exactly using radicals. For example, her ```{julia} @syms y # possibly complex -solve(y^4 - 2y - 1) +solve(y^4 - 2y - 1 ~ 0, y) ``` Third- and fourth-degree polynomials can be solved in general, with increasingly more complicated answers. The following finds one of the answers for a general third-degree polynomial: @@ -347,7 +349,7 @@ Third- and fourth-degree polynomials can be solved in general, with increasingly #| hold: true @syms a[0:3] p = sum(a*x^(i-1) for (i,a) in enumerate(a)) -rts = solve(p, x) +rts = solve(p ~ 0, x) rts[1] # there are three roots ``` @@ -355,7 +357,7 @@ Some fifth degree polynomials are solvable in terms of radicals, however, `solve ```{julia} -solve(x^5 - x + 1) +solve(x^5 - x + 1 ~ 0, x) ``` (Though there is no formula involving only radicals like the quadratic equation, there is a formula for the roots in terms of a function called the [Bring radical](http://en.wikipedia.org/wiki/Bring_radical).) @@ -399,7 +401,7 @@ The `solve` function can be used to get numeric approximations to the roots. It ```{julia} #| hold: true -rts = solve(x^5 - x + 1 ~ 0) +rts = solve(x^5 - x + 1 ~ 0, x) N.(rts) # note the `.(` to broadcast over all values in rts ``` @@ -411,25 +413,25 @@ Here we see another example: ```{julia} ex = x^7 -3x^6 + 2x^5 -1x^3 + 2x^2 + 1x^1 - 2 -solve(ex) +solve(ex ~ 0, x) ``` This finds two of the seven possible roots, the remainder of the real roots can be found numerically: ```{julia} -N.(solve(ex)) +N.(solve(ex ~ 0, x)) ``` ### The solveset function -SymPy is phasing in the `solveset` function to replace `solve`. The main reason being that `solve` has too many different output types (a vector, a dictionary, ...). The output of `solveset` is always a set. For tasks like this, which return a finite set, we use the `elements` function to access the individual answers. To illustrate: +SymPy is phasing in the `solveset` function to replace `solve`. The main reason being that `solve` has too many different output types (a vector, a dictionary, ...). The output of `solveset` is always a set. For tasks like this, which return a finite set, we use the `collect` function to access the individual answers. To illustrate: ```{julia} p = 8x^4 - 8x^2 + 1 -p_rts = solveset(p) +p_rts = solveset(p ~ 0, x) ``` The `p_rts` object, a `Set`, does not allow indexed access to its elements. For that `collect` will work to return a vector: @@ -443,7 +445,7 @@ To get the numeric approximation, we can broadcast: ```{julia} -N.(solveset(p)) +N.(solveset(p ~ 0, x)) ``` (There is no need to call `collect` -- though you can -- as broadcasting over a set falls back to broadcasting over the iteration of the set and in this case returns a vector.) @@ -497,7 +499,7 @@ in fact there are three, two are *very* close together: ```{julia} -N.(solve(h)) +N.(solve(h ~ 0, x)) ``` :::{.callout-note} @@ -545,7 +547,7 @@ For another example, if we looked at $f(x) = x^5 - 100x^4 + 4000x^3 - 80000x^2 + ```{julia} j = x^5 - 100x^4 + 4000x^3 - 80000x^2 + 799999x - 3199979 -N.(solve(j)) +N.(solve(j ~ 0, x)) ``` ### Cauchy's bound on the magnitude of the real roots. diff --git a/quarto/precalc/polynomials_package.qmd b/quarto/precalc/polynomials_package.qmd index 0d58b39c..288bd31a 100644 --- a/quarto/precalc/polynomials_package.qmd +++ b/quarto/precalc/polynomials_package.qmd @@ -93,7 +93,7 @@ This variable is a `Polynomial` object, so can be manipulated as a polynomial; w r = (x-2)^2 * (x-1) * (x+1) ``` -The product is expanded for storage by `Polynomials`, which may not be desirable for some uses. A new variable can produced by calling `variable()`; so we could have constructed `p` by: +The product is expanded for storage by `Polynomials`, which may not be desirable for some uses. A new variable can be produced by calling `variable()`; so we could have constructed `p` by: ```{julia} @@ -106,7 +106,7 @@ A polynomial in factored form, as `r` above is, can be constructed from its root ```{julia} -fromroots([2,2,1,-1]) +fromroots([2, 2, 1, -1]) ``` The `fromroots` function is basically the [factor theorem](https://en.wikipedia.org/wiki/Factor_theorem) which links the factored form of the polynomial with the roots of the polynomial: $(x-k)$ is a factor of $p$ if and only if $k$ is a root of $p$. By combining a factor of the type $(x-k)$ for each specified root, the polynomial can be constructed by multiplying its factors. For example, using `prod` and a generarator, we would have: @@ -115,7 +115,7 @@ The `fromroots` function is basically the [factor theorem](https://en.wikipedia. ```{julia} #| hold: true x = variable() -prod(x - k for k in [2,2,1,-1]) +prod(x - k for k in [2, 2, 1, -1]) ``` The `Polynomials` package has different ways to represent polynomials, and a factored form can also be used. For example, the `fromroots` function constructs polynomials from the specified roots and `FactoredPolynomial` leaves these in a factored form: @@ -156,11 +156,11 @@ Polynomial objects have a plot recipe defined – plotting from the `Plots` pack plot(r, legend=false) # suppress the legend ``` -The choice of domain is heuristically identified; and it can be manually adjusted, as with: +The choice of domain is heuristically identified; it can be manually adjusted, as with: ```{julia} -plot(r, 1.5, 2.5, legend=false) +plot(r, 1.5, 2.5; legend=false) ``` ## Roots @@ -205,6 +205,13 @@ p = (x-1)^5 Polynomials.Multroot.multroot(p) ``` +Converting to the `FactoredPolynomial` type also does this work: + +```{julia} +convert(FactoredPolynomial, p) +``` + + Floating point error can also prevent the finding of real roots. For example, this polynomial has $3$ real roots, but `roots` finds but $1$, as the two nearby ones are identified as complex: @@ -251,21 +258,22 @@ A line, $y=mx+b$ can be a linear polynomial or a constant depending on $m$, so w Knowing we can succeed, we approach the problem of $3$ points, say $(x_0, y_0)$, $(x_1,y_1)$, and $(x_2, y_2)$. There is a polynomial $p = a\cdot x^2 + b\cdot x + c$ with $p(x_i) = y_i$. This gives $3$ equations for the $3$ unknown values $a$, $b$, and $c$: - +$$ \begin{align*} a\cdot x_0^2 + b\cdot x_0 + c &= y_0\\ a\cdot x_1^2 + b\cdot x_1 + c &= y_1\\ a\cdot x_2^2 + b\cdot x_2 + c &= y_2\\ \end{align*} +$$ -Solving this with `SymPy` is tractable. A comprehension is used below to create the $3$ equations; the `zip` function is a simple means to iterate over $2$ or more iterables simultaneously: +Solving this with `SymPy` is tractable. A generator is used below to create the $3$ equations; the `zip` function is a simple means to iterate over $2$ or more iterables simultaneously: ```{julia} SymPy.@syms a b c xs[0:2] ys[0:2] -eqs = [a*xi^2 + b*xi + c ~ yi for (xi,yi) in zip(xs, ys)] -abc = SymPy.solve(eqs, [a,b,c]) +eqs = tuple((a*xi^2 + b*xi + c ~ yi for (xi,yi) in zip(xs, ys))...) +abc = SymPy.solve(eqs, (a,b,c)) ``` As can be seen, the terms do get quite unwieldy when treated symbolically. Numerically, the `fit` function from the `Polynomials` package will return the interpolating polynomial. To compare, @@ -294,8 +302,8 @@ A related problem, that will arise when finding iterative means to solve for zer ```{julia} #| hold: true SymPy.@syms a b c xs[0:2] ys[0:2] -eqs = [a*yi^2 + b*yi + c ~ xi for (xi, yi) in zip(xs,ys)] -abc = SymPy.solve(eqs, [a,b,c]) +eqs = tuple((a*yi^2 + b*yi + c ~ xi for (xi, yi) in zip(xs,ys))...) +abc = SymPy.solve(eqs, (a,b,c)) abc[c] ``` @@ -306,8 +314,8 @@ We can graphically see the result for the specific values of `xs` and `ys` as fo #| hold: true #| echo: false SymPy.@syms a b c xs[0:2] ys[0:2] -eqs = [a*yi^2 + b*yi + c ~ xi for (xi, yi) in zip(xs,ys)] -abc = SymPy.solve(eqs, [a,b,c]) +eqs = tuple((a*yi^2 + b*yi + c ~ xi for (xi, yi) in zip(xs,ys))...) +abc = SymPy.solve(eqs, (a,b,c)) abc[c] 𝒙s, 𝒚s = [1,2,3], [3,1,2] @@ -390,12 +398,13 @@ radioq(choices, answ, keep_order=true) Consider the polynomial $p(x) = a_1 x - a_3 x^3 + a_5 x^5$ where - +$$ \begin{align*} a_1 &= 4(\frac{3}{\pi} - \frac{9}{16}) \\ a_3 &= 2a_1 -\frac{5}{2}\\ a_5 &= a_1 - \frac{3}{2}. \end{align*} +$$ * Form the polynomial `p` by first computing the $a$s and forming `p=Polynomial([0,a1,0,-a3,0,a5])` @@ -546,12 +555,13 @@ This last answer is why $p$ is called an *interpolating* polynomial and this que The Chebyshev ($T$) polynomials are polynomials which use a different basis from the standard basis. Denote the basis elements $T_0$, $T_1$, ... where we have $T_0(x) = 1$, $T_1(x) = x$, and for bigger indices $T_{i+1}(x) = 2xT_i(x) - T_{i-1}(x)$. The first others are then: - +$$ \begin{align*} T_2(x) &= 2xT_1(x) - T_0(x) = 2x^2 - 1\\ T_3(x) &= 2xT_2(x) - T_1(x) = 2x(2x^2-1) - x = 4x^3 - 3x\\ T_4(x) &= 2xT_3(x) - T_2(x) = 2x(4x^3-3x) - (2x^2-1) = 8x^4 - 8x^2 + 1 \end{align*} +$$ With these definitions what is the polynomial associated to the coefficients $[0,1,2,3]$ with this basis? diff --git a/quarto/precalc/ranges.qmd b/quarto/precalc/ranges.qmd index 986cda66..0a30c388 100644 --- a/quarto/precalc/ranges.qmd +++ b/quarto/precalc/ranges.qmd @@ -66,7 +66,7 @@ Rather than express sequences by the $a_0$, $h$, and $n$, `Julia` uses the start 1:10 ``` -But wait, nothing different printed? This is because `1:10` is efficiently stored. Basically, a recipe to generate the next number from the previous number is created and `1:10` just stores the start and end point and that recipe is used to generate the set of all values. To expand the values, you have to ask for them to be `collect`ed (though this typically isn't needed in practice): +But wait, nothing different printed? This is because `1:10` is efficiently stored. Basically, a recipe to generate the next number from the previous number is created and `1:10` just stores the start and end points (the step size is implicit in how this is stored) and that recipe is used to generate the set of all values. To expand the values, you have to ask for them to be `collect`ed (though this typically isn't needed in practice, as values are usually *iterated* over): ```{julia} @@ -168,11 +168,13 @@ Now we concentrate on some more general styles to modify a sequence to produce a ### Filtering +The act of throwing out elements of a collection based on some condition is called *filtering*. + For example, another way to get the values between $0$ and $100$ that are multiples of $7$ is to start with all $101$ values and throw out those that don't match. To check if a number is divisible by $7$, we could use the `rem` function. It gives the remainder upon division. Multiples of `7` match `rem(m, 7) == 0`. Checking for divisibility by seven is unusual enough there is nothing built in for that, but checking for division by $2$ is common, and for that, there is a built-in function `iseven`. -The act of throwing out elements of a collection based on some condition is called *filtering*. The `filter` function does this in `Julia`; the basic syntax being `filter(predicate_function, collection)`. The "`predicate_function`" is one that returns either `true` or `false`, such as `iseven`. The output of `filter` consists of the new collection of values - those where the predicate returns `true`. +The `filter` function does this in `Julia`; the basic syntax being `filter(predicate_function, collection)`. The "`predicate_function`" is one that returns either `true` or `false`, such as `iseven`. The output of `filter` consists of the new collection of values - those where the predicate returns `true`. To see it used, lets start with the numbers between `0` and `25` (inclusive) and filter out those that are even: @@ -210,7 +212,7 @@ Let's return to the case of the set of even numbers between $0$ and $100$. We ha * The set of numbers $\{2k: k=0, \dots, 50\}$. -While `Julia` has a special type for dealing with sets, we will use a vector for such a set. (Unlike a set, vectors can have repeated values, but as vectors are more widely used, we demonstrate them.) Vectors are described more fully in a previous section, but as a reminder, vectors are constructed using square brackets: `[]` (a special syntax for [concatenation](http://docs.julialang.org/en/latest/manual/arrays/#concatenation)). Square brackets are used in different contexts within `Julia`, in this case we use them to create a *collection*. If we separate single values in our collection by commas (or semicolons), we will create a vector: +While `Julia` has a special type for dealing with sets, we will use a vector for such a set. (Unlike a set, vectors can have repeated values, but, as vectors are more widely used, we demonstrate them.) Vectors are described more fully in a previous section, but as a reminder, vectors are constructed using square brackets: `[]` (a special syntax for [concatenation](http://docs.julialang.org/en/latest/manual/arrays/#concatenation)). Square brackets are used in different contexts within `Julia`, in this case we use them to create a *collection*. If we separate single values in our collection by commas (or semicolons), we will create a vector: ```{julia} @@ -277,20 +279,24 @@ A typical pattern would be to generate a collection of numbers and then apply a sum([2^i for i in 1:10]) ``` -Conceptually this is easy to understand, but computationally it is a bit inefficient. The generator syntax allows this type of task to be done more efficiently. To use this syntax, we just need to drop the `[]`: +Conceptually this is easy to understand: one step generates the numbers, the other adds them up. Computationally it is a bit inefficient. The generator syntax allows this type of task to be done more efficiently. To use this syntax, we just need to drop the `[]`: ```{julia} sum(2^i for i in 1:10) ``` -(The difference being no intermediate object is created to store the collection of all values specified by the generator.) +The difference being no intermediate object is created to store the collection of all values specified by the generator. Not all functions allow generators as arguments, but most common reductions do. ### Filtering generated expressions -Both comprehensions and generators allow for filtering through the keyword `if`. The following shows *one* way to add the prime numbers in $[1,100]$: +Both comprehensions and generators allow for filtering through the keyword `if`. The basic pattern is + +`[expr for variable in collection if expr]` + +The following shows *one* way to add the prime numbers in $[1,100]$: ```{julia} @@ -299,6 +305,10 @@ sum(p for p in 1:100 if isprime(p)) The value on the other side of `if` should be an expression that evaluates to either `true` or `false` for a given `p` (like a predicate function, but here specified as an expression). The value returned by `isprime(p)` is such. +::: {.callout-note} +In these notes we primarily use functions rather than expressions for various actions. We will see creating a function is not much more difficult than specifying an expression, though there is additional notation necessary. Generators are one very useful means to use expressions, symbolic math will be seen as another. +::: + In this example, we use the fact that `rem(k, 7)` returns the remainder found from dividing `k` by `7`, and so is `0` when `k` is a multiple of `7`: @@ -323,7 +333,7 @@ This example of Stefan Karpinski's comes from a [blog](http://julialang.org/blog First, a simple question: using pennies, nickels, dimes, and quarters how many different ways can we generate one dollar? Clearly $100$ pennies, or $20$ nickels, or $10$ dimes, or $4$ quarters will do this, so the answer is at least four, but how much more than four? -Well, we can use a comprehension to enumerate the possibilities. This example illustrates how comprehensions and generators can involve one or more variable for the iteration. +Well, we can use a comprehension to enumerate the possibilities. This example illustrates how comprehensions and generators can involve one or more variable for the iteration. By judiciously choosing what is iterated over, the entire set can be described. First, we either have $0,1,2,3$, or $4$ quarters, or $0$, $25$ cents, $50$ cents, $75$ cents, or a dollar's worth. If we have, say, $1$ quarter, then we need to make up $75$ cents with the rest. If we had $3$ dimes, then we need to make up $45$ cents out of nickels and pennies, if we then had $6$ nickels, we know we must need $15$ pennies. @@ -333,22 +343,42 @@ The following expression shows how counting this can be done through enumeration ```{julia} -ways = [(q, d, n, p) for q = 0:25:100 for d = 0:10:(100 - q) for n = 0:5:(100 - q - d) for p = (100 - q - d - n)] +ways = [(q, d, n, p) + for q = 0:25:100 + for d = 0:10:(100 - q) + for n = 0:5:(100 - q - d) + for p = (100 - q - d - n)] length(ways) ``` -We see $242$ cases, each distinct. The first $3$ are: +There are $242$ distinct cases. The first three are: ```{julia} ways[1:3] ``` -The generating expression reads naturally. It introduces the use of multiple `for` statements, each subsequent one depending on the value of the previous (working left to right). Now suppose, we want to ensure that the amount in pennies is less than the amount in nickels, etc. We could use `filter` somehow to do this for our last answer, but using `if` allows for filtering while the events are generating. Here our condition is simply expressed: `q > d > n > p`: +The generating expression reads naturally. It introduces the use of multiple `for` statements, each subsequent one depending on the value of the previous (working left to right). + +The cashier might like to know the number of coins, not the dollar amount: +```{julia} +[amt ./ [25, 10, 5, 1] for amt in ways[1:3]] +``` + +There are various ways to get integer values, and not floating point values. One way is to call `round`. Here though, we use the integer division operator, `div`, through its infix operator `÷`: ```{julia} -[(q, d, n, p) for q = 0:25:100 +[amt .÷ [25, 10, 5, 1] for amt in ways[1:3]] +``` + + +Now suppose, we want to ensure that the amount in pennies is less than the amount in nickels, etc. We could use `filter` somehow to do this for our last answer, but using `if` allows for filtering while the events are generating. Here our condition is simply expressed: `q > d > n > p`: + + +```{julia} +[(q, d, n, p) + for q = 0:25:100 for d = 0:10:(100 - q) for n = 0:5:(100 - q - d) for p = (100 - q - d - n) @@ -523,6 +553,24 @@ radioq(choices, answ) ###### Question +An arithmetic sequence ($a_0$, $a_1 = a_0 + h$, $a_2=a_0 + 2h, \dots,$ $a_n=a_0 + n\cdot h$) is specified with a starting point ($a_0$), a step size ($h$), and a number of points $(n+1)$. This is not the case with the colon constructor which take a starting point, a step size, and a suggested last value. This is not the case a with the default for the `range` function, with signature `range(start, stop, length)`. However, the documentation for `range` shows that indeed the three values ($a_0$, $h$, and $n$) can be passed in. Which signature (from the docs) would allow this: + +```{julia} +#| echo: false +choices = [ + "`range(start, stop, length)`", + "`range(start, stop; length, step)`", + "`range(start; length, stop, step)`", + "`range(;start, length, stop, step)`"] +answer = 3 +explanation = """ +This is a somewhat vague question, but the use of `range(a0; length=n+1, step=h)` will produce the arithmetic sequence with this parameterization. +""" +buttonq(choices, answer; explanation) +``` + +###### Question + Create the sequence $10, 100, 1000, \dots, 1,000,000$ using a list comprehension. Which of these works? @@ -670,7 +718,7 @@ Let's see if `4137 8947 1175 5804` is a valid credit card number? First, we enter it as a value and immediately break the number into its digits: ```{julia} -x = 4137_8947_1175_5904 # _ in a number is ignored by parser +x = 4137_8947_1175_5804 # _ in a number is ignored by parser xs = digits(x) ``` @@ -688,7 +736,7 @@ for i in 1:2:length(xs) end ``` -Number greater than 9, have their digits added, then all the resulting numbers are added. This can be done with a generator: +Numbers greater than 9, have their digits added, then all the resulting numbers are added. This can be done with a generator: ```{julia} diff --git a/quarto/precalc/rational_functions.qmd b/quarto/precalc/rational_functions.qmd index 3244fb07..fe4a6fbc 100644 --- a/quarto/precalc/rational_functions.qmd +++ b/quarto/precalc/rational_functions.qmd @@ -153,7 +153,7 @@ This decomposition breaks the rational expression into two pieces: $x-4$ and $40 plot(apart(h) - (x - 4), 10, 100) ``` -Similarly, a plot over $[-100, -10]$ would show decay towards $0$, though in that case from below. Combining these two facts then, it is now no surprise that the graph of the rational function $f(x)$ should approach a straight line, in this case $y=x-4$ as $x \rightarrow \pm \infty$. +Similarly, a plot over $[-100, -10]$ would show decay towards $0$, though in that case from below. Combining these two facts then, it is now no surprise that the graph of the rational function $f(x)$ should approach a straight line, in this case $y=x-4$, as $x \rightarrow \pm \infty$. We can easily do most of this analysis without needing a computer or algebra. First, we should know the four eventual shapes of a polynomial, that the graph of $y=mx$ is a line with slope $m$, the graph of $y = c$ is a constant line at height $c$, and the graph of $y=c/x^m$, $m > 0$ will decay towards $0$ as $x \rightarrow \pm\infty$. The latter should be clear, as $x^m$ gets big, so its reciprocal goes towards $0$. @@ -358,8 +358,8 @@ We can avoid the vertical asymptotes in our viewing window. For example we could ```{julia} -𝒇(x) = (x-1)^2 * (x-2) / ((x+3)*(x-3) ) -plot(𝒇, -2.9, 2.9) +f(x) = (x-1)^2 * (x-2) / ((x+3)*(x-3) ) +plot(f, -2.9, 2.9) ``` This backs off by $\delta = 0.1$. As we have that $3 - 2.9$ is $\delta$ and $1/\delta$ is 10, the $y$ axis won't get too large, and indeed it doesn't. @@ -373,7 +373,7 @@ We can also clip the `y` axis. The `plot` function can be passed an argument `yl ```{julia} #| hold: true -plot(𝒇, -5, 5, ylims=(-20, 20)) +plot(f, -5, 5, ylims=(-20, 20)) ``` This isn't ideal, as the large values are still computed, just the viewing window is clipped. This leaves the vertical asymptotes still effecting the graph. @@ -386,7 +386,7 @@ This was discussed in an earlier section where the `rangeclamp` function was int ```{julia} -plot(rangeclamp(𝒇, 30), -25, 25) # rangeclamp is in the CalculusWithJulia package +plot(rangeclamp(f, 30), -25, 25) # rangeclamp is in the CalculusWithJulia package ``` We can see the general shape of $3$ curves broken up by the vertical asymptotes. The two on the side heading off towards the line $x-4$ and the one in the middle. We still can't see the precise location of the zeros, but that wouldn't be the case with most graphs that show asymptotic behaviors. However, we can clearly tell where to "zoom in" were those of interest. @@ -464,28 +464,28 @@ In the following, we import some functions from the `Polynomials` package. We av import Polynomials: Polynomial, variable, lowest_terms, fromroots, coeffs ``` -The `Polynomials` package has support for rational functions. The `//` operator can be used to create rational expressions: +The `Polynomials` package has support for rational functions. The `//` operator can be used to create rational expressions from polynomial expressions: ```{julia} -𝒙 = variable() -𝒑 = (𝒙-1)*(𝒙-2)^2 -𝒒 = (𝒙-2)*(𝒙-3) -𝒑𝒒 = 𝒑 // 𝒒 +x = variable() +p = (x-1)*(x-2)^2 +q = (x-2)*(x-3) +pq = p // q ``` A rational expression is a formal object; a rational function the viewpoint that this object will be evaluated by substituting values for the indeterminate. Rational expressions made within `Polynomials` are evaluated just like functions: ```{julia} -𝒑𝒒(4) # p(4)/q(4) +pq(4) # p(4)/q(4) ``` The rational expressions are not in lowest terms unless requested through the `lowest_terms` method: ```{julia} -lowest_terms(𝒑𝒒) +lowest_terms(pq) ``` For polynomials as simple as these, this computation is not a problem, but there is the very real possibility that the lowest term computation may be incorrect. Unlike `SymPy` which factors symbolically, `lowest_terms` uses a numeric algorithm and does not, as would be done by hand or with `SymPy`, factor the polynomial and then cancel common factors. @@ -512,7 +512,7 @@ Similarly, we can divide a polynomial by the polynomial $1$, which in `Julia` is ```{julia} -pp = 𝒑 // one(𝒑) +pp = p // one(p) ``` And as with rational numbers, `p` is recovered by `numerator`: @@ -532,7 +532,7 @@ For the polynomial `pq` above, we have from observation that $1$ and $2$ will be ```{julia} -plot(𝒑𝒒) +plot(pq) ``` To better see the zeros, a plot over a narrower interval, say $[0,2.5]$, would be encouraged; to better see the slant asymptote, a plot over a wider interval, say $[-10,10]$, would be encouraged. @@ -608,8 +608,8 @@ We can verify this does what we want through example with the previously defined ```{julia} -𝐩 = Polynomial([1, 2, 3, 4, 5]) -𝐪 = mobius_transformation(𝐩, 4, 6) +p = Polynomial([1, 2, 3, 4, 5]) +q = mobius_transformation(p, 4, 6) ``` As contrasted with @@ -619,9 +619,9 @@ As contrasted with #| hold: true a, b = 4, 6 -pq = 𝐩 // one(𝐩) +pq = p // one(p) x = variable(pq) -d = Polynomials.degree(𝐩) +d = Polynomials.degree(p) numerator(lowest_terms( (x + 1)^d * pq((a*x + b)/(x + 1)))) ``` @@ -699,22 +699,22 @@ More challenging problems can be readily handled by this package. The following ```{julia} -𝒔 = Polynomial([0,1]) # also just variable(Polynomial{Int}) -𝒖 = -1 + 254*𝒔 - 16129*𝒔^2 + 𝒔^15 +s = Polynomial([0,1]) # also just variable(Polynomial{Int}) +u = -1 + 254*s - 16129*s^2 + s^15 ``` has three real roots, two of which are clustered very close to each other: ```{julia} -𝒔𝒕 = ANewDsc(coeffs(𝒖)) +st = ANewDsc(coeffs(u)) ``` and ```{julia} -refine_roots(𝒔𝒕) +refine_roots(st) ``` The SymPy package (`sympy.real_roots`) can accurately identify the three roots but it can take a **very** long time. The `Polynomials.roots` function from the `Polynomials` package identifies the cluster as complex valued. Though the implementation in `RealPolynomialRoots` doesn't handle such large polynomials, the authors of the algorithm have implementations that can quickly solve polynomials with degrees as high as $10,000$. diff --git a/quarto/precalc/transformations.qmd b/quarto/precalc/transformations.qmd index 20b0e67a..333ff0a5 100644 --- a/quarto/precalc/transformations.qmd +++ b/quarto/precalc/transformations.qmd @@ -54,7 +54,7 @@ ss = sin + sqrt ss(4) ``` -Doing this works, as Julia treats functions as first class objects, lending itself to [higher](https://en.wikipedia.org/wiki/Higher-order_programming) order programming. However, this definition in general is kind of limiting, as functions in mathematics and Julia can be much more varied than just the univariate functions we have defined addition for. We won't pursue this further. +Doing this works, as Julia treats functions as first class objects, lending itself to [higher](https://en.wikipedia.org/wiki/Higher-order_programming) order programming. However, this definition in general is kind of limiting, as functions in mathematics and Julia can be much more varied than just the univariate functions we have defined addition for. Further, users shouldn't be modifying base methods on types they don't control, as that can lead to really unexpected and undesirable behaviours. This is called *type piracy*. We won't pursue this possibility further. Rather we will define new function by what they do to their values, such as `h(x) = f(x) + g(x)`. ### Composition of functions @@ -95,7 +95,7 @@ plot!(gf, label="g∘f") :::{.callout-note} ## Note -Unlike how the basic arithmetic operations are treated, `Julia` defines the infix Unicode operator `\circ[tab]` to represent composition of functions, mirroring mathematical notation. This infix operations takes in two functions and returns an anonymous function. It can be useful and will mirror standard mathematical usage up to issues with precedence rules. +Unlike how the basic arithmetic operations are treated, `Julia` defines the infix Unicode operator `\circ[tab]` to represent composition of functions, mirroring mathematical notation. This infix operations takes in two functions and returns a composed function. It can be useful and will mirror standard mathematical usage up to issues with precedence rules. ::: @@ -109,15 +109,16 @@ $$ (f \circ g)(x) = (e^x - x)^2 + 2(e^x - x) - 1. $$ -It can be helpful to think of the argument to $f$ as a "box" that gets filled in by $g$: - +It can be helpful to think of the argument to $f$ as a "box" that gets filled in by $g(x)$: +$$ \begin{align*} g(x) &=e^x - x\\ f(\square) &= (\square)^2 + 2(\square) - 1\\ f(g(x)) &= (g(x))^2 + 2(g(x)) - 1 = (e^x - x)^2 + 2(e^x - x) - 1. \end{align*} +$$ Here we look at a few compositions: @@ -171,46 +172,46 @@ To illustrate, let's define a hat-shaped function as follows: ```{julia} -𝒇(x) = max(0, 1 - abs(x)) +f(x) = max(0, 1 - abs(x)) ``` A plot over the interval $[-2,2]$ is shown here: ```{julia} -plot(𝒇, -2,2) +plot(f, -2,2) ``` The same graph of $f$ and its image shifted up by $2$ units would be given by: ```{julia} -plot(𝒇, -2, 2, label="f") -plot!(up(𝒇, 2), label="up") +plot(f, -2, 2, label="f") +plot!(up(f, 2), label="up") ``` A graph of $f$ and its shift over by $2$ units would be given by: ```{julia} -plot(𝒇, -2, 4, label="f") -plot!(over(𝒇, 2), label="over") +plot(f, -2, 4, label="f") +plot!(over(f, 2), label="over") ``` A graph of $f$ and it being stretched by $2$ units would be given by: ```{julia} -plot(𝒇, -2, 2, label="f") -plot!(stretch(𝒇, 2), label="stretch") +plot(f, -2, 2, label="f") +plot!(stretch(f, 2), label="stretch") ``` Finally, a graph of $f$ and it being scaled by $2$ would be given by: ```{julia} -plot(𝒇, -2, 2, label="f") -plot!(scale(𝒇, 2), label="scale") +plot(f, -2, 2, label="f") +plot!(scale(f, 2), label="scale") ``` Scaling by $2$ shrinks the non-zero domain, scaling by $1/2$ would stretch it. If this is not intuitive, the definition `x-> f(x/c)` could have been used, which would have opposite behaviour for scaling. @@ -226,16 +227,16 @@ A shift right by $2$ and up by $1$ is achieved through ```{julia} -plot(𝒇, -2, 4, label="f") -plot!(up(over(𝒇,2), 1), label="over and up") +plot(f, -2, 4, label="f") +plot!(up(over(f,2), 1), label="over and up") ``` -Shifting and scaling can be confusing. Here we graph `scale(over(𝒇,2),1/3)`: +Shifting and scaling can be confusing. Here we graph `scale(over(f,2),1/3)`: ```{julia} -plot(𝒇, -1,9, label="f") -plot!(scale(over(𝒇,2), 1/3), label="over and scale") +plot(f, -1,9, label="f") +plot!(scale(over(f,2), 1/3), label="over and scale") ``` This graph is over by $6$ with a width of $3$ on each side of the center. Mathematically, we have $h(x) = f((1/3)\cdot x - 2)$ @@ -245,8 +246,8 @@ Compare this to the same operations in opposite order: ```{julia} -plot(𝒇, -1, 5, label="f") -plot!(over(scale(𝒇, 1/3), 2), label="scale and over") +plot(f, -1, 5, label="f") +plot!(over(scale(f, 1/3), 2), label="scale and over") ``` This graph first scales the symmetric graph, stretching from $-3$ to $3$, then shifts over right by $2$. The resulting function is $f((1/3)\cdot (x-2))$. @@ -265,9 +266,9 @@ We can view this as a composition of "scale" by $1/a$, then "over" by $b$, and ```{julia} #| hold: true a = 2; b = 5 -𝒉(x) = stretch(over(scale(𝒇, 1/a), b), 1/a)(x) -plot(𝒇, -1, 8, label="f") -plot!(𝒉, label="h") +h(x) = stretch(over(scale(f, 1/a), b), 1/a)(x) +plot(f, -1, 8, label="f") +plot!(h, label="h") ``` (This transformation keeps the same amount of area in the triangles, can you tell from the graph?) @@ -324,12 +325,6 @@ delta = (newyork(185) - datetime) * 60 This is off by a fair amount - almost $12$ minutes. Clearly a trigonometric model, based on the assumption of circular motion of the earth around the sun, is not accurate enough for precise work, but it does help one understand how summer days are longer than winter days and how the length of a day changes fastest at the spring and fall equinoxes. -##### Example: a growth model in fisheries - - -The von Bertalanffy growth [equation](https://en.wikipedia.org/wiki/Von_Bertalanffy_function) is $L(t) =L_\infty \cdot (1 - e^{k\cdot(t-t_0)})$. This family of functions can be viewed as a transformation of the exponential function $f(t)=e^t$. Part is a scaling and shifting (the $e^{k \cdot (t - t_0)}$) along with some shifting and stretching. The various parameters have physical importance which can be measured: $L_\infty$ is a carrying capacity for the species or organism, and $k$ is a rate of growth. These parameters may be estimated from data by finding the "closest" curve to a given data set. - - ##### Example: the pipeline operator @@ -348,6 +343,44 @@ pi/2 |> g |> f The output of the preceding expression is passed as the input to the next. This notation is especially convenient when the enclosing function is not the main focus. (Some programming languages have more developed [fluent interfaces](https://en.wikipedia.org/wiki/Fluent_interface) for chaining function calls. Julia has more powerful chaining macros provided in packages, such as `DataPipes.jl` or `Chain.jl`.) +##### Example: a growth model in fisheries + + +The von Bertalanffy growth [equation](https://en.wikipedia.org/wiki/Von_Bertalanffy_function) is $L(t) =L_\infty \cdot (1 - e^{k\cdot(t-t_0)})$. This family of functions can be viewed as a transformation of the exponential function $f(t)=e^t$. Part is a scaling and shifting (the $e^{k \cdot (t - t_0)}$) along with some shifting and stretching. The various parameters have physical importance which can be measured: $L_\infty$ is a carrying capacity for the species or organism, and $k$ is a rate of growth. These parameters may be estimated from data by finding the "closest" curve to a given data set. + +##### Example: Representing data visually. + +Suppose we have a data set like the following:^[Which comes from the "Palmer Penguins" data set] + +|flipper length | bill length | body mass | gender | species | +|---------------|-------------|-----------|--------|:--------| +| 38.8 | 18.3 | 3701 | male | Adelie | +| 48.8 | 18.4 | 3733 | male | Chinstrap | +| 47.5 | 15.0 | 5076 | male | Gentoo | + +We might want to plot on an $x-y$ axis flipper length versus bill length but also indicate body size with a large size marker for bigger sizes. + +We could do so by transforming a marker: scaling by size, then shifting it to an `x-y` position; then plotting. Something like this: + +```{julia} +flipper = [38.8, 48.8, 47.5] +bill = [18.3, 18.4, 15.0] +bodymass = [3701, 4733, 5076] + +shape = Shape(:star5) +p = plot(; legend=false) +for (x,y,sz) in zip(flipper, bill, bodymass) + sz = (sz - 2000) ÷ 1000 + + new_shape = Plots.translate(Plots.scale(shape, sz, sz), x, y); + + plot!(p, new_shape; fill=(:red, 0.25), stroke=(:black, 2)) +end +p +``` + +While some of the commands in this example are unfamiliar and won't be explained further, the use of `translate` and `scale` for shapes is very similar to how transformations for functions are being described (Though this `translate` function combines `up` and `over`; and this `scale` function allows different values depending on direction.) In the above, the function names are qualified, as they are not exported by the `Plots.jl` package. More variables from the data set could be encoded through colors, different shapes etc. allowing very data-rich graphics. + ### Operators @@ -392,14 +425,14 @@ To see that it works, we take a typical function ```{julia} -𝐟(k) = 1 + k^2 +f(k) = 1 + k^2 ``` and check: ```{julia} -D(𝐟)(3), 𝐟(3) - 𝐟(3-1) +D(f)(3), f(3) - f(3-1) ``` That the two are the same value is no coincidence. (Again, pause for a second to make sure you understand why `D(f)(3)` makes sense. If this is unclear, you could name the function `D(f)` and then call this with a value of `3`.) @@ -416,7 +449,7 @@ To check if this works as expected, compare these two values: ```{julia} -S(𝐟)(4), 𝐟(1) + 𝐟(2) + 𝐟(3) + 𝐟(4) +S(f)(4), f(1) + f(2) + f(3) + f(4) ``` So one function adds, the other subtracts. Addition and subtraction are somehow inverse to each other so should "cancel" out. This holds for these two operations as well, in the following sense: subtracting after adding leaves the function alone: @@ -424,7 +457,7 @@ So one function adds, the other subtracts. Addition and subtraction are somehow ```{julia} k = 10 # some arbitrary value k >= 1 -D(S(𝐟))(k), 𝐟(k) +D(S(f))(k), f(k) ``` Any positive integer value of `k` will give the same answer (up to overflow). This says the difference of the accumulation process is just the last value to accumulate. @@ -434,7 +467,7 @@ Adding after subtracting also leaves the function alone, save for a vestige of $ ```{julia} -S(D(𝐟))(15), 𝐟(15) - 𝐟(0) +S(D(f))(15), f(15) - f(0) ``` That is the accumulation of differences is just the difference of the end values. @@ -597,9 +630,11 @@ Consider this expression $$ -\left(f(1) - f(0)\right) + \left(f(2) - f(1)\right) + \cdots + \left(f(n) - f(n-1)\right) = --f(0) + f(1) - f(1) + f(2) - f(2) + \cdots + f(n-1) - f(n-1) + f(n) = -f(n) - f(0). +\begin{align*} +\left(f(1) - f(0)\right) &+ \left(f(2) - f(1)\right) + \cdots + \left(f(n) - f(n-1)\right) \\ +&= -f(0) + f(1) - f(1) + f(2) - f(2) + \cdots + f(n-1) - f(n-1) + f(n) \\ +&=f(n) - f(0). +\end{align*} $$ Referring to the definitions of `D` and `S` in the example on operators, which relationship does this support: diff --git a/quarto/precalc/trig_functions.qmd b/quarto/precalc/trig_functions.qmd index 3f5719a3..6d496715 100644 --- a/quarto/precalc/trig_functions.qmd +++ b/quarto/precalc/trig_functions.qmd @@ -44,14 +44,16 @@ annotate!([(.75, .25, "θ"), (4.0, 1.25, "opposite"), (2, -.25, "adjacent"), (1. With these, the basic definitions for the primary trigonometric functions are - - +::: {.callout-note icon=false} +## Trigonometric definitions +$$ \begin{align*} \sin(\theta) &= \frac{\text{opposite}}{\text{hypotenuse}} &\quad(\text{the sine function})\\ \cos(\theta) &= \frac{\text{adjacent}}{\text{hypotenuse}} &\quad(\text{the cosine function})\\ -\tan(\theta) &= \frac{\text{opposite}}{\text{adjacent}}. &\quad(\text{the tangent function}) +\tan(\theta) &= \frac{\text{opposite}}{\text{adjacent}} &\quad(\text{the tangent function}) \end{align*} - +$$ +::: :::{.callout-note} ## Note @@ -122,11 +124,12 @@ Julia has the $6$ basic trigonometric functions defined through the functions `s Two right triangles - the one with equal, $\pi/4$, angles; and the one with angles $\pi/6$ and $\pi/3$ can have the ratio of their sides computed from basic geometry. In particular, this leads to the following values, which are usually committed to memory: - +$$ \begin{align*} \sin(0) &= 0, \quad \sin(\pi/6) = \frac{1}{2}, \quad \sin(\pi/4) = \frac{\sqrt{2}}{2}, \quad\sin(\pi/3) = \frac{\sqrt{3}}{2},\text{ and } \sin(\pi/2) = 1\\ \cos(0) &= 1, \quad \cos(\pi/6) = \frac{\sqrt{3}}{2}, \quad \cos(\pi/4) = \frac{\sqrt{2}}{2}, \quad\cos(\pi/3) = \frac{1}{2},\text{ and } \cos(\pi/2) = 0. \end{align*} +$$ Using the circle definition allows these basic values to inform us of values throughout the unit circle. @@ -144,10 +147,10 @@ The fact that $x^2 + y^2 = 1$ for the unit circle leads to the "Pythagorean iden $$ -\sin(\theta)^2 + \cos(\theta)^2 = 1. +\sin^2(\theta) + \cos^2(\theta) = 1. $$ -This basic fact can be manipulated many ways. For example, dividing through by $\cos(\theta)^2$ gives the related identity: $\tan(\theta)^2 + 1 = \sec(\theta)^2$. +This basic fact can be manipulated many ways. For example, dividing through by $\cos^2(\theta)$ gives the related identity: $\tan^2(\theta) + 1 = \sec^2(\theta)$. `Julia`'s functions can compute values for any angles, including these fundamental ones: @@ -157,8 +160,15 @@ This basic fact can be manipulated many ways. For example, dividing through by $ [cos(theta) for theta in [0, pi/6, pi/4, pi/3, pi/2]] ``` -These are floating point approximations, as can be seen clearly in the last value. Symbolic math can be used if exactness matters: +To compute $\sin^2(\theta)$, the power is applied to the value of $\sin(\theta)$ and not the `sin` function. (Think of $\sin^2(\theta)$ as $(sin(\theta))^2$: + +```{julia} +theta = pi/8 +sin(theta)^2 +``` + +These values are floating point approximations, as can be seen clearly in the computation of `sin(pi/2)`, which is mathematically $0$. Symbolic math can be usedby using `PI` for `pi` if exactness matters: ```{julia} cos.([0, PI/6, PI/4, PI/3, PI/2]) @@ -167,7 +177,7 @@ cos.([0, PI/6, PI/4, PI/3, PI/2]) The `sincos` function computes both `sin` and `cos` simultaneously, which can be more performant when both values are needed. -```{juila} +```{julia} sincos(pi/3) ``` @@ -266,11 +276,11 @@ $$ g(x) = a + b \sin((2\pi n)x) $$ -That is $g$ is shifted up by $a$ units, scaled vertically by $b$ units and has a period of $1/n$. We see a simple plot here where we can verify the transformation: +That is a graph of $g$ will be the sine curve shifted up by $a$ units, scaled vertically by $b$ units and has a period of $1/n$. We see a simple plot here where we can verify the transformation: ```{julia} -g(x; b=1,n=1) = b*sin(2pi*n*x) +g(x; b=1, n=1) = b*sin(2pi*n*x) g1(x) = 1 + g(x, b=2, n=3) plot(g1, 0, 1) ``` @@ -388,51 +398,181 @@ In `Julia`, the functions `sind`, `cosd`, `tand`, `cscd`, `secd`, and `cotd` are Consider the point on the unit circle $(x,y) = (\cos(\theta), \sin(\theta))$. In terms of $(x,y)$ (or $\theta$) is there a way to represent the angle found by rotating an additional $\theta$, that is what is $(\cos(2\theta), \sin(2\theta))$? -More generally, suppose we have two angles $\alpha$ and $\beta$, can we represent the values of $(\cos(\alpha + \beta), \sin(\alpha + \beta))$ using the values just involving $\beta$ and $\alpha$ separately? +More generally, suppose we have two angles $\alpha$ and $\beta$, can we represent the values of $(\cos(\alpha + \beta), \sin(\alpha + \beta))$ using the values just involving $\beta$ and $\alpha$ separately? The sum formulas express the sine and cosine of $\alpha + \beta$ in terms of the sines and cosines of $\alpha$ and $\beta$. We show variations on the basic decomposition of a right triangle using sine and cosine to illustrate the resulting formula. According to [Wikipedia](https://en.wikipedia.org/wiki/Trigonometric_functions#Identities) this geometric derivation has ideas that date to Ptolemy. -According to [Wikipedia](https://en.wikipedia.org/wiki/Trigonometric_functions#Identities) the following figure (from [mathalino.com](http://www.mathalino.com/reviewer/derivation-of-formulas/derivation-of-sum-and-difference-of-two-angles)) has ideas that date to Ptolemy: +Suppose both $\alpha$ and $\beta$ are positive with $\alpha + \beta \leq \pi/2$. Then using right triangle geometry we can associate the sine and cosine of $\alpha + \beta$ with distances in this figure: ```{julia} #| echo: false -# ImageFile(:precalc, "figures/summary-sum-and-difference-of-two-angles.jpg", "Relations between angles") -nothing -``` -![Relations between angles](figures/summary-sum-and-difference-of-two-angles.jpg) +using Plots, LaTeXStrings + +# two angles +α = pi/5 +β = pi/6 + +## points +A = (0,0) +B = (cos(α + β), 0) +C = (cos(α)*cos(β), 0) +D = (cos(α + β), sin(α)cos(β)) +E = (cos(α)*cos(β), sin(α)cos(β)) +F = (cos(α + β), sin(α + β)) + +color1 = :royalblue +color2 = :forestgreen +color3 = :brown3 +color4 = :mediumorchid2 +canvas() = plot(axis=([],false), legend=false, aspect_ratio=:equal) +p1 = canvas() +plot!(Shape([A,B,F]), fill=(color4, 0.15)) + +r = 0.10 +dae = sqrt(sum((A.-E).^2)) +daf = sqrt(sum((A.-F).^2)) +dbc = sqrt(sum((B.-C).^2)) +dce = sqrt(sum((C.-E).^2)) +def = sqrt(sum((E.-F).^2)) +dde = sqrt(sum((D.-E).^2)) +ddf = sqrt(sum((D.-F).^2)) +Δ = 0.0 + +alphabeta = (r*cos(α/2 + β/2), r*sin(α/2 + β/2), + text("α + β",:hcenter; rotation=pi/2)) +cosαβ = (B[1]/2, 0, text("cos(α + β)", :top)) +sinαβ = (B[1], F[2]/2, text("sin(α + β)")) + +txtpoints = ( + one = (F[1]/2, F[2]/2, "1",:right), + beta=(r*cos(α + β/2), r*sin(α + β/2), + text("β", :hcenter)), + alpha = (r*cos(α/2), r*sin(α/2), + text("α",:hcenter)), + alphaa = (F[1] + r*sin(α/2), F[2] - r*cos(α/2) , + text("α"),:hcenter), + cosβ = (dae/2*cos(α),dae/2*sin(α) + Δ, + text("cos(β)",:hcenter)), + sinβ = (B[1] + dbc/2 + Δ/2, D[2] + ddf/2 + Δ/2, + text("sin(β)",:bottom)), + cosαcosβ = (C[1]/2, 0 - Δ, text("cos(α)cos(β)", :top)), + sinαcosβ = (cos(α)*cos(β) - 0.1, dce/2 , + text("sin(α)cos(β)", :hcenter)), + cosαsinβ = (D[1] - Δ, D[2] + ddf/2 , + text("cos(α)sin(β)", :top)), + sinαsinβ = (D[1] + dde/2, D[2] + Δ , + text("sin(α)sin(β)", :top)), +) + +# Plot 1 +p1 = canvas() +plot!(Shape([A,B,F]), fill=(color4, 0.15)) +annotate!([txtpoints[:one], alphabeta, cosαβ, sinαβ]) +plot!([A,B,F,A]; line=(5,:red, 0.25)) + + +# Plot 2 +p2 = canvas() +plot!(Shape([A,E,F]), fill=(color1, 0.15)) +plot!([A,B,F,A]; line=(5,:red, 0.25)) +annotate!(map(s ->getindex(txtpoints,s), [:one, :cosβ, :sinβ, :beta])) + + +# Plot 3 +p3 = canvas() +plot!(Shape([A,E,F]), fill=(color1, 0.15)) +plot!(Shape([A,C,E]), fill=(color2, 0.15)) +annotate!(map(s ->getindex(txtpoints,s), [:alpha, :cosβ, :cosαcosβ, :sinαcosβ])) +plot!([A,B,F,A]; line=(5,:red, 0.25)) +annotate!([txtpoints[:beta]]) + +# Plot 4 +p4 = canvas() +plot!(Shape([A,E,D, F]), fill=(color1, 0.15)) +plot!(Shape([A,C,E]), fill=(color2, 0.15)) +plot!(Shape([D,E,F]), fill=(color3, 0.15)) +annotate!(map(s ->getindex(txtpoints,s), [:alphaa, :sinβ, :sinαsinβ, :cosαsinβ])) +plot!([A,B,F,A]; line=(5,:red, 0.25)) + +# Plot 5 +p5 = canvas() +plot!(Shape([A,E,D, F]), fill=(color1, 0.15)) +plot!(Shape([A,C,E]), fill=(color2, 0.15)) +plot!(Shape([D,E,F]), fill=(color3, 0.15)) +plot!(Shape([F,B]), fill=(:black, 0.35)) +annotate!(map(s ->getindex(txtpoints,s), collect(keys(txtpoints)))) + + +p1 +``` + +Another right triangle with hypotenuse of length $1$ can be made by isolating the angle $\beta$, as below: -To read this, there are three triangles: the bigger (green with pink part) has hypotenuse $1$ (and adjacent and opposite sides that form the hypotenuses of the other two); the next biggest (yellow) hypotenuse $\cos(\beta)$, adjacent side (of angle $\alpha$) $\cos(\beta)\cdot \cos(\alpha)$, and opposite side $\cos(\beta)\cdot\sin(\alpha)$; and the smallest (pink) hypotenuse $\sin(\beta)$, adjacent side (of angle $\alpha$) $\sin(\beta)\cdot \cos(\alpha)$, and opposite side $\sin(\beta)\sin(\alpha)$. +```{julia} +#| echo: false +p2 +``` -This figure shows the following sum formula for sine and cosine: +We can make two more right triangles one with hypotenuse $\cos(\beta)$ and one with hypotenuse $\sin(\beta)$; each having an angle $\alpha$, the latter using some geometry, for which we can apply right-triangle trigonometry to find the length of their sides. +```{julia} +#| echo: false +plot(p3, p4) +``` +From the left figure and the initial triangle, by comparing the lengths along the $x$ direction, we can see the decomposition: -\begin{align*} -\sin(\alpha + \beta) &= \sin(\alpha)\cos(\beta) + \cos(\alpha)\sin(\beta), & (\overline{CE} + \overline{DF})\\ -\cos(\alpha + \beta) &= \cos(\alpha)\cos(\beta) - \sin(\alpha)\sin(\beta). & (\overline{AC} - \overline{DE}) -\end{align*} +$$ +\cos(\alpha)\cos(\beta) = \cos(\alpha + \beta) + \sin(\alpha)\sin(\beta) +$$ +Similarly, this relationship comes from considering the vertical lengths: -Using the fact that $\sin$ is an odd function and $\cos$ an even function, related formulas for the difference $\alpha - \beta$ can be derived. +$$ +\sin(\alpha+\beta) = \sin(\alpha)\cos(\beta) + \cos(\alpha)\sin(\beta) +$$ +These lead to: -Taking $\alpha = \beta$ we immediately get the "double-angle" formulas: +::: {.callout-note icon=false} +## The *sum* formulas for sine and cosine +$$ +\begin{align*} +\sin(\alpha+\beta) &= \sin(\alpha)\cos(\beta) + \cos(\alpha)\sin(\beta) \\ +\cos(\alpha + \beta) &= \cos(\alpha)\cos(\beta) - \sin(\alpha)\sin(\beta) +\end{align*} +$$ +::: +Taking $\alpha = \beta$ we immediately get +::: {.callout-note icon=false} +## The "double-angle" formulas +$$ \begin{align*} \sin(2\alpha) &= 2\sin(\alpha)\cos(\alpha)\\ -\cos(2\alpha) &= \cos(\alpha)^2 - \sin(\alpha)^2. +\cos(2\alpha) &= \cos^2(\alpha) - \sin^2(\alpha). \end{align*} +$$ +::: - -The latter looks like the Pythagorean identify, but has a minus sign. In fact, the Pythagorean identify is often used to rewrite this, for example $\cos(2\alpha) = 2\cos(\alpha)^2 - 1$ or $1 - 2\sin(\alpha)^2$. +The latter looks like the Pythagorean identify, but has a minus sign. In fact, the Pythagorean identify is often used to rewrite this, for example $\cos(2\alpha) = 2\cos^2(\alpha) - 1$ or $1 - 2\sin^2(\alpha)$. -Applying the above with $\alpha = \beta/2$, we get that $\cos(\beta) = 2\cos(\beta/2)^2 -1$, which rearranged yields the "half-angle" formula: $\cos(\beta/2)^2 = (1 + \cos(\beta))/2$. +Applying the above with $\alpha = \beta/2$, we get that $\cos(\beta) = 2\cos^2(\beta/2) -1$. Similarly, using the Pythagorean identity a formula for sine can be done; when rearranged these yield the "half-angle" formulas: +::: {.callout-note icon=false} +## The "half-angle" formula +$$ +\begin{align*} +\sin^2(\frac{\beta}{2}) &= \frac{1 - \cos(\beta)}{2}\\ +\cos^2(\frac{\beta}{2}) &= \frac{1 + \cos(\beta)}{2} +\end{align*} +$$ +::: ##### Example @@ -440,18 +580,19 @@ Applying the above with $\alpha = \beta/2$, we get that $\cos(\beta) = 2\cos(\be Consider the expressions $\cos((n+1)\theta)$ and $\cos((n-1)\theta)$. These can be re-expressed as: - +$$ \begin{align*} \cos((n+1)\theta) &= \cos(n\theta + \theta) = \cos(n\theta) \cos(\theta) - \sin(n\theta)\sin(\theta), \text{ and}\\ \cos((n-1)\theta) &= \cos(n\theta - \theta) = \cos(n\theta) \cos(-\theta) - \sin(n\theta)\sin(-\theta). \end{align*} +$$ But $\cos(-\theta) = \cos(\theta)$, whereas $\sin(-\theta) = -\sin(\theta)$. Using this, we add the two formulas above to get: $$ -\cos((n+1)\theta) = 2\cos(n\theta) \cos(\theta) - \cos((n-1)\theta). +\cos((n+1)\theta) = 2 \cos(\theta) \cos(n\theta) - \cos((n-1)\theta). $$ That is the angle for a multiple of $n+1$ can be expressed in terms of the angle with a multiple of $n$ and $n-1$. This can be used recursively to find expressions for $\cos(n\theta)$ in terms of polynomials in $\cos(\theta)$. @@ -554,11 +695,11 @@ The approximation error is about $2.7$ percent. ##### Example -The AMS has an interesting column on [rainbows](http://www.ams.org/publicoutreach/feature-column/fcarc-rainbows) the start of which uses some formulas from the previous example. Click through to see a ray of light passing through a spherical drop of water, as analyzed by Descartes. The deflection of the ray occurs when the incident light hits the drop of water, then there is an *internal* deflection of the light, and finally when the light leaves, there is another deflection. The total deflection (in radians) is $D = (i-r) + (\pi - 2r) + (i-r) = \pi + 2i - 4r$. However, the incident angle $i$ and the refracted angle $r$ are related by Snell's law: $\sin(i) = n \sin(r)$. The value $n$ is the index of refraction and is $4/3$ for water. (It was $3/2$ for glass in the previous example.) This gives +The AMS has an interesting column on [rainbows](http://www.ams.org/publicoutreach/feature-column/fcarc-rainbows) the start of which uses some formulas from the previous example. Click through to see a ray of light passing through a spherical drop of water, as analyzed by Descartes. The deflection of the ray occurs when the incident light hits the drop of water, then there is an *internal* deflection of the light, and finally when the light leaves, there is another deflection. The total deflection (in radians) is $d = (i-r) + (\pi - 2r) + (i-r) = \pi + 2i - 4r$. However, the incident angle $i$ and the refracted angle $r$ are related by Snell's law: $\sin(i) = n \sin(r)$. The value $n$ is the index of refraction and is $4/3$ for water. (It was $3/2$ for glass in the previous example.) This gives $$ -D = \pi + 2i - 4 \arcsin(\frac{1}{n} \sin(i)). +d = \pi + 2i - 4 \arcsin(\frac{1}{n} \sin(i)). $$ Graphing this for incident angles between $0$ and $\pi/2$ we have: @@ -567,8 +708,8 @@ Graphing this for incident angles between $0$ and $\pi/2$ we have: ```{julia} #| hold: true n = 4/3 -D(i) = pi + 2i - 4 * asin(sin(i)/n) -plot(D, 0, pi/2) +d(i) = pi + 2i - 4 * asin(sin(i)/n) +plot(d, 0, pi/2) ``` Descartes was interested in the minimum value of this graph, as it relates to where the light concentrates. This is roughly at $1$ radian or about $57$ degrees: @@ -588,17 +729,17 @@ Consider again this equation derived with the sum-and-difference formula: $$ -\cos((n+1)\theta) = 2\cos(n\theta) \cos(\theta) - \cos((n-1)\theta). +\cos((n+1)\theta) = 2 \cos(\theta) \cos(n\theta) - \cos((n-1)\theta). $$ -Let $T_n(x) = \cos(n \arccos(x))$. Calling $\theta = \arccos(x)$ for $-1 \leq x \leq 1$ we get a relation between these functions: +Let $T_n(x) = \cos(n \arccos(x))$. Note $T_1(x) = \cos(x)$. By identifying $\theta$ with $\arccos(x)$ for $-1 \leq x \leq 1$, we get a relation between these functions: $$ T_{n+1}(x) = 2x T_n(x) - T_{n-1}(x). $$ -We can simplify a few: For example, when $n=0$ we see immediately that $T_0(x) = 1$, the constant function. Whereas with $n=1$ we get $T_1(x) = \cos(\arccos(x)) = x$. Things get more interesting as we get bigger $n$, for example using the equation above we get $T_2(x) = 2xT_1(x) - T_0(x) = 2x\cdot x - 1 = 2x^2 - 1$. Continuing, we'd get $T_3(x) = 2 x T_2(x) - T_1(x) = 2x(2x^2 - 1) - x = 4x^3 -3x$. +We can simplify a few of the above : For example, when $n=0$ we see immediately that $T_0(x) = 1$, the constant function. We used above that for $n=1$ we get $T_1(x) = \cos(\arccos(x)) = x$. Things get more interesting as we get bigger $n$, for example using the equation above we get $T_2(x) = 2xT_1(x) - T_0(x) = 2x\cdot x - 1 = 2x^2 - 1$. Continuing, we'd get $T_3(x) = 2 x T_2(x) - T_1(x) = 2x(2x^2 - 1) - x = 4x^3 -3x$. A few things become clear from the above two representations: @@ -621,7 +762,7 @@ plot!(abs ∘ q, -1,1, label="|q|") ## Hyperbolic trigonometric functions -Related to the trigonometric functions are the hyperbolic trigonometric functions. Instead of associating a point $(x,y)$ on the unit circle with an angle $\theta$, we associate a point $(x,y)$ on the unit *hyperbola* ($x^2 - y^2 = 1$). We define the hyperbolic sine ($\sinh$) and hyperbolic cosine ($\cosh$) through $(\cosh(\theta), \sinh(\theta)) = (x,y)$. +Related to the trigonometric functions are the hyperbolic trigonometric functions. Instead of associating a point $(x,y)$ on the unit circle with an angle $\theta,$ we associate a point $(x,y)$ on the unit *hyperbola* ($x^2 - y^2 = 1$). We define the hyperbolic sine ($\sinh$) and hyperbolic cosine ($\cosh$) through $(\cosh(\theta), \sinh(\theta)) = (x,y)$. ```{julia} @@ -671,11 +812,12 @@ end These values are more commonly expressed using the exponential function as: - +$$ \begin{align*} \sinh(x) &= \frac{e^x - e^{-x}}{2}\\ \cosh(x) &= \frac{e^x + e^{-x}}{2}. \end{align*} +$$ The hyperbolic tangent is then the ratio of $\sinh$ and $\cosh$. As well, three inverse hyperbolic functions can be defined. @@ -791,7 +933,7 @@ numericq(val) ###### Question -For any positive integer $n$ the equation $\cos(x) - nx = 0$ has a solution in $[0, \pi/2]$. Graphically estimate the value when $n=10$. +For any positive integer $n$ the equation $\cos(x) - nx = 0$ has a solution in $[0, \pi/2].$ Graphically estimate the value when $n=10.$ ```{julia} diff --git a/quarto/precalc/variables.qmd b/quarto/precalc/variables.qmd index a939c5e8..2726bd70 100644 --- a/quarto/precalc/variables.qmd +++ b/quarto/precalc/variables.qmd @@ -28,7 +28,7 @@ nothing The Google calculator has a button `Ans` to refer to the answer to the previous evaluation. This is a form of memory. The last answer is stored in a specific place in memory for retrieval when `Ans` is used. In some calculators, more advanced memory features are possible. For some, it is possible to push values onto a stack of values for them to be referred to at a later time. This proves useful for complicated expressions, say, as the expression can be broken into smaller intermediate steps to be computed. These values can then be appropriately combined. This strategy is a good one, though the memory buttons can make its implementation a bit cumbersome. -With `Julia`, as with other programming languages, it is very easy to refer to past evaluations. This is done by *assignment* whereby a computed value stored in memory is associated with a name. The name can be used to look up the value later. Assignment does not change the value of the object being assigned, it only introduces a reference to it. +With `Julia`, as with other programming languages, it is very easy to refer to past evaluations. This is done by *assignment* whereby a computed value stored in memory is associated with a name (sometimes thought of as symbol or label). The name can be used to look up the value later. Assignment does not change the value of the object being assigned, it only introduces a reference to it. Assignment in `Julia` is handled by the equals sign and takes the general form `variable_name = value`. For example, here we assign values to the variables `x` and `y` @@ -49,7 +49,7 @@ x Just typing a variable name (without a trailing semicolon) causes the assigned value to be displayed. -Variable names can be reused, as here, where we redefine `x`: +Variable names can be reused (or reassigned), as here, where we redefine `x`: ```{julia} @@ -115,7 +115,7 @@ By defining a new variable `a` to represent a value that is repeated a few times A [grass swale](https://stormwater.pca.state.mn.us/index.php?title=Design_criteria_for_dry_swale_(grass_swale)) is a design to manage surface water flow resulting from a storm. Swales detain, filter, and infiltrate runoff limiting erosion in the process. -![Swale cross section](precalc/figures/swale.png) +![Swale cross section](figures/swale.png) There are a few mathematical formula that describe the characteristics of swale: @@ -155,7 +155,7 @@ n, S = 0.025, 2/90 A = (b + d/tan(theta)) * d P = b + 2d/sin(theta) R = A / P -Q = R^(2/3) * S^(1/2) * A / n +Q = R^(2/3) * S^(1/2) / n * A ``` @@ -198,7 +198,7 @@ This is completely unlike the mathematical equation $x = x^2$ which is typically ##### Example -Having `=` as assignment is usefully exploited when modeling sequences. For example, an application of Newton's method might end up with this expression: +Having `=` as assignment is usefully exploited when modeling sequences. For example, an application of Newton's method might end up with this mathematical expression: $$ @@ -208,7 +208,7 @@ $$ As a mathematical expression, for each $i$ this defines a new value for $x_{i+1}$ in terms of a known value $x_i$. This can be used to recursively generate a sequence, provided some starting point is known, such as $x_0 = 2$. -The above might be written instead with: +The above might be written instead using assignment with: ```{julia} @@ -220,11 +220,19 @@ x = x - (x^2 - 2) / (2x) Repeating this last line will generate new values of `x` based on the previous one - no need for subscripts. This is exactly what the mathematical notation indicates is to be done. +::: {.callout-note} +## Use of = + +The distinction between ``=`` versus `=` is important and one area where common math notation and common computer notation diverge. The mathematical ``=`` indicates *equality*, and is often used with equations and also for assignment. Later, when symbolic math is introduced, the `~` symbol will be used to indicate an equation, though this is by convention and not part of base `Julia`. The computer syntax use of `=` is for *assignment* and *re-assignment*. Equality is tested with `==` and `===`. + +::: ## Context -The binding of a value to a variable name happens within some context. For our simple illustrations, we are assigning values, as though they were typed at the command line. This stores the binding in the `Main` module. `Julia` looks for variables in this module when it encounters an expression and the value is substituted. Other uses, such as when variables are defined within a function, involve different contexts which may not be visible within the `Main` module. +The binding of a value to a variable name happens within some context. When a variable is assigned or referenced, the scope of the variable -- the region of code where it is accessible -- is taken into consideration. + +For our simple illustrations, we are assigning values, as though they were typed at the command line. This stores the binding in the `Main` module. `Julia` looks for variables in this module when it encounters an expression and the value is substituted. Other uses, such as when variables are defined within a function, involve different contexts which may not be visible within the `Main` module. :::{.callout-note} @@ -235,14 +243,16 @@ The `varinfo` function will list the variables currently defined in the main wor :::{.callout-warning} ## Warning -**Shooting oneselves in the foot.** `Julia` allows us to locally redefine variables that are built in, such as the value for `pi` or the function object assigned to `sin`. For example, this is a perfectly valid command `sin=3`. However, it will overwrite the typical value of `sin` so that `sin(3)` will be an error. At the terminal, the binding to `sin` occurs in the `Main` module. This shadows that value of `sin` bound in the `Base` module. Even if redefined in `Main`, the value in base can be used by fully qualifying the name, as in `Base.sin(pi)`. This uses the notation `module_name.variable_name` to look up a binding in a module. +**Shooting oneselves in the foot.** `Julia` allows us to locally redefine variables that are built in, such as the value for `pi` or the function object assigned to `sin`. This is called shadowing. For example, this is a perfectly valid command `sin=3`. However, it will overwrite the typical value of `sin` so that `sin(3)` will be an error. At the terminal, the binding to `sin` occurs in the `Main` module. This shadows that value of `sin` bound in the `Base` module. Even if redefined in `Main`, the value in base can be used by fully qualifying the name, as in `Base.sin(pi)`. This uses the notation `module_name.variable_name` to look up a binding in a module. ::: ## Variable names -`Julia` has a very wide set of possible [names](https://docs.julialang.org/en/stable/manual/variables/#Allowed-Variable-Names-1) for variables. Variables are case sensitive and their names can include many [Unicode](http://en.wikipedia.org/wiki/List_of_Unicode_characters) characters. Names must begin with a letter or an appropriate Unicode value (but not a number). There are some reserved words, such as `try` or `else` which can not be assigned to. However, many built-in names can be locally overwritten. Conventionally, variable names are lower case. For compound names, it is not unusual to see them squished together, joined with underscores, or written in camelCase. +`Julia` has a very wide set of possible [names](https://docs.julialang.org/en/stable/manual/variables/#Allowed-Variable-Names-1) for variables. Variables are case sensitive and their names can include many [Unicode](http://en.wikipedia.org/wiki/List_of_Unicode_characters) characters. Names must begin with a letter or an appropriate Unicode value (but not a number). There are some reserved words, such as `try` or `else` which can not be assigned to. However, many built-in names can be locally overwritten (shadowed). + +Conventionally, variable names are lower case. For compound names, it is not unusual to see them squished together, joined with underscores, or written in camelCase. ```{julia} @@ -273,17 +283,19 @@ For example, we could have defined `theta` (`\theta[tab]`) and `v0` (`v\_0[tab]` ``` :::{.callout-note} -## Unicode -These notes can be presented as HTML files *or* as `Pluto` notebooks. They often use Unicode alternatives to avoid the `Pluto` requirement of a single use of assigning to a variable name in a notebook without placing the assignment in a `let` block or a function body. +## Emojis +There is even support for tab-completion of [emojis](https://github.com/JuliaLang/julia/blob/master/stdlib/REPL/src/emoji_symbols.jl) such as `\:snowman:[tab]` or `\:koala:[tab]` ::: + :::{.callout-note} -## Emojis -There is even support for tab-completion of [emojis](https://github.com/JuliaLang/julia/blob/master/stdlib/REPL/src/emoji_symbols.jl) such as `\:snowman:[tab]` or `\:koala:[tab]` +## Unicode +These notes often use Unicode alternatives for some variable. Originally this was to avoid a requirement of `Pluto` of a single use of assigning to a variable name in a notebook without placing the assignment in a `let` block or a function body. Now, they are just for clarity through distinction. ::: + ##### Example @@ -322,7 +334,7 @@ a, b = 1, 2 a, b = b, a ``` -#### Example, finding the slope +### Example, finding the slope Find the slope of the line connecting the points $(1,2)$ and $(4,6)$. We begin by defining the values and then applying the slope formula: @@ -337,6 +349,7 @@ m = (y1 - y0) / (x1 - x0) Of course, this could be computed directly with `(6-2) / (4-1)`, but by using familiar names for the values we can be certain we apply the formula properly. + ## Questions diff --git a/quarto/precalc/vectors.qmd b/quarto/precalc/vectors.qmd index cd4e2c12..58f3ff71 100644 --- a/quarto/precalc/vectors.qmd +++ b/quarto/precalc/vectors.qmd @@ -87,12 +87,13 @@ For the motion in the above figure, the object's $x$ and $y$ values change accor It is common to work with *both* formulas at once. Mathematically, when graphing, we naturally pair off two values using Cartesian coordinates (e.g., $(x,y)$). Another means of combining related values is to use a *vector*. The notation for a vector varies, but to distinguish them from a point we will use $\langle x,~ y\rangle$. With this notation, we can use it to represent the position, the velocity, and the acceleration at time $t$ through: - +$$ \begin{align*} \vec{x} &= \langle x_0 + v_{0x}t,~ -(1/2) g t^2 + v_{0y}t + y_0 \rangle,\\ \vec{v} &= \langle v_{0x},~ -gt + v_{0y} \rangle, \text{ and }\\ \vec{a} &= \langle 0,~ -g \rangle. \end{align*} +$$ @@ -358,7 +359,13 @@ fibs = [1, 1, 2, 3, 5, 8, 13] Later we will discuss different ways to modify the values of a vector to create new ones, similar to how scalar multiplication does. -As mentioned, vectors in `Julia` are comprised of elements of a similar type, but the type is not limited to numeric values. For example, a vector of strings might be useful for text processing, a vector of Boolean values can naturally arise, some applications are even naturally represented in terms of vectors of vectors (such as happens when plotting a collection points). Look at the output of these two vectors: +As mentioned, vectors in `Julia` are comprised of elements of a similar type, but the type is not limited to numeric values. Some examples:, +* a vector of strings might be useful for text processing, For example, the `WordTokenizers.jl` package takes text and produces tokens from the words. +* a vector of Boolean values can naturally arise and is widely used within Julia's `DataFrames.jl` package. +* some applications are even naturally represented in terms of vectors of vectors (such as happens when plotting a collection points). + + +Look at the output of these two vectors, in particular how the underlying type of the components is described on printing. ```{julia} @@ -369,6 +376,8 @@ As mentioned, vectors in `Julia` are comprised of elements of a similar type, [true, false, true] # vector of Bool values ``` + + Finally, we mention that if `Julia` has values of different types it will promote them to a common type if possible. Here we combine three types of numbers, and see that each is promoted to `Float64`: @@ -441,7 +450,7 @@ vs[2] = 10 vs ``` -The assignment `vs[2]` is different than the initial assignment `vs=[1,2]` in that, `vs[2]=10` **modifies** the container that `vs` points to, whereas `v=[1,2]` **replaces** the binding for `vs`. The indexed assignment is then more memory efficient when vectors are large. This point is also of interest when passing vectors to functions, as a function may modify components of the vector passed to it, though can't replace the container itself. +The assignment `vs[2]` is different than the initial assignment `vs=[1,2]` in that, `vs[2]=10` **modifies** the container that `vs` points to, whereas `vs=[1,2]` **replaces** any binding for `vs`. The indexed assignment is more memory efficient when vectors are large. This point is also of interest when passing vectors to functions, as a function may modify components of the vector passed to it, though can't replace the container itself. ## Some useful functions for working with vectors. @@ -520,7 +529,7 @@ append!(v1, [6,8,7]) These two functions modify or mutate the values stored within the vector `𝒗` that passed as an argument. In the `push!` example above, the value `5` is added to the vector of $4$ elements. In `Julia`, a convention is to name mutating functions with a trailing exclamation mark. (Again, these do not mutate the binding of `𝒗` to the container, but do mutate the contents of the container.) There are functions with mutating and non-mutating definitions, an example is `sort` and `sort!`. -If only a mutating function is available, like `push!`, and this is not desired a copy of the vector can be made. It is not enough to copy by assignment, as with `w = 𝒗`. As both `w` and `𝒗` will be bound to the same memory location. Rather, you call `copy` to make a new container with copied contents, as in `w = copy(𝒗)`. +If only a mutating function is available, like `push!`, and this is not desired a copy of the vector can be made. It is not enough to copy by assignment, as with `w = 𝒗`. As both `w` and `𝒗` will be bound to the same memory location. Rather, you call `copy` (or sometimes `deepcopy`) to make a new container with copied contents, as in `w = copy(𝒗)`. Creating new vectors of a given size is common for programming, though not much use will be made here. There are many different functions to do so: `ones` to make a vector of ones, `zeros` to make a vector of zeros, `trues` and `falses` to make Boolean vectors of a given size, and `similar` to make a similar-sized vector (with no particular values assigned). @@ -610,7 +619,7 @@ This shows many of the manipulations that can be made with vectors. Rather than :::{.callout-note} ## Note -The `map` function is very much related to broadcasting and similarly named functions are found in many different programming languages. (The "dot" broadcast is mostly limited to `Julia` and mirrors on a similar usage of a dot in `MATLAB`.) For those familiar with other programming languages, using `map` may seem more natural. Its syntax is `map(f, xs)`. +The `map` function is very much related to broadcasting and similarly named functions are found in many different programming languages. (The "dot" broadcast is mostly limited to `Julia` and mirrors a similar usage of a dot in `MATLAB`.) For those familiar with other programming languages, using `map` may seem more natural. Its syntax is `map(f, xs)`. ::: @@ -631,18 +640,18 @@ Comprehension notation is similar. The above could be created in `Julia` with: ```{julia} -𝒙s = [1,2,3,4,5] -[x^3 for x in 𝒙s] +xs = [1,2,3,4,5] +[x^3 for x in xs] ``` Something similar can be done more succinctly: ```{julia} -𝒙s .^ 3 +xs .^ 3 ``` -However, comprehensions have a value when more complicated expressions are desired as they work with an expression of `𝒙s`, and not a pre-defined or user-defined function. +However, comprehensions have a value when more complicated expressions are desired as they work with an expression of `xs`, and not a pre-defined or user-defined function. Another typical example of set notation might include a condition, such as, the numbers divisible by $7$ between $1$ and $100$. Set notation might be: @@ -680,21 +689,21 @@ The first task is to create the data. We will soon see more convenient ways to g ```{julia} a, b, n = -1, 1, 7 d = (b-a) // (n-1) -𝐱s = [a, a+d, a+2d, a+3d, a+4d, a+5d, a+6d] # 7 points +xs = [a, a+d, a+2d, a+3d, a+4d, a+5d, a+6d] # 7 points ``` To get the corresponding $y$ values, we can use a compression (or define a function and broadcast): ```{julia} -𝐲s = [x^2 for x in 𝐱s] +ys = [x^2 for x in xs] ``` Vectors can be compared together by combining them into a separate container, as follows: ```{julia} -[𝐱s 𝐲s] +[xs ys] ``` (If there is a space between objects they are horizontally combined. In our construction of vectors using `[]` we used a comma for vertical combination. More generally we should use a `;` for vertical concatenation.) @@ -711,8 +720,41 @@ The style generally employed here is to use plural variable names for a collecti ## Other container types +We end this section with some general comments that are for those interested in a bit more, but in general aren't needed to understand most all of what follows later. + +Vectors in `Julia` are a container for values. Vectors are +one of many different types of containers. The `Julia` manual uses the word "collection" to refer to a container of values that has properties like a vector. +Here we briefly review some alternate container types that are common in Julia and find use in these notes. + +First, here are some of the properties of a *vector*: + +* Vectors are *homogeneous*. That is, the container holding the vectors all have a common type. This type might be an abstract type, but for high performance, concrete types (like 64-bit floating point or 64-bit integers) are more typical. + +* Vectors are $1$-dimensional. + +* Vectors are ordered and *indexable* by their order. In Julia, the default indexing for vectors is $1$-based (starting) with one, with numeric access to the first, second, third, ..., last entries. + +* Vectors are *mutable*. That is, their elements may be changed; the container may be grown or shrunk + +* Vectors are *iterable*. That is, their values can be accessed one-by-one in various manners. + +These properties may not all be desirable for one reason or the other and `Julia` has developed a large number of alternative container types of which we describe a few here. + +### Arrays + +Vectors are $1$-dimensional, but there are desires for other dimensions. Vectors are a implemented as a special case of a more general array type. Arrays are of dimension $N$ for various non-negative values of $N$. A common, and somewhat familiar, mathematical use of a $2$-dimensional array is a matrix. + +Arrays can have their entries accessed by dimension and within that dimension their components. By default these are $1$-based, but other offsets are possible through the `OffsetArrays.jl` package. A matrix can refer to its values either by row and column indices or, as a matrix has linear indexing by a single index. -Vectors in `Julia` are a container, one of many different types. Another useful type for programming purposes are *tuples*. If a vector is formed by placing comma-separated values within a `[]` pair (e.g., `[1,2,3]`), a tuple is formed by placing comma-separated values within a `()` pair. A tuple of length $1$ uses a convention of a trailing comma to distinguish it from a parenthesized expression (e.g. `(1,)` is a tuple, `(1)` is just the value `1`). +For large collections of data with many entries being $0$ a sparse array is beneficial for less memory intensive storage. These are implemented in the `SparseArrays.jl` package. + +There are numerous array types available. `Julia` has a number of *generic* methods for working with different arrays. An example would be `eachindex`, which provides an iterator interface to the underlying array access by index in an efficient manner. + +### Tuples + +Tuples are fixed-length containers where there is no expectation or enforcement of their having a common type. Tuples just combine values together in an *immutable* container. Like vectors they can be accessed by index (also $1$-based). Unlike vectors, the containers are *immutable* - elements can not be changed and the length of the container may not change. This has benefits for performance purposes. (For fixed length, mutable containers that have the benefits of tuples and vectors, the `StaticArrays.jl` package is available). + +While a vector is formed by placing comma-separated values within a `[]` pair (e.g., `[1,2,3]`), a tuple is formed by placing comma-separated values within a `()` pair. A tuple of length $1$ uses a convention of a trailing comma to distinguish it from a parenthesized expression (e.g. `(1,)` is a tuple, `(1)` is just the value `1`). :::{.callout-note} ## Well, actually... @@ -721,10 +763,7 @@ Technically, the tuple is formed just by the use of commas, which separate diffe ::: -Tuples are used in programming, as they don't typically require allocated memory to be used so they can be faster. Internal usages are for function arguments and function return types. Unlike vectors, tuples can be heterogeneous collections. (When commas are used to combine more than one output into a cell, a tuple is being used.) (Also, a big technical distinction is that tuples are also different from vectors and other containers in that tuple types are *covariant* in their parameters, not *invariant*.) - - -Unlike vectors, tuples can have names which can be used for referencing a value, similar to indexing but possibly more convenient. Named tuples are similar to *dictionaries* which are used to associate a key (like a name) with a value. +There are *named tuples* where each component has an associated name. Like a tuple these can be indexed by number and unlike regular tuples also by name. For example, here a named tuple is constructed, and then its elements referenced: @@ -732,9 +771,76 @@ For example, here a named tuple is constructed, and then its elements referenced ```{julia} nt = (one=1, two="two", three=:three) # heterogeneous values (Int, String, Symbol) -nt.one, nt[2], nt[end] # named tuples have name or index access +nt.one, nt[2], nt[end] # named tuples have name or index access ``` + +::: {.callout-note} +## Named tuple and destructuring + +A *named* tuple is a container that allows access by index *or* by name. They are easily constructed. For example: + +```{julia} +nt = (x0 = 1, x1 = 4, y0 = 2, y1 = 6) +``` + +The values in a named tuple can be accessed using the "dot" notation: + +```{julia} +nt.x1 +``` + +Alternatively, the index notation -- using a *symbol* for the name -- can be used: + +```{julia} +nt[:x1] +``` + +Named tuples are employed to pass parameters to functions. To find the slope, we could do: + +```{julia} +(nt.y1 - nt.y0) / (nt.x1 - nt.x0) +``` + + +However, more commonly used is destructuring, where named variables are extracted by name when the left hand side matches the right hand side: + +```{julia} +(;x0, x1) = nt # only extract what is desired +x1 - x0 +``` + +(This works for named tuples and other iterable containers in `Julia`. It also works the other way, if `x0` and `x1` are defined then `(;x0, x1)` creates a named tuple with those values.) + +::: + + +### Associative arrays + +Named tuples associate a name (in this case a symbol) to a value. More generally an associative array associates to each key a value, where the keys and values may be of different types. + +The `pair` notation, `key => value`, is used to make one association. A *dictionary* is used to have a container of associations. For example, this constructs a simple dictionary associating a spelled out name with a numeric value: + +```{julia} +d = Dict("one" => 1, "two" => 2, "three" => 3) +``` + +The print out shows the keys are of type `String`, the values of type `Int64`, in this case. There are a number of different means to construct dictionaries. + + +The values in a dictionary can be accessed by name: + + +```{julia} +d["two"] +``` + +Named tuples are associative arrays where the keys are restricted to symbols. There are other types of associative arrays, specialized cases of the `AbstractDict` type with performance benefits for specific use cases. In these notes, dictionaries appear as output in some function calls. + +Unlike vectors and tuples, named tuples and dictionaries are not currently supported by broadcasting. This causes no loss in usefulness, as the values can easily be iterated over, but the convenience of the dot notation is lost. + + + ## Questions @@ -881,6 +987,7 @@ From [transum.org](http://www.transum.org/Maths/Exam/Online_Exercise.asp?Topic=V ```{julia} #| hold: true #| echo: false +let p = plot(xlim=(0,10), ylim=(0,5), legend=false, framestyle=:none) for j in (-3):10 plot!(p, [j, j + 5], [0, 5*sqrt(3)], color=:blue, alpha=0.5) @@ -906,7 +1013,8 @@ annotate!(p, [(2, 3/2*sqrt(3) -delta, L"a"), ]) -p + p +end ``` The figure shows $5$ vectors.