Matthias-2015-07-27.txt

Started in the 30s with Church.

Godel has a language that can prove anything.

Church says you can prove anything with functions.

Set notation was `^ x | x² < 0`, church annotated `'^ | x² < 0`, which was
typeset as `λ | x² < 0`.

    e ⩴ x | λx.e | e e

See things as trees:

    ((λx.x)(λx.x)) = (λ.←) (λ.←)

The names don't matter, the meaning of variable is its binding-site.

    λdavid. (david david) = λjeff. (jeff jeff)

"`λ` is just 7th-grade algebra hyped up a little bit."

    f(x) ≔ x²

    f(6) + 5 = 6² + 5 = 41

    (λx.e)e' = e[x←e'] -- β

    (λx.x) a = a
  
    (λxy.xyy)ab = (λy.ayy)b = abb

Lambdas are not functions! What about partial functions?

`β` is a relation, and requires a `λ` on the outside.

    λx.λy.λz.⸤(λx.x)(zz)⸥y

We can't reduce with `β`.

`β` is a "notion of reduction".

We start defining `=⸤β⸣`:

    e β e'
    ---------
    e =⸤β⸥ e'

    e =⸤β⸥ e'
    ---------------
    λx.e =⸤β⸥ λx.e'

Template structure is called a "redex". 

["redex" is not latin, it just means reducible expression. "contractum" is
latin.]

Need another set of rules..

    e =⸤β⸥ e'
    ----------------
    e₀ e =⸤β⸥ e₀ e'

    e =⸤β⸥ e'
    ----------------
    e e₀ =⸤β⸥ e' e₀

We are creating a "syntactic compatibility closure".

We also need the reflexive, symmetric and transitive closure of `=⸤β⸥`.

This gives an equivalence relation.

We have "a system of calculating equivalences between terms".

Q: "does something have meaning?"

Two possible meanings:

1. "Can you prove 'true = false', or 'is everything related'"
   You need to prove (meta-proof), that you cannot prove (in the system), that
   some two terms are equal.
   This is called a consistency theorem, developed by Church+Rosser, "the
   Church+Rosser lemma".
   This shows that the system relates some terms, but not all terms.
2. Is there a topologically, algebraically generated space of functions
   generated by `λ` and satisfying `=⸤β⸥`.
   This was worked out by Dana Scott.

"lambda-calculus and denotational semantics had a terrible influence on
computer science" --MF

1958: Lisp and Algol 60 were created.

Lisp: 
- introduced `λ`-notation, got it wrong

Algol 60: 
- based on substitution model of `λ`-calculus
- call-by-name parameter passing (`β`-rule) (was very slow)
- then also introduced call-by-value
- cvn vs cbv "one was correct, one was fast"

For the next 15 years, people struggled to relate call-by-name (correct) with
call-by-value (fast).

Landin (1960s, '62, '63), invented the idea of abstract syntax.
Bohm did the same thing.
McCarthy tried something similar.

Abelson and Sussman make popular "applicative order application".

Dana Scott assigned a mathematical meaning to `λ`-calculus:
1. Created the function space
2. Assigning a mapping from `λ → ⟦⟧`

MF opinion, denotational semantics took us off track.

Plotkin solved all of this (1972/1974) "Call-by-name, call-by-value and the
lambda calculus". Launched enough research ideas to fill 15 people's entire
research lives. Read this paper it's really good!!!

Gives an algorithm to understand what a calculus and a semantics is for a
programming language (13 steps?).

Launched research into CPS.

1. Pick a term language, scoped
2. Pick a subset of terms, called programs, and another subset, called values
   (first appearance of words 'program' and 'value' in study of `λ` up to that
   point.)
   - Programs are things we don't really know what to do with immediately
   - Values are things "you see" at the end of computation. `λ` is a value.

                .- input
         (λi.e) e' ~~~~~~> output
         -----
         ^ program proper

3. Define a notion of reduction: `β` and `β-value`

       βᵥ: (λx.e)v ~> e[x←v]

4. Uniformly crate a calculus `=ₓ` from the notions of reduction.

       `=ₙ` from β and `=ᵥ` from β and βᵥ

   A way to equating arbitrary program fragments.

5. Define a semantics from `=ₓ`

       evalₓ ∈ 𝒫(Program × Value)

       e evalₓ v 𝑖𝑓𝑓 e =ₓ x

6. Prove that `evalₓ` is a function.

   Via Church-Rosser Lemma, `evalˣ` is a (partial) function

       evalₓ(e) ≔ { n          for "base" value 
                  | 'closure   for λ-expression }

   You can now prove things like:
   
       e (Y e) =ₙ Y e
   
   Computation should be directed, which for now is not specified, and
   problematic.

7. Prove that `=ₓ` satisfies a "standardization" property:

       𝑖𝑓  e =ₓ e' 𝑡ℎ𝑒𝑛 then you can do so in an algorithmic fashion

   An algorithm means you know how to pick the next redex.

   The algorithm is the same for CBN and CBV.

   "left-most outer-most strategy", 𝑖.𝑒. standard reduction. `|-->ₓ`.

   A strategy is a meta-function for picking a redex.

   "If all you care about is the value at the end, you can use standard
   reduction".

       evalₓ(e) = v 𝑖𝑓𝑓 e |-->ₓ* v

   Proof in Curry and Fays, Curry Fays theorem.

(Aside: you must give readers a guide for how to pronounce math notation! A
reader should be able to read your paper aloud.)

We have two semantics, `evalₓ` based on standard reduction, and `=ₓ` based on
equality.

Must prove that `evalₓSR` is the same function as `evalₓ=`.

CBN calculus inconsistent with CBV interpreter, CBV calculus inconsistent with
CBN interpreter.

What do calculations on program mean?

1. (Syntactic) because you prove Church/Rosser, you know that calculations are
   consistent with the "fast" interpreter
2. (Semantic) via snippet from Jim Morris (63) dissertation, created
   polymorphic lambda calculus (PAL): introduce a relation known as
   observational equivalence.

   `e ≃ e'` means for all ways of placing a term into a complete program (a
   context) called C, evalₓ(C[e]) ~ evalₓ(C[e'])

   Two versions: `≃ₙ` and `≃ᵥ`. These are the largest possible consistent
   equivalence relations that let you calculate programs. Therefore they are
   unique (because they are larges). They are the _truth_.

   Every programming language has "the truth" (`≃ₙ`) by virtue of having an
   interpreter. The goal is to make the proof system (`=ₓ`) consistent with the
   truth.

   MF: "On the expressive power of programming languages", previous draft
   attempted to prove `≃ᵥ ⊆  ≃ₙ`, was proved different two months earlier.

CBV and CBN functional programming are not related other than in the syntax of
the terms. CBN is not "a different strategy".

Laziness and CBN are related, by subset.

Q: what use is studying functional programming if programs aren't purely
functional?

    (f (call/cc g)) ~ g(f)

A calculus equation for a very imperative idea.

Technical insights: "evaluation context semantics" _use contexts instead of
inference rules_.

    e β e'
    ---------    <-- inference rule
    e =⸤β⸥ e'

"Syntactic compatibility"

"left-most-outer-most"

Contexts:

    e ⩴ x | λx.e | e e
    C ⩴ □ | λx.C | C e | e C

one-hole contexts.

`C[e]` "textually" put `e` into hole.

    (λx.□)(λy.y)
          
     / \
    λ  λ←←
    |  | ↑
    □  ⋅→→

with contexts:

    =⸤β⸥ : e =⸤β⸥ e' 𝑖𝑓𝑓  ∃ C,
         e  = C[(λx.e₀)e₁]
         e' = C[e₀[x ←e₁]]

Evaluation context:

    E ⩴ □ | E e

Thm: E[(λx.e)e'] is the LMOM redex.

For CBV you need:

    E ⩴ □ | v E | E e

You could also use:

    E ⩴ □ | e E | E v

also left-most-outer-most!

    E[(λx.e)e'] |-->ₙ E[e[x←e']]

fully describes CBN standard reduction.

    E[(λx.e)v] |-->ᵥ E[e[x←v]]

fully describes CBV standard reduction.

"evaluation context semantics" should be called "standard reduction semantics".

Technical Insight 2:

    E[ THING v ]

`THING` can manipulate `E`, the evaluation context.

From this you can do side-effects, continuations, etc.

𝑒.𝑔.

    E[raise e] ~> raise e

full equational system for exceptions:

    x | λx.e | ee | raise e

calculation system:

    C[(λx.e)v]    =ₑₓ C[e[x←v]]
    C[E[raise e]] =ₑₓ C[raise e]

These two rules give you a consistent Church/Rosser system for exceptions. Same
two equations work for CBN.

Standard reduction:

    E[(λx.e)v]     |-->ₑₓ E[e[x←v]]
    E[E'[raise e]] |-->ₑₓ E[raise e]   [ |-->ₑₓ raise e , as a coincidence  ]

Standard reduction for assignment:

    e = x | λx.e | e e | set! x e | letrec ((x v) ..) e
    v = λx.e

    E = □ | E e | v E | set! x E

    (βₛₑₜ):  (λx.e) v                          R  letrec ((x v)) e
    (x):     letrec (.. (x v) ..) E[x]         R  letrec (.. (x v) ..) E[v]
    (set!):  letrec (.. (x v) ..) E[set! x u]  R  letrec (.. (x u) ..) E[λx.x]

    (scope extrusion:)
      E[letrec (...) e]  R  letrec (...) E[e]

    (merge:)
      letrec (.. (x v) ..) (letrec (.. (y u) ..) e)  R  letrec (.. (x v) .. .. (y u) ..) e

You can calculate in parallel, but standard reduction doesn't capture parallel
execution.

Technical Insight 3:

    t:   E[(λx.e)e']   = Pₜ
    t+1: E[e[x←e']]    = Pₜ₊₁

Idea: separate `E` from the expression where the "machine" is looking for a
redex.

Two register machine: control and stack registers:

    ⟨e,E⟩

Next idea: change data representation from context to stack:

    ⟨e,[app₁]⟩
       [app₂]
       [app₃]
        ...

Next idea: substitution is hard and inefficient. Make substitution lazy;
reveals an explicit environment.

    control:     e
    environment: ρ  mapping free-variables to values
    stack:       κ  control stack