Merge pull request #652 from finch-tensor/wma/docs3

Wma/docs3
finch-tensor · Nov 22, 2024 · ce1184a · ce1184a
2 parents 63a50b0 + 243ee5f
commit ce1184a
Show file tree

Hide file tree

Showing 54 changed files with 1,597 additions and 203 deletions.
diff --git a/Project.toml b/Project.toml
@@ -1,7 +1,7 @@
 name = "Finch"
 uuid = "9177782c-1635-4eb9-9bfb-d9dfa25e6bce"
 authors = ["Willow Ahrens"]
-version = "0.6.33"
+version = "1.0.0"
 
 [deps]
 AbstractTrees = "1520ce14-60c1-5f80-bbc7-55ef81b5835c"

diff --git a/README.md b/README.md
@@ -37,7 +37,7 @@ julia> using Pkg; Pkg.add("Finch")
 julia> using Finch
 
 # Create a sparse tensor
-julia> A = Tensor(Dense(SparseList(Element(0.0))), [1 0 0; 0 2 0; 0 0 3])
+julia> A = Tensor(CSCFormat(), [1 0 0; 0 2 0; 0 0 3])
 3×3 Tensor{DenseLevel{Int64, SparseListLevel{Int64, Vector{Int64}, Vector{Int64}, ElementLevel{0.0, Float64, Int64, Vector{Float64}}}}}:
  1.0  0.0  0.0
  0.0  2.0  0.0
@@ -58,6 +58,12 @@ Finch first translates high-level array code into **FinchLogic**, a custom inter
 
 Finch supports most major sparse formats (CSR, CSC, DCSR, DCSC, CSF, COO, Hash, Bytemap). Finch also allows users to define their own sparse formats with a parameterized format language.
 
+```
+CSC_matrix = Tensor(CSCFormat())
+CSR_matrix = swizzle(Tensor(CSCFormat()), 2, 1)
+CSF_tensor = Tensor(CSFFormat(3))
+```
+
 Finch also supports a wide variety of array structure beyond sparsity. Whether you're dealing with [custom background (zero) values](https://en.wikipedia.org/wiki/GraphBLAS), [run-length encoding](https://en.wikipedia.org/wiki/Run-length_encoding), or matrices with [special structures](https://en.wikipedia.org/wiki/Sparse_matrix#Special_structure) like banded or triangular matrices, Finch’s compiler can understand and optimize various data patterns and computational rules to adapt to the structure of data.
 
 ### Examples:
@@ -76,6 +82,9 @@ julia> B = Tensor(Dense(SparseList(Element(0.0))), [0 1 1; 1 0 0; 0 0 1; 0 0 1])
 # Element-wise multiplication
 julia> C = A .* B
 
+# Element-wise max
+julia> C = max.(A, B)
+
 # Sum over rows
 julia> D = sum(C, dims=2)
 ```
@@ -84,25 +93,28 @@ For situations where more complex operations are needed, Finch supports an `@ein
 ```julia
 julia> @einsum E[i] += A[i, j] * B[i, j]
 
-julia> @einsum F[i] <<max>>= A[i, j] + B[i, j]
+julia> @einsum F[i, k] <<max>>= A[i, j] + B[j, k]
 
 ```
 
-Finch even allows users to fuse multiple operations into a single kernel with `lazy` and `compute`.
+Finch even allows users to fuse multiple operations into a single kernel with `lazy` and `compute`.  The `lazy` function creates a lazy tensor, which is a symbolic representation of the computation. The `compute` function evaluates the computation.
+Different optimizers can be used with `compute`, such as the state-of-the-art `Galley` optimizer, which can adapt to the
+sparsity patterns of the inputs.
 
 ```julia
-julia> C = lazy(A) .+ lazy(B)
-?×?-LazyTensor{Float64}
+julia> using Finch, BenchmarkTools
 
-julia> D = sum(C, dims=2)
-?-LazyTensor{Float64}
-
-julia> compute(D)
-4 Tensor{SparseDictLevel{Int64, Vector{Int64}, Vector{Int64}, Vector{Int64}, Dict{Tuple{Int64, Int64}, Int64}, Vector{Int64}, ElementLevel{0.0, Float64, Int64, Vector{Float64}}}}:
- 3.1
- 6.5
- 5.4
- 6.5
+julia> A = fsprand(1000, 1000, 0.1); B = fsprand(1000, 1000, 0.1); C = fsprand(1000, 1000, 0.0001);
+
+julia> A = lazy(A); B = lazy(B); C = lazy(C);
+
+julia> sum(A * B * C)
+
+julia> @btime compute(sum(A * B * C));
+  263.612 ms (1012 allocations: 185.08 MiB)
+
+julia> @btime compute(sum(A * B * C), ctx=galley_scheduler());
+  153.708 μs (667 allocations: 29.02 KiB)
 ```
 
 ## Learn More

diff --git a/docs/src/assets/levels-A-d-d-e.png b/docs/src/assets/levels-A-d-d-e.png
diff --git a/docs/src/assets/levels-A-d-sl-e.png b/docs/src/assets/levels-A-d-sl-e.png
diff --git a/docs/src/assets/levels-A-matrix.png b/docs/src/assets/levels-A-matrix.png
diff --git a/docs/src/assets/levels-A-sc2-e.png b/docs/src/assets/levels-A-sc2-e.png
diff --git a/docs/src/assets/levels-A-sl-sl-e.png b/docs/src/assets/levels-A-sl-sl-e.png
diff --git a/docs/src/docs/array_api.md b/docs/src/docs/array_api.md
@@ -45,6 +45,14 @@ julia> map(x -> x^2, B)
  0.0  0.0  0.0   0.0   0.0  0.0
 ```
 
+# Einsum
+
+Finch also supports a highly general `@einsum` macro which supports any reduction over any simple pointwise array expression.
+
+```@docs
+@einsum
+```
+
 # Array Fusion
 
 Finch supports array fusion, which allows you to compose multiple array operations
@@ -78,69 +86,62 @@ together and divides each result by 2, without materializing an intermediate.
 ```@docs
 lazy
 compute
+fused
+```
+
+The `lazy` and `compute` functions allow the compiler to fuse operations together, resulting in asymptotically more efficient code.
+
+```julia
+julia> using BenchmarkTools
+
+julia> A = fsprand(1000, 1000, 100); B = Tensor(rand(1000, 1000)); C = Tensor(rand(1000, 1000));
+
+julia> @btime A .* (B * C);
+  145.940 ms (859 allocations: 7.69 MiB)
+
+julia> @btime compute(lazy(A) .* (lazy(B) * lazy(C)));
+  694.666 μs (712 allocations: 60.86 KiB)
+
+```
+
+## Optimizers
+
+Different optimizers can be used with `compute`, such as the state-of-the-art
+Galley optimizer, which can adapt to the sparsity patterns of the inputs. The
+optimizer can be set as an argument `ctx` to the `compute` function, or using
+`set_scheduler` or `with_scheduler`.
+
+```@docs
+set_scheduler!
+with_scheduler
+default_scheduler
 ```
 
-## The Galley Optimizer
+### The Galley Optimizer
 
 Galley is a cost-based optimizer for Finch's lazy evaluation interface based on techniques from database 
 query optimization. To use Galley, you just add the parameter `ctx=galley_optimizer()` to the `compute` 
 function. While the default optimizer (`ctx=default_scheduler()`) makes decisions entirely based on
 the types of the inputs, Galley gathers statistics on their sparsity to make cost-based based optimization
 decisions.
 
-Consider the following set of small examples:
-
-```
-   N = 300
-   A = lazy(Tensor(Dense(SparseList(Element(0.0))), fsprand(N, N, .5)))
-   B = lazy(Tensor(Dense(SparseList(Element(0.0))), fsprand(N, N, .5)))
-   C = lazy(Tensor(Dense(SparseList(Element(0.0))), fsprand(N, N, .01)))
-
-   println("Galley: A * B * C")
-   empty!(Finch.codes)
-   @btime begin 
-      compute($A * $B * $C, ctx=galley_scheduler())
-   end
-
-   println("Galley: C * B * A")
-   empty!(Finch.codes)
-   @btime begin 
-      compute($C * $B * $A, ctx=galley_scheduler())
-   end
-
-   println("Galley: sum(C * B * A)")
-   empty!(Finch.codes)
-   @btime begin 
-      compute(sum($C * $B * $A), ctx=galley_scheduler())
-   end
-
-   println("Finch: A * B * C")
-   empty!(Finch.codes)
-   @btime begin 
-      compute($A * $B * $C, ctx=Finch.default_scheduler())
-   end
-
-   println("Finch: C * B * A")
-   empty!(Finch.codes)
-   @btime begin 
-      compute($C * $B * $A, ctx=Finch.default_scheduler())
-   end
-
-   println("Finch: sum(C * B * A)")
-   empty!(Finch.codes)
-   @btime begin 
-      compute(sum($C * $B * $A), ctx=Finch.default_scheduler())
-   end
+```@docs
+galley_scheduler
 ```
 
-By taking advantage of the fact that C is highly sparse, Galley can better structure the computation. In the matrix chain multiplication,
-it always starts with the C,B matmul before multiplying with A. In the summation, it takes advantage of distributivity to pushing the reduction
-down to the inputs. It first sums over A and C, then multiplies those vectors with B.
+```julia
+julia> A = fsprand(1000, 1000, 0.1); B = fsprand(1000, 1000, 0.1); C = fsprand(1000, 1000, 0.0001);
 
-# Einsum
+julia> A = lazy(A); B = lazy(B); C = lazy(C);
 
-Finch also supports a highly general `@einsum` macro which supports any reduction over any simple pointwise array expression.
+julia> @btime compute(sum(A * B * C));
+  282.503 ms (1018 allocations: 184.43 MiB)
 
-```@docs
-@einsum
-```
+julia> @btime compute(sum(A * B * C), ctx=galley_scheduler());
+  152.792 μs (672 allocations: 28.81 KiB)
+
+```
+
+By taking advantage of the fact that C is highly sparse, Galley can better structure the computation. In the matrix chain multiplication,
+it always starts with the C,B matmul before multiplying with A. In the summation, it takes advantage of distributivity to pushing the reduction
+down to the inputs. It first sums over A and C, then multiplies those vectors with B.
diff --git a/docs/src/docs/sparse_utils.md b/docs/src/docs/sparse_utils.md
@@ -96,4 +96,75 @@ julia> sum(map(last, B))
 
 julia> sum(map(first, B))
 4.0
+```
+
+## Format Conversion and Storage Order
+
+### Converting Between Formats
+
+You can convert between tensor formats with the `Tensor` constructor. Simply construct a new Tensor in the desired format and 
+
+```jldoctest tensorformats; setup = :(using Finch)
+# Create an empty 4x3 sparse matrix in CSC format
+julia> A = Tensor(CSCFormat(), [0 0 2 1; 0 0 1 0; 1 0 0 0])
+3×4 Tensor{DenseLevel{Int64, SparseListLevel{Int64, Vector{Int64}, Vector{Int64}, ElementLevel{0.0, Float64, Int64, Vector{Float64}}}}}:
+ 0.0  0.0  2.0  1.0
+ 0.0  0.0  1.0  0.0
+ 1.0  0.0  0.0  0.0
+
+julia> B = Tensor(DCSCFormat(), A)
+3×4 Tensor{SparseListLevel{Int64, Vector{Int64}, Vector{Int64}, SparseListLevel{Int64, Vector{Int64}, Vector{Int64}, ElementLevel{0.0, Float64, Int64, Vector{Float64}}}}}:
+ 0.0  0.0  2.0  1.0
+ 0.0  0.0  1.0  0.0
+ 1.0  0.0  0.0  0.0
+
+```
+
+### Storage Order
+
+By default, tensors in Finch are column-major. However, you can use the
+`swizzle` function to transpose them lazily. To convert to a transposed format,
+use the `dropfills!` function. Note that the `permutedims` function transposes eagerly.
+
+```@docs
+swizzle
+```
+
+```jldoctest tensorformats; setup = :(using Finch)
+julia> A = Tensor(CSCFormat(), [0 0 2 1; 0 0 1 0; 1 0 0 0])
+3×4 Tensor{DenseLevel{Int64, SparseListLevel{Int64, Vector{Int64}, Vector{Int64}, ElementLevel{0.0, Float64, Int64, Vector{Float64}}}}}:
+ 0.0  0.0  2.0  1.0
+ 0.0  0.0  1.0  0.0
+ 1.0  0.0  0.0  0.0
+
+julia> tensor_tree(swizzle(A, 2, 1))
+SwizzleArray (2, 1)
+└─ 3×4-Tensor
+   └─ Dense [:,1:4]
+      ├─ [:, 1]: SparseList (0.0) [1:3]
+      │  └─ [3]: 1.0
+      ├─ [:, 2]: SparseList (0.0) [1:3]
+      ├─ [:, 3]: SparseList (0.0) [1:3]
+      │  ├─ [1]: 2.0
+      │  └─ [2]: 1.0
+      └─ [:, 4]: SparseList (0.0) [1:3]
+         └─ [1]: 1.0
+
+julia> tensor_tree(permutedims(A, (2, 1)))
+4×3-Tensor
+└─ SparseDict (0.0) [:,1:3]
+   ├─ [:, 1]: SparseDict (0.0) [1:4]
+   │  ├─ [3]: 2.0
+   │  └─ [4]: 1.0
+   ├─ [:, 2]: SparseDict (0.0) [1:4]
+   │  └─ [3]: 1.0
+   └─ [:, 3]: SparseDict (0.0) [1:4]
+      └─ [1]: 1.0
+
+julia> dropfills!(swizzle(Tensor(CSCFormat()), 2, 1), A)
+3×4 Finch.SwizzleArray{(2, 1), Tensor{DenseLevel{Int64, SparseListLevel{Int64, Vector{Int64}, Vector{Int64}, ElementLevel{0.0, Float64, Int64, Vector{Float64}}}}}}:
+ 0.0  0.0  2.0  1.0
+ 0.0  0.0  1.0  0.0
+ 1.0  0.0  0.0  0.0
+
 ```