Skip to content

Commit

Permalink
Merge pull request #652 from finch-tensor/wma/docs3
Browse files Browse the repository at this point in the history
Wma/docs3
  • Loading branch information
willow-ahrens authored Nov 22, 2024
2 parents 63a50b0 + 243ee5f commit ce1184a
Show file tree
Hide file tree
Showing 54 changed files with 1,597 additions and 203 deletions.
2 changes: 1 addition & 1 deletion Project.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name = "Finch"
uuid = "9177782c-1635-4eb9-9bfb-d9dfa25e6bce"
authors = ["Willow Ahrens"]
version = "0.6.33"
version = "1.0.0"

[deps]
AbstractTrees = "1520ce14-60c1-5f80-bbc7-55ef81b5835c"
Expand Down
40 changes: 26 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ julia> using Pkg; Pkg.add("Finch")
julia> using Finch

# Create a sparse tensor
julia> A = Tensor(Dense(SparseList(Element(0.0))), [1 0 0; 0 2 0; 0 0 3])
julia> A = Tensor(CSCFormat(), [1 0 0; 0 2 0; 0 0 3])
3×3 Tensor{DenseLevel{Int64, SparseListLevel{Int64, Vector{Int64}, Vector{Int64}, ElementLevel{0.0, Float64, Int64, Vector{Float64}}}}}:
1.0 0.0 0.0
0.0 2.0 0.0
Expand All @@ -58,6 +58,12 @@ Finch first translates high-level array code into **FinchLogic**, a custom inter

Finch supports most major sparse formats (CSR, CSC, DCSR, DCSC, CSF, COO, Hash, Bytemap). Finch also allows users to define their own sparse formats with a parameterized format language.

```
CSC_matrix = Tensor(CSCFormat())
CSR_matrix = swizzle(Tensor(CSCFormat()), 2, 1)
CSF_tensor = Tensor(CSFFormat(3))
```

Finch also supports a wide variety of array structure beyond sparsity. Whether you're dealing with [custom background (zero) values](https://en.wikipedia.org/wiki/GraphBLAS), [run-length encoding](https://en.wikipedia.org/wiki/Run-length_encoding), or matrices with [special structures](https://en.wikipedia.org/wiki/Sparse_matrix#Special_structure) like banded or triangular matrices, Finch’s compiler can understand and optimize various data patterns and computational rules to adapt to the structure of data.

### Examples:
Expand All @@ -76,6 +82,9 @@ julia> B = Tensor(Dense(SparseList(Element(0.0))), [0 1 1; 1 0 0; 0 0 1; 0 0 1])
# Element-wise multiplication
julia> C = A .* B

# Element-wise max
julia> C = max.(A, B)

# Sum over rows
julia> D = sum(C, dims=2)
```
Expand All @@ -84,25 +93,28 @@ For situations where more complex operations are needed, Finch supports an `@ein
```julia
julia> @einsum E[i] += A[i, j] * B[i, j]

julia> @einsum F[i] <<max>>= A[i, j] + B[i, j]
julia> @einsum F[i, k] <<max>>= A[i, j] + B[j, k]

```

Finch even allows users to fuse multiple operations into a single kernel with `lazy` and `compute`.
Finch even allows users to fuse multiple operations into a single kernel with `lazy` and `compute`. The `lazy` function creates a lazy tensor, which is a symbolic representation of the computation. The `compute` function evaluates the computation.
Different optimizers can be used with `compute`, such as the state-of-the-art `Galley` optimizer, which can adapt to the
sparsity patterns of the inputs.

```julia
julia> C = lazy(A) .+ lazy(B)
?×?-LazyTensor{Float64}
julia> using Finch, BenchmarkTools

julia> D = sum(C, dims=2)
?-LazyTensor{Float64}

julia> compute(D)
4 Tensor{SparseDictLevel{Int64, Vector{Int64}, Vector{Int64}, Vector{Int64}, Dict{Tuple{Int64, Int64}, Int64}, Vector{Int64}, ElementLevel{0.0, Float64, Int64, Vector{Float64}}}}:
3.1
6.5
5.4
6.5
julia> A = fsprand(1000, 1000, 0.1); B = fsprand(1000, 1000, 0.1); C = fsprand(1000, 1000, 0.0001);

julia> A = lazy(A); B = lazy(B); C = lazy(C);

julia> sum(A * B * C)

julia> @btime compute(sum(A * B * C));
263.612 ms (1012 allocations: 185.08 MiB)

julia> @btime compute(sum(A * B * C), ctx=galley_scheduler());
153.708 μs (667 allocations: 29.02 KiB)
```

## Learn More
Expand Down
Binary file removed docs/src/assets/levels-A-d-d-e.png
Binary file not shown.
Binary file removed docs/src/assets/levels-A-d-sl-e.png
Binary file not shown.
Binary file removed docs/src/assets/levels-A-matrix.png
Binary file not shown.
Binary file removed docs/src/assets/levels-A-sc2-e.png
Binary file not shown.
Binary file removed docs/src/assets/levels-A-sl-sl-e.png
Binary file not shown.
105 changes: 53 additions & 52 deletions docs/src/docs/array_api.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,14 @@ julia> map(x -> x^2, B)
0.0 0.0 0.0 0.0 0.0 0.0
```

# Einsum

Finch also supports a highly general `@einsum` macro which supports any reduction over any simple pointwise array expression.

```@docs
@einsum
```

# Array Fusion

Finch supports array fusion, which allows you to compose multiple array operations
Expand Down Expand Up @@ -78,69 +86,62 @@ together and divides each result by 2, without materializing an intermediate.
```@docs
lazy
compute
fused
```

The `lazy` and `compute` functions allow the compiler to fuse operations together, resulting in asymptotically more efficient code.

```julia
julia> using BenchmarkTools

julia> A = fsprand(1000, 1000, 100); B = Tensor(rand(1000, 1000)); C = Tensor(rand(1000, 1000));

julia> @btime A .* (B * C);
145.940 ms (859 allocations: 7.69 MiB)

julia> @btime compute(lazy(A) .* (lazy(B) * lazy(C)));
694.666 μs (712 allocations: 60.86 KiB)

```

## Optimizers

Different optimizers can be used with `compute`, such as the state-of-the-art
Galley optimizer, which can adapt to the sparsity patterns of the inputs. The
optimizer can be set as an argument `ctx` to the `compute` function, or using
`set_scheduler` or `with_scheduler`.

```@docs
set_scheduler!
with_scheduler
default_scheduler
```

## The Galley Optimizer
### The Galley Optimizer

Galley is a cost-based optimizer for Finch's lazy evaluation interface based on techniques from database
query optimization. To use Galley, you just add the parameter `ctx=galley_optimizer()` to the `compute`
function. While the default optimizer (`ctx=default_scheduler()`) makes decisions entirely based on
the types of the inputs, Galley gathers statistics on their sparsity to make cost-based based optimization
decisions.

Consider the following set of small examples:

```
N = 300
A = lazy(Tensor(Dense(SparseList(Element(0.0))), fsprand(N, N, .5)))
B = lazy(Tensor(Dense(SparseList(Element(0.0))), fsprand(N, N, .5)))
C = lazy(Tensor(Dense(SparseList(Element(0.0))), fsprand(N, N, .01)))
println("Galley: A * B * C")
empty!(Finch.codes)
@btime begin
compute($A * $B * $C, ctx=galley_scheduler())
end
println("Galley: C * B * A")
empty!(Finch.codes)
@btime begin
compute($C * $B * $A, ctx=galley_scheduler())
end
println("Galley: sum(C * B * A)")
empty!(Finch.codes)
@btime begin
compute(sum($C * $B * $A), ctx=galley_scheduler())
end
println("Finch: A * B * C")
empty!(Finch.codes)
@btime begin
compute($A * $B * $C, ctx=Finch.default_scheduler())
end
println("Finch: C * B * A")
empty!(Finch.codes)
@btime begin
compute($C * $B * $A, ctx=Finch.default_scheduler())
end
println("Finch: sum(C * B * A)")
empty!(Finch.codes)
@btime begin
compute(sum($C * $B * $A), ctx=Finch.default_scheduler())
end
```@docs
galley_scheduler
```

By taking advantage of the fact that C is highly sparse, Galley can better structure the computation. In the matrix chain multiplication,
it always starts with the C,B matmul before multiplying with A. In the summation, it takes advantage of distributivity to pushing the reduction
down to the inputs. It first sums over A and C, then multiplies those vectors with B.
```julia
julia> A = fsprand(1000, 1000, 0.1); B = fsprand(1000, 1000, 0.1); C = fsprand(1000, 1000, 0.0001);

# Einsum
julia> A = lazy(A); B = lazy(B); C = lazy(C);

Finch also supports a highly general `@einsum` macro which supports any reduction over any simple pointwise array expression.
julia> @btime compute(sum(A * B * C));
282.503 ms (1018 allocations: 184.43 MiB)

```@docs
@einsum
```
julia> @btime compute(sum(A * B * C), ctx=galley_scheduler());
152.792 μs (672 allocations: 28.81 KiB)

```

By taking advantage of the fact that C is highly sparse, Galley can better structure the computation. In the matrix chain multiplication,
it always starts with the C,B matmul before multiplying with A. In the summation, it takes advantage of distributivity to pushing the reduction
down to the inputs. It first sums over A and C, then multiplies those vectors with B.
71 changes: 71 additions & 0 deletions docs/src/docs/sparse_utils.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,4 +96,75 @@ julia> sum(map(last, B))
julia> sum(map(first, B))
4.0
```

## Format Conversion and Storage Order

### Converting Between Formats

You can convert between tensor formats with the `Tensor` constructor. Simply construct a new Tensor in the desired format and

```jldoctest tensorformats; setup = :(using Finch)
# Create an empty 4x3 sparse matrix in CSC format
julia> A = Tensor(CSCFormat(), [0 0 2 1; 0 0 1 0; 1 0 0 0])
3×4 Tensor{DenseLevel{Int64, SparseListLevel{Int64, Vector{Int64}, Vector{Int64}, ElementLevel{0.0, Float64, Int64, Vector{Float64}}}}}:
0.0 0.0 2.0 1.0
0.0 0.0 1.0 0.0
1.0 0.0 0.0 0.0
julia> B = Tensor(DCSCFormat(), A)
3×4 Tensor{SparseListLevel{Int64, Vector{Int64}, Vector{Int64}, SparseListLevel{Int64, Vector{Int64}, Vector{Int64}, ElementLevel{0.0, Float64, Int64, Vector{Float64}}}}}:
0.0 0.0 2.0 1.0
0.0 0.0 1.0 0.0
1.0 0.0 0.0 0.0
```

### Storage Order

By default, tensors in Finch are column-major. However, you can use the
`swizzle` function to transpose them lazily. To convert to a transposed format,
use the `dropfills!` function. Note that the `permutedims` function transposes eagerly.

```@docs
swizzle
```

```jldoctest tensorformats; setup = :(using Finch)
julia> A = Tensor(CSCFormat(), [0 0 2 1; 0 0 1 0; 1 0 0 0])
3×4 Tensor{DenseLevel{Int64, SparseListLevel{Int64, Vector{Int64}, Vector{Int64}, ElementLevel{0.0, Float64, Int64, Vector{Float64}}}}}:
0.0 0.0 2.0 1.0
0.0 0.0 1.0 0.0
1.0 0.0 0.0 0.0
julia> tensor_tree(swizzle(A, 2, 1))
SwizzleArray (2, 1)
└─ 3×4-Tensor
└─ Dense [:,1:4]
├─ [:, 1]: SparseList (0.0) [1:3]
│ └─ [3]: 1.0
├─ [:, 2]: SparseList (0.0) [1:3]
├─ [:, 3]: SparseList (0.0) [1:3]
│ ├─ [1]: 2.0
│ └─ [2]: 1.0
└─ [:, 4]: SparseList (0.0) [1:3]
└─ [1]: 1.0
julia> tensor_tree(permutedims(A, (2, 1)))
4×3-Tensor
└─ SparseDict (0.0) [:,1:3]
├─ [:, 1]: SparseDict (0.0) [1:4]
│ ├─ [3]: 2.0
│ └─ [4]: 1.0
├─ [:, 2]: SparseDict (0.0) [1:4]
│ └─ [3]: 1.0
└─ [:, 3]: SparseDict (0.0) [1:4]
└─ [1]: 1.0
julia> dropfills!(swizzle(Tensor(CSCFormat()), 2, 1), A)
3×4 Finch.SwizzleArray{(2, 1), Tensor{DenseLevel{Int64, SparseListLevel{Int64, Vector{Int64}, Vector{Int64}, ElementLevel{0.0, Float64, Int64, Vector{Float64}}}}}}:
0.0 0.0 2.0 1.0
0.0 0.0 1.0 0.0
1.0 0.0 0.0 0.0
```
Loading

0 comments on commit ce1184a

Please sign in to comment.