-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
intrinsics module with alternative implementations #915
Open
jalvesz
wants to merge
30
commits into
fortran-lang:master
Choose a base branch
from
jalvesz:intrinsics
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
30 commits
Select commit
Hold shift + click to select a range
08ec0aa
intrinsics module with fast sums
jalvesz c36251e
Merge branch 'fortran-lang:master' into intrinsics
jalvesz 2207f41
Merge branch 'fortran-lang:master' into intrinsics
jalvesz 2bc7af9
add fast dot_product and start tests
jalvesz 4625205
Merge branch 'intrinsics' of https://github.com/jalvesz/stdlib into i…
jalvesz 243ea6f
add complex sum test
jalvesz c38dcd6
test masked sum
jalvesz bf1ce2f
add dot_product tests
jalvesz cc9df61
start specs
jalvesz 671fd61
Merge branch 'fortran-lang:master' into intrinsics
jalvesz 75945f1
split into submodules
jalvesz d05903f
specs and examples
jalvesz c0d96e5
Merge branch 'intrinsics' of https://github.com/jalvesz/stdlib into i…
jalvesz 4abd8d3
Merge branch 'fortran-lang:master' into intrinsics
jalvesz 7c6e8a4
fix specs
jalvesz 7cea1fd
fix test: complex initialization
jalvesz eaffa4a
fix test: complex assignment caused accuracy loss
jalvesz ad64162
Merge branch 'fortran-lang:master' into intrinsics
jalvesz a3d24e4
extend fsum support for ndarrays
jalvesz 5a1fdcb
remove unnecessary definition
jalvesz 47396ac
update specs, change name of kahan kernel
jalvesz ecb7050
small reorganization
jalvesz 87ef502
Merge branch 'intrinsics' of https://github.com/jalvesz/stdlib into i…
jalvesz 14be974
change names to stdlib_*
jalvesz aaa68bc
add comments
jalvesz cc232e1
Merge branch 'fortran-lang:master' into intrinsics
jalvesz 6e36b6f
extend kahan sum for rank N arrays
jalvesz 65175d7
Merge branch 'intrinsics' of https://github.com/jalvesz/stdlib into i…
jalvesz 8a35f38
Merge branch 'fortran-lang:master' into intrinsics
jalvesz 16a0e96
Merge branch 'fortran-lang:master' into intrinsics
jalvesz File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,158 @@ | ||
--- | ||
title: intrinsics | ||
--- | ||
|
||
# The `stdlib_intrinsics` module | ||
|
||
[TOC] | ||
|
||
## Introduction | ||
|
||
The `stdlib_intrinsics` module provides replacements for some of the well known intrinsic functions found in Fortran compilers for which either a faster and/or more accurate implementation is found which has also proven of interest to the Fortran community. | ||
|
||
<!-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --> | ||
### `stdlib_sum` function | ||
|
||
#### Description | ||
|
||
The `stdlib_sum` function can replace the intrinsic `sum` for `real` or `complex` arrays. It follows a chunked implementation which maximizes vectorization potential as well as reducing the round-off error. This procedure is recommended when summing large arrays, for repetitive summation of smaller arrays consider the classical `sum`. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why is it not for |
||
|
||
#### Syntax | ||
|
||
`res = ` [[stdlib_intrinsics(module):stdlib_sum(interface)]] ` (x [,mask] )` | ||
|
||
`res = ` [[stdlib_intrinsics(module):stdlib_sum(interface)]] ` (x, dim [,mask] )` | ||
|
||
#### Status | ||
|
||
Experimental | ||
|
||
#### Class | ||
|
||
Pure function. | ||
|
||
#### Argument(s) | ||
|
||
`x`: N-D array of either `real` or `complex` type. This argument is `intent(in)`. | ||
|
||
`dim` (optional): scalar of type `integer` with a value in the range from 1 to n, where n equals the rank of `x`. | ||
|
||
`mask` (optional): N-D array of `logical` values, with the same shape as `x`. This argument is `intent(in)`. | ||
|
||
#### Output value or Result value | ||
|
||
If `dim` is absent, the output is a scalar of the same `type` and `kind` as to that of `x`. Otherwise, an array of rank n-1, where n equals the rank of `x`, and a shape similar to that of `x` with dimension `dim` dropped is returned. | ||
|
||
<!-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --> | ||
### `stdlib_sum_kahan` function | ||
|
||
#### Description | ||
|
||
The `stdlib_sum_kahan` function can replace the intrinsic `sum` for `real` or `complex` arrays. It follows a chunked implementation which maximizes vectorization potential complemented by an `elemental` kernel based on the [kahan summation](https://en.wikipedia.org/wiki/Kahan_summation_algorithm) strategy to reduce the round-off error: | ||
|
||
```fortran | ||
elemental subroutine kahan_kernel_<kind>(a,s,c) | ||
type(<kind>), intent(in) :: a | ||
type(<kind>), intent(inout) :: s | ||
type(<kind>), intent(inout) :: c | ||
type(<kind>) :: t, y | ||
y = a - c | ||
t = s + y | ||
c = (t - s) - y | ||
s = t | ||
end subroutine | ||
``` | ||
|
||
#### Syntax | ||
|
||
`res = ` [[stdlib_intrinsics(module):stdlib_sum_kahan(interface)]] ` (x [,mask] )` | ||
|
||
`res = ` [[stdlib_intrinsics(module):stdlib_sum_kahan(interface)]] ` (x, dim [,mask] )` | ||
|
||
#### Status | ||
|
||
Experimental | ||
|
||
#### Class | ||
|
||
Pure function. | ||
|
||
#### Argument(s) | ||
|
||
`x`: 1D array of either `real` or `complex` type. This argument is `intent(in)`. | ||
|
||
`dim` (optional): scalar of type `integer` with a value in the range from 1 to n, where n equals the rank of `x`. | ||
|
||
`mask` (optional): N-D array of `logical` values, with the same shape as `x`. This argument is `intent(in)`. | ||
|
||
#### Output value or Result value | ||
|
||
If `dim` is absent, the output is a scalar of the same `type` and `kind` as to that of `x`. Otherwise, an array of rank n-1, where n equals the rank of `x`, and a shape similar to that of `x` with dimension `dim` dropped is returned. | ||
|
||
#### Example | ||
|
||
```fortran | ||
{!example/intrinsics/example_sum.f90!} | ||
``` | ||
|
||
<!-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --> | ||
### `stdlib_dot_product` function | ||
|
||
#### Description | ||
|
||
The `stdlib_dot_product` function can replace the intrinsic `dot_product` for 1D `real` or `complex` arrays. It follows a chunked implementation which maximizes vectorization potential as well as reducing the round-off error. This procedure is recommended when crunching large arrays, for repetitive products of smaller arrays consider the classical `dot_product`. | ||
|
||
#### Syntax | ||
|
||
`res = ` [[stdlib_intrinsics(module):stdlib_dot_product(interface)]] ` (x, y)` | ||
|
||
#### Status | ||
|
||
Experimental | ||
|
||
#### Class | ||
|
||
Pure function. | ||
|
||
#### Argument(s) | ||
|
||
`x`: 1D array of either `real` or `complex` type. This argument is `intent(in)`. | ||
|
||
`y`: 1D array of the same type and kind as `x`. This argument is `intent(in)`. | ||
|
||
#### Output value or Result value | ||
|
||
The output is a scalar of `type` and `kind` same as to that of `x` and `y`. | ||
|
||
<!-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --> | ||
### `stdlib_dot_product_kahan` function | ||
|
||
#### Description | ||
|
||
The `stdlib_dot_product_kahan` function can replace the intrinsic `dot_product` for 1D `real` or `complex` arrays. It follows a chunked implementation which maximizes vectorization potential , complemented by the same `elemental` kernel based on the [kahan summation](https://en.wikipedia.org/wiki/Kahan_summation_algorithm) used for `stdlib_sum` to reduce the round-off error. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is the license of wikipedia in agreement with the MIT license of stdlib? |
||
|
||
#### Syntax | ||
|
||
`res = ` [[stdlib_intrinsics(module):stdlib_dot_product_kahan(interface)]] ` (x, y)` | ||
|
||
#### Status | ||
|
||
Experimental | ||
|
||
#### Class | ||
|
||
Pure function. | ||
|
||
#### Argument(s) | ||
|
||
`x`: 1D array of either `real` or `complex` type. This argument is `intent(in)`. | ||
|
||
`y`: 1D array of the same type and kind as `x`. This argument is `intent(in)`. | ||
|
||
#### Output value or Result value | ||
|
||
The output is a scalar of `type` and `kind` same as to that of `x` and `y`. | ||
|
||
```fortran | ||
{!example/intrinsics/example_dot_product.f90!} | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
ADD_EXAMPLE(sum) | ||
ADD_EXAMPLE(dot_product) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
program example_dot_product | ||
use stdlib_kinds, only: sp | ||
use stdlib_intrinsics, only: stdlib_dot_product, stdlib_dot_product_kahan | ||
implicit none | ||
|
||
real(sp), allocatable :: x(:), y(:) | ||
real(sp) :: total_prod(3) | ||
|
||
allocate( x(1000), y(1000) ) | ||
call random_number(x) | ||
call random_number(y) | ||
|
||
total_prod(1) = dot_product(x,y) !> compiler intrinsic | ||
total_prod(2) = stdlib_dot_product(x,y) !> chunked summation over inner product | ||
total_prod(3) = stdlib_dot_product_kahan(x,y) !> chunked kahan summation over inner product | ||
print *, total_prod(1:3) | ||
|
||
end program example_dot_product |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
program example_sum | ||
use stdlib_kinds, only: sp | ||
use stdlib_intrinsics, only: stdlib_sum, stdlib_sum_kahan | ||
implicit none | ||
|
||
real(sp), allocatable :: x(:) | ||
real(sp) :: total_sum(3) | ||
|
||
allocate( x(1000) ) | ||
call random_number(x) | ||
|
||
total_sum(1) = sum(x) !> compiler intrinsic | ||
total_sum(2) = stdlib_sum(x) !> chunked summation | ||
total_sum(3) = stdlib_sum_kahan(x)!> chunked kahan summation | ||
print *, total_sum(1:3) | ||
|
||
end program example_sum |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: why are these functions not implemented by the compilers if they are faster and more accurate? Is it a standard limitation?
For the cases where it is less accurate, I think there should be a warning in the specs.