Skip to content

Commit

Permalink
Make font italic
Browse files Browse the repository at this point in the history
  • Loading branch information
tomvothecoder committed Jan 23, 2024
1 parent 8835da2 commit f7c487a
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion docs/paper/paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ xCDAT addresses this need by combining the power of Xarray with meticulously dev

Performance is one fundamental driver in how xCDAT is designed, especially with large datasets. xCDAT conveniently inherits Xarray's support for parallel computing with Dask [@dask:2016]. Parallel computing with Dask enables users to take advantage of compute resources through multithreading or multiprocessing. To use Dask's default multithreading scheduler, users only need to open and chunk datasets in Xarray before calling xCDAT APIs. xCDAT's seamless support for parallel computing enables users to run large-scale computations with minimal effort. If users require more resources, they can also configure and use a local Dask cluster to meet resource-intensive computational needs. Figure 1 shows xCDAT's significant performance advantage over CDAT for global spatial averaging on datasets of varying sizes.

![A performance benchmark for global spatial averaging computations using CDAT (serial only) and xCDAT (serial and parallel with Dask distributed scheduler). xCDAT outperforms CDAT by a wide margin for the 7 GB and 12 GB datasets. Runtimes could not be captured for CDAT with datasets >= 22 GB and xCDAT serial for the 105 GB dataset due to memory allocation errors. The performance benchmark setup and scripts are available in the [`xcdat-validation` repo](https://github.com/xCDAT/xcdat-validation/tree/main/validation/v0.6.0/xcdat-cdat-perf-metrics). __Disclaimer: Performance will vary depending on hardware, dataset shapes/sizes, and how Dask and chunking schemes are configured. There are also some cases where selecting a regional averaging domain (e.g., Niño 3.4) can lead to CDAT outperforming xCDAT.__ \label{fig:figure1}](figures/figure1.png){ height=40% }
![A performance benchmark for global spatial averaging computations using CDAT (serial only) and xCDAT (serial and parallel with Dask distributed scheduler). xCDAT outperforms CDAT by a wide margin for the 7 GB and 12 GB datasets. Runtimes could not be captured for CDAT with datasets >= 22 GB and xCDAT serial for the 105 GB dataset due to memory allocation errors. The performance benchmark setup and scripts are available in the [`xcdat-validation` repo](https://github.com/xCDAT/xcdat-validation/tree/main/validation/v0.6.0/xcdat-cdat-perf-metrics). _Disclaimer: Performance will vary depending on hardware, dataset shapes/sizes, and how Dask and chunking schemes are configured. There are also some cases where selecting a regional averaging domain (e.g., Niño 3.4) can lead to CDAT outperforming xCDAT._ \label{fig:figure1}](figures/figure1.png){ height=40% }

xCDAT's intentional design emphasizes software sustainability and reproducible science. It aims to make analysis code reusable, readable, and less-error prone by abstracting common Xarray boilerplate logic into simple and configurable APIs. xCDAT extends Xarray by using [accessor classes](https://docs.xarray.dev/en/stable/internals/extending-xarray.html) that operate directly on Xarray Dataset objects. xCDAT is rigorously tested using real-world datasets and maintains 100% unit test coverage (at the time this paper was written). To demonstrate the value in xCDAT's API design, Figure 2 compares code to calculate annual averages for global climatological anomalies using Xarray against xCDAT. xCDAT requires fewer lines of code and supports further user options (e.g., regional or seasonal averages, not shown). Figure 2 shows the plots for the results produced by xCDAT.

Expand Down

0 comments on commit f7c487a

Please sign in to comment.