Skip to content

Latest commit

 

History

History
505 lines (389 loc) · 23.9 KB

index.md

File metadata and controls

505 lines (389 loc) · 23.9 KB
title description tagline button_text button_link layout name
Open GPU Data Science
A suite of software libraries for executing end-to-end data science completely on GPUs
Open GPU Data Science
GET STARTED
start.html
default
index

GPU DATA SCIENCE

{: .section-title-full }

{% capture about_top_left %}

Accelerated Data Science

The RAPIDS suite of open source software libraries gives you the freedom to execute end-to-end data science and analytics pipelines entirely on GPUs.
Learn about RAPIDS {% endcapture %}

{% capture about_top_middle %}

Scale Out on GPUS

Seamlessly scale from GPU workstations to multi-GPU servers and multi-node clusters with Dask.
Learn about Dask {% endcapture %}

{% capture about_top_right %}

Python Integration

Accelerate your Python data science toolchain with minimal code changes and no new tools to learn.
Learn about our libraries {% endcapture %}

{% capture about_bottom_left %}

Top Model Accuracy

Increase machine learning model accuracy by iterating on models faster and deploying them more frequently.
Learn about RAPIDS for model optimization {% endcapture %}

{% capture about_bottom_middle %}

Reduced Training Time

Drastically improve your productivity with more interactive data science tools like XGBoost.
Learn about XGBoost
Learn about accelerated ML with cuML {: target="_blank"} {% endcapture %}

{% capture about_bottom_right %}

Open Source

RAPIDS is an open source project. Supported by NVIDIA, it also relies on Numba, Apache Arrow, and many more open source projects.
Learn about our projects {% endcapture %}

{% include section-double-thirds.html background="background-white" padding-top="1em" padding-bottom="10em" content-top-left-third=about_top_left content-top-middle-third=about_top_middle content-top-right-third=about_top_right content-bottom-left-third=about_bottom_left content-bottom-middle-third=about_bottom_middle content-bottom-right-third=about_bottom_right %}

{% capture start_left %}

Getting Started

{: .section-title-halfs}

The RAPIDS data science framework is designed to have a familiar look and feel to data scientist working in Python. Here’s a code snippet where we read in a CSV file and output some descriptive statistics:

import cudf

gdf = cudf.read_csv('path/to/file.csv')
for column in gdf.columns:
    print(gdf[column].mean())

Find more details on our Get Started Page

Try Now Online

Jump right into a GPU powered RAPIDS notebook, online, with SageMaker Studio Lab (free account required):

![SageMaker Studio Lab]({{ site.baseurl }}{% link /assets/images/Open-StudioLab.png%}){: target="_blank"}{: .half-image}

{% endcapture %}

{% capture start_right %}

10 Minutes to cuDF and Dask-cuDF

{: .section-subtitle-top-1}

Modeled after 10 Minutes to Pandas, this is a short introduction to cuDF that is geared mainly for new users.
Go to guide {: target="_blank"}

Example Notebooks

{: .section-subtitle-top-1}

A GitHub repository with our introductory examples of XGBoost, cuML demos, cuGraph demos, and more.
Go to repo {: target="_blank"}

Example Community Notebooks

{: .section-subtitle-top-1}

A second GitHub repository with our extended collection of community contributed notebook examples.
Go to repo {: target="_blank"}

{% endcapture %} {% include slopecap.html background="background-gray" position="top" slope="down" %} {% include section-halfs.html background="background-gray" padding-top="1em" padding-bottom="1em" content-left-half=start_left content-right-half=start_right %} {% capture posts_title %}

RAPIDS News

{% endcapture %} {% include section-single.html background="background-gray" padding-top="0em" padding-bottom="0em" content-single=posts_title %}

{% include medium-thirds-json.html background="background-gray" padding-top="1em" padding-bottom="1em" %}

{% include tweet-thirds-json.html background="background-gray" padding-top="1em" padding-bottom="10em" %}

{% capture com_left %}

RAPIDS Repositories

{: .section-title-halfs} RAPIDS is committed to open source. We strive to release new features on a 2 month cadence{: target="_blank"} with the generalized release schedule below. Learn more on our Release Blogs {: target="_blank"}

Release Schedule

<style> .cls-1, .cls-2 { fill: none; stroke: #e0e0e0; stroke-width: 3px; } .cls-1, .cls-2, .cls-3 { stroke-miterlimit: 10; }
    .cls-2 {
        stroke-dasharray: 10.11 10.11;
    }

    .cls-3,
    .cls-4 {
        fill: #fff;
    }

    .cls-3 {
        stroke: #9943ff;
        stroke-width: 2px;
    }
    .cls-4 {
        font-size: 15px;
        font-family: sans-serif, Helvetica;
    }
    .cls-7 {
        fill: #fff;
        font-weight: bold;
        font-size: 25px;
    }

    </style>
</defs>
<title>Release Schedule</title>
<line class="cls-1" x1="37.36" y1="62.02" x2="320.24" y2="62.02" />
<line class="cls-1" x1="320.24" y1="62.02" x2="325.24" y2="62.02" />
<line class="cls-2" x1="335.35" y1="62.02" x2="593.08" y2="62.02" />
<line class="cls-1" x1="598.13" y1="62.02" x2="603.13" y2="62.02" />
<circle class="cls-3" cx="37.36" cy="62.02" r="16.61" />
<circle class="cls-3" cx="320.24" cy="62.02" r="16.61" />
<circle class="cls-3" cx="603.13" cy="62.02" r="16.61" />
<text class="cls-4" transform="translate(0 100.00)"> {{ site.data.releases.legacy-date }} </text>
<text class="cls-4" transform="translate(283.48 100.00)"> {{ site.data.releases.stable-date }} </text>
<text class="cls-4" transform="translate(564.22 100.00)"> {{ site.data.releases.nightly-date }} </text>
<text class="cls-4" transform="translate(7.00 12.00)"> LEGACY </text>
<text class="cls-4" transform="translate(292.00 12.00)"> STABLE </text>
<text class="cls-4" transform="translate(571.00 12.00)"> NIGHTLY </text>
<text class="cls-7" transform="translate(10.00 38.00)"> {{ site.data.releases.legacy-version }} </text>
<text class="cls-7" transform="translate(295.00 38.00)"> {{ site.data.releases.stable-version }} </text>
<text class="cls-7" transform="translate(578.00 38.00)"> {{ site.data.releases.nightly-version }} </text>
{: .padding-top-1em .padding-bottom-2em } {% endcapture %}

{% capture com_right %}

RAPIDS APIS and Libraries

{: .section-subtitle-top-1}

RAPIDS is open source licensed under Apache 2.0, spanning multiple projects that range from GPU dataframes to GPU accelerated ML algorithms. It also provides native array_interface support, allowing Apache Arrow data to be pushed to deep learning frameworks.
Learn more

Contributing

Whether you are new to RAPIDS, looking to help, or are part of the team, learn about our contributing guidelines on our contributing page.
Go to Docs {: target="_blank"} {% endcapture %}

{% include slopecap.html background="background-purple" position="top" slope="up" %} {% include section-halfs.html background="background-purple" padding-top="5em" padding-bottom="6em" content-left-half=com_left content-right-half=com_right %}

{% capture lib1_left %}

cuDF API

GitHub{: target="_blank"} / Docs{: target="_blank"} / Change Log{: target="_blank"} {: .no-tb-margins }

cuDF is a Python GPU DataFrame library (built on the Apache Arrow{: target="_blank"} columnar memory format) for loading, joining, aggregating, filtering, and otherwise manipulating data all in a pandas-like{: target="_blank"} API familiar to data scientists.

{% endcapture %} {% capture lib1_right %}

libcudf LIB

GitHub{: target="_blank"} / Docs{: target="_blank"} / Change Log{: target="_blank"} {: .no-tb-margins }

libcudf is a CUDA C++ library for implementing standard dataframe operations. It is part of the cuDF repository.

{% endcapture %} {% capture lib2_left %}

cuML API

GitHub{: target="_blank"} / Docs{: target="_blank"} / Change Log{: target="_blank"} {: .no-tb-margins }

cuML is a suite of libraries that implement machine learning algorithms and mathematical primitives functions that are compatible with other RAPIDS projects, all in a scikit-learn-like{: target="_blank"} API familiar to data scientists.

{% endcapture %}

{% capture lib2_right %}

cuGraph API

GitHub{: target="_blank"} / Docs{: target="_blank"} / Change Log{: target="_blank"} {: .no-tb-margins }

cuGraph is a GPU accelerated graph analytics library, with functionality like NetworkX{: target="_blank"}, which is seamlessly integrated into the RAPIDS data science platform.

{% endcapture %} {% capture lib3_left %}

cuSignal API

GitHub{: target="_blank"} / Docs{: target="_blank"} / Change Log{: target="_blank"} {: .no-tb-margins }

cuSignal is a GPU accelerated signal processing library built around a SciPy Signal-like{: target="_blank"} API, CuPy, and custom Numba and CuPy CUDA kernels. cuSignal is written exclusively in Python and demonstrates GPU speeds without a C++ software layer.

{% endcapture %} {% capture lib3_right %}

cuSpatial API

GitHub{: target="_blank"} / Docs{: target="_blank"} / Change Log{: target="_blank"} {: .no-tb-margins }

cuSpatial is an efficient C++ library accelerated on GPUs with Python bindings to enable use by the data science community. cuSpatial provides significant GPU-acceleration to common spatial and spatiotemporal operations such as point-in-polygon tests, distances between trajectories, and trajectory clustering when compared to CPU-based implementations.

{% endcapture %} {% capture lib4_left %}

cuxfilter API

GitHub{: target="_blank"} / Docs{: target="_blank"} / Change Log{: target="_blank"} {: .no-tb-margins }

cuxfilter is a framework to connect web visualizations to GPU accelerated crossfiltering. Inspired by the JavaScript library crossfilter{: target="_blank"}, it enables interactive and super fast multi-dimensional filtering of 100 million+ row tabular datasets via cuDF{: target="_blank"}.

{% endcapture %} {% capture lib4_right %}

CLX API

GitHub{: target="_blank"} / Docs{: target="_blank"} / Change Log{: target="_blank"} {: .no-tb-margins }

Cyber Log Accelerators (CLX), also pronounced "clicks", provides a collection of RAPIDS examples for security analysts, data scientists, and engineers to quickly get started applying RAPIDS and GPU acceleration to real-world cybersecurity use cases. {% endcapture %} {% capture lib5_left %}

RMM LIB

GitHub{: target="_blank"} / Docs{: target="_blank"} / Change Log{: target="_blank"} {: .no-tb-margins }

RAPIDS Memory Manager (RMM) is a central place for all device memory allocations in cuDF (C++ and Python) and other RAPIDS libraries. In addition, it is a replacement allocator for CUDA Device Memory (and CUDA Managed Memory) and a pool allocator to make CUDA device memory allocation / deallocation faster and asynchronous.

{% endcapture %} {% capture lib5_right %}

cuCIM API

GitHub{: target="_blank"} / Docs{: target="_blank"} / Change Log{: target="_blank"} {: .no-tb-margins }

cuCIM is a an extensible toolkit designed to provide GPU-accelerated I/O, computer vision and image processing primitives for N-Dimensional images with a focus on biomedical imaging. Our API mirrors scikit-image for image manipulation and OpenSlide for image loading.

{% endcapture %}

{% include section-halfs.html background="background-purple" padding-top="0em" padding-bottom="1em" content-left-half=lib1_left content-right-half=lib1_right %} {% include section-halfs.html background="background-purple" padding-top="0em" padding-bottom="1em" content-left-half=lib2_left content-right-half=lib2_right %} {% include section-halfs.html background="background-purple" padding-top="0em" padding-bottom="1em" content-left-half=lib3_left content-right-half=lib3_right %} {% include section-halfs.html background="background-purple" padding-top="0em" padding-bottom="1em" content-left-half=lib4_left content-right-half=lib4_right %} {% include section-halfs.html background="background-purple" padding-top="0em" padding-bottom="6em" content-left-half=lib5_left content-right-half=lib5_right %} {% include slopecap.html background="background-purple" position="bottom" slope="down" %}

Community and Projects

{: .section-title-full .padding-top-1em}

{% capture com_top_left %} ![RAPIDS+SQL]({{ site.baseurl }}{% link /assets/images/RAPIDS-SQL.png %}){: .third-image-center}

RAPIDS + SQL

RAPIDS integrates with Spark SQL and Dask-SQL to accelerate your SQL queries at scale and make GPU acceleration available to an even broader set of users.
Learn more about RAPIDS + Spark SQL
Learn more about RAPIDS + Dask SQL

{% endcapture %}

{% capture com_top_right %} ![Dask]({{ site.baseurl }}{% link /assets/images/dask_logo.png %}){: .third-image-center}

RAPIDS + Dask

Dask is an open source project providing advanced parallelism for analytics that enables performance at scale. RAPIDS is actively contributing to Dask, and it integrates with RAPIDS cuDF, XGBoost, and RAPIDS cuML for GPU-accelerated data analytics and machine learning.
Learn more on our Dask page

{% endcapture %}

{% capture com_bottom_left %} ![xgboost]({{ site.baseurl }}{% link /assets/images/xgboost_logo.png %}){: .third-image-center}

RAPIDS + XGBoost

XGBoost is a well-known gradient boosted decision trees (GBDT) machine learning package used to tackle regression, classification, and ranking problems. The RAPIDS team works closely with the Distributed Machine Learning Common (DMLC) XGBoost organization to upstream code and ensure that all components of the GPU-accelerated analytics ecosystem work together.
Learn more on our XGBoost page

{% endcapture %}

{% capture com_bottom_right %} ![cloud]({{ site.baseurl }}{% link /assets/images/RAPIDS-cloud.png %}){: .third-image-center}

RAPIDS + Cloud

RAPIDS’ GPU accelerated data science tools can be deployed on all of the major clouds, allowing anyone to take advantage of the speed increases and TCO reductions that RAPIDS enables.
Learn more on our cloud page {% endcapture %}

{% capture com2_top_left %} ![Plotly]({{ site.baseurl }}{% link /assets/images/Plotly_Dash_logo.png %}){: .third-image-center}

RAPIDS + Plotly Dash

Plotly’s Dash enables Data Science teams to focus on the data and models, while producing and sharing enterprise-ready analytic apps that sit on top of RAPIDS-accelerated Python dataframes.
Learn more on our Plotly page

{% endcapture %}

{% capture com2_top_right %} ![HPO]({{ site.baseurl }}{% link /assets/images/csp+hpo.png %}){: .third-image-center}

RAPIDS + HPO

Accelerate Hyperparameter Optimization (HPO) in the Cloud. The RAPIDS team works closely with major cloud providers and open source hyperparameter optimization solutions to ensure smooth integration and high performance, regardless of your deployment platform.
Learn more on our HPO page

{% endcapture %}

{% capture com2_bottom_left %} ![merlin]({{ site.baseurl }}{% link /assets/images/NVLogo_2D_H.png%}){: .third-image-center}

RAPIDS + NVIDIA MERLIN

NVIDIA Merlin is an open source library providing end-to-end GPU-accelerated recommender systems. Merlin leverages RAPIDS cuDF and Dask cuDF for dataframe transformation during ETL and inference, as well as for the optimized dataloaders in TensorFlow, PyTorch or HugeCTR to accelerate deep learning training.
Learn more on our Merlin page

{% endcapture %}

{% capture com2_bottom_right %} ![slurm]({{ site.baseurl }}{% link /assets/images/slurm-logo.png%}){: .third-image-center}

RAPIDS + HPC

RAPIDS works extremely well in traditional HPC environments where GPUs are often co-located with accelerated networking hardware such as InfiniBand.
Learn more on our HPC page

{% endcapture %}

{% capture com3_top_left %} ![spark]({{ site.baseurl }}{% link /assets/images/spark-logo-trademark.png %}){: .third-image-center}

RAPIDS + Spark

NVIDIA is bringing RAPIDS to Apache Spark to accelerate ETL workflows with GPUs.
Learn more on the RAPIDS for Apache Spark page {: target="_blank"}

{% endcapture %}

{% capture com3_top_right %} ![monai]({{ site.baseurl }}{% link /assets/images/MONAI-logo_color.png%}){: .third-image-center}

RAPIDS + MONAI

The Medical Open Network for AI (MONAI) has been named by some the PyTorch of healthcare. RAPIDS cuCIM has been integrated into the MONAI Transforms component to accelerate the data pathology training pipeline on GPU.
Learn more on MONAI latest highlights

{% endcapture %} {% capture com3_bottom_left %} ![pip]({{ site.baseurl }}{% link /assets/images/pypip.png%}){: .third-image-center}

RAPIDS + PIP

RAPIDS users can once again install RAPIDS via pip! Early-access experimental pip packages are now available!
Learn more about RAPIDS and pip

{% endcapture %}

{% include section-double-halfs.html background="background-white" padding-top="2em" padding-bottom="0em" content-top-left-half=com_top_left content-top-right-half=com_top_right content-bottom-left-half=com_bottom_left content-bottom-right-half=com_bottom_right %} {% include section-double-halfs.html background="background-white" padding-top="2em" padding-bottom="0em" content-top-left-half=com2_top_left content-top-right-half=com2_top_right content-bottom-left-half=com2_bottom_left content-bottom-right-half=com2_bottom_right %} {% include section-double-halfs.html background="background-white" padding-top="0em" padding-bottom="1em" content-top-left-half=com3_top_left content-top-right-half=com3_top_right content-bottom-left-half=com3_bottom_left content-bottom-right-half=com3_bottom_right %}

Contributors

{: .section-title-full} {% include contributing-logos.html padding-top="0em" padding-bottom="5em" %}

Adopters

{: .section-title-full} {% include adopter-logos.html padding-top="0em" padding-bottom="5em" %}

Open Source

{: .section-title-full} {% include open-source-logos.html padding-top="0em" padding-bottom="10em" %}

{% include slopecap.html background="background-darkpurple" position="top" slope="up" %} {% include cta-footer.html name="Experience Data Science on GPUs with RAPIDS" tagline="" button="GET STARTED" link="start.html" %}