From 43a5654bda761a1bd02079fc606a9f6e0107b979 Mon Sep 17 00:00:00 2001 From: Ubuntu Date: Tue, 30 Jan 2024 02:20:59 +0000 Subject: [PATCH] build since blog --- feed.xml | 220 ++++++++- feed/blog.xml | 2 +- index.html | 27 +- page/2.html | 24 +- page/3.html | 25 +- page/4.html | 24 +- page/5.html | 24 +- page/6.html | 12 + pages/about.html | 153 +++--- pages/index.html | 51 +- pages/page/2.html | 48 +- pages/page/3.html | 52 +- pages/page/4.html | 54 +- pages/page/5.html | 63 +-- pages/page/6.html | 36 ++ pages/start.html | 841 ++++++++++++++++---------------- release/2024/01/26/release.html | 833 +++++++++++++++++++++++++++++++ 17 files changed, 1772 insertions(+), 717 deletions(-) create mode 100644 release/2024/01/26/release.html diff --git a/feed.xml b/feed.xml index 1ec791f..ef3bf3f 100644 --- a/feed.xml +++ b/feed.xml @@ -1,4 +1,203 @@ -Jekyll2023-08-17T01:43:26+00:00https://www.dgl.ai/feed.xmlDeep Graph LibraryEasy Deep Learning on GraphsDGL 1.0: Empowering Graph Machine Learning for Everyone2023-02-20T00:00:00+00:002023-02-20T00:00:00+00:00https://www.dgl.ai/release/2023/02/20/release<p>We are thrilled to announce the arrival of DGL 1.0, a cutting-edge machine +Jekyll2024-01-30T02:20:10+00:00https://www.dgl.ai/feed.xmlDeep Graph LibraryEasy Deep Learning on GraphsDGL 2.0: Streamlining Your GNN Data Pipeline from Bottleneck to Boost2024-01-26T00:00:00+00:002024-01-26T00:00:00+00:00https://www.dgl.ai/release/2024/01/26/release<p>We’re thrilled to announce the release of DGL 2.0, a major milestone in our +mission to empower developers with cutting-edge tools for Graph Neural Networks +(GNNs). Traditionally, data loading has been a significant bottleneck in GNN +training. Complex graph structures and the need for efficient sampling often +lead to slow data loading times and resource constraints. This can drastically +hinder the training speed and scalability of your GNN models. DGL 2.0 breaks +free from these limitations with the introduction of dgl.graphbolt, a +revolutionary data loading framework that supercharges your GNN training by +streamlining the data pipeline.</p> + +<p><img src="/assets/images/posts/2024-01-26-release/diagram.png" alt="diagram" width="800x" class="aligncenter" /></p> +<p><center>High-Level Architecture of GraphBolt Data Pipeline</center></p> + +<h2 id="flexible-data-pipeline--customizable-stages">Flexible data pipeline &amp; customizable stages</h2> + +<p>One size doesn’t fit all - and especially not when it comes to dealing with a +variety of graph data and GNN tasks. For instance, link prediction requires +negative sampling but not node classification, some features are too large to be +stored in memory, and occasionally, we might combine multiple sampling +operations to form subgraphs. To offer adaptable operators while maintaining +high performance, dgl.graphbolt integrates seamlessly with the PyTorch datapipe, +relying on the unified “MiniBatch” data structure to connect processing stages. +The core stages are defined as:</p> + +<ul> + <li><strong>Item Sampling</strong>: randomly selects a subset (nodes, edges, graphs) from the +entire training set as an initial mini-batch for downstream computation.</li> + <li><strong>Negative Sampling (for Link Prediction)</strong>: generates non-existing edges as +negative examples.</li> + <li><strong>Subgraph Sampling</strong>: generates subgraphs based on the input nodes/edges.</li> + <li><strong>Feature Fetching</strong>: fetches related node/edge features from the dataset for +the given input.</li> + <li><strong>Data Moving (for training on GPU)</strong>: moves the data to specified device for +training.</li> +</ul> + +<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="c"># Seed edge sampler.</span> +<span class="n">dp</span> <span class="o">=</span> <span class="n">gb</span><span class="o">.</span><span class="n">ItemSampler</span><span class="p">(</span><span class="n">train_edge_set</span><span class="p">,</span> <span class="n">batch_size</span><span class="o">=</span><span class="mi">1024</span><span class="p">,</span> <span class="n">shuffle</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span> +<span class="c"># Negative sampling.</span> +<span class="n">dp</span> <span class="o">=</span> <span class="n">dp</span><span class="o">.</span><span class="n">sample_uniform_negative</span><span class="p">(</span><span class="n">graph</span><span class="p">,</span> <span class="n">negative_ratio</span><span class="o">=</span><span class="mi">10</span><span class="p">)</span> +<span class="c"># Neighbor sampling.</span> +<span class="n">dp</span> <span class="o">=</span> <span class="n">dp</span><span class="o">.</span><span class="n">sample_neighbor</span><span class="p">(</span><span class="n">graph</span><span class="p">,</span> <span class="n">fanouts</span><span class="o">=</span><span class="p">[</span><span class="mi">15</span><span class="p">,</span> <span class="mi">10</span><span class="p">,</span> <span class="mi">5</span><span class="p">])</span> +<span class="c"># Fetch features.</span> +<span class="n">dp</span> <span class="o">=</span> <span class="n">dp</span><span class="o">.</span><span class="n">fetch_feature</span><span class="p">(</span><span class="n">features</span><span class="p">,</span> <span class="n">node_feature_keys</span><span class="o">=</span><span class="p">[</span><span class="s">"feat"</span><span class="p">])</span> +<span class="c"># Copy to GPU for training.</span> +<span class="n">dp</span> <span class="o">=</span> <span class="n">dp</span><span class="o">.</span><span class="n">copy_to</span><span class="p">(</span><span class="n">device</span><span class="o">=</span><span class="s">"cuda:0"</span><span class="p">)</span> +</code></pre> +</div> + +<p>The dgl.graphbolt allows you to plug in your own custom processing steps to +build the perfect data pipeline for your needs, for example:</p> + +<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="c"># Seed edge sampler.</span> +<span class="n">dp</span> <span class="o">=</span> <span class="n">gb</span><span class="o">.</span><span class="n">ItemSampler</span><span class="p">(</span><span class="n">train_edge_set</span><span class="p">,</span> <span class="n">batch_size</span><span class="o">=</span><span class="mi">1024</span><span class="p">,</span> <span class="n">shuffle</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span> +<span class="c"># Negative sampling.</span> +<span class="n">dp</span> <span class="o">=</span> <span class="n">dp</span><span class="o">.</span><span class="n">sample_uniform_negative</span><span class="p">(</span><span class="n">graph</span><span class="p">,</span> <span class="n">negative_ratio</span><span class="o">=</span><span class="mi">10</span><span class="p">)</span> +<span class="c"># Neighbor sampling.</span> +<span class="n">dp</span> <span class="o">=</span> <span class="n">dp</span><span class="o">.</span><span class="n">sample_neighbor</span><span class="p">(</span><span class="n">graph</span><span class="p">,</span> <span class="n">fanouts</span><span class="o">=</span><span class="p">[</span><span class="mi">15</span><span class="p">,</span> <span class="mi">10</span><span class="p">,</span> <span class="mi">5</span><span class="p">])</span> + +<span class="c"># Exclude seed edges.</span> +<span class="n">dp</span> <span class="o">=</span> <span class="n">dp</span><span class="o">.</span><span class="n">transform</span><span class="p">(</span><span class="n">gb</span><span class="o">.</span><span class="n">exclude_seed_edges</span><span class="p">)</span> + +<span class="c"># Fetch features.</span> +<span class="n">dp</span> <span class="o">=</span> <span class="n">dp</span><span class="o">.</span><span class="n">fetch_feature</span><span class="p">(</span><span class="n">features</span><span class="p">,</span> <span class="n">node_feature_keys</span><span class="o">=</span><span class="p">[</span><span class="s">"feat"</span><span class="p">])</span> +<span class="c"># Copy to GPU for training.</span> +<span class="n">dp</span> <span class="o">=</span> <span class="n">dp</span><span class="o">.</span><span class="n">copy_to</span><span class="p">(</span><span class="n">device</span><span class="o">=</span><span class="s">"cuda:0"</span><span class="p">)</span> + +</code></pre> +</div> + +<p>The dgl.graphbolt empowers you to customize stages in your data pipelines. +Implement custom stages using pre-defined APIs, such as loading features from +external storage or adding customized caching mechanisms (e.g. +<a href="https://github.com/dmlc/dgl/blob/0cb309a1b406d896311b5cfc2b5b1a1915f57c3b/python/dgl/graphbolt/impl/gpu_cached_feature.py#L11">GPUCachedFeature</a>), +and integrate the custom stages seamlessly without any modifications to your +core training code.</p> + +<h2 id="speed-enhancement--memory-efficiency">Speed enhancement &amp; memory efficiency</h2> + +<p>The dgl.graphbolt doesn’t just give you flexibility, it also provides top +performance under the hood. It features a compact graph data structure for +efficient sampling, blazing-fast multi-threaded neighbor sampling operator and +edge exclusion operator, and a built-in option to store large feature tensors +outside your CPU’s main memory. Additionally, The dgl.graphbolt takes care of +scheduling across all hardware, minimizing wait times and maximizing efficiency.</p> + +<p>The dgl.graphbolt brings impressive speed gains to your GNN training, showcasing +over 30% faster node classification in our benchmark and a remarkable ~390% +acceleration for link prediction in our benchmark that involve edge exclusion.</p> + +<table style="text-align: center;"> + <tr> + <th>Epoch Time(s)</th> + <th>GraphSAGE</th> + <th>R-GCN</th> + </tr> + <tr> + <td>DGL Dataloader</td> + <td>22.5</td> + <td>73.6</td> + </tr> + <tr> + <td>dgl.graphbolt</td> + <td>17.2</td> + <td>64.6</td> + </tr> + <tr> + <td>**Speedup**</td> + <td>**1.31x**</td> + <td>**1.14x**</td> + </tr> +</table> +<p><center>Node classification speedup (NVIDIA T4 GPU). GraphSAGE is tested on OGBN-Products. R-GCN is tested on OGBN-MAG</center></p> + +<table style="text-align: center;"> + <tr> + <th>Epoch Time(s)</th> + <th>include seeds</th> + <th>exclude seeds</th> + </tr> + <tr> + <td>DGL Dataloader</td> + <td>37.75</td> + <td>135.32</td> + </tr> + <tr> + <td>dgl.graphbolt</td> + <td>15.51</td> + <td>27.62</td> + </tr> + <tr> + <td>**Speedup**</td> + <td>**2.43x**</td> + <td>**4.90x**</td> + </tr> +</table> +<p><center>Link prediction speedup (NVIDIA T4 GPU) on OGBN-Citation2</center></p> + +<p>For memory-constrained training on enormous graphs like OGBN-MAG240m, the +dgl.graphbolt also proves its worth. While both utilize mmap-based optimization, +compared to DGL dataloader, the dgl.graphbolt boasts a substantial speedup. The +dgl.graphbolt’s well-defined component API streamlines the process for +contributors to refine out-of-core RAM solutions for future optimization, +ensuring even the most massive graphs can be tackled with ease.</p> + +<table style="text-align: center;"> + <tr> + <th>Iteration time with different RAM size (s)</th> + <th>128GB RAM</th> + <th>256GB RAM</th> + <th>384GB RAM</th> + </tr> + <tr> + <td>Naïve DGL dataloader</td> + <td>OOM</td> + <td>OOM</td> + <td>OOM</td> + </tr> + <tr> + <td>Optimized DGL dataloader</td> + <td>65.42</td> + <td>3.86</td> + <td>0.30</td> + </tr> + <tr> + <td>dgl.graphbolt</td> + <td>60.99</td> + <td>3.21</td> + <td>0.23</td> + </tr> +</table> +<p><center>Node classification on OGBN-MAG240m under different RAM sizes. Optimized DGL dataloader baseline uses mmap to load features.</center></p> + +<h2 id="whats-more">What’s more</h2> + +<p>Furthermore, DGL 2.0 includes various new additions such as a hetero-relational +GCN example and several datasets. Improvements have been introduced to the +system, examples, and documentation, including updates to the CPU Docker +tcmalloc, supporting sparse matrix slicing operators and enhancements in various +examples. A set of <a href="https://docs.dgl.ai/api/python/nn-pytorch.html#utility-modules-for-graph-transformer">utilities</a> for building graph transformer models is released +along with this version, including NN modules such as positional encoders and +layers as building blocks, and <a href="https://github.com/dmlc/dgl/tree/master/examples/core/Graphormer">examples</a> and <a href="https://docs.dgl.ai/en/latest/graphtransformer/index.html">tutorials</a> demonstrating the usage of +them. Additionally, numerous bug fixes have been implemented, resolving issues +such as the cusparseCreateCsr format for cuda12, addressing the lazy device copy +problem related to DGL node/edge features e.t.c. For more information on the new +additions and changes in DGL 2.0, please refer to our <a href="https://github.com/dmlc/dgl/releases/tag/v2.0.0">release note</a>.</p> + +<h2 id="get-started-with-dgl-20">Get started with DGL 2.0</h2> + +<p>You can easily install DGL 2.0 with dgl.graphbolt on any platform using <a href="https://www.dgl.ai/pages/start.html">pip or conda</a>. +To jump right in, dive into our brand-new <a href="https://docs.dgl.ai/en/latest/stochastic_training/index.html">Stochastic Training of GNNs with GraphBolt tutorial</a> +and experiment with our <a href="https://colab.research.google.com/github/dmlc/dgl/blob/master/notebooks/stochastic_training/node_classification.ipynb">node classification</a> +and <a href="https://colab.research.google.com/github/dmlc/dgl/blob/master/notebooks/stochastic_training/link_prediction.ipynb">link prediction</a> +examples in Google Colab. No need to set up a local environment - just point and +click! This first release of DGL 2.0 with dgl.graphbolt packs a punch with +<a href="https://github.com/dmlc/dgl/tree/master/examples/sampling/graphbolt">7 comprehensive single-GPU examples</a> +and <a href="https://github.com/dmlc/dgl/tree/master/examples/multigpu/graphbolt">1 multi-GPU example</a>, covering a wide range of tasks.</p> + +<p>We welcome your feedback and are available via <a href="https://github.com/dmlc/dgl/issues">Github issues</a> and <a href="https://discuss.dgl.ai/">Discuss posts</a>. +Join our <a href="http://slack.dgl.ai/">Slack channel</a> to stay updated and to connect with the community.</p>DGLTeamWe’re thrilled to announce the release of DGL 2.0, a major milestone in our mission to empower developers with cutting-edge tools for Graph Neural Networks (GNNs). Traditionally, data loading has been a significant bottleneck in GNN training. Complex graph structures and the need for efficient sampling often lead to slow data loading times and resource constraints. This can drastically hinder the training speed and scalability of your GNN models. DGL 2.0 breaks free from these limitations with the introduction of dgl.graphbolt, a revolutionary data loading framework that supercharges your GNN training by streamlining the data pipeline.DGL 1.0: Empowering Graph Machine Learning for Everyone2023-02-20T00:00:00+00:002023-02-20T00:00:00+00:00https://www.dgl.ai/release/2023/02/20/release<p>We are thrilled to announce the arrival of DGL 1.0, a cutting-edge machine learning framework for deep learning on graphs. Over the past three years, there has been growing interest from both academia and industry in this technology. Our framework has received requests from various scenarios, from @@ -1767,21 +1966,4 @@ training loops and code snippets in DGL to realize them.</p> <ul> <li>Full release note: <a href="https://github.com/dmlc/dgl/releases/tag/v0.6.0">https://github.com/dmlc/dgl/releases/tag/v0.6.0</a></li> -</ul>DGLTeamThe recent DGL 0.6 release is a major update on many aspects of the project including documentation, APIs, system speed, and scalability. This article highlights some of the new features and enhancements.DGL Empowers Service for Predictions on Connected Datasets with Graph Neural Networks2020-12-15T00:00:00+00:002020-12-15T00:00:00+00:00https://www.dgl.ai/news/2020/12/15/neptuneml<p>AWS just announced the availability of <a href="http://aws.amazon.com/neptune/machine-learning/">Neptune ML</a>. -Amazon Neptune is a fast, -reliable, fully managed graph database service that makes it easy to build and -run applications that work with highly connected datasets. Neptune ML is a new -capability that uses graph neural networks (GNNs), a machine learning (ML) -technique purpose-built for graphs, for making easy, fast, and accurate -predictions on graphs. The accuracy of most predictions for graphs increases to -50% with Neptune ML when compared to non-graph methods. Neptune ML uses the -Deep Graph Library (DGL), an open-source library to which AWS contributes that -makes it easy to develop and apply GNN models on graph data. Now, developers -can create, train, and apply ML on Neptune data in hours instead of weeks -without the need to learn new tools and ML technologies.</p> - -<p>We would love to see more commercial vendors build innovation on top of DGL in -the future. For more information about Neptune ML, please visit the <a href="https://aws.amazon.com/blogs/database/announcing-amazon-neptune-ml-easy-fast-and-accurate-predictions-on-graphs/">AWS blog</a> -and <a href="https://aws.amazon.com/neptune/machine-learning/">product page</a>. Watch the -<a href="https://reinvent.awsevents.com/keynotes/">re:Invent 2020 Machine Learning Keynote</a> -by Swami Sivasubramanian for the full announcement.</p>DGLTeamAWS just announced the availability of Neptune ML. Amazon Neptune is a fast, reliable, fully managed graph database service that makes it easy to build and run applications that work with highly connected datasets. Neptune ML is a new capability that uses graph neural networks (GNNs), a machine learning (ML) technique purpose-built for graphs, for making easy, fast, and accurate predictions on graphs. The accuracy of most predictions for graphs increases to 50% with Neptune ML when compared to non-graph methods. Neptune ML uses the Deep Graph Library (DGL), an open-source library to which AWS contributes that makes it easy to develop and apply GNN models on graph data. Now, developers can create, train, and apply ML on Neptune data in hours instead of weeks without the need to learn new tools and ML technologies. \ No newline at end of file +</ul>DGLTeamThe recent DGL 0.6 release is a major update on many aspects of the project including documentation, APIs, system speed, and scalability. This article highlights some of the new features and enhancements. \ No newline at end of file diff --git a/feed/blog.xml b/feed/blog.xml index e42cbe9..961ae85 100644 --- a/feed/blog.xml +++ b/feed/blog.xml @@ -1,4 +1,4 @@ -Jekyll2023-08-17T01:43:26+00:00https://www.dgl.ai/feed/blog.xmlDeep Graph Library | BlogEasy Deep Learning on GraphsImproving Graph Neural Networks via Network-in-network Architecture2022-11-28T00:00:00+00:002022-11-28T00:00:00+00:00https://www.dgl.ai/blog/2022/11/28/ngnn<p>As Graph Neural Networks (GNNs) has become increasingly popular, there is a +Jekyll2024-01-30T02:20:10+00:00https://www.dgl.ai/feed/blog.xmlDeep Graph Library | BlogEasy Deep Learning on GraphsImproving Graph Neural Networks via Network-in-network Architecture2022-11-28T00:00:00+00:002022-11-28T00:00:00+00:00https://www.dgl.ai/blog/2022/11/28/ngnn<p>As Graph Neural Networks (GNNs) has become increasingly popular, there is a wide interest of designing deeper GNN architecture. However, deep GNNs suffer from the <em>oversmoothing</em> issue where the learnt node representations quickly become indistinguishable with more layers. This blog features a simple yet diff --git a/index.html b/index.html index e7d32b2..654a428 100644 --- a/index.html +++ b/index.html @@ -634,6 +634,20 @@

Find an example to get started

diff --git a/page/2.html b/page/2.html index 8cadb31..73f8568 100644 --- a/page/2.html +++ b/page/2.html @@ -636,12 +636,12 @@

Find an example to get started

-

May 2022 Update Note

-

Check out the highlighted features of the new 0.8.2 release! +

v0.9 Release Highlights

+

Check out the highlighted features of the new 0.9 release!

@@ -649,12 +649,12 @@

May 2022 U
-

April 2022 Update Note

-

Check out the highlighted features of the new 0.8.1 release! +

May 2022 Update Note

+

Check out the highlighted features of the new 0.8.2 release!

@@ -662,12 +662,12 @@

April 2022
-

v0.8 Release Highlights

-

The new 0.8 release just hits the road. See the highlighted features! +

April 2022 Update Note

+

Check out the highlighted features of the new 0.8.1 release!

@@ -675,12 +675,12 @@

v0.8 Relea
-

v0.7 Release Highlights

-

Check out the new features in the latest DGL 0.7 release. +

v0.8 Release Highlights

+

The new 0.8 release just hits the road. See the highlighted features!

diff --git a/page/3.html b/page/3.html index d5c2b0c..e4eea48 100644 --- a/page/3.html +++ b/page/3.html @@ -634,6 +634,19 @@

Find an example to get started

diff --git a/page/4.html b/page/4.html index 7df9989..8697607 100644 --- a/page/4.html +++ b/page/4.html @@ -634,6 +634,18 @@

Find an example to get started

diff --git a/page/5.html b/page/5.html index bbc9579..3c78e02 100644 --- a/page/5.html +++ b/page/5.html @@ -636,11 +636,11 @@

Find an example to get started

-

DGL v0.3.1 Release

-

We have received many requests from our community for more GNN layers, models and examples. This is the time to respond. In this minor release, we enriched DGL with a...

+

DGL v0.4 Release (heterogeneous graph update)

+

We are thrilled to announce the 0.4 release! This includes support of heterogeneous graphs, release of a package for training knowledge graph embedding and several feature updates and bug fixes....

@@ -648,11 +648,11 @@

DGL v0.3.1
-

Large-Scale Training of Graph Neural Networks

-

Many graph applications deal with giant scale. Social networks, recommendation and knowledge graphs have nodes and edges in the order of hundreds of millions or even billions of nodes. For...

+

DGL v0.3.1 Release

+

We have received many requests from our community for more GNN layers, models and examples. This is the time to respond. In this minor release, we enriched DGL with a...

@@ -660,11 +660,11 @@

Large-Scale Tra
-

DGL v0.3 Release

-

V0.3 release includes many crucial updates. (1) Fused message passing kernels that greatly boost the training speed of GNNs on large graphs. (2) Add components to enable distributed training of...

+

Large-Scale Training of Graph Neural Networks

+

Many graph applications deal with giant scale. Social networks, recommendation and knowledge graphs have nodes and edges in the order of hundreds of millions or even billions of nodes. For...

@@ -672,11 +672,11 @@

DGL v0.3 R
-

When Kernel Fusion meets Graph Neural Networks

-

This blog describes fused message passing, the key technique enabling these performance improvements. We will address the following questions. (1) Why cannot basic message passing scale to large graphs? (2)...

+

DGL v0.3 Release

+

V0.3 release includes many crucial updates. (1) Fused message passing kernels that greatly boost the training speed of GNNs on large graphs. (2) Add components to enable distributed training of...

diff --git a/page/6.html b/page/6.html index 98e8b17..7472104 100644 --- a/page/6.html +++ b/page/6.html @@ -634,6 +634,18 @@

Find an example to get started