Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Allgather with proxy channel hangs at H2D cudaMemcpyAsync #440

Closed
cubele opened this issue Jan 5, 2025 · 5 comments
Closed

[Bug] Allgather with proxy channel hangs at H2D cudaMemcpyAsync #440

cubele opened this issue Jan 5, 2025 · 5 comments

Comments

@cubele
Copy link

cubele commented Jan 5, 2025

We implemented a simple intra-node allgather algorithm using mscclpplang with proxychannels. However, when running the generated json algorithm file using the nccl interface by setting MSCCL_NCCL_PLAN_DIR, mscclpp reports the following error:

/include/mscclpp/semaphore_device.hpp:30: void mscclpp::Host2DeviceSemaphoreDeviceHandle::wait(signed long): block: [0,0,0], thread: [1,0,0] Assertion (atomicLoad(inboundSemaphoreId, memoryOrderAcquire) < (*expectedInboundSemaphoreId)) failed.

We suspect this is the same error with #394 and #285 where the H2D cudaMemcpyAsync hangs and the semaphore is never signaled. However, there is no working solution to this error. Do you have any thoughts on this?

The algorithm json file is generated using the following mscclpplang program:

import argparse
from msccl.language import *
from msccl.topologies import *
from msccl.language.collectives import AllGather

def ring_allgather(gpus, instances, inplace=False):
    size = gpus
    topology = fully_connected(size)
    collective = AllGather(size, 1, inplace)

    with MSCCLPPProgram(
        f"allgather_ring_proxy_n={size}_i={instances}_inp={inplace}",
        topology,
        collective,
        instances,
        protocol="Simple",
        replication_policy=ReplicationPolicy.interleaved,
    ):
        # Chunk i
        for i in range(size):
            for step in range(size - 1):
                send_rank = (i + step) % size
                recv_rank = (i + step + 1) % size

                c = chunk(send_rank, Buffer.input, 0) if step == 0 else chunk(send_rank, Buffer.output, i)
                c.put(
                    recv_rank,
                    Buffer.output,
                    i,
                    sendtb=0,
                    chan_type=ChannelType.proxy,
                )
                c.signal(recv_rank, Buffer.output, i, sendtb=0, chan_type=ChannelType.proxy)
                c.flush(recv_rank, Buffer.output, i, sendtb=0, chan_type=ChannelType.proxy)
                cr = chunk(recv_rank, Buffer.output, i)
                cr.wait(send_rank, Buffer.input, 0, recvtb=0, chan_type=ChannelType.proxy)

        if not inplace:
            for i in range(size):
                c = chunk(i, Buffer.input, 0)
                c.copy(i, Buffer.output, i, sendtb=0)

        Json()
        Check()

parser = argparse.ArgumentParser()
parser.add_argument("num_gpus", type=int, help="number of gpus")
parser.add_argument("instances", type=int, help="number of instances")
parser.add_argument("--inplace", action="store_true", help="inplace reducescatter")
args = parser.parse_args()

ring_allgather(args.num_gpus, args.instances, args.inplace)

And Allgather is called using the following example code provided by nccl on a non-default stream:

#include <stdio.h>
#include "cuda_runtime.h"
#include "nccl.h"
#include "mpi.h"
#include <unistd.h>
#include <stdint.h>
#include <stdlib.h>

#define MPICHECK(cmd) do {                          \
  int e = cmd;                                      \
  if( e != MPI_SUCCESS ) {                          \
    printf("Failed: MPI error %s:%d '%d'\n",        \
        __FILE__,__LINE__, e);   \
    exit(EXIT_FAILURE);                             \
  }                                                 \
} while(0)

#define CUDACHECK(cmd) do {                         \
  cudaError_t e = cmd;                              \
  if( e != cudaSuccess ) {                          \
    printf("Failed: Cuda error %s:%d '%s'\n",             \
        __FILE__,__LINE__,cudaGetErrorString(e));   \
    exit(EXIT_FAILURE);                             \
  }                                                 \
} while(0)

#define NCCLCHECK(cmd) do {                         \
  ncclResult_t r = cmd;                             \
  if (r!= ncclSuccess) {                            \
    printf("Failed, NCCL error %s:%d '%s'\n",             \
        __FILE__,__LINE__,ncclGetErrorString(r));   \
    exit(EXIT_FAILURE);                             \
  }                                                 \
} while(0)

static uint64_t getHostHash(const char* string) {
  // Based on DJB2a, result = result * 33 ^ char
  uint64_t result = 5381;
  for (int c = 0; string[c] != '\0'; c++){
    result = ((result << 5) + result) ^ string[c];
  }
  return result;
}

static void getHostName(char* hostname, int maxlen) {
  gethostname(hostname, maxlen);
  for (int i=0; i< maxlen; i++) {
    if (hostname[i] == '.') {
        hostname[i] = '\0';
        return;
    }
  }
}

int main(int argc, char* argv[])
{
  int size = 32*1024*1024; // Size of the buffer each rank will send
  int myRank, nRanks, localRank = 0;

  //initializing MPI
  MPICHECK(MPI_Init(&argc, &argv));
  MPICHECK(MPI_Comm_rank(MPI_COMM_WORLD, &myRank));
  MPICHECK(MPI_Comm_size(MPI_COMM_WORLD, &nRanks));

  //calculating localRank based on hostname which is used in selecting a GPU
  uint64_t hostHashs[nRanks];
  char hostname[1024];
  getHostName(hostname, 1024);
  hostHashs[myRank] = getHostHash(hostname);
  MPICHECK(MPI_Allgather(MPI_IN_PLACE, 0, MPI_DATATYPE_NULL, hostHashs, sizeof(uint64_t), MPI_BYTE, MPI_COMM_WORLD));
  for (int p=0; p<nRanks; p++) {
     if (p == myRank) break;
     if (hostHashs[p] == hostHashs[myRank]) localRank++;
  }

  ncclUniqueId id;
  ncclComm_t comm;
  float *sendbuff, *recvbuff;
  cudaStream_t s;

  //get NCCL unique ID at rank 0 and broadcast it to all others
  if (myRank == 0) ncclGetUniqueId(&id);
  MPICHECK(MPI_Bcast((void *)&id, sizeof(id), MPI_BYTE, 0, MPI_COMM_WORLD));

  //picking a GPU based on localRank, allocate device buffers
  CUDACHECK(cudaSetDevice(localRank));
  CUDACHECK(cudaMalloc(&sendbuff, size * sizeof(float)));
  CUDACHECK(cudaMalloc(&recvbuff, size * nRanks * sizeof(float))); // Allocate for allgather
  CUDACHECK(cudaStreamCreate(&s));

  //initializing NCCL
  NCCLCHECK(ncclCommInitRank(&comm, nRanks, id, myRank));

  NCCLCHECK(ncclAllGather((const void*)sendbuff, (void*)recvbuff, size, ncclFloat, comm, s));

  CUDACHECK(cudaStreamSynchronize(s));

  //free device buffers
  CUDACHECK(cudaFree(sendbuff));
  CUDACHECK(cudaFree(recvbuff));

  //finalizing NCCL
  ncclCommDestroy(comm);

  //finalizing MPI
  MPICHECK(MPI_Finalize());

  printf("[MPI Rank %d] Success \n", myRank);
  return 0;
}
@chhwang
Copy link
Contributor

chhwang commented Jan 10, 2025

Can you share your environment details and the commands to reproduce?

@cubele
Copy link
Author

cubele commented Jan 11, 2025

Can you share your environment details and the commands to reproduce?

  • Device: Eight NVIDIA A100-SXM4-80GB, Driver Version: 550.127.08, CUDA Version: 12.4, with NVSwitch
  • Docker: ghcr.io/microsoft/mscclpp/mscclpp:base-cuda12.4
  • Azure/msccl-tools: commit ceaf52ff3d84831de1e7967c15c20c16fd14d982
  • mscclpp: commit 8ac50dc
  • nccl-tests: commit 29f4114f027fed903649a3c81babc5d52e8d41ae
  • command:
mpirun --allow-run-as-root --tag-output --bind-to numa -np 8 -x LD_PRELOAD='/path/to/libmscclpp_nccl.so:' -x MSCCLPP_EXECUTION_PLAN_DIR='/path/to/algo/json/dir' -x MSCCLPP_DEBUG='INFO' /path/to/nccl-test/all_gather_perf -b 1K -e 1G -f 2 -w 10 -n 50

We found out that the test only fails for non in-place allgather, so we only use the non in-place version to reproduce it.

By the way, we encountered the same problem when implementing an alltoall kernel using proxychannel with the python bindings. This allgather example and the alltoall case both involves proxychannel copying from the local inputbuffer to the remote outputbuffer. Other in-place examples with proxychannels that only involves the local and remote outputbuffers, like in-place allgather and the allreduce3 kernel in the benchmark runs normally. Do you have any idea on how involving an extra input buffer can cause the algorithm to hang?

Command to generate the json algorithm using msccl-tools: python3 allgather_ring_proxy.py 8 1
The python file for the algorithm and the generated json are as follows:
allgather_ring_proxy.py

import argparse
from msccl.language import *
from msccl.topologies import *
from msccl.language.collectives import AllGather

def ring_allgather(gpus, instances, inplace=False):
    size = gpus
    topology = fully_connected(size)
    collective = AllGather(size, 1, inplace)

    with MSCCLPPProgram(
        f"allgather_ring_proxy_n={size}_i={instances}_inp={inplace}",
        topology,
        collective,
        instances,
        protocol="Simple",
        replication_policy=ReplicationPolicy.interleaved,
    ):
        # Chunk i
        for i in range(size):
            for step in range(size - 1):
                send_rank = (i + step) % size
                recv_rank = (i + step + 1) % size

                c = chunk(send_rank, Buffer.input, 0) if step == 0 else chunk(send_rank, Buffer.output, i)
                c.put(
                    recv_rank,
                    Buffer.output,
                    i,
                    sendtb=0,
                    chan_type=ChannelType.proxy,
                )
                c.signal(recv_rank, Buffer.output, i, sendtb=0, chan_type=ChannelType.proxy)
                c.flush(recv_rank, Buffer.output, i, sendtb=0, chan_type=ChannelType.proxy)
                cr = chunk(recv_rank, Buffer.output, i)
                cr.wait(send_rank, Buffer.input, 0, recvtb=0, chan_type=ChannelType.proxy)
        if not inplace:
            for i in range(size):
                c = chunk(i, Buffer.input, 0)
                c.copy(i, Buffer.output, i, sendtb=0)

        Json()
        Check()

parser = argparse.ArgumentParser()
parser.add_argument("num_gpus", type=int, help="number of gpus")
parser.add_argument("instances", type=int, help="number of instances")
parser.add_argument("--inplace", action="store_true", help="inplace")
args = parser.parse_args()

ring_allgather(args.num_gpus, args.instances, args.inplace)

allgather_ring_proxy_n=8_i=1_inp=False.json

{
  "name": "allgather_ring_proxy_n=8_i=1_inp=False",
  "collective": "allgather",
  "protocol": "Simple",
  "inplace": false,
  "gpus": [
    {
      "id": 0,
      "inputChunks": 1,
      "outputChunks": 8,
      "scratchChunks": 0,
      "chunkGroups": 1,
      "threadblocks": [
        {
          "id": 0,
          "ops": [
            {
              "name": "pwsf",
              "o_buff": {
                "src": "i",
                "dst": "o"
              },
              "o_cids": [
                {
                  "id": 0,
                  "off": 0
                }
              ],
              "srcs": [
                {
                  "buff": "i",
                  "off": 0
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "wait",
              "i_buff": {
                "src": "i",
                "dst": "o"
              },
              "i_cids": [
                {
                  "id": 1,
                  "off": 0
                },
                {
                  "id": 1,
                  "off": 0
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "nop",
              "deps": [
                {
                  "tb": 0,
                  "step": 1
                }
              ]
            },
            {
              "name": "pwsf",
              "o_buff": {
                "src": "o",
                "dst": "o"
              },
              "o_cids": [
                {
                  "id": 0,
                  "off": 2
                }
              ],
              "srcs": [
                {
                  "buff": "o",
                  "off": 2
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "wait",
              "i_buff": {
                "src": "i",
                "dst": "o"
              },
              "i_cids": [
                {
                  "id": 1,
                  "off": 0
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "nop",
              "deps": [
                {
                  "tb": 0,
                  "step": 4
                }
              ]
            },
            {
              "name": "pwsf",
              "o_buff": {
                "src": "o",
                "dst": "o"
              },
              "o_cids": [
                {
                  "id": 0,
                  "off": 3
                }
              ],
              "srcs": [
                {
                  "buff": "o",
                  "off": 3
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "wait",
              "i_buff": {
                "src": "i",
                "dst": "o"
              },
              "i_cids": [
                {
                  "id": 1,
                  "off": 0
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "nop",
              "deps": [
                {
                  "tb": 0,
                  "step": 7
                }
              ]
            },
            {
              "name": "pwsf",
              "o_buff": {
                "src": "o",
                "dst": "o"
              },
              "o_cids": [
                {
                  "id": 0,
                  "off": 4
                }
              ],
              "srcs": [
                {
                  "buff": "o",
                  "off": 4
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "wait",
              "i_buff": {
                "src": "i",
                "dst": "o"
              },
              "i_cids": [
                {
                  "id": 1,
                  "off": 0
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "nop",
              "deps": [
                {
                  "tb": 0,
                  "step": 10
                }
              ]
            },
            {
              "name": "pwsf",
              "o_buff": {
                "src": "o",
                "dst": "o"
              },
              "o_cids": [
                {
                  "id": 0,
                  "off": 5
                }
              ],
              "srcs": [
                {
                  "buff": "o",
                  "off": 5
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "wait",
              "i_buff": {
                "src": "i",
                "dst": "o"
              },
              "i_cids": [
                {
                  "id": 1,
                  "off": 0
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "nop",
              "deps": [
                {
                  "tb": 0,
                  "step": 13
                }
              ]
            },
            {
              "name": "pwsf",
              "o_buff": {
                "src": "o",
                "dst": "o"
              },
              "o_cids": [
                {
                  "id": 0,
                  "off": 6
                }
              ],
              "srcs": [
                {
                  "buff": "o",
                  "off": 6
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "wait",
              "i_buff": {
                "src": "i",
                "dst": "o"
              },
              "i_cids": [
                {
                  "id": 1,
                  "off": 0
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "nop",
              "deps": [
                {
                  "tb": 0,
                  "step": 16
                }
              ]
            },
            {
              "name": "pwsf",
              "o_buff": {
                "src": "o",
                "dst": "o"
              },
              "o_cids": [
                {
                  "id": 0,
                  "off": 7
                }
              ],
              "srcs": [
                {
                  "buff": "o",
                  "off": 7
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "nop",
              "deps": [
                {
                  "tb": 0,
                  "step": 0
                }
              ]
            },
            {
              "name": "copy",
              "src": 0,
              "srcbuff": "i",
              "srcoff": 0,
              "dst": 0,
              "dstbuff": "o",
              "dstoff": 0,
              "ctype": "none",
              "cnt": 1
            }
          ],
          "channels": [
            {
              "src": "i",
              "dst": "o",
              "ctype": "proxy",
              "cids": [
                0,
                1
              ]
            },
            {
              "src": "o",
              "dst": "o",
              "ctype": "proxy",
              "cids": [
                0
              ]
            }
          ]
        }
      ],
      "channels": [
        {
          "srcbuff": "i",
          "dstbuff": "o",
          "type": "proxy",
          "connectedTo": [
            1,
            7
          ]
        },
        {
          "srcbuff": "o",
          "dstbuff": "o",
          "type": "proxy",
          "connectedTo": [
            1
          ]
        }
      ]
    },
    {
      "id": 1,
      "inputChunks": 1,
      "outputChunks": 8,
      "scratchChunks": 0,
      "chunkGroups": 1,
      "threadblocks": [
        {
          "id": 0,
          "ops": [
            {
              "name": "wait",
              "i_buff": {
                "src": "i",
                "dst": "o"
              },
              "i_cids": [
                {
                  "id": 0,
                  "off": 0
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "nop",
              "deps": [
                {
                  "tb": 0,
                  "step": 0
                }
              ]
            },
            {
              "name": "pwsf",
              "o_buff": {
                "src": "o",
                "dst": "o"
              },
              "o_cids": [
                {
                  "id": 0,
                  "off": 0
                }
              ],
              "srcs": [
                {
                  "buff": "o",
                  "off": 0
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "pwsf",
              "o_buff": {
                "src": "i",
                "dst": "o"
              },
              "o_cids": [
                {
                  "id": 1,
                  "off": 1
                }
              ],
              "srcs": [
                {
                  "buff": "i",
                  "off": 0
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "wait",
              "i_buff": {
                "src": "i",
                "dst": "o"
              },
              "i_cids": [
                {
                  "id": 0,
                  "off": 0
                },
                {
                  "id": 0,
                  "off": 0
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "nop",
              "deps": [
                {
                  "tb": 0,
                  "step": 4
                }
              ]
            },
            {
              "name": "pwsf",
              "o_buff": {
                "src": "o",
                "dst": "o"
              },
              "o_cids": [
                {
                  "id": 0,
                  "off": 3
                }
              ],
              "srcs": [
                {
                  "buff": "o",
                  "off": 3
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "wait",
              "i_buff": {
                "src": "i",
                "dst": "o"
              },
              "i_cids": [
                {
                  "id": 0,
                  "off": 0
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "nop",
              "deps": [
                {
                  "tb": 0,
                  "step": 7
                }
              ]
            },
            {
              "name": "pwsf",
              "o_buff": {
                "src": "o",
                "dst": "o"
              },
              "o_cids": [
                {
                  "id": 0,
                  "off": 4
                }
              ],
              "srcs": [
                {
                  "buff": "o",
                  "off": 4
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "wait",
              "i_buff": {
                "src": "i",
                "dst": "o"
              },
              "i_cids": [
                {
                  "id": 0,
                  "off": 0
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "nop",
              "deps": [
                {
                  "tb": 0,
                  "step": 10
                }
              ]
            },
            {
              "name": "pwsf",
              "o_buff": {
                "src": "o",
                "dst": "o"
              },
              "o_cids": [
                {
                  "id": 0,
                  "off": 5
                }
              ],
              "srcs": [
                {
                  "buff": "o",
                  "off": 5
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "wait",
              "i_buff": {
                "src": "i",
                "dst": "o"
              },
              "i_cids": [
                {
                  "id": 0,
                  "off": 0
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "nop",
              "deps": [
                {
                  "tb": 0,
                  "step": 13
                }
              ]
            },
            {
              "name": "pwsf",
              "o_buff": {
                "src": "o",
                "dst": "o"
              },
              "o_cids": [
                {
                  "id": 0,
                  "off": 6
                }
              ],
              "srcs": [
                {
                  "buff": "o",
                  "off": 6
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "wait",
              "i_buff": {
                "src": "i",
                "dst": "o"
              },
              "i_cids": [
                {
                  "id": 0,
                  "off": 0
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "nop",
              "deps": [
                {
                  "tb": 0,
                  "step": 16
                }
              ]
            },
            {
              "name": "pwsf",
              "o_buff": {
                "src": "o",
                "dst": "o"
              },
              "o_cids": [
                {
                  "id": 0,
                  "off": 7
                }
              ],
              "srcs": [
                {
                  "buff": "o",
                  "off": 7
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "nop",
              "deps": [
                {
                  "tb": 0,
                  "step": 3
                }
              ]
            },
            {
              "name": "copy",
              "src": 1,
              "srcbuff": "i",
              "srcoff": 0,
              "dst": 1,
              "dstbuff": "o",
              "dstoff": 1,
              "ctype": "none",
              "cnt": 1
            }
          ],
          "channels": [
            {
              "src": "i",
              "dst": "o",
              "ctype": "proxy",
              "cids": [
                0,
                1
              ]
            },
            {
              "src": "o",
              "dst": "o",
              "ctype": "proxy",
              "cids": [
                0
              ]
            }
          ]
        }
      ],
      "channels": [
        {
          "srcbuff": "i",
          "dstbuff": "o",
          "type": "proxy",
          "connectedTo": [
            0,
            2
          ]
        },
        {
          "srcbuff": "o",
          "dstbuff": "o",
          "type": "proxy",
          "connectedTo": [
            2
          ]
        }
      ]
    },
    {
      "id": 2,
      "inputChunks": 1,
      "outputChunks": 8,
      "scratchChunks": 0,
      "chunkGroups": 1,
      "threadblocks": [
        {
          "id": 0,
          "ops": [
            {
              "name": "wait",
              "i_buff": {
                "src": "i",
                "dst": "o"
              },
              "i_cids": [
                {
                  "id": 0,
                  "off": 0
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "nop",
              "deps": [
                {
                  "tb": 0,
                  "step": 0
                }
              ]
            },
            {
              "name": "pwsf",
              "o_buff": {
                "src": "o",
                "dst": "o"
              },
              "o_cids": [
                {
                  "id": 0,
                  "off": 0
                }
              ],
              "srcs": [
                {
                  "buff": "o",
                  "off": 0
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "wait",
              "i_buff": {
                "src": "i",
                "dst": "o"
              },
              "i_cids": [
                {
                  "id": 0,
                  "off": 0
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "nop",
              "deps": [
                {
                  "tb": 0,
                  "step": 3
                }
              ]
            },
            {
              "name": "pwsf",
              "o_buff": {
                "src": "o",
                "dst": "o"
              },
              "o_cids": [
                {
                  "id": 0,
                  "off": 1
                }
              ],
              "srcs": [
                {
                  "buff": "o",
                  "off": 1
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "pwsf",
              "o_buff": {
                "src": "i",
                "dst": "o"
              },
              "o_cids": [
                {
                  "id": 1,
                  "off": 2
                }
              ],
              "srcs": [
                {
                  "buff": "i",
                  "off": 0
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "wait",
              "i_buff": {
                "src": "i",
                "dst": "o"
              },
              "i_cids": [
                {
                  "id": 0,
                  "off": 0
                },
                {
                  "id": 0,
                  "off": 0
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "nop",
              "deps": [
                {
                  "tb": 0,
                  "step": 7
                }
              ]
            },
            {
              "name": "pwsf",
              "o_buff": {
                "src": "o",
                "dst": "o"
              },
              "o_cids": [
                {
                  "id": 0,
                  "off": 4
                }
              ],
              "srcs": [
                {
                  "buff": "o",
                  "off": 4
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "wait",
              "i_buff": {
                "src": "i",
                "dst": "o"
              },
              "i_cids": [
                {
                  "id": 0,
                  "off": 0
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "nop",
              "deps": [
                {
                  "tb": 0,
                  "step": 10
                }
              ]
            },
            {
              "name": "pwsf",
              "o_buff": {
                "src": "o",
                "dst": "o"
              },
              "o_cids": [
                {
                  "id": 0,
                  "off": 5
                }
              ],
              "srcs": [
                {
                  "buff": "o",
                  "off": 5
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "wait",
              "i_buff": {
                "src": "i",
                "dst": "o"
              },
              "i_cids": [
                {
                  "id": 0,
                  "off": 0
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "nop",
              "deps": [
                {
                  "tb": 0,
                  "step": 13
                }
              ]
            },
            {
              "name": "pwsf",
              "o_buff": {
                "src": "o",
                "dst": "o"
              },
              "o_cids": [
                {
                  "id": 0,
                  "off": 6
                }
              ],
              "srcs": [
                {
                  "buff": "o",
                  "off": 6
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "wait",
              "i_buff": {
                "src": "i",
                "dst": "o"
              },
              "i_cids": [
                {
                  "id": 0,
                  "off": 0
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "nop",
              "deps": [
                {
                  "tb": 0,
                  "step": 16
                }
              ]
            },
            {
              "name": "pwsf",
              "o_buff": {
                "src": "o",
                "dst": "o"
              },
              "o_cids": [
                {
                  "id": 0,
                  "off": 7
                }
              ],
              "srcs": [
                {
                  "buff": "o",
                  "off": 7
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "nop",
              "deps": [
                {
                  "tb": 0,
                  "step": 6
                }
              ]
            },
            {
              "name": "copy",
              "src": 2,
              "srcbuff": "i",
              "srcoff": 0,
              "dst": 2,
              "dstbuff": "o",
              "dstoff": 2,
              "ctype": "none",
              "cnt": 1
            }
          ],
          "channels": [
            {
              "src": "i",
              "dst": "o",
              "ctype": "proxy",
              "cids": [
                0,
                1
              ]
            },
            {
              "src": "o",
              "dst": "o",
              "ctype": "proxy",
              "cids": [
                0
              ]
            }
          ]
        }
      ],
      "channels": [
        {
          "srcbuff": "i",
          "dstbuff": "o",
          "type": "proxy",
          "connectedTo": [
            1,
            3
          ]
        },
        {
          "srcbuff": "o",
          "dstbuff": "o",
          "type": "proxy",
          "connectedTo": [
            3
          ]
        }
      ]
    },
    {
      "id": 3,
      "inputChunks": 1,
      "outputChunks": 8,
      "scratchChunks": 0,
      "chunkGroups": 1,
      "threadblocks": [
        {
          "id": 0,
          "ops": [
            {
              "name": "wait",
              "i_buff": {
                "src": "i",
                "dst": "o"
              },
              "i_cids": [
                {
                  "id": 0,
                  "off": 0
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "nop",
              "deps": [
                {
                  "tb": 0,
                  "step": 0
                }
              ]
            },
            {
              "name": "pwsf",
              "o_buff": {
                "src": "o",
                "dst": "o"
              },
              "o_cids": [
                {
                  "id": 0,
                  "off": 0
                }
              ],
              "srcs": [
                {
                  "buff": "o",
                  "off": 0
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "wait",
              "i_buff": {
                "src": "i",
                "dst": "o"
              },
              "i_cids": [
                {
                  "id": 0,
                  "off": 0
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "nop",
              "deps": [
                {
                  "tb": 0,
                  "step": 3
                }
              ]
            },
            {
              "name": "pwsf",
              "o_buff": {
                "src": "o",
                "dst": "o"
              },
              "o_cids": [
                {
                  "id": 0,
                  "off": 1
                }
              ],
              "srcs": [
                {
                  "buff": "o",
                  "off": 1
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "wait",
              "i_buff": {
                "src": "i",
                "dst": "o"
              },
              "i_cids": [
                {
                  "id": 0,
                  "off": 0
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "nop",
              "deps": [
                {
                  "tb": 0,
                  "step": 6
                }
              ]
            },
            {
              "name": "pwsf",
              "o_buff": {
                "src": "o",
                "dst": "o"
              },
              "o_cids": [
                {
                  "id": 0,
                  "off": 2
                }
              ],
              "srcs": [
                {
                  "buff": "o",
                  "off": 2
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "pwsf",
              "o_buff": {
                "src": "i",
                "dst": "o"
              },
              "o_cids": [
                {
                  "id": 1,
                  "off": 3
                }
              ],
              "srcs": [
                {
                  "buff": "i",
                  "off": 0
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "wait",
              "i_buff": {
                "src": "i",
                "dst": "o"
              },
              "i_cids": [
                {
                  "id": 0,
                  "off": 0
                },
                {
                  "id": 0,
                  "off": 0
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "nop",
              "deps": [
                {
                  "tb": 0,
                  "step": 10
                }
              ]
            },
            {
              "name": "pwsf",
              "o_buff": {
                "src": "o",
                "dst": "o"
              },
              "o_cids": [
                {
                  "id": 0,
                  "off": 5
                }
              ],
              "srcs": [
                {
                  "buff": "o",
                  "off": 5
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "wait",
              "i_buff": {
                "src": "i",
                "dst": "o"
              },
              "i_cids": [
                {
                  "id": 0,
                  "off": 0
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "nop",
              "deps": [
                {
                  "tb": 0,
                  "step": 13
                }
              ]
            },
            {
              "name": "pwsf",
              "o_buff": {
                "src": "o",
                "dst": "o"
              },
              "o_cids": [
                {
                  "id": 0,
                  "off": 6
                }
              ],
              "srcs": [
                {
                  "buff": "o",
                  "off": 6
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "wait",
              "i_buff": {
                "src": "i",
                "dst": "o"
              },
              "i_cids": [
                {
                  "id": 0,
                  "off": 0
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "nop",
              "deps": [
                {
                  "tb": 0,
                  "step": 16
                }
              ]
            },
            {
              "name": "pwsf",
              "o_buff": {
                "src": "o",
                "dst": "o"
              },
              "o_cids": [
                {
                  "id": 0,
                  "off": 7
                }
              ],
              "srcs": [
                {
                  "buff": "o",
                  "off": 7
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "nop",
              "deps": [
                {
                  "tb": 0,
                  "step": 9
                }
              ]
            },
            {
              "name": "copy",
              "src": 3,
              "srcbuff": "i",
              "srcoff": 0,
              "dst": 3,
              "dstbuff": "o",
              "dstoff": 3,
              "ctype": "none",
              "cnt": 1
            }
          ],
          "channels": [
            {
              "src": "i",
              "dst": "o",
              "ctype": "proxy",
              "cids": [
                0,
                1
              ]
            },
            {
              "src": "o",
              "dst": "o",
              "ctype": "proxy",
              "cids": [
                0
              ]
            }
          ]
        }
      ],
      "channels": [
        {
          "srcbuff": "i",
          "dstbuff": "o",
          "type": "proxy",
          "connectedTo": [
            2,
            4
          ]
        },
        {
          "srcbuff": "o",
          "dstbuff": "o",
          "type": "proxy",
          "connectedTo": [
            4
          ]
        }
      ]
    },
    {
      "id": 4,
      "inputChunks": 1,
      "outputChunks": 8,
      "scratchChunks": 0,
      "chunkGroups": 1,
      "threadblocks": [
        {
          "id": 0,
          "ops": [
            {
              "name": "wait",
              "i_buff": {
                "src": "i",
                "dst": "o"
              },
              "i_cids": [
                {
                  "id": 0,
                  "off": 0
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "nop",
              "deps": [
                {
                  "tb": 0,
                  "step": 0
                }
              ]
            },
            {
              "name": "pwsf",
              "o_buff": {
                "src": "o",
                "dst": "o"
              },
              "o_cids": [
                {
                  "id": 0,
                  "off": 0
                }
              ],
              "srcs": [
                {
                  "buff": "o",
                  "off": 0
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "wait",
              "i_buff": {
                "src": "i",
                "dst": "o"
              },
              "i_cids": [
                {
                  "id": 0,
                  "off": 0
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "nop",
              "deps": [
                {
                  "tb": 0,
                  "step": 3
                }
              ]
            },
            {
              "name": "pwsf",
              "o_buff": {
                "src": "o",
                "dst": "o"
              },
              "o_cids": [
                {
                  "id": 0,
                  "off": 1
                }
              ],
              "srcs": [
                {
                  "buff": "o",
                  "off": 1
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "wait",
              "i_buff": {
                "src": "i",
                "dst": "o"
              },
              "i_cids": [
                {
                  "id": 0,
                  "off": 0
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "nop",
              "deps": [
                {
                  "tb": 0,
                  "step": 6
                }
              ]
            },
            {
              "name": "pwsf",
              "o_buff": {
                "src": "o",
                "dst": "o"
              },
              "o_cids": [
                {
                  "id": 0,
                  "off": 2
                }
              ],
              "srcs": [
                {
                  "buff": "o",
                  "off": 2
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "wait",
              "i_buff": {
                "src": "i",
                "dst": "o"
              },
              "i_cids": [
                {
                  "id": 0,
                  "off": 0
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "nop",
              "deps": [
                {
                  "tb": 0,
                  "step": 9
                }
              ]
            },
            {
              "name": "pwsf",
              "o_buff": {
                "src": "o",
                "dst": "o"
              },
              "o_cids": [
                {
                  "id": 0,
                  "off": 3
                }
              ],
              "srcs": [
                {
                  "buff": "o",
                  "off": 3
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "pwsf",
              "o_buff": {
                "src": "i",
                "dst": "o"
              },
              "o_cids": [
                {
                  "id": 1,
                  "off": 4
                }
              ],
              "srcs": [
                {
                  "buff": "i",
                  "off": 0
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "wait",
              "i_buff": {
                "src": "i",
                "dst": "o"
              },
              "i_cids": [
                {
                  "id": 0,
                  "off": 0
                },
                {
                  "id": 0,
                  "off": 0
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "nop",
              "deps": [
                {
                  "tb": 0,
                  "step": 13
                }
              ]
            },
            {
              "name": "pwsf",
              "o_buff": {
                "src": "o",
                "dst": "o"
              },
              "o_cids": [
                {
                  "id": 0,
                  "off": 6
                }
              ],
              "srcs": [
                {
                  "buff": "o",
                  "off": 6
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "wait",
              "i_buff": {
                "src": "i",
                "dst": "o"
              },
              "i_cids": [
                {
                  "id": 0,
                  "off": 0
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "nop",
              "deps": [
                {
                  "tb": 0,
                  "step": 16
                }
              ]
            },
            {
              "name": "pwsf",
              "o_buff": {
                "src": "o",
                "dst": "o"
              },
              "o_cids": [
                {
                  "id": 0,
                  "off": 7
                }
              ],
              "srcs": [
                {
                  "buff": "o",
                  "off": 7
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "nop",
              "deps": [
                {
                  "tb": 0,
                  "step": 12
                }
              ]
            },
            {
              "name": "copy",
              "src": 4,
              "srcbuff": "i",
              "srcoff": 0,
              "dst": 4,
              "dstbuff": "o",
              "dstoff": 4,
              "ctype": "none",
              "cnt": 1
            }
          ],
          "channels": [
            {
              "src": "i",
              "dst": "o",
              "ctype": "proxy",
              "cids": [
                0,
                1
              ]
            },
            {
              "src": "o",
              "dst": "o",
              "ctype": "proxy",
              "cids": [
                0
              ]
            }
          ]
        }
      ],
      "channels": [
        {
          "srcbuff": "i",
          "dstbuff": "o",
          "type": "proxy",
          "connectedTo": [
            3,
            5
          ]
        },
        {
          "srcbuff": "o",
          "dstbuff": "o",
          "type": "proxy",
          "connectedTo": [
            5
          ]
        }
      ]
    },
    {
      "id": 5,
      "inputChunks": 1,
      "outputChunks": 8,
      "scratchChunks": 0,
      "chunkGroups": 1,
      "threadblocks": [
        {
          "id": 0,
          "ops": [
            {
              "name": "wait",
              "i_buff": {
                "src": "i",
                "dst": "o"
              },
              "i_cids": [
                {
                  "id": 0,
                  "off": 0
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "nop",
              "deps": [
                {
                  "tb": 0,
                  "step": 0
                }
              ]
            },
            {
              "name": "pwsf",
              "o_buff": {
                "src": "o",
                "dst": "o"
              },
              "o_cids": [
                {
                  "id": 0,
                  "off": 0
                }
              ],
              "srcs": [
                {
                  "buff": "o",
                  "off": 0
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "wait",
              "i_buff": {
                "src": "i",
                "dst": "o"
              },
              "i_cids": [
                {
                  "id": 0,
                  "off": 0
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "nop",
              "deps": [
                {
                  "tb": 0,
                  "step": 3
                }
              ]
            },
            {
              "name": "pwsf",
              "o_buff": {
                "src": "o",
                "dst": "o"
              },
              "o_cids": [
                {
                  "id": 0,
                  "off": 1
                }
              ],
              "srcs": [
                {
                  "buff": "o",
                  "off": 1
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "wait",
              "i_buff": {
                "src": "i",
                "dst": "o"
              },
              "i_cids": [
                {
                  "id": 0,
                  "off": 0
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "nop",
              "deps": [
                {
                  "tb": 0,
                  "step": 6
                }
              ]
            },
            {
              "name": "pwsf",
              "o_buff": {
                "src": "o",
                "dst": "o"
              },
              "o_cids": [
                {
                  "id": 0,
                  "off": 2
                }
              ],
              "srcs": [
                {
                  "buff": "o",
                  "off": 2
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "wait",
              "i_buff": {
                "src": "i",
                "dst": "o"
              },
              "i_cids": [
                {
                  "id": 0,
                  "off": 0
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "nop",
              "deps": [
                {
                  "tb": 0,
                  "step": 9
                }
              ]
            },
            {
              "name": "pwsf",
              "o_buff": {
                "src": "o",
                "dst": "o"
              },
              "o_cids": [
                {
                  "id": 0,
                  "off": 3
                }
              ],
              "srcs": [
                {
                  "buff": "o",
                  "off": 3
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "wait",
              "i_buff": {
                "src": "i",
                "dst": "o"
              },
              "i_cids": [
                {
                  "id": 0,
                  "off": 0
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "nop",
              "deps": [
                {
                  "tb": 0,
                  "step": 12
                }
              ]
            },
            {
              "name": "pwsf",
              "o_buff": {
                "src": "o",
                "dst": "o"
              },
              "o_cids": [
                {
                  "id": 0,
                  "off": 4
                }
              ],
              "srcs": [
                {
                  "buff": "o",
                  "off": 4
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "pwsf",
              "o_buff": {
                "src": "i",
                "dst": "o"
              },
              "o_cids": [
                {
                  "id": 1,
                  "off": 5
                }
              ],
              "srcs": [
                {
                  "buff": "i",
                  "off": 0
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "wait",
              "i_buff": {
                "src": "i",
                "dst": "o"
              },
              "i_cids": [
                {
                  "id": 0,
                  "off": 0
                },
                {
                  "id": 0,
                  "off": 0
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "nop",
              "deps": [
                {
                  "tb": 0,
                  "step": 16
                }
              ]
            },
            {
              "name": "pwsf",
              "o_buff": {
                "src": "o",
                "dst": "o"
              },
              "o_cids": [
                {
                  "id": 0,
                  "off": 7
                }
              ],
              "srcs": [
                {
                  "buff": "o",
                  "off": 7
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "nop",
              "deps": [
                {
                  "tb": 0,
                  "step": 15
                }
              ]
            },
            {
              "name": "copy",
              "src": 5,
              "srcbuff": "i",
              "srcoff": 0,
              "dst": 5,
              "dstbuff": "o",
              "dstoff": 5,
              "ctype": "none",
              "cnt": 1
            }
          ],
          "channels": [
            {
              "src": "i",
              "dst": "o",
              "ctype": "proxy",
              "cids": [
                0,
                1
              ]
            },
            {
              "src": "o",
              "dst": "o",
              "ctype": "proxy",
              "cids": [
                0
              ]
            }
          ]
        }
      ],
      "channels": [
        {
          "srcbuff": "i",
          "dstbuff": "o",
          "type": "proxy",
          "connectedTo": [
            4,
            6
          ]
        },
        {
          "srcbuff": "o",
          "dstbuff": "o",
          "type": "proxy",
          "connectedTo": [
            6
          ]
        }
      ]
    },
    {
      "id": 6,
      "inputChunks": 1,
      "outputChunks": 8,
      "scratchChunks": 0,
      "chunkGroups": 1,
      "threadblocks": [
        {
          "id": 0,
          "ops": [
            {
              "name": "wait",
              "i_buff": {
                "src": "i",
                "dst": "o"
              },
              "i_cids": [
                {
                  "id": 0,
                  "off": 0
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "nop",
              "deps": [
                {
                  "tb": 0,
                  "step": 0
                }
              ]
            },
            {
              "name": "pwsf",
              "o_buff": {
                "src": "o",
                "dst": "o"
              },
              "o_cids": [
                {
                  "id": 0,
                  "off": 0
                }
              ],
              "srcs": [
                {
                  "buff": "o",
                  "off": 0
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "wait",
              "i_buff": {
                "src": "i",
                "dst": "o"
              },
              "i_cids": [
                {
                  "id": 0,
                  "off": 0
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "nop",
              "deps": [
                {
                  "tb": 0,
                  "step": 3
                }
              ]
            },
            {
              "name": "pwsf",
              "o_buff": {
                "src": "o",
                "dst": "o"
              },
              "o_cids": [
                {
                  "id": 0,
                  "off": 1
                }
              ],
              "srcs": [
                {
                  "buff": "o",
                  "off": 1
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "wait",
              "i_buff": {
                "src": "i",
                "dst": "o"
              },
              "i_cids": [
                {
                  "id": 0,
                  "off": 0
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "nop",
              "deps": [
                {
                  "tb": 0,
                  "step": 6
                }
              ]
            },
            {
              "name": "pwsf",
              "o_buff": {
                "src": "o",
                "dst": "o"
              },
              "o_cids": [
                {
                  "id": 0,
                  "off": 2
                }
              ],
              "srcs": [
                {
                  "buff": "o",
                  "off": 2
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "wait",
              "i_buff": {
                "src": "i",
                "dst": "o"
              },
              "i_cids": [
                {
                  "id": 0,
                  "off": 0
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "nop",
              "deps": [
                {
                  "tb": 0,
                  "step": 9
                }
              ]
            },
            {
              "name": "pwsf",
              "o_buff": {
                "src": "o",
                "dst": "o"
              },
              "o_cids": [
                {
                  "id": 0,
                  "off": 3
                }
              ],
              "srcs": [
                {
                  "buff": "o",
                  "off": 3
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "wait",
              "i_buff": {
                "src": "i",
                "dst": "o"
              },
              "i_cids": [
                {
                  "id": 0,
                  "off": 0
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "nop",
              "deps": [
                {
                  "tb": 0,
                  "step": 12
                }
              ]
            },
            {
              "name": "pwsf",
              "o_buff": {
                "src": "o",
                "dst": "o"
              },
              "o_cids": [
                {
                  "id": 0,
                  "off": 4
                }
              ],
              "srcs": [
                {
                  "buff": "o",
                  "off": 4
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "wait",
              "i_buff": {
                "src": "i",
                "dst": "o"
              },
              "i_cids": [
                {
                  "id": 0,
                  "off": 0
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "nop",
              "deps": [
                {
                  "tb": 0,
                  "step": 15
                }
              ]
            },
            {
              "name": "pwsf",
              "o_buff": {
                "src": "o",
                "dst": "o"
              },
              "o_cids": [
                {
                  "id": 0,
                  "off": 5
                }
              ],
              "srcs": [
                {
                  "buff": "o",
                  "off": 5
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "pwsf",
              "o_buff": {
                "src": "i",
                "dst": "o"
              },
              "o_cids": [
                {
                  "id": 1,
                  "off": 6
                }
              ],
              "srcs": [
                {
                  "buff": "i",
                  "off": 0
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "wait",
              "i_buff": {
                "src": "i",
                "dst": "o"
              },
              "i_cids": [
                {
                  "id": 0,
                  "off": 0
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "nop",
              "deps": [
                {
                  "tb": 0,
                  "step": 18
                }
              ]
            },
            {
              "name": "copy",
              "src": 6,
              "srcbuff": "i",
              "srcoff": 0,
              "dst": 6,
              "dstbuff": "o",
              "dstoff": 6,
              "ctype": "none",
              "cnt": 1
            }
          ],
          "channels": [
            {
              "src": "i",
              "dst": "o",
              "ctype": "proxy",
              "cids": [
                0,
                1
              ]
            },
            {
              "src": "o",
              "dst": "o",
              "ctype": "proxy",
              "cids": [
                0
              ]
            }
          ]
        }
      ],
      "channels": [
        {
          "srcbuff": "i",
          "dstbuff": "o",
          "type": "proxy",
          "connectedTo": [
            5,
            7
          ]
        },
        {
          "srcbuff": "o",
          "dstbuff": "o",
          "type": "proxy",
          "connectedTo": [
            7
          ]
        }
      ]
    },
    {
      "id": 7,
      "inputChunks": 1,
      "outputChunks": 8,
      "scratchChunks": 0,
      "chunkGroups": 1,
      "threadblocks": [
        {
          "id": 0,
          "ops": [
            {
              "name": "wait",
              "i_buff": {
                "src": "i",
                "dst": "o"
              },
              "i_cids": [
                {
                  "id": 1,
                  "off": 0
                },
                {
                  "id": 1,
                  "off": 0
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "nop",
              "deps": [
                {
                  "tb": 0,
                  "step": 0
                }
              ]
            },
            {
              "name": "pwsf",
              "o_buff": {
                "src": "o",
                "dst": "o"
              },
              "o_cids": [
                {
                  "id": 0,
                  "off": 1
                }
              ],
              "srcs": [
                {
                  "buff": "o",
                  "off": 1
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "wait",
              "i_buff": {
                "src": "i",
                "dst": "o"
              },
              "i_cids": [
                {
                  "id": 1,
                  "off": 0
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "nop",
              "deps": [
                {
                  "tb": 0,
                  "step": 3
                }
              ]
            },
            {
              "name": "pwsf",
              "o_buff": {
                "src": "o",
                "dst": "o"
              },
              "o_cids": [
                {
                  "id": 0,
                  "off": 2
                }
              ],
              "srcs": [
                {
                  "buff": "o",
                  "off": 2
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "wait",
              "i_buff": {
                "src": "i",
                "dst": "o"
              },
              "i_cids": [
                {
                  "id": 1,
                  "off": 0
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "nop",
              "deps": [
                {
                  "tb": 0,
                  "step": 6
                }
              ]
            },
            {
              "name": "pwsf",
              "o_buff": {
                "src": "o",
                "dst": "o"
              },
              "o_cids": [
                {
                  "id": 0,
                  "off": 3
                }
              ],
              "srcs": [
                {
                  "buff": "o",
                  "off": 3
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "wait",
              "i_buff": {
                "src": "i",
                "dst": "o"
              },
              "i_cids": [
                {
                  "id": 1,
                  "off": 0
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "nop",
              "deps": [
                {
                  "tb": 0,
                  "step": 9
                }
              ]
            },
            {
              "name": "pwsf",
              "o_buff": {
                "src": "o",
                "dst": "o"
              },
              "o_cids": [
                {
                  "id": 0,
                  "off": 4
                }
              ],
              "srcs": [
                {
                  "buff": "o",
                  "off": 4
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "wait",
              "i_buff": {
                "src": "i",
                "dst": "o"
              },
              "i_cids": [
                {
                  "id": 1,
                  "off": 0
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "nop",
              "deps": [
                {
                  "tb": 0,
                  "step": 12
                }
              ]
            },
            {
              "name": "pwsf",
              "o_buff": {
                "src": "o",
                "dst": "o"
              },
              "o_cids": [
                {
                  "id": 0,
                  "off": 5
                }
              ],
              "srcs": [
                {
                  "buff": "o",
                  "off": 5
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "wait",
              "i_buff": {
                "src": "i",
                "dst": "o"
              },
              "i_cids": [
                {
                  "id": 1,
                  "off": 0
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "nop",
              "deps": [
                {
                  "tb": 0,
                  "step": 15
                }
              ]
            },
            {
              "name": "pwsf",
              "o_buff": {
                "src": "o",
                "dst": "o"
              },
              "o_cids": [
                {
                  "id": 0,
                  "off": 6
                }
              ],
              "srcs": [
                {
                  "buff": "o",
                  "off": 6
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "pwsf",
              "o_buff": {
                "src": "i",
                "dst": "o"
              },
              "o_cids": [
                {
                  "id": 0,
                  "off": 7
                }
              ],
              "srcs": [
                {
                  "buff": "i",
                  "off": 0
                }
              ],
              "ctype": "proxy",
              "cnt": 1
            },
            {
              "name": "nop",
              "deps": [
                {
                  "tb": 0,
                  "step": 18
                }
              ]
            },
            {
              "name": "copy",
              "src": 7,
              "srcbuff": "i",
              "srcoff": 0,
              "dst": 7,
              "dstbuff": "o",
              "dstoff": 7,
              "ctype": "none",
              "cnt": 1
            }
          ],
          "channels": [
            {
              "src": "i",
              "dst": "o",
              "ctype": "proxy",
              "cids": [
                0,
                1
              ]
            },
            {
              "src": "o",
              "dst": "o",
              "ctype": "proxy",
              "cids": [
                0
              ]
            }
          ]
        }
      ],
      "channels": [
        {
          "srcbuff": "i",
          "dstbuff": "o",
          "type": "proxy",
          "connectedTo": [
            0,
            6
          ]
        },
        {
          "srcbuff": "o",
          "dstbuff": "o",
          "type": "proxy",
          "connectedTo": [
            0
          ]
        }
      ]
    }
  ],
  "num_threads_per_block": 1024,
  "use_double_scratch_buffer": false,
  "min_message_size": 0,
  "max_message_size": 18446744073709551615
}

@Binyang2014
Copy link
Contributor

I am confused with this line: c = chunk(send_rank, Buffer.input, 0) if step == 0 else chunk(send_rank, Buffer.output, i). What's the meaning of this

@cubele
Copy link
Author

cubele commented Jan 14, 2025

I am confused with this line: c = chunk(send_rank, Buffer.input, 0) if step == 0 else chunk(send_rank, Buffer.output, i). What's the meaning of this

For out-of-place ring allgather, step 0 copies chunk i from the input buffer at rank i -> output buffer at rank i + 1. Therefore in step 1 and further, we can operate on only the output buffers. chunk i is copied from output buffer at rank i + step -> output buffer at rank i + step + 1.

After explaining this I found out the bug in the allgather algorithm, the cr.wait line should be aligned with the sender chunk c instead of always waiting for the input buffer chunk😂. The algorithm can run normally after changing this. I guess mscclpplang is missing some checks for such deadlocks?

@cubele cubele closed this as completed Jan 14, 2025
@Binyang2014
Copy link
Contributor

Binyang2014 commented Jan 14, 2025

We are missing some checks in current version. We are trying to revise current API, then will add more correctness check

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants