Skip to content

Latest commit

 

History

History
305 lines (217 loc) · 12.7 KB

1.chunkedgraph.md

File metadata and controls

305 lines (217 loc) · 12.7 KB

Infrastructure

terraform is used to create the infrastructure needed to run ingest. Currently scripts are provided to run the ingest on Google Cloud, but it can be run locally or on another cloud provider with appropriate setup.

Requirements

  • GCloud SDK (tested with 455.0.0)
  • Terraform (v1.6.3)
  • Helm (v3.13.2)
  • kubectl (v1.28.2)

Terraform (docs)

IMPORTANT: This setup assumes that a bigtable instance is already created. To reduce latency, it is recommended that all resources are co-located in the same region as bigtable instance.

Provided scripts create a VPC network, subnet, redis instance, cluster with separately managed pools to run master and workers. Customize variables in the file terraform/terraform.tfvars to create infrastructure in your Google Cloud project.

You will need at least the following roles:

  • Service Account Admin
  • Cloud Memorystore Redis Admin
  • Kubernetes Engine Cluster Admin
  • Compute Network Admin

Run the following to create required resources.

cd terraform/
export GOOGLE_OAUTH_ACCESS_TOKEN=$(gcloud auth print-access-token) # temporary token
terraform init // only needed first time
terraform plan
terraform apply

This will output some variables useful for next steps:

kubernetes_cluster_context = "gcloud container clusters get-credentials chunkedgraph-ingest --zone us-east1-b --project neuromancer-seung-import"
kubernetes_cluster_name = "chunkedgraph-ingest"
project_id = "neuromancer-seung-import"
redis_host = "10.128.211.211"
region = "us-east1"
zone = "us-east1-b"

Use value of kubernetes_cluster_context to connect to your cluster. Use value of redis_host in helm/pychunkedgraph/values.yaml (more info in Helm section).

You can also look these up again with terraform show from within the terraform/ directory.

Private Cluster

Click Me If your project does not have enough quota for In-Use IP addresses for the given region, you may need to use a private cluster. GCP assigns an external IP to each VM that is part of a public cluster which can easily hit the quota when you use a large number of workers.

To use a private cluster, add the following changes to terraform/cluster.tf within the google_container_cluster resource block.

  addons_config {
    http_load_balancing {
      disabled = true
    }
    network_policy_config { // Calico
      disabled = false
    }
  }

  network_policy {
    enabled = true
  }

  master_authorized_networks_config {}
  private_cluster_config {
    enable_private_nodes    = true
    enable_private_endpoint = false
    master_ipv4_cidr_block  = "172.16.0.16/28"
  }

You will also need to enable the Service Networking API.

After this, run terraform commands to create the required infrastructure. Once it completes you will need to disable authorized networks to allow kubectl access to GKE control plane.

gcloud container clusters update <CLUSTER_NAME> --no-enable-master-authorized-networks

This method allows us to use a private cluster easily. If you need to give cluster access only to specific IPs you may follow the instructions in this guide, this involves a few additional components and steps to access the cluster with kubectl.

Docker image

Build the PyChunkedgraph docker image:

git clone https://github.com/seung-lab/PyChunkedGraph.git
cd PyChunkedGraph
gcloud builds submit . --tag=gcr.io/<your_project_id>/pychunkedgraph:<custom_tag> --project=<your_project_id>

and link the image and tag in values.yaml (refer to example_values.yaml).

image:
  repository: gcr.io/<your_project_id>/pychunkedgraph
  tag: <custom_tag>

Helm (docs)

helm is used to run the ingest. The provided chart installs kubernetes resources such as configmaps, secrets, deployments needed to run the ingest. Refer to example helm/pychunkedgraph/example_values.yaml file for more information.

IMPORTANT: If you have a large dataset to ingest, it is recommended to do this layer by layer. See scaling.

NOTE: Depending on your dataset, you will need to figure out the optimal limits for cpu and memory in your worker deployments. To do that adjust the count and machine variables in terraform.tfvars. It can vary with chunk size, size of supervoxels (atomic semgents in layer 1), number of edges per chunk and so on.

Chart Installation

When all variables are ready, rename your values file to values.yaml (ignored by git because it can contain sensitive information). If a different name is preferred (for different datasets/project), use the format values*.[yml|yaml] which will also be ignored by git. Then the file name will need to be explicitly passed to helm install with -f <values_file.yml>.

Then run:

cd helm/pychunkedgraph
helm dependencies update
helm install <release_name> . --debug --dry-run --set workerDeployments[0].enabled=true
helm upgrade <release_name> . // upgrade existing after changing values.yaml

You can enable a deployment and disable another without changing the values file, for example - when L2 chunks are done L2 workers are no longer needed. Resources can be freed up by disabling them (note that the default in values file is enabled: false, only enabling needs to be set explicitly).

helm upgrade <release_name> . --set workerDeployments[1].enabled=true

If successful run the same command without --dry-run. This will create master and worker kubernetes deployments.

Pods will have dataset configuration mounted in /app/datasets and /app is the WORKDIR.

NOTE: Refer to dataset_config.md for more information about the config structure.

The number of workerDeployments in values.yaml relates to the number of layers in the graph. For example, a graph with 6 layers must have 5 deployments in workerDeployments (l2 - l6). To disable a depoyment, set the enabled key to false, you may want this for higher layer deployments because there will be no jobs until lower layers finish.

Tracker workers listen to completion of jobs at each layer and queue the parent chunks. Set trackerDeployments.count to root_layer of the chunkedgraph. The root_layer can be determined as follows -

# voxel_counts - size of the dataset along axes
# ex) voxel_counts = np.array([4096, 4096, 1024])
# chunk_size - chunkedgraph chunk size
# ex) chunk_size = np.array([512, 512, 128])
import numpy as np
chunks = np.ceil((voxel_counts / chunk_size)).astype(int)
max_chunks = max(chunks) # most chunks along any dimension
log_chunks = np.log(max_chunks) / np.log(fanout)
root_layer = int(np.ceil(log_chunks)) + 2

Ingest

The process is named ingest because you are ingesting data into a bigtable.

Pods should now be in Running status, provided there were no issues. Run the following to create a bigtable and enqueue jobs.

kubectl exec -ti deploy/master -- bash
// now you're in the container
> ingest graph <unique_test_bigtable_name> datasets/test.yml --test

RQ is used to create jobs. This library uses redis as a task queue.

The --test flag will queue 8 children chunks that share the same parent. When the children chunk jobs finish, worker listening on the t2 (tracker for layer 2) queue should enqueue the parent chunk. (To avoid race conditions there should only be one worker listening on tracker queues for each layer. The provided helm chart makes sure of this but important not to forget).

NOTE: For large datasets, the ingest graph command can take a long time because queue is buffered so as not to exceed redis memory. You will need to figure out how big the redis instance must be accordingly. See scaling section to see the recommended way to run ingest for large datasets.

You can check the progress with ingest status. In another shell, run:

kubectl exec -ti deploy/master -- bash
> ingest status
2       : 8 / 64 # progress at each layer
3       : 1 / 8
4       : 0 / 1 # this graph has 4 layers

Output should look like this if successful. Now you can rerun the ingest without the --test flag (make sure to use a different bigtable name).

NOTE: make sure to flush redis (ingest flush_redis) after running ingest and before another helm install. Residuals from previous ingest runs can lead to inaccurate information.

Scaling

When creating a large chunkedgraph with tens of thousands of level 2 chunks, it is recommended to do it layer by layer. This is because there is simply too much information to keep track of for automatically enqueing parent chunks. This easily leads to exhausting redis memory and disrupts the process. In addition, the enqueing of parent chunks is slow because there can only be one tracker worker per layer.

To do this, add this env var to the helm/pychunkedgraph/values.yaml file:

env:
- name: &commonEnvVars "pychunkedgraph"
  vars:
    REDIS_HOST: "<redis_host>" # refer to output of terraform apply
    REDIS_PORT: 6379
    REDIS_PASSWORD: ""
    .
    .
    .
    DO_NOT_AUTOQUEUE_PARENT_CHUNKS: <any_non_empty_string>

This essentially disables all the tracking needed to automatically enqueue parent chunks. The trackerDeployments can be disabled as well because they will not be necessary.

Follow the same process as above to enqueue level 2 chunks with the ingest graph command. When all of them complete you can proceed to layer 3.

> ingest layer 3

Then layer 4 and so on up to the max layer. Proceed to next layer only after all the jobs in current layer finish.

Troubleshooting

Retrying

If the ingest graph command is killed by mistake or if kubernetes restarts master pod for any reason while ingest is in progress, you can restart the process with --retry. This option must only be used when a table has already been created and ingest process was interrupted.

> ingest graph <same_name_as_above> datasets/test.yml --retry

Failed jobs

Sometimes jobs fail for any number of reasons. Assuming the causes were external, you can requeue them using rqx requeue <queue_name> --all. Refer to pychunkedgraph/ingest/cli.py and pychunkedgraph/ingest/rq_cli.py for more commands.

RQ Cheatsheet:

rq info # show info for all existing queues

rqx status <queue_name>
rqx failed <queue_name> # will print a list of failed jobs
rqx failed <queue_name> <job_id> # will print the traceback of error why the job failed
rqx requeue <queue_name> --all
rqx requeue <queue_name> <job_ids_space_separated>

Stuck jobs

Sometimes even after all jobs have been queued and none of the workers are busy anymore, you might see a mismatch between jobs completed and total jobs at a given layer. This does not happen often and is due to pre-emptible VMs failing to communicate when they get killed. Ideally rq should notice and mark the job as failed but sometimes it doesn't happen.

To re-queue such jobs, you will need to login to redis and check which of them might be stuck.

If you are using private cluster If you created a private cluster, `apt` is not working because of no public ip address. You might want to create a new Docker image and create a cluster again.
FROM seunglab/pychunkedgraph:graph-tool_dracopy
COPY override/timeout.conf /etc/nginx/conf.d/timeout.conf
COPY override/supervisord.conf /etc/supervisor/conf.d/supervisord.conf
COPY requirements.txt /app
RUN pip install pip==20.2 && pip install --no-cache-dir --upgrade -r requirements.txt \
    && apt update; exit 0
RUN apt install redis-server -y
COPY . /app
kubectl exec -ti deploy/master -- bash
> apt update
> apt install redis -y
> redis-cli -h $REDIS_HOST
redis> keys rq:job:2_*

The command keys rq:job:2* will give you a list of jobs stuck in queue l2. The output looks something like this:

1) "rq:job:2_319_290_14"
2) "rq:job:2_932_683_2"

Enqueue these chunks individually with ingest chunk <queue> <layer> <X> <Y> <Z>, for instance:

> ingest chunk l2 2 319 290 14

Once all these stuck jobs finish, completed count and total count should match.