aqlaboratory · juliocesar-io · Aug 10, 2024 · Aug 27, 2024 · Aug 27, 2024 · Aug 27, 2024
diff --git a/notebooks/OpenFoldLocal.ipynb b/notebooks/OpenFoldLocal.ipynb
@@ -0,0 +1,360 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# OpenFold Local Notebook\n",
+    "\n",
+    "Provides the flexibility to run inference on a target sequence using a local Docker installation of [OpenFold](https://github.com/aqlaboratory/openfold), along with the convenience of visualizing results using the same plots from the OpenFold Colab Notebook.\n",
+    "\n",
+    "This notebook utilizes the provided utility functions to execute OpenFold via Docker. It includes logic to handle results, allowing you to experiment with different parameters, reuse computed MSAs, filter the best model, and plot metrics. It also supports asynchronous and long-running executions.\n",
+    "\n",
+    "If you have access to a machine and want to perform quick inference and visualize results, this notebook offers several useful features:\n",
+    "\n",
+    "- Use precomputed alignments, enabling you to run inference with different model parameters for result comparison.\n",
+    "- Identify the best model and generate metric plots.\n",
+    "- Manage long-running executions.\n",
+    "- Work with large datasets by splitting your input and performing asynchronous runs using threads on multiple GPUs.\n",
+    "\n",
+    "While you can achieve this entirely through Docker commands in the terminal, you would need to code or adjust the Colab functions to work with local data. This notebook gives you a head start.\n",
+    "\n",
+    "**Citing this work**\n",
+    "\n",
+    "Any publication that discloses findings arising from using this notebook should [cite](https://github.com/deepmind/alphafold/#citing-this-work) DeepMind's [AlphaFold paper](https://doi.org/10.1038/s41586-021-03819-2).\n",
+    "\n",
+    "**Licenses**\n",
+    "\n",
+    "This Notebook supports inference with the [AlphaFold model parameters](https://github.com/deepmind/alphafold/#model-parameters-license), made available under the Creative Commons Attribution 4.0 International ([CC BY 4.0](https://creativecommons.org/licenses/by/4.0/legalcode)) license. The Colab itself is provided under the [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0). See the full license statement below.\n",
+    "\n",
+    "**More information**\n",
+    "\n",
+    "You can find more information about how AlphaFold/OpenFold works in DeepMind's two Nature papers:\n",
+    "\n",
+    "*   [AlphaFold methods paper](https://www.nature.com/articles/s41586-021-03819-2)\n",
+    "*   [AlphaFold predictions of the human proteome paper](https://www.nature.com/articles/s41586-021-03828-1)\n",
+    "\n",
+    "FAQ on how to interpret AlphaFold/OpenFold predictions are [here](https://alphafold.ebi.ac.uk/faq)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Setup the notebook\n",
+    "\n",
+    "Fist, build Openfold using Docker. Follow this [guide](https://openfold.readthedocs.io/en/latest/original_readme.html#building-and-using-the-docker-container).\n",
+    "\n",
+    "Then, go to the notebook folder\n",
+    "\n",
+    "`cd notebooks`\n",
+    "\n",
+    "Create an environment to run Jupyter with the requirements\n",
+    "\n",
+    "`mamba create -n openfold_notebook python==3.10`\n",
+    "\n",
+    "Activate the environment\n",
+    "\n",
+    "`mamba activate openfold_notebook`\n",
+    "\n",
+    "Install the requirements\n",
+    "\n",
+    "`pip install -r src/requirements.txt`\n",
+    "\n",
+    "Start your Jupyter server in the current folder\n",
+    "\n",
+    "`jupyter lab . --ip=\"0.0.0.0\"`\n",
+    "\n",
+    "Access the notebook URL or connect remotely using VSCode.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Running Inference \n",
+    "\n",
+    "**Inputs:** files or strings with sequences\n",
+    "\n",
+    "**Output:** \n",
+    "\n",
+    "```bash\n",
+    "data/ \n",
+    "├── run_<date>_<run_id>/ # each is run stored with a random ID, this id can be use to re-run inference \n",
+    "│   ├── fasta_dir/ \n",
+    "│   │   ├── tmp/ # generated .fasta file per sequence\n",
+    "│   │   └── sequences.fasta # validated input sequences are merged into a .fasta file\n",
+    "│   └── output/\n",
+    "│       ├── alignments/ #  one folder per sequence of resulted MSA\n",
+    "│       ├── msa_plots/ #  one file per aligment  .png\n",
+    "│       ├── predictions/ # inference results .pkl and .pdb files\n",
+    "│       ├── selected_predictions/ # selected best inferece and metrics plots\n",
+    "│       └── timings.json # inference time\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Initialize the client"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import docker\n",
+    "from src.inference import InferenceClientOpenFold\n",
+    "\n",
+    "# You can also use a remote docker server \n",
+    "docker_client = docker.from_env()\n",
+    "\n",
+    "# i.e connect to the remote Docker daemon\n",
+    "# remote_docker_client = docker.DockerClient(base_url='tcp://<remote_server_ip>:2375')\n",
+    "\n",
+    "# Initialize the OpenFold Docker client setting the database path \n",
+    "\n",
+    "databases_dir = \"/path/to/databases\"\n",
+    "\n",
+    "openfold_client = InferenceClientOpenFold(databases_dir, docker_client)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Inference using a sequence string"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# For multiple sequences, separate sequences with a colon `:`\n",
+    "input_string = \"DAGAQGAAIGSPGVLSGNVVQVPVHVPVNVCGNTVSVIGLLNPAFGNTCVNA:AGETGRTGVLVTSSATNDGDSGWGRFAG\"\n",
+    "\n",
+    "model_name = \"multimer\" # or \"monomer\"\n",
+    "weight_set = 'AlphaFold' # or 'OpenFold'\n",
+    "\n",
+    "# Run inference\n",
+    "run_id = openfold_client.run_inference(weight_set, model_name, inference_input=input_string)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Inference using a fasta file"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "input_file = \"/path/to/test.fasta\"\n",
+    "\n",
+    "run_id = openfold_client.run_inference(weight_set, model_name, inference_input=input_file)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Inference using pre-computed aligments for a run_id"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "model_name = \"monomer\"\n",
+    "weight_set = 'OpenFold'\n",
+    "\n",
+    "openfold_client.run_inference(weight_set, model_name, use_precomputed_alignments=True, run_id=run_id)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Metrics and Visualizations "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Get the MSA Plots for one sequence in a run"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from src.plot_msas import get_msa_plot\n",
+    "\n",
+    "# Provide the fasta sequence id and the run_id\n",
+    "get_msa_plot(run_id, fasta_id=\"<sequence_id>\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "# To get all sequence aligments\n",
+    "get_msa_plot(run_id)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Get the best prediction by pLDDT and metrics"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from src.metrics import get_metrics_and_visualizations, plot_plddt_legend\n",
+    "\n",
+    "model_name = \"multimer\"\n",
+    "weight_set = 'AlphaFold'\n",
+    "\n",
+    "plot_plddt_legend()\n",
+    "get_metrics_and_visualizations(run_id, weight_set, model_name, \"<sequence_id>\", relax_prediction=True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Concurrent/Async inference\n",
+    "\n",
+    "If you have multiple cards and want to run concurrent inference for experiments"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from concurrent.futures import ProcessPoolExecutor, as_completed\n",
+    "import signal\n",
+    "\n",
+    "def experiment_1():\n",
+    "    print(\"Experiment 1 is running\")\n",
+    "    input_file = \"/path/to/experiment_1.fasta\"\n",
+    "    gpu = \"cuda:0\"\n",
+    "    model_name = \"multimer\"\n",
+    "    weight_set = 'AlphaFold'\n",
+    "    run_id = openfold_client.run_inference(weight_set, model_name, inference_input=input_file, gpu=gpu)  \n",
+    "    return \"Experiment 1 completed\"\n",
+    "\n",
+    "def experiment_2():\n",
+    "    print(\"Experiment 2 is running\")\n",
+    "    input_file = \"/path/to/experiment_2.fasta\"\n",
+    "    gpu = \"cuda:1\"\n",
+    "    model_name = \"monomer\"\n",
+    "    weight_set = 'OpenFold'\n",
+    "    run_id = openfold_client.run_inference(weight_set, model_name, inference_input=input_file, gpu=gpu)\n",
+    "    return \"Experiment 2 completed\"\n",
+    "\n",
+    "experiments = [experiment_1, experiment_2]\n",
+    "\n",
+    "# Function to handle keyboard interrupt\n",
+    "def signal_handler(sig, frame):\n",
+    "    print(\"Interrupt received, stopping...\")\n",
+    "    raise KeyboardInterrupt\n",
+    "\n",
+    "# Register the signal handler\n",
+    "signal.signal(signal.SIGINT, signal_handler)\n",
+    "\n",
+    "try:\n",
+    "    # Execute tasks in parallel\n",
+    "    with ProcessPoolExecutor() as executor:\n",
+    "        futures = [executor.submit(task) for task in experiments]\n",
+    "        results = []\n",
+    "        for future in as_completed(futures):\n",
+    "            results.append(future.result())\n",
+    "    print(\"Results:\", results)\n",
+    "except KeyboardInterrupt:\n",
+    "    print(\"Execution interrupted by user.\")\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# License and Disclaimer\n",
+    "\n",
+    "This notebook and other information provided is for theoretical modelling only, caution should be exercised in its use. It is provided ‘as-is’ without any warranty of any kind, whether expressed or implied. Information is not intended to be a substitute for professional medical advice, diagnosis, or treatment, and does not constitute medical or other professional advice.\n",
+    "\n",
+    "## AlphaFold/OpenFold Code License\n",
+    "\n",
+    "Copyright 2021 AlQuraishi Laboratory\n",
+    "\n",
+    "Copyright 2021 DeepMind Technologies Limited.\n",
+    "\n",
+    "Licensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the License at https://www.apache.org/licenses/LICENSE-2.0.\n",
+    "\n",
+    "Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.\n",
+    "\n",
+    "## Model Parameters License\n",
+    "\n",
+    "DeepMind's AlphaFold parameters are made available under the terms of the Creative Commons Attribution 4.0 International (CC BY 4.0) license. You can find details at: https://creativecommons.org/licenses/by/4.0/legalcode\n",
+    "\n",
+    "\n",
+    "## Third-party software\n",
+    "\n",
+    "Use of the third-party software, libraries or code referred to in this notebook may be governed by separate terms and conditions or license provisions. Your use of the third-party software, libraries or code is subject to any such terms and you should check that you can comply with any applicable restrictions or terms and conditions before use.\n",
+    "\n",
+    "\n",
+    "## Mirrored Databases\n",
+    "\n",
+    "The following databases have been mirrored by DeepMind, and are available with reference to the following:\n",
+    "* UniRef90: v2021\\_03 (unmodified), by The UniProt Consortium, available under a [Creative Commons Attribution-NoDerivatives 4.0 International License](http://creativecommons.org/licenses/by-nd/4.0/).\n",
+    "* MGnify: v2019\\_05 (unmodified), by Mitchell AL et al., available free of all copyright restrictions and made fully and freely available for both non-commercial and commercial use under [CC0 1.0 Universal (CC0 1.0) Public Domain Dedication](https://creativecommons.org/publicdomain/zero/1.0/).\n",
+    "* BFD: (modified), by Steinegger M. and Söding J., modified by DeepMind, available under a [Creative Commons Attribution-ShareAlike 4.0 International License](https://creativecommons.org/licenses/by/4.0/). See the Methods section of the [AlphaFold proteome paper](https://www.nature.com/articles/s41586-021-03828-1) for details."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.0"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/notebooks/src/__init__.py b/notebooks/src/__init__.py