diff --git a/README.md b/README.md index c664ed3..ece8813 100644 --- a/README.md +++ b/README.md @@ -5,34 +5,42 @@ The data analyzed by C-COMPASS typically derives from proteomics fractionation samples that result in compartment-specific protein profiles. Our tool can be used to analyze datasets derived from various experimental techniques. ## Key Features + - **Protein Localization Prediction**: Use a neural network to predict the spatial distribution of proteins within cellular compartments. - **Dynamic Compartment Composition Analysis**: Model changes in compartment composition based on protein abundance data under various conditions. - **Comparison of Biological Conditions**: Compare different biological conditions to identify and quantify relocalization of proteins and re-organization of cellular compartments. - **Multi-Omics Support**: Combine your proteomics experiment with different omics measurements such as lipidomics to bring your project to the spacial multi-omics level. - **User-Friendly Interface**: No coding skills required; the tool features a simple GUI for conducting analysis. -## System Requirements -- 64-bit Windows Operating System -- **No** Python Installation Required +## Documentation + +Further documentation is available at https://c-compass.readthedocs.io/en/latest/. ## Installation ### Single-file executables -Single-file executables that don't require a Python installation are available on the release page. +Single-file executables that don't require a Python installation are available +on the release page for Linux, Windows, and MacOS. Download the appropriate +file for your operating system and run it. -On Windows, make sure to install the Microsoft C and C++ (MSVC) runtime libraries before ([further information](ttps://learn.microsoft.com/en-us/cpp/windows/latest-supported-vc-redist?view=msvc-170), [direct download](https://aka.ms/vs/17/release/vc_redist.x64.exe)). +On Windows, make sure to install the Microsoft C and C++ (MSVC) runtime +libraries before ([further information](ttps://learn.microsoft.com/en-us/cpp/windows/latest-supported-vc-redist?view=msvc-170), +[direct download](https://aka.ms/vs/17/release/vc_redist.x64.exe)). ### Via pip ```bash +# install pip install ccompass +# launch the GUI +ccompass ``` -### Prerequisites -C-COMPASS requires Python>=3.10, and due to its `tensorflow` dependency Python<=3.12. +Note that C-COMPASS currently requires Python>=3.10, and due to its +`tensorflow` dependency Python<=3.12. -#### Ubuntu +On Ubuntu linux, installing the `python3-tk` package is required: ```bash sudo apt-get install python3-tk @@ -40,16 +48,14 @@ sudo apt-get install python3-tk ## Usage -To launch the GUI, run the following command: - -```bash -ccompass -``` +See also https://c-compass.readthedocs.io/en/latest/usage.html. ### Graphical User Interface (GUI) -- The GUI will guide you through the process of loading and analyzing your proteomics dataset, including fractionation samples and Total Proteome samples. -- Follow the on-screen instructions to perform the analysis and configure settings only if required -- Standard parameters should fit for the majority of experiments. You **don't need to change the default settings!** + +* The GUI will guide you through the process of loading and analyzing your proteomics dataset, including fractionation samples and Total Proteome samples. +*Follow the on-screen instructions to perform the analysis and configure settings only if required +* Standard parameters should fit for the majority of experiments. + You **don't need to change the default settings!** ### Command-Line Usage (Optional) You can also run the software via the command line: @@ -68,23 +74,24 @@ You can also run the software via the command line: - Principal analysis steps and calculations will be kept as they are in version 1.0 unless changes are suggested by the reviewers. ### Contributing + Contributions to C-COMPASS are welcome! To contribute: + 1. **Fork the repository** on GitHub. 2. **Create a new branch** for your changes. 3. **Commit your changes**. 4. **Submit a pull request**. ### License + C-COMPASS is licensed under the BSD 3-Clause License. ### Trouble-Shooting -- **SmartScreen Warning**: If Windows blocks the application via SmartScreen, this is due to the software being unsigned. Please consult your IT department to bypass this restriction if necessary. -- **Long Path Issues on Windows**: If your system encounters long path errors, you can activate them in your registry under 'HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\FileSystem' by setting the value for **LongPathsEnabled* from 0 to 1. + +* **SmartScreen Warning**: If Windows blocks the application via SmartScreen, this is due to the software being unsigned. Please consult your IT department to bypass this restriction if necessary. +* **Long Path Issues on Windows**: If your system encounters long path errors, you can activate them in your registry under 'HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\FileSystem' by setting the value for **LongPathsEnabled* from 0 to 1. ### Contact -For any questions, contact daniel.haas@helmholtz-munich.de -### Pre-Publication Information -The software documentation to C-COMPASS is accessible under -**/docs/build/html/index.html** -and will be publicly available by the official release of C-COMPASS. +For any questions, contact `daniel.haas@helmholtz-munich.de` or post an +issue at https://github.com/ICB-DCM/C-COMPASS/issues/. diff --git a/doc/changelog.rst b/doc/changelog.rst index 9e27567..1952dab 100644 --- a/doc/changelog.rst +++ b/doc/changelog.rst @@ -1,5 +1,5 @@ -VII. Changelog -============================== +Changelog +========= Version 1.0.0 ------------- diff --git a/doc/contributing.rst b/doc/contributing.rst index 06910ea..6ca1bd9 100644 --- a/doc/contributing.rst +++ b/doc/contributing.rst @@ -1,5 +1,5 @@ -V. Contributions -========================= +Contributing +============ We welcome contributions to C-COMPASS and encourage the community to participate in its development. Whether you are fixing bugs, adding new features, or improving documentation, your help is greatly appreciated. @@ -14,8 +14,8 @@ Before starting major changes, it's a good idea to open an issue to discuss the We appreciate your time and effort in making C-COMPASS even better! -VI. Pre-commit Hooks --------------------- +Pre-commit Hooks +---------------- We use `pre-commit `__ hooks to ensure code quality and consistency. Pre-commit hooks automatically run checks diff --git a/doc/faq.rst b/doc/faq.rst index b29b82b..5b8ea59 100644 --- a/doc/faq.rst +++ b/doc/faq.rst @@ -1,12 +1,6 @@ -IV. Help -================ +FAQ +==== -1. FAQ -------------------------------------- +**What if one or more of my gradients are missing some fractions?** - a. What if one or more of my gradients are missing some fractions? - - A. By default, C-COMPASS analyses each replicate secparately which means it is not necessary that all fractionations are complete. However, if the number of remaining fractions is decreased, the prediction accuracy is affected. Furthermore, it can happen that the missing fraction is an important feature for a distinct compartment. You can check the correlation matrix for marker proteins and their median profile to evaluate how distinguishable your profiles still are. - -2. Trouble Shooting -------------------- +A. By default, C-COMPASS analyses each replicate separately which means it is not necessary that all fractionations are complete. However, if the number of remaining fractions is decreased, the prediction accuracy is affected. Furthermore, it can happen that the missing fraction is an important feature for a distinct compartment. You can check the correlation matrix for marker proteins and their median profile to evaluate how distinguishable your profiles still are. diff --git a/doc/index.rst b/doc/index.rst index 226443b..21e4205 100644 --- a/doc/index.rst +++ b/doc/index.rst @@ -1,5 +1,5 @@ -Welcome to CCMPS Documentation -============================== +Welcome to C-COMPASS Documentation +================================== Introduction ------------ @@ -18,7 +18,6 @@ With C-COMPASS, users can perform comprehensive quantitative analyses of compart :caption: Contents: installation.rst - preparation.rst usage.rst faq.rst contributing.rst diff --git a/doc/installation.rst b/doc/installation.rst index fdff9fc..a9f1551 100644 --- a/doc/installation.rst +++ b/doc/installation.rst @@ -1,29 +1,7 @@ -I. Installation -============================== - -System Requirements --------------------- - -- 64-bit Windows Operating System - - -Running the Software --------------------- - -- Download the ZIP file from the repository or release section. -- Extract the ZIP file to any location on your machine. -- Navigate to the extracted folder. -- Double-click `C-CMPS.bat` to start the application. -- The software will initialize the portable Python environment and launch the GUI (this may take a few minutes). - - -Command-Line Usage (optional) ------------------------------ - -You can also run the software via the command line: - -> python CCMPS.py +Installation +============ +See `https://github.com/ICB-DCM/C-COMPASS?tab=readme-ov-file#installation `__ for the latest installation instructions. Trouble-Shooting ---------------- diff --git a/doc/license.rst b/doc/license.rst index eba28d4..b455393 100644 --- a/doc/license.rst +++ b/doc/license.rst @@ -1,41 +1,5 @@ -VI. License -=========== - -**Note** - -WinPython components are distributed as they were received from -their copyright holder, under their own copyright and/or license, -and without any linking with each other. - -WinPython software collection (i.e., the collection of software, -libraries, and documents) is licensed under the terms of the -following license agreement. - -**WinPython License Agreement (MIT License)** - -Copyright (c) 2012 Pierre Raybaut, 2016+ WinPython team - -Permission is hereby granted, free of charge, to any person -obtaining a copy of this software and associated documentation -files (the "Software"), to deal in the Software without -restriction, including without limitation the rights to use, -copy, modify, merge, publish, distribute, sublicense, and/or sell -copies of the Software, and to permit persons to whom the -Software is furnished to do so, subject to the following -conditions: - -The above copyright notice and this permission notice shall be -included in all copies or substantial portions of the Software. - -THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, -EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES -OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND -NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT -HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, -WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING -FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR -OTHER DEALINGS IN THE SOFTWARE. - +License +======= **C-COMPASS (BSD 3-Clause License)** diff --git a/doc/preparation.rst b/doc/preparation.rst deleted file mode 100644 index 1edac81..0000000 --- a/doc/preparation.rst +++ /dev/null @@ -1,29 +0,0 @@ -II. Preparation -=============== - -Input Data ----------- - -1. To analyze your spatial proteomics datasets, you need the proteomics report file(s) derived from your spectral search software, such as MaxQuant, Spectronaut, DIANN, or others. Your data must be reported as a **pivot report table**, meaning that your table includes one column per sample, as well as additional columns for further information. The necessary columns are: - - a. One column per sample (fraction). - b. One column containing a **unique identifier** (e.g., protein groups, protein ID, etc.). - c. One column containing key names that match the key names in your marker list (usually gene names). Ensure these keys are compatible, including case sensitivity. - -2. Furthermore, you need a file containing your marker proteins. C-COMPASS provides prepared marker lists from previous publications, or you can use a custom export from a database relevant to your project. This file must include at least two columns: - - a. A column containing key names matching those in your dataset (usually gene names, see II.1.c). - b. A column containing **class annotations** (for spatial proteomics experiments, this should represent the compartments where the marker proteins are located). - -3. An additional dataset containing the total proteomes of the fractionation samples (proteomes derived from whole cell/tissue lysate) can be provided for **class-centric analysis** of compartments. This file should contain: - - a. One column per total proteome sample. - b. One column containing the **same unique identifier** as used in the fractionation samples (see II.1.b). - - -Additional Notes ----------------- - -A) All input files must be **tab-delimited** (.tsv or .txt). -B) If using an export file from **Perseus**, ensure that the file does not contain a second-layer header. -C) Input datasets (for both fractionation and total proteome) can be stored in the same file or split across different files. If they are split, ensure that the **identifiers** are consistent. diff --git a/doc/usage.rst b/doc/usage.rst index 96ce919..7d6adf4 100644 --- a/doc/usage.rst +++ b/doc/usage.rst @@ -1,146 +1,198 @@ -III. Usage Guide -================ +=========== +Usage Guide +=========== + +0. Data Preparation +=================== + +* To analyze your spatial proteomics datasets, you need the proteomics report file(s) derived from your spectral search software, such as MaxQuant, Spectronaut, DIANN, or others. Your data must be reported as a **pivot report table**, meaning that your table includes one column per sample, as well as additional columns for further information. The necessary columns are: + + * One column per sample (fraction). + * One column containing a **unique identifier** (e.g., protein groups, protein ID, etc.). + * One column containing key names that match the key names in your marker list (usually gene names). Ensure these keys are compatible, including case sensitivity. + +* Furthermore, you need a file containing your marker proteins. C-COMPASS provides prepared marker lists from previous publications, or you can use a custom export from a database relevant to your project. This file must include at least two columns: + + * A column containing key names matching those in your dataset (usually gene names, see above). + * A column containing **class annotations** (for spatial proteomics experiments, this should represent the compartments where the marker proteins are located). + +* An additional dataset containing the total proteomes of the fractionation samples (proteomes derived from whole cell/tissue lysate) can be provided for **class-centric analysis** of compartments. This file should contain: + + * One column per total proteome sample. + * One column containing the **same unique identifier** as used in the fractionation samples (see above). + + +Additional Notes +---------------- + +* All input files must be **tab-delimited** (.tsv or .txt). +* If using an export file from **Perseus**, ensure that the file does not contain a second-layer header. +* Input datasets (for both fractionation and total proteome) can be stored in the same file or split across different files. If they are split, ensure that the **identifiers** are consistent. + 1. Graphical User Interface (GUI) -------------------------------------- +================================= - a. C-COMPASS allows you to save and load your sessions via the main toolbar. +C-COMPASS allows you to save and load your sessions via the main toolbar. - b. A session can be saved as a NumPy (.npy) file, which includes all datasets, marker lists, settings, analyses, trainings, and statistics. These will be fully restored upon loading. +A session can be saved as a NumPy (``.npy``) file, which includes all datasets, +marker lists, settings, analyses, trainings, and statistics. These will be +fully restored upon loading. -2. Pre-Training -------------------- +2. Before training +================== - a. **Data Import** +#. **Data Import** - i. There are two tabs for data import: Fractionation and TotalProteome. + #. There are two tabs for data import: `Fractionation` and `TotalProteome`. - ii. Fractionation data can be analyzed independently, but TotalProteome is required for final class-centric statistics. + #. Fractionation data can be analyzed independently, but TotalProteome is + required for final class-centric statistics. - iii. Use the "Add file..." button to import datasets. Multiple datasets can be imported and will appear in the dropdown menu. To remove a dataset, select it from the dropdown and click "Remove." + #. Use the `Add file...` button to import datasets. + Multiple datasets can be imported and will appear in the dropdown menu. + To remove a dataset, select it from the dropdown and click `Remove.` - iv. The table will display all column names found in the selected dataset. + #. The table will display all column names found in the selected dataset. - b. **Sample Annotation** +#. **Sample Annotation** - i. For Fractionation data: Assign the condition, replicate number, and fraction numbers by selecting the relevant column names and clicking the appropriate button. + #. For Fractionation data: Assign the condition, replicate number, and + fraction numbers by selecting the relevant column names and clicking the + appropriate button. - ii. For TotalProteome data: Follow the same steps as Fractionation data, using consistent condition names. + #. For TotalProteome data: Follow the same steps as Fractionation data, + using consistent condition names. - iii. Set the identifier column (e.g., ProteinGroups) for both Fractionation and TotalProteome datasets using the "Set Identifier" button. Ensure compatibility between these columns. + #. Set the identifier column (e.g., `ProteinGroups`) for both Fractionation and + TotalProteome datasets using the "Set Identifier" button. + Ensure compatibility between these columns. - iv. For other columns, either remove them or mark them as "Keep." Data marked as "Keep" will not be used in the analysis but will be available for export. + #. For other columns, either remove them or mark them as `Keep.` + Data marked as `Keep` will not be used in the analysis but will be + available for export. - v. **IMPORTANT**: Ensure that the column matching the marker list's naming (usually the gene name column) is kept. + #. **IMPORTANT**: Ensure that the column matching the marker list's naming + (usually the gene name column) is kept. - c. **Pre-Processing** +#. **Pre-Processing** - i. Once columns are annotated, click "Process Fract." or "Process TP" to import the data. + #. Once columns are annotated, click `Process Fract.` or `Process TP` + to import the data. - ii. Fractionation and TotalProteome data can be processed independently. + #. Fractionation and TotalProteome data can be processed independently. - d. **Marker List Import** +#. **Marker List Import** - i. In the "Marker Selection" frame, load marker lists via the "Add..." button. Multiple marker lists can be imported, and individual lists can be removed using the "Remove" button. + #. In the `Marker Selection` frame, load marker lists via the `Add...` + button. Multiple marker lists can be imported, and individual lists can + be removed using the `Remove` button. - ii. Imported marker lists will be displayed in the box. + #. Imported marker lists will be displayed in the box. - iii. For each marker list, specify the key column (e.g., gene names) and the class column (e.g., compartment). + #. For each marker list, specify the key column (e.g., gene names) + and the class column (e.g., compartment). - iv. In the "Fract. Key" section, select the column from the fractionation dataset that contains the compatible key naming. If the identifier and key column are the same, select "[IDENTIFIER]." + #. In the `Fract. Key` section, select the column from the fractionation dataset that contains the compatible key naming. If the identifier and key column are the same, select `[IDENTIFIER].` - e. **Marker Check & Matching** +#. **Marker Check & Matching** - i. Click "Manage..." to view all class annotations from the marker lists. Unselect any classes you do not want in the analysis or rename them. + #. Click `Manage...` to view all class annotations from the marker lists. + Unselect any classes you do not want in the analysis or rename them. - ii. Classes with different nomenclatures (e.g., "ER" vs. "Endoplasmic Reticulum") can be merged by giving them the same name. + #. Classes with different nomenclatures (e.g., ``ER`` vs. ``Endoplasmic Reticulum``) can be merged by giving them the same name. - iii. Median profiles of marker proteins and Pearson correlation matrices can be displayed via the corresponding buttons. Export options for plots and tables are available. + #. Median profiles of marker proteins and Pearson correlation matrices + can be displayed via the corresponding buttons. + Export options for plots and tables are available. - iv. Confirm your marker selection by clicking "Match!." + #. Confirm your marker selection by clicking `Match!`. 3. Training ---------------- +=========== - a. Start the training process by clicking "Train C-COMPASS." +#. Start the training process by clicking `Train C-COMPASS`. - b. Various network architectures will be trained and evaluated for optimal results. This process may take over an hour, depending on dataset size. +#. Various network architectures will be trained and evaluated for optimal results. This process may take over an hour, depending on dataset size. - c. Progress will be shown in the background console window. +#. Progress will be shown in the background console window. - d. **Hint**: Save your session after training to avoid repeating the process. +#. **Hint**: Save your session after training to avoid repeating the process. - e. **Note**: Future versions will optimize training time while maintaining calculation accuracy. +#. **Note**: Future versions will optimize training time while maintaining calculation accuracy. -4. Post-Training --------------------- +4. After training +================= - a. **Statistics** +#. **Statistics** - i. After training, create "Static Statistics" via "Predict Proteome" to generate quantitative classifications for each condition. + #. After training, create `Static Statistics` via `Predict Proteome` + to generate quantitative classifications for each condition. - ii. Predictions can be exported or imported for comparison across sessions, ensuring compatible identifiers. + #. Predictions can be exported or imported for comparison across sessions, + ensuring compatible identifiers. - iii. Use the "Report" button to export results. + #. Use the `Report` button to export results. - iv. Create simple plots and export them, along with the corresponding data tables. + #. Create simple plots and export them, along with the corresponding data tables. - b. **Conditional Comparison - Global Changes** +#. **Conditional Comparison - Global Changes** - i. "Calculate Global Changes" compares localization across conditions, providing relocalization results. + #. `Calculate Global Changes` compares localization across conditions, + providing relocalization results. - ii. Results can be displayed and exported similarly to the statistics. + #. Results can be displayed and exported similarly to the statistics. - c. **Conditional Comparison - Class-centric Changes** +#. **Conditional Comparison - Class-centric Changes** - i. **CPA (Class-centric Protein Amount)**: The amount of protein within a compartment, normalized by total proteome data. This is a relative value that requires comparison across conditions. + #. **CPA (Class-centric Protein Amount)**: The amount of protein within a compartment, normalized by total proteome data. This is a relative value that requires comparison across conditions. - ii. **CFC (Class-centric Fold-Change)**: The fold change of proteins across conditions within a compartment, based on CPA values. Only proteins with valid fractionation and total proteome data for both conditions will have CFC values. + #. **CFC (Class-centric Fold-Change)**: The fold change of proteins across conditions within a compartment, based on CPA values. Only proteins with valid fractionation and total proteome data for both conditions will have CFC values. 5. Spatial Lipidomics -------------------------- +====================== - a. C-COMPASS has been used for spatial lipidomics analysis, though no dedicated feature currently exists for multi-omics analysis. +#. C-COMPASS has been used for spatial lipidomics analysis, though no dedicated feature currently exists for multi-omics analysis. - b. You can concatenate proteomics and lipidomics datasets into one file before importing into C-COMPASS. Lipids will be treated like proteins, and spatial information can be derived similarly. +#. You can concatenate proteomics and lipidomics datasets into one file before importing into C-COMPASS. Lipids will be treated like proteins, and spatial information can be derived similarly. - c. Future versions of C-COMPASS will include features specifically designed for lipidomics. +#. Future versions of C-COMPASS will include features specifically designed for lipidomics. 6. Parameters ------------------ +============= - a. All parameters are set to default values used in our publication. It is not recommended to change them unless you are familiar with the procedure and its impact on results. +#. All parameters are set to default values used in our publication. It is not recommended to change them unless you are familiar with the procedure and its impact on results. - b. **Parameters - Fractionation** +#. **Parameters - Fractionation** - i. Parameters for analysis and visualization can be adjusted independently. + #. Parameters for analysis and visualization can be adjusted independently. - ii. **Min. valid fractions**: Profiles with fewer valid values across fractions can be filtered out. + #. **Min. valid fractions**: Profiles with fewer valid values across fractions can be filtered out. - iii. **Found in at least X Replicates**: Proteins found in fewer replicates than specified will be removed. + #. **Found in at least X Replicates**: Proteins found in fewer replicates than specified will be removed. - iv. **Pre-scaling**: Options include MinMax scaling or Area scaling. + #. **Pre-scaling**: Options include MinMax scaling or Area scaling. - v. **Exclude Proteins from Worst Correlated Replicate**: Removes the replicate with the lowest Pearson correlation. + #. **Exclude Proteins from Worst Correlated Replicate**: Removes the replicate with the lowest Pearson correlation. - vi. **Post-scaling**: Same options as Pre-scaling, useful for median profiles. + #. **Post-scaling**: Same options as Pre-scaling, useful for median profiles. - vii. **Remove Baseline Profiles**: Removes profiles with only 0 values after processing. + #. **Remove Baseline Profiles**: Removes profiles with only 0 values after processing. - c. **Parameters - TotalProteome** +#. **Parameters - TotalProteome** - i. **Found in at least X**: Similar to Fractionation data, this filters proteins found in fewer replicates. + #. **Found in at least X**: Similar to Fractionation data, this filters proteins found in fewer replicates. - ii. **Imputation**: Missing values can be replaced by 0 or other values. + #. **Imputation**: Missing values can be replaced by 0 or other values. - d. **Parameters - Marker Selection** +#. **Parameters - Marker Selection** - i. Discrepancies across marker lists can be handled by excluding markers or taking the majority annotation. + #. Discrepancies across marker lists can be handled by excluding markers or taking the majority annotation. - e. **Parameters - Spatial Prediction** +#. **Parameters - Spatial Prediction** - i. **WARNING**: Changes here are not recommended! + #. **WARNING**: Changes here are not recommended! - ii. Various upsampling, noise, and SVM filtering methods are available for marker prediction. + #. Various upsampling, noise, and SVM filtering methods are available for marker prediction. - f. **Other parameters** for network training and optimization can be configured, including dense layer activation, output activation, loss function, optimizers, and number of epochs. +#. **Other parameters** for network training and optimization can be configured, including dense layer activation, output activation, loss function, optimizers, and number of epochs.