Analysis mode and input parameters

### A template for this input parameter file can be downloaded here.

ProteinClusterQuant analysis uses an input parameter file that is a text file containing a set of property = value pairs as described here:

Algorithm main parameters, thresholds and filters:

inputType = FILE_TYPE
This parameter takes into consideration the type/format of data for input into the program. The possible values for the file type are:

CENSUS_CHRO - for data that was acquired based on isobaric isotopologue labeling of peptides
CENSUS_OUT - from ip2 for data that was acquired in silac format.
SEPARATED_VALUES - tab delimited data file containing the following columns:
- Column 1: PSM identifier
- Column 2: peptide sequence
- Column 3: ratio value (not log values!!).
- Column 4: ratio weight. This value can be any value measuring the quality of the ratio measurement. For example, for iTRAQ ratios, it could be the intensity of the higher peak contributing to the ratio; for SILAC it could be a measurement of the MS1 area divided by the fitting curve quality.
- Column 5: protein accession [OPTIONAL]. This column can be avoided if a fasta file is provided with the 'fastaFile' parameter.

isobaricRatioType = Rc/Ri
This paramter defines which type of ratio is used when using isobaric isotopologues approach, that is, when inputType = CENSUS_CHRO.
If this parameter is used with other inputType different than CENSUS_CHRO, it will throw an error.
Similarly, if this parameter is empty or is not present when inputType - CENSUS_CHRO, it will throw an error.

quantChannels = LIGHT/HEAVY
In case of having input files containing more than 2 channels (Light over Heavy), you should specify here which ones you want to use for getting the ratios of the analysis.
The format is a ratio of supported labels (separating them by '/').
Possible values of labels are: LIGHT, HEAVY, MEDIUM, TMT_6PLEX_126, TMT_6PLEX_127, TMT_6PLEX_128, TMT_6PLEX_129, TMT_6PLEX_130, TMT_6PLEX_131, N14, N15.
Default value if not provided: LIGHT/HEAVY

ignorePTMs = TRUE/FALSE
In case this toggle is set to TRUE, peptides modified and not modified will be considered as the same peptide. Otherwise, they will be considered as different and therefore, they will be displayed in different peptide nodes.

collapseIndistinguishablePeptides = TRUE/FALSE
In case the toggle is set to TRUE peptide nodes of individual peptides that are connected to exactly the same protein nodes are collapsed into one single peptide node. It reduces peptide node redundancies in connected components.
Default value if not provided: TRUE

collapsePeptidesBySites = TEXT, TEXT, ...
In case of having site specific quantitative data, you can specify here in which aminoacid(s) you get the ratios for, and therefore PCQ will perform differently 2 things:

PCQ will parse isobaric isotopologues ratios accordingly to the sites (discarding isobaric ratios from ions that do not exclusively contribute to the site specificity of the quantitation).
PCQ will collapse peptide nodes by sites in proteins. So different peptides that cover the same site of the same proteins, are collapsed together.

The values of this property is a comma separated list of aminoacid letters, or just one aminoacid letter if the quantitation has only been performed in one particular aminoacid.
If this property is not present or is empty, the peptides will be collapsed by how they are shared with proteins.

collapsePeptidesByPTMs = TEXT, TEXT, ...
In case of having a PTM analysis you may want to quantify the PTM positions in the proteins. In other words, quantitative values for peptides containing the same PTM will be averaged together even when they are not coming from peptides with the same sequence.
For that purpose, this parameter will specify which PTMs you want to use to collapse quantitative values.
The value of this parameter is a comma separated string collection in which each string is like:

"+79.96@PST", meaning, ptms with a mass shift of +79.96 at P,S or T aminoacids
"K", meaning, any ptm at K aminoacid
"+79.96", meaning, any aminoacid with mass shift of +79.96

createProteinPTMStates = TRUE/FALSE
In case of enabling this parameter (TRUE), protein nodes connected to peptides with modifications will be split to represent the different modifications states that a protein may have, keeping all the possibilities based on their peptides containing PTMs.
For example, for a protein p with 3 different PTM sites (PTM1-3), there will be 2^3 = 8 possible protein nodes: non modified p, p with PTM1,p with PTM2,p with PTM3, p with PTM1 and PTM2, p with PTM1 and PTM3, p with PTM2 and PTM3, p with PTM1 and PTM2 and PTM3.
In case of having collapsePeptidesByPTMs=FALSE, this parameter will be ignored.
Default value if not provided: FALSE

maxNumPTMsPerProtein = NUMERIC_VALUE
In case of setting parameters collapsePeptidesByPTMs and createProteinPTMStates set to TRUE, PCQ will collect all detected PTM sites for each protein and then, it will create a protein node with all combinations of PTMs that support the detected peptides.
The example above shows how a protein with 3 different PTMs can derive in 8 different protein nodes. However, you can imagine that a protein with multiple PTM sites could end up generating a huge number of protein nodes which most of them are not interesting. For that reason, this parameter will limit when a protein with multiple PTMs will be split into multiple protein nodes, and that limit will be determined by the number of different PTM sites in the protein.
Default value if not provided: 4
Recommended values: any value less than 8.

collapseIndistinguishableProteins = TRUE/FALSE
In case the toggle is set to TRUE protein nodes of individual proteins that are connected to exactly the same peptide nodes are collapsed into one single protein node. It reduces protein node redundancies in a connected component.
Default value if not provided: TRUE

removeFilteredNodes = TRUE/FALSE
In case this toggle is set to TRUE and some filters are applied, if some peptide node is filtered-out, it will be removed from the network, and therefore the network will be rearranged accordingly.
In case of being set to FALSE, filtered-out nodes will remain in the output Cytoscape networks (.XGMML files), but they will be color in gray.
In any case, filtered-out nodes will not be considered for the ratio integration algorithm or for the classification schema.
Default value if not provided: TRUE

onlyOneSpectrumPerChromatographicPeakAndPerSaltStep = TRUE/FALSE
This parameter is only valid MS1-based quantitation data (like SILAC) and inputType = CENSUS_OUT.
When this parameter is set TRUE, it means that that only the best PSM will be taken from each chromatographic peak and salt step irrespective whether the quantification was done based on the heavy or the light peak that was identified. It prevents that a chromatographic pair gets included in the calculations twice because it was identified in light as well as in heavy. Note that using DTASelect the parameter -t 1 should be used prior to quantification and use of PCQ.

ionsPerPeptideNodeThresholdOn = TRUE/FALSE
Toggle that when set to TRUE a minimum amount of ions per peptide node is required in order to not being discarded.
Default value if not provided: FALSE

ionsPerPeptideNodeThreshold = NUMERIC_VALUE
This parameter works with the parameter ionsPerPeptideNodeThresholdOn to set the minimum amount of ions per peptide node to consider it in further analysis.
This filter is only applied if fileType = CENSUS_CHRO, that is, for data acquired based on isobaric isotopologue labeling of peptides.
Default value if not provided: 0

psmsPerPeptideNodeThresholdOn = TRUE/FALSE
Toggle that when set TRUE, checks if there is a threshold for minimum PSMs per peptide node.
Default value if not provided: FALSE

psmsPerPeptideNodeThreshold = NUMERIC_VALUE
This parameter works with the parameter 'psmsPerPeptideNodeThresholdOn' to define a minimum number of PSMs per peptide node to consider it in further analysis.
Default value if not provided: 0

replicatesPerPeptideNodeThresholdOn = TRUE/FALSE
Toggle that when set TRUE, checks if there is a threshold for minimum Replicates per peptide node.
Default value if not provided: FALSE

replicatesPerPeptideNodeThreshold = NUMERIC_VALUE
This parameter works with the parameter 'replicatesPerPeptideNodeThresholdOn' to define a minimum number of Replicates per peptide node to consider it in further analysis.
Default value if not provided: 0

skipSingletons = TRUE/FALSE
This parameter is only valid MS1-based quantitation data (like SILAC) and inputType = CENSUS_OUT.
When this parameter is set to TRUE, the singletons (quantified peptides only detected in one of the two conditions) are skipped and not considered in further analysis.
Default value if not provided: FALSE

labelSwap = TRUE/FALSE
This parameter should only be set to TRUE when a label swap experiment is analyzed, e.g. experiments are analyzed simultaneously in which sample 1 was labeled light and sample 2 heavy in one experiment(s) and sample 1 labeled heavy and sample 2 labeled light in another set of experiments.
Default value if not provided: FALSE

useMajorityRulesForInfinities = TRUE/FALSE
This parameter states how to proceed when averaging set of ratios containing INFINITY values (either positive or negative).
If this parameter is set to true, the mayority rule is applied, which means that if there are more non-infinity ratios than infinities, then it will report the average of the ratios, otherwise report the infinity.
More detailed cases are shown in these examples:

If useMajorityRulesForInfinities=TRUE and we have +INF, +INF, +INF, +2.5, +1.5, then the averaged value will be +INF.
If useMajorityRulesForInfinities=FALSE and we have +INF, +INF, +INF, +2.5, +1.5, then the averaged value will be +2.0.
If useMajorityRulesForInfinities=TRUE and we have +INF, +INF, +INF, +2.5, +1.5, +4.0 then the averaged value will be +4.0.
If useMajorityRulesForInfinities=FALSE and we have +INF, +INF, +INF, +2.5, then the averaged value will be +2.5.
Default value if not provided: TRUE

Peptide node ratio calculation algorithm parameters

Minimal parameters needed to perform the quantitative analysis within the protein-peptide network based on the publication Mol Cell Proteomics. 2011 Jan;10(1):M110.003335. doi: 10.1074/mcp.M110.003335. Epub 2010 Aug 31.

performRatioIntegration = TRUE/FALSE
The parameter toggles whether PCQ does apply the SanXot algorithm to calculate the final peptide node ratios or not. If the parameter is set to FALSE, the final peptide node ratios will be calculated as average of all individual quantitative ratios (in case of SILAC data) or as Ion Count ratios (Rc) (see publication for details) in case of isobaric isotopologues quantitation.
Default value if not provided: FALSE

sanxotPath = FOLDERPATH
If performRatioIntegration is enabled, SanXot scripts will be used in order to calculate the final peptide node ratios.
SanXot scripts are available under request to the Jesús Vázquez Cobos proteomics group at the CNIC, Spain.

outliersRemovalFDR = NUMERIC_VALUE
At each level of quantitative data integration (for example from PSM to peptide) outlier values are removed in case they are below a specified FDR. It allows to remove measurement values that are a result of an error of measurement. In case no numeric value is specified, the removal of outliers will not be performed.
Examples: outliersRemovalFDR = 0.01

significantFDRThreshold = NUMERIC_VALUE
This parameters refers to a threshold applied in the final ratio of the peptide node calculated after the ratio integration algorithm. The algorithm will report a FDR associated to each final peptide node ratio, and this threshold will determine if the peptide node is considered as significantly changing.
If this parameter is set, some statistics about the number of significantly regulated peptide nodes will be available in the summary file and an additional network file (XGMML) will be create containing the cluster that contains at least one peptide node with a FDR under the threshold.
If not provided, no statistics and additional network file is created.
Example: significantFDRThreshold = 0.05

Protein pair analysis parameters

applyClassificationsByProteinPairs = TRUE/FALSE
If this parameter is set to TRUE, the protein pair analysis is performed. If is set to FALSE, all the other parameters in this section will be ignored.
Default value if not provided: FALSE

thresholdForSignificance = NUMERIC_VALUE
This parameter dictates the threshold of the fold chance (not as log2 value) in determining significantly regulated or not in the protein pair analysis. This parameter is for marking significance and the higher the value, the larger the difference in ratios of peptides or proteins required to be considered significantly regulated.
This parameters is ignored if applyClassificationsByProteinPairs = FALSE.
Default value if not provided: 2

statisticalTestForProteinPairApplied = TRUE/FALSE
If this parameter is set to TRUE, a Iglewicz-Hoaglin statistical test is applied for detecting outliers in a protein pair analysis.
This test will only be performed when having enough measurements (10 individual ratios) in the protein pair.
Default value if not provided: FALSE

iglewiczHoaglinTest = NUMERIC_VALUE
This parameter sets the value for the Iglewicz-Hoaglin test. Usually 3.5 is considered a standard threshold value. This is a test applied in the protein pair analsysis to check if there are any peptide node outliers in each protein pair when is enabled in the previous parameter. If the results from this test equal or is greater than the value of the variable, it will be considered an outlier in the protein pair analysis.
The value calculated for this test is, giving a protein pair and testing the peptide node with ratio xi, the result of the test is calculated as Mi=0.6745(xi-x~)/MAD where x~ is the median of the population and MAD is the median absolute deviation of the population, considering the population all the individual PSM-level ratios in the rest of the peptide nodes of the protein pair.
Default value if not provided: 3.5
This parameter is ignored if significantProteinPairAnalysis = FALSE or if statisticalTestForProteinPairApplied = FALSE.

uniquePepOnly = TRUE/FALSE
This parameter, when set to TRUE, directs PCQ to compile protein pairs with peptide nodes that are unique within a connected component. When set to FALSE a 'protein pair centric' view is applied meaning that protein pairs are assembled independent of whether a peptide node for a protein pair is unique within the connected component.
Default value if not provided: TRUE

Peptide sequence alignment parameters

As an option for the user, peptide-to-peptide sequence alignments can be performed in order to detect analogous and orthologous sequences and cluster them in the same connect component. This could be useful for a multi-species comparison analysis.

makeAlignments = TRUE/FALSE
This parameter, when set to TRUE, aligns the peptides by similarity and denotes the similarity by an edge within the output protein-peptide network (XGMML files).
This parameter, as well as the rest of the parameters in this section are based of the Needleman-Wunsch alignment algorithm. This algorithm takes into consideration point mutations, deletions and insertions in peptide sequences and uses the BLOSUM62 matrix.
Note that it is only executed in case peptide nodes are based on single, individual peptide sequences (e.g. the value of the parameter 'collapseIndistinguishablePeptides' is set to FALSE), otherwise PCQ will report an error.
Default value if not provided: FALSE

finalAlignmentScore = NUMERIC_VALUE
This parameter sets the limit for the final alignment score from the Needleman-Wunsch alignment algorithm. Only alignments of peptide sequences with an score greater than that will be reported in the results.
Default value if not provided: 30
Example: finalAlignmentScore = 30

sequenceIdentity = NUMERIC_VALUE
This parameter sets the threshold on the sequence identify score from the Needleman-Wunsch alignment algorithm. Only alignments above that threshold will be considered in the analysis.
Default value if not provided: 0.8.
Example: sequenceIdentity = 0.8

minConsecutiveIdenticalAlignment = NUMERIC_VALUE
This value determines how many aminoacids have to consecutively be identical to be considered a valid alignment.
Default value if not provided: 6.
Example: minConsecutiveIdenticalAlignment = 6

PSEA-Quant input file generator

writePSEAQuantInputFiles = TRUE/FALSE
Whether to write or not output files that will be able to serve as input files for the PSEA-Quant annotation enrichment analysis.
Default value if not provided: TRUE

Parameters to properly read in the database and assign peptides to proteins

In case the input files doesn't contains all the peptide-to-protein connections (because the software who generated them remove some of the subset proteins or performed a protein inference heuristic), you can use a protein sequence database in order to map all the possible peptide-to-protein connections without losing any isoform for example.
Note: These parameters should be set according to the initial parameters used to search the database for identified peptide hits as well as how the search results were filtered.

enzymeArray = TEXT, TEXT, ...
This parameter specify the aminoacids where protease cleaves. Note: amino acids that are recognized as cleavage site are provided as comma-separated (',') values.
Trypsin cleaves at K and R, so 'K,R' should set here.
LysC cleaves at K only, so 'K' should be set here.
Default value if not provided: K,R
Example: enzymeArray = K,R

missedCleavages = NUMERIC_VALUE
This parameter allows the user to select how many missed cleavages were allowed in the search of the spectra.
Default value if not provided: 0
Example: missedCleavages = 2

semiCleavage = TRUE/FALSE
This parameter allows the user to allow semi cleavaged peptides or not.
Default value if not provided: FALSE
Example: semiCleavage = TRUE

discardDecoys = TEXT
Regular expression for discarding decoys from input files.
If empty or not provided, no decoys will be discarded from input files.
Example: discardDecoys = Reverse

ignoreNotFoundPeptidesInDB = TRUE/FALSE
If this parameter is set to TRUE, PCQ will check that all the peptides provided in the input files are found in the FASTA file (if provided). This is useful in order to check that the fasta in-silico digestion parameters are set correctly as well as in order to ensure that the correct database is being used.
Default value if not provided: FALSE

fastaFile = FILEPATH
Full path in which a FASTA database file is located

lookInUniprotForProteoforms = TRUE/FALSE
If this parameter is TRUE, PCQ will consider all the proteoforms for the input proteins that are available in UniprotKB. This will include available isoforms but also alternative products, mutations, sequence conflicts, etc... Even if these proteins were not searched before, they will be taken into account when building the network between proteins and peptides.
Default value if not provided: FALSE

Input files and folders parameters

inputFilePath = FILEPATH
Full path of the folder in which all the input files are located.
Default value if not provided: the system user's home folder.
Example: inputFilePath = C:\Users\Salva\Desktop\data\PINT projects\Fibulin\LacZ

inputFiles = EXP_NAME[FILE_NAME1, FILE_NAME2, ...] | EXP2_NAME[FILE_NAME4, FILE_NAME5, ...]
This parameter specifies the quantification input files to be used and groups them into groups, in order to be treated properly. EXP_NAME is an internal tag for the group of files provided subsequently between brackets. Several groups are separated by '|'. Several file names are separated by ','. File names don't include path names, because they are expected to be found in the folder specified by the inputFilePath parameter.
This parameter is required.
Examples:

DmDv[DmDv_rep1.xml, DmDv_rep2.xml, DmDv_rep3.xml] | DvDm[DvDm_rep1.xml, DvDm_rep2.xml, DvDm_rep3.xml]
A_Embryos[A_Embryos_rep1.xml, A_Embryos_rep2.xml, A_Embryos_rep3.xml]
isobaric_HEK[isobaric_HEK.xml]
CFBEvsHSB[CFBEvsHBE.xml]
Fibulin_LacZ[Fibulin_LacZ_rep1.txt, Fibulin_LacZ_rep2.txt, Fibulin_LacZ_rep3.txt]

inputIDFiles = EXP_NAME[FILE_NAME1, FILE_NAME2, ...] | EXP2_NAME[FILE_NAME4, FILE_NAME5, ...]
This parameter specifies the identification input files (DTASelect-filter.txt files) optionally to be used to complement the network with identified but not quantified nodes.
This parameter is optional.

uniprotReleasesFolder = FILEPATH
This parameter points to the folder where the UniProt annotations are going to be automatically stored in the machine were PCQ runs. UniProt annotations are retrieved from the latest version of UniProtKB in order to associate gene names, taxonomies, etc... that may not be available in the protein description line provided in the input files.
This process may take some time and some disk space, depending on the dataset size. However, once the annotations are locally stored, PCQ will retrieve the annotations from the local resource, taking just a few seconds.
Default value if not provided: the system user's home directory
Example: uniprotReleasesFolder = C:\Users\Salva\Desktop\tmp\uniprotKB

uniprotVersion = FORMATED_DATE
This parameter states the Uniprot release to use, corresponding to the subfolder name created in the folder stated by the uniprotReleasesFolder.
Leave blank for getting latest version annotations. In that case, each month (every two months from January 2020), UniProtKB releases a new version of the database and therefore PCQ will retrieve the information from Uniprot again and will store a new version of the annotations, creating a subfolder with the release date.
Default value if not provided: empty, which means, the latest version Example: uniprotVersion = 2016_07

outputFilePath = FILEPATH
This parameter points to the folder in which all output files will be generated on computer.
Default value if not provided: system user's home folder.
Example: outputFilePath = C:\Users\Salva\Desktop\data\Fibulin\LacZ\quant

outputPrefix = TEXT
Prefix appended to all output files.
This parameter is required.
Example: outputPrefix = Fibulin_LacZ_OLD_DB

outputSuffix = TEXT
Suffix appended to all output files.
This parameter is required.
Example: outputSuffix = 3PSMsPerPeptideNode_NoSingletons

recognizeACCFormat = TRUE/FALSE
If this parameter is set to TRUE, PCQ will try to extract a known formatted accession from the protein accessions in the input files. For example, having sp|P12345|P2P_HUMAN, it will be recognized as an Uniprot accession as P12345. NCBI gi and IPI accessions can also be recognized.
Set this parameter to FALSE, any accession will be taken as it is in the input files without any particular parsing. This will be appropriate for non standard accessions in order to save parsing time.
Default value if not provided: TRUE

printPTMPositionInProtein = TRUE/FALSE
If this parameter is set to TRUE, PCQ will print two new columns:

one with the position of the PTM of interest on the protein sequence and
another one with the position of the PTM on the peptide.

Default value is not provided: TRUE

Species comparison parameters

These parameters are meant for experiments in which each of the sample used for relative quantification may be from a different species. If two different species have been labelled heavy and light, respectively, these parameters will allow the program to estimate the number of incorrect quantifications. Needs to match with the scientific annotation of the species in Uniprot.

ignoreTaxonomies = TRUE/FALSE
When this is set to TRUE, taxonomies from proteins will be ignored and the following parameters will be also ignored.
This parameter should be set to TRUE when analyzing single species samples if the number of proteins is big and the proteins have not UniprotKB accessions, because to parse taxonomy from protein description is expensive for big datasets.
Default value if not provided: FALSE

lightSpecies = TEXT
Species that was labeled light. When this parameter is set, some statistics are shown in the output summary file.
Example: lightSpecies = Drosophila virilis

heavySpecies = TEXT
Species that was labeled heavy. When this parameter is set, some statistics are shown in the output summary file.
Example: heavySpecies = Drosophila virilis

Graphic options for networks exported to Cytoscape (XGMML files)

Color settings are encoded in hexadecimal color code following a ‘#’ symbol

printOnlyFirstGene= TRUE/FALSE
When this is set to TRUE, the program prints only the first gene when collapsing several proteins. This is applicable to the output files, but it is particularly useful for the tooltip on the proteins in the Cytoscape network visualization in order to avoid over-crowded tooltip text boxes.
Default value if not provided: TRUE

colorsByTaxonomy = TEXT, HEX_COLOR_CODE
Defines the color of the protein nodes depending to the taxonomy of the protein(s) in the node. It is a comma separated string were the first string is the scientific name of the taxonomy and the second string is the HEX_COLOR_CODE color code. If more than one taxonomy is present, it should be concatenated after a comma (see example).
Default value if not provided: WHITE color
Example: colorsByTaxonomy = Drosophila Virilis,#ff0000,Drosophila Melanogaster,#00ff00

colorMultiTaxonomy = HEX_COLOR_CODE
Defines the color the protein nodes that contains more than one taxonomy.
Default value if not provided: WHITE color
Example: colorMultiTaxonomy = #123456

colorAlignedPeptidesEdge = HEX_COLOR_CODE
Defines the color of the edge which connects two peptide nodes that have been aligned (makeAlignments = TRUE).
Default value if not provided: RED color

proteinLabel = PROTEIN_LABEL_TYPE
This parameter indicates which information will be displayed as the label of the protein nodes in the output Cytoscape networks. The possible values of this parameter are:

ACC. The labels on the protein nodes will be as: 'P02649' or 'P02649, P02650'.
ID. The labels on the protein nodes will be as: 'APOE_HUMAN'
GENE. The labels on the protein nodes will be as: 'APOE'
Default value if not provided: ACC

minimumRatioForColor = NUMERIC_VALUE
This parameter states the lower value of a log2 ratio range defining the gradient of colors between colorRatioMin and colorRatioMax.
Any log2 ratio value lower than this value will be showed with the color defined in parameter colorRatioMin.
Default value if not provided: -10.

maximumRatioForColor = NUMERIC_VALUE
This parameter states the higher value of a log2 ratio range defining the gradient of colors between colorRatioMin and colorRatioMax.
Any log2 ratio value higher than this value will be showed with the color defined in parameter colorRatioMax.
Default value if not provided: 10.

colorRatioMin = HEX_COLOR_CODE
This parameter defines lower color of the color gradient for the peptide nodes.
The peptides nodes will be shown with a color depending on their final peptide node ratio value and the ratio range defined by minimumRatioForColor and maximumRatioForColor.
Default value if not provided: "#00ffff" (CYAN).

colorRatioMax = HEX_COLOR_CODE
This parameter defines higher color of the color gradient for the peptide nodes.
The peptides nodes will be shown with a color depending on their final peptide node ratio value and the ratio range defined by minimumRatioForColor and maximumRatioForColor.
Default value if not provided: #00ffff (CYAN).

colorNonRegulatedPeptides = HEX_COLOR_CODE
If provided, this parameter defines the color of the peptide nodes that result to be non significatively (up/down)-regulated.
If not provided, the peptide nodes will be colored using the color scale defined by colorRatioMin and colorRatioMax.

proteinNodeWidth = NUMERIC_VALUE
This parameters defines the width of the protein nodes in the Cytoscape network view.
Default value if not provided: 70.

proteinNodeHeight = NUMERIC_VALUE
This parameters defines the height of the protein nodes in the Cytoscape network view.
Default value if not provided: 30.

peptideNodeWidth = NUMERIC_VALUE
This parameters defines the width of the peptide nodes in the Cytoscape network view.
Default value if not provided: 70.

peptideNodeHeight = NUMERIC_VALUE
This parameters defines the height of the peptide nodes in the Cytoscape network view.
Default value if not provided: 30.

proteinNodeShape = SHAPE_TYPE
This parameter defines the shape of the protein nodes in the Cytoscape network view. The possible values of this parameter are:
ELLIPSE, RECTANGLE, TRIANGLE, DIAMOND, HEXAGON, OCTAGON, PARALLELOGRAM, ROUNDRECT, VEE
Default value if not provided: ELLIPSE.

peptideNodeShape = SHAPE_TYPE
This parameter defines the shape of the peptide nodes in the Cytoscape network view. The possible values of this parameter were described above.
Default value if not provided: ROUNDRECT.

showCasesInEdges = TRUE/FALSE
If this parameter is set to TRUE, the edges will contain a label showing the classification cases for the protein pair analysis.
This parameter will not have any effect if significantProteinPairAnalysis = FALSE.
Default value if not provided: TRUE.

remarkSignificantPeptides = TRUE/FALSE
If this parameter is set to TRUE, the border of significant peptides nodes and edges will be shown with the highlight color.
Default value if not provided: TRUE.

uniprot_xpath = [XPATH, SUB_XPATH, COLUMNNAME]
New columns to include in XGMML files, so that it can be queried in Cytoscape.
Using the Uniprot XML structure (http://www.uniprot.org/docs/uniprot.xsd), you can specify an specific annotation to include in the XGMML files.
An example of a Uniprot XML file can be found at: http://www.uniprot.org/uniprot/B4P3Q9.xml
Values in this property are a list of triplets enclosed in brackets.
Each triplet is composed by 3 elements (separated by commas):

XPATH: XPath to the XML element containing the information you want to include. It may have a condition value (see examples below)
SUB_XPATH: XPath over the latest (deepest) element in the previous XPATH which indicates the actual value to retrieve from the XML.
COLUMNNAME: name to assign to that property in the XGMML. It will appear as a new column in the data tables in Cytoscape.

You can specify as many triplets as you want (no separation character): _[tri,plet,1] [tri,plet,2] _

Example 1: Consider the following part of one Uniprot XML:

<dbReference type="Proteomes" id="UP000002282">
   <property type="component" value="Unassembled WGS sequence"/>  
</dbReference>  
<dbReference type="GO" id="GO:0005739">  
   <property type="term" value="C:mitochondrion"/>  
   <property type="evidence" value="ECO:0000501"/>  
   <property type="project" value="UniProtKB-SubCell"/>  
</dbReference>  
<dbReference type="GO" id="GO:0005524">  
   <property type="term" value="F:ATP binding"/>  
   <property type="evidence" value="ECO:0000501"/>  
   <property type="project" value="UniProtKB-KW"/>  
</dbReference>

You may be interested on including all the GO term names for your proteins, so you want a new column that for this particular protein will have the value: "C:mitochondrion, F:ATP binding"

In order to tell to PCQ to retrieve those annotations you will have to write a triplet like:

XPATH: dbReference$type=GO
SUB_XPATH: property$value
COLUMNNAME: GO_Term_Name

So it would be:
uniprot_xpath = [dbReference$type=GO, property$value, GO_Term_Name]

Note that the first XPATH indicates the element you are using to filter the annotations, saying that you want all 'dbReference' elements having an attribute 'type' equals to "GO". The second SUB_XPATH indicates the xpath starting on 'dbReference' element, for the data to retrieve so that you want the attribute 'value' of the subelement 'property' of the element 'dbReference'.

Example 2: Consider the following part of one Uniprot XML:

<comment type="subcellular location">`  
    <subcellularLocation>`  
        <location evidence="1">Mitochondrion</location>`  
    </subcellularLocation>`  
</comment>`

You may be interested on including all the subcellular locations of your proteins, so you want a new column that for this particular protein will have the value: "Mitochondrion"

In order to tell to PCQ to retrieve those annotations you will have to write a triplet like:

XPATH: comment$type=subcellular location
SUB_XPATH: subcellularLocation/location
COLUMNNAME: Subcellular_Location

So it would be:
uniprot_xpath = [comment$type=subcellular location, subcellularLocation/location, Subcellular_Location]

Note that the first XPATH indicates the element you are using to filter the annotations, saying that you want all 'comment' elements having an attribute 'type' equals to "subcellular location". The second SUB_XPATH indicates the xpath starting on 'çomment' element, for the data to retrieve so that you want the text value (note that here there is no attribute notation '$') of the subelement location in the subelement subcellularLocation on the element 'comment'.

Proteomics Yates Laboratory

Contact person:
Salvador Martínez-Bartolomé (salvador at scripps.edu)
Research Associate
The Scripps Research Institute
10550 North Torrey Pines Road
La Jolla, CA 92037
Git-Hub profile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly